A self-supervised framework for laboratory data imputation in electronic health records
Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Communications Medicine |
| Online Access: | https://doi.org/10.1038/s43856-025-00973-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD). Methods To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient’s journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results. Results We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases. Conclusions LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes. |
|---|---|
| ISSN: | 2730-664X |