A self-supervised framework for laboratory data imputation in electronic health records
Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Communications Medicine |
| Online Access: | https://doi.org/10.1038/s43856-025-00973-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849334606878212096 |
|---|---|
| author | Samuel P. Heilbroner Curtis Carter David M. Vidmar Erik T. Mueller Martin C. Stumpe Riccardo Miotto |
| author_facet | Samuel P. Heilbroner Curtis Carter David M. Vidmar Erik T. Mueller Martin C. Stumpe Riccardo Miotto |
| author_sort | Samuel P. Heilbroner |
| collection | DOAJ |
| description | Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD). Methods To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient’s journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results. Results We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases. Conclusions LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes. |
| format | Article |
| id | doaj-art-4d77bec095e14973ae8863fde02f7987 |
| institution | Kabale University |
| issn | 2730-664X |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Communications Medicine |
| spelling | doaj-art-4d77bec095e14973ae8863fde02f79872025-08-20T03:45:31ZengNature PortfolioCommunications Medicine2730-664X2025-07-015111110.1038/s43856-025-00973-wA self-supervised framework for laboratory data imputation in electronic health recordsSamuel P. Heilbroner0Curtis Carter1David M. Vidmar2Erik T. Mueller3Martin C. Stumpe4Riccardo Miotto5Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD). Methods To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient’s journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results. Results We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases. Conclusions LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes.https://doi.org/10.1038/s43856-025-00973-w |
| spellingShingle | Samuel P. Heilbroner Curtis Carter David M. Vidmar Erik T. Mueller Martin C. Stumpe Riccardo Miotto A self-supervised framework for laboratory data imputation in electronic health records Communications Medicine |
| title | A self-supervised framework for laboratory data imputation in electronic health records |
| title_full | A self-supervised framework for laboratory data imputation in electronic health records |
| title_fullStr | A self-supervised framework for laboratory data imputation in electronic health records |
| title_full_unstemmed | A self-supervised framework for laboratory data imputation in electronic health records |
| title_short | A self-supervised framework for laboratory data imputation in electronic health records |
| title_sort | self supervised framework for laboratory data imputation in electronic health records |
| url | https://doi.org/10.1038/s43856-025-00973-w |
| work_keys_str_mv | AT samuelpheilbroner aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT curtiscarter aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT davidmvidmar aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT eriktmueller aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT martincstumpe aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT riccardomiotto aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT samuelpheilbroner selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT curtiscarter selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT davidmvidmar selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT eriktmueller selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT martincstumpe selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords AT riccardomiotto selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords |