A self-supervised framework for laboratory data imputation in electronic health records

Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to...

Full description

Saved in:
Bibliographic Details
Main Authors: Samuel P. Heilbroner, Curtis Carter, David M. Vidmar, Erik T. Mueller, Martin C. Stumpe, Riccardo Miotto
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Communications Medicine
Online Access:https://doi.org/10.1038/s43856-025-00973-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849334606878212096
author Samuel P. Heilbroner
Curtis Carter
David M. Vidmar
Erik T. Mueller
Martin C. Stumpe
Riccardo Miotto
author_facet Samuel P. Heilbroner
Curtis Carter
David M. Vidmar
Erik T. Mueller
Martin C. Stumpe
Riccardo Miotto
author_sort Samuel P. Heilbroner
collection DOAJ
description Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD). Methods To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient’s journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results. Results We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases. Conclusions LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes.
format Article
id doaj-art-4d77bec095e14973ae8863fde02f7987
institution Kabale University
issn 2730-664X
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Communications Medicine
spelling doaj-art-4d77bec095e14973ae8863fde02f79872025-08-20T03:45:31ZengNature PortfolioCommunications Medicine2730-664X2025-07-015111110.1038/s43856-025-00973-wA self-supervised framework for laboratory data imputation in electronic health recordsSamuel P. Heilbroner0Curtis Carter1David M. Vidmar2Erik T. Mueller3Martin C. Stumpe4Riccardo Miotto5Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Tempus AI, Inc.Abstract Background Laboratory data in electronic health records (EHRs) is an effective source of information to characterize patient populations, inform accurate diagnostics and treatment decisions, and fuel research studies. However, despite their value, laboratory values are underutilized due to high levels of missingness. Existing imputation methods fall short, as they do not fully leverage patient clinical histories and are commonly not scalable to the large number of tests available in real-world data (RWD). Methods To address these shortcomings, we present Laboratory Imputation Framework for EHRs (LIFE), a self-supervised learning framework based on multi-head attention that is trained to impute any laboratory test value at any point in time in the patient’s journey using their complete EHRs. This architecture (1) eliminates the need to train a different model for each laboratory test by jointly modeling all laboratory data of interest; and (2) better clinically contextualizes the predictions by leveraging additional EHR variables, such as diagnosis, medications, and discrete laboratory results. Results We validate our framework using a large-scale, real-world dataset encompassing over 1 million oncology patients. Our results demonstrate that LIFE obtains superior or equivalent results compared to state-of-the-art baseline methods in 23 out of 25 evaluated laboratory tests and better enhances a downstream adverse event detection task in 7 out of 9 cases. Conclusions LIFE shows promise in accurately estimating missing laboratory values and enhancing the utilization of large-scale RWD in healthcare. This advancement could lead to better clinical models, more informed decision-making and improved patient outcomes.https://doi.org/10.1038/s43856-025-00973-w
spellingShingle Samuel P. Heilbroner
Curtis Carter
David M. Vidmar
Erik T. Mueller
Martin C. Stumpe
Riccardo Miotto
A self-supervised framework for laboratory data imputation in electronic health records
Communications Medicine
title A self-supervised framework for laboratory data imputation in electronic health records
title_full A self-supervised framework for laboratory data imputation in electronic health records
title_fullStr A self-supervised framework for laboratory data imputation in electronic health records
title_full_unstemmed A self-supervised framework for laboratory data imputation in electronic health records
title_short A self-supervised framework for laboratory data imputation in electronic health records
title_sort self supervised framework for laboratory data imputation in electronic health records
url https://doi.org/10.1038/s43856-025-00973-w
work_keys_str_mv AT samuelpheilbroner aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT curtiscarter aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT davidmvidmar aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT eriktmueller aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT martincstumpe aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT riccardomiotto aselfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT samuelpheilbroner selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT curtiscarter selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT davidmvidmar selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT eriktmueller selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT martincstumpe selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords
AT riccardomiotto selfsupervisedframeworkforlaboratorydataimputationinelectronichealthrecords