Transformer patient embedding using electronic health records enables patient stratification and progression analysis

Abstract Current studies regarding the secondary use of electronic health records (EHR) predominantly rely on domain expertise and existing medical knowledge. A powerful representation approach can unleash the potential of discovering new medical patterns underlying the EHR. Here, we introduce an un...

Full description

Saved in:
Bibliographic Details
Main Authors: Su Xian, Monika E. Grabowska, Iftikhar J. Kullo, Yuan Luo, Jordan W. Smoller, Theresa L. Walunas, Wei-Qi Wei, Gail P. Jarvik, Sean D. Mooney, David R. Crosslin
Format: Article
Language:English
Published: Nature Portfolio 2025-08-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01872-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Current studies regarding the secondary use of electronic health records (EHR) predominantly rely on domain expertise and existing medical knowledge. A powerful representation approach can unleash the potential of discovering new medical patterns underlying the EHR. Here, we introduce an unsupervised method for embedding high-dimensional EHR data at the patient level to characterize heterogeneity in complex diseases and identify novel disease patterns linked to disparities in clinical outcomes. We applied this approach to 34,851 unique medical codes across 1,046,649 longitudinal patient events, including 102,740 patients in the Electronic Medical Records and GEnomics (eMERGE) Network. The model achieved strong predictive performance in predicting future disease (median AUROC = 0.87 within one year) and bulk phenotyping (median AUROC = 0.84). Notably, these patient embeddings revealed diverse comorbidity profiles and health outcomes, including distinct subtypes and progression patterns in colorectal cancer and systemic lupus erythematosus.
ISSN:2398-6352