Autoencoder-Based Representation Learning for Similar Patients Retrieval From Electronic Health Records: Comparative Study

Abstract BackgroundBy analyzing electronic health record snapshots of similar patients, physicians can proactively predict disease onsets, customize treatment plans, and anticipate patient-specific trajectories. However, the modeling of electronic health record data is inheren...

Full description

Saved in:
Bibliographic Details
Main Authors: Deyi Li, Aditi Shukla, Sravani Chandaka, Bradley Taylor, Jie Xu, Mei Liu
Format: Article
Language:English
Published: JMIR Publications 2025-07-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2025/1/e68830
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract BackgroundBy analyzing electronic health record snapshots of similar patients, physicians can proactively predict disease onsets, customize treatment plans, and anticipate patient-specific trajectories. However, the modeling of electronic health record data is inherently challenging due to its high dimensionality, mixed feature types, noise, bias, and sparsity. Patient representation learning using autoencoders (AEs) presents promising opportunities to address these challenges. A critical question remains: how do different AE designs and distance measures impact the quality of retrieved similar patient cohorts? ObjectiveThis study aims to evaluate the performance of 5 common AE variants—vanilla autoencoder, denoising autoencoder, contractive autoencoder, sparse autoencoder, and robust autoencoder—in retrieving similar patients. Additionally, it investigates the impact of different distance measures and hyperparameter configurations on model performance. MethodsWe tested the 5 AE variants on 2 real-world datasets—the University of Kansas Medical Center (n=13,752) and the Medical College of Wisconsin (n=9568)—across 168 different hyperparameter configurations. To retrieve similar patients based on the AE-produced latent representations, we applied k-nearest neighbors (k-NN) using Euclidean and Mahalanobis distances. Two prediction targets were evaluated: acute kidney injury onset and postdischarge 1-year mortality. ResultsOur findings demonstrate that (1) denoising autoencoders outperformed other AE variants when paired with Euclidean distance (P ConclusionsThis study provides a comprehensive analysis of the performance of different AE variants in retrieving similar patients and evaluates the impact of various hyperparameter configurations on model performance. The findings lay the groundwork for future development of AE-based patient similarity estimation and personalized medicine.
ISSN:2291-9694