Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study
BackgroundCongestive heart failure (CHF) is a common cause of hospital admissions. Medical records contain valuable information about CHF, but manual chart review is time-consuming. Claims databases (using International Classification of Diseases [ICD] codes) provide a scalab...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-04-01
|
| Series: | JMIR Medical Informatics |
| Online Access: | https://medinform.jmir.org/2025/1/e64113 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850186183972225024 |
|---|---|
| author | Daniel Sumsion Elijah Davis Marta Fernandes Ruoqi Wei Rebecca Milde Jet Malou Veltink Wan-Yee Kong Yiwen Xiong Samvrit Rao Tara Westover Lydia Petersen Niels Turley Arjun Singh Stephanie Buss Shibani Mukerji Sahar Zafar Sudeshna Das Valdery Moura Junior Manohar Ghanta Aditya Gupta Jennifer Kim Katie Stone Emmanuel Mignot Dennis Hwang Lynn Marie Trotti Gari D Clifford Umakanth Katwa Robert Thomas M Brandon Westover Haoqi Sun |
| author_facet | Daniel Sumsion Elijah Davis Marta Fernandes Ruoqi Wei Rebecca Milde Jet Malou Veltink Wan-Yee Kong Yiwen Xiong Samvrit Rao Tara Westover Lydia Petersen Niels Turley Arjun Singh Stephanie Buss Shibani Mukerji Sahar Zafar Sudeshna Das Valdery Moura Junior Manohar Ghanta Aditya Gupta Jennifer Kim Katie Stone Emmanuel Mignot Dennis Hwang Lynn Marie Trotti Gari D Clifford Umakanth Katwa Robert Thomas M Brandon Westover Haoqi Sun |
| author_sort | Daniel Sumsion |
| collection | DOAJ |
| description |
BackgroundCongestive heart failure (CHF) is a common cause of hospital admissions. Medical records contain valuable information about CHF, but manual chart review is time-consuming. Claims databases (using International Classification of Diseases [ICD] codes) provide a scalable alternative but are less accurate. Automated analysis of medical records through natural language processing (NLP) enables more efficient adjudication but has not yet been validated across multiple sites.
ObjectiveWe seek to accurately classify the diagnosis of CHF based on structured and unstructured data from each patient, including medications, ICD codes, and information extracted through NLP of notes left by providers, by comparing the effectiveness of several machine learning models.
MethodsWe developed an NLP model to identify CHF from medical records using electronic health records (EHRs) from two hospitals (Mass General Hospital and Beth Israel Deaconess Medical Center; from 2010 to 2023), with 2800 clinical visit notes from 1821 patients. We trained and compared the performance of logistic regression, random forests, and RoBERTa models. We measured model performance using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). These models were also externally validated by training the data on one hospital sample and testing on the other, and an overall estimated error was calculated using a completely random sample from both hospitals.
ResultsThe average age of the patients was 66.7 (SD 17.2) years; 978 (54.3%) out of 1821 patients were female. The logistic regression model achieved the best performance using a combination of ICD codes, medications, and notes, with an AUROC of 0.968 (95% CI 0.940-0.982) and an AUPRC of 0.921 (95% CI 0.835-0.969). The models that only used ICD codes or medications had lower performance. The estimated overall error rate in a random EHR sample was 1.6%. The model also showed high external validity from training on Mass General Hospital data and testing on Beth Israel Deaconess Medical Center data (AUROC 0.927, 95% CI 0.908-0.944) and vice versa (AUROC 0.968, 95% CI 0.957-0.976).
ConclusionsThe proposed EHR-based phenotyping model for CHF achieved excellent performance, external validity, and generalization across two institutions. The model enables multiple downstream uses, paving the way for large-scale studies of CHF treatment effectiveness, comorbidities, outcomes, and mechanisms. |
| format | Article |
| id | doaj-art-7761e237ff094e1daa0527b8020f675b |
| institution | OA Journals |
| issn | 2291-9694 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | JMIR Medical Informatics |
| spelling | doaj-art-7761e237ff094e1daa0527b8020f675b2025-08-20T02:16:28ZengJMIR PublicationsJMIR Medical Informatics2291-96942025-04-0113e6411310.2196/64113Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective StudyDaniel Sumsionhttps://orcid.org/0009-0009-4994-0558Elijah Davishttps://orcid.org/0009-0005-1821-5944Marta Fernandeshttps://orcid.org/0000-0002-7203-2832Ruoqi Weihttps://orcid.org/0000-0002-1771-542XRebecca Mildehttps://orcid.org/0009-0003-0923-6765Jet Malou Veltinkhttps://orcid.org/0009-0000-5141-9934Wan-Yee Konghttps://orcid.org/0000-0003-0553-9577Yiwen Xionghttps://orcid.org/0009-0002-6857-4927Samvrit Raohttps://orcid.org/0009-0001-7136-3760Tara Westoverhttps://orcid.org/0009-0004-4795-2612Lydia Petersenhttps://orcid.org/0009-0008-4491-2948Niels Turleyhttps://orcid.org/0009-0009-4806-578XArjun Singhhttps://orcid.org/0009-0005-4370-3077Stephanie Busshttps://orcid.org/0000-0002-9912-063XShibani Mukerjihttps://orcid.org/0000-0002-5677-6954Sahar Zafarhttps://orcid.org/0000-0001-5252-5376Sudeshna Dashttps://orcid.org/0000-0002-9486-6811Valdery Moura Juniorhttps://orcid.org/0000-0001-5735-9143Manohar Ghantahttps://orcid.org/0009-0004-8488-3644Aditya Guptahttps://orcid.org/0000-0002-5243-368XJennifer Kimhttps://orcid.org/0000-0003-3072-6198Katie Stonehttps://orcid.org/0000-0003-2797-3171Emmanuel Mignothttps://orcid.org/0000-0002-6928-5310Dennis Hwanghttps://orcid.org/0000-0002-4070-1640Lynn Marie Trottihttps://orcid.org/0000-0003-2329-6847Gari D Cliffordhttps://orcid.org/0000-0002-5709-201XUmakanth Katwahttps://orcid.org/0009-0002-1810-4134Robert Thomashttps://orcid.org/0000-0002-5575-3953M Brandon Westoverhttps://orcid.org/0000-0003-4803-312XHaoqi Sunhttps://orcid.org/0000-0002-5041-8312 BackgroundCongestive heart failure (CHF) is a common cause of hospital admissions. Medical records contain valuable information about CHF, but manual chart review is time-consuming. Claims databases (using International Classification of Diseases [ICD] codes) provide a scalable alternative but are less accurate. Automated analysis of medical records through natural language processing (NLP) enables more efficient adjudication but has not yet been validated across multiple sites. ObjectiveWe seek to accurately classify the diagnosis of CHF based on structured and unstructured data from each patient, including medications, ICD codes, and information extracted through NLP of notes left by providers, by comparing the effectiveness of several machine learning models. MethodsWe developed an NLP model to identify CHF from medical records using electronic health records (EHRs) from two hospitals (Mass General Hospital and Beth Israel Deaconess Medical Center; from 2010 to 2023), with 2800 clinical visit notes from 1821 patients. We trained and compared the performance of logistic regression, random forests, and RoBERTa models. We measured model performance using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). These models were also externally validated by training the data on one hospital sample and testing on the other, and an overall estimated error was calculated using a completely random sample from both hospitals. ResultsThe average age of the patients was 66.7 (SD 17.2) years; 978 (54.3%) out of 1821 patients were female. The logistic regression model achieved the best performance using a combination of ICD codes, medications, and notes, with an AUROC of 0.968 (95% CI 0.940-0.982) and an AUPRC of 0.921 (95% CI 0.835-0.969). The models that only used ICD codes or medications had lower performance. The estimated overall error rate in a random EHR sample was 1.6%. The model also showed high external validity from training on Mass General Hospital data and testing on Beth Israel Deaconess Medical Center data (AUROC 0.927, 95% CI 0.908-0.944) and vice versa (AUROC 0.968, 95% CI 0.957-0.976). ConclusionsThe proposed EHR-based phenotyping model for CHF achieved excellent performance, external validity, and generalization across two institutions. The model enables multiple downstream uses, paving the way for large-scale studies of CHF treatment effectiveness, comorbidities, outcomes, and mechanisms.https://medinform.jmir.org/2025/1/e64113 |
| spellingShingle | Daniel Sumsion Elijah Davis Marta Fernandes Ruoqi Wei Rebecca Milde Jet Malou Veltink Wan-Yee Kong Yiwen Xiong Samvrit Rao Tara Westover Lydia Petersen Niels Turley Arjun Singh Stephanie Buss Shibani Mukerji Sahar Zafar Sudeshna Das Valdery Moura Junior Manohar Ghanta Aditya Gupta Jennifer Kim Katie Stone Emmanuel Mignot Dennis Hwang Lynn Marie Trotti Gari D Clifford Umakanth Katwa Robert Thomas M Brandon Westover Haoqi Sun Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study JMIR Medical Informatics |
| title | Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study |
| title_full | Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study |
| title_fullStr | Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study |
| title_full_unstemmed | Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study |
| title_short | Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study |
| title_sort | identification of patients with congestive heart failure from the electronic health records of two hospitals retrospective study |
| url | https://medinform.jmir.org/2025/1/e64113 |
| work_keys_str_mv | AT danielsumsion identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT elijahdavis identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT martafernandes identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT ruoqiwei identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT rebeccamilde identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT jetmalouveltink identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT wanyeekong identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT yiwenxiong identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT samvritrao identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT tarawestover identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT lydiapetersen identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT nielsturley identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT arjunsingh identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT stephaniebuss identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT shibanimukerji identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT saharzafar identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT sudeshnadas identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT valderymourajunior identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT manoharghanta identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT adityagupta identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT jenniferkim identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT katiestone identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT emmanuelmignot identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT dennishwang identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT lynnmarietrotti identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT garidclifford identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT umakanthkatwa identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT robertthomas identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT mbrandonwestover identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy AT haoqisun identificationofpatientswithcongestiveheartfailurefromtheelectronichealthrecordsoftwohospitalsretrospectivestudy |