Large language models aided patient progression documentation according to the ICD standard

Background and Objective: Healthcare documentation processing is becoming more and more efficient and effective as a result of advances in machine learning and natural language processing (NLP). One challenge in clinical practice is the early detection of future patient potential diagnoses, which is...

Full description

Saved in:
Bibliographic Details
Main Authors: Nuria Lebeña, Arantza Casillas, Alicia Pérez
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914825000255
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850135899805843456
author Nuria Lebeña
Arantza Casillas
Alicia Pérez
author_facet Nuria Lebeña
Arantza Casillas
Alicia Pérez
author_sort Nuria Lebeña
collection DOAJ
description Background and Objective: Healthcare documentation processing is becoming more and more efficient and effective as a result of advances in machine learning and natural language processing (NLP). One challenge in clinical practice is the early detection of future patient potential diagnoses, which is crucial for preventive medicine. Estimating the potential future diagnoses, helps to speed up the management of Electronic Health Records (EHRs) and opens a path towards clinical prevention. It is a challenging task, as there are thousands of possible diseases, and, in general, there is limited data available to train systems due to privacy concerns.The objective of his study is to infer future probable diagnoses given patients diagnosis history. In previous works, this task has been carried out using structured data, such as, ICD-coded diagnoses, overlooking unstructured textual information in EHRs. Unlike traditional methods, this study aims to enhance next-diagnosis prediction by integrating patient diagnosis information codified according to the International Classification of Diseases (ICD) with unstructured clinical text. Methods:: We propose a multi-faceted model that integrates structured ICD-encoded patient histories with unstructured EHR text for future diagnosis prediction. Our approach consists of (1) a sequential model trained on structured diagnosis timelines, (2) a Clinical Longformer-based model trained on unstructured EHRs, and (3) an ensemble strategy to combine predictions from both components. Results:: Our proposed ensemble strategy significantly outperforms current state-of-the-art approaches in predicting future diagnoses, achieving a Precision@5 of 72.34% and a Precision@20 of 77.49%. Additionally, it showed high robustness and reliability across different demographic groups and a varying scope of medical history. Conclusion:: This research demonstrates that the integration of structured ICD diagnoses timelines with unstructured EHRs achieves improved results compared to just using structured diagnosis timelines. Notably, the proposed model also maintained high accuracy even with a short-term history of diagnoses.
format Article
id doaj-art-385b1e46e8a64400a905baa265f20524
institution OA Journals
issn 2352-9148
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Informatics in Medicine Unlocked
spelling doaj-art-385b1e46e8a64400a905baa265f205242025-08-20T02:31:16ZengElsevierInformatics in Medicine Unlocked2352-91482025-01-015510163710.1016/j.imu.2025.101637Large language models aided patient progression documentation according to the ICD standardNuria Lebeña0Arantza Casillas1Alicia Pérez2HiTZ Center - Ixa, University of the Basque Country (UPV/EHU). Department of electricity and electronics, Sarriena 2, Leioa 48940, Spain; Corresponding author.HiTZ Center - Ixa, University of the Basque Country (UPV/EHU). Department of electricity and electronics, Sarriena 2, Leioa 48940, SpainHiTZ Center - Ixa, University of the Basque Country (UPV/EHU). Department of computer languages and systems, Rafael Moreno “Pitxitxi” 2/3, Bilbao 48013, SpainBackground and Objective: Healthcare documentation processing is becoming more and more efficient and effective as a result of advances in machine learning and natural language processing (NLP). One challenge in clinical practice is the early detection of future patient potential diagnoses, which is crucial for preventive medicine. Estimating the potential future diagnoses, helps to speed up the management of Electronic Health Records (EHRs) and opens a path towards clinical prevention. It is a challenging task, as there are thousands of possible diseases, and, in general, there is limited data available to train systems due to privacy concerns.The objective of his study is to infer future probable diagnoses given patients diagnosis history. In previous works, this task has been carried out using structured data, such as, ICD-coded diagnoses, overlooking unstructured textual information in EHRs. Unlike traditional methods, this study aims to enhance next-diagnosis prediction by integrating patient diagnosis information codified according to the International Classification of Diseases (ICD) with unstructured clinical text. Methods:: We propose a multi-faceted model that integrates structured ICD-encoded patient histories with unstructured EHR text for future diagnosis prediction. Our approach consists of (1) a sequential model trained on structured diagnosis timelines, (2) a Clinical Longformer-based model trained on unstructured EHRs, and (3) an ensemble strategy to combine predictions from both components. Results:: Our proposed ensemble strategy significantly outperforms current state-of-the-art approaches in predicting future diagnoses, achieving a Precision@5 of 72.34% and a Precision@20 of 77.49%. Additionally, it showed high robustness and reliability across different demographic groups and a varying scope of medical history. Conclusion:: This research demonstrates that the integration of structured ICD diagnoses timelines with unstructured EHRs achieves improved results compared to just using structured diagnosis timelines. Notably, the proposed model also maintained high accuracy even with a short-term history of diagnoses.http://www.sciencedirect.com/science/article/pii/S2352914825000255Natural language processingElectronic health recordsInternational classification of diseasesClinical documentationPreventive medicine
spellingShingle Nuria Lebeña
Arantza Casillas
Alicia Pérez
Large language models aided patient progression documentation according to the ICD standard
Informatics in Medicine Unlocked
Natural language processing
Electronic health records
International classification of diseases
Clinical documentation
Preventive medicine
title Large language models aided patient progression documentation according to the ICD standard
title_full Large language models aided patient progression documentation according to the ICD standard
title_fullStr Large language models aided patient progression documentation according to the ICD standard
title_full_unstemmed Large language models aided patient progression documentation according to the ICD standard
title_short Large language models aided patient progression documentation according to the ICD standard
title_sort large language models aided patient progression documentation according to the icd standard
topic Natural language processing
Electronic health records
International classification of diseases
Clinical documentation
Preventive medicine
url http://www.sciencedirect.com/science/article/pii/S2352914825000255
work_keys_str_mv AT nurialebena largelanguagemodelsaidedpatientprogressiondocumentationaccordingtotheicdstandard
AT arantzacasillas largelanguagemodelsaidedpatientprogressiondocumentationaccordingtotheicdstandard
AT aliciaperez largelanguagemodelsaidedpatientprogressiondocumentationaccordingtotheicdstandard