An automated information extraction model for unstructured discharge letters using large language models and GPT-4

The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagno...

Full description

Saved in:

Bibliographic Details
Main Authors:	Robert M. Siepmann, Giulia Baldini, Cynthia S. Schmidt, Daniel Truhn, Gustav Anton Müller-Franzes, Amin Dada, Jens Kleesiek, Felix Nensa, René Hosch
Format:	Article
Language:	English
Published:	Elsevier 2025-06-01
Series:	Healthcare Analytics
Subjects:	Large language models Automated information extraction Artificial intelligence Generative pre-trained transformer (GPT) ChatGPT Discharge letters
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772442524000807
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832595275302043648
author	Robert M. Siepmann Giulia Baldini Cynthia S. Schmidt Daniel Truhn Gustav Anton Müller-Franzes Amin Dada Jens Kleesiek Felix Nensa René Hosch
author_facet	Robert M. Siepmann Giulia Baldini Cynthia S. Schmidt Daniel Truhn Gustav Anton Müller-Franzes Amin Dada Jens Kleesiek Felix Nensa René Hosch
author_sort	Robert M. Siepmann
collection	DOAJ
description	The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.
format	Article
id	doaj-art-ac1d469139464923b279eea96d11db05
institution	Kabale University
issn	2772-4425
language	English
publishDate	2025-06-01
publisher	Elsevier
record_format	Article
series	Healthcare Analytics
spelling	doaj-art-ac1d469139464923b279eea96d11db052025-01-19T06:26:56ZengElsevierHealthcare Analytics2772-44252025-06-017100378An automated information extraction model for unstructured discharge letters using large language models and GPT-4Robert M. Siepmann0Giulia Baldini1Cynthia S. Schmidt2Daniel Truhn3Gustav Anton Müller-Franzes4Amin Dada5Jens Kleesiek6Felix Nensa7René Hosch8Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, GermanyInstitute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany; Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany; Institute for Transfusion Medicine, University Hospital Essen, Essen, GermanyDepartment of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, GermanyDepartment of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, GermanyInstitute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany; Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany; Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany; Corresponding author. Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.http://www.sciencedirect.com/science/article/pii/S2772442524000807Large language modelsAutomated information extractionArtificial intelligenceGenerative pre-trained transformer (GPT)ChatGPTDischarge letters
spellingShingle	Robert M. Siepmann Giulia Baldini Cynthia S. Schmidt Daniel Truhn Gustav Anton Müller-Franzes Amin Dada Jens Kleesiek Felix Nensa René Hosch An automated information extraction model for unstructured discharge letters using large language models and GPT-4 Healthcare Analytics Large language models Automated information extraction Artificial intelligence Generative pre-trained transformer (GPT) ChatGPT Discharge letters
title	An automated information extraction model for unstructured discharge letters using large language models and GPT-4
title_full	An automated information extraction model for unstructured discharge letters using large language models and GPT-4
title_fullStr	An automated information extraction model for unstructured discharge letters using large language models and GPT-4
title_full_unstemmed	An automated information extraction model for unstructured discharge letters using large language models and GPT-4
title_short	An automated information extraction model for unstructured discharge letters using large language models and GPT-4
title_sort	automated information extraction model for unstructured discharge letters using large language models and gpt 4
topic	Large language models Automated information extraction Artificial intelligence Generative pre-trained transformer (GPT) ChatGPT Discharge letters
url	http://www.sciencedirect.com/science/article/pii/S2772442524000807
work_keys_str_mv	AT robertmsiepmann anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT giuliabaldini anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT cynthiasschmidt anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT danieltruhn anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT gustavantonmullerfranzes anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT amindada anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT jenskleesiek anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT felixnensa anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT renehosch anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT robertmsiepmann automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT giuliabaldini automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT cynthiasschmidt automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT danieltruhn automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT gustavantonmullerfranzes automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT amindada automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT jenskleesiek automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT felixnensa automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT renehosch automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4

An automated information extraction model for unstructured discharge letters using large language models and GPT-4

Similar Items