An automated information extraction model for unstructured discharge letters using large language models and GPT-4
The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagno...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-06-01
|
Series: | Healthcare Analytics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772442524000807 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832595275302043648 |
---|---|
author | Robert M. Siepmann Giulia Baldini Cynthia S. Schmidt Daniel Truhn Gustav Anton Müller-Franzes Amin Dada Jens Kleesiek Felix Nensa René Hosch |
author_facet | Robert M. Siepmann Giulia Baldini Cynthia S. Schmidt Daniel Truhn Gustav Anton Müller-Franzes Amin Dada Jens Kleesiek Felix Nensa René Hosch |
author_sort | Robert M. Siepmann |
collection | DOAJ |
description | The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes. |
format | Article |
id | doaj-art-ac1d469139464923b279eea96d11db05 |
institution | Kabale University |
issn | 2772-4425 |
language | English |
publishDate | 2025-06-01 |
publisher | Elsevier |
record_format | Article |
series | Healthcare Analytics |
spelling | doaj-art-ac1d469139464923b279eea96d11db052025-01-19T06:26:56ZengElsevierHealthcare Analytics2772-44252025-06-017100378An automated information extraction model for unstructured discharge letters using large language models and GPT-4Robert M. Siepmann0Giulia Baldini1Cynthia S. Schmidt2Daniel Truhn3Gustav Anton Müller-Franzes4Amin Dada5Jens Kleesiek6Felix Nensa7René Hosch8Department of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, GermanyInstitute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany; Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany; Institute for Transfusion Medicine, University Hospital Essen, Essen, GermanyDepartment of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, GermanyDepartment of Diagnostic and Interventional Radiology, University Hospital Aachen, Aachen, GermanyInstitute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany; Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, GermanyInstitute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Essen, Germany; Institute for Artificial Intelligence in Medicine (IKIM), University Hospital Essen, Essen, Germany; Corresponding author. Institute of Diagnostic and Interventional Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, 45147, Essen, Germany.The administrative burden of manually extracting clinical information from discharge letters is a common challenge in healthcare. This study aims to explore the use of Large Language Models (LLMs), specifically Generative Pretrained Transformer 4 (GPT-4) by OpenAI, for automated extraction of diagnoses, medications, and allergies from discharge letters. Data for this study were sourced from two healthcare institutions in Germany, comprising discharge letters for ten patients from each institution. The first experiment is conducted using a standardized prompt for information extraction. However, challenges were encountered, and the prompt was fine-tuned in a second experiment to improve the results. We further tested whether open-source LLMs can achieve similar results. In the first experiment, primary diagnoses were identified with 85% accuracy and secondary diagnoses with 55.8%. Medications and allergies were extracted with 85.9% and 100% accuracy, respectively. The International Classification of Diseases, 10th revision (ICD-10) codes for the identified diagnoses achieved an accuracy of 85% for primary diagnoses and 60.7% for secondary diagnoses. Anatomical Therapeutic Chemical (ATC) codes were identified with an accuracy of 78.8%. On the other hand, open-source LLMs did not provide similar levels of accuracy and could not consistently fill the template. With prompt fine-tuning in the second experiment, the primary diagnoses, secondary diagnoses, and medications could be predicted with 95%, 88.9%, and 92.2% accuracy, respectively. GPT-4 shows excellent potential for automated extraction of crucial diagnostic and medication information from discharge letters, presumably lowering the administrative burden for healthcare professionals and improving patient outcomes.http://www.sciencedirect.com/science/article/pii/S2772442524000807Large language modelsAutomated information extractionArtificial intelligenceGenerative pre-trained transformer (GPT)ChatGPTDischarge letters |
spellingShingle | Robert M. Siepmann Giulia Baldini Cynthia S. Schmidt Daniel Truhn Gustav Anton Müller-Franzes Amin Dada Jens Kleesiek Felix Nensa René Hosch An automated information extraction model for unstructured discharge letters using large language models and GPT-4 Healthcare Analytics Large language models Automated information extraction Artificial intelligence Generative pre-trained transformer (GPT) ChatGPT Discharge letters |
title | An automated information extraction model for unstructured discharge letters using large language models and GPT-4 |
title_full | An automated information extraction model for unstructured discharge letters using large language models and GPT-4 |
title_fullStr | An automated information extraction model for unstructured discharge letters using large language models and GPT-4 |
title_full_unstemmed | An automated information extraction model for unstructured discharge letters using large language models and GPT-4 |
title_short | An automated information extraction model for unstructured discharge letters using large language models and GPT-4 |
title_sort | automated information extraction model for unstructured discharge letters using large language models and gpt 4 |
topic | Large language models Automated information extraction Artificial intelligence Generative pre-trained transformer (GPT) ChatGPT Discharge letters |
url | http://www.sciencedirect.com/science/article/pii/S2772442524000807 |
work_keys_str_mv | AT robertmsiepmann anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT giuliabaldini anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT cynthiasschmidt anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT danieltruhn anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT gustavantonmullerfranzes anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT amindada anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT jenskleesiek anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT felixnensa anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT renehosch anautomatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT robertmsiepmann automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT giuliabaldini automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT cynthiasschmidt automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT danieltruhn automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT gustavantonmullerfranzes automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT amindada automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT jenskleesiek automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT felixnensa automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 AT renehosch automatedinformationextractionmodelforunstructureddischargelettersusinglargelanguagemodelsandgpt4 |