Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study

BackgroundPulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especial...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammed Mahyoub, Kacie Dougherty, Ajit Shukla
Format: Article
Language:English
Published: JMIR Publications 2025-04-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2025/1/e67706
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850214082985066496
author Mohammed Mahyoub
Kacie Dougherty
Ajit Shukla
author_facet Mohammed Mahyoub
Kacie Dougherty
Ajit Shukla
author_sort Mohammed Mahyoub
collection DOAJ
description BackgroundPulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings. ObjectiveThis study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency. MethodsIn total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o’s ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting. ResultsGPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision. ConclusionsThe GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE.
format Article
id doaj-art-19dbafd03fa446c7bd0c43e27cfd8cb3
institution OA Journals
issn 2291-9694
language English
publishDate 2025-04-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj-art-19dbafd03fa446c7bd0c43e27cfd8cb32025-08-20T02:09:00ZengJMIR PublicationsJMIR Medical Informatics2291-96942025-04-0113e6770610.2196/67706Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation StudyMohammed Mahyoubhttps://orcid.org/0000-0002-0808-3205Kacie Doughertyhttps://orcid.org/0000-0001-6295-0026Ajit Shuklahttps://orcid.org/0009-0000-2094-9318 BackgroundPulmonary embolism (PE) is a critical condition requiring rapid diagnosis to reduce mortality. Extracting PE diagnoses from radiology reports manually is time-consuming, highlighting the need for automated solutions. Advances in natural language processing, especially transformer models like GPT-4o, offer promising tools to improve diagnostic accuracy and workflow efficiency in clinical settings. ObjectiveThis study aimed to develop an automatic extraction system using GPT-4o to extract PE diagnoses from radiology report impressions, enhancing clinical decision-making and workflow efficiency. MethodsIn total, 2 approaches were developed and evaluated: a fine-tuned Clinical Longformer as a baseline model and a GPT-4o-based extractor. Clinical Longformer, an encoder-only model, was chosen for its robustness in text classification tasks, particularly on smaller scales. GPT-4o, a decoder-only instruction-following LLM, was selected for its advanced language understanding capabilities. The study aimed to evaluate GPT-4o’s ability to perform text classification compared to the baseline Clinical Longformer. The Clinical Longformer was trained on a dataset of 1000 radiology report impressions and validated on a separate set of 200 samples, while the GPT-4o extractor was validated using the same 200-sample set. Postdeployment performance was further assessed on an additional 200 operational records to evaluate model efficacy in a real-world setting. ResultsGPT-4o outperformed the Clinical Longformer in 2 of the metrics, achieving a sensitivity of 1.0 (95% CI 1.0-1.0; Wilcoxon test, P<.001) and an F1-score of 0.975 (95% CI 0.9495-0.9947; Wilcoxon test, P<.001) across the validation dataset. Postdeployment evaluations also showed strong performance of the deployed GPT-4o model with a sensitivity of 1.0 (95% CI 1.0-1.0), a specificity of 0.94 (95% CI 0.8913-0.9804), and an F1-score of 0.97 (95% CI 0.9479-0.9908). This high level of accuracy supports a reduction in manual review, streamlining clinical workflows and improving diagnostic precision. ConclusionsThe GPT-4o model provides an effective solution for the automatic extraction of PE diagnoses from radiology reports, offering a reliable tool that aids timely and accurate clinical decision-making. This approach has the potential to significantly improve patient outcomes by expediting diagnosis and treatment pathways for critical conditions like PE.https://medinform.jmir.org/2025/1/e67706
spellingShingle Mohammed Mahyoub
Kacie Dougherty
Ajit Shukla
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
JMIR Medical Informatics
title Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
title_full Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
title_fullStr Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
title_full_unstemmed Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
title_short Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study
title_sort extracting pulmonary embolism diagnoses from radiology impressions using gpt 4o large language model evaluation study
url https://medinform.jmir.org/2025/1/e67706
work_keys_str_mv AT mohammedmahyoub extractingpulmonaryembolismdiagnosesfromradiologyimpressionsusinggpt4olargelanguagemodelevaluationstudy
AT kaciedougherty extractingpulmonaryembolismdiagnosesfromradiologyimpressionsusinggpt4olargelanguagemodelevaluationstudy
AT ajitshukla extractingpulmonaryembolismdiagnosesfromradiologyimpressionsusinggpt4olargelanguagemodelevaluationstudy