Weakly supervised language models for automated extraction of critical findings from radiology reports

Abstract Critical findings in radiology reports are life threatening conditions that need to be communicated promptly to physicians for timely management of patients. Although challenging, advancements in natural language processing (NLP), particularly large language models (LLMs), now enable the au...

Full description

Saved in:
Bibliographic Details
Main Authors: Avisha Das, Ish A. Talati, Juan Manuel Zambrano Chaves, Daniel Rubin, Imon Banerjee
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01522-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849728997764628480
author Avisha Das
Ish A. Talati
Juan Manuel Zambrano Chaves
Daniel Rubin
Imon Banerjee
author_facet Avisha Das
Ish A. Talati
Juan Manuel Zambrano Chaves
Daniel Rubin
Imon Banerjee
author_sort Avisha Das
collection DOAJ
description Abstract Critical findings in radiology reports are life threatening conditions that need to be communicated promptly to physicians for timely management of patients. Although challenging, advancements in natural language processing (NLP), particularly large language models (LLMs), now enable the automated identification of key findings from verbose reports. Given the scarcity of labeled critical findings data, we implemented a two-phase, weakly supervised fine-tuning approach on 15,000 unlabeled Mayo Clinic reports. This fine-tuned model then automatically extracted critical terms on internal (Mayo Clinic, n = 80) and external (MIMIC-III, n = 123) test datasets, validated against expert annotations. Model performance was further assessed on 5000 MIMIC-IV reports using LLM-aided metrics, G-eval and Prometheus. Both manual and LLM-based evaluations showed improved task alignment with weak supervision. The pipeline and model, publicly available under an academic license, can aid in critical finding extraction for research and clinical use ( https://github.com/dasavisha/CriticalFindings_Extract ).
format Article
id doaj-art-c496ba8fdae946d6971f907ddfba92bd
institution DOAJ
issn 2398-6352
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-c496ba8fdae946d6971f907ddfba92bd2025-08-20T03:09:21ZengNature Portfolionpj Digital Medicine2398-63522025-05-01811910.1038/s41746-025-01522-4Weakly supervised language models for automated extraction of critical findings from radiology reportsAvisha Das0Ish A. Talati1Juan Manuel Zambrano Chaves2Daniel Rubin3Imon Banerjee4Arizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic ArizonaDepartment of Radiology, Stanford UniversityDepartment of Biomedical Data Science, Stanford UniversityDepartment of Radiology, Stanford UniversityArizona Advanced AI & Innovation (A3I) Hub, Mayo Clinic ArizonaAbstract Critical findings in radiology reports are life threatening conditions that need to be communicated promptly to physicians for timely management of patients. Although challenging, advancements in natural language processing (NLP), particularly large language models (LLMs), now enable the automated identification of key findings from verbose reports. Given the scarcity of labeled critical findings data, we implemented a two-phase, weakly supervised fine-tuning approach on 15,000 unlabeled Mayo Clinic reports. This fine-tuned model then automatically extracted critical terms on internal (Mayo Clinic, n = 80) and external (MIMIC-III, n = 123) test datasets, validated against expert annotations. Model performance was further assessed on 5000 MIMIC-IV reports using LLM-aided metrics, G-eval and Prometheus. Both manual and LLM-based evaluations showed improved task alignment with weak supervision. The pipeline and model, publicly available under an academic license, can aid in critical finding extraction for research and clinical use ( https://github.com/dasavisha/CriticalFindings_Extract ).https://doi.org/10.1038/s41746-025-01522-4
spellingShingle Avisha Das
Ish A. Talati
Juan Manuel Zambrano Chaves
Daniel Rubin
Imon Banerjee
Weakly supervised language models for automated extraction of critical findings from radiology reports
npj Digital Medicine
title Weakly supervised language models for automated extraction of critical findings from radiology reports
title_full Weakly supervised language models for automated extraction of critical findings from radiology reports
title_fullStr Weakly supervised language models for automated extraction of critical findings from radiology reports
title_full_unstemmed Weakly supervised language models for automated extraction of critical findings from radiology reports
title_short Weakly supervised language models for automated extraction of critical findings from radiology reports
title_sort weakly supervised language models for automated extraction of critical findings from radiology reports
url https://doi.org/10.1038/s41746-025-01522-4
work_keys_str_mv AT avishadas weaklysupervisedlanguagemodelsforautomatedextractionofcriticalfindingsfromradiologyreports
AT ishatalati weaklysupervisedlanguagemodelsforautomatedextractionofcriticalfindingsfromradiologyreports
AT juanmanuelzambranochaves weaklysupervisedlanguagemodelsforautomatedextractionofcriticalfindingsfromradiologyreports
AT danielrubin weaklysupervisedlanguagemodelsforautomatedextractionofcriticalfindingsfromradiologyreports
AT imonbanerjee weaklysupervisedlanguagemodelsforautomatedextractionofcriticalfindingsfromradiologyreports