Vision-language model for report generation and outcome prediction in CT pulmonary angiogram

Abstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Languag...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhusi Zhong, Yuli Wang, Jing Wu, Wen-Chi Hsu, Vin Somasundaram, Lulu Bi, Shreyas Kulkarni, Zhuoqi Ma, Scott Collins, Grayson Baird, Sun Ho Ahn, Xue Feng, Ihab Kamel, Cheng Ting Lin, Colin Greineder, Michael Atalay, Zhicheng Jiao, Harrison Bai
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01807-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849761110377365504
author Zhusi Zhong
Yuli Wang
Jing Wu
Wen-Chi Hsu
Vin Somasundaram
Lulu Bi
Shreyas Kulkarni
Zhuoqi Ma
Scott Collins
Grayson Baird
Sun Ho Ahn
Xue Feng
Ihab Kamel
Cheng Ting Lin
Colin Greineder
Michael Atalay
Zhicheng Jiao
Harrison Bai
author_facet Zhusi Zhong
Yuli Wang
Jing Wu
Wen-Chi Hsu
Vin Somasundaram
Lulu Bi
Shreyas Kulkarni
Zhuoqi Ma
Scott Collins
Grayson Baird
Sun Ho Ahn
Xue Feng
Ihab Kamel
Cheng Ting Lin
Colin Greineder
Michael Atalay
Zhicheng Jiao
Harrison Bai
author_sort Zhusi Zhong
collection DOAJ
description Abstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.
format Article
id doaj-art-34f3d892d656456385a36fd4691243ac
institution DOAJ
issn 2398-6352
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-34f3d892d656456385a36fd4691243ac2025-08-20T03:06:08ZengNature Portfolionpj Digital Medicine2398-63522025-07-018111510.1038/s41746-025-01807-8Vision-language model for report generation and outcome prediction in CT pulmonary angiogramZhusi Zhong0Yuli Wang1Jing Wu2Wen-Chi Hsu3Vin Somasundaram4Lulu Bi5Shreyas Kulkarni6Zhuoqi Ma7Scott Collins8Grayson Baird9Sun Ho Ahn10Xue Feng11Ihab Kamel12Cheng Ting Lin13Colin Greineder14Michael Atalay15Zhicheng Jiao16Harrison Bai17Department of Diagnostic Imaging, Brown University HealthDepartment of Biomedical Engineering, Johns Hopkins University School of MedicineSecond Xiangya Hospital, Central South UniversityDepartment of Medical Imaging and Intervention, Chang Gung Memorial Hospital at LinkouDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthCarina AIDepartment of Radiology, University of Colorado School of MedicineDepartment of Radiology and Radiological Sciences, Johns Hopkins University School of MedicineDepartment of Emergency Medicine and Department of Pharmacology, University of MichiganDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Radiology and Radiological Sciences, Johns Hopkins University School of MedicineAbstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.https://doi.org/10.1038/s41746-025-01807-8
spellingShingle Zhusi Zhong
Yuli Wang
Jing Wu
Wen-Chi Hsu
Vin Somasundaram
Lulu Bi
Shreyas Kulkarni
Zhuoqi Ma
Scott Collins
Grayson Baird
Sun Ho Ahn
Xue Feng
Ihab Kamel
Cheng Ting Lin
Colin Greineder
Michael Atalay
Zhicheng Jiao
Harrison Bai
Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
npj Digital Medicine
title Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_full Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_fullStr Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_full_unstemmed Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_short Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_sort vision language model for report generation and outcome prediction in ct pulmonary angiogram
url https://doi.org/10.1038/s41746-025-01807-8
work_keys_str_mv AT zhusizhong visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT yuliwang visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT jingwu visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT wenchihsu visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT vinsomasundaram visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT lulubi visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT shreyaskulkarni visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT zhuoqima visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT scottcollins visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT graysonbaird visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT sunhoahn visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT xuefeng visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT ihabkamel visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT chengtinglin visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT colingreineder visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT michaelatalay visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT zhichengjiao visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram
AT harrisonbai visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram