Vision-language model for report generation and outcome prediction in CT pulmonary angiogram

Abstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Languag...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhusi Zhong, Yuli Wang, Jing Wu, Wen-Chi Hsu, Vin Somasundaram, Lulu Bi, Shreyas Kulkarni, Zhuoqi Ma, Scott Collins, Grayson Baird, Sun Ho Ahn, Xue Feng, Ihab Kamel, Cheng Ting Lin, Colin Greineder, Michael Atalay, Zhicheng Jiao, Harrison Bai
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-07-01
Series:	npj Digital Medicine
Online Access:	https://doi.org/10.1038/s41746-025-01807-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849761110377365504
author	Zhusi Zhong Yuli Wang Jing Wu Wen-Chi Hsu Vin Somasundaram Lulu Bi Shreyas Kulkarni Zhuoqi Ma Scott Collins Grayson Baird Sun Ho Ahn Xue Feng Ihab Kamel Cheng Ting Lin Colin Greineder Michael Atalay Zhicheng Jiao Harrison Bai
author_facet	Zhusi Zhong Yuli Wang Jing Wu Wen-Chi Hsu Vin Somasundaram Lulu Bi Shreyas Kulkarni Zhuoqi Ma Scott Collins Grayson Baird Sun Ho Ahn Xue Feng Ihab Kamel Cheng Ting Lin Colin Greineder Michael Atalay Zhicheng Jiao Harrison Bai
author_sort	Zhusi Zhong
collection	DOAJ
description	Abstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.
format	Article
id	doaj-art-34f3d892d656456385a36fd4691243ac
institution	DOAJ
issn	2398-6352
language	English
publishDate	2025-07-01
publisher	Nature Portfolio
record_format	Article
series	npj Digital Medicine
spelling	doaj-art-34f3d892d656456385a36fd4691243ac2025-08-20T03:06:08ZengNature Portfolionpj Digital Medicine2398-63522025-07-018111510.1038/s41746-025-01807-8Vision-language model for report generation and outcome prediction in CT pulmonary angiogramZhusi Zhong0Yuli Wang1Jing Wu2Wen-Chi Hsu3Vin Somasundaram4Lulu Bi5Shreyas Kulkarni6Zhuoqi Ma7Scott Collins8Grayson Baird9Sun Ho Ahn10Xue Feng11Ihab Kamel12Cheng Ting Lin13Colin Greineder14Michael Atalay15Zhicheng Jiao16Harrison Bai17Department of Diagnostic Imaging, Brown University HealthDepartment of Biomedical Engineering, Johns Hopkins University School of MedicineSecond Xiangya Hospital, Central South UniversityDepartment of Medical Imaging and Intervention, Chang Gung Memorial Hospital at LinkouDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthCarina AIDepartment of Radiology, University of Colorado School of MedicineDepartment of Radiology and Radiological Sciences, Johns Hopkins University School of MedicineDepartment of Emergency Medicine and Department of Pharmacology, University of MichiganDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Radiology and Radiological Sciences, Johns Hopkins University School of MedicineAbstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.https://doi.org/10.1038/s41746-025-01807-8
spellingShingle	Zhusi Zhong Yuli Wang Jing Wu Wen-Chi Hsu Vin Somasundaram Lulu Bi Shreyas Kulkarni Zhuoqi Ma Scott Collins Grayson Baird Sun Ho Ahn Xue Feng Ihab Kamel Cheng Ting Lin Colin Greineder Michael Atalay Zhicheng Jiao Harrison Bai Vision-language model for report generation and outcome prediction in CT pulmonary angiogram npj Digital Medicine
title	Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_full	Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_fullStr	Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_full_unstemmed	Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_short	Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
title_sort	vision language model for report generation and outcome prediction in ct pulmonary angiogram
url	https://doi.org/10.1038/s41746-025-01807-8
work_keys_str_mv	AT zhusizhong visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT yuliwang visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT jingwu visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT wenchihsu visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT vinsomasundaram visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT lulubi visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT shreyaskulkarni visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT zhuoqima visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT scottcollins visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT graysonbaird visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT sunhoahn visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT xuefeng visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT ihabkamel visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT chengtinglin visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT colingreineder visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT michaelatalay visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT zhichengjiao visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT harrisonbai visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram

Vision-language model for report generation and outcome prediction in CT pulmonary angiogram

Similar Items