Vision-language model for report generation and outcome prediction in CT pulmonary angiogram
Abstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Languag...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | npj Digital Medicine |
| Online Access: | https://doi.org/10.1038/s41746-025-01807-8 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849761110377365504 |
|---|---|
| author | Zhusi Zhong Yuli Wang Jing Wu Wen-Chi Hsu Vin Somasundaram Lulu Bi Shreyas Kulkarni Zhuoqi Ma Scott Collins Grayson Baird Sun Ho Ahn Xue Feng Ihab Kamel Cheng Ting Lin Colin Greineder Michael Atalay Zhicheng Jiao Harrison Bai |
| author_facet | Zhusi Zhong Yuli Wang Jing Wu Wen-Chi Hsu Vin Somasundaram Lulu Bi Shreyas Kulkarni Zhuoqi Ma Scott Collins Grayson Baird Sun Ho Ahn Xue Feng Ihab Kamel Cheng Ting Lin Colin Greineder Michael Atalay Zhicheng Jiao Harrison Bai |
| author_sort | Zhusi Zhong |
| collection | DOAJ |
| description | Abstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction. |
| format | Article |
| id | doaj-art-34f3d892d656456385a36fd4691243ac |
| institution | DOAJ |
| issn | 2398-6352 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | npj Digital Medicine |
| spelling | doaj-art-34f3d892d656456385a36fd4691243ac2025-08-20T03:06:08ZengNature Portfolionpj Digital Medicine2398-63522025-07-018111510.1038/s41746-025-01807-8Vision-language model for report generation and outcome prediction in CT pulmonary angiogramZhusi Zhong0Yuli Wang1Jing Wu2Wen-Chi Hsu3Vin Somasundaram4Lulu Bi5Shreyas Kulkarni6Zhuoqi Ma7Scott Collins8Grayson Baird9Sun Ho Ahn10Xue Feng11Ihab Kamel12Cheng Ting Lin13Colin Greineder14Michael Atalay15Zhicheng Jiao16Harrison Bai17Department of Diagnostic Imaging, Brown University HealthDepartment of Biomedical Engineering, Johns Hopkins University School of MedicineSecond Xiangya Hospital, Central South UniversityDepartment of Medical Imaging and Intervention, Chang Gung Memorial Hospital at LinkouDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthCarina AIDepartment of Radiology, University of Colorado School of MedicineDepartment of Radiology and Radiological Sciences, Johns Hopkins University School of MedicineDepartment of Emergency Medicine and Department of Pharmacology, University of MichiganDepartment of Diagnostic Imaging, Brown University HealthDepartment of Diagnostic Imaging, Brown University HealthDepartment of Radiology and Radiological Sciences, Johns Hopkins University School of MedicineAbstract Accurate and comprehensive interpretation of pulmonary embolism (PE) from Computed Tomography Pulmonary Angiography (CTPA) scans remains a clinical challenge due to the limited specificity and structure of existing AI tools. We propose an agent-based framework that integrates Vision-Language Models (VLMs) for detecting 32 PE-related abnormalities and Large Language Models (LLMs) for structured report generation. Trained on over 69,000 CTPA studies from 24,890 patients across Brown University Health (BUH), Johns Hopkins University (JHU), and the INSPECT dataset from Stanford, the model demonstrates strong performance in abnormality classification and report generation. For abnormality classification, it achieved AUROC scores of 0.788 (BUH), 0.754 (INSPECT), and 0.710 (JHU), with corresponding BERT-F1 scores of 0.891, 0.829, and 0.842. The abnormality-guided reporting strategy consistently outperformed the organ-based and holistic captioning baselines. For survival prediction, a multimodal fusion model that incorporates imaging, clinical variables, diagnostic outputs, and generated reports achieved concordance indices of 0.863 (BUH) and 0.731 (JHU), outperforming traditional PESI scores. This framework provides a clinically meaningful and interpretable solution for end-to-end PE diagnosis, structured reporting, and outcome prediction.https://doi.org/10.1038/s41746-025-01807-8 |
| spellingShingle | Zhusi Zhong Yuli Wang Jing Wu Wen-Chi Hsu Vin Somasundaram Lulu Bi Shreyas Kulkarni Zhuoqi Ma Scott Collins Grayson Baird Sun Ho Ahn Xue Feng Ihab Kamel Cheng Ting Lin Colin Greineder Michael Atalay Zhicheng Jiao Harrison Bai Vision-language model for report generation and outcome prediction in CT pulmonary angiogram npj Digital Medicine |
| title | Vision-language model for report generation and outcome prediction in CT pulmonary angiogram |
| title_full | Vision-language model for report generation and outcome prediction in CT pulmonary angiogram |
| title_fullStr | Vision-language model for report generation and outcome prediction in CT pulmonary angiogram |
| title_full_unstemmed | Vision-language model for report generation and outcome prediction in CT pulmonary angiogram |
| title_short | Vision-language model for report generation and outcome prediction in CT pulmonary angiogram |
| title_sort | vision language model for report generation and outcome prediction in ct pulmonary angiogram |
| url | https://doi.org/10.1038/s41746-025-01807-8 |
| work_keys_str_mv | AT zhusizhong visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT yuliwang visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT jingwu visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT wenchihsu visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT vinsomasundaram visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT lulubi visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT shreyaskulkarni visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT zhuoqima visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT scottcollins visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT graysonbaird visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT sunhoahn visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT xuefeng visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT ihabkamel visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT chengtinglin visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT colingreineder visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT michaelatalay visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT zhichengjiao visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram AT harrisonbai visionlanguagemodelforreportgenerationandoutcomepredictioninctpulmonaryangiogram |