Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation

Abstract Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexpl...

Full description

Saved in:
Bibliographic Details
Main Authors: Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, Wenting Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Shih-Jen Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-57426-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849761559948034048
author Cheng-Yi Li
Kao-Jung Chang
Cheng-Fu Yang
Hsin-Yu Wu
Wenting Chen
Hritik Bansal
Ling Chen
Yi-Ping Yang
Yu-Chun Chen
Shih-Pin Chen
Shih-Jen Chen
Jiing-Feng Lirng
Kai-Wei Chang
Shih-Hwa Chiou
author_facet Cheng-Yi Li
Kao-Jung Chang
Cheng-Fu Yang
Hsin-Yu Wu
Wenting Chen
Hritik Bansal
Ling Chen
Yi-Ping Yang
Yu-Chun Chen
Shih-Pin Chen
Shih-Jen Chen
Jiing-Feng Lirng
Kai-Wei Chang
Shih-Hwa Chiou
author_sort Cheng-Yi Li
collection DOAJ
description Abstract Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexplored. In this regard, we curate the 3D-BrainCT dataset (18,885 text-scan pairs) and develop BrainGPT, a clinically visual instruction-tuned (CVIT) model designed for 3D CT RRG. While we notice that the traditional LLM metrics failed to gauge the diagnostic quality of the RRG, we propose feature-oriented radiology task evaluation (FORTE), an evaluation scheme that captures the clinical essence of the generated reports. Here we show that BrainGPT achieves an average FORTE F1-score of 0.71 (degree = 0.661; landmark = 0.706; feature = 0.693, and impression = 0.779) and 74% of BrainGPT-generated reports were indistinguishable from human-written ground truth in a Turing-like test. Together, our work establishes a comprehensive framework encompassing dataset curation, anatomy-aware model fine-tuning, and the development of robust evaluation metrics for the RRG. By sharing our experience in 3D MLLM-based RRG, we aim to accelerate the expedition in human-machine collaboration for next-generation healthcare.
format Article
id doaj-art-bcb337b2a2dd4543bb89a64467f00dce
institution DOAJ
issn 2041-1723
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-bcb337b2a2dd4543bb89a64467f00dce2025-08-20T03:06:00ZengNature PortfolioNature Communications2041-17232025-03-0116111410.1038/s41467-025-57426-0Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generationCheng-Yi Li0Kao-Jung Chang1Cheng-Fu Yang2Hsin-Yu Wu3Wenting Chen4Hritik Bansal5Ling Chen6Yi-Ping Yang7Yu-Chun Chen8Shih-Pin Chen9Shih-Jen Chen10Jiing-Feng Lirng11Kai-Wei Chang12Shih-Hwa Chiou13School of Medicine, National Yang Ming Chiao Tung UniversityDepartment of Medical Research, Taipei Veterans General HospitalDepartment of Computer Science, University of CaliforniaSchool of Medicine, National Yang Ming Chiao Tung UniversityDepartment of Electrical Engineering, City University of Hong KongDepartment of Computer Science, University of CaliforniaInstitute of Hospital and Health Care Administration, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversityDepartment of Computer Science, University of CaliforniaDepartment of Medical Research, Taipei Veterans General HospitalAbstract Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexplored. In this regard, we curate the 3D-BrainCT dataset (18,885 text-scan pairs) and develop BrainGPT, a clinically visual instruction-tuned (CVIT) model designed for 3D CT RRG. While we notice that the traditional LLM metrics failed to gauge the diagnostic quality of the RRG, we propose feature-oriented radiology task evaluation (FORTE), an evaluation scheme that captures the clinical essence of the generated reports. Here we show that BrainGPT achieves an average FORTE F1-score of 0.71 (degree = 0.661; landmark = 0.706; feature = 0.693, and impression = 0.779) and 74% of BrainGPT-generated reports were indistinguishable from human-written ground truth in a Turing-like test. Together, our work establishes a comprehensive framework encompassing dataset curation, anatomy-aware model fine-tuning, and the development of robust evaluation metrics for the RRG. By sharing our experience in 3D MLLM-based RRG, we aim to accelerate the expedition in human-machine collaboration for next-generation healthcare.https://doi.org/10.1038/s41467-025-57426-0
spellingShingle Cheng-Yi Li
Kao-Jung Chang
Cheng-Fu Yang
Hsin-Yu Wu
Wenting Chen
Hritik Bansal
Ling Chen
Yi-Ping Yang
Yu-Chun Chen
Shih-Pin Chen
Shih-Jen Chen
Jiing-Feng Lirng
Kai-Wei Chang
Shih-Hwa Chiou
Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation
Nature Communications
title Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation
title_full Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation
title_fullStr Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation
title_full_unstemmed Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation
title_short Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation
title_sort towards a holistic framework for multimodal llm in 3d brain ct radiology report generation
url https://doi.org/10.1038/s41467-025-57426-0
work_keys_str_mv AT chengyili towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT kaojungchang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT chengfuyang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT hsinyuwu towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT wentingchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT hritikbansal towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT lingchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT yipingyang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT yuchunchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT shihpinchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT shihjenchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT jiingfenglirng towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT kaiweichang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration
AT shihhwachiou towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration