Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation
Abstract Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexpl...
Saved in:
| Main Authors: | , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-57426-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849761559948034048 |
|---|---|
| author | Cheng-Yi Li Kao-Jung Chang Cheng-Fu Yang Hsin-Yu Wu Wenting Chen Hritik Bansal Ling Chen Yi-Ping Yang Yu-Chun Chen Shih-Pin Chen Shih-Jen Chen Jiing-Feng Lirng Kai-Wei Chang Shih-Hwa Chiou |
| author_facet | Cheng-Yi Li Kao-Jung Chang Cheng-Fu Yang Hsin-Yu Wu Wenting Chen Hritik Bansal Ling Chen Yi-Ping Yang Yu-Chun Chen Shih-Pin Chen Shih-Jen Chen Jiing-Feng Lirng Kai-Wei Chang Shih-Hwa Chiou |
| author_sort | Cheng-Yi Li |
| collection | DOAJ |
| description | Abstract Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexplored. In this regard, we curate the 3D-BrainCT dataset (18,885 text-scan pairs) and develop BrainGPT, a clinically visual instruction-tuned (CVIT) model designed for 3D CT RRG. While we notice that the traditional LLM metrics failed to gauge the diagnostic quality of the RRG, we propose feature-oriented radiology task evaluation (FORTE), an evaluation scheme that captures the clinical essence of the generated reports. Here we show that BrainGPT achieves an average FORTE F1-score of 0.71 (degree = 0.661; landmark = 0.706; feature = 0.693, and impression = 0.779) and 74% of BrainGPT-generated reports were indistinguishable from human-written ground truth in a Turing-like test. Together, our work establishes a comprehensive framework encompassing dataset curation, anatomy-aware model fine-tuning, and the development of robust evaluation metrics for the RRG. By sharing our experience in 3D MLLM-based RRG, we aim to accelerate the expedition in human-machine collaboration for next-generation healthcare. |
| format | Article |
| id | doaj-art-bcb337b2a2dd4543bb89a64467f00dce |
| institution | DOAJ |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-bcb337b2a2dd4543bb89a64467f00dce2025-08-20T03:06:00ZengNature PortfolioNature Communications2041-17232025-03-0116111410.1038/s41467-025-57426-0Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generationCheng-Yi Li0Kao-Jung Chang1Cheng-Fu Yang2Hsin-Yu Wu3Wenting Chen4Hritik Bansal5Ling Chen6Yi-Ping Yang7Yu-Chun Chen8Shih-Pin Chen9Shih-Jen Chen10Jiing-Feng Lirng11Kai-Wei Chang12Shih-Hwa Chiou13School of Medicine, National Yang Ming Chiao Tung UniversityDepartment of Medical Research, Taipei Veterans General HospitalDepartment of Computer Science, University of CaliforniaSchool of Medicine, National Yang Ming Chiao Tung UniversityDepartment of Electrical Engineering, City University of Hong KongDepartment of Computer Science, University of CaliforniaInstitute of Hospital and Health Care Administration, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversitySchool of Medicine, National Yang Ming Chiao Tung UniversityDepartment of Computer Science, University of CaliforniaDepartment of Medical Research, Taipei Veterans General HospitalAbstract Multi-modal large language models (MLLMs) have transformed the landscape of modern healthcare, with automated radiology report generation (RRG) emerging as a cutting-edge application. While 2D MLLM-based RRG has been well established, its utility for 3D medical images remains largely unexplored. In this regard, we curate the 3D-BrainCT dataset (18,885 text-scan pairs) and develop BrainGPT, a clinically visual instruction-tuned (CVIT) model designed for 3D CT RRG. While we notice that the traditional LLM metrics failed to gauge the diagnostic quality of the RRG, we propose feature-oriented radiology task evaluation (FORTE), an evaluation scheme that captures the clinical essence of the generated reports. Here we show that BrainGPT achieves an average FORTE F1-score of 0.71 (degree = 0.661; landmark = 0.706; feature = 0.693, and impression = 0.779) and 74% of BrainGPT-generated reports were indistinguishable from human-written ground truth in a Turing-like test. Together, our work establishes a comprehensive framework encompassing dataset curation, anatomy-aware model fine-tuning, and the development of robust evaluation metrics for the RRG. By sharing our experience in 3D MLLM-based RRG, we aim to accelerate the expedition in human-machine collaboration for next-generation healthcare.https://doi.org/10.1038/s41467-025-57426-0 |
| spellingShingle | Cheng-Yi Li Kao-Jung Chang Cheng-Fu Yang Hsin-Yu Wu Wenting Chen Hritik Bansal Ling Chen Yi-Ping Yang Yu-Chun Chen Shih-Pin Chen Shih-Jen Chen Jiing-Feng Lirng Kai-Wei Chang Shih-Hwa Chiou Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation Nature Communications |
| title | Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation |
| title_full | Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation |
| title_fullStr | Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation |
| title_full_unstemmed | Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation |
| title_short | Towards a holistic framework for multimodal LLM in 3D brain CT radiology report generation |
| title_sort | towards a holistic framework for multimodal llm in 3d brain ct radiology report generation |
| url | https://doi.org/10.1038/s41467-025-57426-0 |
| work_keys_str_mv | AT chengyili towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT kaojungchang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT chengfuyang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT hsinyuwu towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT wentingchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT hritikbansal towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT lingchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT yipingyang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT yuchunchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT shihpinchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT shihjenchen towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT jiingfenglirng towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT kaiweichang towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration AT shihhwachiou towardsaholisticframeworkformultimodalllmin3dbrainctradiologyreportgeneration |