R2GenGPT: Radiology Report Generation with frozen LLMs

Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in moda...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2023-11-01
Series:Meta-Radiology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2950162823000334
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850194058433003520
author Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
author_facet Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
author_sort Zhanyu Wang
collection DOAJ
description Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 ​M parameters (which constitute just 0.07 ​% of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at https://github.com/wang-zhanyu/R2GenGPT.
format Article
id doaj-art-e35f5633af0d481cab241a2e8bedf6cd
institution OA Journals
issn 2950-1628
language English
publishDate 2023-11-01
publisher KeAi Communications Co., Ltd.
record_format Article
series Meta-Radiology
spelling doaj-art-e35f5633af0d481cab241a2e8bedf6cd2025-08-20T02:14:06ZengKeAi Communications Co., Ltd.Meta-Radiology2950-16282023-11-011310003310.1016/j.metrad.2023.100033R2GenGPT: Radiology Report Generation with frozen LLMsZhanyu Wang0Lingqiao Liu1Lei Wang2Luping Zhou3University of Sydney, New South Wales 2006, AustraliaUniversity of Adelaide, South Australia 5005, AustraliaUniversity of Wollongong, New South Wales 2522, AustraliaUniversity of Sydney, New South Wales 2006, Australia; Corresponding author.Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 ​M parameters (which constitute just 0.07 ​% of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at https://github.com/wang-zhanyu/R2GenGPT.http://www.sciencedirect.com/science/article/pii/S2950162823000334Radiology report generationLarge language modelsLLAMA
spellingShingle Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
R2GenGPT: Radiology Report Generation with frozen LLMs
Meta-Radiology
Radiology report generation
Large language models
LLAMA
title R2GenGPT: Radiology Report Generation with frozen LLMs
title_full R2GenGPT: Radiology Report Generation with frozen LLMs
title_fullStr R2GenGPT: Radiology Report Generation with frozen LLMs
title_full_unstemmed R2GenGPT: Radiology Report Generation with frozen LLMs
title_short R2GenGPT: Radiology Report Generation with frozen LLMs
title_sort r2gengpt radiology report generation with frozen llms
topic Radiology report generation
Large language models
LLAMA
url http://www.sciencedirect.com/science/article/pii/S2950162823000334
work_keys_str_mv AT zhanyuwang r2gengptradiologyreportgenerationwithfrozenllms
AT lingqiaoliu r2gengptradiologyreportgenerationwithfrozenllms
AT leiwang r2gengptradiologyreportgenerationwithfrozenllms
AT lupingzhou r2gengptradiologyreportgenerationwithfrozenllms