R2GenGPT: Radiology Report Generation with frozen LLMs
Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in moda...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
KeAi Communications Co., Ltd.
2023-11-01
|
| Series: | Meta-Radiology |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2950162823000334 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850194058433003520 |
|---|---|
| author | Zhanyu Wang Lingqiao Liu Lei Wang Luping Zhou |
| author_facet | Zhanyu Wang Lingqiao Liu Lei Wang Luping Zhou |
| author_sort | Zhanyu Wang |
| collection | DOAJ |
| description | Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 M parameters (which constitute just 0.07 % of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at https://github.com/wang-zhanyu/R2GenGPT. |
| format | Article |
| id | doaj-art-e35f5633af0d481cab241a2e8bedf6cd |
| institution | OA Journals |
| issn | 2950-1628 |
| language | English |
| publishDate | 2023-11-01 |
| publisher | KeAi Communications Co., Ltd. |
| record_format | Article |
| series | Meta-Radiology |
| spelling | doaj-art-e35f5633af0d481cab241a2e8bedf6cd2025-08-20T02:14:06ZengKeAi Communications Co., Ltd.Meta-Radiology2950-16282023-11-011310003310.1016/j.metrad.2023.100033R2GenGPT: Radiology Report Generation with frozen LLMsZhanyu Wang0Lingqiao Liu1Lei Wang2Luping Zhou3University of Sydney, New South Wales 2006, AustraliaUniversity of Adelaide, South Australia 5005, AustraliaUniversity of Wollongong, New South Wales 2522, AustraliaUniversity of Sydney, New South Wales 2006, Australia; Corresponding author.Large Language Models (LLMs) have consistently showcased remarkable generalization capa-bilities when applied to various language tasks. Nonetheless, harnessing the full potential of LLMs for Radiology Report Generation (R2Gen) still presents a challenge, stemming from the inherent disparity in modality between LLMs and the R2Gen task. To bridge this gap effectively, we propose R2GenGPT, which is a novel solution that aligns visual features with the word embedding space of LLMs using an efficient visual alignment module. This innovative approach empowers the previously static LLM to seamlessly integrate and process image information, marking a step forward in optimizing R2Gen performance. R2GenGPT offers the following benefits. First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM. Second, it exhibits high training efficiency, as it requires the training of an exceptionally minimal number of parameters while achieving rapid convergence. By employing delta tuning, our model only trains 5 M parameters (which constitute just 0.07 % of the total parameter count) to achieve performance close to the SOTA levels. Our code is available at https://github.com/wang-zhanyu/R2GenGPT.http://www.sciencedirect.com/science/article/pii/S2950162823000334Radiology report generationLarge language modelsLLAMA |
| spellingShingle | Zhanyu Wang Lingqiao Liu Lei Wang Luping Zhou R2GenGPT: Radiology Report Generation with frozen LLMs Meta-Radiology Radiology report generation Large language models LLAMA |
| title | R2GenGPT: Radiology Report Generation with frozen LLMs |
| title_full | R2GenGPT: Radiology Report Generation with frozen LLMs |
| title_fullStr | R2GenGPT: Radiology Report Generation with frozen LLMs |
| title_full_unstemmed | R2GenGPT: Radiology Report Generation with frozen LLMs |
| title_short | R2GenGPT: Radiology Report Generation with frozen LLMs |
| title_sort | r2gengpt radiology report generation with frozen llms |
| topic | Radiology report generation Large language models LLAMA |
| url | http://www.sciencedirect.com/science/article/pii/S2950162823000334 |
| work_keys_str_mv | AT zhanyuwang r2gengptradiologyreportgenerationwithfrozenllms AT lingqiaoliu r2gengptradiologyreportgenerationwithfrozenllms AT leiwang r2gengptradiologyreportgenerationwithfrozenllms AT lupingzhou r2gengptradiologyreportgenerationwithfrozenllms |