A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these lim...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2025-08-01
|
| Series: | Advanced Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/aisy.202400840 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849230073362644992 |
|---|---|
| author | Zishan Gu Fenglin Liu Jiayuan Chen Changchang Yin Ping Zhang |
| author_facet | Zishan Gu Fenglin Liu Jiayuan Chen Changchang Yin Ping Zhang |
| author_sort | Zishan Gu |
| collection | DOAJ |
| description | The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians’ communication by incorporating a learner agent to proactively acquire information from domain‐specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain‐specific sub‐problems; ii) Interact: The agent engages in iterative “ask‐answer” interactions with expert models to obtain domain‐specific knowledge; and iii) Integrate: The agent integrates all the acquired domain‐specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X‐ray images. The experiments show that our zero‐shot prediction achieves state‐of‐the‐art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human–AI interaction and collaboration. |
| format | Article |
| id | doaj-art-31689f4520fe4b98b307e1987eb5f454 |
| institution | Kabale University |
| issn | 2640-4567 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Wiley |
| record_format | Article |
| series | Advanced Intelligent Systems |
| spelling | doaj-art-31689f4520fe4b98b307e1987eb5f4542025-08-21T11:05:47ZengWileyAdvanced Intelligent Systems2640-45672025-08-0178n/an/a10.1002/aisy.202400840A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical ReasoningZishan Gu0Fenglin Liu1Jiayuan Chen2Changchang Yin3Ping Zhang4Department of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Engineering Science, Institute of Biomedical Engineering University of Oxford Oxford UKDepartment of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Computer Science and Engineering The Ohio State University Columbus OH USAThe adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians’ communication by incorporating a learner agent to proactively acquire information from domain‐specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain‐specific sub‐problems; ii) Interact: The agent engages in iterative “ask‐answer” interactions with expert models to obtain domain‐specific knowledge; and iii) Integrate: The agent integrates all the acquired domain‐specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X‐ray images. The experiments show that our zero‐shot prediction achieves state‐of‐the‐art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human–AI interaction and collaboration.https://doi.org/10.1002/aisy.202400840AI agentlarge language modelmultimodal medical reasoning |
| spellingShingle | Zishan Gu Fenglin Liu Jiayuan Chen Changchang Yin Ping Zhang A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning Advanced Intelligent Systems AI agent large language model multimodal medical reasoning |
| title | A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning |
| title_full | A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning |
| title_fullStr | A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning |
| title_full_unstemmed | A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning |
| title_short | A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning |
| title_sort | proactive agent collaborative framework for zero shot multimodal medical reasoning |
| topic | AI agent large language model multimodal medical reasoning |
| url | https://doi.org/10.1002/aisy.202400840 |
| work_keys_str_mv | AT zishangu aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT fenglinliu aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT jiayuanchen aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT changchangyin aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT pingzhang aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT zishangu proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT fenglinliu proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT jiayuanchen proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT changchangyin proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT pingzhang proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning |