A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning

The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these lim...

Full description

Saved in:
Bibliographic Details
Main Authors: Zishan Gu, Fenglin Liu, Jiayuan Chen, Changchang Yin, Ping Zhang
Format: Article
Language:English
Published: Wiley 2025-08-01
Series:Advanced Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1002/aisy.202400840
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849230073362644992
author Zishan Gu
Fenglin Liu
Jiayuan Chen
Changchang Yin
Ping Zhang
author_facet Zishan Gu
Fenglin Liu
Jiayuan Chen
Changchang Yin
Ping Zhang
author_sort Zishan Gu
collection DOAJ
description The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians’ communication by incorporating a learner agent to proactively acquire information from domain‐specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain‐specific sub‐problems; ii) Interact: The agent engages in iterative “ask‐answer” interactions with expert models to obtain domain‐specific knowledge; and iii) Integrate: The agent integrates all the acquired domain‐specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X‐ray images. The experiments show that our zero‐shot prediction achieves state‐of‐the‐art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human–AI interaction and collaboration.
format Article
id doaj-art-31689f4520fe4b98b307e1987eb5f454
institution Kabale University
issn 2640-4567
language English
publishDate 2025-08-01
publisher Wiley
record_format Article
series Advanced Intelligent Systems
spelling doaj-art-31689f4520fe4b98b307e1987eb5f4542025-08-21T11:05:47ZengWileyAdvanced Intelligent Systems2640-45672025-08-0178n/an/a10.1002/aisy.202400840A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical ReasoningZishan Gu0Fenglin Liu1Jiayuan Chen2Changchang Yin3Ping Zhang4Department of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Engineering Science, Institute of Biomedical Engineering University of Oxford Oxford UKDepartment of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Computer Science and Engineering The Ohio State University Columbus OH USAThe adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians’ communication by incorporating a learner agent to proactively acquire information from domain‐specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain‐specific sub‐problems; ii) Interact: The agent engages in iterative “ask‐answer” interactions with expert models to obtain domain‐specific knowledge; and iii) Integrate: The agent integrates all the acquired domain‐specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X‐ray images. The experiments show that our zero‐shot prediction achieves state‐of‐the‐art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human–AI interaction and collaboration.https://doi.org/10.1002/aisy.202400840AI agentlarge language modelmultimodal medical reasoning
spellingShingle Zishan Gu
Fenglin Liu
Jiayuan Chen
Changchang Yin
Ping Zhang
A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
Advanced Intelligent Systems
AI agent
large language model
multimodal medical reasoning
title A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_full A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_fullStr A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_full_unstemmed A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_short A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_sort proactive agent collaborative framework for zero shot multimodal medical reasoning
topic AI agent
large language model
multimodal medical reasoning
url https://doi.org/10.1002/aisy.202400840
work_keys_str_mv AT zishangu aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT fenglinliu aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT jiayuanchen aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT changchangyin aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT pingzhang aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT zishangu proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT fenglinliu proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT jiayuanchen proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT changchangyin proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning
AT pingzhang proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning