A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning

The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these lim...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zishan Gu, Fenglin Liu, Jiayuan Chen, Changchang Yin, Ping Zhang
Format:	Article
Language:	English
Published:	Wiley 2025-08-01
Series:	Advanced Intelligent Systems
Subjects:	AI agent large language model multimodal medical reasoning
Online Access:	https://doi.org/10.1002/aisy.202400840
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849230073362644992
author	Zishan Gu Fenglin Liu Jiayuan Chen Changchang Yin Ping Zhang
author_facet	Zishan Gu Fenglin Liu Jiayuan Chen Changchang Yin Ping Zhang
author_sort	Zishan Gu
collection	DOAJ
description	The adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians’ communication by incorporating a learner agent to proactively acquire information from domain‐specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain‐specific sub‐problems; ii) Interact: The agent engages in iterative “ask‐answer” interactions with expert models to obtain domain‐specific knowledge; and iii) Integrate: The agent integrates all the acquired domain‐specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X‐ray images. The experiments show that our zero‐shot prediction achieves state‐of‐the‐art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human–AI interaction and collaboration.
format	Article
id	doaj-art-31689f4520fe4b98b307e1987eb5f454
institution	Kabale University
issn	2640-4567
language	English
publishDate	2025-08-01
publisher	Wiley
record_format	Article
series	Advanced Intelligent Systems
spelling	doaj-art-31689f4520fe4b98b307e1987eb5f4542025-08-21T11:05:47ZengWileyAdvanced Intelligent Systems2640-45672025-08-0178n/an/a10.1002/aisy.202400840A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical ReasoningZishan Gu0Fenglin Liu1Jiayuan Chen2Changchang Yin3Ping Zhang4Department of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Engineering Science, Institute of Biomedical Engineering University of Oxford Oxford UKDepartment of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Computer Science and Engineering The Ohio State University Columbus OH USADepartment of Computer Science and Engineering The Ohio State University Columbus OH USAThe adoption of large language models (LLMs) in healthcare has garnered significant research interest, yet their performance remains limited due to a lack of domain‐specific knowledge, medical reasoning skills, and their unimodal nature, which restricts them to text‐only inputs. To address these limitations, we propose MultiMedRes, a multimodal medical collaborative reasoning framework that simulates human physicians’ communication by incorporating a learner agent to proactively acquire information from domain‐specific expert models. MultiMedRes addresses medical multimodal reasoning problems through three steps i) Inquire: The learner agent decomposes complex medical reasoning problems into multiple domain‐specific sub‐problems; ii) Interact: The agent engages in iterative “ask‐answer” interactions with expert models to obtain domain‐specific knowledge; and iii) Integrate: The agent integrates all the acquired domain‐specific knowledge to address the medical reasoning problems (e.g., identifying the difference of disease levels and abnormality sizes between medical images). We validate the effectiveness of our method on the task of difference visual question answering for X‐ray images. The experiments show that our zero‐shot prediction achieves state‐of‐the‐art performance, surpassing fully supervised methods, which demonstrates that MultiMedRes could offer trustworthy and interpretable assistance to physicians in monitoring the treatment progression of patients, paving the way for effective human–AI interaction and collaboration.https://doi.org/10.1002/aisy.202400840AI agentlarge language modelmultimodal medical reasoning
spellingShingle	Zishan Gu Fenglin Liu Jiayuan Chen Changchang Yin Ping Zhang A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning Advanced Intelligent Systems AI agent large language model multimodal medical reasoning
title	A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_full	A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_fullStr	A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_full_unstemmed	A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_short	A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning
title_sort	proactive agent collaborative framework for zero shot multimodal medical reasoning
topic	AI agent large language model multimodal medical reasoning
url	https://doi.org/10.1002/aisy.202400840
work_keys_str_mv	AT zishangu aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT fenglinliu aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT jiayuanchen aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT changchangyin aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT pingzhang aproactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT zishangu proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT fenglinliu proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT jiayuanchen proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT changchangyin proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning AT pingzhang proactiveagentcollaborativeframeworkforzeroshotmultimodalmedicalreasoning

A Proactive Agent Collaborative Framework for Zero‐Shot Multimodal Medical Reasoning

Similar Items