Medical Knowledge-Based Differential Image Visual Question Answering

Visual Question Answering (VQA) technology shows great promise for cross-disciplinary applications, with its integration into the medical field emerging as a major research focus in recent years. The current mainstream medical visual question answering (VQA) models only support single-image input, w...

Full description

Saved in:
Bibliographic Details
Main Authors: Fangpeng Lu, Songyan Liu, Wenbin Lu, Peng Chen, Boyang Ding
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10980296/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850131780694179840
author Fangpeng Lu
Songyan Liu
Wenbin Lu
Peng Chen
Boyang Ding
author_facet Fangpeng Lu
Songyan Liu
Wenbin Lu
Peng Chen
Boyang Ding
author_sort Fangpeng Lu
collection DOAJ
description Visual Question Answering (VQA) technology shows great promise for cross-disciplinary applications, with its integration into the medical field emerging as a major research focus in recent years. The current mainstream medical visual question answering (VQA) models only support single-image input, whereas differential medical VQA can support multiple images and answer questions, including those about differences between images. However, these approaches primarily focus on extracting information from images while neglecting the inherent information and relevant relationships associated with the diseases themselves. Therefore, this paper proposes a differential medical visual question answering method based on medical knowledge, comprising three modules: the feature encoding module, the feature processing module, and the answer generation module. The proposed method first learns cluster embeddings of medical knowledge features through the feature encoding module, which are then interactively learned with image and text features. Following differential operations, these features are fed into the feature attention module, and subsequently into the answer generation module to produce the final answer. Experimental results demonstrate that our method significantly enhances the performance of the differential medical visual question answering task. This advancement is of considerable reference value in improving the applicability and interpretability of medical visual question answering.
format Article
id doaj-art-7331a009741d437384979edbf8d03a0e
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-7331a009741d437384979edbf8d03a0e2025-08-20T02:32:22ZengIEEEIEEE Access2169-35362025-01-0113938189382910.1109/ACCESS.2025.356569510980296Medical Knowledge-Based Differential Image Visual Question AnsweringFangpeng Lu0https://orcid.org/0009-0006-3449-7424Songyan Liu1https://orcid.org/0009-0009-4602-6031Wenbin Lu2Peng Chen3Boyang Ding4School of Electronic Engineering, Heilongjiang University, Harbin, Heilongjiang, ChinaSchool of Electronic Engineering, Heilongjiang University, Harbin, Heilongjiang, ChinaSchool of Electronic Engineering, Heilongjiang University, Harbin, Heilongjiang, ChinaSchool of Electronic Engineering, Heilongjiang University, Harbin, Heilongjiang, ChinaSchool of Electronic Engineering, Heilongjiang University, Harbin, Heilongjiang, ChinaVisual Question Answering (VQA) technology shows great promise for cross-disciplinary applications, with its integration into the medical field emerging as a major research focus in recent years. The current mainstream medical visual question answering (VQA) models only support single-image input, whereas differential medical VQA can support multiple images and answer questions, including those about differences between images. However, these approaches primarily focus on extracting information from images while neglecting the inherent information and relevant relationships associated with the diseases themselves. Therefore, this paper proposes a differential medical visual question answering method based on medical knowledge, comprising three modules: the feature encoding module, the feature processing module, and the answer generation module. The proposed method first learns cluster embeddings of medical knowledge features through the feature encoding module, which are then interactively learned with image and text features. Following differential operations, these features are fed into the feature attention module, and subsequently into the answer generation module to produce the final answer. Experimental results demonstrate that our method significantly enhances the performance of the differential medical visual question answering task. This advancement is of considerable reference value in improving the applicability and interpretability of medical visual question answering.https://ieeexplore.ieee.org/document/10980296/Visual question answeringmedical knowledge graphmultimodal
spellingShingle Fangpeng Lu
Songyan Liu
Wenbin Lu
Peng Chen
Boyang Ding
Medical Knowledge-Based Differential Image Visual Question Answering
IEEE Access
Visual question answering
medical knowledge graph
multimodal
title Medical Knowledge-Based Differential Image Visual Question Answering
title_full Medical Knowledge-Based Differential Image Visual Question Answering
title_fullStr Medical Knowledge-Based Differential Image Visual Question Answering
title_full_unstemmed Medical Knowledge-Based Differential Image Visual Question Answering
title_short Medical Knowledge-Based Differential Image Visual Question Answering
title_sort medical knowledge based differential image visual question answering
topic Visual question answering
medical knowledge graph
multimodal
url https://ieeexplore.ieee.org/document/10980296/
work_keys_str_mv AT fangpenglu medicalknowledgebaseddifferentialimagevisualquestionanswering
AT songyanliu medicalknowledgebaseddifferentialimagevisualquestionanswering
AT wenbinlu medicalknowledgebaseddifferentialimagevisualquestionanswering
AT pengchen medicalknowledgebaseddifferentialimagevisualquestionanswering
AT boyangding medicalknowledgebaseddifferentialimagevisualquestionanswering