Medical Knowledge-Based Differential Image Visual Question Answering
Visual Question Answering (VQA) technology shows great promise for cross-disciplinary applications, with its integration into the medical field emerging as a major research focus in recent years. The current mainstream medical visual question answering (VQA) models only support single-image input, w...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10980296/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Visual Question Answering (VQA) technology shows great promise for cross-disciplinary applications, with its integration into the medical field emerging as a major research focus in recent years. The current mainstream medical visual question answering (VQA) models only support single-image input, whereas differential medical VQA can support multiple images and answer questions, including those about differences between images. However, these approaches primarily focus on extracting information from images while neglecting the inherent information and relevant relationships associated with the diseases themselves. Therefore, this paper proposes a differential medical visual question answering method based on medical knowledge, comprising three modules: the feature encoding module, the feature processing module, and the answer generation module. The proposed method first learns cluster embeddings of medical knowledge features through the feature encoding module, which are then interactively learned with image and text features. Following differential operations, these features are fed into the feature attention module, and subsequently into the answer generation module to produce the final answer. Experimental results demonstrate that our method significantly enhances the performance of the differential medical visual question answering task. This advancement is of considerable reference value in improving the applicability and interpretability of medical visual question answering. |
|---|---|
| ISSN: | 2169-3536 |