Improving Visual Question Answering by Image Captioning

Improving Visual Question Answering by Image Captioning

Visual Question Answering (VQA) is a challenging task that bridges the computer vision and natural language processing communities. It provide natural language answers to questions related to an associated image. Most existing VQA methods focus on the fusion and inference of visual features with the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiangjun Shao, Hongsong Dong, Guangsheng Wu
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Deep learning image captioning multimodal learning visual question answering
Online Access:	https://ieeexplore.ieee.org/document/10918635/
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Medical Knowledge-Based Differential Image Visual Question Answering
by: Fangpeng Lu, et al.
Published: (2025-01-01)

Adaptive Conditional Reasoning for Remote Sensing Visual Question Answering
by: Yiqun Gao, et al.
Published: (2025-04-01)

Visual Question Answering in Robotic Surgery: A Comprehensive Review
by: Di Ding, et al.
Published: (2025-01-01)

Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
by: Junkai Zhang, et al.
Published: (2025-04-01)

Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering
by: Faheem Shehzad, et al.
Published: (2024-01-01)

ZPVQA: Visual Question Answering of Images Based on Zero-Shot Prompt Learning
by: Naihao Hu, et al.
Published: (2025-01-01)

Seeing and Reasoning: A Simple Deep Learning Approach to Visual Question Answering
by: Rufai Yusuf Zakari, et al.
Published: (2025-04-01)

Enhancing Visual Question Answering for Multiple Choice Questions
by: Rashi Goel, et al.
Published: (2025-01-01)

Enhanced CLIP-GPT Framework for Cross-Lingual Remote Sensing Image Captioning
by: Rui Song, et al.
Published: (2025-01-01)

AVCaps: An Audio-Visual Dataset With Modality-Specific Captions
by: Parthasaarathy Sudarsanam, et al.
Published: (2025-01-01)

Multimodal representative answer extraction in community question answering
by: Ming Li, et al.
Published: (2023-10-01)

BVQA: Connecting Language and Vision Through Multimodal Attention for Open-Ended Question Answering
by: Md. Shalha Mucha Bhuyan, et al.
Published: (2025-01-01)

Assessing the performance of zero-shot visual question answering in multimodal large language models for 12-lead ECG image interpretation
by: Tomohisa Seki, et al.
Published: (2025-02-01)

Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation
by: Sebastian Künzel, et al.
Published: (2025-04-01)

Analysis of The Use of Discussion And Question And Answer Methods As an Effort to Improve Student Physics Learning Outcomes
by: Delia Sapitri, et al.
Published: (2023-06-01)

Tiny TR-CAP: A novel small-scale benchmark dataset for general-purpose image captioning tasks
by: Abbas Memiş, et al.
Published: (2025-04-01)

Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images
by: Erfan Zolghadriha, et al.
Published: (2024-03-01)

A Semantic Weight Adaptive Model Based on Visual Question Answering
by: Li Huimin, et al.
Published: (2025-01-01)

SHIFA: SBERT-Based Healthcare Information Focused Arabic Question Answering
by: Rahaf Alruwaithi, et al.
Published: (2025-01-01)

Giving Questions and Getting Answers (GQGA) Strategy Improves Biology Learning Outcomes
by: Muhammad Eval Setiawan, et al.
Published: (2019-12-01)

Affective Image Captioning for Visual Artworks Using Emotion-Based Cross-Attention Mechanisms
by: Shintaro Ishikawa, et al.
Published: (2023-01-01)

MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
by: Anna-Maria Christodoulou, et al.
Published: (2025-07-01)

Analyzing Diagnostic Reasoning of Vision–Language Models via Zero-Shot Chain-of-Thought Prompting in Medical Visual Question Answering
by: Fatema Tuj Johora Faria, et al.
Published: (2025-07-01)

Expert Detection In Question Answer Communities
by: Hamed Salimian, et al.
Published: (2022-01-01)

A novel image captioning model with visual-semantic similarities and visual representations re-weighting
by: Alaa Thobhani, et al.
Published: (2024-09-01)

Integrating visual memory for image captioning
by: Jiahui Wei, et al.
Published: (2025-05-01)

ReceiptQA: A Question-Answering Dataset for Receipt Understanding
by: Mahmoud Abdalla, et al.
Published: (2025-05-01)

The role of answer content and length when preparing answers to questions
by: Ruth Elizabeth Corps, et al.
Published: (2024-07-01)

Deep Memory Fusion Model for Long Video Question Answering
by: SUN Guanglu, et al.
Published: (2021-02-01)

Enhancing the performance of neurosurgery medical question-answering systems using a multi-task knowledge graph-augmented answer generation model
by: Ting Pan, et al.
Published: (2025-05-01)

Knowledge injection methods in question answering
by: D. V. Radyush
Published: (2025-06-01)

Adapting an English Corpus and a Question Answering System for Slovene
by: Uroš Šmajdek, et al.
Published: (2023-09-01)

Preliminary Study on Image Captioning for Construction Hazards
by: Wen-Ta Hsiao, et al.
Published: (2024-08-01)

Cross-Encoder-Based Semantic Evaluation of Extractive and Generative Question Answering in Low-Resourced African Languages
by: Funebi Francis Ijebu, et al.
Published: (2025-03-01)

The battle of question formats: a comparative study of retrieval practice using very short answer questions and multiple choice questions
by: Elise V. van Wijk, et al.
Published: (2024-12-01)

Generative Models for Multiple-Choice Question Answering in Portuguese: A Monolingual and Multilingual Experimental Study
by: Guilherme Dallmann Lima, et al.
Published: (2025-05-01)

Human Scene Understanding Mechanism-Based Image Captioning for Blind Assistance
by: Jong-Hoon Kim, et al.
Published: (2025-01-01)

Offline visual aid system for the blind based on image captioning
by: Yue CHEN, et al.
Published: (2022-01-01)

Rhetorical questions as aggressive, friendly or sarcastic/ironical questions with imposed answers
by: Džemal Špago
Published: (2025-01-01)

Improved IEC performance via emotional stimuli-aware captioning
by: Zibo Zhou, et al.
Published: (2025-07-01)