Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering
In the realm of Visual Question Answering, accurate answers often hinge on the harmonious fusion of textual and visual elements. While these complex architectures are effective, they typically come with a hefty price tag: a large number of parameters that demand significant processing power and leng...
Saved in:
| Main Authors: | Faheem Shehzad, Aniello Minutolo, Massimo Esposito |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2024-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10811881/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
A Semantic Weight Adaptive Model Based on Visual Question Answering
by: Li Huimin, et al.
Published: (2025-01-01) -
Envisioning Answers: Unleashing Deep Learning for Visual Question Answering in Artistic Images
by: Erfan Zolghadriha, et al.
Published: (2024-03-01) -
Medical Knowledge-Based Differential Image Visual Question Answering
by: Fangpeng Lu, et al.
Published: (2025-01-01) -
Adaptive Conditional Reasoning for Remote Sensing Visual Question Answering
by: Yiqun Gao, et al.
Published: (2025-04-01) -
Seeing and Reasoning: A Simple Deep Learning Approach to Visual Question Answering
by: Rufai Yusuf Zakari, et al.
Published: (2025-04-01)