Attention Mechanism-Based Cognition-Level Scene Understanding

Attention Mechanism-Based Cognition-Level Scene Understanding

Given a question–image input, a visual commonsense reasoning (VCR) model predicts an answer with a corresponding rationale, which requires inference abilities based on real-world knowledge. The VCR task, which calls for exploiting multi-source information as well as learning different levels of unde...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xuejiao Tang, Wenbin Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Information
Subjects:	visual commonsense reasoning visual understanding
Online Access:	https://www.mdpi.com/2078-2489/16/3/203
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Visual Commonsense Causal Reasoning From a Still Image
by: Xiaojing Wu, et al.
Published: (2025-01-01)

Commonsense Spatial Reasoning: an Informational Perspective
by: Stefania Bandini, et al.
Published: (2008-07-01)

MuRelSGG: Multimodal Relationship Prediction for Neurosymbolic Scene Graph Generation
by: Muhammad Junaid Khan, et al.
Published: (2025-01-01)

Evaluating LLMs for visualization generation and understanding
by: Saadiq Rauf Khan, et al.
Published: (2025-05-01)

Enhancing human-centered dynamic scene understanding via multiple LLMs collaborated reasoning
by: Hang Zhang, et al.
Published: (2025-03-01)

A Preservationist Christian Sexual Ethic: Verifying and Vindicating a Contested Perspective
by: Kenneth L. Waters
Published: (2025-06-01)

Enabling High-Level Worker-Centric Semantic Understanding of Onsite Images Using Visual Language Models with Attention Mechanism and Beam Search Strategy
by: Hui Deng, et al.
Published: (2025-03-01)

Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
by: Andre O. Francani, et al.
Published: (2025-01-01)

Sketchy understandings: drawings reveal where students may need additional support to understand scale and abstraction in common representations of DNA
by: Crystal Uminski, et al.
Published: (2025-08-01)

The effectiveness of experiential learning on students' understanding of science and technology
by: Rika Yuliana, et al.
Published: (2025-02-01)

Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
by: Junkai Zhang, et al.
Published: (2025-04-01)

The students' visual reasoning in solving integral problems
by: Ummu Sholihah, et al.
Published: (2023-12-01)

Spatial and non-verbal reasoning abilities in first-year female DVM students before and after 4 h of canine osteology training or 19 h of canine dissection: preliminary study
by: J. Claudio Gutierrez, et al.
Published: (2025-07-01)

Affective feature knowledge interaction for empathetic conversation generation
by: Ensi Chen, et al.
Published: (2022-12-01)

Analyzing Students’ Difficulty in Learning Geometry at Upper Secondary School in Cambodia: A Case Study on Vector
by: Sieng Veasna
Published: (2024-09-01)

Shaping the physical world to our ends through the left PF technical-cognition area
by: François Osiurak, et al.
Published: (2025-04-01)

Adaptive Conditional Reasoning for Remote Sensing Visual Question Answering
by: Yiqun Gao, et al.
Published: (2025-04-01)

LMGDoc: Light Multigranular GNN for Efficient Document Understanding
by: Abdellatif Sassioui, et al.
Published: (2025-01-01)

Fusion of Visual Attention and Scene Descriptions With Deep Reinforcement Learning for AAV Indoor Autonomous Navigation
by: Hussein Samma, et al.
Published: (2025-01-01)

Progression of argumentative reasoning and the relation with visual attention in an interactive learning environment
by: Fang-Ying Yang, Yuan-Li Liu, Shih-Chieh Chien, Yi-Wen Hung
Published: (2025-04-01)

Thoughts, Labyrinths, and Torii
by: Maurício Vieira Kritz
Published: (2024-12-01)

The impact of multiple representations on students' understanding of vector field concepts: Implementation of simulations and sketching activities into lecture-based recitations in undergraduate physics
by: Larissa Hahn, et al.
Published: (2025-04-01)

Children’s attention to online adverts is related to low-level saliency factors and individual level of gaze control
by: Nils Holmberg, et al.
Published: (2015-06-01)

Transsaccadic Scene Memory Revisited: A 'Theory of Visual Attention (TVA)' Based Approach to Recognition Memory and Confidence for Objects in Naturalistic Scenes.
by: Melissa L.-H. Võ, et al.
Published: (2008-12-01)

Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation
by: Sebastian Künzel, et al.
Published: (2025-04-01)

Resource Identification and Level of Understanding of Particle Dynamics Concepts
by: Akbar Kadir Masalesi
Published: (2022-10-01)

Challenges in teaching students to plot equations: Another impact of graphing procedures
by: Ulumul Umah, et al.
Published: (2024-09-01)

A Computational–Cognitive Model of Audio-Visual Attention in Dynamic Environments
by: Hamideh Yazdani, et al.
Published: (2025-05-01)

DeepLabV3+-Based Semantic Annotation Refinement for SLAM in Indoor Environments
by: Shuangfeng Wei, et al.
Published: (2025-05-01)

Multiple item representations in visual working memory simultaneously guide attention
by: Caibin Duan, et al.
Published: (2025-03-01)

TOSD: A Hierarchical Object-Centric Descriptor Integrating Shape, Color, and Topology
by: Jun-Hyeon Choi, et al.
Published: (2025-07-01)

Domain-Incremental Learning Paradigm for scene understanding via Pseudo-Replay Generation
by: Zhifeng Xie, et al.
Published: (2025-09-01)

Spatial Position Reasoning of Image Entities Based on Location Words
by: Xingguo Qin, et al.
Published: (2024-12-01)

Interpersonal Relationship Detection Using Multi-Head Graph Attention Networks With Multi-Feature Fusion
by: Simge Akay, et al.
Published: (2025-01-01)

Multimodal Latent Representation Learning for Video Moment Retrieval
by: Jinkwon Hwang, et al.
Published: (2025-07-01)

The impact of visualization on flexible Bayesian reasoning
by: Katharina Böcherer-Linder, et al.
Published: (2017-05-01)

Deep Learning for Traffic Scene Understanding: A Review
by: Parya Dolatyabi, et al.
Published: (2025-01-01)

Improved Visual SLAM Algorithm Based on Dynamic Scenes
by: Jinxing Niu, et al.
Published: (2024-11-01)

The PSR as a practical principle in Kantian ethics
by: Schafer Karl
Published: (2024-01-01)

A week in neuro-ophthalmology: the Singapore scene
by: Karen B. Reyes, MD, et al.
Published: (2010-12-01)