Deep Memory Fusion Model for Long Video Question Answering
Long video question answering contains rich multimodal semantic information and inference information. At present, it is difficult for video question answering models based on recurrent neural networks to fully retain important memory information, to ignore irrelevant redundant information and to ac...
Saved in:
| Main Authors: | SUN Guanglu, WU Meng, QIU Jing, LIANG Lili |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
Harbin University of Science and Technology Publications
2021-02-01
|
| Series: | Journal of Harbin University of Science and Technology |
| Subjects: | |
| Online Access: | https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=1911 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
MusiQAl: A Dataset for Music Question–Answering through Audio–Video Fusion
by: Anna-Maria Christodoulou, et al.
Published: (2025-07-01) -
MSAM:Video Question Answering Based on Multi-Stage Attention Model
by: LIANG Li-li, et al.
Published: (2022-08-01) -
An Image Grid Can Be Worth a Video: Zero-Shot Video Question Answering Using a VLM
by: Wonkyun Kim, et al.
Published: (2024-01-01) -
Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
by: Junkai Zhang, et al.
Published: (2025-04-01) -
Enhancing Visual Question Answering for Multiple Choice Questions
by: Rashi Goel, et al.
Published: (2025-01-01)