An Image Grid Can Be Worth a Video: Zero-Shot Video Question Answering Using a VLM

Stimulated by the sophisticated reasoning capabilities of recent Large Language Models (LLMs), a variety of strategies for bridging video modality have been devised. A prominent strategy involves Video Language Models (VideoLMs), which train a learnable interface with video data to connect advanced...

Full description

Saved in:
Bibliographic Details
Main Authors: Wonkyun Kim, Changin Choi, Wonseok Lee, Wonjong Rhee
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10802898/
Tags: Add Tag
No Tags, Be the first to tag this record!