Open-Vocabulary Action Localization With Iterative Visual Prompting

Video action localization aims to find the timings of specific actions from a long video. Although existing learning-based approaches have been successful, they require annotating videos, which comes with a considerable labor cost. This paper proposes a training-free, open-vocabulary approach based...

Full description

Saved in:

Bibliographic Details
Main Authors:	Naoki Wake, Atsushi Kanehira, Kazuhiro Sasabuchi, Jun Takamatsu, Katsushi Ikeuchi
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Open-vocabulary action localization vision-language models large language models GPT action localization
Online Access:	https://ieeexplore.ieee.org/document/10942370/
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!

Open-Vocabulary Action Localization With Iterative Visual Prompting

Similar Items