Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screening

Abstract Text-to-video retrieval (TVR) has made significant progress with advances in vision and language representation learning. Most existing methods use real-valued and hash-based embeddings to represent the video and text, allowing retrieval by computing their similarities. However, these metho...

Full description

Saved in:
Bibliographic Details
Main Authors: Yingjia Xu, Mengxia Wu, Zixin Guo, Min Cao, Mang Ye, Jorma Laaksonen
Format: Article
Language:English
Published: Springer 2025-03-01
Series:Visual Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44267-025-00073-2
Tags: Add Tag
No Tags, Be the first to tag this record!

Similar Items