Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screening

Abstract Text-to-video retrieval (TVR) has made significant progress with advances in vision and language representation learning. Most existing methods use real-valued and hash-based embeddings to represent the video and text, allowing retrieval by computing their similarities. However, these metho...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yingjia Xu, Mengxia Wu, Zixin Guo, Min Cao, Mang Ye, Jorma Laaksonen
Format:	Article
Language:	English
Published:	Springer 2025-03-01
Series:	Visual Intelligence
Subjects:	Text-to-video retrieval (TVR) Inverted index Pre-screening Contrastive learning (CL)
Online Access:	https://doi.org/10.1007/s44267-025-00073-2
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!

Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screening

Similar Items