Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screening
Abstract Text-to-video retrieval (TVR) has made significant progress with advances in vision and language representation learning. Most existing methods use real-valued and hash-based embeddings to represent the video and text, allowing retrieval by computing their similarities. However, these metho...
Saved in:
| Main Authors: | Yingjia Xu, Mengxia Wu, Zixin Guo, Min Cao, Mang Ye, Jorma Laaksonen |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-03-01
|
| Series: | Visual Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44267-025-00073-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions With Multi-Level Representations
by: Jie Jiang, et al.
Published: (2025-01-01) -
DI-VTR: Dual inter-modal interaction model for video-text retrieval
by: Jie Guo, et al.
Published: (2024-09-01) -
Dialogue-to-Video Retrieval via Multi-Grained Attention Network
by: Yi Yu, et al.
Published: (2025-01-01) -
Hierarchical multi‐modal video summarization with dynamic sampling
by: Lingjian Yu, et al.
Published: (2024-12-01) -
Strong and Weak Prompt Engineering for Remote Sensing Image-Text Cross-Modal Retrieval
by: Tianci Sun, et al.
Published: (2025-01-01)