Efficient text-to-video retrieval via multi-modal multi-tagger derived pre-screening
Abstract Text-to-video retrieval (TVR) has made significant progress with advances in vision and language representation learning. Most existing methods use real-valued and hash-based embeddings to represent the video and text, allowing retrieval by computing their similarities. However, these metho...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-03-01
|
| Series: | Visual Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44267-025-00073-2 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Be the first to leave a comment!