Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions With Multi-Level Representations

Text-Video Retrieval plays an important role in multi-modal understanding and has attracted increasing attention in recent years. Most existing methods focus on constructing contrastive pairs between whole videos and complete caption sentences, while overlooking fine-grained cross-modal relationship...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Jiang, Shaobo Min, Weijie Kong, Hongfa Wang, Zhifeng Li, Wei Liu
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9979153/
Tags: Add Tag
No Tags, Be the first to tag this record!