DI-VTR: Dual inter-modal interaction model for video-text retrieval

Video-text retrieval is a challenging task for multimodal information processing due to the semantic gap between different modalities. However, most existing methods do not fully mine the intra-modal interactions, as with the temporal correlation of video frames, which results in poor matching perfo...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Guo, Mengying Wang, Wenwei Wang, Yan Zhou, Bin Song
Format: Article
Language:English
Published: KeAi Communications Co., Ltd. 2024-09-01
Series:Journal of Information and Intelligence
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S294971592400026X
Tags: Add Tag
No Tags, Be the first to tag this record!