DCLMA: Deep correlation learning with multi-modal attention for visual-audio retrieval

The cross-modal retrieval task aims to retrieve audio modality information from the database that best matches the visual modality and vice versa. One of the key challenges in this field is the inconsistency of audio and visual features, which increases the complexity of capturing cross-modal inform...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiwei Zhang, Hirotaka Hachiya
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Machine Learning with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666827025000787
Tags: Add Tag
No Tags, Be the first to tag this record!