An adaptive feature fusion strategy using dual-layer attention and multi-modal deep reinforcement learning for all-media similarity search

Abstract This paper proposes a novel adaptive feature fusion strategy that combines a dual-layer attention mechanism and Multi-modal deep reinforcement learning (DRL) to optimize cross-modal information retrieval. The dual-layer attention mechanism enhances the model's ability to capture deep s...

Full description

Saved in:
Bibliographic Details
Main Authors: Jin Yue, Jiayun Lang, Rui Feng
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00332-7
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This paper proposes a novel adaptive feature fusion strategy that combines a dual-layer attention mechanism and Multi-modal deep reinforcement learning (DRL) to optimize cross-modal information retrieval. The dual-layer attention mechanism enhances the model's ability to capture deep semantic relationships between different modalities, while DRL optimizes the feature extraction and fusion process, improving adaptability in complex environments. Experimental results demonstrate that this strategy outperforms traditional CNN and RNN methods in terms of accuracy, recall, and efficiency across a range of cross-modal retrieval tasks, particularly in multi-modal data environments such as text-image, text-video, and image-video. The proposed approach offers a promising solution for improving the accuracy and efficiency of cross-modal information retrieval.
ISSN:2731-0809