An adaptive feature fusion strategy using dual-layer attention and multi-modal deep reinforcement learning for all-media similarity search
Abstract This paper proposes a novel adaptive feature fusion strategy that combines a dual-layer attention mechanism and Multi-modal deep reinforcement learning (DRL) to optimize cross-modal information retrieval. The dual-layer attention mechanism enhances the model's ability to capture deep s...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Discover Artificial Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44163-025-00332-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract This paper proposes a novel adaptive feature fusion strategy that combines a dual-layer attention mechanism and Multi-modal deep reinforcement learning (DRL) to optimize cross-modal information retrieval. The dual-layer attention mechanism enhances the model's ability to capture deep semantic relationships between different modalities, while DRL optimizes the feature extraction and fusion process, improving adaptability in complex environments. Experimental results demonstrate that this strategy outperforms traditional CNN and RNN methods in terms of accuracy, recall, and efficiency across a range of cross-modal retrieval tasks, particularly in multi-modal data environments such as text-image, text-video, and image-video. The proposed approach offers a promising solution for improving the accuracy and efficiency of cross-modal information retrieval. |
|---|---|
| ISSN: | 2731-0809 |