TCAINet an RGB T salient object detection model with cross modal fusion and adaptive decoding
Abstract In the field of deep learning-based object detection, RGB-T salient object detection (SOD) networks show significant potential for cross-modal information fusion. However, existing methods still face considerable challenges in complex scenes. Specifically, current cross-modal feature fusion...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-04-01
|
| Series: | Scientific Reports |
| Online Access: | https://doi.org/10.1038/s41598-025-98423-z |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract In the field of deep learning-based object detection, RGB-T salient object detection (SOD) networks show significant potential for cross-modal information fusion. However, existing methods still face considerable challenges in complex scenes. Specifically, current cross-modal feature fusion approaches fail to exploit the complementary information between modalities fully, resulting in limited robustness when handling diverse inputs. Furthermore, inadequate adaptation to multi-scale features hinders accurately recognizing salient objects at different scales. Although some feature decoding strategies attempt to mitigate noise interference, they often struggle in high-noise environments and lack flexible feature weighting, further restricting fusion capabilities. To address these limitations, this paper proposes a novel salient object detection network, TCAINet. The network integrates a Channel Attention (CA) mechanism, an enhanced cross-modal fusion module (CAF), and an adaptive decoder (AAD) to improve both the depth and breadth of feature fusion. Additionally, diverse noise addition and augmentation methods are applied during data preprocessing to boost the model’s robustness and adaptability. Specifically, the CA module enhances the model’s feature selection ability, while the CAF and AAD modules optimize the integration and processing of multimodal information. Experimental results demonstrate that TCAINet outperforms existing methods across multiple evaluation metrics, proving its effectiveness and practicality in complex scenes. Notably, the proposed model achieves improvements of 0.653%, 1.384%, 1.019%, and 5.83% in Sm, Em, Fm, and MAE metrics, respectively, confirming its efficacy in enhancing detection accuracy and optimizing feature fusion. The code and results can be found at the following link:huyunfei0219/TCAINet. |
|---|---|
| ISSN: | 2045-2322 |