RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model

RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely r...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yubin Li, Weida Zhan, Yichun Jiang, Jinxin Guo
Format:	Article
Language:	English
Published:	MDPI AG 2025-04-01
Series:	Entropy
Subjects:	object detection multimodal cross-modal representation pretraining
Online Access:	https://www.mdpi.com/1099-4300/27/4/442
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.
ISSN:	1099-4300

RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model

Similar Items