RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model

RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely r...

Full description

Saved in:
Bibliographic Details
Main Authors: Yubin Li, Weida Zhan, Yichun Jiang, Jinxin Guo
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/27/4/442
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.
ISSN:1099-4300