RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model

RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely r...

Full description

Saved in:
Bibliographic Details
Main Authors: Yubin Li, Weida Zhan, Yichun Jiang, Jinxin Guo
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/27/4/442
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849713658764984320
author Yubin Li
Weida Zhan
Yichun Jiang
Jinxin Guo
author_facet Yubin Li
Weida Zhan
Yichun Jiang
Jinxin Guo
author_sort Yubin Li
collection DOAJ
description RGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.
format Article
id doaj-art-165daba04a964a438f6fd04063dfe2a4
institution DOAJ
issn 1099-4300
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Entropy
spelling doaj-art-165daba04a964a438f6fd04063dfe2a42025-08-20T03:13:54ZengMDPI AGEntropy1099-43002025-04-0127444210.3390/e27040442RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation ModelYubin Li0Weida Zhan1Yichun Jiang2Jinxin Guo3The College of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaThe College of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaThe College of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaThe College of Electronic and Information Engineering, Changchun University of Science and Technology, Changchun 130022, ChinaRGB-thermal object detection harnesses complementary information from visible and thermal modalities to enhance detection robustness in challenging environments, particularly under low-light conditions. However, existing approaches suffer from limitations due to their heavy dependence on precisely registered data and insufficient handling of cross-modal distribution disparities. This paper presents RDCRNet, a novel framework incorporating a Cross-Modal Representation Model to effectively address these challenges. The proposed network features a Cross-Modal Feature Remapping Module that aligns modality distributions through statistical normalization and learnable correction parameters, significantly reducing feature discrepancies between modalities. A Cross-Modal Refinement and Interaction Module enables sophisticated bidirectional information exchange via trinity refinement for intra-modal context modeling and cross-attention mechanisms for unaligned feature fusion. Multiscale detection capability is enhanced through a Cross-Scale Feature Integration Module, improving detection performance across various object sizes. To overcome the inherent data scarcity in RGB-T detection, we introduce a self-supervised pretraining strategy that combines masked reconstruction with adversarial learning and semantic consistency loss, effectively leveraging both aligned and unaligned RGB-T samples. Extensive experiments demonstrate that RDCRNet achieves state-of-the-art performance on multiple benchmark datasets while maintaining high computational and storage efficiency, validating its superiority and practical effectiveness in real-world applications.https://www.mdpi.com/1099-4300/27/4/442object detectionmultimodalcross-modal representationpretraining
spellingShingle Yubin Li
Weida Zhan
Yichun Jiang
Jinxin Guo
RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model
Entropy
object detection
multimodal
cross-modal representation
pretraining
title RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model
title_full RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model
title_fullStr RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model
title_full_unstemmed RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model
title_short RDCRNet: RGB-T Object Detection Network Based on Cross-Modal Representation Model
title_sort rdcrnet rgb t object detection network based on cross modal representation model
topic object detection
multimodal
cross-modal representation
pretraining
url https://www.mdpi.com/1099-4300/27/4/442
work_keys_str_mv AT yubinli rdcrnetrgbtobjectdetectionnetworkbasedoncrossmodalrepresentationmodel
AT weidazhan rdcrnetrgbtobjectdetectionnetworkbasedoncrossmodalrepresentationmodel
AT yichunjiang rdcrnetrgbtobjectdetectionnetworkbasedoncrossmodalrepresentationmodel
AT jinxinguo rdcrnetrgbtobjectdetectionnetworkbasedoncrossmodalrepresentationmodel