Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities

Multispectral detection leverages visible and infrared imaging to improve detection performance in complex environments. However, conventional convolution-based fusion methods predominantly rely on local feature interactions, limiting their capacity to fully exploit cross-modal information and makin...

Full description

Saved in:
Bibliographic Details
Main Authors: Yongsheng Zhao, Yuxing Gao, Xu Yang, Luyang Yang
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/11/5857
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849722618760921088
author Yongsheng Zhao
Yuxing Gao
Xu Yang
Luyang Yang
author_facet Yongsheng Zhao
Yuxing Gao
Xu Yang
Luyang Yang
author_sort Yongsheng Zhao
collection DOAJ
description Multispectral detection leverages visible and infrared imaging to improve detection performance in complex environments. However, conventional convolution-based fusion methods predominantly rely on local feature interactions, limiting their capacity to fully exploit cross-modal information and making them more susceptible to interference from complex backgrounds. To overcome these challenges, the YOLO-MEDet multispectral target detection model is proposed. Firstly, the YOLOv5 architecture is redesigned into a two-stream backbone network, incorporating a midway fusion strategy to integrate multimodal features from the C3 to C5 layers, thereby enhancing detection accuracy and robustness. Secondly, the Attention-Enhanced Feature Fusion Framework (AEFF) is introduced to optimize both cross-modal and intra-modal feature representations by employing an attention mechanism, effectively boosting model performance. Finally, the C3-PSA (C3 Pyramid Compressed Attention) module is integrated to reinforce multiscale spatial feature extraction and refine feature representation, ultimately improving detection accuracy while reducing false alarms and missed detections in complex scenarios. Extensive experiments on the FLIR, KAIST, and M3FD datasets, along with additional validation using SimuNPS simulations, confirm the superiority of YOLO-MEDet. The results indicate that the proposed model outperforms existing approaches across multiple evaluation metrics, providing an innovative solution for multispectral target detection.
format Article
id doaj-art-5a481b49be7c4451a650d83c6bbd7622
institution DOAJ
issn 2076-3417
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-5a481b49be7c4451a650d83c6bbd76222025-08-20T03:11:18ZengMDPI AGApplied Sciences2076-34172025-05-011511585710.3390/app15115857Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared ModalitiesYongsheng Zhao0Yuxing Gao1Xu Yang2Luyang Yang3College of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaCollege of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaCollege of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaCollege of Marine Electrical Engineering, Dalian Maritime University, Dalian 116026, ChinaMultispectral detection leverages visible and infrared imaging to improve detection performance in complex environments. However, conventional convolution-based fusion methods predominantly rely on local feature interactions, limiting their capacity to fully exploit cross-modal information and making them more susceptible to interference from complex backgrounds. To overcome these challenges, the YOLO-MEDet multispectral target detection model is proposed. Firstly, the YOLOv5 architecture is redesigned into a two-stream backbone network, incorporating a midway fusion strategy to integrate multimodal features from the C3 to C5 layers, thereby enhancing detection accuracy and robustness. Secondly, the Attention-Enhanced Feature Fusion Framework (AEFF) is introduced to optimize both cross-modal and intra-modal feature representations by employing an attention mechanism, effectively boosting model performance. Finally, the C3-PSA (C3 Pyramid Compressed Attention) module is integrated to reinforce multiscale spatial feature extraction and refine feature representation, ultimately improving detection accuracy while reducing false alarms and missed detections in complex scenarios. Extensive experiments on the FLIR, KAIST, and M3FD datasets, along with additional validation using SimuNPS simulations, confirm the superiority of YOLO-MEDet. The results indicate that the proposed model outperforms existing approaches across multiple evaluation metrics, providing an innovative solution for multispectral target detection.https://www.mdpi.com/2076-3417/15/11/5857multispectral pedestrian detectioncross-modal feature fusionattention mechanismsmultiscale feature extractionSimuNPS
spellingShingle Yongsheng Zhao
Yuxing Gao
Xu Yang
Luyang Yang
Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities
Applied Sciences
multispectral pedestrian detection
cross-modal feature fusion
attention mechanisms
multiscale feature extraction
SimuNPS
title Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities
title_full Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities
title_fullStr Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities
title_full_unstemmed Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities
title_short Multispectral Target Detection Based on Deep Feature Fusion of Visible and Infrared Modalities
title_sort multispectral target detection based on deep feature fusion of visible and infrared modalities
topic multispectral pedestrian detection
cross-modal feature fusion
attention mechanisms
multiscale feature extraction
SimuNPS
url https://www.mdpi.com/2076-3417/15/11/5857
work_keys_str_mv AT yongshengzhao multispectraltargetdetectionbasedondeepfeaturefusionofvisibleandinfraredmodalities
AT yuxinggao multispectraltargetdetectionbasedondeepfeaturefusionofvisibleandinfraredmodalities
AT xuyang multispectraltargetdetectionbasedondeepfeaturefusionofvisibleandinfraredmodalities
AT luyangyang multispectraltargetdetectionbasedondeepfeaturefusionofvisibleandinfraredmodalities