Feature Enhancement Network for Infrared Small Target Detection in Complex Backgrounds Based on Multi-Scale Attention Mechanism

The identification of tiny objects via single-frame infrared is a significant challenge in computer vision, primarily due to large variances in target dimensions, overcrowded backgrounds, suboptimal signal-to-noise ratios, the propensity of bounding box regression to vary with target size, and poten...

Full description

Saved in:
Bibliographic Details
Main Authors: Sen Zhang, Weilin Du, Yuan Liu, Ni Zhou, Zheng Li
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/9/4966
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The identification of tiny objects via single-frame infrared is a significant challenge in computer vision, primarily due to large variances in target dimensions, overcrowded backgrounds, suboptimal signal-to-noise ratios, the propensity of bounding box regression to vary with target size, and potential partial occlusion scenarios. Addressing these challenges, we propose a sturdy network for enhancing features in the infrared detection of small targets utilizing multi-scale attention. In particular, the introduction of the Iterative Attentional Feature Fusion (iAFF) module at the detection network’s neck aims to tackle the issue of minor target features being overshadowed in the process of cross-scale feature fusion. Additionally, we present the Occlusion-Aware Attention Module (OAAM), which demonstrates greater tolerance for target localization errors in regions where local features are missing due to partial occlusion. By combining the scale and spatial attention mechanisms of the Dynamic Head, our approach adaptively learns the relative importance of different semantic layers. Furthermore, the integration of Normalized Wasserstein–Gaussian Distance (NWD) aims to tackle the convergence issues associated with the increased sensitivity of bounding box regression in identifying minor infrared targets. For assessing our technique’s efficiency, we present a novel benchmark dataset, IRMT-UAV, noted for its considerable discrepancies in target size, intricate backdrops, and substantial variations in the signal-to-noise ratio. The outcomes of our experiments using the public IRSTD-1k dataset and the internally developed IRMT-UAV dataset reveal that our technique surpasses cutting-edge (SOTA) methods, with mAP50 enhancements of 1.4% and 4.9%, respectively, thus proving our method’s efficiency and sturdiness.
ISSN:2076-3417