MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images

Abstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation a...

Full description

Saved in:
Bibliographic Details
Main Authors: Xun Li, Yuzhen Zhao, Yang Zhao, Zhun Guo, Jianjing Gao, Baoxi Yuan
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01901-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation and blurred edges, existing methods struggle to maintain high detection accuracy while ensuring efficient inference speed. To address this, this paper proposes a Multi-branch Large-Kernel TRansformer network (MLK-TR) for small target detection in UAV scenarios. Compared with existing detectors, MLK-TR improves detection performance through the following innovations. First, the Sparse Large-Kernel Attention Mechanism (SLK-Atten) proposed selects key information in the image by sparsifying feature representations. Next, the C3PA2 module enhances the feature extraction capability of the detector, thus improving the detector’s focus on foreground targets. In addition, the Frequent Interaction Feature Fusion Network (FIFFN) facilitates feature interaction between different levels, enhancing the detector’s adaptability to different scales. Finally, super high-resolution prediction feature maps are introduced to enhance edge details, thereby improving the detector’s sensitivity to small targets. Notably, the proposed modules can be easily integrated into the YOLO series framework. Compared to the original YOLO11n, MLK-TR achieves a 9% improvement in mAP50 on the publicly available VisDrone dataset, a 1.9% improvement in mAP50 on the UAVDT dataset, and a 3.6% improvement in mAP50 on the PVD dataset. These results confirm the effectiveness of MLK-TR in addressing the complexities of UAV object detection.
ISSN:2199-4536
2198-6053