MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images

Abstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation a...

Full description

Saved in:
Bibliographic Details
Main Authors: Xun Li, Yuzhen Zhao, Yang Zhao, Zhun Guo, Jianjing Gao, Baoxi Yuan
Format: Article
Language:English
Published: Springer 2025-05-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-025-01901-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849733822407507968
author Xun Li
Yuzhen Zhao
Yang Zhao
Zhun Guo
Jianjing Gao
Baoxi Yuan
author_facet Xun Li
Yuzhen Zhao
Yang Zhao
Zhun Guo
Jianjing Gao
Baoxi Yuan
author_sort Xun Li
collection DOAJ
description Abstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation and blurred edges, existing methods struggle to maintain high detection accuracy while ensuring efficient inference speed. To address this, this paper proposes a Multi-branch Large-Kernel TRansformer network (MLK-TR) for small target detection in UAV scenarios. Compared with existing detectors, MLK-TR improves detection performance through the following innovations. First, the Sparse Large-Kernel Attention Mechanism (SLK-Atten) proposed selects key information in the image by sparsifying feature representations. Next, the C3PA2 module enhances the feature extraction capability of the detector, thus improving the detector’s focus on foreground targets. In addition, the Frequent Interaction Feature Fusion Network (FIFFN) facilitates feature interaction between different levels, enhancing the detector’s adaptability to different scales. Finally, super high-resolution prediction feature maps are introduced to enhance edge details, thereby improving the detector’s sensitivity to small targets. Notably, the proposed modules can be easily integrated into the YOLO series framework. Compared to the original YOLO11n, MLK-TR achieves a 9% improvement in mAP50 on the publicly available VisDrone dataset, a 1.9% improvement in mAP50 on the UAVDT dataset, and a 3.6% improvement in mAP50 on the PVD dataset. These results confirm the effectiveness of MLK-TR in addressing the complexities of UAV object detection.
format Article
id doaj-art-4949d03ec05e4d46add5f58fe863f9ba
institution DOAJ
issn 2199-4536
2198-6053
language English
publishDate 2025-05-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-4949d03ec05e4d46add5f58fe863f9ba2025-08-20T03:07:57ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-05-0111612510.1007/s40747-025-01901-0MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based imagesXun Li0Yuzhen Zhao1Yang Zhao2Zhun Guo3Jianjing Gao4Baoxi Yuan5Xi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityAbstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation and blurred edges, existing methods struggle to maintain high detection accuracy while ensuring efficient inference speed. To address this, this paper proposes a Multi-branch Large-Kernel TRansformer network (MLK-TR) for small target detection in UAV scenarios. Compared with existing detectors, MLK-TR improves detection performance through the following innovations. First, the Sparse Large-Kernel Attention Mechanism (SLK-Atten) proposed selects key information in the image by sparsifying feature representations. Next, the C3PA2 module enhances the feature extraction capability of the detector, thus improving the detector’s focus on foreground targets. In addition, the Frequent Interaction Feature Fusion Network (FIFFN) facilitates feature interaction between different levels, enhancing the detector’s adaptability to different scales. Finally, super high-resolution prediction feature maps are introduced to enhance edge details, thereby improving the detector’s sensitivity to small targets. Notably, the proposed modules can be easily integrated into the YOLO series framework. Compared to the original YOLO11n, MLK-TR achieves a 9% improvement in mAP50 on the publicly available VisDrone dataset, a 1.9% improvement in mAP50 on the UAVDT dataset, and a 3.6% improvement in mAP50 on the PVD dataset. These results confirm the effectiveness of MLK-TR in addressing the complexities of UAV object detection.https://doi.org/10.1007/s40747-025-01901-0YOLOLarge kernel attention mechanismFeature fusion networkPartially transformerUAV aerial imageTiny object detection
spellingShingle Xun Li
Yuzhen Zhao
Yang Zhao
Zhun Guo
Jianjing Gao
Baoxi Yuan
MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images
Complex & Intelligent Systems
YOLO
Large kernel attention mechanism
Feature fusion network
Partially transformer
UAV aerial image
Tiny object detection
title MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images
title_full MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images
title_fullStr MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images
title_full_unstemmed MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images
title_short MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images
title_sort mlk tr a multi branch large kernel transformer for uav based images
topic YOLO
Large kernel attention mechanism
Feature fusion network
Partially transformer
UAV aerial image
Tiny object detection
url https://doi.org/10.1007/s40747-025-01901-0
work_keys_str_mv AT xunli mlktramultibranchlargekerneltransformerforuavbasedimages
AT yuzhenzhao mlktramultibranchlargekerneltransformerforuavbasedimages
AT yangzhao mlktramultibranchlargekerneltransformerforuavbasedimages
AT zhunguo mlktramultibranchlargekerneltransformerforuavbasedimages
AT jianjinggao mlktramultibranchlargekerneltransformerforuavbasedimages
AT baoxiyuan mlktramultibranchlargekerneltransformerforuavbasedimages