MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images
Abstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation a...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-05-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-01901-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849733822407507968 |
|---|---|
| author | Xun Li Yuzhen Zhao Yang Zhao Zhun Guo Jianjing Gao Baoxi Yuan |
| author_facet | Xun Li Yuzhen Zhao Yang Zhao Zhun Guo Jianjing Gao Baoxi Yuan |
| author_sort | Xun Li |
| collection | DOAJ |
| description | Abstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation and blurred edges, existing methods struggle to maintain high detection accuracy while ensuring efficient inference speed. To address this, this paper proposes a Multi-branch Large-Kernel TRansformer network (MLK-TR) for small target detection in UAV scenarios. Compared with existing detectors, MLK-TR improves detection performance through the following innovations. First, the Sparse Large-Kernel Attention Mechanism (SLK-Atten) proposed selects key information in the image by sparsifying feature representations. Next, the C3PA2 module enhances the feature extraction capability of the detector, thus improving the detector’s focus on foreground targets. In addition, the Frequent Interaction Feature Fusion Network (FIFFN) facilitates feature interaction between different levels, enhancing the detector’s adaptability to different scales. Finally, super high-resolution prediction feature maps are introduced to enhance edge details, thereby improving the detector’s sensitivity to small targets. Notably, the proposed modules can be easily integrated into the YOLO series framework. Compared to the original YOLO11n, MLK-TR achieves a 9% improvement in mAP50 on the publicly available VisDrone dataset, a 1.9% improvement in mAP50 on the UAVDT dataset, and a 3.6% improvement in mAP50 on the PVD dataset. These results confirm the effectiveness of MLK-TR in addressing the complexities of UAV object detection. |
| format | Article |
| id | doaj-art-4949d03ec05e4d46add5f58fe863f9ba |
| institution | DOAJ |
| issn | 2199-4536 2198-6053 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Springer |
| record_format | Article |
| series | Complex & Intelligent Systems |
| spelling | doaj-art-4949d03ec05e4d46add5f58fe863f9ba2025-08-20T03:07:57ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-05-0111612510.1007/s40747-025-01901-0MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based imagesXun Li0Yuzhen Zhao1Yang Zhao2Zhun Guo3Jianjing Gao4Baoxi Yuan5Xi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityXi’an Key Laboratory of Advanced Photo-electronics Materials and Energy Conversion Device, School of Electronic Information, Xijing UniversityAbstract Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation and blurred edges, existing methods struggle to maintain high detection accuracy while ensuring efficient inference speed. To address this, this paper proposes a Multi-branch Large-Kernel TRansformer network (MLK-TR) for small target detection in UAV scenarios. Compared with existing detectors, MLK-TR improves detection performance through the following innovations. First, the Sparse Large-Kernel Attention Mechanism (SLK-Atten) proposed selects key information in the image by sparsifying feature representations. Next, the C3PA2 module enhances the feature extraction capability of the detector, thus improving the detector’s focus on foreground targets. In addition, the Frequent Interaction Feature Fusion Network (FIFFN) facilitates feature interaction between different levels, enhancing the detector’s adaptability to different scales. Finally, super high-resolution prediction feature maps are introduced to enhance edge details, thereby improving the detector’s sensitivity to small targets. Notably, the proposed modules can be easily integrated into the YOLO series framework. Compared to the original YOLO11n, MLK-TR achieves a 9% improvement in mAP50 on the publicly available VisDrone dataset, a 1.9% improvement in mAP50 on the UAVDT dataset, and a 3.6% improvement in mAP50 on the PVD dataset. These results confirm the effectiveness of MLK-TR in addressing the complexities of UAV object detection.https://doi.org/10.1007/s40747-025-01901-0YOLOLarge kernel attention mechanismFeature fusion networkPartially transformerUAV aerial imageTiny object detection |
| spellingShingle | Xun Li Yuzhen Zhao Yang Zhao Zhun Guo Jianjing Gao Baoxi Yuan MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images Complex & Intelligent Systems YOLO Large kernel attention mechanism Feature fusion network Partially transformer UAV aerial image Tiny object detection |
| title | MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images |
| title_full | MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images |
| title_fullStr | MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images |
| title_full_unstemmed | MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images |
| title_short | MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images |
| title_sort | mlk tr a multi branch large kernel transformer for uav based images |
| topic | YOLO Large kernel attention mechanism Feature fusion network Partially transformer UAV aerial image Tiny object detection |
| url | https://doi.org/10.1007/s40747-025-01901-0 |
| work_keys_str_mv | AT xunli mlktramultibranchlargekerneltransformerforuavbasedimages AT yuzhenzhao mlktramultibranchlargekerneltransformerforuavbasedimages AT yangzhao mlktramultibranchlargekerneltransformerforuavbasedimages AT zhunguo mlktramultibranchlargekerneltransformerforuavbasedimages AT jianjinggao mlktramultibranchlargekerneltransformerforuavbasedimages AT baoxiyuan mlktramultibranchlargekerneltransformerforuavbasedimages |