WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks
With the growing adoption of Unmanned Aerial Vehicles (UAVs) in surveillance, emergency response, and environmental monitoring, accurate small object detection under resource constraints remains a critical challenge. To address this issue, we propose WDFS-DETR (Wavelet-based Dual-stage Feature-enhan...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-09-01
|
| Series: | Results in Engineering |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2590123025020018 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849682878574624768 |
|---|---|
| author | Jinjiang Liu Yonghua Xie |
| author_facet | Jinjiang Liu Yonghua Xie |
| author_sort | Jinjiang Liu |
| collection | DOAJ |
| description | With the growing adoption of Unmanned Aerial Vehicles (UAVs) in surveillance, emergency response, and environmental monitoring, accurate small object detection under resource constraints remains a critical challenge. To address this issue, we propose WDFS-DETR (Wavelet-based Dual-stage Feature-enhanced Detection Transformer), a Transformer-based detection framework built upon RT-DETR (Real-Time Detection Transformer). The framework integrates four custom-designed components to jointly enhance detection accuracy and efficiency. First, BasicBlock-WTCM (Wavelet-based Transform Coordinate Module) improves small object perception by modeling spatial and channel semantics across scales. Second, the Dual-Stage Adaptive Normalization (DSAN) dynamically selects appropriate normalization strategies during training and inference to improve convergence and runtime performance. Third, the Feature-Focused Diffusion Pyramid Network (FFDPN) enhances context modeling and robustness via hierarchical multi-scale feature alignment and fusion. Finally, the boundary-aware Slide-VarifocalLoss combines sliding window mechanisms with class-weighted reweighting to address class imbalance and improve boundary localization. Experiments show that WDFS-DETR improves mAP@0.5 by 2.2 % over RT-DETR-r18 on VisDrone2019. When deployed on Jetson Orin Nano, inference speed improves from 38.6 FPS to 50.1 FPS, highlighting its suitability for real-time deployment on lightweight platforms. The model also generalizes well to UAVDT and DOTA datasets, demonstrating its applicability to real-world UAV-based detection tasks in complex engineering settings. Source code is available at: https://github.com/liuliuliu2002/WDFS-DETR. |
| format | Article |
| id | doaj-art-b6070b19d18743479635124a947faea8 |
| institution | DOAJ |
| issn | 2590-1230 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Results in Engineering |
| spelling | doaj-art-b6070b19d18743479635124a947faea82025-08-20T03:24:03ZengElsevierResults in Engineering2590-12302025-09-012710593010.1016/j.rineng.2025.105930WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering TasksJinjiang Liu0Yonghua Xie1College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, PR ChinaCorresponding author.; College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, PR ChinaWith the growing adoption of Unmanned Aerial Vehicles (UAVs) in surveillance, emergency response, and environmental monitoring, accurate small object detection under resource constraints remains a critical challenge. To address this issue, we propose WDFS-DETR (Wavelet-based Dual-stage Feature-enhanced Detection Transformer), a Transformer-based detection framework built upon RT-DETR (Real-Time Detection Transformer). The framework integrates four custom-designed components to jointly enhance detection accuracy and efficiency. First, BasicBlock-WTCM (Wavelet-based Transform Coordinate Module) improves small object perception by modeling spatial and channel semantics across scales. Second, the Dual-Stage Adaptive Normalization (DSAN) dynamically selects appropriate normalization strategies during training and inference to improve convergence and runtime performance. Third, the Feature-Focused Diffusion Pyramid Network (FFDPN) enhances context modeling and robustness via hierarchical multi-scale feature alignment and fusion. Finally, the boundary-aware Slide-VarifocalLoss combines sliding window mechanisms with class-weighted reweighting to address class imbalance and improve boundary localization. Experiments show that WDFS-DETR improves mAP@0.5 by 2.2 % over RT-DETR-r18 on VisDrone2019. When deployed on Jetson Orin Nano, inference speed improves from 38.6 FPS to 50.1 FPS, highlighting its suitability for real-time deployment on lightweight platforms. The model also generalizes well to UAVDT and DOTA datasets, demonstrating its applicability to real-world UAV-based detection tasks in complex engineering settings. Source code is available at: https://github.com/liuliuliu2002/WDFS-DETR.http://www.sciencedirect.com/science/article/pii/S2590123025020018Transformer-based detectorSmall object detectionEmbedded platformsMulti-scale Feature FusionJetson Orin Nano |
| spellingShingle | Jinjiang Liu Yonghua Xie WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks Results in Engineering Transformer-based detector Small object detection Embedded platforms Multi-scale Feature Fusion Jetson Orin Nano |
| title | WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks |
| title_full | WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks |
| title_fullStr | WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks |
| title_full_unstemmed | WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks |
| title_short | WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks |
| title_sort | wdfs detr a transformer based framework with multi scale attention for small object detection in uav engineering tasks |
| topic | Transformer-based detector Small object detection Embedded platforms Multi-scale Feature Fusion Jetson Orin Nano |
| url | http://www.sciencedirect.com/science/article/pii/S2590123025020018 |
| work_keys_str_mv | AT jinjiangliu wdfsdetratransformerbasedframeworkwithmultiscaleattentionforsmallobjectdetectioninuavengineeringtasks AT yonghuaxie wdfsdetratransformerbasedframeworkwithmultiscaleattentionforsmallobjectdetectioninuavengineeringtasks |