WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks

With the growing adoption of Unmanned Aerial Vehicles (UAVs) in surveillance, emergency response, and environmental monitoring, accurate small object detection under resource constraints remains a critical challenge. To address this issue, we propose WDFS-DETR (Wavelet-based Dual-stage Feature-enhan...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinjiang Liu, Yonghua Xie
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Results in Engineering
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2590123025020018
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849682878574624768
author Jinjiang Liu
Yonghua Xie
author_facet Jinjiang Liu
Yonghua Xie
author_sort Jinjiang Liu
collection DOAJ
description With the growing adoption of Unmanned Aerial Vehicles (UAVs) in surveillance, emergency response, and environmental monitoring, accurate small object detection under resource constraints remains a critical challenge. To address this issue, we propose WDFS-DETR (Wavelet-based Dual-stage Feature-enhanced Detection Transformer), a Transformer-based detection framework built upon RT-DETR (Real-Time Detection Transformer). The framework integrates four custom-designed components to jointly enhance detection accuracy and efficiency. First, BasicBlock-WTCM (Wavelet-based Transform Coordinate Module) improves small object perception by modeling spatial and channel semantics across scales. Second, the Dual-Stage Adaptive Normalization (DSAN) dynamically selects appropriate normalization strategies during training and inference to improve convergence and runtime performance. Third, the Feature-Focused Diffusion Pyramid Network (FFDPN) enhances context modeling and robustness via hierarchical multi-scale feature alignment and fusion. Finally, the boundary-aware Slide-VarifocalLoss combines sliding window mechanisms with class-weighted reweighting to address class imbalance and improve boundary localization. Experiments show that WDFS-DETR improves mAP@0.5 by 2.2 % over RT-DETR-r18 on VisDrone2019. When deployed on Jetson Orin Nano, inference speed improves from 38.6 FPS to 50.1 FPS, highlighting its suitability for real-time deployment on lightweight platforms. The model also generalizes well to UAVDT and DOTA datasets, demonstrating its applicability to real-world UAV-based detection tasks in complex engineering settings. Source code is available at: https://github.com/liuliuliu2002/WDFS-DETR.
format Article
id doaj-art-b6070b19d18743479635124a947faea8
institution DOAJ
issn 2590-1230
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Results in Engineering
spelling doaj-art-b6070b19d18743479635124a947faea82025-08-20T03:24:03ZengElsevierResults in Engineering2590-12302025-09-012710593010.1016/j.rineng.2025.105930WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering TasksJinjiang Liu0Yonghua Xie1College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, PR ChinaCorresponding author.; College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, PR ChinaWith the growing adoption of Unmanned Aerial Vehicles (UAVs) in surveillance, emergency response, and environmental monitoring, accurate small object detection under resource constraints remains a critical challenge. To address this issue, we propose WDFS-DETR (Wavelet-based Dual-stage Feature-enhanced Detection Transformer), a Transformer-based detection framework built upon RT-DETR (Real-Time Detection Transformer). The framework integrates four custom-designed components to jointly enhance detection accuracy and efficiency. First, BasicBlock-WTCM (Wavelet-based Transform Coordinate Module) improves small object perception by modeling spatial and channel semantics across scales. Second, the Dual-Stage Adaptive Normalization (DSAN) dynamically selects appropriate normalization strategies during training and inference to improve convergence and runtime performance. Third, the Feature-Focused Diffusion Pyramid Network (FFDPN) enhances context modeling and robustness via hierarchical multi-scale feature alignment and fusion. Finally, the boundary-aware Slide-VarifocalLoss combines sliding window mechanisms with class-weighted reweighting to address class imbalance and improve boundary localization. Experiments show that WDFS-DETR improves mAP@0.5 by 2.2 % over RT-DETR-r18 on VisDrone2019. When deployed on Jetson Orin Nano, inference speed improves from 38.6 FPS to 50.1 FPS, highlighting its suitability for real-time deployment on lightweight platforms. The model also generalizes well to UAVDT and DOTA datasets, demonstrating its applicability to real-world UAV-based detection tasks in complex engineering settings. Source code is available at: https://github.com/liuliuliu2002/WDFS-DETR.http://www.sciencedirect.com/science/article/pii/S2590123025020018Transformer-based detectorSmall object detectionEmbedded platformsMulti-scale Feature FusionJetson Orin Nano
spellingShingle Jinjiang Liu
Yonghua Xie
WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks
Results in Engineering
Transformer-based detector
Small object detection
Embedded platforms
Multi-scale Feature Fusion
Jetson Orin Nano
title WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks
title_full WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks
title_fullStr WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks
title_full_unstemmed WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks
title_short WDFS-DETR: A Transformer-based framework with multi-scale attention for small object detection in UAV Engineering Tasks
title_sort wdfs detr a transformer based framework with multi scale attention for small object detection in uav engineering tasks
topic Transformer-based detector
Small object detection
Embedded platforms
Multi-scale Feature Fusion
Jetson Orin Nano
url http://www.sciencedirect.com/science/article/pii/S2590123025020018
work_keys_str_mv AT jinjiangliu wdfsdetratransformerbasedframeworkwithmultiscaleattentionforsmallobjectdetectioninuavengineeringtasks
AT yonghuaxie wdfsdetratransformerbasedframeworkwithmultiscaleattentionforsmallobjectdetectioninuavengineeringtasks