A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
DEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergen...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/13/3952 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849427682441297920 |
|---|---|
| author | Liang Yu Lin Tang Lisha Mu |
| author_facet | Liang Yu Lin Tang Lisha Mu |
| author_sort | Liang Yu |
| collection | DOAJ |
| description | DEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergence, poor small object detection, and low efficiency, prompting extensive research. This paper systematically reviews DETR’s technical evolution from a “problem-driven” perspective, focusing on advancements in attention mechanisms, query design, training strategies, and architectural efficiency. We also outline DETR’s applications in autonomous driving, medical imaging, and remote sensing, and its expansion to fine-grained classification and video understanding. Finally, we summarize current challenges and future directions. This “problem-driven” analysis offers researchers a comprehensive and insightful overview, aiming to fill gaps in the existing literature on DETR’s evolution and logic. |
| format | Article |
| id | doaj-art-3fb707aabbe14cbeb1ad092ba318d99d |
| institution | Kabale University |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-3fb707aabbe14cbeb1ad092ba318d99d2025-08-20T03:28:58ZengMDPI AGSensors1424-82202025-06-012513395210.3390/s25133952A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception ApplicationsLiang Yu0Lin Tang1Lisha Mu2College of Software Engineering, Sichuan Polytechnic University, Deyang 618000, ChinaCollege of Software Engineering, Sichuan Polytechnic University, Deyang 618000, ChinaCollege of Software Engineering, Sichuan Polytechnic University, Deyang 618000, ChinaDEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergence, poor small object detection, and low efficiency, prompting extensive research. This paper systematically reviews DETR’s technical evolution from a “problem-driven” perspective, focusing on advancements in attention mechanisms, query design, training strategies, and architectural efficiency. We also outline DETR’s applications in autonomous driving, medical imaging, and remote sensing, and its expansion to fine-grained classification and video understanding. Finally, we summarize current challenges and future directions. This “problem-driven” analysis offers researchers a comprehensive and insightful overview, aiming to fill gaps in the existing literature on DETR’s evolution and logic.https://www.mdpi.com/1424-8220/25/13/3952object detectionDETRtransformerattentionend to enddeep learning |
| spellingShingle | Liang Yu Lin Tang Lisha Mu A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications Sensors object detection DETR transformer attention end to end deep learning |
| title | A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications |
| title_full | A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications |
| title_fullStr | A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications |
| title_full_unstemmed | A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications |
| title_short | A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications |
| title_sort | review of detection transformer from basic architecture to advanced developments and visual perception applications |
| topic | object detection DETR transformer attention end to end deep learning |
| url | https://www.mdpi.com/1424-8220/25/13/3952 |
| work_keys_str_mv | AT liangyu areviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications AT lintang areviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications AT lishamu areviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications AT liangyu reviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications AT lintang reviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications AT lishamu reviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications |