A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications

DEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergen...

Full description

Saved in:
Bibliographic Details
Main Authors: Liang Yu, Lin Tang, Lisha Mu
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/13/3952
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849427682441297920
author Liang Yu
Lin Tang
Lisha Mu
author_facet Liang Yu
Lin Tang
Lisha Mu
author_sort Liang Yu
collection DOAJ
description DEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergence, poor small object detection, and low efficiency, prompting extensive research. This paper systematically reviews DETR’s technical evolution from a “problem-driven” perspective, focusing on advancements in attention mechanisms, query design, training strategies, and architectural efficiency. We also outline DETR’s applications in autonomous driving, medical imaging, and remote sensing, and its expansion to fine-grained classification and video understanding. Finally, we summarize current challenges and future directions. This “problem-driven” analysis offers researchers a comprehensive and insightful overview, aiming to fill gaps in the existing literature on DETR’s evolution and logic.
format Article
id doaj-art-3fb707aabbe14cbeb1ad092ba318d99d
institution Kabale University
issn 1424-8220
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-3fb707aabbe14cbeb1ad092ba318d99d2025-08-20T03:28:58ZengMDPI AGSensors1424-82202025-06-012513395210.3390/s25133952A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception ApplicationsLiang Yu0Lin Tang1Lisha Mu2College of Software Engineering, Sichuan Polytechnic University, Deyang 618000, ChinaCollege of Software Engineering, Sichuan Polytechnic University, Deyang 618000, ChinaCollege of Software Engineering, Sichuan Polytechnic University, Deyang 618000, ChinaDEtection TRansformer (DETR) introduced an end-to-end object detection paradigm using Transformers, eliminating hand-crafted components like anchor boxes and Non-Maximum Suppression (NMS) via set prediction and bipartite matching. Despite its potential, the original DETR suffered from slow convergence, poor small object detection, and low efficiency, prompting extensive research. This paper systematically reviews DETR’s technical evolution from a “problem-driven” perspective, focusing on advancements in attention mechanisms, query design, training strategies, and architectural efficiency. We also outline DETR’s applications in autonomous driving, medical imaging, and remote sensing, and its expansion to fine-grained classification and video understanding. Finally, we summarize current challenges and future directions. This “problem-driven” analysis offers researchers a comprehensive and insightful overview, aiming to fill gaps in the existing literature on DETR’s evolution and logic.https://www.mdpi.com/1424-8220/25/13/3952object detectionDETRtransformerattentionend to enddeep learning
spellingShingle Liang Yu
Lin Tang
Lisha Mu
A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
Sensors
object detection
DETR
transformer
attention
end to end
deep learning
title A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
title_full A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
title_fullStr A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
title_full_unstemmed A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
title_short A Review of DEtection TRansformer: From Basic Architecture to Advanced Developments and Visual Perception Applications
title_sort review of detection transformer from basic architecture to advanced developments and visual perception applications
topic object detection
DETR
transformer
attention
end to end
deep learning
url https://www.mdpi.com/1424-8220/25/13/3952
work_keys_str_mv AT liangyu areviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications
AT lintang areviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications
AT lishamu areviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications
AT liangyu reviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications
AT lintang reviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications
AT lishamu reviewofdetectiontransformerfrombasicarchitecturetoadvanceddevelopmentsandvisualperceptionapplications