Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles

Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO exc...

Full description

Saved in:
Bibliographic Details
Main Authors: Hoang Ngoc Tran, Nam Nhat Ngo Nguyen, Nhi Quynh Phan Le, Thu Anh Ngoc Le, Anh Duy Nguyen
Format: Article
Language:English
Published: Elsevier 2025-04-01
Series:Engineering Science and Technology, an International Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215098625000837
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849344510004297728
author Hoang Ngoc Tran
Nam Nhat Ngo Nguyen
Nhi Quynh Phan Le
Thu Anh Ngoc Le
Anh Duy Nguyen
author_facet Hoang Ngoc Tran
Nam Nhat Ngo Nguyen
Nhi Quynh Phan Le
Thu Anh Ngoc Le
Anh Duy Nguyen
author_sort Hoang Ngoc Tran
collection DOAJ
description Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze & Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.We evaluate our framework on the custom DINO&GTSRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios.
format Article
id doaj-art-c1ebd5fc0084464182fda2c7cac423b4
institution Kabale University
issn 2215-0986
language English
publishDate 2025-04-01
publisher Elsevier
record_format Article
series Engineering Science and Technology, an International Journal
spelling doaj-art-c1ebd5fc0084464182fda2c7cac423b42025-08-20T03:42:39ZengElsevierEngineering Science and Technology, an International Journal2215-09862025-04-016410202810.1016/j.jestch.2025.102028Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehiclesHoang Ngoc Tran0Nam Nhat Ngo Nguyen1Nhi Quynh Phan Le2Thu Anh Ngoc Le3Anh Duy Nguyen4FPT University, Can Tho, 94000, Viet Nam; Corresponding authors.FPT University, Can Tho, 94000, Viet NamFPT University, Can Tho, 94000, Viet NamFPT University, Can Tho, 94000, Viet NamDepartment of Mechatronic Engineering, Faculty of Mechanical Engineering, Ho Chi Minh City University of Technology (HCMUT), Vietnam National University, Ho Chi Minh City, Viet Nam; Corresponding authors.Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze & Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.We evaluate our framework on the custom DINO&GTSRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios.http://www.sciencedirect.com/science/article/pii/S2215098625000837Traffic sign recognitionLightweight modelsOpen-set object detectionSelf-distillationModel compressionGrounding DINO
spellingShingle Hoang Ngoc Tran
Nam Nhat Ngo Nguyen
Nhi Quynh Phan Le
Thu Anh Ngoc Le
Anh Duy Nguyen
Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
Engineering Science and Technology, an International Journal
Traffic sign recognition
Lightweight models
Open-set object detection
Self-distillation
Model compression
Grounding DINO
title Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
title_full Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
title_fullStr Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
title_full_unstemmed Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
title_short Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
title_sort grounding dino and distillation enhanced model for advanced traffic sign detection and classification in autonomous vehicles
topic Traffic sign recognition
Lightweight models
Open-set object detection
Self-distillation
Model compression
Grounding DINO
url http://www.sciencedirect.com/science/article/pii/S2215098625000837
work_keys_str_mv AT hoangngoctran groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles
AT namnhatngonguyen groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles
AT nhiquynhphanle groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles
AT thuanhngocle groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles
AT anhduynguyen groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles