Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO exc...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-04-01
|
| Series: | Engineering Science and Technology, an International Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2215098625000837 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849344510004297728 |
|---|---|
| author | Hoang Ngoc Tran Nam Nhat Ngo Nguyen Nhi Quynh Phan Le Thu Anh Ngoc Le Anh Duy Nguyen |
| author_facet | Hoang Ngoc Tran Nam Nhat Ngo Nguyen Nhi Quynh Phan Le Thu Anh Ngoc Le Anh Duy Nguyen |
| author_sort | Hoang Ngoc Tran |
| collection | DOAJ |
| description | Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze & Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.We evaluate our framework on the custom DINO>SRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios. |
| format | Article |
| id | doaj-art-c1ebd5fc0084464182fda2c7cac423b4 |
| institution | Kabale University |
| issn | 2215-0986 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Engineering Science and Technology, an International Journal |
| spelling | doaj-art-c1ebd5fc0084464182fda2c7cac423b42025-08-20T03:42:39ZengElsevierEngineering Science and Technology, an International Journal2215-09862025-04-016410202810.1016/j.jestch.2025.102028Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehiclesHoang Ngoc Tran0Nam Nhat Ngo Nguyen1Nhi Quynh Phan Le2Thu Anh Ngoc Le3Anh Duy Nguyen4FPT University, Can Tho, 94000, Viet Nam; Corresponding authors.FPT University, Can Tho, 94000, Viet NamFPT University, Can Tho, 94000, Viet NamFPT University, Can Tho, 94000, Viet NamDepartment of Mechatronic Engineering, Faculty of Mechanical Engineering, Ho Chi Minh City University of Technology (HCMUT), Vietnam National University, Ho Chi Minh City, Viet Nam; Corresponding authors.Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze & Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.We evaluate our framework on the custom DINO>SRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios.http://www.sciencedirect.com/science/article/pii/S2215098625000837Traffic sign recognitionLightweight modelsOpen-set object detectionSelf-distillationModel compressionGrounding DINO |
| spellingShingle | Hoang Ngoc Tran Nam Nhat Ngo Nguyen Nhi Quynh Phan Le Thu Anh Ngoc Le Anh Duy Nguyen Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles Engineering Science and Technology, an International Journal Traffic sign recognition Lightweight models Open-set object detection Self-distillation Model compression Grounding DINO |
| title | Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles |
| title_full | Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles |
| title_fullStr | Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles |
| title_full_unstemmed | Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles |
| title_short | Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles |
| title_sort | grounding dino and distillation enhanced model for advanced traffic sign detection and classification in autonomous vehicles |
| topic | Traffic sign recognition Lightweight models Open-set object detection Self-distillation Model compression Grounding DINO |
| url | http://www.sciencedirect.com/science/article/pii/S2215098625000837 |
| work_keys_str_mv | AT hoangngoctran groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles AT namnhatngonguyen groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles AT nhiquynhphanle groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles AT thuanhngocle groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles AT anhduynguyen groundingdinoanddistillationenhancedmodelforadvancedtrafficsigndetectionandclassificationinautonomousvehicles |