Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles

Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO exc...

Full description

Saved in:
Bibliographic Details
Main Authors: Hoang Ngoc Tran, Nam Nhat Ngo Nguyen, Nhi Quynh Phan Le, Thu Anh Ngoc Le, Anh Duy Nguyen
Format: Article
Language:English
Published: Elsevier 2025-04-01
Series:Engineering Science and Technology, an International Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2215098625000837
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze & Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.We evaluate our framework on the custom DINO&GTSRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios.
ISSN:2215-0986