Grounding DINO and distillation-enhanced model for advanced traffic sign detection and classification in autonomous vehicles
Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO exc...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-04-01
|
| Series: | Engineering Science and Technology, an International Journal |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2215098625000837 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Accurate traffic sign detection is critical for safe autonomous driving. This paper presents a novel approach that integrates GroundingDINO, known for its semantic grounding capabilities, with a self-distilled ResNet to enhance detection performance and real-time feasibility. While GroundingDINO excels in linking object detection with contextual understanding, it faces challenges when detecting small or occluded signs. To address these limitations, we employ a lightweight LB-scSE (Linear Bottleneck Block with Simultaneous Spatial and Channel Squeeze & Excitation) architecture, thereby improving detection accuracy while significantly reducing computational overhead.We evaluate our framework on the custom DINO>SRBv1 dataset, where the GroundingDINO Pro model achieves a mAP@50 of 68.52%. The self-distilled network further reduces model size by tenfold compared to baseline models (e.g., MobileNetV2, VGG16, ResNet18), yet maintains competitive accuracy, providing a robust, resource-efficient solution for real-time deployment. Our results indicate that integrating semantic grounding with distillation-based compression not only enhances traffic sign detection performance but also delivers a scalable and efficient approach for complex traffic environments. Additionally, our method outperforms standard architectures such as MobileNetV1-2, VGG16-19, and ResNet34-50, demonstrating higher detection accuracy and lower resource consumption, thus reinforcing its suitability for real-world autonomous driving scenarios. |
|---|---|
| ISSN: | 2215-0986 |