Enhanced YOLOv5s Model for Improved Multi-Sized Object Detection in Road Scenes
Detecting objects in complex driving environments is crucial for autonomous vehicles to navigate safely. However, this task becomes challenging when addressing scale variations, occlusions and diverse backgrounds. This paper proposes an enhanced YOLOv5s model for handling varying object sizes from s...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11048496/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Detecting objects in complex driving environments is crucial for autonomous vehicles to navigate safely. However, this task becomes challenging when addressing scale variations, occlusions and diverse backgrounds. This paper proposes an enhanced YOLOv5s model for handling varying object sizes from small pedestrians and traffic signs to larger vehicles in road scenes. The proposed enhancement begins by refining the default anchor boxes using the percentile-based quantile method on the distribution of the bounding boxes and the adjustments to the convolution layers for enhanced feature extraction. Smaller kernel sizes and fewer channels are employed in the initial layers to capture fine-grained details, while in deeper layers, the number of channels is progressively increased to capture broader information that better represents larger objects. Furthermore, an efficient channel attention (ECA) mechanism is integrated into the backbone to prioritize key feature channels, thereby enhancing the model’s ability to detect overlapping and small objects. To improve the feature fusion process, a Multi-scale BiFPN block is integrated into the neck of the model. This combines fine-grained spatial details from the shallow layers with more abstract semantic information from deeper layers, enabling the detection of objects across varying scales. Experimental evaluations carried out on the IDD dataset reveal that the enhanced YOLOv5s model achieves a significant gain in prediction accuracy when compared with the original YOLOv5s. To mitigate the effect of class imbalance and improve generalization across varying object sizes, CutMix data augmentation is employed during training. It shows a 48% increase in mean average precision (mAP@0.5) and a 44% and 49% rise in precision and recall, respectively, with an inference time of 14.6ms compared to the baseline model. These improvements underscore the effectiveness of the proposed enhancements in addressing the challenges of detecting multi-sized objects in complex road environments. |
|---|---|
| ISSN: | 2169-3536 |