The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORT
ObjectiveVisual inspection is an important technology for the roadside perception of vehicle-road cooperative. But in practice, it is difficult to achieve both optimal detection accuracy and computational efficiency simultaneously due to limited computing resources. This paper proposes a new method...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Editorial Department of Journal of Sichuan University (Engineering Science Edition)
2025-01-01
|
| Series: | 工程科学与技术 |
| Subjects: | |
| Online Access: | http://jsuese.scu.edu.cn/thesisDetails#10.12454/j.jsuese.202400467 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850067346029281280 |
|---|---|
| author | LI Xiaohui YANG Jie XIA Qin |
| author_facet | LI Xiaohui YANG Jie XIA Qin |
| author_sort | LI Xiaohui |
| collection | DOAJ |
| description | ObjectiveVisual inspection is an important technology for the roadside perception of vehicle-road cooperative. But in practice, it is difficult to achieve both optimal detection accuracy and computational efficiency simultaneously due to limited computing resources. This paper proposes a new method based on improved YOLOv5 and CombineSORT for image recognition and target tracking, which gets good results of detection and low time cost simultaneously in the experiment.MethodsFirstly, Multi-scale Fature Ehancement (MFE) is applied on FPN of YOLOv5 for extracting shallow details of the target. This module is mainly composed of Scale Fusion, CombineFPN and Pixel-Region Attention. To improve the convergence and reduce the complexity of the model, an advanced loss function of super-efficient IOU (SEIOU) and network pruning are applied. In this process, the loss is calculated based on the differences in length, width, and diagonal between the detection and ground-truth boxes, and batch normalization (BN) layer sparsification is applied for convolutional channel filtering. Secondly, combining DeepSORT, StrongSORT and Bot-SORT, a new method of CombineSORT is presented for multi target tracking. In this approach, the basic framework of DeepSORT is adopted, and the Bot net with ResNet50 as the backbone is taken to extract the appearance features. For improving trajectory smoothness, Kalman filtering is replaced by polynomial fitting, while the joint similarity matrix from StrongSORT is used for matching targets with trajectories. According to the operational process of the algorithm given by this paper, a series of experiments are designed to validate the algorithm’s effectiveness. Using images of real intersections, an ablation test verifies the effectiveness and data volume of each improved module. The algorithm is then compared with other classical algorithms using video streams from intersections with varying traffic volumes, all executed on a mobile edge computer (MEC) with limited computing power.Results and DiscussionsThrough ablation test, original YOLOv5 achieved mAP@90 at 0.894 and parameter quantity at 21.2M. Scale Fusion, CombineFPN and Pixel-Region Attention increased the mAP@90 of the original model to 0.91, 0.923 and 0.916, while the parameter quantity to 24.4M, 25.3M and 24.1M respectively. The YOLOv5 integrating the three modules achieved mAP@90 at 0.939 and parameter quantity at 31.0M, then network pruning reduced the parameter quantity to .6M with mAP@90 of 0.937. Through three groups of real intersection experiments, the average recall rates for group 1 to 3 were 97.68%, 95.83% and 96.76%, while the multiple object tracking accuracy (MOTA) values were 0.944, 0.890 and 0.910. Among the all targets, pedestrians and non-motorized vehicles exhibited relatively poor detection performance. Especially in group , the recall rate and MOTA of pedestrians were 89.98% and 0.75, while the ones of non-motorized vehicles were even as low as 84.5% and 0.675. This is because these two targets have relatively small size and would not strictly follow traffic rules, making them easily obstructed and difficult trajectory predicting. Additionally, the recall rates of buses and trucks are nearly 3 percentage points lower than that of cars, especially in group , the results were only 94.81% and 94.92%. This is because box trucks and buses have similar appearance features, making misidentifications likely to occur from rear perspectives. Comparing the overall processing effects of different algorithms at the low volume intersection, the worst test result was the recall rate of 96.54% and the MOTA value of 0.938, while the best was the recall rate of 97.69% and the MOTA value of 0.946. It indicates that most algorithms can achieve good detection results when the targets are sparse, and the lightweight models may have more advantages with considering the demand of computing resources. But for high volume intersections, although the lightweight algorithm based on Efficientnet and Bytetrack has the shortest computation delay, its recall rate and MOTA value were only 9.75% and 0.817. On the contrary, the algorithms applying YOLOv5, YOLOX, YOLOv7 and the paper's improved YOLOv5 achieved the recall rates from 95.26% to 96.28%, while algorithms applying DeepSORT, StrongSORT, Bot-SORT and CombineSORT achieved the MOTA values from 0.887 to 0.901. But most of them had the time cost exceeding 80ms, making them could not perform real-time calculations. Among those algorithms with computation time less than 80ms, the paper's algorithm based on improved YOLOv5 and CombineSORT achieved the best detection result with recall rates at 96.27% and MOTA value at 0.900, which confirmed that it can simultaneously balance detection accuracy and computational efficiency.ConclusionsThe paper focuses on traffic target perception from a fixed perspective on the roadside, while the results demonstrate the effectiveness and accuracy of the proposed algorithm. Compared to other commonly algorithms, this algorithm can simultaneously achieve higher detection accuracy and lower time cost for high volume intersections, offering good application prospects in vehicle road collaboration scenarios. For some better engineering practices, further research can be conducted on improving the recognition and tracking based on continuous images under adverse weather conditions. |
| format | Article |
| id | doaj-art-4ad14fe82d124b0ab90c1316e5fbc929 |
| institution | DOAJ |
| issn | 2096-3246 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Editorial Department of Journal of Sichuan University (Engineering Science Edition) |
| record_format | Article |
| series | 工程科学与技术 |
| spelling | doaj-art-4ad14fe82d124b0ab90c1316e5fbc9292025-08-20T02:48:20ZengEditorial Department of Journal of Sichuan University (Engineering Science Edition)工程科学与技术2096-32462025-01-0111287949380The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORTLI XiaohuiYANG JieXIA QinObjectiveVisual inspection is an important technology for the roadside perception of vehicle-road cooperative. But in practice, it is difficult to achieve both optimal detection accuracy and computational efficiency simultaneously due to limited computing resources. This paper proposes a new method based on improved YOLOv5 and CombineSORT for image recognition and target tracking, which gets good results of detection and low time cost simultaneously in the experiment.MethodsFirstly, Multi-scale Fature Ehancement (MFE) is applied on FPN of YOLOv5 for extracting shallow details of the target. This module is mainly composed of Scale Fusion, CombineFPN and Pixel-Region Attention. To improve the convergence and reduce the complexity of the model, an advanced loss function of super-efficient IOU (SEIOU) and network pruning are applied. In this process, the loss is calculated based on the differences in length, width, and diagonal between the detection and ground-truth boxes, and batch normalization (BN) layer sparsification is applied for convolutional channel filtering. Secondly, combining DeepSORT, StrongSORT and Bot-SORT, a new method of CombineSORT is presented for multi target tracking. In this approach, the basic framework of DeepSORT is adopted, and the Bot net with ResNet50 as the backbone is taken to extract the appearance features. For improving trajectory smoothness, Kalman filtering is replaced by polynomial fitting, while the joint similarity matrix from StrongSORT is used for matching targets with trajectories. According to the operational process of the algorithm given by this paper, a series of experiments are designed to validate the algorithm’s effectiveness. Using images of real intersections, an ablation test verifies the effectiveness and data volume of each improved module. The algorithm is then compared with other classical algorithms using video streams from intersections with varying traffic volumes, all executed on a mobile edge computer (MEC) with limited computing power.Results and DiscussionsThrough ablation test, original YOLOv5 achieved mAP@90 at 0.894 and parameter quantity at 21.2M. Scale Fusion, CombineFPN and Pixel-Region Attention increased the mAP@90 of the original model to 0.91, 0.923 and 0.916, while the parameter quantity to 24.4M, 25.3M and 24.1M respectively. The YOLOv5 integrating the three modules achieved mAP@90 at 0.939 and parameter quantity at 31.0M, then network pruning reduced the parameter quantity to .6M with mAP@90 of 0.937. Through three groups of real intersection experiments, the average recall rates for group 1 to 3 were 97.68%, 95.83% and 96.76%, while the multiple object tracking accuracy (MOTA) values were 0.944, 0.890 and 0.910. Among the all targets, pedestrians and non-motorized vehicles exhibited relatively poor detection performance. Especially in group , the recall rate and MOTA of pedestrians were 89.98% and 0.75, while the ones of non-motorized vehicles were even as low as 84.5% and 0.675. This is because these two targets have relatively small size and would not strictly follow traffic rules, making them easily obstructed and difficult trajectory predicting. Additionally, the recall rates of buses and trucks are nearly 3 percentage points lower than that of cars, especially in group , the results were only 94.81% and 94.92%. This is because box trucks and buses have similar appearance features, making misidentifications likely to occur from rear perspectives. Comparing the overall processing effects of different algorithms at the low volume intersection, the worst test result was the recall rate of 96.54% and the MOTA value of 0.938, while the best was the recall rate of 97.69% and the MOTA value of 0.946. It indicates that most algorithms can achieve good detection results when the targets are sparse, and the lightweight models may have more advantages with considering the demand of computing resources. But for high volume intersections, although the lightweight algorithm based on Efficientnet and Bytetrack has the shortest computation delay, its recall rate and MOTA value were only 9.75% and 0.817. On the contrary, the algorithms applying YOLOv5, YOLOX, YOLOv7 and the paper's improved YOLOv5 achieved the recall rates from 95.26% to 96.28%, while algorithms applying DeepSORT, StrongSORT, Bot-SORT and CombineSORT achieved the MOTA values from 0.887 to 0.901. But most of them had the time cost exceeding 80ms, making them could not perform real-time calculations. Among those algorithms with computation time less than 80ms, the paper's algorithm based on improved YOLOv5 and CombineSORT achieved the best detection result with recall rates at 96.27% and MOTA value at 0.900, which confirmed that it can simultaneously balance detection accuracy and computational efficiency.ConclusionsThe paper focuses on traffic target perception from a fixed perspective on the roadside, while the results demonstrate the effectiveness and accuracy of the proposed algorithm. Compared to other commonly algorithms, this algorithm can simultaneously achieve higher detection accuracy and lower time cost for high volume intersections, offering good application prospects in vehicle road collaboration scenarios. For some better engineering practices, further research can be conducted on improving the recognition and tracking based on continuous images under adverse weather conditions.http://jsuese.scu.edu.cn/thesisDetails#10.12454/j.jsuese.202400467vehicle-road cooperativeroadside perceptionImage recognitionYOLOv5CombineSORT |
| spellingShingle | LI Xiaohui YANG Jie XIA Qin The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORT 工程科学与技术 vehicle-road cooperative roadside perception Image recognition YOLOv5 CombineSORT |
| title | The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORT |
| title_full | The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORT |
| title_fullStr | The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORT |
| title_full_unstemmed | The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORT |
| title_short | The Study of Roadside Visual Perception in Internet of Vehicles Based on Improved YOLOv5 and CombineSORT |
| title_sort | study of roadside visual perception in internet of vehicles based on improved yolov5 and combinesort |
| topic | vehicle-road cooperative roadside perception Image recognition YOLOv5 CombineSORT |
| url | http://jsuese.scu.edu.cn/thesisDetails#10.12454/j.jsuese.202400467 |
| work_keys_str_mv | AT lixiaohui thestudyofroadsidevisualperceptionininternetofvehiclesbasedonimprovedyolov5andcombinesort AT yangjie thestudyofroadsidevisualperceptionininternetofvehiclesbasedonimprovedyolov5andcombinesort AT xiaqin thestudyofroadsidevisualperceptionininternetofvehiclesbasedonimprovedyolov5andcombinesort AT lixiaohui studyofroadsidevisualperceptionininternetofvehiclesbasedonimprovedyolov5andcombinesort AT yangjie studyofroadsidevisualperceptionininternetofvehiclesbasedonimprovedyolov5andcombinesort AT xiaqin studyofroadsidevisualperceptionininternetofvehiclesbasedonimprovedyolov5andcombinesort |