A Benchmark Review of YOLO Algorithm Developments for Object Detection
You Only Look Once (YOLO) has established itself as a prominent object detection framework due to its excellent balance between speed and accuracy. This article provides a thorough review of the YOLO series, from YOLOv1 to YOLOv10, including YOLOX, emphasizing their architectural advancements, loss...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11072404/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | You Only Look Once (YOLO) has established itself as a prominent object detection framework due to its excellent balance between speed and accuracy. This article provides a thorough review of the YOLO series, from YOLOv1 to YOLOv10, including YOLOX, emphasizing their architectural advancements, loss function improvements, and performance enhancements. We have benchmarked the officially released versions from YOLOv3 to YOLOv10 and YOLOX, using widely recognized datasets VOC07+12 and COCO2017, on diverse hardware platforms: NVIDIA GTX Titan X, RTX 3060, and Tesla V100. The benchmark provides significant insights, such as YOLOv9-E achieving the highest mean average precision (mAP) of 76.0% on VOC07+12 and also showing superior detection accuracy on COCO2017 with an mAP of 56.6% which is 1.2% higher than that of the latest YOLOv10-X. YOLOv9-E stands out for its superior detection accuracy making it more suitable for detection that needs high accuracy such as analysis of medical images, while some lightweight versions like YOLOv5-S, YOLOv7-S, YOLOv8-S, and YOLOv10-S offer the great balance of accuracy and speed, making them ideal for real-time applications. Among them, YOLOv7-S has the highest mAP value among these lightweight models. Inference benchmarks highlight lightweight YOLO models such as YOLOv10-S for their exceptional inference speed on all GPUs and results of training time also indicate YOLOv9-E would take the longest time to converge among all versions using both datasets. This study would provide researchers and developers with some strategies in choosing appropriate YOLO models based on accuracy, resource availability, and application-specific needs. |
|---|---|
| ISSN: | 2169-3536 |