Dual-stream detection and segmentation framework for vision based unmanned ground vehicle pothole perception on unstructured roads
Abstract Reliable perception of road surface damage is considered essential for ensuring safe and autonomous operation of Unmanned Ground Vehicles (UGVs) on unstructured roads, where irregular textures, blurred boundaries, and environmental interference are often encountered. To overcome the limitat...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-08-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44443-025-00236-7 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Reliable perception of road surface damage is considered essential for ensuring safe and autonomous operation of Unmanned Ground Vehicles (UGVs) on unstructured roads, where irregular textures, blurred boundaries, and environmental interference are often encountered. To overcome the limitations of existing methods, a dual-stream detection–segmentation framework is presented, in which object-level localization and pixel-level boundary extraction are decoupled and independently optimized. Specifically, the detection stream adopts an enhanced YOLOv10+ network equipped with a frequency-aware fusion module (FreqFusion) in the neck to improve semantic–spatial alignment and robustness to texture variation. The segmentation stream introduces GAL-DeepLabv3+plus, which integrates a Dense Atrous Spatial Pyramid Pooling (DenseASPP) module and a Graph Attention Layer (GAL) into the standard DeepLabv3+ architecture, thereby enhancing contextual reasoning and boundary refinement. Extensive experiments are conducted on a self-constructed dataset comprising 3,000 annotated images of unstructured roads. Quantitative results demonstrate that the proposed framework achieves an F1-score of 93.0% and recall of 93.2% in detection, and an IoU of 92.5% and F1-score of 96.1% in segmentation. Ablation studies and environmental condition tests further confirm its component effectiveness and real-world applicability. Compared with state-of-the-art baselines such as YOLOv8 and DeepLabv3+, the proposed method achieves +5.1% and +3.4% improvements in detection F1-score and segmentation IoU, respectively, highlighting its superior performance and practical value in complex terrain perception scenarios. |
|---|---|
| ISSN: | 1319-1578 2213-1248 |