Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections
Abstract Autonomous vehicles heavily rely on precise scene understanding to ensure safe navigation. These vehicles house an array of sophisticated sensors and advanced technologies, like computer vision and artificial intelligence, to navigate complex and unpredictable real-world driving scenarios....
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-08-01
|
| Series: | Discover Artificial Intelligence |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44163-025-00455-x |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Autonomous vehicles heavily rely on precise scene understanding to ensure safe navigation. These vehicles house an array of sophisticated sensors and advanced technologies, like computer vision and artificial intelligence, to navigate complex and unpredictable real-world driving scenarios. Semantic segmentation is a primary method that enables AVs to perceive and understand their environment. As the driving scenes characterize dynamic scenarios with unpredictable movements of other vehicles, pedestrians, cyclists, and animals; it becomes necessary for these vehicles to observe their environment in real time and with high precision; and also demand a high level of precision in the semantic segmentation of these driving scenes. In the proposed work, an efficient UNet inspired architecture, namely ResAttUNet, is proposed; wherein the classical UNet is modified by introducing the attention mechanism in the skip and the introduction of the residual connections in each encoder and decoder block to build a deeper model. The proposed work evaluates the integration of the residual connections and the attention gate for segmentation; the residual connections enable deeper models, and the inclusion of attention gate in the skip layers of the UNet enables the model to decisively prioritize the critical information to enhance the overall capability. The evaluation was carried out on the CamVid dataset, and it was observed that the proposed ResAttUNet offers superior performance over existing models, such as FCN, PSPNet, and SegFast-Mobile, with higher accuracies and intersection over union (IOU) metrics. ResAttUNet surpasses existing state-of-the-art models, achieving a pixel-level accuracy of 98.78% and mean IOU of 0.5321. |
|---|---|
| ISSN: | 2731-0809 |