Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections

Abstract Autonomous vehicles heavily rely on precise scene understanding to ensure safe navigation. These vehicles house an array of sophisticated sensors and advanced technologies, like computer vision and artificial intelligence, to navigate complex and unpredictable real-world driving scenarios....

Full description

Saved in:
Bibliographic Details
Main Authors: Siddhant Arora, Ahaan Banerjee, Nitish Katal
Format: Article
Language:English
Published: Springer 2025-08-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00455-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Autonomous vehicles heavily rely on precise scene understanding to ensure safe navigation. These vehicles house an array of sophisticated sensors and advanced technologies, like computer vision and artificial intelligence, to navigate complex and unpredictable real-world driving scenarios. Semantic segmentation is a primary method that enables AVs to perceive and understand their environment. As the driving scenes characterize dynamic scenarios with unpredictable movements of other vehicles, pedestrians, cyclists, and animals; it becomes necessary for these vehicles to observe their environment in real time and with high precision; and also demand a high level of precision in the semantic segmentation of these driving scenes. In the proposed work, an efficient UNet inspired architecture, namely ResAttUNet, is proposed; wherein the classical UNet is modified by introducing the attention mechanism in the skip and the introduction of the residual connections in each encoder and decoder block to build a deeper model. The proposed work evaluates the integration of the residual connections and the attention gate for segmentation; the residual connections enable deeper models, and the inclusion of attention gate in the skip layers of the UNet enables the model to decisively prioritize the critical information to enhance the overall capability. The evaluation was carried out on the CamVid dataset, and it was observed that the proposed ResAttUNet offers superior performance over existing models, such as FCN, PSPNet, and SegFast-Mobile, with higher accuracies and intersection over union (IOU) metrics. ResAttUNet surpasses existing state-of-the-art models, achieving a pixel-level accuracy of 98.78% and mean IOU of 0.5321.
ISSN:2731-0809