Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections

Abstract Autonomous vehicles heavily rely on precise scene understanding to ensure safe navigation. These vehicles house an array of sophisticated sensors and advanced technologies, like computer vision and artificial intelligence, to navigate complex and unpredictable real-world driving scenarios....

Full description

Saved in:
Bibliographic Details
Main Authors: Siddhant Arora, Ahaan Banerjee, Nitish Katal
Format: Article
Language:English
Published: Springer 2025-08-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00455-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332375409917952
author Siddhant Arora
Ahaan Banerjee
Nitish Katal
author_facet Siddhant Arora
Ahaan Banerjee
Nitish Katal
author_sort Siddhant Arora
collection DOAJ
description Abstract Autonomous vehicles heavily rely on precise scene understanding to ensure safe navigation. These vehicles house an array of sophisticated sensors and advanced technologies, like computer vision and artificial intelligence, to navigate complex and unpredictable real-world driving scenarios. Semantic segmentation is a primary method that enables AVs to perceive and understand their environment. As the driving scenes characterize dynamic scenarios with unpredictable movements of other vehicles, pedestrians, cyclists, and animals; it becomes necessary for these vehicles to observe their environment in real time and with high precision; and also demand a high level of precision in the semantic segmentation of these driving scenes. In the proposed work, an efficient UNet inspired architecture, namely ResAttUNet, is proposed; wherein the classical UNet is modified by introducing the attention mechanism in the skip and the introduction of the residual connections in each encoder and decoder block to build a deeper model. The proposed work evaluates the integration of the residual connections and the attention gate for segmentation; the residual connections enable deeper models, and the inclusion of attention gate in the skip layers of the UNet enables the model to decisively prioritize the critical information to enhance the overall capability. The evaluation was carried out on the CamVid dataset, and it was observed that the proposed ResAttUNet offers superior performance over existing models, such as FCN, PSPNet, and SegFast-Mobile, with higher accuracies and intersection over union (IOU) metrics. ResAttUNet surpasses existing state-of-the-art models, achieving a pixel-level accuracy of 98.78% and mean IOU of 0.5321.
format Article
id doaj-art-85bf441b16554a868eab5d0c7d766209
institution Kabale University
issn 2731-0809
language English
publishDate 2025-08-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj-art-85bf441b16554a868eab5d0c7d7662092025-08-20T03:46:12ZengSpringerDiscover Artificial Intelligence2731-08092025-08-015112510.1007/s44163-025-00455-xEnhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connectionsSiddhant Arora0Ahaan Banerjee1Nitish Katal2School of Computer Science and Engineering, Vellore Institute of TechnologySchool of Computer Science and Engineering, Vellore Institute of TechnologySchool of Electronics Engineering, Vellore Institute of TechnologyAbstract Autonomous vehicles heavily rely on precise scene understanding to ensure safe navigation. These vehicles house an array of sophisticated sensors and advanced technologies, like computer vision and artificial intelligence, to navigate complex and unpredictable real-world driving scenarios. Semantic segmentation is a primary method that enables AVs to perceive and understand their environment. As the driving scenes characterize dynamic scenarios with unpredictable movements of other vehicles, pedestrians, cyclists, and animals; it becomes necessary for these vehicles to observe their environment in real time and with high precision; and also demand a high level of precision in the semantic segmentation of these driving scenes. In the proposed work, an efficient UNet inspired architecture, namely ResAttUNet, is proposed; wherein the classical UNet is modified by introducing the attention mechanism in the skip and the introduction of the residual connections in each encoder and decoder block to build a deeper model. The proposed work evaluates the integration of the residual connections and the attention gate for segmentation; the residual connections enable deeper models, and the inclusion of attention gate in the skip layers of the UNet enables the model to decisively prioritize the critical information to enhance the overall capability. The evaluation was carried out on the CamVid dataset, and it was observed that the proposed ResAttUNet offers superior performance over existing models, such as FCN, PSPNet, and SegFast-Mobile, with higher accuracies and intersection over union (IOU) metrics. ResAttUNet surpasses existing state-of-the-art models, achieving a pixel-level accuracy of 98.78% and mean IOU of 0.5321.https://doi.org/10.1007/s44163-025-00455-xAttention gateAutonomous drivingResidual learningSemantic segmentationUNetConvolutional neural networks (CNNs)
spellingShingle Siddhant Arora
Ahaan Banerjee
Nitish Katal
Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections
Discover Artificial Intelligence
Attention gate
Autonomous driving
Residual learning
Semantic segmentation
UNet
Convolutional neural networks (CNNs)
title Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections
title_full Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections
title_fullStr Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections
title_full_unstemmed Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections
title_short Enhanced urban driving scene segmentation using modified UNet with residual convolutions and attention guided skip connections
title_sort enhanced urban driving scene segmentation using modified unet with residual convolutions and attention guided skip connections
topic Attention gate
Autonomous driving
Residual learning
Semantic segmentation
UNet
Convolutional neural networks (CNNs)
url https://doi.org/10.1007/s44163-025-00455-x
work_keys_str_mv AT siddhantarora enhancedurbandrivingscenesegmentationusingmodifiedunetwithresidualconvolutionsandattentionguidedskipconnections
AT ahaanbanerjee enhancedurbandrivingscenesegmentationusingmodifiedunetwithresidualconvolutionsandattentionguidedskipconnections
AT nitishkatal enhancedurbandrivingscenesegmentationusingmodifiedunetwithresidualconvolutionsandattentionguidedskipconnections