An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving

In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as aut...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jia Tian, Peizeng Xin, Xinlu Bai, Zhiguo Xiao, Nianfeng Li
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Applied Sciences
Subjects:	real-time semantic segmentation autonomous driving perception lightweight neural network attention mechanism multi-scale feature fusion
Online Access:	https://www.mdpi.com/2076-3417/15/15/8373
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In recent years, a growing number of real-time semantic segmentation networks have been developed to improve segmentation accuracy. However, these advancements often come at the cost of increased computational complexity, which limits their inference efficiency, particularly in scenarios such as autonomous driving, where strict real-time performance is essential. Achieving an effective balance between speed and accuracy has thus become a central challenge in this field. To address this issue, we present a lightweight semantic segmentation model tailored for the perception requirements of autonomous vehicles. The architecture follows an encoder–decoder paradigm, which not only preserves the capability for deep feature extraction but also facilitates multi-scale information integration. The encoder leverages a high-efficiency backbone, while the decoder introduces a dynamic fusion mechanism designed to enhance information interaction between different feature branches. Recognizing the limitations of convolutional networks in modeling long-range dependencies and capturing global semantic context, the model incorporates an attention-based feature extraction component. This is further augmented by positional encoding, enabling better awareness of spatial structures and local details. The dynamic fusion mechanism employs an adaptive weighting strategy, adjusting the contribution of each feature channel to reduce redundancy and improve representation quality. To validate the effectiveness of the proposed network, experiments were conducted on a single RTX 3090 GPU. The Dynamic Real-time Integrated Vision Encoder–Segmenter Network (DriveSegNet) achieved a mean Intersection over Union (mIoU) of 76.9% and an inference speed of 70.5 FPS on the Cityscapes test dataset, 74.6% mIoU and 139.8 FPS on the CamVid test dataset, and 35.8% mIoU with 108.4 FPS on the ADE20K dataset. The experimental results demonstrate that the proposed method achieves an excellent balance between inference speed, segmentation accuracy, and model size.
ISSN:	2076-3417

An Efficient Semantic Segmentation Framework with Attention-Driven Context Enhancement and Dynamic Fusion for Autonomous Driving

Similar Items