Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation

Scene understanding through semantic segmentation is a vital component for autonomous vehicles. Given the importance of safety in autonomous driving, existing methods are constantly striving to improve accuracy and reduce error. RGB-based semantic segmentation models typically underperform due to in...

Full description

Saved in:

Bibliographic Details
Main Authors:	emad mousavian, Danial Qashqai, Shahriar B. Shokouhi
Format:	Article
Language:	English
Published:	Ferdowsi University of Mashhad 2025-04-01
Series:	Computer and Knowledge Engineering
Subjects:	attention mechanism autonomous driving deep learning rgb-d semantic segmentation
Online Access:	https://cke.um.ac.ir/article_46516_bbfb88302877289ce4d9c04dd311ac60.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849731683761258496
author	emad mousavian Danial Qashqai Shahriar B. Shokouhi
author_facet	emad mousavian Danial Qashqai Shahriar B. Shokouhi
author_sort	emad mousavian
collection	DOAJ
description	Scene understanding through semantic segmentation is a vital component for autonomous vehicles. Given the importance of safety in autonomous driving, existing methods are constantly striving to improve accuracy and reduce error. RGB-based semantic segmentation models typically underperform due to information loss in challenging situations such as lighting variations and limitations in distinguishing occluded objects of similar appearance. Therefore, recent studies have developed RGB-D semantic segmentation methods by employing attention-based fusion modules. Existing fusion modules typically combine cross-modal features by focusing on each modality independently, which limits their ability to capture the complementary nature of modalities. To address this issue, we propose a simple yet effective module called the Discriminative Cross-modal Attention Fusion (DCMAF) module. Specifically, the proposed module performs cross-modal discrimination using element-wise subtraction in an attention-based approach. By integrating the DCMAF module with efficient channel- and spatial-wise attention modules, we introduce the Discriminative Cross-modal Network (DCMNet), a scale- and appearance-invariant model. Extensive experiments demonstrate significant improvements, particularly in predicting small and fine objects, achieving an mIoU of 77.39% on the CamVid dataset, outperforming state-of-the-art RGB-based methods, and a remarkable mIoU of 82.8% on the Cityscapes dataset. As the CamVid dataset lacks depth information, we employ the DPT monocular depth estimation model to generate depth images.
format	Article
id	doaj-art-05f0c35d30a5447385eda7e7bf20f434
institution	DOAJ
issn	2538-5453 2717-4123
language	English
publishDate	2025-04-01
publisher	Ferdowsi University of Mashhad
record_format	Article
series	Computer and Knowledge Engineering
spelling	doaj-art-05f0c35d30a5447385eda7e7bf20f4342025-08-20T03:08:28ZengFerdowsi University of MashhadComputer and Knowledge Engineering2538-54532717-41232025-04-0181435210.22067/cke.2025.88682.111746516Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentationemad mousavian0Danial Qashqai1Shahriar B. Shokouhi2Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran,Department of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran,Department of Electrical Engineering, Iran University of Science and Technology, Tehran, IranScene understanding through semantic segmentation is a vital component for autonomous vehicles. Given the importance of safety in autonomous driving, existing methods are constantly striving to improve accuracy and reduce error. RGB-based semantic segmentation models typically underperform due to information loss in challenging situations such as lighting variations and limitations in distinguishing occluded objects of similar appearance. Therefore, recent studies have developed RGB-D semantic segmentation methods by employing attention-based fusion modules. Existing fusion modules typically combine cross-modal features by focusing on each modality independently, which limits their ability to capture the complementary nature of modalities. To address this issue, we propose a simple yet effective module called the Discriminative Cross-modal Attention Fusion (DCMAF) module. Specifically, the proposed module performs cross-modal discrimination using element-wise subtraction in an attention-based approach. By integrating the DCMAF module with efficient channel- and spatial-wise attention modules, we introduce the Discriminative Cross-modal Network (DCMNet), a scale- and appearance-invariant model. Extensive experiments demonstrate significant improvements, particularly in predicting small and fine objects, achieving an mIoU of 77.39% on the CamVid dataset, outperforming state-of-the-art RGB-based methods, and a remarkable mIoU of 82.8% on the Cityscapes dataset. As the CamVid dataset lacks depth information, we employ the DPT monocular depth estimation model to generate depth images.https://cke.um.ac.ir/article_46516_bbfb88302877289ce4d9c04dd311ac60.pdfattention mechanismautonomous drivingdeep learningrgb-d semantic segmentation
spellingShingle	emad mousavian Danial Qashqai Shahriar B. Shokouhi Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation Computer and Knowledge Engineering attention mechanism autonomous driving deep learning rgb-d semantic segmentation
title	Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation
title_full	Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation
title_fullStr	Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation
title_full_unstemmed	Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation
title_short	Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation
title_sort	discriminative cross modal attention approach for rgb d semantic segmentation
topic	attention mechanism autonomous driving deep learning rgb-d semantic segmentation
url	https://cke.um.ac.ir/article_46516_bbfb88302877289ce4d9c04dd311ac60.pdf
work_keys_str_mv	AT emadmousavian discriminativecrossmodalattentionapproachforrgbdsemanticsegmentation AT danialqashqai discriminativecrossmodalattentionapproachforrgbdsemanticsegmentation AT shahriarbshokouhi discriminativecrossmodalattentionapproachforrgbdsemanticsegmentation

Discriminative Cross-Modal Attention Approach for RGB-D Semantic Segmentation

Similar Items