MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attent...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yan Wang, Li Cao, He Deng
Format:	Article
Language:	English
Published:	MDPI AG 2024-11-01
Series:	Sensors
Subjects:	semantic segmentation multi-modal remote sensing data feature fusion
Online Access:	https://www.mdpi.com/1424-8220/24/22/7266
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850267044184850432
author	Yan Wang Li Cao He Deng
author_facet	Yan Wang Li Cao He Deng
author_sort	Yan Wang
collection	DOAJ
description	Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance when compared with traditional single-modal techniques. Nonetheless, the majority of these multi-modal models, which rely on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for feature fusion, face limitations in terms of remote modeling capabilities or computational complexity. This paper presents a novel Mamba-based multi-modal fusion network called MFMamba for semantic segmentation of remote sensing images. Specifically, the network employs a dual-branch encoding structure, consisting of a CNN-based main encoder for extracting local features from high-resolution remote sensing images (HRRSIs) and of a Mamba-based auxiliary encoder for capturing global features on its corresponding digital surface model (DSM). To capitalize on the distinct attributes of the multi-modal remote sensing data from both branches, a feature fusion block (FFB) is designed to synergistically enhance and integrate the features extracted from the dual-branch structure at each stage. Extensive experiments on the Vaihingen and the Potsdam datasets have verified the effectiveness and superiority of MFMamba in semantic segmentation of remote sensing images. Compared with state-of-the-art methods, MFMamba achieves higher overall accuracy (OA) and a higher mean F1 score (mF1) and mean intersection over union (mIoU), while maintaining low computational complexity.
format	Article
id	doaj-art-4c48f534f26b4c4983710f73619ccf05
institution	OA Journals
issn	1424-8220
language	English
publishDate	2024-11-01
publisher	MDPI AG
record_format	Article
series	Sensors
spelling	doaj-art-4c48f534f26b4c4983710f73619ccf052025-08-20T01:53:57ZengMDPI AGSensors1424-82202024-11-012422726610.3390/s24227266MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing ImagesYan Wang0Li Cao1He Deng2School of Electrical and Electronic Engineering, Wuhan Polytechnic University, Wuhan 430023, ChinaSchool of Electrical and Electronic Engineering, Wuhan Polytechnic University, Wuhan 430023, ChinaSchool of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, ChinaSemantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance when compared with traditional single-modal techniques. Nonetheless, the majority of these multi-modal models, which rely on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for feature fusion, face limitations in terms of remote modeling capabilities or computational complexity. This paper presents a novel Mamba-based multi-modal fusion network called MFMamba for semantic segmentation of remote sensing images. Specifically, the network employs a dual-branch encoding structure, consisting of a CNN-based main encoder for extracting local features from high-resolution remote sensing images (HRRSIs) and of a Mamba-based auxiliary encoder for capturing global features on its corresponding digital surface model (DSM). To capitalize on the distinct attributes of the multi-modal remote sensing data from both branches, a feature fusion block (FFB) is designed to synergistically enhance and integrate the features extracted from the dual-branch structure at each stage. Extensive experiments on the Vaihingen and the Potsdam datasets have verified the effectiveness and superiority of MFMamba in semantic segmentation of remote sensing images. Compared with state-of-the-art methods, MFMamba achieves higher overall accuracy (OA) and a higher mean F1 score (mF1) and mean intersection over union (mIoU), while maintaining low computational complexity.https://www.mdpi.com/1424-8220/24/22/7266semantic segmentationmulti-modal remote sensing datafeature fusion
spellingShingle	Yan Wang Li Cao He Deng MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images Sensors semantic segmentation multi-modal remote sensing data feature fusion
title	MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_full	MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_fullStr	MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_full_unstemmed	MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_short	MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_sort	mfmamba a mamba based multi modal fusion network for semantic segmentation of remote sensing images
topic	semantic segmentation multi-modal remote sensing data feature fusion
url	https://www.mdpi.com/1424-8220/24/22/7266
work_keys_str_mv	AT yanwang mfmambaamambabasedmultimodalfusionnetworkforsemanticsegmentationofremotesensingimages AT licao mfmambaamambabasedmultimodalfusionnetworkforsemanticsegmentationofremotesensingimages AT hedeng mfmambaamambabasedmultimodalfusionnetworkforsemanticsegmentationofremotesensingimages

MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

Similar Items