MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images

Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attent...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan Wang, Li Cao, He Deng
Format: Article
Language:English
Published: MDPI AG 2024-11-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/24/22/7266
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850267044184850432
author Yan Wang
Li Cao
He Deng
author_facet Yan Wang
Li Cao
He Deng
author_sort Yan Wang
collection DOAJ
description Semantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance when compared with traditional single-modal techniques. Nonetheless, the majority of these multi-modal models, which rely on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for feature fusion, face limitations in terms of remote modeling capabilities or computational complexity. This paper presents a novel Mamba-based multi-modal fusion network called MFMamba for semantic segmentation of remote sensing images. Specifically, the network employs a dual-branch encoding structure, consisting of a CNN-based main encoder for extracting local features from high-resolution remote sensing images (HRRSIs) and of a Mamba-based auxiliary encoder for capturing global features on its corresponding digital surface model (DSM). To capitalize on the distinct attributes of the multi-modal remote sensing data from both branches, a feature fusion block (FFB) is designed to synergistically enhance and integrate the features extracted from the dual-branch structure at each stage. Extensive experiments on the Vaihingen and the Potsdam datasets have verified the effectiveness and superiority of MFMamba in semantic segmentation of remote sensing images. Compared with state-of-the-art methods, MFMamba achieves higher overall accuracy (OA) and a higher mean F1 score (mF1) and mean intersection over union (mIoU), while maintaining low computational complexity.
format Article
id doaj-art-4c48f534f26b4c4983710f73619ccf05
institution OA Journals
issn 1424-8220
language English
publishDate 2024-11-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-4c48f534f26b4c4983710f73619ccf052025-08-20T01:53:57ZengMDPI AGSensors1424-82202024-11-012422726610.3390/s24227266MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing ImagesYan Wang0Li Cao1He Deng2School of Electrical and Electronic Engineering, Wuhan Polytechnic University, Wuhan 430023, ChinaSchool of Electrical and Electronic Engineering, Wuhan Polytechnic University, Wuhan 430023, ChinaSchool of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430081, ChinaSemantic segmentation of remote sensing images is a fundamental task in computer vision, holding substantial relevance in applications such as land cover surveys, environmental protection, and urban building planning. In recent years, multi-modal fusion-based models have garnered considerable attention, exhibiting superior segmentation performance when compared with traditional single-modal techniques. Nonetheless, the majority of these multi-modal models, which rely on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for feature fusion, face limitations in terms of remote modeling capabilities or computational complexity. This paper presents a novel Mamba-based multi-modal fusion network called MFMamba for semantic segmentation of remote sensing images. Specifically, the network employs a dual-branch encoding structure, consisting of a CNN-based main encoder for extracting local features from high-resolution remote sensing images (HRRSIs) and of a Mamba-based auxiliary encoder for capturing global features on its corresponding digital surface model (DSM). To capitalize on the distinct attributes of the multi-modal remote sensing data from both branches, a feature fusion block (FFB) is designed to synergistically enhance and integrate the features extracted from the dual-branch structure at each stage. Extensive experiments on the Vaihingen and the Potsdam datasets have verified the effectiveness and superiority of MFMamba in semantic segmentation of remote sensing images. Compared with state-of-the-art methods, MFMamba achieves higher overall accuracy (OA) and a higher mean F1 score (mF1) and mean intersection over union (mIoU), while maintaining low computational complexity.https://www.mdpi.com/1424-8220/24/22/7266semantic segmentationmulti-modal remote sensing datafeature fusion
spellingShingle Yan Wang
Li Cao
He Deng
MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
Sensors
semantic segmentation
multi-modal remote sensing data
feature fusion
title MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_full MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_fullStr MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_full_unstemmed MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_short MFMamba: A Mamba-Based Multi-Modal Fusion Network for Semantic Segmentation of Remote Sensing Images
title_sort mfmamba a mamba based multi modal fusion network for semantic segmentation of remote sensing images
topic semantic segmentation
multi-modal remote sensing data
feature fusion
url https://www.mdpi.com/1424-8220/24/22/7266
work_keys_str_mv AT yanwang mfmambaamambabasedmultimodalfusionnetworkforsemanticsegmentationofremotesensingimages
AT licao mfmambaamambabasedmultimodalfusionnetworkforsemanticsegmentationofremotesensingimages
AT hedeng mfmambaamambabasedmultimodalfusionnetworkforsemanticsegmentationofremotesensingimages