A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation

Semantic segmentation is one of the key tasks in the intelligent interpretation of remote sensing images with extensive potential applications. However, when ultra-high resolution (UHR) remote sensing images exhibit complex background intersections and significant variations in object sizes, existin...

Full description

Saved in:
Bibliographic Details
Main Authors: Lingyu Yan, Qingyang Feng, Jing Wang, Jinshan Cao, Xiaoxiao Feng, Xing Tang
Format: Article
Language:English
Published: MDPI AG 2025-08-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/15/2696
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849405936940089344
author Lingyu Yan
Qingyang Feng
Jing Wang
Jinshan Cao
Xiaoxiao Feng
Xing Tang
author_facet Lingyu Yan
Qingyang Feng
Jing Wang
Jinshan Cao
Xiaoxiao Feng
Xing Tang
author_sort Lingyu Yan
collection DOAJ
description Semantic segmentation is one of the key tasks in the intelligent interpretation of remote sensing images with extensive potential applications. However, when ultra-high resolution (UHR) remote sensing images exhibit complex background intersections and significant variations in object sizes, existing multimodal fusion segmentation methods based on convolutional neural networks and Transformers face challenges such as limited receptive fields and high secondary complexity, leading to inadequate global context modeling and multimodal feature representation. Moreover, the lack of accurate boundary detail feature constraints in the final segmentation further limits segmentation accuracy. To address these challenges, we propose a novel boundary-enhanced multilevel multimodal fusion Mamba-Large Strip Convolution network (FMLSNet) for remote sensing image segmentation, which offers the advantages of a global receptive field and efficient linear complexity. Specifically, this paper introduces a new multistage Mamba multimodal fusion framework (FMB) for UHR remote sensing image segmentation. By employing an innovative multimodal scanning mechanism integrated with disentanglement strategies to deepen the fusion process, FMB promotes deep fusion of multimodal features and captures cross-modal contextual information at multiple levels, enabling robust and comprehensive feature integration with enriched global semantic context. Additionally, we propose a Large Strip Spatial Detail (LSSD) extraction module, which adaptively combines multi-directional large strip convolutions to capture more precise and fine-grained boundary features. This enables the network to learn detailed spatial features from shallow layers. A large number of experimental results on challenging remote sensing image datasets show that our method exhibits superior performance over state-of-the-art models.
format Article
id doaj-art-e54388b06431461783978cf4b2d31219
institution Kabale University
issn 2072-4292
language English
publishDate 2025-08-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-e54388b06431461783978cf4b2d312192025-08-20T03:36:32ZengMDPI AGRemote Sensing2072-42922025-08-011715269610.3390/rs17152696A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic SegmentationLingyu Yan0Qingyang Feng1Jing Wang2Jinshan Cao3Xiaoxiao Feng4Xing Tang5School of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaKey Laboratory of Green Intelligent Computing Network in Hubei Province, Wuhan 430068, ChinaSemantic segmentation is one of the key tasks in the intelligent interpretation of remote sensing images with extensive potential applications. However, when ultra-high resolution (UHR) remote sensing images exhibit complex background intersections and significant variations in object sizes, existing multimodal fusion segmentation methods based on convolutional neural networks and Transformers face challenges such as limited receptive fields and high secondary complexity, leading to inadequate global context modeling and multimodal feature representation. Moreover, the lack of accurate boundary detail feature constraints in the final segmentation further limits segmentation accuracy. To address these challenges, we propose a novel boundary-enhanced multilevel multimodal fusion Mamba-Large Strip Convolution network (FMLSNet) for remote sensing image segmentation, which offers the advantages of a global receptive field and efficient linear complexity. Specifically, this paper introduces a new multistage Mamba multimodal fusion framework (FMB) for UHR remote sensing image segmentation. By employing an innovative multimodal scanning mechanism integrated with disentanglement strategies to deepen the fusion process, FMB promotes deep fusion of multimodal features and captures cross-modal contextual information at multiple levels, enabling robust and comprehensive feature integration with enriched global semantic context. Additionally, we propose a Large Strip Spatial Detail (LSSD) extraction module, which adaptively combines multi-directional large strip convolutions to capture more precise and fine-grained boundary features. This enables the network to learn detailed spatial features from shallow layers. A large number of experimental results on challenging remote sensing image datasets show that our method exhibits superior performance over state-of-the-art models.https://www.mdpi.com/2072-4292/17/15/2696multimodal remote sensinglarge strip convolutionvisual state space modelsemantic segmentation
spellingShingle Lingyu Yan
Qingyang Feng
Jing Wang
Jinshan Cao
Xiaoxiao Feng
Xing Tang
A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation
Remote Sensing
multimodal remote sensing
large strip convolution
visual state space model
semantic segmentation
title A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation
title_full A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation
title_fullStr A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation
title_full_unstemmed A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation
title_short A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation
title_sort multilevel multimodal hybrid mamba large strip convolution network for remote sensing semantic segmentation
topic multimodal remote sensing
large strip convolution
visual state space model
semantic segmentation
url https://www.mdpi.com/2072-4292/17/15/2696
work_keys_str_mv AT lingyuyan amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT qingyangfeng amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT jingwang amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT jinshancao amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT xiaoxiaofeng amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT xingtang amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT lingyuyan multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT qingyangfeng multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT jingwang multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT jinshancao multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT xiaoxiaofeng multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation
AT xingtang multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation