A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation
Semantic segmentation is one of the key tasks in the intelligent interpretation of remote sensing images with extensive potential applications. However, when ultra-high resolution (UHR) remote sensing images exhibit complex background intersections and significant variations in object sizes, existin...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-08-01
|
| Series: | Remote Sensing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-4292/17/15/2696 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849405936940089344 |
|---|---|
| author | Lingyu Yan Qingyang Feng Jing Wang Jinshan Cao Xiaoxiao Feng Xing Tang |
| author_facet | Lingyu Yan Qingyang Feng Jing Wang Jinshan Cao Xiaoxiao Feng Xing Tang |
| author_sort | Lingyu Yan |
| collection | DOAJ |
| description | Semantic segmentation is one of the key tasks in the intelligent interpretation of remote sensing images with extensive potential applications. However, when ultra-high resolution (UHR) remote sensing images exhibit complex background intersections and significant variations in object sizes, existing multimodal fusion segmentation methods based on convolutional neural networks and Transformers face challenges such as limited receptive fields and high secondary complexity, leading to inadequate global context modeling and multimodal feature representation. Moreover, the lack of accurate boundary detail feature constraints in the final segmentation further limits segmentation accuracy. To address these challenges, we propose a novel boundary-enhanced multilevel multimodal fusion Mamba-Large Strip Convolution network (FMLSNet) for remote sensing image segmentation, which offers the advantages of a global receptive field and efficient linear complexity. Specifically, this paper introduces a new multistage Mamba multimodal fusion framework (FMB) for UHR remote sensing image segmentation. By employing an innovative multimodal scanning mechanism integrated with disentanglement strategies to deepen the fusion process, FMB promotes deep fusion of multimodal features and captures cross-modal contextual information at multiple levels, enabling robust and comprehensive feature integration with enriched global semantic context. Additionally, we propose a Large Strip Spatial Detail (LSSD) extraction module, which adaptively combines multi-directional large strip convolutions to capture more precise and fine-grained boundary features. This enables the network to learn detailed spatial features from shallow layers. A large number of experimental results on challenging remote sensing image datasets show that our method exhibits superior performance over state-of-the-art models. |
| format | Article |
| id | doaj-art-e54388b06431461783978cf4b2d31219 |
| institution | Kabale University |
| issn | 2072-4292 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Remote Sensing |
| spelling | doaj-art-e54388b06431461783978cf4b2d312192025-08-20T03:36:32ZengMDPI AGRemote Sensing2072-42922025-08-011715269610.3390/rs17152696A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic SegmentationLingyu Yan0Qingyang Feng1Jing Wang2Jinshan Cao3Xiaoxiao Feng4Xing Tang5School of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaSchool of Computer Science, Hubei University of Technology, Wuhan 430068, ChinaKey Laboratory of Green Intelligent Computing Network in Hubei Province, Wuhan 430068, ChinaSemantic segmentation is one of the key tasks in the intelligent interpretation of remote sensing images with extensive potential applications. However, when ultra-high resolution (UHR) remote sensing images exhibit complex background intersections and significant variations in object sizes, existing multimodal fusion segmentation methods based on convolutional neural networks and Transformers face challenges such as limited receptive fields and high secondary complexity, leading to inadequate global context modeling and multimodal feature representation. Moreover, the lack of accurate boundary detail feature constraints in the final segmentation further limits segmentation accuracy. To address these challenges, we propose a novel boundary-enhanced multilevel multimodal fusion Mamba-Large Strip Convolution network (FMLSNet) for remote sensing image segmentation, which offers the advantages of a global receptive field and efficient linear complexity. Specifically, this paper introduces a new multistage Mamba multimodal fusion framework (FMB) for UHR remote sensing image segmentation. By employing an innovative multimodal scanning mechanism integrated with disentanglement strategies to deepen the fusion process, FMB promotes deep fusion of multimodal features and captures cross-modal contextual information at multiple levels, enabling robust and comprehensive feature integration with enriched global semantic context. Additionally, we propose a Large Strip Spatial Detail (LSSD) extraction module, which adaptively combines multi-directional large strip convolutions to capture more precise and fine-grained boundary features. This enables the network to learn detailed spatial features from shallow layers. A large number of experimental results on challenging remote sensing image datasets show that our method exhibits superior performance over state-of-the-art models.https://www.mdpi.com/2072-4292/17/15/2696multimodal remote sensinglarge strip convolutionvisual state space modelsemantic segmentation |
| spellingShingle | Lingyu Yan Qingyang Feng Jing Wang Jinshan Cao Xiaoxiao Feng Xing Tang A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation Remote Sensing multimodal remote sensing large strip convolution visual state space model semantic segmentation |
| title | A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation |
| title_full | A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation |
| title_fullStr | A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation |
| title_full_unstemmed | A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation |
| title_short | A Multilevel Multimodal Hybrid Mamba-Large Strip Convolution Network for Remote Sensing Semantic Segmentation |
| title_sort | multilevel multimodal hybrid mamba large strip convolution network for remote sensing semantic segmentation |
| topic | multimodal remote sensing large strip convolution visual state space model semantic segmentation |
| url | https://www.mdpi.com/2072-4292/17/15/2696 |
| work_keys_str_mv | AT lingyuyan amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT qingyangfeng amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT jingwang amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT jinshancao amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT xiaoxiaofeng amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT xingtang amultilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT lingyuyan multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT qingyangfeng multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT jingwang multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT jinshancao multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT xiaoxiaofeng multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation AT xingtang multilevelmultimodalhybridmambalargestripconvolutionnetworkforremotesensingsemanticsegmentation |