Vision Mamba and xLSTM-UNet for medical image segmentation
Abstract Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at m...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-03-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-88967-5 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850039590707003392 |
|---|---|
| author | Xin Zhong Gehao Lu Hao Li |
| author_facet | Xin Zhong Gehao Lu Hao Li |
| author_sort | Xin Zhong |
| collection | DOAJ |
| description | Abstract Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at modeling global information, their high computational complexity restricts their practical application in clinical scenarios. To address these limitations, this study introduces VMAXL-UNet, a novel segmentation network that integrates Structured State Space Models (SSM) and lightweight LSTMs (xLSTM). The network incorporates Visual State Space (VSS) and ViL modules in the encoder to efficiently fuse local boundary details with global semantic context. The VSS module leverages SSM to capture long-range dependencies and extract critical features from distant regions. Meanwhile, the ViL module employs a gating mechanism to enhance the integration of local and global features, thereby improving segmentation accuracy and robustness. Experiments on datasets such as ISIC17, ISIC18, CVC-ClinicDB, and Kvasir demonstrate that VMAXL-UNet significantly outperforms traditional CNNs and Transformer-based models in capturing lesion boundaries and their distant correlations. These results highlight the model’s superior performance and provide a promising approach for efficient segmentation in complex medical imaging scenarios. |
| format | Article |
| id | doaj-art-c954c4ac8f2b4d56951bfe62fcf9fddc |
| institution | DOAJ |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-03-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-c954c4ac8f2b4d56951bfe62fcf9fddc2025-08-20T02:56:19ZengNature PortfolioScientific Reports2045-23222025-03-0115111210.1038/s41598-025-88967-5Vision Mamba and xLSTM-UNet for medical image segmentationXin Zhong0Gehao Lu1Hao Li2School of Information Science and Engineering, Yunnan UniversitySchool of Information Science and Engineering, Yunnan UniversitySchool of Information Science and Engineering, Yunnan UniversityAbstract Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at modeling global information, their high computational complexity restricts their practical application in clinical scenarios. To address these limitations, this study introduces VMAXL-UNet, a novel segmentation network that integrates Structured State Space Models (SSM) and lightweight LSTMs (xLSTM). The network incorporates Visual State Space (VSS) and ViL modules in the encoder to efficiently fuse local boundary details with global semantic context. The VSS module leverages SSM to capture long-range dependencies and extract critical features from distant regions. Meanwhile, the ViL module employs a gating mechanism to enhance the integration of local and global features, thereby improving segmentation accuracy and robustness. Experiments on datasets such as ISIC17, ISIC18, CVC-ClinicDB, and Kvasir demonstrate that VMAXL-UNet significantly outperforms traditional CNNs and Transformer-based models in capturing lesion boundaries and their distant correlations. These results highlight the model’s superior performance and provide a promising approach for efficient segmentation in complex medical imaging scenarios.https://doi.org/10.1038/s41598-025-88967-5Deep LearningMedical Image SegmentationSSMXLSTM |
| spellingShingle | Xin Zhong Gehao Lu Hao Li Vision Mamba and xLSTM-UNet for medical image segmentation Scientific Reports Deep Learning Medical Image Segmentation SSM XLSTM |
| title | Vision Mamba and xLSTM-UNet for medical image segmentation |
| title_full | Vision Mamba and xLSTM-UNet for medical image segmentation |
| title_fullStr | Vision Mamba and xLSTM-UNet for medical image segmentation |
| title_full_unstemmed | Vision Mamba and xLSTM-UNet for medical image segmentation |
| title_short | Vision Mamba and xLSTM-UNet for medical image segmentation |
| title_sort | vision mamba and xlstm unet for medical image segmentation |
| topic | Deep Learning Medical Image Segmentation SSM XLSTM |
| url | https://doi.org/10.1038/s41598-025-88967-5 |
| work_keys_str_mv | AT xinzhong visionmambaandxlstmunetformedicalimagesegmentation AT gehaolu visionmambaandxlstmunetformedicalimagesegmentation AT haoli visionmambaandxlstmunetformedicalimagesegmentation |