Vision Mamba and xLSTM-UNet for medical image segmentation

Abstract Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at m...

Full description

Saved in:
Bibliographic Details
Main Authors: Xin Zhong, Gehao Lu, Hao Li
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88967-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850039590707003392
author Xin Zhong
Gehao Lu
Hao Li
author_facet Xin Zhong
Gehao Lu
Hao Li
author_sort Xin Zhong
collection DOAJ
description Abstract Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at modeling global information, their high computational complexity restricts their practical application in clinical scenarios. To address these limitations, this study introduces VMAXL-UNet, a novel segmentation network that integrates Structured State Space Models (SSM) and lightweight LSTMs (xLSTM). The network incorporates Visual State Space (VSS) and ViL modules in the encoder to efficiently fuse local boundary details with global semantic context. The VSS module leverages SSM to capture long-range dependencies and extract critical features from distant regions. Meanwhile, the ViL module employs a gating mechanism to enhance the integration of local and global features, thereby improving segmentation accuracy and robustness. Experiments on datasets such as ISIC17, ISIC18, CVC-ClinicDB, and Kvasir demonstrate that VMAXL-UNet significantly outperforms traditional CNNs and Transformer-based models in capturing lesion boundaries and their distant correlations. These results highlight the model’s superior performance and provide a promising approach for efficient segmentation in complex medical imaging scenarios.
format Article
id doaj-art-c954c4ac8f2b4d56951bfe62fcf9fddc
institution DOAJ
issn 2045-2322
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-c954c4ac8f2b4d56951bfe62fcf9fddc2025-08-20T02:56:19ZengNature PortfolioScientific Reports2045-23222025-03-0115111210.1038/s41598-025-88967-5Vision Mamba and xLSTM-UNet for medical image segmentationXin Zhong0Gehao Lu1Hao Li2School of Information Science and Engineering, Yunnan UniversitySchool of Information Science and Engineering, Yunnan UniversitySchool of Information Science and Engineering, Yunnan UniversityAbstract Deep learning-based medical image segmentation methods are generally divided into convolutional neural networks (CNNs) and Transformer-based models. Traditional CNNs are limited by their receptive field, making it challenging to capture long-range dependencies. While Transformers excel at modeling global information, their high computational complexity restricts their practical application in clinical scenarios. To address these limitations, this study introduces VMAXL-UNet, a novel segmentation network that integrates Structured State Space Models (SSM) and lightweight LSTMs (xLSTM). The network incorporates Visual State Space (VSS) and ViL modules in the encoder to efficiently fuse local boundary details with global semantic context. The VSS module leverages SSM to capture long-range dependencies and extract critical features from distant regions. Meanwhile, the ViL module employs a gating mechanism to enhance the integration of local and global features, thereby improving segmentation accuracy and robustness. Experiments on datasets such as ISIC17, ISIC18, CVC-ClinicDB, and Kvasir demonstrate that VMAXL-UNet significantly outperforms traditional CNNs and Transformer-based models in capturing lesion boundaries and their distant correlations. These results highlight the model’s superior performance and provide a promising approach for efficient segmentation in complex medical imaging scenarios.https://doi.org/10.1038/s41598-025-88967-5Deep LearningMedical Image SegmentationSSMXLSTM
spellingShingle Xin Zhong
Gehao Lu
Hao Li
Vision Mamba and xLSTM-UNet for medical image segmentation
Scientific Reports
Deep Learning
Medical Image Segmentation
SSM
XLSTM
title Vision Mamba and xLSTM-UNet for medical image segmentation
title_full Vision Mamba and xLSTM-UNet for medical image segmentation
title_fullStr Vision Mamba and xLSTM-UNet for medical image segmentation
title_full_unstemmed Vision Mamba and xLSTM-UNet for medical image segmentation
title_short Vision Mamba and xLSTM-UNet for medical image segmentation
title_sort vision mamba and xlstm unet for medical image segmentation
topic Deep Learning
Medical Image Segmentation
SSM
XLSTM
url https://doi.org/10.1038/s41598-025-88967-5
work_keys_str_mv AT xinzhong visionmambaandxlstmunetformedicalimagesegmentation
AT gehaolu visionmambaandxlstmunetformedicalimagesegmentation
AT haoli visionmambaandxlstmunetformedicalimagesegmentation