MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the poten...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-08-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/15/8626 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849407683130556416 |
|---|---|
| author | Chao Deng Xiao Qin |
| author_facet | Chao Deng Xiao Qin |
| author_sort | Chao Deng |
| collection | DOAJ |
| description | Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the potential of encoders with different encoding methodologies. Additionally, we observed that the encoder output contains substantial redundant information. To address this, we designed a Demodulate Bridge (DB) to filter out redundant information from feature maps. Furthermore, we developed a multi-Scale Sampling Decoder (SSD) capable of preserving information about organs of varying sizes during upsampling and accurately restoring their shapes. This study demonstrates the superior performance of MINTFormer across several datasets, including Synapse, ACDC, Kvasir-SEG, and skin lesion segmentation datasets. |
| format | Article |
| id | doaj-art-6ae4b2d03fd3446da2223c2de90c1a51 |
| institution | Kabale University |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-6ae4b2d03fd3446da2223c2de90c1a512025-08-20T03:35:58ZengMDPI AGApplied Sciences2076-34172025-08-011515862610.3390/app15158626MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image SegmentationChao Deng0Xiao Qin1School of Artificial Intelligence, Nanning Normal University, Nanning 530100, ChinaSchool of Artificial Intelligence, Nanning Normal University, Nanning 530100, ChinaTransformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the potential of encoders with different encoding methodologies. Additionally, we observed that the encoder output contains substantial redundant information. To address this, we designed a Demodulate Bridge (DB) to filter out redundant information from feature maps. Furthermore, we developed a multi-Scale Sampling Decoder (SSD) capable of preserving information about organs of varying sizes during upsampling and accurately restoring their shapes. This study demonstrates the superior performance of MINTFormer across several datasets, including Synapse, ACDC, Kvasir-SEG, and skin lesion segmentation datasets.https://www.mdpi.com/2076-3417/15/15/8626medical image segmentationdeep learningattention mechanismMaxViTCSWin |
| spellingShingle | Chao Deng Xiao Qin MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation Applied Sciences medical image segmentation deep learning attention mechanism MaxViT CSWin |
| title | MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation |
| title_full | MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation |
| title_fullStr | MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation |
| title_full_unstemmed | MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation |
| title_short | MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation |
| title_sort | mintformer multi scale information aggregation with cswin vision transformer for medical image segmentation |
| topic | medical image segmentation deep learning attention mechanism MaxViT CSWin |
| url | https://www.mdpi.com/2076-3417/15/15/8626 |
| work_keys_str_mv | AT chaodeng mintformermultiscaleinformationaggregationwithcswinvisiontransformerformedicalimagesegmentation AT xiaoqin mintformermultiscaleinformationaggregationwithcswinvisiontransformerformedicalimagesegmentation |