MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation

Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the poten...

Full description

Saved in:
Bibliographic Details
Main Authors: Chao Deng, Xiao Qin
Format: Article
Language:English
Published: MDPI AG 2025-08-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/15/8626
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849407683130556416
author Chao Deng
Xiao Qin
author_facet Chao Deng
Xiao Qin
author_sort Chao Deng
collection DOAJ
description Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the potential of encoders with different encoding methodologies. Additionally, we observed that the encoder output contains substantial redundant information. To address this, we designed a Demodulate Bridge (DB) to filter out redundant information from feature maps. Furthermore, we developed a multi-Scale Sampling Decoder (SSD) capable of preserving information about organs of varying sizes during upsampling and accurately restoring their shapes. This study demonstrates the superior performance of MINTFormer across several datasets, including Synapse, ACDC, Kvasir-SEG, and skin lesion segmentation datasets.
format Article
id doaj-art-6ae4b2d03fd3446da2223c2de90c1a51
institution Kabale University
issn 2076-3417
language English
publishDate 2025-08-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-6ae4b2d03fd3446da2223c2de90c1a512025-08-20T03:35:58ZengMDPI AGApplied Sciences2076-34172025-08-011515862610.3390/app15158626MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image SegmentationChao Deng0Xiao Qin1School of Artificial Intelligence, Nanning Normal University, Nanning 530100, ChinaSchool of Artificial Intelligence, Nanning Normal University, Nanning 530100, ChinaTransformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the potential of encoders with different encoding methodologies. Additionally, we observed that the encoder output contains substantial redundant information. To address this, we designed a Demodulate Bridge (DB) to filter out redundant information from feature maps. Furthermore, we developed a multi-Scale Sampling Decoder (SSD) capable of preserving information about organs of varying sizes during upsampling and accurately restoring their shapes. This study demonstrates the superior performance of MINTFormer across several datasets, including Synapse, ACDC, Kvasir-SEG, and skin lesion segmentation datasets.https://www.mdpi.com/2076-3417/15/15/8626medical image segmentationdeep learningattention mechanismMaxViTCSWin
spellingShingle Chao Deng
Xiao Qin
MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
Applied Sciences
medical image segmentation
deep learning
attention mechanism
MaxViT
CSWin
title MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_full MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_fullStr MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_full_unstemmed MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_short MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_sort mintformer multi scale information aggregation with cswin vision transformer for medical image segmentation
topic medical image segmentation
deep learning
attention mechanism
MaxViT
CSWin
url https://www.mdpi.com/2076-3417/15/15/8626
work_keys_str_mv AT chaodeng mintformermultiscaleinformationaggregationwithcswinvisiontransformerformedicalimagesegmentation
AT xiaoqin mintformermultiscaleinformationaggregationwithcswinvisiontransformerformedicalimagesegmentation