MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation

Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the poten...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chao Deng, Xiao Qin
Format:	Article
Language:	English
Published:	MDPI AG 2025-08-01
Series:	Applied Sciences
Subjects:	medical image segmentation deep learning attention mechanism MaxViT CSWin
Online Access:	https://www.mdpi.com/2076-3417/15/15/8626
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849407683130556416
author	Chao Deng Xiao Qin
author_facet	Chao Deng Xiao Qin
author_sort	Chao Deng
collection	DOAJ
description	Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the potential of encoders with different encoding methodologies. Additionally, we observed that the encoder output contains substantial redundant information. To address this, we designed a Demodulate Bridge (DB) to filter out redundant information from feature maps. Furthermore, we developed a multi-Scale Sampling Decoder (SSD) capable of preserving information about organs of varying sizes during upsampling and accurately restoring their shapes. This study demonstrates the superior performance of MINTFormer across several datasets, including Synapse, ACDC, Kvasir-SEG, and skin lesion segmentation datasets.
format	Article
id	doaj-art-6ae4b2d03fd3446da2223c2de90c1a51
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-08-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-6ae4b2d03fd3446da2223c2de90c1a512025-08-20T03:35:58ZengMDPI AGApplied Sciences2076-34172025-08-011515862610.3390/app15158626MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image SegmentationChao Deng0Xiao Qin1School of Artificial Intelligence, Nanning Normal University, Nanning 530100, ChinaSchool of Artificial Intelligence, Nanning Normal University, Nanning 530100, ChinaTransformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the potential of encoders with different encoding methodologies. Additionally, we observed that the encoder output contains substantial redundant information. To address this, we designed a Demodulate Bridge (DB) to filter out redundant information from feature maps. Furthermore, we developed a multi-Scale Sampling Decoder (SSD) capable of preserving information about organs of varying sizes during upsampling and accurately restoring their shapes. This study demonstrates the superior performance of MINTFormer across several datasets, including Synapse, ACDC, Kvasir-SEG, and skin lesion segmentation datasets.https://www.mdpi.com/2076-3417/15/15/8626medical image segmentationdeep learningattention mechanismMaxViTCSWin
spellingShingle	Chao Deng Xiao Qin MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation Applied Sciences medical image segmentation deep learning attention mechanism MaxViT CSWin
title	MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_full	MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_fullStr	MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_full_unstemmed	MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_short	MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
title_sort	mintformer multi scale information aggregation with cswin vision transformer for medical image segmentation
topic	medical image segmentation deep learning attention mechanism MaxViT CSWin
url	https://www.mdpi.com/2076-3417/15/15/8626
work_keys_str_mv	AT chaodeng mintformermultiscaleinformationaggregationwithcswinvisiontransformerformedicalimagesegmentation AT xiaoqin mintformermultiscaleinformationaggregationwithcswinvisiontransformerformedicalimagesegmentation

MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation

Similar Items