MINTFormer: Multi-Scale Information Aggregation with CSWin Vision Transformer for Medical Image Segmentation
Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the poten...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-08-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/15/8626 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Transformers have been extensively utilized as encoders in medical image segmentation; however, the information that an encoder can capture is inherently limited. In this study, we propose MINTFormer, which introduces a Heterogeneous encoder that integratesCSWin and MaxViT to fully exploit the potential of encoders with different encoding methodologies. Additionally, we observed that the encoder output contains substantial redundant information. To address this, we designed a Demodulate Bridge (DB) to filter out redundant information from feature maps. Furthermore, we developed a multi-Scale Sampling Decoder (SSD) capable of preserving information about organs of varying sizes during upsampling and accurately restoring their shapes. This study demonstrates the superior performance of MINTFormer across several datasets, including Synapse, ACDC, Kvasir-SEG, and skin lesion segmentation datasets. |
|---|---|
| ISSN: | 2076-3417 |