GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation

Abstract Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-U...

Full description

Saved in:
Bibliographic Details
Main Authors: Shengxiang Wang, Ge Li, Min Gao, Linlin Zhuo, Mingzhe Liu, Zhizhong Ma, Wei Zhao, Xiangzheng Fu
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01829-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849332402261852160
author Shengxiang Wang
Ge Li
Min Gao
Linlin Zhuo
Mingzhe Liu
Zhizhong Ma
Wei Zhao
Xiangzheng Fu
author_facet Shengxiang Wang
Ge Li
Min Gao
Linlin Zhuo
Mingzhe Liu
Zhizhong Ma
Wei Zhao
Xiangzheng Fu
author_sort Shengxiang Wang
collection DOAJ
description Abstract Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet .
format Article
id doaj-art-b439201d860b4fe1aed606e503c62be9
institution Kabale University
issn 2398-6352
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-b439201d860b4fe1aed606e503c62be92025-08-20T03:46:12ZengNature Portfolionpj Digital Medicine2398-63522025-07-018111510.1038/s41746-025-01829-2GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentationShengxiang Wang0Ge Li1Min Gao2Linlin Zhuo3Mingzhe Liu4Zhizhong Ma5Wei Zhao6Xiangzheng Fu7School of Data Science and Artificial Intelligence, Wenzhou University of TechnologyDepartment of Radiology, Xiangya Hospital, Central South UniversityDepartment of Radiology, The Second Xiangya Hospital, Central South UniversitySchool of Data Science and Artificial Intelligence, Wenzhou University of TechnologySchool of Data Science and Artificial Intelligence, Wenzhou University of TechnologySchool of Data Science and Artificial Intelligence, Wenzhou University of TechnologyDepartment of Radiology, The Second Xiangya Hospital, Central South UniversitySchool of Chinese Medicine, Hong Kong Baptist UniversityAbstract Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet .https://doi.org/10.1038/s41746-025-01829-2
spellingShingle Shengxiang Wang
Ge Li
Min Gao
Linlin Zhuo
Mingzhe Liu
Zhizhong Ma
Wei Zhao
Xiangzheng Fu
GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation
npj Digital Medicine
title GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation
title_full GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation
title_fullStr GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation
title_full_unstemmed GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation
title_short GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation
title_sort gh unet group wise hybrid convolution vit for robust medical image segmentation
url https://doi.org/10.1038/s41746-025-01829-2
work_keys_str_mv AT shengxiangwang ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation
AT geli ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation
AT mingao ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation
AT linlinzhuo ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation
AT mingzheliu ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation
AT zhizhongma ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation
AT weizhao ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation
AT xiangzhengfu ghunetgroupwisehybridconvolutionvitforrobustmedicalimagesegmentation