GH-UNet: group-wise hybrid convolution-VIT for robust medical image segmentation

Abstract Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-U...

Full description

Saved in:
Bibliographic Details
Main Authors: Shengxiang Wang, Ge Li, Min Gao, Linlin Zhuo, Mingzhe Liu, Zhizhong Ma, Wei Zhao, Xiangzheng Fu
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01829-2
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Medical image segmentation is vital for accurate diagnosis. While U-Net-based models are effective, they struggle to capture long-range dependencies in complex anatomy. We propose GH-UNet, a Group-wise Hybrid Convolution-ViT model within the U-Net framework, to address this limitation. GH-UNet integrates a hybrid convolution-Transformer encoder for both local detail and global context modeling, a Group-wise Dynamic Gating (GDG) module for adaptive feature weighting, and a cascaded decoder for multi-scale integration. Both the encoder and GDG are modular, enabling compatibility with various CNN or ViT backbones. Extensive experiments on five public and one private dataset show GH-UNet consistently achieves superior performance. On ISIC2016, it surpasses H2Former with 1.37% and 1.94% gains in DICE and IOU, respectively, using only 38% of the parameters and 49.61% of the FLOPs. The code is freely accessible via: https://github.com/xiachashuanghua/GH-UNet .
ISSN:2398-6352