MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation

Abstract Medical image segmentation plays a crucial role in addressing emerging healthcare challenges. Although several impressive deep learning architectures based on convolutional neural networks (CNNs) and Transformers have recently demonstrated remarkable performance, there is still potential fo...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruiyuan Chen, Saiqi He, Junjie Xie, Tao Wang, Yingying Xu, Jiangxiong Fang, Xiaoming Zhao, Shiqing Zhang, Guoyu Wang, Hongsheng Lu, Zhaohui Yang
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-89096-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850067134267260928
author Ruiyuan Chen
Saiqi He
Junjie Xie
Tao Wang
Yingying Xu
Jiangxiong Fang
Xiaoming Zhao
Shiqing Zhang
Guoyu Wang
Hongsheng Lu
Zhaohui Yang
author_facet Ruiyuan Chen
Saiqi He
Junjie Xie
Tao Wang
Yingying Xu
Jiangxiong Fang
Xiaoming Zhao
Shiqing Zhang
Guoyu Wang
Hongsheng Lu
Zhaohui Yang
author_sort Ruiyuan Chen
collection DOAJ
description Abstract Medical image segmentation plays a crucial role in addressing emerging healthcare challenges. Although several impressive deep learning architectures based on convolutional neural networks (CNNs) and Transformers have recently demonstrated remarkable performance, there is still potential for further performance improvement due to their inherent limitations in capturing feature correlations of input data. To address this issue, this paper proposes a novel encoder-decoder architecture called MedFuseNet that aims to fuse local and global deep feature representations with hybrid attention mechanisms for medical image segmentation. More specifically, the proposed approach contains two branches for feature learning in parallel: one leverages CNNs to learn local correlations of input data, and the other utilizes Swin-Transformer to capture global contextual correlations of input data. For feature fusion and enhancement, the designed hybrid attention mechanisms combine four different attention modules: (1) an atrous spatial pyramid pooling (ASPP) module for the CNN branch, (2) a cross attention module in the encoder for fusing local and global features, (3) an adaptive cross attention (ACA) module in skip connections for further performing fusion, and (4) a squeeze-and-excitation attention (SE-attention) module in the decoder for highlighting informative features. We evaluate our proposed approach on the public ACDC and Synapse datasets, and achieves the average DSC of 89.73% and 78.40%, respectively. Experimental results on these two datasets demonstrate the effectiveness of our proposed approach on medical image segmentation tasks, outperforming other used state-of-the-art approaches.
format Article
id doaj-art-e6d5493f3fc04fce8e00adf1a0fe5e93
institution DOAJ
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-e6d5493f3fc04fce8e00adf1a0fe5e932025-08-20T02:48:29ZengNature PortfolioScientific Reports2045-23222025-02-0115111410.1038/s41598-025-89096-9MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentationRuiyuan Chen0Saiqi He1Junjie Xie2Tao Wang3Yingying Xu4Jiangxiong Fang5Xiaoming Zhao6Shiqing Zhang7Guoyu Wang8Hongsheng Lu9Zhaohui Yang10Taizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityHengyang Central HospitalTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityTaizhou Central Hospital (Taizhou University Hospital), Taizhou UniversityAbstract Medical image segmentation plays a crucial role in addressing emerging healthcare challenges. Although several impressive deep learning architectures based on convolutional neural networks (CNNs) and Transformers have recently demonstrated remarkable performance, there is still potential for further performance improvement due to their inherent limitations in capturing feature correlations of input data. To address this issue, this paper proposes a novel encoder-decoder architecture called MedFuseNet that aims to fuse local and global deep feature representations with hybrid attention mechanisms for medical image segmentation. More specifically, the proposed approach contains two branches for feature learning in parallel: one leverages CNNs to learn local correlations of input data, and the other utilizes Swin-Transformer to capture global contextual correlations of input data. For feature fusion and enhancement, the designed hybrid attention mechanisms combine four different attention modules: (1) an atrous spatial pyramid pooling (ASPP) module for the CNN branch, (2) a cross attention module in the encoder for fusing local and global features, (3) an adaptive cross attention (ACA) module in skip connections for further performing fusion, and (4) a squeeze-and-excitation attention (SE-attention) module in the decoder for highlighting informative features. We evaluate our proposed approach on the public ACDC and Synapse datasets, and achieves the average DSC of 89.73% and 78.40%, respectively. Experimental results on these two datasets demonstrate the effectiveness of our proposed approach on medical image segmentation tasks, outperforming other used state-of-the-art approaches.https://doi.org/10.1038/s41598-025-89096-9Medical image segmentationDeep learningConvolutional neural networksSwin-TransformerHybrid attention mechanisms
spellingShingle Ruiyuan Chen
Saiqi He
Junjie Xie
Tao Wang
Yingying Xu
Jiangxiong Fang
Xiaoming Zhao
Shiqing Zhang
Guoyu Wang
Hongsheng Lu
Zhaohui Yang
MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
Scientific Reports
Medical image segmentation
Deep learning
Convolutional neural networks
Swin-Transformer
Hybrid attention mechanisms
title MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
title_full MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
title_fullStr MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
title_full_unstemmed MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
title_short MedFuseNet: fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
title_sort medfusenet fusing local and global deep feature representations with hybrid attention mechanisms for medical image segmentation
topic Medical image segmentation
Deep learning
Convolutional neural networks
Swin-Transformer
Hybrid attention mechanisms
url https://doi.org/10.1038/s41598-025-89096-9
work_keys_str_mv AT ruiyuanchen medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT saiqihe medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT junjiexie medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT taowang medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT yingyingxu medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT jiangxiongfang medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT xiaomingzhao medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT shiqingzhang medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT guoyuwang medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT hongshenglu medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation
AT zhaohuiyang medfusenetfusinglocalandglobaldeepfeaturerepresentationswithhybridattentionmechanismsformedicalimagesegmentation