A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification

Medical image classification often relies on CNNs to capture local details (e.g., lesions, nodules) or on transformers to model long-range dependencies. However, each paradigm alone is limited in addressing both fine-grained structures and broader anatomical context. We propose ConvTransGFusion, a h...

Full description

Saved in:
Bibliographic Details
Main Authors: Jaber Qezelbash-Chamak, Karen Hicklin
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:IoT
Subjects:
Online Access:https://www.mdpi.com/2624-831X/6/2/30
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849432365833650176
author Jaber Qezelbash-Chamak
Karen Hicklin
author_facet Jaber Qezelbash-Chamak
Karen Hicklin
author_sort Jaber Qezelbash-Chamak
collection DOAJ
description Medical image classification often relies on CNNs to capture local details (e.g., lesions, nodules) or on transformers to model long-range dependencies. However, each paradigm alone is limited in addressing both fine-grained structures and broader anatomical context. We propose ConvTransGFusion, a hybrid model that fuses ConvNeXt (for refined convolutional features) and Swin Transformer (for hierarchical global attention) using a learnable dual-attention gating mechanism. By aligning spatial dimensions, scaling each branch adaptively, and applying both channel and spatial attention, the proposed architecture bridges local and global representations, melding fine-grained lesion details with the broader anatomical context essential for accurate diagnosis. Tested on four diverse medical imaging datasets—including X-ray, ultrasound, and MRI scans—the proposed model consistently achieves superior accuracy, precision, recall, F1, and AUC over state-of-the-art CNNs and transformers. Our findings highlight the benefits of combining convolutional inductive biases and transformer-based global context in a single learnable framework, positioning ConvTransGFusion as a robust and versatile solution for real-world clinical applications.
format Article
id doaj-art-bbfbd99c8b8c439b88c3a1de0847d783
institution Kabale University
issn 2624-831X
language English
publishDate 2025-05-01
publisher MDPI AG
record_format Article
series IoT
spelling doaj-art-bbfbd99c8b8c439b88c3a1de0847d7832025-08-20T03:27:22ZengMDPI AGIoT2624-831X2025-05-01623010.3390/iot6020030A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image ClassificationJaber Qezelbash-Chamak0Karen Hicklin1Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USADepartment of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USAMedical image classification often relies on CNNs to capture local details (e.g., lesions, nodules) or on transformers to model long-range dependencies. However, each paradigm alone is limited in addressing both fine-grained structures and broader anatomical context. We propose ConvTransGFusion, a hybrid model that fuses ConvNeXt (for refined convolutional features) and Swin Transformer (for hierarchical global attention) using a learnable dual-attention gating mechanism. By aligning spatial dimensions, scaling each branch adaptively, and applying both channel and spatial attention, the proposed architecture bridges local and global representations, melding fine-grained lesion details with the broader anatomical context essential for accurate diagnosis. Tested on four diverse medical imaging datasets—including X-ray, ultrasound, and MRI scans—the proposed model consistently achieves superior accuracy, precision, recall, F1, and AUC over state-of-the-art CNNs and transformers. Our findings highlight the benefits of combining convolutional inductive biases and transformer-based global context in a single learnable framework, positioning ConvTransGFusion as a robust and versatile solution for real-world clinical applications.https://www.mdpi.com/2624-831X/6/2/30machine learningdeep learningConvNeXtSwin Transformerfeature fusionbiomedical informatics
spellingShingle Jaber Qezelbash-Chamak
Karen Hicklin
A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification
IoT
machine learning
deep learning
ConvNeXt
Swin Transformer
feature fusion
biomedical informatics
title A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification
title_full A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification
title_fullStr A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification
title_full_unstemmed A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification
title_short A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification
title_sort hybrid learnable fusion of convnext and swin transformer for optimized image classification
topic machine learning
deep learning
ConvNeXt
Swin Transformer
feature fusion
biomedical informatics
url https://www.mdpi.com/2624-831X/6/2/30
work_keys_str_mv AT jaberqezelbashchamak ahybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification
AT karenhicklin ahybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification
AT jaberqezelbashchamak hybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification
AT karenhicklin hybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification