A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification
Medical image classification often relies on CNNs to capture local details (e.g., lesions, nodules) or on transformers to model long-range dependencies. However, each paradigm alone is limited in addressing both fine-grained structures and broader anatomical context. We propose ConvTransGFusion, a h...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | IoT |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2624-831X/6/2/30 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849432365833650176 |
|---|---|
| author | Jaber Qezelbash-Chamak Karen Hicklin |
| author_facet | Jaber Qezelbash-Chamak Karen Hicklin |
| author_sort | Jaber Qezelbash-Chamak |
| collection | DOAJ |
| description | Medical image classification often relies on CNNs to capture local details (e.g., lesions, nodules) or on transformers to model long-range dependencies. However, each paradigm alone is limited in addressing both fine-grained structures and broader anatomical context. We propose ConvTransGFusion, a hybrid model that fuses ConvNeXt (for refined convolutional features) and Swin Transformer (for hierarchical global attention) using a learnable dual-attention gating mechanism. By aligning spatial dimensions, scaling each branch adaptively, and applying both channel and spatial attention, the proposed architecture bridges local and global representations, melding fine-grained lesion details with the broader anatomical context essential for accurate diagnosis. Tested on four diverse medical imaging datasets—including X-ray, ultrasound, and MRI scans—the proposed model consistently achieves superior accuracy, precision, recall, F1, and AUC over state-of-the-art CNNs and transformers. Our findings highlight the benefits of combining convolutional inductive biases and transformer-based global context in a single learnable framework, positioning ConvTransGFusion as a robust and versatile solution for real-world clinical applications. |
| format | Article |
| id | doaj-art-bbfbd99c8b8c439b88c3a1de0847d783 |
| institution | Kabale University |
| issn | 2624-831X |
| language | English |
| publishDate | 2025-05-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | IoT |
| spelling | doaj-art-bbfbd99c8b8c439b88c3a1de0847d7832025-08-20T03:27:22ZengMDPI AGIoT2624-831X2025-05-01623010.3390/iot6020030A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image ClassificationJaber Qezelbash-Chamak0Karen Hicklin1Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USADepartment of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USAMedical image classification often relies on CNNs to capture local details (e.g., lesions, nodules) or on transformers to model long-range dependencies. However, each paradigm alone is limited in addressing both fine-grained structures and broader anatomical context. We propose ConvTransGFusion, a hybrid model that fuses ConvNeXt (for refined convolutional features) and Swin Transformer (for hierarchical global attention) using a learnable dual-attention gating mechanism. By aligning spatial dimensions, scaling each branch adaptively, and applying both channel and spatial attention, the proposed architecture bridges local and global representations, melding fine-grained lesion details with the broader anatomical context essential for accurate diagnosis. Tested on four diverse medical imaging datasets—including X-ray, ultrasound, and MRI scans—the proposed model consistently achieves superior accuracy, precision, recall, F1, and AUC over state-of-the-art CNNs and transformers. Our findings highlight the benefits of combining convolutional inductive biases and transformer-based global context in a single learnable framework, positioning ConvTransGFusion as a robust and versatile solution for real-world clinical applications.https://www.mdpi.com/2624-831X/6/2/30machine learningdeep learningConvNeXtSwin Transformerfeature fusionbiomedical informatics |
| spellingShingle | Jaber Qezelbash-Chamak Karen Hicklin A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification IoT machine learning deep learning ConvNeXt Swin Transformer feature fusion biomedical informatics |
| title | A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification |
| title_full | A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification |
| title_fullStr | A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification |
| title_full_unstemmed | A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification |
| title_short | A Hybrid Learnable Fusion of ConvNeXt and Swin Transformer for Optimized Image Classification |
| title_sort | hybrid learnable fusion of convnext and swin transformer for optimized image classification |
| topic | machine learning deep learning ConvNeXt Swin Transformer feature fusion biomedical informatics |
| url | https://www.mdpi.com/2624-831X/6/2/30 |
| work_keys_str_mv | AT jaberqezelbashchamak ahybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification AT karenhicklin ahybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification AT jaberqezelbashchamak hybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification AT karenhicklin hybridlearnablefusionofconvnextandswintransformerforoptimizedimageclassification |