A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data
<b>Background/Objectives</b>: Although transformer-based deep learning architectures are preferred in many hybrid architectures due to their flexibility, they generally perform poorly on image classification tasks with small datasets. An important improvement in performance when transfor...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Diagnostics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2075-4418/15/7/929 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | <b>Background/Objectives</b>: Although transformer-based deep learning architectures are preferred in many hybrid architectures due to their flexibility, they generally perform poorly on image classification tasks with small datasets. An important improvement in performance when transformer architectures work with limited data is the use of distillation techniques. The impact of distillation techniques on classification accuracy in transformer-based models has not yet been extensively investigated. <b>Methods</b>: This study investigates the impact of distillation techniques on the classification performance of transformer-based deep learning architectures trained on limited data. We use transformer-based models ViTx32 and ViTx16 without distillation and DeiT and BeiT with distillation. A four-class dataset of brain MRI images is used for training and testing. <b>Results</b>: Our experiments show that the DeiT and BeiT architectures with distillation achieve performance gains of 2.2% and 1%, respectively, compared to ViTx16. A more detailed analysis shows that the distillation techniques improve the detection of non-patient individuals by about 4%. Our study also includes a detailed analysis of the training times for each architecture. <b>Conclusions</b>: The results of the experiments show that using distillation techniques in transformer-based deep learning models can significantly improve classification accuracy when working with limited data. Based on these findings, we recommend the use of transformer-based models with distillation, especially in medical applications and other areas where flexible models are developed with limited data. |
|---|---|
| ISSN: | 2075-4418 |