A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data
<b>Background/Objectives</b>: Although transformer-based deep learning architectures are preferred in many hybrid architectures due to their flexibility, they generally perform poorly on image classification tasks with small datasets. An important improvement in performance when transfor...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Diagnostics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2075-4418/15/7/929 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849738685892788224 |
|---|---|
| author | Aynur Sevinc Murat Ucan Buket Kaya |
| author_facet | Aynur Sevinc Murat Ucan Buket Kaya |
| author_sort | Aynur Sevinc |
| collection | DOAJ |
| description | <b>Background/Objectives</b>: Although transformer-based deep learning architectures are preferred in many hybrid architectures due to their flexibility, they generally perform poorly on image classification tasks with small datasets. An important improvement in performance when transformer architectures work with limited data is the use of distillation techniques. The impact of distillation techniques on classification accuracy in transformer-based models has not yet been extensively investigated. <b>Methods</b>: This study investigates the impact of distillation techniques on the classification performance of transformer-based deep learning architectures trained on limited data. We use transformer-based models ViTx32 and ViTx16 without distillation and DeiT and BeiT with distillation. A four-class dataset of brain MRI images is used for training and testing. <b>Results</b>: Our experiments show that the DeiT and BeiT architectures with distillation achieve performance gains of 2.2% and 1%, respectively, compared to ViTx16. A more detailed analysis shows that the distillation techniques improve the detection of non-patient individuals by about 4%. Our study also includes a detailed analysis of the training times for each architecture. <b>Conclusions</b>: The results of the experiments show that using distillation techniques in transformer-based deep learning models can significantly improve classification accuracy when working with limited data. Based on these findings, we recommend the use of transformer-based models with distillation, especially in medical applications and other areas where flexible models are developed with limited data. |
| format | Article |
| id | doaj-art-19e45aa0c292495fbca183b9bf7b6607 |
| institution | DOAJ |
| issn | 2075-4418 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Diagnostics |
| spelling | doaj-art-19e45aa0c292495fbca183b9bf7b66072025-08-20T03:06:29ZengMDPI AGDiagnostics2075-44182025-04-0115792910.3390/diagnostics15070929A Distillation Approach to Transformer-Based Medical Image Classification with Limited DataAynur Sevinc0Murat Ucan1Buket Kaya2Department of Computer Technologies, Silvan Vocational School, Dicle University, Diyarbakir 21640, TurkeyDepartment of Computer Technologies, Vocational School of Technical Sciences, Dicle University, Diyarbakir 21200, TurkeyDepartment of Electronics and Automation, Firat University, Elazig 23119, Turkey<b>Background/Objectives</b>: Although transformer-based deep learning architectures are preferred in many hybrid architectures due to their flexibility, they generally perform poorly on image classification tasks with small datasets. An important improvement in performance when transformer architectures work with limited data is the use of distillation techniques. The impact of distillation techniques on classification accuracy in transformer-based models has not yet been extensively investigated. <b>Methods</b>: This study investigates the impact of distillation techniques on the classification performance of transformer-based deep learning architectures trained on limited data. We use transformer-based models ViTx32 and ViTx16 without distillation and DeiT and BeiT with distillation. A four-class dataset of brain MRI images is used for training and testing. <b>Results</b>: Our experiments show that the DeiT and BeiT architectures with distillation achieve performance gains of 2.2% and 1%, respectively, compared to ViTx16. A more detailed analysis shows that the distillation techniques improve the detection of non-patient individuals by about 4%. Our study also includes a detailed analysis of the training times for each architecture. <b>Conclusions</b>: The results of the experiments show that using distillation techniques in transformer-based deep learning models can significantly improve classification accuracy when working with limited data. Based on these findings, we recommend the use of transformer-based models with distillation, especially in medical applications and other areas where flexible models are developed with limited data.https://www.mdpi.com/2075-4418/15/7/929BeiTclassificationDeiTdistillationtransformers |
| spellingShingle | Aynur Sevinc Murat Ucan Buket Kaya A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data Diagnostics BeiT classification DeiT distillation transformers |
| title | A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data |
| title_full | A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data |
| title_fullStr | A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data |
| title_full_unstemmed | A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data |
| title_short | A Distillation Approach to Transformer-Based Medical Image Classification with Limited Data |
| title_sort | distillation approach to transformer based medical image classification with limited data |
| topic | BeiT classification DeiT distillation transformers |
| url | https://www.mdpi.com/2075-4418/15/7/929 |
| work_keys_str_mv | AT aynursevinc adistillationapproachtotransformerbasedmedicalimageclassificationwithlimiteddata AT muratucan adistillationapproachtotransformerbasedmedicalimageclassificationwithlimiteddata AT buketkaya adistillationapproachtotransformerbasedmedicalimageclassificationwithlimiteddata AT aynursevinc distillationapproachtotransformerbasedmedicalimageclassificationwithlimiteddata AT muratucan distillationapproachtotransformerbasedmedicalimageclassificationwithlimiteddata AT buketkaya distillationapproachtotransformerbasedmedicalimageclassificationwithlimiteddata |