Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
In this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller mo...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-08-01
|
| Series: | Machine Learning and Knowledge Extraction |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-4990/6/3/94 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850260462217723904 |
|---|---|
| author | Darian Onchis Codruta Istin Ioan Samuila |
| author_facet | Darian Onchis Codruta Istin Ioan Samuila |
| author_sort | Darian Onchis |
| collection | DOAJ |
| description | In this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller model to achieve efficient, fast, and generalizable performance while retaining much of the original accuracy. The majority of deep neural models used for classification tasks append a SoftMax layer to generate output probabilities and it is usual to take the highest score and consider it the inference of the model, while the rest of the probability values are generally ignored. The focus is on those probabilities as carriers of dark knowledge and our aim is to quantify the relevance of dark knowledge, not heuristically as provided in the literature so far, but with an inductive proof on the SoftMax operational limits. These limits are further pushed by using an incremental decision tree with information gain split. The user can set a desired precision and an accuracy level to obtain a maximal temperature setting for a continual classification process. Moreover, by fitting both the hard targets and the soft targets, one obtains an optimal knowledge distillation effect that mitigates better catastrophic forgetting. The strengths of our method come from the possibility of controlling the amount of distillation transferred non-heuristically and the agnostic application of this model-independent study. |
| format | Article |
| id | doaj-art-5eb7bf639c7a472ba9fb35e9da07c2eb |
| institution | OA Journals |
| issn | 2504-4990 |
| language | English |
| publishDate | 2024-08-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Machine Learning and Knowledge Extraction |
| spelling | doaj-art-5eb7bf639c7a472ba9fb35e9da07c2eb2025-08-20T01:55:38ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902024-08-01631921193510.3390/make6030094Optimal Knowledge Distillation through Non-Heuristic Control of Dark KnowledgeDarian Onchis0Codruta Istin1Ioan Samuila2Department of Computer Science, West University of Timisoara, 300223 Timisoara, RomaniaDepartment of Computer and Information Technology, Politehnica University of Timisoara, 300006 Timisoara, RomaniaDepartment of Computer Science, West University of Timisoara, 300223 Timisoara, RomaniaIn this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller model to achieve efficient, fast, and generalizable performance while retaining much of the original accuracy. The majority of deep neural models used for classification tasks append a SoftMax layer to generate output probabilities and it is usual to take the highest score and consider it the inference of the model, while the rest of the probability values are generally ignored. The focus is on those probabilities as carriers of dark knowledge and our aim is to quantify the relevance of dark knowledge, not heuristically as provided in the literature so far, but with an inductive proof on the SoftMax operational limits. These limits are further pushed by using an incremental decision tree with information gain split. The user can set a desired precision and an accuracy level to obtain a maximal temperature setting for a continual classification process. Moreover, by fitting both the hard targets and the soft targets, one obtains an optimal knowledge distillation effect that mitigates better catastrophic forgetting. The strengths of our method come from the possibility of controlling the amount of distillation transferred non-heuristically and the agnostic application of this model-independent study.https://www.mdpi.com/2504-4990/6/3/94dark knowledgeknowledge distillationclusteringincremental learning |
| spellingShingle | Darian Onchis Codruta Istin Ioan Samuila Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge Machine Learning and Knowledge Extraction dark knowledge knowledge distillation clustering incremental learning |
| title | Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge |
| title_full | Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge |
| title_fullStr | Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge |
| title_full_unstemmed | Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge |
| title_short | Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge |
| title_sort | optimal knowledge distillation through non heuristic control of dark knowledge |
| topic | dark knowledge knowledge distillation clustering incremental learning |
| url | https://www.mdpi.com/2504-4990/6/3/94 |
| work_keys_str_mv | AT darianonchis optimalknowledgedistillationthroughnonheuristiccontrolofdarkknowledge AT codrutaistin optimalknowledgedistillationthroughnonheuristiccontrolofdarkknowledge AT ioansamuila optimalknowledgedistillationthroughnonheuristiccontrolofdarkknowledge |