Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge

In this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller mo...

Full description

Saved in:
Bibliographic Details
Main Authors: Darian Onchis, Codruta Istin, Ioan Samuila
Format: Article
Language:English
Published: MDPI AG 2024-08-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/6/3/94
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850260462217723904
author Darian Onchis
Codruta Istin
Ioan Samuila
author_facet Darian Onchis
Codruta Istin
Ioan Samuila
author_sort Darian Onchis
collection DOAJ
description In this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller model to achieve efficient, fast, and generalizable performance while retaining much of the original accuracy. The majority of deep neural models used for classification tasks append a SoftMax layer to generate output probabilities and it is usual to take the highest score and consider it the inference of the model, while the rest of the probability values are generally ignored. The focus is on those probabilities as carriers of dark knowledge and our aim is to quantify the relevance of dark knowledge, not heuristically as provided in the literature so far, but with an inductive proof on the SoftMax operational limits. These limits are further pushed by using an incremental decision tree with information gain split. The user can set a desired precision and an accuracy level to obtain a maximal temperature setting for a continual classification process. Moreover, by fitting both the hard targets and the soft targets, one obtains an optimal knowledge distillation effect that mitigates better catastrophic forgetting. The strengths of our method come from the possibility of controlling the amount of distillation transferred non-heuristically and the agnostic application of this model-independent study.
format Article
id doaj-art-5eb7bf639c7a472ba9fb35e9da07c2eb
institution OA Journals
issn 2504-4990
language English
publishDate 2024-08-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj-art-5eb7bf639c7a472ba9fb35e9da07c2eb2025-08-20T01:55:38ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902024-08-01631921193510.3390/make6030094Optimal Knowledge Distillation through Non-Heuristic Control of Dark KnowledgeDarian Onchis0Codruta Istin1Ioan Samuila2Department of Computer Science, West University of Timisoara, 300223 Timisoara, RomaniaDepartment of Computer and Information Technology, Politehnica University of Timisoara, 300006 Timisoara, RomaniaDepartment of Computer Science, West University of Timisoara, 300223 Timisoara, RomaniaIn this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller model to achieve efficient, fast, and generalizable performance while retaining much of the original accuracy. The majority of deep neural models used for classification tasks append a SoftMax layer to generate output probabilities and it is usual to take the highest score and consider it the inference of the model, while the rest of the probability values are generally ignored. The focus is on those probabilities as carriers of dark knowledge and our aim is to quantify the relevance of dark knowledge, not heuristically as provided in the literature so far, but with an inductive proof on the SoftMax operational limits. These limits are further pushed by using an incremental decision tree with information gain split. The user can set a desired precision and an accuracy level to obtain a maximal temperature setting for a continual classification process. Moreover, by fitting both the hard targets and the soft targets, one obtains an optimal knowledge distillation effect that mitigates better catastrophic forgetting. The strengths of our method come from the possibility of controlling the amount of distillation transferred non-heuristically and the agnostic application of this model-independent study.https://www.mdpi.com/2504-4990/6/3/94dark knowledgeknowledge distillationclusteringincremental learning
spellingShingle Darian Onchis
Codruta Istin
Ioan Samuila
Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
Machine Learning and Knowledge Extraction
dark knowledge
knowledge distillation
clustering
incremental learning
title Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
title_full Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
title_fullStr Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
title_full_unstemmed Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
title_short Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
title_sort optimal knowledge distillation through non heuristic control of dark knowledge
topic dark knowledge
knowledge distillation
clustering
incremental learning
url https://www.mdpi.com/2504-4990/6/3/94
work_keys_str_mv AT darianonchis optimalknowledgedistillationthroughnonheuristiccontrolofdarkknowledge
AT codrutaistin optimalknowledgedistillationthroughnonheuristiccontrolofdarkknowledge
AT ioansamuila optimalknowledgedistillationthroughnonheuristiccontrolofdarkknowledge