Distilling Diverse Knowledge for Deep Ensemble Learning

Bidirectional knowledge distillation improves network performance by sharing knowledge between networks during the training of multiple networks. Additionally, performance is further improved by using an ensemble of multiple networks during inference. However, the performance improvement achieved by...

Full description

Saved in:
Bibliographic Details
Main Authors: Naoki Okamoto, Tsubasa Hirakawa, Takayoshi Yamashita, Hironobu Fujiyoshi
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11028994/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850161248098844672
author Naoki Okamoto
Tsubasa Hirakawa
Takayoshi Yamashita
Hironobu Fujiyoshi
author_facet Naoki Okamoto
Tsubasa Hirakawa
Takayoshi Yamashita
Hironobu Fujiyoshi
author_sort Naoki Okamoto
collection DOAJ
description Bidirectional knowledge distillation improves network performance by sharing knowledge between networks during the training of multiple networks. Additionally, performance is further improved by using an ensemble of multiple networks during inference. However, the performance improvement achieved by an ensemble of networks trained with bidirectional knowledge distillation is lower compared to a general ensemble that does not use knowledge distillation. From this trend, we think there is a relationship between network diversity, which is essential for performance improvement through ensembling, and the networks’ shared knowledge. Therefore, we present a distillation strategy to promote network diversity for ensemble learning. Since different types of network diversity can be considered, we design loss functions to separate knowledge and automatically design an effective distillation strategy for ensemble learning by performing a hyperparameter search using these loss functions as hyperparameters. Furthermore, considering network diversity, we design a network compression method for the ensemble and obtain a single network with performance equivalent to that of the ensemble. In the experiments, we automatically design distillation strategies for ensemble learning and evaluate the ensemble accuracy on five classification task datasets.
format Article
id doaj-art-89644c16012a4ce4af69c47f0cf10e26
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-89644c16012a4ce4af69c47f0cf10e262025-08-20T02:22:55ZengIEEEIEEE Access2169-35362025-01-011310087510088810.1109/ACCESS.2025.357810511028994Distilling Diverse Knowledge for Deep Ensemble LearningNaoki Okamoto0https://orcid.org/0000-0003-0053-8890Tsubasa Hirakawa1https://orcid.org/0000-0003-3851-5221Takayoshi Yamashita2https://orcid.org/0000-0003-2631-9856Hironobu Fujiyoshi3https://orcid.org/0000-0001-7391-4725Department of Robotic Science and Technology, Chubu University, Kasugai-shi, Aichi, JapanDepartment of Computer Science, Chubu University, Kasugai-shi, Aichi, JapanDepartment of Computer Science, Chubu University, Kasugai-shi, Aichi, JapanDepartment of Robotic Science and Technology, Chubu University, Kasugai-shi, Aichi, JapanBidirectional knowledge distillation improves network performance by sharing knowledge between networks during the training of multiple networks. Additionally, performance is further improved by using an ensemble of multiple networks during inference. However, the performance improvement achieved by an ensemble of networks trained with bidirectional knowledge distillation is lower compared to a general ensemble that does not use knowledge distillation. From this trend, we think there is a relationship between network diversity, which is essential for performance improvement through ensembling, and the networks’ shared knowledge. Therefore, we present a distillation strategy to promote network diversity for ensemble learning. Since different types of network diversity can be considered, we design loss functions to separate knowledge and automatically design an effective distillation strategy for ensemble learning by performing a hyperparameter search using these loss functions as hyperparameters. Furthermore, considering network diversity, we design a network compression method for the ensemble and obtain a single network with performance equivalent to that of the ensemble. In the experiments, we automatically design distillation strategies for ensemble learning and evaluate the ensemble accuracy on five classification task datasets.https://ieeexplore.ieee.org/document/11028994/Ensemble learningknowledge distillationcollaborative learninghyperparameter searchAutoML
spellingShingle Naoki Okamoto
Tsubasa Hirakawa
Takayoshi Yamashita
Hironobu Fujiyoshi
Distilling Diverse Knowledge for Deep Ensemble Learning
IEEE Access
Ensemble learning
knowledge distillation
collaborative learning
hyperparameter search
AutoML
title Distilling Diverse Knowledge for Deep Ensemble Learning
title_full Distilling Diverse Knowledge for Deep Ensemble Learning
title_fullStr Distilling Diverse Knowledge for Deep Ensemble Learning
title_full_unstemmed Distilling Diverse Knowledge for Deep Ensemble Learning
title_short Distilling Diverse Knowledge for Deep Ensemble Learning
title_sort distilling diverse knowledge for deep ensemble learning
topic Ensemble learning
knowledge distillation
collaborative learning
hyperparameter search
AutoML
url https://ieeexplore.ieee.org/document/11028994/
work_keys_str_mv AT naokiokamoto distillingdiverseknowledgefordeepensemblelearning
AT tsubasahirakawa distillingdiverseknowledgefordeepensemblelearning
AT takayoshiyamashita distillingdiverseknowledgefordeepensemblelearning
AT hironobufujiyoshi distillingdiverseknowledgefordeepensemblelearning