Distilling Diverse Knowledge for Deep Ensemble Learning
Bidirectional knowledge distillation improves network performance by sharing knowledge between networks during the training of multiple networks. Additionally, performance is further improved by using an ensemble of multiple networks during inference. However, the performance improvement achieved by...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11028994/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850161248098844672 |
|---|---|
| author | Naoki Okamoto Tsubasa Hirakawa Takayoshi Yamashita Hironobu Fujiyoshi |
| author_facet | Naoki Okamoto Tsubasa Hirakawa Takayoshi Yamashita Hironobu Fujiyoshi |
| author_sort | Naoki Okamoto |
| collection | DOAJ |
| description | Bidirectional knowledge distillation improves network performance by sharing knowledge between networks during the training of multiple networks. Additionally, performance is further improved by using an ensemble of multiple networks during inference. However, the performance improvement achieved by an ensemble of networks trained with bidirectional knowledge distillation is lower compared to a general ensemble that does not use knowledge distillation. From this trend, we think there is a relationship between network diversity, which is essential for performance improvement through ensembling, and the networks’ shared knowledge. Therefore, we present a distillation strategy to promote network diversity for ensemble learning. Since different types of network diversity can be considered, we design loss functions to separate knowledge and automatically design an effective distillation strategy for ensemble learning by performing a hyperparameter search using these loss functions as hyperparameters. Furthermore, considering network diversity, we design a network compression method for the ensemble and obtain a single network with performance equivalent to that of the ensemble. In the experiments, we automatically design distillation strategies for ensemble learning and evaluate the ensemble accuracy on five classification task datasets. |
| format | Article |
| id | doaj-art-89644c16012a4ce4af69c47f0cf10e26 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-89644c16012a4ce4af69c47f0cf10e262025-08-20T02:22:55ZengIEEEIEEE Access2169-35362025-01-011310087510088810.1109/ACCESS.2025.357810511028994Distilling Diverse Knowledge for Deep Ensemble LearningNaoki Okamoto0https://orcid.org/0000-0003-0053-8890Tsubasa Hirakawa1https://orcid.org/0000-0003-3851-5221Takayoshi Yamashita2https://orcid.org/0000-0003-2631-9856Hironobu Fujiyoshi3https://orcid.org/0000-0001-7391-4725Department of Robotic Science and Technology, Chubu University, Kasugai-shi, Aichi, JapanDepartment of Computer Science, Chubu University, Kasugai-shi, Aichi, JapanDepartment of Computer Science, Chubu University, Kasugai-shi, Aichi, JapanDepartment of Robotic Science and Technology, Chubu University, Kasugai-shi, Aichi, JapanBidirectional knowledge distillation improves network performance by sharing knowledge between networks during the training of multiple networks. Additionally, performance is further improved by using an ensemble of multiple networks during inference. However, the performance improvement achieved by an ensemble of networks trained with bidirectional knowledge distillation is lower compared to a general ensemble that does not use knowledge distillation. From this trend, we think there is a relationship between network diversity, which is essential for performance improvement through ensembling, and the networks’ shared knowledge. Therefore, we present a distillation strategy to promote network diversity for ensemble learning. Since different types of network diversity can be considered, we design loss functions to separate knowledge and automatically design an effective distillation strategy for ensemble learning by performing a hyperparameter search using these loss functions as hyperparameters. Furthermore, considering network diversity, we design a network compression method for the ensemble and obtain a single network with performance equivalent to that of the ensemble. In the experiments, we automatically design distillation strategies for ensemble learning and evaluate the ensemble accuracy on five classification task datasets.https://ieeexplore.ieee.org/document/11028994/Ensemble learningknowledge distillationcollaborative learninghyperparameter searchAutoML |
| spellingShingle | Naoki Okamoto Tsubasa Hirakawa Takayoshi Yamashita Hironobu Fujiyoshi Distilling Diverse Knowledge for Deep Ensemble Learning IEEE Access Ensemble learning knowledge distillation collaborative learning hyperparameter search AutoML |
| title | Distilling Diverse Knowledge for Deep Ensemble Learning |
| title_full | Distilling Diverse Knowledge for Deep Ensemble Learning |
| title_fullStr | Distilling Diverse Knowledge for Deep Ensemble Learning |
| title_full_unstemmed | Distilling Diverse Knowledge for Deep Ensemble Learning |
| title_short | Distilling Diverse Knowledge for Deep Ensemble Learning |
| title_sort | distilling diverse knowledge for deep ensemble learning |
| topic | Ensemble learning knowledge distillation collaborative learning hyperparameter search AutoML |
| url | https://ieeexplore.ieee.org/document/11028994/ |
| work_keys_str_mv | AT naokiokamoto distillingdiverseknowledgefordeepensemblelearning AT tsubasahirakawa distillingdiverseknowledgefordeepensemblelearning AT takayoshiyamashita distillingdiverseknowledgefordeepensemblelearning AT hironobufujiyoshi distillingdiverseknowledgefordeepensemblelearning |