Leveraging logit uncertainty for better knowledge distillation

Abstract Knowledge distillation improves student model performance. However, using a larger teacher model does not necessarily result in better distillation gains due to significant architecture and output gaps with smaller student networks. To address this issue, we reconsider teacher outputs and f...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhen Guo, Dong Wang, Qiang He, Pengzhou Zhang
Format: Article
Language:English
Published: Nature Portfolio 2024-12-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-82647-6
Tags: Add Tag
No Tags, Be the first to tag this record!