Leveraging logit uncertainty for better knowledge distillation
Abstract Knowledge distillation improves student model performance. However, using a larger teacher model does not necessarily result in better distillation gains due to significant architecture and output gaps with smaller student networks. To address this issue, we reconsider teacher outputs and f...
Saved in:
| Main Authors: | Zhen Guo, Dong Wang, Qiang He, Pengzhou Zhang |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-12-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-024-82647-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
by: Darian Onchis, et al.
Published: (2024-08-01) -
A Review of Knowledge Distillation in Object Detection
by: Shengjie Cheng, et al.
Published: (2025-01-01) -
Decoupled Time-Dimensional Progressive Self-Distillation With Knowledge Calibration for Edge Computing-Enabled AIoT
by: Yingchao Wang, et al.
Published: (2024-01-01) -
Autocorrelation Matrix Knowledge Distillation: A Task-Specific Distillation Method for BERT Models
by: Kai Zhang, et al.
Published: (2024-10-01) -
Aligning to the teacher: multilevel feature-aligned knowledge distillation
by: Yang Zhang, et al.
Published: (2025-08-01)