Two-stage optimization based on heterogeneous branch fusion for knowledge distillation.
Knowledge distillation transfers knowledge from the teacher model to the student model, effectively improving the performance of the student model. However, relying solely on the fixed knowledge of the teacher model for guidance lacks the supplementation and expansion of knowledge, which limits the...
Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Public Library of Science (PLoS)
2025-01-01
|
| Series: | PLoS ONE |
| Online Access: | https://doi.org/10.1371/journal.pone.0326711 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Knowledge distillation transfers knowledge from the teacher model to the student model, effectively improving the performance of the student model. However, relying solely on the fixed knowledge of the teacher model for guidance lacks the supplementation and expansion of knowledge, which limits the generalization ability of the student model. Therefore, this paper proposes two-stage optimization based on heterogeneous branch fusion for knowledge distillation (THFKD), which provides appropriate knowledge to the student model in different stages through a two-stage optimization strategy. Specifically, the pre-trained teacher offers stable and comprehensive static knowledge, preventing the student from deviating from the target early in the training process. Meanwhile, the student model acquires rich feature representations through heterogeneous branches and a progressive feature fusion module, generating dynamically updated collaborative learning objectives, thus effectively enhancing the diversity of dynamic knowledge. Finally, in the first stage, the ramp-up weight gradually increases the loss weight within the period, while in the second stage, consistent loss weights are applied. The two-stage optimization strategy fully exploits the advantages of each type of knowledge, thereby improving the generalization ability of the student model. Although no tests of statistical significance were carried out, our experimental results on standard datasets (CIFAR-100, Tiny-ImageNet) and long-tail datasets (CIFAR100-LT) suggest that THFKD may slightly improve the student model's classification accuracy and generalization ability. For instance, using ResNet110-ResNet32 on the CIFAR-100 dataset, the accuracy reaches 75.41%, a 1.52% improvement over the state-of-the-art (SOTA). |
|---|---|
| ISSN: | 1932-6203 |