Mixed precision quantization based on information entropy

Abstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantizat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ting Qin, Zhao Li, Jiaqi Zhao, Yuting Yan, Yafei Du
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-04-01
Series:	Scientific Reports
Subjects:	Sliding window Information entropy Mixed precision quantization Knowledge distillation Model compression
Online Access:	https://doi.org/10.1038/s41598-025-91684-8
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantization and lead to wasted computational resources and degraded model performance. We propose a bit-width allocation method based on information entropy as a means of mitigating the precision loss caused by quantization. During the forward pass of the model, the entropy value of each layer output is calculated, and a sliding window is employed to smooth these entropy values. By computing a dynamic threshold based on the smoothed average entropy of each layer, we adaptively allocate the bit width for each layer. Furthermore, the threshold and the sliding window size are treated as hyperparameters, which Optuna optimizes. Model accuracy is the constraint, thereby automating the bit-width allocation across layers. Finally, we integrate knowledge distillation, where a larger teacher model guides the training of the quantized model, ensuring high performance despite compression by transferring soft labels and deeper knowledge. Experiments on ResNet20, ResNet32, and ResNet56 architectures show that our method can effectively reduce the bit width of weights and activations to 3.6M/3.6MP while maintaining the accuracy of the model. The maximum accuracy loss of this method on the CIFAR-100 dataset is only 0.6%, and it achieves an accuracy comparable to that of the full-precision model on the CIFAR-10 dataset, fully demonstrating its effectiveness in balancing model compression and performance.
ISSN:	2045-2322

Mixed precision quantization based on information entropy

Similar Items