Mixed precision quantization based on information entropy

Abstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantizat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ting Qin, Zhao Li, Jiaqi Zhao, Yuting Yan, Yafei Du
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-04-01
Series:	Scientific Reports
Subjects:	Sliding window Information entropy Mixed precision quantization Knowledge distillation Model compression
Online Access:	https://doi.org/10.1038/s41598-025-91684-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850156575761629184
author	Ting Qin Zhao Li Jiaqi Zhao Yuting Yan Yafei Du
author_facet	Ting Qin Zhao Li Jiaqi Zhao Yuting Yan Yafei Du
author_sort	Ting Qin
collection	DOAJ
description	Abstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantization and lead to wasted computational resources and degraded model performance. We propose a bit-width allocation method based on information entropy as a means of mitigating the precision loss caused by quantization. During the forward pass of the model, the entropy value of each layer output is calculated, and a sliding window is employed to smooth these entropy values. By computing a dynamic threshold based on the smoothed average entropy of each layer, we adaptively allocate the bit width for each layer. Furthermore, the threshold and the sliding window size are treated as hyperparameters, which Optuna optimizes. Model accuracy is the constraint, thereby automating the bit-width allocation across layers. Finally, we integrate knowledge distillation, where a larger teacher model guides the training of the quantized model, ensuring high performance despite compression by transferring soft labels and deeper knowledge. Experiments on ResNet20, ResNet32, and ResNet56 architectures show that our method can effectively reduce the bit width of weights and activations to 3.6M/3.6MP while maintaining the accuracy of the model. The maximum accuracy loss of this method on the CIFAR-100 dataset is only 0.6%, and it achieves an accuracy comparable to that of the full-precision model on the CIFAR-10 dataset, fully demonstrating its effectiveness in balancing model compression and performance.
format	Article
id	doaj-art-4fabde50a6b743d8af8e412bb2f072ed
institution	OA Journals
issn	2045-2322
language	English
publishDate	2025-04-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-4fabde50a6b743d8af8e412bb2f072ed2025-08-20T02:24:29ZengNature PortfolioScientific Reports2045-23222025-04-0115111610.1038/s41598-025-91684-8Mixed precision quantization based on information entropyTing Qin0Zhao Li1Jiaqi Zhao2Yuting Yan3Yafei Du4School of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologyAbstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantization and lead to wasted computational resources and degraded model performance. We propose a bit-width allocation method based on information entropy as a means of mitigating the precision loss caused by quantization. During the forward pass of the model, the entropy value of each layer output is calculated, and a sliding window is employed to smooth these entropy values. By computing a dynamic threshold based on the smoothed average entropy of each layer, we adaptively allocate the bit width for each layer. Furthermore, the threshold and the sliding window size are treated as hyperparameters, which Optuna optimizes. Model accuracy is the constraint, thereby automating the bit-width allocation across layers. Finally, we integrate knowledge distillation, where a larger teacher model guides the training of the quantized model, ensuring high performance despite compression by transferring soft labels and deeper knowledge. Experiments on ResNet20, ResNet32, and ResNet56 architectures show that our method can effectively reduce the bit width of weights and activations to 3.6M/3.6MP while maintaining the accuracy of the model. The maximum accuracy loss of this method on the CIFAR-100 dataset is only 0.6%, and it achieves an accuracy comparable to that of the full-precision model on the CIFAR-10 dataset, fully demonstrating its effectiveness in balancing model compression and performance.https://doi.org/10.1038/s41598-025-91684-8Sliding windowInformation entropyMixed precision quantizationKnowledge distillationModel compression
spellingShingle	Ting Qin Zhao Li Jiaqi Zhao Yuting Yan Yafei Du Mixed precision quantization based on information entropy Scientific Reports Sliding window Information entropy Mixed precision quantization Knowledge distillation Model compression
title	Mixed precision quantization based on information entropy
title_full	Mixed precision quantization based on information entropy
title_fullStr	Mixed precision quantization based on information entropy
title_full_unstemmed	Mixed precision quantization based on information entropy
title_short	Mixed precision quantization based on information entropy
title_sort	mixed precision quantization based on information entropy
topic	Sliding window Information entropy Mixed precision quantization Knowledge distillation Model compression
url	https://doi.org/10.1038/s41598-025-91684-8
work_keys_str_mv	AT tingqin mixedprecisionquantizationbasedoninformationentropy AT zhaoli mixedprecisionquantizationbasedoninformationentropy AT jiaqizhao mixedprecisionquantizationbasedoninformationentropy AT yutingyan mixedprecisionquantizationbasedoninformationentropy AT yafeidu mixedprecisionquantizationbasedoninformationentropy

Mixed precision quantization based on information entropy

Similar Items