Mixed precision quantization based on information entropy

Abstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantizat...

Full description

Saved in:
Bibliographic Details
Main Authors: Ting Qin, Zhao Li, Jiaqi Zhao, Yuting Yan, Yafei Du
Format: Article
Language:English
Published: Nature Portfolio 2025-04-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-91684-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850156575761629184
author Ting Qin
Zhao Li
Jiaqi Zhao
Yuting Yan
Yafei Du
author_facet Ting Qin
Zhao Li
Jiaqi Zhao
Yuting Yan
Yafei Du
author_sort Ting Qin
collection DOAJ
description Abstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantization and lead to wasted computational resources and degraded model performance. We propose a bit-width allocation method based on information entropy as a means of mitigating the precision loss caused by quantization. During the forward pass of the model, the entropy value of each layer output is calculated, and a sliding window is employed to smooth these entropy values. By computing a dynamic threshold based on the smoothed average entropy of each layer, we adaptively allocate the bit width for each layer. Furthermore, the threshold and the sliding window size are treated as hyperparameters, which Optuna optimizes. Model accuracy is the constraint, thereby automating the bit-width allocation across layers. Finally, we integrate knowledge distillation, where a larger teacher model guides the training of the quantized model, ensuring high performance despite compression by transferring soft labels and deeper knowledge. Experiments on ResNet20, ResNet32, and ResNet56 architectures show that our method can effectively reduce the bit width of weights and activations to 3.6M/3.6MP while maintaining the accuracy of the model. The maximum accuracy loss of this method on the CIFAR-100 dataset is only 0.6%, and it achieves an accuracy comparable to that of the full-precision model on the CIFAR-10 dataset, fully demonstrating its effectiveness in balancing model compression and performance.
format Article
id doaj-art-4fabde50a6b743d8af8e412bb2f072ed
institution OA Journals
issn 2045-2322
language English
publishDate 2025-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-4fabde50a6b743d8af8e412bb2f072ed2025-08-20T02:24:29ZengNature PortfolioScientific Reports2045-23222025-04-0115111610.1038/s41598-025-91684-8Mixed precision quantization based on information entropyTing Qin0Zhao Li1Jiaqi Zhao2Yuting Yan3Yafei Du4School of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologySchool of Computer Science and Technology, Shandong University of TechnologyAbstract Mixed precision quantization represents a sophisticated technique that markedly diminishes a system’s computational and memory demands by reducing the bit width of the model. However, in practical applications, an improper allocation strategy can fail to leverage the advantages of quantization and lead to wasted computational resources and degraded model performance. We propose a bit-width allocation method based on information entropy as a means of mitigating the precision loss caused by quantization. During the forward pass of the model, the entropy value of each layer output is calculated, and a sliding window is employed to smooth these entropy values. By computing a dynamic threshold based on the smoothed average entropy of each layer, we adaptively allocate the bit width for each layer. Furthermore, the threshold and the sliding window size are treated as hyperparameters, which Optuna optimizes. Model accuracy is the constraint, thereby automating the bit-width allocation across layers. Finally, we integrate knowledge distillation, where a larger teacher model guides the training of the quantized model, ensuring high performance despite compression by transferring soft labels and deeper knowledge. Experiments on ResNet20, ResNet32, and ResNet56 architectures show that our method can effectively reduce the bit width of weights and activations to 3.6M/3.6MP while maintaining the accuracy of the model. The maximum accuracy loss of this method on the CIFAR-100 dataset is only 0.6%, and it achieves an accuracy comparable to that of the full-precision model on the CIFAR-10 dataset, fully demonstrating its effectiveness in balancing model compression and performance.https://doi.org/10.1038/s41598-025-91684-8Sliding windowInformation entropyMixed precision quantizationKnowledge distillationModel compression
spellingShingle Ting Qin
Zhao Li
Jiaqi Zhao
Yuting Yan
Yafei Du
Mixed precision quantization based on information entropy
Scientific Reports
Sliding window
Information entropy
Mixed precision quantization
Knowledge distillation
Model compression
title Mixed precision quantization based on information entropy
title_full Mixed precision quantization based on information entropy
title_fullStr Mixed precision quantization based on information entropy
title_full_unstemmed Mixed precision quantization based on information entropy
title_short Mixed precision quantization based on information entropy
title_sort mixed precision quantization based on information entropy
topic Sliding window
Information entropy
Mixed precision quantization
Knowledge distillation
Model compression
url https://doi.org/10.1038/s41598-025-91684-8
work_keys_str_mv AT tingqin mixedprecisionquantizationbasedoninformationentropy
AT zhaoli mixedprecisionquantizationbasedoninformationentropy
AT jiaqizhao mixedprecisionquantizationbasedoninformationentropy
AT yutingyan mixedprecisionquantizationbasedoninformationentropy
AT yafeidu mixedprecisionquantizationbasedoninformationentropy