Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates
For embedded devices, both memory and computational efficiency are essential due to their constrained resources. However, neural network training remains both computation and memory intensive. Although many existing studies apply quantization schemes to mitigate memory overhead, they often employ st...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Graz University of Technology
2025-08-01
|
| Series: | Journal of Universal Computer Science |
| Subjects: | |
| Online Access: | https://lib.jucs.org/article/164737/download/pdf/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849341529494126592 |
|---|---|
| author | Leo Buron Andreas Erbslöh Gregor Schiele |
| author_facet | Leo Buron Andreas Erbslöh Gregor Schiele |
| author_sort | Leo Buron |
| collection | DOAJ |
| description | For embedded devices, both memory and computational efficiency are essential due to their constrained resources. However, neural network training remains both computation and memory intensive. Although many existing studies apply quantization schemes to mitigate memory overhead, they often employ stochastic rounding for both inference and gradient computation. Notably, no prior work has explored its advantages exclusively in parameter updates. Here, we in-troduce Quantized Parameter Updates (QPU), which uses stochastic rounding (SQPU) to achieve improved and more stable training outcomes. Our fixed-point quantization scheme quantizes parameters (weights and biases) upon model initialization, conducts high-precision gradient com-putations during training, and applies stochastically quantized updates thereafter. This approach substantially lowers memory usage and enables mostly quantized inference, thereby accelerating calculations. Furthermore, storing quantized inputs for gradient computation reduces memory demands even more. When tested on the FASHION-MNIST dataset, our method matches the Straight-Through Estimator (STE) in performance, delivering 0.92% validation accuracy while consuming just 57% of the memory during training. Accepting a slight 1.5% drop in accuracy yields a 50% memory reduction. Additional techniques include stochastic rounding in inference, the use of higher precision for parameters than for layer outputs to limit overflow, L2 regularization via weight decay, and adaptive learning-rate scheduling for improved optimization across a range of batch sizes. |
| format | Article |
| id | doaj-art-b66c0c8c23d74beabc68e628c434e11c |
| institution | Kabale University |
| issn | 0948-6968 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Graz University of Technology |
| record_format | Article |
| series | Journal of Universal Computer Science |
| spelling | doaj-art-b66c0c8c23d74beabc68e628c434e11c2025-08-20T03:43:36ZengGraz University of TechnologyJournal of Universal Computer Science0948-69682025-08-0131996397910.3897/jucs.164737164737Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter UpdatesLeo Buron0Andreas Erbslöh1Gregor Schiele2University of Duisburg-EssenUniversity of Duisburg-EssenUniversity of Duisburg-EssenFor embedded devices, both memory and computational efficiency are essential due to their constrained resources. However, neural network training remains both computation and memory intensive. Although many existing studies apply quantization schemes to mitigate memory overhead, they often employ stochastic rounding for both inference and gradient computation. Notably, no prior work has explored its advantages exclusively in parameter updates. Here, we in-troduce Quantized Parameter Updates (QPU), which uses stochastic rounding (SQPU) to achieve improved and more stable training outcomes. Our fixed-point quantization scheme quantizes parameters (weights and biases) upon model initialization, conducts high-precision gradient com-putations during training, and applies stochastically quantized updates thereafter. This approach substantially lowers memory usage and enables mostly quantized inference, thereby accelerating calculations. Furthermore, storing quantized inputs for gradient computation reduces memory demands even more. When tested on the FASHION-MNIST dataset, our method matches the Straight-Through Estimator (STE) in performance, delivering 0.92% validation accuracy while consuming just 57% of the memory during training. Accepting a slight 1.5% drop in accuracy yields a 50% memory reduction. Additional techniques include stochastic rounding in inference, the use of higher precision for parameters than for layer outputs to limit overflow, L2 regularization via weight decay, and adaptive learning-rate scheduling for improved optimization across a range of batch sizes.https://lib.jucs.org/article/164737/download/pdf/deep learningquantized parametersedge training |
| spellingShingle | Leo Buron Andreas Erbslöh Gregor Schiele Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates Journal of Universal Computer Science deep learning quantized parameters edge training |
| title | Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates |
| title_full | Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates |
| title_fullStr | Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates |
| title_full_unstemmed | Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates |
| title_short | Reducing Memory and Computational Cost for Deep Neural Network Training with Quantized Parameter Updates |
| title_sort | reducing memory and computational cost for deep neural network training with quantized parameter updates |
| topic | deep learning quantized parameters edge training |
| url | https://lib.jucs.org/article/164737/download/pdf/ |
| work_keys_str_mv | AT leoburon reducingmemoryandcomputationalcostfordeepneuralnetworktrainingwithquantizedparameterupdates AT andreaserbsloh reducingmemoryandcomputationalcostfordeepneuralnetworktrainingwithquantizedparameterupdates AT gregorschiele reducingmemoryandcomputationalcostfordeepneuralnetworktrainingwithquantizedparameterupdates |