Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs

The low-bit quantization can effectively reduce the deep neural network storage as well as the computation costs. Existing quantization methods have yielded unsatisfactory results when being applied to lightweight networks. Additionally, following network quantization, the differences in data types...

Full description

Saved in:
Bibliographic Details
Main Authors: Lingjie Yi, Xianzhong Xie, Yi Wan, Bo Jiang, Junfan Chen
Format: Article
Language:English
Published: Wiley 2024-01-01
Series:International Journal of Distributed Sensor Networks
Online Access:http://dx.doi.org/10.1155/2024/8018810
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849698084695572480
author Lingjie Yi
Xianzhong Xie
Yi Wan
Bo Jiang
Junfan Chen
author_facet Lingjie Yi
Xianzhong Xie
Yi Wan
Bo Jiang
Junfan Chen
author_sort Lingjie Yi
collection DOAJ
description The low-bit quantization can effectively reduce the deep neural network storage as well as the computation costs. Existing quantization methods have yielded unsatisfactory results when being applied to lightweight networks. Additionally, following network quantization, the differences in data types between the operators can cause issues when deploying networks on Field Programmable Gate Arrays (FPGAs). Moreover, some operators cannot be accelerated heterogeneously on FPGAs, resulting in frequent switching between the Advanced RISC Machine (ARM) and FPGA environments for computation tasks. To address these problems, this paper proposes a custom network quantization approach. Firstly, an improved PArameterized Clipping Activation (PACT) method is employed during the quantization aware training to restrict the value range of neural network parameters and reduce the loss of precision arising from quantization. Secondly, the Consecutive Execution Of Convolution Operators (CEOCO) strategy is utilized to mitigate the resource consumption caused by the frequent environment switching. The proposed approach is validated on Xilinx Zynq Ultrascale+MPSoC 3EG and Virtex UltraScale+XCVU13P platforms. The MobileNetv1, MobileNetv3, PPLCNet, and PPLCNetv2 networks were utilized as testbeds for the validation. Moreover, experimental results are on the miniImageNet, CIFAR-10, and OxFord 102 Flowers public datasets. In comparison to the original model, the proposed optimization methods result in an average decrease of 1.2% in accuracy. Compared to conventional quantization method, the accuracy remains almost unchanged, while the frames per second (FPS) on FPGAs improves by an average of 2.1 times.
format Article
id doaj-art-aa14af564a8a4388bb4d786042dbf99e
institution DOAJ
issn 1550-1477
language English
publishDate 2024-01-01
publisher Wiley
record_format Article
series International Journal of Distributed Sensor Networks
spelling doaj-art-aa14af564a8a4388bb4d786042dbf99e2025-08-20T03:19:02ZengWileyInternational Journal of Distributed Sensor Networks1550-14772024-01-01202410.1155/2024/8018810Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAsLingjie Yi0Xianzhong Xie1Yi Wan2Bo Jiang3Junfan Chen4School of Computer Science and TechnologySchool of Computer Science and TechnologySchool of Computer Science and TechnologySchool of Computer Science and TechnologyChongqing Haiyun Jiexun TechnologyThe low-bit quantization can effectively reduce the deep neural network storage as well as the computation costs. Existing quantization methods have yielded unsatisfactory results when being applied to lightweight networks. Additionally, following network quantization, the differences in data types between the operators can cause issues when deploying networks on Field Programmable Gate Arrays (FPGAs). Moreover, some operators cannot be accelerated heterogeneously on FPGAs, resulting in frequent switching between the Advanced RISC Machine (ARM) and FPGA environments for computation tasks. To address these problems, this paper proposes a custom network quantization approach. Firstly, an improved PArameterized Clipping Activation (PACT) method is employed during the quantization aware training to restrict the value range of neural network parameters and reduce the loss of precision arising from quantization. Secondly, the Consecutive Execution Of Convolution Operators (CEOCO) strategy is utilized to mitigate the resource consumption caused by the frequent environment switching. The proposed approach is validated on Xilinx Zynq Ultrascale+MPSoC 3EG and Virtex UltraScale+XCVU13P platforms. The MobileNetv1, MobileNetv3, PPLCNet, and PPLCNetv2 networks were utilized as testbeds for the validation. Moreover, experimental results are on the miniImageNet, CIFAR-10, and OxFord 102 Flowers public datasets. In comparison to the original model, the proposed optimization methods result in an average decrease of 1.2% in accuracy. Compared to conventional quantization method, the accuracy remains almost unchanged, while the frames per second (FPS) on FPGAs improves by an average of 2.1 times.http://dx.doi.org/10.1155/2024/8018810
spellingShingle Lingjie Yi
Xianzhong Xie
Yi Wan
Bo Jiang
Junfan Chen
Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs
International Journal of Distributed Sensor Networks
title Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs
title_full Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs
title_fullStr Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs
title_full_unstemmed Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs
title_short Custom Network Quantization Method for Lightweight CNN Acceleration on FPGAs
title_sort custom network quantization method for lightweight cnn acceleration on fpgas
url http://dx.doi.org/10.1155/2024/8018810
work_keys_str_mv AT lingjieyi customnetworkquantizationmethodforlightweightcnnaccelerationonfpgas
AT xianzhongxie customnetworkquantizationmethodforlightweightcnnaccelerationonfpgas
AT yiwan customnetworkquantizationmethodforlightweightcnnaccelerationonfpgas
AT bojiang customnetworkquantizationmethodforlightweightcnnaccelerationonfpgas
AT junfanchen customnetworkquantizationmethodforlightweightcnnaccelerationonfpgas