Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems

Abstract Many recent research efforts have exploited data sparsity for the acceleration of convolutional neural network (CNN) inferences. However, the effects of data transfer between main memory and the CNN accelerator have been largely overlooked. In this work, the authors propose a CNN accelerati...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jisu Kwon, Joonho Kong, Arslan Munir
Format:	Article
Language:	English
Published:	Wiley 2022-01-01
Series:	IET Computers & Digital Techniques
Subjects:	accelerator compression convolutional neural networks field programmable gate array input sparsity
Online Access:	https://doi.org/10.1049/cdt2.12038
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849397500426846208
author	Jisu Kwon Joonho Kong Arslan Munir
author_facet	Jisu Kwon Joonho Kong Arslan Munir
author_sort	Jisu Kwon
collection	DOAJ
description	Abstract Many recent research efforts have exploited data sparsity for the acceleration of convolutional neural network (CNN) inferences. However, the effects of data transfer between main memory and the CNN accelerator have been largely overlooked. In this work, the authors propose a CNN acceleration technique that leverages hardware/software co‐design and exploits the sparsity in input feature maps (IFMs). On the software side, the authors' technique employs a novel lossless compression scheme for IFMs, which are sent to the hardware accelerator via direct memory access. On the hardware side, the authors' technique uses a CNN inference accelerator that performs convolutional layer operations with their compressed data format. With several design optimization techniques, the authors have implemented their technique in a field‐programmable gate array (FPGA) system‐on‐chip platform and evaluated their technique for six different convolutional layers in SqueezeNet. Results reveal that the authors' technique improves the performance by 1.1×–22.6× while reducing energy consumption by 47.7%–97.4% as compared to the CPU‐based execution. Furthermore, results indicate that the IFM size and transfer latency are reduced by 34.0%–85.2% and 4.4%–75.7%, respectively, compared to the case without data compression. In addition, the authors' hardware accelerator shows better performance per hardware resource with less than or comparable power consumption to the state‐of‐the‐art FPGA‐based designs.
format	Article
id	doaj-art-ace7e778e7cf4e7296bc44054dbbaf99
institution	Kabale University
issn	1751-8601 1751-861X
language	English
publishDate	2022-01-01
publisher	Wiley
record_format	Article
series	IET Computers & Digital Techniques
spelling	doaj-art-ace7e778e7cf4e7296bc44054dbbaf992025-08-20T03:38:59ZengWileyIET Computers & Digital Techniques1751-86011751-861X2022-01-01161294310.1049/cdt2.12038Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systemsJisu Kwon0Joonho Kong1Arslan Munir2School of Electronic and Electrical Engineering Kyungpook National University Daegu South KoreaSchool of Electronic and Electrical Engineering Kyungpook National University Daegu South KoreaDepartment of Computer Science Kansas State University Manhattan Kansas USAAbstract Many recent research efforts have exploited data sparsity for the acceleration of convolutional neural network (CNN) inferences. However, the effects of data transfer between main memory and the CNN accelerator have been largely overlooked. In this work, the authors propose a CNN acceleration technique that leverages hardware/software co‐design and exploits the sparsity in input feature maps (IFMs). On the software side, the authors' technique employs a novel lossless compression scheme for IFMs, which are sent to the hardware accelerator via direct memory access. On the hardware side, the authors' technique uses a CNN inference accelerator that performs convolutional layer operations with their compressed data format. With several design optimization techniques, the authors have implemented their technique in a field‐programmable gate array (FPGA) system‐on‐chip platform and evaluated their technique for six different convolutional layers in SqueezeNet. Results reveal that the authors' technique improves the performance by 1.1×–22.6× while reducing energy consumption by 47.7%–97.4% as compared to the CPU‐based execution. Furthermore, results indicate that the IFM size and transfer latency are reduced by 34.0%–85.2% and 4.4%–75.7%, respectively, compared to the case without data compression. In addition, the authors' hardware accelerator shows better performance per hardware resource with less than or comparable power consumption to the state‐of‐the‐art FPGA‐based designs.https://doi.org/10.1049/cdt2.12038acceleratorcompressionconvolutional neural networksfield programmable gate arrayinput sparsity
spellingShingle	Jisu Kwon Joonho Kong Arslan Munir Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems IET Computers & Digital Techniques accelerator compression convolutional neural networks field programmable gate array input sparsity
title	Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems
title_full	Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems
title_fullStr	Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems
title_full_unstemmed	Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems
title_short	Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems
title_sort	sparse convolutional neural network acceleration with lossless input feature map compression for resource constrained systems
topic	accelerator compression convolutional neural networks field programmable gate array input sparsity
url	https://doi.org/10.1049/cdt2.12038
work_keys_str_mv	AT jisukwon sparseconvolutionalneuralnetworkaccelerationwithlosslessinputfeaturemapcompressionforresourceconstrainedsystems AT joonhokong sparseconvolutionalneuralnetworkaccelerationwithlosslessinputfeaturemapcompressionforresourceconstrainedsystems AT arslanmunir sparseconvolutionalneuralnetworkaccelerationwithlosslessinputfeaturemapcompressionforresourceconstrainedsystems

Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems

Similar Items