An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device

Recently, the combination of convolutional neural network (CNN) and long short-term memory (LSTM) exhibits better performance than single network architecture. Most of these studies connect LSTM networks behind CNNs. When operating on hardware, the current design of CNN-LSTM is similar to a pipeline...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xin Zhou, Wei Xie, Han Zhou, Yongjing Cheng, Ximing Wang, Yun Ren, Shandong Yuan, Liuwen Li
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	CNN-LSTM field programmable gate array (FPGA) hardware acceleration deep learning
Online Access:	https://ieeexplore.ieee.org/document/10621060/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850190299269169152
author	Xin Zhou Wei Xie Han Zhou Yongjing Cheng Ximing Wang Yun Ren Shandong Yuan Liuwen Li
author_facet	Xin Zhou Wei Xie Han Zhou Yongjing Cheng Ximing Wang Yun Ren Shandong Yuan Liuwen Li
author_sort	Xin Zhou
collection	DOAJ
description	Recently, the combination of convolutional neural network (CNN) and long short-term memory (LSTM) exhibits better performance than single network architecture. Most of these studies connect LSTM networks behind CNNs. When operating on hardware, the current design of CNN-LSTM is similar to a pipeline architecture. However, the classic structure lead to a feature loss when data is sent to LSTM since CNN is not good at extracting temporal features. At the same time, as the depth and scale increases, it will bring a huge amount of computation, which makes hardware implementation difficult. Based on that, a parallel CNN-LSTM architecture is proposed, in which two networks extract features from the input data synchronously, being proven to be more effective than classical CNN-LSTM. This paper designs a parallel CNN-LSTM computing device based on FPGA. The device is divided into control unit and operation unit. Control stream and data stream transport between the two units, ensuring the proper running of the device. A highly parallel multi-channel convolution layer and pooling layer are designed to improve the calculation efficiency. A 4-stage pipeline structure is adopted to implement the LSTM part. This paper makes full use of on-chip BRAM to design a look-up table for activation function approximation, reducing the resource consumption by 95% compared with the traditional polynomial approximation. Finally, we verify our device under cooperative spectrum sensing (CSS) and handwritten classification scenarios. Proposed device reaches higher accuracy in two scenarios compared with classic CNN-LSTM structure as well as faster calculating speed, and the overall project power is limited below 2W. The scalability and limitation of this computing device are also discussed.
format	Article
id	doaj-art-511f428de0b148c19a8cc6a0cbba1952
institution	OA Journals
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-511f428de0b148c19a8cc6a0cbba19522025-08-20T02:15:19ZengIEEEIEEE Access2169-35362024-01-011210657910659210.1109/ACCESS.2024.343766310621060An Accelerated FPGA-Based Parallel CNN-LSTM Computing DeviceXin Zhou0https://orcid.org/0009-0008-4422-1020Wei Xie1Han Zhou2Yongjing Cheng3Ximing Wang4https://orcid.org/0000-0003-2216-9352Yun Ren5Shandong Yuan6Liuwen Li7https://orcid.org/0009-0000-8167-9315College of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaCollege of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaCollege of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaCollege of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaCollege of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaCollege of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaCollege of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaCollege of Information and Communication, National University of Defense Technology, Wuhan, Hubei, ChinaRecently, the combination of convolutional neural network (CNN) and long short-term memory (LSTM) exhibits better performance than single network architecture. Most of these studies connect LSTM networks behind CNNs. When operating on hardware, the current design of CNN-LSTM is similar to a pipeline architecture. However, the classic structure lead to a feature loss when data is sent to LSTM since CNN is not good at extracting temporal features. At the same time, as the depth and scale increases, it will bring a huge amount of computation, which makes hardware implementation difficult. Based on that, a parallel CNN-LSTM architecture is proposed, in which two networks extract features from the input data synchronously, being proven to be more effective than classical CNN-LSTM. This paper designs a parallel CNN-LSTM computing device based on FPGA. The device is divided into control unit and operation unit. Control stream and data stream transport between the two units, ensuring the proper running of the device. A highly parallel multi-channel convolution layer and pooling layer are designed to improve the calculation efficiency. A 4-stage pipeline structure is adopted to implement the LSTM part. This paper makes full use of on-chip BRAM to design a look-up table for activation function approximation, reducing the resource consumption by 95% compared with the traditional polynomial approximation. Finally, we verify our device under cooperative spectrum sensing (CSS) and handwritten classification scenarios. Proposed device reaches higher accuracy in two scenarios compared with classic CNN-LSTM structure as well as faster calculating speed, and the overall project power is limited below 2W. The scalability and limitation of this computing device are also discussed.https://ieeexplore.ieee.org/document/10621060/CNN-LSTMfield programmable gate array (FPGA)hardware accelerationdeep learning
spellingShingle	Xin Zhou Wei Xie Han Zhou Yongjing Cheng Ximing Wang Yun Ren Shandong Yuan Liuwen Li An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device IEEE Access CNN-LSTM field programmable gate array (FPGA) hardware acceleration deep learning
title	An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device
title_full	An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device
title_fullStr	An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device
title_full_unstemmed	An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device
title_short	An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device
title_sort	accelerated fpga based parallel cnn lstm computing device
topic	CNN-LSTM field programmable gate array (FPGA) hardware acceleration deep learning
url	https://ieeexplore.ieee.org/document/10621060/
work_keys_str_mv	AT xinzhou anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT weixie anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT hanzhou anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT yongjingcheng anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT ximingwang anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT yunren anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT shandongyuan anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT liuwenli anacceleratedfpgabasedparallelcnnlstmcomputingdevice AT xinzhou acceleratedfpgabasedparallelcnnlstmcomputingdevice AT weixie acceleratedfpgabasedparallelcnnlstmcomputingdevice AT hanzhou acceleratedfpgabasedparallelcnnlstmcomputingdevice AT yongjingcheng acceleratedfpgabasedparallelcnnlstmcomputingdevice AT ximingwang acceleratedfpgabasedparallelcnnlstmcomputingdevice AT yunren acceleratedfpgabasedparallelcnnlstmcomputingdevice AT shandongyuan acceleratedfpgabasedparallelcnnlstmcomputingdevice AT liuwenli acceleratedfpgabasedparallelcnnlstmcomputingdevice

An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device

Similar Items