Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing

In recent years, deep neural networks (DNNs) have addressed new applications with intelligent autonomy, often achieving higher accuracy than human experts. This capability comes at the expense of the ever-increasing complexity of emerging DNNs, causing enormous challenges while deploying on resource...

Full description

Saved in:
Bibliographic Details
Main Authors: Iraj Moghaddasi, Byeong-Gyu Nam
Format: Article
Language:English
Published: MDPI AG 2024-07-01
Series:Machine Learning and Knowledge Extraction
Subjects:
Online Access:https://www.mdpi.com/2504-4990/6/3/70
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850260436832747520
author Iraj Moghaddasi
Byeong-Gyu Nam
author_facet Iraj Moghaddasi
Byeong-Gyu Nam
author_sort Iraj Moghaddasi
collection DOAJ
description In recent years, deep neural networks (DNNs) have addressed new applications with intelligent autonomy, often achieving higher accuracy than human experts. This capability comes at the expense of the ever-increasing complexity of emerging DNNs, causing enormous challenges while deploying on resource-limited edge devices. Improving the efficiency of DNN hardware accelerators by compression has been explored previously. Existing state-of-the-art studies applied approximate computing to enhance energy efficiency even at the expense of a little accuracy loss. In contrast, bit-serial processing has been used for improving the computational efficiency of neural processing without accuracy loss, exploiting a simple design, dynamic precision adjustment, and computation pruning. This research presents Serial/Parallel Systolic Array (SPSA) and Octet Serial/Parallel Systolic Array (OSPSA) processing elements for edge DNN acceleration, which exploit bit-serial processing on systolic array architecture for improving computational efficiency. For evaluation, all designs were described at the RTL level and synthesized in 28 nm technology. Post-synthesis cycle-accurate simulations of image classification over DNNs illustrated that, on average, a sample 16 × 16 systolic array indicated remarkable improvements of 17.6% and 50.6% in energy efficiency compared to the baseline, with no loss of accuracy.
format Article
id doaj-art-2ee3adbb86c845e6b391cab1e91cd59b
institution OA Journals
issn 2504-4990
language English
publishDate 2024-07-01
publisher MDPI AG
record_format Article
series Machine Learning and Knowledge Extraction
spelling doaj-art-2ee3adbb86c845e6b391cab1e91cd59b2025-08-20T01:55:38ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902024-07-01631484149310.3390/make6030070Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic ComputingIraj Moghaddasi0Byeong-Gyu Nam1Department of Computer Science and Engineering, Chungnam National University, Daejeon 305-764, Republic of KoreaDepartment of Computer Science and Engineering, Chungnam National University, Daejeon 305-764, Republic of KoreaIn recent years, deep neural networks (DNNs) have addressed new applications with intelligent autonomy, often achieving higher accuracy than human experts. This capability comes at the expense of the ever-increasing complexity of emerging DNNs, causing enormous challenges while deploying on resource-limited edge devices. Improving the efficiency of DNN hardware accelerators by compression has been explored previously. Existing state-of-the-art studies applied approximate computing to enhance energy efficiency even at the expense of a little accuracy loss. In contrast, bit-serial processing has been used for improving the computational efficiency of neural processing without accuracy loss, exploiting a simple design, dynamic precision adjustment, and computation pruning. This research presents Serial/Parallel Systolic Array (SPSA) and Octet Serial/Parallel Systolic Array (OSPSA) processing elements for edge DNN acceleration, which exploit bit-serial processing on systolic array architecture for improving computational efficiency. For evaluation, all designs were described at the RTL level and synthesized in 28 nm technology. Post-synthesis cycle-accurate simulations of image classification over DNNs illustrated that, on average, a sample 16 × 16 systolic array indicated remarkable improvements of 17.6% and 50.6% in energy efficiency compared to the baseline, with no loss of accuracy.https://www.mdpi.com/2504-4990/6/3/70systolic arrayDNN acceleratorserial inference engineTPUenergy efficiency
spellingShingle Iraj Moghaddasi
Byeong-Gyu Nam
Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing
Machine Learning and Knowledge Extraction
systolic array
DNN accelerator
serial inference engine
TPU
energy efficiency
title Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing
title_full Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing
title_fullStr Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing
title_full_unstemmed Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing
title_short Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing
title_sort enhancing computation efficiency of deep neural network processing on edge devices through serial parallel systolic computing
topic systolic array
DNN accelerator
serial inference engine
TPU
energy efficiency
url https://www.mdpi.com/2504-4990/6/3/70
work_keys_str_mv AT irajmoghaddasi enhancingcomputationefficiencyofdeepneuralnetworkprocessingonedgedevicesthroughserialparallelsystoliccomputing
AT byeonggyunam enhancingcomputationefficiencyofdeepneuralnetworkprocessingonedgedevicesthroughserialparallelsystoliccomputing