Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing
In recent years, deep neural networks (DNNs) have addressed new applications with intelligent autonomy, often achieving higher accuracy than human experts. This capability comes at the expense of the ever-increasing complexity of emerging DNNs, causing enormous challenges while deploying on resource...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2024-07-01
|
| Series: | Machine Learning and Knowledge Extraction |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-4990/6/3/70 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850260436832747520 |
|---|---|
| author | Iraj Moghaddasi Byeong-Gyu Nam |
| author_facet | Iraj Moghaddasi Byeong-Gyu Nam |
| author_sort | Iraj Moghaddasi |
| collection | DOAJ |
| description | In recent years, deep neural networks (DNNs) have addressed new applications with intelligent autonomy, often achieving higher accuracy than human experts. This capability comes at the expense of the ever-increasing complexity of emerging DNNs, causing enormous challenges while deploying on resource-limited edge devices. Improving the efficiency of DNN hardware accelerators by compression has been explored previously. Existing state-of-the-art studies applied approximate computing to enhance energy efficiency even at the expense of a little accuracy loss. In contrast, bit-serial processing has been used for improving the computational efficiency of neural processing without accuracy loss, exploiting a simple design, dynamic precision adjustment, and computation pruning. This research presents Serial/Parallel Systolic Array (SPSA) and Octet Serial/Parallel Systolic Array (OSPSA) processing elements for edge DNN acceleration, which exploit bit-serial processing on systolic array architecture for improving computational efficiency. For evaluation, all designs were described at the RTL level and synthesized in 28 nm technology. Post-synthesis cycle-accurate simulations of image classification over DNNs illustrated that, on average, a sample 16 × 16 systolic array indicated remarkable improvements of 17.6% and 50.6% in energy efficiency compared to the baseline, with no loss of accuracy. |
| format | Article |
| id | doaj-art-2ee3adbb86c845e6b391cab1e91cd59b |
| institution | OA Journals |
| issn | 2504-4990 |
| language | English |
| publishDate | 2024-07-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Machine Learning and Knowledge Extraction |
| spelling | doaj-art-2ee3adbb86c845e6b391cab1e91cd59b2025-08-20T01:55:38ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902024-07-01631484149310.3390/make6030070Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic ComputingIraj Moghaddasi0Byeong-Gyu Nam1Department of Computer Science and Engineering, Chungnam National University, Daejeon 305-764, Republic of KoreaDepartment of Computer Science and Engineering, Chungnam National University, Daejeon 305-764, Republic of KoreaIn recent years, deep neural networks (DNNs) have addressed new applications with intelligent autonomy, often achieving higher accuracy than human experts. This capability comes at the expense of the ever-increasing complexity of emerging DNNs, causing enormous challenges while deploying on resource-limited edge devices. Improving the efficiency of DNN hardware accelerators by compression has been explored previously. Existing state-of-the-art studies applied approximate computing to enhance energy efficiency even at the expense of a little accuracy loss. In contrast, bit-serial processing has been used for improving the computational efficiency of neural processing without accuracy loss, exploiting a simple design, dynamic precision adjustment, and computation pruning. This research presents Serial/Parallel Systolic Array (SPSA) and Octet Serial/Parallel Systolic Array (OSPSA) processing elements for edge DNN acceleration, which exploit bit-serial processing on systolic array architecture for improving computational efficiency. For evaluation, all designs were described at the RTL level and synthesized in 28 nm technology. Post-synthesis cycle-accurate simulations of image classification over DNNs illustrated that, on average, a sample 16 × 16 systolic array indicated remarkable improvements of 17.6% and 50.6% in energy efficiency compared to the baseline, with no loss of accuracy.https://www.mdpi.com/2504-4990/6/3/70systolic arrayDNN acceleratorserial inference engineTPUenergy efficiency |
| spellingShingle | Iraj Moghaddasi Byeong-Gyu Nam Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing Machine Learning and Knowledge Extraction systolic array DNN accelerator serial inference engine TPU energy efficiency |
| title | Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing |
| title_full | Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing |
| title_fullStr | Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing |
| title_full_unstemmed | Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing |
| title_short | Enhancing Computation-Efficiency of Deep Neural Network Processing on Edge Devices through Serial/Parallel Systolic Computing |
| title_sort | enhancing computation efficiency of deep neural network processing on edge devices through serial parallel systolic computing |
| topic | systolic array DNN accelerator serial inference engine TPU energy efficiency |
| url | https://www.mdpi.com/2504-4990/6/3/70 |
| work_keys_str_mv | AT irajmoghaddasi enhancingcomputationefficiencyofdeepneuralnetworkprocessingonedgedevicesthroughserialparallelsystoliccomputing AT byeonggyunam enhancingcomputationefficiencyofdeepneuralnetworkprocessingonedgedevicesthroughserialparallelsystoliccomputing |