10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors

Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using...

Full description

Saved in:
Bibliographic Details
Main Authors: Van Truong Nguyen, Jie-Seok Kim, Jong-Wook Lee
Format: Article
Language:English
Published: IEEE 2021-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9429249/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849223118068908032
author Van Truong Nguyen
Jie-Seok Kim
Jong-Wook Lee
author_facet Van Truong Nguyen
Jie-Seok Kim
Jong-Wook Lee
author_sort Van Truong Nguyen
collection DOAJ
description Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm<sup>2</sup> throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by <inline-formula> <tex-math notation="LaTeX">$16\times $ </tex-math></inline-formula> compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.
format Article
id doaj-art-3207d958e32941b089e9f1893de91e63
institution Kabale University
issn 2169-3536
language English
publishDate 2021-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-3207d958e32941b089e9f1893de91e632025-08-25T23:00:35ZengIEEEIEEE Access2169-35362021-01-019712627127610.1109/ACCESS.2021.3079425942924910T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge ProcessorsVan Truong Nguyen0Jie-Seok Kim1https://orcid.org/0000-0003-3420-1145Jong-Wook Lee2https://orcid.org/0000-0002-9160-2183Department of Electronics and Information Convergence Engineering, Information and Communication System-on-chip (SoC) Research Center, Kyung Hee University, Yongin, South KoreaDepartment of Electronics, School of Electronics and Information, Kyung Hee University, Yongin, South KoreaDepartment of Electronics and Information Convergence Engineering, Information and Communication System-on-chip (SoC) Research Center, Kyung Hee University, Yongin, South KoreaComputing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm<sup>2</sup> throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by <inline-formula> <tex-math notation="LaTeX">$16\times $ </tex-math></inline-formula> compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.https://ieeexplore.ieee.org/document/9429249/Computing-in-memorystatic random access memorydeep neural networkmachine learningedge processor
spellingShingle Van Truong Nguyen
Jie-Seok Kim
Jong-Wook Lee
10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
IEEE Access
Computing-in-memory
static random access memory
deep neural network
machine learning
edge processor
title 10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_full 10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_fullStr 10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_full_unstemmed 10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_short 10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_sort 10t sram computing in memory macros for binary and multibit mac operation of dnn edge processors
topic Computing-in-memory
static random access memory
deep neural network
machine learning
edge processor
url https://ieeexplore.ieee.org/document/9429249/
work_keys_str_mv AT vantruongnguyen 10tsramcomputinginmemorymacrosforbinaryandmultibitmacoperationofdnnedgeprocessors
AT jieseokkim 10tsramcomputinginmemorymacrosforbinaryandmultibitmacoperationofdnnedgeprocessors
AT jongwooklee 10tsramcomputinginmemorymacrosforbinaryandmultibitmacoperationofdnnedgeprocessors