10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors

Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using...

Full description

Saved in:

Bibliographic Details
Main Authors:	Van Truong Nguyen, Jie-Seok Kim, Jong-Wook Lee
Format:	Article
Language:	English
Published:	IEEE 2021-01-01
Series:	IEEE Access
Subjects:	Computing-in-memory static random access memory deep neural network machine learning edge processor
Online Access:	https://ieeexplore.ieee.org/document/9429249/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849223118068908032
author	Van Truong Nguyen Jie-Seok Kim Jong-Wook Lee
author_facet	Van Truong Nguyen Jie-Seok Kim Jong-Wook Lee
author_sort	Van Truong Nguyen
collection	DOAJ
description	Computing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm<sup>2</sup> throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by <inline-formula> <tex-math notation="LaTeX">$16\times $ </tex-math></inline-formula> compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.
format	Article
id	doaj-art-3207d958e32941b089e9f1893de91e63
institution	Kabale University
issn	2169-3536
language	English
publishDate	2021-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-3207d958e32941b089e9f1893de91e632025-08-25T23:00:35ZengIEEEIEEE Access2169-35362021-01-019712627127610.1109/ACCESS.2021.3079425942924910T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge ProcessorsVan Truong Nguyen0Jie-Seok Kim1https://orcid.org/0000-0003-3420-1145Jong-Wook Lee2https://orcid.org/0000-0002-9160-2183Department of Electronics and Information Convergence Engineering, Information and Communication System-on-chip (SoC) Research Center, Kyung Hee University, Yongin, South KoreaDepartment of Electronics, School of Electronics and Information, Kyung Hee University, Yongin, South KoreaDepartment of Electronics and Information Convergence Engineering, Information and Communication System-on-chip (SoC) Research Center, Kyung Hee University, Yongin, South KoreaComputing-in-memory (CIM) is a promising approach to reduce latency and improve the energy efficiency of the multiply-and-accumulate (MAC) operation under a memory wall constraint for artificial intelligence (AI) edge processors. This paper proposes an approach focusing on scalable CIM designs using a new ten-transistor (10T) static random access memory (SRAM) bit-cell. Using the proposed 10T SRAM bit-cell, we present two SRAM-based CIM (SRAM-CIM) macros supporting multibit and binary MAC operations. The first design achieves fully parallel computing and high throughput using 32 parallel binary MAC operations. Advanced circuit techniques such as an input-dependent dynamic reference generator and an input-boosted sense amplifier are presented. Fabricated in 28 nm CMOS process, this design achieves 409.6 GOPS throughput, 1001.7 TOPS/W energy efficiency, and a 169.9 TOPS/mm<sup>2</sup> throughput area efficiency. The proposed approach effectively solves previous problems such as writing disturb, throughput, and the power consumption of an analog to digital converter (ADC). The second design supports multibit MAC operation (4-b weight, 4-b input, and 8-b output) to increase the inference accuracy. We propose an architecture that divides 4-b weight and 4-b input multiplication to four 2-b multiplication in parallel, which increases the signal margin by <inline-formula> <tex-math notation="LaTeX">$16\times $ </tex-math></inline-formula> compared to conventional 4-b multiplication. Besides, the capacitive digital-to-analog converter (CDAC) area issue is effectively addressed using the intrinsic bit-line capacitance existing in the SRAM-CIM architecture. The proposed approach of realizing four 2-b parallel multiplication using the CDAC is successfully demonstrated with a modified LeNet-5 neural network. These results demonstrate that the proposed 10T bit-cell is promising for realizing robust and scalable SRAM-CIM designs, which is essential for realizing fully parallel edge computing.https://ieeexplore.ieee.org/document/9429249/Computing-in-memorystatic random access memorydeep neural networkmachine learningedge processor
spellingShingle	Van Truong Nguyen Jie-Seok Kim Jong-Wook Lee 10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors IEEE Access Computing-in-memory static random access memory deep neural network machine learning edge processor
title	10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_full	10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_fullStr	10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_full_unstemmed	10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_short	10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors
title_sort	10t sram computing in memory macros for binary and multibit mac operation of dnn edge processors
topic	Computing-in-memory static random access memory deep neural network machine learning edge processor
url	https://ieeexplore.ieee.org/document/9429249/
work_keys_str_mv	AT vantruongnguyen 10tsramcomputinginmemorymacrosforbinaryandmultibitmacoperationofdnnedgeprocessors AT jieseokkim 10tsramcomputinginmemorymacrosforbinaryandmultibitmacoperationofdnnedgeprocessors AT jongwooklee 10tsramcomputinginmemorymacrosforbinaryandmultibitmacoperationofdnnedgeprocessors

10T SRAM Computing-in-Memory Macros for Binary and Multibit MAC Operation of DNN Edge Processors

Similar Items