System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory
High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology....
Saved in:
Main Authors: | , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Journal on Exploratory Solid-State Computational Devices and Circuits |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10750212/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1841526330302136320 |
---|---|
author | Leandro M. Giacomini Rocha Mohamed Naeim Guilherme Paim Moritz Brunion Priya Venugopal Dragomir Milojevic James Myers Mustafa Badaroglu Marian Verhelst Julien Ryckaert Dwaipayan Biswas |
author_facet | Leandro M. Giacomini Rocha Mohamed Naeim Guilherme Paim Moritz Brunion Priya Venugopal Dragomir Milojevic James Myers Mustafa Badaroglu Marian Verhelst Julien Ryckaert Dwaipayan Biswas |
author_sort | Leandro M. Giacomini Rocha |
collection | DOAJ |
description | High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows <inline-formula> <tex-math notation="LaTeX">$4\times $ </tex-math></inline-formula> memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM <inline-formula> <tex-math notation="LaTeX">$1\times $ </tex-math></inline-formula>. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a <inline-formula> <tex-math notation="LaTeX">$32\times $ </tex-math></inline-formula> memory capacity can lead to a <inline-formula> <tex-math notation="LaTeX">$7.4\times $ </tex-math></inline-formula> faster execution with <inline-formula> <tex-math notation="LaTeX">$5.7\times $ </tex-math></inline-formula> higher effective TOPS/W than the <inline-formula> <tex-math notation="LaTeX">$1\times $ </tex-math></inline-formula> memory capacity case on the same technology. |
format | Article |
id | doaj-art-226b40e97ae0451ba5bc686ec2cc116c |
institution | Kabale University |
issn | 2329-9231 |
language | English |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal on Exploratory Solid-State Computational Devices and Circuits |
spelling | doaj-art-226b40e97ae0451ba5bc686ec2cc116c2025-01-17T00:00:34ZengIEEEIEEE Journal on Exploratory Solid-State Computational Devices and Circuits2329-92312024-01-011012513410.1109/JXCDC.2024.349611810750212System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile MemoryLeandro M. Giacomini Rocha0https://orcid.org/0000-0003-2883-2768Mohamed Naeim1https://orcid.org/0009-0006-8202-2159Guilherme Paim2https://orcid.org/0000-0001-7809-9563Moritz Brunion3https://orcid.org/0000-0001-7842-7774Priya Venugopal4https://orcid.org/0000-0002-9783-2713Dragomir Milojevic5https://orcid.org/0000-0001-5915-5160James Myers6https://orcid.org/0009-0000-2558-7504Mustafa Badaroglu7https://orcid.org/0009-0006-0126-9062Marian Verhelst8https://orcid.org/0000-0003-3495-9263Julien Ryckaert9Dwaipayan Biswas10https://orcid.org/0000-0001-7912-3692imec, Leuven, Belgiumimec, Leuven, BelgiumINESC-ID, Lisbon, Portugalimec, Leuven, Belgiumimec, Leuven, Belgiumand Mechanical Systems Department (BEAMS), Bio, Electro, Université Libre de Bruxelles, Brussels, Belgiumimec, Cambridge, U.K.Qualcomm, San Diego, CA, USAKU Leuven, Leuven, Belgiumimec, Leuven, Belgiumimec, Leuven, BelgiumHigh-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows <inline-formula> <tex-math notation="LaTeX">$4\times $ </tex-math></inline-formula> memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM <inline-formula> <tex-math notation="LaTeX">$1\times $ </tex-math></inline-formula>. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a <inline-formula> <tex-math notation="LaTeX">$32\times $ </tex-math></inline-formula> memory capacity can lead to a <inline-formula> <tex-math notation="LaTeX">$7.4\times $ </tex-math></inline-formula> faster execution with <inline-formula> <tex-math notation="LaTeX">$5.7\times $ </tex-math></inline-formula> higher effective TOPS/W than the <inline-formula> <tex-math notation="LaTeX">$1\times $ </tex-math></inline-formula> memory capacity case on the same technology.https://ieeexplore.ieee.org/document/10750212/3-D partitioningedge artificial intelligence (Edge-AI)nonvolatile memory (NVM)system-technology co-optimization (STCO)systolic arrayvoltage-gated spin-orbit torque (VGSOT) |
spellingShingle | Leandro M. Giacomini Rocha Mohamed Naeim Guilherme Paim Moritz Brunion Priya Venugopal Dragomir Milojevic James Myers Mustafa Badaroglu Marian Verhelst Julien Ryckaert Dwaipayan Biswas System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 3-D partitioning edge artificial intelligence (Edge-AI) nonvolatile memory (NVM) system-technology co-optimization (STCO) systolic array voltage-gated spin-orbit torque (VGSOT) |
title | System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory |
title_full | System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory |
title_fullStr | System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory |
title_full_unstemmed | System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory |
title_short | System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory |
title_sort | system technology co optimization for dense edge architectures using 3 d integration and nonvolatile memory |
topic | 3-D partitioning edge artificial intelligence (Edge-AI) nonvolatile memory (NVM) system-technology co-optimization (STCO) systolic array voltage-gated spin-orbit torque (VGSOT) |
url | https://ieeexplore.ieee.org/document/10750212/ |
work_keys_str_mv | AT leandromgiacominirocha systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT mohamednaeim systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT guilhermepaim systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT moritzbrunion systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT priyavenugopal systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT dragomirmilojevic systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT jamesmyers systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT mustafabadaroglu systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT marianverhelst systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT julienryckaert systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory AT dwaipayanbiswas systemtechnologycooptimizationfordenseedgearchitecturesusing3dintegrationandnonvolatilememory |