Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture

Convolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weigh...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hemkant Nehete, Sandeep Soni, Tharun Kumar Reddy Bollu, Balasubramanian Raman, Brajesh Kumar Kaushik
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Open Journal of Nanotechnology
Subjects:	Approximation-aware training neural networks compute-in-memory architecture magnetic random access memory crossbars
Online Access:	https://ieeexplore.ieee.org/document/10819260/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592873690759168
author	Hemkant Nehete Sandeep Soni Tharun Kumar Reddy Bollu Balasubramanian Raman Brajesh Kumar Kaushik
author_facet	Hemkant Nehete Sandeep Soni Tharun Kumar Reddy Bollu Balasubramanian Raman Brajesh Kumar Kaushik
author_sort	Hemkant Nehete
collection	DOAJ
description	Convolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weights are approximated using a differential approximation function, resulting in a new weight matrix composed of approximation function's coefficients (AFC). The network is trained using backpropagation to minimize the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for inference operations of approximate neural networks. This architecture includes a mapping algorithm that modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption. Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18 models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared to RRAM-based architectures.
format	Article
id	doaj-art-8e1b9132722a4e83a2998bb10adc05dd
institution	Kabale University
issn	2644-1292
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of Nanotechnology
spelling	doaj-art-8e1b9132722a4e83a2998bb10adc05dd2025-01-21T00:02:42ZengIEEEIEEE Open Journal of Nanotechnology2644-12922025-01-016162610.1109/OJNANO.2024.352426510819260Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM ArchitectureHemkant Nehete0https://orcid.org/0009-0003-2367-5799Sandeep Soni1https://orcid.org/0000-0003-3137-9124Tharun Kumar Reddy Bollu2https://orcid.org/0000-0001-7873-4889Balasubramanian Raman3https://orcid.org/0000-0001-6277-6267Brajesh Kumar Kaushik4https://orcid.org/0000-0002-6414-0032Department of Electronics and Communication Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Electronics and Communication Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Electronics and Communication Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Computer Science and Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaDepartment of Electronics and Communication Engineering, Indian Institute of Technology Roorkee, Roorkee, IndiaConvolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weights are approximated using a differential approximation function, resulting in a new weight matrix composed of approximation function's coefficients (AFC). The network is trained using backpropagation to minimize the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for inference operations of approximate neural networks. This architecture includes a mapping algorithm that modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption. Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18 models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared to RRAM-based architectures.https://ieeexplore.ieee.org/document/10819260/Approximation-aware trainingneural networkscompute-in-memory architecturemagnetic random access memory crossbars
spellingShingle	Hemkant Nehete Sandeep Soni Tharun Kumar Reddy Bollu Balasubramanian Raman Brajesh Kumar Kaushik Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture IEEE Open Journal of Nanotechnology Approximation-aware training neural networks compute-in-memory architecture magnetic random access memory crossbars
title	Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture
title_full	Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture
title_fullStr	Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture
title_full_unstemmed	Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture
title_short	Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture
title_sort	approximation aware training for efficient neural network inference on mram based cim architecture
topic	Approximation-aware training neural networks compute-in-memory architecture magnetic random access memory crossbars
url	https://ieeexplore.ieee.org/document/10819260/
work_keys_str_mv	AT hemkantnehete approximationawaretrainingforefficientneuralnetworkinferenceonmrambasedcimarchitecture AT sandeepsoni approximationawaretrainingforefficientneuralnetworkinferenceonmrambasedcimarchitecture AT tharunkumarreddybollu approximationawaretrainingforefficientneuralnetworkinferenceonmrambasedcimarchitecture AT balasubramanianraman approximationawaretrainingforefficientneuralnetworkinferenceonmrambasedcimarchitecture AT brajeshkumarkaushik approximationawaretrainingforefficientneuralnetworkinferenceonmrambasedcimarchitecture

Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture

Similar Items