Scaling of hardware-compatible perturbative training algorithms

In this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic gradient descent. We extend the framework to include both weig...

Full description

Saved in:

Bibliographic Details
Main Authors:	B. G. Oripov, A. Dienstfrey, A. N. McCaughan, S. M. Buckley
Format:	Article
Language:	English
Published:	AIP Publishing LLC 2025-06-01
Series:	APL Machine Learning
Online Access:	http://dx.doi.org/10.1063/5.0258271
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849427894263087104
author	B. G. Oripov A. Dienstfrey A. N. McCaughan S. M. Buckley
author_facet	B. G. Oripov A. Dienstfrey A. N. McCaughan S. M. Buckley
author_sort	B. G. Oripov
collection	DOAJ
description	In this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic gradient descent. We extend the framework to include both weight and node perturbation and discuss the advantages and disadvantages of each approach. We investigate the time to train networks using MGD as a function of network size and task complexity. Previous research has suggested that perturbative training methods do not scale well to large problems since in these methods, the time to estimate the gradient scales linearly with the number of network parameters. However, in this work, we show that the time to reach a target accuracy—that is, actually solve the problem of interest—does not follow this undesirable linear scaling and in fact often decreases with network size. Furthermore, we demonstrate that MGD can be used to calculate a drop-in replacement for the gradient in stochastic gradient descent, and therefore, optimization accelerators such as momentum can be used alongside MGD, ensuring compatibility with existing machine learning practices. Our results indicate that MGD can efficiently train large networks on hardware, achieving accuracy comparable with backpropagation, thus presenting a practical solution for future neuromorphic computing systems.
format	Article
id	doaj-art-830730704ec44613b23ded2edd95b4cd
institution	Kabale University
issn	2770-9019
language	English
publishDate	2025-06-01
publisher	AIP Publishing LLC
record_format	Article
series	APL Machine Learning
spelling	doaj-art-830730704ec44613b23ded2edd95b4cd2025-08-20T03:28:52ZengAIP Publishing LLCAPL Machine Learning2770-90192025-06-0132026107026107-1510.1063/5.0258271Scaling of hardware-compatible perturbative training algorithmsB. G. Oripov0A. Dienstfrey1A. N. McCaughan2S. M. Buckley3Department of Physics, University of Colorado, Boulder, Colorado 80309, USANational Institute of Standards and Technology, Boulder, Colorado 80305, USANational Institute of Standards and Technology, Boulder, Colorado 80305, USANational Institute of Standards and Technology, Boulder, Colorado 80305, USAIn this work, we explore the capabilities of multiplexed gradient descent (MGD), a scalable and efficient perturbative zeroth-order training method for estimating the gradient of a loss function in hardware and training it via stochastic gradient descent. We extend the framework to include both weight and node perturbation and discuss the advantages and disadvantages of each approach. We investigate the time to train networks using MGD as a function of network size and task complexity. Previous research has suggested that perturbative training methods do not scale well to large problems since in these methods, the time to estimate the gradient scales linearly with the number of network parameters. However, in this work, we show that the time to reach a target accuracy—that is, actually solve the problem of interest—does not follow this undesirable linear scaling and in fact often decreases with network size. Furthermore, we demonstrate that MGD can be used to calculate a drop-in replacement for the gradient in stochastic gradient descent, and therefore, optimization accelerators such as momentum can be used alongside MGD, ensuring compatibility with existing machine learning practices. Our results indicate that MGD can efficiently train large networks on hardware, achieving accuracy comparable with backpropagation, thus presenting a practical solution for future neuromorphic computing systems.http://dx.doi.org/10.1063/5.0258271
spellingShingle	B. G. Oripov A. Dienstfrey A. N. McCaughan S. M. Buckley Scaling of hardware-compatible perturbative training algorithms APL Machine Learning
title	Scaling of hardware-compatible perturbative training algorithms
title_full	Scaling of hardware-compatible perturbative training algorithms
title_fullStr	Scaling of hardware-compatible perturbative training algorithms
title_full_unstemmed	Scaling of hardware-compatible perturbative training algorithms
title_short	Scaling of hardware-compatible perturbative training algorithms
title_sort	scaling of hardware compatible perturbative training algorithms
url	http://dx.doi.org/10.1063/5.0258271
work_keys_str_mv	AT bgoripov scalingofhardwarecompatibleperturbativetrainingalgorithms AT adienstfrey scalingofhardwarecompatibleperturbativetrainingalgorithms AT anmccaughan scalingofhardwarecompatibleperturbativetrainingalgorithms AT smbuckley scalingofhardwarecompatibleperturbativetrainingalgorithms

Scaling of hardware-compatible perturbative training algorithms

Similar Items