Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training

Deep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two pr...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nikodimos Provatas, Iasonas Chalas, Ioannis Konstantinou, Nectarios Koziris
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Deep learning distributed systems all-reduce parameter server
Online Access:	https://ieeexplore.ieee.org/document/10836684/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592899339976704
author	Nikodimos Provatas Iasonas Chalas Ioannis Konstantinou Nectarios Koziris
author_facet	Nikodimos Provatas Iasonas Chalas Ioannis Konstantinou Nectarios Koziris
author_sort	Nikodimos Provatas
collection	DOAJ
description	Deep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two prevalent approaches are All-Reduce and Parameter Server. All-Reduce, operating synchronously, faces synchronization-related bottlenecks, while the Parameter Server, often used asynchronously, can potentially compromise the model’s performance. To harness the strengths of both setups, we introduce Strategy-Switch, a hybrid approach that offers the best of both worlds, combining speed with efficiency and high-quality results. This method initiates training under the All-Reduce system and, guided by an empirical rule, transitions to asynchronous Parameter Server training once the model stabilizes. Our experimental analysis demonstrates that we can achieve comparable accuracy to All-Reduce training but with significantly accelerated training.
format	Article
id	doaj-art-f29fc539363748dc90c74386fe582886
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-f29fc539363748dc90c74386fe5828862025-01-21T00:02:27ZengIEEEIEEE Access2169-35362025-01-01139510952310.1109/ACCESS.2025.352824810836684Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient TrainingNikodimos Provatas0https://orcid.org/0009-0000-0931-1479Iasonas Chalas1Ioannis Konstantinou2https://orcid.org/0000-0002-7142-8106Nectarios Koziris3School of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceDepartment of Informatics and Telecommunications, University of Thessaly, Lamia, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceDeep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two prevalent approaches are All-Reduce and Parameter Server. All-Reduce, operating synchronously, faces synchronization-related bottlenecks, while the Parameter Server, often used asynchronously, can potentially compromise the model’s performance. To harness the strengths of both setups, we introduce Strategy-Switch, a hybrid approach that offers the best of both worlds, combining speed with efficiency and high-quality results. This method initiates training under the All-Reduce system and, guided by an empirical rule, transitions to asynchronous Parameter Server training once the model stabilizes. Our experimental analysis demonstrates that we can achieve comparable accuracy to All-Reduce training but with significantly accelerated training.https://ieeexplore.ieee.org/document/10836684/Deep learningdistributed systemsall-reduceparameter server
spellingShingle	Nikodimos Provatas Iasonas Chalas Ioannis Konstantinou Nectarios Koziris Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training IEEE Access Deep learning distributed systems all-reduce parameter server
title	Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_full	Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_fullStr	Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_full_unstemmed	Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_short	Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_sort	strategy switch from all reduce to parameter server for faster efficient training
topic	Deep learning distributed systems all-reduce parameter server
url	https://ieeexplore.ieee.org/document/10836684/
work_keys_str_mv	AT nikodimosprovatas strategyswitchfromallreducetoparameterserverforfasterefficienttraining AT iasonaschalas strategyswitchfromallreducetoparameterserverforfasterefficienttraining AT ioanniskonstantinou strategyswitchfromallreducetoparameterserverforfasterefficienttraining AT nectarioskoziris strategyswitchfromallreducetoparameterserverforfasterefficienttraining

Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training

Similar Items