Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training

Deep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Nikodimos Provatas, Iasonas Chalas, Ioannis Konstantinou, Nectarios Koziris
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10836684/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592899339976704
author Nikodimos Provatas
Iasonas Chalas
Ioannis Konstantinou
Nectarios Koziris
author_facet Nikodimos Provatas
Iasonas Chalas
Ioannis Konstantinou
Nectarios Koziris
author_sort Nikodimos Provatas
collection DOAJ
description Deep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two prevalent approaches are All-Reduce and Parameter Server. All-Reduce, operating synchronously, faces synchronization-related bottlenecks, while the Parameter Server, often used asynchronously, can potentially compromise the model’s performance. To harness the strengths of both setups, we introduce Strategy-Switch, a hybrid approach that offers the best of both worlds, combining speed with efficiency and high-quality results. This method initiates training under the All-Reduce system and, guided by an empirical rule, transitions to asynchronous Parameter Server training once the model stabilizes. Our experimental analysis demonstrates that we can achieve comparable accuracy to All-Reduce training but with significantly accelerated training.
format Article
id doaj-art-f29fc539363748dc90c74386fe582886
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-f29fc539363748dc90c74386fe5828862025-01-21T00:02:27ZengIEEEIEEE Access2169-35362025-01-01139510952310.1109/ACCESS.2025.352824810836684Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient TrainingNikodimos Provatas0https://orcid.org/0009-0000-0931-1479Iasonas Chalas1Ioannis Konstantinou2https://orcid.org/0000-0002-7142-8106Nectarios Koziris3School of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceDepartment of Informatics and Telecommunications, University of Thessaly, Lamia, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceDeep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two prevalent approaches are All-Reduce and Parameter Server. All-Reduce, operating synchronously, faces synchronization-related bottlenecks, while the Parameter Server, often used asynchronously, can potentially compromise the model’s performance. To harness the strengths of both setups, we introduce Strategy-Switch, a hybrid approach that offers the best of both worlds, combining speed with efficiency and high-quality results. This method initiates training under the All-Reduce system and, guided by an empirical rule, transitions to asynchronous Parameter Server training once the model stabilizes. Our experimental analysis demonstrates that we can achieve comparable accuracy to All-Reduce training but with significantly accelerated training.https://ieeexplore.ieee.org/document/10836684/Deep learningdistributed systemsall-reduceparameter server
spellingShingle Nikodimos Provatas
Iasonas Chalas
Ioannis Konstantinou
Nectarios Koziris
Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
IEEE Access
Deep learning
distributed systems
all-reduce
parameter server
title Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_full Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_fullStr Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_full_unstemmed Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_short Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
title_sort strategy switch from all reduce to parameter server for faster efficient training
topic Deep learning
distributed systems
all-reduce
parameter server
url https://ieeexplore.ieee.org/document/10836684/
work_keys_str_mv AT nikodimosprovatas strategyswitchfromallreducetoparameterserverforfasterefficienttraining
AT iasonaschalas strategyswitchfromallreducetoparameterserverforfasterefficienttraining
AT ioanniskonstantinou strategyswitchfromallreducetoparameterserverforfasterefficienttraining
AT nectarioskoziris strategyswitchfromallreducetoparameterserverforfasterefficienttraining