Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training
Deep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two pr...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10836684/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592899339976704 |
---|---|
author | Nikodimos Provatas Iasonas Chalas Ioannis Konstantinou Nectarios Koziris |
author_facet | Nikodimos Provatas Iasonas Chalas Ioannis Konstantinou Nectarios Koziris |
author_sort | Nikodimos Provatas |
collection | DOAJ |
description | Deep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two prevalent approaches are All-Reduce and Parameter Server. All-Reduce, operating synchronously, faces synchronization-related bottlenecks, while the Parameter Server, often used asynchronously, can potentially compromise the model’s performance. To harness the strengths of both setups, we introduce Strategy-Switch, a hybrid approach that offers the best of both worlds, combining speed with efficiency and high-quality results. This method initiates training under the All-Reduce system and, guided by an empirical rule, transitions to asynchronous Parameter Server training once the model stabilizes. Our experimental analysis demonstrates that we can achieve comparable accuracy to All-Reduce training but with significantly accelerated training. |
format | Article |
id | doaj-art-f29fc539363748dc90c74386fe582886 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-f29fc539363748dc90c74386fe5828862025-01-21T00:02:27ZengIEEEIEEE Access2169-35362025-01-01139510952310.1109/ACCESS.2025.352824810836684Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient TrainingNikodimos Provatas0https://orcid.org/0009-0000-0931-1479Iasonas Chalas1Ioannis Konstantinou2https://orcid.org/0000-0002-7142-8106Nectarios Koziris3School of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceDepartment of Informatics and Telecommunications, University of Thessaly, Lamia, GreeceSchool of Electrical and Computer Engineering, National Technical University of Athens, Athens, GreeceDeep learning plays a pivotal role in numerous big data applications by enhancing the accuracy of models. However, the abundance of available data presents a challenge when training neural networks on a single node. Consequently, various distributed training methods have emerged. Among these, two prevalent approaches are All-Reduce and Parameter Server. All-Reduce, operating synchronously, faces synchronization-related bottlenecks, while the Parameter Server, often used asynchronously, can potentially compromise the model’s performance. To harness the strengths of both setups, we introduce Strategy-Switch, a hybrid approach that offers the best of both worlds, combining speed with efficiency and high-quality results. This method initiates training under the All-Reduce system and, guided by an empirical rule, transitions to asynchronous Parameter Server training once the model stabilizes. Our experimental analysis demonstrates that we can achieve comparable accuracy to All-Reduce training but with significantly accelerated training.https://ieeexplore.ieee.org/document/10836684/Deep learningdistributed systemsall-reduceparameter server |
spellingShingle | Nikodimos Provatas Iasonas Chalas Ioannis Konstantinou Nectarios Koziris Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training IEEE Access Deep learning distributed systems all-reduce parameter server |
title | Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training |
title_full | Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training |
title_fullStr | Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training |
title_full_unstemmed | Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training |
title_short | Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training |
title_sort | strategy switch from all reduce to parameter server for faster efficient training |
topic | Deep learning distributed systems all-reduce parameter server |
url | https://ieeexplore.ieee.org/document/10836684/ |
work_keys_str_mv | AT nikodimosprovatas strategyswitchfromallreducetoparameterserverforfasterefficienttraining AT iasonaschalas strategyswitchfromallreducetoparameterserverforfasterefficienttraining AT ioanniskonstantinou strategyswitchfromallreducetoparameterserverforfasterefficienttraining AT nectarioskoziris strategyswitchfromallreducetoparameterserverforfasterefficienttraining |