A snapshot of parallelism in distributed deep learning training

The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without...

Full description

Saved in:
Bibliographic Details
Main Authors: Hairol Romero-Sandí, Gabriel Núñez, Elvis Rojas
Format: Article
Language:English
Published: Universidad Autónoma de Bucaramanga 2024-06-01
Series:Revista Colombiana de Computación
Online Access:https://revistasunabeduco.biteca.online/index.php/rcc/article/view/5054
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849223246599159808
author Hairol Romero-Sandí
Gabriel Núñez
Elvis Rojas
author_facet Hairol Romero-Sandí
Gabriel Núñez
Elvis Rojas
author_sort Hairol Romero-Sandí
collection DOAJ
description The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view.  The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one.
format Article
id doaj-art-9e333338bec145c38bf7194e4f25b455
institution Kabale University
issn 1657-2831
2539-2115
language English
publishDate 2024-06-01
publisher Universidad Autónoma de Bucaramanga
record_format Article
series Revista Colombiana de Computación
spelling doaj-art-9e333338bec145c38bf7194e4f25b4552025-08-25T20:22:18ZengUniversidad Autónoma de BucaramangaRevista Colombiana de Computación1657-28312539-21152024-06-0125110.29375/25392115.5054A snapshot of parallelism in distributed deep learning trainingHairol Romero-Sandí0Gabriel Núñez1Elvis Rojas2Universidad NacionalUniversidad NacionalUniversidad Nacional | National High Technology Center The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view.  The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one. https://revistasunabeduco.biteca.online/index.php/rcc/article/view/5054
spellingShingle Hairol Romero-Sandí
Gabriel Núñez
Elvis Rojas
A snapshot of parallelism in distributed deep learning training
Revista Colombiana de Computación
title A snapshot of parallelism in distributed deep learning training
title_full A snapshot of parallelism in distributed deep learning training
title_fullStr A snapshot of parallelism in distributed deep learning training
title_full_unstemmed A snapshot of parallelism in distributed deep learning training
title_short A snapshot of parallelism in distributed deep learning training
title_sort snapshot of parallelism in distributed deep learning training
url https://revistasunabeduco.biteca.online/index.php/rcc/article/view/5054
work_keys_str_mv AT hairolromerosandi asnapshotofparallelismindistributeddeeplearningtraining
AT gabrielnunez asnapshotofparallelismindistributeddeeplearningtraining
AT elvisrojas asnapshotofparallelismindistributeddeeplearningtraining
AT hairolromerosandi snapshotofparallelismindistributeddeeplearningtraining
AT gabrielnunez snapshotofparallelismindistributeddeeplearningtraining
AT elvisrojas snapshotofparallelismindistributeddeeplearningtraining