A snapshot of parallelism in distributed deep learning training
The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Universidad Autónoma de Bucaramanga
2024-06-01
|
| Series: | Revista Colombiana de Computación |
| Online Access: | https://revistasunabeduco.biteca.online/index.php/rcc/article/view/5054 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849223246599159808 |
|---|---|
| author | Hairol Romero-Sandí Gabriel Núñez Elvis Rojas |
| author_facet | Hairol Romero-Sandí Gabriel Núñez Elvis Rojas |
| author_sort | Hairol Romero-Sandí |
| collection | DOAJ |
| description |
The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view. The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one.
|
| format | Article |
| id | doaj-art-9e333338bec145c38bf7194e4f25b455 |
| institution | Kabale University |
| issn | 1657-2831 2539-2115 |
| language | English |
| publishDate | 2024-06-01 |
| publisher | Universidad Autónoma de Bucaramanga |
| record_format | Article |
| series | Revista Colombiana de Computación |
| spelling | doaj-art-9e333338bec145c38bf7194e4f25b4552025-08-25T20:22:18ZengUniversidad Autónoma de BucaramangaRevista Colombiana de Computación1657-28312539-21152024-06-0125110.29375/25392115.5054A snapshot of parallelism in distributed deep learning trainingHairol Romero-Sandí0Gabriel Núñez1Elvis Rojas2Universidad NacionalUniversidad NacionalUniversidad Nacional | National High Technology Center The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view. The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one. https://revistasunabeduco.biteca.online/index.php/rcc/article/view/5054 |
| spellingShingle | Hairol Romero-Sandí Gabriel Núñez Elvis Rojas A snapshot of parallelism in distributed deep learning training Revista Colombiana de Computación |
| title | A snapshot of parallelism in distributed deep learning training |
| title_full | A snapshot of parallelism in distributed deep learning training |
| title_fullStr | A snapshot of parallelism in distributed deep learning training |
| title_full_unstemmed | A snapshot of parallelism in distributed deep learning training |
| title_short | A snapshot of parallelism in distributed deep learning training |
| title_sort | snapshot of parallelism in distributed deep learning training |
| url | https://revistasunabeduco.biteca.online/index.php/rcc/article/view/5054 |
| work_keys_str_mv | AT hairolromerosandi asnapshotofparallelismindistributeddeeplearningtraining AT gabrielnunez asnapshotofparallelismindistributeddeeplearningtraining AT elvisrojas asnapshotofparallelismindistributeddeeplearningtraining AT hairolromerosandi snapshotofparallelismindistributeddeeplearningtraining AT gabrielnunez snapshotofparallelismindistributeddeeplearningtraining AT elvisrojas snapshotofparallelismindistributeddeeplearningtraining |