SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
Currently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
China InfoCom Media Group
2025-05-01
|
| Series: | 大数据 |
| Subjects: | |
| Online Access: | http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025040/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850133486772420608 |
|---|---|
| author | WANG Jinquan LIU Xuzhao LIAO Xiaojian XIAO Limin HUO Zhisheng SUO Jiashun LI Yuntong SHEN Runnan XIE Xilong TANG Xicheng |
| author_facet | WANG Jinquan LIU Xuzhao LIAO Xiaojian XIAO Limin HUO Zhisheng SUO Jiashun LI Yuntong SHEN Runnan XIE Xilong TANG Xicheng |
| author_sort | WANG Jinquan |
| collection | DOAJ |
| description | Currently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers to the edge and end has significant advantages in aspects such as support for new application patterns, protection of data privacy, and control of training costs. Most existing distributed training systems are designed for homogeneous devices, and they are difficult to adapt to the heterogeneous computing environments of cloud-edge-end. For this reason, a cross-domain distributed training system named SpanTrain, which is based on the heterogeneous devices of cloud, edge, and end, has been designed. Through a novel hybrid pipeline parallel mechanism, it realizes the efficient DNN model training with the collaboration of the heterogeneous devices of cloud, edge, and end. Moreover, experiments have been carried out in an environment containing typical heterogeneous devices. Experiments in typical cloud-edge-end heterogeneous environments demonstrate that SpanTrain achieves 1.17x~3.15x higher training throughput compared to state-of-the-art systems, while improving resource utilization of heterogeneous devices by 39%. These results validate the efficiency of SpanTrain for DNN training in cloud-edge-end heterogeneous environments. |
| format | Article |
| id | doaj-art-5f2cdc5ad33d4982931070e1c50393b2 |
| institution | OA Journals |
| issn | 2096-0271 |
| language | zho |
| publishDate | 2025-05-01 |
| publisher | China InfoCom Media Group |
| record_format | Article |
| series | 大数据 |
| spelling | doaj-art-5f2cdc5ad33d4982931070e1c50393b22025-08-20T02:31:57ZzhoChina InfoCom Media Group大数据2096-02712025-05-01111732100477753SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devicesWANG JinquanLIU XuzhaoLIAO XiaojianXIAO LiminHUO ZhishengSUO JiashunLI YuntongSHEN RunnanXIE XilongTANG XichengCurrently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers to the edge and end has significant advantages in aspects such as support for new application patterns, protection of data privacy, and control of training costs. Most existing distributed training systems are designed for homogeneous devices, and they are difficult to adapt to the heterogeneous computing environments of cloud-edge-end. For this reason, a cross-domain distributed training system named SpanTrain, which is based on the heterogeneous devices of cloud, edge, and end, has been designed. Through a novel hybrid pipeline parallel mechanism, it realizes the efficient DNN model training with the collaboration of the heterogeneous devices of cloud, edge, and end. Moreover, experiments have been carried out in an environment containing typical heterogeneous devices. Experiments in typical cloud-edge-end heterogeneous environments demonstrate that SpanTrain achieves 1.17x~3.15x higher training throughput compared to state-of-the-art systems, while improving resource utilization of heterogeneous devices by 39%. These results validate the efficiency of SpanTrain for DNN training in cloud-edge-end heterogeneous environments.http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025040/distributed computingDNN trainingparallel training strategy |
| spellingShingle | WANG Jinquan LIU Xuzhao LIAO Xiaojian XIAO Limin HUO Zhisheng SUO Jiashun LI Yuntong SHEN Runnan XIE Xilong TANG Xicheng SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices 大数据 distributed computing DNN training parallel training strategy |
| title | SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices |
| title_full | SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices |
| title_fullStr | SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices |
| title_full_unstemmed | SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices |
| title_short | SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices |
| title_sort | spantrain a cross domain distributed model training system for cloud edge end heterogeneous devices |
| topic | distributed computing DNN training parallel training strategy |
| url | http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025040/ |
| work_keys_str_mv | AT wangjinquan spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT liuxuzhao spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT liaoxiaojian spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT xiaolimin spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT huozhisheng spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT suojiashun spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT liyuntong spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT shenrunnan spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT xiexilong spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices AT tangxicheng spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices |