SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices

Currently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG Jinquan, LIU Xuzhao, LIAO Xiaojian, XIAO Limin, HUO Zhisheng, SUO Jiashun, LI Yuntong, SHEN Runnan, XIE Xilong, TANG Xicheng
Format: Article
Language:zho
Published: China InfoCom Media Group 2025-05-01
Series:大数据
Subjects:
Online Access:http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025040/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850133486772420608
author WANG Jinquan
LIU Xuzhao
LIAO Xiaojian
XIAO Limin
HUO Zhisheng
SUO Jiashun
LI Yuntong
SHEN Runnan
XIE Xilong
TANG Xicheng
author_facet WANG Jinquan
LIU Xuzhao
LIAO Xiaojian
XIAO Limin
HUO Zhisheng
SUO Jiashun
LI Yuntong
SHEN Runnan
XIE Xilong
TANG Xicheng
author_sort WANG Jinquan
collection DOAJ
description Currently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers to the edge and end has significant advantages in aspects such as support for new application patterns, protection of data privacy, and control of training costs. Most existing distributed training systems are designed for homogeneous devices, and they are difficult to adapt to the heterogeneous computing environments of cloud-edge-end. For this reason, a cross-domain distributed training system named SpanTrain, which is based on the heterogeneous devices of cloud, edge, and end, has been designed. Through a novel hybrid pipeline parallel mechanism, it realizes the efficient DNN model training with the collaboration of the heterogeneous devices of cloud, edge, and end. Moreover, experiments have been carried out in an environment containing typical heterogeneous devices. Experiments in typical cloud-edge-end heterogeneous environments demonstrate that SpanTrain achieves 1.17x~3.15x higher training throughput compared to state-of-the-art systems, while improving resource utilization of heterogeneous devices by 39%. These results validate the efficiency of SpanTrain for DNN training in cloud-edge-end heterogeneous environments.
format Article
id doaj-art-5f2cdc5ad33d4982931070e1c50393b2
institution OA Journals
issn 2096-0271
language zho
publishDate 2025-05-01
publisher China InfoCom Media Group
record_format Article
series 大数据
spelling doaj-art-5f2cdc5ad33d4982931070e1c50393b22025-08-20T02:31:57ZzhoChina InfoCom Media Group大数据2096-02712025-05-01111732100477753SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devicesWANG JinquanLIU XuzhaoLIAO XiaojianXIAO LiminHUO ZhishengSUO JiashunLI YuntongSHEN RunnanXIE XilongTANG XichengCurrently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers to the edge and end has significant advantages in aspects such as support for new application patterns, protection of data privacy, and control of training costs. Most existing distributed training systems are designed for homogeneous devices, and they are difficult to adapt to the heterogeneous computing environments of cloud-edge-end. For this reason, a cross-domain distributed training system named SpanTrain, which is based on the heterogeneous devices of cloud, edge, and end, has been designed. Through a novel hybrid pipeline parallel mechanism, it realizes the efficient DNN model training with the collaboration of the heterogeneous devices of cloud, edge, and end. Moreover, experiments have been carried out in an environment containing typical heterogeneous devices. Experiments in typical cloud-edge-end heterogeneous environments demonstrate that SpanTrain achieves 1.17x~3.15x higher training throughput compared to state-of-the-art systems, while improving resource utilization of heterogeneous devices by 39%. These results validate the efficiency of SpanTrain for DNN training in cloud-edge-end heterogeneous environments.http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025040/distributed computingDNN trainingparallel training strategy
spellingShingle WANG Jinquan
LIU Xuzhao
LIAO Xiaojian
XIAO Limin
HUO Zhisheng
SUO Jiashun
LI Yuntong
SHEN Runnan
XIE Xilong
TANG Xicheng
SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
大数据
distributed computing
DNN training
parallel training strategy
title SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
title_full SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
title_fullStr SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
title_full_unstemmed SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
title_short SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
title_sort spantrain a cross domain distributed model training system for cloud edge end heterogeneous devices
topic distributed computing
DNN training
parallel training strategy
url http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025040/
work_keys_str_mv AT wangjinquan spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT liuxuzhao spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT liaoxiaojian spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT xiaolimin spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT huozhisheng spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT suojiashun spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT liyuntong spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT shenrunnan spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT xiexilong spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices
AT tangxicheng spantrainacrossdomaindistributedmodeltrainingsystemforcloudedgeendheterogeneousdevices