SpanTrain: a cross-domain distributed model training system for cloud-edge-end heterogeneous devices
Currently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers...
Saved in:
| Main Authors: | , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
China InfoCom Media Group
2025-05-01
|
| Series: | 大数据 |
| Subjects: | |
| Online Access: | http://www.j-bigdataresearch.com.cn/zh/article/doi/10.11959/j.issn.2096-0271.2025040/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Currently, in addition to cloud computing centers, the edge and end environments represented by the internet of things, fixed or mobile computing edges are also filled with a large number of intelligent computing devices. Expanding the deep neural network (DNN) training from cloud computing centers to the edge and end has significant advantages in aspects such as support for new application patterns, protection of data privacy, and control of training costs. Most existing distributed training systems are designed for homogeneous devices, and they are difficult to adapt to the heterogeneous computing environments of cloud-edge-end. For this reason, a cross-domain distributed training system named SpanTrain, which is based on the heterogeneous devices of cloud, edge, and end, has been designed. Through a novel hybrid pipeline parallel mechanism, it realizes the efficient DNN model training with the collaboration of the heterogeneous devices of cloud, edge, and end. Moreover, experiments have been carried out in an environment containing typical heterogeneous devices. Experiments in typical cloud-edge-end heterogeneous environments demonstrate that SpanTrain achieves 1.17x~3.15x higher training throughput compared to state-of-the-art systems, while improving resource utilization of heterogeneous devices by 39%. These results validate the efficiency of SpanTrain for DNN training in cloud-edge-end heterogeneous environments. |
|---|---|
| ISSN: | 2096-0271 |