Practice of large language model training optimization based on large-scale AI cluster with more than 10 000 domestic NPU

In order to solve the problems of low computing efficiency utilization, poor stability, high difficulty in training optimization, and imperfect domestic accelerator technology ecology in AI cluster model training with more than 10 000 NPU, a large language model training optimization solution based...

Full description

Saved in:
Bibliographic Details
Main Authors: LOU Tao, NIU Hongweihua, ZHANG Pengfei, DONG Jiangfan, LI Panpan, LI Daotong, XU Weidong, YAO Chenghui, XUE Lianhao, TANG Ting, XIANG Jie
Format: Article
Language:zho
Published: Beijing Xintong Media Co., Ltd 2025-07-01
Series:Dianxin kexue
Subjects:
Online Access:http://www.telecomsci.com/zh/article/doi/10.11959/j.issn.1000-0801.2025166/
Tags: Add Tag
No Tags, Be the first to tag this record!