Exploring non-zero position constraints: algorithm-hardware co-designed DNN sparse training method
On-device learning enables edge devices to continuously adapt to new data for AI applications. Leveraging sparsity to eliminate redundant computation and storage usage during training is a key approach to improving the learning efficiency of edge deep neural network(DNN). However, due to the lack of...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
EDP Sciences
2025-02-01
|
| Series: | Xibei Gongye Daxue Xuebao |
| Subjects: | |
| Online Access: | https://www.jnwpu.org/articles/jnwpu/full_html/2025/01/jnwpu2025431p119/jnwpu2025431p119.html |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | On-device learning enables edge devices to continuously adapt to new data for AI applications. Leveraging sparsity to eliminate redundant computation and storage usage during training is a key approach to improving the learning efficiency of edge deep neural network(DNN). However, due to the lack of assumptions about non-zero positions, expensive runtime identification and allocation of zero positions and load balancing of irregular computations are often required, making it difficult for existing sparse training works to approach the ideal speedup. This paper points out that if the non-zero position constraints of operands during training can be predicted in advance, these processing overheads can be skipped to improve sparse training energy efficiency. Therefore, this paper explores the position constraint rules between operands for three typical activation functions in edge scenarios during sparse training. And based on these rules, this paper proposed a parev hardware-friendly sparse training algorithm to reduce the computation and storage pressure of the three phases, and an energy-efficient sparse training accelerator that can be executed in parallel with the forward propagation computation to estimate the non-zero positions so that the runtime processing cost is masked. Experiments show that the proposed method is 2.2 times, 1.38 times and 1.46 times more energy efficient than dense accelerator and two other sparse training tasks respectively. |
|---|---|
| ISSN: | 1000-2758 2609-7125 |