FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output,...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-06-01
|
| Series: | Frontiers in High Performance Computing |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849423414575497216 |
|---|---|
| author | Arnab Raha Deepak A. Mathaikutty Shamik Kundu Soumendu K. Ghosh |
| author_facet | Arnab Raha Deepak A. Mathaikutty Shamik Kundu Soumendu K. Ghosh |
| author_sort | Arnab Raha |
| collection | DOAJ |
| description | This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators. |
| format | Article |
| id | doaj-art-5dbd69b533c1462397f1f00c28587bfd |
| institution | Kabale University |
| issn | 2813-7337 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in High Performance Computing |
| spelling | doaj-art-5dbd69b533c1462397f1f00c28587bfd2025-08-20T03:30:36ZengFrontiers Media S.A.Frontiers in High Performance Computing2813-73372025-06-01310.3389/fhpcp.2025.15702101570210FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devicesArnab RahaDeepak A. MathaikuttyShamik KunduSoumendu K. GhoshThis paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators.https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/fulldeep neural network acceleratorflexible data flowsparsity accelerationenergy efficiencyedge intelligence |
| spellingShingle | Arnab Raha Deepak A. Mathaikutty Shamik Kundu Soumendu K. Ghosh FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices Frontiers in High Performance Computing deep neural network accelerator flexible data flow sparsity acceleration energy efficiency edge intelligence |
| title | FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices |
| title_full | FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices |
| title_fullStr | FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices |
| title_full_unstemmed | FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices |
| title_short | FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices |
| title_sort | flexnpu a dataflow aware flexible deep learning accelerator for energy efficient edge devices |
| topic | deep neural network accelerator flexible data flow sparsity acceleration energy efficiency edge intelligence |
| url | https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/full |
| work_keys_str_mv | AT arnabraha flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices AT deepakamathaikutty flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices AT shamikkundu flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices AT soumendukghosh flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices |