FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices

This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output,...

Full description

Saved in:

Bibliographic Details
Main Authors:	Arnab Raha, Deepak A. Mathaikutty, Shamik Kundu, Soumendu K. Ghosh
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2025-06-01
Series:	Frontiers in High Performance Computing
Subjects:	deep neural network accelerator flexible data flow sparsity acceleration energy efficiency edge intelligence
Online Access:	https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/full
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849423414575497216
author	Arnab Raha Deepak A. Mathaikutty Shamik Kundu Soumendu K. Ghosh
author_facet	Arnab Raha Deepak A. Mathaikutty Shamik Kundu Soumendu K. Ghosh
author_sort	Arnab Raha
collection	DOAJ
description	This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators.
format	Article
id	doaj-art-5dbd69b533c1462397f1f00c28587bfd
institution	Kabale University
issn	2813-7337
language	English
publishDate	2025-06-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in High Performance Computing
spelling	doaj-art-5dbd69b533c1462397f1f00c28587bfd2025-08-20T03:30:36ZengFrontiers Media S.A.Frontiers in High Performance Computing2813-73372025-06-01310.3389/fhpcp.2025.15702101570210FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devicesArnab RahaDeepak A. MathaikuttyShamik KunduSoumendu K. GhoshThis paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators.https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/fulldeep neural network acceleratorflexible data flowsparsity accelerationenergy efficiencyedge intelligence
spellingShingle	Arnab Raha Deepak A. Mathaikutty Shamik Kundu Soumendu K. Ghosh FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices Frontiers in High Performance Computing deep neural network accelerator flexible data flow sparsity acceleration energy efficiency edge intelligence
title	FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_full	FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_fullStr	FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_full_unstemmed	FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_short	FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_sort	flexnpu a dataflow aware flexible deep learning accelerator for energy efficient edge devices
topic	deep neural network accelerator flexible data flow sparsity acceleration energy efficiency edge intelligence
url	https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/full
work_keys_str_mv	AT arnabraha flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices AT deepakamathaikutty flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices AT shamikkundu flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices AT soumendukghosh flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices

FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices

Similar Items