FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices

This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output,...

Full description

Saved in:
Bibliographic Details
Main Authors: Arnab Raha, Deepak A. Mathaikutty, Shamik Kundu, Soumendu K. Ghosh
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in High Performance Computing
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849423414575497216
author Arnab Raha
Deepak A. Mathaikutty
Shamik Kundu
Soumendu K. Ghosh
author_facet Arnab Raha
Deepak A. Mathaikutty
Shamik Kundu
Soumendu K. Ghosh
author_sort Arnab Raha
collection DOAJ
description This paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators.
format Article
id doaj-art-5dbd69b533c1462397f1f00c28587bfd
institution Kabale University
issn 2813-7337
language English
publishDate 2025-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in High Performance Computing
spelling doaj-art-5dbd69b533c1462397f1f00c28587bfd2025-08-20T03:30:36ZengFrontiers Media S.A.Frontiers in High Performance Computing2813-73372025-06-01310.3389/fhpcp.2025.15702101570210FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devicesArnab RahaDeepak A. MathaikuttyShamik KunduSoumendu K. GhoshThis paper introduces FlexNPU, a Flexible Neural Processing Unit, which adopts agile design principles to enable versatile dataflows, enhancing energy efficiency. Unlike conventional convolutional neural network accelerator architectures that adhere to fixed dataflows (such as input, weight, output, or row stationary) to transfer activations and weights between storage and compute units, our design revolutionizes by enabling adaptable dataflows of any type through configurable software descriptors. Considering that data movement costs considerably outweigh compute costs from an energy perspective, the flexibility in dataflow allows us to optimize the movement per layer for minimal data transfer and energy consumption, a capability unattainable in fixed dataflow architectures. To further enhance throughput and reduce energy consumption in the FlexNPU architecture, we propose a novel sparsity-based acceleration logic that utilizes fine-grained sparsity in both the activation and weight tensors to bypass redundant computations, thus optimizing the convolution engine within the hardware accelerator. Extensive experimental results underscore a significant improvement in the performance and energy efficiency of FlexNPU compared to existing DNN accelerators.https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/fulldeep neural network acceleratorflexible data flowsparsity accelerationenergy efficiencyedge intelligence
spellingShingle Arnab Raha
Deepak A. Mathaikutty
Shamik Kundu
Soumendu K. Ghosh
FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
Frontiers in High Performance Computing
deep neural network accelerator
flexible data flow
sparsity acceleration
energy efficiency
edge intelligence
title FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_full FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_fullStr FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_full_unstemmed FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_short FlexNPU: a dataflow-aware flexible deep learning accelerator for energy-efficient edge devices
title_sort flexnpu a dataflow aware flexible deep learning accelerator for energy efficient edge devices
topic deep neural network accelerator
flexible data flow
sparsity acceleration
energy efficiency
edge intelligence
url https://www.frontiersin.org/articles/10.3389/fhpcp.2025.1570210/full
work_keys_str_mv AT arnabraha flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices
AT deepakamathaikutty flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices
AT shamikkundu flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices
AT soumendukghosh flexnpuadataflowawareflexibledeeplearningacceleratorforenergyefficientedgedevices