Cross-Filter Structured Pruning for Efficient Sparse CNN Acceleration

Convolutional Neural Networks (CNNs) are widely used in vision tasks for resource-constrained environments due to their computational efficiency and strong generalization. However, the dominance of <inline-formula> <tex-math notation="LaTeX">$1 \times 1$ </tex-math></i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ngoc-Son Pham, Sangwon Shin, Lei Xu, Weidong Shi, Taeweon Suh
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	AI accelerator convolutional neural networks (CNNs) sparsity exploitation data compression dataflow network on a chip (NoC)
Online Access:	https://ieeexplore.ieee.org/document/11072696/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Convolutional Neural Networks (CNNs) are widely used in vision tasks for resource-constrained environments due to their computational efficiency and strong generalization. However, the dominance of <inline-formula> <tex-math notation="LaTeX">$1 \times 1$ </tex-math></inline-formula> convolutions in modern CNN architectures introduces challenges for sparsity-aware hardware accelerators, particularly in processing element (PE) load balancing, which limits speedup in sparse inference. To address this issue, this paper proposes a cross-filter structured pruning method, enforcing a uniform sparsity pattern across multiple filters to ensure balanced workload distribution among PEs. This approach is further extended to k <inline-formula> <tex-math notation="LaTeX">$\times $ </tex-math></inline-formula> k convolutions by decomposing them into <inline-formula> <tex-math notation="LaTeX">$1 \times 1$ </tex-math></inline-formula> filters, improving applicability across various CNN layers. This paper also proposes an intra-kernel parallelism technique which significantly reduces the size of PE’s local buffers, a critical bottleneck in sparse CNN accelerators. Experimental results show that the proposed approach maintains accuracy comparable to globally unstructured pruning while significantly enhancing inference speed. FPGA implementation and cycle-accurate simulations confirm improvements in processing speed, energy efficiency, and hardware utilization, making this method well-suited for edge and mobile AI applications. Specifically, the proposed architecture achieves <inline-formula> <tex-math notation="LaTeX">$1.14\times $ </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">$1.6\times $ </tex-math></inline-formula> speedup over Sparten for various CNN models and delivers <inline-formula> <tex-math notation="LaTeX">$7.6\times $ </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">$1.9\times $ </tex-math></inline-formula> higher energy efficiency compared to Sparten and StarSPA, respectively. In terms of area efficiency, synthesis results show a <inline-formula> <tex-math notation="LaTeX">$1.73\times $ </tex-math></inline-formula>–<inline-formula> <tex-math notation="LaTeX">$10.95\times $ </tex-math></inline-formula> reduction in required hardware primitives compared to Sparten and StarSPA.
ISSN:	2169-3536

Cross-Filter Structured Pruning for Efficient Sparse CNN Acceleration

Similar Items