CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling
This paper analyzes the dependence of the convolutional neural network (CNN) accelerator performance on loop tiling. More specifically, based on the closed-form expression of the CNN accelerator performance, the dependence on the tile sizes is characterized by the derivative, the asymptote and the s...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10849540/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832576737239629824 |
---|---|
author | Chester Sungchung Park Sungkyung Park |
author_facet | Chester Sungchung Park Sungkyung Park |
author_sort | Chester Sungchung Park |
collection | DOAJ |
description | This paper analyzes the dependence of the convolutional neural network (CNN) accelerator performance on loop tiling. More specifically, based on the closed-form expression of the CNN accelerator performance, the dependence on the tile sizes is characterized by the derivative, the asymptote and the switching point between the computation-limited condition and the communication-limited condition. The analysis provides a useful insight into how to determine the tile sizes to achieve the required performance while avoiding an unnecessary static random access memory (SRAM) size increase. The paper also deals with the optimum resource-constrained loop tiling for CNN accelerators. Given the constraint on either the on-chip buffer size or the multiply-accumulate (MAC) array size, tile sizes are optimized to maximize the performance. The closed-form expressions of the optimum tile sizes provide useful insights into how to allocate the available hardware resources for maximum performance. From performance evaluation, the proposed tile sizes achieve almost the maximum performance, which enables the optimization of tile sizes without relying on exhaustive search, speeding up design space exploration. |
format | Article |
id | doaj-art-137de8a2da0c46dd8f62226d3ee7dfc3 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-137de8a2da0c46dd8f62226d3ee7dfc32025-01-31T00:01:42ZengIEEEIEEE Access2169-35362025-01-0113168001681010.1109/ACCESS.2025.353279010849540CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop TilingChester Sungchung Park0https://orcid.org/0000-0003-2009-2814Sungkyung Park1https://orcid.org/0000-0003-1171-5020Department of Electrical and Electronics Engineering, Konkuk University, Gwangjin-gu, Seoul, South KoreaDepartment of Electrical and Electronics Engineering, Pusan National University, Geumjeong-gu, Busan, South KoreaThis paper analyzes the dependence of the convolutional neural network (CNN) accelerator performance on loop tiling. More specifically, based on the closed-form expression of the CNN accelerator performance, the dependence on the tile sizes is characterized by the derivative, the asymptote and the switching point between the computation-limited condition and the communication-limited condition. The analysis provides a useful insight into how to determine the tile sizes to achieve the required performance while avoiding an unnecessary static random access memory (SRAM) size increase. The paper also deals with the optimum resource-constrained loop tiling for CNN accelerators. Given the constraint on either the on-chip buffer size or the multiply-accumulate (MAC) array size, tile sizes are optimized to maximize the performance. The closed-form expressions of the optimum tile sizes provide useful insights into how to allocate the available hardware resources for maximum performance. From performance evaluation, the proposed tile sizes achieve almost the maximum performance, which enables the optimization of tile sizes without relying on exhaustive search, speeding up design space exploration.https://ieeexplore.ieee.org/document/10849540/Closed-form expressionCNN acceleratorcommunicationcomputationhardware resourceloop tiling |
spellingShingle | Chester Sungchung Park Sungkyung Park CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling IEEE Access Closed-form expression CNN accelerator communication computation hardware resource loop tiling |
title | CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling |
title_full | CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling |
title_fullStr | CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling |
title_full_unstemmed | CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling |
title_short | CNN Accelerator Performance Dependence on Loop Tiling and the Optimum Resource-Constrained Loop Tiling |
title_sort | cnn accelerator performance dependence on loop tiling and the optimum resource constrained loop tiling |
topic | Closed-form expression CNN accelerator communication computation hardware resource loop tiling |
url | https://ieeexplore.ieee.org/document/10849540/ |
work_keys_str_mv | AT chestersungchungpark cnnacceleratorperformancedependenceonlooptilingandtheoptimumresourceconstrainedlooptiling AT sungkyungpark cnnacceleratorperformancedependenceonlooptilingandtheoptimumresourceconstrainedlooptiling |