Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference

The high parameter and memory access demands of CNNs highlight the need to reduce off-chip memory accesses. While recent approaches have improved data reuse to lessen these accesses, simple and efficient prefetching methods are still lacking. This paper introduces a greedy prefetch method that uses...

Full description

Saved in:
Bibliographic Details
Main Authors: Dengtian Yang, Lan Chen
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/3/164
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850203739861811200
author Dengtian Yang
Lan Chen
author_facet Dengtian Yang
Lan Chen
author_sort Dengtian Yang
collection DOAJ
description The high parameter and memory access demands of CNNs highlight the need to reduce off-chip memory accesses. While recent approaches have improved data reuse to lessen these accesses, simple and efficient prefetching methods are still lacking. This paper introduces a greedy prefetch method that uses data repetition to optimize the prefetching route, thus decreasing off-chip memory accesses. The method is also implemented in a hardware simulator to organize an deployment strategy with additional optimizations. Our deployment strategy outperforms recent works, with a maximum data reuse improvement of 1.98×.
format Article
id doaj-art-76923f0d092e4e9db788e96adeffc619
institution OA Journals
issn 2078-2489
language English
publishDate 2025-02-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-76923f0d092e4e9db788e96adeffc6192025-08-20T02:11:26ZengMDPI AGInformation2078-24892025-02-0116316410.3390/info16030164Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network InferenceDengtian Yang0Lan Chen1Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaInstitute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaThe high parameter and memory access demands of CNNs highlight the need to reduce off-chip memory accesses. While recent approaches have improved data reuse to lessen these accesses, simple and efficient prefetching methods are still lacking. This paper introduces a greedy prefetch method that uses data repetition to optimize the prefetching route, thus decreasing off-chip memory accesses. The method is also implemented in a hardware simulator to organize an deployment strategy with additional optimizations. Our deployment strategy outperforms recent works, with a maximum data reuse improvement of 1.98×.https://www.mdpi.com/2078-2489/16/3/164deep learninggreedy prefetchaccelerator
spellingShingle Dengtian Yang
Lan Chen
Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
Information
deep learning
greedy prefetch
accelerator
title Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_full Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_fullStr Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_full_unstemmed Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_short Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_sort greedy prefetch for reducing off chip memory accesses in convolutional neural network inference
topic deep learning
greedy prefetch
accelerator
url https://www.mdpi.com/2078-2489/16/3/164
work_keys_str_mv AT dengtianyang greedyprefetchforreducingoffchipmemoryaccessesinconvolutionalneuralnetworkinference
AT lanchen greedyprefetchforreducingoffchipmemoryaccessesinconvolutionalneuralnetworkinference