Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference

The high parameter and memory access demands of CNNs highlight the need to reduce off-chip memory accesses. While recent approaches have improved data reuse to lessen these accesses, simple and efficient prefetching methods are still lacking. This paper introduces a greedy prefetch method that uses...

Full description

Saved in:

Bibliographic Details
Main Authors:	Dengtian Yang, Lan Chen
Format:	Article
Language:	English
Published:	MDPI AG 2025-02-01
Series:	Information
Subjects:	deep learning greedy prefetch accelerator
Online Access:	https://www.mdpi.com/2078-2489/16/3/164
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850203739861811200
author	Dengtian Yang Lan Chen
author_facet	Dengtian Yang Lan Chen
author_sort	Dengtian Yang
collection	DOAJ
description	The high parameter and memory access demands of CNNs highlight the need to reduce off-chip memory accesses. While recent approaches have improved data reuse to lessen these accesses, simple and efficient prefetching methods are still lacking. This paper introduces a greedy prefetch method that uses data repetition to optimize the prefetching route, thus decreasing off-chip memory accesses. The method is also implemented in a hardware simulator to organize an deployment strategy with additional optimizations. Our deployment strategy outperforms recent works, with a maximum data reuse improvement of 1.98×.
format	Article
id	doaj-art-76923f0d092e4e9db788e96adeffc619
institution	OA Journals
issn	2078-2489
language	English
publishDate	2025-02-01
publisher	MDPI AG
record_format	Article
series	Information
spelling	doaj-art-76923f0d092e4e9db788e96adeffc6192025-08-20T02:11:26ZengMDPI AGInformation2078-24892025-02-0116316410.3390/info16030164Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network InferenceDengtian Yang0Lan Chen1Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaInstitute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaThe high parameter and memory access demands of CNNs highlight the need to reduce off-chip memory accesses. While recent approaches have improved data reuse to lessen these accesses, simple and efficient prefetching methods are still lacking. This paper introduces a greedy prefetch method that uses data repetition to optimize the prefetching route, thus decreasing off-chip memory accesses. The method is also implemented in a hardware simulator to organize an deployment strategy with additional optimizations. Our deployment strategy outperforms recent works, with a maximum data reuse improvement of 1.98×.https://www.mdpi.com/2078-2489/16/3/164deep learninggreedy prefetchaccelerator
spellingShingle	Dengtian Yang Lan Chen Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference Information deep learning greedy prefetch accelerator
title	Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_full	Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_fullStr	Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_full_unstemmed	Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_short	Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference
title_sort	greedy prefetch for reducing off chip memory accesses in convolutional neural network inference
topic	deep learning greedy prefetch accelerator
url	https://www.mdpi.com/2078-2489/16/3/164
work_keys_str_mv	AT dengtianyang greedyprefetchforreducingoffchipmemoryaccessesinconvolutionalneuralnetworkinference AT lanchen greedyprefetchforreducingoffchipmemoryaccessesinconvolutionalneuralnetworkinference

Greedy Prefetch for Reducing Off-Chip Memory Accesses in Convolutional Neural Network Inference

Similar Items