MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles

Accurate cache performance prediction is critical for designing efficient memory hierarchies in high-performance computing systems. While cyclic simulators provide high accuracy, they require significant computational cost and time, making them inefficient for large-scale design space exploration. A...

Full description

Saved in:
Bibliographic Details
Main Authors: Minjung Cho, Eui-Young Chung
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11082128/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate cache performance prediction is critical for designing efficient memory hierarchies in high-performance computing systems. While cyclic simulators provide high accuracy, they require significant computational cost and time, making them inefficient for large-scale design space exploration. Analytical models are faster but lack accuracy in complex cache scenarios. This paper proposes MLCRP, a machine learning-based GPU cache performance prediction framework that utilizes the reuse profile (RP) as a key feature. RP captures memory access locality through a histogram of reuse distances. MLCRP consists of three main stages: data preparation, training, and inference. In the data preparation stage, synthetic RP-based traces are generated from parameterized distributions to simulate diverse and non-stationary memory patterns. In the training stage, a regression-based ML model is trained to capture the relationship between RP features, cache configurations, and performance metrics such as miss rate and miss status holding register (MSHR) merge rate. Finally, we propose a method to extract RP features from real GPU application traces, enabling the trained model to predict cache performance. Experimental results demonstrate that MLCRP significantly improves prediction accuracy compared to existing analytical models, maintaining the mean absolute error (MAE) within 5%. Furthermore, it successfully reduces simulation time by an average of four orders of magnitude compared to cycle-accurate simulators. Combining the strengths of analytic speed and simulation accuracy, MLCRP offers a scalable and generalizable solution for GPU cache modeling.
ISSN:2169-3536