MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles
Accurate cache performance prediction is critical for designing efficient memory hierarchies in high-performance computing systems. While cyclic simulators provide high accuracy, they require significant computational cost and time, making them inefficient for large-scale design space exploration. A...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11082128/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Accurate cache performance prediction is critical for designing efficient memory hierarchies in high-performance computing systems. While cyclic simulators provide high accuracy, they require significant computational cost and time, making them inefficient for large-scale design space exploration. Analytical models are faster but lack accuracy in complex cache scenarios. This paper proposes MLCRP, a machine learning-based GPU cache performance prediction framework that utilizes the reuse profile (RP) as a key feature. RP captures memory access locality through a histogram of reuse distances. MLCRP consists of three main stages: data preparation, training, and inference. In the data preparation stage, synthetic RP-based traces are generated from parameterized distributions to simulate diverse and non-stationary memory patterns. In the training stage, a regression-based ML model is trained to capture the relationship between RP features, cache configurations, and performance metrics such as miss rate and miss status holding register (MSHR) merge rate. Finally, we propose a method to extract RP features from real GPU application traces, enabling the trained model to predict cache performance. Experimental results demonstrate that MLCRP significantly improves prediction accuracy compared to existing analytical models, maintaining the mean absolute error (MAE) within 5%. Furthermore, it successfully reduces simulation time by an average of four orders of magnitude compared to cycle-accurate simulators. Combining the strengths of analytic speed and simulation accuracy, MLCRP offers a scalable and generalizable solution for GPU cache modeling. |
|---|---|
| ISSN: | 2169-3536 |