MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles
Accurate cache performance prediction is critical for designing efficient memory hierarchies in high-performance computing systems. While cyclic simulators provide high accuracy, they require significant computational cost and time, making them inefficient for large-scale design space exploration. A...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11082128/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849420325283954688 |
|---|---|
| author | Minjung Cho Eui-Young Chung |
| author_facet | Minjung Cho Eui-Young Chung |
| author_sort | Minjung Cho |
| collection | DOAJ |
| description | Accurate cache performance prediction is critical for designing efficient memory hierarchies in high-performance computing systems. While cyclic simulators provide high accuracy, they require significant computational cost and time, making them inefficient for large-scale design space exploration. Analytical models are faster but lack accuracy in complex cache scenarios. This paper proposes MLCRP, a machine learning-based GPU cache performance prediction framework that utilizes the reuse profile (RP) as a key feature. RP captures memory access locality through a histogram of reuse distances. MLCRP consists of three main stages: data preparation, training, and inference. In the data preparation stage, synthetic RP-based traces are generated from parameterized distributions to simulate diverse and non-stationary memory patterns. In the training stage, a regression-based ML model is trained to capture the relationship between RP features, cache configurations, and performance metrics such as miss rate and miss status holding register (MSHR) merge rate. Finally, we propose a method to extract RP features from real GPU application traces, enabling the trained model to predict cache performance. Experimental results demonstrate that MLCRP significantly improves prediction accuracy compared to existing analytical models, maintaining the mean absolute error (MAE) within 5%. Furthermore, it successfully reduces simulation time by an average of four orders of magnitude compared to cycle-accurate simulators. Combining the strengths of analytic speed and simulation accuracy, MLCRP offers a scalable and generalizable solution for GPU cache modeling. |
| format | Article |
| id | doaj-art-dbbbc0ffc9ac453dbd39ff022771ee00 |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-dbbbc0ffc9ac453dbd39ff022771ee002025-08-20T03:31:47ZengIEEEIEEE Access2169-35362025-01-011312661012662210.1109/ACCESS.2025.358980211082128MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse ProfilesMinjung Cho0https://orcid.org/0000-0002-6443-9878Eui-Young Chung1https://orcid.org/0000-0003-2013-8763Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of KoreaDepartment of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of KoreaAccurate cache performance prediction is critical for designing efficient memory hierarchies in high-performance computing systems. While cyclic simulators provide high accuracy, they require significant computational cost and time, making them inefficient for large-scale design space exploration. Analytical models are faster but lack accuracy in complex cache scenarios. This paper proposes MLCRP, a machine learning-based GPU cache performance prediction framework that utilizes the reuse profile (RP) as a key feature. RP captures memory access locality through a histogram of reuse distances. MLCRP consists of three main stages: data preparation, training, and inference. In the data preparation stage, synthetic RP-based traces are generated from parameterized distributions to simulate diverse and non-stationary memory patterns. In the training stage, a regression-based ML model is trained to capture the relationship between RP features, cache configurations, and performance metrics such as miss rate and miss status holding register (MSHR) merge rate. Finally, we propose a method to extract RP features from real GPU application traces, enabling the trained model to predict cache performance. Experimental results demonstrate that MLCRP significantly improves prediction accuracy compared to existing analytical models, maintaining the mean absolute error (MAE) within 5%. Furthermore, it successfully reduces simulation time by an average of four orders of magnitude compared to cycle-accurate simulators. Combining the strengths of analytic speed and simulation accuracy, MLCRP offers a scalable and generalizable solution for GPU cache modeling.https://ieeexplore.ieee.org/document/11082128/Cache memoryreuse distancereuse profilemachine learningtrain data generation |
| spellingShingle | Minjung Cho Eui-Young Chung MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles IEEE Access Cache memory reuse distance reuse profile machine learning train data generation |
| title | MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles |
| title_full | MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles |
| title_fullStr | MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles |
| title_full_unstemmed | MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles |
| title_short | MLCRP: ML-Based GPU Cache Performance Modeling Featured With Reuse Profiles |
| title_sort | mlcrp ml based gpu cache performance modeling featured with reuse profiles |
| topic | Cache memory reuse distance reuse profile machine learning train data generation |
| url | https://ieeexplore.ieee.org/document/11082128/ |
| work_keys_str_mv | AT minjungcho mlcrpmlbasedgpucacheperformancemodelingfeaturedwithreuseprofiles AT euiyoungchung mlcrpmlbasedgpucacheperformancemodelingfeaturedwithreuseprofiles |