Research on memory failure prediction based on ensemble learning.

Timely prediction of memory failures is crucial for the stable operation of data centers. However, existing methods often rely on a single classifier, which can lead to inaccurate or unstable predictions. To address this, we propose a new ensemble model for predicting CE-driven memory failures, wher...

Full description

Saved in:
Bibliographic Details
Main Authors: Peng Zhang, Jialiang Zhang, Yi Li
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0321954
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849315991666819072
author Peng Zhang
Jialiang Zhang
Yi Li
author_facet Peng Zhang
Jialiang Zhang
Yi Li
author_sort Peng Zhang
collection DOAJ
description Timely prediction of memory failures is crucial for the stable operation of data centers. However, existing methods often rely on a single classifier, which can lead to inaccurate or unstable predictions. To address this, we propose a new ensemble model for predicting CE-driven memory failures, where failures occur due to a surge of correctable errors (CEs) in memory, causing server downtime. Our model combines several strong-performing classifiers, such as Random Forest, LightGBM, and XGBoost, and assigns different weights to each based on its performance. By optimizing the decision-making process, the model improves prediction accuracy. We validate the model using in-memory data from Alibaba's data center, and the results show an accuracy of over 84%, outperforming existing single and dual-classifier models, further confirming its excellent predictive performance.
format Article
id doaj-art-792b2273fcce493ba907a834c3a8b3ab
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-792b2273fcce493ba907a834c3a8b3ab2025-08-20T03:51:59ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01204e032195410.1371/journal.pone.0321954Research on memory failure prediction based on ensemble learning.Peng ZhangJialiang ZhangYi LiTimely prediction of memory failures is crucial for the stable operation of data centers. However, existing methods often rely on a single classifier, which can lead to inaccurate or unstable predictions. To address this, we propose a new ensemble model for predicting CE-driven memory failures, where failures occur due to a surge of correctable errors (CEs) in memory, causing server downtime. Our model combines several strong-performing classifiers, such as Random Forest, LightGBM, and XGBoost, and assigns different weights to each based on its performance. By optimizing the decision-making process, the model improves prediction accuracy. We validate the model using in-memory data from Alibaba's data center, and the results show an accuracy of over 84%, outperforming existing single and dual-classifier models, further confirming its excellent predictive performance.https://doi.org/10.1371/journal.pone.0321954
spellingShingle Peng Zhang
Jialiang Zhang
Yi Li
Research on memory failure prediction based on ensemble learning.
PLoS ONE
title Research on memory failure prediction based on ensemble learning.
title_full Research on memory failure prediction based on ensemble learning.
title_fullStr Research on memory failure prediction based on ensemble learning.
title_full_unstemmed Research on memory failure prediction based on ensemble learning.
title_short Research on memory failure prediction based on ensemble learning.
title_sort research on memory failure prediction based on ensemble learning
url https://doi.org/10.1371/journal.pone.0321954
work_keys_str_mv AT pengzhang researchonmemoryfailurepredictionbasedonensemblelearning
AT jialiangzhang researchonmemoryfailurepredictionbasedonensemblelearning
AT yili researchonmemoryfailurepredictionbasedonensemblelearning