Machine learning-based prediction of LDL cholesterol: performance evaluation and validation

Objective This study aimed to validate and optimize a machine learning algorithm for accurately predicting low-density lipoprotein cholesterol (LDL-C) levels, addressing limitations of traditional formulas, particularly in hypertriglyceridemia. Methods Various machine learning models—linear regressi...

Full description

Saved in:
Bibliographic Details
Main Authors: Jing-Bi Meng, Zai-Jian An, Chun-Shan Jiang
Format: Article
Language:English
Published: PeerJ Inc. 2025-04-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/19248.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objective This study aimed to validate and optimize a machine learning algorithm for accurately predicting low-density lipoprotein cholesterol (LDL-C) levels, addressing limitations of traditional formulas, particularly in hypertriglyceridemia. Methods Various machine learning models—linear regression, K-nearest neighbors (KNN), decision tree, random forest, eXtreme Gradient Boosting (XGB), and multilayer perceptron (MLP) regressor—were compared to conventional formulas (Friedewald, Martin, and Sampson) using lipid profiles from 120,174 subjects (2020–2023). Predictive performance was evaluated using R-squared (R2), mean squared error (MSE), and Pearson correlation coefficient (PCC) against measured LDL-C values. Results Machine learning models outperformed traditional methods, with Random Forest and XGB achieving the highest accuracy (R2 = 0.94, MSE = 89.25) on the internal dataset. Among the traditional formulas, the Sampson method performed best but showed reduced accuracy in high triglyceride (TG) groups (TG > 300 mg/dL). Machine learning models maintained high predictive power across all TG levels. Conclusion Machine learning models offer more accurate LDL-C estimates, especially in high TG contexts where traditional formulas are less reliable. These models could enhance cardiovascular risk assessment by providing more precise LDL-C estimates, potentially leading to more informed treatment decisions and improved patient outcomes.
ISSN:2167-8359