Enhanced cardiovascular risk prediction in the Western Pacific: A machine learning approach tailored to the Malaysian population.

<h4>Background</h4>Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.<h4>Objective</h4>This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort...

Full description

Saved in:
Bibliographic Details
Main Authors: Sazzli Kasim, Putri Nur Fatin Amir Rudin, Sorayya Malek, Nurulain Ibrahim, Xue Ning Kiew, Nafiza Mat Nasir, Khairul Shafiq Ibrahim, Raja Ezman Raja Shariff
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0323949
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:<h4>Background</h4>Cardiovascular disease (CVD) is a significant public health challenge in the Western Pacific region, including Malaysia.<h4>Objective</h4>This study aimed to develop and validate machine learning (ML) models to predict 10-year CVD risk in a Malaysian cohort, which could serve as a model for other Asian populations with similar genetic and environmental backgrounds.<h4>Methods</h4>Utilizing data from the REDISCOVER Registry (5,688 participants from 2007 to 2017), 30 clinically relevant features were selected, and several ML algorithms were trained: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Neural Network (NN) and Naive Bayes (NB). Ensemble model were also created using three commonly used meta learners, including RF, Generalized Linear Model (GLM), and Gradient Boosting Model (GBM). The dataset was split into a 70:30 train-test ratio, with 5-fold cross-validation to ensure robust performance. Model evaluation was primarily based on the Area Under the Curve (AUC), with additional metrics such as sensitivity, specificity, and the Net Reclassification Index (NRI) to compare the ML models against traditional risk scores like the Framingham Risk Score (FRS) and Revised Pooled Cohort Equations (RPCE).<h4>Results</h4>The LR model achieved the highest AUC of 0.77, outperforming the FRS (AUC = 0.72) and RPCE (AUC = 0.74). The ensemble model provided robust performance, though it did not significantly exceed the best individual model. SHAP (SHapley Additive exPlanations) analysis identified key predictors such as systolic blood pressure, weight and waist circumference. The study showed a significant NRI improvement of 13.15% compared to the FRS and 7.00% compared to the RPCE, highlighting the potential of ML approaches to enhance CVD risk prediction in Malaysia. The best-performing model was deployed on a web platform for real-time use, ensuring ongoing validation and clinical applicability.<h4>Conclusions</h4>These findings underscore the effectiveness of ML models in improving CVD risk stratification and decision-making in Malaysia and beyond.
ISSN:1932-6203