Machine Learning–Based Analysis of Lifestyle Risk Factors for Atherosclerotic Cardiovascular Disease: Retrospective Case-Control Study
Abstract BackgroundThe risk of developing atherosclerotic cardiovascular disease (ASCVD) varies among individuals and is related to a variety of lifestyle factors in addition to the presence of chronic diseases. ObjectiveWe aimed to assess the predictive accuracy o...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-08-01
|
| Series: | JMIR Medical Informatics |
| Online Access: | https://medinform.jmir.org/2025/1/e74415 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract
BackgroundThe risk of developing atherosclerotic cardiovascular disease (ASCVD) varies among individuals and is related to a variety of lifestyle factors in addition to the presence of chronic diseases.
ObjectiveWe aimed to assess the predictive accuracy of machine learning (ML) models incorporating lifestyle risk behaviors for ASCVD risk using the Korean nationwide database.
MethodsUsing data from the Korea National Health and Nutrition Examination Survey, 5 ML algorithms were used for the prediction of high ASCVD risk: logistic regression (LR), support vector machine, random forest, extreme gradient boosting, and light gradient boosting models. ASCVD risk was assessed using the pooled cohort equations, with a high-risk threshold of ≥7.5% over 10 years. Among the 8573 participants aged 40‐79 years, propensity score matching (PSM) was used to adjust for demographic confounders. We divided the dataset into a training and a test dataset in an 8:2 ratio. We also used bootstrapping to train the ML model with the area under the receiver operating characteristics curve score. Shapley additive explanations were used to identify the models’ important variables in assessing high ASCVD risks. In sensitivity analysis, we additionally performed binary LR analysis, in which the ML model’s results were consistent with the conventional statistical model.
ResultsOf the 8573 participants, 41.7% (n=3578) had high ASCVD risk. Before PSM, age and sex differed significantly between groups. PSM (1:1) yielded 1976 patients with balanced demographics. After PSM, the high ASCVD risk group had higher alcohol or tobacco use, lower omega-3 intake, higher BMI, less physical activity, and spent less time sitting. In 5 ML models, the extreme gradient boosting model showed the highest area under the receiver operating characteristics curve, indicating superior overall discrimination between high and low ASCVD risk groups. However, the light gradient boosting model demonstrated better performance in accuracy, recall, and F1
ConclusionsAnalyzing lifestyle behavioral factors in ASCVD risk with an ML model improves the predictive performance compared to traditional models. Personalized prevention strategies tailored to an individual’s lifestyle can effectively reduce ASCVD risk. |
|---|---|
| ISSN: | 2291-9694 |