Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study
Abstract BackgroundCoronary heart disease (CHD) is a major cause of morbidity and mortality worldwide. Identifying key risk factors is essential for effective risk assessment and prevention. A data-driven approach using machine learning (ML) offers advanced techniques to analy...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-05-01
|
| Series: | JMIR Cardio |
| Online Access: | https://cardio.jmir.org/2025/1/e68066 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850271551793922048 |
|---|---|
| author | Thien Vu Yoshihiro Kokubo Mai Inoue Masaki Yamamoto Attayeb Mohsen Agustin Martin-Morales Research Dawadi Takao Inoue Jie Ting Tay Mari Yoshizaki Naoki Watanabe Yuki Kuriya Chisa Matsumoto Ahmed Arafa Yoko M Nakao Yuka Kato Masayuki Teramoto Michihiro Araki |
| author_facet | Thien Vu Yoshihiro Kokubo Mai Inoue Masaki Yamamoto Attayeb Mohsen Agustin Martin-Morales Research Dawadi Takao Inoue Jie Ting Tay Mari Yoshizaki Naoki Watanabe Yuki Kuriya Chisa Matsumoto Ahmed Arafa Yoko M Nakao Yuka Kato Masayuki Teramoto Michihiro Araki |
| author_sort | Thien Vu |
| collection | DOAJ |
| description |
Abstract
BackgroundCoronary heart disease (CHD) is a major cause of morbidity and mortality worldwide. Identifying key risk factors is essential for effective risk assessment and prevention. A data-driven approach using machine learning (ML) offers advanced techniques to analyze complex, nonlinear, and high-dimensional datasets, uncovering novel predictors of CHD that go beyond the limitations of traditional models, which rely on predefined variables.
ObjectiveThis study aims to evaluate the contribution of various risk factors to CHD, focusing on both established and novel markers using ML techniques.
MethodsThe study recruited 7672 participants aged 30-84 years from Suita City, Japan, between 1989 and 1999. Over an average of 15 years, participants were monitored for cardiovascular events. A total of 7260 participants and 28 variables were included in the analysis after excluding individuals with missing outcome data and eliminating unnecessary variables. Five ML models—logistic regression, random forest (RF), support vector machine, Extreme Gradient Boosting, and Light Gradient-Boosting Machine—were applied for predicting CHD incidence. Model performance was evaluated using accuracy, sensitivity, specificity, precision, area under the curve, F1
ResultsAmong 7260 participants, 305 (4.2%) were diagnosed with CHD. The RF model demonstrated the highest performance, with an accuracy of 0.73 (95% CI 0.64‐0.80), sensitivity of 0.74 (95% CI 0.62‐0.84), specificity of 0.72 (95% CI 0.61‐0.83), and an area under the curve of 0.73 (95% CI 0.65‐0.80). RF also showed excellent calibration, with predicted probabilities closely aligning with observed outcomes, and provided substantial net benefit across a range of risk thresholds, as demonstrated by decision curve analysis. SHAP analysis elucidated key predictors of CHD, including the intima-media thickness (IMT_cMax) of the common carotid artery, blood pressure, lipid profiles (non–high-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglycerides), and estimated glomerular filtration rate. Novel risk factors identified as significant contributors to CHD risk included lower calcium levels, elevated white blood cell counts, and body fat percentage. Furthermore, a protective effect was observed in women, suggesting the potential necessity for gender-specific risk assessment strategies in future cardiovascular health evaluations.
ConclusionsWe developed a model to predict CHD using ML and applied SHAP methods for interpretation. This approach highlights the multifactor nature of CHD risk evaluation, aiming to support health care professionals in identifying risk factors and formulating effective prevention strategies. |
| format | Article |
| id | doaj-art-54df5baacd0c459d9bbad8dad8c41da3 |
| institution | OA Journals |
| issn | 2561-1011 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | JMIR Cardio |
| spelling | doaj-art-54df5baacd0c459d9bbad8dad8c41da32025-08-20T01:52:12ZengJMIR PublicationsJMIR Cardio2561-10112025-05-019e68066e6806610.2196/68066Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based StudyThien Vuhttp://orcid.org/0000-0002-6956-0191Yoshihiro Kokubohttp://orcid.org/0000-0002-0705-9449Mai Inouehttp://orcid.org/0000-0003-3204-1627Masaki Yamamotohttp://orcid.org/0000-0001-9179-6080Attayeb Mohsenhttp://orcid.org/0000-0003-0690-8012Agustin Martin-Moraleshttp://orcid.org/0000-0002-3564-4776Research Dawadihttp://orcid.org/0000-0002-3524-1459Takao Inouehttp://orcid.org/0000-0002-2080-7480Jie Ting Tayhttp://orcid.org/0009-0008-8385-8649Mari Yoshizakihttp://orcid.org/0009-0002-5031-0632Naoki Watanabehttp://orcid.org/0009-0005-2703-0044Yuki Kuriyahttp://orcid.org/0000-0001-5118-7803Chisa Matsumotohttp://orcid.org/0000-0002-8066-8363Ahmed Arafahttp://orcid.org/0000-0002-3335-2243Yoko M Nakaohttp://orcid.org/0000-0002-3627-5626Yuka Katohttp://orcid.org/0009-0009-2739-9609Masayuki Teramotohttp://orcid.org/0000-0002-2318-2447Michihiro Arakihttp://orcid.org/0000-0002-6686-4018 Abstract BackgroundCoronary heart disease (CHD) is a major cause of morbidity and mortality worldwide. Identifying key risk factors is essential for effective risk assessment and prevention. A data-driven approach using machine learning (ML) offers advanced techniques to analyze complex, nonlinear, and high-dimensional datasets, uncovering novel predictors of CHD that go beyond the limitations of traditional models, which rely on predefined variables. ObjectiveThis study aims to evaluate the contribution of various risk factors to CHD, focusing on both established and novel markers using ML techniques. MethodsThe study recruited 7672 participants aged 30-84 years from Suita City, Japan, between 1989 and 1999. Over an average of 15 years, participants were monitored for cardiovascular events. A total of 7260 participants and 28 variables were included in the analysis after excluding individuals with missing outcome data and eliminating unnecessary variables. Five ML models—logistic regression, random forest (RF), support vector machine, Extreme Gradient Boosting, and Light Gradient-Boosting Machine—were applied for predicting CHD incidence. Model performance was evaluated using accuracy, sensitivity, specificity, precision, area under the curve, F1 ResultsAmong 7260 participants, 305 (4.2%) were diagnosed with CHD. The RF model demonstrated the highest performance, with an accuracy of 0.73 (95% CI 0.64‐0.80), sensitivity of 0.74 (95% CI 0.62‐0.84), specificity of 0.72 (95% CI 0.61‐0.83), and an area under the curve of 0.73 (95% CI 0.65‐0.80). RF also showed excellent calibration, with predicted probabilities closely aligning with observed outcomes, and provided substantial net benefit across a range of risk thresholds, as demonstrated by decision curve analysis. SHAP analysis elucidated key predictors of CHD, including the intima-media thickness (IMT_cMax) of the common carotid artery, blood pressure, lipid profiles (non–high-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglycerides), and estimated glomerular filtration rate. Novel risk factors identified as significant contributors to CHD risk included lower calcium levels, elevated white blood cell counts, and body fat percentage. Furthermore, a protective effect was observed in women, suggesting the potential necessity for gender-specific risk assessment strategies in future cardiovascular health evaluations. ConclusionsWe developed a model to predict CHD using ML and applied SHAP methods for interpretation. This approach highlights the multifactor nature of CHD risk evaluation, aiming to support health care professionals in identifying risk factors and formulating effective prevention strategies.https://cardio.jmir.org/2025/1/e68066 |
| spellingShingle | Thien Vu Yoshihiro Kokubo Mai Inoue Masaki Yamamoto Attayeb Mohsen Agustin Martin-Morales Research Dawadi Takao Inoue Jie Ting Tay Mari Yoshizaki Naoki Watanabe Yuki Kuriya Chisa Matsumoto Ahmed Arafa Yoko M Nakao Yuka Kato Masayuki Teramoto Michihiro Araki Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study JMIR Cardio |
| title | Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study |
| title_full | Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study |
| title_fullStr | Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study |
| title_full_unstemmed | Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study |
| title_short | Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study |
| title_sort | machine learning model for predicting coronary heart disease risk development and validation using insights from a japanese population based study |
| url | https://cardio.jmir.org/2025/1/e68066 |
| work_keys_str_mv | AT thienvu machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT yoshihirokokubo machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT maiinoue machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT masakiyamamoto machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT attayebmohsen machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT agustinmartinmorales machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT researchdawadi machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT takaoinoue machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT jietingtay machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT mariyoshizaki machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT naokiwatanabe machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT yukikuriya machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT chisamatsumoto machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT ahmedarafa machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT yokomnakao machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT yukakato machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT masayukiteramoto machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy AT michihiroaraki machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy |