Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study

Abstract BackgroundCoronary heart disease (CHD) is a major cause of morbidity and mortality worldwide. Identifying key risk factors is essential for effective risk assessment and prevention. A data-driven approach using machine learning (ML) offers advanced techniques to analy...

Full description

Saved in:
Bibliographic Details
Main Authors: Thien Vu, Yoshihiro Kokubo, Mai Inoue, Masaki Yamamoto, Attayeb Mohsen, Agustin Martin-Morales, Research Dawadi, Takao Inoue, Jie Ting Tay, Mari Yoshizaki, Naoki Watanabe, Yuki Kuriya, Chisa Matsumoto, Ahmed Arafa, Yoko M Nakao, Yuka Kato, Masayuki Teramoto, Michihiro Araki
Format: Article
Language:English
Published: JMIR Publications 2025-05-01
Series:JMIR Cardio
Online Access:https://cardio.jmir.org/2025/1/e68066
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850271551793922048
author Thien Vu
Yoshihiro Kokubo
Mai Inoue
Masaki Yamamoto
Attayeb Mohsen
Agustin Martin-Morales
Research Dawadi
Takao Inoue
Jie Ting Tay
Mari Yoshizaki
Naoki Watanabe
Yuki Kuriya
Chisa Matsumoto
Ahmed Arafa
Yoko M Nakao
Yuka Kato
Masayuki Teramoto
Michihiro Araki
author_facet Thien Vu
Yoshihiro Kokubo
Mai Inoue
Masaki Yamamoto
Attayeb Mohsen
Agustin Martin-Morales
Research Dawadi
Takao Inoue
Jie Ting Tay
Mari Yoshizaki
Naoki Watanabe
Yuki Kuriya
Chisa Matsumoto
Ahmed Arafa
Yoko M Nakao
Yuka Kato
Masayuki Teramoto
Michihiro Araki
author_sort Thien Vu
collection DOAJ
description Abstract BackgroundCoronary heart disease (CHD) is a major cause of morbidity and mortality worldwide. Identifying key risk factors is essential for effective risk assessment and prevention. A data-driven approach using machine learning (ML) offers advanced techniques to analyze complex, nonlinear, and high-dimensional datasets, uncovering novel predictors of CHD that go beyond the limitations of traditional models, which rely on predefined variables. ObjectiveThis study aims to evaluate the contribution of various risk factors to CHD, focusing on both established and novel markers using ML techniques. MethodsThe study recruited 7672 participants aged 30-84 years from Suita City, Japan, between 1989 and 1999. Over an average of 15 years, participants were monitored for cardiovascular events. A total of 7260 participants and 28 variables were included in the analysis after excluding individuals with missing outcome data and eliminating unnecessary variables. Five ML models—logistic regression, random forest (RF), support vector machine, Extreme Gradient Boosting, and Light Gradient-Boosting Machine—were applied for predicting CHD incidence. Model performance was evaluated using accuracy, sensitivity, specificity, precision, area under the curve, F1 ResultsAmong 7260 participants, 305 (4.2%) were diagnosed with CHD. The RF model demonstrated the highest performance, with an accuracy of 0.73 (95% CI 0.64‐0.80), sensitivity of 0.74 (95% CI 0.62‐0.84), specificity of 0.72 (95% CI 0.61‐0.83), and an area under the curve of 0.73 (95% CI 0.65‐0.80). RF also showed excellent calibration, with predicted probabilities closely aligning with observed outcomes, and provided substantial net benefit across a range of risk thresholds, as demonstrated by decision curve analysis. SHAP analysis elucidated key predictors of CHD, including the intima-media thickness (IMT_cMax) of the common carotid artery, blood pressure, lipid profiles (non–high-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglycerides), and estimated glomerular filtration rate. Novel risk factors identified as significant contributors to CHD risk included lower calcium levels, elevated white blood cell counts, and body fat percentage. Furthermore, a protective effect was observed in women, suggesting the potential necessity for gender-specific risk assessment strategies in future cardiovascular health evaluations. ConclusionsWe developed a model to predict CHD using ML and applied SHAP methods for interpretation. This approach highlights the multifactor nature of CHD risk evaluation, aiming to support health care professionals in identifying risk factors and formulating effective prevention strategies.
format Article
id doaj-art-54df5baacd0c459d9bbad8dad8c41da3
institution OA Journals
issn 2561-1011
language English
publishDate 2025-05-01
publisher JMIR Publications
record_format Article
series JMIR Cardio
spelling doaj-art-54df5baacd0c459d9bbad8dad8c41da32025-08-20T01:52:12ZengJMIR PublicationsJMIR Cardio2561-10112025-05-019e68066e6806610.2196/68066Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based StudyThien Vuhttp://orcid.org/0000-0002-6956-0191Yoshihiro Kokubohttp://orcid.org/0000-0002-0705-9449Mai Inouehttp://orcid.org/0000-0003-3204-1627Masaki Yamamotohttp://orcid.org/0000-0001-9179-6080Attayeb Mohsenhttp://orcid.org/0000-0003-0690-8012Agustin Martin-Moraleshttp://orcid.org/0000-0002-3564-4776Research Dawadihttp://orcid.org/0000-0002-3524-1459Takao Inouehttp://orcid.org/0000-0002-2080-7480Jie Ting Tayhttp://orcid.org/0009-0008-8385-8649Mari Yoshizakihttp://orcid.org/0009-0002-5031-0632Naoki Watanabehttp://orcid.org/0009-0005-2703-0044Yuki Kuriyahttp://orcid.org/0000-0001-5118-7803Chisa Matsumotohttp://orcid.org/0000-0002-8066-8363Ahmed Arafahttp://orcid.org/0000-0002-3335-2243Yoko M Nakaohttp://orcid.org/0000-0002-3627-5626Yuka Katohttp://orcid.org/0009-0009-2739-9609Masayuki Teramotohttp://orcid.org/0000-0002-2318-2447Michihiro Arakihttp://orcid.org/0000-0002-6686-4018 Abstract BackgroundCoronary heart disease (CHD) is a major cause of morbidity and mortality worldwide. Identifying key risk factors is essential for effective risk assessment and prevention. A data-driven approach using machine learning (ML) offers advanced techniques to analyze complex, nonlinear, and high-dimensional datasets, uncovering novel predictors of CHD that go beyond the limitations of traditional models, which rely on predefined variables. ObjectiveThis study aims to evaluate the contribution of various risk factors to CHD, focusing on both established and novel markers using ML techniques. MethodsThe study recruited 7672 participants aged 30-84 years from Suita City, Japan, between 1989 and 1999. Over an average of 15 years, participants were monitored for cardiovascular events. A total of 7260 participants and 28 variables were included in the analysis after excluding individuals with missing outcome data and eliminating unnecessary variables. Five ML models—logistic regression, random forest (RF), support vector machine, Extreme Gradient Boosting, and Light Gradient-Boosting Machine—were applied for predicting CHD incidence. Model performance was evaluated using accuracy, sensitivity, specificity, precision, area under the curve, F1 ResultsAmong 7260 participants, 305 (4.2%) were diagnosed with CHD. The RF model demonstrated the highest performance, with an accuracy of 0.73 (95% CI 0.64‐0.80), sensitivity of 0.74 (95% CI 0.62‐0.84), specificity of 0.72 (95% CI 0.61‐0.83), and an area under the curve of 0.73 (95% CI 0.65‐0.80). RF also showed excellent calibration, with predicted probabilities closely aligning with observed outcomes, and provided substantial net benefit across a range of risk thresholds, as demonstrated by decision curve analysis. SHAP analysis elucidated key predictors of CHD, including the intima-media thickness (IMT_cMax) of the common carotid artery, blood pressure, lipid profiles (non–high-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and triglycerides), and estimated glomerular filtration rate. Novel risk factors identified as significant contributors to CHD risk included lower calcium levels, elevated white blood cell counts, and body fat percentage. Furthermore, a protective effect was observed in women, suggesting the potential necessity for gender-specific risk assessment strategies in future cardiovascular health evaluations. ConclusionsWe developed a model to predict CHD using ML and applied SHAP methods for interpretation. This approach highlights the multifactor nature of CHD risk evaluation, aiming to support health care professionals in identifying risk factors and formulating effective prevention strategies.https://cardio.jmir.org/2025/1/e68066
spellingShingle Thien Vu
Yoshihiro Kokubo
Mai Inoue
Masaki Yamamoto
Attayeb Mohsen
Agustin Martin-Morales
Research Dawadi
Takao Inoue
Jie Ting Tay
Mari Yoshizaki
Naoki Watanabe
Yuki Kuriya
Chisa Matsumoto
Ahmed Arafa
Yoko M Nakao
Yuka Kato
Masayuki Teramoto
Michihiro Araki
Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study
JMIR Cardio
title Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study
title_full Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study
title_fullStr Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study
title_full_unstemmed Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study
title_short Machine Learning Model for Predicting Coronary Heart Disease Risk: Development and Validation Using Insights From a Japanese Population–Based Study
title_sort machine learning model for predicting coronary heart disease risk development and validation using insights from a japanese population based study
url https://cardio.jmir.org/2025/1/e68066
work_keys_str_mv AT thienvu machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT yoshihirokokubo machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT maiinoue machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT masakiyamamoto machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT attayebmohsen machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT agustinmartinmorales machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT researchdawadi machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT takaoinoue machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT jietingtay machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT mariyoshizaki machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT naokiwatanabe machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT yukikuriya machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT chisamatsumoto machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT ahmedarafa machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT yokomnakao machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT yukakato machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT masayukiteramoto machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy
AT michihiroaraki machinelearningmodelforpredictingcoronaryheartdiseaseriskdevelopmentandvalidationusinginsightsfromajapanesepopulationbasedstudy