Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study

BackgroundChronic heart failure (CHF) is a serious threat to human health, with high morbidity and mortality rates, imposing a heavy burden on the health care system and society. With the abundance of medical data and the rapid development of machine learning (ML) technologie...

Full description

Saved in:
Bibliographic Details
Main Authors: Qian Xu, Xue Cai, Ruicong Yu, Yueyue Zheng, Guanjie Chen, Hui Sun, Tianyun Gao, Cuirong Xu, Jing Sun
Format: Article
Language:English
Published: JMIR Publications 2025-01-01
Series:JMIR Medical Informatics
Online Access:https://medinform.jmir.org/2025/1/e64972
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575586302689280
author Qian Xu
Xue Cai
Ruicong Yu
Yueyue Zheng
Guanjie Chen
Hui Sun
Tianyun Gao
Cuirong Xu
Jing Sun
author_facet Qian Xu
Xue Cai
Ruicong Yu
Yueyue Zheng
Guanjie Chen
Hui Sun
Tianyun Gao
Cuirong Xu
Jing Sun
author_sort Qian Xu
collection DOAJ
description BackgroundChronic heart failure (CHF) is a serious threat to human health, with high morbidity and mortality rates, imposing a heavy burden on the health care system and society. With the abundance of medical data and the rapid development of machine learning (ML) technologies, new opportunities are provided for in-depth investigation of the mechanisms of CHF and the construction of predictive models. The introduction of health ecology research methodology enables a comprehensive dissection of CHF risk factors from a wider range of environmental, social, and individual factors. This not only helps to identify high-risk groups at an early stage but also provides a scientific basis for the development of precise prevention and intervention strategies. ObjectiveThis study aims to use ML to construct a predictive model of the risk of occurrence of CHF and analyze the risk of CHF from a health ecology perspective. MethodsThis study sourced data from the Jackson Heart Study database. Stringent data preprocessing procedures were implemented, which included meticulous management of missing values and the standardization of data. Principal component analysis and random forest (RF) were used as feature selection techniques. Subsequently, several ML models, namely decision tree, RF, extreme gradient boosting, adaptive boosting (AdaBoost), support vector machine, naive Bayes model, multilayer perceptron, and bootstrap forest, were constructed, and their performance was evaluated. The effectiveness of the models was validated through internal validation using a 10-fold cross-validation approach on the training and validation sets. In addition, the performance metrics of each model, including accuracy, precision, sensitivity, F1-score, and area under the curve (AUC), were compared. After selecting the best model, we used hyperparameter optimization to construct a better model. ResultsRF-selected features (21 in total) had an average root mean square error of 0.30, outperforming principal component analysis. Synthetic Minority Oversampling Technique and Edited Nearest Neighbors showed better accuracy in data balancing. The AdaBoost model was most effective with an AUC of 0.86, accuracy of 75.30%, precision of 0.86, sensitivity of 0.69, and F1-score of 0.76. Validation on the training and validation sets through 10-fold cross-validation gave an AUC of 0.97, an accuracy of 91.27%, a precision of 0.94, a sensitivity of 0.92, and an F1-score of 0.94. After random search processing, the accuracy and AUC of AdaBoost improved. Its accuracy was 77.68% and its AUC was 0.86. ConclusionsThis study offered insights into CHF risk prediction. Future research should focus on prospective studies, diverse data, advanced techniques, longitudinal studies, and exploring factor interactions for better CHF prevention and management.
format Article
id doaj-art-61803035fdf34b4aa84a5b4d92af52e5
institution Kabale University
issn 2291-9694
language English
publishDate 2025-01-01
publisher JMIR Publications
record_format Article
series JMIR Medical Informatics
spelling doaj-art-61803035fdf34b4aa84a5b4d92af52e52025-01-31T20:00:58ZengJMIR PublicationsJMIR Medical Informatics2291-96942025-01-0113e6497210.2196/64972Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic StudyQian Xuhttps://orcid.org/0009-0003-6723-1436Xue Caihttps://orcid.org/0000-0003-4647-3279Ruicong Yuhttps://orcid.org/0009-0006-0183-7525Yueyue Zhenghttps://orcid.org/0009-0008-0188-3868Guanjie Chenhttps://orcid.org/0000-0001-5946-6452Hui Sunhttps://orcid.org/0009-0000-1350-5647Tianyun Gaohttps://orcid.org/0009-0001-6773-6878Cuirong Xuhttps://orcid.org/0000-0002-8979-0533Jing Sunhttps://orcid.org/0000-0002-0097-2438 BackgroundChronic heart failure (CHF) is a serious threat to human health, with high morbidity and mortality rates, imposing a heavy burden on the health care system and society. With the abundance of medical data and the rapid development of machine learning (ML) technologies, new opportunities are provided for in-depth investigation of the mechanisms of CHF and the construction of predictive models. The introduction of health ecology research methodology enables a comprehensive dissection of CHF risk factors from a wider range of environmental, social, and individual factors. This not only helps to identify high-risk groups at an early stage but also provides a scientific basis for the development of precise prevention and intervention strategies. ObjectiveThis study aims to use ML to construct a predictive model of the risk of occurrence of CHF and analyze the risk of CHF from a health ecology perspective. MethodsThis study sourced data from the Jackson Heart Study database. Stringent data preprocessing procedures were implemented, which included meticulous management of missing values and the standardization of data. Principal component analysis and random forest (RF) were used as feature selection techniques. Subsequently, several ML models, namely decision tree, RF, extreme gradient boosting, adaptive boosting (AdaBoost), support vector machine, naive Bayes model, multilayer perceptron, and bootstrap forest, were constructed, and their performance was evaluated. The effectiveness of the models was validated through internal validation using a 10-fold cross-validation approach on the training and validation sets. In addition, the performance metrics of each model, including accuracy, precision, sensitivity, F1-score, and area under the curve (AUC), were compared. After selecting the best model, we used hyperparameter optimization to construct a better model. ResultsRF-selected features (21 in total) had an average root mean square error of 0.30, outperforming principal component analysis. Synthetic Minority Oversampling Technique and Edited Nearest Neighbors showed better accuracy in data balancing. The AdaBoost model was most effective with an AUC of 0.86, accuracy of 75.30%, precision of 0.86, sensitivity of 0.69, and F1-score of 0.76. Validation on the training and validation sets through 10-fold cross-validation gave an AUC of 0.97, an accuracy of 91.27%, a precision of 0.94, a sensitivity of 0.92, and an F1-score of 0.94. After random search processing, the accuracy and AUC of AdaBoost improved. Its accuracy was 77.68% and its AUC was 0.86. ConclusionsThis study offered insights into CHF risk prediction. Future research should focus on prospective studies, diverse data, advanced techniques, longitudinal studies, and exploring factor interactions for better CHF prevention and management.https://medinform.jmir.org/2025/1/e64972
spellingShingle Qian Xu
Xue Cai
Ruicong Yu
Yueyue Zheng
Guanjie Chen
Hui Sun
Tianyun Gao
Cuirong Xu
Jing Sun
Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study
JMIR Medical Informatics
title Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study
title_full Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study
title_fullStr Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study
title_full_unstemmed Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study
title_short Machine Learning–Based Risk Factor Analysis and Prediction Model Construction for the Occurrence of Chronic Heart Failure: Health Ecologic Study
title_sort machine learning based risk factor analysis and prediction model construction for the occurrence of chronic heart failure health ecologic study
url https://medinform.jmir.org/2025/1/e64972
work_keys_str_mv AT qianxu machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT xuecai machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT ruicongyu machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT yueyuezheng machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT guanjiechen machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT huisun machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT tianyungao machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT cuirongxu machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy
AT jingsun machinelearningbasedriskfactoranalysisandpredictionmodelconstructionfortheoccurrenceofchronicheartfailurehealthecologicstudy