SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction

Abstract Cardiovascular disease (CVD) remains a leading global health concern, accounting for approximately 31.5% of deaths worldwide. According to the World Health Organization (WHO), over 20.5 million people succumb to CVD each year—a figure projected to rise to 24.2 million by 2030. Early diagnos...

Full description

Saved in:
Bibliographic Details
Main Authors: Anima Naik, Ghanshyam G. Tejani, Seyed Jalaleddin Mousavirad
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-02525-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850127955758415872
author Anima Naik
Ghanshyam G. Tejani
Seyed Jalaleddin Mousavirad
author_facet Anima Naik
Ghanshyam G. Tejani
Seyed Jalaleddin Mousavirad
author_sort Anima Naik
collection DOAJ
description Abstract Cardiovascular disease (CVD) remains a leading global health concern, accounting for approximately 31.5% of deaths worldwide. According to the World Health Organization (WHO), over 20.5 million people succumb to CVD each year—a figure projected to rise to 24.2 million by 2030. Early diagnosis is critical and can be facilitated by monitoring key risk factors such as cholesterol levels, blood pressure, diabetes, and obesity. This study proposes a heart disease prediction (HDP) model employing Random Forest (RF) and eXtreme Gradient Boosting (XGB) classifiers. Both models are further optimized through hyperparameter tuning using the Social Group Optimization (SGO) algorithm. The model was developed and validated using the Cleveland and Statlog datasets from the UCI repository. Pre-optimization results for RF yielded an accuracy (Acc.) of 84% and a ROC-AUC score of 92.03% on the Cleveland dataset, and 88.09% Acc. with a ROC-AUC of 97.50% on Statlog. The XGB classifier achieved 81.97% Acc. and a ROC-AUC of 90.73% on Cleveland, and 92.86% Acc. with a ROC-AUC of 96.14% on Statlog. After SGO-based optimization, RF improved to 95.08% Acc. and 95.26% ROC-AUC on Cleveland, and 95.24% Acc. with 98.18% ROC-AUC on Statlog. Similarly, the optimized XGB classifier reached 93.44% Acc. and 95.24% ROC-AUC on Cleveland, and 97.62% Acc. with 97.50% ROC-AUC on Statlog. These results highlight the effectiveness of SGO in enhancing ML performance for medical prediction problems. However, the study has certain limitations. The evaluation was conducted solely on two benchmark datasets, which may not fully reflect the diversity and complexity of real-world clinical populations. Furthermore, external validation using independent or real-time clinical data was not performed, which may limit the generalizability of the results. The computational cost associated with SGO optimization was also not assessed. Future research should focus on validating the model across broader datasets, assessing real-world applicability, and analyzing computational efficiency to ensure scalability and clinical adoption.
format Article
id doaj-art-39de98da4644427882ba1e240a6dc519
institution OA Journals
issn 2045-2322
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-39de98da4644427882ba1e240a6dc5192025-08-20T02:33:31ZengNature PortfolioScientific Reports2045-23222025-05-0115113110.1038/s41598-025-02525-7SGO enhanced random forest and extreme gradient boosting framework for heart disease predictionAnima Naik0Ghanshyam G. Tejani1Seyed Jalaleddin Mousavirad2Department of CSE, Raghu Engineering CollegeDepartment of Research Analytics, Saveetha Dental College and Hospitals, Saveetha Institute of Medical and Technical Sciences, Saveetha UniversityDepartment of Computer and Electrical Engineering, Mid Sweden UniversityAbstract Cardiovascular disease (CVD) remains a leading global health concern, accounting for approximately 31.5% of deaths worldwide. According to the World Health Organization (WHO), over 20.5 million people succumb to CVD each year—a figure projected to rise to 24.2 million by 2030. Early diagnosis is critical and can be facilitated by monitoring key risk factors such as cholesterol levels, blood pressure, diabetes, and obesity. This study proposes a heart disease prediction (HDP) model employing Random Forest (RF) and eXtreme Gradient Boosting (XGB) classifiers. Both models are further optimized through hyperparameter tuning using the Social Group Optimization (SGO) algorithm. The model was developed and validated using the Cleveland and Statlog datasets from the UCI repository. Pre-optimization results for RF yielded an accuracy (Acc.) of 84% and a ROC-AUC score of 92.03% on the Cleveland dataset, and 88.09% Acc. with a ROC-AUC of 97.50% on Statlog. The XGB classifier achieved 81.97% Acc. and a ROC-AUC of 90.73% on Cleveland, and 92.86% Acc. with a ROC-AUC of 96.14% on Statlog. After SGO-based optimization, RF improved to 95.08% Acc. and 95.26% ROC-AUC on Cleveland, and 95.24% Acc. with 98.18% ROC-AUC on Statlog. Similarly, the optimized XGB classifier reached 93.44% Acc. and 95.24% ROC-AUC on Cleveland, and 97.62% Acc. with 97.50% ROC-AUC on Statlog. These results highlight the effectiveness of SGO in enhancing ML performance for medical prediction problems. However, the study has certain limitations. The evaluation was conducted solely on two benchmark datasets, which may not fully reflect the diversity and complexity of real-world clinical populations. Furthermore, external validation using independent or real-time clinical data was not performed, which may limit the generalizability of the results. The computational cost associated with SGO optimization was also not assessed. Future research should focus on validating the model across broader datasets, assessing real-world applicability, and analyzing computational efficiency to ensure scalability and clinical adoption.https://doi.org/10.1038/s41598-025-02525-7SGORFXGBClassifierHeart diseaseCleveland datasetStatlog dataset
spellingShingle Anima Naik
Ghanshyam G. Tejani
Seyed Jalaleddin Mousavirad
SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction
Scientific Reports
SGO
RF
XGBClassifier
Heart disease
Cleveland dataset
Statlog dataset
title SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction
title_full SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction
title_fullStr SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction
title_full_unstemmed SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction
title_short SGO enhanced random forest and extreme gradient boosting framework for heart disease prediction
title_sort sgo enhanced random forest and extreme gradient boosting framework for heart disease prediction
topic SGO
RF
XGBClassifier
Heart disease
Cleveland dataset
Statlog dataset
url https://doi.org/10.1038/s41598-025-02525-7
work_keys_str_mv AT animanaik sgoenhancedrandomforestandextremegradientboostingframeworkforheartdiseaseprediction
AT ghanshyamgtejani sgoenhancedrandomforestandextremegradientboostingframeworkforheartdiseaseprediction
AT seyedjalaleddinmousavirad sgoenhancedrandomforestandextremegradientboostingframeworkforheartdiseaseprediction