The effect of imbalance data mitigation techniques on cardiovascular disease prediction
The prevalence of class imbalance is a common challenge in medical datasets, which can adversely affect the performance of machine learning models. This paper explores how several data imbalance mitigation techniques affect the performance of cardiovascular disease prediction. This study applied va...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nigerian Society of Physical Sciences
2025-05-01
|
| Series: | Journal of Nigerian Society of Physical Sciences |
| Subjects: | |
| Online Access: | https://journal.nsps.org.ng/index.php/jnsps/article/view/2385 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850199762691686400 |
|---|---|
| author | Raphael Ozighor Enihe Rajesh Prasad Francisca Nonyelum Ogwueleka Fatimah Binta Abdullahi |
| author_facet | Raphael Ozighor Enihe Rajesh Prasad Francisca Nonyelum Ogwueleka Fatimah Binta Abdullahi |
| author_sort | Raphael Ozighor Enihe |
| collection | DOAJ |
| description |
The prevalence of class imbalance is a common challenge in medical datasets, which can adversely affect the performance of machine learning models. This paper explores how several data imbalance mitigation techniques affect the performance of cardiovascular disease prediction. This study applied various data balancing techniques on a real-life cardiovascular disease (CVD) dataset of 1000 patient records with 14 features obtained from the University of Abuja Teaching Hospital Nigeria to address this problem. The data balancing techniques used include random under-sampling, Synthetic Minority Over-sampling Technique (SMOTE), Synthetic Minority Oversampling-Edited Nearest Neighbour (SMOTE-ENN), and the combination of SMOTE and Tomek Links undersampling (SMOTE-TOMEK). After applying these techniques, their performance was evaluated on seven machine learning models, including Random Forest, XGBoost, LightGBM, Gradient Boosting, K-Nearest Neighbours, Decision Tree, and Support Vector Machine. The evaluation metrics used are precision, recall, F1-score, accuracy, and receiver operating characteristic-area under the curve (ROC-AUC). Learning curve plots were also used to showcase the impact of the different data balancing techniques on the challenges of overfitting and underfitting. The results showed that the application of data balancing techniques significantly enhances the performance of machine learning models in heart disease prediction and effectively addresses the challenges of overfitting and underfitting with SMOTE-TOMEK, yielding the best-balanced fit as well as the highest precision, recall, F1-score, accuracy of 92%, and ROC-AUC of 96% on the Lightweight Gradient Boosting Machine (LightGBM) model. These results underscore the critical role of data balancing in predictive modelling for heart disease and highlight the effectiveness of specific techniques and models in achieving accurate, more reliable, and generalised predictions.
|
| format | Article |
| id | doaj-art-a721a9bd021e48d4b63702ef1e370cbf |
| institution | OA Journals |
| issn | 2714-2817 2714-4704 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nigerian Society of Physical Sciences |
| record_format | Article |
| series | Journal of Nigerian Society of Physical Sciences |
| spelling | doaj-art-a721a9bd021e48d4b63702ef1e370cbf2025-08-20T02:12:33ZengNigerian Society of Physical SciencesJournal of Nigerian Society of Physical Sciences2714-28172714-47042025-05-017210.46481/jnsps.2025.2385The effect of imbalance data mitigation techniques on cardiovascular disease predictionRaphael Ozighor Enihe0https://orcid.org/0000-0001-8155-4205Rajesh Prasad1Francisca Nonyelum Ogwueleka2Fatimah Binta Abdullahi3Department of Computer Science, Baze University, Abuja, NigeriaDepartment of Computer Science & Engineering, Ajay Kumar Garg Engineering College, Ghaziabad, India; Department of Computer Science, University of Abuja, Abuja, NigeriaDepartment of Computer Science, University of AbujaDepartment of Computer Science, University of Abuja, Abuja, Nigeria The prevalence of class imbalance is a common challenge in medical datasets, which can adversely affect the performance of machine learning models. This paper explores how several data imbalance mitigation techniques affect the performance of cardiovascular disease prediction. This study applied various data balancing techniques on a real-life cardiovascular disease (CVD) dataset of 1000 patient records with 14 features obtained from the University of Abuja Teaching Hospital Nigeria to address this problem. The data balancing techniques used include random under-sampling, Synthetic Minority Over-sampling Technique (SMOTE), Synthetic Minority Oversampling-Edited Nearest Neighbour (SMOTE-ENN), and the combination of SMOTE and Tomek Links undersampling (SMOTE-TOMEK). After applying these techniques, their performance was evaluated on seven machine learning models, including Random Forest, XGBoost, LightGBM, Gradient Boosting, K-Nearest Neighbours, Decision Tree, and Support Vector Machine. The evaluation metrics used are precision, recall, F1-score, accuracy, and receiver operating characteristic-area under the curve (ROC-AUC). Learning curve plots were also used to showcase the impact of the different data balancing techniques on the challenges of overfitting and underfitting. The results showed that the application of data balancing techniques significantly enhances the performance of machine learning models in heart disease prediction and effectively addresses the challenges of overfitting and underfitting with SMOTE-TOMEK, yielding the best-balanced fit as well as the highest precision, recall, F1-score, accuracy of 92%, and ROC-AUC of 96% on the Lightweight Gradient Boosting Machine (LightGBM) model. These results underscore the critical role of data balancing in predictive modelling for heart disease and highlight the effectiveness of specific techniques and models in achieving accurate, more reliable, and generalised predictions. https://journal.nsps.org.ng/index.php/jnsps/article/view/2385Imbalance datasetCardiovascular disease predictionSMOTE-TOMEKMarchine learningOverfitting and Underfitting |
| spellingShingle | Raphael Ozighor Enihe Rajesh Prasad Francisca Nonyelum Ogwueleka Fatimah Binta Abdullahi The effect of imbalance data mitigation techniques on cardiovascular disease prediction Journal of Nigerian Society of Physical Sciences Imbalance dataset Cardiovascular disease prediction SMOTE-TOMEK Marchine learning Overfitting and Underfitting |
| title | The effect of imbalance data mitigation techniques on cardiovascular disease prediction |
| title_full | The effect of imbalance data mitigation techniques on cardiovascular disease prediction |
| title_fullStr | The effect of imbalance data mitigation techniques on cardiovascular disease prediction |
| title_full_unstemmed | The effect of imbalance data mitigation techniques on cardiovascular disease prediction |
| title_short | The effect of imbalance data mitigation techniques on cardiovascular disease prediction |
| title_sort | effect of imbalance data mitigation techniques on cardiovascular disease prediction |
| topic | Imbalance dataset Cardiovascular disease prediction SMOTE-TOMEK Marchine learning Overfitting and Underfitting |
| url | https://journal.nsps.org.ng/index.php/jnsps/article/view/2385 |
| work_keys_str_mv | AT raphaelozighorenihe theeffectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction AT rajeshprasad theeffectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction AT franciscanonyelumogwueleka theeffectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction AT fatimahbintaabdullahi theeffectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction AT raphaelozighorenihe effectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction AT rajeshprasad effectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction AT franciscanonyelumogwueleka effectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction AT fatimahbintaabdullahi effectofimbalancedatamitigationtechniquesoncardiovasculardiseaseprediction |