Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
Stroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Universitas Ahmad Dahlan
2025-02-01
|
| Series: | IJAIN (International Journal of Advances in Intelligent Informatics) |
| Subjects: | |
| Online Access: | https://ijain.org/index.php/IJAIN/article/view/1678 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850240773400821760 |
|---|---|
| author | Adiwijaya Adiwijaya Nur Ghaniaviyanto Ramadhan |
| author_facet | Adiwijaya Adiwijaya Nur Ghaniaviyanto Ramadhan |
| author_sort | Adiwijaya Adiwijaya |
| collection | DOAJ |
| description | Stroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM), hypertension, and high cholesterol levels. Effective preventive measures are crucial to minimize stroke risk, and using predictive methods based on data analysis from the clinical examination dataset over the last three years (2019-2021), known as the general checkup (GCU) dataset, presents an innovative approach. This study aims to predict an individual's stroke risk for the following year. In this context, the study also addresses the preprocessing stage of the GCU dataset, which includes solutions for missing values by substituting them with the statistical mean, label encoding, feature correlation analysis using entropy values, and addressing data imbalance with the Adaptive Synthetic (ADASYN) technique. To evaluate their predictive performance, the research involves comparisons among various machine learning models. The outcome of the experiment shows that the Random Forest model is the best model, with 98.7% accuracy and 63.9% F1-Score. This research highlights the importance of preemptive measures against stroke by utilizing predictive techniques on clinical data, with the Random Forest model proving most effective in forecasting stroke probability. |
| format | Article |
| id | doaj-art-cdd31dba679f45aeba61ee49a91e1dee |
| institution | OA Journals |
| issn | 2442-6571 2548-3161 |
| language | English |
| publishDate | 2025-02-01 |
| publisher | Universitas Ahmad Dahlan |
| record_format | Article |
| series | IJAIN (International Journal of Advances in Intelligent Informatics) |
| spelling | doaj-art-cdd31dba679f45aeba61ee49a91e1dee2025-08-20T02:00:46ZengUniversitas Ahmad DahlanIJAIN (International Journal of Advances in Intelligent Informatics)2442-65712548-31612025-02-01111395410.26555/ijain.v11i1.1678327Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learningAdiwijaya Adiwijaya0Nur Ghaniaviyanto Ramadhan1School of Computing, Telkom UniversitySchool of Computing, Telkom UniversityStroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM), hypertension, and high cholesterol levels. Effective preventive measures are crucial to minimize stroke risk, and using predictive methods based on data analysis from the clinical examination dataset over the last three years (2019-2021), known as the general checkup (GCU) dataset, presents an innovative approach. This study aims to predict an individual's stroke risk for the following year. In this context, the study also addresses the preprocessing stage of the GCU dataset, which includes solutions for missing values by substituting them with the statistical mean, label encoding, feature correlation analysis using entropy values, and addressing data imbalance with the Adaptive Synthetic (ADASYN) technique. To evaluate their predictive performance, the research involves comparisons among various machine learning models. The outcome of the experiment shows that the Random Forest model is the best model, with 98.7% accuracy and 63.9% F1-Score. This research highlights the importance of preemptive measures against stroke by utilizing predictive techniques on clinical data, with the Random Forest model proving most effective in forecasting stroke probability.https://ijain.org/index.php/IJAIN/article/view/1678general checkup datamachine learningstroke predictionadasynrandom forest |
| spellingShingle | Adiwijaya Adiwijaya Nur Ghaniaviyanto Ramadhan Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning IJAIN (International Journal of Advances in Intelligent Informatics) general checkup data machine learning stroke prediction adasyn random forest |
| title | Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning |
| title_full | Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning |
| title_fullStr | Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning |
| title_full_unstemmed | Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning |
| title_short | Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning |
| title_sort | analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning |
| topic | general checkup data machine learning stroke prediction adasyn random forest |
| url | https://ijain.org/index.php/IJAIN/article/view/1678 |
| work_keys_str_mv | AT adiwijayaadiwijaya analyzingriskfactorsandhandlingimbalanceddataforpredictingstrokeriskusingmachinelearning AT nurghaniaviyantoramadhan analyzingriskfactorsandhandlingimbalanceddataforpredictingstrokeriskusingmachinelearning |