Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning

Stroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM...

Full description

Saved in:
Bibliographic Details
Main Authors: Adiwijaya Adiwijaya, Nur Ghaniaviyanto Ramadhan
Format: Article
Language:English
Published: Universitas Ahmad Dahlan 2025-02-01
Series:IJAIN (International Journal of Advances in Intelligent Informatics)
Subjects:
Online Access:https://ijain.org/index.php/IJAIN/article/view/1678
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850240773400821760
author Adiwijaya Adiwijaya
Nur Ghaniaviyanto Ramadhan
author_facet Adiwijaya Adiwijaya
Nur Ghaniaviyanto Ramadhan
author_sort Adiwijaya Adiwijaya
collection DOAJ
description Stroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM), hypertension, and high cholesterol levels. Effective preventive measures are crucial to minimize stroke risk, and using predictive methods based on data analysis from the clinical examination dataset over the last three years (2019-2021), known as the general checkup (GCU) dataset, presents an innovative approach. This study aims to predict an individual's stroke risk for the following year. In this context, the study also addresses the preprocessing stage of the GCU dataset, which includes solutions for missing values by substituting them with the statistical mean, label encoding, feature correlation analysis using entropy values, and addressing data imbalance with the Adaptive Synthetic (ADASYN) technique. To evaluate their predictive performance, the research involves comparisons among various machine learning models. The outcome of the experiment shows that the Random Forest model is the best model, with 98.7% accuracy and 63.9% F1-Score. This research highlights the importance of preemptive measures against stroke by utilizing predictive techniques on clinical data, with the Random Forest model proving most effective in forecasting stroke probability.
format Article
id doaj-art-cdd31dba679f45aeba61ee49a91e1dee
institution OA Journals
issn 2442-6571
2548-3161
language English
publishDate 2025-02-01
publisher Universitas Ahmad Dahlan
record_format Article
series IJAIN (International Journal of Advances in Intelligent Informatics)
spelling doaj-art-cdd31dba679f45aeba61ee49a91e1dee2025-08-20T02:00:46ZengUniversitas Ahmad DahlanIJAIN (International Journal of Advances in Intelligent Informatics)2442-65712548-31612025-02-01111395410.26555/ijain.v11i1.1678327Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learningAdiwijaya Adiwijaya0Nur Ghaniaviyanto Ramadhan1School of Computing, Telkom UniversitySchool of Computing, Telkom UniversityStroke is a serious medical condition resulting from disturbances in blood flow to the brain, signaling a chronic health issue that requires an immediate response. Principal risk factors increasing the likelihood of stroke include the presence of pre-existing conditions such as Diabetes Mellitus (DM), hypertension, and high cholesterol levels. Effective preventive measures are crucial to minimize stroke risk, and using predictive methods based on data analysis from the clinical examination dataset over the last three years (2019-2021), known as the general checkup (GCU) dataset, presents an innovative approach. This study aims to predict an individual's stroke risk for the following year. In this context, the study also addresses the preprocessing stage of the GCU dataset, which includes solutions for missing values by substituting them with the statistical mean, label encoding, feature correlation analysis using entropy values, and addressing data imbalance with the Adaptive Synthetic (ADASYN) technique. To evaluate their predictive performance, the research involves comparisons among various machine learning models. The outcome of the experiment shows that the Random Forest model is the best model, with 98.7% accuracy and 63.9% F1-Score. This research highlights the importance of preemptive measures against stroke by utilizing predictive techniques on clinical data, with the Random Forest model proving most effective in forecasting stroke probability.https://ijain.org/index.php/IJAIN/article/view/1678general checkup datamachine learningstroke predictionadasynrandom forest
spellingShingle Adiwijaya Adiwijaya
Nur Ghaniaviyanto Ramadhan
Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
IJAIN (International Journal of Advances in Intelligent Informatics)
general checkup data
machine learning
stroke prediction
adasyn
random forest
title Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
title_full Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
title_fullStr Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
title_full_unstemmed Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
title_short Analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
title_sort analyzing risk factors and handling imbalanced data for predicting stroke risk using machine learning
topic general checkup data
machine learning
stroke prediction
adasyn
random forest
url https://ijain.org/index.php/IJAIN/article/view/1678
work_keys_str_mv AT adiwijayaadiwijaya analyzingriskfactorsandhandlingimbalanceddataforpredictingstrokeriskusingmachinelearning
AT nurghaniaviyantoramadhan analyzingriskfactorsandhandlingimbalanceddataforpredictingstrokeriskusingmachinelearning