Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach

Stroke disease has been the leading cause of death globally for the last several decades. Thus, the death rate can be decreased by early recognition of disease and ongoing surveillance. However, the largest obstacle to perform advanced analytics using the conventional approach is the growth of massi...

Full description

Saved in:
Bibliographic Details
Main Authors: Assefa Senbato Genale, Tsion Ayalew Dessalegn
Format: Article
Language:English
Published: Wiley 2025-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/acis/7394597
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849311232105906176
author Assefa Senbato Genale
Tsion Ayalew Dessalegn
author_facet Assefa Senbato Genale
Tsion Ayalew Dessalegn
author_sort Assefa Senbato Genale
collection DOAJ
description Stroke disease has been the leading cause of death globally for the last several decades. Thus, the death rate can be decreased by early recognition of disease and ongoing surveillance. However, the largest obstacle to perform advanced analytics using the conventional approach is the growth of massive amount of data from various sources, including patient histories, wearable sensor devices, and medical data. The current technology that could have a large impact on the healthcare sector is the integration of machine learning with big data analytics (scalable machine learning), particularly in the early diagnosis of this disease. To address this issue, a scalable stroke disease prediction model for a multinode distributed environment, which was developed by combining big data analytics concepts with machine learning to handle extensive healthcare datasets, an aspect not seen in the prior literature on stroke disease detection, is presented in this work. We have implemented four scalable algorithms: logistic regression, random forest, gradient-boosting tree, and decision tree, using a dataset that was collected from a Medical Quality Improvement Consortium database. As a result, two worker nodes and one master node were used to analyze the dataset. The model’s performance was assessed using performance metrics including the area under the curve (AUC) and confusion matrix. With an accuracy of 94.3% and an AUC score of 99%, the random forest was determined to be better based on the experimental results. It was also shown that the main risk factor for stroke disease is diabetes, which is followed by hypertension. This study demonstrated the effectiveness of using Spark’s scalable machine learning techniques to forecast stroke disease and identify risk factors earlier. The findings of this study can be utilized by physicians as clinical decision aids to aid in the more accurate identification of stroke disease.
format Article
id doaj-art-1ce8dec65c794efa811cb74815fde785
institution Kabale University
issn 1687-9732
language English
publishDate 2025-01-01
publisher Wiley
record_format Article
series Applied Computational Intelligence and Soft Computing
spelling doaj-art-1ce8dec65c794efa811cb74815fde7852025-08-20T03:53:28ZengWileyApplied Computational Intelligence and Soft Computing1687-97322025-01-01202510.1155/acis/7394597Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning ApproachAssefa Senbato Genale0Tsion Ayalew Dessalegn1Department of Computer ScienceDepartment of Software EngineeringStroke disease has been the leading cause of death globally for the last several decades. Thus, the death rate can be decreased by early recognition of disease and ongoing surveillance. However, the largest obstacle to perform advanced analytics using the conventional approach is the growth of massive amount of data from various sources, including patient histories, wearable sensor devices, and medical data. The current technology that could have a large impact on the healthcare sector is the integration of machine learning with big data analytics (scalable machine learning), particularly in the early diagnosis of this disease. To address this issue, a scalable stroke disease prediction model for a multinode distributed environment, which was developed by combining big data analytics concepts with machine learning to handle extensive healthcare datasets, an aspect not seen in the prior literature on stroke disease detection, is presented in this work. We have implemented four scalable algorithms: logistic regression, random forest, gradient-boosting tree, and decision tree, using a dataset that was collected from a Medical Quality Improvement Consortium database. As a result, two worker nodes and one master node were used to analyze the dataset. The model’s performance was assessed using performance metrics including the area under the curve (AUC) and confusion matrix. With an accuracy of 94.3% and an AUC score of 99%, the random forest was determined to be better based on the experimental results. It was also shown that the main risk factor for stroke disease is diabetes, which is followed by hypertension. This study demonstrated the effectiveness of using Spark’s scalable machine learning techniques to forecast stroke disease and identify risk factors earlier. The findings of this study can be utilized by physicians as clinical decision aids to aid in the more accurate identification of stroke disease.http://dx.doi.org/10.1155/acis/7394597
spellingShingle Assefa Senbato Genale
Tsion Ayalew Dessalegn
Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach
Applied Computational Intelligence and Soft Computing
title Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach
title_full Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach
title_fullStr Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach
title_full_unstemmed Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach
title_short Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach
title_sort developing a predictive model for stroke disease detection using a scalable machine learning approach
url http://dx.doi.org/10.1155/acis/7394597
work_keys_str_mv AT assefasenbatogenale developingapredictivemodelforstrokediseasedetectionusingascalablemachinelearningapproach
AT tsionayalewdessalegn developingapredictivemodelforstrokediseasedetectionusingascalablemachinelearningapproach