Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach
Stroke disease has been the leading cause of death globally for the last several decades. Thus, the death rate can be decreased by early recognition of disease and ongoing surveillance. However, the largest obstacle to perform advanced analytics using the conventional approach is the growth of massi...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2025-01-01
|
| Series: | Applied Computational Intelligence and Soft Computing |
| Online Access: | http://dx.doi.org/10.1155/acis/7394597 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849311232105906176 |
|---|---|
| author | Assefa Senbato Genale Tsion Ayalew Dessalegn |
| author_facet | Assefa Senbato Genale Tsion Ayalew Dessalegn |
| author_sort | Assefa Senbato Genale |
| collection | DOAJ |
| description | Stroke disease has been the leading cause of death globally for the last several decades. Thus, the death rate can be decreased by early recognition of disease and ongoing surveillance. However, the largest obstacle to perform advanced analytics using the conventional approach is the growth of massive amount of data from various sources, including patient histories, wearable sensor devices, and medical data. The current technology that could have a large impact on the healthcare sector is the integration of machine learning with big data analytics (scalable machine learning), particularly in the early diagnosis of this disease. To address this issue, a scalable stroke disease prediction model for a multinode distributed environment, which was developed by combining big data analytics concepts with machine learning to handle extensive healthcare datasets, an aspect not seen in the prior literature on stroke disease detection, is presented in this work. We have implemented four scalable algorithms: logistic regression, random forest, gradient-boosting tree, and decision tree, using a dataset that was collected from a Medical Quality Improvement Consortium database. As a result, two worker nodes and one master node were used to analyze the dataset. The model’s performance was assessed using performance metrics including the area under the curve (AUC) and confusion matrix. With an accuracy of 94.3% and an AUC score of 99%, the random forest was determined to be better based on the experimental results. It was also shown that the main risk factor for stroke disease is diabetes, which is followed by hypertension. This study demonstrated the effectiveness of using Spark’s scalable machine learning techniques to forecast stroke disease and identify risk factors earlier. The findings of this study can be utilized by physicians as clinical decision aids to aid in the more accurate identification of stroke disease. |
| format | Article |
| id | doaj-art-1ce8dec65c794efa811cb74815fde785 |
| institution | Kabale University |
| issn | 1687-9732 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | Wiley |
| record_format | Article |
| series | Applied Computational Intelligence and Soft Computing |
| spelling | doaj-art-1ce8dec65c794efa811cb74815fde7852025-08-20T03:53:28ZengWileyApplied Computational Intelligence and Soft Computing1687-97322025-01-01202510.1155/acis/7394597Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning ApproachAssefa Senbato Genale0Tsion Ayalew Dessalegn1Department of Computer ScienceDepartment of Software EngineeringStroke disease has been the leading cause of death globally for the last several decades. Thus, the death rate can be decreased by early recognition of disease and ongoing surveillance. However, the largest obstacle to perform advanced analytics using the conventional approach is the growth of massive amount of data from various sources, including patient histories, wearable sensor devices, and medical data. The current technology that could have a large impact on the healthcare sector is the integration of machine learning with big data analytics (scalable machine learning), particularly in the early diagnosis of this disease. To address this issue, a scalable stroke disease prediction model for a multinode distributed environment, which was developed by combining big data analytics concepts with machine learning to handle extensive healthcare datasets, an aspect not seen in the prior literature on stroke disease detection, is presented in this work. We have implemented four scalable algorithms: logistic regression, random forest, gradient-boosting tree, and decision tree, using a dataset that was collected from a Medical Quality Improvement Consortium database. As a result, two worker nodes and one master node were used to analyze the dataset. The model’s performance was assessed using performance metrics including the area under the curve (AUC) and confusion matrix. With an accuracy of 94.3% and an AUC score of 99%, the random forest was determined to be better based on the experimental results. It was also shown that the main risk factor for stroke disease is diabetes, which is followed by hypertension. This study demonstrated the effectiveness of using Spark’s scalable machine learning techniques to forecast stroke disease and identify risk factors earlier. The findings of this study can be utilized by physicians as clinical decision aids to aid in the more accurate identification of stroke disease.http://dx.doi.org/10.1155/acis/7394597 |
| spellingShingle | Assefa Senbato Genale Tsion Ayalew Dessalegn Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach Applied Computational Intelligence and Soft Computing |
| title | Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach |
| title_full | Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach |
| title_fullStr | Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach |
| title_full_unstemmed | Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach |
| title_short | Developing a Predictive Model for Stroke Disease Detection Using a Scalable Machine Learning Approach |
| title_sort | developing a predictive model for stroke disease detection using a scalable machine learning approach |
| url | http://dx.doi.org/10.1155/acis/7394597 |
| work_keys_str_mv | AT assefasenbatogenale developingapredictivemodelforstrokediseasedetectionusingascalablemachinelearningapproach AT tsionayalewdessalegn developingapredictivemodelforstrokediseasedetectionusingascalablemachinelearningapproach |