Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
Background:: Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potential...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-09-01
|
| Series: | Neuroscience Informatics |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2772528625000305 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849229236791934976 |
|---|---|
| author | Abrar Faiaz Eram Aliva Sadnim Mahmud Marwan Mostafa Khadem Md Amimul Ihsan |
| author_facet | Abrar Faiaz Eram Aliva Sadnim Mahmud Marwan Mostafa Khadem Md Amimul Ihsan |
| author_sort | Abrar Faiaz Eram |
| collection | DOAJ |
| description | Background:: Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings. Method:: This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases. Results:: Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario. Conclusions:: The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed. |
| format | Article |
| id | doaj-art-621d5e9c79944dbcb1bf78496554e176 |
| institution | Kabale University |
| issn | 2772-5286 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Neuroscience Informatics |
| spelling | doaj-art-621d5e9c79944dbcb1bf78496554e1762025-08-22T04:58:47ZengElsevierNeuroscience Informatics2772-52862025-09-015310021510.1016/j.neuri.2025.100215Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasetsAbrar Faiaz Eram0Aliva Sadnim Mahmud1Marwan Mostafa Khadem2Md Amimul Ihsan3Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1000, BangladeshDepartment of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1000, BangladeshDepartment of Japanese Studies, University of Dhaka, Dhaka, 1000, BangladeshDepartment of Electrical and Electronic Engineering, Jamalpur Science and Technology University, Jamalpur, 2010, Bangladesh; Department of Biomedical Physics and Technology, University of Dhaka, Dhaka, Bangladesh; Corresponding author at: Department of Electrical and Electronic Engineering, Jamalpur Science and Technology University, Jamalpur, 2010, Bangladesh.Background:: Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings. Method:: This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases. Results:: Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario. Conclusions:: The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.http://www.sciencedirect.com/science/article/pii/S2772528625000305Brain stroke predictionData imbalanceRecallSHAP |
| spellingShingle | Abrar Faiaz Eram Aliva Sadnim Mahmud Marwan Mostafa Khadem Md Amimul Ihsan Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets Neuroscience Informatics Brain stroke prediction Data imbalance Recall SHAP |
| title | Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets |
| title_full | Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets |
| title_fullStr | Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets |
| title_full_unstemmed | Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets |
| title_short | Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets |
| title_sort | beyond the numbers app enabled stroke prediction system for high risk individuals in imbalanced datasets |
| topic | Brain stroke prediction Data imbalance Recall SHAP |
| url | http://www.sciencedirect.com/science/article/pii/S2772528625000305 |
| work_keys_str_mv | AT abrarfaiazeram beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets AT alivasadnimmahmud beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets AT marwanmostafakhadem beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets AT mdamimulihsan beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets |