Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets

Background:: Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potential...

Full description

Saved in:
Bibliographic Details
Main Authors: Abrar Faiaz Eram, Aliva Sadnim Mahmud, Marwan Mostafa Khadem, Md Amimul Ihsan
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Neuroscience Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2772528625000305
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849229236791934976
author Abrar Faiaz Eram
Aliva Sadnim Mahmud
Marwan Mostafa Khadem
Md Amimul Ihsan
author_facet Abrar Faiaz Eram
Aliva Sadnim Mahmud
Marwan Mostafa Khadem
Md Amimul Ihsan
author_sort Abrar Faiaz Eram
collection DOAJ
description Background:: Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings. Method:: This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases. Results:: Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario. Conclusions:: The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.
format Article
id doaj-art-621d5e9c79944dbcb1bf78496554e176
institution Kabale University
issn 2772-5286
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Neuroscience Informatics
spelling doaj-art-621d5e9c79944dbcb1bf78496554e1762025-08-22T04:58:47ZengElsevierNeuroscience Informatics2772-52862025-09-015310021510.1016/j.neuri.2025.100215Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasetsAbrar Faiaz Eram0Aliva Sadnim Mahmud1Marwan Mostafa Khadem2Md Amimul Ihsan3Department of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1000, BangladeshDepartment of Electrical and Electronic Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1000, BangladeshDepartment of Japanese Studies, University of Dhaka, Dhaka, 1000, BangladeshDepartment of Electrical and Electronic Engineering, Jamalpur Science and Technology University, Jamalpur, 2010, Bangladesh; Department of Biomedical Physics and Technology, University of Dhaka, Dhaka, Bangladesh; Corresponding author at: Department of Electrical and Electronic Engineering, Jamalpur Science and Technology University, Jamalpur, 2010, Bangladesh.Background:: Brain stroke, characterized by interrupted blood flow to the brain, poses significant mortality risks and quality-of-life impacts. While machine learning approaches show promise in stroke prediction, current research often relies on synthetic data to address dataset imbalance, potentially compromising real-world model performance in clinical settings. Method:: This research proposes an alternative approach focusing on recall as the primary evaluation metric for stroke prediction, specifically targeting the reduction of false negatives. In the context of stroke diagnosis, where missed detection can lead to severe consequences or fatality, recall is crucial for directly measuring the model's ability to identify actual stroke cases. Results:: Three superior models were identified: Logistic Regression, an Ensemble using Soft Voting (combining Gaussian Naive Bayes and Logistic Regression), and customized Support Vector Machine. Exceptional stroke prediction was achieved with recall values of 92%, 92%, and 94%, respectively. Interpretability is enhanced through SHAP applied to the best one. While previous methods showed recall values between 5.6% and 40%, this approach outperformed these benchmarks (94%). Current research emphasizes accuracy metrics, relying on oversampling, being inappropriate for sensitive medical datasets. The pitfall is a slight increase in false positives, which is tolerable because the cost of misdiagnosing a stroke patient far outweighs the reverse scenario. Conclusions:: The research demonstrates the effectiveness of focusing on recall as an evaluation metric for stroke prediction, minimizing false negative predictions. To facilitate practical implementation, a mobile application incorporating the best-performing model was included. A primary screening which can supplement doctors in stroke diagnosis and prediction was proposed.http://www.sciencedirect.com/science/article/pii/S2772528625000305Brain stroke predictionData imbalanceRecallSHAP
spellingShingle Abrar Faiaz Eram
Aliva Sadnim Mahmud
Marwan Mostafa Khadem
Md Amimul Ihsan
Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
Neuroscience Informatics
Brain stroke prediction
Data imbalance
Recall
SHAP
title Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
title_full Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
title_fullStr Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
title_full_unstemmed Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
title_short Beyond the numbers: App-enabled stroke prediction system for high-risk individuals in imbalanced datasets
title_sort beyond the numbers app enabled stroke prediction system for high risk individuals in imbalanced datasets
topic Brain stroke prediction
Data imbalance
Recall
SHAP
url http://www.sciencedirect.com/science/article/pii/S2772528625000305
work_keys_str_mv AT abrarfaiazeram beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets
AT alivasadnimmahmud beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets
AT marwanmostafakhadem beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets
AT mdamimulihsan beyondthenumbersappenabledstrokepredictionsystemforhighriskindividualsinimbalanceddatasets