Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic stroke

ObjectiveTo develop and validate a machine learning (ML)-based model for predicting stroke-associated pneumonia (SAP) risk in older adult hemorrhagic stroke patients.MethodsA retrospective collection of older adult hemorrhagic stroke patients from three tertiary hospitals in Guiyang, Guizhou Provinc...

Full description

Saved in:
Bibliographic Details
Main Authors: Yi Cao, Haipeng Deng, Shaoyun Liu, Xi Zeng, Yangyang Gou, Weiting Zhang, Yixinyuan Li, Hua Yang, Min Peng
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-06-01
Series:Frontiers in Neurology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fneur.2025.1591570/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849422531062136832
author Yi Cao
Yi Cao
Haipeng Deng
Shaoyun Liu
Xi Zeng
Yangyang Gou
Weiting Zhang
Yixinyuan Li
Hua Yang
Min Peng
author_facet Yi Cao
Yi Cao
Haipeng Deng
Shaoyun Liu
Xi Zeng
Yangyang Gou
Weiting Zhang
Yixinyuan Li
Hua Yang
Min Peng
author_sort Yi Cao
collection DOAJ
description ObjectiveTo develop and validate a machine learning (ML)-based model for predicting stroke-associated pneumonia (SAP) risk in older adult hemorrhagic stroke patients.MethodsA retrospective collection of older adult hemorrhagic stroke patients from three tertiary hospitals in Guiyang, Guizhou Province (January 2019–December 2022) formed the modeling cohort, randomly split into training and internal validation sets (7:3 ratio). External validation utilized retrospective data from January–December 2023. After univariate and multivariate regression analyses, four ML models (Logistic Regression, XGBoost, Naive Bayes, and SVM) were constructed. Receiver operating characteristic (ROC) curves and area under the curve (AUC) were calculated for training and internal validation sets. Model performance was compared using Delong's test or Bootstrap test, while sensitivity, specificity, accuracy, precision, recall, and F1-score evaluated predictive efficacy. Calibration curves assessed model calibration. The optimal model underwent external validation using ROC and calibration curves.ResultsA total of 788 older adult hemorrhagic stroke patients were enrolled, divided into a training set (n = 462), an internal validation set (n = 196), and an external validation set (n = 130). The incidence of SAP in older adult patients with hemorrhagic stroke was 46.7% (368/788). Advanced age [OR = 1.064, 95% CI (1.024, 1.104)], smoking[OR = 2.488, 95% CI (1.460, 4.24)], low GCS score [OR = 0.675, 95% CI (0.553, 0.825)], low Braden score [OR = 0.741, 95% CI (0.640, 0.858)], and nasogastric tube [OR = 1.761, 95% CI (1.048, 2.960)] were identified as risk factors for SAP. Among the four machine learning algorithms evaluated [XGBoost, Logistic Regression (LR), Support Vector Machine (SVM), and Naive Bayes], the LR model demonstrated robust and consistent performance in predicting SAP among older adult patients with hemorrhagic stroke across multiple evaluation metrics. Furthermore, the model exhibited stable generalizability within the external validation cohort. Based on these findings, the LR framework was subsequently selected for external validation, accompanied by a nomogram visualization. The model achieved AUC values of 0.883 (training), 0.855 (internal validation), and 0.882 (external validation). The Hosmer-Lemeshow (H-L) test indicates that the calibration of the model is satisfactory in all three datasets, with P-values of 0.381, 0.142, and 0.066 respectively.ConclusionsThis study constructed and validated a risk prediction model for SAP in older adult patients with hemorrhagic stroke based on multi-center data. The results indicated that among the four machine learning algorithms (XGBoost, LR, SVM, and Naive Bayes), the LR model demonstrated the best and most stable predictive performance. Age, smoking, low GCS score, low Braden score, and nasogastric tube were identified as predictive factors for SAP in these patients. These indicators are easily obtainable in clinical practice and facilitate rapid bedside assessment. Through internal and external validation, the model was proven to have good generalization ability, and a nomogram was ultimately drawn to provide an objective and operational risk assessment tool for clinical nursing practice. It helps in the early identification of high-risk patients and guides targeted interventions, thereby reducing the incidence of SAP and improving patient prognosis.
format Article
id doaj-art-cc620a5ed3554ab49a1f58538f0ca39d
institution Kabale University
issn 1664-2295
language English
publishDate 2025-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neurology
spelling doaj-art-cc620a5ed3554ab49a1f58538f0ca39d2025-08-20T03:31:05ZengFrontiers Media S.A.Frontiers in Neurology1664-22952025-06-011610.3389/fneur.2025.15915701591570Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic strokeYi Cao0Yi Cao1Haipeng Deng2Shaoyun Liu3Xi Zeng4Yangyang Gou5Weiting Zhang6Yixinyuan Li7Hua Yang8Min Peng9Department of Neurosurgery, Affiliated Hospital of Guizhou Medical University, Guiyang, ChinaSchool of Nursing, Guizhou Medical University, Guiyang, ChinaDepartment of Neurosurgery, Affiliated Hospital of Guizhou Medical University, Guiyang, ChinaDepartment of Neurosurgery, Affiliated Hospital of Guizhou Medical University, Guiyang, ChinaDepartment of Neurosurgery, Affiliated Hospital of Guizhou Medical University, Guiyang, ChinaSchool of Nursing, Guizhou Medical University, Guiyang, ChinaSchool of Nursing, Guizhou Medical University, Guiyang, ChinaSchool of Nursing, Guizhou Medical University, Guiyang, ChinaDepartment of Neurosurgery, Affiliated Hospital of Guizhou Medical University, Guiyang, ChinaDepartment of Nursing Quality Management, Affiliated Hospital of Guizhou Medical University, Guiyang, ChinaObjectiveTo develop and validate a machine learning (ML)-based model for predicting stroke-associated pneumonia (SAP) risk in older adult hemorrhagic stroke patients.MethodsA retrospective collection of older adult hemorrhagic stroke patients from three tertiary hospitals in Guiyang, Guizhou Province (January 2019–December 2022) formed the modeling cohort, randomly split into training and internal validation sets (7:3 ratio). External validation utilized retrospective data from January–December 2023. After univariate and multivariate regression analyses, four ML models (Logistic Regression, XGBoost, Naive Bayes, and SVM) were constructed. Receiver operating characteristic (ROC) curves and area under the curve (AUC) were calculated for training and internal validation sets. Model performance was compared using Delong's test or Bootstrap test, while sensitivity, specificity, accuracy, precision, recall, and F1-score evaluated predictive efficacy. Calibration curves assessed model calibration. The optimal model underwent external validation using ROC and calibration curves.ResultsA total of 788 older adult hemorrhagic stroke patients were enrolled, divided into a training set (n = 462), an internal validation set (n = 196), and an external validation set (n = 130). The incidence of SAP in older adult patients with hemorrhagic stroke was 46.7% (368/788). Advanced age [OR = 1.064, 95% CI (1.024, 1.104)], smoking[OR = 2.488, 95% CI (1.460, 4.24)], low GCS score [OR = 0.675, 95% CI (0.553, 0.825)], low Braden score [OR = 0.741, 95% CI (0.640, 0.858)], and nasogastric tube [OR = 1.761, 95% CI (1.048, 2.960)] were identified as risk factors for SAP. Among the four machine learning algorithms evaluated [XGBoost, Logistic Regression (LR), Support Vector Machine (SVM), and Naive Bayes], the LR model demonstrated robust and consistent performance in predicting SAP among older adult patients with hemorrhagic stroke across multiple evaluation metrics. Furthermore, the model exhibited stable generalizability within the external validation cohort. Based on these findings, the LR framework was subsequently selected for external validation, accompanied by a nomogram visualization. The model achieved AUC values of 0.883 (training), 0.855 (internal validation), and 0.882 (external validation). The Hosmer-Lemeshow (H-L) test indicates that the calibration of the model is satisfactory in all three datasets, with P-values of 0.381, 0.142, and 0.066 respectively.ConclusionsThis study constructed and validated a risk prediction model for SAP in older adult patients with hemorrhagic stroke based on multi-center data. The results indicated that among the four machine learning algorithms (XGBoost, LR, SVM, and Naive Bayes), the LR model demonstrated the best and most stable predictive performance. Age, smoking, low GCS score, low Braden score, and nasogastric tube were identified as predictive factors for SAP in these patients. These indicators are easily obtainable in clinical practice and facilitate rapid bedside assessment. Through internal and external validation, the model was proven to have good generalization ability, and a nomogram was ultimately drawn to provide an objective and operational risk assessment tool for clinical nursing practice. It helps in the early identification of high-risk patients and guides targeted interventions, thereby reducing the incidence of SAP and improving patient prognosis.https://www.frontiersin.org/articles/10.3389/fneur.2025.1591570/fullmachine learningolder adulthemorrhagic strokestroke-associated pneumoniaprediction modelvalidation
spellingShingle Yi Cao
Yi Cao
Haipeng Deng
Shaoyun Liu
Xi Zeng
Yangyang Gou
Weiting Zhang
Yixinyuan Li
Hua Yang
Min Peng
Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic stroke
Frontiers in Neurology
machine learning
older adult
hemorrhagic stroke
stroke-associated pneumonia
prediction model
validation
title Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic stroke
title_full Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic stroke
title_fullStr Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic stroke
title_full_unstemmed Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic stroke
title_short Development and validation of a machine learning-based risk prediction model for stroke-associated pneumonia in older adult hemorrhagic stroke
title_sort development and validation of a machine learning based risk prediction model for stroke associated pneumonia in older adult hemorrhagic stroke
topic machine learning
older adult
hemorrhagic stroke
stroke-associated pneumonia
prediction model
validation
url https://www.frontiersin.org/articles/10.3389/fneur.2025.1591570/full
work_keys_str_mv AT yicao developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT yicao developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT haipengdeng developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT shaoyunliu developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT xizeng developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT yangyanggou developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT weitingzhang developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT yixinyuanli developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT huayang developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke
AT minpeng developmentandvalidationofamachinelearningbasedriskpredictionmodelforstrokeassociatedpneumoniainolderadulthemorrhagicstroke