Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study

Abstract Background Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications prom...

Full description

Saved in:
Bibliographic Details
Main Authors: Juan Xie, Run-wei Ma, Yu-jing Feng, Yuan Qiao, Hong-yan Zhu, Xing-ping Tao, Wen-juan Chen, Cong-yun Liu, Tan Li, Kai Liu, Li-ming Cheng
Format: Article
Language:English
Published: BMC 2025-03-01
Series:BMC Infectious Diseases
Subjects:
Online Access:https://doi.org/10.1186/s12879-025-10797-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850063892941635584
author Juan Xie
Run-wei Ma
Yu-jing Feng
Yuan Qiao
Hong-yan Zhu
Xing-ping Tao
Wen-juan Chen
Cong-yun Liu
Tan Li
Kai Liu
Li-ming Cheng
author_facet Juan Xie
Run-wei Ma
Yu-jing Feng
Yuan Qiao
Hong-yan Zhu
Xing-ping Tao
Wen-juan Chen
Cong-yun Liu
Tan Li
Kai Liu
Li-ming Cheng
author_sort Juan Xie
collection DOAJ
description Abstract Background Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications promptly is imperative. Nevertheless, the current diagnostic methods, including PCR and bacterial culture, are time-consuming and expensive. Some studies have attempted to develop risk prediction models based on multivariate data, but their performance can be improved. Therefore, this study aims to further optimize and expand the risk assessment tool to more efficiently identify high-risk individuals and compensate for the shortcomings of existing diagnostic methods. Objective The aim of this study was to develop a pertussis risk prediction model that is both efficient and has good generalization ability, applicable to different datasets. The model was constructed using machine learning techniques based on multicenter data and screened for key features. The performance and generalization ability of the model were evaluated by deploying it on an online platform. At the same time, this study aims to provide a rapid and accurate auxiliary diagnostic tool for clinical practice to help identify high-risk patients in a timely manner, optimize early intervention strategies, reduce the risk of complications and reduce transmission, thereby improving the efficiency of public health management. Methods First, data from 1085 suspected pertussis patients from 7 centers were collected, and ten key features were analyzed using the lasso regression and Boruta algorithm: PDW-MPV-RATIO, SII, white blood cells, platelet distribution width, mean platelet volume, lymphocytes, cough duration, vaccination, fever, and lytic lymphocytes.Eight models were then trained and validated to assess their performance and to confirm their generalization ability with external datasets based on these features. Finally, an online platform was constructed for clinicians to use the models in real time. Results The random forest model demonstrated excellent discrimination ability in the validation set, with an AUC of 0.98, and an AUC of 0.97 in the external validation set. Calibration curve and decision curve analysis showed that the model had high accuracy in predicting low-to-medium risk patients, which could help clinicians avoid unnecessary interventions, especially in resource-limited settings. The application of this model can help optimize the early identification and management of high-risk patients and improve clinical decision-making. Conclusion The pertussis prediction model devised in this study was validated using multicenter data, exhibited high prediction performance, and was successfully implemented online. Future research should broaden the data sources and incorporate dynamic data to enhance the model's accuracy and applicability.
format Article
id doaj-art-6bdda123738f4201b63a21e39e7e0ec9
institution DOAJ
issn 1471-2334
language English
publishDate 2025-03-01
publisher BMC
record_format Article
series BMC Infectious Diseases
spelling doaj-art-6bdda123738f4201b63a21e39e7e0ec92025-08-20T02:49:29ZengBMCBMC Infectious Diseases1471-23342025-03-0125111410.1186/s12879-025-10797-7Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective studyJuan Xie0Run-wei Ma1Yu-jing Feng2Yuan Qiao3Hong-yan Zhu4Xing-ping Tao5Wen-juan Chen6Cong-yun Liu7Tan Li8Kai Liu9Li-ming Cheng10Department of Anesthesiology, Kunming Children’S HospitalDepartment of Cardiac Surgery, Fuwai Yunnan Hospital, Chinese Academy of Medical Sciences/Affiliated Cardiovascular Hospital of Kunming Medical UniversityComprehensive Pediatrics, Wenshan Maternal and Child Health Care HospitalComprehensive Pediatrics and Neonatology, Chuxiong Yi Autonomous Prefecture People’s HospitalPediatric Respiratory Department, Qujing Maternal and Child Health HospitalDepartment of Pediatrics, Kaiyuan People’s HospitalDepartment of Pediatrics and Emergency, Yuxi Children’S HospitalComprehensive Pediatrics & Pulmonary and Critical Care Medicine, Baoshan People’s HospitalDepartment of Respiratory Medicine Kunming Children’S HospitalComprehensive Pediatrics & Pulmonary and Critical Care Medicine, Kunming Children’S HospitalDepartment of Anesthesiology, Kunming Children’S HospitalAbstract Background Pertussis is a highly contagious respiratory disease. Even though vaccination has reduced the incidence, cases have resurfaced in certain regions due to immune escape and waning vaccine efficacy. Identifying high-risk patients to mitigate transmission and avert complications promptly is imperative. Nevertheless, the current diagnostic methods, including PCR and bacterial culture, are time-consuming and expensive. Some studies have attempted to develop risk prediction models based on multivariate data, but their performance can be improved. Therefore, this study aims to further optimize and expand the risk assessment tool to more efficiently identify high-risk individuals and compensate for the shortcomings of existing diagnostic methods. Objective The aim of this study was to develop a pertussis risk prediction model that is both efficient and has good generalization ability, applicable to different datasets. The model was constructed using machine learning techniques based on multicenter data and screened for key features. The performance and generalization ability of the model were evaluated by deploying it on an online platform. At the same time, this study aims to provide a rapid and accurate auxiliary diagnostic tool for clinical practice to help identify high-risk patients in a timely manner, optimize early intervention strategies, reduce the risk of complications and reduce transmission, thereby improving the efficiency of public health management. Methods First, data from 1085 suspected pertussis patients from 7 centers were collected, and ten key features were analyzed using the lasso regression and Boruta algorithm: PDW-MPV-RATIO, SII, white blood cells, platelet distribution width, mean platelet volume, lymphocytes, cough duration, vaccination, fever, and lytic lymphocytes.Eight models were then trained and validated to assess their performance and to confirm their generalization ability with external datasets based on these features. Finally, an online platform was constructed for clinicians to use the models in real time. Results The random forest model demonstrated excellent discrimination ability in the validation set, with an AUC of 0.98, and an AUC of 0.97 in the external validation set. Calibration curve and decision curve analysis showed that the model had high accuracy in predicting low-to-medium risk patients, which could help clinicians avoid unnecessary interventions, especially in resource-limited settings. The application of this model can help optimize the early identification and management of high-risk patients and improve clinical decision-making. Conclusion The pertussis prediction model devised in this study was validated using multicenter data, exhibited high prediction performance, and was successfully implemented online. Future research should broaden the data sources and incorporate dynamic data to enhance the model's accuracy and applicability.https://doi.org/10.1186/s12879-025-10797-7Public healthPDW-MPV-RATIOSIICalibration curvesLasso regressionRandom forest
spellingShingle Juan Xie
Run-wei Ma
Yu-jing Feng
Yuan Qiao
Hong-yan Zhu
Xing-ping Tao
Wen-juan Chen
Cong-yun Liu
Tan Li
Kai Liu
Li-ming Cheng
Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study
BMC Infectious Diseases
Public health
PDW-MPV-RATIO
SII
Calibration curves
Lasso regression
Random forest
title Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study
title_full Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study
title_fullStr Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study
title_full_unstemmed Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study
title_short Machine learning-based risk prediction model for pertussis in children: a multicenter retrospective study
title_sort machine learning based risk prediction model for pertussis in children a multicenter retrospective study
topic Public health
PDW-MPV-RATIO
SII
Calibration curves
Lasso regression
Random forest
url https://doi.org/10.1186/s12879-025-10797-7
work_keys_str_mv AT juanxie machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT runweima machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT yujingfeng machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT yuanqiao machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT hongyanzhu machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT xingpingtao machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT wenjuanchen machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT congyunliu machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT tanli machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT kailiu machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy
AT limingcheng machinelearningbasedriskpredictionmodelforpertussisinchildrenamulticenterretrospectivestudy