Machine learning for predicting severe dengue in Puerto Rico

Abstract Background Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This stud...

Full description

Saved in:
Bibliographic Details
Main Authors: Zachary J. Madewell, Dania M. Rodriguez, Maile B. Thayer, Vanessa Rivera-Amill, Gabriela Paz-Bailey, Laura E. Adams, Joshua M. Wong
Format: Article
Language:English
Published: BMC 2025-02-01
Series:Infectious Diseases of Poverty
Subjects:
Online Access:https://doi.org/10.1186/s40249-025-01273-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861483027562496
author Zachary J. Madewell
Dania M. Rodriguez
Maile B. Thayer
Vanessa Rivera-Amill
Gabriela Paz-Bailey
Laura E. Adams
Joshua M. Wong
author_facet Zachary J. Madewell
Dania M. Rodriguez
Maile B. Thayer
Vanessa Rivera-Amill
Gabriela Paz-Bailey
Laura E. Adams
Joshua M. Wong
author_sort Zachary J. Madewell
collection DOAJ
description Abstract Background Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico. Methods We analyzed data from Puerto Rico’s Sentinel Enhanced Dengue Surveillance System (May 2012–August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance. Results Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0–98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4–6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5–98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%. Conclusions ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models’ applicability in resource-limited settings, where access to laboratory data may be limited. Graphical Abstract
format Article
id doaj-art-8f5b6b2c0f6744fead5a08d53b15d568
institution Kabale University
issn 2049-9957
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series Infectious Diseases of Poverty
spelling doaj-art-8f5b6b2c0f6744fead5a08d53b15d5682025-02-09T12:59:46ZengBMCInfectious Diseases of Poverty2049-99572025-02-0114111710.1186/s40249-025-01273-0Machine learning for predicting severe dengue in Puerto RicoZachary J. Madewell0Dania M. Rodriguez1Maile B. Thayer2Vanessa Rivera-Amill3Gabriela Paz-Bailey4Laura E. Adams5Joshua M. Wong6Division of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionPonce Health Sciences University/Ponce Research InstituteDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionAbstract Background Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico. Methods We analyzed data from Puerto Rico’s Sentinel Enhanced Dengue Surveillance System (May 2012–August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance. Results Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0–98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4–6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5–98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%. Conclusions ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models’ applicability in resource-limited settings, where access to laboratory data may be limited. Graphical Abstracthttps://doi.org/10.1186/s40249-025-01273-0DengueEnsemble learningGradient boostingFeature importanceClinical decision supportCaribbean
spellingShingle Zachary J. Madewell
Dania M. Rodriguez
Maile B. Thayer
Vanessa Rivera-Amill
Gabriela Paz-Bailey
Laura E. Adams
Joshua M. Wong
Machine learning for predicting severe dengue in Puerto Rico
Infectious Diseases of Poverty
Dengue
Ensemble learning
Gradient boosting
Feature importance
Clinical decision support
Caribbean
title Machine learning for predicting severe dengue in Puerto Rico
title_full Machine learning for predicting severe dengue in Puerto Rico
title_fullStr Machine learning for predicting severe dengue in Puerto Rico
title_full_unstemmed Machine learning for predicting severe dengue in Puerto Rico
title_short Machine learning for predicting severe dengue in Puerto Rico
title_sort machine learning for predicting severe dengue in puerto rico
topic Dengue
Ensemble learning
Gradient boosting
Feature importance
Clinical decision support
Caribbean
url https://doi.org/10.1186/s40249-025-01273-0
work_keys_str_mv AT zacharyjmadewell machinelearningforpredictingseveredengueinpuertorico
AT daniamrodriguez machinelearningforpredictingseveredengueinpuertorico
AT mailebthayer machinelearningforpredictingseveredengueinpuertorico
AT vanessariveraamill machinelearningforpredictingseveredengueinpuertorico
AT gabrielapazbailey machinelearningforpredictingseveredengueinpuertorico
AT lauraeadams machinelearningforpredictingseveredengueinpuertorico
AT joshuamwong machinelearningforpredictingseveredengueinpuertorico