Machine learning for predicting severe dengue in Puerto Rico
Abstract Background Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This stud...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2025-02-01
|
Series: | Infectious Diseases of Poverty |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40249-025-01273-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861483027562496 |
---|---|
author | Zachary J. Madewell Dania M. Rodriguez Maile B. Thayer Vanessa Rivera-Amill Gabriela Paz-Bailey Laura E. Adams Joshua M. Wong |
author_facet | Zachary J. Madewell Dania M. Rodriguez Maile B. Thayer Vanessa Rivera-Amill Gabriela Paz-Bailey Laura E. Adams Joshua M. Wong |
author_sort | Zachary J. Madewell |
collection | DOAJ |
description | Abstract Background Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico. Methods We analyzed data from Puerto Rico’s Sentinel Enhanced Dengue Surveillance System (May 2012–August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance. Results Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0–98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4–6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5–98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%. Conclusions ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models’ applicability in resource-limited settings, where access to laboratory data may be limited. Graphical Abstract |
format | Article |
id | doaj-art-8f5b6b2c0f6744fead5a08d53b15d568 |
institution | Kabale University |
issn | 2049-9957 |
language | English |
publishDate | 2025-02-01 |
publisher | BMC |
record_format | Article |
series | Infectious Diseases of Poverty |
spelling | doaj-art-8f5b6b2c0f6744fead5a08d53b15d5682025-02-09T12:59:46ZengBMCInfectious Diseases of Poverty2049-99572025-02-0114111710.1186/s40249-025-01273-0Machine learning for predicting severe dengue in Puerto RicoZachary J. Madewell0Dania M. Rodriguez1Maile B. Thayer2Vanessa Rivera-Amill3Gabriela Paz-Bailey4Laura E. Adams5Joshua M. Wong6Division of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionPonce Health Sciences University/Ponce Research InstituteDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionDivision of Vector-Borne Diseases, Centers for Disease Control and PreventionAbstract Background Distinguishing between non-severe and severe dengue is crucial for timely intervention and reducing morbidity and mortality. World Health Organization (WHO)-recommended warning signs offer a practical approach for clinicians but have limited sensitivity and specificity. This study aims to evaluate machine learning (ML) model performance compared to WHO-recommended warning signs in predicting severe dengue among laboratory-confirmed cases in Puerto Rico. Methods We analyzed data from Puerto Rico’s Sentinel Enhanced Dengue Surveillance System (May 2012–August 2024), using 40 clinical, demographic, and laboratory variables. Nine ML models, including Decision Trees, K-Nearest Neighbors, Naïve Bayes, Support Vector Machines, Artificial Neural Networks, AdaBoost, CatBoost, LightGBM, and XGBoost, were trained using fivefold cross-validation and evaluated with area under the receiver operating characteristic curve (AUC-ROC), sensitivity, and specificity. A subanalysis excluded hemoconcentration and leukopenia to assess performance in resource-limited settings. An AUC-ROC value of 0.5 indicates no discriminative power, while values closer to 1.0 reflect better performance. Results Among the 1708 laboratory-confirmed dengue cases, 24.3% were classified as severe. Gradient boosting algorithms achieved the highest predictive performance, with an AUC-ROC of 97.1% (95% CI: 96.0–98.3%) for CatBoost using the full 40-variable feature set. Feature importance analysis identified hemoconcentration (≥ 20% increase during illness or ≥ 20% above baseline for age and sex), leukopenia (white blood cell count < 4000/mm3), and timing of presentation at 4–6 days post-symptom onset as key predictors. When excluding hemoconcentration and leukopenia, the CatBoost AUC-ROC was 96.7% (95% CI: 95.5–98.0%), demonstrating minimal reduction in performance. Individual warning signs like abdominal pain and restlessness had sensitivities of 79.0% and 64.6%, but lower specificities of 48.4% and 59.1%, respectively. Combining ≥ 3 warning signs improved specificity (80.9%) while maintaining moderate sensitivity (78.6%), resulting in an AUC-ROC of 74.0%. Conclusions ML models, especially gradient boosting algorithms, outperformed traditional warning signs in predicting severe dengue. Integrating these models into clinical decision-support tools could help clinicians better identify high-risk patients, guiding timely interventions like hospitalization, closer monitoring, or the administration of intravenous fluids. The subanalysis excluding hemoconcentration confirmed the models’ applicability in resource-limited settings, where access to laboratory data may be limited. Graphical Abstracthttps://doi.org/10.1186/s40249-025-01273-0DengueEnsemble learningGradient boostingFeature importanceClinical decision supportCaribbean |
spellingShingle | Zachary J. Madewell Dania M. Rodriguez Maile B. Thayer Vanessa Rivera-Amill Gabriela Paz-Bailey Laura E. Adams Joshua M. Wong Machine learning for predicting severe dengue in Puerto Rico Infectious Diseases of Poverty Dengue Ensemble learning Gradient boosting Feature importance Clinical decision support Caribbean |
title | Machine learning for predicting severe dengue in Puerto Rico |
title_full | Machine learning for predicting severe dengue in Puerto Rico |
title_fullStr | Machine learning for predicting severe dengue in Puerto Rico |
title_full_unstemmed | Machine learning for predicting severe dengue in Puerto Rico |
title_short | Machine learning for predicting severe dengue in Puerto Rico |
title_sort | machine learning for predicting severe dengue in puerto rico |
topic | Dengue Ensemble learning Gradient boosting Feature importance Clinical decision support Caribbean |
url | https://doi.org/10.1186/s40249-025-01273-0 |
work_keys_str_mv | AT zacharyjmadewell machinelearningforpredictingseveredengueinpuertorico AT daniamrodriguez machinelearningforpredictingseveredengueinpuertorico AT mailebthayer machinelearningforpredictingseveredengueinpuertorico AT vanessariveraamill machinelearningforpredictingseveredengueinpuertorico AT gabrielapazbailey machinelearningforpredictingseveredengueinpuertorico AT lauraeadams machinelearningforpredictingseveredengueinpuertorico AT joshuamwong machinelearningforpredictingseveredengueinpuertorico |