Machine learning to improve the understanding of rabies epidemiology in low surveillance settings

Abstract In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for...

Full description

Saved in:
Bibliographic Details
Main Authors: Ravikiran Keshavamurthy, Cassandra Boutelle, Yoshinori Nakazawa, Haim Joseph, Dady W. Joseph, Pierre Dilius, Andrew D. Gibson, Ryan M. Wallace
Format: Article
Language:English
Published: Nature Portfolio 2024-10-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-76089-3
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850179594960764928
author Ravikiran Keshavamurthy
Cassandra Boutelle
Yoshinori Nakazawa
Haim Joseph
Dady W. Joseph
Pierre Dilius
Andrew D. Gibson
Ryan M. Wallace
author_facet Ravikiran Keshavamurthy
Cassandra Boutelle
Yoshinori Nakazawa
Haim Joseph
Dady W. Joseph
Pierre Dilius
Andrew D. Gibson
Ryan M. Wallace
author_sort Ravikiran Keshavamurthy
collection DOAJ
description Abstract In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for their ability to estimate the probability of rabies in animals investigated as part of an Integrated Bite Case Management program (IBCM). To balance our training data, we used Random Oversampling (ROS) and Synthetic Minority Oversampling Technique. We developed a risk stratification framework based on predicted rabies probabilities. XGB performed better at predicting rabies cases than LR. Oversampling strategies enhanced the model sensitivity making them the preferred technique to predict rare events like rabies in a biting animal. XGB-ROS classified most of the confirmed rabies cases and only a small proportion of non-cases as either high (confirmed cases = 85.2%, non-cases = 0.01%) or moderate (confirmed cases = 8.4%, non-cases = 4.0%) risk. Model-based risk stratification led to a 3.2-fold increase in epidemiologically useful data compared to a routine surveillance strategy using IBCM case definitions. Our study demonstrates the application of machine learning to strengthen zoonotic disease surveillance under resource-limited settings.
format Article
id doaj-art-e7c66c9cf26648fab914350dbcbe1fd2
institution OA Journals
issn 2045-2322
language English
publishDate 2024-10-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-e7c66c9cf26648fab914350dbcbe1fd22025-08-20T02:18:27ZengNature PortfolioScientific Reports2045-23222024-10-0114111010.1038/s41598-024-76089-3Machine learning to improve the understanding of rabies epidemiology in low surveillance settingsRavikiran Keshavamurthy0Cassandra Boutelle1Yoshinori Nakazawa2Haim Joseph3Dady W. Joseph4Pierre Dilius5Andrew D. Gibson6Ryan M. Wallace7Poxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionPoxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionPoxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionMinistère de l’Agriculture, des Ressources Naturelles et du Développement RuralMinistère de l’Agriculture, des Ressources Naturelles et du Développement RuralMinistère de l’Agriculture, des Ressources Naturelles et du Développement RuralMission RabiesPoxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionAbstract In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for their ability to estimate the probability of rabies in animals investigated as part of an Integrated Bite Case Management program (IBCM). To balance our training data, we used Random Oversampling (ROS) and Synthetic Minority Oversampling Technique. We developed a risk stratification framework based on predicted rabies probabilities. XGB performed better at predicting rabies cases than LR. Oversampling strategies enhanced the model sensitivity making them the preferred technique to predict rare events like rabies in a biting animal. XGB-ROS classified most of the confirmed rabies cases and only a small proportion of non-cases as either high (confirmed cases = 85.2%, non-cases = 0.01%) or moderate (confirmed cases = 8.4%, non-cases = 4.0%) risk. Model-based risk stratification led to a 3.2-fold increase in epidemiologically useful data compared to a routine surveillance strategy using IBCM case definitions. Our study demonstrates the application of machine learning to strengthen zoonotic disease surveillance under resource-limited settings.https://doi.org/10.1038/s41598-024-76089-3Rabies epidemiologyPredictionMachine learningExtreme gradient boostingRisk stratificationZoonotic disease surveillance
spellingShingle Ravikiran Keshavamurthy
Cassandra Boutelle
Yoshinori Nakazawa
Haim Joseph
Dady W. Joseph
Pierre Dilius
Andrew D. Gibson
Ryan M. Wallace
Machine learning to improve the understanding of rabies epidemiology in low surveillance settings
Scientific Reports
Rabies epidemiology
Prediction
Machine learning
Extreme gradient boosting
Risk stratification
Zoonotic disease surveillance
title Machine learning to improve the understanding of rabies epidemiology in low surveillance settings
title_full Machine learning to improve the understanding of rabies epidemiology in low surveillance settings
title_fullStr Machine learning to improve the understanding of rabies epidemiology in low surveillance settings
title_full_unstemmed Machine learning to improve the understanding of rabies epidemiology in low surveillance settings
title_short Machine learning to improve the understanding of rabies epidemiology in low surveillance settings
title_sort machine learning to improve the understanding of rabies epidemiology in low surveillance settings
topic Rabies epidemiology
Prediction
Machine learning
Extreme gradient boosting
Risk stratification
Zoonotic disease surveillance
url https://doi.org/10.1038/s41598-024-76089-3
work_keys_str_mv AT ravikirankeshavamurthy machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings
AT cassandraboutelle machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings
AT yoshinorinakazawa machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings
AT haimjoseph machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings
AT dadywjoseph machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings
AT pierredilius machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings
AT andrewdgibson machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings
AT ryanmwallace machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings