Machine learning to improve the understanding of rabies epidemiology in low surveillance settings
Abstract In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2024-10-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-024-76089-3 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850179594960764928 |
|---|---|
| author | Ravikiran Keshavamurthy Cassandra Boutelle Yoshinori Nakazawa Haim Joseph Dady W. Joseph Pierre Dilius Andrew D. Gibson Ryan M. Wallace |
| author_facet | Ravikiran Keshavamurthy Cassandra Boutelle Yoshinori Nakazawa Haim Joseph Dady W. Joseph Pierre Dilius Andrew D. Gibson Ryan M. Wallace |
| author_sort | Ravikiran Keshavamurthy |
| collection | DOAJ |
| description | Abstract In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for their ability to estimate the probability of rabies in animals investigated as part of an Integrated Bite Case Management program (IBCM). To balance our training data, we used Random Oversampling (ROS) and Synthetic Minority Oversampling Technique. We developed a risk stratification framework based on predicted rabies probabilities. XGB performed better at predicting rabies cases than LR. Oversampling strategies enhanced the model sensitivity making them the preferred technique to predict rare events like rabies in a biting animal. XGB-ROS classified most of the confirmed rabies cases and only a small proportion of non-cases as either high (confirmed cases = 85.2%, non-cases = 0.01%) or moderate (confirmed cases = 8.4%, non-cases = 4.0%) risk. Model-based risk stratification led to a 3.2-fold increase in epidemiologically useful data compared to a routine surveillance strategy using IBCM case definitions. Our study demonstrates the application of machine learning to strengthen zoonotic disease surveillance under resource-limited settings. |
| format | Article |
| id | doaj-art-e7c66c9cf26648fab914350dbcbe1fd2 |
| institution | OA Journals |
| issn | 2045-2322 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-e7c66c9cf26648fab914350dbcbe1fd22025-08-20T02:18:27ZengNature PortfolioScientific Reports2045-23222024-10-0114111010.1038/s41598-024-76089-3Machine learning to improve the understanding of rabies epidemiology in low surveillance settingsRavikiran Keshavamurthy0Cassandra Boutelle1Yoshinori Nakazawa2Haim Joseph3Dady W. Joseph4Pierre Dilius5Andrew D. Gibson6Ryan M. Wallace7Poxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionPoxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionPoxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionMinistère de l’Agriculture, des Ressources Naturelles et du Développement RuralMinistère de l’Agriculture, des Ressources Naturelles et du Développement RuralMinistère de l’Agriculture, des Ressources Naturelles et du Développement RuralMission RabiesPoxvirus and Rabies Branch, Division of High Consequence Pathogens and Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and PreventionAbstract In low and middle-income countries, a large proportion of animal rabies investigations end without a conclusive diagnosis leading to epidemiologic interpretations informed by clinical, rather than laboratory data. We compared Extreme Gradient Boosting (XGB) with Logistic Regression (LR) for their ability to estimate the probability of rabies in animals investigated as part of an Integrated Bite Case Management program (IBCM). To balance our training data, we used Random Oversampling (ROS) and Synthetic Minority Oversampling Technique. We developed a risk stratification framework based on predicted rabies probabilities. XGB performed better at predicting rabies cases than LR. Oversampling strategies enhanced the model sensitivity making them the preferred technique to predict rare events like rabies in a biting animal. XGB-ROS classified most of the confirmed rabies cases and only a small proportion of non-cases as either high (confirmed cases = 85.2%, non-cases = 0.01%) or moderate (confirmed cases = 8.4%, non-cases = 4.0%) risk. Model-based risk stratification led to a 3.2-fold increase in epidemiologically useful data compared to a routine surveillance strategy using IBCM case definitions. Our study demonstrates the application of machine learning to strengthen zoonotic disease surveillance under resource-limited settings.https://doi.org/10.1038/s41598-024-76089-3Rabies epidemiologyPredictionMachine learningExtreme gradient boostingRisk stratificationZoonotic disease surveillance |
| spellingShingle | Ravikiran Keshavamurthy Cassandra Boutelle Yoshinori Nakazawa Haim Joseph Dady W. Joseph Pierre Dilius Andrew D. Gibson Ryan M. Wallace Machine learning to improve the understanding of rabies epidemiology in low surveillance settings Scientific Reports Rabies epidemiology Prediction Machine learning Extreme gradient boosting Risk stratification Zoonotic disease surveillance |
| title | Machine learning to improve the understanding of rabies epidemiology in low surveillance settings |
| title_full | Machine learning to improve the understanding of rabies epidemiology in low surveillance settings |
| title_fullStr | Machine learning to improve the understanding of rabies epidemiology in low surveillance settings |
| title_full_unstemmed | Machine learning to improve the understanding of rabies epidemiology in low surveillance settings |
| title_short | Machine learning to improve the understanding of rabies epidemiology in low surveillance settings |
| title_sort | machine learning to improve the understanding of rabies epidemiology in low surveillance settings |
| topic | Rabies epidemiology Prediction Machine learning Extreme gradient boosting Risk stratification Zoonotic disease surveillance |
| url | https://doi.org/10.1038/s41598-024-76089-3 |
| work_keys_str_mv | AT ravikirankeshavamurthy machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings AT cassandraboutelle machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings AT yoshinorinakazawa machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings AT haimjoseph machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings AT dadywjoseph machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings AT pierredilius machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings AT andrewdgibson machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings AT ryanmwallace machinelearningtoimprovetheunderstandingofrabiesepidemiologyinlowsurveillancesettings |