Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model

Introduction Despite progress in tuberculosis (TB) treatment coverage in past years, an estimated 183 000 people with TB may not have been diagnosed in Pakistan in 2022. Therefore, there is a need to develop models which help to steer active case finding (ACF) towards populations with a high probabi...

Full description

Saved in:
Bibliographic Details
Main Authors: Hasan Tahir, Frank Cobelens, Christina Mergenthaler, Mirjam I Bakker, Tanveer Ahmed, Jake D Mathewson, Daniella Brals, Abdullah Latif, Stephanie Lako, Andreas Werle van der Merwe, Matthys Potgieter, Vincent Meurrens, Zia Samad, Ente Rood
Format: Article
Language:English
Published: BMJ Publishing Group 2025-05-01
Series:BMJ Public Health
Online Access:https://bmjpublichealth.bmj.com/content/3/1/e001424.full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850111670466117632
author Hasan Tahir
Frank Cobelens
Christina Mergenthaler
Mirjam I Bakker
Tanveer Ahmed
Jake D Mathewson
Daniella Brals
Abdullah Latif
Stephanie Lako
Andreas Werle van der Merwe
Matthys Potgieter
Vincent Meurrens
Zia Samad
Ente Rood
author_facet Hasan Tahir
Frank Cobelens
Christina Mergenthaler
Mirjam I Bakker
Tanveer Ahmed
Jake D Mathewson
Daniella Brals
Abdullah Latif
Stephanie Lako
Andreas Werle van der Merwe
Matthys Potgieter
Vincent Meurrens
Zia Samad
Ente Rood
author_sort Hasan Tahir
collection DOAJ
description Introduction Despite progress in tuberculosis (TB) treatment coverage in past years, an estimated 183 000 people with TB may not have been diagnosed in Pakistan in 2022. Therefore, there is a need to develop models which help to steer active case finding (ACF) towards populations with a high probability of having undetected TB. The aim of this study was to cross-validate TB positivity rate predictions in ACF settings of an existing Bayesian machine learning (BML) with a simpler frequentist model.Methods We conducted a retrospective analysis of cross-sectional data to identify predictors for detection of bacteriologically confirmed TB cases during ACF events in Pakistan. A predictive negative binomial regression (NBR) model was created, and the presence of spatial autocorrelation was examined to account for spatial dependencies in the outcome variable. The NBR and BML models were compared on their respective predictive precisions for the identification of TB hotspots, based on Root Mean Square Error values, k-fold cross-validation and tehsil-level (sub-district) prediction rankings.Results 407 (1.9%) bacteriologically confirmed cases among 21 227 visitors were detected in 414 ACF events between September 2020 and January 2022. In the final NBR, the spatial lag variable explained most variation in TB positivity rates across ACF events. NBR and BML predictions were similar at tehsil level. While the BML had a slightly lower root mean squared error (1.02 vs 1.03) the NBR had a slightly better fit based on the Akaike information criterion.Conclusions Statistical models can be effective in predicting TB hotspots for ACF planning, and the relatively simpler NBR model was nearly as effective as a more complex BML model. The predictions of different modelling approaches were similar, suggesting that predictions are more driven by covariates rather than modelling framework. The agreement between model results increases confidence in the potential utility of models to spatially target ACF activities in high need, low access areas.
format Article
id doaj-art-2467007360274afea13ea57fa0ea76b9
institution OA Journals
issn 2753-4294
language English
publishDate 2025-05-01
publisher BMJ Publishing Group
record_format Article
series BMJ Public Health
spelling doaj-art-2467007360274afea13ea57fa0ea76b92025-08-20T02:37:34ZengBMJ Publishing GroupBMJ Public Health2753-42942025-05-013110.1136/bmjph-2024-001424Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning modelHasan Tahir0Frank Cobelens1Christina Mergenthaler2Mirjam I Bakker3Tanveer Ahmed4Jake D Mathewson5Daniella Brals6Abdullah Latif7Stephanie Lako8Andreas Werle van der Merwe9Matthys Potgieter10Vincent Meurrens11Zia Samad12Ente Rood13Royal Free London NHS Foundation Trust, London, UKAmsterdam Institute for Global Health & Development (AIGHD) and Department of Global Health, Amsterdam University Medical Centers, Amsterdam, The NetherlandsHealth Unit, KIT Royal Tropical Institute, Amsterdam, The NetherlandsKIT Royal Tropical Institute, Amsterdam, NetherlandsCommon Management Unit for AIDS Tuberculosis and Malaria, Islamabad, PakistanKIT Royal Tropical Institute, Amsterdam, NetherlandsDepartment of Global Health, Amsterdam Institute for Global Health and Development, Amsterdam, NetherlandsMercy Corps, Islamabad, PakistanCentre for Applied Spatial Epidemiology, KIT Royal Tropical Institute, Amsterdam, The NetherlandsEPCON SA Pty (Ltd), Cape Town, South AfricaEPCON SA Pty (Ltd), Cape Town, South AfricaEPCON SA Pty (Ltd), Cape Town, South AfricaCommon Management Unit for AIDS, Tuberculosis and Malaria, Islamabad, PakistanKIT Royal Tropical Institute, Amsterdam, The NetherlandsIntroduction Despite progress in tuberculosis (TB) treatment coverage in past years, an estimated 183 000 people with TB may not have been diagnosed in Pakistan in 2022. Therefore, there is a need to develop models which help to steer active case finding (ACF) towards populations with a high probability of having undetected TB. The aim of this study was to cross-validate TB positivity rate predictions in ACF settings of an existing Bayesian machine learning (BML) with a simpler frequentist model.Methods We conducted a retrospective analysis of cross-sectional data to identify predictors for detection of bacteriologically confirmed TB cases during ACF events in Pakistan. A predictive negative binomial regression (NBR) model was created, and the presence of spatial autocorrelation was examined to account for spatial dependencies in the outcome variable. The NBR and BML models were compared on their respective predictive precisions for the identification of TB hotspots, based on Root Mean Square Error values, k-fold cross-validation and tehsil-level (sub-district) prediction rankings.Results 407 (1.9%) bacteriologically confirmed cases among 21 227 visitors were detected in 414 ACF events between September 2020 and January 2022. In the final NBR, the spatial lag variable explained most variation in TB positivity rates across ACF events. NBR and BML predictions were similar at tehsil level. While the BML had a slightly lower root mean squared error (1.02 vs 1.03) the NBR had a slightly better fit based on the Akaike information criterion.Conclusions Statistical models can be effective in predicting TB hotspots for ACF planning, and the relatively simpler NBR model was nearly as effective as a more complex BML model. The predictions of different modelling approaches were similar, suggesting that predictions are more driven by covariates rather than modelling framework. The agreement between model results increases confidence in the potential utility of models to spatially target ACF activities in high need, low access areas.https://bmjpublichealth.bmj.com/content/3/1/e001424.full
spellingShingle Hasan Tahir
Frank Cobelens
Christina Mergenthaler
Mirjam I Bakker
Tanveer Ahmed
Jake D Mathewson
Daniella Brals
Abdullah Latif
Stephanie Lako
Andreas Werle van der Merwe
Matthys Potgieter
Vincent Meurrens
Zia Samad
Ente Rood
Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model
BMJ Public Health
title Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model
title_full Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model
title_fullStr Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model
title_full_unstemmed Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model
title_short Predicting communities with high tuberculosis case-finding efficiency to optimise resource allocation in Pakistan: comparing the performance of a negative binomial spatial lag model with a Bayesian machine-learning model
title_sort predicting communities with high tuberculosis case finding efficiency to optimise resource allocation in pakistan comparing the performance of a negative binomial spatial lag model with a bayesian machine learning model
url https://bmjpublichealth.bmj.com/content/3/1/e001424.full
work_keys_str_mv AT hasantahir predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT frankcobelens predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT christinamergenthaler predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT mirjamibakker predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT tanveerahmed predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT jakedmathewson predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT daniellabrals predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT abdullahlatif predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT stephanielako predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT andreaswerlevandermerwe predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT matthyspotgieter predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT vincentmeurrens predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT ziasamad predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel
AT enterood predictingcommunitieswithhightuberculosiscasefindingefficiencytooptimiseresourceallocationinpakistancomparingtheperformanceofanegativebinomialspatiallagmodelwithabayesianmachinelearningmodel