Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy

Abstract The increasing incidence of dengue virus (DENV) infections poses significant public health challenges in Bangladesh, demanding advanced forecasting methodologies to guide timely interventions. This study introduces a rigorous multivariate time series analysis, integrating meteorological fac...

Full description

Saved in:
Bibliographic Details
Main Author: Mahadee Al Mobin
Format: Article
Language:English
Published: BMC 2025-05-01
Series:BMC Infectious Diseases
Subjects:
Online Access:https://doi.org/10.1186/s12879-025-11159-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849687967689342976
author Mahadee Al Mobin
author_facet Mahadee Al Mobin
author_sort Mahadee Al Mobin
collection DOAJ
description Abstract The increasing incidence of dengue virus (DENV) infections poses significant public health challenges in Bangladesh, demanding advanced forecasting methodologies to guide timely interventions. This study introduces a rigorous multivariate time series analysis, integrating meteorological factors with state-of-the-art machine learning (ML) models, to predict DENV case trends across different temporal scales. Leveraging a robust data pipeline, this research incorporates a strategic downscaling technique, applying the Stochastic Bayesian Downscaling (SBD) algorithm to convert monthly DENV case data to daily frequency. This approach addresses key issues in the handling of sparse datasets and missing data, offering novel insights into the potential accuracy benefits of data downscaling in time series forecasting. Among the models assessed, the decision tree demonstrated superior performance on the actual monthly data, achieving an accuracy of $$74.6\%$$ 74.6 % . In contrast, the random forest model outperformed others on the downscaled daily data, reaching an accuracy of $$95.8\%$$ 95.8 % , thereby supporting the efficacy of data downscaling for ML applications in epidemiology. Comparative analysis reveals that downscaling provided a $$28.5\%$$ 28.5 % improvement in accuracy and an $$89.3\%$$ 89.3 % reduction in mean absolute percentage error (MAPE) over non-downscaled data which has been proven to be statistically significant using the Wilcoxon signed rank test, illustrating the substantial advantages of employing downscaling for effective DENV forecasting. Based on the best-performing model, the study further projects a worst-case scenario for 2024, forecasting daily cases to peak at 1,382 ( $$95\%$$ 95 % CI: 1,341-1,423) between August and October, with a gradual decline expected by December. The findings not only underscore the critical influence of meteorological variables on DENV transmission but also advocate for the adoption of sophisticated data preprocessing techniques, such as downscaling, to enhance prediction accuracy. This research marks a significant advancement in predictive epidemiology, offering a scalable framework for DENV and other vector-borne diseases, with implications for improving public health responses in vulnerable regions globally.
format Article
id doaj-art-ab7d0730a9924cb4b65affe108c9e6e5
institution DOAJ
issn 1471-2334
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series BMC Infectious Diseases
spelling doaj-art-ab7d0730a9924cb4b65affe108c9e6e52025-08-20T03:22:11ZengBMCBMC Infectious Diseases1471-23342025-05-0125111710.1186/s12879-025-11159-zMultivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracyMahadee Al Mobin0Bangladesh Institute of Governance and ManagementAbstract The increasing incidence of dengue virus (DENV) infections poses significant public health challenges in Bangladesh, demanding advanced forecasting methodologies to guide timely interventions. This study introduces a rigorous multivariate time series analysis, integrating meteorological factors with state-of-the-art machine learning (ML) models, to predict DENV case trends across different temporal scales. Leveraging a robust data pipeline, this research incorporates a strategic downscaling technique, applying the Stochastic Bayesian Downscaling (SBD) algorithm to convert monthly DENV case data to daily frequency. This approach addresses key issues in the handling of sparse datasets and missing data, offering novel insights into the potential accuracy benefits of data downscaling in time series forecasting. Among the models assessed, the decision tree demonstrated superior performance on the actual monthly data, achieving an accuracy of $$74.6\%$$ 74.6 % . In contrast, the random forest model outperformed others on the downscaled daily data, reaching an accuracy of $$95.8\%$$ 95.8 % , thereby supporting the efficacy of data downscaling for ML applications in epidemiology. Comparative analysis reveals that downscaling provided a $$28.5\%$$ 28.5 % improvement in accuracy and an $$89.3\%$$ 89.3 % reduction in mean absolute percentage error (MAPE) over non-downscaled data which has been proven to be statistically significant using the Wilcoxon signed rank test, illustrating the substantial advantages of employing downscaling for effective DENV forecasting. Based on the best-performing model, the study further projects a worst-case scenario for 2024, forecasting daily cases to peak at 1,382 ( $$95\%$$ 95 % CI: 1,341-1,423) between August and October, with a gradual decline expected by December. The findings not only underscore the critical influence of meteorological variables on DENV transmission but also advocate for the adoption of sophisticated data preprocessing techniques, such as downscaling, to enhance prediction accuracy. This research marks a significant advancement in predictive epidemiology, offering a scalable framework for DENV and other vector-borne diseases, with implications for improving public health responses in vulnerable regions globally.https://doi.org/10.1186/s12879-025-11159-zMachine learningDengueTime series analysisTropical diseaseData downscalingPredictive epidemiology
spellingShingle Mahadee Al Mobin
Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy
BMC Infectious Diseases
Machine learning
Dengue
Time series analysis
Tropical disease
Data downscaling
Predictive epidemiology
title Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy
title_full Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy
title_fullStr Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy
title_full_unstemmed Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy
title_short Multivariate forecasting of dengue infection in Bangladesh: evaluating the influence of data downscaling on machine learning predictive accuracy
title_sort multivariate forecasting of dengue infection in bangladesh evaluating the influence of data downscaling on machine learning predictive accuracy
topic Machine learning
Dengue
Time series analysis
Tropical disease
Data downscaling
Predictive epidemiology
url https://doi.org/10.1186/s12879-025-11159-z
work_keys_str_mv AT mahadeealmobin multivariateforecastingofdengueinfectioninbangladeshevaluatingtheinfluenceofdatadownscalingonmachinelearningpredictiveaccuracy