Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023

Abstract Tobacco smoking is a significant public health issue in sub-Saharan Africa, with its prevalence shaped by various demographic factors. This study aimed to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 using machine learning algorithms. Data from Demogr...

Full description

Saved in:
Bibliographic Details
Main Authors: Mequannent Sharew Melaku, Nebebe Demis Baykemagn, Lamrot Yohannes, Adem Tsegaw Zegeye
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-09380-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849344271937699840
author Mequannent Sharew Melaku
Nebebe Demis Baykemagn
Lamrot Yohannes
Adem Tsegaw Zegeye
author_facet Mequannent Sharew Melaku
Nebebe Demis Baykemagn
Lamrot Yohannes
Adem Tsegaw Zegeye
author_sort Mequannent Sharew Melaku
collection DOAJ
description Abstract Tobacco smoking is a significant public health issue in sub-Saharan Africa, with its prevalence shaped by various demographic factors. This study aimed to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 using machine learning algorithms. Data from Demographic and Health Surveys covering 147,466 men were analyzed. STATA version 17 was used for data cleaning and descriptive statistics, while Python 3.9 was employed for machine learning predictions. The study utilized several machine learning models, including Decision Tree, Logistic Regression, Random Forest, KNN, eXtreme Gradient Boosting (XGBoost), and AdaBoost, to identify the key predictors of tobacco use among men. Hyperparameter optimization was performed using Randomized Search with tenfold cross-validation, enhancing model performance. The Additive Explanations (SHAP) method was used to assess predictor significance. Model performance was evaluated based on accuracy, precision, recall, F1 score, and area under the curve (AUC). The study found a pooled tobacco use prevalence of 14.73%, with no significant variation between countries. High tobacco use was observed in Mozambique, Zambia, Benin, Mali, Mauritania, Senegal, Guinea, Sierra Leone, and Liberia, with Tanzania, Benin, and Senegal reporting the highest rates. The XGBoost algorithm attained an accuracy of 98% and an AUC score of 97%. SHAP analysis revealed that age, education, wealth index, religion, residence, internet use, occupation, age at first sex, number of sexual partners, and marital status were key predictors. These findings underscore the need for targeted public health interventions and highlight the value of machine learning in identifying at-risk populations and addressing socio-cultural and economic factors influencing tobacco use.
format Article
id doaj-art-9b2d4b848fd948e68ddc37cfb02e3715
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-9b2d4b848fd948e68ddc37cfb02e37152025-08-20T03:42:41ZengNature PortfolioScientific Reports2045-23222025-07-0115112010.1038/s41598-025-09380-6Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023Mequannent Sharew Melaku0Nebebe Demis Baykemagn1Lamrot Yohannes2Adem Tsegaw Zegeye3Department of Health Informatics, Institute of Public Health, University of GondarDepartment of Health Informatics, Institute of Public Health, University of GondarDepartment of Environmental and Occupational Health and Safety, Institute of Public Health, College of Medicine and Health Science, University of GondarDepartment of Health Informatics, Institute of Public Health, University of GondarAbstract Tobacco smoking is a significant public health issue in sub-Saharan Africa, with its prevalence shaped by various demographic factors. This study aimed to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 using machine learning algorithms. Data from Demographic and Health Surveys covering 147,466 men were analyzed. STATA version 17 was used for data cleaning and descriptive statistics, while Python 3.9 was employed for machine learning predictions. The study utilized several machine learning models, including Decision Tree, Logistic Regression, Random Forest, KNN, eXtreme Gradient Boosting (XGBoost), and AdaBoost, to identify the key predictors of tobacco use among men. Hyperparameter optimization was performed using Randomized Search with tenfold cross-validation, enhancing model performance. The Additive Explanations (SHAP) method was used to assess predictor significance. Model performance was evaluated based on accuracy, precision, recall, F1 score, and area under the curve (AUC). The study found a pooled tobacco use prevalence of 14.73%, with no significant variation between countries. High tobacco use was observed in Mozambique, Zambia, Benin, Mali, Mauritania, Senegal, Guinea, Sierra Leone, and Liberia, with Tanzania, Benin, and Senegal reporting the highest rates. The XGBoost algorithm attained an accuracy of 98% and an AUC score of 97%. SHAP analysis revealed that age, education, wealth index, religion, residence, internet use, occupation, age at first sex, number of sexual partners, and marital status were key predictors. These findings underscore the need for targeted public health interventions and highlight the value of machine learning in identifying at-risk populations and addressing socio-cultural and economic factors influencing tobacco use.https://doi.org/10.1038/s41598-025-09380-6DeterminantsTobacco smokingSmokeless tobaccoPredictionSub-Saharan Africa
spellingShingle Mequannent Sharew Melaku
Nebebe Demis Baykemagn
Lamrot Yohannes
Adem Tsegaw Zegeye
Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023
Scientific Reports
Determinants
Tobacco smoking
Smokeless tobacco
Prediction
Sub-Saharan Africa
title Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023
title_full Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023
title_fullStr Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023
title_full_unstemmed Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023
title_short Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023
title_sort exploring explainable machine learning algorithms to model predictors of tobacco use among men in sub sahara africa between 2018 and 2023
topic Determinants
Tobacco smoking
Smokeless tobacco
Prediction
Sub-Saharan Africa
url https://doi.org/10.1038/s41598-025-09380-6
work_keys_str_mv AT mequannentsharewmelaku exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023
AT nebebedemisbaykemagn exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023
AT lamrotyohannes exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023
AT ademtsegawzegeye exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023