Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023
Abstract Tobacco smoking is a significant public health issue in sub-Saharan Africa, with its prevalence shaped by various demographic factors. This study aimed to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 using machine learning algorithms. Data from Demogr...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-07-01
|
| Series: | Scientific Reports |
| Subjects: | |
| Online Access: | https://doi.org/10.1038/s41598-025-09380-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849344271937699840 |
|---|---|
| author | Mequannent Sharew Melaku Nebebe Demis Baykemagn Lamrot Yohannes Adem Tsegaw Zegeye |
| author_facet | Mequannent Sharew Melaku Nebebe Demis Baykemagn Lamrot Yohannes Adem Tsegaw Zegeye |
| author_sort | Mequannent Sharew Melaku |
| collection | DOAJ |
| description | Abstract Tobacco smoking is a significant public health issue in sub-Saharan Africa, with its prevalence shaped by various demographic factors. This study aimed to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 using machine learning algorithms. Data from Demographic and Health Surveys covering 147,466 men were analyzed. STATA version 17 was used for data cleaning and descriptive statistics, while Python 3.9 was employed for machine learning predictions. The study utilized several machine learning models, including Decision Tree, Logistic Regression, Random Forest, KNN, eXtreme Gradient Boosting (XGBoost), and AdaBoost, to identify the key predictors of tobacco use among men. Hyperparameter optimization was performed using Randomized Search with tenfold cross-validation, enhancing model performance. The Additive Explanations (SHAP) method was used to assess predictor significance. Model performance was evaluated based on accuracy, precision, recall, F1 score, and area under the curve (AUC). The study found a pooled tobacco use prevalence of 14.73%, with no significant variation between countries. High tobacco use was observed in Mozambique, Zambia, Benin, Mali, Mauritania, Senegal, Guinea, Sierra Leone, and Liberia, with Tanzania, Benin, and Senegal reporting the highest rates. The XGBoost algorithm attained an accuracy of 98% and an AUC score of 97%. SHAP analysis revealed that age, education, wealth index, religion, residence, internet use, occupation, age at first sex, number of sexual partners, and marital status were key predictors. These findings underscore the need for targeted public health interventions and highlight the value of machine learning in identifying at-risk populations and addressing socio-cultural and economic factors influencing tobacco use. |
| format | Article |
| id | doaj-art-9b2d4b848fd948e68ddc37cfb02e3715 |
| institution | Kabale University |
| issn | 2045-2322 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Scientific Reports |
| spelling | doaj-art-9b2d4b848fd948e68ddc37cfb02e37152025-08-20T03:42:41ZengNature PortfolioScientific Reports2045-23222025-07-0115112010.1038/s41598-025-09380-6Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023Mequannent Sharew Melaku0Nebebe Demis Baykemagn1Lamrot Yohannes2Adem Tsegaw Zegeye3Department of Health Informatics, Institute of Public Health, University of GondarDepartment of Health Informatics, Institute of Public Health, University of GondarDepartment of Environmental and Occupational Health and Safety, Institute of Public Health, College of Medicine and Health Science, University of GondarDepartment of Health Informatics, Institute of Public Health, University of GondarAbstract Tobacco smoking is a significant public health issue in sub-Saharan Africa, with its prevalence shaped by various demographic factors. This study aimed to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 using machine learning algorithms. Data from Demographic and Health Surveys covering 147,466 men were analyzed. STATA version 17 was used for data cleaning and descriptive statistics, while Python 3.9 was employed for machine learning predictions. The study utilized several machine learning models, including Decision Tree, Logistic Regression, Random Forest, KNN, eXtreme Gradient Boosting (XGBoost), and AdaBoost, to identify the key predictors of tobacco use among men. Hyperparameter optimization was performed using Randomized Search with tenfold cross-validation, enhancing model performance. The Additive Explanations (SHAP) method was used to assess predictor significance. Model performance was evaluated based on accuracy, precision, recall, F1 score, and area under the curve (AUC). The study found a pooled tobacco use prevalence of 14.73%, with no significant variation between countries. High tobacco use was observed in Mozambique, Zambia, Benin, Mali, Mauritania, Senegal, Guinea, Sierra Leone, and Liberia, with Tanzania, Benin, and Senegal reporting the highest rates. The XGBoost algorithm attained an accuracy of 98% and an AUC score of 97%. SHAP analysis revealed that age, education, wealth index, religion, residence, internet use, occupation, age at first sex, number of sexual partners, and marital status were key predictors. These findings underscore the need for targeted public health interventions and highlight the value of machine learning in identifying at-risk populations and addressing socio-cultural and economic factors influencing tobacco use.https://doi.org/10.1038/s41598-025-09380-6DeterminantsTobacco smokingSmokeless tobaccoPredictionSub-Saharan Africa |
| spellingShingle | Mequannent Sharew Melaku Nebebe Demis Baykemagn Lamrot Yohannes Adem Tsegaw Zegeye Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 Scientific Reports Determinants Tobacco smoking Smokeless tobacco Prediction Sub-Saharan Africa |
| title | Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 |
| title_full | Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 |
| title_fullStr | Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 |
| title_full_unstemmed | Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 |
| title_short | Exploring explainable machine learning algorithms to model predictors of tobacco use among men in Sub Sahara Africa between 2018 and 2023 |
| title_sort | exploring explainable machine learning algorithms to model predictors of tobacco use among men in sub sahara africa between 2018 and 2023 |
| topic | Determinants Tobacco smoking Smokeless tobacco Prediction Sub-Saharan Africa |
| url | https://doi.org/10.1038/s41598-025-09380-6 |
| work_keys_str_mv | AT mequannentsharewmelaku exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023 AT nebebedemisbaykemagn exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023 AT lamrotyohannes exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023 AT ademtsegawzegeye exploringexplainablemachinelearningalgorithmstomodelpredictorsoftobaccouseamongmeninsubsaharaafricabetween2018and2023 |