Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018

Abstract Background Alveolar bone loss (ABL) is common in modern society. Heavy metal exposure is usually considered to be a risk factor for ABL. Some studies revealed a positive trend found between urinary heavy metals and periodontitis using multiple logistic regression and Bayesian kernel machine...

Full description

Saved in:
Bibliographic Details
Main Author: Jiayi Chen
Format: Article
Language:English
Published: BMC 2025-02-01
Series:BMC Public Health
Subjects:
Online Access:https://doi.org/10.1186/s12889-025-21658-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861486605303808
author Jiayi Chen
author_facet Jiayi Chen
author_sort Jiayi Chen
collection DOAJ
description Abstract Background Alveolar bone loss (ABL) is common in modern society. Heavy metal exposure is usually considered to be a risk factor for ABL. Some studies revealed a positive trend found between urinary heavy metals and periodontitis using multiple logistic regression and Bayesian kernel machine regression. Overfitting using kernel function, long calculation period, the definition of prior distribution and lack of rank of heavy metal will affect the performance of the statistical model. Optimal model on this topic still remains controversy. This study aimed: (1) to develop an algorithm for exploring the association between heavy metal exposure and ABL; (2) filter the actual causal variables and investigate how heavy metals were associated with ABL; and (3) identify the potential risk factors for ABL. Methods Data were collected from National Health and Nutrition Examination Survey (NHANES) between 2015 and 2018 to develop a machine learning (ML) model. Feature selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. The selected data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) and divided into a training set and testing set at a 3:1 ratio. Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), and XGboost were used to construct the ML model. Accuracy, Area Under the Receiver Operating Characteristic Curve (AUC), Precision, Recall, and F1 score were used to select the optimal model for further analysis. The contribution of the variables to the ML model was explained using the Shapley Additive Explanations (SHAP) method. Results RF showed the best performance in exploring the association between heavy metal exposure and ABL, with an AUC (0.88), accuracy (0.78), precision (0.76), recall (0.83), and F1 score (0.79). Age was the most important factor in the ML model (mean| SHAP value| = 0.09), and Cd was the primary contributor. Sex had little effect on the ML model contribution. Conclusion In this study, RF showed superior performance compared with the other five algorithms. Among the 12 heavy metals, Cd was the most important factor in the ML model. The relationship of Co & Pb and ABL are weaker than that of Cd. Among all the independent variables, age was considered the most important factor for this model. As for PIR, low-income participants present association with ABL. Mexican American and Non-Hispanic White show low association with ABL compared to Non-Hispanic Black and other races. Gender feature demonstrates a weak association with ABL. In the future, more advanced algorithms should be developed to validate these results and related parameters can be tuned to improve the accuracy of the model. Clinical trial number not applicable.
format Article
id doaj-art-9c258de4d59e46a4b675af0093d0cf55
institution Kabale University
issn 1471-2458
language English
publishDate 2025-02-01
publisher BMC
record_format Article
series BMC Public Health
spelling doaj-art-9c258de4d59e46a4b675af0093d0cf552025-02-09T12:58:34ZengBMCBMC Public Health1471-24582025-02-0125111210.1186/s12889-025-21658-yDevelopment of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018Jiayi Chen0Department of stomatology, Suzhou Wujiang District Hospital of Traditional Chinese MedicineAbstract Background Alveolar bone loss (ABL) is common in modern society. Heavy metal exposure is usually considered to be a risk factor for ABL. Some studies revealed a positive trend found between urinary heavy metals and periodontitis using multiple logistic regression and Bayesian kernel machine regression. Overfitting using kernel function, long calculation period, the definition of prior distribution and lack of rank of heavy metal will affect the performance of the statistical model. Optimal model on this topic still remains controversy. This study aimed: (1) to develop an algorithm for exploring the association between heavy metal exposure and ABL; (2) filter the actual causal variables and investigate how heavy metals were associated with ABL; and (3) identify the potential risk factors for ABL. Methods Data were collected from National Health and Nutrition Examination Survey (NHANES) between 2015 and 2018 to develop a machine learning (ML) model. Feature selection was performed using the Least Absolute Shrinkage and Selection Operator (LASSO) regression with 10-fold cross-validation. The selected data were balanced using the Synthetic Minority Oversampling Technique (SMOTE) and divided into a training set and testing set at a 3:1 ratio. Logistic Regression (LR), Support Vector Machines (SVM), Random Forest (RF), K-Nearest Neighbor (KNN), Decision Tree (DT), and XGboost were used to construct the ML model. Accuracy, Area Under the Receiver Operating Characteristic Curve (AUC), Precision, Recall, and F1 score were used to select the optimal model for further analysis. The contribution of the variables to the ML model was explained using the Shapley Additive Explanations (SHAP) method. Results RF showed the best performance in exploring the association between heavy metal exposure and ABL, with an AUC (0.88), accuracy (0.78), precision (0.76), recall (0.83), and F1 score (0.79). Age was the most important factor in the ML model (mean| SHAP value| = 0.09), and Cd was the primary contributor. Sex had little effect on the ML model contribution. Conclusion In this study, RF showed superior performance compared with the other five algorithms. Among the 12 heavy metals, Cd was the most important factor in the ML model. The relationship of Co & Pb and ABL are weaker than that of Cd. Among all the independent variables, age was considered the most important factor for this model. As for PIR, low-income participants present association with ABL. Mexican American and Non-Hispanic White show low association with ABL compared to Non-Hispanic Black and other races. Gender feature demonstrates a weak association with ABL. In the future, more advanced algorithms should be developed to validate these results and related parameters can be tuned to improve the accuracy of the model. Clinical trial number not applicable.https://doi.org/10.1186/s12889-025-21658-yMachine learningHeavy metalAlveolar bone lossNHANESRandom forest
spellingShingle Jiayi Chen
Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018
BMC Public Health
Machine learning
Heavy metal
Alveolar bone loss
NHANES
Random forest
title Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018
title_full Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018
title_fullStr Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018
title_full_unstemmed Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018
title_short Development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among US adults utilizing SHAP: a study based on NHANES 2015–2018
title_sort development of a machine learning model related to explore the association between heavy metal exposure and alveolar bone loss among us adults utilizing shap a study based on nhanes 2015 2018
topic Machine learning
Heavy metal
Alveolar bone loss
NHANES
Random forest
url https://doi.org/10.1186/s12889-025-21658-y
work_keys_str_mv AT jiayichen developmentofamachinelearningmodelrelatedtoexploretheassociationbetweenheavymetalexposureandalveolarbonelossamongusadultsutilizingshapastudybasedonnhanes20152018