Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018

Environmental pollution plays a major role in the development of prostate cancer (PCA). However, there has been no research on machine learning (ML) modelling between multiple heavy metal exposures and PCA risk. Based on the 8022 samples from the 2003–2018 National Health and Nutrition Examination S...

Full description

Saved in:
Bibliographic Details
Main Authors: Zu-Ming You, Yuan-Sheng Li, Fan-Shuo Meng, Rui-Xiang Zhang, Chen-Xi Xie, Zhijiang Liang, Ji-Yuan Zhou
Format: Article
Language:English
Published: Elsevier 2025-09-01
Series:Ecotoxicology and Environmental Safety
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0147651325010759
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849389690278379520
author Zu-Ming You
Yuan-Sheng Li
Fan-Shuo Meng
Rui-Xiang Zhang
Chen-Xi Xie
Zhijiang Liang
Ji-Yuan Zhou
author_facet Zu-Ming You
Yuan-Sheng Li
Fan-Shuo Meng
Rui-Xiang Zhang
Chen-Xi Xie
Zhijiang Liang
Ji-Yuan Zhou
author_sort Zu-Ming You
collection DOAJ
description Environmental pollution plays a major role in the development of prostate cancer (PCA). However, there has been no research on machine learning (ML) modelling between multiple heavy metal exposures and PCA risk. Based on the 8022 samples from the 2003–2018 National Health and Nutrition Examination Survey (NHANES) database, we utilized the information pertaining to the concentrations of 18 blood and urinary heavy metals and minerals as well as 14 covariates. Among the eight ML models evaluated, the random forest (RF) algorithm showed superior performance, achieving an accuracy of 72.835 %, an area under the receiver operating characteristic curve (AUC) of 0.869, an F1 score of 0.145, a G-mean of 0.749, and a Youden index of 0.498 in the test set. Four interpretable methods were integrated into the ML model. RF found that specific levels of blood lead (Pb) (0.449–29.964 µg/dL), urinary cesium (Cs) (1.822–270.426 µg/L), and urinary antimony (Sb) (0.015–4.953 µg/L) were positively associated with the PCA risk, while blood cadmium (Cd) (0.247–9.025 µg/L) showed a negative association. Notably, urinary Cs and Sb emerged as novel risk-related metals for the PCA in our study. The synergistic effect analysis further identified blood Pb, urinary Sb, and urinary Cs as the major contributing factors. The predictive model established in this study can provide valuable strategies for the prevention and the control of PCA.
format Article
id doaj-art-33ff7db65d32459bae9c843afd6538a6
institution Kabale University
issn 0147-6513
language English
publishDate 2025-09-01
publisher Elsevier
record_format Article
series Ecotoxicology and Environmental Safety
spelling doaj-art-33ff7db65d32459bae9c843afd6538a62025-08-20T03:41:53ZengElsevierEcotoxicology and Environmental Safety0147-65132025-09-0130211873010.1016/j.ecoenv.2025.118730Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018Zu-Ming You0Yuan-Sheng Li1Fan-Shuo Meng2Rui-Xiang Zhang3Chen-Xi Xie4Zhijiang Liang5Ji-Yuan Zhou6Department of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Medical Information and Statistics, Guangdong Women and Children Hospital, 521 Xingnan Road, Panyu District, Guangzhou 511442, China; Corresponding author.Department of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, China; Correspondence to: Department of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, No. 1023, South Shatai Road, Baiyun District, Guangzhou 510515, China.Environmental pollution plays a major role in the development of prostate cancer (PCA). However, there has been no research on machine learning (ML) modelling between multiple heavy metal exposures and PCA risk. Based on the 8022 samples from the 2003–2018 National Health and Nutrition Examination Survey (NHANES) database, we utilized the information pertaining to the concentrations of 18 blood and urinary heavy metals and minerals as well as 14 covariates. Among the eight ML models evaluated, the random forest (RF) algorithm showed superior performance, achieving an accuracy of 72.835 %, an area under the receiver operating characteristic curve (AUC) of 0.869, an F1 score of 0.145, a G-mean of 0.749, and a Youden index of 0.498 in the test set. Four interpretable methods were integrated into the ML model. RF found that specific levels of blood lead (Pb) (0.449–29.964 µg/dL), urinary cesium (Cs) (1.822–270.426 µg/L), and urinary antimony (Sb) (0.015–4.953 µg/L) were positively associated with the PCA risk, while blood cadmium (Cd) (0.247–9.025 µg/L) showed a negative association. Notably, urinary Cs and Sb emerged as novel risk-related metals for the PCA in our study. The synergistic effect analysis further identified blood Pb, urinary Sb, and urinary Cs as the major contributing factors. The predictive model established in this study can provide valuable strategies for the prevention and the control of PCA.http://www.sciencedirect.com/science/article/pii/S0147651325010759Interpretable machine learningRandom forest algorithmHeavy metalsProstate cancerPrevention and control
spellingShingle Zu-Ming You
Yuan-Sheng Li
Fan-Shuo Meng
Rui-Xiang Zhang
Chen-Xi Xie
Zhijiang Liang
Ji-Yuan Zhou
Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018
Ecotoxicology and Environmental Safety
Interpretable machine learning
Random forest algorithm
Heavy metals
Prostate cancer
Prevention and control
title Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018
title_full Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018
title_fullStr Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018
title_full_unstemmed Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018
title_short Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018
title_sort interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from nhanes 2003 2018
topic Interpretable machine learning
Random forest algorithm
Heavy metals
Prostate cancer
Prevention and control
url http://www.sciencedirect.com/science/article/pii/S0147651325010759
work_keys_str_mv AT zumingyou interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018
AT yuanshengli interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018
AT fanshuomeng interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018
AT ruixiangzhang interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018
AT chenxixie interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018
AT zhijiangliang interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018
AT jiyuanzhou interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018