Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018
Environmental pollution plays a major role in the development of prostate cancer (PCA). However, there has been no research on machine learning (ML) modelling between multiple heavy metal exposures and PCA risk. Based on the 8022 samples from the 2003–2018 National Health and Nutrition Examination S...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-09-01
|
| Series: | Ecotoxicology and Environmental Safety |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S0147651325010759 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849389690278379520 |
|---|---|
| author | Zu-Ming You Yuan-Sheng Li Fan-Shuo Meng Rui-Xiang Zhang Chen-Xi Xie Zhijiang Liang Ji-Yuan Zhou |
| author_facet | Zu-Ming You Yuan-Sheng Li Fan-Shuo Meng Rui-Xiang Zhang Chen-Xi Xie Zhijiang Liang Ji-Yuan Zhou |
| author_sort | Zu-Ming You |
| collection | DOAJ |
| description | Environmental pollution plays a major role in the development of prostate cancer (PCA). However, there has been no research on machine learning (ML) modelling between multiple heavy metal exposures and PCA risk. Based on the 8022 samples from the 2003–2018 National Health and Nutrition Examination Survey (NHANES) database, we utilized the information pertaining to the concentrations of 18 blood and urinary heavy metals and minerals as well as 14 covariates. Among the eight ML models evaluated, the random forest (RF) algorithm showed superior performance, achieving an accuracy of 72.835 %, an area under the receiver operating characteristic curve (AUC) of 0.869, an F1 score of 0.145, a G-mean of 0.749, and a Youden index of 0.498 in the test set. Four interpretable methods were integrated into the ML model. RF found that specific levels of blood lead (Pb) (0.449–29.964 µg/dL), urinary cesium (Cs) (1.822–270.426 µg/L), and urinary antimony (Sb) (0.015–4.953 µg/L) were positively associated with the PCA risk, while blood cadmium (Cd) (0.247–9.025 µg/L) showed a negative association. Notably, urinary Cs and Sb emerged as novel risk-related metals for the PCA in our study. The synergistic effect analysis further identified blood Pb, urinary Sb, and urinary Cs as the major contributing factors. The predictive model established in this study can provide valuable strategies for the prevention and the control of PCA. |
| format | Article |
| id | doaj-art-33ff7db65d32459bae9c843afd6538a6 |
| institution | Kabale University |
| issn | 0147-6513 |
| language | English |
| publishDate | 2025-09-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Ecotoxicology and Environmental Safety |
| spelling | doaj-art-33ff7db65d32459bae9c843afd6538a62025-08-20T03:41:53ZengElsevierEcotoxicology and Environmental Safety0147-65132025-09-0130211873010.1016/j.ecoenv.2025.118730Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018Zu-Ming You0Yuan-Sheng Li1Fan-Shuo Meng2Rui-Xiang Zhang3Chen-Xi Xie4Zhijiang Liang5Ji-Yuan Zhou6Department of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, ChinaDepartment of Medical Information and Statistics, Guangdong Women and Children Hospital, 521 Xingnan Road, Panyu District, Guangzhou 511442, China; Corresponding author.Department of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, Guangzhou 510515, China; Correspondence to: Department of Biostatistics, School of Public Health (State Key Laboratory of Multi-organ Injury Prevention and Treatment, and Guangdong Provincial Key Laboratory of Tropical Disease Research), Southern Medical University, No. 1023, South Shatai Road, Baiyun District, Guangzhou 510515, China.Environmental pollution plays a major role in the development of prostate cancer (PCA). However, there has been no research on machine learning (ML) modelling between multiple heavy metal exposures and PCA risk. Based on the 8022 samples from the 2003–2018 National Health and Nutrition Examination Survey (NHANES) database, we utilized the information pertaining to the concentrations of 18 blood and urinary heavy metals and minerals as well as 14 covariates. Among the eight ML models evaluated, the random forest (RF) algorithm showed superior performance, achieving an accuracy of 72.835 %, an area under the receiver operating characteristic curve (AUC) of 0.869, an F1 score of 0.145, a G-mean of 0.749, and a Youden index of 0.498 in the test set. Four interpretable methods were integrated into the ML model. RF found that specific levels of blood lead (Pb) (0.449–29.964 µg/dL), urinary cesium (Cs) (1.822–270.426 µg/L), and urinary antimony (Sb) (0.015–4.953 µg/L) were positively associated with the PCA risk, while blood cadmium (Cd) (0.247–9.025 µg/L) showed a negative association. Notably, urinary Cs and Sb emerged as novel risk-related metals for the PCA in our study. The synergistic effect analysis further identified blood Pb, urinary Sb, and urinary Cs as the major contributing factors. The predictive model established in this study can provide valuable strategies for the prevention and the control of PCA.http://www.sciencedirect.com/science/article/pii/S0147651325010759Interpretable machine learningRandom forest algorithmHeavy metalsProstate cancerPrevention and control |
| spellingShingle | Zu-Ming You Yuan-Sheng Li Fan-Shuo Meng Rui-Xiang Zhang Chen-Xi Xie Zhijiang Liang Ji-Yuan Zhou Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018 Ecotoxicology and Environmental Safety Interpretable machine learning Random forest algorithm Heavy metals Prostate cancer Prevention and control |
| title | Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018 |
| title_full | Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018 |
| title_fullStr | Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018 |
| title_full_unstemmed | Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018 |
| title_short | Interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from NHANES 2003–2018 |
| title_sort | interpretable machine learning approaches for predicting prostate cancer by using multiple heavy metal exposures based on the data from nhanes 2003 2018 |
| topic | Interpretable machine learning Random forest algorithm Heavy metals Prostate cancer Prevention and control |
| url | http://www.sciencedirect.com/science/article/pii/S0147651325010759 |
| work_keys_str_mv | AT zumingyou interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018 AT yuanshengli interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018 AT fanshuomeng interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018 AT ruixiangzhang interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018 AT chenxixie interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018 AT zhijiangliang interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018 AT jiyuanzhou interpretablemachinelearningapproachesforpredictingprostatecancerbyusingmultipleheavymetalexposuresbasedonthedatafromnhanes20032018 |