Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector

Identifying and quantifying the dominant factors influencing heavy metal (HM) pollution sources are essential for maintaining soil ecological health and implementing effective pollution control measures. This study analyzed soil HM samples from 53 different land use types in Jiaozuo City, Henan Prov...

Full description

Saved in:
Bibliographic Details
Main Authors: Tong Liu, Mingshi Wang, Mingya Wang, Qinqing Xiong, Luhao Jia, Wanqi Ma, Shaobo Sui, Wei Wu, Xiaoming Guo
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Ecotoxicology and Environmental Safety
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0147651325000673
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823856941944799232
author Tong Liu
Mingshi Wang
Mingya Wang
Qinqing Xiong
Luhao Jia
Wanqi Ma
Shaobo Sui
Wei Wu
Xiaoming Guo
author_facet Tong Liu
Mingshi Wang
Mingya Wang
Qinqing Xiong
Luhao Jia
Wanqi Ma
Shaobo Sui
Wei Wu
Xiaoming Guo
author_sort Tong Liu
collection DOAJ
description Identifying and quantifying the dominant factors influencing heavy metal (HM) pollution sources are essential for maintaining soil ecological health and implementing effective pollution control measures. This study analyzed soil HM samples from 53 different land use types in Jiaozuo City, Henan Province, China. Pollution sources were identified using Absolute Principal Component Score (APCS), with 8 anthropogenic factors, 9 natural factors, and 4 soil physicochemical properties mapped using Geographic Information System (GIS) kernel density estimation. Geodetector and a genetic algorithm optimized random forest model (GA-RF) were employed to quantify the dominant factors and precisely identify pollution sources. A Monte Carlo model was further applied to assess source-oriented health risk probabilities across age groups in the study area. The results revealed three principal components representing pollution sources, with contribution rates of 47.2 %, 33.3 %, and 19.5 %, respectively. For pollution source 1, industrial activities were dominant, with factory density (27.7 %) and distance from the factory (36.3 %) identified as the main factors. Cr, Cu, Mn, and Ni had high loads in this source. Pollution source 2, a combination of natural and transportation influences, was primarily affected by the normalized difference vegetation index (NDVI, 37.8 %), road network density (16.8 %), and proximity to roads (15.3 %). Pollution source 3 was linked to agricultural activities, with cultivated land density (CLD) contributing 39.1 %. As exhibited a high load (91.1 %) in this source, with an exceedance rate of 93 % in cultivated soil, a moderate enrichment factor of 2.33, and a strong ecological risk index of 615.72, making it the most polluted metal in the area. The source-oriented Health Risk Assessment (HRA) showed that agricultural activities contributed 88.7 % to the carcinogenic risk from As in cultivated land. Overall, 99.3 % of the population faced an acceptable cancer risk level. Unlike traditional source apportionment methods, the GA-RF model effectively quantified the contributions of specific influencing factors (e.g., factory density) to pollution sources, rather than merely estimating the percentage contributions of the sources themselves. This approach provides a novel perspective for HM source apportionment under complex environmental conditions.
format Article
id doaj-art-dd4edc0e059e49febb4b0e154e52e15e
institution Kabale University
issn 0147-6513
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Ecotoxicology and Environmental Safety
spelling doaj-art-dd4edc0e059e49febb4b0e154e52e15e2025-02-12T05:30:07ZengElsevierEcotoxicology and Environmental Safety0147-65132025-01-01290117731Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetectorTong Liu0Mingshi Wang1Mingya Wang2Qinqing Xiong3Luhao Jia4Wanqi Ma5Shaobo Sui6Wei Wu7Xiaoming Guo8College of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, ChinaCollege of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, China; Corresponding author.College of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, ChinaCollege of Atmospheric Physics, Nanjing University of Information Science and Technology, Nanjing 210044, ChinaCollege of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, ChinaCollege of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, ChinaCollege of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, ChinaCollege of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, ChinaCollege of Resource and Environment, Henan Polytechnic University, Jiaozuo 454003, ChinaIdentifying and quantifying the dominant factors influencing heavy metal (HM) pollution sources are essential for maintaining soil ecological health and implementing effective pollution control measures. This study analyzed soil HM samples from 53 different land use types in Jiaozuo City, Henan Province, China. Pollution sources were identified using Absolute Principal Component Score (APCS), with 8 anthropogenic factors, 9 natural factors, and 4 soil physicochemical properties mapped using Geographic Information System (GIS) kernel density estimation. Geodetector and a genetic algorithm optimized random forest model (GA-RF) were employed to quantify the dominant factors and precisely identify pollution sources. A Monte Carlo model was further applied to assess source-oriented health risk probabilities across age groups in the study area. The results revealed three principal components representing pollution sources, with contribution rates of 47.2 %, 33.3 %, and 19.5 %, respectively. For pollution source 1, industrial activities were dominant, with factory density (27.7 %) and distance from the factory (36.3 %) identified as the main factors. Cr, Cu, Mn, and Ni had high loads in this source. Pollution source 2, a combination of natural and transportation influences, was primarily affected by the normalized difference vegetation index (NDVI, 37.8 %), road network density (16.8 %), and proximity to roads (15.3 %). Pollution source 3 was linked to agricultural activities, with cultivated land density (CLD) contributing 39.1 %. As exhibited a high load (91.1 %) in this source, with an exceedance rate of 93 % in cultivated soil, a moderate enrichment factor of 2.33, and a strong ecological risk index of 615.72, making it the most polluted metal in the area. The source-oriented Health Risk Assessment (HRA) showed that agricultural activities contributed 88.7 % to the carcinogenic risk from As in cultivated land. Overall, 99.3 % of the population faced an acceptable cancer risk level. Unlike traditional source apportionment methods, the GA-RF model effectively quantified the contributions of specific influencing factors (e.g., factory density) to pollution sources, rather than merely estimating the percentage contributions of the sources themselves. This approach provides a novel perspective for HM source apportionment under complex environmental conditions.http://www.sciencedirect.com/science/article/pii/S0147651325000673APCS-MLRGA-RFGeodetectorHeavy metalsSoil
spellingShingle Tong Liu
Mingshi Wang
Mingya Wang
Qinqing Xiong
Luhao Jia
Wanqi Ma
Shaobo Sui
Wei Wu
Xiaoming Guo
Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector
Ecotoxicology and Environmental Safety
APCS-MLR
GA-RF
Geodetector
Heavy metals
Soil
title Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector
title_full Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector
title_fullStr Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector
title_full_unstemmed Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector
title_short Identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector
title_sort identification of the primary pollution sources and dominant influencing factors of soil heavy metals using a random forest model optimized by genetic algorithm coupled with geodetector
topic APCS-MLR
GA-RF
Geodetector
Heavy metals
Soil
url http://www.sciencedirect.com/science/article/pii/S0147651325000673
work_keys_str_mv AT tongliu identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT mingshiwang identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT mingyawang identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT qinqingxiong identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT luhaojia identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT wanqima identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT shaobosui identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT weiwu identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector
AT xiaomingguo identificationoftheprimarypollutionsourcesanddominantinfluencingfactorsofsoilheavymetalsusingarandomforestmodeloptimizedbygeneticalgorithmcoupledwithgeodetector