Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning method

To explore the impact of environmental factors on soil organic carbon (SOC) with machine learning (ML) model is of great significance for mitigating climate change and soil carbon sequestration and emission reduction. However, the traditional ML model is limited by the hyperparameter adjustment of a...

Full description

Saved in:
Bibliographic Details
Main Authors: Feng Wang, Ruilin Liang, Shuyue Li, Meiyan Xiang, Weihao Yang, Miao Lu, Yingqiang Song
Format: Article
Language:English
Published: Elsevier 2024-12-01
Series:Ecological Indicators
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1470160X24013220
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850064981872082944
author Feng Wang
Ruilin Liang
Shuyue Li
Meiyan Xiang
Weihao Yang
Miao Lu
Yingqiang Song
author_facet Feng Wang
Ruilin Liang
Shuyue Li
Meiyan Xiang
Weihao Yang
Miao Lu
Yingqiang Song
author_sort Feng Wang
collection DOAJ
description To explore the impact of environmental factors on soil organic carbon (SOC) with machine learning (ML) model is of great significance for mitigating climate change and soil carbon sequestration and emission reduction. However, the traditional ML model is limited by the hyperparameter adjustment of artificially trial-and-error experimentation and the inexplicability of fitting process, and the precision and performance of ML model cannot be fully utilized. For the end, this study developed a tree-structured Parzen estimator-extreme gradient boosting (TPE-XGBoost) method based on SHapley additive explanations (SHAP) analysis to analyze the response of climate, human activities, soil properties and terrain for SOC (0-200cm) in different land use types of China. The results of descriptive statistics described the order of SOC content: forest land > grassland > cultivated land > unused land. With the increase of soil depth, the SOC content of all land types decreased continuously, and the values indicate a left-skewed non-normal distribution. The fitting accuracy (R2) of TPE-XGBoost model for SOC content was greater than 0.8. At the depth of 0-5cm, the prediction accuracy of cultivated land (R2 = 0.96), grassland (R2 = 0.93), forest land (R2 = 0.95) and unused land (R2 = 0.95) was the highest. The result of SHAP analysis showed that the factors that contributed the most to the fitting accuracy of cultivated land, grassland, forest land and unused land in all depths were temperature, soil pH, temperature and elevation. From surface to deep soil, the mean SHAP value showed a downward trend, indicating that the driving force of environmental factors on the content of SOC gradually weakened. The individual explanations of the variance partitioning (VP) analysis of climate, terrain, and soil property for cultivated land (0-200cm), forest land (30-60cm), and unused land (0-200cm) was as high as 0.32, 0.17, and 0.16, respectively, which indicated that these environmental factors had a high response to SOC content. It is found that the appropriate temperature not only promotes plant roots to obtain nutrients, but also interacts with soil pH on microorganisms, thereby increasing the SOC content. The results confirm that the TPE-XGBoost model based on SHAP analysis can reliably explain the nonlinear driving effect of environmental factors on the SOC, which provides credible decision support for accounting carbon budget and carbon sequestration in large-scale regions.
format Article
id doaj-art-9126a13cb17b49c4a79f665a9a6ff1ca
institution DOAJ
issn 1470-160X
language English
publishDate 2024-12-01
publisher Elsevier
record_format Article
series Ecological Indicators
spelling doaj-art-9126a13cb17b49c4a79f665a9a6ff1ca2025-08-20T02:49:08ZengElsevierEcological Indicators1470-160X2024-12-0116911286510.1016/j.ecolind.2024.112865Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning methodFeng Wang0Ruilin Liang1Shuyue Li2Meiyan Xiang3Weihao Yang4Miao Lu5Yingqiang Song6School of civil engineering and geomatics, Shandong University of Technology, Zibo 255000, ChinaSchool of civil engineering and geomatics, Shandong University of Technology, Zibo 255000, ChinaSchool of civil engineering and geomatics, Shandong University of Technology, Zibo 255000, ChinaSchool of civil engineering and geomatics, Shandong University of Technology, Zibo 255000, ChinaSchool of civil engineering and geomatics, Shandong University of Technology, Zibo 255000, ChinaState Key Laboratory of Efficient Utilization of Arid and Semi-arid Arable Land in Northern China / Key Labora tory of Agricultural Remote Sensing, Ministry of Agriculture and Rural Affairs / Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Center of Technology Innovationfor Comprehensive Utilization of Saline-Alkali Land, Dongying 257300, China; Corresponding authors.School of civil engineering and geomatics, Shandong University of Technology, Zibo 255000, China; National Center of Technology Innovationfor Comprehensive Utilization of Saline-Alkali Land, Dongying 257300, China; Corresponding authors.To explore the impact of environmental factors on soil organic carbon (SOC) with machine learning (ML) model is of great significance for mitigating climate change and soil carbon sequestration and emission reduction. However, the traditional ML model is limited by the hyperparameter adjustment of artificially trial-and-error experimentation and the inexplicability of fitting process, and the precision and performance of ML model cannot be fully utilized. For the end, this study developed a tree-structured Parzen estimator-extreme gradient boosting (TPE-XGBoost) method based on SHapley additive explanations (SHAP) analysis to analyze the response of climate, human activities, soil properties and terrain for SOC (0-200cm) in different land use types of China. The results of descriptive statistics described the order of SOC content: forest land > grassland > cultivated land > unused land. With the increase of soil depth, the SOC content of all land types decreased continuously, and the values indicate a left-skewed non-normal distribution. The fitting accuracy (R2) of TPE-XGBoost model for SOC content was greater than 0.8. At the depth of 0-5cm, the prediction accuracy of cultivated land (R2 = 0.96), grassland (R2 = 0.93), forest land (R2 = 0.95) and unused land (R2 = 0.95) was the highest. The result of SHAP analysis showed that the factors that contributed the most to the fitting accuracy of cultivated land, grassland, forest land and unused land in all depths were temperature, soil pH, temperature and elevation. From surface to deep soil, the mean SHAP value showed a downward trend, indicating that the driving force of environmental factors on the content of SOC gradually weakened. The individual explanations of the variance partitioning (VP) analysis of climate, terrain, and soil property for cultivated land (0-200cm), forest land (30-60cm), and unused land (0-200cm) was as high as 0.32, 0.17, and 0.16, respectively, which indicated that these environmental factors had a high response to SOC content. It is found that the appropriate temperature not only promotes plant roots to obtain nutrients, but also interacts with soil pH on microorganisms, thereby increasing the SOC content. The results confirm that the TPE-XGBoost model based on SHAP analysis can reliably explain the nonlinear driving effect of environmental factors on the SOC, which provides credible decision support for accounting carbon budget and carbon sequestration in large-scale regions.http://www.sciencedirect.com/science/article/pii/S1470160X24013220HyperparameterMachine learningLand useSoil organic carbon
spellingShingle Feng Wang
Ruilin Liang
Shuyue Li
Meiyan Xiang
Weihao Yang
Miao Lu
Yingqiang Song
Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning method
Ecological Indicators
Hyperparameter
Machine learning
Land use
Soil organic carbon
title Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning method
title_full Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning method
title_fullStr Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning method
title_full_unstemmed Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning method
title_short Assessing the impact of multi-source environmental variables on soil organic carbon in different land use types of China using an interpretable high-precision machine learning method
title_sort assessing the impact of multi source environmental variables on soil organic carbon in different land use types of china using an interpretable high precision machine learning method
topic Hyperparameter
Machine learning
Land use
Soil organic carbon
url http://www.sciencedirect.com/science/article/pii/S1470160X24013220
work_keys_str_mv AT fengwang assessingtheimpactofmultisourceenvironmentalvariablesonsoilorganiccarbonindifferentlandusetypesofchinausinganinterpretablehighprecisionmachinelearningmethod
AT ruilinliang assessingtheimpactofmultisourceenvironmentalvariablesonsoilorganiccarbonindifferentlandusetypesofchinausinganinterpretablehighprecisionmachinelearningmethod
AT shuyueli assessingtheimpactofmultisourceenvironmentalvariablesonsoilorganiccarbonindifferentlandusetypesofchinausinganinterpretablehighprecisionmachinelearningmethod
AT meiyanxiang assessingtheimpactofmultisourceenvironmentalvariablesonsoilorganiccarbonindifferentlandusetypesofchinausinganinterpretablehighprecisionmachinelearningmethod
AT weihaoyang assessingtheimpactofmultisourceenvironmentalvariablesonsoilorganiccarbonindifferentlandusetypesofchinausinganinterpretablehighprecisionmachinelearningmethod
AT miaolu assessingtheimpactofmultisourceenvironmentalvariablesonsoilorganiccarbonindifferentlandusetypesofchinausinganinterpretablehighprecisionmachinelearningmethod
AT yingqiangsong assessingtheimpactofmultisourceenvironmentalvariablesonsoilorganiccarbonindifferentlandusetypesofchinausinganinterpretablehighprecisionmachinelearningmethod