Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learning

This article describes a comprehensive framework for soil organic carbon density (SOCD, kg/m3) modeling and mapping, based on spatiotemporal random forest (RF) and quantile regression forests (QRF). A total of 45,616 SOCD observations and various Earth observation (EO) feature layers were used to pr...

Full description

Saved in:
Bibliographic Details
Main Authors: Xuemeng Tian, Sytze de Bruin, Rolf Simoes, Mustafa Serkan Isik, Robert Minarik, Yu-Feng Ho, Murat Şahin, Martin Herold, Davide Consoli, Tomislav Hengl
Format: Article
Language:English
Published: PeerJ Inc. 2025-07-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/19605.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849430913410138112
author Xuemeng Tian
Sytze de Bruin
Rolf Simoes
Mustafa Serkan Isik
Robert Minarik
Yu-Feng Ho
Murat Şahin
Martin Herold
Davide Consoli
Tomislav Hengl
author_facet Xuemeng Tian
Sytze de Bruin
Rolf Simoes
Mustafa Serkan Isik
Robert Minarik
Yu-Feng Ho
Murat Şahin
Martin Herold
Davide Consoli
Tomislav Hengl
author_sort Xuemeng Tian
collection DOAJ
description This article describes a comprehensive framework for soil organic carbon density (SOCD, kg/m3) modeling and mapping, based on spatiotemporal random forest (RF) and quantile regression forests (QRF). A total of 45,616 SOCD observations and various Earth observation (EO) feature layers were used to produce 30 m SOCD maps for the EU at four-year intervals (2000–2022) and four soil depth intervals (0–20 cm, 20–50 cm, 50–100 cm, and 100–200 cm). Per-pixel 95% probability prediction intervals (PIs) and extrapolation risk probabilities are also provided. Model evaluation indicates good overall accuracy (R2 = 0.63 and CCC = 0.76 for hold-out independent tests). Prediction accuracy varies by land cover, depth interval and year of prediction with the worst accuracy for shrubland and deeper soils 100–200 cm. The PI validation confirmed effective uncertainty estimation, though with reduced accuracy for higher SOCD values. Shapley analysis identified soil depth as the most influential feature, followed by vegetation, long-term bioclimate, and topographic features. While pixel-level uncertainty is substantial, spatial aggregation reduces uncertainty by approximately 66%. Detecting SOCD changes remains challenging but offers a baseline for future improvements. Maps, based primarily on topsoil data from cropland, grassland, and woodland, are best suited for applications related to these land covers and depths. We recommend that users interpret the maps in conjunction with local knowledge and consider the accompanying uncertainty and extrapolation risk layers. All data and code are available under an open license at https://doi.org/10.5281/zenodo.13754343 and https://github.com/AI4SoilHealth/SoilHealthDataCube/.
format Article
id doaj-art-c8e4613dc4d742e591dbe9b4f4a40f46
institution Kabale University
issn 2167-8359
language English
publishDate 2025-07-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj-art-c8e4613dc4d742e591dbe9b4f4a40f462025-08-20T03:27:48ZengPeerJ Inc.PeerJ2167-83592025-07-0113e1960510.7717/peerj.19605Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learningXuemeng Tian0Sytze de Bruin1Rolf Simoes2Mustafa Serkan Isik3Robert Minarik4Yu-Feng Ho5Murat Şahin6Martin Herold7Davide Consoli8Tomislav Hengl9OpenGeoHub, Doorwerth, NetherlandsLaboratory of Geo-Information Science and Remote Sensing, Wageningen University and Research, Wageningen, NetherlandsOpenGeoHub, Doorwerth, NetherlandsOpenGeoHub, Doorwerth, NetherlandsOpenGeoHub, Doorwerth, NetherlandsOpenGeoHub, Doorwerth, NetherlandsDepartment of Geosciences & Engineering, Delft University of Technology, Delft, NetherlandsLaboratory of Geo-Information Science and Remote Sensing, Wageningen University and Research, Wageningen, NetherlandsOpenGeoHub, Doorwerth, NetherlandsOpenGeoHub, Doorwerth, NetherlandsThis article describes a comprehensive framework for soil organic carbon density (SOCD, kg/m3) modeling and mapping, based on spatiotemporal random forest (RF) and quantile regression forests (QRF). A total of 45,616 SOCD observations and various Earth observation (EO) feature layers were used to produce 30 m SOCD maps for the EU at four-year intervals (2000–2022) and four soil depth intervals (0–20 cm, 20–50 cm, 50–100 cm, and 100–200 cm). Per-pixel 95% probability prediction intervals (PIs) and extrapolation risk probabilities are also provided. Model evaluation indicates good overall accuracy (R2 = 0.63 and CCC = 0.76 for hold-out independent tests). Prediction accuracy varies by land cover, depth interval and year of prediction with the worst accuracy for shrubland and deeper soils 100–200 cm. The PI validation confirmed effective uncertainty estimation, though with reduced accuracy for higher SOCD values. Shapley analysis identified soil depth as the most influential feature, followed by vegetation, long-term bioclimate, and topographic features. While pixel-level uncertainty is substantial, spatial aggregation reduces uncertainty by approximately 66%. Detecting SOCD changes remains challenging but offers a baseline for future improvements. Maps, based primarily on topsoil data from cropland, grassland, and woodland, are best suited for applications related to these land covers and depths. We recommend that users interpret the maps in conjunction with local knowledge and consider the accompanying uncertainty and extrapolation risk layers. All data and code are available under an open license at https://doi.org/10.5281/zenodo.13754343 and https://github.com/AI4SoilHealth/SoilHealthDataCube/.https://peerj.com/articles/19605.pdfSoil organic carbon densityMachine learningEarth observationUncertaintySpatial aggregationTime series
spellingShingle Xuemeng Tian
Sytze de Bruin
Rolf Simoes
Mustafa Serkan Isik
Robert Minarik
Yu-Feng Ho
Murat Şahin
Martin Herold
Davide Consoli
Tomislav Hengl
Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learning
PeerJ
Soil organic carbon density
Machine learning
Earth observation
Uncertainty
Spatial aggregation
Time series
title Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learning
title_full Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learning
title_fullStr Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learning
title_full_unstemmed Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learning
title_short Spatiotemporal prediction of soil organic carbon density in Europe (2000–2022) using earth observation and machine learning
title_sort spatiotemporal prediction of soil organic carbon density in europe 2000 2022 using earth observation and machine learning
topic Soil organic carbon density
Machine learning
Earth observation
Uncertainty
Spatial aggregation
Time series
url https://peerj.com/articles/19605.pdf
work_keys_str_mv AT xuemengtian spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT sytzedebruin spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT rolfsimoes spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT mustafaserkanisik spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT robertminarik spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT yufengho spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT muratsahin spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT martinherold spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT davideconsoli spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning
AT tomislavhengl spatiotemporalpredictionofsoilorganiccarbondensityineurope20002022usingearthobservationandmachinelearning