Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle

Land surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresent...

Full description

Saved in:
Bibliographic Details
Main Authors: Jeralyn Poe, Deborah Huntzinger, Nathan Collier, Christopher Schwalm, Jon Wells, Christina Schädel, William J Riley, Stephen Sitch
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Environmental Research: Ecology
Subjects:
Online Access:https://doi.org/10.1088/2752-664X/adacee
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832542913212448768
author Jeralyn Poe
Deborah Huntzinger
Nathan Collier
Christopher Schwalm
Jon Wells
Christina Schädel
William J Riley
Stephen Sitch
author_facet Jeralyn Poe
Deborah Huntzinger
Nathan Collier
Christopher Schwalm
Jon Wells
Christina Schädel
William J Riley
Stephen Sitch
author_sort Jeralyn Poe
collection DOAJ
description Land surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresented by models is the Arctic–Boreal region, which is a potential tipping point region in Earth’s climate system due to large permafrost carbon stocks that are vulnerable to release with climate warming. We use the International Land Model Benchmarking (ILAMB) framework to evaluate how the model skill of TRENDY-v9 models varies based on the choice of observational-based benchmark and how benchmarks are applied in model evaluation. This analysis uses global datasets integrated into ILAMB and new, regionally-specific observational products from the Arctic–Boreal Vulnerability Experiment. Our results cover the overall time period of 1979–2019 and show that model scores can vary substantially depending on the data product applied, with higher model scores indicating better model performance against observations. The lowest model scores occur when benchmarked against regional, compared to global, datasets. We also evaluate observed and modeled functional relationships between ecosystem respiration and air temperature and between gross primary production and precipitation. Here, we find that the magnitude and shape of the responses are strongly impacted by the choice of observational dataset and the approach used to construct the functional relationship benchmark. These results suggest that model evaluation studies could conclude a false sense of model skill if only using a single benchmark data product or if not applying regional data products when performing a regional model analysis. Collectively, our findings highlight the influence of benchmarking choices on model evaluation and point to the need for benchmarking guidelines when assessing model skill.
format Article
id doaj-art-b42b77d68be643ad87ef794eaa5d2894
institution Kabale University
issn 2752-664X
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Environmental Research: Ecology
spelling doaj-art-b42b77d68be643ad87ef794eaa5d28942025-02-03T12:13:19ZengIOP PublishingEnvironmental Research: Ecology2752-664X2025-01-014101500710.1088/2752-664X/adaceeImpacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycleJeralyn Poe0https://orcid.org/0000-0003-1849-5278Deborah Huntzinger1Nathan Collier2Christopher Schwalm3Jon Wells4https://orcid.org/0000-0001-5037-8798Christina Schädel5https://orcid.org/0000-0003-2145-6210William J Riley6https://orcid.org/0000-0002-4615-2304Stephen Sitch7School of Informatics, Computing, and Cyber Systems, Northern Arizona University , Flagstaff, AZ, United States of AmericaSchool of Informatics, Computing, and Cyber Systems, Northern Arizona University , Flagstaff, AZ, United States of America; School of Earth and Sustainability, Northern Arizona University , Flagstaff, AZ, United States of America; Center for Ecosystem Science and Society, Northern Arizona University , Flagstaff, AZ, United States of AmericaComputational Sciences and Engineering Division, Oak Ridge National Laboratory , Oak Ridge, TN, United States of AmericaWoodwell Climate Research Center , Falmouth, MA, United States of AmericaCenter for Ecosystem Science and Society, Northern Arizona University , Flagstaff, AZ, United States of AmericaWoodwell Climate Research Center , Falmouth, MA, United States of AmericaEarth and Environmental Sciences Area, Lawrence Berkeley National Laboratory , Berkeley, CA, United States of AmericaCollege of Life and Environmental Sciences, University of Exeter , Exeter, United KingdomLand surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresented by models is the Arctic–Boreal region, which is a potential tipping point region in Earth’s climate system due to large permafrost carbon stocks that are vulnerable to release with climate warming. We use the International Land Model Benchmarking (ILAMB) framework to evaluate how the model skill of TRENDY-v9 models varies based on the choice of observational-based benchmark and how benchmarks are applied in model evaluation. This analysis uses global datasets integrated into ILAMB and new, regionally-specific observational products from the Arctic–Boreal Vulnerability Experiment. Our results cover the overall time period of 1979–2019 and show that model scores can vary substantially depending on the data product applied, with higher model scores indicating better model performance against observations. The lowest model scores occur when benchmarked against regional, compared to global, datasets. We also evaluate observed and modeled functional relationships between ecosystem respiration and air temperature and between gross primary production and precipitation. Here, we find that the magnitude and shape of the responses are strongly impacted by the choice of observational dataset and the approach used to construct the functional relationship benchmark. These results suggest that model evaluation studies could conclude a false sense of model skill if only using a single benchmark data product or if not applying regional data products when performing a regional model analysis. Collectively, our findings highlight the influence of benchmarking choices on model evaluation and point to the need for benchmarking guidelines when assessing model skill.https://doi.org/10.1088/2752-664X/adaceeModel validationILAMBArctic–Borealfunctional relationshipTRENDY
spellingShingle Jeralyn Poe
Deborah Huntzinger
Nathan Collier
Christopher Schwalm
Jon Wells
Christina Schädel
William J Riley
Stephen Sitch
Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
Environmental Research: Ecology
Model validation
ILAMB
Arctic–Boreal
functional relationship
TRENDY
title Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_full Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_fullStr Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_full_unstemmed Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_short Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_sort impacts of benchmarking choices on inferred model skill of the arctic boreal terrestrial carbon cycle
topic Model validation
ILAMB
Arctic–Boreal
functional relationship
TRENDY
url https://doi.org/10.1088/2752-664X/adacee
work_keys_str_mv AT jeralynpoe impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle
AT deborahhuntzinger impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle
AT nathancollier impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle
AT christopherschwalm impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle
AT jonwells impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle
AT christinaschadel impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle
AT williamjriley impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle
AT stephensitch impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle