Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle

Land surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresent...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jeralyn Poe, Deborah Huntzinger, Nathan Collier, Christopher Schwalm, Jon Wells, Christina Schädel, William J Riley, Stephen Sitch
Format:	Article
Language:	English
Published:	IOP Publishing 2025-01-01
Series:	Environmental Research: Ecology
Subjects:	Model validation ILAMB Arctic–Boreal functional relationship TRENDY
Online Access:	https://doi.org/10.1088/2752-664X/adacee
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832542913212448768
author	Jeralyn Poe Deborah Huntzinger Nathan Collier Christopher Schwalm Jon Wells Christina Schädel William J Riley Stephen Sitch
author_facet	Jeralyn Poe Deborah Huntzinger Nathan Collier Christopher Schwalm Jon Wells Christina Schädel William J Riley Stephen Sitch
author_sort	Jeralyn Poe
collection	DOAJ
description	Land surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresented by models is the Arctic–Boreal region, which is a potential tipping point region in Earth’s climate system due to large permafrost carbon stocks that are vulnerable to release with climate warming. We use the International Land Model Benchmarking (ILAMB) framework to evaluate how the model skill of TRENDY-v9 models varies based on the choice of observational-based benchmark and how benchmarks are applied in model evaluation. This analysis uses global datasets integrated into ILAMB and new, regionally-specific observational products from the Arctic–Boreal Vulnerability Experiment. Our results cover the overall time period of 1979–2019 and show that model scores can vary substantially depending on the data product applied, with higher model scores indicating better model performance against observations. The lowest model scores occur when benchmarked against regional, compared to global, datasets. We also evaluate observed and modeled functional relationships between ecosystem respiration and air temperature and between gross primary production and precipitation. Here, we find that the magnitude and shape of the responses are strongly impacted by the choice of observational dataset and the approach used to construct the functional relationship benchmark. These results suggest that model evaluation studies could conclude a false sense of model skill if only using a single benchmark data product or if not applying regional data products when performing a regional model analysis. Collectively, our findings highlight the influence of benchmarking choices on model evaluation and point to the need for benchmarking guidelines when assessing model skill.
format	Article
id	doaj-art-b42b77d68be643ad87ef794eaa5d2894
institution	Kabale University
issn	2752-664X
language	English
publishDate	2025-01-01
publisher	IOP Publishing
record_format	Article
series	Environmental Research: Ecology
spelling	doaj-art-b42b77d68be643ad87ef794eaa5d28942025-02-03T12:13:19ZengIOP PublishingEnvironmental Research: Ecology2752-664X2025-01-014101500710.1088/2752-664X/adaceeImpacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycleJeralyn Poe0https://orcid.org/0000-0003-1849-5278Deborah Huntzinger1Nathan Collier2Christopher Schwalm3Jon Wells4https://orcid.org/0000-0001-5037-8798Christina Schädel5https://orcid.org/0000-0003-2145-6210William J Riley6https://orcid.org/0000-0002-4615-2304Stephen Sitch7School of Informatics, Computing, and Cyber Systems, Northern Arizona University , Flagstaff, AZ, United States of AmericaSchool of Informatics, Computing, and Cyber Systems, Northern Arizona University , Flagstaff, AZ, United States of America; School of Earth and Sustainability, Northern Arizona University , Flagstaff, AZ, United States of America; Center for Ecosystem Science and Society, Northern Arizona University , Flagstaff, AZ, United States of AmericaComputational Sciences and Engineering Division, Oak Ridge National Laboratory , Oak Ridge, TN, United States of AmericaWoodwell Climate Research Center , Falmouth, MA, United States of AmericaCenter for Ecosystem Science and Society, Northern Arizona University , Flagstaff, AZ, United States of AmericaWoodwell Climate Research Center , Falmouth, MA, United States of AmericaEarth and Environmental Sciences Area, Lawrence Berkeley National Laboratory , Berkeley, CA, United States of AmericaCollege of Life and Environmental Sciences, University of Exeter , Exeter, United KingdomLand surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresented by models is the Arctic–Boreal region, which is a potential tipping point region in Earth’s climate system due to large permafrost carbon stocks that are vulnerable to release with climate warming. We use the International Land Model Benchmarking (ILAMB) framework to evaluate how the model skill of TRENDY-v9 models varies based on the choice of observational-based benchmark and how benchmarks are applied in model evaluation. This analysis uses global datasets integrated into ILAMB and new, regionally-specific observational products from the Arctic–Boreal Vulnerability Experiment. Our results cover the overall time period of 1979–2019 and show that model scores can vary substantially depending on the data product applied, with higher model scores indicating better model performance against observations. The lowest model scores occur when benchmarked against regional, compared to global, datasets. We also evaluate observed and modeled functional relationships between ecosystem respiration and air temperature and between gross primary production and precipitation. Here, we find that the magnitude and shape of the responses are strongly impacted by the choice of observational dataset and the approach used to construct the functional relationship benchmark. These results suggest that model evaluation studies could conclude a false sense of model skill if only using a single benchmark data product or if not applying regional data products when performing a regional model analysis. Collectively, our findings highlight the influence of benchmarking choices on model evaluation and point to the need for benchmarking guidelines when assessing model skill.https://doi.org/10.1088/2752-664X/adaceeModel validationILAMBArctic–Borealfunctional relationshipTRENDY
spellingShingle	Jeralyn Poe Deborah Huntzinger Nathan Collier Christopher Schwalm Jon Wells Christina Schädel William J Riley Stephen Sitch Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle Environmental Research: Ecology Model validation ILAMB Arctic–Boreal functional relationship TRENDY
title	Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_full	Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_fullStr	Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_full_unstemmed	Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_short	Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
title_sort	impacts of benchmarking choices on inferred model skill of the arctic boreal terrestrial carbon cycle
topic	Model validation ILAMB Arctic–Boreal functional relationship TRENDY
url	https://doi.org/10.1088/2752-664X/adacee
work_keys_str_mv	AT jeralynpoe impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT deborahhuntzinger impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT nathancollier impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT christopherschwalm impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT jonwells impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT christinaschadel impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT williamjriley impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT stephensitch impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle

Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle

Similar Items