Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle
Land surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresent...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IOP Publishing
2025-01-01
|
Series: | Environmental Research: Ecology |
Subjects: | |
Online Access: | https://doi.org/10.1088/2752-664X/adacee |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832542913212448768 |
---|---|
author | Jeralyn Poe Deborah Huntzinger Nathan Collier Christopher Schwalm Jon Wells Christina Schädel William J Riley Stephen Sitch |
author_facet | Jeralyn Poe Deborah Huntzinger Nathan Collier Christopher Schwalm Jon Wells Christina Schädel William J Riley Stephen Sitch |
author_sort | Jeralyn Poe |
collection | DOAJ |
description | Land surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresented by models is the Arctic–Boreal region, which is a potential tipping point region in Earth’s climate system due to large permafrost carbon stocks that are vulnerable to release with climate warming. We use the International Land Model Benchmarking (ILAMB) framework to evaluate how the model skill of TRENDY-v9 models varies based on the choice of observational-based benchmark and how benchmarks are applied in model evaluation. This analysis uses global datasets integrated into ILAMB and new, regionally-specific observational products from the Arctic–Boreal Vulnerability Experiment. Our results cover the overall time period of 1979–2019 and show that model scores can vary substantially depending on the data product applied, with higher model scores indicating better model performance against observations. The lowest model scores occur when benchmarked against regional, compared to global, datasets. We also evaluate observed and modeled functional relationships between ecosystem respiration and air temperature and between gross primary production and precipitation. Here, we find that the magnitude and shape of the responses are strongly impacted by the choice of observational dataset and the approach used to construct the functional relationship benchmark. These results suggest that model evaluation studies could conclude a false sense of model skill if only using a single benchmark data product or if not applying regional data products when performing a regional model analysis. Collectively, our findings highlight the influence of benchmarking choices on model evaluation and point to the need for benchmarking guidelines when assessing model skill. |
format | Article |
id | doaj-art-b42b77d68be643ad87ef794eaa5d2894 |
institution | Kabale University |
issn | 2752-664X |
language | English |
publishDate | 2025-01-01 |
publisher | IOP Publishing |
record_format | Article |
series | Environmental Research: Ecology |
spelling | doaj-art-b42b77d68be643ad87ef794eaa5d28942025-02-03T12:13:19ZengIOP PublishingEnvironmental Research: Ecology2752-664X2025-01-014101500710.1088/2752-664X/adaceeImpacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycleJeralyn Poe0https://orcid.org/0000-0003-1849-5278Deborah Huntzinger1Nathan Collier2Christopher Schwalm3Jon Wells4https://orcid.org/0000-0001-5037-8798Christina Schädel5https://orcid.org/0000-0003-2145-6210William J Riley6https://orcid.org/0000-0002-4615-2304Stephen Sitch7School of Informatics, Computing, and Cyber Systems, Northern Arizona University , Flagstaff, AZ, United States of AmericaSchool of Informatics, Computing, and Cyber Systems, Northern Arizona University , Flagstaff, AZ, United States of America; School of Earth and Sustainability, Northern Arizona University , Flagstaff, AZ, United States of America; Center for Ecosystem Science and Society, Northern Arizona University , Flagstaff, AZ, United States of AmericaComputational Sciences and Engineering Division, Oak Ridge National Laboratory , Oak Ridge, TN, United States of AmericaWoodwell Climate Research Center , Falmouth, MA, United States of AmericaCenter for Ecosystem Science and Society, Northern Arizona University , Flagstaff, AZ, United States of AmericaWoodwell Climate Research Center , Falmouth, MA, United States of AmericaEarth and Environmental Sciences Area, Lawrence Berkeley National Laboratory , Berkeley, CA, United States of AmericaCollege of Life and Environmental Sciences, University of Exeter , Exeter, United KingdomLand surface models require continuous validation against observations to improve and reduce simulation uncertainty. However, inferred model performance can be heavily influenced by subjective choices made in the selection and application of observational data products. A key area often misrepresented by models is the Arctic–Boreal region, which is a potential tipping point region in Earth’s climate system due to large permafrost carbon stocks that are vulnerable to release with climate warming. We use the International Land Model Benchmarking (ILAMB) framework to evaluate how the model skill of TRENDY-v9 models varies based on the choice of observational-based benchmark and how benchmarks are applied in model evaluation. This analysis uses global datasets integrated into ILAMB and new, regionally-specific observational products from the Arctic–Boreal Vulnerability Experiment. Our results cover the overall time period of 1979–2019 and show that model scores can vary substantially depending on the data product applied, with higher model scores indicating better model performance against observations. The lowest model scores occur when benchmarked against regional, compared to global, datasets. We also evaluate observed and modeled functional relationships between ecosystem respiration and air temperature and between gross primary production and precipitation. Here, we find that the magnitude and shape of the responses are strongly impacted by the choice of observational dataset and the approach used to construct the functional relationship benchmark. These results suggest that model evaluation studies could conclude a false sense of model skill if only using a single benchmark data product or if not applying regional data products when performing a regional model analysis. Collectively, our findings highlight the influence of benchmarking choices on model evaluation and point to the need for benchmarking guidelines when assessing model skill.https://doi.org/10.1088/2752-664X/adaceeModel validationILAMBArctic–Borealfunctional relationshipTRENDY |
spellingShingle | Jeralyn Poe Deborah Huntzinger Nathan Collier Christopher Schwalm Jon Wells Christina Schädel William J Riley Stephen Sitch Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle Environmental Research: Ecology Model validation ILAMB Arctic–Boreal functional relationship TRENDY |
title | Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle |
title_full | Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle |
title_fullStr | Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle |
title_full_unstemmed | Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle |
title_short | Impacts of benchmarking choices on inferred model skill of the Arctic–Boreal terrestrial carbon cycle |
title_sort | impacts of benchmarking choices on inferred model skill of the arctic boreal terrestrial carbon cycle |
topic | Model validation ILAMB Arctic–Boreal functional relationship TRENDY |
url | https://doi.org/10.1088/2752-664X/adacee |
work_keys_str_mv | AT jeralynpoe impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT deborahhuntzinger impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT nathancollier impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT christopherschwalm impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT jonwells impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT christinaschadel impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT williamjriley impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle AT stephensitch impactsofbenchmarkingchoicesoninferredmodelskillofthearcticborealterrestrialcarboncycle |