Understanding the reliability of citizen science observational data using item response models
Abstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We i...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-08-01
|
Series: | Methods in Ecology and Evolution |
Subjects: | |
Online Access: | https://doi.org/10.1111/2041-210X.13623 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1825206709282406400 |
---|---|
author | Edgar Santos‐Fernandez Kerrie Mengersen |
author_facet | Edgar Santos‐Fernandez Kerrie Mengersen |
author_sort | Edgar Santos‐Fernandez |
collection | DOAJ |
description | Abstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We introduce a new methodological framework of item response that quantifies a citizen scientist's ability, taking into account the difficulty of the task. We focus on citizen science programs involving the classification of images. Our approach accommodates spatial autocorrelation within the item difficulties, and provides deeper insights and relevant ecological measures of species and site‐related difficulties, discriminatory power and guessing behaviour. The identification of very capable versus less skilled participants can facilitate selective use of data in analyses and more targeted training programs for citizen scientists. This paper also addresses challenges in fitting such models to very large datasets. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy and WAIC, based on leave‐one‐out cross‐validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given, which allow extrapolation to a wide range of citizen science ecological problems. |
format | Article |
id | doaj-art-47f620e814ed40da87920c30fa218d91 |
institution | Kabale University |
issn | 2041-210X |
language | English |
publishDate | 2021-08-01 |
publisher | Wiley |
record_format | Article |
series | Methods in Ecology and Evolution |
spelling | doaj-art-47f620e814ed40da87920c30fa218d912025-02-07T06:21:06ZengWileyMethods in Ecology and Evolution2041-210X2021-08-011281533154810.1111/2041-210X.13623Understanding the reliability of citizen science observational data using item response modelsEdgar Santos‐Fernandez0Kerrie Mengersen1School of Mathematical Sciences Queensland University of Technology Brisbane Qld AustraliaSchool of Mathematical Sciences Queensland University of Technology Brisbane Qld AustraliaAbstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We introduce a new methodological framework of item response that quantifies a citizen scientist's ability, taking into account the difficulty of the task. We focus on citizen science programs involving the classification of images. Our approach accommodates spatial autocorrelation within the item difficulties, and provides deeper insights and relevant ecological measures of species and site‐related difficulties, discriminatory power and guessing behaviour. The identification of very capable versus less skilled participants can facilitate selective use of data in analyses and more targeted training programs for citizen scientists. This paper also addresses challenges in fitting such models to very large datasets. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy and WAIC, based on leave‐one‐out cross‐validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given, which allow extrapolation to a wide range of citizen science ecological problems.https://doi.org/10.1111/2041-210X.13623ability estimationbig dataitem response theorylatent variable regressionspatial modelspecies difficulties |
spellingShingle | Edgar Santos‐Fernandez Kerrie Mengersen Understanding the reliability of citizen science observational data using item response models Methods in Ecology and Evolution ability estimation big data item response theory latent variable regression spatial model species difficulties |
title | Understanding the reliability of citizen science observational data using item response models |
title_full | Understanding the reliability of citizen science observational data using item response models |
title_fullStr | Understanding the reliability of citizen science observational data using item response models |
title_full_unstemmed | Understanding the reliability of citizen science observational data using item response models |
title_short | Understanding the reliability of citizen science observational data using item response models |
title_sort | understanding the reliability of citizen science observational data using item response models |
topic | ability estimation big data item response theory latent variable regression spatial model species difficulties |
url | https://doi.org/10.1111/2041-210X.13623 |
work_keys_str_mv | AT edgarsantosfernandez understandingthereliabilityofcitizenscienceobservationaldatausingitemresponsemodels AT kerriemengersen understandingthereliabilityofcitizenscienceobservationaldatausingitemresponsemodels |