Understanding the reliability of citizen science observational data using item response models

Abstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We i...

Full description

Saved in:
Bibliographic Details
Main Authors: Edgar Santos‐Fernandez, Kerrie Mengersen
Format: Article
Language:English
Published: Wiley 2021-08-01
Series:Methods in Ecology and Evolution
Subjects:
Online Access:https://doi.org/10.1111/2041-210X.13623
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1825206709282406400
author Edgar Santos‐Fernandez
Kerrie Mengersen
author_facet Edgar Santos‐Fernandez
Kerrie Mengersen
author_sort Edgar Santos‐Fernandez
collection DOAJ
description Abstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We introduce a new methodological framework of item response that quantifies a citizen scientist's ability, taking into account the difficulty of the task. We focus on citizen science programs involving the classification of images. Our approach accommodates spatial autocorrelation within the item difficulties, and provides deeper insights and relevant ecological measures of species and site‐related difficulties, discriminatory power and guessing behaviour. The identification of very capable versus less skilled participants can facilitate selective use of data in analyses and more targeted training programs for citizen scientists. This paper also addresses challenges in fitting such models to very large datasets. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy and WAIC, based on leave‐one‐out cross‐validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given, which allow extrapolation to a wide range of citizen science ecological problems.
format Article
id doaj-art-47f620e814ed40da87920c30fa218d91
institution Kabale University
issn 2041-210X
language English
publishDate 2021-08-01
publisher Wiley
record_format Article
series Methods in Ecology and Evolution
spelling doaj-art-47f620e814ed40da87920c30fa218d912025-02-07T06:21:06ZengWileyMethods in Ecology and Evolution2041-210X2021-08-011281533154810.1111/2041-210X.13623Understanding the reliability of citizen science observational data using item response modelsEdgar Santos‐Fernandez0Kerrie Mengersen1School of Mathematical Sciences Queensland University of Technology Brisbane Qld AustraliaSchool of Mathematical Sciences Queensland University of Technology Brisbane Qld AustraliaAbstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We introduce a new methodological framework of item response that quantifies a citizen scientist's ability, taking into account the difficulty of the task. We focus on citizen science programs involving the classification of images. Our approach accommodates spatial autocorrelation within the item difficulties, and provides deeper insights and relevant ecological measures of species and site‐related difficulties, discriminatory power and guessing behaviour. The identification of very capable versus less skilled participants can facilitate selective use of data in analyses and more targeted training programs for citizen scientists. This paper also addresses challenges in fitting such models to very large datasets. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy and WAIC, based on leave‐one‐out cross‐validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given, which allow extrapolation to a wide range of citizen science ecological problems.https://doi.org/10.1111/2041-210X.13623ability estimationbig dataitem response theorylatent variable regressionspatial modelspecies difficulties
spellingShingle Edgar Santos‐Fernandez
Kerrie Mengersen
Understanding the reliability of citizen science observational data using item response models
Methods in Ecology and Evolution
ability estimation
big data
item response theory
latent variable regression
spatial model
species difficulties
title Understanding the reliability of citizen science observational data using item response models
title_full Understanding the reliability of citizen science observational data using item response models
title_fullStr Understanding the reliability of citizen science observational data using item response models
title_full_unstemmed Understanding the reliability of citizen science observational data using item response models
title_short Understanding the reliability of citizen science observational data using item response models
title_sort understanding the reliability of citizen science observational data using item response models
topic ability estimation
big data
item response theory
latent variable regression
spatial model
species difficulties
url https://doi.org/10.1111/2041-210X.13623
work_keys_str_mv AT edgarsantosfernandez understandingthereliabilityofcitizenscienceobservationaldatausingitemresponsemodels
AT kerriemengersen understandingthereliabilityofcitizenscienceobservationaldatausingitemresponsemodels