Understanding the reliability of citizen science observational data using item response models

Abstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Edgar Santos‐Fernandez, Kerrie Mengersen
Format:	Article
Language:	English
Published:	Wiley 2021-08-01
Series:	Methods in Ecology and Evolution
Subjects:	ability estimation big data item response theory latent variable regression spatial model species difficulties
Online Access:	https://doi.org/10.1111/2041-210X.13623
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1825206709282406400
author	Edgar Santos‐Fernandez Kerrie Mengersen
author_facet	Edgar Santos‐Fernandez Kerrie Mengersen
author_sort	Edgar Santos‐Fernandez
collection	DOAJ
description	Abstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We introduce a new methodological framework of item response that quantifies a citizen scientist's ability, taking into account the difficulty of the task. We focus on citizen science programs involving the classification of images. Our approach accommodates spatial autocorrelation within the item difficulties, and provides deeper insights and relevant ecological measures of species and site‐related difficulties, discriminatory power and guessing behaviour. The identification of very capable versus less skilled participants can facilitate selective use of data in analyses and more targeted training programs for citizen scientists. This paper also addresses challenges in fitting such models to very large datasets. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy and WAIC, based on leave‐one‐out cross‐validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given, which allow extrapolation to a wide range of citizen science ecological problems.
format	Article
id	doaj-art-47f620e814ed40da87920c30fa218d91
institution	Kabale University
issn	2041-210X
language	English
publishDate	2021-08-01
publisher	Wiley
record_format	Article
series	Methods in Ecology and Evolution
spelling	doaj-art-47f620e814ed40da87920c30fa218d912025-02-07T06:21:06ZengWileyMethods in Ecology and Evolution2041-210X2021-08-011281533154810.1111/2041-210X.13623Understanding the reliability of citizen science observational data using item response modelsEdgar Santos‐Fernandez0Kerrie Mengersen1School of Mathematical Sciences Queensland University of Technology Brisbane Qld AustraliaSchool of Mathematical Sciences Queensland University of Technology Brisbane Qld AustraliaAbstract Citizen science projects have become increasingly popular in many fields, including ecology. However, the quality of this information is frequently debated within the scientific community. Modern citizen science implementations therefore require measures of the users' proficiency. We introduce a new methodological framework of item response that quantifies a citizen scientist's ability, taking into account the difficulty of the task. We focus on citizen science programs involving the classification of images. Our approach accommodates spatial autocorrelation within the item difficulties, and provides deeper insights and relevant ecological measures of species and site‐related difficulties, discriminatory power and guessing behaviour. The identification of very capable versus less skilled participants can facilitate selective use of data in analyses and more targeted training programs for citizen scientists. This paper also addresses challenges in fitting such models to very large datasets. We found that the suggested methods outperform the traditional item response models in terms of RMSE, accuracy and WAIC, based on leave‐one‐out cross‐validation on simulated and empirical data. We present a comprehensive implementation using a case study of species identification in the Serengeti, Tanzania. The R and Stan codes are provided for full reproducibility. Multiple statistical illustrations and visualizations are given, which allow extrapolation to a wide range of citizen science ecological problems.https://doi.org/10.1111/2041-210X.13623ability estimationbig dataitem response theorylatent variable regressionspatial modelspecies difficulties
spellingShingle	Edgar Santos‐Fernandez Kerrie Mengersen Understanding the reliability of citizen science observational data using item response models Methods in Ecology and Evolution ability estimation big data item response theory latent variable regression spatial model species difficulties
title	Understanding the reliability of citizen science observational data using item response models
title_full	Understanding the reliability of citizen science observational data using item response models
title_fullStr	Understanding the reliability of citizen science observational data using item response models
title_full_unstemmed	Understanding the reliability of citizen science observational data using item response models
title_short	Understanding the reliability of citizen science observational data using item response models
title_sort	understanding the reliability of citizen science observational data using item response models
topic	ability estimation big data item response theory latent variable regression spatial model species difficulties
url	https://doi.org/10.1111/2041-210X.13623
work_keys_str_mv	AT edgarsantosfernandez understandingthereliabilityofcitizenscienceobservationaldatausingitemresponsemodels AT kerriemengersen understandingthereliabilityofcitizenscienceobservationaldatausingitemresponsemodels

Understanding the reliability of citizen science observational data using item response models

Similar Items