Conformal taxonomic validation: A semi-automated validation framework for citizen science records

Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scienti...

Full description

Saved in:
Bibliographic Details
Main Authors: Matthieu de Castelbajac, Sandra Bringay, Arnaud Sallaberry, Maximilien Servajean, Clémence Epinoux, Juan Carlos Molinero, Delphine Bonnet
Format: Article
Language:English
Published: Elsevier 2025-12-01
Series:Ecological Informatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1574954125002997
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849233618892750848
author Matthieu de Castelbajac
Sandra Bringay
Arnaud Sallaberry
Maximilien Servajean
Clémence Epinoux
Juan Carlos Molinero
Delphine Bonnet
author_facet Matthieu de Castelbajac
Sandra Bringay
Arnaud Sallaberry
Maximilien Servajean
Clémence Epinoux
Juan Carlos Molinero
Delphine Bonnet
author_sort Matthieu de Castelbajac
collection DOAJ
description Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scientific research. Validating large volumes of citizen science data remains an important challenge. In this paper, we present a semi-automated validation framework that combines a deep learning classifier with conformal prediction to generate sets of plausible taxonomic labels at multiple ranks, while providing rigorous control over prediction confidence. Extensive evaluation was carried out using 25,000 jellyfish records, both with and without prior validation, as well as against 800 expert-validated entries. Our results show that the method frequently produces singleton prediction sets that can be accepted automatically, offering a high-confidence and scalable solution for validating marine citizen science data.
format Article
id doaj-art-e3230fa8d25f4be3b7157a5d5b06985f
institution Kabale University
issn 1574-9541
language English
publishDate 2025-12-01
publisher Elsevier
record_format Article
series Ecological Informatics
spelling doaj-art-e3230fa8d25f4be3b7157a5d5b06985f2025-08-20T05:05:35ZengElsevierEcological Informatics1574-95412025-12-019010329010.1016/j.ecoinf.2025.103290Conformal taxonomic validation: A semi-automated validation framework for citizen science recordsMatthieu de Castelbajac0Sandra Bringay1Arnaud Sallaberry2Maximilien Servajean3Clémence Epinoux4Juan Carlos Molinero5Delphine Bonnet6LIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; Corresponding author.LIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; AMIS, Univ. Montpellier Paul-Valéry, Route de Mende, Montpellier, 34095, FranceLIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; AMIS, Univ. Montpellier Paul-Valéry, Route de Mende, Montpellier, 34095, FranceLIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; AMIS, Univ. Montpellier Paul-Valéry, Route de Mende, Montpellier, 34095, FranceMARBEC, Univ. Montpellier, Place Eugene Bataillon, Montpellier, 34090, FranceMARBEC, Univ. Montpellier, Place Eugene Bataillon, Montpellier, 34090, FranceMARBEC, Univ. Montpellier, Place Eugene Bataillon, Montpellier, 34090, FranceCitizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scientific research. Validating large volumes of citizen science data remains an important challenge. In this paper, we present a semi-automated validation framework that combines a deep learning classifier with conformal prediction to generate sets of plausible taxonomic labels at multiple ranks, while providing rigorous control over prediction confidence. Extensive evaluation was carried out using 25,000 jellyfish records, both with and without prior validation, as well as against 800 expert-validated entries. Our results show that the method frequently produces singleton prediction sets that can be accepted automatically, offering a high-confidence and scalable solution for validating marine citizen science data.http://www.sciencedirect.com/science/article/pii/S1574954125002997Citizen scienceDeep-learningSpecies identificationHierarchical classificationConformal prediction
spellingShingle Matthieu de Castelbajac
Sandra Bringay
Arnaud Sallaberry
Maximilien Servajean
Clémence Epinoux
Juan Carlos Molinero
Delphine Bonnet
Conformal taxonomic validation: A semi-automated validation framework for citizen science records
Ecological Informatics
Citizen science
Deep-learning
Species identification
Hierarchical classification
Conformal prediction
title Conformal taxonomic validation: A semi-automated validation framework for citizen science records
title_full Conformal taxonomic validation: A semi-automated validation framework for citizen science records
title_fullStr Conformal taxonomic validation: A semi-automated validation framework for citizen science records
title_full_unstemmed Conformal taxonomic validation: A semi-automated validation framework for citizen science records
title_short Conformal taxonomic validation: A semi-automated validation framework for citizen science records
title_sort conformal taxonomic validation a semi automated validation framework for citizen science records
topic Citizen science
Deep-learning
Species identification
Hierarchical classification
Conformal prediction
url http://www.sciencedirect.com/science/article/pii/S1574954125002997
work_keys_str_mv AT matthieudecastelbajac conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords
AT sandrabringay conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords
AT arnaudsallaberry conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords
AT maximilienservajean conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords
AT clemenceepinoux conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords
AT juancarlosmolinero conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords
AT delphinebonnet conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords