Conformal taxonomic validation: A semi-automated validation framework for citizen science records
Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scienti...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2025-12-01
|
| Series: | Ecological Informatics |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S1574954125002997 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849233618892750848 |
|---|---|
| author | Matthieu de Castelbajac Sandra Bringay Arnaud Sallaberry Maximilien Servajean Clémence Epinoux Juan Carlos Molinero Delphine Bonnet |
| author_facet | Matthieu de Castelbajac Sandra Bringay Arnaud Sallaberry Maximilien Servajean Clémence Epinoux Juan Carlos Molinero Delphine Bonnet |
| author_sort | Matthieu de Castelbajac |
| collection | DOAJ |
| description | Citizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scientific research. Validating large volumes of citizen science data remains an important challenge. In this paper, we present a semi-automated validation framework that combines a deep learning classifier with conformal prediction to generate sets of plausible taxonomic labels at multiple ranks, while providing rigorous control over prediction confidence. Extensive evaluation was carried out using 25,000 jellyfish records, both with and without prior validation, as well as against 800 expert-validated entries. Our results show that the method frequently produces singleton prediction sets that can be accepted automatically, offering a high-confidence and scalable solution for validating marine citizen science data. |
| format | Article |
| id | doaj-art-e3230fa8d25f4be3b7157a5d5b06985f |
| institution | Kabale University |
| issn | 1574-9541 |
| language | English |
| publishDate | 2025-12-01 |
| publisher | Elsevier |
| record_format | Article |
| series | Ecological Informatics |
| spelling | doaj-art-e3230fa8d25f4be3b7157a5d5b06985f2025-08-20T05:05:35ZengElsevierEcological Informatics1574-95412025-12-019010329010.1016/j.ecoinf.2025.103290Conformal taxonomic validation: A semi-automated validation framework for citizen science recordsMatthieu de Castelbajac0Sandra Bringay1Arnaud Sallaberry2Maximilien Servajean3Clémence Epinoux4Juan Carlos Molinero5Delphine Bonnet6LIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; Corresponding author.LIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; AMIS, Univ. Montpellier Paul-Valéry, Route de Mende, Montpellier, 34095, FranceLIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; AMIS, Univ. Montpellier Paul-Valéry, Route de Mende, Montpellier, 34095, FranceLIRMM, Univ. Montpellier, CNRS, 161 Rue Ada, Montpellier, 34095, France; AMIS, Univ. Montpellier Paul-Valéry, Route de Mende, Montpellier, 34095, FranceMARBEC, Univ. Montpellier, Place Eugene Bataillon, Montpellier, 34090, FranceMARBEC, Univ. Montpellier, Place Eugene Bataillon, Montpellier, 34090, FranceMARBEC, Univ. Montpellier, Place Eugene Bataillon, Montpellier, 34090, FranceCitizen science records are a valuable source of marine biodiversity data, especially where standardized sampling campaigns are limited in spatial or temporal scope. However, such records often contain biases and errors and typically require expert validation before they can reliably support scientific research. Validating large volumes of citizen science data remains an important challenge. In this paper, we present a semi-automated validation framework that combines a deep learning classifier with conformal prediction to generate sets of plausible taxonomic labels at multiple ranks, while providing rigorous control over prediction confidence. Extensive evaluation was carried out using 25,000 jellyfish records, both with and without prior validation, as well as against 800 expert-validated entries. Our results show that the method frequently produces singleton prediction sets that can be accepted automatically, offering a high-confidence and scalable solution for validating marine citizen science data.http://www.sciencedirect.com/science/article/pii/S1574954125002997Citizen scienceDeep-learningSpecies identificationHierarchical classificationConformal prediction |
| spellingShingle | Matthieu de Castelbajac Sandra Bringay Arnaud Sallaberry Maximilien Servajean Clémence Epinoux Juan Carlos Molinero Delphine Bonnet Conformal taxonomic validation: A semi-automated validation framework for citizen science records Ecological Informatics Citizen science Deep-learning Species identification Hierarchical classification Conformal prediction |
| title | Conformal taxonomic validation: A semi-automated validation framework for citizen science records |
| title_full | Conformal taxonomic validation: A semi-automated validation framework for citizen science records |
| title_fullStr | Conformal taxonomic validation: A semi-automated validation framework for citizen science records |
| title_full_unstemmed | Conformal taxonomic validation: A semi-automated validation framework for citizen science records |
| title_short | Conformal taxonomic validation: A semi-automated validation framework for citizen science records |
| title_sort | conformal taxonomic validation a semi automated validation framework for citizen science records |
| topic | Citizen science Deep-learning Species identification Hierarchical classification Conformal prediction |
| url | http://www.sciencedirect.com/science/article/pii/S1574954125002997 |
| work_keys_str_mv | AT matthieudecastelbajac conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords AT sandrabringay conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords AT arnaudsallaberry conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords AT maximilienservajean conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords AT clemenceepinoux conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords AT juancarlosmolinero conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords AT delphinebonnet conformaltaxonomicvalidationasemiautomatedvalidationframeworkforcitizensciencerecords |