Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data

In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically inter...

Full description

Saved in:
Bibliographic Details
Main Authors: Kameswara Bharadwaj Mantha, Hayley Roberts, Lucy Fortson, Chris Lintott, Hugh Dickinson, William Keel, Ramanakumar Sankar, Coleman Krawczyk, Brooke Simmons, Mike Walmsley, Izzy Garland, Jason Shingirai Makechemu, Laura Trouille, Clifford Johnson
Format: Article
Language:English
Published: Ubiquity Press 2024-12-01
Series:Citizen Science: Theory and Practice
Subjects:
Online Access:https://account.theoryandpractice.citizenscienceassociation.org/index.php/up-j-cstp/article/view/740
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850081130814898176
author Kameswara Bharadwaj Mantha
Hayley Roberts
Lucy Fortson
Chris Lintott
Hugh Dickinson
William Keel
Ramanakumar Sankar
Coleman Krawczyk
Brooke Simmons
Mike Walmsley
Izzy Garland
Jason Shingirai Makechemu
Laura Trouille
Clifford Johnson
author_facet Kameswara Bharadwaj Mantha
Hayley Roberts
Lucy Fortson
Chris Lintott
Hugh Dickinson
William Keel
Ramanakumar Sankar
Coleman Krawczyk
Brooke Simmons
Mike Walmsley
Izzy Garland
Jason Shingirai Makechemu
Laura Trouille
Clifford Johnson
author_sort Kameswara Bharadwaj Mantha
collection DOAJ
description In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.
format Article
id doaj-art-95caf762951d42e1b8b4aa3e85464744
institution DOAJ
issn 2057-4991
language English
publishDate 2024-12-01
publisher Ubiquity Press
record_format Article
series Citizen Science: Theory and Practice
spelling doaj-art-95caf762951d42e1b8b4aa3e854647442025-08-20T02:44:47ZengUbiquity PressCitizen Science: Theory and Practice2057-49912024-12-0191404010.5334/cstp.740722Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big DataKameswara Bharadwaj Mantha0https://orcid.org/0000-0002-6016-300XHayley Roberts1https://orcid.org/0000-0003-0046-9848Lucy Fortson2https://orcid.org/0000-0002-1067-8558Chris Lintott3https://orcid.org/0000-0001-5578-359XHugh Dickinson4https://orcid.org/0000-0003-0475-008XWilliam Keel5https://orcid.org/0000-0002-6131-9539Ramanakumar Sankar6https://orcid.org/0000-0002-6794-7587Coleman Krawczyk7https://orcid.org/0000-0001-9233-2341Brooke Simmons8https://orcid.org/0000-0001-5882-3323Mike Walmsley9https://orcid.org/0000-0002-6408-4181Izzy Garland10https://orcid.org/0000-0002-3887-6433Jason Shingirai Makechemu11https://orcid.org/0009-0009-6545-8710Laura Trouille12Clifford Johnson13https://orcid.org/0000-0002-0511-6737University of Minnesota-Twin CitiesUniversity of Minnesota-Twin CitiesUniversity of Minnesota-Twin CitiesUniversity of OxfordOpen UniversityUniversity of AlabamaUniversity of California BerkeleyUniversity of PortsmouthLancaster UniversityUniversity of TorontoLancaster UniversityLancaster UniversityAdler PlanetariumAdler PlanetariumIn the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.https://account.theoryandpractice.citizenscienceassociation.org/index.php/up-j-cstp/article/view/740deep learninganomaly detectionhuman-machine optimizationunsupervised learningastronomy imaging
spellingShingle Kameswara Bharadwaj Mantha
Hayley Roberts
Lucy Fortson
Chris Lintott
Hugh Dickinson
William Keel
Ramanakumar Sankar
Coleman Krawczyk
Brooke Simmons
Mike Walmsley
Izzy Garland
Jason Shingirai Makechemu
Laura Trouille
Clifford Johnson
Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
Citizen Science: Theory and Practice
deep learning
anomaly detection
human-machine optimization
unsupervised learning
astronomy imaging
title Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
title_full Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
title_fullStr Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
title_full_unstemmed Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
title_short Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
title_sort through the citizen scientists eyes insights into using citizen science with machine learning for effective identification of unknown unknowns in big data
topic deep learning
anomaly detection
human-machine optimization
unsupervised learning
astronomy imaging
url https://account.theoryandpractice.citizenscienceassociation.org/index.php/up-j-cstp/article/view/740
work_keys_str_mv AT kameswarabharadwajmantha throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT hayleyroberts throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT lucyfortson throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT chrislintott throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT hughdickinson throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT williamkeel throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT ramanakumarsankar throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT colemankrawczyk throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT brookesimmons throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT mikewalmsley throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT izzygarland throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT jasonshingiraimakechemu throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT lauratrouille throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata
AT cliffordjohnson throughthecitizenscientistseyesinsightsintousingcitizensciencewithmachinelearningforeffectiveidentificationofunknownunknownsinbigdata