Live research dialogue on the benefits, costs and utility of synthetic data for researchers

Objectives • To explore use cases and characteristics of synthetic data that make it useful for research. • To discuss governance frameworks essential for the routine creation, dissemination, and use of synthetic data. • Drawing on the experience of international participants, explore measu...

Full description

Saved in:
Bibliographic Details
Main Authors: Emily Oliver, Fiona Lugg-Widger, Maureen Haaker, Cristina Magder, Emma Gordon
Format: Article
Language:English
Published: Swansea University 2024-11-01
Series:International Journal of Population Data Science
Online Access:https://ijpds.org/article/view/2942
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850192544060669952
author Emily Oliver
Fiona Lugg-Widger
Maureen Haaker
Cristina Magder
Emma Gordon
author_facet Emily Oliver
Fiona Lugg-Widger
Maureen Haaker
Cristina Magder
Emma Gordon
author_sort Emily Oliver
collection DOAJ
description Objectives • To explore use cases and characteristics of synthetic data that make it useful for research. • To discuss governance frameworks essential for the routine creation, dissemination, and use of synthetic data. • Drawing on the experience of international participants, explore measures to mitigate risks to data privacy and public perception relating to synthetic data creation and use whilst maximising its utility. Approach Presenters provided context and insights on definitions and interpretations of synthetic data, including debates about fidelity and disclosure risk; known benefits of synthetic data and use cases; and its costs and challenges from the perspective of data owners, data providers and the public. This informed discussions in breakout groups aligned to the objectives. Participants self-identified as data service providers/processors, academic researchers, data owners, and/or ‘other’. Results • Characteristics identified as important across a variety of use cases included accessibility, structure and format to match the real data, and documentation. • The most popular measure to mitigate risks was clear and detailed documentation (metadata, codebooks, user guides, limitations, creation methods etc). • Training was identified as an important benefit and use case of synthetic data across the participant groups. • The most popular challenge identified by data service providers/processors and ‘other’ was lack of governance and standards; whilst for researchers, this was the verification of the synthetic data and the potential for it to be incorrectly interpreted. Conclusions There is a demand across stakeholder groups for synthetic data for a range of uses. Synthetic datasets should be accompanied by clear and detailed documentation and provided within an agreed governance framework.
format Article
id doaj-art-e490a5e1760e4b1ab286e335dcaa4398
institution OA Journals
issn 2399-4908
language English
publishDate 2024-11-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj-art-e490a5e1760e4b1ab286e335dcaa43982025-08-20T02:14:31ZengSwansea UniversityInternational Journal of Population Data Science2399-49082024-11-019510.23889/ijpds.v9i5.2942Live research dialogue on the benefits, costs and utility of synthetic data for researchersEmily Oliver0Fiona Lugg-Widger1Maureen Haaker2Cristina Magder3Emma Gordon4ADR UKCardiff UniversityUniversity of EssexUniversity of EssexADR UK Objectives • To explore use cases and characteristics of synthetic data that make it useful for research. • To discuss governance frameworks essential for the routine creation, dissemination, and use of synthetic data. • Drawing on the experience of international participants, explore measures to mitigate risks to data privacy and public perception relating to synthetic data creation and use whilst maximising its utility. Approach Presenters provided context and insights on definitions and interpretations of synthetic data, including debates about fidelity and disclosure risk; known benefits of synthetic data and use cases; and its costs and challenges from the perspective of data owners, data providers and the public. This informed discussions in breakout groups aligned to the objectives. Participants self-identified as data service providers/processors, academic researchers, data owners, and/or ‘other’. Results • Characteristics identified as important across a variety of use cases included accessibility, structure and format to match the real data, and documentation. • The most popular measure to mitigate risks was clear and detailed documentation (metadata, codebooks, user guides, limitations, creation methods etc). • Training was identified as an important benefit and use case of synthetic data across the participant groups. • The most popular challenge identified by data service providers/processors and ‘other’ was lack of governance and standards; whilst for researchers, this was the verification of the synthetic data and the potential for it to be incorrectly interpreted. Conclusions There is a demand across stakeholder groups for synthetic data for a range of uses. Synthetic datasets should be accompanied by clear and detailed documentation and provided within an agreed governance framework. https://ijpds.org/article/view/2942
spellingShingle Emily Oliver
Fiona Lugg-Widger
Maureen Haaker
Cristina Magder
Emma Gordon
Live research dialogue on the benefits, costs and utility of synthetic data for researchers
International Journal of Population Data Science
title Live research dialogue on the benefits, costs and utility of synthetic data for researchers
title_full Live research dialogue on the benefits, costs and utility of synthetic data for researchers
title_fullStr Live research dialogue on the benefits, costs and utility of synthetic data for researchers
title_full_unstemmed Live research dialogue on the benefits, costs and utility of synthetic data for researchers
title_short Live research dialogue on the benefits, costs and utility of synthetic data for researchers
title_sort live research dialogue on the benefits costs and utility of synthetic data for researchers
url https://ijpds.org/article/view/2942
work_keys_str_mv AT emilyoliver liveresearchdialogueonthebenefitscostsandutilityofsyntheticdataforresearchers
AT fionaluggwidger liveresearchdialogueonthebenefitscostsandutilityofsyntheticdataforresearchers
AT maureenhaaker liveresearchdialogueonthebenefitscostsandutilityofsyntheticdataforresearchers
AT cristinamagder liveresearchdialogueonthebenefitscostsandutilityofsyntheticdataforresearchers
AT emmagordon liveresearchdialogueonthebenefitscostsandutilityofsyntheticdataforresearchers