Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments

Introduction We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Re...

Full description

Saved in:
Bibliographic Details
Main Authors: Katherine O'Sullivan, Milan Markovic, Jaroslaw Dymiter, Bernhard Scheliga, Chinasa Odo, Katie Wilde
Format: Article
Language:English
Published: Swansea University 2025-02-01
Series:International Journal of Population Data Science
Subjects:
Online Access:https://ijpds.org/article/view/2464
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823860638306271232
author Katherine O'Sullivan
Milan Markovic
Jaroslaw Dymiter
Bernhard Scheliga
Chinasa Odo
Katie Wilde
author_facet Katherine O'Sullivan
Milan Markovic
Jaroslaw Dymiter
Bernhard Scheliga
Chinasa Odo
Katie Wilde
author_sort Katherine O'Sullivan
collection DOAJ
description Introduction We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE). Methods Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow. Results The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors. Conclusion This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance.
format Article
id doaj-art-52da628051ad49299472307ab2c537a3
institution Kabale University
issn 2399-4908
language English
publishDate 2025-02-01
publisher Swansea University
record_format Article
series International Journal of Population Data Science
spelling doaj-art-52da628051ad49299472307ab2c537a32025-02-10T10:51:49ZengSwansea UniversityInternational Journal of Population Data Science2399-49082025-02-0110210.23889/ijpds.v10i2.2464Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research EnvironmentsKatherine O'Sullivan0Milan Markovic1https://orcid.org/0000-0002-5477-287XJaroslaw Dymiter2Bernhard Scheliga3https://orcid.org/0000-0003-2764-6605Chinasa Odo4https://orcid.org/0000-0002-0770-0806Katie Wilde5https://orcid.org/0000-0001-5024-8846University of Sheffield, Research and Innovation IT, 10-12 Brunswick St, Sheffield, S10 2FNUniversity of Aberdeen, Department of Natural and Computing Sciences, 32 Elphinstone Rd, Aberdeen AB24 3EUUniversity of Aberdeen, Grampian Data Safe Haven, Health Sciences Building, Foresterhill, Aberdeen, AB25 2ZDUniversity of Aberdeen, Grampian Data Safe Haven, Health Sciences Building, Foresterhill, Aberdeen, AB25 2ZDUniversity of Bradford, Faculty of Health Studies, Bradford, BD7 1DPUniversity of Aberdeen, Grampian Data Safe Haven, Health Sciences Building, Foresterhill, Aberdeen, AB25 2ZD Introduction We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE). Methods Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow. Results The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors. Conclusion This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance. https://ijpds.org/article/view/2464Data provenancesemi-automationtrusted research environmentssecure data environmentssafe havendata linkage
spellingShingle Katherine O'Sullivan
Milan Markovic
Jaroslaw Dymiter
Bernhard Scheliga
Chinasa Odo
Katie Wilde
Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments
International Journal of Population Data Science
Data provenance
semi-automation
trusted research environments
secure data environments
safe haven
data linkage
title Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments
title_full Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments
title_fullStr Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments
title_full_unstemmed Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments
title_short Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments
title_sort semi automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in trusted research environments
topic Data provenance
semi-automation
trusted research environments
secure data environments
safe haven
data linkage
url https://ijpds.org/article/view/2464
work_keys_str_mv AT katherineosullivan semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments
AT milanmarkovic semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments
AT jaroslawdymiter semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments
AT bernhardscheliga semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments
AT chinasaodo semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments
AT katiewilde semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments