Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments
Introduction We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Re...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Swansea University
2025-02-01
|
Series: | International Journal of Population Data Science |
Subjects: | |
Online Access: | https://ijpds.org/article/view/2464 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823860638306271232 |
---|---|
author | Katherine O'Sullivan Milan Markovic Jaroslaw Dymiter Bernhard Scheliga Chinasa Odo Katie Wilde |
author_facet | Katherine O'Sullivan Milan Markovic Jaroslaw Dymiter Bernhard Scheliga Chinasa Odo Katie Wilde |
author_sort | Katherine O'Sullivan |
collection | DOAJ |
description |
Introduction
We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE).
Methods
Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow.
Results
The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors.
Conclusion
This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance.
|
format | Article |
id | doaj-art-52da628051ad49299472307ab2c537a3 |
institution | Kabale University |
issn | 2399-4908 |
language | English |
publishDate | 2025-02-01 |
publisher | Swansea University |
record_format | Article |
series | International Journal of Population Data Science |
spelling | doaj-art-52da628051ad49299472307ab2c537a32025-02-10T10:51:49ZengSwansea UniversityInternational Journal of Population Data Science2399-49082025-02-0110210.23889/ijpds.v10i2.2464Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research EnvironmentsKatherine O'Sullivan0Milan Markovic1https://orcid.org/0000-0002-5477-287XJaroslaw Dymiter2Bernhard Scheliga3https://orcid.org/0000-0003-2764-6605Chinasa Odo4https://orcid.org/0000-0002-0770-0806Katie Wilde5https://orcid.org/0000-0001-5024-8846University of Sheffield, Research and Innovation IT, 10-12 Brunswick St, Sheffield, S10 2FNUniversity of Aberdeen, Department of Natural and Computing Sciences, 32 Elphinstone Rd, Aberdeen AB24 3EUUniversity of Aberdeen, Grampian Data Safe Haven, Health Sciences Building, Foresterhill, Aberdeen, AB25 2ZDUniversity of Aberdeen, Grampian Data Safe Haven, Health Sciences Building, Foresterhill, Aberdeen, AB25 2ZDUniversity of Bradford, Faculty of Health Studies, Bradford, BD7 1DPUniversity of Aberdeen, Grampian Data Safe Haven, Health Sciences Building, Foresterhill, Aberdeen, AB25 2ZD Introduction We present a prototype solution for improving transparency and quality assurance of the data linkage process through data provenance tracking designed to assist Data Analysts, researchers and information governance teams in authenticating and auditing data workflows within a Trusted Research Environment (TRE). Methods Using a participatory design process with Data Analysts, researchers and information governance teams, we undertook a contextual inquiry, user requirements interviews, co-design workshops, low-fidelity prototype evaluations. Public Engagement and Involvement activities underpinned the methods to ensure the project and approaches met the public's trust for semi-automating data processing. These helped inform methods for technical implementation, applying the PROV-O ontology to create a derived ontology following the four-step Linked Open Terms methodology and development of automated scripts to collect provenance information for the data processing workflow. Results The resulting Provenance Explorer for Trusted Research Environments (PE-TRE) interactive tool displays the data linkage information extracted from a knowledge graph described using the derived SHP ontology and results of rule-based validation checks. User evaluations confirmed PE-TRE would contribute to better quality data linkage and reduce data processing errors. Conclusion This project demonstrates the next stage in advancing transparency and quality assurance within TREs by semi-automating and systematising data tracking in a single tool throughout the data processing lifecycle, improving transparency, openness and quality assurance. https://ijpds.org/article/view/2464Data provenancesemi-automationtrusted research environmentssecure data environmentssafe havendata linkage |
spellingShingle | Katherine O'Sullivan Milan Markovic Jaroslaw Dymiter Bernhard Scheliga Chinasa Odo Katie Wilde Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments International Journal of Population Data Science Data provenance semi-automation trusted research environments secure data environments safe haven data linkage |
title | Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments |
title_full | Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments |
title_fullStr | Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments |
title_full_unstemmed | Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments |
title_short | Semi-automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in Trusted Research Environments |
title_sort | semi automated data provenance tracking for transparent data production and linkage to enhance auditing and quality assurance in trusted research environments |
topic | Data provenance semi-automation trusted research environments secure data environments safe haven data linkage |
url | https://ijpds.org/article/view/2464 |
work_keys_str_mv | AT katherineosullivan semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments AT milanmarkovic semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments AT jaroslawdymiter semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments AT bernhardscheliga semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments AT chinasaodo semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments AT katiewilde semiautomateddataprovenancetrackingfortransparentdataproductionandlinkagetoenhanceauditingandqualityassuranceintrustedresearchenvironments |