A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure

Abstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has ra...

Full description

Saved in:
Bibliographic Details
Main Authors: Silvia Gioiosa, Beatrice Chiavarini, Mattia D’Antonio, Giuseppe Trotta, Balasubramanian Chandramouli, Juan Mata Naranjo, Giuseppa Muscianisi, Mirko Cestari, Elisa Rossi
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-024-01047-9
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has raised tremendous opportunities for large-scale genomic data analysis. Cohort studies involving the processing of hundreds or thousands of input samples, combined with the integration of diverse diagnostic data, enable researchers to conduct integrative analyses at an unprecedented level of detail would have been impossible to achieve through single sample studies. To analyse such amount of data, centres that have access to High Performance Computing or extensive cloud resources have become crucial both for storage and efficient execution of data analysis pipelines. Nevertheless, since genomic data are considered sensitive personal data according to the EU General Data Protection Regulation, computational centres with high resource capabilities must prioritize data security and protection. This solution has been successfully applied to the Network for Italian Genomes use-case, demonstrating scalability to other hospitals and universities involved in research projects dealing with sensitive genomic data.
ISSN:2196-1115