A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure

Abstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has ra...

Full description

Saved in:
Bibliographic Details
Main Authors: Silvia Gioiosa, Beatrice Chiavarini, Mattia D’Antonio, Giuseppe Trotta, Balasubramanian Chandramouli, Juan Mata Naranjo, Giuseppa Muscianisi, Mirko Cestari, Elisa Rossi
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-024-01047-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1823861945017565184
author Silvia Gioiosa
Beatrice Chiavarini
Mattia D’Antonio
Giuseppe Trotta
Balasubramanian Chandramouli
Juan Mata Naranjo
Giuseppa Muscianisi
Mirko Cestari
Elisa Rossi
author_facet Silvia Gioiosa
Beatrice Chiavarini
Mattia D’Antonio
Giuseppe Trotta
Balasubramanian Chandramouli
Juan Mata Naranjo
Giuseppa Muscianisi
Mirko Cestari
Elisa Rossi
author_sort Silvia Gioiosa
collection DOAJ
description Abstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has raised tremendous opportunities for large-scale genomic data analysis. Cohort studies involving the processing of hundreds or thousands of input samples, combined with the integration of diverse diagnostic data, enable researchers to conduct integrative analyses at an unprecedented level of detail would have been impossible to achieve through single sample studies. To analyse such amount of data, centres that have access to High Performance Computing or extensive cloud resources have become crucial both for storage and efficient execution of data analysis pipelines. Nevertheless, since genomic data are considered sensitive personal data according to the EU General Data Protection Regulation, computational centres with high resource capabilities must prioritize data security and protection. This solution has been successfully applied to the Network for Italian Genomes use-case, demonstrating scalability to other hospitals and universities involved in research projects dealing with sensitive genomic data.
format Article
id doaj-art-20f5ad0ffaff405cb659e3dbf24aee5f
institution Kabale University
issn 2196-1115
language English
publishDate 2025-02-01
publisher SpringerOpen
record_format Article
series Journal of Big Data
spelling doaj-art-20f5ad0ffaff405cb659e3dbf24aee5f2025-02-09T12:41:18ZengSpringerOpenJournal of Big Data2196-11152025-02-0112111610.1186/s40537-024-01047-9A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructureSilvia Gioiosa0Beatrice Chiavarini1Mattia D’Antonio2Giuseppe Trotta3Balasubramanian Chandramouli4Juan Mata Naranjo5Giuseppa Muscianisi6Mirko Cestari7Elisa Rossi8HPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAAbstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has raised tremendous opportunities for large-scale genomic data analysis. Cohort studies involving the processing of hundreds or thousands of input samples, combined with the integration of diverse diagnostic data, enable researchers to conduct integrative analyses at an unprecedented level of detail would have been impossible to achieve through single sample studies. To analyse such amount of data, centres that have access to High Performance Computing or extensive cloud resources have become crucial both for storage and efficient execution of data analysis pipelines. Nevertheless, since genomic data are considered sensitive personal data according to the EU General Data Protection Regulation, computational centres with high resource capabilities must prioritize data security and protection. This solution has been successfully applied to the Network for Italian Genomes use-case, demonstrating scalability to other hospitals and universities involved in research projects dealing with sensitive genomic data.https://doi.org/10.1186/s40537-024-01047-9Cloud-computingBioinformatics analysesHigh-throughput sequencing analysisGDPR-compliantCloud-secure environmentSensitive data
spellingShingle Silvia Gioiosa
Beatrice Chiavarini
Mattia D’Antonio
Giuseppe Trotta
Balasubramanian Chandramouli
Juan Mata Naranjo
Giuseppa Muscianisi
Mirko Cestari
Elisa Rossi
A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure
Journal of Big Data
Cloud-computing
Bioinformatics analyses
High-throughput sequencing analysis
GDPR-compliant
Cloud-secure environment
Sensitive data
title A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure
title_full A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure
title_fullStr A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure
title_full_unstemmed A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure
title_short A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure
title_sort gdpr compliant solution for analysis of large scale genomics datasets on hpc cloud infrastructure
topic Cloud-computing
Bioinformatics analyses
High-throughput sequencing analysis
GDPR-compliant
Cloud-secure environment
Sensitive data
url https://doi.org/10.1186/s40537-024-01047-9
work_keys_str_mv AT silviagioiosa agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT beatricechiavarini agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT mattiadantonio agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT giuseppetrotta agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT balasubramanianchandramouli agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT juanmatanaranjo agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT giuseppamuscianisi agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT mirkocestari agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT elisarossi agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT silviagioiosa gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT beatricechiavarini gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT mattiadantonio gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT giuseppetrotta gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT balasubramanianchandramouli gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT juanmatanaranjo gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT giuseppamuscianisi gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT mirkocestari gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure
AT elisarossi gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure