A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure
Abstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has ra...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2025-02-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40537-024-01047-9 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1823861945017565184 |
---|---|
author | Silvia Gioiosa Beatrice Chiavarini Mattia D’Antonio Giuseppe Trotta Balasubramanian Chandramouli Juan Mata Naranjo Giuseppa Muscianisi Mirko Cestari Elisa Rossi |
author_facet | Silvia Gioiosa Beatrice Chiavarini Mattia D’Antonio Giuseppe Trotta Balasubramanian Chandramouli Juan Mata Naranjo Giuseppa Muscianisi Mirko Cestari Elisa Rossi |
author_sort | Silvia Gioiosa |
collection | DOAJ |
description | Abstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has raised tremendous opportunities for large-scale genomic data analysis. Cohort studies involving the processing of hundreds or thousands of input samples, combined with the integration of diverse diagnostic data, enable researchers to conduct integrative analyses at an unprecedented level of detail would have been impossible to achieve through single sample studies. To analyse such amount of data, centres that have access to High Performance Computing or extensive cloud resources have become crucial both for storage and efficient execution of data analysis pipelines. Nevertheless, since genomic data are considered sensitive personal data according to the EU General Data Protection Regulation, computational centres with high resource capabilities must prioritize data security and protection. This solution has been successfully applied to the Network for Italian Genomes use-case, demonstrating scalability to other hospitals and universities involved in research projects dealing with sensitive genomic data. |
format | Article |
id | doaj-art-20f5ad0ffaff405cb659e3dbf24aee5f |
institution | Kabale University |
issn | 2196-1115 |
language | English |
publishDate | 2025-02-01 |
publisher | SpringerOpen |
record_format | Article |
series | Journal of Big Data |
spelling | doaj-art-20f5ad0ffaff405cb659e3dbf24aee5f2025-02-09T12:41:18ZengSpringerOpenJournal of Big Data2196-11152025-02-0112111610.1186/s40537-024-01047-9A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructureSilvia Gioiosa0Beatrice Chiavarini1Mattia D’Antonio2Giuseppe Trotta3Balasubramanian Chandramouli4Juan Mata Naranjo5Giuseppa Muscianisi6Mirko Cestari7Elisa Rossi8HPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAHPC High Performance Computing Department, CINECAAbstract This paper outlines the technical and organizational measures implemented by the Italian supercomputing center, CINECA, to efficiently collect, process, and store sensitive-omics data in compliance with GDPR regulations. Indeed, the explosion of High Throughput Sequencing in medicine has raised tremendous opportunities for large-scale genomic data analysis. Cohort studies involving the processing of hundreds or thousands of input samples, combined with the integration of diverse diagnostic data, enable researchers to conduct integrative analyses at an unprecedented level of detail would have been impossible to achieve through single sample studies. To analyse such amount of data, centres that have access to High Performance Computing or extensive cloud resources have become crucial both for storage and efficient execution of data analysis pipelines. Nevertheless, since genomic data are considered sensitive personal data according to the EU General Data Protection Regulation, computational centres with high resource capabilities must prioritize data security and protection. This solution has been successfully applied to the Network for Italian Genomes use-case, demonstrating scalability to other hospitals and universities involved in research projects dealing with sensitive genomic data.https://doi.org/10.1186/s40537-024-01047-9Cloud-computingBioinformatics analysesHigh-throughput sequencing analysisGDPR-compliantCloud-secure environmentSensitive data |
spellingShingle | Silvia Gioiosa Beatrice Chiavarini Mattia D’Antonio Giuseppe Trotta Balasubramanian Chandramouli Juan Mata Naranjo Giuseppa Muscianisi Mirko Cestari Elisa Rossi A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure Journal of Big Data Cloud-computing Bioinformatics analyses High-throughput sequencing analysis GDPR-compliant Cloud-secure environment Sensitive data |
title | A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure |
title_full | A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure |
title_fullStr | A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure |
title_full_unstemmed | A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure |
title_short | A GDPR-compliant solution for analysis of large-scale genomics datasets on HPC cloud infrastructure |
title_sort | gdpr compliant solution for analysis of large scale genomics datasets on hpc cloud infrastructure |
topic | Cloud-computing Bioinformatics analyses High-throughput sequencing analysis GDPR-compliant Cloud-secure environment Sensitive data |
url | https://doi.org/10.1186/s40537-024-01047-9 |
work_keys_str_mv | AT silviagioiosa agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT beatricechiavarini agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT mattiadantonio agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT giuseppetrotta agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT balasubramanianchandramouli agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT juanmatanaranjo agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT giuseppamuscianisi agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT mirkocestari agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT elisarossi agdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT silviagioiosa gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT beatricechiavarini gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT mattiadantonio gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT giuseppetrotta gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT balasubramanianchandramouli gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT juanmatanaranjo gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT giuseppamuscianisi gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT mirkocestari gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure AT elisarossi gdprcompliantsolutionforanalysisoflargescalegenomicsdatasetsonhpccloudinfrastructure |