Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.

<h4>Background</h4>Starting from a forensic problem, Homer et al. showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. The finding was extended to prove the possibility of detecting whether a subject participated in a small homogeneous GWAS....

Full description

Saved in:
Bibliographic Details
Main Author: Silviu-Alin Bacanu
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2017-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0179504&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849766085399674880
author Silviu-Alin Bacanu
author_facet Silviu-Alin Bacanu
author_sort Silviu-Alin Bacanu
collection DOAJ
description <h4>Background</h4>Starting from a forensic problem, Homer et al. showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. The finding was extended to prove the possibility of detecting whether a subject participated in a small homogeneous GWAS. We denote this as the detection of a subject belonging to a certain cohort (SBCC). Subsequently, Visscher and Hill showed that the power to detect SBCC signal for an ethnically homogeneous cohort depends roughly on the ratio of the number of independent markers and total sample size. However, it is not clear if the same holds for more ethnically diverse cohorts. Later, Masca et al. propose running as SBCC test a regression of departure from assumed population frequency of i) subject genotype on ii) cohort of interest frequency. They use simulations to show that the approach has better SBCC detection power than the original Homer method but is impeded by population stratification.<h4>Approach</h4>To investigate the possibility of SBCC detection in multi-ethnic cohorts, we generalize the Masca et al. approach by theoretically deriving the correlation between a subject genotype and the cohort reference allele frequencies (RAFs) for stratified cohorts. Based on the derived formula, we theoretically show that, due to background stratification noise, SBCC detection is unlikely even for mildly stratified cohorts of size greater than around a thousand subjects. Thus, for the vast majority of contemporary cohorts, the fear of compromising privacy via SBCC detection is unfounded.
format Article
id doaj-art-0ed441a2e0fa4088b0c799e9a1e9681d
institution DOAJ
issn 1932-6203
language English
publishDate 2017-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-0ed441a2e0fa4088b0c799e9a1e9681d2025-08-20T03:04:40ZengPublic Library of Science (PLoS)PLoS ONE1932-62032017-01-01126e017950410.1371/journal.pone.0179504Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.Silviu-Alin Bacanu<h4>Background</h4>Starting from a forensic problem, Homer et al. showed that it was possible to detect if an individual contributes only 0.5% of the DNA in a pool. The finding was extended to prove the possibility of detecting whether a subject participated in a small homogeneous GWAS. We denote this as the detection of a subject belonging to a certain cohort (SBCC). Subsequently, Visscher and Hill showed that the power to detect SBCC signal for an ethnically homogeneous cohort depends roughly on the ratio of the number of independent markers and total sample size. However, it is not clear if the same holds for more ethnically diverse cohorts. Later, Masca et al. propose running as SBCC test a regression of departure from assumed population frequency of i) subject genotype on ii) cohort of interest frequency. They use simulations to show that the approach has better SBCC detection power than the original Homer method but is impeded by population stratification.<h4>Approach</h4>To investigate the possibility of SBCC detection in multi-ethnic cohorts, we generalize the Masca et al. approach by theoretically deriving the correlation between a subject genotype and the cohort reference allele frequencies (RAFs) for stratified cohorts. Based on the derived formula, we theoretically show that, due to background stratification noise, SBCC detection is unlikely even for mildly stratified cohorts of size greater than around a thousand subjects. Thus, for the vast majority of contemporary cohorts, the fear of compromising privacy via SBCC detection is unfounded.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0179504&type=printable
spellingShingle Silviu-Alin Bacanu
Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.
PLoS ONE
title Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.
title_full Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.
title_fullStr Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.
title_full_unstemmed Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.
title_short Sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy.
title_sort sharing extended summary data from contemporary genetics studies is unlikely to threaten subject privacy
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0179504&type=printable
work_keys_str_mv AT silviualinbacanu sharingextendedsummarydatafromcontemporarygeneticsstudiesisunlikelytothreatensubjectprivacy