A Protein Domain and Family Based Approach to Rare Variant Association Analysis.

<h4>Background</h4>It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domai...

Full description

Saved in:
Bibliographic Details
Main Authors: Tom G Richardson, Hashem A Shihab, Manuel A Rivas, Mark I McCarthy, Colin Campbell, Nicholas J Timpson, Tom R Gaunt
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2016-01-01
Series:PLoS ONE
Online Access:https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0153803&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849472512913571840
author Tom G Richardson
Hashem A Shihab
Manuel A Rivas
Mark I McCarthy
Colin Campbell
Nicholas J Timpson
Tom R Gaunt
author_facet Tom G Richardson
Hashem A Shihab
Manuel A Rivas
Mark I McCarthy
Colin Campbell
Nicholas J Timpson
Tom R Gaunt
author_sort Tom G Richardson
collection DOAJ
description <h4>Background</h4>It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit.<h4>Methods</h4>Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT).<h4>Results</h4>We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed.<h4>Conclusion</h4>We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.
format Article
id doaj-art-2b7625b8d4c4434fb6470359f45a3f89
institution Kabale University
issn 1932-6203
language English
publishDate 2016-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-2b7625b8d4c4434fb6470359f45a3f892025-08-20T03:24:30ZengPublic Library of Science (PLoS)PLoS ONE1932-62032016-01-01114e015380310.1371/journal.pone.0153803A Protein Domain and Family Based Approach to Rare Variant Association Analysis.Tom G RichardsonHashem A ShihabManuel A RivasMark I McCarthyColin CampbellNicholas J TimpsonTom R Gaunt<h4>Background</h4>It has become common practice to analyse large scale sequencing data with statistical approaches based around the aggregation of rare variants within the same gene. We applied a novel approach to rare variant analysis by collapsing variants together using protein domain and family coordinates, regarded to be a more discrete definition of a biologically functional unit.<h4>Methods</h4>Using Pfam definitions, we collapsed rare variants (Minor Allele Frequency ≤ 1%) together in three different ways 1) variants within single genomic regions which map to individual protein domains 2) variants within two individual protein domain regions which are predicted to be responsible for a protein-protein interaction 3) all variants within combined regions from multiple genes responsible for coding the same protein domain (i.e. protein families). A conventional collapsing analysis using gene coordinates was also undertaken for comparison. We used UK10K sequence data and investigated associations between regions of variants and lipid traits using the sequence kernel association test (SKAT).<h4>Results</h4>We observed no strong evidence of association between regions of variants based on Pfam domain definitions and lipid traits. Quantile-Quantile plots illustrated that the overall distributions of p-values from the protein domain analyses were comparable to that of a conventional gene-based approach. Deviations from this distribution suggested that collapsing by either protein domain or gene definitions may be favourable depending on the trait analysed.<h4>Conclusion</h4>We have collapsed rare variants together using protein domain and family coordinates to present an alternative approach over collapsing across conventionally used gene-based regions. Although no strong evidence of association was detected in these analyses, future studies may still find value in adopting these approaches to detect previously unidentified association signals.https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0153803&type=printable
spellingShingle Tom G Richardson
Hashem A Shihab
Manuel A Rivas
Mark I McCarthy
Colin Campbell
Nicholas J Timpson
Tom R Gaunt
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
PLoS ONE
title A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
title_full A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
title_fullStr A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
title_full_unstemmed A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
title_short A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
title_sort protein domain and family based approach to rare variant association analysis
url https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0153803&type=printable
work_keys_str_mv AT tomgrichardson aproteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT hashemashihab aproteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT manuelarivas aproteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT markimccarthy aproteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT colincampbell aproteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT nicholasjtimpson aproteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT tomrgaunt aproteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT tomgrichardson proteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT hashemashihab proteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT manuelarivas proteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT markimccarthy proteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT colincampbell proteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT nicholasjtimpson proteindomainandfamilybasedapproachtorarevariantassociationanalysis
AT tomrgaunt proteindomainandfamilybasedapproachtorarevariantassociationanalysis