Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data
Abstract Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity‐by‐descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2022-11-01
|
| Series: | Methods in Ecology and Evolution |
| Subjects: | |
| Online Access: | https://doi.org/10.1111/2041-210X.13991 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849771906460286976 |
|---|---|
| author | Kelsey E. Johnson Christopher J. Adams Benjamin F. Voight |
| author_facet | Kelsey E. Johnson Christopher J. Adams Benjamin F. Voight |
| author_sort | Kelsey E. Johnson |
| collection | DOAJ |
| description | Abstract Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity‐by‐descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model—denoted here as ‘IBD‐inconsistent’—using unphased population sequencing data. We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model. Applying our method to whole‐genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD‐inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD‐inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content. By identifying IBD‐inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history. |
| format | Article |
| id | doaj-art-8925a0bbb2564a9689b883cf6ebfd0cb |
| institution | DOAJ |
| issn | 2041-210X |
| language | English |
| publishDate | 2022-11-01 |
| publisher | Wiley |
| record_format | Article |
| series | Methods in Ecology and Evolution |
| spelling | doaj-art-8925a0bbb2564a9689b883cf6ebfd0cb2025-08-20T03:02:28ZengWileyMethods in Ecology and Evolution2041-210X2022-11-0113112429244210.1111/2041-210X.13991Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing dataKelsey E. Johnson0Christopher J. Adams1Benjamin F. Voight2Cell and Molecular Biology Graduate Group, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USAGenomics and Computational Biology Graduate Group, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USADepartment of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USAAbstract Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity‐by‐descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model—denoted here as ‘IBD‐inconsistent’—using unphased population sequencing data. We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model. Applying our method to whole‐genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD‐inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD‐inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content. By identifying IBD‐inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history.https://doi.org/10.1111/2041-210X.13991population geneticsbioinformaticsmolecular evolutionevolutionary biologyBayesian methods |
| spellingShingle | Kelsey E. Johnson Christopher J. Adams Benjamin F. Voight Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data Methods in Ecology and Evolution population genetics bioinformatics molecular evolution evolutionary biology Bayesian methods |
| title | Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data |
| title_full | Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data |
| title_fullStr | Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data |
| title_full_unstemmed | Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data |
| title_short | Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data |
| title_sort | identifying rare variants inconsistent with identity by descent in population scale whole genome sequencing data |
| topic | population genetics bioinformatics molecular evolution evolutionary biology Bayesian methods |
| url | https://doi.org/10.1111/2041-210X.13991 |
| work_keys_str_mv | AT kelseyejohnson identifyingrarevariantsinconsistentwithidentitybydescentinpopulationscalewholegenomesequencingdata AT christopherjadams identifyingrarevariantsinconsistentwithidentitybydescentinpopulationscalewholegenomesequencingdata AT benjaminfvoight identifyingrarevariantsinconsistentwithidentitybydescentinpopulationscalewholegenomesequencingdata |