Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data

Abstract Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity‐by‐descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this...

Full description

Saved in:
Bibliographic Details
Main Authors: Kelsey E. Johnson, Christopher J. Adams, Benjamin F. Voight
Format: Article
Language:English
Published: Wiley 2022-11-01
Series:Methods in Ecology and Evolution
Subjects:
Online Access:https://doi.org/10.1111/2041-210X.13991
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849771906460286976
author Kelsey E. Johnson
Christopher J. Adams
Benjamin F. Voight
author_facet Kelsey E. Johnson
Christopher J. Adams
Benjamin F. Voight
author_sort Kelsey E. Johnson
collection DOAJ
description Abstract Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity‐by‐descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model—denoted here as ‘IBD‐inconsistent’—using unphased population sequencing data. We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model. Applying our method to whole‐genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD‐inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD‐inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content. By identifying IBD‐inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history.
format Article
id doaj-art-8925a0bbb2564a9689b883cf6ebfd0cb
institution DOAJ
issn 2041-210X
language English
publishDate 2022-11-01
publisher Wiley
record_format Article
series Methods in Ecology and Evolution
spelling doaj-art-8925a0bbb2564a9689b883cf6ebfd0cb2025-08-20T03:02:28ZengWileyMethods in Ecology and Evolution2041-210X2022-11-0113112429244210.1111/2041-210X.13991Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing dataKelsey E. Johnson0Christopher J. Adams1Benjamin F. Voight2Cell and Molecular Biology Graduate Group, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USAGenomics and Computational Biology Graduate Group, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USADepartment of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania Philadelphia Pennsylvania USAAbstract Analyses of genetic variation typically assume that rare variants within a population are inherited from a single common ancestral event identity‐by‐descent (IBD). However, there are genetic and technical processes through which rare variants in population genetic data may deviate from this simple evolutionary model, including recurrent mutations, gene conversions and genotyping error. All these processes can decrease the expected length of shared background haplotype surrounding a rare variant if that variant was inherited from a single event descending from a common ancestor. No method exists to computationally infer rare variants inconsistent with this simple model—denoted here as ‘IBD‐inconsistent’—using unphased population sequencing data. We hypothesized that the difference in shared haplotype background length can distinguish variants consistent and inconsistent with this simple IBD transmission population sequencing data without pedigree information. We implemented a Bayesian hierarchical model and used Gibbs sampling to estimate the posterior probability of IBD state for rare variants, using simulated recurrent mutations to demonstrate that our approach accurately distinguishes rare variants consistent and inconsistent with a simple IBD inheritance model. Applying our method to whole‐genome sequencing data from 3,621 human individuals in the UK10K consortium, we found that IBD‐inconsistent variants correlated with higher local mutation rates and genomic features like replication timing. Using a heuristic to categorize IBD‐inconsistent variants as gene conversions, we found that potential gene conversions had expected properties such as enriched local GC content. By identifying IBD‐inconsistent variants, we can better understand the spectrum of recent mutations in human populations, a source of genetic variation driving evolution and a key factor in understanding recent demographic history.https://doi.org/10.1111/2041-210X.13991population geneticsbioinformaticsmolecular evolutionevolutionary biologyBayesian methods
spellingShingle Kelsey E. Johnson
Christopher J. Adams
Benjamin F. Voight
Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data
Methods in Ecology and Evolution
population genetics
bioinformatics
molecular evolution
evolutionary biology
Bayesian methods
title Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data
title_full Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data
title_fullStr Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data
title_full_unstemmed Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data
title_short Identifying rare variants inconsistent with identity‐by‐descent in population‐scale whole‐genome sequencing data
title_sort identifying rare variants inconsistent with identity by descent in population scale whole genome sequencing data
topic population genetics
bioinformatics
molecular evolution
evolutionary biology
Bayesian methods
url https://doi.org/10.1111/2041-210X.13991
work_keys_str_mv AT kelseyejohnson identifyingrarevariantsinconsistentwithidentitybydescentinpopulationscalewholegenomesequencingdata
AT christopherjadams identifyingrarevariantsinconsistentwithidentitybydescentinpopulationscalewholegenomesequencingdata
AT benjaminfvoight identifyingrarevariantsinconsistentwithidentitybydescentinpopulationscalewholegenomesequencingdata