Data‐driven guidelines for phylogenomic analyses using SNP data

Abstract Premise There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods. Methods Using genotyping‐by...

Full description

Saved in:
Bibliographic Details
Main Authors: Jacob S. Suissa, Gisel Y. De La Cerda, Leland C. Graber, Chloe Jelley, David Wickell, Heather R. Phillips, Ayress D. Grinage, Corrie S. Moreau, Chelsea D. Specht, Jeff J. Doyle, Jacob B. Landis
Format: Article
Language:English
Published: Wiley 2024-11-01
Series:Applications in Plant Sciences
Subjects:
Online Access:https://doi.org/10.1002/aps3.11611
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850059241327427584
author Jacob S. Suissa
Gisel Y. De La Cerda
Leland C. Graber
Chloe Jelley
David Wickell
Heather R. Phillips
Ayress D. Grinage
Corrie S. Moreau
Chelsea D. Specht
Jeff J. Doyle
Jacob B. Landis
author_facet Jacob S. Suissa
Gisel Y. De La Cerda
Leland C. Graber
Chloe Jelley
David Wickell
Heather R. Phillips
Ayress D. Grinage
Corrie S. Moreau
Chelsea D. Specht
Jeff J. Doyle
Jacob B. Landis
author_sort Jacob S. Suissa
collection DOAJ
description Abstract Premise There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods. Methods Using genotyping‐by‐sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size. Results Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90–100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution. Discussion Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.
format Article
id doaj-art-a9ee168ebe6b4264bb4490193d469ae6
institution DOAJ
issn 2168-0450
language English
publishDate 2024-11-01
publisher Wiley
record_format Article
series Applications in Plant Sciences
spelling doaj-art-a9ee168ebe6b4264bb4490193d469ae62025-08-20T02:50:56ZengWileyApplications in Plant Sciences2168-04502024-11-01126n/an/a10.1002/aps3.11611Data‐driven guidelines for phylogenomic analyses using SNP dataJacob S. Suissa0Gisel Y. De La Cerda1Leland C. Graber2Chloe Jelley3David Wickell4Heather R. Phillips5Ayress D. Grinage6Corrie S. Moreau7Chelsea D. Specht8Jeff J. Doyle9Jacob B. Landis10Department of Ecology and Evolutionary Biology University of Tennessee at Knoxville Knoxville Tennessee USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USADepartment of Entomology Cornell University Ithaca New York USADepartment of Entomology Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USADepartment of Entomology Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USAAbstract Premise There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods. Methods Using genotyping‐by‐sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size. Results Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90–100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution. Discussion Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.https://doi.org/10.1002/aps3.11611ancestral state reconstructionsdivergence time estimationgenotyping‐by‐sequencing (GBS)Glycinelocusphylogenetic comparative methods
spellingShingle Jacob S. Suissa
Gisel Y. De La Cerda
Leland C. Graber
Chloe Jelley
David Wickell
Heather R. Phillips
Ayress D. Grinage
Corrie S. Moreau
Chelsea D. Specht
Jeff J. Doyle
Jacob B. Landis
Data‐driven guidelines for phylogenomic analyses using SNP data
Applications in Plant Sciences
ancestral state reconstructions
divergence time estimation
genotyping‐by‐sequencing (GBS)
Glycine
locus
phylogenetic comparative methods
title Data‐driven guidelines for phylogenomic analyses using SNP data
title_full Data‐driven guidelines for phylogenomic analyses using SNP data
title_fullStr Data‐driven guidelines for phylogenomic analyses using SNP data
title_full_unstemmed Data‐driven guidelines for phylogenomic analyses using SNP data
title_short Data‐driven guidelines for phylogenomic analyses using SNP data
title_sort data driven guidelines for phylogenomic analyses using snp data
topic ancestral state reconstructions
divergence time estimation
genotyping‐by‐sequencing (GBS)
Glycine
locus
phylogenetic comparative methods
url https://doi.org/10.1002/aps3.11611
work_keys_str_mv AT jacobssuissa datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT giselydelacerda datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT lelandcgraber datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT chloejelley datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT davidwickell datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT heatherrphillips datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT ayressdgrinage datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT corriesmoreau datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT chelseadspecht datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT jeffjdoyle datadrivenguidelinesforphylogenomicanalysesusingsnpdata
AT jacobblandis datadrivenguidelinesforphylogenomicanalysesusingsnpdata