Data‐driven guidelines for phylogenomic analyses using SNP data
Abstract Premise There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods. Methods Using genotyping‐by...
Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Wiley
2024-11-01
|
| Series: | Applications in Plant Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1002/aps3.11611 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850059241327427584 |
|---|---|
| author | Jacob S. Suissa Gisel Y. De La Cerda Leland C. Graber Chloe Jelley David Wickell Heather R. Phillips Ayress D. Grinage Corrie S. Moreau Chelsea D. Specht Jeff J. Doyle Jacob B. Landis |
| author_facet | Jacob S. Suissa Gisel Y. De La Cerda Leland C. Graber Chloe Jelley David Wickell Heather R. Phillips Ayress D. Grinage Corrie S. Moreau Chelsea D. Specht Jeff J. Doyle Jacob B. Landis |
| author_sort | Jacob S. Suissa |
| collection | DOAJ |
| description | Abstract Premise There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods. Methods Using genotyping‐by‐sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size. Results Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90–100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution. Discussion Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support. |
| format | Article |
| id | doaj-art-a9ee168ebe6b4264bb4490193d469ae6 |
| institution | DOAJ |
| issn | 2168-0450 |
| language | English |
| publishDate | 2024-11-01 |
| publisher | Wiley |
| record_format | Article |
| series | Applications in Plant Sciences |
| spelling | doaj-art-a9ee168ebe6b4264bb4490193d469ae62025-08-20T02:50:56ZengWileyApplications in Plant Sciences2168-04502024-11-01126n/an/a10.1002/aps3.11611Data‐driven guidelines for phylogenomic analyses using SNP dataJacob S. Suissa0Gisel Y. De La Cerda1Leland C. Graber2Chloe Jelley3David Wickell4Heather R. Phillips5Ayress D. Grinage6Corrie S. Moreau7Chelsea D. Specht8Jeff J. Doyle9Jacob B. Landis10Department of Ecology and Evolutionary Biology University of Tennessee at Knoxville Knoxville Tennessee USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USADepartment of Entomology Cornell University Ithaca New York USADepartment of Entomology Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USADepartment of Entomology Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USASchool of Integrative Plant Science, Section of Plant Biology and the L. H. Bailey Hortorium Cornell University Ithaca New York USAAbstract Premise There is a general lack of consensus on the best practices for filtering of single‐nucleotide polymorphisms (SNPs) and whether it is better to use SNPs or include flanking regions (full “locus”) in phylogenomic analyses and subsequent comparative methods. Methods Using genotyping‐by‐sequencing data from 22 Glycine species, we assessed the effects of SNP vs. locus usage and SNP retention stringency. We compared branch length, node support, and divergence time estimation across 16 datasets with varying amounts of missing data and total size. Results Our results revealed five aspects of phylogenomic data usage that may be generally applicable: (1) tree topology is largely congruent across analyses; (2) filtering strictly for SNP retention (e.g., 90–100%) reduces support and can alter some inferred relationships; (3) absolute branch lengths vary by two orders of magnitude between SNP and locus datasets; (4) data type and branch length variation have little effect on divergence time estimation; and (5) phylograms alter the estimation of ancestral states and rates of morphological evolution. Discussion Using SNP or locus datasets does not alter phylogenetic inference significantly, unless researchers want or need to use absolute branch lengths. We recommend against using excessive filtering thresholds for SNP retention to reduce the risk of producing inconsistent topologies and generating low support.https://doi.org/10.1002/aps3.11611ancestral state reconstructionsdivergence time estimationgenotyping‐by‐sequencing (GBS)Glycinelocusphylogenetic comparative methods |
| spellingShingle | Jacob S. Suissa Gisel Y. De La Cerda Leland C. Graber Chloe Jelley David Wickell Heather R. Phillips Ayress D. Grinage Corrie S. Moreau Chelsea D. Specht Jeff J. Doyle Jacob B. Landis Data‐driven guidelines for phylogenomic analyses using SNP data Applications in Plant Sciences ancestral state reconstructions divergence time estimation genotyping‐by‐sequencing (GBS) Glycine locus phylogenetic comparative methods |
| title | Data‐driven guidelines for phylogenomic analyses using SNP data |
| title_full | Data‐driven guidelines for phylogenomic analyses using SNP data |
| title_fullStr | Data‐driven guidelines for phylogenomic analyses using SNP data |
| title_full_unstemmed | Data‐driven guidelines for phylogenomic analyses using SNP data |
| title_short | Data‐driven guidelines for phylogenomic analyses using SNP data |
| title_sort | data driven guidelines for phylogenomic analyses using snp data |
| topic | ancestral state reconstructions divergence time estimation genotyping‐by‐sequencing (GBS) Glycine locus phylogenetic comparative methods |
| url | https://doi.org/10.1002/aps3.11611 |
| work_keys_str_mv | AT jacobssuissa datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT giselydelacerda datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT lelandcgraber datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT chloejelley datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT davidwickell datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT heatherrphillips datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT ayressdgrinage datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT corriesmoreau datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT chelseadspecht datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT jeffjdoyle datadrivenguidelinesforphylogenomicanalysesusingsnpdata AT jacobblandis datadrivenguidelinesforphylogenomicanalysesusingsnpdata |