Factors influencing the accuracy and precision in dating single gene trees

Molecular dating is the inference of divergence time from genetic sequences. Knowing the time of appearance of a taxon sets the evolutionary context by connecting it with past ecosystems and species. Knowing the divergence times of gene lineages would provide a context to understand adaptation at th...

Full description

Saved in:
Bibliographic Details
Main Authors: Louvel, Guillaume, Roest Crollius, Hugues
Format: Article
Language:English
Published: Peer Community In 2025-05-01
Series:Peer Community Journal
Subjects:
Online Access:https://peercommunityjournal.org/articles/10.24072/pcjournal.556/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849247412539883520
author Louvel, Guillaume
Roest Crollius, Hugues
author_facet Louvel, Guillaume
Roest Crollius, Hugues
author_sort Louvel, Guillaume
collection DOAJ
description Molecular dating is the inference of divergence time from genetic sequences. Knowing the time of appearance of a taxon sets the evolutionary context by connecting it with past ecosystems and species. Knowing the divergence times of gene lineages would provide a context to understand adaptation at the genomic level. However, molecular clock inference faces uncertainty due to the variability of the rate of substitution between species, between genes, and between sites within genes. When dating speciations, per-lineage rate variability can be informed by fossil calibrations, and gene-specific rates can be either averaged out or modeled by concatenating multiple genes. By contrast, when dating gene-specific events, fossil calibrations only inform about speciation nodes, and concatenation does not apply to divergences other than speciations. This study aims to benchmark the accuracy of molecular dating applied to single gene trees and identify how it is affected by gene tree characteristics. We analyze 5205 alignments of genes from 21 Primates in which no duplication or loss is observed. We also simulated alignments based on characteristics from Primates under a relaxed clock model to analyze the dating accuracy. Divergence times were estimated with the Bayesian program Beast2. From the empirical dataset, we find that the date estimates deviate more from the median age with shorter alignments, high rate heterogeneity between branches, and low average rate, features that underlie the amount of dating information in alignments, hence, statistical power. The smallest deviation is associated with core biological functions such as ATP binding and cellular organization, categories that are expected to be under strong negative selection. We then investigated the accuracy of dating with simulated alignments, by controlling the three above parameters separately. It confirmed the factors of precision, but also revealed biases when branch rates are highly heterogeneous. This suggests that in the case of the relaxed uncorrelated molecular clock, biases arise from the tree prior when calibrations are lacking and rate heterogeneity is high. Our study finally reports the scale of the gene tree features that influence the dating consistency with median ages, so that comparisons can be made with other genes and taxa. To tackle the molecular dating of events only observed in single gene trees, like deep coalescence, horizontal gene transfers, and gene duplications, future models should overcome the lack of power due to limited information from single genes.
format Article
id doaj-art-297a95cd5ff74f1eb3dde02d677d7919
institution Kabale University
issn 2804-3871
language English
publishDate 2025-05-01
publisher Peer Community In
record_format Article
series Peer Community Journal
spelling doaj-art-297a95cd5ff74f1eb3dde02d677d79192025-08-20T03:58:13ZengPeer Community InPeer Community Journal2804-38712025-05-01510.24072/pcjournal.55610.24072/pcjournal.556Factors influencing the accuracy and precision in dating single gene trees Louvel, Guillaume0https://orcid.org/0000-0002-7745-0785Roest Crollius, Hugues1https://orcid.org/0000-0002-8209-173XÉcole normale supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'École normale supérieure (IBENS), F-75005 Paris, France; Centre for Anthropobiology and Genomics of Toulouse, CNRS UMR5288/Université de Toulouse, Toulouse, FranceÉcole normale supérieure, PSL Research University, CNRS, Inserm, Institut de Biologie de l'École normale supérieure (IBENS), F-75005 Paris, FranceMolecular dating is the inference of divergence time from genetic sequences. Knowing the time of appearance of a taxon sets the evolutionary context by connecting it with past ecosystems and species. Knowing the divergence times of gene lineages would provide a context to understand adaptation at the genomic level. However, molecular clock inference faces uncertainty due to the variability of the rate of substitution between species, between genes, and between sites within genes. When dating speciations, per-lineage rate variability can be informed by fossil calibrations, and gene-specific rates can be either averaged out or modeled by concatenating multiple genes. By contrast, when dating gene-specific events, fossil calibrations only inform about speciation nodes, and concatenation does not apply to divergences other than speciations. This study aims to benchmark the accuracy of molecular dating applied to single gene trees and identify how it is affected by gene tree characteristics. We analyze 5205 alignments of genes from 21 Primates in which no duplication or loss is observed. We also simulated alignments based on characteristics from Primates under a relaxed clock model to analyze the dating accuracy. Divergence times were estimated with the Bayesian program Beast2. From the empirical dataset, we find that the date estimates deviate more from the median age with shorter alignments, high rate heterogeneity between branches, and low average rate, features that underlie the amount of dating information in alignments, hence, statistical power. The smallest deviation is associated with core biological functions such as ATP binding and cellular organization, categories that are expected to be under strong negative selection. We then investigated the accuracy of dating with simulated alignments, by controlling the three above parameters separately. It confirmed the factors of precision, but also revealed biases when branch rates are highly heterogeneous. This suggests that in the case of the relaxed uncorrelated molecular clock, biases arise from the tree prior when calibrations are lacking and rate heterogeneity is high. Our study finally reports the scale of the gene tree features that influence the dating consistency with median ages, so that comparisons can be made with other genes and taxa. To tackle the molecular dating of events only observed in single gene trees, like deep coalescence, horizontal gene transfers, and gene duplications, future models should overcome the lack of power due to limited information from single genes.https://peercommunityjournal.org/articles/10.24072/pcjournal.556/molecular clockmolecular datinguncertaintygene treeprimatesphylogeneticsphylogenomics
spellingShingle Louvel, Guillaume
Roest Crollius, Hugues
Factors influencing the accuracy and precision in dating single gene trees
Peer Community Journal
molecular clock
molecular dating
uncertainty
gene tree
primates
phylogenetics
phylogenomics
title Factors influencing the accuracy and precision in dating single gene trees
title_full Factors influencing the accuracy and precision in dating single gene trees
title_fullStr Factors influencing the accuracy and precision in dating single gene trees
title_full_unstemmed Factors influencing the accuracy and precision in dating single gene trees
title_short Factors influencing the accuracy and precision in dating single gene trees
title_sort factors influencing the accuracy and precision in dating single gene trees
topic molecular clock
molecular dating
uncertainty
gene tree
primates
phylogenetics
phylogenomics
url https://peercommunityjournal.org/articles/10.24072/pcjournal.556/
work_keys_str_mv AT louvelguillaume factorsinfluencingtheaccuracyandprecisionindatingsinglegenetrees
AT roestcrolliushugues factorsinfluencingtheaccuracyandprecisionindatingsinglegenetrees