Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions

Ancient human viruses have been detected in ancient DNA (aDNA) samples of both Anatomically Modern Humans and Neanderthals. Reconstructing genomes from aDNA using reference mapping presents numerous problems due to the unique nature of ancient samples, their degraded state, smaller read sizes and th...

Full description

Saved in:
Bibliographic Details
Main Authors: Fernando Antoneli, Cristina M. Peter, Marcelo R. S. Briones
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Viruses
Subjects:
Online Access:https://www.mdpi.com/1999-4915/17/2/195
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849722575382380544
author Fernando Antoneli
Cristina M. Peter
Marcelo R. S. Briones
author_facet Fernando Antoneli
Cristina M. Peter
Marcelo R. S. Briones
author_sort Fernando Antoneli
collection DOAJ
description Ancient human viruses have been detected in ancient DNA (aDNA) samples of both Anatomically Modern Humans and Neanderthals. Reconstructing genomes from aDNA using reference mapping presents numerous problems due to the unique nature of ancient samples, their degraded state, smaller read sizes and the limitations of current methodologies. The spurious alignments of reads to reference sequences (mapping) are a main source of false positives in aDNA assemblies and the assessment of signal-to-noise ratios is essential to differentiate bona fide reconstructions from random, noisy assemblies. Here, we analyzed the statistical distributions of viral genome assemblies, ancient and modern, and their respective random “mock” controls used to evaluate the signal-to-noise ratio. We tested if differences between real and random assemblies could be detected from their statistical distributions. Our analysis shows that the coverage distributions of (1) real viral aDNA assemblies of adenovirus (ADV), herpesvirus (HSV) and papillomavirus (HPV) do not follow power laws nor log-normal laws, (2) (ADV) and control aDNA assemblies are well approximated by log-normal laws, (3) negative control parvovirus B19 (real and random) follow a power law with infinite variance and (4) the mapDamage negative control with non-ancient DNA (modern ADV) and the mapDamage positive control (human mtDNA) are well approximated by the negative binomial distribution, consistent with the Lander–Waterman model. Our results show that the tails of the distributions of aDNA and their controls reveal the weight of random effects and can differentiate spurious assemblies, or false positives, from bona fide assemblies.
format Article
id doaj-art-f76d65ded6be4b4bacfa29d4a7e06ec8
institution DOAJ
issn 1999-4915
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Viruses
spelling doaj-art-f76d65ded6be4b4bacfa29d4a7e06ec82025-08-20T03:11:19ZengMDPI AGViruses1999-49152025-01-0117219510.3390/v17020195Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA ReconstructionsFernando Antoneli0Cristina M. Peter1Marcelo R. S. Briones2Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo 04039-032, SP, BrazilCenter for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo 04039-032, SP, BrazilCenter for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo 04039-032, SP, BrazilAncient human viruses have been detected in ancient DNA (aDNA) samples of both Anatomically Modern Humans and Neanderthals. Reconstructing genomes from aDNA using reference mapping presents numerous problems due to the unique nature of ancient samples, their degraded state, smaller read sizes and the limitations of current methodologies. The spurious alignments of reads to reference sequences (mapping) are a main source of false positives in aDNA assemblies and the assessment of signal-to-noise ratios is essential to differentiate bona fide reconstructions from random, noisy assemblies. Here, we analyzed the statistical distributions of viral genome assemblies, ancient and modern, and their respective random “mock” controls used to evaluate the signal-to-noise ratio. We tested if differences between real and random assemblies could be detected from their statistical distributions. Our analysis shows that the coverage distributions of (1) real viral aDNA assemblies of adenovirus (ADV), herpesvirus (HSV) and papillomavirus (HPV) do not follow power laws nor log-normal laws, (2) (ADV) and control aDNA assemblies are well approximated by log-normal laws, (3) negative control parvovirus B19 (real and random) follow a power law with infinite variance and (4) the mapDamage negative control with non-ancient DNA (modern ADV) and the mapDamage positive control (human mtDNA) are well approximated by the negative binomial distribution, consistent with the Lander–Waterman model. Our results show that the tails of the distributions of aDNA and their controls reveal the weight of random effects and can differentiate spurious assemblies, or false positives, from bona fide assemblies.https://www.mdpi.com/1999-4915/17/2/195ancient DNAgenome assemblyancient virusesstatistical distributionspower lawslog-normal laws
spellingShingle Fernando Antoneli
Cristina M. Peter
Marcelo R. S. Briones
Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions
Viruses
ancient DNA
genome assembly
ancient viruses
statistical distributions
power laws
log-normal laws
title Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions
title_full Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions
title_fullStr Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions
title_full_unstemmed Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions
title_short Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions
title_sort statistical distributions of genome assemblies reveal random effects in ancient viral dna reconstructions
topic ancient DNA
genome assembly
ancient viruses
statistical distributions
power laws
log-normal laws
url https://www.mdpi.com/1999-4915/17/2/195
work_keys_str_mv AT fernandoantoneli statisticaldistributionsofgenomeassembliesrevealrandomeffectsinancientviraldnareconstructions
AT cristinampeter statisticaldistributionsofgenomeassembliesrevealrandomeffectsinancientviraldnareconstructions
AT marcelorsbriones statisticaldistributionsofgenomeassembliesrevealrandomeffectsinancientviraldnareconstructions