Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions
Ancient human viruses have been detected in ancient DNA (aDNA) samples of both Anatomically Modern Humans and Neanderthals. Reconstructing genomes from aDNA using reference mapping presents numerous problems due to the unique nature of ancient samples, their degraded state, smaller read sizes and th...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-01-01
|
| Series: | Viruses |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1999-4915/17/2/195 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849722575382380544 |
|---|---|
| author | Fernando Antoneli Cristina M. Peter Marcelo R. S. Briones |
| author_facet | Fernando Antoneli Cristina M. Peter Marcelo R. S. Briones |
| author_sort | Fernando Antoneli |
| collection | DOAJ |
| description | Ancient human viruses have been detected in ancient DNA (aDNA) samples of both Anatomically Modern Humans and Neanderthals. Reconstructing genomes from aDNA using reference mapping presents numerous problems due to the unique nature of ancient samples, their degraded state, smaller read sizes and the limitations of current methodologies. The spurious alignments of reads to reference sequences (mapping) are a main source of false positives in aDNA assemblies and the assessment of signal-to-noise ratios is essential to differentiate bona fide reconstructions from random, noisy assemblies. Here, we analyzed the statistical distributions of viral genome assemblies, ancient and modern, and their respective random “mock” controls used to evaluate the signal-to-noise ratio. We tested if differences between real and random assemblies could be detected from their statistical distributions. Our analysis shows that the coverage distributions of (1) real viral aDNA assemblies of adenovirus (ADV), herpesvirus (HSV) and papillomavirus (HPV) do not follow power laws nor log-normal laws, (2) (ADV) and control aDNA assemblies are well approximated by log-normal laws, (3) negative control parvovirus B19 (real and random) follow a power law with infinite variance and (4) the mapDamage negative control with non-ancient DNA (modern ADV) and the mapDamage positive control (human mtDNA) are well approximated by the negative binomial distribution, consistent with the Lander–Waterman model. Our results show that the tails of the distributions of aDNA and their controls reveal the weight of random effects and can differentiate spurious assemblies, or false positives, from bona fide assemblies. |
| format | Article |
| id | doaj-art-f76d65ded6be4b4bacfa29d4a7e06ec8 |
| institution | DOAJ |
| issn | 1999-4915 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Viruses |
| spelling | doaj-art-f76d65ded6be4b4bacfa29d4a7e06ec82025-08-20T03:11:19ZengMDPI AGViruses1999-49152025-01-0117219510.3390/v17020195Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA ReconstructionsFernando Antoneli0Cristina M. Peter1Marcelo R. S. Briones2Center for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo 04039-032, SP, BrazilCenter for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo 04039-032, SP, BrazilCenter for Medical Bioinformatics, Escola Paulista de Medicina, Federal University of São Paulo (UNIFESP), São Paulo 04039-032, SP, BrazilAncient human viruses have been detected in ancient DNA (aDNA) samples of both Anatomically Modern Humans and Neanderthals. Reconstructing genomes from aDNA using reference mapping presents numerous problems due to the unique nature of ancient samples, their degraded state, smaller read sizes and the limitations of current methodologies. The spurious alignments of reads to reference sequences (mapping) are a main source of false positives in aDNA assemblies and the assessment of signal-to-noise ratios is essential to differentiate bona fide reconstructions from random, noisy assemblies. Here, we analyzed the statistical distributions of viral genome assemblies, ancient and modern, and their respective random “mock” controls used to evaluate the signal-to-noise ratio. We tested if differences between real and random assemblies could be detected from their statistical distributions. Our analysis shows that the coverage distributions of (1) real viral aDNA assemblies of adenovirus (ADV), herpesvirus (HSV) and papillomavirus (HPV) do not follow power laws nor log-normal laws, (2) (ADV) and control aDNA assemblies are well approximated by log-normal laws, (3) negative control parvovirus B19 (real and random) follow a power law with infinite variance and (4) the mapDamage negative control with non-ancient DNA (modern ADV) and the mapDamage positive control (human mtDNA) are well approximated by the negative binomial distribution, consistent with the Lander–Waterman model. Our results show that the tails of the distributions of aDNA and their controls reveal the weight of random effects and can differentiate spurious assemblies, or false positives, from bona fide assemblies.https://www.mdpi.com/1999-4915/17/2/195ancient DNAgenome assemblyancient virusesstatistical distributionspower lawslog-normal laws |
| spellingShingle | Fernando Antoneli Cristina M. Peter Marcelo R. S. Briones Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions Viruses ancient DNA genome assembly ancient viruses statistical distributions power laws log-normal laws |
| title | Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions |
| title_full | Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions |
| title_fullStr | Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions |
| title_full_unstemmed | Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions |
| title_short | Statistical Distributions of Genome Assemblies Reveal Random Effects in Ancient Viral DNA Reconstructions |
| title_sort | statistical distributions of genome assemblies reveal random effects in ancient viral dna reconstructions |
| topic | ancient DNA genome assembly ancient viruses statistical distributions power laws log-normal laws |
| url | https://www.mdpi.com/1999-4915/17/2/195 |
| work_keys_str_mv | AT fernandoantoneli statisticaldistributionsofgenomeassembliesrevealrandomeffectsinancientviraldnareconstructions AT cristinampeter statisticaldistributionsofgenomeassembliesrevealrandomeffectsinancientviraldnareconstructions AT marcelorsbriones statisticaldistributionsofgenomeassembliesrevealrandomeffectsinancientviraldnareconstructions |