Identifying and prioritizing potential human-infecting viruses from their genome sequences.

Determining which animal viruses may be capable of infecting humans is currently intractable at the time of their discovery, precluding prioritization of high-risk viruses for early investigation and outbreak preparedness. Given the increasing use of genomics in virus discovery and the otherwise spa...

Full description

Saved in:
Bibliographic Details
Main Authors: Nardus Mollentze, Simon A Babayan, Daniel G Streicker
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-09-01
Series:PLoS Biology
Online Access:https://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.3001390&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849337486672658432
author Nardus Mollentze
Simon A Babayan
Daniel G Streicker
author_facet Nardus Mollentze
Simon A Babayan
Daniel G Streicker
author_sort Nardus Mollentze
collection DOAJ
description Determining which animal viruses may be capable of infecting humans is currently intractable at the time of their discovery, precluding prioritization of high-risk viruses for early investigation and outbreak preparedness. Given the increasing use of genomics in virus discovery and the otherwise sparse knowledge of the biology of newly discovered viruses, we developed machine learning models that identify candidate zoonoses solely using signatures of host range encoded in viral genomes. Within a dataset of 861 viral species with known zoonotic status, our approach outperformed models based on the phylogenetic relatedness of viruses to known human-infecting viruses (area under the receiver operating characteristic curve [AUC] = 0.773), distinguishing high-risk viruses within families that contain a minority of human-infecting species and identifying putatively undetected or so far unrealized zoonoses. Analyses of the underpinnings of model predictions suggested the existence of generalizable features of viral genomes that are independent of virus taxonomic relationships and that may preadapt viruses to infect humans. Our model reduced a second set of 645 animal-associated viruses that were excluded from training to 272 high and 41 very high-risk candidate zoonoses and showed significantly elevated predicted zoonotic risk in viruses from nonhuman primates, but not other mammalian or avian host groups. A second application showed that our models could have identified Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) as a relatively high-risk coronavirus strain and that this prediction required no prior knowledge of zoonotic Severe Acute Respiratory Syndrome (SARS)-related coronaviruses. Genome-based zoonotic risk assessment provides a rapid, low-cost approach to enable evidence-driven virus surveillance and increases the feasibility of downstream biological and ecological characterization of viruses.
format Article
id doaj-art-7b610add7e994491b0b4d404eb5c74b2
institution Kabale University
issn 1544-9173
1545-7885
language English
publishDate 2021-09-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Biology
spelling doaj-art-7b610add7e994491b0b4d404eb5c74b22025-08-20T03:44:40ZengPublic Library of Science (PLoS)PLoS Biology1544-91731545-78852021-09-01199e300139010.1371/journal.pbio.3001390Identifying and prioritizing potential human-infecting viruses from their genome sequences.Nardus MollentzeSimon A BabayanDaniel G StreickerDetermining which animal viruses may be capable of infecting humans is currently intractable at the time of their discovery, precluding prioritization of high-risk viruses for early investigation and outbreak preparedness. Given the increasing use of genomics in virus discovery and the otherwise sparse knowledge of the biology of newly discovered viruses, we developed machine learning models that identify candidate zoonoses solely using signatures of host range encoded in viral genomes. Within a dataset of 861 viral species with known zoonotic status, our approach outperformed models based on the phylogenetic relatedness of viruses to known human-infecting viruses (area under the receiver operating characteristic curve [AUC] = 0.773), distinguishing high-risk viruses within families that contain a minority of human-infecting species and identifying putatively undetected or so far unrealized zoonoses. Analyses of the underpinnings of model predictions suggested the existence of generalizable features of viral genomes that are independent of virus taxonomic relationships and that may preadapt viruses to infect humans. Our model reduced a second set of 645 animal-associated viruses that were excluded from training to 272 high and 41 very high-risk candidate zoonoses and showed significantly elevated predicted zoonotic risk in viruses from nonhuman primates, but not other mammalian or avian host groups. A second application showed that our models could have identified Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) as a relatively high-risk coronavirus strain and that this prediction required no prior knowledge of zoonotic Severe Acute Respiratory Syndrome (SARS)-related coronaviruses. Genome-based zoonotic risk assessment provides a rapid, low-cost approach to enable evidence-driven virus surveillance and increases the feasibility of downstream biological and ecological characterization of viruses.https://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.3001390&type=printable
spellingShingle Nardus Mollentze
Simon A Babayan
Daniel G Streicker
Identifying and prioritizing potential human-infecting viruses from their genome sequences.
PLoS Biology
title Identifying and prioritizing potential human-infecting viruses from their genome sequences.
title_full Identifying and prioritizing potential human-infecting viruses from their genome sequences.
title_fullStr Identifying and prioritizing potential human-infecting viruses from their genome sequences.
title_full_unstemmed Identifying and prioritizing potential human-infecting viruses from their genome sequences.
title_short Identifying and prioritizing potential human-infecting viruses from their genome sequences.
title_sort identifying and prioritizing potential human infecting viruses from their genome sequences
url https://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.3001390&type=printable
work_keys_str_mv AT nardusmollentze identifyingandprioritizingpotentialhumaninfectingvirusesfromtheirgenomesequences
AT simonababayan identifyingandprioritizingpotentialhumaninfectingvirusesfromtheirgenomesequences
AT danielgstreicker identifyingandprioritizingpotentialhumaninfectingvirusesfromtheirgenomesequences