ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae

Predicting phenotypic properties of a virus directly from its sequence data is an attractive goal for viral epidemiology. Here, we focus narrowly on the Orthocoronavirinae clade and demonstrate models that are powerfully predictive for a human-pathogen phenotype with 76.74% average precision and 85....

Full description

Saved in:
Bibliographic Details
Main Authors: Phillip E. Davis, Joseph A. Russell
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-03-01
Series:Frontiers in Bioinformatics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fbinf.2025.1562668/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849774165057339392
author Phillip E. Davis
Joseph A. Russell
author_facet Phillip E. Davis
Joseph A. Russell
author_sort Phillip E. Davis
collection DOAJ
description Predicting phenotypic properties of a virus directly from its sequence data is an attractive goal for viral epidemiology. Here, we focus narrowly on the Orthocoronavirinae clade and demonstrate models that are powerfully predictive for a human-pathogen phenotype with 76.74% average precision and 85.96% average recall on the withheld test set groups, using only Orf1ab codon frequencies. We show alternative examples for other viral coding sequences and feature representations that do not perform well and discuss what distinguishes the models that are performant. These models point to a small subset of features, specifically 5 codons, that are critical to the success of the models. We discuss and contextualize how this observation may fit within a larger model for the role of translation in virus-host agreement.
format Article
id doaj-art-5bf7b077e655408e9db884c77337b1d7
institution DOAJ
issn 2673-7647
language English
publishDate 2025-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Bioinformatics
spelling doaj-art-5bf7b077e655408e9db884c77337b1d72025-08-20T03:01:49ZengFrontiers Media S.A.Frontiers in Bioinformatics2673-76472025-03-01510.3389/fbinf.2025.15626681562668ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinaePhillip E. DavisJoseph A. RussellPredicting phenotypic properties of a virus directly from its sequence data is an attractive goal for viral epidemiology. Here, we focus narrowly on the Orthocoronavirinae clade and demonstrate models that are powerfully predictive for a human-pathogen phenotype with 76.74% average precision and 85.96% average recall on the withheld test set groups, using only Orf1ab codon frequencies. We show alternative examples for other viral coding sequences and feature representations that do not perform well and discuss what distinguishes the models that are performant. These models point to a small subset of features, specifically 5 codons, that are critical to the success of the models. We discuss and contextualize how this observation may fit within a larger model for the role of translation in virus-host agreement.https://www.frontiersin.org/articles/10.3389/fbinf.2025.1562668/fullmachine learningfeature selectiongenotype-to-phenotypevirusesbioinformactics
spellingShingle Phillip E. Davis
Joseph A. Russell
ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae
Frontiers in Bioinformatics
machine learning
feature selection
genotype-to-phenotype
viruses
bioinformactics
title ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae
title_full ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae
title_fullStr ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae
title_full_unstemmed ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae
title_short ORF1ab codon frequency model predicts host-pathogen relationship in orthocoronavirinae
title_sort orf1ab codon frequency model predicts host pathogen relationship in orthocoronavirinae
topic machine learning
feature selection
genotype-to-phenotype
viruses
bioinformactics
url https://www.frontiersin.org/articles/10.3389/fbinf.2025.1562668/full
work_keys_str_mv AT phillipedavis orf1abcodonfrequencymodelpredictshostpathogenrelationshipinorthocoronavirinae
AT josepharussell orf1abcodonfrequencymodelpredictshostpathogenrelationshipinorthocoronavirinae