A protein language model for exploring viral fitness landscapes

Abstract Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype–fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk var...

Full description

Saved in:
Bibliographic Details
Main Authors: Jumpei Ito, Adam Strange, Wei Liu, Gustav Joas, Spyros Lytras, The Genotype to Phenotype Japan (G2P-Japan) Consortium, Kei Sato
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-025-59422-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849309825133969408
author Jumpei Ito
Adam Strange
Wei Liu
Gustav Joas
Spyros Lytras
The Genotype to Phenotype Japan (G2P-Japan) Consortium
Kei Sato
author_facet Jumpei Ito
Adam Strange
Wei Liu
Gustav Joas
Spyros Lytras
The Genotype to Phenotype Japan (G2P-Japan) Consortium
Kei Sato
author_sort Jumpei Ito
collection DOAJ
description Abstract Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype–fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we present CoVFit, a protein language model adapted from ESM-2, designed to predict variant fitness based solely on spike protein sequences. CoVFit was trained on genotype–fitness data derived from viral genome surveillance and functional mutation assays related to immune evasion. CoVFit successively ranked the fitness of unknown future variants harboring nearly 15 mutations with informative accuracy. CoVFit identified 959 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, we show that CoVFit is applicable for predicting viral evolution through single amino acid mutations. Our study gives insight into the SARS-CoV-2 fitness landscape and provides a tool for efficiently identifying SARS-CoV-2 variants with higher epidemic risk.
format Article
id doaj-art-d31467f3729f406ba1483f031c7ad680
institution Kabale University
issn 2041-1723
language English
publishDate 2025-05-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-d31467f3729f406ba1483f031c7ad6802025-08-20T03:53:57ZengNature PortfolioNature Communications2041-17232025-05-0116111610.1038/s41467-025-59422-wA protein language model for exploring viral fitness landscapesJumpei Ito0Adam Strange1Wei Liu2Gustav Joas3Spyros Lytras4The Genotype to Phenotype Japan (G2P-Japan) ConsortiumKei Sato5Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoAbstract Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype–fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we present CoVFit, a protein language model adapted from ESM-2, designed to predict variant fitness based solely on spike protein sequences. CoVFit was trained on genotype–fitness data derived from viral genome surveillance and functional mutation assays related to immune evasion. CoVFit successively ranked the fitness of unknown future variants harboring nearly 15 mutations with informative accuracy. CoVFit identified 959 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, we show that CoVFit is applicable for predicting viral evolution through single amino acid mutations. Our study gives insight into the SARS-CoV-2 fitness landscape and provides a tool for efficiently identifying SARS-CoV-2 variants with higher epidemic risk.https://doi.org/10.1038/s41467-025-59422-w
spellingShingle Jumpei Ito
Adam Strange
Wei Liu
Gustav Joas
Spyros Lytras
The Genotype to Phenotype Japan (G2P-Japan) Consortium
Kei Sato
A protein language model for exploring viral fitness landscapes
Nature Communications
title A protein language model for exploring viral fitness landscapes
title_full A protein language model for exploring viral fitness landscapes
title_fullStr A protein language model for exploring viral fitness landscapes
title_full_unstemmed A protein language model for exploring viral fitness landscapes
title_short A protein language model for exploring viral fitness landscapes
title_sort protein language model for exploring viral fitness landscapes
url https://doi.org/10.1038/s41467-025-59422-w
work_keys_str_mv AT jumpeiito aproteinlanguagemodelforexploringviralfitnesslandscapes
AT adamstrange aproteinlanguagemodelforexploringviralfitnesslandscapes
AT weiliu aproteinlanguagemodelforexploringviralfitnesslandscapes
AT gustavjoas aproteinlanguagemodelforexploringviralfitnesslandscapes
AT spyroslytras aproteinlanguagemodelforexploringviralfitnesslandscapes
AT thegenotypetophenotypejapang2pjapanconsortium aproteinlanguagemodelforexploringviralfitnesslandscapes
AT keisato aproteinlanguagemodelforexploringviralfitnesslandscapes
AT jumpeiito proteinlanguagemodelforexploringviralfitnesslandscapes
AT adamstrange proteinlanguagemodelforexploringviralfitnesslandscapes
AT weiliu proteinlanguagemodelforexploringviralfitnesslandscapes
AT gustavjoas proteinlanguagemodelforexploringviralfitnesslandscapes
AT spyroslytras proteinlanguagemodelforexploringviralfitnesslandscapes
AT thegenotypetophenotypejapang2pjapanconsortium proteinlanguagemodelforexploringviralfitnesslandscapes
AT keisato proteinlanguagemodelforexploringviralfitnesslandscapes