A protein language model for exploring viral fitness landscapes
Abstract Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype–fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk var...
Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nature Portfolio
2025-05-01
|
| Series: | Nature Communications |
| Online Access: | https://doi.org/10.1038/s41467-025-59422-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849309825133969408 |
|---|---|
| author | Jumpei Ito Adam Strange Wei Liu Gustav Joas Spyros Lytras The Genotype to Phenotype Japan (G2P-Japan) Consortium Kei Sato |
| author_facet | Jumpei Ito Adam Strange Wei Liu Gustav Joas Spyros Lytras The Genotype to Phenotype Japan (G2P-Japan) Consortium Kei Sato |
| author_sort | Jumpei Ito |
| collection | DOAJ |
| description | Abstract Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype–fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we present CoVFit, a protein language model adapted from ESM-2, designed to predict variant fitness based solely on spike protein sequences. CoVFit was trained on genotype–fitness data derived from viral genome surveillance and functional mutation assays related to immune evasion. CoVFit successively ranked the fitness of unknown future variants harboring nearly 15 mutations with informative accuracy. CoVFit identified 959 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, we show that CoVFit is applicable for predicting viral evolution through single amino acid mutations. Our study gives insight into the SARS-CoV-2 fitness landscape and provides a tool for efficiently identifying SARS-CoV-2 variants with higher epidemic risk. |
| format | Article |
| id | doaj-art-d31467f3729f406ba1483f031c7ad680 |
| institution | Kabale University |
| issn | 2041-1723 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Nature Portfolio |
| record_format | Article |
| series | Nature Communications |
| spelling | doaj-art-d31467f3729f406ba1483f031c7ad6802025-08-20T03:53:57ZengNature PortfolioNature Communications2041-17232025-05-0116111610.1038/s41467-025-59422-wA protein language model for exploring viral fitness landscapesJumpei Ito0Adam Strange1Wei Liu2Gustav Joas3Spyros Lytras4The Genotype to Phenotype Japan (G2P-Japan) ConsortiumKei Sato5Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoDivision of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of TokyoAbstract Successively emerging SARS-CoV-2 variants lead to repeated epidemic surges through escalated fitness (i.e., relative effective reproduction number between variants). Modeling the genotype–fitness relationship enables us to pinpoint the mutations boosting viral fitness and flag high-risk variants immediately after their detection. Here, we present CoVFit, a protein language model adapted from ESM-2, designed to predict variant fitness based solely on spike protein sequences. CoVFit was trained on genotype–fitness data derived from viral genome surveillance and functional mutation assays related to immune evasion. CoVFit successively ranked the fitness of unknown future variants harboring nearly 15 mutations with informative accuracy. CoVFit identified 959 fitness elevation events throughout SARS-CoV-2 evolution until late 2023. Furthermore, we show that CoVFit is applicable for predicting viral evolution through single amino acid mutations. Our study gives insight into the SARS-CoV-2 fitness landscape and provides a tool for efficiently identifying SARS-CoV-2 variants with higher epidemic risk.https://doi.org/10.1038/s41467-025-59422-w |
| spellingShingle | Jumpei Ito Adam Strange Wei Liu Gustav Joas Spyros Lytras The Genotype to Phenotype Japan (G2P-Japan) Consortium Kei Sato A protein language model for exploring viral fitness landscapes Nature Communications |
| title | A protein language model for exploring viral fitness landscapes |
| title_full | A protein language model for exploring viral fitness landscapes |
| title_fullStr | A protein language model for exploring viral fitness landscapes |
| title_full_unstemmed | A protein language model for exploring viral fitness landscapes |
| title_short | A protein language model for exploring viral fitness landscapes |
| title_sort | protein language model for exploring viral fitness landscapes |
| url | https://doi.org/10.1038/s41467-025-59422-w |
| work_keys_str_mv | AT jumpeiito aproteinlanguagemodelforexploringviralfitnesslandscapes AT adamstrange aproteinlanguagemodelforexploringviralfitnesslandscapes AT weiliu aproteinlanguagemodelforexploringviralfitnesslandscapes AT gustavjoas aproteinlanguagemodelforexploringviralfitnesslandscapes AT spyroslytras aproteinlanguagemodelforexploringviralfitnesslandscapes AT thegenotypetophenotypejapang2pjapanconsortium aproteinlanguagemodelforexploringviralfitnesslandscapes AT keisato aproteinlanguagemodelforexploringviralfitnesslandscapes AT jumpeiito proteinlanguagemodelforexploringviralfitnesslandscapes AT adamstrange proteinlanguagemodelforexploringviralfitnesslandscapes AT weiliu proteinlanguagemodelforexploringviralfitnesslandscapes AT gustavjoas proteinlanguagemodelforexploringviralfitnesslandscapes AT spyroslytras proteinlanguagemodelforexploringviralfitnesslandscapes AT thegenotypetophenotypejapang2pjapanconsortium proteinlanguagemodelforexploringviralfitnesslandscapes AT keisato proteinlanguagemodelforexploringviralfitnesslandscapes |