Language models learn to represent antigenic properties of human influenza A(H3) virus

Abstract Given that influenza vaccine effectiveness depends on a good antigenic match between the vaccine and circulating viruses, it is important to assess the antigenic properties of newly emerging variants continuously. With the increasing application of real-time pathogen genomic surveillance, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Francesco Durazzi, Marion P. G. Koopmans, Ron A. M. Fouchier, Daniel Remondini
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-03275-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849238652220080128
author Francesco Durazzi
Marion P. G. Koopmans
Ron A. M. Fouchier
Daniel Remondini
author_facet Francesco Durazzi
Marion P. G. Koopmans
Ron A. M. Fouchier
Daniel Remondini
author_sort Francesco Durazzi
collection DOAJ
description Abstract Given that influenza vaccine effectiveness depends on a good antigenic match between the vaccine and circulating viruses, it is important to assess the antigenic properties of newly emerging variants continuously. With the increasing application of real-time pathogen genomic surveillance, a key question is if antigenic properties can reliably be predicted from influenza virus genomic information. Based on validated linked datasets of influenza virus genomic and wet lab experimental results, in silico models may be of use to learn to predict immune escape of variants of interest starting from the protein sequence only. In this study, we compared several machine-learning methods to reconstruct antigenic map coordinates for HA1 protein sequences of influenza A(H3N2) virus, to rank substitutions responsible for major antigenic changes, and to recognize variants with novel antigenic properties that may warrant future vaccine updates. Methods based on deep learning language models (BiLSTM and ProtBERT) and more classical approaches based solely on genetic distances and physicochemical properties of amino acid sequences had comparable performances over the coarser features of the map, but the first two performed better over fine-grained features like single amino acid-driven antigenic change and in silico deep mutational scanning experiments to rank the substitutions with the largest impact on antigenic properties. Given that the best performing model that produces protein embeddings is agnostic to the specific pathogen, the presented approach may be applicable to other pathogens.
format Article
id doaj-art-d6428ff5f1e54544b319d9ec54fd5d23
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-d6428ff5f1e54544b319d9ec54fd5d232025-08-20T04:01:26ZengNature PortfolioScientific Reports2045-23222025-07-0115111010.1038/s41598-025-03275-2Language models learn to represent antigenic properties of human influenza A(H3) virusFrancesco Durazzi0Marion P. G. Koopmans1Ron A. M. Fouchier2Daniel Remondini3Department of Physics and Astronomy, University of BolognaDepartment of Viroscience, Erasmus Medical CentreDepartment of Viroscience, Erasmus Medical CentreDepartment of Physics and Astronomy, University of BolognaAbstract Given that influenza vaccine effectiveness depends on a good antigenic match between the vaccine and circulating viruses, it is important to assess the antigenic properties of newly emerging variants continuously. With the increasing application of real-time pathogen genomic surveillance, a key question is if antigenic properties can reliably be predicted from influenza virus genomic information. Based on validated linked datasets of influenza virus genomic and wet lab experimental results, in silico models may be of use to learn to predict immune escape of variants of interest starting from the protein sequence only. In this study, we compared several machine-learning methods to reconstruct antigenic map coordinates for HA1 protein sequences of influenza A(H3N2) virus, to rank substitutions responsible for major antigenic changes, and to recognize variants with novel antigenic properties that may warrant future vaccine updates. Methods based on deep learning language models (BiLSTM and ProtBERT) and more classical approaches based solely on genetic distances and physicochemical properties of amino acid sequences had comparable performances over the coarser features of the map, but the first two performed better over fine-grained features like single amino acid-driven antigenic change and in silico deep mutational scanning experiments to rank the substitutions with the largest impact on antigenic properties. Given that the best performing model that produces protein embeddings is agnostic to the specific pathogen, the presented approach may be applicable to other pathogens.https://doi.org/10.1038/s41598-025-03275-2
spellingShingle Francesco Durazzi
Marion P. G. Koopmans
Ron A. M. Fouchier
Daniel Remondini
Language models learn to represent antigenic properties of human influenza A(H3) virus
Scientific Reports
title Language models learn to represent antigenic properties of human influenza A(H3) virus
title_full Language models learn to represent antigenic properties of human influenza A(H3) virus
title_fullStr Language models learn to represent antigenic properties of human influenza A(H3) virus
title_full_unstemmed Language models learn to represent antigenic properties of human influenza A(H3) virus
title_short Language models learn to represent antigenic properties of human influenza A(H3) virus
title_sort language models learn to represent antigenic properties of human influenza a h3 virus
url https://doi.org/10.1038/s41598-025-03275-2
work_keys_str_mv AT francescodurazzi languagemodelslearntorepresentantigenicpropertiesofhumaninfluenzaah3virus
AT marionpgkoopmans languagemodelslearntorepresentantigenicpropertiesofhumaninfluenzaah3virus
AT ronamfouchier languagemodelslearntorepresentantigenicpropertiesofhumaninfluenzaah3virus
AT danielremondini languagemodelslearntorepresentantigenicpropertiesofhumaninfluenzaah3virus