Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment

Abstract Background Influenza A virus (IAV) is a major global health threat, causing seasonal epidemics and occasional pandemics. Particularly, Influenza A viruses from avian species pose significant zoonotic threats, with PB2 adaptation serving as a critical first step in cross-species transmission...

Full description

Saved in:
Bibliographic Details
Main Authors: Sangwook Kim, Min-Ah Kim, Bitgoeul Kim, Jisu Lee, Se-Kyung Jung, Jonghong Kim, Ho-Young Chung, Chung-Young Lee, Sungmoon Jeong
Format: Article
Language:English
Published: BMC 2025-04-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-025-11589-8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850176957297197056
author Sangwook Kim
Min-Ah Kim
Bitgoeul Kim
Jisu Lee
Se-Kyung Jung
Jonghong Kim
Ho-Young Chung
Chung-Young Lee
Sungmoon Jeong
author_facet Sangwook Kim
Min-Ah Kim
Bitgoeul Kim
Jisu Lee
Se-Kyung Jung
Jonghong Kim
Ho-Young Chung
Chung-Young Lee
Sungmoon Jeong
author_sort Sangwook Kim
collection DOAJ
description Abstract Background Influenza A virus (IAV) is a major global health threat, causing seasonal epidemics and occasional pandemics. Particularly, Influenza A viruses from avian species pose significant zoonotic threats, with PB2 adaptation serving as a critical first step in cross-species transmission. A comprehensive risk assessment framework based on PB2 sequences is necessary, which should encompass detailed analyses of specific residues and mutations while maintaining sufficient generality for application to non-PB2 segments. Results In this study, we developed two complementary approaches: a regression-based model for accurately distinguishing among risk groups, and a SHAP-based risk assessment model for more meaningful risk analyses. For the regression-based risk models, we compared various methodologies, including tree ensemble methods, conventional regression models, and deep learning architectures. The optimized regression model, combined with SHAP value analysis, identified and ranked individual residues contributing to zoonotic potential. The SHAP-based risk model enabled intra-class analyses within the zoonotic risk assessment framework and quantified risk yields from specific mutations. Conclusion Experimental analyses demonstrated that the Random Forest regression model outperformed other models in most cases, and we validated the target value settings for risk regression through ablation studies. Our SHAP-based analysis identified key residues (271A, 627K, 591R, 588A, 292I, 684S, 684A, 81M, 199S, and 368Q) and mutations (T271A, Q368R/K, E627K, Q591R, A588T/I/V, and I292V/T) critical for zoonotic risk assessment. Using the SHAP-based risk assessment model, we found that influenza A viruses from Phasianidae showed elevated zoonotic risk scores compared to those from other avian species. Additionally, mutations I292V/T, Q368R, A588T/I, V598A/I/T, and E/V627K were identified as significant mutations in the Phasianidae. These PB2-focused quantitative methods provide a robust and generalizable framework for both rapid screening of avians’ zoonotic potential and analytical quantification of risks associated with specific residues or mutations.
format Article
id doaj-art-bec00976a0b648d78deb3ad76d223037
institution OA Journals
issn 1471-2164
language English
publishDate 2025-04-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj-art-bec00976a0b648d78deb3ad76d2230372025-08-20T02:19:07ZengBMCBMC Genomics1471-21642025-04-0126111410.1186/s12864-025-11589-8Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segmentSangwook Kim0Min-Ah Kim1Bitgoeul Kim2Jisu Lee3Se-Kyung Jung4Jonghong Kim5Ho-Young Chung6Chung-Young Lee7Sungmoon Jeong8Bio-medical Research Institute, Kyungpook National University HospitalDepartment of Microbiology, School of Medicine, Kyungpook National UniversityDepartment of Microbiology, School of Medicine, Kyungpook National UniversityDepartment of Microbiology, School of Medicine, Kyungpook National UniversityDepartment of Microbiology, School of Medicine, Kyungpook National UniversityDepartment of Neurology, Keimyung University Dongsan Medical CenterDepartment of Medical Informatics, School of Medicine, Kyungpook National UniversityDepartment of Microbiology, School of Medicine, Kyungpook National UniversityDepartment of Medical Informatics, School of Medicine, Kyungpook National UniversityAbstract Background Influenza A virus (IAV) is a major global health threat, causing seasonal epidemics and occasional pandemics. Particularly, Influenza A viruses from avian species pose significant zoonotic threats, with PB2 adaptation serving as a critical first step in cross-species transmission. A comprehensive risk assessment framework based on PB2 sequences is necessary, which should encompass detailed analyses of specific residues and mutations while maintaining sufficient generality for application to non-PB2 segments. Results In this study, we developed two complementary approaches: a regression-based model for accurately distinguishing among risk groups, and a SHAP-based risk assessment model for more meaningful risk analyses. For the regression-based risk models, we compared various methodologies, including tree ensemble methods, conventional regression models, and deep learning architectures. The optimized regression model, combined with SHAP value analysis, identified and ranked individual residues contributing to zoonotic potential. The SHAP-based risk model enabled intra-class analyses within the zoonotic risk assessment framework and quantified risk yields from specific mutations. Conclusion Experimental analyses demonstrated that the Random Forest regression model outperformed other models in most cases, and we validated the target value settings for risk regression through ablation studies. Our SHAP-based analysis identified key residues (271A, 627K, 591R, 588A, 292I, 684S, 684A, 81M, 199S, and 368Q) and mutations (T271A, Q368R/K, E627K, Q591R, A588T/I/V, and I292V/T) critical for zoonotic risk assessment. Using the SHAP-based risk assessment model, we found that influenza A viruses from Phasianidae showed elevated zoonotic risk scores compared to those from other avian species. Additionally, mutations I292V/T, Q368R, A588T/I, V598A/I/T, and E/V627K were identified as significant mutations in the Phasianidae. These PB2-focused quantitative methods provide a robust and generalizable framework for both rapid screening of avians’ zoonotic potential and analytical quantification of risks associated with specific residues or mutations.https://doi.org/10.1186/s12864-025-11589-8Influenza A virusAvian influenza virusPB2Artificial intelligenceMachine learningSHAP
spellingShingle Sangwook Kim
Min-Ah Kim
Bitgoeul Kim
Jisu Lee
Se-Kyung Jung
Jonghong Kim
Ho-Young Chung
Chung-Young Lee
Sungmoon Jeong
Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment
BMC Genomics
Influenza A virus
Avian influenza virus
PB2
Artificial intelligence
Machine learning
SHAP
title Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment
title_full Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment
title_fullStr Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment
title_full_unstemmed Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment
title_short Machine learning assessment of zoonotic potential in avian influenza viruses using PB2 segment
title_sort machine learning assessment of zoonotic potential in avian influenza viruses using pb2 segment
topic Influenza A virus
Avian influenza virus
PB2
Artificial intelligence
Machine learning
SHAP
url https://doi.org/10.1186/s12864-025-11589-8
work_keys_str_mv AT sangwookkim machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT minahkim machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT bitgoeulkim machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT jisulee machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT sekyungjung machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT jonghongkim machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT hoyoungchung machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT chungyounglee machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment
AT sungmoonjeong machinelearningassessmentofzoonoticpotentialinavianinfluenzavirusesusingpb2segment