Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information

Influenza A virus (IAV) has the characteristics of high infectivity and high pathogenicity, which makes IAV infection a serious public health threat. Identifying protein-protein interactions (PPIs) between IAV and human proteins is beneficial for understanding the mechanism of viral infection and de...

Full description

Saved in:
Bibliographic Details
Main Authors: Binghua Li, Xin Li, Xiaoyu Li, Li Wang, Jun Lu, Jia Wang
Format: Article
Language:English
Published: PeerJ Inc. 2025-01-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/18863.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832574348728205312
author Binghua Li
Xin Li
Xiaoyu Li
Li Wang
Jun Lu
Jia Wang
author_facet Binghua Li
Xin Li
Xiaoyu Li
Li Wang
Jun Lu
Jia Wang
author_sort Binghua Li
collection DOAJ
description Influenza A virus (IAV) has the characteristics of high infectivity and high pathogenicity, which makes IAV infection a serious public health threat. Identifying protein-protein interactions (PPIs) between IAV and human proteins is beneficial for understanding the mechanism of viral infection and designing antiviral drugs. In this article, we developed a sequence-based machine learning method for predicting PPI. First, we applied a new negative sample construction method to establish a high-quality IAV-human PPI dataset. Then we used conjoint triad (CT) and Moran autocorrelation (Moran) to encode biologically relevant features. The joint consideration utilizing the complementary information between contiguous and discontinuous amino acids provides a more comprehensive description of PPI information. After comparing different machine learning models, the eXtreme Gradient Boosting (XGBoost) model was determined as the final model for the prediction. The model achieved an accuracy of 96.89%, precision of 98.79%, recall of 94.85%, F1-score of 96.78%. Finally, we successfully identified 3,269 potential target proteins. Gene ontology (GO) and pathway analysis showed that these genes were highly associated with IAV infection. The analysis of the PPI network further revealed that the predicted proteins were classified as core proteins within the human protein interaction network. This study may encourage the identification of potential targets for the discovery of more effective anti-influenza drugs. The source codes and datasets are available at https://github.com/HVPPIlab/IVA-Human-PPI/.
format Article
id doaj-art-05802a02654a405f8b3b3798436ad28d
institution Kabale University
issn 2167-8359
language English
publishDate 2025-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj-art-05802a02654a405f8b3b3798436ad28d2025-02-01T15:05:08ZengPeerJ Inc.PeerJ2167-83592025-01-0113e1886310.7717/peerj.18863Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids informationBinghua Li0Xin Li1Xiaoyu Li2Li Wang3Jun Lu4Jia Wang5College of Informatics, Huazhong Agricultural University, Wuhan, ChinaCollege of Informatics, Huazhong Agricultural University, Wuhan, ChinaCollege of Informatics, Huazhong Agricultural University, Wuhan, ChinaCollege of Informatics, Huazhong Agricultural University, Wuhan, ChinaCollege of Engineering, Huazhong Agricultural University, Wuhan, ChinaCollege of Informatics, Huazhong Agricultural University, Wuhan, ChinaInfluenza A virus (IAV) has the characteristics of high infectivity and high pathogenicity, which makes IAV infection a serious public health threat. Identifying protein-protein interactions (PPIs) between IAV and human proteins is beneficial for understanding the mechanism of viral infection and designing antiviral drugs. In this article, we developed a sequence-based machine learning method for predicting PPI. First, we applied a new negative sample construction method to establish a high-quality IAV-human PPI dataset. Then we used conjoint triad (CT) and Moran autocorrelation (Moran) to encode biologically relevant features. The joint consideration utilizing the complementary information between contiguous and discontinuous amino acids provides a more comprehensive description of PPI information. After comparing different machine learning models, the eXtreme Gradient Boosting (XGBoost) model was determined as the final model for the prediction. The model achieved an accuracy of 96.89%, precision of 98.79%, recall of 94.85%, F1-score of 96.78%. Finally, we successfully identified 3,269 potential target proteins. Gene ontology (GO) and pathway analysis showed that these genes were highly associated with IAV infection. The analysis of the PPI network further revealed that the predicted proteins were classified as core proteins within the human protein interaction network. This study may encourage the identification of potential targets for the discovery of more effective anti-influenza drugs. The source codes and datasets are available at https://github.com/HVPPIlab/IVA-Human-PPI/.https://peerj.com/articles/18863.pdfPathogen-host interaction (PHI)Protein-protein interaction (PPI)Influenza A virusXGBoostMachine learningGO and KEGG
spellingShingle Binghua Li
Xin Li
Xiaoyu Li
Li Wang
Jun Lu
Jia Wang
Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information
PeerJ
Pathogen-host interaction (PHI)
Protein-protein interaction (PPI)
Influenza A virus
XGBoost
Machine learning
GO and KEGG
title Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information
title_full Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information
title_fullStr Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information
title_full_unstemmed Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information
title_short Prediction of influenza A virus-human protein-protein interactions using XGBoost with continuous and discontinuous amino acids information
title_sort prediction of influenza a virus human protein protein interactions using xgboost with continuous and discontinuous amino acids information
topic Pathogen-host interaction (PHI)
Protein-protein interaction (PPI)
Influenza A virus
XGBoost
Machine learning
GO and KEGG
url https://peerj.com/articles/18863.pdf
work_keys_str_mv AT binghuali predictionofinfluenzaavirushumanproteinproteininteractionsusingxgboostwithcontinuousanddiscontinuousaminoacidsinformation
AT xinli predictionofinfluenzaavirushumanproteinproteininteractionsusingxgboostwithcontinuousanddiscontinuousaminoacidsinformation
AT xiaoyuli predictionofinfluenzaavirushumanproteinproteininteractionsusingxgboostwithcontinuousanddiscontinuousaminoacidsinformation
AT liwang predictionofinfluenzaavirushumanproteinproteininteractionsusingxgboostwithcontinuousanddiscontinuousaminoacidsinformation
AT junlu predictionofinfluenzaavirushumanproteinproteininteractionsusingxgboostwithcontinuousanddiscontinuousaminoacidsinformation
AT jiawang predictionofinfluenzaavirushumanproteinproteininteractionsusingxgboostwithcontinuousanddiscontinuousaminoacidsinformation