A hybrid machine learning model with attention mechanism and multidimensional multivariate feature coding for essential gene prediction

Abstract Background Essential genes are crucial for the development, inheritance, and survival of species. The exploration of these genes can unravel the complex mechanisms and fundamental life processes and identify potential therapeutic targets for various diseases. Therefore, the identification o...

Full description

Saved in:
Bibliographic Details
Main Authors: Wu Yan, Fu Yu, Li Tan, Li Mengshan, Xie Xiaojun, Zhou Weihong, Sheng Sheng, Wang Jun, Wu Fu-an
Format: Article
Language:English
Published: BMC 2025-04-01
Series:BMC Biology
Subjects:
Online Access:https://doi.org/10.1186/s12915-025-02209-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Background Essential genes are crucial for the development, inheritance, and survival of species. The exploration of these genes can unravel the complex mechanisms and fundamental life processes and identify potential therapeutic targets for various diseases. Therefore, the identification of essential genes is significant. Machine learning has become the mainstream approach for essential gene prediction. However, some key challenges in machine learning need to be addressed, such as the extraction of genetic features, the impact of imbalanced data, and the cross-species generalization ability. Results Here, we proposed a hybrid machine learning model based on graph convolutional neural networks (GCN) and bi-directional long short-term memory (Bi-LSTM) with attention mechanism and multidimensional multivariate feature coding for essential gene prediction, called EGP Hybrid-ML. In the model, GCN was used to extract feature encoding information from the visualized graphics of gene sequences and the attention mechanism was combined with Bi-LSTM to assess the importance of each feature in gene sequences and analyze the influences of different feature encoding methods and data imbalance. Additionally, the cross-species predictive performance of the model was evaluated through cross-validation. The results indicated that the sensitivity of the EGP Hybrid-ML model reached 0.9122. Conclusions This model demonstrated the superior predictive performance and strong generalization capabilities compared to other models. The EGP Hybrid-ML model proposed in this paper has broad application prospects in bioinformatics, chemical information, and pharmaceutical information. The codes, architectures, parameters, and datasets of the proposed model are available free of charge at GitHub ( https://github.com/gnnumsli/EGP-Hybrid-ML ). Graphical Abstract
ISSN:1741-7007