Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach

This paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such...

Full description

Saved in:
Bibliographic Details
Main Authors: Nagwan Abdel Samee, Rawan Talaat, Ali Raza, Hadil Shaiba, Souham Meshoul
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10897991/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such as crown gall disease, negatively impacting crop health and productivity. In contrast, Klebsiella aerogenes is an opportunistic pathogen responsible for various human infections, including healthcare-associated infections that often exhibit antibiotic resistance. Additionally, Xanthomonas campestris is a significant plant pathogen that causes various diseases in crops, making its study important for agricultural sustainability. Accurate promoter prediction in these bacterial species is crucial for understanding gene regulation mechanisms, with potential applications in agricultural biotechnology and medical research. In this study, a novel feature engineering approach was employed to extract significant features from the dataset. A Random Forest classifier was utilized to generate probabilistic features based on the probability estimates for each sequence, providing deeper insights into the data. These features were then used to train an ensemble learning model, combining the strengths of multiple classifiers to enhance prediction accuracy. Our method achieved remarkable accuracy, precision, and recall rates of 99%, 98%, and 99%, respectively, which not only demonstrate the effectiveness of our model but also exceed the current performance of the state of the art in this area. By leveraging advanced classification techniques, we provide a robust framework for promoter prediction that can significantly enhance genetic engineering and biotechnological applications involving these organisms.
ISSN:2169-3536