Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach

This paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such...

Full description

Saved in:
Bibliographic Details
Main Authors: Nagwan Abdel Samee, Rawan Talaat, Ali Raza, Hadil Shaiba, Souham Meshoul
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10897991/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850251513716277248
author Nagwan Abdel Samee
Rawan Talaat
Ali Raza
Hadil Shaiba
Souham Meshoul
author_facet Nagwan Abdel Samee
Rawan Talaat
Ali Raza
Hadil Shaiba
Souham Meshoul
author_sort Nagwan Abdel Samee
collection DOAJ
description This paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such as crown gall disease, negatively impacting crop health and productivity. In contrast, Klebsiella aerogenes is an opportunistic pathogen responsible for various human infections, including healthcare-associated infections that often exhibit antibiotic resistance. Additionally, Xanthomonas campestris is a significant plant pathogen that causes various diseases in crops, making its study important for agricultural sustainability. Accurate promoter prediction in these bacterial species is crucial for understanding gene regulation mechanisms, with potential applications in agricultural biotechnology and medical research. In this study, a novel feature engineering approach was employed to extract significant features from the dataset. A Random Forest classifier was utilized to generate probabilistic features based on the probability estimates for each sequence, providing deeper insights into the data. These features were then used to train an ensemble learning model, combining the strengths of multiple classifiers to enhance prediction accuracy. Our method achieved remarkable accuracy, precision, and recall rates of 99%, 98%, and 99%, respectively, which not only demonstrate the effectiveness of our model but also exceed the current performance of the state of the art in this area. By leveraging advanced classification techniques, we provide a robust framework for promoter prediction that can significantly enhance genetic engineering and biotechnological applications involving these organisms.
format Article
id doaj-art-872fb345fe4144cc83199d4adfffc0cb
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-872fb345fe4144cc83199d4adfffc0cb2025-08-20T01:57:52ZengIEEEIEEE Access2169-35362025-01-0113421164212810.1109/ACCESS.2025.354452010897991Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning ApproachNagwan Abdel Samee0https://orcid.org/0000-0001-5957-1383Rawan Talaat1https://orcid.org/0009-0001-1152-6951Ali Raza2https://orcid.org/0000-0001-5429-9835Hadil Shaiba3https://orcid.org/0000-0003-1652-6579Souham Meshoul4Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University P.O. Box 84428, Riyadh, Saudi ArabiaDepartment of Biotechnology and Genetics, Agriculture Engineering, Ain Shams University, Cairo, EgyptDepartment of Software Engineering, The University of Lahore, Lahore, PakistanDepartment of Computer Science, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University P.O. Box 84428, Riyadh, Saudi ArabiaDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University P.O. Box 84428, Riyadh, Saudi ArabiaThis paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such as crown gall disease, negatively impacting crop health and productivity. In contrast, Klebsiella aerogenes is an opportunistic pathogen responsible for various human infections, including healthcare-associated infections that often exhibit antibiotic resistance. Additionally, Xanthomonas campestris is a significant plant pathogen that causes various diseases in crops, making its study important for agricultural sustainability. Accurate promoter prediction in these bacterial species is crucial for understanding gene regulation mechanisms, with potential applications in agricultural biotechnology and medical research. In this study, a novel feature engineering approach was employed to extract significant features from the dataset. A Random Forest classifier was utilized to generate probabilistic features based on the probability estimates for each sequence, providing deeper insights into the data. These features were then used to train an ensemble learning model, combining the strengths of multiple classifiers to enhance prediction accuracy. Our method achieved remarkable accuracy, precision, and recall rates of 99%, 98%, and 99%, respectively, which not only demonstrate the effectiveness of our model but also exceed the current performance of the state of the art in this area. By leveraging advanced classification techniques, we provide a robust framework for promoter prediction that can significantly enhance genetic engineering and biotechnological applications involving these organisms.https://ieeexplore.ieee.org/document/10897991/AgrobacteriumKlebsiellabacterial genomesensemble learningmachine learning
spellingShingle Nagwan Abdel Samee
Rawan Talaat
Ali Raza
Hadil Shaiba
Souham Meshoul
Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach
IEEE Access
Agrobacterium
Klebsiella
bacterial genomes
ensemble learning
machine learning
title Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach
title_full Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach
title_fullStr Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach
title_full_unstemmed Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach
title_short Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach
title_sort prediction of promotors in agrobacterium and klebsiella using novel feature engineering and ensemble learning approach
topic Agrobacterium
Klebsiella
bacterial genomes
ensemble learning
machine learning
url https://ieeexplore.ieee.org/document/10897991/
work_keys_str_mv AT nagwanabdelsamee predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach
AT rawantalaat predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach
AT aliraza predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach
AT hadilshaiba predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach
AT souhammeshoul predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach