Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach
This paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10897991/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850251513716277248 |
|---|---|
| author | Nagwan Abdel Samee Rawan Talaat Ali Raza Hadil Shaiba Souham Meshoul |
| author_facet | Nagwan Abdel Samee Rawan Talaat Ali Raza Hadil Shaiba Souham Meshoul |
| author_sort | Nagwan Abdel Samee |
| collection | DOAJ |
| description | This paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such as crown gall disease, negatively impacting crop health and productivity. In contrast, Klebsiella aerogenes is an opportunistic pathogen responsible for various human infections, including healthcare-associated infections that often exhibit antibiotic resistance. Additionally, Xanthomonas campestris is a significant plant pathogen that causes various diseases in crops, making its study important for agricultural sustainability. Accurate promoter prediction in these bacterial species is crucial for understanding gene regulation mechanisms, with potential applications in agricultural biotechnology and medical research. In this study, a novel feature engineering approach was employed to extract significant features from the dataset. A Random Forest classifier was utilized to generate probabilistic features based on the probability estimates for each sequence, providing deeper insights into the data. These features were then used to train an ensemble learning model, combining the strengths of multiple classifiers to enhance prediction accuracy. Our method achieved remarkable accuracy, precision, and recall rates of 99%, 98%, and 99%, respectively, which not only demonstrate the effectiveness of our model but also exceed the current performance of the state of the art in this area. By leveraging advanced classification techniques, we provide a robust framework for promoter prediction that can significantly enhance genetic engineering and biotechnological applications involving these organisms. |
| format | Article |
| id | doaj-art-872fb345fe4144cc83199d4adfffc0cb |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-872fb345fe4144cc83199d4adfffc0cb2025-08-20T01:57:52ZengIEEEIEEE Access2169-35362025-01-0113421164212810.1109/ACCESS.2025.354452010897991Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning ApproachNagwan Abdel Samee0https://orcid.org/0000-0001-5957-1383Rawan Talaat1https://orcid.org/0009-0001-1152-6951Ali Raza2https://orcid.org/0000-0001-5429-9835Hadil Shaiba3https://orcid.org/0000-0003-1652-6579Souham Meshoul4Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University P.O. Box 84428, Riyadh, Saudi ArabiaDepartment of Biotechnology and Genetics, Agriculture Engineering, Ain Shams University, Cairo, EgyptDepartment of Software Engineering, The University of Lahore, Lahore, PakistanDepartment of Computer Science, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University P.O. Box 84428, Riyadh, Saudi ArabiaDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University P.O. Box 84428, Riyadh, Saudi ArabiaThis paper presents an efficient machine learning-based model for the prediction of promoters in bacterial genomes, specifically targeting Agrobacterium tumefaciens, Klebsiella aerogenes, and Xanthomonas campestris. Agrobacterium tumefaciens is a well-known bacterium that induces plant tumors, such as crown gall disease, negatively impacting crop health and productivity. In contrast, Klebsiella aerogenes is an opportunistic pathogen responsible for various human infections, including healthcare-associated infections that often exhibit antibiotic resistance. Additionally, Xanthomonas campestris is a significant plant pathogen that causes various diseases in crops, making its study important for agricultural sustainability. Accurate promoter prediction in these bacterial species is crucial for understanding gene regulation mechanisms, with potential applications in agricultural biotechnology and medical research. In this study, a novel feature engineering approach was employed to extract significant features from the dataset. A Random Forest classifier was utilized to generate probabilistic features based on the probability estimates for each sequence, providing deeper insights into the data. These features were then used to train an ensemble learning model, combining the strengths of multiple classifiers to enhance prediction accuracy. Our method achieved remarkable accuracy, precision, and recall rates of 99%, 98%, and 99%, respectively, which not only demonstrate the effectiveness of our model but also exceed the current performance of the state of the art in this area. By leveraging advanced classification techniques, we provide a robust framework for promoter prediction that can significantly enhance genetic engineering and biotechnological applications involving these organisms.https://ieeexplore.ieee.org/document/10897991/AgrobacteriumKlebsiellabacterial genomesensemble learningmachine learning |
| spellingShingle | Nagwan Abdel Samee Rawan Talaat Ali Raza Hadil Shaiba Souham Meshoul Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach IEEE Access Agrobacterium Klebsiella bacterial genomes ensemble learning machine learning |
| title | Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach |
| title_full | Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach |
| title_fullStr | Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach |
| title_full_unstemmed | Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach |
| title_short | Prediction of Promotors in Agrobacterium and Klebsiella Using Novel Feature Engineering and Ensemble Learning Approach |
| title_sort | prediction of promotors in agrobacterium and klebsiella using novel feature engineering and ensemble learning approach |
| topic | Agrobacterium Klebsiella bacterial genomes ensemble learning machine learning |
| url | https://ieeexplore.ieee.org/document/10897991/ |
| work_keys_str_mv | AT nagwanabdelsamee predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach AT rawantalaat predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach AT aliraza predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach AT hadilshaiba predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach AT souhammeshoul predictionofpromotorsinagrobacteriumandklebsiellausingnovelfeatureengineeringandensemblelearningapproach |