GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions

Abstract Bacterial small RNAs (sRNAs) are pivotal in post-transcriptional regulation, affecting functions like virulence, metabolism, and gene expression by binding specific mRNA targets. Identifying these targets is crucial to understanding sRNA regulation across species. Despite advancements in hi...

Full description

Saved in:
Bibliographic Details
Main Authors: Shani Cohen, Lior Rokach, Isana Veksler-Lublinsky
Format: Article
Language:English
Published: BMC 2025-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-025-06153-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850125414357270528
author Shani Cohen
Lior Rokach
Isana Veksler-Lublinsky
author_facet Shani Cohen
Lior Rokach
Isana Veksler-Lublinsky
author_sort Shani Cohen
collection DOAJ
description Abstract Bacterial small RNAs (sRNAs) are pivotal in post-transcriptional regulation, affecting functions like virulence, metabolism, and gene expression by binding specific mRNA targets. Identifying these targets is crucial to understanding sRNA regulation across species. Despite advancements in high-throughput (HT) experimental methods, they remain technically challenging and are limited to detecting sRNA-target interactions under specific environmental conditions. Therefore, computational approaches, especially machine learning (ML), are essential for identifying strong candidates for biological validation. In this paper, we hypothesize that ML models trained on large-scale interaction data from specific conditions can accurately predict new interactions in unseen conditions within the same bacterial strain. To test this, we developed models from two families: (1) graph neural networks (GNNs), including GraphRNA and kGraphRNA, that learn transformed representations of interacting sRNA-mRNA pairs via graph relationships, and (2) decision forests, sInterRF (Random Forest) and sInterXGB (XGBoost), which use various interaction features for prediction. We also proposed Summation Ensemble Models (SEM) that combine scores from multiple models. Across three seen-to-unseen conditions evaluations, our models —particularly kGraphRNA— significantly improved the area under the ROC curve (AUC) and Precision-Recall curve (PR-AUC) compared to sRNARFTarget, CopraRNA, and RNAup. The SEM model combining GraphRNA and CopraRNA outperformed CopraRNA alone on a low-throughput (LT) interactions test set (HT-to-LT evaluation). Beyond enhanced performance, our models enable target prediction for species-specific sRNAs, a capability lacking in some existing tools. Furthermore, GNN models remove the dependency on external tools like RNAplex or RNAup to compute hybridization duplex or energy features, enhancing scalability and runtime efficiency. While this study focuses on E. coli K12 MG1655 interactions, our methods are fully adaptable to predict interactions in other bacterial strains, given sufficient data for training. Our comprehensive feature importance analysis revealed the complexity of sRNA-mRNA interactions across environmental conditions, underscoring the significance of RNA sequence composition and duplex structure characteristics, like base pairing and energy factors; findings that align with biological evidence from previous studies. As HT experiments expand sRNA-target interaction data across conditions in various bacteria, our ML methods with features analysis offer promising advances in sRNA-target prediction and deeper insights into sRNA regulatory mechanisms across diverse species.
format Article
id doaj-art-e72a7fdae1484ac5a14a8e09007b34af
institution OA Journals
issn 1471-2105
language English
publishDate 2025-05-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj-art-e72a7fdae1484ac5a14a8e09007b34af2025-08-20T02:34:07ZengBMCBMC Bioinformatics1471-21052025-05-0126112610.1186/s12859-025-06153-wGNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditionsShani Cohen0Lior Rokach1Isana Veksler-Lublinsky2Department of Software & Information Systems Engineering, Faculty of Engineering, Ben-Gurion University of the NegevDepartment of Software & Information Systems Engineering, Faculty of Engineering, Ben-Gurion University of the NegevDepartment of Software & Information Systems Engineering, Faculty of Engineering, Ben-Gurion University of the NegevAbstract Bacterial small RNAs (sRNAs) are pivotal in post-transcriptional regulation, affecting functions like virulence, metabolism, and gene expression by binding specific mRNA targets. Identifying these targets is crucial to understanding sRNA regulation across species. Despite advancements in high-throughput (HT) experimental methods, they remain technically challenging and are limited to detecting sRNA-target interactions under specific environmental conditions. Therefore, computational approaches, especially machine learning (ML), are essential for identifying strong candidates for biological validation. In this paper, we hypothesize that ML models trained on large-scale interaction data from specific conditions can accurately predict new interactions in unseen conditions within the same bacterial strain. To test this, we developed models from two families: (1) graph neural networks (GNNs), including GraphRNA and kGraphRNA, that learn transformed representations of interacting sRNA-mRNA pairs via graph relationships, and (2) decision forests, sInterRF (Random Forest) and sInterXGB (XGBoost), which use various interaction features for prediction. We also proposed Summation Ensemble Models (SEM) that combine scores from multiple models. Across three seen-to-unseen conditions evaluations, our models —particularly kGraphRNA— significantly improved the area under the ROC curve (AUC) and Precision-Recall curve (PR-AUC) compared to sRNARFTarget, CopraRNA, and RNAup. The SEM model combining GraphRNA and CopraRNA outperformed CopraRNA alone on a low-throughput (LT) interactions test set (HT-to-LT evaluation). Beyond enhanced performance, our models enable target prediction for species-specific sRNAs, a capability lacking in some existing tools. Furthermore, GNN models remove the dependency on external tools like RNAplex or RNAup to compute hybridization duplex or energy features, enhancing scalability and runtime efficiency. While this study focuses on E. coli K12 MG1655 interactions, our methods are fully adaptable to predict interactions in other bacterial strains, given sufficient data for training. Our comprehensive feature importance analysis revealed the complexity of sRNA-mRNA interactions across environmental conditions, underscoring the significance of RNA sequence composition and duplex structure characteristics, like base pairing and energy factors; findings that align with biological evidence from previous studies. As HT experiments expand sRNA-target interaction data across conditions in various bacteria, our ML methods with features analysis offer promising advances in sRNA-target prediction and deeper insights into sRNA regulatory mechanisms across diverse species.https://doi.org/10.1186/s12859-025-06153-wsRNA-target predictionMachine learningGraph neural networksBacterial gene regulationkGraphRNAGraphRNA
spellingShingle Shani Cohen
Lior Rokach
Isana Veksler-Lublinsky
GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions
BMC Bioinformatics
sRNA-target prediction
Machine learning
Graph neural networks
Bacterial gene regulation
kGraphRNA
GraphRNA
title GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions
title_full GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions
title_fullStr GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions
title_full_unstemmed GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions
title_short GNNs and ensemble models enhance the prediction of new sRNA-mRNA interactions in unseen conditions
title_sort gnns and ensemble models enhance the prediction of new srna mrna interactions in unseen conditions
topic sRNA-target prediction
Machine learning
Graph neural networks
Bacterial gene regulation
kGraphRNA
GraphRNA
url https://doi.org/10.1186/s12859-025-06153-w
work_keys_str_mv AT shanicohen gnnsandensemblemodelsenhancethepredictionofnewsrnamrnainteractionsinunseenconditions
AT liorrokach gnnsandensemblemodelsenhancethepredictionofnewsrnamrnainteractionsinunseenconditions
AT isanavekslerlublinsky gnnsandensemblemodelsenhancethepredictionofnewsrnamrnainteractionsinunseenconditions