How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.

Identifying novel and functional RNA structures remains a significant challenge in RNA motif design and is crucial for developing RNA-based therapeutics. Here we introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content...

Full description

Saved in:
Bibliographic Details
Main Authors: Rui Wang, Tamar Schlick
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-07-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1013230
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849714029033947136
author Rui Wang
Tamar Schlick
author_facet Rui Wang
Tamar Schlick
author_sort Rui Wang
collection DOAJ
description Identifying novel and functional RNA structures remains a significant challenge in RNA motif design and is crucial for developing RNA-based therapeutics. Here we introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% (121 dual graphs) correspond to approximately 200,000 known RNA atomic fragments/substructures (collected in 2021) using the RNA-as-Graphs (RAG) framework. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters. The cluster with the higher percentage of known dual graphs for RNA is defined as the "RNA-like" cluster, while the other is considered as "non-RNA-like". The distance between each dual graph and the center of the RNA-like cluster represents the likelihood of it belonging to RNA structures. From validation, our PSG-based RNA-like cluster includes 97.3% of the 121 known RNA dual graphs, suggesting good performance. Furthermore, 46.017% of the hypothetical RNAs are predicted to be RNA-like. Among the top 15 graphs identified as high-likelihood candidates for novel RNA motifs, 4 were confirmed from the RNA dataset collected in 2022. Significantly, we observe that all the top 15 RNA-like dual graphs can be separated into multiple subgraphs, whereas the top 15 non-RNA-like dual graphs tend not to have any subgraphs (subgraphs preserve pseudoknots and junctions). Moreover, a significant topological difference between top RNA-like and non-RNA-like graphs is evident when comparing their topological features (e.g., Betti-0 and Betti-1 numbers). These findings provide valuable insights into the size of the RNA motif universe and RNA design strategies, offering a novel framework for predicting RNA graph topologies and guiding the discovery of novel RNA motifs, perhaps anti-viral therapeutics by subgraph assembly.
format Article
id doaj-art-7c9ff7c3e8244205a5c6a5fea5abc0ee
institution DOAJ
issn 1553-734X
1553-7358
language English
publishDate 2025-07-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Computational Biology
spelling doaj-art-7c9ff7c3e8244205a5c6a5fea5abc0ee2025-08-20T03:13:48ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582025-07-01217e101323010.1371/journal.pcbi.1013230How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.Rui WangTamar SchlickIdentifying novel and functional RNA structures remains a significant challenge in RNA motif design and is crucial for developing RNA-based therapeutics. Here we introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% (121 dual graphs) correspond to approximately 200,000 known RNA atomic fragments/substructures (collected in 2021) using the RNA-as-Graphs (RAG) framework. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters. The cluster with the higher percentage of known dual graphs for RNA is defined as the "RNA-like" cluster, while the other is considered as "non-RNA-like". The distance between each dual graph and the center of the RNA-like cluster represents the likelihood of it belonging to RNA structures. From validation, our PSG-based RNA-like cluster includes 97.3% of the 121 known RNA dual graphs, suggesting good performance. Furthermore, 46.017% of the hypothetical RNAs are predicted to be RNA-like. Among the top 15 graphs identified as high-likelihood candidates for novel RNA motifs, 4 were confirmed from the RNA dataset collected in 2022. Significantly, we observe that all the top 15 RNA-like dual graphs can be separated into multiple subgraphs, whereas the top 15 non-RNA-like dual graphs tend not to have any subgraphs (subgraphs preserve pseudoknots and junctions). Moreover, a significant topological difference between top RNA-like and non-RNA-like graphs is evident when comparing their topological features (e.g., Betti-0 and Betti-1 numbers). These findings provide valuable insights into the size of the RNA motif universe and RNA design strategies, offering a novel framework for predicting RNA graph topologies and guiding the discovery of novel RNA motifs, perhaps anti-viral therapeutics by subgraph assembly.https://doi.org/10.1371/journal.pcbi.1013230
spellingShingle Rui Wang
Tamar Schlick
How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.
PLoS Computational Biology
title How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.
title_full How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.
title_fullStr How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.
title_full_unstemmed How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.
title_short How large is the universe of RNA-like motifs? A clustering analysis of RNA graph motifs using topological descriptors.
title_sort how large is the universe of rna like motifs a clustering analysis of rna graph motifs using topological descriptors
url https://doi.org/10.1371/journal.pcbi.1013230
work_keys_str_mv AT ruiwang howlargeistheuniverseofrnalikemotifsaclusteringanalysisofrnagraphmotifsusingtopologicaldescriptors
AT tamarschlick howlargeistheuniverseofrnalikemotifsaclusteringanalysisofrnagraphmotifsusingtopologicaldescriptors