Figure 5 - available via license: Creative Commons Attribution-NonCommercial 4.0 International
Content may be subject to copyright.
VcQrr3 summary profile graph. Boxes indicate selected profiles, and dashed ovals the intersection ones. Each node is labeled with the profile, in parenthetic notation, along with its specific and general frequencies, written as a ratio. An edge from q to q is labeled with the feature(s) from q \q. Similarities between profiles are given by the greatest lower bound, aka 'last common ancestor,' with differences read from edge labels. The root is always the (possibly empty) profile common to all sampled structures. Features are listed by maximal helix with frequency. For illustrative purposes, the secondary structures from Figures 1 and 3, with features highlighted in color, are shown with their selected profile.
Source publication
As the biomedical impact of small RNAs grows, so does the need to understand competing structural alternatives for regions of functional interest. Suboptimal structure analysis provides significantly more RNA base pairing information than a single minimum free energy prediction. Yet computational enhancements like Boltzmann sampling have not been f...
Context in source publication
Similar publications
Gene expression regulatory elements are scattered in gene promoters and pre-mRNAs. In particular, RNA elements lying in untranslated regions (5' and 3'UTRs) are poorly studied because of their peculiar features (i.e., a combination of primary and secondary structure elements) which also pose remarkable computational challenges. Several years ago, w...
It is of great signicance to develop an ecient software system for higher-level structural prediction in RNA/protein sequences. Speaking of RNA secondary structure prediction, it is inevitably required that a prediction system must have an ability to deal with so-called "pseudoknot" structures, one of the most typical and important constructs found...
In this paper we propose the study of properties of RNA secondary structures modeled as dual graphs, by partitioning these graphs into topological components denominated blocks. We give a full characterization of possible topological configurations of these blocks, and, in particular we show that an RNA secondary structure contains a pseudoknot if...
Background
Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On...
Citations
... As with other probing methods, including SHAPE, lead cleavage does not unambiguously distinguish between paired and unpaired positions but provides quantitative evidence that can be converted into a probability that a nucleotide is unpaired. We emphasize that this is not a methodological shortcoming but an inevitable consequence of the fact that RNAs form a free-energy weighted ensemble of structures rather than a single, unambiguous secondary structure (8,69,70). Indeed, recently methods have become available that deconvolve multiple representative structures from a probing signal (70)(71)(72). The 'known' reference structures are therefore necessarily approximations rather than a perfect gold standard. ...
Structural analysis of RNA is an important and versatile tool to investigate the function of this type of molecules in the cell as well as in vitro. Several robust and reliable procedures are available, relying on chemical modification inducing RT stops or nucleotide misincorporations during reverse transcription. Others are based on cleavage reactions and RT stop signals. However, these methods address only one side of the RT stop or misincorporation position. Here, we describe Led-Seq, a new approach based on lead-induced cleavage of unpaired RNA positions, where both resulting cleavage products are investigated. The RNA fragments carrying 2', 3'-cyclic phosphate or 5'-OH ends are selectively ligated to oligonucleotide adapters by specific RNA ligases. In a deep sequencing analysis, the cleavage sites are identified as ligation positions, avoiding possible false positive signals based on premature RT stops. With a benchmark set of transcripts in Escherichia coli, we show that Led-Seq is an improved and reliable approach based on metal ion-induced phosphodiester hydrolysis to investigate RNA structures in vivo.
... RNAprofiling, or just profiling for short, refers to the overall cluster analysis method that organizes and analyzes a collection of secondary structures according to a set of features. It was developed [4] to identify the dominant combinations of base pairing signals in the Boltzmann ensemble. RNAprofiling 1.0 (denoted here Pv1) consistently achieves high sample compression together with low information loss on "small" sequences, on the order of 100 nucleotides (nt). ...
... As described, the content of that information is determined directly from the input sample. When introduced [4], it was established that RNAprofiling provides complementary information to both Sfold and RNAshapes. Moreover, a thorough analysis [9] compared the three, where Pv1 analyzed Boltzmann samples generated by GTfold [10]. ...
... When the observed HC are ordered by decreasing frequency, this yields a distribution with a long tail of low-probability base pairings. The threshold at which to cut the tail is determined by maximizing the average Shannon information entropy [4]. This yields a relatively small set of selected helix classes (SHC). ...
Understanding the base pairing of an RNA sequence provides insight into its molecular structure.By mining suboptimal sampling data, RNAprofiling 1.0 identifies the dominant helices in low-energy secondary structures as features, organizes them into profiles which partition the Boltzmann sample, and highlights key similarities/differences among the most informative, i.e. selected, profiles in a graphical format. Version 2.0 enhances every step of this approach. First, the featured substructures are expanded from helices to stems. Second, profile selection includes low-frequency pairings similar to featured ones. In conjunction, these updates extend the utility of the method to sequences up to length 600, as evaluated over a sizable dataset. Third, relationships are visualized in a decision tree which highlights the most important structural differences. Finally, this cluster analysis is made accessible to experimental researchers in a portable format as an interactive webpage, permitting a much greater understanding of trade-offs among different possible base pairing combinations.
... RNAprofiling, or just profiling for short, refers to the overall cluster analysis method that organizes and analyzes a collection of secondary structures according to a set of features. It was developed [4] to identify the dominant combinations of base pairing signals in the Boltzmann ensemble. RNAprofiling 1.0 (denoted here Pv1) consistently achieves high sample compression together with low information loss on "small" sequences, on the order of 100 nucleotides (nt). ...
... As described, the content of that information is determined directly from the input sample. When introduced [4], it was established that RNAprofiling provides complementary information to both Sfold and RNAshapes. Moreover, a thorough analysis [9] compared the three, where Pv1 analyzed Boltzmann samples generated by GTfold [10]. ...
... When the observed HC are ordered by decreasing frequency, this yields a distribution with a long tail of low-probability base pairings. The threshold at which to cut the tail is determined by maximizing the average Shannon information entropy [4]. This yields a relatively small set of selected helix classes (SHC). ...
Understanding the base pairing of an RNA sequence provides insight into its molecular structure.By mining suboptimal sampling data, RNAprofiling 1.0 identifies the dominant helices in low-energy secondary structures as features, organizes them into profiles which partition the Boltzmann sample, and highlights key similarities/differences among the most informative, i.e. selected, profiles in a graphical format. Version 2.0 enhances every step of this approach. First, the featured substructures are expanded from helices to stems. Second, profile selection includes low-frequency pairings similar to featured ones. In conjunction, these updates extend the utility of the method to sequences up to length 600, as evaluated over a sizable dataset. Third, relationships are visualized in a decision tree which highlights the most important structural differences. Finally, this cluster analysis is made accessible to experimental researchers in a portable format as an interactive webpage, permitting a much greater understanding of trade-offs among different possible base pairing combinations.
... RNAprofiling, or just profiling for short, refers to the overall cluster analysis method that organizes and analyzes a collection of secondary structures according to a set of features. It was developed [4] to identify the dominant combinations of base pairing signals in the Boltzmann ensemble. RNAprofiling 1.0 (denoted here Pv1) consistently achieves high sample compression together with low information loss on "small" sequences, on the order of 100 nucleotides (nt). ...
... As described, the content of that information is determined directly from the input sample. When introduced [4], it was established that RNAprofiling provides complementary information to both Sfold and RNAshapes. Moreover, a thorough analysis [9] compared the three, where Pv1 analyzed Boltzmann samples generated by GTfold [10]. ...
... When the observed HC are ordered by decreasing frequency, this yields a distribution with a long tail of low-probability base pairings. The threshold at which to cut the tail is determined by maximizing the average Shannon information entropy [4]. This yields a relatively small set of selected helix classes (SHC). ...
Understanding the base pairing of an RNA sequence provides insight into its molecular structure. By mining suboptimal sampling data, RNAprofiling 1.0 identifies the dominant helices in low-energy secondary structures as features, organizes them into profiles which partition the Boltzmann sample, and highlights key similarities/differences among the most informative, i.e.selected, profiles in a graphical format. Version 2.0 enhances every step of this approach. First, the featured substructures are expanded from helices to stems. Second, profile selection includes low-frequency pairings similar to featured ones. In conjunction, these updates extend the utility of the method to sequences up to length 600, as evaluated over a sizable dataset. Third, relationships are visualized in a decision tree which highlights the most important structural differences. Finally, this cluster analysis is made accessible to experimental researchers in a portable format as an interactive webpage, permitting a much greater understanding of trade-offs among different possible base pairing combinations.
... There have been substantial advances in methods to characterize RNA dynamic ensembles in vitro (Salmon et al., 2014;Xue et al., 2015;Shi et al., 2016;Tian and Das, 2016;Nichols et al., 2018). However, despite significant progress (Mahen et al., 2005(Mahen et al., , 2010Spitale et al., 2013;Rogers and Heitsch, 2014;Ruminski et al., 2016;Watters et al., 2016;Lee et al., 2017;Li and Aviran, 2018;Woods et al., 2017;Spasic et al., 2018), probing RNA dynamics in vivo remains challenging. Transcriptomewide structure probing experiments indicate that the cellular environment can affect RNA folding relative to in vitro (Beaudoin et al., 2018;Mustoe et al., 2018;Sun et al., 2019). ...
... Transcriptomewide structure probing experiments indicate that the cellular environment can affect RNA folding relative to in vitro (Beaudoin et al., 2018;Mustoe et al., 2018;Sun et al., 2019). Despite some advances in using reactivity data to interpret RNA ensembles (Rogers and Heitsch, 2014;Li and Aviran, 2018;Woods et al., 2017;Spasic et al., 2018), the dependence of reactivity on structure is not entirely understood (Rogers and Heitsch, 2014;Spasic et al., 2018). This combined with low sensitivity to low-abundance transient structures such as ESs has made it difficult to probe secondary structural conformational equilibria within cells (Ganser et al., 2019). ...
... Transcriptomewide structure probing experiments indicate that the cellular environment can affect RNA folding relative to in vitro (Beaudoin et al., 2018;Mustoe et al., 2018;Sun et al., 2019). Despite some advances in using reactivity data to interpret RNA ensembles (Rogers and Heitsch, 2014;Li and Aviran, 2018;Woods et al., 2017;Spasic et al., 2018), the dependence of reactivity on structure is not entirely understood (Rogers and Heitsch, 2014;Spasic et al., 2018). This combined with low sensitivity to low-abundance transient structures such as ESs has made it difficult to probe secondary structural conformational equilibria within cells (Ganser et al., 2019). ...
Low-abundance short-lived non-native conformations referred to as excited states (ESs) are increasingly observed in vitro and implicated in the folding and biological activities of regulatory RNAs. We developed an approach for assessing the relative abundance of RNA ESs within the functional cellular context. Nuclear magnetic resonance (NMR) spectroscopy was used to estimate the degree to which substitution mutations bias conformational equilibria toward the inactive ES in vitro. The cellular activity of the ES-stabilizing mutants was used as an indirect measure of the conformational equilibria within the functional cellular context. Compensatory mutations that restore the ground-state conformation were used to control for changes in sequence. Using this approach, we show that the ESs of two regulatory RNAs from HIV-1, the transactivation response element (TAR) and the Rev response element (RRE), likely form in cells with abundances comparable to those measured in vitro, and their targeted stabilization may provide an avenue for developing anti-HIV therapeutics.
... Un autre outil, appelé RNAHelices [94], prédit les structures d'ARN à partir d'un ensemble d'hélices et classe les structures grâce à une distance basée sur les hélices. Enfin, un outil basant également sa prédiction sur des hélices, appelé RNAprofiling [185], classe les structures prédites suivant la fréquence des hélices dans les structures. ...
Dans cette thèse, nous proposons de nouveaux algorithmes et outils pour la prédiction de structures secondaires d'ARN et de complexes d'ARN, incluant des motifs particuliers, difficiles à prédire, comme les pseudonoeuds. La prédiction de structures d'ARN reste une tâche difficile, et les outils existants, pourtant nombreux, ne donnent pas toujours de bonnes prédictions. Afin de prédire plus précisément ces structures, nous proposons ici des algorithmes qui :i) prédisent les k-meilleures structures;ii) combinent plusieurs modèles de prédiction, afin de bénéficier des avantages de chacun;iii) sont capables de prendre en compte des contraintes utilisateurs et des données biologiques structurales telles que le SHAPE.Nous avons développé trois outils: BiokoP pour la prédiction de structures secondaires d'un ARN, et RCPred et C-RCPred pour la prédiction de structures secondaires de complexes d'ARN. L'outil BiokoP propose plusieurs structures optimales et sous-optimales grâce à la combinaison de deux modèles de prédiction, le modèle énergétique MFE et le modèle probabiliste MEA. Cette combinaison est réalisée grâce à la programmation mathématique multi-objectif, où chaque modèle est assimilé à une fonction objectif. A cet effet, nous avons développé un algorithme générique retournant les k-meilleures courbes de Pareto d'un programme linéaire en nombres entiers bi-objectif. L'outil RCPred, basé sur le modèle MFE, propose plusieurs structures sous-optimales. Il tire parti des nombreux outils existants pour la prédiction de structures secondaires d'ARN seuls et d'interactions ARN-ARN, en prenant en compte des structures secondaires et interactions déjà prédites en entrée. L'objectif de RCPred est de trouver les meilleures combinaisons possibles parmi ces entrées. L'outil C-RCPred est une nouvelle version de RCPred, prenant en compte des contraintes utilisateurs et des données biologiques structurales (SHAPE, PARS et DMS). C-RCPred est basé sur un algorithme multi-objectif, où les différents objectifs correspondent au modèle MFE, au respect des contraintes utilisateurs et à l'accord avec les données biologiques structurales.
... These two approaches are uni ed by the method of sampling suboptimal secondary structures from the Boltzmann ensemble [5]. Moreover, these Boltzmann sample predictions often reveal that the suboptimal secondary structures are organized into two or more distinct modalities [4,20,31,42]. ...
... We investigate the validity of this assumption for three known riboswitches which have proposed base pairing models grounded in experimental data. Using the RNA suboptimal structure cluster analysis tool RNAStructPro ling [31], we nd that there exist riboswitches whose ligand-bound thermodynamics are accessible to Boltzmann sampling, as represented in Figure 1b. ...
... The ability to sample from the Boltzmann ensemble [5] allows one to examine structural alternatives to the MFE structure in proportion to their estimated probability under the NNTM. Hence, it can be used to search for signals of multimodality in RNA secondary structures [9,20,31,33,43]. A Boltzmann sample is a set of structures, typically of size 1000, sampled from the Boltzmann (i.e., Gibbs) ensemble [5]. ...
A riboswitch is a type of RNA molecule that regulates important biological functions by changing structure, typically under ligand-binding. We assess the extent that these ligand-bound structural alternatives are present in the Boltzmann sample, a standard RNA secondary structure prediction method, for three riboswitch test cases. We use the cluster analysis tool RNAStructProfiling to characterize the different modalities present among the suboptimal structures sampled. We compare these modalities to the putative base pairing models obtained from independent experiments using NMR or fluorescence spectroscopy. We find, somewhat unexpectedly, that profiling the Boltzmann sample captures evidence of ligand-bound conformations for two of three riboswitches studied. Moreover, this agreement between predicted modalities and experimental models is consistent with the classification of riboswitches into thermodynamic versus kinetic regulatory mechanisms. Our results support cluster analysis of Boltzmann samples by RNAStructProfiling as a possible basis for de novo identification of thermodynamic riboswitches, while highlighting the challenges for kinetic ones.
... 33 RNA profiling denoises the set of observed base pairs in the Boltzmann sampled structures (sampling according to the Boltzmann distribution) to identify significant combinations of base pairs, which dominate lowenergy RNA secondary structures. 34 The model highlights critical relations at the substructure level, yielding crucial information for molecular biologists. ...
... We also use the online servers of Sfold (Srna module), 23 Mfold, 18 RNAstructure, 19,20 RNAshapes, 32 and RNA profiling 34 with the default parameters to predict the alternative secondary structures for the above 61 tested cases. Table S1. ...
RNA is a versatile macromolecule with the ability to fold into and interconvert between multiple functional conformations. The elucidation of the RNA folding landscape, especially the knowledge of alternative structures, is critical to uncover the physical mechanism of RNA functions. Here, we introduce a helix-based strategy for RNA folding landscape partition and alternative secondary structure determination. The benchmark test of 27 RNAs involving alternative stable structures shows that the model has the ability to divide the whole landscape into distinct partitions at the secondary structure level and predict the representative structures for each partition. Furthermore, the predicted structures and equilibrium populations of metastable conformations for the 2′dG-sensing riboswitch reveal the allosteric conformational switch on transcript length, which is consistent with the experimental study, indicating the importance of metastable states for RNA-based gene regulation. The model delivers a starting point for the landscape-based strategy toward the RNA folding mechanism and functions.
... Others have developed alternative metrics calculated from partition functions to evaluate the accessibility of the possible secondary structures. These include IPknot, Sfold [29], RNAshapes [30] and RNA profiling [31]. However, although efforts in the field have focused on exploring different metrics, researchers have not reached the consensus on which metrics should be broadly adopted. ...
Background
RNA molecules play many crucial roles in living systems. The spatial complexity that exists in RNA structures determines their cellular functions. Therefore, understanding RNA folding conformations, in particular, RNA secondary structures, is critical for elucidating biological functions. Existing literature has focused on RNA design as either an RNA structure prediction problem or an RNA inverse folding problem where free energy has played a key role.
Results
In this research, we propose a Positive-Unlabeled data- driven framework termed ENTRNA. Other than free energy and commonly studied sequence and structural features, we propose a new feature, Sequence Segment Entropy (SSE), to measure the diversity of RNA sequences. ENTRNA is trained and cross-validated using 1024 pseudoknot-free RNAs and 1060 pseudoknotted RNAs from the RNASTRAND database respectively. To test the robustness of the ENTRNA, the models are further blind tested on 206 pseudoknot-free and 93 pseudoknotted RNAs from the PDB database. For pseudoknot-free RNAs, ENTRNA has 86.5% sensitivity on the training dataset and 80.6% sensitivity on the testing dataset. For pseudoknotted RNAs, ENTRNA shows 81.5% sensitivity on the training dataset and 71.0% on the testing dataset. To test the applicability of ENTRNA to long structural-complex RNA, we collect 5 laboratory synthetic RNAs ranging from 1618 to 1790 nucleotides. ENTRNA is able to predict the foldability of 4 RNAs.
Conclusion
In this article, we reformulate the RNA design problem as a foldability prediction problem which is to predict the likelihood of the co-existence of a sequence-structure pair. This new construct has the potential for both RNA structure prediction and the inverse folding problem. In addition, this new construct enables us to explore data-driven approaches in RNA research.
... In addition to low sensitivity to rare transient structures, the quantity and complexity of the data are low relative to the number of parameters needed to define an ensemble, and the nature of the dependence of chemical reactivity on structure is not entirely understood 139 . Nevertheless, methodologies are being developed to interpret chemical probing data in terms of secondary structure ensembles [139][140][141][142] , using strategies similar to those developed to generate NMR 3D ensembles 143 (Supplementary Box 1). This in turn provides insights into how the cellular environment redistributes ensembles compared with the ensembles determined in vitro. ...
RNAs fold into 3D structures that range from simple helical elements to complex tertiary structures and quaternary ribonucleoprotein assemblies. The functions of many regulatory RNAs depend on how their 3D structure changes in response to a diverse array of cellular conditions. In this Review, we examine how the structural characterization of RNA as dynamic ensembles of conformations, which form with different probabilities and at different timescales, is improving our understanding of RNA function in cells. We discuss the mechanisms of gene regulation by microRNAs, riboswitches, ribozymes, post-transcriptional RNA modifications and RNA-binding proteins, and how the cellular environment and processes such as liquid–liquid phase separation may affect RNA folding and activity. The emerging RNA-ensemble–function paradigm is changing our perspective and understanding of RNA regulation, from in vitro to in vivo and from descriptive to predictive. The functions of many regulatory RNAs depend on how their 3D structure changes in response to cellular conditions. Recent studies have revealed that RNA exists as a dynamic ensemble of conformations, which form with different probabilities in different cellular conditions and thus modulate RNA function.