PosterPDF Available

SHAPE directed RNA folding

Authors:

Abstract

The spatial structure of RNA plays an important role in genome regulation because it critically influences the interaction with proteins, and other nucleic acids. Knowledge of RNA structure is therefore crucial for understanding various biological processes. Chemical and enzymatic probing methods provide information concerning the flexibility and accessibility at nucleotide resolution. As these methods are becoming a frequently used technology to experimentally determine RNA structure, for instance in terms of nucleotide-wise flexibility of the RNA backbone (SHAPE), there is increasing demand for efficient and accurate computational methods that incorporate probing data into secondary structure prediction. Existing implementations such algorithms, e.g. provided by the ViennaRNA Package, typically yield excellent prediction results for short sequences. However, accuracy decreases to between 40% and 70% for long RNA sequences due to imperfection of the thermodynamic parameters, and inherent limitations of the secondary structure model, such as tertiary interactions, pseudoknots, ligand binding, or kinetic traps. To alleviate the gap in available computational tools we have developed a framework for incorporating probing data into the structure prediction algorithms of the ViennaRNA Package by means of soft constraints in order to improve prediction quality.
SHAPE directed RNA folding
Dominik Luntzer1, Ronny Lorenz1, Ivo L. Hofacker167, Peter F. Stadler123478, Michael T. Wolfinger15
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria 2Bioinformatics Group of the Department of Computer Science, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany 3Interdisciplinary
Center for Bioinformatics of the University of Leipzig 4Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany 5Center for Integrative Bioinformatics Vienna (CIBIV) & Department of Biochemistry and Molecular
Cell Biology, Max F. Perutz Laboratories, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria 6Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria 7Center for RNA in Technology and Health,
Univ. Copenhagen, Grønnegårdsvej 3, Frederiksberg C, Denmark 8Fraunhofer Institute for Cell Therapy and Immunology, Perlickstrasse 1, D-04103 Leipzig, Germany 9Santa Fe Institute, 1399 Hyde Park Road, Santa Fe NM 87501, USA
1. Introduction
[1] Luntzer, D., Lorenz, R., Hofacker, I.L., Stadler, P.F, Wolfinger, M.T. (2015), Shape directed RNA folding. submitted
[2] Deigan, K. E., Li, T. W., Mathews, D. H., and Weeks, K. M. (2009). Accurate SHAPE-directed RNA structure determination. PNAS, 106, 97–102.
[3] Zarringhalam, K., Meyer, M. M., Dotu, I., Chuang, J. H., and Clote, P. (2012). Integrating chemical footprinting data into RNA secondary structure prediction. PLOS ONE, 7(10).
[4] Washietl, S., Hofacker, I. L., Stadler, P. F., and Kellis, M. (2012). RNA folding with soft constraints: reconciliation of probing data and thermodynamics secondary structure prediction. Nucleic Acids Research, 40(10), 4261–4272.
Contact: ronny@tbi.univie.ac.at - http://www.tbi.univie.ac.at
5. Acknowledgements
This work was partly funded by the Austrian Science
Fund (FWF) project “RNA regulation of the
transcriptome” (F43), Deutsche
Forschungsgemeinschaft (DFG) STA 850/15-1 and
the German ministry of science (0316165C as part of
the e:Bio initiative).
2. Methods
3. Availability
4. Results
The spatial structure of RNA plays an important role
in genome regulation because it critically influences
the interaction with proteins, and other nucleic acids.
Knowledge of RNA structure is therefore crucial for
understanding various biological processes.
Chemical and enzymatic probing methods provide
information concerning the flexibility and accessibility
at nucleotide resolution. As these methods are
becoming a frequently used technology to
experimentally determine RNA structure, for instance
in terms of nucleotide-wise flexibility of the RNA
backbone (SHAPE), there is increasing demand for
efficient and accurate computational methods that
incorporate probing data into secondary structure
prediction. Existing implementations such algorithms,
e.g. provided by the ViennaRNA Package, typically
yield excellent prediction results for short sequences.
However, accuracy decreases to between 40% and
70% for long RNA sequences due to imperfection of
the thermodynamic parameters, and inherent
limitations of the secondary structure model, such as
tertiary interactions, pseudoknots, ligand binding, or
kinetic traps. To alleviate the gap in available
computational tools we have developed a framework
for incorporating probing data into the structure
prediction algorithms of the ViennaRNA Package by
means of soft constraints in order to improve
prediction quality.
Soft constraints guide the folding prediction by adding
position-, or motif-specific pseudo-energy
contributions to the free energies of certain loop
motifs. This amounts to a distortion of the equilibrium
ensemble of structures in favour of those that are
consistent with experimental data. Mismatching motifs
are penalized by positive contributions, while
structure patterns where prediction and experiment
agree with each other receive a “bonus” in form of a
negative pseudo-energy. Current methods for guided
secondary structure prediction by means of soft
constraints mainly focus on the incorporation of
SHAPE reactivity data. For that purpose, three
algorithms are available that aim to transform
normalized SHAPE reactivity data into meaningful
pseudo-energy terms.
The first approach that applied SHAPE directed RNA
folding uses the simple linear ansatz
to convert SHAPE reactivity values to pseudo
energies whenever a nucleotide contributes to a
stacked pair (Deigan et al., 2009). A positive slope
penalizes high reactivities in paired regions, while a
negative intercept results in a confirmatory "bonus"
free energy for correctly predicted base pairs.
A more consistent model considers nucleotide-wise
experimental data in all loop energy evaluations
(Zarringhalam et al., 2012). First, the observed SHAPE
reactivity of nucleotide is converted into the probability
that position is unpaired by means of a non-linear map.
Then pseudo-energies of the form
are computed, where if position is considered
unpaired and if it is involved in a base pair. While
the parameter serves as scaling factor, the magnitude of
discrepancy between prediction and experimental
observation is represented by .
A third, very distinct approach on incorporating SHAPE
reactivity data to guide secondary structure prediction was
suggested by Washietl et al. (2012). Here, the authors
phrase the choice of the bonus energies as an optimization
problem that aims to find a perturbation vector of
pseudo-energies that minimizes the discrepancy between
the observed and predicted probabilities to see particular
nucleotides unpaired, and , respectively. At the same
time, the perturbation should be as small as possible.
The tradeoff between the two goals is naturally defined by
the relative uncertainties inherent in the SHAPE
measurements and the energy model.
G
C
C
G
U
G
A
U
A
G
U
U
U
A
A
U
G
G
U
C
AG
A
A
U
G
G
G
C
G
C
U
U
G
U
CGC
G
U
G
C
C
AGAU
C
G
G
G
G
UUC
A
A
U
U
C
C
C
C
G
U
C
G
C
G
G
CG
C
C
A
1
10
20
30
40
50
60
70
75
G
C
C
G
U
G
A
U
A
G
U
U
U
A
A
U
G
G
U
C
AG
A
A
U
G
G
G
C
G
C
U
U
G
U
CGC
G
U
G
C
C
AGAU
C
G
G
G
G
UUC
A
A
U
U
C
C
C
C
G
U
C
G
C
G
G
CG
C
C
A
1
10
20
30
40
50
60
70
75
G
C
C
G
U
G
A
U
A
G
U
U
U
A
A
U
G
G
U
C
AG
A
A
U
G
G
G
C
G
C
U
U
G
U
CGC
G
U
G
C
C
AGAU
C
G
G
G
G
UUC
A
A
U
U
C
C
C
C
G
U
C
G
C
G
G
CG
C
C
A
1
10
20
30
40
50
60
70
75
Fig. 1 RNA secondary structure of yeast tRNA-asp
annotated with experimentally determined SHAPE
reactivities.
Fig. 2 RNA secondary structure of yeast tRNA-asp with indication
where SHAPE reactivity derived pseudo-energies using the
Deigan et al. approach are applied in the folding prediction. As a
consequence of the method, pseudo-energies are applied twice
for pairs inside a helix, and just once for terminal pairs.
Fig. 3 RNA secondary structure of yeast tRNA-asp where
structural parts that receive a bonus(malus) energy according to
the Zarringhalam et al. method are highlighted in red (paired
nucleotides) and blue (unpaired nucleotides).
G
C
C
G
U
G
A
U
A
G
U
U
U
A
A
U
G
G
U
C
AG
A
A
U
G
G
G
C
G
C
U
U
G
U
CGC
G
U
G
C
C
AGAU
C
G
G
G
G
UUC
A
A
U
U
C
C
C
C
G
U
C
G
C
G
G
CG
C
C
A
1
10
20
30
40
50
60
70
75
0.0 1.83
SHAPE reactivity
Fig. 4 RNA secondary structure of yeast tRNA-asp with
highlighted unpaired nucleotides that receive a pertubation
pseudo-energy according the method of Washietl et al.
All three methods outlined above have been
implemented into the ViennaRNA Package 2.2.
Additional functionalities are available through the
API of the ViennaRNA Library and the command line
interface of RNAfold and RNAalifold. The novel
standalone tool RNApvmin dynamically estimates a
vector of pseudo-energies according to the method of
Washietl et al. (2012), which can be used to guide
structure prediction with RNAfold. This setup makes
it easy for users to incorporate alternative ways of
computing bonus energies, or to use the software
with other types of probing data. Guided structure
prediction has also been included into the
ViennaRNA Websuite, a Web server providing an
interface to many tools of the ViennaRNA Package,
available at http://rna.tbi.univie.ac.at .
We applied all three methods to a benchmark set
containing 24 triples of sequences, their known
reference structures, and corresponding SHAPE
data. In this set, reference structures were either
derived from X-ray crystallography experiments, or
predicted by comparative sequence analysis. The
use of SHAPE data driven soft constraints leads to
improved prediction results for many RNAs.
However, for some of the RNAs within our
benchmark data the additional pseudo-energy terms
impair prediction results, possibly due to several
factors. First, experimental data may be inaccurate,
and second the underlying energy model excludes
pseudoknotted structures, which are present in
approximately half of the benchmarked RNAs. From
our benchmark we conclude, that none of the three
implemented methods consistently outperforms the
other two in terms of prediction performance.
Normalized SHAPE reactivity
0 1.6
U
G
C
C
U
G
G
C
G
G
C
C
G
U
A
G
C
G
C
G
G
U
G
G
U
C
C
C
A
C
C
U
G
A
C
C
C
C
A
U
G
CCGA
A
C
UCAG
A
AGUG
A
A
A
C
G
C
C
G
UA
G
CG
CC
GAU
GGU
AGU
G
U
G
G
G
G
UC
U
C
C
C
C
A
U
G
C
G
A
G
A
G
U
A
G
G
G
A
A
C
U
G
C
C
A
G
G
C
A
U
10
20
30
40
50
60
70
80
90
100
110
120
U
G
C
C
U
G
G
C
G
G
C
C
G
U
A
G
CG
C
G
G
U
G
G
U
C
C
C
ACC
U
G
A
C
C
C
C
A
U
G
C
C
G
A
A
C
U
C
A
G
A
AG
U
G
A
AA
C
GC
C
G
U
A
G
C
G
C
C
G
A
U
G
G
U
A
G
U
G
U
G
G
G
G
U
CU
C
CCC
A
UGCG
A
G
A
G
U
A
G
G
G
A
A
C
U
G
C
C
A
G
G
C
A
U
10
20
30
40
50
60
70
80
90
100
110
120
U
G
C
C
U
G
G
C
G
G
C
C
G
U
A
G
C
G
C
G
G
U
G
G
U
C
C
C
A
C
C
U
G
A
C
C
C
C
A
U
G
CCGA
A
C
U
C
AGA
A
G
U
GA
A
A
C
GCCGUA
GC
G
C
C
GA
U
G
G
UA
GUGUGG
G
G
UC
U
C
C
C
C
A
U
G
C
G
A
G
A
G
U
A
G
G
G
A
A
C
U
G
C
C
A
G
G
C
A
U
10
20
30
40
50
60
70
80 90
100
110
120
A
B
C
Fig. 5 Secondary structure prediction of E.coli 5S rRNA from
our benchmark data set. A Structure reference, B prediction
by RNAfold with default parameters, and C prediction by
RNAfold with guiding pseudo-energies obtained from
SHAPE reactivity data using RNApvmin. Grey nucleotides
correspond to missing SHAPE reactivity data.
Article
Full-text available
Background RNA plays essential roles in all known forms of life. Clustering RNA sequences with common sequence and structure is an essential step towards studying RNA function. With the advent of high-throughput sequencing techniques, experimental and genomic data are expanding to complement the predictive methods. However, the existing methods do not effectively utilize and cope with the immense amount of data becoming available. Results Hundreds of thousands of non-coding RNAs have been detected; however, their annotation is lagging behind. Here we present GraphClust2, a comprehensive approach for scalable clustering of RNAs based on sequence and structural similarities. GraphClust2 bridges the gap between high-throughput sequencing and structural RNA analysis and provides an integrative solution by incorporating diverse experimental and genomic data in an accessible manner via the Galaxy framework. GraphClust2 can efficiently cluster and annotate large datasets of RNAs and supports structure-probing data. We demonstrate that the annotation performance of clustering functional RNAs can be considerably improved. Furthermore, an off-the-shelf procedure is introduced for identifying locally conserved structure candidates in long RNAs. We suggest the presence and the sparseness of phylogenetically conserved local structures for a collection of long non-coding RNAs. Conclusions By clustering data from 2 cross-linking immunoprecipitation experiments, we demonstrate the benefits of GraphClust2 for motif discovery under the presence of biological and methodological biases. Finally, we uncover prominent targets of double-stranded RNA binding protein Roquin-1, such as BCOR’s 3′ untranslated region that contains multiple binding stem-loops that are evolutionary conserved.
Article
Full-text available
Most RNA molecules form internal base pairs, leading to a folded secondary structure. Some of these structures have been demonstrated to be functionally significant. High-throughput RNA structure chemical probing methods generate millions of sequencing reads to provide structural constraints for RNA secondary structure prediction. At present, processed data from these experiments are difficult to access without computational expertise. Here we present FoldAtlas, a web interface for accessing raw and processed structural data across thousands of transcripts. FoldAtlas allows a researcher to easily locate, view, and retrieve probing data for a given RNA molecule. We also provide in silico and in vivo secondary structure predictions for comparison, visualised in the browser as circle plots and topology diagrams. Data currently integrated into FoldAtlas are from a new high-depth Structure-seq data analysis in Arabidopsis thaliana, released with this work. Availability and Implementation The FoldAtlas website can be accessed at www.foldatlas.com. Source code is freely available at github.com/mnori/foldatlas under the MIT license. Raw reads data are available under the NCBI SRA accession SRP066985. Contact yiliang.ding{at}jic.ac.uk or matthew.norris{at}jic.ac.uk. Supplementary Information Supplementary data are available at Bioinformatics online.
Article
Full-text available
Background A large class of RNA secondary structure prediction programs uses an elaborate energy model grounded in extensive thermodynamic measurements and exact dynamic programming algorithms. External experimental evidence can be in principle be incorporated by means of hard constraints that restrict the search space or by means of soft constraints that distort the energy model. In particular recent advances in coupling chemical and enzymatic probing with sequencing techniques but also comparative approaches provide an increasing amount of experimental data to be combined with secondary structure prediction. Results Responding to the increasing needs for a versatile and user-friendly inclusion of external evidence into diverse flavors of RNA secondary structure prediction tools we implemented a generic layer of constraint handling into the ViennaRNA Package. It makes explicit use of the conceptual separation of the “folding grammar” defining the search space and the actual energy evaluation, which allows constraints to be interleaved in a natural way between recursion steps and evaluation of the standard energy function. Conclusions The extension of the ViennaRNA Package provides a generic way to include diverse types of constraints into RNA folding algorithms. The computational overhead incurred is negligible in practice. A wide variety of application scenarios can be accommodated by the new framework, including the incorporation of structure probing data, non-standard base pairs and chemical modifications, as well as structure-dependent ligand binding. Electronic supplementary material The online version of this article (doi:10.1186/s13015-016-0070-z) contains supplementary material, which is available to authorized users.
ResearchGate has not been able to resolve any references for this publication.