PosterPDF Available

Probing assisted RNA folding

Authors:

Abstract

The advent of high- throughput structural probing methods like SHAPE-seq or PARS has spurred the development of computational techniques that incorporate such experimental data with traditional thermodynamic structure prediction approaches. The conversion of RNA structure probing data into pairing probabilities and pseudo energies is not trivial. We present a novel approach for improved RNA secondary structure prediction from probing data by means of a self-consistent probabilistic strategy to derive soft constraints for the underlying energy model
Probing assisted RNA folding
MichaelT.Wolfinger13,RonnyLorenz1,AndreaTanzer1,IvoL.Hofacker12
1Department of Theoretical Chemistry, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
2Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, 1090 Wien, Austria
3Center for Anatomy and Cell Biology, Medical Unversity of Vienna, Währingerstraße 13, 1090 Wien, Austria
1. Introduction
[1] Lorenz, R., Luntzer, D., Hofacker, I.L., Stadler, P.F, Wolfinger, M.T. (2016), SHAPEdirectedRNAfolding. Bioinformatics32,145-14.
[2] Lorenz, R., Wolfinger, M.T., Tanzer, A. and Hofacker, I.L. (2016), PredictingRNAStructuresfromSequenceandprobingData.Methods.
[3] Eddy, S.R. (2014), ComputationalanalysisofconservedRNAsecondarystructureintranscriptomesandgenomes. AnnuRevBiophys43,433-456.
[4] Wu, Y. et al. (2014), ImprovedpredictionofRNAsecondarystructurebyintegratingthefreeenergymodelwithrestraintsderivedfromexperimentalprobingdata. NucleicAcidRes43,7247-7259.
Contact: michael.wolfinger@univie.ac.at - http://www.tbi.univie.ac.at
6. Acknowledgements
This work was partly funded by the Austrian
Science Fund FWF project "RNA regulation of
the transcriptome" (F43) and the Austrian/
French project "RNA-Lands" (FWF-I-1804-N28
and ANR-14-CE34-0011).
2. Soft Constraints
4. A self-consistent method
RNA function is determined by RNA structure
and therefore knowledge of the spatial
conformation of RNA is an asset for under-
standing various biological processes such as
RNA regulation. Chemical and enzymatic
probing methods, e.g. SHAPE, allow for fine-
grained assessment of RNA structure at
nucleotide resolution. The advent of high-
throughput structural probing methods like
SHAPE-seq or PARS has spurred the develop-
ment of computational techniques that incor-
porate such experimental data as auxiliary
information.
Popular RNA folding algorithms, as implemented
e.g. in the ViennaRNA Package, typically yield
excellent prediction results for short sequences.
However, accuracy decreases to between 40%
and 70% for long RNA sequences due to
imperfection of the thermodynamic parameters,
and inherent limitations of the secondary
structure model, such as tertiary interactions,
pseudoknots, ligand binding, or kinetic traps.
Fig. 1 RNA secondary structure of E.coli 5S rRNA anno-
tated with experimentally determined SHAPE reactivities.
To alleviate the gap in available prediction tools
we have developed a framework for incorpora-
ting probing data into the structure prediction
algorithms of the ViennaRNA Package by means
of Soft Constraints that guide the folding predic-
tion by adding position-, or motif-specific pseu-
do-energy contributions to the free energies of
certain loop motifs.
We have recently implemented previously pub-
lished methods to incorporate SHAPE probing
data into the ViennaRNA Package [1,2], two of
which include an ad-hoc conversion of SHAPE
reactivities into pseudo free energies. Later ap-
proaches first convert reactivities into probabili-
ties of being (un)paired and compute pseudo
energies from these likelihoods.
3. Probabilistic RNA folding
The conversion of RNA structure probing data into
pairing probabilities is not trivial. In fact, reactivity
values measured for different structural contexts,
e.g. paired and unpaired bases, are similar and can
therefore not be well separated. Thus, there is no
simple way to infer whether a given nucleotide is
paired just based on raw readout.
Using probing data for a reference RNA with known
secondary structure, however, allows one to derive
distributions of the measured reactivity values for
different structural contexts. These distributions can
then be fitted to a probability density model to
compute for each nucleotide the conditional
probability to observe a reactivity
given its structural context . Eddy [3] already
suggested to convert these conditional probabilities
into a pseudo energy
To overcome the dependency on training data
and thus abandon ad-hoc assumptions inherent
in previous methods, the reactivity distribution
of each distinguished structure context must be
inferred from the data itself. Therefore, the
observed reactivities, i.e. the mixture of distri-
butions, need to be deconvoluted. This, how-
ever, is by far not an easy task.
Nevertheless, under the assumption that the
RNA's structure ensemble is dominated by a
single conformation we can use computed
equilibrium probabilities and parameter-
ized model distributions to obtain
i.e. the probability to observe .
Moreover, the likelihood to observe the mea-
sured pattern of probing data is
Now, the aim is to find a parameterization for
the distributions that maximizes .
We propose to iteratively use the posterior pro-
babilties as soft constraint to update
the probabilities for the next round. This
strategy is then applied until convergence.
Normalized SHAPE reactivity
0 1.6
U
G
C
C
U
G
G
C
G
G
C
C
G
U
A
G
C
G
C
G
G
U
G
G
U
C
C
C
A
C
C
U
G
A
C
C
C
C
A
U
G
C
CGA
A
C
U
C
AG
A
AGUG
A
A
A
C
G
C
C
G
UA
G
CG
CC
GAU
GG
U
AGUG
U
G
G
G
G
UC
U
C
C
C
C
A
U
G
C
G
A
G
A
G
U
A
G
G
G
A
A
C
U
G
C
C
A
G
G
C
A
U
10
20
30
40
50
60
70
80
90
100
110
120
Full data set
0.5
1.5
2.5
Density
Distribution of SHAPE reactivity
Paired nucleotides only
0.5
1.5
2.5
3.5
4.5
5.5
Density
Measurement
Fit
Unpaired nucleotides only
0.0
0.5
1.0
1.5
0 1 2 3
Reactivity
Density
Measurement
Fit
and apply it to each derivation of the MFE algorithm,
where is added to a growing substructure. As a
consequence, the soft constrained MFE structure
maximizes the probability to observe to observe the
probing data.
Fig. 2 SHAPE reactivity distributions for E.coli 23S rRNA.
Using Bayes' rule, the posterior probability of a
structure context given its reactivity is
Still, the probabilities and are unknown
a priori and can only be estimated from training data.
An ad-hoc implementation of this idea is provided by
the RME program [4].
In this self-consistent framework it is even
possible to optimize for a combination of indi-
vidual probing techniques , such as Pb(II),
SHAPE, DMS, PARS, etc.:
In a PARS experiment, for example, rather than
computing log-odds of nuclease S1 and V1
treatment, their intensities can be indepen-
dently converted into pseudo energy contri-
butions.
5. Outlook
1994 (Hofacker et. al)
Vienna RNA Package RNAfold,
RNAInverse, part. funct. and BP
prob. implementation
1981 (Zuker & Stiegler)
DP algorithm for
MFE prediction
1971 (Tinoco et. al)
First framework for
secondary structure
evaluation
1978 (Waterman and Smith)
Mathematical Analysis of RNA
secondary structures
1978 (Nussinov et al.)
DP algorithm for BP
maximization
2002 (Flamm et. al)
barrier trees energy
landscape decomposition
2010 (Hofacker et al.)
Barmap: co-transcriptional folding kinetics
2011 (Bernhart et. al)
RNAplfold: efficient calculation
of RNA accessibility
1999 (Wuchty et. al)
RNAsubopt suboptimals
within energy band
1999 (Rivas & Eddy)
Pknots: DP algorithm for RNA
structures with pseudo knots
1999 (Rivas & Eddy)
Pknots: DP algorithm for RNA
structures with pseudo knots
1998 (Tabaska et al.)
pseudo knots & base triplets
1998 (Tabaska et al.)
pseudo knots & base triplets
1990 (McCaskill)
partition function and BP
probabilities
1999 (Knudsen et al.)
structure prediction using SCFG
1999 (Knudsen et al.)
structure prediction using SCFG
2003 (Knudsen et al.)
Pfold: SCFG for
consensus structure
2003 (Knudsen et al.)
Pfold: SCFG for
consensus structure
2006 (Do et al.)
CONTRAfold: SCFG
folding for single
sequences
2006 (Do et al.)
CONTRAfold: SCFG
folding for single
sequences
2001 (Ding et. al)
Sfold: statistical prediction of RNA accessibility
2001 (Ding et. al)
Sfold: statistical prediction of RNA accessibility
2004 (Giegerich et al.)
RNAshapes: ADP
shape abstraction
2004 (Giegerich et al.)
RNAshapes: ADP
shape abstraction
2008 (Parisien et al.)
MC-Fold/MC-Sym
2008 (Parisien et al.)
MC-Fold/MC-Sym
1984 (Steger at al.), 1989 (Zuker)
DP algorithm for suboptimals
1984 (Steger at al.), 1989 (Zuker)
DP algorithm for suboptimals
2006 (Muckstein et. al)
RNAup: RNA-RNA
interaction considering
accessibility
2015
2010
2000
199019801970
high-throughput probing
probabilistic structure prediction
thermodynamic structure prediction
2010 (Kertez et al)
PARS Parallel Analysis of
RNA structures
2011 (Lucks et al.)
SHAPE-seq
2014 (Siegfried et al.)
SHAPE-MaP
2014 (Homan et al.)
Ring-Map (single molecule)
2015 (Ramani et al.)
Proximity ligation
2010 (Reuter et al.)
RNAstructure
2013 (Ouyang et al.)
SeqFold
2015 (Tang et al.)
StructureFold
2012 (Washietl et al.)
RNAPbfold
2016 (Lorenz et al.)
ViennaRNA with soft
constraints
guided structure prediction
RNA kinetic folding
2010 2011 2012 2013 2014 2015 2016
ResearchGate has not been able to resolve any citations for this publication.
RNAshapes: ADP shape abstraction 2000 (Isambert et. al) KineFold: stochastic simulation of folding kinetics with pseudoknots 2000 (Isambert et. al) KineFold: stochastic simulation of folding kinetics with pseudoknots
  • Giegerich
Sfold: statistical prediction of RNA accessibility 2001 (Ding et. al) Sfold: statistical prediction of RNA accessibility 2004 (Giegerich et al.) RNAshapes: ADP shape abstraction 2004 (Giegerich et al.) RNAshapes: ADP shape abstraction 2000 (Isambert et. al) KineFold: stochastic simulation of folding kinetics with pseudoknots 2000 (Isambert et. al) KineFold: stochastic simulation of folding kinetics with pseudoknots 2008 (Parisien et al.)