# Maximum expected accurate structural neighbors of an RNA secondary structure.

**ABSTRACT** Since RNA molecules regulate genes and control alternative splicing by allostery, that is, by switching between two distinct secondary structures, it is important to develop algorithms to predict RNA conformational switches. It has recently emerged that RNA secondary structure can be more accurately predicted by computing the maximum expected accurate (MEA) structure, rather than the minimum free energy (MFE) structure. The MEA structure S has maximum score 2 Σ (i, j)ϵs Pi, j + Σi unpaired qi, where first sum is taken over all base pairs (i, j) belonging to S, and the second sum is taken over all unpaired positions in S, and where pi, j [resp. qi] is the probability that i, j are paired [resp. i is unpaired] in the ensemble of low energy structures. Results: Given an arbitrary RNA secondary structure S0, for an RNA nucleotide sequence a = a1,. . . , an, we say that another secondary structure S of a is a k-neighbor of S0, if the base pair distance between S0 and S is k. Here we describe the algorithm RNAborMEA, which for an arbitrary initial structure So and for all values 0 ≤ k ≤ n, computes the secondary structure MEA(k), having maximum expected accuracy over all k-neighbos of S0. We apply our algorithm to the class of purine riboswitches.

**1**Bookmark

**·**

**59**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Many non-coding RNAs (ncRNAs), such as riboswitches, can fold into alternate native structures and perform different biological functions. The computational prediction of an ncRNA's alternate native structures can be conducted by analyzing the ncRNA's energy landscape. Previously, we have developed a computational approach, RNASLOpt, to predict alternate native structures for a single ncRNA by generating all possible stable local optimal (SLOpt) stack configurations on the ncRNA's energy landscape. In this paper, in order to improve the accuracy of the prediction, we incorporate structural conservation information among a family of related ncRNA sequences to the prediction. We propose a comparative approach, RNAConSLOpt, to produce all possible consensus SLOpt stack configurations that are conserved on the consensus energy landscape of a family of related ncRNAs. Benchmarking tests show that RNAConSLOpt can reduce the number of candidate structures considered compared with RNASLOpt, and can predict ncRNAs' alternate native structures accurately. Availability: Source code and benchmark tests for RNACon-SLOpt are available at http://genome.ucf.edu/RNAConSLOpt.01/2012;

Page 1

Maximum expected accurate structural neighbors of an RNA secondary structure

Feng Lou

Laboratoire pour la Recherche en Informatique (LRI)

Universit´ e Paris-Sud XI

Orsay cedex, France

lou@lri.fr

Peter Clote

Biology Department

Boston College

Chestnut Hill, MA 02467, USA

clote@bc.edu

Abstract—Since RNA molecules regulate genes and control

alternative splicing by allostery, that is, by switching between

two distinct secondary structures, it is important to develop

algorithms to predict RNA conformational switches. It has

recently emerged that RNA secondary structure can be more

accurately predicted by computing the maximum expected

accurate (MEA) structure, rather than the minimum free energy

(MFE) structure. The MEA structure S has maximum score

2P

over all unpaired positions in S, and where pi,j [resp. qi] is

the probability that i,j are paired [resp. i is unpaired] in the

ensemble of low energy structures.

Results: Given an arbitrary RNA secondary structure S0for an

RNA nucleotide sequence a = a1,...,an, we say that another

secondary structure S of a is a k-neighbor of S0, if the base

pair distance between S0 and S is k. Here we describe the

algorithm RNAborMEA, which for an arbitrary initial structure

S0 and for all values 0 ≤ k ≤ n, computes the secondary

structure MEA(k), having maximum expected accuracy over

all k-neighbors of S0. We apply our algorithm to the class of

purine riboswitches.

Availability: Source code for RNAborMEA can be downloaded

from http://bioinformatics.bc.edu/clotelab/RNAborMEA/.

Keywords-RNA secondary structure, maximum expected ac-

curate structure, minimum free energy structure, riboswitch,

conformational switch

(i,j)∈Spi,j+P

i unpairedqi, where first sum is taken over

all base pairs (i,j) belonging to S, and the second sum is taken

I. INTRODUCTION

RNA secondary structure conformational switches play an

essential role in a number of biological processes, such as

regulation of viral replication [3] and of viroid replication

[4], regulation of R1 plasmid copy number in E. coli by

hok/sok system [9], transcriptional and translational gene

regulation in prokaryotes by riboswitches [15], regulation of

alternative splicing in eukaryotes [7], and stress-responsive

gene regulation in humans [20], etc. Due to their biological

importance, several groups have developed algorithms that

attempt to recognize conformational switches – in particular,

riboswitches [1], [5], [2]. Most current approaches heavily

depend on detecting the aptamer, located in the 5?-portion of

the riboswitch, that is responsible for high affinity binding

of a particular ligand (KD ≈ 5 nM) that triggers the

conformational change [16]. Computational tools that rely

on stochastic context free grammars, such as Infernal

and CMFinder, have been trained to recognize riboswitch

aptamers; in particular, Infernal was used to create

the Rfam database [12], which includes 14 families of

riboswitch aptamers.

Upon ligand binding, the 3?-portion of a riboswitch, called

expression platform, undergoes a conformation change,

forming a stem-loop that aborts transcription (thus effecting

transcriptional regulation, as in guanine riboswitches) or

that sequesters the Shine-Dalgarno sequence (thus effecting

translational regulation, as in thiamine pyrophosphate (TPP)

riboswitches). Due to evolutionary pressure for accurate lig-

and recognition, there is generally high sequence identity in

the aptamer region; however, there is low sequence identity

(data not shown) in the expression platform. Since current

riboswitch detection algorithms do not attempt to predict

the location of the expression platform, we have developed

a tool, RNAborMEA, that yields information concerning

alternative structures of a given RNA sequence. This tool can

suggest the presence of a conformational switch; however,

much more work must be done to actually produce a

riboswitch gene finder, part of the difficulty due to the fact

that riboswitch aptamers contain pseudoknots that cannot be

captured by secondary structure.

In previous work [10], [11], we described a novel program

RNAbor to predict RNA conformational switches. For a

given secondary structure S of a given RNA sequence s,

the secondary structure T of s is said to be a k-neighbor

of S, if the base pair distance between S and T is k.

(Base pair distance is defined later.) Given an arbitrary initial

structure S0, for all values 0 ≤ k ≤ n, the program RNAbor

[10], computes the secondary structure MFE(k), having

minimum free energy over all k-neighbors of S0. In this

paper, we extend our work by computing for all values

0 ≤ k ≤ n, the secondary structure MEA(k), having

maximum expected accuracy (MEA) over all k-neighbors

of S0.

Page 2

In [8], Do et al. introduced1the notion of maximum

expected accurate (MEA) secondary structure, determined

as follows: (i) compute base pairing probabilities p(i,j)

using a trained stochastic context free grammar; (ii) compute

probabilities q(i) = 1 −?

algorithm similar to that of Nussinov and Jacobson [19],

determine that secondary structure S having maximum score

?

positions i located in loop regions of S, and where α,β > 0

are parameters with default values 1. Subsequently Kiryu

et al. [13] computed the MEA structure by replacing the

stochastic context free grammar computation of base pairs

in (i) by using McCaskill’s algorithm [17], which computes

the Boltzmann base pairing probabilities

?

Here E(S) is the free energy of secondary structure S, with

respect to the Turner energy model [22], R is the universal

gas constant, and T is absolute temperature. Thus p(i,j) is

the sum of the Boltzmann factors of all secondary structures

that contain the fixed base pair (i,j), divided by the partition

function, which latter is the sum of Boltzmann factors of

all secondary structures. In fact, Kiryu et al. [13] describe

an algorithm to compute the MEA structure common to all

RNAs in a given alignment. Later, Lu et al. [14] rediscovered

Kiryu’s method; in addition, Lu et al. computed suboptimal

MEA structures by implementing an analogue [23].

In this paper, we extend the MEA technique to compute

the maximum expected accurate k-neighbor of a given RNA

secondary structure S0; i.e. that secondary structure which

has maximum expected accuracy over all structures that

differ from S0by exactly k base pairs.

II. PRELIMINARIES

Recall the definition of RNA secondary structure.

Definition 1: A secondary structure S on RNA sequence

s1,...,snis defined to be a set of ordered pairs (i,j), such

that 1 ≤ i < j ≤ n and the following are satisfied.

1) Watson-Crick or GU wobble pairs: If (i,j) belongs

to S, then pair (ai,aj) must be one of the following

canonical basepairs: (A,U), (U,A), (G,C), (C,G),

(G,U), (U,G).

2) Threshold requirement: If (i,j) belongs to S, then j−

i > θ, where θ, generally taken to be equal to 3, is

i<jp(i,j) −?

j<ip(j,i) that

position i does not pair; (iii) using a dynamic programming

(i,j)∈S2α·p(i,j)+?

iunpairedβqi, where the first sum is

over paired positions (i,j) of S and the second sum is over

p(i,j) =

(i,j)∈Sexp(−E(S)/RT)

?

Sexp(−E(S)/RT)

(1)

1Miyazawa [18] first introduced the concept of maximum expected

accuracy in the context of sequence alignment of two amino acid sequences

a1,...,an and b1,...,bm. Miyazawa computed the Boltzmann pair

probability P(ai,bj) that ai is aligned with bj, for all 1 ≤ i ≤ n and

1 ≤ j ≤ m, and then used P(ai,bj) as the similarity score between ai

and bj in the usual Needleman-Wunsch and Smith-Waterman algorithms.

Do et al. lifted this method to the context of RNA secondary structure

prediction.

1. void RNAborMEA(s,S0,M)

2. //M(i, j, k) is the score of MEA k-neighbor of S0

3. initialize M(i, j, k) = 0 for all 1 ≤ i, j ≤ n, 0 ≤ k ≤ n

4.compute pi,jfor all 1 ≤ i ≤ j ≤ n (McCaskill’s algorithm)

5.for i = 1 to n

6.

j<ipj,i

7.//qiis Boltzmann probability that i is unpaired

8.for d = 0 to n − 1

9.for i = 1 to n − d

10.

j = i + d

11. for k = 0 to n

12.if j − i ≤ θ //θ unpaired bases in hairpin

13.if k == 0

14.

r=iβqr

15.else // k > 0

16.break // for all k > 0 M(i, j, k) = 0

17. else if j − i == θ + 1

18.if (i, j) ∈ S0then

19.

M(i, j, 1) =Pj

22. else // (i, j) ?∈ S0

23.

r=iβqr

24.if basePair(i, j) then

25.

26.break //for other cases M(i, j, k) = 0

27. else // j − i > θ + 1

qi= 1 −P

j≥ipi,j−P

// d is diagonal offset value

M(i, j, k) =Pj

M(i, j, 0) = 2αpi,j+Pj−1

break //for k>1, M(i, j, k) = 0

r=i+1βqr

20.

21.

r=iβqr

M(i, j, 0) =Pj

M(i, j, 1) = 2αpi,j+Pj−1

r=i+1βqr

Figure 3. Initial portion of pseudocode for RNAborMEA algorithm, which

continues in Figure 4. Given RNA sequence s = s1,...,sn of length n,

initial secondary structure S0of s, RNAborMEA computes for all values of

0 ≤ k ≤ n that structure S with base pair distance k from S0, which max-

imizes the value M(i,j,k) =P

MEA structures are obtained by backtracing. This algorithm clearly runs

in O(n4) time with O(n3) space.

(i,j)∈S2αpi,j+P

i unpaired in Sβqi.

The pseudocode actually computes only values M(i,j,k) for all i,j,k; the

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

56.

57.

else // j − i > θ + 1

max = 0 // M(i, j, k) = max of following

// Case 1: j unpaired in S[i, j]

b0= dBP(S0[i, j − 1], S0[i, j])

//b0= 1 if j paired in S0[i, j], else 0

val = M(i, j − 1, k − b0) + βqj

if val > max then

max = val

index = (0, 0, 0)

//backtracking: j unpaired

// Case 2: (i, j) ∈ S

if basePair(i, j) //check if i, j can pair

b1= dBP(S0[i + 1, j − 1] ∪ {(i, j)}, S0[i, j])

val = M(i + 1, j − 1, k − b1) + 2αpi,j

if val > max then

max = val

index = (i, k − b1, 0)

//backtracking: (i, j) ∈ S

// Case 3: (r, j) ∈ S for some i < r < j

for r = i + 1 to j − θ − 1

if basePair(r, j)

b2= dBP(S0[i, r − 1] ∪ S0[r + 1, j − 1] ∪ {(r, j)}, S0[i, j])

for k0= 0 to k − b2

k1= k − b2− k0//k0+ k1+ b2= k

val = M(i, r − 1, k0) + M(r + 1, j − 1, k1) + 2αpr,j

if val > max then

max = val

index = (r, k0, k1)

//backtracking: (r, j) ∈ S

M(i, j, k) = max

M(j, i, k) = index

Figure 4.

quence s = s1,...,sn of length n, initial secondary structure S0 of

s, RNAborMEA computes for all values of 0 ≤ k ≤ n that struc-

ture S with base pair distance k from S0, which maximizes the value

M(i,j,k) =P

are obtained by backtracing. This algorithm clearly runs in O(n4) time with

O(n3) space.

Pseudocode for RNAborMEA algorithm. Given RNA se-

(i,j)∈S2αpi,j+P

i unpaired in Sβqi. The pseudocode

actually computes only values M(i,j,k) for all i,j,k; the MEA structures

Page 3

C

U

A

C

U

AGUAGUUAUUAACUA

A

G

G

G

G

A

G

C

C

A

A

A

AGGCU

GA

G

A

G

A

U

A

U

G

UAUUCA

G ACCC

U

A

UAAC

U

C

U

G

A

U

U

U

G

G

U

U

AAU

A

C

C

A

AC

G

U

A

G

G AA

UUCGUC

A UU

G

A

G

A

U

G

U

C

U

U

G

G

U

C

U

AACUACUUUCUU

C

G

C

U

G

G

G

A

A

G

U

A

G

U

U

C

U

A

C

U

A

G

G

G

G

A

G

C

C

A

A

A

AGGCU

GA

G

A

G

A

U

A

U

G

UAUUCAGA

CCCUU

A

U

A

A

C

C

U

G

AU UU

GGUUAAUACCAACGUAGGAAAGUAGUUAUUAACUA

UUCGUC

A UU

G

A

G

A

U

G

U

C

U

U

G

G

U

C

U

A

A

C

U

A

C

U

U

U

C

U

U

C

G

C

U

G

G

G

A

A

G

U

A

G

U

U

Figure 1.

[12], then extracted right-flanking nucleotides from the corresponding EMBL file. Displayed from left to right are the structures MEA(0) and MEA(61)

(the structure MEA(52) is similar to that of MEA(61) and corresponds to a free energy local minimum in the left figure.) The structure MEA(61)

had the highest MEA score over all structural neighbors, including the original structure S0= MEA(0), and had free energy, −46.0 kcal/mol, that was

equal to that of the initial structure S0= MEA(0), which is the minimum free energy structure for the given sequence.

Sample outputs from RNAborMEA on a TPP-riboswitch, AF269819/1811-1669. We took the TPP riboswitch aptamer from the Rfam database

the minimum number of unpaired bases in a hairpin

loop; i.e. there must be at least θ unpaired bases in a

hairpin loop.

3) Nonexistence of pseudoknots: If (i,j) and (k,?) be-

long to S, then it is not the case that i < k < j < ?.

4) No base triples: If (i,j) and (i,k) belong to S, then

j = k; if (i,j) and (k,j) belong to S, then i = k.

The preceding definition provides for an inductive construc-

tion of the set of all secondary structures for a given RNA

sequence a1,...,an. For all values of d = 0,...,n and

all values of i = 1,...,n − d, the collection Si,i+d of all

secondary structures for ai,...,ai+dis defined as follows.

If 0 ≤ d ≤ θ, then Si,i+d = {∅}; i.e. the only secondary

structure for ai,...,ai+dis the empty structure containing

no base pairs (due to the requirement that all hairpins contain

at least θ unpaired bases). If d > θ and Si,jhas been defined

by recursion for all i ≤ j < i + d, then

• Any secondary structure of ai,...,ai+d−1 is a sec-

ondary structure for ai,...,ai+d, in which ai+d is

unpaired.

• If

ai,aj

base pair, then for any secondary structure S for

canforma Watson-Crickorwobble

ai+1,...,ai+d−1, the structure S ∪ {(i,j)} is a sec-

ondary structure for ai,...,ai+d.

• For any intermediate value i + 1 ≤ r ≤ j − θ − 1, if

ar,aj can form a Watson-Crick or wobble base pair,

then for any secondary structure S for ai,...,ar−1

and any secondary structure T for ar+1,...,aj−1, the

structure S ∪ T ∪ {(r,j)} is a secondary structure for

ai,...,ai+d.

Given two secondary structures S,T , we define the base

pair distance between S,T , denoted by dBP(S,T ), to be

the cardinality of the symmetric difference of S,T ; i.e.

dBP(S,T ) = |(S − T ) ∪ (T − S)|.

III. ALGORITHM DESCRIPTION

Given an RNA sequence a = a1,...,an, a secondary

structure S0of a, and a maximum desired value Kmax ≤ n,

the RNAborMEA algorithm computes, for each 1 ≤ i < j ≤

n and each 0 ≤ k ≤ Kmax ≤ n, the maximum score

M(i,j,k)

?

(i,j)∈S

2αpi,j+

?

i unpaired

βqi

Page 4

-50

-40

-30

-20

-10

0

10

20

30

0 10 20 30 40 50 60 70 80 90 100

Free energy in kcal/mol

Base pair distance from MFE structure

TPP-riboswitch, AF269819/1811-1669

50

60

70

80

90

100

110

120

130

140

0 10 20 30 40 50 60 70 80 90 100

MEA score

Base pair distance from MFE structure

TPP-riboswitch, AF269819/1811-1669

Figure 2.

figure. Clearly, MEA(0) and MEA(61) have the least energy, −46.0 kcal/mol, and MEA(61) has the largest MEA score, 134.555, of all secondary

structures for the given RNA sequence. (Right) MEA score for all MEA(k) structural neighbors, 0 ≤ k ≤ 99, of the TPP-riboswitch, AF269819/1811-1669,

described in the previous figure. Clearly, MEA(61) has the largest MEA score, 134.555, of all secondary structures for the given RNA sequence.

(Left) Free energy for all MEA(k) structural neighbors, 0 ≤ k ≤ 99, of the TPP-riboswitch, AF269819/1811-1669, described in the previous

where the first sum is taken over all base pairs (i,j)

belonging to S, the second sum is taken over all unpaired

positions in S, and where pi,j [resp. qi] is the probability

that i,j are paired [resp. i is unpaired] in the ensemble

of low energy structures, and α,β > 0 are weights. Our

computational experiments, as in Figure 2, were carried out

with default values of 1 for α,β. (See Equation 1 for the

formal definition of Boltzmann base pairing probability pi,j.)

The dynamic programming computation of M(i,j,k) is

performed by recursion on increasing values of j −i for all

values 1 ≤ i ≤ j ≤ n and 0 ≤ k ≤ Kmax. The value of

M(i,j,k), stored in the upper triangular portion of matrix

M, will involve taking the maximum over three cases, which

correspond to the inductive construction of all secondary

structures on ai,...,aj, as described in the previous section.

At the same time, the value M(j,i,k), stored in the lower

triangular portion of matrix M, will consist of a triple

r,k0,k1of numbers, such that the following approximately2

holds. (i) If r = 0 then M(i,j,k) is maximized by a k-

neighbor S of S0[i,j] for the subsequence ai,...,aj in

2In this section, we provide the motivating idea. The actual algorithm

description, which deviates slightly from the description here, is given in

the next section and in Figures 3 and 4.

which aj is unpaired. In this case, k0 = k and k1 = 0.

(ii) If r = i, then M(i,j,k) is maximized by a k-neighbor

S of S0[i,j] for the subsequence ai,...,aj in which base

pair (i,j) ∈ S. In this case, k0 = 0 and k1 = k − 1. (i)

If i < r ≤ j − θ − 1 then M(i,j,k) is maximized by a

k-neighbor S of S0[i,j] for the subsequence ai,...,aj in

which base pair (r,j) ∈ S. The left portion of S, which

is S[i,r − 1] will be a k0 neighbor of S[i,r − 1], while

the right portion of S, which is S[r,j] must contain the

base pair (r,j) and itself be a k1 neighbor of S[r,j]. In

summary, the values r,k0,k1will be used in computing the

traceback, where the maximum expected accurate structure

that is a k-neighbor of S[i,j] will be constructed by one

of the following: (i) MEA k-neighbor of S[i,j − 1], in the

event that ajis unpaired in [i,j]; (ii) MEA k−1-neighbor of

S[i+1,j −1], in the event that ai,ajform a base pair; (iii)

MEA k0-neighbor of S[i,r − 1] and the MEA k1-neighbor

of S[r,j], where k0+ k1= k, in the event that ar,ajform

a base pair.

Pseudocode for the algorithm RNAborMEA is given in

Figures 3 and 4. An array M of size n × n × Kmax is

requires to store the MEA scores in M(i,j,k) for all subse-

quences [i,j] and all base pair distances 0 ≤ k ≤ Kmax be-

Page 5

tween structures S[i,j] and initially given structure S0[i,j].

For 1 ≤ i ≤ j ≤ n and all 0 ≤ k ≤ Kmax, the

pseudocode in Figure 4 stores a value of the form (x,y,z)

in the lower triangular portion, M(j,i,k), of the array. Here,

x = 0 indicates that the optimal structure on [i,j], i.e. having

maximum MEA score over all k-neighbors of S0[i,j], is

obtained by not pairing j with any nucleotide in [i,j]; for

values x > 0, hence x ∈ [i,j − θ − 1], the optimal k-

neighbor of S0[i,j] is obtained by pairing x with j. The

values y,z correspond to the values k0,k1, such that: (i) if

x = 0, then the optimal k-neighbor of S0[i,j] is obtained

by first computing the optimal k0-neighbor of S0[i,j − 1],

where k0= k − b0, then leaving j unpaired; (ii) if x = i,

then the optimal k-neighbor of S0[i,j] is obtained by first

computing the optimal k1-neighbor of S0[i+1,j−1], where

k1= k −b1, then adding the enclosing base pair (i,j); (iii)

if x = r ∈ [i + 1,j − θ − 1], then the optimal k-neighbor

of S0[i,j] is obtained by first computing the optimal k0-

neighbor of S0[i,r − 1] as well as the optimal k1-neighbor

of S0[r+1,j−1], then adding the base pair (r,j). This last

calculation must be done over all values k0,k1 such that

k0+ k1 = k. Using the values M(j,i,k) = (x,y,z), the

traceback can be easily computed by recursion; see Figure 5

for pseudocode of traceback.

In a manner similar3to the pseudocode of Figures 3 and

4, we have developed a program to compute the pseudo-

partition function values

?

Z(k)

i,j=

S on [i,j],dBP(S0,S)=k

exp(MEA(S/RT))

We then graphed the Boltzmann probabilities

Z(k)

1,n

Z1,nas well

as the uniform probabilities

of k-neighbors, and N1,nis the total number of secondary

structures. When RT = n, which normalizes the MEA

score to a maximum of 1, it appears that the Boltzmann

distribution is the same as the uniform distribution, as

illustrated in figures and data that cannot be shown, due

to space restrictions.

N(k)

1,n

N1,n, where N(k)

1,nis the number

IV. RESULTS

We extended the RNAborMEA program to support struc-

tural constraints; i.e. where structures are required to contain

certain designated base pairs or for certain designated posi-

tions to be unpaired. Taking the B. subtilis XPT riboswitch,

whose GENE ON and GENE OFF structures were experi-

mentally determined by in-line probing [21], we applied

RNAborMEA to all purine riboswitch aptamers from the

Rfam database [12], where additional flanking nucleotides

were extracted from the EMBL database. Using the struc-

tural alignment program Gardenia [6], we determined

3Essentially, one replaces the operation of taking the maximum by the

a summation, and one replaces the MEA score by the pseudo-Boltzmann

factor exp(MEA(S)/RT).

values k0,k1 for the most structurally similar structures

MEA(k0) to the XPT GENE OFF structure, resp. MEA(k1)

to the XPT GENE ON structure. Due to space constraints, we

can only show one sample result in Figure 6

Quite to our surprise, there appears to be little to no corre-

lation between the structures MFE(k) output by RNAbor

[10] and the structures MEA(k) output by our current

program RNAborMEA. Thus our current program provides

a different manner of probing increasingly distant structural

neighbors of a given RNA structure.

ACKNOWLEDGEMENTS

Research of P. Clote and F. Lou was funded by the Digiteo

Foundation, in the form of a Digiteo Chair of Excellence

to P. Clote. Additionally, P. Clote was partially by the

National Science Foundation under grants DMS-0817971

and DBI-0543506. Any opinions, findings, and conclusions

or recommendations expressed in this material are those of

the authors and do not necessarily reflect the views of the

National Science Foundation.

REFERENCES

[1] C. Abreu-Goodger and E. Merino. RibEx: A web server for

locating riboswitches and other conserved bacterial regulatory

elements. Nucleic Acids Res, 33:W690–W692, 2005.

[2] T.-H. Chang and H.-D. Huang and L.-C. Wu and C.-T. Yeh

and B.-J. Liu and J.-T. Horng. Computational identification of

riboswitches based on RNA conserved functional sequences

and conformations. RNA, 15(7), 2009.

[3] R.C. Olsthoorn and S. Mertens and F.T. Brederode and

J.F. Bol. A conformational switch at the 3?end of a plant

virus RNA regulates viral replication. EMBO J., 18:4856–

4864, 1999.

[4] D. Repsilber and S. Wiese and M. Rachen and A.W. Schroder

and D. Riesner and G. Steger. Formation of metastable RNA

structures by sequential folding during transcription: time-

resolved structural analysis of potato spindle tuber viroid (−)-

stranded RNA by temperature-gradient gel. RNA, 5:574–584,

1999.

[5] P. Bengert and T. Dandekar.

for identification of riboswitch RNAs. Nucleic Acids Res,

32:W154–W159, 2004.

Riboswitch finder – A tool

[6] Guillaume Blin, Alain Denise, Serge Dulucq, Claire Her-

rbach, and Hlne Touz. Alignments of RNA structures.

IEEE/ACM Transactions on Computational Biology and

Bioinformatics, 2010.

[7] M. T. Cheah, A. Wachter, N. Sudarsan, and R. R. Breaker.

Control of alternative RNA splicing and gene expression by

eukaryotic riboswitches. Nature, 447(7143):497–500, May

2007.

[8] C. B. Do, M. S. Mahabhashyam, M. Brudno, and S. Bat-

zoglou. Probcons: Probabilistic consistency-based multiple

sequence alignment. Genome Res., 15(2):330–340, February

2005.

Page 6

1. void traceback(i,j,k,M,paren)

2.//perform traceback on [i,j] for k-neighbors of S0[i,j]

3.if j − i > θ and M(i,j,k) > 0

4.

(r,k0,k1) = M(j,i,k)

5.if r > 0 //j pairs with r in [i,j]

6.

paren[r] = ’(’

//note that paren has dummy char ’$’ at position 0

7.

paren[j] = ’)’

8.

traceback(r + 1,j − 1,k1,M,paren)

9.

traceback(i,r − 1,k0,M,paren)

10.else //r = 0, so j not paired in [i,j]

11.

traceback(i,j − 1,k0,M,paren)

12. return

Figure 5. Pseudocode for the traceback computed by our RNAborMEA algorithm.

>

UUACAAUAUAAUAGGAACACUCAUAUAAUCGCGUGGAUAUGGCACGCAAGUUUCUACCGGGCACCGUAAAUGUCCGACUAUGGGUGAGCAAUG

((((((...........((((((((.....(((((.......)))))..........((((((.......))))))..))))))))......(

11 .................((((((((...(.(((((.......))))).)........((((((.......))))))..)))))))).

80 (((...)).)(.(((((...)(((((..(.((((((...)..))))).)((....))((((((.......))))))).).(((((((

X83878/168-267

Figure 6.

to the XPT GENE ON structure, with Gardenia similarity 155.5, while its similarity to XPT GENE OFF structure is a much lower 66.0. The MEA(80)

structure is most similar to the XPT GENE OFF structures, with Gardenia similarity 101.5, while the its similarity to the XPT GENE ON structure is a

low 5.0. Maximum expected accurate structural neighbors MEA(k), for 0 ≤ k ≤ 150 were computed by RNAborMEA.

Given riboswitch sequence X83878/168-267 with initial structure the minimum free energy structure, the structure MEA(11) is most similar

[9] Thomas Franch, Alexander P. Gultyaev, and Kenn Gerdes.

Programmed cell death by hok/sok of plasmid r1: Processing

at the hok mRNA 3H-end triggers structural rearrangements

that allow translation and antisense RNA binding. J. Mol.

Biol., 273:38–51, 1997.

[10] E. Freyhult, V. Moulton, and P. Clote. Boltzmann probability

of RNA structural neighbors and riboswitch detection. Bioin-

formatics, 23(16):2054–2062, August 2007.

[11] E. Freyhult, V. Moulton, and P. Clote.

server for RNA structural neighbors. Nucleic. Acids. Res.,

35(Web):W305–W309, July 2007.

RNAbor: a web

[12] P. P. Gardner, J. Daub, J. G. Tate, E. P. Nawrocki, D. L. Kolbe,

S. Lindgreen, A. C. Wilkinson, R. D. Finn, S. Griffiths-Jones,

S. R. Eddy, and A. Bateman. Rfam: updates to the RNA

families database. Nucleic. Acids. Res., 37(Database):D136–

D140, January 2009.

[13] H. Kiryu, T. Kin, and K. Asai. Robust prediction of consensus

secondary structures using averaged base pairing probability

matrices. Bioinformatics, 23(4):434–441, February 2007.

[14] Z. J. Lu, J. W. Gloor, and D. H. Mathews. Improved RNA

secondary structure prediction by maximizing expected pair

accuracy. RNA., 15(10):1805–1813, October 2009.

[15] M. Mandal, B. Boese, J.E. Barrick, W.C. Winkler, and R.R.

Breaker. Riboswitches control fundamental biochemical path-

ways in Bacillus subtilis and other bacteria. Cell, 113(5):577–

586, 2003.

[16] M. Mandal and R. R. Breaker. Adenine riboswitches and

gene activation by disruption of a transcription terminator.

Nat. Struct. Mol. Biol., 11(1):29–35, January 2004.

[17] J.S. McCaskill.

base pair binding probabilities for RNA secondary structure.

Biopolymers, 29:1105–1119, 1990.

The equilibrium partition function and

[18] S. Miyazawa. A reliable sequence alignment method based

on probabilities of residue correspondences. Protein Eng.,

8:999–1009, 1994.

[19] R. Nussinov and A. B. Jacobson. Fast algorithm for predicting

the secondary structure of single stranded RNA. Proceedings

of the National Academy of Sciences, USA, 77(11):6309–

6313, 1980.

[20] P. S. Ray, J. Jia, P. Yao, M. Majumder, M. Hatzoglou, and

P. L. Fox. A stress-responsive RNA switch regulates VEGFA

expression. Nature, 457(7231):915–919, February 2009.

[21] A. Serganov, Y.R. Yuan, O. Pikovskaya, A. Polonskaia,

L. Malinina, A.T. Phan, C. Hobartner, R. Micura, R.R.

Breaker, and D.J. Patel. Structural basis for discriminative

regulation of gene expression by adenine- and guanine-

sensing mRNAs. Chem. Biol., 11(12):1729–1741, 2004.

[22] T. Xia, Jr. J. SantaLucia, M.E. Burkard, R. Kierzek, S.J.

Schroeder, X. Jiao, C. Cox, and D.H. Turner. Thermody-

namic parameters for an expanded nearest-neighbor model

for formation of RNA duplexes with Watson-Crick base pairs.

Biochemistry, 37:14719–35, 1999.

[23] M. Zuker.

molecule. Science, 244(7):48–52, 1989.

On finding all suboptimal foldings of an rna

#### View other sources

#### Hide other sources

- Available from Peter Clote · May 26, 2014
- Available from bc.edu