Selection of antisense oligonucleotides based on multiple predicted target mRNA structures.
ABSTRACT Local structures of target mRNAs play a significant role in determining the efficacies of antisense oligonucleotides (ODNs), but some structure-based target site selection methods are limited by uncertainties in RNA secondary structure prediction. If all the predicted structures of a given mRNA within a certain energy limit could be used simultaneously, target site selection would obviously be improved in both reliability and efficiency. In this study, some key problems in ODN target selection on the basis of multiple predicted target mRNA structures are systematically discussed.
Two methods were considered for merging topologically different RNA structures into integrated representations. Several parameters were derived to characterize local target site structures. Statistical analysis on a dataset with 448 ODNs against 28 different mRNAs revealed 9 features quantitatively associated with efficacy. Features of structural consistency seemed to be more highly correlated with efficacy than indices of the proportion of bases in single-stranded or double-stranded regions. The local structures of the target site 5' and 3' termini were also shown to be important in target selection. Neural network efficacy predictors using these features, defined on integrated structures as inputs, performed well in "minus-one-gene" cross-validation experiments.
Topologically different target mRNA structures can be merged into integrated representations and then used in computer-aided ODN design. The results of this paper imply that some features characterizing multiple predicted target site structures can be used to predict ODN efficacy.
-
Article: Selecting optimal antisense reagents.
[show abstract] [hide abstract]
ABSTRACT: Selection of the appropriate target site is crucial to the success of an antisense experiment. The selection is difficult because RNAs fold to form secondary structures, rendering most of the molecule inaccessible to intermolecular base pairing with complementary nucleic acids. Conventional approaches, such as selection by 'sequence-walking' or computer-assisted design, have not brought significant success. Several empirical selection methods have been reported, a number of which are summarised in this review. Of notable significance are the 'global' methods based on mapping of transcripts with the endoribonuclease H (RNase H) and oligonucleotide scanning arrays.Advanced Drug Delivery Reviews 11/2000; 44(1):23-34. · 11.50 Impact Factor -
SourceAvailable from: Peter F. Stadler
Article: Prediction of RNA base pairing probabilities on massively parallel computers.
[show abstract] [hide abstract]
ABSTRACT: We present an implementation of McCaskill's algorithm for computing the base pair probabilities of an RNA molecule for massively parallel message passing architectures. The program can be used to routinely fold RNA sequences of more than 10,000 nucleotides. Applications to complete viral genomes are discussed.Journal of Computational Biology 7(1-2):171-82. · 1.55 Impact Factor -
Article: Prediction of antisense oligonucleotide efficacy by in vitro methods
Olga Matveeva, Brice Felden, Alexander Tsodikov, Joseph Johnston, Brett P Monia, John F Atkins, Raymond F Gesteland, Susan M FreierNature Biotechnology 01/1999; · 23.27 Impact Factor
Page 1
BioMed Central
Page 1 of 12
(page number not for citation purposes)
BMC Bioinformatics
Open Access
Methodology article
Selection of antisense oligonucleotides based on multiple predicted
target mRNA structures
Xiaochen Bo†, Shaoke Lou†, Daochun Sun, Wenjie Shu, Jing Yang and
Shengqi Wang*
Address: Beijing Institute of Radiation Medicine, 27 Taiping Road, Beijing 100850, P R China
Email: Xiaochen Bo - boxc@bmi.ac.cn; Shaoke Lou - lousk@163.com; Daochun Sun - sdcwin@163.com; Wenjie Shu - shuwj@bmi.ac.cn;
Jing Yang - jingyang0511@sina.com; Shengqi Wang* - sqwang@bmi.ac.cn
* Corresponding author †Equal contributors
Abstract
Background: Local structures of target mRNAs play a significant role in determining the efficacies
of antisense oligonucleotides (ODNs), but some structure-based target site selection methods are
limited by uncertainties in RNA secondary structure prediction. If all the predicted structures of a
given mRNA within a certain energy limit could be used simultaneously, target site selection would
obviously be improved in both reliability and efficiency. In this study, some key problems in ODN
target selection on the basis of multiple predicted target mRNA structures are systematically
discussed.
Results: Two methods were considered for merging topologically different RNA structures into
integrated representations. Several parameters were derived to characterize local target site
structures. Statistical analysis on a dataset with 448 ODNs against 28 different mRNAs revealed 9
features quantitatively associated with efficacy. Features of structural consistency seemed to be
more highly correlated with efficacy than indices of the proportion of bases in single-stranded or
double-stranded regions. The local structures of the target site 5' and 3' termini were also shown
to be important in target selection. Neural network efficacy predictors using these features, defined
on integrated structures as inputs, performed well in "minus-one-gene" cross-validation
experiments.
Conclusion: Topologically different target mRNA structures can be merged into integrated
representations and then used in computer-aided ODN design. The results of this paper imply that
some features characterizing multiple predicted target site structures can be used to predict ODN
efficacy.
Background
Antisense oligonucleotides (ODNs) have served as power-
ful tools during the post-genome era. They provide an
important approach to sequence-specific knockdown of
gene expression, offering significant advantages over gene
knockout techniques in respect of cost, time and resource
requirements, and have therefore been widely used for
determining gene function, validating drug targets and
elucidating pathways [1,2]. ODNs also have potential as
novel therapeutic agents for various diseases; several anti-
Published: 09 March 2006
BMC Bioinformatics2006, 7:122 doi:10.1186/1471-2105-7-122
Received: 12 July 2005
Accepted: 09 March 2006
This article is available from: http://www.biomedcentral.com/1471-2105/7/122
© 2006Bo et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Page 2
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 2 of 12
(page number not for citation purposes)
sense compounds have been evaluated in clinical trials
with promising results [3].
However, even with careful design, only a small propor-
tion of ODNs against a given RNA effectively suppress the
target gene in living cells [4]. It is commonly accepted that
the identification of accessible sites in the target RNA is of
great importance in designing ODNs. Various experimen-
tal approaches to the identification of promising local tar-
get sites have been described in recent years [5-10]. There
has also been much interest in computational approaches
to ODN design, which have advantages over experimental
methods in terms of throughput, cost and efficiency. Sev-
eral approaches to efficacy prediction have been proposed
for rational selection of ODN target sites [11-14].
Among the factors that influence the activity of a given
ODN, the local secondary structures of the target mRNA
are very significant in determining in vitro efficiency [5,15-
17] and are therefore particularly important in current
ODN design strategies [18-20]. Local target site structures
have also been used as the basis of rational design for
other kinds of nucleic acids drugs such as antisense RNAs
[21], catalytic RNAs [22] and ribozymes [23]. However,
the term "structure" in these studies refers to "single com-
putational predicted structure", not the real structure of
the target mRNA; RNA secondary structure is difficult to
determine experimentally.
Many RNA secondary structure prediction algorithms
have been proposed during the past 20 years. Since the
thermodynamically most stable structure of a molecule is
generally the one with the minimum free energy (MFE),
the initial aim of these prediction methods is to determine
the MFE structure [24]. Several MFE structure searching
algorithms have been described and are widely used in
related research [25,26], especially in ODN target selec-
tion. However, partly because of the relatively low relia-
bility of individual target mRNA structure predictions,
researchers have often drawn inconsistent conclusions
about favorable local structure motifs. The results
obtained by Lima et al. [18] and Thierry et al. [19] indi-
cated that single-stranded hairpin loops in RNA were the
best target sites, whereas the studies by Laptev et al. [20]
suggested that ODNs targeted to sequences predicted to
form clustered double-stranded structures in RNA tran-
scripts had the best potential.
It is also possible to consider conformations close to the
energy minimum, and algorithms for calculating subopti-
mal structures within certain energy limits have been pro-
posed [27,28]. The popular RNA secondary structure
prediction program MFold now provides results over a
range of free energies, mitigating the uncertainty of MFE
prediction. Although multiple predicted structures are
apparently more reliable, the MFE structure of the target
mRNA is still used as the only structural basis in some
ODN research. The main difficulty may lie in how to use
these foldings simultaneously, since they can be topolog-
ically very different.
Studies on ensembles of target structures in ODNs design
date back to Jaroszewski et al. [29], who considered the 30
lowest-energy computer-simulated structures of rabbit β-
globin mRNA qualitatively. In some thermodynamic
models, multiple predicted target structures have been
merged into the form of free energy [30,31]. The earliest
work on computational ODN design based on the origi-
nal forms of multiple predicted target mRNA structures
was perhaps that of Patzel et al. [17]. Five structures with
low energy were predicted and aligned for a given
sequence stretch, and ODN sequences were chosen if
potentially favourable local structural elements occurred
in all five. In vitro experiments showed that this theoretical
protocol increased the statistical probability of identifying
local target sites accessible to ODN sequences [17,32].
Another way to explore the original forms of optimal and
suboptimal mRNA structures simultaneously, which is
probably more straightforward, is to merge them into a
single-stranded probability profile (SSPP), P = {pi}, 1 ≤ i
≤ n, where pi is the probability that base i is single-
stranded. Actually, algorithms for predicting single-
stranded regions in RNA secondary structures have long
been of interest, since such regions play many important
roles in RNA-RNA, RNA -DNA and RNA-protein interac-
tions [33]. The SFold web server [34] can now directly out-
put the SSPP of an RNA molecule instead of definite
individual structures. Ding and Lawrence [33] presented a
method for predicting accessible sites in the SSPP of rabbit
β-globin mRNA, obtained by summing statistical samples
of probable secondary structures. Their results showed a
significant correlation between the predicted hybridiza-
tion potential and the degree of inhibition of in vitro trans-
lation. Some researchers regard this method as the most
successful [11,12].
The original RNA structural information is used in essen-
tially different ways in the two methods described above.
In the method based on structure alignment, favorable
structural elements are identified by base pairing patterns,
which can be illustrated as graphs. The role of secondary
structures in this method is similar to its role in earlier
studies of ODN design based on the target mRNA MFE
structure. The success of this method relies mainly on the
greatly increased reliability of structural elements. How-
ever, in the method based on SSPP, the RNA structures
resemble a special time series rather than molecular
"structures" in the usual sense. Base pairing patterns, or
topological features, can hardly be explored in SSPP. The
common ground between these two methods is the
Page 3
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 3 of 12
(page number not for citation purposes)
Table 1: Summary of antisense target genes and their predicted structures used in this study
Accession Description No. structuresNo. ODNs
X62295
XM_051583
M14758
NM_004996
Rattus mRNA for vascular type-1 angiotensin II receptor.
Homo sapiens v-raf-1 murine leukemia viral oncogene homolog 1 (RAF1), mRNA
Homo sapiens P-glycoprotein (PGY1) mRNA
Homo sapiens ATP-binding cassette, sub-family C (CFTR/MRP), member 1 (ABCC1), transcript
variant 1, mRNA
Human intercellular adhesion molecule-1 (ICAM-1)
Human PKC alpha mRNA for protein kinase C alpha
Homo sapiens vascular cell adhesion molecule 1 (VCAM1), transcript variant 1, mRNA
Homo sapiens selectin E (endothelial adhesion molecule 1) (SELE), mRNA.
Human endothelial leukocyte adhesion molecule I (ELAM1) mRNA, complete cds
Homo sapiens interleukin 1 receptor, type I (IL1R1), mRNA.
Mouse (clone lambda-c5e) intercellular adhesion molecule 1 (ICAM-1) mRNA, complete cds
Homo sapiens collagen, type I, alpha 1, mRNA (cDNA clone MGC:33668 IMAGE:5264710)
Mus musculus midkine (Mdk), mRNA
P.pyralis (firefly) luciferase gene, complete cds
Human mRNA for raf oncogene
Mus musculus mRNA for DNA methyltransferase 1
Homo sapiens ras homolog gene family, member A, mRNA
Rabbit beta-globin mRNA
Human X-linked inhibitor of apotosis protein XIAP mRNA
Human telomerase reverse transcriptase mRNA
Homo sapiens telomerase RNA component (TERC) on chromosome 3
Human epidermal growth factor receptor (HER3) mRNA, complete cds.
Homo sapiens HUS1 checkpoint homolog (S. pombe) (HUS1), mRNA.
Escherichia coli 23S rRNA gene, strain K12 DSM 30083T
Human c-erb-B-2 mRNA
Human tumor necrosis factor (TNF) mRNA
Homo sapiens dihydrofolate reductase (DHFR), mRNA.
Homo sapiens baculoviral IAP repeat-containing 5 (survivin) (BIRC5), mRNA
Mus musculus dual specificity phosphatase 1 (Dusp1), mRNA
Co-reporter vector pRL-TK, complete sequence
50
50
50
50
36
31
22
14
M24283
X52479
NM_001078
XM_057446
M30640
NM_000877
M31585
BC036531
NM_010784
M15077
X03484
X14805
BC005976
M10843
U45880
AF015950
NR_001566
M34309
NM_004507
AJ278710
X03363
M10988
NM_000791
NM_001168
NM_013642
AF025846
50
37
50
50
50
50
39
50
17
39
50
50
23
26
36
50
23
50
33
50
50
26
50
29
37
50
66
19
35
11
4
20
8
19
4
8
20
8
13
24
6
5
5
22
11
7
3
4
7
5
8
4
emphasis on the role of single-stranded regions in deter-
mining target accessibility. In the SSPP of rabbit β-globin
mRNA, Ding and Lawrence found a significant correlation
between the peak value of SSPP and the degree of inhibi-
tion of translation. The "well-chacterized" single-stranded
regions were revealed by high probability peaks in the
profile [33], while in the systematic alignment of multiple
predicted target mRNA secondary structures, large (>10
nt) consecutive sequence stretches not involved in base
pairing were regarded as favorable structural motifs [17].
Since these two methods were only evaluated on a single
target mRNA, further research is needed on a broad range
of target genes.
The purpose of this article is to systematically explore the
methods for computational selection of ODN target sites
based on features defined in multiple predicted structures
of the target mRNA. In our approach, the predicted mRNA
structures were first merged into integrated representa-
tions. Efficacy-associated features were then screened
from a set of features defined on these representations.
The potential of neural networks for predicting efficacy on
the basis of these features was also validated.
Results
Dataset
Three ODN databases have been reported: ODNBase [35],
AOdb [12] and an unnamed database with experimental
data from Isis Pharmaceuticals [36]. We have also devel-
oped a database named AOBase [37] (NAR molecular
biology database collection entry number 781) for both
the selection and design of ODNs. Currently, it stores 705
ODNs from the published literature tested against tran-
scripts of 54 different target genes. Since no homogeneous
database is publicly available, we perforce used a hetero-
geneous collection of measurements made by different
researchers using different experimental techniques as our
dataset. Four hundred and forty-eight ODNs against 28
different mRNAs were collected from AOBase to construct
this dataset; 54.2% of them had been tested at protein
level and the others at mRNA level. The data selection cri-
teria were similar to those used in other ODN efficacy pre-
diction studies [11-13]: (a) at least 4 ODNs were tested
under the same experimental conditions; (b) ODN effica-
cies were presented as percentages of the control target
gene expression level; (c) virus targets were excluded; (d)
ODNs targeting to the translational initiation site were
Page 4
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 4 of 12
(page number not for citation purposes)
excluded, since regions surrounding the initiation codon
are generally considered to be free of secondary structure
[38]. To keep in line with most of the research on drug
design, the ODN efficacies in our dataset were trans-
formed into [100%-(% of control expression)].
RNA folding calculation times have been greatly reduced
in recent years because of faster computers and improved
algorithms. The MFold web server [39] can now fold 6000
bases for a batch job, which meets the need of full-length
mRNA structure prediction in most cases and is therefore
used in this study. Because the number of predicted sub-
optimal RNA secondary structures increases exponentially
as the folding energy increases [40], only structures within
5 percent of the computed minimum free energy were
taken into consideration. The upper bound on the
number of simultaneously predicted structures was set to
50 to avoid the high computational cost of long RNA
sequences. These settings were the default settings of the
MFold web server. Table 1 is a brief summary of the data-
set.
Integrating multiple predicted target mRNA secondary
structures
In this study, two methods were used to represent the
multiple predicted local structures of target sites syntheti-
cally. All the predicted local structures were first merged
into an SSPP, which is easily calculated from the ss-count
file in the MFold output. For a more illustrative represen-
tation of the multiple predicted structures, the SSPP was
further transformed to a "single-stranded/pair/uncertain"
sequence (SUP representation) S = {si}, where si = 'S' if
base i is single-stranded, si = 'P' if base i is paired with
another base, and si = 'U' if it is uncertain whether base i is
single-stranded. The thresholds suggested by Ding and
Lawrence [33] were used to map SSPP {pi} into the SUP
representation {si}, giving
S
p
p
p
i
i
i
i
=
>
>
≤
≥
( )
1
’ ’,
’ ’,
’ ’,P
.
.
.
.
S
U
0 5
0 2
0 2
0 5
Two representations of multiple predicted structures of rabbit β-globin mRNA (G101-G130)
Figure 1
Two representations of multiple predicted structures of rabbit β-globin mRNA (G101-G130). (a) Single-stranded probability
profile; (b) 'SUP' representation.
110 120130
- - - - - - - - - | - - - - - - - - - | - - - - - - - - - |
P P P P S S P P P P P S U U S S P P U P P P P P S S P U P P
(b)
105 110115 120125130
0 .0
0.2
0.5
1.0
Nucleotide Position
Probability
(a)
Page 5
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 5 of 12
(page number not for citation purposes)
SUP representation loses a lot of structural information in
comparison to graphical illustration or dot-parenthesis
notation of RNA secondary structure and therefore cannot
be used to explore the whole RNA structure. However, for
RNA local structural analysis, especially of very RNA short
regions, SUP gives a competent simplified representation.
Figure 1 illustrates part of these two representations (101–
130 nt) of rabbit β-globin mRNA structure.
Selection of efficacy-associated features
The first important step in computational design based on
multiple predicted mRNA structures is to find the efficacy-
associated features in the SSPP and SUP representations of
the target sites. Since the data structures of these two linear
representations of multiple predicted structures are very
different from graphical illustrations of RNA molecules,
the topological features known to be correlated with effi-
cacy must be redefined. However, new representations
also afford opportunities to discover novel efficacy-associ-
ated features.
A set of features characterizing the local multiply-pre-
dicted target mRNA secondary structures was derived.
Seven of these features were defined on the SSPP represen-
tation (listed in Table 2) while the other eleven were
defined on the SUP sequence representation (listed in
Table 3). The size of the local target, n, in the definition of
features is equal to the length of the ODN.
The mean of all single stranded probabilities within a
given target site, fmean, indicates the probability that the
target site is single-stranded. The maximum value, fmax,
has also been used for this purpose [33]. fimpulse, can be
viewed as a relative peak value compared to the mean. The
other statistics, frms, fpeak, fwave, and fdifference, describe the
structural consistency of the target site.
Numerical features defined on the SUP sequence are
directly derived from research results and from empirical
rules about target site selection based on local structure.
Features fNS, fNP, fPS, and fPP, give an overall description of
target structure, while f5S, f5P, f3S and f3P emphasize the
local structure of the target site termini. Factors fCS and fCP
are derived to confirm whether the occurrence of consec-
utive subsequences in single-stranded or helical regions is
correlated with efficacy, as explored by Patzel et al. [17].
Table 2: Parameters derived from the SSPP representation
Parameter Definition
fmean
Mean,
frms
Root mean square,
fmax
Maximum,
fimpulse
Impulse factor,
fpeak
Peak factor,
fwave
Wave factor,
fdifference
Mean of difference,
f
n
p
meani
i
n
=
=∑
1
1
f
n
pf
rmsi mean
i
n
=−
()
=∑
1
2
1
fp
in
i
max
, ,
max
=1?
=
{ }
f
f
f
impulse
mean
=
max
f
f
f
peak
rms
f
f
mean
=
max
f
wave
rms
=
f
n
pp
differenceii
i
n
∑
=
−
−
+
=
−
1
1
1
1
1
Table 3: Parameters derived from the SUP sequence representation
Parameter Definition
fNS
fNP
fPS
fPP
fCS
fCP
f5S
f5P
f3S
f3P
fSC
Number of bases in single-stranded region
Number of bases in double-stranded region
Percentage of bases in single-stranded region to the length of ODN
Percentage of bases in double-stranded region to the length of ODN
Maximum length of consecutive subsequence in single-stranded region
Maximum length of consecutive subsequence in base pairing
Maximum length of consecutive subsequence in single-stranded region counting from 5' terminal
Maximum length of consecutive subsequence in base pairing counting from 5' terminal
Maximum length of consecutive subsequence in single-stranded region counting from 3' terminal
Maximum length of consecutive subsequence in base pairing counting from 3' terminal
Structure consistency, , where
f
n
E S S
( ,
SCii
i
n
∑
=
−
−
=
−
1
1
1
1
1
)
E x y
( , )
x
x
x
y
’ ’,
,
or y
’ ’,
≠
y xy
,
,
’ ’U
’ ’
≠
’ ’P
=
=
=
≠
≠
=
1
0
-1
U U
S
Page 6
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 6 of 12
(page number not for citation purposes)
Absolute numbers of bases appear in the definitions of
eight features defined on the SUP representation, viz. fNS,
fNP, fCS, fCP, f5S, f5P, f3S and f3P. Since the ODN lengths in
the dataset are not uniform, it is necessary to determine
whether these features are bound up with or limited by
the size of local target. Figure 2(a) shows the distribution
of ODN lengths in the dataset, which range from 10 nt to
22 nt. Most of the ODNs were 20 nt long. The dataset was
divided into groups according to ODN length. The mean
values of these features were calculated for each group and
are shown in Figure 2(b), which indicates no obvious rela-
tionships between these features and target size.
Two types of indices, efficiency prediction potential and
classification potency, were used to measure the suitabil-
ity of these parameters for rational ODN design. The effi-
cacy prediction potential was evaluated by calculating the
correlation between the features and efficacy, using Pear-
son linear correlation, Spearman rank correlation and
Kendall rank correlation. The classification potency was
evaluated by exploring the performance of Fisher linear
discriminators, using the feature as the single independ-
ent variable. The performance was measured as specificity
T
TF
np
+
and sensitivity . Two different
efficacy threshold values, 50% and 75%, were used to dis-
tinguish between positive and negative cases in our data-
set, since these indices depend on threshold. Features
matching at least one of the following two criteria were
selected as efficacy-associated: (a) statistically significant
correlation (p < 0.05) with efficacy; and (b) high specifi-
city (≥0.7) or high sensitivity (≥0.7) in distinguishing
between active and inactive ODNs.
The correlation between parameters and efficacy is pre-
sented in Table 4. Only four features defined on SSPP, i.e.
frms, fmax, fpeak and fdifference, correlated strongly with efficacy.
Table 5 compares the Fisher discrimination results for
each parameter and different thresholds, indicating that
frms, fmax, fpeak, fdifference, fPP, fCS, fCP, f5S and f3S can be used to
distinguish between active and inactive ODNs according
to our criteria.
The most noteworthy finding is that ODN efficacy seems
not to rely greatly on the degree of single-strandedness in
its target site, as suggested in previous publications [18-
20], since fmean, fNS and fPS show neither sufficient correla-
tion with efficacy nor good performance in identifying
active ODNs. The lengths of consecutive single-stranded
regions in the target site, which are characterized by fCS,
prove useful for identifying active ODNs. This result is
partly consistent with the conclusion drawn by Patzel et al
[17]. In contrast to the conclusion of Ding and Lawrence
[33], although fmax is revealed to be efficacy-associated, the
peak value of the target site SSPP correlates negatively
with efficacy.
The helical region in the target site appears to be more
important, as suggested by Laptev [20], because features
fPP and fCP satisfy our selection criteria for ODN classifica-
tion. From the analysis, it is obvious that the structural
consistency features, frms, fpeak, and fdifference, are more
important in target site selection. But this should not be
S
p
n
=
S
T
+
TF
e
p
pn
=
Table 4: Correlations between features and efficacy
ParameterPearson Correlation Spearman CorrelationKendall Correlation
fmean
frms
fmax
fimpulse
fpeak
fwave
fdifference
fNS
fNP
fPS
fPP
fCS
fCP
f5S
f5P
f3S
f3P
fSC
-0.086
-0.150**
-0.099*
0.040
0.124**
-0.030
-0.094*
-0.087
-0.045
-0.073
-0.040
-0.062
-0.012
0.031
-0.009
-0.050
-0.036
-0.064
-0.055
-0.100**
-0.113**
0.039
0.083**
-0.017
-0.034
-0.057
-0.043
-0.050
-0.040
-0.037
-0.012
0.012
-0.039
-0.011
-0.030
-0.045
-0.087
-0.147**
-0.155**
0.060
0.125**
-0.025
-0.051
-0.082
-0.061
-0.075
-0.057
-0.053
-0.019
0.016
-0.055
-0.016
-0.039
-0.066
**. Correlation is significant at the 0.01 level
*. Correlation is significant at the 0.05 level
Page 7
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 7 of 12
(page number not for citation purposes)
interpreted as implying simple correspondences between
structural consistency and efficacy.
ODN efficacy may be closely associated with the local
structures of the 5' and 3' termini of the target sites. Fisher
classifiers using factors f5S and f3S gave high specificity or
sensitivity in ODN discrimination.
Although some features are efficacy-associated, the rela-
tionship between structural factors and efficacy is highly
complex. No single feature has been found to correlate
highly with efficacy, and no feature is reliable on its own
for distinguishing active from inactive ODNs. Two feature
sets defined on the SSPP and SUP representations of the
target site are selected as inputs of efficacy-predicting neu-
The distribution of ODN length and length-limited features
Figure 2
The distribution of ODN length and length-limited features. (a) The distribution of ODN lengths in the dataset; (b) Mean val-
ues of some features of ODNs with different lengths.
Page 8
BMC Bioinformatics 2006, 7:122 http://www.biomedcentral.com/1471-2105/7/122
Page 8 of 12
(page number not for citation purposes)
ral networks: FSSPP = {frms, fmax, fpeak, fdifference} and FSUP =
{fPP, fCS, fCP, f5S, f3S}.
Efficacy predicting using neural networks
To assess the ability of selected features to predict efficacy,
two neural network models were constructed, one for fea-
tures defined on the SSPP and the other for features
derived from the SUP sequence representation of the tar-
get structure.
Previous studies have shown that cross-validation is
important for estimating accuracy [11-14]. Since ODNs
always have similar properties if they are near each other
on the same gene or are measured in the same study, the
network training process should be completely independ-
ent of the test data [12,13]. In this research, cross-valida-
tion was done by the "minus-one-gene" (-gene) [13]
approach. ODNs targeting to 8 mRNAs (listed in Table 6)
were selected alternately from the dataset for testing,
while the remainder, assayed in the same studies, were
used as the training set. The test mRNA selection criteria
were: (a) more than 15 different target sites were tested;
(b) the efficacy of at least one ODN was greater than 75%.
Sixteen neural networks for efficacy prediction were tested
in our cross-validation experiments. The network group
NSSPP (NSSPP1~NSSPP8) took FSSPP as inputs, and the NSUP
group (NSUP1~NSUP8) took FSUP as the input parameter
set. The outputs of all these networks met the condition of
convergence within 100 training cycles.
Several methods have been used to measure the accuracy
of ODN predictors [11-14]. To obtain rounded assess-
ments for the aforementioned neural networks, two dif-
ferent types of indices were computed: (1) specificity SP,
sensitivity Se and accuracy calcu-
lated using fixed threshold values, as mentioned above in
the account of feature selection; (2) the receiver operating
characteristics (ROC) curve [41], which is a plot of Se ver-
sus 1 - SP at different thresholds. The ROC area was calcu-
lated as a quantitative indicator of the ability of the
network to classify. The cutoff efficacy value used to dis-
tinguish positive from negative ODNs in the cross-valida-
tion test was 75%.
The performances of the neural networks are listed in
Table 7. The specificities, SP, of all the networks in these
two groups are greater than the related sensitivities, Se.
This performance is beneficial for ODN design, since users
will only be interested in candidates with high predicted
efficacy in practical applications [14]. The ROC curves of
the 16 networks tested on ODNs targeting to 8 different
mRNAs are shown in Figure 3. The best ROC curve areas
were obtained in cross-validation experiment 7 (network
NSSPP7 and NSUP7), which used the data from Matveeva et
al. [6] as test set. The average ROC area for NSUP is 0.77.
The average for NSUP is 0.73, which is little lower.
Acc
T
T
T
FTF
np
pnpn
=
+
+++
Table 6: Dataset for cross-validation experiments
Networks Accession of test geneNumber in train set Number in test set
NSSPP1 and NSUP1
NSSPP2 and NSUP2
NSSPP3 and NSUP3
NSSPP4 and NSUP4
NSSPP5 and NSUP5
NSSPP6 and NSUP6
NSSPP7 and NSUP7
NSSPP8 and NSUP8
X62295
XM_051583
M14758
M24283
NM_001078
NM_000877
X03484
M10843
412
417
426
356
379
428
428
424
36
31
22
66
35
20
20
24
Table 5: Performance of Fisher linear discriminators for each
parameter
Parameter Threshold = 50% Threshold = 75%
Se
Sp
Se
Sp
fmean
frms
fmax
fimpulse
fpeak
fwave
fdifference
fNS
fNP
fPS
fPP
fCS
fCP
f5S
f5P
f3S
f3P
fSC
0.56
0.58
0.33
0.37
0.50
0.42
0.52
0.56
0.44
0.54
0.49
0.56
0.43
0.31
0.33
0.29
0.34
0.50
0.53
0.54
0.73
0.67
0.61
0.58
0.50
0.50
0.51
0.52
0.45
0.47
0.63
0.71
0.63
0.73
0.64
0.50
0.65
0.50
0.60
0.48
0.56
0.63
0.52
0.56
0.52
0.58
0.70
0.73
0.74
0.73
0.58
0.65
0.50
0.60
0.51
0.54
0.72
0.38
0.59
0.43
0.50
0.48
0.49
0.51
0.54
0.45
0.40
0.30
0.37
0.29
0.37
0.52
. high specificity (≥ 0.7) or high sensitivity (≥ 0.7)
Page 9
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 9 of 12
(page number not for citation purposes)
Discussion
Compared with most other bioinformatics research prob-
lems, studies on computer-aided ODN design are far from
"data rich". Moreover, the data collected from the pub-
lished literature are variable owing to the diversity of
experimental methods. To provide a more reliable basis
for feature-mining and predictor development, one focus
of future work will be on enlargement of the dataset. A
large dataset with quality control will make the analysis
and cross-validation of grouped homogeneous subsets
possible, and therefore make the ODN design systems
more reliable.
Another "data poor" limitation in our study and related
research [6,17,29] is that not all possible target RNA struc-
tures are taken into account. As pointed out by Mathews,
an ideal way to integrate the predicted RNA structures
would be to compute a partition function, which sums the
contributions of all structures weighted by their Boltz-
mann probabilities [44]. However, the determination of a
partition function has O(N3) computational complexity
[45], so this method is practicable only for short RNA
sequences. Several studies have been done on the estima-
tion of partition function with lower computational cost
[44,46-48]. The Vienna RNA secondary structure predic-
tion server [49] can now compute the partition function
of RNA up to 5000 bases for batch jobs. One implication
of this study that warrants further investigation is ODN
design using the partition function of the target mRNA,
which is based on more reliable structural information.
The factors influencing the potential of an ODN are com-
plex and so far poorly understood. Although this paper
focuses on the relationship between ODN efficacy and tar-
get site structure, we do not ignore other factors that have
been shown to influence efficacy, such as chemical prop-
erties, DNA-RNA duplex stability, sequence motifs, meta-
bolic properties of target mRNA, etc. [4]. We do believe
that as more factors are considered in ODN efficacy pre-
diction, the more reliable the target site selection
becomes.
Conclusion
This paper presents a method, based on multiple pre-
dicted target mRNA structures, for reducing the uncer-
tainty of structure prediction in ODN design. Several
efficacy-associated features characterizing the integrated
structure of the target site have been discovered. The struc-
tural consistency features of the target seem to be corre-
lated with efficacy. In contrast, some features of favorable
ODN targets reported in previous research, which empha-
sized single-stranded regions, were found to correlate
weakly with efficacy. In addition, the local structures of
the 5' and 3' termini were shown to be important in target
site selection.
Neural network efficacy predictors using features defined
on integrated structures as inputs have been shown to per-
form well, implying that these features can also be used
for other forms of efficacy prediction such as Bayesian sta-
tistics (BS), multiple linear regression (MLR), decision
tree (DT) and support vector machine (SVM).
Methods
After preliminary experiments, feed-forward network
architecture with a hidden layer containing 20 nodes was
applied to each network. The input neurons used a loga-
rithmic sigmoid (tan-sigmoid) activation function; the
output neurons used a hyperbolic tangent sigmoid (log-
sigmoid) activation function. The weights and bias values
of the networks were updated according to the Levenberg-
Marquardt optimization algorithm [42], which appears to
be the fastest method for training a moderate-size feed-
forward neural network [43]. Matlab® Neural Network
Toolbox 4.0.3 was used for all neural network implemen-
tation.
Authors' contributions
SW guided the project. XB and SW conceived of the study.
XB wrote program, analyzed the results and drafted the
manuscript. SL and DS helped in dataset construction. WS
and JY helped in analysis and discussion, gave useful com-
ments.
Table 7: The performances of two groups of networks in cross-validation experiments
NetworksSe SpAccROC areaNetworks SeSpAcc ROC area
Nsspp1
Nsspp2
Nsspp3
Nsspp4
Nsspp5
Nsspp6
Nsspp7
Nsspp8
0.50
0.33
0
0
0
0
0
0
0.97
0.96
0.93
1
1
0.94
1
1
0.92
0.90
0.59
0.88
0.71
0.85
0.70
0.58
0.91
0.75
0.71
0.66
0.69
0.81
0.98
0.63
Nsup1
Nsup2
Nsup3
Nsup4
Nsup5
Nsup6
Nsup7
Nsup8
0
0
0
0.94
0.93
1
0.86
1
1
0.93
0.86
0.83
0.84
0.64
0.82
0.71
0.9
0.70
0.67
0.60
0.69
0.65
0.66
0.74
0.89
0.89
0.71
0.5
0
0
0.17
0.40
. high specificity (≥ 0.7) or high sensitivity (≥ 0.7)
Page 10
BMC Bioinformatics 2006, 7:122 http://www.biomedcentral.com/1471-2105/7/122
Page 10 of 12
(page number not for citation purposes)
ROC curves for efficacy-predicting neural networks
Figure 3
ROC curves for efficacy-predicting neural networks. ROC curves are shown for networks (a) NSSPP1 and NSUP1; (b) NSSPP2 and
NSUP2; (c) NSSPP3 and NSUP3; (d) NSSPP4 and NSUP4; (e) NSSPP5 and NSUP5; (f) NSSPP6 and NSUP6; (g) NSSPP7 and NSUP7; (h) NSSPP8
and NSUP8.
Page 11
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 11 of 12
(page number not for citation purposes)
Acknowledgements
This work was supported by grants from the National Nature Science
Foundation of China (No.30171111), the National High Technology
Research and Development Program of China (863 Program) (No.
2003AA234031) and the Special Funds for Major State Basic Research Pro-
gram of China (973 Program) (No. 2004CB518904).
References
1. Taylor MF, Wiederholt K, Svetdrup F: Antisense oligonucle-
otides: a systematic high-throughput approach to target val-
idation and gene function determination. Drug Discov Today
1999, 4:562-567.
2.Flaherty KT, Stevenson JP, O'Dwyer PJ: Antisense therapeutics:
lessons from early clinical trials. Curr Opin Oncol 2001,
13:499-505.
3. Crooke ST: An overview of progress in antisense therapeu-
tics. Antisense Nucleic Acid Drug Dev 1998, 8:115-122.
4. Far RK, Nedbal W, Sczakiel G: Concepts to automate the theo-
retical design of effective antisense oligonucleotides. Bioinfor-
matics 2001, 17:1058-1061.
5. Ho SP, Bao Y, Lesher T, Malhotra R, Ma LY, Fluharty SJ, Sakai RR:
Mapping of RNA accessible sites for antisense experiments
with oligonucleotide libraries. Nature Biotechnology 1998,
16:59-63.
6.Matveeva OV, Felden B, Tsodikov A, Johnston J, Monia BP, Atkins JF,
Gesteland RF, Freier SM: Prediction of antisense oligonucle-
otide efficacy by in vitro methods. Nature Biotechnology 1998,
16:1374-1375.
7.Matveeva O, Felden B, Audlin S, Gesteland RF, Atkins JF: A rapid in
vitro method for obtaining RNA accessibility patterns for
complementary DNA probes: correlation with an intracellu-
lar pattern and known RNA structures. Nucleic Acids Res 1997,
25:5010-5016.
8.Milner N, Mir KU, Southern EM: Selecting effective antisense
reagents on combinatorial oligonucleotide arrays. Nature Bio-
technology 1997, 15:537-541.
9.Allawi HT, Dong F, Ip HS, Neri BP, Lyamichev VI: Mapping of RNA
accessible sites by extension of random oligonucleotide
libraries with reverse transcriptase. RNA 2001, 7:314-327.
10.Zhang HY, Modn J, Zhou D, Xu Y, Thonberg H, Liang Z, Wahlestedt
C: mRNA accessibility site tagging (MAST): a novel high
throughput method for selecting effective antisense oligonu-
cleotides. Nucleic Acid Res 2003, 31:e72.
11. Camps-Valls G, Chalk AM, Serrano-Lopez A, Martin-Guerrero JD,
Sonnhammer ELL: Profiled support vector machine for anti-
sense oligonucleotide efficacy prediction. BMC Bioinformatics
2004, 5:135.
12.Chalk AM, Sonnhammer ELL: Computational antisense oligo
prediction with a neural network model. Bioinformatics 2002,
18:1567-1575.
13. Giddings MC, Shah AA, Freier S, Atkins JF, Gesteland RF, Matveeva
OV: Artificial neural network prediction of antisense oligode-
oxynucleotide activity. Nucleic Acids Res 2002, 30:4295-4304.
14.Sætrom Pål: Predicting the efficacy of short oligonucleotides in
antisense and RNAi experiments with boosted genetic pro-
gramming. Bioinformatics 2004, 20:3055-3063.
15.Scherer LJ, Rossi JJ: Approaches for the sequence-specific
knockdown of mRNA. Nature Biotechnology 2003, 21:1457-1465.
16. Vickers TA, Wyatt JR, Freier SM: Effects of RNA secondary struc-
ture on cellular antisense activity. Nucleic Acids Res 2000,
28:1340-1347.
17. Patzel V, Steidl U, Kronenwett R, Haas R, Sczakiel G: A theoretical
approach to select effective antisense oligodeoxyribonucle-
otides at high statistical probability. Nucleic Acids Res 1999,
27:4328-4334.
18.Lima WF, Monia BP, Ecker DJ, Freier SM: Implication of RNA
structure on antisense oligonucleotide hybridization kinet-
ics. Biochemistry 1992, 31:12055-12061.
19.Thierry AR, Rahman A, Dritschilo A: Overcoming multi drug
resistance in human tumor cells using free and liposomally
encapsulated antisense oligodeoxynucleotides. Biochem Bio-
phys Res Commun 1993, 190:952-960.
20.Laptev AV, Lu Z, Colige A, Prockop DJ: Specific inhibition of
expression of a human collagen gene (COL1A1) with modi-
fied
33:11033-11039.
Sczakiel G, Homann M, Rittner K: Computer-aided search for
effective antisense RNA target sequences of the human
immunodeficiency virus type 1. Antisense Res Dev 1993, 3:45-52.
Denman RB: Using RNAFOLD to predict the activity of small
catalytic RNAs. Biotechniques 1993, 15:1090-1095.
James W, Cowe E: Computational approaches to the identifi-
cation of ribozyme target sites. Methods Mol Biol 1997, 74:17-26.
Higgs PG: RNA secondary structure: physical and computa-
tional aspect. Quarterly Reviews of Biophysics 2000, 33:199-253.
Zuker M, Stiegler P: Optimal computer folding of large RNA
sequences using thermodynamics and auxiliary information.
Nucleic Acids Res 1981, 9:133-148.
Dumas JP, Ninio J: Efficient algorithms for folding and compar-
ing nucleic acid sequences. Nucleic Acids Res 1982, 10:197-206.
Zuker M: On finding all suboptimal foldings of an RNA mole-
cule. Science 1989, 244:48-52.
Yamamoto K, Kilamura Y, Yoshikura H: Computation of statisti-
cal secondary structure of nucleic acids. Nucleic Acids Res 1984,
12:335-346.
Jaroszewski JW, Syi JL, Ghosh M, Ghosh K, Cohen JS: Targeting of
antisense DNA: comparison of activity of anti-rabbit beta-
globin oligodeoxyribonucleoside phosphorothioates with
computer predictions of mRNA folding. Antisense Res Dev 1993,
3:339-348.
Walton SP, Stephanopoulos GN, Yarmush ML, Roth CM: Thermo-
dynamic and kinetic characterization of antisense oligodeox-
ynucleotide binding to a structured mRNA. Biophys J 2002,
82:366-377.
Mathews DH, Burkard ME, Freier SM, Wyatt JR, Turner DH: Pre-
dicting oligonucleotide affinity to nucleic acid targets. RNA
1999, 5:1458-1469.
Scherr M, Rossi JJ, Sczakiel G, Patzel V: RNA accessibility predic-
tion: a theoretical approach is consistent with experimental
studies in cell extracts. Nucleic Acids Res 2000, 28:2455-2461.
Ding Y, Lawrence CE: Statistical prediction of single-stranded
regions in RNA secondary structure and application to pre-
dicting effective antisense target sites and beyond. Nucleic
Acids Res 2001, 29:1034-1046.
Ding Y, Chan CY, Lawrence CE: Sfold web server for statistical
folding and rational design of nucleic acids. Nucleic Acids Res
2004, 32:W135-W141.
Giddings MC, Matveeva OV, Atkins JF, Gesteland RF: ODNBase – a
web database for antisense oligonucleotide effectiveness
studies. Bioinformatics 2000, 16:843-844.
Matveeva OV, Mathews DH, Tsodikov AD, Shabalina SA, Gesteland
RF, Atkins JF, Freier SM: Thermodynamic criteria for high hit
rate antisense oligonucleotide design. Nucleic Acids Res 2003,
31:4989-4994.
AOBase [http://www.bioit.org.cn/ao/aobase]
Sohail M, Southern EM: Selecting optimal antisense reagents.
Advanced Drug Delivery Reviews 2000, 44:23-34.
Zuker M: Mfold web server for nucleic acid folding and hybrid-
ization. Nucleic Acids Res 2003, 31:3406-3415.
Wuchty S, Fontana W, Hofacker IL, Schuster P: Complete subop-
timal folding of RNA and the stability of secondary struc-
tures. Biopolymers 1999, 49:145-165.
Hanley J, McNeil BJ: The meaning and use of the area under a
receiver operating characteristic (ROC) curve. Radiology
1982, 143:29-36.
Hagan MT, Menhaj M: Training feedforward networks with the
Marquardt algorithm. IEEE Transactions on Neural Networks 1994,
5:989-993.
Demuth H, Beale M: Neural Network Toolbox MathWorks Inc 2004.
Mathews DH: Using an RNA secondary structure partition
function to determine confidence in base pairs predicted by
free energy minimization. RNA 2004, 10(8):1178-1190.
McCaskill JS: The equilibrium partition function and base pair
probabilities for RNA secondary structure. Biopolymers 1990,
29:1105-1119.
Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schus-
ter P: Fast folding and comparison of RNA secondary struc-
tures. Monatsh Chem 1994, 125:167-168.
antisense oligonucleotides.
Biochemistry 1994,
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
Page 12
Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright
Submit your manuscript here:
http://www.biomedcentral.com/info/publishing_adv.asp
BioMedcentral
BMC Bioinformatics 2006, 7:122http://www.biomedcentral.com/1471-2105/7/122
Page 12 of 12
(page number not for citation purposes)
47.Fekete M, Hofacker IL, Stadler PF: Prediction of RNA base pair-
ing probabilities on massively parallel computers. J Comput
Biol 2000, 7:171-182.
Ding Y, Lawrence CE: A Bayesian statistical algorithm for RNA
secondary structure prediction. Comput Chem 1999,
23:387-400.
Hofacker IL: Vienna RNA secondary structure server. Nucleic
Acids Res 2003, 31:3429-3431.
48.
49.