ArticlePDF Available

AVISPA: A web tool for the prediction and analysis of alternative splicing

October 2013
Genome Biology 14(10):R114

October 2013
14(10):R114

DOI:10.1186/gb-2013-14-10-r114

Source
PubMed

License
CC BY 2.0

Authors:

Transcriptome complexity and its relation to numerous diseases underpins the need to predict in silico splice variants and the regulatory elements that affect them. Building upon our recently described splicing code, we developed AVISPA, a Galaxy-based web tool for splicing prediction and analysis. Given an exon and its proximal sequence, the tool predicts whether the exon is alternatively spliced, displays tissue-dependent splicing patterns, and whether it has associated regulatory elements. We assess AVISPA's accuracy on an independent dataset of tissue-dependent exons, and illustrate how the tool can be applied to analyze a gene of interest. AVISPA is available at http://avispa.biociphers.org.

Analysis of Vegfa exon 6 muscle-dependent inclusion. A subset of the summary page produced by AVISPA is shown. (a) Feature enrichment analysis: the values of the features listed on the left are computed for Vegfa exon 6 and compared against matching feature values in a set of labeled exons. The four sets of exons compared against here are alternative exons ('AS', third column from the left), constitutive exons ('Const', third column from the right), exons differentially included in muscle ('Muscle Inc', second column from the right), and differentially excluded in muscle ('Muscle Exc', right most column). Relative enrichment or depletion of features is indicated using the heat map on the right. Only features with significantly low (blue) and high (red) values are shown here. The genomic region of each feature is indicated by the second from left column using the notation and colors in the top figure. (b) Stacked bar chart (left) of the normalized feature effect (NFE, y-axis) on splicing prediction. Only the top motifs are shown. Motif regions are annotated using the color scheme depicted below. Mapping of the motifs onto the UCSC genome browser is shown on the right. Tracks combining all motifs used by the code model (red), the unbiased motif search [5] (grey scaled), and conservation (blue) are added at the bottom.

…

Figures - available via license: Creative Commons Attribution 2.0 Generic

Content may be subject to copyright.

Available via license: CC BY 2.0

Content may be subject to copyright.

Available via license: CC BY 2.0

Content may be subject to copyright.

SOF T W A R E Open Access

AVISPA: a web tool for the prediction and

analysis of alternative splicing

Yoseph Barash

1,2,4,5*

, Jorge Vaquero-Garcia

1,2

, Juan González-Vallinas

1,3†

, Hui Yuan Xiong

4†

, Weijun Gao

Leo J Lee

and Brendan J Frey

4,5

Abstract

Transcriptome complexity and its relation to numerous diseases underpins the need to predict in silico splice

variants and the regulatory elements that affect them. Building upon our recently described splicin g code, we

developed AVISPA, a Galaxy-based web tool for splicing prediction and analysis. Given an exon and its proximal

sequence, the tool predicts whether the exon is alternatively spliced, displays tissue-dependent splicing patterns,

and whether it has associated regulatory elements. We assess AVISPA's accuracy on an independent dataset of

tissue-dependent exons, and illustrate how the tool can be applied to analyze a gene of interest. AVISPA is

available at http://avispa.biociphers.org.

Alternative splicing (AS) is estimated to affect tran-

scripts from over 95% of human multi-exon genes [1,2],

with the most common class of AS involving cassette

exons. Thousands of alternative cassette exons have

been found to be differentially spliced between mamma-

lian tissues, with tissues such as the brain displaying the

most complex patterns [1,2]. These observations and the

association of many splicing defects with diseases [3]

motivated the recent derivation of a splicing code. The

code, comprising a model with a set of rules that can

predict splicing outcomes given genomic sequence and

cellular context [4,5], used over 1,000 regulatory fea-

tures. Trained using inclusion measurements for 3,700

cassette exons across 27 mouse tissues , the code’s model

was shown to predict differential AS in four tissue

groups: the central nervous system (CNS), muscle, di-

gestive, and embryo versus adult tissues.

The derivation of a predictive splicing code served as

proof-of-concept and enabled insights into RNA biogen-

esis [5,6], but was limited in scope. Specifically, it was

only applied to a subset of alternative exons in specific

studies. However, given the importance of splicing in the

study of gene regulation, development and disease, it

became important to translate the splicing code models

into a tool that would be accessible for researchers

in a wide range of fields. Here, we present AVISPA

(Advanced Visualization of Splicing Prediction and Ana-

lysis), a web tool that enables both prediction and spli-

cing analysis of alternative and tissue-dependent exons

in any gene of interest. Given an exon, the tool predicts

whether it is alternative and whether its inclusion is ex-

pected to change in different tissues. It reports whether

the exon is known to be alternative based on an internal

transcripts database, and perform s in silico splicing ana-

lysis, identifying putative regu latory elements and map-

ping those as tracks in the genome browser.

AVISPA’s pipeline is illustrated in Figure 1. Users sub-

mit a query by specifying the sequence or genomic coor-

dinates of either a single exon, or a triplet of exons that

includes the immediate up- and downstream exons of

the query exon. In the pre-processing step, the query is

matched against an internal database of exon triplets

mined from known transcripts and mapped to the refer-

ence genome. The result of the pre-processing is re-

ported in the AVISPA’s output and indicates existing

evidence for whether the exon is alternatively spliced

based on, for example, alignments of cDNA and E ST

data. After the query has been successfully matched,

RNA features are extracted from the query exon and

flanking regions [5]. At the first prediction stage, the

* Correspondence: yosephb@upenn.edu

†

Equal contributors

Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104,

USA

Department of Computer and Information Science, University of

Pennsylvania, Philadelphia, PA 19104, USA

Full list of author information is available at the end of the article

Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and

reproduction in any medium, provided the original work is properly cited.

Barash et al. Genome Biology 2013, 14:R114

http://genomebiology.com/2013/14/10/R114

extracted features are used to predict whether the query

exon is alternatively or constitutively spliced. If the

query is predicted to be an alternative cassette exon, a

second prediction step assesses whether the exon is dif-

ferentially included in specific tissues.

The new web tool offers marked improvements over

available software. First , it offers 'genome-wide' tissue-

dependent splicing predictions, where any exon can be

submitted as a query. By contrast, the original work only

allowed analysis on a previously mined set of approxi-

mately 12,000 cassette exons, while other tools focus on

quantifying experimental data or general splice site and

motif analysis [7-9]. Second, AVISPA offers a new in

silico analysis of regulatory features and the mapping of

putative regulatory sequence motifs in the genome. As

part of this analysis, motifs found to be robustly in-

cluded in the B ayesian ensemble of models and present

in the query are removed in silico to determine their ef-

fect on splicing prediction. The relative effect of the se

feature removals is reported as a bar chart of the nor-

malized feature effect (NFE). The putative regulatory

motifs are also mapped to the genome using the UCSC

genome browser, where they can be combined with

other tracks, such as known single nucleotide polymor-

phisms and binding measurements of known splicing

factors [10]. Additionally, the enrichment of the query’s

features is compared to refer ence groups such as alter-

native or constitutively spliced exons in AVISPA’s data-

base. Feature enrichment is reported using a standard

heat map ranging from blue, for relatively low values , to

red for relatively high values. For example, a relatively

strong 3’ splice site will appear red, indicating a high

score, while a weak splice site will be marked blue.

The new tool also includes several other improve-

ments. First, the prediction technique is now based on a

Bayesian neural network, which provides improved pre-

diction accuracy compared to a battery of other methods

[11]. Second, the original dataset of 3,700 cassette exons

has been expanded to approximately 30,000 exons using

data from 33 experiments in 11 mouse tissues [12].

Third, AVISPA uses an extended set of features that in-

clude computationally predicted nucleosome occupancy

[13] together with primary sequence motifs implicated

in general splicing regulation.

Assessing splicing prediction accuracy

The new two-stage prediction paradigm, combined with

the expanded dataset, yields a significant improvement

in detecting alternative cassette exons (Figure 2a). For

example, using only tissue-d ependent splicing predictors

achieves an area under the curve (AUC) of 64% for dis-

tinguishing between alternative and constitu tive exons,

compared to 86% by the first stage classifier. The im-

proved accuracy of 94% AUC achieved for detecting

tissue-dependent exons is to be expected, as many

regulatory features and higher intronic conservation

are associated with such exons. Notably, AVISPA’s se-

quence-based predictions offer a significant improve-

ment compared to a similar classifier that directly uses

normalized exon expression measurements from 33 ex-

periments [12]. The latter achieves an overall lower ac-

curacy of 71% AUC, with a significantly 2.5-fold lower

sensitivity (54% versus 21%) for high-confidence events

at a false positive rate of 2%. These results illustr ate the

usefulness of the new tool, which generalizes over ex-

perimental conditions and is not limited by technical

factors such as microarray noise or read coverage. We

note that these accuracy estimates can be considered as

lower bounds, as some of the events labeled as constitu-

tive in our database may be alternative.

Motif Effect

Motif 1

Motif 2

Motif 3

Motif Effect

Feat 1

Feat 2

Feat 4

Feat 1

Feat 3

Feat 4

PreA I1(5') A PostAI1(3') I2(5') I2(3')

Matched query

Alternative splicing

Constitutive

Alternative

Muscle

Digestive

Embryo

CNS

CNS-dependent splicing

Pr[AS]

Motif effect

Motif map

Motif 1

Motif 2

Motif 3

Feat 2

Feat 1

Feat 3

Feat 4

Feat 3

Feat 2

Low

values

High

values

Pr[Ts]

PreA A PostA

(1) Query submission

Cassette DB

Transcript DB

Genome

Unmatched Unmatched

Unmatched

Matched Matched Matched

(2) Query Matching

(4) Splicing regulatory analysis

(3) Splicing prediction

Feature enrichment

Reg F[AS] F[Con] F[Inc] F[Exc]

Figure 1 AVISPA’s analysis pipeline. The analysis is composed of the following steps. (1) Query submission: users submit a query composed of

either a single exon of interest or an exon triplet that also specifies the up- and downstream exons. (2) Query matching: the submitted query is

first matched against internal databases (DB) of known transcripts and alternative exons. If no match is found the query is searched against the

reference genome. If the query cannot be matched (red cross) an error is reported. (3) Splicing prediction: a successfully matched query

(light blue rectangle) is scored as an alternative cassette exon, followed by scoring for differential splicing in four tissue groups. (4) Splicing

analysis: if the query’s predictions pass a user-defined significance threshold a splicing analysis is performed. Analysis includes feature enrichment,

effect of in silico motif removal on splicing predictions, and mapping putative regulatory motifs to the genome. A visual summary of both predic-

tions and splicing analysis is produced (right).

Barash et al. Genome Biology 2013, 14:R114 Page 2 of 8

http://genomebiology.com/2013/14/10/R114

The new tool also achieves significant improvement in

detecting tissue-depend ent exons (Figure 2b). The over-

all accuracy in discriminating between tissue-dependent

and non-tissue-dependent exons is 89% AUC, but varies

considerably between tissues and between differential in-

clusion and exclusion in the same tissue type. For ex-

ample, the highest accuracy was achieved for detecting

increased inclusion of exons in CNS (94% AUC) and

muscle tissues (91% AUC), while the lowest accuracy was

for detecting increased exclusion in CNS (85% AUC) and

increased inclusion in embryonic tissues (82% AUC).

In order to test AVISPA on an independent dataset,

we computed predictions for a set of cassette exons re-

cently shown to be regulated by the Muscleblind-like

proteins Mbnl1/2 in mouse brain, muscle, and heart

[14]. Figure 2c shows AVISPA easily distinguished these

exons from constitutive exons (97% AUC), similar to its

performance in detecting tissue-dependent alternative

exons in the original test set. In discriminating the

Mbnl1/2-regulated exons from non-CNS- and non-

muscle-dependent exons, AVISPA achieves an AUC of

93% and 94%, respectively, while in silico removal of

Mbnl1/2 caused, on average, an almost two-fold larger

effect for Mbnl1/2-regulated exons compared to the ef-

fect for non-muscle- and non-heart-dependent exons.

The improved accuracy in detecting Mbnl1/2-regulated

exons compared to the detection of tissue-dependent

exons in the original test data is likely due to a lower

false detection rate from the RNA-Seq and CLIP-Seq ex-

periments in [14].

Finally, we also tested whether the regulatory features

added in the web tool were useful for splicing prediction.

As expected, many of the sequence motifs implicated in

general splicing regulation were included in the code,

especially for differentiating between alternative and

constitutive exons. By contrast, the relation between nu-

cleosome occupancy and alternative splicing is less well

understood, and has garnered much research attention

[15,16]. We found that the model selected features

representing nucleosome occupancy around the alterna-

tive exon, but training the model without these features

resulted in similar prediction accuracy (data not shown).

This result indicates that other features in our model,

such as di- and tri-nucleotide frequencies, already cap-

tured the 'predictive power' of computationally derived

nucleosome position features.

Vegfa in silico splicing analysis

Previous work demonstrated how the splicing code

model could be used to identify new regulatory ele-

ments, detect novel tissue-dependent splicing e vents,

and study the evolution of splicing across vertebrates

[6]. Here, we illustrate how the new tool can be used to

analyze a well-studied gene of major interest. We

applied AVISPA to the vascular endothelial growth fac-

tor A (Vegfa) gene. Vegfa has a complex and highly

Alternative exons

Pr[Ts] − 64%

33 exon arrays − 71%

Pr[AS] − 86%

Pr[AS] Tissue dep. − 91%

Rand

0.2

0.4

0.6

0.8

Sensitivity

Pr[AS] Mbnl dep. − 97%

CNS Mbnl dep. − 94%

Muscle Mbnl dep. − 95%

Rand

Tissue-dependent exons

0 0.2 0.4 0.6 0.8 1

1−Specificity

0 0.2 0.4 0.6 0.8 1

1−Specificity

0 0.2 0.4 0.6 0.8 1

1−Specificity

Mbnl1/2-dependent exons

abc

CNS.Inc − 94%

CNS.Exc − 85%

Muscle.Inc − 91%

Muscle.Exc − 86%

EM.Inc − 82%

EM.Exc − 89%

Digestive.Inc − 87%

Digestive.Exc − 88%

All Tiss. Dep. − 89%

Figure 2 Prediction accuracy. (a) Differentiating alternative (n = 11,773) from constitutive (n = 9,638) exons. Detecting which exons are

alternative (green) is significantly improved compared to a classifier that uses exon expression measurements from 33 experiments (cyan), and

compared to the original classifier trained to detect only tissue-dependent cassette exons (red). Detection of exons that exhibit tissue-dependent

splicing changes (blue, n = 659) is much more accurate. Numbers within each legend represent the area under the curve (AUC) (b) Identifying

tissue-dependent splicing. Detecting tissue-dependent splicing changes (n = 865) from a random set of non-tissue-dependent exons (n = 4,000)

achieves an overall accuracy of 89% AUC (black). Accuracy varies considerably between tissues and for detecting increased inclusion (solid line) or

exclusion (dashed) in a tissue (c) Detection accuracy for an independent set of Mbnl1/2-dependent exons [14] (n = 461). Differentiating between

Mbnl1/2-dependent exons and constitutive exons achieves 97% AUC. Accuracy in detecting Mbnl1/2-dependent exons from a random set of

non-tissue-dependent exons (n = 2,000) is approximately 94% AUC for both brain (blue) and muscle (red).

Barash et al. Genome Biology 2013, 14:R114 Page 3 of 8

http://genomebiology.com/2013/14/10/R114

conserved pattern of alternative splicing that changes

across tissues and developmental stages [17,18]. Its role

in angiogenesis, which is controlled in part by alternative

splicing, has made it an attractive target of several

anticancer therapies. Accordingly, there is considerable

interest in identifying the factors that regulate the spli-

cing of Vegfa transcripts [18,19]. Analyzing all Vegfa

exon triplets revealed that only exons 6 and 7 were pre-

dicted to be cassette exons, with a score corresponding

to a false positive rate of 0.009 and 0.017, respectively.

For comparison, other exons scores corresponded to a

false positive rate of 0.22 or higher (data not shown).

These prediction s are in line with annotated transcripts,

many of which skip exon 6, one that skips exon 7

(ENSMUST00000113519), and several that skip both.

Exons 6 and 7 were also both predicted, with a fal se

positive rate of less than 0.025, to exhibit differential

splicing in all four major tissue groups modeled. While

confidence in differential splicing was high, the predic-

tions were not conclusive as to whether a relative in-

crease or decrease of exon inclusion would occur in the

tissues. These results reflect the conserved and complex

splicing pattern of Vegfa, with RT-PCR experiments

showing exon 6 to have a complex bi-phasic increase of

inclusion in developing mouse and chicken heart [18].

Prediction of other splice variations of Vegfa, such as the

3’ splice site variation in exon 8, are currently not sup-

ported by the tool.

Figure 3 shows the regulatory feature analysis for

differential inclusion of Vegfa exon 6 in muscle. The

enrichment analysis in Figure 3a highlights that

the alternative exon is depleted of non-tissue-specific

exonic splicing enhancers and is highly enriched with

exonic splic ing silencers. Other highlighted features are

enriched secondary structure-free regions in the up-

stream intron, a distant first AG nucleotide upstream

and a particularly short preceding exon 5. The preceding

exon, for example, is 32 bp long, and the enrichment

analysis indicates that only 0.127% of the tool’s reference

set of alternative exons has a shorter preceding exon.

The most dominant effect of in silico motif removal

(Figure 3b) is for CU-rich elements known to bind Ptb1/

Figure 3 Analysis of Vegfa exon 6 muscle-dependent inclusion. A subset of the summary page produced by AVISPA is shown. (a) Feature

enrichment analysis: the values of the features listed on the left are computed for Vegfa exon 6 and compared against matching feature values in

a set of labeled exons. The four sets of exons compared against here are alternative exons ('AS', third column from the left), constitutive exons

('Const', third column from the right), exons differentially included in muscle ('Muscle Inc', second column from the right), and differentially

excluded in muscle ('Muscle Exc', right most column). Relative enrichment or depletion of features is indicated using the heat map on the right.

Only features with significantly low (blue) and high (red) values are shown here. The genomic region of each feature is indicated by the second

from left column using the notation and colors in the top figure. (b) Stacked bar chart (left) of the normalized feature effect (NFE, y-axis) on

splicing prediction. Only the top motifs are shown. Motif regions are annotated using the color scheme depicted below. Mapping of the motifs

onto the UCSC genome browser is shown on the right. Tracks combining all motifs used by the code model (red), the unbiased motif search [5]

(grey scaled), and conservation (blue) are added at the bottom.

Barash et al. Genome Biology 2013, 14:R114 Page 4 of 8

http://genomebiology.com/2013/14/10/R114

2, followed by an ACUAAY motif known to bind Quak-

ing (Qk). These splice factors have not been previously

reported to regulate Vegfa, but a recent study estimates

39% of regulated exons during myogenesis are under the

control of one or both of these splicing factors [20]. A

smaller effect on splicing prediction in muscle is associ-

ated with intronic motifs known to bind Cugbp1/2 and

Muscleblind-like protein (Mbnl1/2). Both Cugbp1/2 and

Mbnl1/2 have been shown to play an important role in

regulating splicing in developing hearts. Overexpressing

Cugbp1 or knockdown of Mbnl1 in the adult mouse

heart did not alter exon 6 inclusion levels significantly

[18], but recent results point to possible compensatory

effects between Mbnl1 and Mbnl2 [14]. Othe r elements

implicated in Vegfa splicing regulation include the short

YCAY motifs known to bind Nova proteins [21] and a

UGCAUG motif, known to bind the brain- and muscle-

specific splicing factor Fox-1 (A2bp 1) and its paralog

Fox-2 (Rbm9) [22]. While the Fox-1/2 binding site is

highly conserved, it resides over 1 kb downstream of

exon 6 and Fox-1/2 have not been pre viously reported to

regulate Vegfa. However, recent results indicate that

Fox-2 knockdown in mice clearly alters Vegfa splicing

pattern during heart development (Xiang-Dong Fu, per-

sonal communication). Smaller effects associated with

non-tissue-specific regulation include G-rich elements,

known to bind hnRNP-F/H, and U-rich elements that

are known to bind hnRNP-C and Tiar/Tia1 [23]. Not-

ably, Tia1 was previously reported to regulate Vegfa iso-

form expression [24]. Overall, our exploratory analysis of

Vegfa splicing is consistent with previou s results and

offers new insights into mechanism of Vegfa regulation

that are supported by recent experiments.

In summary, we presented a new tool, AVISPA, for in

silico prediction and analysis of alternative splicing. The

tool is not limit ed by technical constraints such as se-

quencing depth, and its predictions for alternatively

spliced exons generalize over unmeasured conditions.

Beyond the splicing outcom e, it offers researchers the

ability to identify putative regulatory elements and map

those to the genome. These capabilities were re cently

used in an independent study to identify TIA1 as a regu-

lator of an alternative exon coding miR-412 [25]. Here,

we used a recent genome-wide study to demon strate the

tool’s accuracy for predicting muscl e, heart, and brain

regulated exons and performed detailed in silico splicing

analysis for the vascular endothelial growth factor A.

Several important elements remain as on-going and

future enhancements of the tool. These include predic-

tions for species other than mouse, predictions for

additional forms of alternative splicing (for example,

alternative 3’ and 5

’ splice sites), and higher resolution of

tissue spe cificity. Currently, AVISPA’s predictions reflect

confidence in alternative splicing or in relative, tissue-

dependent, inclusion changes. Thus, users may infer an

exon is likely to be alternative or to be differentially in-

cluded in brain versus other tissues, but predictions for

absolute inclusion levels (for example, 20% inclusion in

brain, 40% inclusion in liver) are currently not sup-

ported. The tool has some technical limitations as well.

Users can only submit a single cassette exon as a query,

due to the computational burden involved in processing

a query. Queries must be based on annotated exons,

cannot contain exons shorter than 10 bases long, and

non-canonical splicing by the minor spliceosome is

not supported. Nonetheless, the ability to perform

splicing prediction irrespective of experimental limita-

tions , coupled with the new regulatory element s analysis,

should serve researchers studying gene regulation, RNA

biogenesis, and development. Moreover, AVISPA is built

as a flexible platform that can be repeatedly updated as

more data and improved models be come available. The

new computational analysis offered by AVISPA should

facilitate the discovery of novel splicing variant s, regula-

tory elements, and genomic variations affecting pheno-

typic variability or disease.

Materials and methods

Query matching against sequence database

The web-tool’s internal database includes three compo-

nents. The first is a database of 11,773 cassette exons that

we previously mined from sequence libraries [5]. The sec-

ond is a set of 9,638 exon triplets derived from Refseq [26]

and other sequence libraries as described in [5], where

every three constitutive exons in a transcript define a trip-

let. These triplets were also scanned against exon expres-

sion measurements in 11 mouse tissues [12] and triplets

suspected to contain an alternative cassette exon were re-

moved. A query’s sequence is matched against the two

transcript databases using BLAT with parameters set to

tileSize = 8, minMatch = 2, minIdentity = 88. The third

database component is the mouse assembly mm10 from

the UCSC Genome Browser [27]. Matching a query to the

reference genome is executed only if no match in the two

transcript-based databases is found, and only when gen-

omic coordinates for all three exons are specified.

Extended regulatory feature set

We extended the set of putative regulatory features to

include the occurrences of 350 new binding motifs in

the seven regions around a cassette exon as defined in

[5]. The motifs correspond to general splicing related

RNA binding proteins (RBPs), SR and SR-related pro-

teins (SC35, SRp20, 9G8, ASF/SF2, SRp30c, SRp38,

SRp40, SRp55, SRp75, Tra2α/β), and hnRNP proteins

(hnRNPA1, hnRNPA2/B1, hnRNPF/H, hnRNPG).

We also added features encoding computationally pre-

dicted nucleosome occupancy around the alternative

Barash et al. Genome Biology 2013, 14:R114 Page 5 of 8

http://genomebiology.com/2013/14/10/R114

exon [13]. Features were defined as the average and

maximal occupancy scores in the first 100 nucleotides in

each intron and the first or last 50 nucleotides of the

alternative exon.

Extended training set for tissue-specific alternative

splicing

A total of 33 data tracks for normalized expression mea-

surements using Affymetrix exon arrays were down-

loaded from the UCSC Genome Browser. The tracks are

composed of measurements in 11 mouse tissues (brain,

embryo, heart, kidney, liver, lung, muscle, ovary, spleen,

testis, thymus) with three replicates for each tissue [12].

The expression of each exon and the relative inclusion

of a putative cassette exon compared to its flanking

exons were used as input features to train an ensemble

of Bayesian neural networks [11]. The networks used

these input features to identify differential inclusion and

exclusion of alternative exons in the four tissue groups

previously identified (CNS, muscle, digestive, embryo).

Training was based on a subset of 3,770 cassette exons

for which three probabilities for increased inclusion

inc

), increased exclusion (q

exc

) and no change (q

)in

each of the four tissue groups was previously computed

[5]. This training step allowed the calibration of differen-

tial splicing estimation obtained from the new set of 33

experiments to the estimates used to train the original

splicing model [5]. The model ensemble was then used

to estimate differential splic ing (q

inc

exc

) for the

remaining exons. The differential splicing estimates for

the original set of 3,770 exons were averaged between

the two datasets and care was taken to make sure pre-

dictions were based on non-overlapping training sets.

Predicting alternative cassette exons using expression

data and a single stage tissue-specific classifier

The33expressiondatatracksdescribedabovewerealso

used to train a Bayesian deep neural network classifier [11],

denoted '33 exon arrays'inFigure2a.Anyexontriplets

from the set of 11,773 cassette exons and 9,638 putative

constitutive exons that had missing data were removed,

maintaining a total of 8,986 for training and test purposes.

The prediction of alternative exons using a single stage

tissue classifier, denoted Pr[Ts] in Figure 2a, used a max

function over the chance of differential splicing (1 - p

)

in each tissue.

Training a splicing code model for alternative exons and

for tissue-dependent splicing

For the purpose of inferring a regulatory model, we used

a Bayesian neural network that worked better for this

task than support vector machines, boosted decision

trees , and other leading machine learning techniques

[11]. To discriminate between alternative and constitutive

exons the network was set to have 10 hidden units and a

sparsity prior of 0.9 for connections between features and

hidden units. For predicting tissue-dependent splicing the

network was set to have 20 units and a sparsity prior of

0.95. Varying the sparsity prior between 0.85 and 0.95 and

adding up to 10 more hidden units did not have a signifi-

cant effect on the results (data not shown). An ensemble

of 5,000 models generated by Markov chain Monte Carlo

simulations was used to estimate differential splicing

inc

exc

) as was previously described [11].

Scoring tissue-dependent splicing

Under the new framework the probability that any given

triplet of exons contain a tissue-dependent cassette exon

can be expressed as:

¼ ch r

Þ¼PASr

ÞPO

¼ ch r

; ASÞ;jðjðjð

where P(O

= ch|r

) denotes the probability to observe a

change in the exon’s inclusion level in tissue t given the

exon’s feature vector r

, P(AS|r

) is the probability the

exon is alternative, and P(O

= ch|r

, AS) is the probabil-

ity of observin g differential splicing given that the exon

is alternative. The first term on the right is computed by

the first stage predictor, while the second term is com-

puted by the second stage predictor.

ROC performance evaluation

Receiver operating characteristic (ROC) performance

was evaluated using repeated five-fold cross-validation

and care was taken to make sure predictions were based

on non-redundant training sets, as was previously de-

scribed [5]. Evaluation of discriminating between alter-

native and constitutive exons was based on a set of

11,773 cassette exons and 9,638 putative constitutive

exons derived from EST/cDNA sequences [5]. In order

to assess the accuracy of detecting cassette exons that

exhibit a tissue-dependent splicing pattern ( for example,

differential inclusion in muscle) we compared the scores

of such exons to those of a random set of exon triplets

that do not exhibit this splicing pattern. The random set

was selected using the following procedure. First, we

used the 33 genome-wide exon expression measure-

ments described above to quantify the inclusion level of

all exon triplets from all Refseq transcripts. Next, we dis-

carded triplets with missing data and required the rela-

tive expression of the upstream and downstream exons

to be no more than 1.5-fold apart in all experiments. In

order to avoid probe sets with little signal, we required

the up- and downstream exons to have a normalized ab-

solute value of at least 0.1 in at least 15 experiments.

Additionally, we required in at least three experiments

of the tissue group of interest (for e xample, digestive)

that the up- and downstream exons are not in the

Barash et al. Genome Biology 2013, 14:R114 Page 6 of 8

http://genomebiology.com/2013/14/10/R114

bottom 20 percentile. Finally, the relative expression of

each middle exon compared to its flanking exons was used

to estimate the chance it is differentially included in each

tissue group [28]. Any triplet that had a P-value of 0.7 or

higher was deemed non-tissue-dependent and a set of ap-

proximately 2,000 exons was then selected for each tissue

as a non-tissue-dependent exon set. Exons were selected

randomly from the respective genes and then randomly

from the relative order within the gene. We then verified

that these are not biased in terms of relative location

within the gene or gene length compared to a random

sample of triplets from the genome (data not shown).

While small variations in the parameters of the above

process did not have a notable effect on the results, we did

detect an apparent selection bias in this procedure. Specif-

ically, using expression measurements to select exons

based on high confidence in non-tissue-dependent spli-

cing may favor constitutive exons. Notably, the 'true' labels

of any given exon as alternative or constitutive is unavail-

able. However, since our prediction algorithm has proved

accurate in distinguishing alternative from constitutive

exons (Figure 2a), we applied it to the set of 2,000 non-

tissue-dependent exons selected for each tissue group.

Compared to a random set of 1,000 exon triplets, these

exons were biased towards constitutive exon scores

(Additional file 1). To correct for this apparent bias we

subsampled 1,000 exons for each tissue group so that their

scores as alternative match those in the random set

(Additional file 1, green and red lines). This corrected set

of a total of 4,000 predictions was then used for subsequent

analysis (Figure 2b,c). We note that without this correction

the initial set of non-tissue-dependent exons results in im-

proved performance compared to that shown in Figure 2.

In silico feature removal and normalized feature effect

In order to evaluate the relative effect of a putative regu-

latory sequence motif (for example, the occurrence of a

[U]GCAUG motif, known to bind Fox1/2, upstream of

the alternative exon), the feature is first set to zero. The

splicing predictions with the mutated feature, denoted

inc

Δf

; p

exc

Δf



; are then computed with the total effect on

differential splicing defined a s FE

¼jp

inc

 p

inc

jþjp

exc

p

exc

j. This definition aims to capture the effect of features

that not only change the confidence in a splicing change

; p

Δf



; but also change the relative confidence in

either differential inclusion or exclusion. Finally, the nor-

malized feature effect (NFE) is defined as:

NFE

∑

j∈J

where J is the set of robust features. By itself, the NFE has

no statistical significance measure associated with it. The

NFE serves mainly as a quantitative tool to guide re-

searchers interested in knowing which of the identified

regulatory features have a higher effect on the model’spre-

diction confidence.

Additional file

Additional file 1: Figure S1. Correcting constitutive exons selection

bias in non-tissue-dependent exons. Exon scores for being alternative

versus constitutive (x-axis) are plotted as a cumulative distribution func-

tion (CDF, y-axis). The initial set of selected non-tissue-dependent exons

(blue) was biased towards constitutive exons compared to a random

sample of 1,000 exon triplets from the genome (red). Subsampling the

original set of 2,000 exons per tissue to fit the score distribution of a

random set gave a good fit (green). Both green and red line plots are

accumulated over all exons in all tissues as no significant difference was

observed between the different tissues.

Abbreviations

AS: Alternative splicing; AUC: Area under the curve; AVISPA: Advanced

visualization of splicing prediction and analysis; CNS: Central nervous system;

EST: Expressed sequence tag; NFE: Normalized feature effect.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

YB and BJF conceived of the project. YB developed the combined prediction

framework and in silico feature analysis. YB, JVG and WG developed the

analysis pipeline with input from all authors. YB, JVG, WG and LJL created

the sequence databases. JGV, WG and JVG developed the web tool. HYX, YB

and BJF developed the prediction algorithms. YB and JVG performed the

data analysis. YB wrote the paper with input from BJF. All authors read and

approved the final manuscript.

Acknowledgements

We thank Ben Blencowe for his support and advice throughout this project. We

thank Kristen Lynch for helpful feedback and discussions; Xiang-Dong Fu for

sharing experimental results; members of the Barash, Lynch, Blencowe and Frey

labs for providing helpful comments on the manuscript and suggestions for the

web tool. Funding to JGV through FPI grant from the Spanish Ministry of Sci-

ence; HYX, WG and LJL were funded from NSERC Steacie and CIHR grants to

BJF. While at the University of Toronto, YB was funded from an OGI Spark grant

and a Genome Canada grant to BJF and others.

Author details

Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104,

USA.

Department of Computer and Information Science, University of

Pennsylvania, Philadelphia, PA 19104, USA.

Universitat Pompeu Fabra,

Barcelona 08003, Spain.

Department of Electrical and Computer

Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada.

Banting

and Best Department of Medical Research, University of Toronto, Toronto,

ON M5G 1L6, Canada.

Received: 9 July 2013 Accepted: 11 October 2013

Published: 24 October 2013

References

1. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Deep surveying of alternative

splicing complexity in the human transcriptome by high-throughput

sequencing. Nat Genet 2008, 40:1413–1415.

2. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF,

Schroth GP, Burge CB: Alternative isoform regulation in human tissue

transcriptomes. Nature 2008, 456:470–476.

3. Wang ET, Cooper AT: Splicing in disease: disruption of the splicing code

and the decoding machinery. Nature 2007, 8:749–761.

4. Wang Z, Burge CB: Splicing regulation: from a parts list of regulatory

elements to an integrated splicing code. RNA 2008, 14:802–813.

Barash et al. Genome Biology 2013, 14:R114 Page 7 of 8

http://genomebiology.com/2013/14/10/R114

5. Barash Y, Calarco JA, Gao W, Pan Q, Wang X, Shai O, Blencowe BJ, Frey BJ:

Deciphering the splicing code. Nature 2010, 465:53–59.

6. Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ,

Slobodeniuc V, Kutter C, Watt S, Çolak R, Kim T, Misquitta-Ali CM, Wilson

MD, Kim PM, Odom DT, Frey BJ, Blencowe BJ: The evolutionary landscape

of alternative splicing in vertebrate species. Science 2012, 338:1587–1593.

7. Yeo G, Burge CB: Maximum entropy modeling of short sequence motifs

with applications to RNA splicing signals. J Comput Biol 2004, 11:377–394.

8. Dogan RI, Getoor L, Wilbur WJ, Mount SM: SplicePort–an interactive splice-

site analysis tool. Nucleic Acids Res 2007, 35:W285–W291.

9. Cartegni L, Wang J, Zhu Z, Zhang MQ, Krainer AR: ESEfinder: a web

resource to identify exonic splicing enhancers. Nucleic Acids Res 2003,

31:3568–3571.

10. Yeo GW, Coufal NG, Liang TY, Peng GE, Fu X-D, Gage FH: An RNA code for

the FOX2 splicing regulator revealed by mapping RNA-protein interac-

tions in stem cells. Nat Struct Mol Biol 2009, 16:130–137.

11. Xiong HY, Barash Y, Frey BJ: Bayesian prediction of tissue-regulated

splicing using RNA sequence and cellular context. Bioinformatics 2011,

27:2554–2562.

12. Pohl AA, Sugnet CW, Clark TA, Smith K, Fujita PA, Cline MS: Affy exon

tissues: exon levels in normal tissues in human, mouse and rat.

Bioinformatics 2009, 25:2442–2443.

13. Xi L, Fondufe-Mittendorf Y, Xia L, Flatow J, Widom J, Wang JP: Predicting

nucleosome positioning using a duration Hidden Markov Model.

BMC Bioinformatics 2010, 11:346.

14. Wang ET, Cody NAL, Jog S, Biancolella M, Wang TT, Treacy DJ, Luo S,

Schroth GP, Housman DE, Reddy S, Lécuyer E, Burge CB: Transcriptome-

wide regulation of pre-mRNA splicing and mRNA localization by muscle-

blind proteins. Cell 2012, 150:710–724.

15. Schwartz S, Meshorer E, Ast G: Chromatin organization marks exon-intron

structure. Nat Struct Mol Biol 2009, 16:990

–995.

16. Tilgner H, Nikolaou C, Althammer S, Sammeth M, Beato M, Valcárcel J,

Guigó R: Nucleosome positioning as a determinant of exon recognition.

Nat Struct Mol Biol 2009, 16:996–1001.

17. Harper SJ, Bates DO: VEGF-A splicing: the key to anti-angiogenic thera-

peutics? Nat Rev Cancer 2008, 8:880–887.

18. Kalsotra A, Xiao X, Ward AJ, Castle JC, Johnson JM, Burge CB, Cooper TA:

A postnatal switch of CELF and MBNL proteins reprograms alternative

splicing in the developing heart. Proc Natl Acad Sci USA 2008,

105:20333–20338.

19. Nowak DG, Amin EM, Rennel ES, Hoareau-Aveilla C, Gammons M, Damo-

doran G, Hagiwara M, Harper SJ, Woolard J, Ladomery MR, Bates DO:

Regulation of Vascular Endothelial Growth Factor (VEGF) Splicing from

Pro-angiogenic to Anti-angiogenic Isoforms: a Novel Therapeutic

Strategy for Angiogenesis. J Biol Chem 2009, 285:5532–5540.

20. Hall MP, Nagel RJ, Fagg WS, Shiue L, Cline MS, Perriman RJ, Donohue JP,

Ares M: Quaking and PTB control overlapping splicing regulatory

networks during muscle cell differentiation. RNA 2013, 19:627–638.

21. Ule J, Stefani G, Mele A, Ruggiu M, Wang X, Taneri B, Gaasterland T,

Blencowe BJ, Darnell RB: An RNA map predicting Nova-dependent

splicing regulation. Nature 2006, 444:580–586.

22. Kawamoto S: Neuron-specific alternative splicing of nonmuscle myosin II

heavy chain-B pre-mRNA requires a cis-acting intron sequence. J Biol

Chem 1996, 271:17613–17616.

23. Aznarez I, Barash Y, Shai O, He D, Zielenski J, Tsui LC, Parkinson J, Frey BJ,

Rommens JM, Blencowe BJ: A systematic analysis of intronic sequences

downstream of 5’ splice sites reveals a widespread role for U-rich motifs

and TIA1/TIAL1 proteins in alternative splicing regulation. Genome Res

2008, 18:1247 –1258.

24. Chen M, Manley JL: Mechanisms of alternative splicing regulation:

insights from molecular and genomics approaches. Nat Rev Mol Cell Biol

2009, 10:741–754.

25. Melamed Z, Levy A, Ashwal-Fluss R, Lev-Maor G, Mekahel K, Atias N, Gilad S,

Sharan R, Levy C, Kadener S, Ast G: Alternative Splicing Regulates Biogen-

esis of miRNAs Located across Exon-Intron Junctions. Mol Cell 2013,

50:869–

881.

26. Pruitt KD, Tatusova T, Brown GR, Maglott DR: NCBI Reference Sequences

(RefSeq): current status, new features and genome annotation policy.

Nucleic Acids Res 2012, 40:D130–D135.

27. Dreszer TR, Karolchik D, Zweig AS, Hinrichs AS, Raney BJ, Kuhn RM, Meyer

LR, Wong M, Sloan CA, Rosenbloom KR: The UCSC Genome Browser

database: extensions and updates 2011. Nucleic Acids Res 2012,

40:D918–D923.

28. Ben-Dor A, Friedman N, Yakhini Z: Scoring Genes for Relevance. Agilent; 2000.

doi:10.1186/gb-2013-14-10-r114

Cite this article as: Barash et al.: AVISPA: a web tool for the prediction

and analysis of alternative splicing. Genome Biology 2013 14:R114.

Submit your next manuscript to BioMed Central

and take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color ﬁgure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at

www.biomedcentral.com/submit

Barash et al. Genome Biology 2013, 14:R114 Page 8 of 8

http://genomebiology.com/2013/14/10/R114

Predicting exon criticality from protein sequence

Article

Full-text available

Mar 2022
NUCLEIC ACIDS RES

Alternative splicing is frequently involved in the diversification of protein function and can also be modulated for therapeutic purposes. Here we develop a predictive model, called Exon ByPASS (predicting Exon skipping Based on Protein amino acid SequenceS), to assess the criticality of exon inclusion based solely on information contained in the amino acid sequence upstream and downstream of the exon junctions. By focusing on protein sequence, Exon ByPASS predicts exon skipping independent of tissue and species in the absence of any intronic information. We validate model predictions using transcriptomic and proteomic data and show that the model can capture exon skipping in different tissues and species. Additionally, we reveal potential therapeutic opportunities by predicting synthetically skippable exons and neo-junctions arising in cancer cells.

Omics analyses provide insights to CART cell therapy resistance

Article

Full-text available

May 2021

Chimeric antigen receptor T (CART) cell therapy has revolutionized the treatment of relapsed/refractory B cell malignancies in recent years. Despite high initial response rates, durable response rates are low, and CART cell efficacy in solid tumors is very modest. Additionally, the overall success of CART cell therapy is limited by toxicities such as cytokine release syndrome and neurotoxicity. Decades of advancement in genome sequencing technology and bioinformatics have given us a better understanding of how cancer develops and evolves following treatments. This has resulted in a better understanding of patient response to cancer treatment on a molecular level. Resistance to CART cell therapy can be mediated by the cancer cells, the tumor microenvironment, or the patient’s T cells. In this review, we will outline lessons learned from multi-omics studies (1) to identify biomarkers of response or toxicity to CART cell therapy or (2) to develop biomarker-guided therapeutic interventions to overcome these limitations.

Ancient antagonism between CELF and RBFOX families tunes mRNA splicing outcomes

Preprint

Full-text available

Jan 2017

Over 95% of human multi-exon genes undergo alternative splicing, a process important in normal development and often dysregulated in disease. We sought to analyze the global splicing regulatory network of CELF2 in human T cells, a well-studied splicing regulator critical to T cell development and function. By integrating high-throughput sequencing data for binding and splicing quantification with sequence features and probabilistic splicing code models, we find evidence of splicing antagonism between CELF2 and the RBFOX family of splicing factors. We validate this functional antagonism through knockdown and overexpression experiments in human cells and find CELF2 represses RBFOX2 mRNA and protein levels. Because both families of proteins have been implicated in the development and maintenance of neuronal, muscle, and heart tissues, we analyzed publicly available data in these systems. Our analysis suggests global, antagonistic co-regulation of splicing by the CELF and RBFOX proteins in mouse muscle and heart in several physiologically relevant targets including proteins involved in calcium signaling and members of the MEF2 family of transcription factors. Importantly, a number of these co-regulated events are aberrantly spliced in mouse models and human patients with diseases that affect these tissues including heart failure, diabetes, or myotonic dystrophy. Finally, analysis of exons regulated by ancient CELF family homologs in chicken, and Drosophila suggests this antagonism is conserved through evolution.

Enhanced Integrated Gradients: Improving interpretability of deep learning models using splicing codes as a case study

Article

Full-text available

Jun 2020
GENOME BIOL

Despite the success and fast adaptation of deep learning models in biomedical domains, their lack of interpretability remains an issue. Here, we introduce Enhanced Integrated Gradients (EIG), a method to identify significant features associated with a specific prediction task. Using RNA splicing prediction as well as digit classification as case studies, we demonstrate that EIG improves upon the original Integrated Gradients method and produces sets of informative features. We then apply EIG to identify A1CF as a key regulator of liver-specific alternative splicing, supporting this finding with subsequent analysis of relevant A1CF functional (RNA-seq) and binding data (PAR-CLIP).

An Overview of Alternative Splicing Defects Implicated in Myotonic Dystrophy Type I

Article

Full-text available

Sep 2020

Myotonic dystrophy type I (DM1) is the most common form of adult muscular dystrophy, caused by expansion of a CTG triplet repeat in the 3' untranslated region (3'UTR) of the myotonic dystrophy protein kinase (DMPK) gene. The pathological CTG repeats result in protein trapping by expanded transcripts, a decreased DMPK translation and the disruption of the chromatin structure, affecting neighboring genes expression. The muscleblind-like (MBNL) and CUG-BP and ETR-3-like factors (CELF) are two families of tissue-specific regulators of developmentally programmed alternative splicing that act as antagonist regulators of several pre-mRNA targets, including troponin 2 (TNNT2), insulin receptor (INSR), chloride channel 1 (CLCN1) and MBNL2. Sequestration of MBNL proteins and up-regulation of CELF1 are key to DM1 pathology, inducing a spliceopathy that leads to a developmental remodelling of the transcriptome due to an adult-to-foetal splicing switch, which results in the loss of cell function and viability. Moreover, recent studies indicate that additional pathogenic mechanisms may also contribute to disease pathology, including a misregulation of cellular mRNA translation, localization and stability. This review focuses on the cause and effects of MBNL and CELF1 deregulation in DM1, describing the molecular mechanisms underlying alternative splicing misregulation for a deeper understanding of DM1 complexity. To contribute to this analysis, we have prepared a comprehensive list of transcript alterations involved in DM1 pathogenesis, as well as other deregulated mRNA processing pathways implications.

Epigenome-based splicing prediction using a recurrent neural network

Article

Full-text available

Jun 2020
PLOS COMPUT BIOL

Alternative RNA splicing provides an important means to expand metazoan transcriptome diversity. Contrary to what was accepted previously, splicing is now thought to predominantly take place during transcription. Motivated by emerging data showing the physical proximity of the spliceosome to Pol II, we surveyed the effect of epigenetic context on co-transcriptional splicing. In particular, we observed that splicing factors were not necessarily enriched at exon junctions and that most epigenetic signatures had a distinctly asymmetric profile around known splice sites. Given this, we tried to build an interpretable model that mimics the physical layout of splicing regulation where the chromatin context progressively changes as the Pol II moves along the guide DNA. We used a recurrent-neural-network architecture to predict the inclusion of a spliced exon based on adjacent epigenetic signals, and we showed that distinct spatio-temporal features of these signals were key determinants of model outcome, in addition to the actual nucleotide sequence of the guide DNA strand. After the model had been trained and tested (with >80% precision-recall curve metric), we explored the derived weights of the latent factors, finding they highlight the importance of the asymmetric time-direction of chromatin context during transcription.

Predicting cell-type-specific exon inclusion in the human brain reveals more complex splicing mechanisms in neurons than glia

Preprint

Full-text available

Mar 2024

Alternative splicing contributes to molecular diversity across brain cell types. RNA-binding proteins (RBPs) regulate splicing, but the genome-wide mechanisms remain poorly understood. Here, we used RBP binding sites and/or the genomic sequence to predict exon inclusion in neurons and glia as measured by long-read single-cell data in human hippocampus and frontal cortex. We found that alternative splicing is harder to predict in neurons compared to glia in both brain regions. Comparing neurons and glia, the position of RBP binding sites in alternatively spliced exons in neurons differ more from non-variable exons indicating distinct splicing mechanisms. Model interpretation pinpointed RBPs, including QKI, potentially regulating alternative splicing between neurons and glia. Finally, using our models, we accurately predict and prioritize the effect of splicing QTLs. Taken together, our models provide new insights into the mechanisms regulating cell-type-specific alternative splicing and can accurately predict the effect of genetic variants on splicing.

Intelligent Drug Design and Use for Cancer Treatment: The Roles of AI and Precision Oncology in Targeting Patient-Specific Splicing Profiles

Chapter

Jan 2023

The development of new drugs is expensive, time-consuming, and often results in failure. These problems can partially be solved through the use of AI to identify drug targets, search for molecules capable of interacting with these targets, and then model the interactions of the drug and its target while modelling the physiochemical properties of this drug. Alternative splicing is commonly altered in cancer and as such has become a target for the designing of new drugs. While many drugs have been designed to target either the new isoforms that favour cancer development or proteins involved in the splicing pathway, AI can improve this by helping screen proteome and transcriptome databases to identify new splice variants. AI can also model the three-dimensional structure of new isoforms in order to screen for compounds that can bind exclusively to these isoforms.

WSN Node Access Authentication Protocol Based on Trusted Computing

Article

Mar 2022
SIMUL MODEL PRACT TH

Although wireless sensor networks (WSNs) are widely used in many fields, such as industrial production, medical studies, and environmental monitoring, they are vulnerable to various security problems. This study proposes a WSN node access authentication protocol based on trusted connection architecture to prevent easy node capture and various malicious attacks as well as to address the limited energy and computing power and different levels of node credibility in WSNs. First, each node of a WSN is configured using a trusted platform module to ensure complete key generation and safe storage, and thus provides security for the access protocol. Second, an alarm mechanism is introduced to avoid cluster node issues, such as not forwarding data, forwarding part of the data, and forwarding wrong data. This mechanism enhances the troubleshooting capability. Finally, during node access, bidirectional node identity authentication, platform identity authentication, and platform integrity verification are performed to achieve trusted node access. Our protocol is formally verified using Syverson-Van Oorschot (SVO) logic. The security features are applied to analyze the protocol, and back-end analysis modules such as On-the-fly Model-Checker (OFMC) and Constraint Logic based Attack Searcher (CL-AtSe) of the Automated Validation of Internet Security Protocols and Applications (AVISPA) tool are used to test the protocol. The theoretical analysis and test results show that the established security target of the protocol can resist network attacks in real application scenarios. In addition, the implementation efficiency of the protocol is sufficiently analyzed and evaluated. The results show that the protocol has high execution efficiency. In particular, the protocol is suitable for WSNs with high security requirements and limited computing power.

Artificial intelligence and machine learning‐aided drug discovery in central nervous system diseases: State‐of‐the‐arts and future directions

Article

Full-text available

Dec 2020

Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML‐driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML‐powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state‐of‐the‐art of AI/ML‐guided CNS drug discovery, focusing on blood–brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.

Quaking and PTB control overlapping splicing regulatory networks during muscle cell differentiation

Article

Full-text available

Mar 2013
RNA

Alternative splicing contributes to muscle development, but a complete set of muscle-splicing factors and their combinatorial interactions are unknown. Previous work identified ACUAA ("STAR" motif) as an enriched intron sequence near muscle-specific alternative exons such as Capzb exon 9. Mass spectrometry of myoblast proteins selected by the Capzb exon 9 intron via RNA affinity chromatography identifies Quaking (QK), a protein known to regulate mRNA function through ACUAA motifs in 3' UTRs. We find that QK promotes inclusion of Capzb exon 9 in opposition to repression by polypyrimidine tract-binding protein (PTB). QK depletion alters inclusion of 406 cassette exons whose adjacent intron sequences are also enriched in ACUAA motifs. During differentiation of myoblasts to myotubes, QK levels increase two- to threefold, suggesting a mechanism for QK-responsive exon regulation. Combined analysis of the PTB- and QK-splicing regulatory networks during myogenesis suggests that 39% of regulated exons are under the control of one or both of these splicing factors. This work provides the first evidence that QK is a global regulator of splicing during muscle development in vertebrates and shows how overlapping splicing regulatory networks contribute to gene expression programs during differentiation.

The Evolutionary Landscape of Alternative Splicing in Vertebrate Species

Article

Full-text available

Dec 2012
SCIENCE

Whence Species Variation? Vertebrates have widely varying phenotypes that are at odds with their much more limited proteincoding genotypes and conserved messenger RNA expression patterns. Genes with multiple exons and introns can undergo alternative splicing, potentially resulting in multiple protein isoforms (see the Perspective by Papasaikas and Valcárcel ). Barbosa-Morais et al. (p. 1587 ) and Merkin et al. (p. 1593 ) analyzed alternative splicing across the genomes of a variety of vertebrates, including human, primates, rodents, opossum, platypus, chicken, lizard, and frog. The findings suggest that the evolution of alternative splicing has for the most part been very rapid and that alternative splicing patterns of most organs more strongly reflect the identity of the species rather than the organ type. Species-classifying alternative splicing can affect key regulators, often in disordered regions of proteins that may influence protein-protein interactions, or in regions involved in protein phosphorylation.

The UCSC Genome Browser database: Extensions and updates 2013

Article

Full-text available

Nov 2012
NUCLEIC ACIDS RES

The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation ‘tracks’ are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.

Scoring genes for relevance

Article

Full-text available

Jan 2000

Amir Ben-Dor

Recent molecular level studies that compare different classes of disease conditions produce labeled gene expression data. We examine scoring methods that are useful in mining such gene expression data for genes that have biological relevance to the condition studied. Relevance information is useful in identifying genes driving the biological process, in selecting small subsets of genes with diagnostic potential, and in better understanding the condition studied and its relationship to known or hypothesized biochemical pathways. We present the scoring methods; de-scribe a process for computing the corresponding p-values; and finally, present results from application to actual cancer gene ex-pression data. These include applying classification techniques employing varying relevance based selected sets of genes.

NCBI Reference Sequences (RefSeq): Current status, new features and genome annotation policy

Article

Full-text available

Nov 2011
NUCLEIC ACIDS RES

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database is a collection of genomic, transcript and protein sequence records. These records are selected and curated from public sequence archives and represent a significant reduction in redundancy compared to the volume of data archived by the International Nucleotide Sequence Database Collaboration. The database includes over 16 000 organisms, 2.4 × 106 genomic records, 13 × 106 proteins and 2 × 106 RNA records spanning prokaryotes, eukaryotes and viruses (RefSeq release 49, September 2011). The RefSeq database is maintained by a combined approach of automated analyses, collaboration and manual curation to generate an up-to-date representation of the sequence, its features, names and cross-links to related sources of information. We report here on recent growth, the status of curating the human RefSeq data set, more extensive feature annotation and current policy for eukaryotic genome annotation via the NCBI annotation pipeline. More information about the resource is available online (see http://www.ncbi.nlm.nih.gov/RefSeq/).

The UCSC Genome Browser Database: Extensions and updates 2011

Article

Full-text available

Nov 2011
NUCLEIC ACIDS RES

The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced ‘track data hubs’, which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.

Alternative Splicing Regulates Biogenesis of miRNAs Located across Exon-Intron Junctions

Article

Jun 2013

The initial step in microRNA (miRNA) biogenesis requires processing of the precursor miRNA (pre-miRNA) from a longer primary transcript. Many pre-miRNAs originate from introns, and both a mature miRNA and a spliced RNA can be generated from the same transcription unit. We have identified a mechanism in which RNA splicing negatively regulates the processing of pre-miRNAs that overlap exon-intron junctions. Computational analysis identified dozens of such pre-miRNAs, and experimental validation demonstrated competitive interaction between the Microprocessor complex and the splicing machinery. Tissue-specific alternative splicing regulates maturation of one such miRNA, miR-412, resulting in effects on its targets that code a protein network involved in neuronal cell death processes. This mode of regulation specifically controls maturation of splice-site-overlapping pre-miRNAs but not pre-miRNAs located completely within introns or exons of the same transcript. Our data present a biological role of alternative splicing in regulation of miRNA biogenesis.

Regulation of Vascular Endothelial Growth Factor (VEGF) Splicing from Pro-angiogenic to Anti-angiogenic Isoforms

Article

Feb 2010

Vascular endothelial growth factor (VEGF) is produced either as a pro-angiogenic or anti-angiogenic protein depending upon splice site choice in the terminal, eighth exon. Proximal splice site selection (PSS) in exon 8 generates pro-angiogenic isoforms such as VEGF165, and distal splice site selection (DSS) results in anti-angiogenic isoforms such as VEGF165b. Cellular decisions on splice site selection depend upon the activity of RNA-binding splice factors, such as ASF/SF2, which have previously been shown to regulate VEGF splice site choice. To determine the mechanism by which the pro-angiogenic splice site choice is mediated, we investigated the effect of inhibition of ASF/SF2 phosphorylation by SR protein kinases (SRPK1/2) on splice site choice in epithelial cells and in in vivo angiogenesis models. Epithelial cells treated with insulin-like growth factor-1 (IGF-1) increased PSS and produced more VEGF165 and less VEGF165b. This down-regulation of DSS and increased PSS was blocked by protein kinase C inhibition and SRPK1/2 inhibition. IGF-1 treatment resulted in nuclear localization of ASF/SF2, which was blocked by SPRK1/2 inhibition. Pull-down assay and RNA immunoprecipitation using VEGF mRNA sequences identified an 11-nucleotide sequence required for ASF/SF2 binding. Injection of an SRPK1/2 inhibitor reduced angiogenesis in a mouse model of retinal neovascularization, suggesting that regulation of alternative splicing could be a potential therapeutic strategy in angiogenic pathologies.

Transcriptome-wide Regulation of Pre-mRNA Splicing and mRNA Localization by Muscleblind Proteins

Article

Aug 2012

The muscleblind-like (Mbnl) family of RNA-binding proteins plays important roles in muscle and eye development and in myotonic dystrophy (DM), in which expanded CUG or CCUG repeats functionally deplete Mbnl proteins. We identified transcriptome-wide functional and biophysical targets of Mbnl proteins in brain, heart, muscle, and myoblasts by using RNA-seq and CLIP-seq approaches. This analysis identified several hundred splicing events whose regulation depended on Mbnl function in a pattern indicating functional interchangeability between Mbnl1 and Mbnl2. A nucleotide resolution RNA map associated repression or activation of exon splicing with Mbnl binding near either 3' splice site or near the downstream 5' splice site, respectively. Transcriptomic analysis of subcellular compartments uncovered a global role for Mbnls in regulating localization of mRNAs in both mouse and Drosophila cells, and Mbnl-dependent translation and protein secretion were observed for a subset of mRNAs with Mbnl-dependent localization. These findings hold several new implications for DM pathogenesis.

Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context

Article

Jul 2011
BIOINFORMATICS

Alternative splicing is a major contributor to cellular diversity in mammalian tissues and relates to many human diseases. An important goal in understanding this phenomenon is to infer a 'splicing code' that predicts how splicing is regulated in different cell types by features derived from RNA, DNA and epigenetic modifiers. We formulate the assembly of a splicing code as a problem of statistical inference and introduce a Bayesian method that uses an adaptively selected number of hidden variables to combine subgroups of features into a network, allows different tissues to share feature subgroups and uses a Gibbs sampler to hedge predictions and ascertain the statistical significance of identified features. Using data for 3665 cassette exons, 1014 RNA features and 4 tissue types derived from 27 mouse tissues (http://genes.toronto.edu/wasp), we benchmarked several methods. Our method outperforms all others, and achieves relative improvements of 52% in splicing code quality and up to 22% in classification error, compared with the state of the art. Novel combinations of regulatory features and novel combinations of tissues that share feature subgroups were identified using our method. frey@psi.toronto.edu Supplementary data are available at Bioinformatics online.

AVISPA: A web tool for the prediction and analysis of alternative splicing

Abstract and Figures

Recommended publications

Human Full-Length Pre-mRNA Sequence Dataset for Computational Gene Prediction and Alternative Splici...

Algorithms for differential splicing detection using exon arrays: A comparative assessment

Determination of Alternate Splicing Events Using Transcriptome Arrays

Splicing predictions reliably classify different types of alternative splicing