ArticlePDF Available

Prediction of Continuous B-cell Epitopes in an Antigen Using Recurrent Neural Network

October 2006
Proteins Structure Function and Bioinformatics 65(1):40-8

October 2006
65(1):40-8

DOI:10.1002/prot.21078

Source
PubMed

Authors:

Sudipto Saha

Bose Institute

Gajendra Pal Singh Raghava

Indraprastha Institute of Information Technology

B-cell epitopes play a vital role in the development of peptide vaccines, in diagnosis of diseases, and also for allergy research. Experimental methods used for characterizing epitopes are time consuming and demand large resources. The availability of epitope prediction method(s) can rapidly aid experimenters in simplifying this problem. The standard feed-forward (FNN) and recurrent neural network (RNN) have been used in this study for predicting B-cell epitopes in an antigenic sequence. The networks have been trained and tested on a clean data set, which consists of 700 non-redundant B-cell epitopes obtained from Bcipep database and equal number of non-epitopes obtained randomly from Swiss-Prot database. The networks have been trained and tested at different input window length and hidden units. Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for window length of 16. The final network yields an overall prediction accuracy of 65.93% when tested by fivefold cross-validation. The corresponding sensitivity, specificity, and positive prediction values are 67.14, 64.71, and 65.61%, respectively. It has been observed that RNN (JE) was more successful than FNN in the prediction of B-cell epitopes. The length of the peptide is also important in the prediction of B-cell epitopes from antigenic sequences. The webserver ABCpred is freely available at www.imtech.res.in/raghava/abcpred/.

Content uploaded by Sudipto Saha

Content may be subject to copyright.

SHORT COMMUNICATION

Prediction of Continuous B-Cell Epitopes in an Antigen

Using Recurrent Neural Network

Sudipto Saha and G. P. S. Raghava*

Institute of Microbial Technology, Chandigarh, India

ABSTRACT B-cell epitopes play a vital role

in the development of peptide vaccines, in diagno-

sis of diseases, and also for allergy research. Ex-

perimental methods used for characterizing epi-

topes are time consuming and demand large

resources. The availability of epitope prediction

method(s) can rapidly aid experimenters in simpli-

fying this problem. The standard feed-forward

(FNN) and recurrent neural network (RNN) have

been used in this study for predicting B-cell epi-

topes in an antigenic sequence. The networks have

been trained and tested on a clean data set, which

consists of 700 non-redundant B-cell epitopes ob-

tained from Bcipep database and equal number of

non-epitopes obtained randomly from Swiss-Prot

database. The networks have been trained and

tested at different input window length and hidden

units. Maximum accuracy has been obtained using

recurrent neural network (Jordan network) with a

single hidden layer of 35 hidden units for window

length of 16. The final network yields an overall

prediction accuracy of 65.93% when tested by five-

fold cross-validation. The corresponding sensitiv-

ity, specificity, and positive prediction values are

67.14, 64.71, and 65.61%, respectively. It has been

observed that RNN (JE) was more successful than

FNN in the prediction of B-cell epitopes. The

length of the peptide is also important in the pre-

diction of B-cell epitopes from antigenic sequences.

The webserver ABCpred is freely available at

www.imtech.res.in/raghava/abcpred/. Proteins 2006;

65:40–48. V

C2006 Wiley-Liss, Inc.

Key words: ABCpred; prediction; B-cell epitopes;

recurrent neural network; web server

INTRODUCTION

The antigenic regions of a protein that are recognized

by the binding sites or paratope of immunoglobulin mol-

ecules are called B-cell epitopes. When such specific

binding (between epitope of an antigen and paratope of

an antibody) is observed experimentally, the particular

immunoglobulin establishes the epitope nature of a pro-

tein. Epitopes are thus relational entities that can be

defined only in a functional sense (i.e. in an immunoas-

say) by the binding of complementary paratopes.

These

epitopes play an important role in the designing of pep-

tide-based vaccines and also in the diagnosis of dis-

eases.

2–4

B-cell epitopes are also important for allergy

research and in determining the cross-reactivity of IgE-

type epitopes of allergens.

5–7

These epitopes may be

linear (continuous) or conformational (discontinuous).

When linear synthetic peptides are found to cross-react

with anti-protein antibodies or when they are able to

induce antibodies that cross-react with the parent pro-

tein, then these peptides are labeled as linear (continu-

ous) epitopes.

The protective linear B-cell epitopes may

lead to the synthesis of the efficient peptide vaccine

against viral disease.

A dominant linear B-cell epitope

is used as the target of neutralizing antibody responses

in autoimmune diseases.

A discontinuous or conforma-

tional epitope is composed of several disparate sequences

stretches, which are spatially contiguous. These sequen-

ces form a compact accessible region when the protein is

folded. Deciphering these epitopes is a difficult task, but

can give insight into the structural basis of antigen-anti-

body recognition.

Recently, Conformational epitope pre-

diction (CEP) server has been developed for the predic-

tion of conformational epitopes using 3D structural data

of protein antigens.

Prediction of immunogenic epitopes remains vital and

challenging task using bioinformatic tools. The inherent

complexity of antigen recognition complicates epitope

prediction.

In the past, number of algorithms have

been developed for predicting the continuous B-cell epi-

topes based on physico-chemical properties of amino

acids,

but their rate of successful prediction is not very

high. The commonly used properties for the prediction

Grant sponsors: Council of Scientific and Industrial Research

(CSIR); Department of Biotechnology (DBT), Government of India.

*Correspondence to: Dr. G.P.S. Raghava, Scientist, Institute of

Microbial Technology, Sector 39A, Chandigarh, India.

E-mail: raghava@imtech.res.in

Received 31 May 2005; Revised 7 March 2006; Accepted 24 April

2006

Published online 7 August 2006 in Wiley InterScience (www.

interscience.wiley.com). DOI: 10.1002/prot.21078

C2006 WILEY-LISS, INC.

PROTEINS: Structure, Function, and Bioinformatics 65:40–48 (2006)

are hydrophilicity (Parker method),

flexibility (Karplus

method),

accessibility (Emini method),

and turns

(Pellequer method),

which had been correlated with

the location of continuous epitopes in a few well-charac-

terized proteins. All the prediction calculations are

based on the propensity scales for each of the 20 amino

acids and these scales describe the tendency of each resi-

due to be associated with the physico-chemical proper-

ties. Based on these properties, few computer programs

are developed to assist the user in predicting epitopes in

an antigenic sequence. For example, PREDITOP

uses

22 normalized scales, corresponding to hydrophilicity,

accessibility, flexibility, and secondary structure propen-

sities. Another program, PEOPLE

have used the com-

bined prediction methods, taking into account of phys-

ico-chemical properties such as bturns, surface accessi-

bility, hydrophilicity, and flexibility. A recent program

BEPITOPE

aims at predicting the continuous protein

epitopes and searching for patterns either in a single

protein or on a complete translated genome. An assess-

ment of predictive value of algorithms based on eight

physico-chemical parameter scales has been studied for

locating of 29 continuous epitopes in four model pro-

teins. The results showed that the percentage of correct

prediction varies between 40–68% depending upon the

cut-off level of the threshold and the model protein.

Van Regenmortel and Pellequer have compared the pre-

diction efficacy of 22 different scales, taking into account

both the correct and incorrect predictions, and showed

that the prediction accuracy was not >50–60%.

Recently, we have studied the performance of various

methods on clean and large data set of B-cell epitopes.

Based on our observation we also developed a combined

method BcePred (www.imtech.res.in/raghava/bcepred/)

for predicting the B-cell epitopes using various physico-

chemical properties. The performance of the physico-

chemical properties varies from 52.9 to 57.5%, whereas

combined methods shows 58.7% accuracy.

Blythe and

Flower found underperformance of the existing 484

amino acid propensity scales while benchmarking B cell

epitope prediction.

One of the major problems with existing methods is

that they are qualitative rather than quantitative, as

most of these methods gave a property plots. In these

property plots one can only guess the stretch or region of

a protein, which may have B-cell epitope. It is nearly

impossible to identify exact region (start and end resi-

due), which can serve as B-cell epitope. To the best of our

knowledge, no sophisticated technique like artificial neu-

ral network (ANN) has been used for the prediction of B-

cell epitopes. The major problem of using machine learn-

ing technique is that the input window length has to be

fixed, whereas B-cell epitopes sequence vary from 5 to 30

as reported in literature (Bcipep database). This is the

reason why machine learning techniques such as ANN

were not developed in the past. In this study, an attempt

has been made to develop a method using ANN for the

prediction of B-cell epitopes. To overcome the problem of

varied length of B-cell epitope, we examined all the avail-

able B-cell epitopes and observed that most of the epi-

topes have length of about 20 amino acids or less (as

reported in literature), and only few epitopes have length

>20 amino acids. Thus in our study, we only considered

the epitopes of length 20 amino acids or less for develop-

ing our method. It does not mean that B-cell epitope have

length 20 amino acids or less. Adding or removing a few

residues at the terminals of B-cell epitopes has generated

the fixed length patterns. The additional residues were

taken from the parent/original antigenic sequences. To

train any prediction method, particularly machine learn-

ing techniques, one might require both positive (e.g. B-

cell epitopes) as well as negative (e.g. non B-cell epitope)

datasets. We created the positive B-cell epitope data set

from the Bcipep database. In the absence of any proven

non-epitopes, we took random peptides generated from

proteins as non-epitopes in this study. The creation of

negative dataset from random peptides/proteins is a com-

mon practice in the literature.

26–28

Though these random

peptides may also have B-cell epitopes, we assumed that

their probability is low.

29,30

Both standard feed forward network (FNN) and recur-

rent neural network (RNN) were applied in the present

study for predicting the B-cell epitope in an antigenic

sequence. Different window length, that is, 10–20 with

two amino acids interval, were used to achieve high ac-

curacy of the B-cell epitope prediction. It was observed

that the prediction of B-cell epitopes using RNN was

more accurate than FNN.

METHODS

The Data Set

B-cell epitopes have been obtained from Bcipep data-

base

that contains 2479 continuous epitopes. To train

any machine learning technique one need to have fixed

length pattern whereas B-cell epitopes have varying

length. We examined the length of B-cell epitopes and

observed that large number of epitopes have length less

than 20 amino acid (!90%). Thus we discarded all epi-

topes having length more than 20 residues in order to fix

the size of the pattern. We are not justfying that the 20

residues are optimized length for B-cell epitopes but this

is the practical aspect to handle the problem of B-cell epi-

tope prediction. We also removed identical epitopes from

our dataset to remove any biasness in the prediction.

Final dataset consists of 700 unique B-cell epitopes where

the maximum length is 20 amino acids. To generate a

negative dataset, we created non-epitopes using random

peptides of length 20 residues from the proteins in Swiss-

Prot.

All the random peptides that are identical to B-cell

epitopes were removed. Finally, we selected 700 random

peptides and used them as non B-cell epitope dataset.

Thus our dataset consists of 700 B-cell epitopes and 700

non B-cell epitopes (random peptides).

Creation of Fixed Length Pattern

In this study, we considered the B-cell epitopes of

length only 20 amino acids or less to fix the size of the

41PREDICTION OF CONTINUOUS B-CELL EPITOPES

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

pattern (upper limit). If the epitope length is less than

20 amino acids, then the length is increased by introduc-

ing equal number of residues at both terminals derived

from its original antigenic sequence. For example, if any

peptide is having length of 8 amino acids, then we added

6 neighbor residues at its both terminals (See Table I).

These neighbor residues were obtained from its original

antigenic sequence.

Size of the Input Window

After fixing upper limit and creating patterns of

length 20, we generated a pattern of different lengths

(window length) (10, 12, 14, 16, 18, 20). In this case we

removed equal number of residues from both sides of the

pattern. Table I shows the different window length ob-

tained for two epitopes of length 8 and 20.

Fivefold Cross-Validation

In this study, a fivefold cross-validation technique has

been used, in which the data set is randomly divided

into five subsets, each containing an equal number of

peptides (280 each). The five subsets have been grouped

into training, validation, and testing set. The training

set consists of three of these subsets. The network is

validated for minimum error on validation set (one set)

to avoid overtraining and the network is tested on the

remaining set of epitopes called testing set. This process

has been repeated five times so that each set was used

once for testing. The final prediction results have been

the average of five testing sets.

Blind Dataset 1

To evaluate our method on blind or independent data-

set, we obtained following four immunogenic proteins

from literature. (i) ESAT-6 protein, a low-molecular

weight protein secreted by virulent Mycobacterium tu-

berculosis, induced strong antibody response in experi-

mentally infected monkeys. The epitopes were deter-

mined using synthesis of overlapping peptides spanning

of ESAT-6 protein and by measuring antibody response

to ESAT-6 peptides by ELISA in serum samples from

monkeys.

33,34

(ii) Ag44 protein is a recombinant antigen

expressing the 134 C-terminal RhopH3 residues of Plas-

modium falciparum. Epitopes was determined using

overlapping peptides scanning of the protein and per-

forming ELISA assays.

(iii) The nucleocapsid (N) pro-

tein of reinderpest virus (RPV) is one of the most abun-

dant and immunogenic viral proteins. Epitope mapping

with overlapping peptides revealed three antigenic sites

in the regions.

(iv) Major surface protein (MSP) 1a of

the genus type species Anaplasma marginale had been

shown to contribute to protective immunity in cattle.

Linear B-cell epitopes of MSP1a were mapped using syn-

thetic peptides representing the entire sequence of the

protein and the sera from immunized cattle recognized

the peptides.

Blind Dataset 2

We also created another independent Blind dataset 2,

which consists of total 187 epitopes (128 IgE epitopes

obtained from structural database of allergenic proteins

(SDAP)

and 59 epitopes obtained from Bcipep data-

base

), and none of these epitopes were used in the

training or testing of ABCpred algorithm. This dataset

consists of 109 epitopes having less than 16 residues. To

create a pattern of 16 residues, we added equal number

of residues on both terminals of these epitopes from its

original sequence. We also generated 200 random 16mer

peptides from non allergen dataset of Bjorklund et al.

and used as non-epitopes. In summary, Blind dataset 2

consists of 187 epitopes and 200 non-epitopes.

Neural Network

In this study, FNN and partial RNN with a single hid-

den layer have been used. Initially, FNN has been tried,

since it is commonly used in the ANN. However, FNN did

not yield any satisfactory result and prompted us to try

for RNN (Jordan network). Both the networks have been

trained using back-propagation algorithm and with vari-

ous window lengths from 10 to 20 residues. The target

output consists of a single binary number and is one or

zero (B-cell epitopes or non-epitopes). The final Jordan

network has input window of 16 residues and have 35

units in a single hidden layer. For detailed description of

Jordan network see supplementary information at www.

imtech.res.in/raghava/abcpred/ABC_method.html.

The publicly available free simulation packages

SNNS, version 4.2, from Stuttgart University has been

used to implement the neural networks.

It allows

TABLE I. Creation of Fixed Length Patterns of 20 or Less Than 20 Amino Acids from B-cell Epitopes

Window length/peptide

AEFPLDIT

(8 amino acid length)

ACVPTDPNPQEVVLVNVTEN

(20 amino acid length)

20 PKGYVGAEFPLDITAGTEAA ACVPTDPNPQEVVLVNVTEN

18 KGYVGAEFPLDITAGTEA CVPTDPNPQEVVLVNVTE

16 GYVGAEFPLDITAGTE VPTDPNPQEVVLVNVT

14 YVGAEFPLDITAGT PTDPNPQEVVLVNV

12 VGAEFPLDITAG TDPNPQEVVLVN

10 GAEFPLDITADPNPQEVVLV

Patterns of different length generated from an epitope of eight amino acids.

Patterns of different length generated from an epitope of 20 amino acids.

42 S. SAHA AND G.P.S. RAGHAVA

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

incorporation of the resulting network into an ANSI C

function for use in the stand-alone code. At the start of

each simulation, the weights are initialized with random

values. The training is carried out by using error back-

propagation, with a sum of square error function.

The

magnitude of the error sum in the test and training set

is monitored in each cycle of the training. The ultimate

number of cycles is determined when the network con-

verges. During testing, a cut off value is set for each net-

work, and the output produced by the network is com-

pared with the cutoff value. If the output value is

greater than the threshold value, then that peptide is

predicted as B-cell epitope, otherwise as a non-epitope.

For each network, the cutoff value is adjusted so that it

yields the highest accuracy for that network. In this

study we used uniform/same parameters for learning of

five networks on different training sets during fivefold

cross validation. It means we have not optimized per-

formance of networks for individual test sets, instead we

optimized networks in order to get best average accu-

racy. We tried different network parameters during the

training to get the overall best performance (average ac-

curacy) over five sets. In other words, our best result

was achieved by maintaining uniform parameters over

the five subsets.

Performance Measure

Threshold-dependent measure

We used commonly used parameter to evaluate the

performance of method. The evaluation of performance

was at peptide or epitope level and not at residue level.

Five parameters have been used in the present work

to measure the performance of prediction method. Fol-

lowing is the brief description of the parameters: (1)

sens

(sensitivity) is the percent of epitopes that are cor-

rectly predicted as epitopes; (2) Q

spec

(specificity) is the

percent of epitopes correctly predicted as non-epitopes;

(3) Q

acc

(accuracy) is the proportion of correctly pre-

dicted peptides; (4) Q

ppv

(positive prediction value) is the

probability that a predicted epitope is infact an epitope;

and (5) Matthew’s correlation coefficient (MCC) were

also calculated. The parameters can be calculated by the

following equations.

Qsens ¼TP

TP þFN 3100%

Qspec ¼TN

TN þFP 3100%

Qacc ¼TP þTN

TP þFP þTN þFN 3100%

Qppv ¼TP

TP þFP

MCC ¼ðTPÞðTNÞ & ðFPÞðFNÞ

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

½TP þFP(½TP þFN(½TN þFP(½TN þFN(

Where TP and FN refer to true positive and false negatives,

TN and FP refer to true negatives and false positives.

Threshold-independent measures

One problem with the threshold-dependent measure is

that they measure the performance on a given threshold.

It is difficult to assess the overall performance of method

using these threshold-dependent parameters. The ROC is

a threshold-independent measure that was developed as a

signal processing technique. For a prediction method,

ROC plot is obtained by plotting all sensitivity values

(true-positive fraction) on the y-axis against their equiva-

lent (1-specificity) values (false-positive fraction) on the x-

axis. The area under the ROC curve is taken as an impor-

tant index because it provides a single measure of overall

accuracy that is not dependent on a particular thresh-

old.

It measures discrimination, the ability of a method

to correctly classify B-cell epitopes and non-epitopes.

RESULTS

All the methods have been trained and tested using

fivefold cross-validation. The prediction performance

measures have been averaged over five sets. First, we

trained and tested our method using FNN for different

window lengths (input units) like 10, 12, 14, 16, 18, and

20 (See Table I). The performance of FNN at different

window lengths with single layer of hidden unit 35 at

optimum/default threshold 0.5 is shown in Table II. The

TABLE II. The Performance of Our Neural Network with FNN at Optimum/Default Threshold (0.5)

Window size

Sensitivity

(%)

Specificity

(%)

PPV

(%)

Accuracy

(%) MCC

10 48.14 52.71 50.53 50.43 0.0088

12 53.00 52.14 52.54 52.57 0.0515

(54.71)

(54.72)

(54.71)

14 51.43 55.00 53.17 53.21 0.0645

16 53.29 56.57 55.10 54.93 0.0859

(55.86)

(54.29)

(55.20)

(55.07)

18 51.43 54.57 52.92 53.00 0.0602

20 54.43 59.14 57.28 56.79 0.1374

These results were obtained by FNN using single hidden layer of 35 units.

Indicates maximum percentage at hidden units 10.

43PREDICTION OF CONTINUOUS B-CELL EPITOPES

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

maximum performance achieved by FNN varied from

50.43% (nearly random) to 56.79%. We tried number of

options, including layers hidden units etc., but the accu-

racy does not improve further with FNN (data not

shown).

The accuracy of the method improved significantly (P

value ¼0.01732) when we implemented RNN for train-

ing and testing (at 0.02 level). The overall performance

(ROC plot) of RNN at various thresholds for window

length 16 is shown in Figure 1. ROC plot was obtained

by plotting all sensitivity values on the y-axis against (1-

specificity) on the x-axis for 0.1–1.0 thresholds at inter-

val of 0.1 (See Methods). The best performance of RNN

was at hidden unit 35 with singly hidden layer (see Fig.

1). We also compared the overall performance (ROC plot)

of FNN and RNN at hidden unit 35 with window length

16 and observed that RNN was better than FNN for

whole range (see Fig. 2). These results clearly indicate

the superiority of RNN over FNN in the prediction of B-

cell epitopes. We achieved average accuracy, 65.93%;

sensitivity, 67.14%; specificity, 64.71%; and MCC, 0.3187

using RNN at threshold 0.5. The learning parameters

were same for all five RNN models (e.g., SSE 0.0005,

cycles 5000, JE order; hidden nodes 35) in fivefold cross-

validation. The accuracy at threshold 0.5 for five test

sets was 58.57% (Set 1), 73.57% (Set 2), 72.14% (Set 3),

68.93% (Set 4), and 56.43% (Set 5). We used best RNN

model in our server. The sensitivity, specificity, PPV, ac-

curacy, and MCC at different window lengths using

RNN are shown in Table III.

Testing of ABCpred server on Blind Dataset 1

To evaluate the performance of ABCpred server, we

compute its predictive performance on a blind dataset

(Protein sequences not used in the development of

ABCpred algorithm). For this purpose, four recently

experimentally annotated proteins were obtained from

the literature. We predicted the B-cell epitopes in these

proteins using ABCpred server at default parameters.

The B-cell epitopes (predicted as well as experimentally

determined) were mapped on the protein along its amino

acid sequence. The B-cell epitopes predicted by ABCpred

server in ESAT-6,

33,34

Ag44 protein

and MSP1a,

Rin-

derpest virus protein

are shown in Figure 3(a,b),

respectively. The predicted peptides are displayed rank-

wise based on scores obtained by the trained recurrent

neural network. All the peptides shown in the figure are

at default threshold value (0.5) and window length 16

with overlapping filter. In case of ESAT-6, there was

totally four experimentally determined epitopes, our

server predicted seven epitopes in this protein. Our four

predicted region were in same region in sequence where

experimentally determined epitopes (three) were there.

Fifth epitope cover nearly half of the fourth B-cell epi-

tope. These results indicate that the server has the abil-

ity to detect the potential regions that contain B-cell epi-

topes, with significant accuracy. However, the server

also has lot of over prediction (or false positive) and can-

not predict boundary of B-cell epitopes. One of the rea-

sons for the poor prediction is due to the fact that B-cell

epitopes do not have any fixed length and we are using

a window of fixed length. Also it is not necessary that

epitopes determined experimentally have correct boun-

daries, because in experiment they tried limited peptides

(not all peptides of all possible length). The similar trend

was observed for other proteins; see Figure 3(a,b) for

detail. Overall, the results indicate that the performance

of the method is much better than random in real life.

Testing of ABCpred server on Blind Dataset 2

The performance of ABCpred has been evaluated on

Blind dataset 2, which consists 187 B-cell epitopes and

200 non-epitopes (16mer random peptides). In case, if

the B-cell epitope have more than 16 residues, we exam-

ined all overlapping 16mers and if any 16mer have score

more than the threshold then whole sequence is pre-

dicted as B-cell epitope. As shown in Table IV, we

achieved sensitivity of 71.66%, specificity of 61.50%, and

Fig. 1. The overall performance of our method with RNN for window

size 16. This ROC plot was obtained between sensitivity (y-axis) and

1-specificity (x-axis) for RNN at different thresholds from 0.1 to 1.0 at

interval of 0.1. Fig. 2. ROC plot of two neural networks FNN and RNN used in this

study at window size 16 and hidden units 35.

44 S. SAHA AND G.P.S. RAGHAVA

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

accuracy of 66.41% at default threshold value of 0.5. The

maximum accuracy of 69.25% was achieved using

ABCpred at threshold value of 0.6. Similarly, we eval-

uated the performance of Karplus method using flexibil-

ity and achieved the maximum accuracy of 59.43% at

threshold value of 1.50 (Table V). We achieved maximum

accuracy of 61.49% by Parker method using hydrophilic-

ity scale at the threshold value of 2.00 (Table VI). These

results demonstrate that the ABCpred can predict B-cell

epitopes with reasonably high accuracy.

TABLE III. The Performance of Our Method Using RNN at 0.5 Threshold

Using Single Hidden Layer of 35 Units

Window

Size

Sensitivity

(%)

Specificity

(%)

PPV

(%)

Accuracy

(%) MCC

10 58.71 64.14 61.78 61.43 0.2293

12 53.57 61.71 58.30 57.64 0.1534

14 52.43 65.29 60.12 58.86 0.1786

16 67.14 64.71 65.61 65.93 0.3187

18 58.70 65.0 62.06 61.86 0.2373

20 57.14 71.57 66.51 64.36 0.2871

Fig. 3. Comparative epitope mapping of predictions by ABCpred server against experimental data on set of 4

proteins (a)Mycobacterium tuberculosis ESAT-6 protein and Plasmodiun falciparum Ag44; (b) Rinderpest nu-

cleocapsid protein C terminal and Anaplasma marginale MSP1a N terminal. Red residues are reported in the lit-

erature as immunogenic, blue residues are predicted by ABCpred server as epitope, and underlined residues

are correctly predicted.

45PREDICTION OF CONTINUOUS B-CELL EPITOPES

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

Comparison with Existing Methods

It is important to compare the performance of newly

developed method with the existing methods. Following

is the brief description of major B-cell epitopes predic-

tion methods: (i) Hopps and Woods method is based on

analysis of 12 proteins

; (ii) Parker et al. method use

the modified hydrophilic scale

; (iii) Karplus and Schulz

developed a method using flexibility scale for predicting

the B-cell epitopes

; (iv) Emini et al. developed a

method using surface accessibility of the amino acids

;

(v) Kolaskar and Tongaonkar derived their own scale of

antigenicity based on the frequency of residues

; (vi)

Pellequer et al. uses turn scales, which they derived

from 87 protein structures

; (vii) Pellequer and Westhof

developed a program PREDITOP that uses the 22 nor-

malized scales

; (viii) PEOPLE uses combination of

physico-chemical properties,

and (ix) BEPITOPE is a

comprehensive program, which allows to combine two or

more parameters.

It is not practically possible to eval-

uate all these methods and programs in their original

form, because of number of reasons that includes non-

availability of the methods and most of them are quali-

tative methods. To evaluate the performance of existing

methods, we evaluated the performance of various phys-

ico-chemical properties of residues, rather than methods

itself

(http://www.imtech.res.in/raghava/bcepred/). They

evaluate major residue properties (hydrophilicity

; flexi-

bility

; accessibility,

etc.), which are used in most of

the existing method. As shown in Table VII, the per-

formance of the physico-chemical properties varies from

52.92 to 57.53%. The maximum accuracy of 58.70% has

been achieved using the combination of properties.

achieved maximum accuracy of 65.93% using method

ABCpred described in this study, which is better than

the accuracy achieved using any single property or by

combination. We calculated P-value to test whether the

accuracy of ABCpred is significantly better than accu-

racy of property based methods. We got P-value of 0.012

between accuracies of ABCpred and Karplus

method

(flexibility) and P-value of 0.011 between ABCpred and

Parker

method (hydrophilicity) accuracies on five test

sets at 0.05 level. These results show that the perform-

ance of ABCpred is significantly better than the methods

based on physico-chemical properties.

Web Server

Based on our observations, a server ABCpred, which

allows users to predict continuous B-cell epitopes in a

protein sequence, has been developed. Users can submit

an amino acid sequence and can select any window

length as well as threshold to be used for epitopes pre-

diction. It presents the result in overlap display and tab-

ular frame. In case of tabular frame, the server ranked

epitopes based on the score obtained from the trained

recurrent neural network. The higher score values of

the peptides indicates the higher probability to be pre-

dicted for an B-cell epitope. The server is accessible from

www.imtech.res.in/raghava/abcpred/.

TABLE IV. The Performance of ABCpred Server on

Blind Data Set 2

Threshold

Sensitivity

(%)

Specificity

(%)

Accuracy

(%)

0.1 99.47 1.00 48.58

0.2 95.72 7.00 49.87

0.3 92.51 18.00 54.00

0.4 82.89 39.50 60.47

0.5 71.66 61.50 66.41

0.6 60.96 77.00 69.25

0.7 49.73 87.00 68.99

0.8 33.16 95.50 65.37

0.9 4.81 99.50 53.75

1.0 0.00 100.00 51.68

TABLE V. The Performance of Karplus Method Based

on Flexibility on Blind Data Set 2

Threshold

Sensitivity

(%)

Specificity

(%)

Accuracy

(%)

0.00 100.00 0.00 48.32

0.50 99.47 5.00 50.65

1.00 95.18 20.50 56.59

1.50 78.60 42.00 59.43

2.00 50.27 63.50 57.11

2.50 23.53 79.00 52.19

3.00 4.81 93.50 50.65

TABLE VI. The Performance of Parker Method Based

on Hydrophilicity on Blind Data Set 2

Threshold

Sensitivity

(%)

Specificity

(%)

Accuracy

(%)

0.00 100.00 0.00 48.32

0.50 99.47 6.00 51.16

1.00 95.72 20.50 56.85

1.50 81.81 41.50 60.98

2.00 58.82 64.00 61.49

2.50 26.74 85.50 57.11

3.00 4.28 98.50 52.97

TABLE VII. The Performance of Various

Physico-Chemical Properties in Predicting

B-cell Epitope Prediction and ABCpred

Physico-chemical

properties/methods Accuracy Sensitivity Specificity

Hydrophilicity

54.47 33.04 76.90

Flexibility

57.53 47.42 67.64

Accessibility

55.49 65.01 45.97

Turns

52.92 17.01 88.82

Antigenic scale

55.59 58.99 52.19

Polarity

54.08 27.50 80.66

Surface

55.73 37.12 74.34

Best combination

58.70 56.07 61.32

ABCpred

(window length 16)

65.93 67.14 64.71

46 S. SAHA AND G.P.S. RAGHAVA

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

DISCUSSION

The prediction of B-cell epitopes in an antigen sequence

is an important and complex problem. Although, most an-

tigenic determinants of proteins are discontinuous, it is

possible to mimic epitopes by synthetic peptides.

Many

algorithms have been developed to predict the location of

continuous epitopes in proteins but their rate of successful

prediction is low.

One of the major problems faced in

developing B-cell epitope prediction is the variable length

of the epitope. As all machine learning techniques like

SVM, ANN, and PEBLS require fixed length of pattern/

peptide, it is not possible to use these techniques for B-cell

epitope prediction. Though ANN techniques are used to

classify the proteins of variable lengths from their amino

acid composition (fixed length pattern of 20), it is not pos-

sible in case of epitope/peptide where length is too small

to compute the composition. All the existing methods are

residue property based where first they generate the prop-

erty plots (e.g. hydrophilicity, flexibility) and then selects

the regions in an antigen, which shows the peaks. These

regions are assigned as B-cell epitopes. However, these

methods are subjective in nature because one does not

know the boundaries of epitopes.

In this study, for the first time a systematic attempt has

been made to develop a neural network based method for

predicting B-cell epitopes. A major problem in this method is

the length of B-cell epitope that varies from 5 to 30 residues.

The optimal length of a B-cell epitope is not known, unlike T-

cell epitope where MHC molecule core prefer 9 amino acids

for binding. On other hand, machine learning method

requires a fixed length of window for testing and training.

An initial examination of all the B-cell epitopes obtained

from Bcipep database reveals that most of the epitopes have

20 or less residues. Therefore, in our study we have only used

those epitopes that have 20 or less residues. This way we

have fixed the upper limit of size of patterns used in this

study. Next problem is how to handle epitopes that have resi-

dues <20. For epitopes of length less than 20 amino acids,

we have generated patterns of length of 20 amino acids by

adding neighboring residues both side of the epitope derived

from its original sequence. (See Methods; Table I). This way

we get a pattern of fixed length of 20 amino acids correspond-

ing to each epitope. We feel that this is one of the best ways

to handle this problem. Another problem we faced in this

study was obtaining non B-cell epitopes data. Ideally one

should have experimentally proven non B-cell epitopes data.

Because of lack of such data in the public domain, we gener-

ated random peptides of 20 amino acids from proteins in

Swiss-Prot database. We are not justifying that all these ran-

dom peptides are non B-cell epitopes, and it is possible that

these random peptides may also have B-cell epitopes. We

adopt this strategy of generating non-epitopes (negative

examples) as it has been used in number of investigations in

past.

26–28

Final data set contains patterns of length 20, with

equal number of positive (B-cell epitope) and negative (non

epitope) examples. A machine learning technique (ANN) is

used for discriminating B-cell epitopes from non-epitopes.

Though FNN is a commonly used network, we obtained poor

results using FNN. The percentage of accuracy obtained

using FNN is lower than existing methods based on physico-

chemical properties, and for window lengths of 10 and 12, ac-

curacy of FNN is near random (Table II). It has been

observed in the past that RNN performs better than FNN in

the prediction of secondary structure of proteins.

There-

fore, we tried RNN in our study and interestingly the per-

formance of RNN is found to be better than FNN (Table III).

The performance of RNN based method described in this

study also is significantly better than that reported for any

existing B-cell epitope prediction methods. The best perform-

ance of our method has been achieved when length of epitope

is 16 residues. However, 16 cannot be considered as an ideal

length of epitopes as number of epitopes with 15–22 amino

acids length have been identified.

We also evaluated the

performance of our method on blind dataset where we com-

pare the predicted and experimentally determined epitopes

in four proteins (not used in testing or training of ABCpred).

As shown in Figure 3(a,b), our method was able to predict

the experimentally determined epitopes with reasonable ac-

curacy. The performance is much better than random, de-

spite the fact that B-cell epitope prediction is a complex prob-

lem. Thus it is worth to use ABCpred server for detecting

potential B-cell epitopes in an antigen.

Though we have obtained high prediction accuracy of B-

cell epitopes in this study, it has its own limitations. The

method described here is not an alternate to existing meth-

ods, but will help to complement these methods. A number

of assumptions have been made in the algorithm because

one cannot directly implement ANN techniques in B-cell

epitopes prediction. The aim of this study is to provide an

additional quantitative method for B-cell epitopes predic-

tion. The accuracy of method is also not very high, despite

our systematic attempts. Users are advised to predict the

B-cell epitope in an antigen using all existing methods,

including our method, and to find out the regions in anti-

genic sequences, predicted by most of the methods.

CONCLUSIONS

It was observed that RNN (JE) has been more success-

ful than FNN in prediction of B-cell epitopes. The length

of the peptide is also important in prediction of B-cell

epitopes from antigenic sequences.

ACKNOWLEDGMENT

We are thankful to Miss Harpreet Kaur for assisting

in running SNNS version 4.2.

REFERENCES

1. Van Regenmortel MH. The concept and operational definition of

protein epitopes. Philos Trans R Soc Lond B Biol Sci 1989;323:

451–466.

2. Wiesmuller KH, Fleckenstein B, Jung G. Peptide vaccines and

peptide libraries. Biol Chem 2001;382:571–579.

3. Zauner W, Lingnau K, Mattner F, Von Gabain A, Buschle M.

Defined synthetic vaccines. Biol Chem 2001;382:581–595.

4. Van Regenmortel MH. Pitfalls of reductionism in the design of

peptide-cased vaccines. Vaccine 2001;19:2369–2374.

5. Negroni L, Bernard H, Clement G, Chatel JM, Brune P, Frobert

Y, Wal JM, Grassi J. Two-site enzyme immunometric assays for

47PREDICTION OF CONTINUOUS B-CELL EPITOPES

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

determination of native and denatured b-lactoglobulin. J Immu-

nol Methods 1998;220:25–37.

6. Selo I, Clement G, Bernard H, Chatel J, Creminon C, Peltre G,

Wal J. Allergy to bovine b-lactoglobulin: specificity of human

IgE to tryptic peptides. Clin Exp Allergy 1999;29:1055–1063.

7. Clement G, Boquet D, Frobert Y, Bernard H, Negroni L, Chatel

JM, Adel-Patient K, Creminon C, Wal JM, Grassi J. Epitopic

characterization of native bovine b-lactoglobulin. J Immunol

Methods 2002;266:67–78.

8. Van Regenmortel MH. Synthetic peptides versus natural

antigens in immunoassays. Ann Biol Clin (Paris) 1993;51:39–

41.

9. Langeveld JP, Martinez-Torrecuadrada J, Boshuizen RS, Meloen

RH, Ignacio CJ. Characterisation of a protective linear B

cell epitope against feline parvoviruses. Vaccine 2001;19:2352–

2360.

10. Castelletti D, Fracasso G, Righetti S, Tridente G, Schnell R,

Engert A, Colombatti M. A dominant linear B-cell epitope of

ricin A-chain is the target of a neutralizing antibody response

in Hodgkin’s lymphoma patients treated with an anti-CD25

immunotoxin. Clin Exp Immunol 2004;136:365–372.

11. Estienne V, Duthoit C, Blanchin S, Montserret R, Durand-Gorde

JM, Chartier M, Baty D, Carayon P, Ruf J. Analysis of a confor-

mational B cell epitope of human thyroid peroxidase: identifica-

tion of a tyrosine residue at a strategic location for immunodo-

minance. Int Immunol 2002;14:359–366.

12. Kulkarni-Kale U, Bhosle S, Kolaskar AS. CEP: a conformational

epitope prediction server. Nucleic Acids Res 2005;33:W168–

W171. Web server issue.

13. Flower DR. Towards in silico prediction of immunogenic epi-

topes. Trends Immunol 2003;24:667–674.

14. Pellequer JL, Westhof E, Regenmortel MHV. Predicting location

of continuous epitopes in proteins from their primary struc-

tures. Methods Enzymol 1991;203:176–201.

15. Parker JMD, Guo D, Hodges RS. New hydrophilicity scale

derived from high-performance liquid chromatography peptide

retention data: correlation of predicted surface residues with anti-

genicity and X-ray-derived accessible sites. Biochemistry 1986;25:

5425–5432.

16. Karplus PA, Schulz GE. Prediction of chain flexibility in pro-

teins: a tool for the selection of peptide antigen. Naturwissen-

schaften 1985;72:212,213.

17. Emini EA, Hughes JV, Perlow DS, Boger J. Induction of hepati-

tis A virus-neutralizing antibody by a virus-specific synthetic

peptide. J Virol 1985;55:836–839.

18. Pellequer J-L, Westhof E, Regenmortel MHV. Correlation

between the location of antigenic sites and the prediction of

turns in proteins. Immunol Lett 1993;36:83–99.

19. Pellequer JL, Westhof E. PREDITOP: A program for antigenic-

ity prediction. J Mol Graphics 1993;11:204–210.

20. Alix AJ. Predictive estimation of protein linear epitopes by

using the program PEOPLE. Vaccine 1999;18:311–314.

21. Odorico M, Pellequer JL. BEPITOPE: predicting the location of

continuous epitope and patterns in proteins. J Mol Recognit

2003;16:20–22.

22. Van Regenmortel MHV, de Marcillac GD. An assessment of pre-

diction methods for locating continuous epitopes in proteins.

Immunol Lett 1988;17:95–107.

23. Van Regenmortel MH, Pellequer JL. Predicting antigenic deter-

minants in proteins: looking for unidimensional solutions to a

three-dimensional problem? Pept Res 1994;7:224–228.

24. Saha S, Raghava GPS. BcePred: prediction of continuous B-cell

epitopes in antigenic sequences using physico-chemical proper-

ties. In: Nicosia G, Cutello V, Bentley PJ, Timis J, editors. ICA-

RIS 2004, LNCS 3239. Berlin: Springer; 2004. pp 197–204.

25. Blythe MJ, Flower DR. Benchmarking B cell epitope prediction:

underperformance of existing methods. Prot Sci 2005;14:246–

248.

26. Brazma A, Jonassen I, Eidhammer I, Gilbert D. Approaches to

the automatic discovery of patterns in biosequences. J Comput

Biol 1998;5:279–305.

27. Singh H, Raghava GPS. ProPred1: prediction of promiscuous

MHC class-I binding sites. Bioinformatics 2003;19:1009–1014.

28. Singh H, Raghava GPS. PropPred: prediction of HLA-DR bind-

ing sites. Bioinformatics 2001;17:1236,1237.

29. Lesenechal M, Becquart L, Lacoux X, Ladaviere L, Baida RC,

Paranhos-Baccala G, da Silveira JF. Mapping of B-cell epitopes

in a Trypanosoma cruzi immunodominant antigen expressed in

natural infections. Clin Diagn Lab Immunol 2005;12:329–333.

30. Choi KS, Nah JJ, Ko YJ, Kang SY, Yoon KJ, Jo NI. Antigenic

and immunogenic investigation of B-cell epitopes in the nucleo-

capsid protein of peste des petits ruminants virus. Clin Diagn

Lab Immunol 2005;12:114–121.

31. Saha S, Bhasin M, Raghava GPS. Bcipep: A database of B-cell

epitopes. BMC Genom 2005;6:79.

32. Bairoch A, Apweiler R. The SWISS-PROT protein sequence

database and its supplement TrEMBL in 2000. Nucleic Acids

Res 2000;28:45–48.

33. Kanaujia GV, Motzel S, Garcia MA, Andersen P, Gennaro ML.

Recognition of ESAT-6 sequences by antibodies in sera of tuber-

culous nonhuman primates. Clin Diagn Lab Immunol 2004;

11:222–226.

34. Harboe M, Malin AS, Dockrell HS, Wiker HG, Ulvund G, Holm

A, Jorgensen MC, Andersen P. B-cell epitopes and quantification

of the ESAT-6 protein of Mycobacterium tuberculosis. Infect

Immun 1998;66:717–723.

35. Doury JC, Goasdoue JL, Tolou H, Martelloni M, Bonnefoy S,

Mercereau-Puijalon O. Characterisation of the binding sites of

monoclonal antibodies reacting with the Plasmodium falcipa-

rum rhoptry protein RhopH3. Mol Biochem Parasitol 1997;85:

149–159.

36. Choi KS, Nah JJ, Ko YJ, Kang SY, Yoon KJ, Joo YS. Character-

ization of immunodominant linear B-cell epitopes on the car-

boxy terminus of the rinderpest virus nucleocapsid protein. Clin

Diagn Lab Immunol 2004;11:658–664.

37. Garcia-Garcia JC, de la Fuente J, Kocan KM, Blouin EF, Hal-

bur T, Onet VC, Saliki JT. Mapping of B-cell epitopes in the N-

terminal repeated peptides of Anaplasma marginale major sur-

face protein 1a and characterization of the humoral immune

response of cattle immunized with recombinant and whole orga-

nism antigens. Vet Immunol Immunopathol 2004;98:137–151.

38. Ivanciuc O, Schein CH, Braun W. SDAP: database and compu-

tational tools for allergenic proteins. Nucleic Acids Res 2003;31:

359–362.

39. Bjorklund AK, Soeria-Atmadja D, Zorzet A, Hammerling U,

Gustafsson MG. Supervised identification of allergen-represen-

tative peptides for in silico detection of potentially allergenic

proteins. Bioinformatics 2005;21:39–50.

40. Zell A, Mamier G. Stuttgart neural network simulator, version

4.2. University of Stuttgart, Stuttgart, 1997.

41. Rumelhart DE, Hinton GE, Williams RJ. Learning representa-

tions by back-propagation errors. Nature 1986;323:533–563.

42. Deleo JM. In: Proceedings of the second international sympo-

sium on uncertainity modelling and analysis, IEEE 1993. Col-

lege Park, MD: Computer Society Press; 1993. pp 318–325.

43. Hopp TP, Woods RK. Predictions of protein antigenic determi-

nants from amino acid sequences. Proc Natl Acad Sci USA

1981;78:3824–3828.

44. Kolaskar AS, Tongaonkar PC. A semi-empirical method for pre-

diction of antigenic determinants on protein antigens. FEBS

Lett 1990;276:172–174.

45. Baldi P, Brunak S. Exploiting the past and the future in protein

secondary structure prediction. Bioinformatics 1999;15:937–946.

46. Colman PM, Laver WG, Varghese JN, Baker AT, Tulloch PA, Air

GM, Webster RG. Three-dimensional structure of a complex of anti-

body with influenza virus neuraminidase. Nature 1987;326:358.

48 S. SAHA AND G.P.S. RAGHAVA

PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot

A Multi-epitope Subunit Vaccine Identification and Development Against Scrub Typhus (Orientia tsutsugamushi) Using Immunoinformatics Approaches

Article

May 2024

Display of porcine epidemic diarrhea virus spike protein B-cell linear epitope on Lactobacillus mucosae G01 S-layer surface induce a robust immunogenicity in mice

Article

Full-text available

May 2024
MICROB CELL FACT

The Porcine epidemic diarrhea virus (PEDV) presents a substantial risk to the domestic pig industry, resulting in extensive and fatal viral diarrhea among piglets. Recognizing the mucosal stimulation triggered by PEDV and harnessing the regulatory impact of lactobacilli on intestinal function, we have developed a lactobacillus-based vaccine that is carefully designed to elicit a strong mucosal immune response. Through bioinformatics analysis, we examined PEDV S proteins to identify B-cell linear epitopes that meet the criteria of being non-toxic, soluble, antigenic, and capable of neutralizing the virus. In this study, a genetically modified strain of Lactobacillus mucosae G01 (L.mucosae G01) was created by utilizing the S layer protein (SLP) as a scaffold for surface presentation. Chimeric immunodominant epitopes with neutralizing activity were incorporated at various sites on SLP. The successful expression of SLP chimeric immunodominant epitope 1 on the surface of L.mucosae G01 was confirmed through indirect immunofluorescence and transmission electron microscopy, revealing the formation of a transparent membrane. The findings demonstrate that the oral administration of L.mucosae G01, which expresses the SLP chimeric immunodominant gene epitope1, induces the production of secreted IgA in the intestine and feces of mice. Additionally, there is an elevation in IgG levels in the serum. Moreover, the levels of cytokines IL-2, IL-4, IFN-γ, and IL-17 are significantly increased compared to the negative control group. These results suggest that L. mucosae G01 has the ability to deliver exogenous antigens and elicit a specific mucosal immune response against PEDV. This investigation presents new possibilities for immunoprophylaxis against PEDV-induced diarrhea.

Predictive Immunoinformatics Reveal Promising Safety and Anti-Onchocerciasis Protective Immune Response Profiles to Vaccine Candidates (Ov-RAL-2 and Ov-103) in Anticipation of Phase I Clinical Trials

Preprint

Full-text available

May 2024

Onchocerciasis is a devastating tropical disease that causes severe eye and skin lesions. As global efforts shift from disease control to elimination, prophylactic/therapeutic vaccines have emerged as alternative elimination tools. Notably, Ov-RAL-2 and Ov-103 antigens have shown great promise in preclinical studies and plans are underway for clinical trials. Here, we predict the immunogenicity and other vaccine-related parameters for both antigens using immunoinformatics, as potential vaccine candidates against onchocerciasis. The analysis reveals that both antigens exhibit a favourable safety profile, making them promising candidates poised for human trials. Importantly, in silico immune simulation forecasts heightened antibody production and sustained cellular responses for both vaccine candidates. Indeed, the antigens were predicted to harbour substantial numbers of a wide range of distinct epitopes associated with protective responses against onchocerciasis, as well as the potential for stimulating innate immune TLR-4 receptor recognition with Ov-103 exhibiting better structural efficiency and antigenicity with no homology to human proteins compared to Ov-RAL-2. Overall, we provide herein valuable insights for advancing the development of Ov-103 and RAL-2 vaccine candidates against onchocerciasis in humans. Keywords: onchocerciasis, immunoinformatics, Ov-RAL-2, Ov-103, antigenicity, safety, protective immunity, molecular docking, molecular dynamics simulation

An emerging trends of bioinformatics and big data analytics in healthcare

Chapter

Full-text available

May 2024

In the current situation, the world is facing a variety of diseases, and healthcare management is also facing various enabling challenges due to emerging diseases. As healthcare is essential to human existence, most cutting-edge techniques are employed to enhance healthcare. In the era of knowledge mining, informatics plays a crucial role in various branches of research, especially in the period of the technological world since there are constantly evolving computational resources, technology, and algorithms, computational biology driven out of research laboratories and into our everyday lives to deal with it and manage it within the allotted time. Due to advanced and high-tech emerging algorithms in bioinformatics, personal computers can have the power of supercomputers, reducing research costs and time, ensuring safe and effective methods, and accelerating the discovery of novel human and healthcare managementrelated outcomes. Based on the fusion of computers and biology, computational biology can be recognized as an information science discipline that can assist in comprehending the complexity of diseases and their underlying mechanisms using a variety of fundamental approaches. As every individual contains a distinctive genome and a high degree of individuality, achieving a healthcare system where each patient might get personalized medication is one of the greatest challenges humans are facing in the present era. Monitoring patterns of data, which undergoes analysis to facilitate the discovery of strategic and decision-making-relevant insights, is possible in healthcare, thanks to big data analytics technology, along with patient diagnostics, rapid epidemic recognition, and enhanced patient management. Therefore, this chapter aims to provide an indepth overview of bioinformatics, various tools and their applications, health informatics, and the health care system. Thus, this study aims to contribute to a technologically distinct perspective of advancements in bioinformatics and big data analysis methods that can be useful to healthcare.

Immunoinformatics strategy for designing a multi-epitope chimeric vaccine to combat Neisseria gonorrhoeae

Article

May 2024
Vacunas

Subtractive proteomics-based vaccine targets annotation and reverse vaccinology approaches to identify multiepitope vaccine against Plesiomonas Shigelloides

Article

May 2024

Plasmodium vivax HAP2 genetic diversity and population structure from worldwide clinical samples. A potential Transmission-Blocking malaria vaccine candidate

Article

May 2024

Ahmed Saif

Exploring malaria parasite surface proteins to devise highly immunogenic multi-epitope subunit vaccine for Plasmodium falciparum

Article

May 2024

Background Malaria has remained a major health concern for decades among people living in tropical and sub-tropical countries. Plasmodium falciparum is one of the critical species that cause severe malaria and is responsible for major mortality. Moreover, the parasite has generated resistance against all WHO recommended drugs and therapies. Therefore, there is an urgent need for preventive measures in the form of reliable vaccines to achieve the target of a malaria-free world. Surface proteins are the preferable choice for subunit vaccine development because they are rapidly detected and engaged by host immune cells and vaccination-induced antibodies. Additionally, abundant surface or membrane proteins may contribute to the opsonization of pathogens by vaccine-induced antibodies. Results In our study, we have listed all those surface proteins from the literature that could be functionally important and essential for infection and immune evasion of the malaria parasite. Eight Plasmodium surface and membrane proteins from the pre-erythrocyte and erythrocyte stages were shortlisted. Thirty-seven epitopes (B-cell, CTL, and HTL epitopes) from these proteins were predicted using immune-informatic tools and joined with suitable peptide linkers to design a vaccine construct. A TLR-4 agonist peptide adjuvant was added at the N-terminus of the multi-epitope series, followed by the PADRE sequence and EAAAK linker. The TLR-4 receptor was docked with the construct’s anticipated model structure. The complex of vaccine and TLR-4, with the lowest energy −1514, was found to be stable under simulated physiological settings. Conclusion This study has provided a novel multi-epitope construct that may be exploited further for the development of an efficient vaccine for malaria.

Vaccinomics

Chapter

Jan 2024

A novel multi-epitope peptide vaccine candidate targeting hepatitis E virus: An in silico approach

Article

Full-text available

May 2024
J VIRAL HEPATITIS

Hepatitis E virus (HEV) is a foodborne virus transmitted through the faecal–oral route that causes viral hepatitis in humans worldwide. Ever since its discovery as a zoonotic agent, HEV was isolated from several species with an expanding range of hosts. HEV possesses several features of other RNA viruses but also has certain HEV‐specific traits that make its viral–host interactions inimitable. HEV leads to severe morbidity and mortality in immunocompromised people and pregnant women across the world. The situation in underdeveloped countries is even more alarming. Even after creating a menace across the world, we still lack an effective vaccine against HEV. Till date, there is only one licensed vaccine for HEV available only in China. The development of an anti‐HEV vaccine that can reduce HEV‐induced morbidity and mortality is required. Live attenuated and killed vaccines against HEV are not accessible due to the lack of a tolerant cell culture system, slow viral replication kinetics and varying growth conditions. Thus, the main focus for anti‐HEV vaccine development is now on the molecular approaches. In the current study, we have designed a multi‐epitope vaccine against HEV through a reverse vaccinology approach. For the first time, we have used viral ORF3, capsid protein and polyprotein altogether for epitope prediction. These are crucial for viral replication and persistence and are major vaccine targets against HEV. The proposed in silico vaccine construct comprises of highly immunogenic and antigenic T‐cell and B‐cell epitopes of HEV proteins. The construct is capable of inducing an effective and long‐lasting host immune response as evident from the simulation results. In addition, the construct is stable, non‐allergic and antigenic for the host. Altogether, our findings suggest that the in silico vaccine construct may be useful as a vaccine candidate for preventing HEV infections.

BcePred: Prediction of Continuous B-Cell Epitopes in Antigenic Sequences Using Physico-chemical Properties

Conference Paper

Full-text available

Sep 2004

A crucial step in designing of peptide vaccines involves the identification of B-cell epitopes. In past, numerous methods have been developed for predicting continuous B-cell epitopes, most of these methods are based on physico-chemical properties of amino acids. Presently, its difficult to say which residue property or method is better than the others because there is no independent evaluation or benchmarking of existing methods. In this study the performance of various residue properties commonly used in B-cell epitope prediction has been evaluated on a clean dataset. The dataset used in this study consists of 1029 non-redundant B cell epitopes obtained from Bcipep database and equally number of non-epitopes obtained randomly from SWISS-PROT database. The performance of each residue property used in existing methods has been computed at various thresholds on above dataset. The accuracy of prediction based on properties varies between 52.92% and 57.53%. We have also evaluated the combination of two or more properties as combination of parameters enhance the accuracy of prediction. Based on our analysis we have developed a method for predicting B cell epitopes, which combines four residue properties. The accuracy of this method is 58.70%, which is slightly better than any single residue property. A web server has been developed to predict B cell epitopes in an antigen sequence. The server is accessible from http://www.imtech.res.in/raghava/bcepred/

The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1998

Article

Full-text available

Jan 1997

SWISS-PROT (http://www.expasy.ch/) is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include: an increase in the number and scope of model organisms; cross-references to two additional databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except the CDS already included in SWISS-PROT.

Allergy to bovine beta-lactoglobulin: specificity of human IgE to tryptic peptides

Article

Aug 1999

Background Bovine beta-lactoglobulin (Blg) is a major cow's milk allergen. It is the main whey protein, without any counterpart in human milk. Big chemical hydrolysates appeared to retain most of the immunoreactivity of the native protein. Allergenicity of Big has already been shown to be associated with the four peptides derived from cyanogen bromide cleavage of Big. Objectives To map the major allergenic epitopes (e.g. regions of the molecule able to bind IgE) on Big using specific IgE from sera of 46 milk-allergic patients as a probe. Methods Direct and competitive inhibition enzyme immunoassays involving immobilized native protein or purified peptides derived from Big tryptic cleavage. Results Several peptides capable of specifically binding human IgEs were identified and were classified according to the intensity and frequency of the responses. The major epitopes appeared to be fragments (41-60), (102-124) and (149-162) recognized by 92, 97 and 89% of sera, respectively, whilst a second group which contained the fragments (1-8) and (25-40) was recognized by 58 and 72% of the population. A third group, comprising peptides (9-14), (84-91) and (92-100), was still detected by more than 40% of sera. Conclusion Three peptides were identified as major epitopes, recognized by a large majority of human IgE antibodies. Numerous other epitopes are scattered all along the Big sequence.

Analysis of a conformational B cell epitope of human thyroid peroxidase: identification of a tyrosine residue at a strategic location for immunodominance

Article

Apr 2002
INT IMMUNOL

Valérie Estienne

Thyroid peroxidase (TPO) is involved in autoimmune thyroid diseases and high titers of TPO autoantibodies directed to various conformational B cell epitopes are frequently present in patients’ sera. Deciphering these epitopes is a difficult task, but can give insight into the structural basis of autoimmune recognition. TPO is a membrane-bound enzyme with the extracellular part organized in three protein domains, but of unknown three-dimensional structure. We previously localized a TPO B cell epitope within amino acid residues 742‐848, a region encompassing the two C-terminal, extracellular domains of the protein. We found that at least one of the three tyrosine residues of the peptide 742‐848 might be involved in autoantibody binding. In this study, we show by site-directed mutagenesis that the autoepitope contains tyrosine 772 located near the hinge area between the two protein domains, suggesting they are both involved in the epitope structure. The B cell epitopes of TPO are clustered in two overlapping immunodominant regions. To map the newly localized epitope with respect of these regions, competition experiments were performed using a reference panel of TPO mAb and a further mAb previously found to be specific for the TPO peptide 742‐848 at variance with all the other ones. Here, we show that the tyrosine 772-bearing epitope in the peptide 742‐848 maps in a region that partly overlaps the reported two immunodominant regions. These results are suggestive of a complex TPO folding that involves all the three TPO protein domains to form a highly conformational immunodominant region.

Prediction of chain flexibility in proteins: A tool for the selection of peptide antigens

Article

Jan 1984

Paul Andrew Karplus

SNNS: Stuttgart Neural Network Simulator User Manual

Article

Jan 1995

The SWISS-PROT protein database and its supplement TrEMBL in 2000

Article

Jan 2000

Learning Representations by Back Propagating Errors

Article

Oct 1986
NATURE

We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal 'hidden' units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure.

Prediction of chain flexibility in proteins

Article

Jan 1985

SNNS (Stuttgart Neural Network Simulator)

Chapter

Jul 2011

We here describe SNNS, a neural network simulator for Unix workstations that has been developed at the University of Stuttgart, Germany. Our network simulation environment is a tool to generate, train, test, and visualize artificial neural networks. The simulator consists of three major components: a simulator kernel that operates on the internal representation of the neural networks, a graphical user interface based on X-Windows to interactively create, modify and visualize neural nets, and a compiler to generate large neural networks from a high level network description language.

Prediction of Continuous B-cell Epitopes in an Antigen Using Recurrent Neural Network

Abstract

Recommended publications

Identification of surface-exposed B-cell epitopes on high-molecular- weight adhesion proteins of non...

High Resolution Mapping of the B Cell Epitopes of Staphylokinase in Humans Using Negative Selection...

Plasmodium falciparum: Fine-mapping of an epitope of the serine repeat antigen that is a target of p...

Long synthetic peptides encompassing the Plasmodium falciparum LSA3 are the target of human B and T...