ArticlePDF Available

Prediction of Continuous B-cell Epitopes in an Antigen Using Recurrent Neural Network

Authors:

Abstract

B-cell epitopes play a vital role in the development of peptide vaccines, in diagnosis of diseases, and also for allergy research. Experimental methods used for characterizing epitopes are time consuming and demand large resources. The availability of epitope prediction method(s) can rapidly aid experimenters in simplifying this problem. The standard feed-forward (FNN) and recurrent neural network (RNN) have been used in this study for predicting B-cell epitopes in an antigenic sequence. The networks have been trained and tested on a clean data set, which consists of 700 non-redundant B-cell epitopes obtained from Bcipep database and equal number of non-epitopes obtained randomly from Swiss-Prot database. The networks have been trained and tested at different input window length and hidden units. Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for window length of 16. The final network yields an overall prediction accuracy of 65.93% when tested by fivefold cross-validation. The corresponding sensitivity, specificity, and positive prediction values are 67.14, 64.71, and 65.61%, respectively. It has been observed that RNN (JE) was more successful than FNN in the prediction of B-cell epitopes. The length of the peptide is also important in the prediction of B-cell epitopes from antigenic sequences. The webserver ABCpred is freely available at www.imtech.res.in/raghava/abcpred/.
SHORT COMMUNICATION
Prediction of Continuous B-Cell Epitopes in an Antigen
Using Recurrent Neural Network
Sudipto Saha and G. P. S. Raghava*
Institute of Microbial Technology, Chandigarh, India
ABSTRACT B-cell epitopes play a vital role
in the development of peptide vaccines, in diagno-
sis of diseases, and also for allergy research. Ex-
perimental methods used for characterizing epi-
topes are time consuming and demand large
resources. The availability of epitope prediction
method(s) can rapidly aid experimenters in simpli-
fying this problem. The standard feed-forward
(FNN) and recurrent neural network (RNN) have
been used in this study for predicting B-cell epi-
topes in an antigenic sequence. The networks have
been trained and tested on a clean data set, which
consists of 700 non-redundant B-cell epitopes ob-
tained from Bcipep database and equal number of
non-epitopes obtained randomly from Swiss-Prot
database. The networks have been trained and
tested at different input window length and hidden
units. Maximum accuracy has been obtained using
recurrent neural network (Jordan network) with a
single hidden layer of 35 hidden units for window
length of 16. The final network yields an overall
prediction accuracy of 65.93% when tested by five-
fold cross-validation. The corresponding sensitiv-
ity, specificity, and positive prediction values are
67.14, 64.71, and 65.61%, respectively. It has been
observed that RNN (JE) was more successful than
FNN in the prediction of B-cell epitopes. The
length of the peptide is also important in the pre-
diction of B-cell epitopes from antigenic sequences.
The webserver ABCpred is freely available at
www.imtech.res.in/raghava/abcpred/. Proteins 2006;
65:40–48. V
V
C2006 Wiley-Liss, Inc.
Key words: ABCpred; prediction; B-cell epitopes;
recurrent neural network; web server
INTRODUCTION
The antigenic regions of a protein that are recognized
by the binding sites or paratope of immunoglobulin mol-
ecules are called B-cell epitopes. When such specific
binding (between epitope of an antigen and paratope of
an antibody) is observed experimentally, the particular
immunoglobulin establishes the epitope nature of a pro-
tein. Epitopes are thus relational entities that can be
defined only in a functional sense (i.e. in an immunoas-
say) by the binding of complementary paratopes.
1
These
epitopes play an important role in the designing of pep-
tide-based vaccines and also in the diagnosis of dis-
eases.
2–4
B-cell epitopes are also important for allergy
research and in determining the cross-reactivity of IgE-
type epitopes of allergens.
5–7
These epitopes may be
linear (continuous) or conformational (discontinuous).
When linear synthetic peptides are found to cross-react
with anti-protein antibodies or when they are able to
induce antibodies that cross-react with the parent pro-
tein, then these peptides are labeled as linear (continu-
ous) epitopes.
8
The protective linear B-cell epitopes may
lead to the synthesis of the efficient peptide vaccine
against viral disease.
9
A dominant linear B-cell epitope
is used as the target of neutralizing antibody responses
in autoimmune diseases.
10
A discontinuous or conforma-
tional epitope is composed of several disparate sequences
stretches, which are spatially contiguous. These sequen-
ces form a compact accessible region when the protein is
folded. Deciphering these epitopes is a difficult task, but
can give insight into the structural basis of antigen-anti-
body recognition.
11
Recently, Conformational epitope pre-
diction (CEP) server has been developed for the predic-
tion of conformational epitopes using 3D structural data
of protein antigens.
12
Prediction of immunogenic epitopes remains vital and
challenging task using bioinformatic tools. The inherent
complexity of antigen recognition complicates epitope
prediction.
13
In the past, number of algorithms have
been developed for predicting the continuous B-cell epi-
topes based on physico-chemical properties of amino
acids,
14
but their rate of successful prediction is not very
high. The commonly used properties for the prediction
Grant sponsors: Council of Scientific and Industrial Research
(CSIR); Department of Biotechnology (DBT), Government of India.
*Correspondence to: Dr. G.P.S. Raghava, Scientist, Institute of
Microbial Technology, Sector 39A, Chandigarh, India.
E-mail: raghava@imtech.res.in
Received 31 May 2005; Revised 7 March 2006; Accepted 24 April
2006
Published online 7 August 2006 in Wiley InterScience (www.
interscience.wiley.com). DOI: 10.1002/prot.21078
V
V
C2006 WILEY-LISS, INC.
PROTEINS: Structure, Function, and Bioinformatics 65:40–48 (2006)
are hydrophilicity (Parker method),
15
flexibility (Karplus
method),
16
accessibility (Emini method),
17
and turns
(Pellequer method),
18
which had been correlated with
the location of continuous epitopes in a few well-charac-
terized proteins. All the prediction calculations are
based on the propensity scales for each of the 20 amino
acids and these scales describe the tendency of each resi-
due to be associated with the physico-chemical proper-
ties. Based on these properties, few computer programs
are developed to assist the user in predicting epitopes in
an antigenic sequence. For example, PREDITOP
19
uses
22 normalized scales, corresponding to hydrophilicity,
accessibility, flexibility, and secondary structure propen-
sities. Another program, PEOPLE
20
have used the com-
bined prediction methods, taking into account of phys-
ico-chemical properties such as bturns, surface accessi-
bility, hydrophilicity, and flexibility. A recent program
BEPITOPE
21
aims at predicting the continuous protein
epitopes and searching for patterns either in a single
protein or on a complete translated genome. An assess-
ment of predictive value of algorithms based on eight
physico-chemical parameter scales has been studied for
locating of 29 continuous epitopes in four model pro-
teins. The results showed that the percentage of correct
prediction varies between 40–68% depending upon the
cut-off level of the threshold and the model protein.
22
Van Regenmortel and Pellequer have compared the pre-
diction efficacy of 22 different scales, taking into account
both the correct and incorrect predictions, and showed
that the prediction accuracy was not >50–60%.
23
Recently, we have studied the performance of various
methods on clean and large data set of B-cell epitopes.
24
Based on our observation we also developed a combined
method BcePred (www.imtech.res.in/raghava/bcepred/)
for predicting the B-cell epitopes using various physico-
chemical properties. The performance of the physico-
chemical properties varies from 52.9 to 57.5%, whereas
combined methods shows 58.7% accuracy.
24
Blythe and
Flower found underperformance of the existing 484
amino acid propensity scales while benchmarking B cell
epitope prediction.
25
One of the major problems with existing methods is
that they are qualitative rather than quantitative, as
most of these methods gave a property plots. In these
property plots one can only guess the stretch or region of
a protein, which may have B-cell epitope. It is nearly
impossible to identify exact region (start and end resi-
due), which can serve as B-cell epitope. To the best of our
knowledge, no sophisticated technique like artificial neu-
ral network (ANN) has been used for the prediction of B-
cell epitopes. The major problem of using machine learn-
ing technique is that the input window length has to be
fixed, whereas B-cell epitopes sequence vary from 5 to 30
as reported in literature (Bcipep database). This is the
reason why machine learning techniques such as ANN
were not developed in the past. In this study, an attempt
has been made to develop a method using ANN for the
prediction of B-cell epitopes. To overcome the problem of
varied length of B-cell epitope, we examined all the avail-
able B-cell epitopes and observed that most of the epi-
topes have length of about 20 amino acids or less (as
reported in literature), and only few epitopes have length
>20 amino acids. Thus in our study, we only considered
the epitopes of length 20 amino acids or less for develop-
ing our method. It does not mean that B-cell epitope have
length 20 amino acids or less. Adding or removing a few
residues at the terminals of B-cell epitopes has generated
the fixed length patterns. The additional residues were
taken from the parent/original antigenic sequences. To
train any prediction method, particularly machine learn-
ing techniques, one might require both positive (e.g. B-
cell epitopes) as well as negative (e.g. non B-cell epitope)
datasets. We created the positive B-cell epitope data set
from the Bcipep database. In the absence of any proven
non-epitopes, we took random peptides generated from
proteins as non-epitopes in this study. The creation of
negative dataset from random peptides/proteins is a com-
mon practice in the literature.
26–28
Though these random
peptides may also have B-cell epitopes, we assumed that
their probability is low.
29,30
Both standard feed forward network (FNN) and recur-
rent neural network (RNN) were applied in the present
study for predicting the B-cell epitope in an antigenic
sequence. Different window length, that is, 10–20 with
two amino acids interval, were used to achieve high ac-
curacy of the B-cell epitope prediction. It was observed
that the prediction of B-cell epitopes using RNN was
more accurate than FNN.
METHODS
The Data Set
B-cell epitopes have been obtained from Bcipep data-
base
31
that contains 2479 continuous epitopes. To train
any machine learning technique one need to have fixed
length pattern whereas B-cell epitopes have varying
length. We examined the length of B-cell epitopes and
observed that large number of epitopes have length less
than 20 amino acid (!90%). Thus we discarded all epi-
topes having length more than 20 residues in order to fix
the size of the pattern. We are not justfying that the 20
residues are optimized length for B-cell epitopes but this
is the practical aspect to handle the problem of B-cell epi-
tope prediction. We also removed identical epitopes from
our dataset to remove any biasness in the prediction.
Final dataset consists of 700 unique B-cell epitopes where
the maximum length is 20 amino acids. To generate a
negative dataset, we created non-epitopes using random
peptides of length 20 residues from the proteins in Swiss-
Prot.
32
All the random peptides that are identical to B-cell
epitopes were removed. Finally, we selected 700 random
peptides and used them as non B-cell epitope dataset.
Thus our dataset consists of 700 B-cell epitopes and 700
non B-cell epitopes (random peptides).
Creation of Fixed Length Pattern
In this study, we considered the B-cell epitopes of
length only 20 amino acids or less to fix the size of the
41PREDICTION OF CONTINUOUS B-CELL EPITOPES
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
pattern (upper limit). If the epitope length is less than
20 amino acids, then the length is increased by introduc-
ing equal number of residues at both terminals derived
from its original antigenic sequence. For example, if any
peptide is having length of 8 amino acids, then we added
6 neighbor residues at its both terminals (See Table I).
These neighbor residues were obtained from its original
antigenic sequence.
Size of the Input Window
After fixing upper limit and creating patterns of
length 20, we generated a pattern of different lengths
(window length) (10, 12, 14, 16, 18, 20). In this case we
removed equal number of residues from both sides of the
pattern. Table I shows the different window length ob-
tained for two epitopes of length 8 and 20.
Fivefold Cross-Validation
In this study, a fivefold cross-validation technique has
been used, in which the data set is randomly divided
into five subsets, each containing an equal number of
peptides (280 each). The five subsets have been grouped
into training, validation, and testing set. The training
set consists of three of these subsets. The network is
validated for minimum error on validation set (one set)
to avoid overtraining and the network is tested on the
remaining set of epitopes called testing set. This process
has been repeated five times so that each set was used
once for testing. The final prediction results have been
the average of five testing sets.
Blind Dataset 1
To evaluate our method on blind or independent data-
set, we obtained following four immunogenic proteins
from literature. (i) ESAT-6 protein, a low-molecular
weight protein secreted by virulent Mycobacterium tu-
berculosis, induced strong antibody response in experi-
mentally infected monkeys. The epitopes were deter-
mined using synthesis of overlapping peptides spanning
of ESAT-6 protein and by measuring antibody response
to ESAT-6 peptides by ELISA in serum samples from
monkeys.
33,34
(ii) Ag44 protein is a recombinant antigen
expressing the 134 C-terminal RhopH3 residues of Plas-
modium falciparum. Epitopes was determined using
overlapping peptides scanning of the protein and per-
forming ELISA assays.
35
(iii) The nucleocapsid (N) pro-
tein of reinderpest virus (RPV) is one of the most abun-
dant and immunogenic viral proteins. Epitope mapping
with overlapping peptides revealed three antigenic sites
in the regions.
36
(iv) Major surface protein (MSP) 1a of
the genus type species Anaplasma marginale had been
shown to contribute to protective immunity in cattle.
Linear B-cell epitopes of MSP1a were mapped using syn-
thetic peptides representing the entire sequence of the
protein and the sera from immunized cattle recognized
the peptides.
37
Blind Dataset 2
We also created another independent Blind dataset 2,
which consists of total 187 epitopes (128 IgE epitopes
obtained from structural database of allergenic proteins
(SDAP)
38
and 59 epitopes obtained from Bcipep data-
base
31
), and none of these epitopes were used in the
training or testing of ABCpred algorithm. This dataset
consists of 109 epitopes having less than 16 residues. To
create a pattern of 16 residues, we added equal number
of residues on both terminals of these epitopes from its
original sequence. We also generated 200 random 16mer
peptides from non allergen dataset of Bjorklund et al.
39
and used as non-epitopes. In summary, Blind dataset 2
consists of 187 epitopes and 200 non-epitopes.
Neural Network
In this study, FNN and partial RNN with a single hid-
den layer have been used. Initially, FNN has been tried,
since it is commonly used in the ANN. However, FNN did
not yield any satisfactory result and prompted us to try
for RNN (Jordan network). Both the networks have been
trained using back-propagation algorithm and with vari-
ous window lengths from 10 to 20 residues. The target
output consists of a single binary number and is one or
zero (B-cell epitopes or non-epitopes). The final Jordan
network has input window of 16 residues and have 35
units in a single hidden layer. For detailed description of
Jordan network see supplementary information at www.
imtech.res.in/raghava/abcpred/ABC_method.html.
The publicly available free simulation packages
SNNS, version 4.2, from Stuttgart University has been
used to implement the neural networks.
40
It allows
TABLE I. Creation of Fixed Length Patterns of 20 or Less Than 20 Amino Acids from B-cell Epitopes
Window length/peptide
AEFPLDIT
a
(8 amino acid length)
ACVPTDPNPQEVVLVNVTEN
b
(20 amino acid length)
20 PKGYVGAEFPLDITAGTEAA ACVPTDPNPQEVVLVNVTEN
18 KGYVGAEFPLDITAGTEA CVPTDPNPQEVVLVNVTE
16 GYVGAEFPLDITAGTE VPTDPNPQEVVLVNVT
14 YVGAEFPLDITAGT PTDPNPQEVVLVNV
12 VGAEFPLDITAG TDPNPQEVVLVN
10 GAEFPLDITADPNPQEVVLV
a
Patterns of different length generated from an epitope of eight amino acids.
b
Patterns of different length generated from an epitope of 20 amino acids.
42 S. SAHA AND G.P.S. RAGHAVA
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
incorporation of the resulting network into an ANSI C
function for use in the stand-alone code. At the start of
each simulation, the weights are initialized with random
values. The training is carried out by using error back-
propagation, with a sum of square error function.
41
The
magnitude of the error sum in the test and training set
is monitored in each cycle of the training. The ultimate
number of cycles is determined when the network con-
verges. During testing, a cut off value is set for each net-
work, and the output produced by the network is com-
pared with the cutoff value. If the output value is
greater than the threshold value, then that peptide is
predicted as B-cell epitope, otherwise as a non-epitope.
For each network, the cutoff value is adjusted so that it
yields the highest accuracy for that network. In this
study we used uniform/same parameters for learning of
five networks on different training sets during fivefold
cross validation. It means we have not optimized per-
formance of networks for individual test sets, instead we
optimized networks in order to get best average accu-
racy. We tried different network parameters during the
training to get the overall best performance (average ac-
curacy) over five sets. In other words, our best result
was achieved by maintaining uniform parameters over
the five subsets.
Performance Measure
Threshold-dependent measure
We used commonly used parameter to evaluate the
performance of method. The evaluation of performance
was at peptide or epitope level and not at residue level.
Five parameters have been used in the present work
to measure the performance of prediction method. Fol-
lowing is the brief description of the parameters: (1)
Q
sens
(sensitivity) is the percent of epitopes that are cor-
rectly predicted as epitopes; (2) Q
spec
(specificity) is the
percent of epitopes correctly predicted as non-epitopes;
(3) Q
acc
(accuracy) is the proportion of correctly pre-
dicted peptides; (4) Q
ppv
(positive prediction value) is the
probability that a predicted epitope is infact an epitope;
and (5) Matthew’s correlation coefficient (MCC) were
also calculated. The parameters can be calculated by the
following equations.
Qsens ¼TP
TP þFN 3100%
Qspec ¼TN
TN þFP 3100%
Qacc ¼TP þTN
TP þFP þTN þFN 3100%
Qppv ¼TP
TP þFP
MCC ¼ðTPÞðTNÞ & ðFPÞðFNÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
½TP þFPTP þFNTN þFPTN þFN(
p
Where TP and FN refer to true positive and false negatives,
TN and FP refer to true negatives and false positives.
Threshold-independent measures
One problem with the threshold-dependent measure is
that they measure the performance on a given threshold.
It is difficult to assess the overall performance of method
using these threshold-dependent parameters. The ROC is
a threshold-independent measure that was developed as a
signal processing technique. For a prediction method,
ROC plot is obtained by plotting all sensitivity values
(true-positive fraction) on the y-axis against their equiva-
lent (1-specificity) values (false-positive fraction) on the x-
axis. The area under the ROC curve is taken as an impor-
tant index because it provides a single measure of overall
accuracy that is not dependent on a particular thresh-
old.
42
It measures discrimination, the ability of a method
to correctly classify B-cell epitopes and non-epitopes.
RESULTS
All the methods have been trained and tested using
fivefold cross-validation. The prediction performance
measures have been averaged over five sets. First, we
trained and tested our method using FNN for different
window lengths (input units) like 10, 12, 14, 16, 18, and
20 (See Table I). The performance of FNN at different
window lengths with single layer of hidden unit 35 at
optimum/default threshold 0.5 is shown in Table II. The
TABLE II. The Performance of Our Neural Network with FNN at Optimum/Default Threshold (0.5)
Window size
Sensitivity
(%)
Specificity
(%)
PPV
(%)
Accuracy
(%) MCC
10 48.14 52.71 50.53 50.43 0.0088
12 53.00 52.14 52.54 52.57 0.0515
(54.71)
a
(54.71)
a
(54.72)
a
(54.71)
a
14 51.43 55.00 53.17 53.21 0.0645
16 53.29 56.57 55.10 54.93 0.0859
(55.86)
a
(54.29)
a
(55.20)
a
(55.07)
a
18 51.43 54.57 52.92 53.00 0.0602
20 54.43 59.14 57.28 56.79 0.1374
These results were obtained by FNN using single hidden layer of 35 units.
a
Indicates maximum percentage at hidden units 10.
43PREDICTION OF CONTINUOUS B-CELL EPITOPES
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
maximum performance achieved by FNN varied from
50.43% (nearly random) to 56.79%. We tried number of
options, including layers hidden units etc., but the accu-
racy does not improve further with FNN (data not
shown).
The accuracy of the method improved significantly (P
value ¼0.01732) when we implemented RNN for train-
ing and testing (at 0.02 level). The overall performance
(ROC plot) of RNN at various thresholds for window
length 16 is shown in Figure 1. ROC plot was obtained
by plotting all sensitivity values on the y-axis against (1-
specificity) on the x-axis for 0.1–1.0 thresholds at inter-
val of 0.1 (See Methods). The best performance of RNN
was at hidden unit 35 with singly hidden layer (see Fig.
1). We also compared the overall performance (ROC plot)
of FNN and RNN at hidden unit 35 with window length
16 and observed that RNN was better than FNN for
whole range (see Fig. 2). These results clearly indicate
the superiority of RNN over FNN in the prediction of B-
cell epitopes. We achieved average accuracy, 65.93%;
sensitivity, 67.14%; specificity, 64.71%; and MCC, 0.3187
using RNN at threshold 0.5. The learning parameters
were same for all five RNN models (e.g., SSE 0.0005,
cycles 5000, JE order; hidden nodes 35) in fivefold cross-
validation. The accuracy at threshold 0.5 for five test
sets was 58.57% (Set 1), 73.57% (Set 2), 72.14% (Set 3),
68.93% (Set 4), and 56.43% (Set 5). We used best RNN
model in our server. The sensitivity, specificity, PPV, ac-
curacy, and MCC at different window lengths using
RNN are shown in Table III.
Testing of ABCpred server on Blind Dataset 1
To evaluate the performance of ABCpred server, we
compute its predictive performance on a blind dataset
(Protein sequences not used in the development of
ABCpred algorithm). For this purpose, four recently
experimentally annotated proteins were obtained from
the literature. We predicted the B-cell epitopes in these
proteins using ABCpred server at default parameters.
The B-cell epitopes (predicted as well as experimentally
determined) were mapped on the protein along its amino
acid sequence. The B-cell epitopes predicted by ABCpred
server in ESAT-6,
33,34
Ag44 protein
35
and MSP1a,
37
Rin-
derpest virus protein
36
are shown in Figure 3(a,b),
respectively. The predicted peptides are displayed rank-
wise based on scores obtained by the trained recurrent
neural network. All the peptides shown in the figure are
at default threshold value (0.5) and window length 16
with overlapping filter. In case of ESAT-6, there was
totally four experimentally determined epitopes, our
server predicted seven epitopes in this protein. Our four
predicted region were in same region in sequence where
experimentally determined epitopes (three) were there.
Fifth epitope cover nearly half of the fourth B-cell epi-
tope. These results indicate that the server has the abil-
ity to detect the potential regions that contain B-cell epi-
topes, with significant accuracy. However, the server
also has lot of over prediction (or false positive) and can-
not predict boundary of B-cell epitopes. One of the rea-
sons for the poor prediction is due to the fact that B-cell
epitopes do not have any fixed length and we are using
a window of fixed length. Also it is not necessary that
epitopes determined experimentally have correct boun-
daries, because in experiment they tried limited peptides
(not all peptides of all possible length). The similar trend
was observed for other proteins; see Figure 3(a,b) for
detail. Overall, the results indicate that the performance
of the method is much better than random in real life.
Testing of ABCpred server on Blind Dataset 2
The performance of ABCpred has been evaluated on
Blind dataset 2, which consists 187 B-cell epitopes and
200 non-epitopes (16mer random peptides). In case, if
the B-cell epitope have more than 16 residues, we exam-
ined all overlapping 16mers and if any 16mer have score
more than the threshold then whole sequence is pre-
dicted as B-cell epitope. As shown in Table IV, we
achieved sensitivity of 71.66%, specificity of 61.50%, and
Fig. 1. The overall performance of our method with RNN for window
size 16. This ROC plot was obtained between sensitivity (y-axis) and
1-specificity (x-axis) for RNN at different thresholds from 0.1 to 1.0 at
interval of 0.1. Fig. 2. ROC plot of two neural networks FNN and RNN used in this
study at window size 16 and hidden units 35.
44 S. SAHA AND G.P.S. RAGHAVA
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
accuracy of 66.41% at default threshold value of 0.5. The
maximum accuracy of 69.25% was achieved using
ABCpred at threshold value of 0.6. Similarly, we eval-
uated the performance of Karplus method using flexibil-
ity and achieved the maximum accuracy of 59.43% at
threshold value of 1.50 (Table V). We achieved maximum
accuracy of 61.49% by Parker method using hydrophilic-
ity scale at the threshold value of 2.00 (Table VI). These
results demonstrate that the ABCpred can predict B-cell
epitopes with reasonably high accuracy.
TABLE III. The Performance of Our Method Using RNN at 0.5 Threshold
Using Single Hidden Layer of 35 Units
Window
Size
Sensitivity
(%)
Specificity
(%)
PPV
(%)
Accuracy
(%) MCC
10 58.71 64.14 61.78 61.43 0.2293
12 53.57 61.71 58.30 57.64 0.1534
14 52.43 65.29 60.12 58.86 0.1786
16 67.14 64.71 65.61 65.93 0.3187
18 58.70 65.0 62.06 61.86 0.2373
20 57.14 71.57 66.51 64.36 0.2871
Fig. 3. Comparative epitope mapping of predictions by ABCpred server against experimental data on set of 4
proteins (a)Mycobacterium tuberculosis ESAT-6 protein and Plasmodiun falciparum Ag44; (b) Rinderpest nu-
cleocapsid protein C terminal and Anaplasma marginale MSP1a N terminal. Red residues are reported in the lit-
erature as immunogenic, blue residues are predicted by ABCpred server as epitope, and underlined residues
are correctly predicted.
45PREDICTION OF CONTINUOUS B-CELL EPITOPES
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
Comparison with Existing Methods
It is important to compare the performance of newly
developed method with the existing methods. Following
is the brief description of major B-cell epitopes predic-
tion methods: (i) Hopps and Woods method is based on
analysis of 12 proteins
43
; (ii) Parker et al. method use
the modified hydrophilic scale
15
; (iii) Karplus and Schulz
developed a method using flexibility scale for predicting
the B-cell epitopes
16
; (iv) Emini et al. developed a
method using surface accessibility of the amino acids
17
;
(v) Kolaskar and Tongaonkar derived their own scale of
antigenicity based on the frequency of residues
44
; (vi)
Pellequer et al. uses turn scales, which they derived
from 87 protein structures
18
; (vii) Pellequer and Westhof
developed a program PREDITOP that uses the 22 nor-
malized scales
19
; (viii) PEOPLE uses combination of
physico-chemical properties,
20
and (ix) BEPITOPE is a
comprehensive program, which allows to combine two or
more parameters.
21
It is not practically possible to eval-
uate all these methods and programs in their original
form, because of number of reasons that includes non-
availability of the methods and most of them are quali-
tative methods. To evaluate the performance of existing
methods, we evaluated the performance of various phys-
ico-chemical properties of residues, rather than methods
itself
24
(http://www.imtech.res.in/raghava/bcepred/). They
evaluate major residue properties (hydrophilicity
15
; flexi-
bility
16
; accessibility,
17
etc.), which are used in most of
the existing method. As shown in Table VII, the per-
formance of the physico-chemical properties varies from
52.92 to 57.53%. The maximum accuracy of 58.70% has
been achieved using the combination of properties.
24
We
achieved maximum accuracy of 65.93% using method
ABCpred described in this study, which is better than
the accuracy achieved using any single property or by
combination. We calculated P-value to test whether the
accuracy of ABCpred is significantly better than accu-
racy of property based methods. We got P-value of 0.012
between accuracies of ABCpred and Karplus
16
method
(flexibility) and P-value of 0.011 between ABCpred and
Parker
15
method (hydrophilicity) accuracies on five test
sets at 0.05 level. These results show that the perform-
ance of ABCpred is significantly better than the methods
based on physico-chemical properties.
Web Server
Based on our observations, a server ABCpred, which
allows users to predict continuous B-cell epitopes in a
protein sequence, has been developed. Users can submit
an amino acid sequence and can select any window
length as well as threshold to be used for epitopes pre-
diction. It presents the result in overlap display and tab-
ular frame. In case of tabular frame, the server ranked
epitopes based on the score obtained from the trained
recurrent neural network. The higher score values of
the peptides indicates the higher probability to be pre-
dicted for an B-cell epitope. The server is accessible from
www.imtech.res.in/raghava/abcpred/.
TABLE IV. The Performance of ABCpred Server on
Blind Data Set 2
Threshold
Sensitivity
(%)
Specificity
(%)
Accuracy
(%)
0.1 99.47 1.00 48.58
0.2 95.72 7.00 49.87
0.3 92.51 18.00 54.00
0.4 82.89 39.50 60.47
0.5 71.66 61.50 66.41
0.6 60.96 77.00 69.25
0.7 49.73 87.00 68.99
0.8 33.16 95.50 65.37
0.9 4.81 99.50 53.75
1.0 0.00 100.00 51.68
TABLE V. The Performance of Karplus Method Based
on Flexibility on Blind Data Set 2
Threshold
Sensitivity
(%)
Specificity
(%)
Accuracy
(%)
0.00 100.00 0.00 48.32
0.50 99.47 5.00 50.65
1.00 95.18 20.50 56.59
1.50 78.60 42.00 59.43
2.00 50.27 63.50 57.11
2.50 23.53 79.00 52.19
3.00 4.81 93.50 50.65
TABLE VI. The Performance of Parker Method Based
on Hydrophilicity on Blind Data Set 2
Threshold
Sensitivity
(%)
Specificity
(%)
Accuracy
(%)
0.00 100.00 0.00 48.32
0.50 99.47 6.00 51.16
1.00 95.72 20.50 56.85
1.50 81.81 41.50 60.98
2.00 58.82 64.00 61.49
2.50 26.74 85.50 57.11
3.00 4.28 98.50 52.97
TABLE VII. The Performance of Various
Physico-Chemical Properties in Predicting
B-cell Epitope Prediction and ABCpred
Physico-chemical
properties/methods Accuracy Sensitivity Specificity
Hydrophilicity
15
54.47 33.04 76.90
Flexibility
16
57.53 47.42 67.64
Accessibility
17
55.49 65.01 45.97
Turns
18
52.92 17.01 88.82
Antigenic scale
44
55.59 58.99 52.19
Polarity
24
54.08 27.50 80.66
Surface
4
55.73 37.12 74.34
Best combination
24
58.70 56.07 61.32
ABCpred
(window length 16)
65.93 67.14 64.71
46 S. SAHA AND G.P.S. RAGHAVA
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
DISCUSSION
The prediction of B-cell epitopes in an antigen sequence
is an important and complex problem. Although, most an-
tigenic determinants of proteins are discontinuous, it is
possible to mimic epitopes by synthetic peptides.
8
Many
algorithms have been developed to predict the location of
continuous epitopes in proteins but their rate of successful
prediction is low.
23
One of the major problems faced in
developing B-cell epitope prediction is the variable length
of the epitope. As all machine learning techniques like
SVM, ANN, and PEBLS require fixed length of pattern/
peptide, it is not possible to use these techniques for B-cell
epitope prediction. Though ANN techniques are used to
classify the proteins of variable lengths from their amino
acid composition (fixed length pattern of 20), it is not pos-
sible in case of epitope/peptide where length is too small
to compute the composition. All the existing methods are
residue property based where first they generate the prop-
erty plots (e.g. hydrophilicity, flexibility) and then selects
the regions in an antigen, which shows the peaks. These
regions are assigned as B-cell epitopes. However, these
methods are subjective in nature because one does not
know the boundaries of epitopes.
In this study, for the first time a systematic attempt has
been made to develop a neural network based method for
predicting B-cell epitopes. A major problem in this method is
the length of B-cell epitope that varies from 5 to 30 residues.
The optimal length of a B-cell epitope is not known, unlike T-
cell epitope where MHC molecule core prefer 9 amino acids
for binding. On other hand, machine learning method
requires a fixed length of window for testing and training.
An initial examination of all the B-cell epitopes obtained
from Bcipep database reveals that most of the epitopes have
20 or less residues. Therefore, in our study we have only used
those epitopes that have 20 or less residues. This way we
have fixed the upper limit of size of patterns used in this
study. Next problem is how to handle epitopes that have resi-
dues <20. For epitopes of length less than 20 amino acids,
we have generated patterns of length of 20 amino acids by
adding neighboring residues both side of the epitope derived
from its original sequence. (See Methods; Table I). This way
we get a pattern of fixed length of 20 amino acids correspond-
ing to each epitope. We feel that this is one of the best ways
to handle this problem. Another problem we faced in this
study was obtaining non B-cell epitopes data. Ideally one
should have experimentally proven non B-cell epitopes data.
Because of lack of such data in the public domain, we gener-
ated random peptides of 20 amino acids from proteins in
Swiss-Prot database. We are not justifying that all these ran-
dom peptides are non B-cell epitopes, and it is possible that
these random peptides may also have B-cell epitopes. We
adopt this strategy of generating non-epitopes (negative
examples) as it has been used in number of investigations in
past.
26–28
Final data set contains patterns of length 20, with
equal number of positive (B-cell epitope) and negative (non
epitope) examples. A machine learning technique (ANN) is
used for discriminating B-cell epitopes from non-epitopes.
Though FNN is a commonly used network, we obtained poor
results using FNN. The percentage of accuracy obtained
using FNN is lower than existing methods based on physico-
chemical properties, and for window lengths of 10 and 12, ac-
curacy of FNN is near random (Table II). It has been
observed in the past that RNN performs better than FNN in
the prediction of secondary structure of proteins.
45
There-
fore, we tried RNN in our study and interestingly the per-
formance of RNN is found to be better than FNN (Table III).
The performance of RNN based method described in this
study also is significantly better than that reported for any
existing B-cell epitope prediction methods. The best perform-
ance of our method has been achieved when length of epitope
is 16 residues. However, 16 cannot be considered as an ideal
length of epitopes as number of epitopes with 15–22 amino
acids length have been identified.
46
We also evaluated the
performance of our method on blind dataset where we com-
pare the predicted and experimentally determined epitopes
in four proteins (not used in testing or training of ABCpred).
As shown in Figure 3(a,b), our method was able to predict
the experimentally determined epitopes with reasonable ac-
curacy. The performance is much better than random, de-
spite the fact that B-cell epitope prediction is a complex prob-
lem. Thus it is worth to use ABCpred server for detecting
potential B-cell epitopes in an antigen.
Though we have obtained high prediction accuracy of B-
cell epitopes in this study, it has its own limitations. The
method described here is not an alternate to existing meth-
ods, but will help to complement these methods. A number
of assumptions have been made in the algorithm because
one cannot directly implement ANN techniques in B-cell
epitopes prediction. The aim of this study is to provide an
additional quantitative method for B-cell epitopes predic-
tion. The accuracy of method is also not very high, despite
our systematic attempts. Users are advised to predict the
B-cell epitope in an antigen using all existing methods,
including our method, and to find out the regions in anti-
genic sequences, predicted by most of the methods.
CONCLUSIONS
It was observed that RNN (JE) has been more success-
ful than FNN in prediction of B-cell epitopes. The length
of the peptide is also important in prediction of B-cell
epitopes from antigenic sequences.
ACKNOWLEDGMENT
We are thankful to Miss Harpreet Kaur for assisting
in running SNNS version 4.2.
REFERENCES
1. Van Regenmortel MH. The concept and operational definition of
protein epitopes. Philos Trans R Soc Lond B Biol Sci 1989;323:
451–466.
2. Wiesmuller KH, Fleckenstein B, Jung G. Peptide vaccines and
peptide libraries. Biol Chem 2001;382:571–579.
3. Zauner W, Lingnau K, Mattner F, Von Gabain A, Buschle M.
Defined synthetic vaccines. Biol Chem 2001;382:581–595.
4. Van Regenmortel MH. Pitfalls of reductionism in the design of
peptide-cased vaccines. Vaccine 2001;19:2369–2374.
5. Negroni L, Bernard H, Clement G, Chatel JM, Brune P, Frobert
Y, Wal JM, Grassi J. Two-site enzyme immunometric assays for
47PREDICTION OF CONTINUOUS B-CELL EPITOPES
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
determination of native and denatured b-lactoglobulin. J Immu-
nol Methods 1998;220:25–37.
6. Selo I, Clement G, Bernard H, Chatel J, Creminon C, Peltre G,
Wal J. Allergy to bovine b-lactoglobulin: specificity of human
IgE to tryptic peptides. Clin Exp Allergy 1999;29:1055–1063.
7. Clement G, Boquet D, Frobert Y, Bernard H, Negroni L, Chatel
JM, Adel-Patient K, Creminon C, Wal JM, Grassi J. Epitopic
characterization of native bovine b-lactoglobulin. J Immunol
Methods 2002;266:67–78.
8. Van Regenmortel MH. Synthetic peptides versus natural
antigens in immunoassays. Ann Biol Clin (Paris) 1993;51:39–
41.
9. Langeveld JP, Martinez-Torrecuadrada J, Boshuizen RS, Meloen
RH, Ignacio CJ. Characterisation of a protective linear B
cell epitope against feline parvoviruses. Vaccine 2001;19:2352–
2360.
10. Castelletti D, Fracasso G, Righetti S, Tridente G, Schnell R,
Engert A, Colombatti M. A dominant linear B-cell epitope of
ricin A-chain is the target of a neutralizing antibody response
in Hodgkin’s lymphoma patients treated with an anti-CD25
immunotoxin. Clin Exp Immunol 2004;136:365–372.
11. Estienne V, Duthoit C, Blanchin S, Montserret R, Durand-Gorde
JM, Chartier M, Baty D, Carayon P, Ruf J. Analysis of a confor-
mational B cell epitope of human thyroid peroxidase: identifica-
tion of a tyrosine residue at a strategic location for immunodo-
minance. Int Immunol 2002;14:359–366.
12. Kulkarni-Kale U, Bhosle S, Kolaskar AS. CEP: a conformational
epitope prediction server. Nucleic Acids Res 2005;33:W168–
W171. Web server issue.
13. Flower DR. Towards in silico prediction of immunogenic epi-
topes. Trends Immunol 2003;24:667–674.
14. Pellequer JL, Westhof E, Regenmortel MHV. Predicting location
of continuous epitopes in proteins from their primary struc-
tures. Methods Enzymol 1991;203:176–201.
15. Parker JMD, Guo D, Hodges RS. New hydrophilicity scale
derived from high-performance liquid chromatography peptide
retention data: correlation of predicted surface residues with anti-
genicity and X-ray-derived accessible sites. Biochemistry 1986;25:
5425–5432.
16. Karplus PA, Schulz GE. Prediction of chain flexibility in pro-
teins: a tool for the selection of peptide antigen. Naturwissen-
schaften 1985;72:212,213.
17. Emini EA, Hughes JV, Perlow DS, Boger J. Induction of hepati-
tis A virus-neutralizing antibody by a virus-specific synthetic
peptide. J Virol 1985;55:836–839.
18. Pellequer J-L, Westhof E, Regenmortel MHV. Correlation
between the location of antigenic sites and the prediction of
turns in proteins. Immunol Lett 1993;36:83–99.
19. Pellequer JL, Westhof E. PREDITOP: A program for antigenic-
ity prediction. J Mol Graphics 1993;11:204–210.
20. Alix AJ. Predictive estimation of protein linear epitopes by
using the program PEOPLE. Vaccine 1999;18:311–314.
21. Odorico M, Pellequer JL. BEPITOPE: predicting the location of
continuous epitope and patterns in proteins. J Mol Recognit
2003;16:20–22.
22. Van Regenmortel MHV, de Marcillac GD. An assessment of pre-
diction methods for locating continuous epitopes in proteins.
Immunol Lett 1988;17:95–107.
23. Van Regenmortel MH, Pellequer JL. Predicting antigenic deter-
minants in proteins: looking for unidimensional solutions to a
three-dimensional problem? Pept Res 1994;7:224–228.
24. Saha S, Raghava GPS. BcePred: prediction of continuous B-cell
epitopes in antigenic sequences using physico-chemical proper-
ties. In: Nicosia G, Cutello V, Bentley PJ, Timis J, editors. ICA-
RIS 2004, LNCS 3239. Berlin: Springer; 2004. pp 197–204.
25. Blythe MJ, Flower DR. Benchmarking B cell epitope prediction:
underperformance of existing methods. Prot Sci 2005;14:246–
248.
26. Brazma A, Jonassen I, Eidhammer I, Gilbert D. Approaches to
the automatic discovery of patterns in biosequences. J Comput
Biol 1998;5:279–305.
27. Singh H, Raghava GPS. ProPred1: prediction of promiscuous
MHC class-I binding sites. Bioinformatics 2003;19:1009–1014.
28. Singh H, Raghava GPS. PropPred: prediction of HLA-DR bind-
ing sites. Bioinformatics 2001;17:1236,1237.
29. Lesenechal M, Becquart L, Lacoux X, Ladaviere L, Baida RC,
Paranhos-Baccala G, da Silveira JF. Mapping of B-cell epitopes
in a Trypanosoma cruzi immunodominant antigen expressed in
natural infections. Clin Diagn Lab Immunol 2005;12:329–333.
30. Choi KS, Nah JJ, Ko YJ, Kang SY, Yoon KJ, Jo NI. Antigenic
and immunogenic investigation of B-cell epitopes in the nucleo-
capsid protein of peste des petits ruminants virus. Clin Diagn
Lab Immunol 2005;12:114–121.
31. Saha S, Bhasin M, Raghava GPS. Bcipep: A database of B-cell
epitopes. BMC Genom 2005;6:79.
32. Bairoch A, Apweiler R. The SWISS-PROT protein sequence
database and its supplement TrEMBL in 2000. Nucleic Acids
Res 2000;28:45–48.
33. Kanaujia GV, Motzel S, Garcia MA, Andersen P, Gennaro ML.
Recognition of ESAT-6 sequences by antibodies in sera of tuber-
culous nonhuman primates. Clin Diagn Lab Immunol 2004;
11:222–226.
34. Harboe M, Malin AS, Dockrell HS, Wiker HG, Ulvund G, Holm
A, Jorgensen MC, Andersen P. B-cell epitopes and quantification
of the ESAT-6 protein of Mycobacterium tuberculosis. Infect
Immun 1998;66:717–723.
35. Doury JC, Goasdoue JL, Tolou H, Martelloni M, Bonnefoy S,
Mercereau-Puijalon O. Characterisation of the binding sites of
monoclonal antibodies reacting with the Plasmodium falcipa-
rum rhoptry protein RhopH3. Mol Biochem Parasitol 1997;85:
149–159.
36. Choi KS, Nah JJ, Ko YJ, Kang SY, Yoon KJ, Joo YS. Character-
ization of immunodominant linear B-cell epitopes on the car-
boxy terminus of the rinderpest virus nucleocapsid protein. Clin
Diagn Lab Immunol 2004;11:658–664.
37. Garcia-Garcia JC, de la Fuente J, Kocan KM, Blouin EF, Hal-
bur T, Onet VC, Saliki JT. Mapping of B-cell epitopes in the N-
terminal repeated peptides of Anaplasma marginale major sur-
face protein 1a and characterization of the humoral immune
response of cattle immunized with recombinant and whole orga-
nism antigens. Vet Immunol Immunopathol 2004;98:137–151.
38. Ivanciuc O, Schein CH, Braun W. SDAP: database and compu-
tational tools for allergenic proteins. Nucleic Acids Res 2003;31:
359–362.
39. Bjorklund AK, Soeria-Atmadja D, Zorzet A, Hammerling U,
Gustafsson MG. Supervised identification of allergen-represen-
tative peptides for in silico detection of potentially allergenic
proteins. Bioinformatics 2005;21:39–50.
40. Zell A, Mamier G. Stuttgart neural network simulator, version
4.2. University of Stuttgart, Stuttgart, 1997.
41. Rumelhart DE, Hinton GE, Williams RJ. Learning representa-
tions by back-propagation errors. Nature 1986;323:533–563.
42. Deleo JM. In: Proceedings of the second international sympo-
sium on uncertainity modelling and analysis, IEEE 1993. Col-
lege Park, MD: Computer Society Press; 1993. pp 318–325.
43. Hopp TP, Woods RK. Predictions of protein antigenic determi-
nants from amino acid sequences. Proc Natl Acad Sci USA
1981;78:3824–3828.
44. Kolaskar AS, Tongaonkar PC. A semi-empirical method for pre-
diction of antigenic determinants on protein antigens. FEBS
Lett 1990;276:172–174.
45. Baldi P, Brunak S. Exploiting the past and the future in protein
secondary structure prediction. Bioinformatics 1999;15:937–946.
46. Colman PM, Laver WG, Varghese JN, Baker AT, Tulloch PA, Air
GM, Webster RG. Three-dimensional structure of a complex of anti-
body with influenza virus neuraminidase. Nature 1987;326:358.
48 S. SAHA AND G.P.S. RAGHAVA
PROTEINS: Structure, Function, and Bioinformatics DOI 10.1002/prot
... For predicting B-cell lymphocytes (BCL), the server was trained on 700 B-cell epitopes and 700 non-B-cell epitopes. Using a recurrent neural network, this server's expected accuracy is 65.93% [15]. In the ABCpred server, the default BCL prediction threshold was set to 0.51. ...
... Immunoinformatics tools, specifically the ABCpred server [29] (https:// webs. iiitd. ...
Article
Full-text available
The Porcine epidemic diarrhea virus (PEDV) presents a substantial risk to the domestic pig industry, resulting in extensive and fatal viral diarrhea among piglets. Recognizing the mucosal stimulation triggered by PEDV and harnessing the regulatory impact of lactobacilli on intestinal function, we have developed a lactobacillus-based vaccine that is carefully designed to elicit a strong mucosal immune response. Through bioinformatics analysis, we examined PEDV S proteins to identify B-cell linear epitopes that meet the criteria of being non-toxic, soluble, antigenic, and capable of neutralizing the virus. In this study, a genetically modified strain of Lactobacillus mucosae G01 (L.mucosae G01) was created by utilizing the S layer protein (SLP) as a scaffold for surface presentation. Chimeric immunodominant epitopes with neutralizing activity were incorporated at various sites on SLP. The successful expression of SLP chimeric immunodominant epitope 1 on the surface of L.mucosae G01 was confirmed through indirect immunofluorescence and transmission electron microscopy, revealing the formation of a transparent membrane. The findings demonstrate that the oral administration of L.mucosae G01, which expresses the SLP chimeric immunodominant gene epitope1, induces the production of secreted IgA in the intestine and feces of mice. Additionally, there is an elevation in IgG levels in the serum. Moreover, the levels of cytokines IL-2, IL-4, IFN-γ, and IL-17 are significantly increased compared to the negative control group. These results suggest that L. mucosae G01 has the ability to deliver exogenous antigens and elicit a specific mucosal immune response against PEDV. This investigation presents new possibilities for immunoprophylaxis against PEDV-induced diarrhea.
... Initially, ABCpred was used to predict 16-mer epitopes at a set threshold of 0.70 based on recurrent neural network (http://www.imtech.res.in/raghava/abcpred/) (49). Subsequently, the BCPreds server (http://ailab.ist.psu.edu/bcpred/), a novel tool employing kernel methods, was employed to predict 16-mer epitopes at a set threshold of 0.90. ...
Preprint
Full-text available
Onchocerciasis is a devastating tropical disease that causes severe eye and skin lesions. As global efforts shift from disease control to elimination, prophylactic/therapeutic vaccines have emerged as alternative elimination tools. Notably, Ov-RAL-2 and Ov-103 antigens have shown great promise in preclinical studies and plans are underway for clinical trials. Here, we predict the immunogenicity and other vaccine-related parameters for both antigens using immunoinformatics, as potential vaccine candidates against onchocerciasis. The analysis reveals that both antigens exhibit a favourable safety profile, making them promising candidates poised for human trials. Importantly, in silico immune simulation forecasts heightened antibody production and sustained cellular responses for both vaccine candidates. Indeed, the antigens were predicted to harbour substantial numbers of a wide range of distinct epitopes associated with protective responses against onchocerciasis, as well as the potential for stimulating innate immune TLR-4 receptor recognition with Ov-103 exhibiting better structural efficiency and antigenicity with no homology to human proteins compared to Ov-RAL-2. Overall, we provide herein valuable insights for advancing the development of Ov-103 and RAL-2 vaccine candidates against onchocerciasis in humans. Keywords: onchocerciasis, immunoinformatics, Ov-RAL-2, Ov-103, antigenicity, safety, protective immunity, molecular docking, molecular dynamics simulation
... Although this web resource contains 1,595,239 Chapter 7 An emerging trends of bioinformatics and big data analytics in healthcare (peptidic epitopes), 3,185 (nonpeptidic epitopes), 504,230 (T-cell assays), 1,387,504 (B-cell assays), 4,777,367 (MHC assays), 4,399 (origin of organisms of epitopes), 988 (restricted antigenic alleles), and 24,039 references [59]. ABCpred is employed to identify the linear Bcell epitopes from the antigenic query sequence based on the artificial neural network [60]. VaxiJen 2.0 is the alignment-independent prediction for identifying protective antigens based on the query sequence, and they used the threshold criteria based on their selected target organism [61]. ...
Chapter
Full-text available
In the current situation, the world is facing a variety of diseases, and healthcare management is also facing various enabling challenges due to emerging diseases. As healthcare is essential to human existence, most cutting-edge techniques are employed to enhance healthcare. In the era of knowledge mining, informatics plays a crucial role in various branches of research, especially in the period of the technological world since there are constantly evolving computational resources, technology, and algorithms, computational biology driven out of research laboratories and into our everyday lives to deal with it and manage it within the allotted time. Due to advanced and high-tech emerging algorithms in bioinformatics, personal computers can have the power of supercomputers, reducing research costs and time, ensuring safe and effective methods, and accelerating the discovery of novel human and healthcare managementrelated outcomes. Based on the fusion of computers and biology, computational biology can be recognized as an information science discipline that can assist in comprehending the complexity of diseases and their underlying mechanisms using a variety of fundamental approaches. As every individual contains a distinctive genome and a high degree of individuality, achieving a healthcare system where each patient might get personalized medication is one of the greatest challenges humans are facing in the present era. Monitoring patterns of data, which undergoes analysis to facilitate the discovery of strategic and decision-making-relevant insights, is possible in healthcare, thanks to big data analytics technology, along with patient diagnostics, rapid epidemic recognition, and enhanced patient management. Therefore, this chapter aims to provide an indepth overview of bioinformatics, various tools and their applications, health informatics, and the health care system. Thus, this study aims to contribute to a technologically distinct perspective of advancements in bioinformatics and big data analysis methods that can be useful to healthcare.
Article
Background Malaria has remained a major health concern for decades among people living in tropical and sub-tropical countries. Plasmodium falciparum is one of the critical species that cause severe malaria and is responsible for major mortality. Moreover, the parasite has generated resistance against all WHO recommended drugs and therapies. Therefore, there is an urgent need for preventive measures in the form of reliable vaccines to achieve the target of a malaria-free world. Surface proteins are the preferable choice for subunit vaccine development because they are rapidly detected and engaged by host immune cells and vaccination-induced antibodies. Additionally, abundant surface or membrane proteins may contribute to the opsonization of pathogens by vaccine-induced antibodies. Results In our study, we have listed all those surface proteins from the literature that could be functionally important and essential for infection and immune evasion of the malaria parasite. Eight Plasmodium surface and membrane proteins from the pre-erythrocyte and erythrocyte stages were shortlisted. Thirty-seven epitopes (B-cell, CTL, and HTL epitopes) from these proteins were predicted using immune-informatic tools and joined with suitable peptide linkers to design a vaccine construct. A TLR-4 agonist peptide adjuvant was added at the N-terminus of the multi-epitope series, followed by the PADRE sequence and EAAAK linker. The TLR-4 receptor was docked with the construct’s anticipated model structure. The complex of vaccine and TLR-4, with the lowest energy −1514, was found to be stable under simulated physiological settings. Conclusion This study has provided a novel multi-epitope construct that may be exploited further for the development of an efficient vaccine for malaria.
Article
Full-text available
Hepatitis E virus (HEV) is a foodborne virus transmitted through the faecal–oral route that causes viral hepatitis in humans worldwide. Ever since its discovery as a zoonotic agent, HEV was isolated from several species with an expanding range of hosts. HEV possesses several features of other RNA viruses but also has certain HEV‐specific traits that make its viral–host interactions inimitable. HEV leads to severe morbidity and mortality in immunocompromised people and pregnant women across the world. The situation in underdeveloped countries is even more alarming. Even after creating a menace across the world, we still lack an effective vaccine against HEV. Till date, there is only one licensed vaccine for HEV available only in China. The development of an anti‐HEV vaccine that can reduce HEV‐induced morbidity and mortality is required. Live attenuated and killed vaccines against HEV are not accessible due to the lack of a tolerant cell culture system, slow viral replication kinetics and varying growth conditions. Thus, the main focus for anti‐HEV vaccine development is now on the molecular approaches. In the current study, we have designed a multi‐epitope vaccine against HEV through a reverse vaccinology approach. For the first time, we have used viral ORF3, capsid protein and polyprotein altogether for epitope prediction. These are crucial for viral replication and persistence and are major vaccine targets against HEV. The proposed in silico vaccine construct comprises of highly immunogenic and antigenic T‐cell and B‐cell epitopes of HEV proteins. The construct is capable of inducing an effective and long‐lasting host immune response as evident from the simulation results. In addition, the construct is stable, non‐allergic and antigenic for the host. Altogether, our findings suggest that the in silico vaccine construct may be useful as a vaccine candidate for preventing HEV infections.
Conference Paper
Full-text available
A crucial step in designing of peptide vaccines involves the identification of B-cell epitopes. In past, numerous methods have been developed for predicting continuous B-cell epitopes, most of these methods are based on physico-chemical properties of amino acids. Presently, its difficult to say which residue property or method is better than the others because there is no independent evaluation or benchmarking of existing methods. In this study the performance of various residue properties commonly used in B-cell epitope prediction has been evaluated on a clean dataset. The dataset used in this study consists of 1029 non-redundant B cell epitopes obtained from Bcipep database and equally number of non-epitopes obtained randomly from SWISS-PROT database. The performance of each residue property used in existing methods has been computed at various thresholds on above dataset. The accuracy of prediction based on properties varies between 52.92% and 57.53%. We have also evaluated the combination of two or more properties as combination of parameters enhance the accuracy of prediction. Based on our analysis we have developed a method for predicting B cell epitopes, which combines four residue properties. The accuracy of this method is 58.70%, which is slightly better than any single residue property. A web server has been developed to predict B cell epitopes in an antigen sequence. The server is accessible from http://www.imtech.res.in/raghava/bcepred/
Article
Full-text available
SWISS-PROT (http://www.expasy.ch/) is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Recent developments of the database include: an increase in the number and scope of model organisms; cross-references to two additional databases; a variety of new documentation files and improvements to TrEMBL, a computer annotated supplement to SWISS-PROT. TrEMBL consists of entries in SWISS-PROT-like format derived from the translation of all coding sequences (CDS) in the EMBL nucleotide sequence database, except the CDS already included in SWISS-PROT.
Article
Background Bovine beta-lactoglobulin (Blg) is a major cow's milk allergen. It is the main whey protein, without any counterpart in human milk. Big chemical hydrolysates appeared to retain most of the immunoreactivity of the native protein. Allergenicity of Big has already been shown to be associated with the four peptides derived from cyanogen bromide cleavage of Big. Objectives To map the major allergenic epitopes (e.g. regions of the molecule able to bind IgE) on Big using specific IgE from sera of 46 milk-allergic patients as a probe. Methods Direct and competitive inhibition enzyme immunoassays involving immobilized native protein or purified peptides derived from Big tryptic cleavage. Results Several peptides capable of specifically binding human IgEs were identified and were classified according to the intensity and frequency of the responses. The major epitopes appeared to be fragments (41-60), (102-124) and (149-162) recognized by 92, 97 and 89% of sera, respectively, whilst a second group which contained the fragments (1-8) and (25-40) was recognized by 58 and 72% of the population. A third group, comprising peptides (9-14), (84-91) and (92-100), was still detected by more than 40% of sera. Conclusion Three peptides were identified as major epitopes, recognized by a large majority of human IgE antibodies. Numerous other epitopes are scattered all along the Big sequence.
Article
Thyroid peroxidase (TPO) is involved in autoimmune thyroid diseases and high titers of TPO autoantibodies directed to various conformational B cell epitopes are frequently present in patients’ sera. Deciphering these epitopes is a difficult task, but can give insight into the structural basis of autoimmune recognition. TPO is a membrane-bound enzyme with the extracellular part organized in three protein domains, but of unknown three-dimensional structure. We previously localized a TPO B cell epitope within amino acid residues 742‐848, a region encompassing the two C-terminal, extracellular domains of the protein. We found that at least one of the three tyrosine residues of the peptide 742‐848 might be involved in autoantibody binding. In this study, we show by site-directed mutagenesis that the autoepitope contains tyrosine 772 located near the hinge area between the two protein domains, suggesting they are both involved in the epitope structure. The B cell epitopes of TPO are clustered in two overlapping immunodominant regions. To map the newly localized epitope with respect of these regions, competition experiments were performed using a reference panel of TPO mAb and a further mAb previously found to be specific for the TPO peptide 742‐848 at variance with all the other ones. Here, we show that the tyrosine 772-bearing epitope in the peptide 742‐848 maps in a region that partly overlaps the reported two immunodominant regions. These results are suggestive of a complex TPO folding that involves all the three TPO protein domains to form a highly conformational immunodominant region.
Article
We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal 'hidden' units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure.
Chapter
We here describe SNNS, a neural network simulator for Unix workstations that has been developed at the University of Stuttgart, Germany. Our network simulation environment is a tool to generate, train, test, and visualize artificial neural networks. The simulator consists of three major components: a simulator kernel that operates on the internal representation of the neural networks, a graphical user interface based on X-Windows to interactively create, modify and visualize neural nets, and a compiler to generate large neural networks from a high level network description language.