High-throughput prediction of protein antigenicity using protein microarray data.

Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA.
Bioinformatics (Impact Factor: 4.62). 10/2010; 26(23):2936-43. DOI: 10.1093/bioinformatics/btq551
Source: PubMed

ABSTRACT Discovery of novel protective antigens is fundamental to the development of vaccines for existing and emerging pathogens. Most computational methods for predicting protein antigenicity rely directly on homology with previously characterized protective antigens; however, homology-based methods will fail to discover truly novel protective antigens. Thus, there is a significant need for homology-free methods capable of screening entire proteomes for the antigens most likely to generate a protective humoral immune response.
Here we begin by curating two types of positive data: (i) antigens that elicit a strong antibody response in protected individuals but not in unprotected individuals, using human immunoglobulin reactivity data obtained from protein microarray analyses; and (ii) known protective antigens from the literature. The resulting datasets are used to train a sequence-based prediction model, ANTIGENpro, to predict the likelihood that a protein is a protective antigen. ANTIGENpro correctly classifies 82% of the known protective antigens when trained using only the protein microarray datasets. The accuracy on the combined dataset is estimated at 76% by cross-validation experiments. Finally, ANTIGENpro performs well when evaluated on an external pathogen proteome for which protein microarray data were obtained after the initial development of ANTIGENpro.
ANTIGENpro is integrated in the SCRATCH suite of predictors available at

  • [Show abstract] [Hide abstract]
    ABSTRACT: Cancer immunotherapy has an outstanding position in cancer prevention and treatment. In this kind of therapy, the immune system is activated to eliminate cancerous cells. Multi-epitope peptide cancer vaccines are manifesting as the next generation of cancer immunotherapy. In the present study, we have implemented various strategies to design an efficient multi-epitope vaccine. CD8+ cytolytic T lymphocytes (CTLs) epitopes, which have a pivotal role in cellular immune responses, helper epitopes and adjuvant, are three crucial components of peptide vaccine. CTL epitopes were determined from two high immunogenic protein Wilms tumor-1 (WT1) and human papillomavirus (HPV) E7 by various servers, which apply different algorithms. CTL epitopes were linked together by AAY and HEYGAEALERAG motifs to enhance epitope presentation. Pan HLA DR-binding epitope (PADRE) peptide sequence and helper epitopes, which have defined from Tetanus toxin fragment C (TTFrC) by various servers, were used to induce CD4+ helper T lymphocytes (HTLs) responses. Additionally, Helper epitopes were conjugated together via GPGPG motifs that stimulate HTL immunity. Heparin-Binding Hemagglutinin (HBHA), a novel TLR4 agonist was employed as an adjuvant to polarize CD4+ T cells toward T-helper 1 to induce strong CTL responses. Moreover, The EAAAK linker was introduced to N and C terminals of HBHA for efficient separation. 3D model of protein was generated and predicted B cell epitopes were determined from the surface of built structure. Our protein contains several linear and conformational B cell epitopes, which suggests the antibody triggering property of this novel vaccine. Hence, our final protein can be used for prophylactic or therapeutic usages, because it can potentially stimulate both cellular and humoral immune responses.
    Journal of Theoretical Biology 02/2014; · 2.35 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In our earlier study, an immunoblot analysis using sera from febrile patients revealed that a 50-kDa band from an outer membrane protein fraction of Salmonella enterica serovar Typhi was specifically recognized only by typhoid sera and not sera from other febrile illnesses. Here, we investigated the identities of the proteins contained in the immunogenic 50-kDa band to pinpoint antigens responsible for its immunogenicity. We first used LC-MS/MS for protein identification, then used the online tool ANTIGENpro for antigenicity prediction and produced recombinant proteins of the lead antigens for validation in an enzyme-linked immunosorbent assay (ELISA). We found that proteins TolC, GlpK and SucB were specific to typhoid sera but react to antibodies differently under native and denatured conditions. This difference suggests the presence of linear and conformational epitopes on these proteins.
    Applied Biochemistry and Biotechnology 08/2014; · 1.69 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying protective antigens from bacterial pathogens is important for developing vaccines. Most computational methods for predicting protein antigenicity rely on sequence similarity between a query protein sequence and at least one known antigen. Such methods limit our ability to predict novel antigens (i.e., antigens that are not homologous to any known antigen). Therefore, there is an urgent need for alignment-free computational methods for reliable prediction of protective antigens. We evaluated the discriminative power of four different amino acid composition derived feature representations using three classification methods (Logistic Regression, Support Vector Machine, and Random Forest) on a cross validation data set of 193 protective bacterial antigens and 193 non-antigenic bacterial proteins. Our results show that, with all four data representations, Random Forest classifiers consistently outperform other classifiers. We compared HRF50, one of the best performing Random Forest classifiers with VaxiJen and SignalP on independent test sets derived from the Chlamydia trachomatis and Bartonella proteomes. Our results show that our HRF50 predictor outperforms VaxiJen and is competitive with SignalP and ANTIGENpro in predicting protective antigens. We further showed that when we combine SignalP with HRF50, the resulting method, which we call BacGen, yields performance that is comparable to or better than that of ANTIGENpro in predicting antigens in bacterial sequences. We conclude that amino acid sequence composition derived features can be effectively used to design alignment-free methods for predicting protein antigenicity using Random Forest classifiers. BacGen is available as an online server at:
    Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine; 01/2012

Full-text (2 Sources)

Available from
Jun 2, 2014