High-throughput prediction of protein antigenicity using protein microarray data

Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA.
Bioinformatics (Impact Factor: 4.98). 10/2010; 26(23):2936-43. DOI: 10.1093/bioinformatics/btq551
Source: PubMed


Discovery of novel protective antigens is fundamental to the development of vaccines for existing and emerging pathogens. Most computational methods for predicting protein antigenicity rely directly on homology with previously characterized protective antigens; however, homology-based methods will fail to discover truly novel protective antigens. Thus, there is a significant need for homology-free methods capable of screening entire proteomes for the antigens most likely to generate a protective humoral immune response.
Here we begin by curating two types of positive data: (i) antigens that elicit a strong antibody response in protected individuals but not in unprotected individuals, using human immunoglobulin reactivity data obtained from protein microarray analyses; and (ii) known protective antigens from the literature. The resulting datasets are used to train a sequence-based prediction model, ANTIGENpro, to predict the likelihood that a protein is a protective antigen. ANTIGENpro correctly classifies 82% of the known protective antigens when trained using only the protein microarray datasets. The accuracy on the combined dataset is estimated at 76% by cross-validation experiments. Finally, ANTIGENpro performs well when evaluated on an external pathogen proteome for which protein microarray data were obtained after the initial development of ANTIGENpro.
ANTIGENpro is integrated in the SCRATCH suite of predictors available at

Download full-text


Available from: Pierre Baldi
  • Source
    • "A high-throughput in silico prediction of protein antigenicity was used to investigate antigenic properties of seroreactive proteins using the SCRATCH prediction suite [41] available online at Evaluated parameters included secondary structure using SSpro and SSpro8 [42], determination of putative epitopic regions in antigens by evaluating continuous B-cell epitopes using COBEpro [43], and protein antigenicity based on multiple representations of the primary sequence using ANTIGENpro as well as SVMTriP software [44] (available at http://sysbio. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Unlabelled: Sporotrichosis is a polymorphic disease that affects both humans and animals worldwide. The fungus gains entry into a warm-blooded host through minor trauma to the skin, typically by contaminated vegetation or by scratches and bites from a diseased cat. Cellular and humoral responses triggered upon pathogen introduction play important roles in the development and severity of the disease. We investigated molecules expressed during the host-parasite interplay that elicit the humoral response in human sporotrichosis. For antigenic profiling, Sporothrix yeast cell extracts were separated by two-dimensional (2D) gel electrophoresis and probed with pooled sera from individuals with fixed cutaneous and lymphocutaneous sporotrichosis. Thirty-five IgG-seroreactive spots were identified as eight specific proteins by MALDI-ToF/MS. Remarkable cross-reactivity among Sporothrix brasiliensis, Sporothrix schenckii, and Sporothrix globosa was noted and antibodies strongly reacted with the 70-kDa protein (gp70), irrespective of clinical manifestation. Gp70 was successfully identified in multiple spots as 3-carboxymuconate cyclase. In addition, 2D-DIGE characterization suggested that the major antigen of sporotrichosis undergoes post-translational modifications involving glycosylation and amino acid substitution, resulting in at least six isoforms and glycoforms that were present in the pathogenic species but absent in the ancestral non-virulent Sporothrix mexicana. Although a primary environmental function related to the benzoate degradation pathway of aromatic polymers has been attributed to orthologs of this molecule, our findings support the hypothesis that gp70 is important for pathogenesis and invasion in human sporotrichosis. We propose a diverse panel of new putative candidate molecules for diagnostic tests and vaccine development. Biological significance: Outbreaks due to Sporothrix spp. have emerged over time, affecting thousands of patients worldwide. A sophisticated host-pathogen interplay drives the manifestation and severity of infection, involving immune responses elicited upon traumatic exposure of the skin barrier to the pathogen followed by immune evasion. Using an immunoproteomic approach we characterized proteins of potential significance in pathogenesis and invasion that trigger the humoral response during human sporotrichosis. We found gp70 to be a cross-immunogenic protein shared among pathogenic Sporothrix spp. but absent in the ancestral environmental S. mexicana, supporting the hypothesis that gp70 plays key roles in pathogenicity. For the first time, we demonstrate with 2D-DIGE that post-translational modifications putatively involve glycosylation and amino acid substitution, resulting in at least six isoforms and glycoforms, all of them IgG-reactive. These findings of a convergent humoral response highlight gp70 as an important target serological diagnosis and for vaccine development among phylogenetically related agents of sporotrichosis.
    Full-text · Article · Nov 2014 · Journal of Proteomics
  • Source
    • "However, our analyses may exhibit bias as consequence of the proteins selected to go on the array. Yet, we anticipate similar results from a full proteome chip and have previous catalogues of serodominant antigens for numerous bacteria and have consistently found these features predict antigenicity19212232. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Current serological diagnostic assays for typhoid fever are based on detecting antibodies against Salmonella LPS or flagellum, resulting in a high false-positive rate. Here we used a protein microarray containing 2,724 Salmonella enterica serovar Typhi antigens (>63% of proteome) and identified antibodies against 16 IgG antigens and 77 IgM antigens that were differentially reactive among acute typhoid patients and healthy controls. The IgG target antigens produced a sensitivity of 97% and specificity of 80%, whereas the IgM target antigens produced 97% and 91% sensitivity and specificity, respectively. Our analyses indicated certain features such as membrane association, secretion, and protein expression were significant enriching features of the reactive antigens. About 72% of the serodiagnostic antigens were within the top 25% of the ranked antigen list using a Naïve bayes classifier. These data provide an important resource for improved diagnostics, therapeutics and vaccine development against an important human pathogen.
    Full-text · Article · Jan 2013 · Scientific Reports
  • Source
    • "Out of these 1463, 73 proteins are antigenic and the remaining 1390 proteins are considered as non-antigens. More information about this data set can be found in [9] [20] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying protective antigens from bacterial pathogens is important for developing vaccines. Most computational methods for predicting protein antigenicity rely on sequence similarity between a query protein sequence and at least one known antigen. Such methods limit our ability to predict novel antigens (i.e., antigens that are not homologous to any known antigen). Therefore, there is an urgent need for alignment-free computational methods for reliable prediction of protective antigens. We evaluated the discriminative power of four different amino acid composition derived feature representations using three classification methods (Logistic Regression, Support Vector Machine, and Random Forest) on a cross validation data set of 193 protective bacterial antigens and 193 non-antigenic bacterial proteins. Our results show that, with all four data representations, Random Forest classifiers consistently outperform other classifiers. We compared HRF50, one of the best performing Random Forest classifiers with VaxiJen and SignalP on independent test sets derived from the Chlamydia trachomatis and Bartonella proteomes. Our results show that our HRF50 predictor outperforms VaxiJen and is competitive with SignalP and ANTIGENpro in predicting protective antigens. We further showed that when we combine SignalP with HRF50, the resulting method, which we call BacGen, yields performance that is comparable to or better than that of ANTIGENpro in predicting antigens in bacterial sequences. We conclude that amino acid sequence composition derived features can be effectively used to design alignment-free methods for predicting protein antigenicity using Random Forest classifiers. BacGen is available as an online server at:
    Full-text · Conference Paper · Jan 2012
Show more