Guang Lan Zhang

Dana-Farber Cancer Institute, Boston, MA, USA

Are you Guang Lan Zhang?

Claim your profile

Publications (26)66.95 Total impact

  • Article: Novel myeloma-associated antigens revealed in the context of syngeneic hematopoietic stem cell transplantation.
    [show abstract] [hide abstract]
    ABSTRACT: Targets of curative donor-derived graft-versus-myeloma (GVM) responses after allogeneic hematopoietic stem cell transplantation (HSCT) remain poorly defined, partly because immunity against minor histocompatibility Ags (mHAgs) complicates the elucidation of multiple myeloma (MM)-specific targets. We hypothesized that syngeneic HSCT would facilitate the identification of GVM-associated Ags because donor immune responses in this setting should exclusively target unique tumor Ags in the absence of donor-host genetic disparities. Therefore, in the present study, we investigated the development of tumor immunity in an HLA-A0201(+) MM patient who achieved durable remission after myeloablative syngeneic HSCT. Using high-density protein microarrays to screen post-HSCT plasma, we identified 6 Ags that elicited high-titer (1:5000-1:10 000) Abs that correlated with clinical tumor regression. Two Ags (DAPK2 and PIM1) had enriched expression in primary MM tissues. Both elicited Ab responses in other MM patients after chemotherapy or HSCT (11 and 6 of 32 patients for DAPK2 and PIM1, respectively). The index patient also developed specific CD8(+) T-cell responses to HLA-A2-restricted peptides derived from DAPK2 and PIM1. Peptide-specific T cells recognized HLA-A2(+) MM-derived cell lines and primary MM tumor cells. Coordinated T- and B-cell immunity develops against MM-associated Ags after syngeneic HSCT. DAPK1 and PIM1 are promising target Ags for MM-directed immunotherapy.
    Blood 01/2012; 119(13):3142-50. · 9.90 Impact Factor
  • Source
    Article: PB1-F2 Finder: scanning influenza sequences for PB1-F2 encoding RNA segments.
    [show abstract] [hide abstract]
    ABSTRACT: PB1-F2 is a major virulence factor of influenza A. This protein is a product of an alternative reading frame in the PB1-encoding RNA segment 2. Its presence of is dictated by the presence or absence of premature stop codons. This virulence factor is present in every influenza pandemic and major epidemic of the 20th century. Absence of PB1-F2 is associated with mild disease, such as the 2009 H1N1 ("swine flu"). The analysis of 8608 segment 2 sequences showed that only 8.5% have been annotated for the presence of PB1-F2. Our analysis indicates that 75% of segment 2 sequences are likely to encode PB1-F2. Two major populations of PB1-F2 are of lengths 90 and 57 while minor populations include lengths 52, 63, 79, 81, 87, and 101. Additional possible populations include the lengths of 59, 69, 81, 95, and 106. Previously described sequences include only lengths 57, 87, and 90. We observed substantial variation in PB1-F2 sequences where certain variants show up to 35% difference to well-defined reference sequences. Therefore this dataset indicates that there are many more variants that need to be functionally characterized. Our web-accessible tool PB1-F2 Finder enables scanning of influenza sequences for potential PB1-F2 protein products. It provides an initial screen and annotation of PB1-F2 products. It is accessible at http://cvc.dfci.harvard.edu/pb1-f2.
    BMC Bioinformatics 11/2011; 12 Suppl 13:S6. · 2.75 Impact Factor
  • Source
    Article: Machine learning competition in immunology - Prediction of HLA class I binding peptides.
    Journal of immunological methods 11/2011; 374(1-2):1-4. · 2.35 Impact Factor
  • Article: Dana-Farber repository for machine learning in immunology.
    [show abstract] [hide abstract]
    ABSTRACT: The immune system is characterized by high combinatorial complexity that necessitates the use of specialized computational tools for analysis of immunological data. Machine learning (ML) algorithms are used in combination with classical experimentation for the selection of vaccine targets and in computational simulations that reduce the number of necessary experiments. The development of ML algorithms requires standardized data sets, consistent measurement methods, and uniform scales. To bridge the gap between the immunology community and the ML community, we designed a repository for machine learning in immunology named Dana-Farber Repository for Machine Learning in Immunology (DFRMLI). This repository provides standardized data sets of HLA-binding peptides with all binding affinities mapped onto a common scale. It also provides a list of experimentally validated naturally processed T cell epitopes derived from tumor or virus antigens. The DFRMLI data were preprocessed and ensure consistency, comparability, detailed descriptions, and statistically meaningful sample sizes for peptides that bind to various HLA molecules. The repository is accessible at http://bio.dfci.harvard.edu/DFRMLI/.
    Journal of immunological methods 07/2011; 374(1-2):18-25. · 2.35 Impact Factor
  • Source
    Article: Transcriptionally abundant major histocompatibility complex class I alleles are fundamental to nonhuman primate simian immunodeficiency virus-specific CD8+ T cell responses.
    [show abstract] [hide abstract]
    ABSTRACT: Simian immunodeficiency virus (SIV)-infected macaques are the preferred animal model for human immunodeficiency virus (HIV) vaccines that elicit CD8(+) T cell responses. Unlike humans, whose CD8(+) T cell responses are restricted by a maximum of six HLA class I alleles, macaques express up to 20 distinct major histocompatibility complex class I (MHC-I) sequences. Interestingly, only a subset of macaque MHC-I sequences are transcriptionally abundant in peripheral blood lymphocytes. We hypothesized that highly transcribed MHC-I sequences are principally responsible for restricting SIV-specific CD8(+) T cell responses. To examine this hypothesis, we measured SIV-specific CD8(+) T cell responses in MHC-I homozygous Mauritian cynomolgus macaques. Each of eight CD8(+) T cell responses defined by full-proteome gamma interferon (IFN-γ) enzyme-linked immunospot (ELISPOT) assay were restricted by four of the five transcripts that are transcriptionally abundant (>1% of total MHC-I transcripts in peripheral blood lymphocytes). The five transcriptionally rare transcripts shared by these animals did not restrict any detectable CD8(+) T cell responses. Further, seven CD8(+) T cell responses were defined by identifying peptide binding motifs of the three most frequent MHC-I transcripts on the M3 haplotype. Combined, these results suggest that transcriptionally abundant MHC-I transcripts are principally responsible for restricting SIV-specific CD8(+) T cell responses. Thus, only a subset of the thousands of known MHC-I alleles in macaques should be prioritized for CD8(+) T cell epitope characterization.
    Journal of Virology 01/2011; 85(7):3250-61. · 5.40 Impact Factor
  • Article: Data processing and analysis for protein microarrays.
    [show abstract] [hide abstract]
    ABSTRACT: Protein microarrays are a high-throughput technology capable of generating large quantities of proteomics data. They can be used for general research or for clinical diagnostics. Bioinformatics and statistical analysis techniques are required for interpretation and reaching biologically relevant conclusions from raw data. We describe essential algorithms for processing protein microarray data, including spot-finding on slide images, Z score, and significance analysis of microarrays (SAM) calculations, as well as the concentration dependent analysis (CDA). We also describe available tools for protein microarray analysis, and provide a template for a step-by-step approach to performing an analysis centered on the CDA method. We conclude with a discussion of fundamental and practical issues and considerations.
    Methods in molecular biology (Clifton, N.J.) 01/2011; 723:337-47.
  • Article: Database resources for proteomics-based analysis of cancer.
    Guang Lan Zhang, David S DeLuca, Vladimir Brusic
    [show abstract] [hide abstract]
    ABSTRACT: Biological/bioinformatics databases are essential for medical and biological studies. They integrate and organize biologically related information in a structured format and provide researchers with easy access to a variety of relevant data. This review presents an overview of publicly available databases relevant to proteomics studies in cancer research. They include gene/protein expression databases, gene mutation and single nucleotide polymorphisms databases, tumor antigen databases, protein-protein interaction, and biological pathway databases. Automated information retrieval from these databases enables efficient large-scale proteomics data analysis.
    Methods in molecular biology (Clifton, N.J.) 01/2011; 723:349-64.
  • Article: FLAVIdB: A data mining system for knowledge discovery in flaviviruses with direct applications in immunology and vaccinology.
    [show abstract] [hide abstract]
    ABSTRACT: The flavivirus genus is unusually large, comprising more than 70 species, of which more than half are known human pathogens. It includes a set of clinically relevant infectious agents such as dengue, West Nile, yellow fever, and Japanese encephalitis viruses. Although these pathogens have been studied exten-sively, safe and efficient vaccines lack for the majority of the flaviviruses. We have assembled a database that combines antigenic data of flaviviruses, specialized analysis tools, and workflows for automated complex analyses focusing on applications in immunology and vaccinology. FLAVIdB contains 12,858 entries of flavivirus antigen sequences, 184 verified T-cell epitopes, 201 verified B-cell epitopes, and 4 representative molecular structures of the dengue virus envelope protein. FLAVIdB was assembled by collection, annotation, and integration of data from GenBank, GenPept, UniProt, IEDB, and PDB. The data were subject to extensive quality control (redundancy elimination, error detection, and vocabulary consolidation). Further annotation of selected functionally relevant features was performed by organizing information extracted from the literature. The database was incorporated into a web-accessible data mining system, combining specialized data analysis tools for integrated analysis of relevant data categories (protein sequences, macromolecular structures, and immune epitopes). The data mining system includes tools for variability and conservation analysis, T-cell epitope prediction, and characterization of neutralizing components of B-cell epitopes. FLAVIdB is accessible at cvc.dfci.harvard.edu/flavi/ FLAVIdB represents a new generation of databases in which data and tools are integrated into a data min-ing infrastructures specifically designed to aid rational vaccine design by discovery of vaccine targets.
    Immunome Research 01/2011; 7(3):1-9.
  • Source
    Article: Conservation analysis of dengue virus T-cell epitope-based vaccine candidates using Peptide block entropy.
    [show abstract] [hide abstract]
    ABSTRACT: Broad coverage of the pathogen population is particularly important when designing CD8+ T-cell epitope vaccines against viral pathogens. Traditional approaches are based on combinations of highly conserved T-cell epitopes. Peptide block entropy analysis is a novel approach for assembling sets of broadly covering antigens. Since T-cell epitopes are recognized as peptides rather than individual residues, this method is based on calculating the information content of blocks of peptides from a multiple sequence alignment of homologous proteins rather than using the information content of individual residues. The block entropy analysis provides broad coverage of variant antigens. We applied the block entropy analysis method to the proteomes of the four serotypes of dengue virus (DENV) and found 1,551 blocks of 9-mer peptides, which cover 99% of available sequences with five or fewer unique peptides. In contrast, the benchmark study by Khan et al. (2008) resulted in 165 conserved 9-mer peptides. Many of the conserved blocks are located consecutively in the proteins. Connecting these blocks resulted in 78 conserved regions. Of the 1551 blocks of 9-mer peptides 110 comprised predicted HLA binder sets. In total, 457 subunit peptides that encompass the diversity of all sequenced DENV strains of which 333 are T-cell epitope candidates.
    Frontiers in immunology. 01/2011; 2:69.
  • Article: MULTIPRED2: a computational system for large-scale identification of peptides predicted to bind to HLA supertypes and alleles.
    [show abstract] [hide abstract]
    ABSTRACT: MULTIPRED2 is a computational system for facile prediction of peptide binding to multiple alleles belonging to human leukocyte antigen (HLA) class I and class II DR molecules. It enables prediction of peptide binding to products of individual HLA alleles, combination of alleles, or HLA supertypes. NetMHCpan and NetMHCIIpan are used as prediction engines. The 13 HLA Class I supertypes are A1, A2, A3, A24, B7, B8, B27, B44, B58, B62, C1, and C4. The 13 HLA Class II DR supertypes are DR1, DR3, DR4, DR6, DR7, DR8, DR9, DR11, DR12, DR13, DR14, DR15, and DR16. In total, MULTIPRED2 enables prediction of peptide binding to 1077 variants representing 26 HLA supertypes. MULTIPRED2 has visualization modules for mapping promiscuous T-cell epitopes as well as those regions of high target concentration - referred to as T-cell epitope hotspots. Novel graphic representations are employed to display the predicted binding peptides and immunological hotspots in an intuitive manner and also to provide a global view of results as heat maps. Another function of MULTIPRED2, which has direct relevance to vaccine design, is the calculation of population coverage. Currently it calculates population coverage in five major groups in North America. MULTIPRED2 is an important tool to complement wet-lab experimental methods for identification of T-cell epitopes. It is available at http://cvc.dfci.harvard.edu/multipred2/.
    Journal of immunological methods 12/2010; 374(1-2):53-61. · 2.35 Impact Factor
  • Source
    Article: Hotspot Hunter: a computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes.
    [show abstract] [hide abstract]
    ABSTRACT: T-cell epitopes that promiscuously bind to multiple alleles of a human leukocyte antigen (HLA) supertype are prime targets for development of vaccines and immunotherapies because they are relevant to a large proportion of the human population. The presence of clusters of promiscuous T-cell epitopes, immunological hotspots, has been observed in several antigens. These clusters may be exploited to facilitate the development of epitope-based vaccines by selecting a small number of hotspots that can elicit all of the required T-cell activation functions. Given the large size of pathogen proteomes, including of variant strains, computational tools are necessary for automated screening and selection of immunological hotspots. Hotspot Hunter is a web-based computational system for large-scale screening and selection of candidate immunological hotspots in pathogen proteomes through analysis of antigenic diversity. It allows screening and selection of hotspots specific to four common HLA supertypes, namely HLA class I A2, A3, B7 and class II DR. The system uses Artificial Neural Network and Support Vector Machine methods as predictive engines. Soft computing principles were employed to integrate the prediction results produced by both methods for robust prediction performance. Experimental validation of the predictions showed that Hotspot Hunter can successfully identify majority of the real hotspots. Users can predict hotspots from a single protein sequence, or from a set of aligned protein sequences representing pathogen proteome. The latter feature provides a global view of the localizations of the hotspots in the proteome set, enabling analysis of antigenic diversity and shift of hotspots across protein variants. The system also allows the integration of prediction results of the four supertypes for identification of hotspots common across multiple supertypes. The target selection feature of the system shortlists candidate peptide hotspots for the formulation of an epitope-based vaccine that could be effective against multiple variants of the pathogen and applicable to a large proportion of the human population. Hotspot Hunter is publicly accessible at http://antigen.i2r.a-star.edu.sg/hh/. It is a new generation computational tool aiding in epitope-based vaccine design.
    BMC Bioinformatics 02/2008; 9 Suppl 1:S19. · 2.75 Impact Factor
  • Source
    Article: Conservation and variability of dengue virus proteins: implications for vaccine design.
    [show abstract] [hide abstract]
    ABSTRACT: Genetic variation and rapid evolution are hallmarks of RNA viruses, the result of high mutation rates in RNA replication and selection of mutants that enhance viral adaptation, including the escape from host immune responses. Variability is uneven across the genome because mutations resulting in a deleterious effect on viral fitness are restricted. RNA viruses are thus marked by protein sites permissive to multiple mutations and sites critical to viral structure-function that are evolutionarily robust and highly conserved. Identification and characterization of the historical dynamics of the conserved sites have relevance to multiple applications, including potential targets for diagnosis, and prophylactic and therapeutic purposes. We describe a large-scale identification and analysis of evolutionarily highly conserved amino acid sequences of the entire dengue virus (DENV) proteome, with a focus on sequences of 9 amino acids or more, and thus immune-relevant as potential T-cell determinants. DENV protein sequence data were collected from the NCBI Entrez protein database in 2005 (9,512 sequences) and again in 2007 (12,404 sequences). Forty-four (44) sequences (pan-DENV sequences), mainly those of nonstructural proteins and representing approximately 15% of the DENV polyprotein length, were identical in 80% or more of all recorded DENV sequences. Of these 44 sequences, 34 ( approximately 77%) were present in >or=95% of sequences of each DENV type, and 27 ( approximately 61%) were conserved in other Flaviviruses. The frequencies of variants of the pan-DENV sequences were low (0 to approximately 5%), as compared to variant frequencies of approximately 60 to approximately 85% in the non pan-DENV sequence regions. We further showed that the majority of the conserved sequences were immunologically relevant: 34 contained numerous predicted human leukocyte antigen (HLA) supertype-restricted peptide sequences, and 26 contained T-cell determinants identified by studies with HLA-transgenic mice and/or reported to be immunogenic in humans. Forty-four (44) pan-DENV sequences of at least 9 amino acids were highly conserved and identical in 80% or more of all recorded DENV sequences, and the majority were found to be immune-relevant by their correspondence to known or putative HLA-restricted T-cell determinants. The conservation of these sequences through the entire recorded DENV genetic history supports their possible value for diagnosis, prophylactic and/or therapeutic applications. The combination of bioinformatics and experimental approaches applied herein provides a framework for large-scale and systematic analysis of conserved and variable sequences of other pathogens, in particular, for rapidly mutating viruses, such as influenza A virus and HIV.
    PLoS Neglected Tropical Diseases 01/2008; 2(8):e272. · 4.69 Impact Factor
  • Source
    Article: Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research.
    [show abstract] [hide abstract]
    ABSTRACT: Initiation and regulation of immune responses in humans involves recognition of peptides presented by human leukocyte antigen class II (HLA-II) molecules. These peptides (HLA-II T-cell epitopes) are increasingly important as research targets for the development of vaccines and immunotherapies. HLA-II peptide binding studies involve multiple overlapping peptides spanning individual antigens, as well as complete viral proteomes. Antigen variation in pathogens and tumor antigens, and extensive polymorphism of HLA molecules increase the number of targets for screening studies. Experimental screening methods are expensive and time consuming and reagents are not readily available for many of the HLA class II molecules. Computational prediction methods complement experimental studies, minimize the number of validation experiments, and significantly speed up the epitope mapping process. We collected test data from four independent studies that involved 721 peptide binding assays. Full overlapping studies of four antigens identified binding affinity of 103 peptides to seven common HLA-DR molecules (DRB1*0101, 0301, 0401, 0701, 1101, 1301, and 1501). We used these data to analyze performance of 21 HLA-II binding prediction servers accessible through the WWW. Because not all servers have predictors for all tested HLA-II molecules, we assessed a total of 113 predictors. The length of test peptides ranged from 15 to 19 amino acids. We tried three prediction strategies - the best 9-mer within the longer peptide, the average of best three 9-mer predictions, and the average of all 9-mer predictions within the longer peptide. The best strategy was the identification of a single best 9-mer within the longer peptide. Overall, measured by the receiver operating characteristic method (AROC), 17 predictors showed good (AROC > 0.8), 41 showed marginal (AROC > 0.7), and 55 showed poor performance (AROC < 0.7). Good performance predictors included HLA-DRB1*0101 (seven), 1101 (six), 0401 (three), and 0701 (one). The best individual predictor was NETMHCIIPAN, closely followed by PROPRED, IEDB (Consensus), and MULTIPRED (SVM). None of the individual predictors was shown to be suitable for prediction of promiscuous peptides. Current predictive capabilities allow prediction of only 50% of actual T-cell epitopes using practical thresholds. The available HLA-II servers do not match prediction capabilities of HLA-I predictors. Currently available HLA-II prediction servers offer only a limited prediction accuracy and the development of improved predictors is needed for large-scale studies, such as proteome-wide epitope mapping. The requirements for accuracy of HLA-II binding predictions are stringent because of the substantial effect of false positives.
    BMC Bioinformatics 01/2008; 9 Suppl 12:S22. · 2.75 Impact Factor
  • Article: Prediction of supertype-specific HLA class I binding peptides using support vector machines.
    [show abstract] [hide abstract]
    ABSTRACT: Experimental approaches for identifying T-cell epitopes are time-consuming, costly and not applicable to the large scale screening. Computer modeling methods can help to minimize the number of experiments required, enable a systematic scanning for candidate major histocompatibility complex (MHC) binding peptides and thus speed up vaccine development. We developed a prediction system based on a novel data representation of peptide/MHC interaction and support vector machines (SVM) for prediction of peptides that promiscuously bind to multiple Human Leukocyte Antigen (HLA, human MHC) alleles belonging to a HLA supertype. Ten-fold cross-validation results showed that the overall performance of SVM models is improved in comparison to our previously published methods based on hidden Markov models (HMM) and artificial neural networks (ANN), also confirmed by blind testing. At specificity 0.90, sensitivity values of SVM models were 0.90 and 0.92 for HLA-A2 and -A3 dataset respectively. Average area under the receiver operating curve (A(ROC)) of SVM models in blind testing are 0.89 and 0.92 for HLA-A2 and -A3 datasets. A(ROC) of HLA-A2 and -A3 SVM models were 0.94 and 0.95, validated using a full overlapping study of 9-mer peptides from human papillomavirus type 16 E6 and E7 proteins. In addition, a large-scale experimental dataset has been used to validate HLA-A2 and -A3 SVM models. The SVM prediction models were integrated into a web-based computational system MULTIPRED1, accessible at antigen.i2r.a-star.edu.sg/multipred1/.
    Journal of Immunological Methods 04/2007; 320(1-2):143-54. · 2.20 Impact Factor
  • Article: AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins.
    [show abstract] [hide abstract]
    ABSTRACT: Assessment of potential allergenicity and patterns of cross-reactivity is necessary whenever novel proteins are introduced into human food chain. Current bioinformatic methods in allergology focus mainly on the prediction of allergenic proteins, with no information on cross-reactivity patterns among known allergens. In this study, we present AllerTool, a web server with essential tools for the assessment of predicted as well as published cross-reactivity patterns of allergens. The analysis tools include graphical representation of allergen cross-reactivity information; a local sequence comparison tool that displays information of known cross-reactive allergens; a sequence similarity search tool for assessment of cross-reactivity in accordance to FAO/WHO Codex alimentarius guidelines; and a method based on support vector machine (SVM). A 10-fold cross-validation results showed that the area under the receiver operating curve (A(ROC)) of SVM models is 0.90 with 86.00% sensitivity (SE) at specificity (SP) of 86.00%. Availability: AllerTool is freely available at http://research.i2r.a-star.edu.sg/AllerTool/.
    Bioinformatics 03/2007; 23(4):504-6. · 5.47 Impact Factor
  • Conference Proceeding: Performance Evaluation of MULTIEPD1 on Prediction of MHC Class I Binders
    [show abstract] [hide abstract]
    ABSTRACT: Identification of T-cell epitopes (parts of antigenic proteins to which the T-cells receptor respond) is important in the development of vaccines and immunotherapeutics. We developed MULTIPRED1 (http://antigen.i2r.a-star.edu.sg/multipred1/), a Web-based computational system for prediction of peptides (protein fragments) that bind multiple related human leukocyte antigen (HLA) molecules (the human major histocompatibility complex - MHC molecules). In this paper, the performance of MULTIPRED1 in predicting individual 9-mer binders to HLA-A2 and A3 molecules was compared with five other publicly available prediction tools, SFYPEITHI, BIMAS, SMM, RANKPEP and SVMHC. The results show that MULTIPRED1 is both sensitive and specific for prediction of binders to individual HLA alleles and demonstrates comparable accuracy as those of other prediction tools. Majority voting was applied to combine the strength of the three prediction models of MULTIPRED1 and results indicate that better prediction performance can be achieved. MULTIPRED1 is useful in the selection of key antigenic regions to minimize the number of experiments required for mapping of promiscuous T-cell epitopes.
    Biomedical and Pharmaceutical Engineering, 2006. ICBPE 2006. International Conference on; 01/2007
  • Article: PREDNOD, a prediction server for peptide binding to the H-2g7 haplotype of the non-obese diabetic mouse.
    [show abstract] [hide abstract]
    ABSTRACT: The non-obese diabetic (NOD) mouse is a widely used animal model for study of autoimmune diseases, in particular human type 1 diabetes mellitus (T1DM). Identification of the subset of peptides that bind MHC molecules comprising the H-2g7 haplotype of NOD mouse and thereby representing potential NOD T-cell epitopes is important for research into the pathogenesis and immunotherapy of T1DM. The H-2g7 haplotype comprises the MHC class-I molecules Kd and Db and a single class-II molecule I-Ag7. We have developed a prediction system, PREDNOD, for accurate identification of peptides that bind the MHC molecules constituting the H-2g7 haplotype. PREDNOD is accessible at http://antigen.i2r.a-star.edu.sg/Ag7.
    Autoimmunity 01/2007; 39(8):645-50. · 2.47 Impact Factor
  • Article: Prediction of HLA-DQ3.2beta ligands: evidence of multiple registers in class II binding peptides.
    [show abstract] [hide abstract]
    ABSTRACT: While processing of MHC class II antigens for presentation to helper T-cells is essential for normal immune response, it is also implicated in the pathogenesis of autoimmune disorders and hypersensitivity reactions. Sequence-based computational techniques for predicting HLA-DQ binding peptides have encountered limited success, with few prediction techniques developed using three-dimensional models. We describe a structure-based prediction model for modeling peptide-DQ3.2beta complexes. We have developed a rapid and accurate protocol for docking candidate peptides into the DQ3.2beta receptor and a scoring function to discriminate binders from the background. The scoring function was rigorously trained, tested and validated using experimentally verified DQ3.2beta binding and non-binding peptides obtained from biochemical and functional studies. Our model predicts DQ3.2beta binding peptides with high accuracy [area under the receiver operating characteristic (ROC) curve A(ROC) > 0.90], compared with experimental data. We investigated the binding patterns of DQ3.2beta peptides and illustrate that several registers exist within a candidate binding peptide. Further analysis reveals that peptides with multiple registers occur predominantly for high-affinity binders.
    Bioinformatics 06/2006; 22(10):1232-8. · 5.47 Impact Factor
  • Chapter: Extreme Learning Machine for Predicting HLA-Peptide Binding
    [show abstract] [hide abstract]
    ABSTRACT: Machine learning techniques have been recognized as powerful tools for learning from data. One of the most popular learning techniques, the Back-Propagation (BP) Artificial Neural Networks, can be used as a computer model to predict peptides binding to the Human Leukocyte Antigens (HLA). The major advantage of computational screening is that it reduces the number of wet-lab experiments that need to be performed, significantly reducing the cost and time. A recently developed method, Extreme Learning Machine (ELM), which has superior properties over BP has been investigated to accomplish such tasks. In our work, we found that the ELM is as good as, if not better than, the BP in term of time complexity, accuracy deviations across experiments, and – most importantly – prevention from over-fitting for prediction of peptide binding to HLA.
    05/2006: pages 716-721;
  • Source
    Article: PRED(TAP): a system for prediction of peptide binding to the human transporter associated with antigen processing.
    [show abstract] [hide abstract]
    ABSTRACT: The transporter associated with antigen processing (TAP) is a critical component of the major histocompatibility complex (MHC) class I antigen processing and presentation pathway. TAP transports antigenic peptides into the endoplasmic reticulum where it loads them into the binding groove of MHC class I molecules. Because peptides must first be transported by TAP in order to be presented on MHC class I, TAP binding preferences should impact significantly on T-cell epitope selection. PRED(TAP) is a computational system that predicts peptide binding to human TAP. It uses artificial neural networks and hidden Markov models as predictive engines. Extensive testing was performed to valid the prediction models. The results showed that PRED(TAP) was both sensitive and specific and had good predictive ability (area under the receiver operating characteristic curve Aroc>0.85). PRED(TAP) can be integrated with prediction systems for MHC class I binding peptides for improved performance of in silico prediction of T-cell epitopes. PRED(TAP) is available for public use at [1].
    Immunome Research 01/2006; 2:3.

Institutions

  • 2008–2011
    • Dana-Farber Cancer Institute
      • • Department of Medical Oncology
      • • Cancer Vaccine Center
      Boston, MA, USA
  • 2005–2006
    • Institute for Infocomm Research
      Singapore, Singapore
    • Nanyang Technological University
      • School of Computer Engineering
      Singapore, Singapore