Download full-text

Full-text

Available from: Gajendra Pal Singh Raghava
  • Source
    • "However, NetMHCpan is distinguished from NetMHC in that it is a ‘pan’ method; that is, it leverages binding data across different MHC molecules to make predictions, even for those MHCs with no previous experimental characterizations. A number of papers reporting their predictive performances have been published [9, 10, 23, 24]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set. Results We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates. Conclusion It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-241) contains supplementary material, which is available to authorized users.
    Full-text · Article · Jul 2014 · BMC Bioinformatics
  • Source
    • "The front page of BlockLogo with an example of input for the visualization of region 220–229 of MSA of influenza A HA. Numbering is relative to the MSA alignment position and the input in this example is in the FASTA format. 2008a, 2008b; Zhang et al. 2011). When the HLA binding prediction option is selected, netMHC is used to predict HLA class I binders if the selected block is of length 8–11, and for HLA class II binders if the selected block is of length 13–25. "
    [Show abstract] [Hide abstract]
    ABSTRACT: BlockLogo is a web-server application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://met-hilab.bu.edu/blocklogo/
    Full-text · Article · Aug 2013 · Journal of immunological methods
  • Source
    • "against HLA A and B molecules) (4). Of special note, an MHC class I prediction competition was held recently for the first time (9). Tested on blind peptide:MHC binding datasets generated by an independent group, the consensus method hosted at the IEDB-AR has consistently ranked high (i.e. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The immune epitope database analysis resource (IEDB-AR: http://tools.iedb.org) is a collection of tools for prediction and analysis of molecular targets of T- and B-cell immune responses (i.e. epitopes). Since its last publication in the NAR webserver issue in 2008, a new generation of peptide:MHC binding and T-cell epitope predictive tools have been added. As validated by different labs and in the first international competition for predicting peptide:MHC-I binding, their predictive performances have improved considerably. In addition, a new B-cell epitope prediction tool was added, and the homology mapping tool was updated to enable mapping of discontinuous epitopes onto 3D structures. Furthermore, to serve a wider range of users, the number of ways in which IEDB-AR can be accessed has been expanded. Specifically, the predictive tools can be programmatically accessed using a web interface and can also be downloaded as software packages.
    Full-text · Article · May 2012 · Nucleic Acids Research
Show more