PISCES: recent improvements to a PDB sequence culling server

Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA.
Nucleic Acids Research (Impact Factor: 9.11). 08/2005; 33(Web Server issue):W94-8. DOI: 10.1093/nar/gki402
Source: PubMed


PISCES is a database server for producing lists of sequences from the Protein Data Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence identity. Our goal in culling the PDB is to provide the longest list possible of the highest resolution structures that fulfill the sequence identity and structural quality cut-offs. The new PISCES server uses a combination of PSI-BLAST and structure-based alignments to determine sequence identities. Structure alignment produces more complete alignments and therefore more accurate sequence identities than PSI-BLAST. PISCES now allows a user to cull the PDB by-entry in addition to the standard culling by individual chains. In this scenario, a list will contain only entries that do not have a chain that has a sequence identity to any chain in any other entry in the list over the sequence identity cut-off. PISCES also provides fully annotated sequences including gene name and species. The server allows a user to cull an input list of entries or chains, so that other criteria, such as function, can be used. Results from a search on the re-engineered RCSB's site for the PDB can be entered into the PISCES server by a single click, combining the powerful searching abilities of the PDB with PISCES's utilities for sequence culling. The server's data are updated weekly. The server is available at

Download full-text


Available from: Roland Dunbrack,
12 Reads
  • Source
    • "We structurally aligned the various structures of each protein chain using Kabsch's algorithm (Kabsch, 1978), on all the Ca atoms of the proteins. For Figure 1, we ran PISCES (Wang and Dunbrack, 2005) to remove redundancy within the set of 56,255 protein chains that each have more than one available conformation. By default, when PISCES recognizes two proteins with sequence identity larger than the threshold, it removes the structure of lower quality. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein function involves conformational changes, but often, for a given protein, only some of these conformations are known. The missing conformations could be predicted using the wealth of data in the PDB. Most PDB proteins have multiple structures, and proteins sharing one similar conformation often share others as well. The ConTemplate web server ( exploits these observations to suggest conformations for a query protein with at least one known conformation (or model thereof). We demonstrate ConTemplate on a ribose-binding protein that undergoes significant conformational changes upon substrate binding. Querying ConTemplate with the ligand-free (or bound) structure of the protein produces the ligand-bound (or free) conformation with a root-mean-square deviation of 1.7 Å (or 2.2 Å); the models are derived from conformations of other sugar-binding proteins, sharing approximately 30% sequence identity with the query. The calculation also suggests intermediate conformations and a pathway between the bound and free conformations.
    Structure 10/2015; DOI:10.1016/j.str.2015.08.018 · 5.62 Impact Factor
  • Source
    • "Our cases of interest were all of those annotated in UniProtKB/Swiss-Prot under their FT lines with the key name MOD_RES or LIPID, both related to posttranslational modifications of single residues that have been systematically categorized by UniProtKB in their controlled vocabulary of posttranslational modifications. To test the coverage of these sites by sequence segments with known 3D structures, we searched for significant BLAST+ [24] hits (E-value < 0.001) in the protein structure database of non-redundant sequences (pdbaanr from 24 November 2013) provided by the Dunbrack Lab [25]. Instead of limiting ourselves to PDB structures of known entries, BLAST+ was used in order to enlarge coverage , ensuring that proteins that did not have their structure experimentally solved, but are statistically significantly related to proteins of known structures, are taken into consideration in our analysis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many protein posttranslational modifications (PTMs) are the result of an enzymatic reaction. The modifying enzyme has to recognize the substrate protein's sequence motif containing the residue(s) to be modified; thus, the enzyme's catalytic cleft engulfs these residue(s) and the respective sequence environment. This residue accessibility condition principally limits the range where enzymatic PTMs can occur in the protein sequence. Non-globular, flexible, intrinsically disordered segments or large loops/accessible long side chains should be preferred whereas residues buried in the core of structures should be void of what we call canonical, enzyme-generated PTMs. We investigate whether PTM sites annotated in UniProtKB (with MOD_RES /LIPID keys) are situated within sequence ranges that can be mapped to known 3D structures. We find that N- or C-termini harbor essentially exclusively canonical PTMs. We also find that the overwhelming majority of all other PTMs are also canonical though, later in the protein's life cycle, the PTM sites can become buried due to complex formation. Among the remaining cases, some can be explained (i) with autocatalysis, (ii) with modification before folding or after temporary unfolding, or (iii) as products of interaction with small, diffusible reactants. Others require further research how these PTMs are mechanistically generated in vivo. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Proteomics 06/2015; 15(14). DOI:10.1002/pmic.201400633 · 3.81 Impact Factor
  • Source
    • "The first was all entries (228 in total) from the Disprot v5.0 database (Sickmeier, et al., 2007) flagged as being derived from either NMR or biophysical methods. The other dataset was a redundancyreduced (percentage sequence identity < 90%) subset of high resolution (<= 2.2 Å) X-ray structure chains from PDB (Berman, et al., 2000) derived from PISCES (Wang and Dunbrack, 2005) compiled in February 2010. Chains shorter than 25 amino acids were discarded. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: A sizeable fraction of eukaryotic proteins contain intrinsically disordered regions (IDRs), which act in unfolded states or by undergoing transitions between structured and unstructured conformations. Over time, sequence-based classifiers of IDRs have become fairly accurate and currently a major challenge is linking IDRs to their biological roles from the molecular to the systems level. Results: We describe DISOPRED3, which extends its predecessor with new modules to predict IDRs and protein-binding sites within them. Based on recent CASP evaluation results, DISOPRED3 can be regarded as state of the art in the identification of IDRs, and our self-assessment shows that it significantly improves over DISOPRED2 because its predictions are more specific across the whole board and more sensitive to IDRs longer than 20 amino acids. Predicted IDRs are annotated as protein binding through a novel SVM based classifier, which uses profile data and additional sequence-derived features. Based on benchmarking experiments with full cross-validation, we show that this predictor generates precise assignments of disordered protein binding regions and that it compares well with other publicly available tools.
    Bioinformatics 11/2014; 31(6). DOI:10.1093/bioinformatics/btu744 · 4.98 Impact Factor
Show more