PISCES: recent improvements to a PDB sequence culling server

Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA.
Nucleic Acids Research (Impact Factor: 9.11). 08/2005; 33(Web Server issue):W94-8. DOI: 10.1093/nar/gki402
Source: PubMed


PISCES is a database server for producing lists of sequences from the Protein Data Bank (PDB) using a number of entry- and chain-specific criteria and mutual sequence identity. Our goal in culling the PDB is to provide the longest list possible of the highest resolution structures that fulfill the sequence identity and structural quality cut-offs. The new PISCES server uses a combination of PSI-BLAST and structure-based alignments to determine sequence identities. Structure alignment produces more complete alignments and therefore more accurate sequence identities than PSI-BLAST. PISCES now allows a user to cull the PDB by-entry in addition to the standard culling by individual chains. In this scenario, a list will contain only entries that do not have a chain that has a sequence identity to any chain in any other entry in the list over the sequence identity cut-off. PISCES also provides fully annotated sequences including gene name and species. The server allows a user to cull an input list of entries or chains, so that other criteria, such as function, can be used. Results from a search on the re-engineered RCSB's site for the PDB can be entered into the PISCES server by a single click, combining the powerful searching abilities of the PDB with PISCES's utilities for sequence culling. The server's data are updated weekly. The server is available at

Download full-text


Available from: Roland Dunbrack
    • "Strong support for the validity of the explicit hydrogen representation is that this model is able to reproduce the observed side chain dihedral angle distributions of residues in protein cores, whereas the extended atom representation does not. To calculate the packing fraction of protein cores, we use the 'Dunbrack database' of high resolution protein crystal structures, which is composed of 221 proteins with resolution ≤ 1.0 ˚ A, side chain B-factors per residue ≤ 30Å 30Å 2 , and R-factor ≤ 0.2[10,11]. In prior studies, we showed that hard-sphere models of dipeptide mimetics with explicit hydrogens can recapitulate the side chain dihedral angle distributions observed in protein crystal structures1213141516. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Shortly after the determination of the first protein x-ray crystal structures, researchers analyzed their cores and reported packing fractions $\phi \approx 0.75$, a value that is similar to close packing equal-sized spheres. A limitation of these analyses was the use of `extended atom' models, rather than the more physically accurate `explicit hydrogen' model. The validity of using the explicit hydrogen model is proved by its ability to predict the side chain dihedral angle distributions observed in proteins. We employ the explicit hydrogen model to calculate the packing fraction of the cores of over $200$ high resolution protein structures. We find that these protein cores have $\phi \approx 0.55$, which is comparable to random close-packing of non-spherical particles. This result provides a deeper understanding of the physical basis of protein structure that will enable predictions of the effects of amino acid mutations and design of new functional proteins.
    No preview · Article · Oct 2015
  • Source
    • "We structurally aligned the various structures of each protein chain using Kabsch's algorithm (Kabsch, 1978), on all the Ca atoms of the proteins. For Figure 1, we ran PISCES (Wang and Dunbrack, 2005) to remove redundancy within the set of 56,255 protein chains that each have more than one available conformation. By default, when PISCES recognizes two proteins with sequence identity larger than the threshold, it removes the structure of lower quality. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein function involves conformational changes, but often, for a given protein, only some of these conformations are known. The missing conformations could be predicted using the wealth of data in the PDB. Most PDB proteins have multiple structures, and proteins sharing one similar conformation often share others as well. The ConTemplate web server ( exploits these observations to suggest conformations for a query protein with at least one known conformation (or model thereof). We demonstrate ConTemplate on a ribose-binding protein that undergoes significant conformational changes upon substrate binding. Querying ConTemplate with the ligand-free (or bound) structure of the protein produces the ligand-bound (or free) conformation with a root-mean-square deviation of 1.7 Å (or 2.2 Å); the models are derived from conformations of other sugar-binding proteins, sharing approximately 30% sequence identity with the query. The calculation also suggests intermediate conformations and a pathway between the bound and free conformations.
    Full-text · Article · Oct 2015 · Structure
  • Source
    • "Our cases of interest were all of those annotated in UniProtKB/Swiss-Prot under their FT lines with the key name MOD_RES or LIPID, both related to posttranslational modifications of single residues that have been systematically categorized by UniProtKB in their controlled vocabulary of posttranslational modifications. To test the coverage of these sites by sequence segments with known 3D structures, we searched for significant BLAST+ [24] hits (E-value < 0.001) in the protein structure database of non-redundant sequences (pdbaanr from 24 November 2013) provided by the Dunbrack Lab [25]. Instead of limiting ourselves to PDB structures of known entries, BLAST+ was used in order to enlarge coverage , ensuring that proteins that did not have their structure experimentally solved, but are statistically significantly related to proteins of known structures, are taken into consideration in our analysis. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many protein posttranslational modifications (PTMs) are the result of an enzymatic reaction. The modifying enzyme has to recognize the substrate protein's sequence motif containing the residue(s) to be modified; thus, the enzyme's catalytic cleft engulfs these residue(s) and the respective sequence environment. This residue accessibility condition principally limits the range where enzymatic PTMs can occur in the protein sequence. Non-globular, flexible, intrinsically disordered segments or large loops/accessible long side chains should be preferred whereas residues buried in the core of structures should be void of what we call canonical, enzyme-generated PTMs. We investigate whether PTM sites annotated in UniProtKB (with MOD_RES /LIPID keys) are situated within sequence ranges that can be mapped to known 3D structures. We find that N- or C-termini harbor essentially exclusively canonical PTMs. We also find that the overwhelming majority of all other PTMs are also canonical though, later in the protein's life cycle, the PTM sites can become buried due to complex formation. Among the remaining cases, some can be explained (i) with autocatalysis, (ii) with modification before folding or after temporary unfolding, or (iii) as products of interaction with small, diffusible reactants. Others require further research how these PTMs are mechanistically generated in vivo. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
    Full-text · Article · Jun 2015 · Proteomics
Show more