Remediation of the Protein Data Bank archive

MSD-EBI, EMBL Outstation-Hinxton, Cambridge CB10 1SD, UK.
Nucleic Acids Research (Impact Factor: 9.11). 02/2008; 36(Database issue):D426-33. DOI: 10.1093/nar/gkm937
Source: PubMed


The Worldwide Protein Data Bank (wwPDB; is the international collaboration that manages the deposition, processing and distribution of the PDB archive. The online PDB archive at is the repository for the coordinates and related information for more than 47 000 structures, including proteins, nucleic acids and large macromolecular complexes that have been determined using X-ray crystallography, NMR and electron microscopy techniques. The members of the wwPDB-RCSB PDB (USA), MSD-EBI (Europe), PDBj (Japan) and BMRB (USA)-have remediated this archive to address inconsistencies that have been introduced over the years. The scope and methods used in this project are presented.

Download full-text


Available from: Chisa Kamada, Oct 10, 2015
30 Reads
  • Source
    • "Atoms names for standard amino acids and nucleotides follow IUPAC recommendations (IUPAC Commission on Macromolecular Nomenclature, 1979) with the exception of the well-established convention for C-terminal atoms OXT and HXT. In early PDB entries, an alternative atom nomenclature was used and this prior atom nomenclature was also included in definitions where the nomenclature has changed (Henrick et al., 2008). For standard amino acids, additional molecular definitions have been created to specify common protonation variants. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Chemical Component Dictionary (CCD) is a chemical reference data resource that describes all residue and small molecule components found in Protein Data Bank (PDB) entries. The CCD contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors, systematic chemical names and idealized coordinates. The content, preparation, validation and distribution of this CCD chemical reference dataset are described. Availability and implementation: The CCD is updated regularly in conjunction with the scheduled weekly release of new PDB structure data. The CCD and amino acid variant reference datasets are hosted in the public PDB ftp repository at,, and its mirror sites, and can be accessed from Supplementary information: Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email:
    Bioinformatics 12/2014; 31(8). DOI:10.1093/bioinformatics/btu789 · 4.98 Impact Factor
  • Source
    • "If mutation information is available for a protein sequence, links to the details are provided in the cross-references section. Additionally, cross-references to various other databases, including PDB (102), UniProtKB (103), the UCSC Genome Browser (104), EBI’s InterPro (105), PharmGKB (106) and SFLD (36) are given. Other ModBase pages provide overviews of more than one sequence or structure. "
    [Show abstract] [Hide abstract]
    ABSTRACT: ModBase ( is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment ( ModBase currently contains almost 30 million reliable models for domains in 4.7 million unique protein sequences. ModBase allows users to compute or update comparative models on demand, through an interface to the ModWeb modeling server ( ModBase models are also available through the Protein Model Portal ( Recently developed associated resources include the AllosMod server for modeling ligand-induced protein dynamics (, the AllosMod-FoXS server for predicting a structural ensemble that fits an SAXS profile (, the FoXSDock server for protein-protein docking filtered by an SAXS profile (, the SAXS Merge server for automatic merging of SAXS profiles ( and the Pose & Rank server for scoring protein-ligand complexes ( In this update, we also highlight two applications of ModBase: a PSI:Biology initiative to maximize the structural coverage of the human alpha-helical transmembrane proteome and a determination of structural determinants of human immunodeficiency virus-1 protease specificity.
    Nucleic Acids Research 11/2013; 42(Database issue). DOI:10.1093/nar/gkt1144 · 9.11 Impact Factor
  • Source
    • "The original SIFTS procedure focused on standardization of taxonomy information in the PDB based on the NCBI taxonomy database, and on adding cross-references to UniProtKB for all the protein sequences in the PDB that are present in the UniProt database. The improved cross-references were fed back into the PDB archival files and these consistent data were then made available as part of the first PDB archive remediation (15). The wwPDB annotation procedures were also modified and now use the SIFTS methodology and rules to assign taxonomy and UniProtKB cross-references for newly deposited PDB entries. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The Structure Integration with Function, Taxonomy and Sequences resource (SIFTS; is a close collaboration between the Protein Data Bank in Europe (PDBe) and UniProt. The two teams have developed a semi-automated process for maintaining up-to-date cross-reference information to UniProt entries, for all protein chains in the PDB entries present in the UniProt database. This process is carried out for every weekly PDB release and the information is stored in the SIFTS database. The SIFTS process includes cross-references to other biological resources such as Pfam, SCOP, CATH, GO, InterPro and the NCBI taxonomy database. The information is exported in XML format, one file for each PDB entry, and is made available by FTP. Many bioinformatics resources use SIFTS data to obtain cross-references between the PDB and other biological databases so as to provide their users with up-to-date information.
    Nucleic Acids Research 11/2012; 41(Database issue). DOI:10.1093/nar/gks1258 · 9.11 Impact Factor
Show more