Protein segment finder: An online search engine for segment motifs in the PDB

Department of Structural Biology, Stanford University, Stanford, CA 94305, USA.
Nucleic Acids Research (Impact Factor: 9.11). 11/2008; 37(Database issue):D224-8. DOI: 10.1093/nar/gkn833
Source: PubMed


Finding related conformations in the Protein Data Bank (PDB) is essential in many areas of bioscience. To assist this task, we designed a search engine that uses a compact database to quickly identify protein segments obeying a set of primary, secondary and tertiary structure constraints. The database contains information such as amino acid sequence, secondary structure, disulfide bonds, hydrogen bonds and atoms in contact as calculated from all protein structures in the PDB. The search engine parses the database and returns hits that match the queried parameters. The conformation search engine, which is notable for its high speed and interactive feedback, is expected to assist scientists in discovering conformation homologs and predicting protein structure. The engine is publicly available at and it will also be used in-house in an automatic mode aimed at discovering new protein motifs.

Download full-text


Available from: Michael Levitt, Aug 13, 2014
  • Source
    • "At a more local level, fragment comparison and identification has become a key step for protein structure analysis, annotation and modeling. Fragment similarities reveal functionally important residues (Tendulkar et al., 2010), similar structural motifs may indicate function preservation in remote homologs (Manikandan et al., 2008), and more generally, recurring fragments may be used as building blocks to the construction of de novo models of protein structures (Bystroff et al., 1996; Friedberg and Godzik, 2005; Samson and Levitt, 2009; Unger et al., 1989). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Meaningful scores to assess protein structure similarity are essential to decipher protein structure and sequence evolution. The mining of the increasing number of protein structures requires fast and accurate similarity measures with statistical significance. Whereas numerous approaches have been proposed for protein domains as a whole, the focus is progressively moving to a more local level of structure analysis for which similarity measurement still remains without any satisfactory answer. We introduce a new score based on Binet-Cauchy kernel. It is normalized and bounded between 1 - maximal similarity which implies exactly the same conformations for protein fragments- and - 1 - mirror image conformations, the unrelated conformations having a null mean score. This allows for the search of both similar and mirror conformations. In addition, such score addresses two major issue of the widely used Root Mean Square deviation (RMSD). Firstly, it achieves length independent statistics even for short fragments. Secondly, it shows better performance in the discrimination of medium range RMSD values. Being simpler and faster to compute than the RMSD, it also provides the means for large scale mining of protein structures. The computer software implementing the score is available at CONTACT:
    Bioinformatics 10/2013; 30(6). DOI:10.1093/bioinformatics/btt618 · 4.98 Impact Factor
  • Source
    • "Seeking for a ‘parts list’ of proteins—with α-helices and β-sheets as prime examples of common parts—fragment libraries have been constructed based on the similarity of the polypeptide backbone (5,6). These protein fragment libraries have been widely used for a range of applications such as structural comparison of protein folds through a simplified representation with fragments (7), homology modeling at the level of fragments (8,9), investigating sequence-to-structure relationships (10), approximating tertiary structure of proteins using fragments (11–14), loop prediction (15–17) or even novel fold prediction (18,19). "
    [Show abstract] [Hide abstract]
    ABSTRACT: High-resolution structures of proteins remain the most valuable source for understanding their function in the cell and provide leads for drug design. Since the availability of sufficient protein structures to tackle complex problems such as modeling backbone moves or docking remains a problem, alternative approaches using small, recurrent protein fragments have been employed. Here we present two databases that provide a vast resource for implementing such fragment-based strategies. The BriX database contains fragments from over 7000 non-homologous proteins from the Astral collection, segmented in lengths from 4 to 14 residues and clustered according to structural similarity, summing up to a content of 2 million fragments per length. To overcome the lack of loops classified in BriX, we constructed the Loop BriX database of non-regular structure elements, clustered according to end-to-end distance between the regular residues flanking the loop. Both databases are available online ( and can be accessed through a user-friendly web-interface. For high-throughput queries a web-based API is provided, as well as full database downloads. In addition, two exciting applications are provided as online services: (i) user-submitted structures can be covered on the fly with BriX classes, representing putative structural variation throughout the protein and (ii) gaps or low-confidence regions in these structures can be bridged with matching fragments.
    Nucleic Acids Research 10/2010; 39(Database issue):D435-42. DOI:10.1093/nar/gkq972 · 9.11 Impact Factor
  • Source
    • "Other databases exist to allow searching conformation alone [e.g. SPASM (8), Fragment Finder (9), Protein Segment Finder (10), Conformational Angles Database (11), PDBeMotif (12)], but even in this arena, the PGD offers a unique combination of convenience and flexibility. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The backbone bond lengths, bond angles, and planarity of a protein are influenced by the backbone conformation (varphi,Psi), but no tool exists to explore these relationships, leaving this area as a reservoir of untapped information about protein structure and function. The Protein Geometry Database (PGD) enables biologists to easily and flexibly query information about the conformation alone, the backbone geometry alone, and the relationships between them. The capabilities the PGD provides are valuable for assessing the uniqueness of observed conformational or geometric features in protein structure as well as discovering novel features and principles of protein structure. The PGD server is available at and the data and code underlying it are freely available to use and extend.
    Nucleic Acids Research 11/2009; 38(Database issue):D320-5. DOI:10.1093/nar/gkp1013 · 9.11 Impact Factor
Show more