Enhanced genome annotation using structural profiles in the program 3D-PSSM.

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, 44 Lincoln's Inn Fields, London, WC2A 3PX, England.
Journal of Molecular Biology (Impact Factor: 3.96). 07/2000; 299(2):499-520. DOI: 10.1006/jmbi.2000.3741
Source: PubMed

ABSTRACT A method (three-dimensional position-specific scoring matrix, 3D-PSSM) to recognise remote protein sequence homologues is described. The method combines the power of multiple sequence profiles with knowledge of protein structure to provide enhanced recognition and thus functional assignment of newly sequenced genomes. The method uses structural alignments of homologous proteins of similar three-dimensional structure in the structural classification of proteins (SCOP) database to obtain a structural equivalence of residues. These equivalences are used to extend multiply aligned sequences obtained by standard sequence searches. The resulting large superfamily-based multiple alignment is converted into a PSSM. Combined with secondary structure matching and solvation potentials, 3D-PSSM can recognise structural and functional relationships beyond state-of-the-art sequence methods. In a cross-validated benchmark on 136 homologous relationships unambiguously undetectable by position-specific iterated basic local alignment search tool (PSI-Blast), 3D-PSSM can confidently assign 18 %. The method was applied to the remaining unassigned regions of the Mycoplasma genitalium genome and an additional 13 regions were assigned with 95 % confidence. 3D-PSSM is available to the community as a web server:

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: One of the central problems in post-genomic era is the understanding of function of myriad of putative gene products suggested by the genome sequencing projects. Computational approaches aimed at establishing the relationships between proteins, purely on the basis of their amino acid sequences, provide a rapid and useful first step. Sequence analysis methods, which use evolutionary information on protein families perform well in terms of detecting remote homologues. Use of three-dimensional (3-D) structures provides a further edge in detecting distantly related proteins as 3-D structures are conserved better than the amino acid sequences. Also, in many cases, similarity in the fold of proteins corresponds to gross similarity in functions. Hence, knowledge of 3-D structures has profound influence in identifying the functions of newly discovered gene products. This review covers recent developments in this area of homology detection and its influence in computational genomics. Introduction A critical problem confronting the present era of genome revolution is assignment of function and structure to newly discovered proteins. The exploding rate at which genomes are sequenced is a formidable challenge for experimental scientists who attempt biological and biochemical characterization of proteins. The realm of rapid, preliminary assignment of protein function is, therefore, a challenging one and several procedures and strategies have been developed in the last decade to complement the pace of genome sequencing. Several effective experimental techniques are aimed to understand properties of proteins at the genomic scale. Microarray and protein expression profiles quantify transcription and translation of genes in an organism. Techniques, such as mass spectrometry and 2D-Gel, also serve as important tools for genome-wide analysis to characterize the gene products. Genome-wide yeast-two-hybrid analysis serves as a powerful tool to study interactions between proteins. While these techniques provide a variety of information about genes encoded in a genome, the biochemical or biological functional characterization of such proteins is still not available for most of the proteins in various organisms. Preliminary characterization and assignment of protein function is often performed by relating newly discovered proteins to those proteins whose structure and biochemical function are well known 1-6 . These similarities are deduced at the level of amino acid sequences, performing pair-wise string comparisons between the protein sequences by the process of protein homology detection⎯a central tool in genomics. Further, in-silico approaches to identify interacting proteins include Rossetta approach 7 , comparison of phylogenetic profiles 8 and chromosomal localization of genes 9 .
  • [Show abstract] [Hide abstract]
    ABSTRACT: A long standing problem in structural bioinformatics is to determine the three-dimensional (3-D) structure of a protein when only a sequence of amino acid residues is given. Many computational methodologies and algorithms have been proposed as a solution to the 3-D Protein Structure Prediction (3-D-PSP) problem. These methods can be divided in four main classes: (a) first principle methods without database information; (b) first principle methods with database information; (c) fold recognition and threading methods; and (d) comparative modeling methods and sequence alignment strategies. Deterministic computational techniques, optimization techniques, data mining and machine learning approaches are typically used in the construction of computational solutions for the PSP problem. Our main goal with this work is to review the methods and computational strategies that are currently used in 3-D protein prediction. Copyright © 2014 Elsevier Ltd. All rights reserved.
    Computational Biology and Chemistry 10/2014; · 1.60 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Publisher’s description: Volume one of this two volume sequence focuses on the basic characterization of known protein structures as well as structure prediction from protein sequence information. The 11 chapters provide an overview of the field, covering key topics in modeling, force fields, classification, computational methods, and struture prediction. Each chapter is a self contained review designed to cover (1) definition of the problem and an historical perspective, (2) mathematical or computational formulation of the problem, (3) computational methods and algorithms, (4) performance results, (5) existing software packages, and (6) strengths, pitfalls, challenges, and future research directions. Table of contents: Modeling protein structures. Empirical force fields. Knowledge-based energy functions for computational studies of proteins.