Protein homologous cores and loops: Important clues to evolutionary relationships between structurally similar proteins

Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland 20894, USA.
BMC Structural Biology (Impact Factor: 1.18). 04/2007; 7(1):23. DOI: 10.1186/1472-6807-7-23
Source: PubMed


To discover remote evolutionary relationships and functional similarities between proteins, biologists rely on comparative sequence analysis, and when structures are available, on structural alignments and various measures of structural similarity. The measures/scores that have most commonly been used for this purpose include: alignment length, percent sequence identity, superposition RMSD and their different combinations. More recently, we have introduced the "Homologous core structure overlap score" (HCS) and the "Loop Hausdorff Measure" (LHM). Along with these we also consider the "gapped structural alignment score" (GSAS), which was introduced earlier by other researchers.
We analyze the performance of these and other conventional measures at the task of ranking structure neighbors by homology, and we show that the HCS, LHM, and GSAS scores display considerably improved performance over the conventional measures of sequence or structural similarity.
The HCS, LHM, and GSAS scores are easily computable quantities that allow users of structure-neighbor databases to more easily identify interesting structural similarities between proteins.

Full-text preview

Available from: PubMed Central
  • Source
    • "The structural similarity between proteins cannot be considered proof of common ancestry, because structure space is relatively small with its limited number of arrangements of secondary structure elements and many examples of ''structural convergence'' have been described (Finkelstein and Ptitsyn 1987; Krishna and Grishin 2004). In practice, a homologous relationship is often accepted when the sequences are significantly similar (Doolittle 1994; Pearson 1996; Murzin 1998), when both sequences and structures are sufficiently similar (Murzin 1993; Holm and Sander 1997; Russell et al. 1997; Madej et al. 2007; Cheng et al. 2008) or when, in addition to sequence or structure similarity, other information such as the co-occurrence of rare structural or functional features, functional annotations , or sequence motifs hint at a homologous relationship (Holm and Sander 1997; Murzin 1998; Dietmann and Holm 2001; Nagano et al. 2002; Gewehr et al. 2007). Despite the usefulness of these criteria, the degree of sequence similarity remains the most important criterion for common ancestry in practice. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Outer membrane beta-barrels (OMBBs) are the major class of outer membrane proteins from Gram-negative bacteria, mitochondria, and plastids. Their transmembrane domains consist of 8-24 beta-strands forming a closed, barrel-shaped beta-sheet around a central pore. Despite their obvious structural regularity, evidence for an origin by duplication or for a common ancestry has not been found. We use three complementary approaches to show that all OMBBs from Gram-negative bacteria evolved from a single, ancestral beta beta hairpin. First, we link almost all families of known single-chain bacterial OMBBs with each other through transitive profile searches. Second, we identify a clear repeat signature in the sequences of many OMBBs in which the repeating sequence unit coincides with the structural beta beta hairpin repeat. Third, we show that the observed sequence similarity between OMBB hairpins cannot be explained by structural or membrane constraints on their sequences. The third approach addresses a longstanding problem in protein evolution: how to distinguish between a very remotely homologous relationship and the opposing scenario of "sequence convergence." The origin of a diverse group of proteins from a single hairpin module supports the hypothesis that, around the time of transition from the RNA to the protein world, proteins arose by amplification and recombination of short peptide modules that had previously evolved as cofactors of RNAs.
    Molecular Biology and Evolution 06/2010; 27(6):1348-58. DOI:10.1093/molbev/msq017 · 9.11 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Loops connect regular secondary structures. In many instances, they are known to play important biological roles. Analysis and prediction of loop conformations depend directly on the definition of repetitive structures. Nonetheless, the secondary structure assignment methods (SSAMs) often lead to divergent assignments. In this study, we analyzed, both structure and sequence point of views, how the divergence between different SSAMs affect boundary definitions of loops connecting regular secondary structures. The analysis of SSAMs underlines that no clear consensus between the different SSAMs can be easily found. Because these latter greatly influence the loop boundary definitions, important variations are indeed observed, that is, capping positions are shifted between different SSAMs. On the other hand, our results show that the sequence information in these capping regions are more stable than expected, and, classical and equivalent sequence patterns were found for most of the SSAMs. This is, to our knowledge, the most exhaustive survey in this field as (i) various databank have been used leading to similar results without implication of protein redundancy and (ii) the first time various SSAMs have been used. This work hence gives new insights into the difficult question of assignment of repetitive structures and addresses the issue of loop boundaries definition. Although SSAMs give very different local structure assignments capping sequence patterns remain efficiently stable.
    Protein Science 09/2009; 18(9):1869-81. DOI:10.1002/pro.198 · 2.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Glycosylation is an important aspect of epigenetic regulation. Glycosyltransferase is a key enzyme in the biosynthesis of glycans, which glycosylates more than half of all proteins in eukaryotes and is involved in a wide range of biological processes. It has been suggested previously that homooligomerization in glycosyltransferases and other proteins might be crucial for their function. In this study, we explore functional homooligomeric states of glycosyltransferases in various organisms, trace their evolution, and perform comparative analyses to find structural features that can mediate or disrupt the formation of different homooligomers. First, we make a structure-based classification of the diverse superfamily of glycosyltransferases and confirm that the majority of the structures are indeed clustered into the GT-A or GT-B folds. We find that homooligomeric glycosyltransferases appear to be as ancient as monomeric glycosyltransferases and go back in evolution to the last universal common ancestor (LUCA). Moreover, we show that interface residues have significant bias to be gapped out or unaligned in the monomers, implying that they might represent features crucial for oligomer formation. Structural analysis of these features reveals that the majority of them represent loops, terminal regions, and helices, indicating that these secondary-structure elements mediate the formation of glycosyltransferases' homooligomers and directly contribute to the specific binding. We also observe relatively short protein regions that disrupt the homodimer interactions, although such cases are rare. These results suggest that relatively small structural changes in the nonconserved regions may contribute to the formation of different functional oligomeric states and might be important in regulation of enzyme activity through homooligomerization.
    Journal of Molecular Biology 05/2010; 399(1):196-206. DOI:10.1016/j.jmb.2010.03.059 · 4.33 Impact Factor
Show more