Sequence space and the ongoing expansion of the protein universe. Nature

Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Calle Dr Aiguader 88, Barcelona Biomedical Research Park Building, 08003 Barcelona, Spain.
Nature (Impact Factor: 41.46). 06/2010; 465(7300):922-6. DOI: 10.1038/nature09105
Source: PubMed


The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

  • Source
    • "Rarely, different initial substitutions create alternative paths that lead to cefotaxime resistance, albeit at lower levels than that achieved via the most common trajectory. All else being equal, the rate of adaptive evolution in a " rugged " landscape having multiple adaptive peaks is slower than in a smooth landscape having one global optimum: " rugged " landscapes are more functionally constrained [8], and some fraction of the population can become trapped on peaks of different heights. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The fitness landscape is a powerful metaphor for describing the relationship between genotype and phenotype for a population under selection. However, empirical data as to the topography of fitness landscapes are limited, owing to difficulties in measuring fitness for large numbers of genotypes under any condition. We previously reported a case of reciprocal sign epistasis (RSE), where two mutations individually increased yeast fitness in a glucose-limited environment, but reduced fitness when combined, suggesting the existence of two peaks on the fitness landscape. We sought to determine whether a ridge connected these peaks so that populations founded by one mutant could reach the peak created by the other, avoiding the low-fitness “Valley-of-Death” between them. Sequencing clones after 250 generations of further evolution provided no evidence for such a ridge, but did reveal many presumptive beneficial mutations, adding to a growing body of evidence that clonal interference pervades evolving microbial populations.
    Full-text · Article · Nov 2014 · Genomics
  • Source
    • "It has been suggested that epistasis is stronger among mutations in the same gene than among mutations in different genes (Poon and Chao 2005; Watson et al. 2010), but confounding methodological variation across studies has prevented solid tests (Szendro, Schenk, et al. 2013). The relatively slow rate of expansion of the " protein universe " (Povoltskaya and Kondrashov 2010) and discrepancy between long and short-term amino acid substitution rates (Breen et al. 2012) also suggest constraints from sign epistasis in protein evolution. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding epistasis is central to biology. For instance, epistatic interactions determine the topography of the fitness landscape and affect the dynamics and determinism of adaptation. However, few empirical data are available and comparing results is complicated by confounding variation in the system and the type of mutations used. Here, we take a systematic approach by quantifying epistasis in two sets of four beneficial mutations in the antibiotic resistance enzyme TEM-1 β-lactamase. Mutations in these sets either have large or small effects on cefotaxime resistance when present as single mutations. By quantifying the epistasis and ruggedness in both landscapes we find two general patterns. First, resistance is maximal for combinations of two mutations in both fitness landscapes and declines when more mutations are added due to abundant sign epistasis and a pattern of diminishing returns with genotype resistance. Second, large-effect mutations interact more strongly than small-effect mutations, suggesting that the effect size of mutations may be an organizing principle in understanding patterns of epistasis. By fitting the data to simple phenotype-resistance models, we show that this pattern may be explained by the nonlinear dependence of resistance on enzyme stability and an unknown phenotype when mutations have antagonistically pleiotropic effects. The comparison to a previously published set of mutations in the same gene with a joint benefit further shows that the enzyme's fitness landscape is locally rugged, but does contain adaptive pathways that lead to high resistance.
    Full-text · Article · May 2013 · Molecular Biology and Evolution
  • Source
    • "It is a large and mysterious entity, which is an essential underpinning of all biology (Levitt, 2009). The current methods (Levitt, 2009; Dokholyan et al., 2002; Jaroszewski et al., 2009; Koonin, 2007; Koonin et al., 2002; Povolotskaya and Kondrashov, 2010) to reveal the nature of the protein universe cluster sequences into families by similarities. However, these methods cannot even tell us what the nature of the protein universe is concretely. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Current methods cannot tell us what the nature of the protein universe is concretely. They are based on different models of amino acid substitution and multiple sequence alignment which is an NP-hard problem and requires manual intervention. Protein structural analysis also gives a direction for mapping the protein universe. Unfortunately, now only a minuscule fraction of proteins' 3-dimensional structures are known. Furthermore, the phylogenetic tree representations are not unique for any existing tree construction methods. Here we develop a novel method to realize the nature of protein universe. We show the protein universe can be realized as a protein space in 60-dimensional Euclidean space using a distance based on a normalized distribution of amino acids. Every protein is in one-to-one correspondence with a point in protein space, where proteins with similar properties stay close together. Thus the distance between two points in protein space represents the biological distance of the corresponding two proteins. We also propose a natural graphical representation for inferring phylogenies. The representation is natural and unique based on the biological distances of proteins in protein space. This will solve the fundamental question of how proteins are distributed in the protein universe.
    Full-text · Article · Nov 2012 · Journal of Theoretical Biology
Show more