Sequence space and the ongoing expansion of the protein universe

Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Calle Dr Aiguader 88, Barcelona Biomedical Research Park Building, 08003 Barcelona, Spain.
Nature (Impact Factor: 42.35). 06/2010; 465(7300):922-6. DOI: 10.1038/nature09105
Source: PubMed

ABSTRACT The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: approximately 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, approximately 3.5 x 10(9) yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The fitness landscape is a powerful metaphor for describing the relationship between genotype and phenotype for a population under selection. However, empirical data as to the topography of fitness landscapes are limited, owing to difficulties in measuring fitness for large numbers of genotypes under any condition. We previously reported a case of reciprocal sign epistasis (RSE), where two mutations individually increased yeast fitness in a glucose-limited environment, but reduced fitness when combined, suggesting the existence of two peaks on the fitness landscape. We sought to determine whether a ridge connected these peaks so that populations founded by one mutant could reach the peak created by the other, avoiding the low-fitness “Valley-of-Death” between them. Sequencing clones after 250 generations of further evolution provided no evidence for such a ridge, but did reveal many presumptive beneficial mutations, adding to a growing body of evidence that clonal interference pervades evolving microbial populations.
    Genomics 11/2014; 104(6). DOI:10.1016/j.ygeno.2014.10.011 · 2.79 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Understanding epistasis is central to biology. For instance, epistatic interactions determine the topography of the fitness landscape and affect the dynamics and determinism of adaptation. However, few empirical data are available and comparing results is complicated by confounding variation in the system and the type of mutations used. Here, we take a systematic approach by quantifying epistasis in two sets of four beneficial mutations in the antibiotic resistance enzyme TEM-1 β-lactamase. Mutations in these sets either have large or small effects on cefotaxime resistance when present as single mutations. By quantifying the epistasis and ruggedness in both landscapes we find two general patterns. First, resistance is maximal for combinations of two mutations in both fitness landscapes and declines when more mutations are added due to abundant sign epistasis and a pattern of diminishing returns with genotype resistance. Second, large-effect mutations interact more strongly than small-effect mutations, suggesting that the effect size of mutations may be an organizing principle in understanding patterns of epistasis. By fitting the data to simple phenotype-resistance models, we show that this pattern may be explained by the nonlinear dependence of resistance on enzyme stability and an unknown phenotype when mutations have antagonistically pleiotropic effects. The comparison to a previously published set of mutations in the same gene with a joint benefit further shows that the enzyme's fitness landscape is locally rugged, but does contain adaptive pathways that lead to high resistance.
    Molecular Biology and Evolution 05/2013; 30(8). DOI:10.1093/molbev/mst096 · 14.31 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Current methods cannot tell us what the nature of the protein universe is concretely. They are based on different models of amino acid substitution and multiple sequence alignment which is an NP-hard problem and requires manual intervention. Protein structural analysis also gives a direction for mapping the protein universe. Unfortunately, now only a minuscule fraction of proteins' 3-dimensional structures are known. Furthermore, the phylogenetic tree representations are not unique for any existing tree construction methods. Here we develop a novel method to realize the nature of protein universe. We show the protein universe can be realized as a protein space in 60-dimensional Euclidean space using a distance based on a normalized distribution of amino acids. Every protein is in one-to-one correspondence with a point in protein space, where proteins with similar properties stay close together. Thus the distance between two points in protein space represents the biological distance of the corresponding two proteins. We also propose a natural graphical representation for inferring phylogenies. The representation is natural and unique based on the biological distances of proteins in protein space. This will solve the fundamental question of how proteins are distributed in the protein universe.
    Journal of Theoretical Biology 11/2012; 318. DOI:10.1016/j.jtbi.2012.11.005 · 2.30 Impact Factor


Available from