Fay JC, Wu CI. Sequence divergence, functional constraint, and selection in protein evolution. Annu Rev Genomics Hum Genet 4: 213-235

Department of Genome Sciences, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
Annual Review of Genomics and Human Genetics (Impact Factor: 8.96). 02/2003; 4:213-35. DOI: 10.1146/annurev.genom.4.020303.162528
Source: PubMed


The genome sequences of multiple species has enabled functional inferences from comparative genomics. A primary objective is to infer biological functions from the conservation of homologous DNA sequences between species. A second, more difficult, objective is to understand what functional DNA sequences have changed over time and are responsible for species' phenotypic differences. The neutral theory of molecular evolution provides a theoretical framework in which both objectives can be explicitly tested. Development of statistical tests within this framework has provided insight into the evolutionary forces that constrain and in some cases change DNA sequences and the resulting patterns that emerge. In this article, we review recent work on how functional constraint and changes in protein function are inferred from protein polymorphism and divergence data. We relate these studies to our understanding of the neutral theory and adaptive evolution.

Download full-text


Available from: Justin C Fay, Oct 10, 2015
1 Follower
20 Reads
    • "Similarly, for Leu (UUR), the RSCU was 2.63 for UUA and 1.62 for UUG. The estimation of nonsynonymous (Ka) and synonymous (Ks) substitution rates is quite useful for understanding the selective constraints acting on the protein-coding sequences across closely related species (Ohta, 1995; Fay and Wu, 2003). In order to detect the influence of selection pressure in Arcidae species, the numbers of Ka, Ks and their ratios were calculated for all pairwise comparisons among the four Arcidae (Supplementary Table 2). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The mitochondrial (mt) genome is a significant tool for investigating the evolutionary history of metazoan animals. The family Arcidae belongs to the superfamily Arcacea in the bivalve order Arcoida, comprising about 260 species. Currently, three complete mitochondrial genomes are available in GenBank, representing 1 subfamily and 2 genera. Here we present the complete mitochondrial genome sequence of Anadara vellicata (Bivalvia: Arcidae), the first report of complete mitogenome from Anadara, Arcidae, and compared its sequence with other available Arcidae mitogenomes. The A. vellicata mitogenome is 34,147bp in length, including 12 protein-coding genes (PCGs), 25 transfer RNAs (tRNAs), 2 ribosomal RNA (rRNA) genes and non-coding regions (NCR) (20,722bp). The nucleotide composition of the genome is A+T biased, accounting for 61.03%, with negative AT skew (-0.12) and positive GC skew (0.41). We report the evidence of alloacceptor tRNA gene recruitment (trnY-trnL2). A conserved 23bp-long sequence was used as the basis to infer the 3' terminus of rrnS. Most of the non-coding sequences (16,112bp) are observed within one segment. In the NCR, the tandem repeat (TR) region is 1143bp, comprising six tandem repeats with 189bp to 192bp in length. In addition, a long thymine-nucleotide stretch (T-stretch) was detected in the NCR of A. vellicata. The gene order and transcriptional polarity of the protein-coding genes is identical to other Arcidae species. tRNA genes are rearranged, making the gene order unique. The results support that mt gene arrangement among Arcidae species is not random, but correlated with their evolutionary relationships. Copyright © 2015 Elsevier Inc. All rights reserved.
    Comparative Biochemistry and Physiology Part D Genomics and Proteomics 08/2015; 16:73-82. DOI:10.1016/j.cbd.2015.08.001 · 2.06 Impact Factor
  • Source
    • "Rennison et al. 2012) and even protein coding genes in general (! = 0.08–0.18; Fay and Wu 2003). Instead, they are similar to values found for genes known to be under strong positive selection such as human MHC and reproductive proteins (Swanson et al. 2001). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Studies of cichlid evolution have highlighted the importance of visual pigment genes in the spectacular radiation of the African rift lake cichlids. Recent work, however, has also provided strong evidence for adaptive diversification of riverine cichlids in the Neotropics, which inhabit environments of markedly different spectral properties from the African rift lakes. These ecological and/or biogeographic differences may have imposed divergent selective pressures on the evolution of the cichlid visual system. To test these hypotheses, we investigated the molecular evolution of the dim-light visual pigment, rhodopsin. We sequenced rhodopsin from Neotropical and African riverine cichlids, and combined these data with published sequences from African cichlids. We found significant evidence for positive selection using random sites codon models in all cichlid groups, with the highest levels in African lake cichlids. Tests using branch-site and clade models that partitioned the data along ecological (lake, river) and/or biogeographic (African, Neotropical) boundaries found significant evidence of divergent selective pressures among cichlid groups. However, statistical comparisons among these models suggest that ecological, rather than biogeographic, factors may be responsible for divergent selective pressures that have shaped the evolution of the visual system in cichlids. We found that branch-site models did not perform as well as clade models for our data set, in which there was evidence for positive selection in the background. One of our most intriguing results is that the amino acid sites found to be under positive selection in Neotropical and African lake cichlids were largely non-overlapping, despite falling into the same three functional categories: spectral tuning, retinal uptake/release, and rhodopsin dimerization. Taken together, these results would imply divergent selection across cichlid clades, but targeting similar functions. This study highlights the importance of molecular investigations of ecologically important groups, and the flexibility of clade models in explicitly testing ecological hypotheses.
    Molecular Biology and Evolution 02/2014; 31(5). DOI:10.1093/molbev/msu064 · 9.11 Impact Factor
  • Source
    • "Alignment-wide estimates of variation (ω) are high for both genes (M0, ω = 0.304 and 0.289 for CD28 and CTLA-4, respectively) compared to typical values of 0.08–0.18 [46]. The values for CD28 and CTLA-4 are comparable to genes coding highly diverse proteins with codon sites under strong positive selection, such as MHC proteins and reproductive proteins (ω = 0.5 and 0.27–0.93, "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein N-glycosylation is found in all domains of life and has a conserved role in glycoprotein folding and stability. In animals, glycoproteins transit through the Golgi where the N-glycans are trimmed and rebuilt with sequences that bind lectins, an innovation that greatly increases structural diversity and redundancy of glycoprotein-lectin interaction at the cell surface. Here we ask whether the natural tension between increasing diversity (glycan-protein interactions) and site multiplicity (backup and status quo) might be revealed by a phylogenic examination of glycoproteins and NXS/T(X≠P) N-glycosylation sites. Site loss is more likely by mutation at Asn encoded by two adenosine (A)-rich codons, while site gain is more probable by generating Ser or Thr downstream of an existing Asn. Thus mutations produce sites at novel positions more frequently than the reversal of recently lost sites, and therefore more paths though sequence space are made available to natural selection. An intra-species comparison of secretory and cytosolic proteins revealed a departure from equilibrium in sequences one-mutation-away from NXS/T and in (A) content, indicating strong selective pressures and exploration of N-glycosylation positions during vertebrate evolution. Furthermore, secretory proteins have evolved at rates proportional to N-glycosylation site number, indicating adaptive interactions between the N-glycans and underlying protein. Given the topology of the genetic code, mutation of (A) is more often nonsynonomous, and Lys, another target of many PTMs, is also encoded by two (A)-rich codons. An examination of acetyl-Lys sites in proteins indicated similar evolutionary dynamics, consistent with asymmetry of the target and recognition portions of modified sites. Our results suggest that encoding asymmetry is an ancient mechanism of evolvability that increases diversity and experimentation with PTM site positions. Strong selective pressures on PTMs may have contributed to the A+T→G+C shift in genome-wide nucleotide composition during metazoan radiation.
    PLoS ONE 01/2014; 9(1):e86088. DOI:10.1371/journal.pone.0086088 · 3.23 Impact Factor
Show more