Evolutionary Analysis of Amino Acid Repeats Across the Genomes of 12 Drosophila Species

Department of Molecular Biology and Genetics Cornell University, USA.
Molecular Biology and Evolution (Impact Factor: 9.11). 01/2008; 24(12):2598-609. DOI: 10.1093/molbev/msm129
Source: PubMed


Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence.

Full-text preview

Available from:
  • Source
    • "compared with other functional classes (Karlin et al. 2002; Alb a and Guig o 2004; Faux et al. 2005; Huntley and Clark 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The high regulatory complexity of vertebrates has been related to two closely spaced whole genome duplications (2R-WGD) that occurred before the divergence of the major vertebrate groups. Following these events, many developmental transcription factors (TFs) were retained in multiple copies and subsequently specialized in diverse functions, whereas others reverted to their singleton state. TFs are known to be generally rich in amino acid repeats or low-complexity regions (LCRs), such as polyalanine or polyglutamine runs, which can evolve rapidly and potentially influence the transcriptional activity of the protein. Here we test the hypothesis that LCRs have played a major role in the diversification of TF gene duplicates. We find that nearly half of the TF gene families originated during the 2R-WGD contain LCRs. The number of gene duplicates with LCRs is 155 out of 550 analyzed (28%), about twice as many as the number of single copy genes with LCRs (15 out of 115, 13%). In addition, duplicated TFs preferentially accumulate certain LCR types, the most prominent of which are alanine repeats. We experimentally test the role of alanine-rich LCRs in two different TF gene families, PHOX2A/PHOX2B and LHX2/LHX9. In both cases, the presence of the alanine-rich LCR in one of the copies (PHOX2B and LHX2) significantly increases the capacity of the TF to activate transcription. Taken together, the results provide strong evidence that LCRs are important driving forces of evolutionary change in duplicated genes. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Molecular Biology and Evolution 04/2015; 32(9). DOI:10.1093/molbev/msv103 · 9.11 Impact Factor
  • Source
    • "Tandem repeats are probably the most common cause of sequence-length variation within loci (e.g. Huntley and Clark 2007; Messer and Arndt 2007), and yet they are detected only poorly by most alignment programs. Furthermore, small inversions often go undetected because the programs do not look for them explicitly, and therefore interpret them as multiple adjacent substitutions (Kelchner and Wendel 1996; Kelchner and Clark 1997; Graham et al. 2000; Quandt and Stech 2005). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.
    Australian Systematic Botany 01/2015; 28(1):46-62. DOI:10.1071/SB15001 · 1.08 Impact Factor
  • Source
    • "Thus, in principle it is unlikely that a type of variation with high mutational instability, like TRs, would be a major contributor to phenotypic evolution. Support for this argument is provided by the knowledge that TRs are not evolutionarily stable features of eukaryotic genomes (Huntley and Clark 2007; Gibbons and Rokas 2009), as well as by dozens of human genetic diseases, which suggest that a significant fraction of the variation present in TRCNPs is deleterious (Mirkin 2007). These caveats notwithstanding, the mutational instability of TRCNPs might be beneficial in certain specific cases, such as in cell-surface genes from organisms that live in rapidly fluctuating environments (Verstrepen et al. 2005; Vinces et al. 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number polymorphisms of nucleotide tandem repeat (TR) regions, such as microsatellites and minisatellites, are mutationally reversible and highly abundant in eukaryotic genomes. Studies linking TR polymorphism to phenotypic variation have led some to suggest that TR variation modulates and majorly contributes to phenotypic variation; however, studies in which the authors assess the genome-wide impact of TR variation on phenotype are lacking. To address this question, we quantified relationships between polymorphism levels in 143 genome-wide promoter region TRs across 16 isolates of the filamentous fungus Aspergillus flavus and its ecotype Aspergillus oryzae with expression levels of their downstream genes. We found that only 4.3% of relationships tested were significant; these findings were consistent with models in which TRs act as "tuning," "volume," or "optimality" "knobs" of phenotype but not with "switch" models. Furthermore, the promoter regions of differentially expressed genes between A. oryzae and A. flavus did not show TR enrichment, suggesting that genome-wide differences in molecular phenotype between the two species are not significantly associated with TRs. Although in some cases TR polymorphisms do contribute to transcript abundance variation, these results argue that at least in this case, TRs might not be major modulators of variation in phenotype.
    G3-Genes Genomes Genetics 12/2012; 2(12):1643-9. DOI:10.1534/g3.112.004663 · 3.20 Impact Factor
Show more