Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species.
ABSTRACT Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence.
- SourceAvailable from: PubMed Central[Show abstract] [Hide abstract]
ABSTRACT: Previous studies have found that DNA flanking low-complexity regions (LCRs) have an increased substitution rate. Here, the substitution rate was confirmed to increase in the vicinity of LCRs in several primate species, including humans. This effect was also found among human sequences from the 1000 Genomes Project. A strong correlation was found between average substitution rate per site and distance from the LCR, as well as the proportion of genes with gaps in the alignment at each site and distance from the LCR. Along with substitution rates, dN=dS ratios were also determined for each site, and the proportion of sites undergoing negative selection was found to have a negative relationship with distance from the LCR.Genome Biology and Evolution 02/2014; · 4.53 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: For ∼30 million years, the eggs of Hawaiian Drosophila were laid in ever-changing environments caused by high rates of island formation. The associated diversification of the size and developmental rate of the syncytial fly embryo would have altered morphogenic gradients, thus necessitating frequent evolutionary compensation of transcriptional responses. We investigate the consequences these radiations had on transcriptional enhancers patterning the embryo to see whether their pattern of molecular evolution is different from non-Hawaiian species. We identify and functionally assay in transgenic D. melanogaster the Neurogenic Ectoderm Enhancers from two different Hawaiian Drosophila groups: (i) the picture wing group, and (ii) the modified mouthparts group. We find that the binding sites in this set of well-characterized enhancers are footprinted by diverse microsatellite repeat (MSR) sequences. We further show that Hawaiian embryonic enhancers in general are enriched in MSR relative to both Hawaiian non-embryonic enhancers and non-Hawaiian embryonic enhancers. We propose embryonic enhancers are sensitive to Activator spacing because they often serve as assembly scaffolds for the aggregation of transcription factor activator complexes. Furthermore, as most indels are produced by microsatellite repeat slippage, enhancers from Hawaiian Drosophila lineages, which experience dynamic evolutionary pressures, would become grossly enriched in MSR content.PLoS ONE 06/2014; 9(6):e101177. · 3.53 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Polyglutamine (polyQ) tracts have been studied extensively for their roles in a number of human diseases such as Huntington's or different Ataxias. However, it has also been recognized that polyQ tracts are abundant and may have important functional and evolutionary roles. Especially the association of polyQ and also polyalanine (polyA) tracts with transcription factors and their activation activity has been noted. While a number of examples for this association have been found for proteins from opisthokonts (animals and fungi), only a few studies exist for polyQ and polyA stretches in plants, and systematic investigations of the significance of these repeats in plant transcription factors are scarce. Here, we analyze the abundance and length of polyQ and polyA stretches in the conceptual proteomes of six plant species and examine the connection between polyQ and polyA tracts and transcription factors of the repeat-containing proteins. We show that there is an association of polyQ stretches with transcription factors in plants. In grasses, transcription factors are also significantly enriched in polyA stretches. While there is variation in the abundance, length, and association with certain functions of polyQ and polyA stretches between different species, no general differences in the evolution of these repeats could be observed between plants and opisthokonts.German Conference on Bioinformatics, Jena; 09/2012