Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species.
ABSTRACT Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence.
- SourceAvailable from: sciencedirect.com[show abstract] [hide abstract]
ABSTRACT: Analysis of protein sequences from Mycobacterium tuberculosis H37Rv (Mtb H37Rv) was performed to identify homopeptide repeat-containing proteins (HRCPs). Functional annotation of the HRCPs showed that they are preferentially involved in cellular metabolism. Furthermore, these homopeptide repeats might play some specific roles in protein-protein interaction. Repeat length differences among Bacteria, Archaea and Eukaryotes were calculated in order to identify the conservation of the repeats in these divergent kingdoms. From the results, it was evident that these repeats have a higher degree of conservation in Bacteria and Archaea than in Eukaryotes. In addition, there seems to be a direct correlation between the repeat length difference and the degree of divergence between the species. Our study supports the hypothesis that the presence of homopeptide repeats influences the rate of evolution of the protein sequences in which they are embedded. Thus, homopeptide repeat may have structural, functional and evolutionary implications on proteins.Genomics Proteomics & Bioinformatics 08/2012; 10(4):217-25.
- [show abstract] [hide abstract]
ABSTRACT: Copy number polymorphisms of nucleotide tandem repeat (TR) regions, such as microsatellites and minisatellites, are mutationally reversible and highly abundant in eukaryotic genomes. Studies linking TR polymorphism to phenotypic variation have led some to suggest that TR variation modulates and majorly contributes to phenotypic variation; however, studies in which the authors assess the genome-wide impact of TR variation on phenotype are lacking. To address this question, we quantified relationships between polymorphism levels in 143 genome-wide promoter region TRs across 16 isolates of the filamentous fungus Aspergillus flavus and its ecotype Aspergillus oryzae with expression levels of their downstream genes. We found that only 4.3% of relationships tested were significant; these findings were consistent with models in which TRs act as "tuning," "volume," or "optimality" "knobs" of phenotype but not with "switch" models. Furthermore, the promoter regions of differentially expressed genes between A. oryzae and A. flavus did not show TR enrichment, suggesting that genome-wide differences in molecular phenotype between the two species are not significantly associated with TRs. Although in some cases TR polymorphisms do contribute to transcript abundance variation, these results argue that at least in this case, TRs might not be major modulators of variation in phenotype.G3-Genes Genomes Genetics 12/2012; 2(12):1643-9. · 1.79 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: BACKGROUND: Low-complexity regions (LCRs) in proteins are tracts that are highly enriched in one or a few amino acids. Given their high abundance, and their capacity to expand in relatively short periods of time through replication slippage, they can greatly contribute to increase protein sequence space and generate novel protein functions. However, little is known about the global impact of LCRs on protein evolution. RESULTS: We have traced back the evolutionary history of 2,802 LCRs from a large set of homologous protein families from H.sapiens, M.musculus, G.gallus, D.rerio and C.intestinalis. Transcriptional factors and other regulatory functions are overrepresented in proteins containing LCRs. We have found that the gain of novel LCRs is frequently associated with repeat expansion whereas the loss of LCRs is more often due to accumulation of amino acid substitutions as opposed to deletions. This dichotomy results in net protein sequence gain over time. We have detected a significant increase in the rate of accumulation of novel LCRs in the ancestral Amniota and mammalian branches, and a reduction in the chicken branch. Alanine and/or glycine-rich LCRs are overrepresented in recently emerged LCR sets from all branches, suggesting that their expansion is better tolerated than for other LCR types. LCRs enriched in positively charged amino acids show the contrary pattern, indicating an important effect of purifying selection in their maintenance. CONCLUSION: We have performed the first large-scale study on the evolutionary dynamics of LCRs in protein families. The study has shown that the composition of an LCR is an important determinant of its evolutionary pattern.BMC Evolutionary Biology 08/2012; 12(1):155. · 3.29 Impact Factor