Publications (9)94.99 Total impact
-
Dataset: nature06341-s1
-
Article: Evolutionary analysis of amino acid repeats across the genomes of 12 Drosophila species.
[show abstract] [hide abstract]
ABSTRACT: Repeated motifs of amino acids within proteins are an abundant feature of eukaryotic sequences and may catalyze the rapid production of genetic and even phenotypic variation among organisms. The completion of the genome sequencing projects of 12 distinct Drosophila species provides a unique dataset to study these intriguing sequence features on a phylogeny with a variety of timescales. We show that there is a higher percentage of proteins containing repeats within the Drosophila genus than most other eukaryotes, including non-Drosphila insects, which makes this collection of species particularly useful for the study of protein repeats. We also find that proteins containing repeats are overrepresented in functional categories involving developmental processes, signaling, and gene regulation. Using the set of 1-to-1 ortholog alignments for the 12 Drosophila species, we test the ability of repeats to act as reliable phylogenetic signals and find that they resolve the generally accepted phylogeny despite the noise caused by their accelerated rate of evolution. We also determine that in general the position of repeats within a protein sequence is non-random, with repeats more often being absent from the middle regions of sequences. Finally we find evidence to suggest that the presence of repeats is associated with an increase in evolutionary rate upon the entire sequence in which they are embedded. With additional evidence to suggest a corresponding elevation in positive selection we propose that some repeats may be inducing compensatory substitutions in their surrounding sequence.Molecular Biology and Evolution 01/2008; 24(12):2598-609. · 5.55 Impact Factor -
Article: Evolution of genes and genomes on the Drosophila phylogeny.
[show abstract] [hide abstract]
ABSTRACT: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.Nature 12/2007; 450(7167):203-18. · 36.28 Impact Factor -
Article: Evolution of genes and genomes on the Drosophila phylogeny
[show abstract] [hide abstract]
ABSTRACT: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.Nature 11/2007; 450(7167):203-218. · 36.28 Impact Factor -
Article: Selection and slippage creating serine homopolymers.
[show abstract] [hide abstract]
ABSTRACT: Highly repetitive sequence within proteins is an abundant feature yet is considered by some to be the protein equivalent of "junk DNA." Homopolymer sequences, the most highly repetitive of this group, are typically encoded by trinucleotide repeats at the DNA level. It is thought that many of these sequences are produced by a replicative slippage mechanism. Recent studies suggest that these highly mutable regions within proteins may allow for rapid morphological evolution emerging from the increased variability afforded by such coding structures. However, in a homopolymer, it is difficult to determine if the repeated amino acid is due to slippage at the DNA level or due to selection at the protein level. Here we develop and test a model to detect cases for which the homopolymer tract has clearly been selected for, with no evidence of slippage at the DNA level. The polyserine tract within the phosphatidylserine receptor protein is used as an excellent example of one such case.Molecular Biology and Evolution 12/2006; 23(11):2017-25. · 5.55 Impact Factor -
Article: A genomic comparison of faster-sex, faster-X, and faster-male evolution between Drosophila melanogaster and Drosophila pseudoobscura.
[show abstract] [hide abstract]
ABSTRACT: A genomic comparison of Drosophila melanogaster and Drosophila pseudoobscura provides a unique opportunity to investigate factors involved in sequence divergence. The chromosomal arrangements of these species include an autosomal segment in D. melanogaster which is homologous to part of the X chromosome in D. pseudoobscura. Using orthologues to calculate rates of nonsynonymous (d(N)) substitutions, we found genes on the X chromosome to be significantly more diverged than those on the autosomes, but it is not true for segment 3L-XR which is autosomal in D. melanogaster (3L) and X-linked in D. pseudoobscura (XR). We also found that the median d(N) values for genes having reproductive functions in either the male, the female, or both sexes are higher than those for sequences without reproductive function and even higher for sequences involved in male-specific function. These estimates of divergence for male sex-related sequences are most likely underestimates, as the very rapidly evolving reproductive genes would tend to lose homology sooner and thus not be included in the comparison of orthologues. We also noticed a high proportion of male reproductive genes among the othologous genes with the highest rates of d(N). Reproductive genes with and without an orthologue in D. pseudoobscura were compared among D. melanogaster, D. simulans, and D. yakuba and it was found that there were in fact higher rates of divergence in the group without a D. pseudoobscura orthologue. These results, from widely separated taxa, bolster the thesis that sexual system genes experience accelerated rates of change in comparison to nonsexual genes in evolution and speciation.Journal of Molecular Evolution 07/2006; 62(6):693-700. · 2.27 Impact Factor -
Article: Simple sequence in brain and nervous system specific proteins.
[show abstract] [hide abstract]
ABSTRACT: We examined sequences expressed in the brain and nervous system using EST data. A previous study including sequences thought to have neurological function found a deficiency of simple sequence within such sequences. This was despite many examples of neurodegenerative diseases, such as Huntington disease, which are thought to be caused by expansions of polyglutamine tracts within associated protein sequences. It may be that many of the sequences thought to have neurological function have other additional, non-neurological roles. For this reason, we examined sequences with specific expression in the brain and nervous system, using EST expression data to determine if they too are deficient of simple, repetitive sequences. Indeed, we find this class of sequences to be deficient. Unexpectedly, however, we find sequences expressed in the brain and nervous system to be consistently enriched for histidine-enriched simple sequence. Determining the function of these histidine-rich regions within brain-specific proteins requires more experimental data.Genome 05/2005; 48(2):291-301. · 1.65 Impact Factor -
Article: Neurological proteins are not enriched for repetitive sequences.
[show abstract] [hide abstract]
ABSTRACT: Proteins associated with disease and development of the nervous system are thought to contain repetitive, simple sequences. However, genome-wide surveys for simple sequences within proteins have revealed that repetitive peptide sequences are the most frequent shared peptide segments among eukaryotic proteins, including those of Saccharomyces cerevisiae, which has few to no specialized developmental and neurological proteins. It is therefore of interest to determine if these specialized proteins have an excess of simple sequences when compared to other sets of compositionally similar proteins. We have determined the relative abundance of simple sequences within neurological proteins and find no excess of repetitive simple sequence within this class. In fact, polyglutamine repeats that are associated with many neurodegenerative diseases are no more abundant within neurological specialized proteins than within nonneurological collections of proteins. We also examined the codon composition of serine homopolymers to determine what forces may play a role in the evolution of extended homopolymers. Codon type homogeneity tends to be favored, suggesting replicative slippage instead of selection as the main force responsible for producing these homopolymers.Genetics 04/2004; 166(3):1141-54. · 4.01 Impact Factor -
Article: Simple sequences are rare in the Protein Data Bank.
[show abstract] [hide abstract]
ABSTRACT: A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.Proteins Structure Function and Bioinformatics 08/2002; 48(1):134-40. · 3.39 Impact Factor
Top Journals
- Molecular Biology and Evolution (2)
- Genetics (1)
- Journal of Molecular Evolution (1)
- Nature (1)
- Genome (1)
Institutions
-
2008
-
Cornell University
- Department of Molecular Biology and Genetics
Ithaca, NY, USA
-
-
2002–2005
-
McMaster University
- Department of Biology
Hamilton, Ontario, Canada
-