[Show abstract][Hide abstract] ABSTRACT: Genetic recombination associated with sexual reproduction increases the efficiency of natural selection by reducing the strength of Hill-Robertson interference. Such interference can be caused either by selective sweeps of positively selected alleles, or by background selection against deleterious mutations. Its consequences can be studied by comparing patterns of molecular evolution and variation in genomic regions with different rates of crossing over. We carried out a comprehensive study of the benefits of recombination in Drosophila melanogaster, both by contrasting five independent genomic regions that lack crossing over with the rest of the genome and by comparing regions with different rates of crossing over, using data on DNA sequence polymorphisms from an African population that is geographically close to the putatively ancestral population for the species, and on sequence divergence from a related species. We observed reductions in sequence diversity in non-crossover regions that are inconsistent with the effects of hard selective sweeps in the absence of recombination. Overall, the observed patterns suggest that the recombination rate experienced by a gene is positively related to an increase in the efficiency of both positive and purifying selection. The results are consistent with a background selection model with interference among selected sites in non-crossover regions, and joint effects of background selection, selective sweeps and a past population expansion on variability in regions of the genome that experience crossing over. In such crossover regions, the X chromosome exhibits a higher rate of adaptive protein sequence evolution than the autosomes, implying a Faster-X effect.
Molecular Biology and Evolution 01/2014; · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The causes of the large effect of the X chromosome in reproductive isolation and speciation have long been debated. Charlesworth et al. (1987) demonstrated that X-linked loci are expected to have higher rates of adaptive evolution than autosomal loci if new beneficial mutations are on average recessive. Reproductive isolation should therefore evolve faster when contributing loci are located on the X chromosome (the faster-X hypothesis). In this study, we have analysed genome-wide nucleotide polymorphism data from the house mouse subspecies Mus musculus castaneus and nucleotide divergence from Mus famulus and Rattus norvegicus to compare rates of adaptive evolution for autosomal and X-linked protein-coding genes. We found significantly faster adaptive evolution for X-linked loci, particularly for genes with expression in male-specific tissues, but autosomal and X-linked genes with expression in female-specific tissues evolve at similar rates. We also estimated rates of adaptive evolution for genes expressed during spermatogenesis, and found that X-linked genes that escape meiotic sex chromosome inactivation (MSCI) show rapid adaptive evolution. Our results suggest that faster-X adaptive evolution is either due to net recessivity of new advantageous mutations or due to a special gene content of the X chromosome which regulates male function and spermatogenesis. We discuss how our results help to explain the large effect of the X chromosome in speciation.
[Show abstract][Hide abstract] ABSTRACT: Modern biological science generates a vast amount of data, the analysis of which presents a major challenge to researchers. Data are commonly represented in tables stored as plain text files and require line-by-line parsing for analysis, which is time consuming and error prone. Furthermore, there is no simple means of indexing these files so that rows containing particular values can be quickly found.
We introduce a new data format and software library called wormtable, which provides efficient access to tabular data in Python. Wormtable stores data in a compact binary format, provides random access to rows, and enables sophisticated indexing on columns within these tables. Files written in existing formats can be easily converted to wormtable format, and we provide conversion utilities for the VCF and GTF formats.
Wormtable's simple API allows users to process large tables orders of magnitude more quickly than is possible when parsing text. Furthermore, the indexing facilities provide efficient access to subsets of the data along with providing useful methods of summarising columns. Since third-party libraries or custom code are no longer needed to parse complex plain text formats, analysis code can also be substantially simpler as well as being uniform across different data formats. These benefits of reduced code complexity and greatly increased performance allow users much greater freedom to explore their data.
[Show abstract][Hide abstract] ABSTRACT: The contribution of regulatory versus protein change to adaptive evolution has long been controversial. In principle, the rate and strength of adaptation within functional genetic elements can be quantified on the basis of an excess of nucleotide substitutions between species compared to the neutral expectation or from effects of recent substitutions on nucleotide diversity at linked sites. Here, we infer the nature of selective forces acting in proteins, their UTRs and conserved noncoding elements (CNEs) using genome-wide patterns of diversity in wild house mice and divergence to related species. By applying an extension of the McDonald-Kreitman test, we infer that adaptive substitutions are widespread in protein-coding genes, UTRs and CNEs, and we estimate that there are at least four times as many adaptive substitutions in CNEs and UTRs as in proteins. We observe pronounced reductions in mean diversity around nonsynonymous sites (whether or not they have experienced a recent substitution). This can be explained by selection on multiple, linked CNEs and exons. We also observe substantial dips in mean diversity (after controlling for divergence) around protein-coding exons and CNEs, which can also be explained by the combined effects of many linked exons and CNEs. A model of background selection (BGS) can adequately explain the reduction in mean diversity observed around CNEs. However, BGS fails to explain the wide reductions in mean diversity surrounding exons (encompassing ∼100 Kb, on average), implying that there is a substantial role for adaptation within exons or closely linked sites. The wide dips in diversity around exons, which are hard to explain by BGS, suggest that the fitness effects of adaptive amino acid substitutions could be substantially larger than substitutions in CNEs. We conclude that although there appear to be many more adaptive noncoding changes, substitutions in proteins may dominate phenotypic evolution.
[Show abstract][Hide abstract] ABSTRACT: We employed deep genome sequencing of two parents and 12 of their offspring to estimate the mutation rate per site per generation in a full-sib family of Drosophila melanogaster recently sampled from a natural population. Sites that were homozygous for the same allele in the parents and heterozygous in one or more offspring were categorized as candidate mutations and subjected to detailed analysis. In 1.23 x 10(9) callable sites from 12 individuals, we confirmed six single nucleotide mutations. We estimated the false negative rate in the experiment by generating synthetic mutations using the empirical distributions of numbers of non-reference bases at heterozygous sites in the offspring. The proportion of synthetic mutations at callable sites that we failed to detect was less than 1%, implying that the false negative rate was extremely low. Our estimate of the point mutation rate is 2.8 x 10(-9) (95% confidence interval = 1.0 x 10(-9) - 6.1 x 10(-9)) per site per generation, which is at the low end of the range of previous estimates, and suggests an effective population size for the species of ~1.4 x 10(6). At one site, point mutations were present in two individuals, indicating that there had been a premeiotic mutation cluster, although surprisingly one individual had a G→A transition and the other a G→T transversion, possibly associated with error-prone mismatch repair. We also detected three short deletion mutations and no insertions giving a deletion mutation rate of 1.2 x 10(-9) (95% confidence interval = 0.7 x 10(-9) - 11 x 10(-9)).
[Show abstract][Hide abstract] ABSTRACT: Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.
[Show abstract][Hide abstract] ABSTRACT: There are many more selectively constrained noncoding than coding nucleotides in the mammalian genome, but most mammalian noncoding DNA is subject to weak selection, on average. One of the most striking discoveries to have emerged from comparisons among mammalian genomes is the hundreds of noncoding elements of more than 200 bp in length that show absolute conservation among mammalian orders. These elements represent the tip of the iceberg of a much larger class of conserved noncoding elements (CNEs). Much evidence suggests that CNEs are selectively constrained and not mutational cold-spots, and there is evidence that some CNEs play a role in the regulation of development. Here, we quantify negative and positive selection acting in murine CNEs by analyzing within-species nucleotide variation and between-species divergence of CNEs that we identified using a phylogenetically independent comparison. The distribution of fitness effects of new mutations in CNEs, inferred from within-species polymorphism, suggests that CNEs receive a higher number of strongly selected deleterious mutations and many fewer nearly neutral mutations than amino acid sites of protein-coding genes or regulatory elements close to genes. However, we also show that CNEs experience a far higher proportion of adaptive substitutions than any known category of genomic sites in murids. The absolute rate of adaptation of CNEs is similar to that of amino acid sites of proteins. This result suggests that there is widespread adaptation in mammalian conserved noncoding DNA elements, some of which have been implicated in the regulation of crucially important processes, including development.
Molecular Biology and Evolution 04/2011; 28(9):2651-60. · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We develop an inference method that uses approximate Bayesian computation (ABC) to simultaneously estimate mutational parameters and selective constraint on the basis of nucleotide divergence for protein-coding genes between pairs of species. Our simulations explicitly model CpG hypermutability and transition vs. transversion mutational biases along with negative and positive selection operating on synonymous and nonsynonymous sites. We evaluate the method by simulations in which true mean parameter values are known and show that it produces reasonably unbiased parameter estimates as long as sequences are not too short and sequence divergence is not too low. We show that the use of quadratic regression within ABC offers an improvement over linear regression, but that weighted regression has little impact on the efficiency of the procedure. We apply the method to estimate mutational and selective constraint parameters in data sets of protein-coding genes extracted from the genome sequences of primates, murids, and carnivores. Estimates of CpG hypermutability are substantially higher in primates than murids and carnivores. Nonsynonymous site selective constraint is substantially higher in murids and carnivores than primates, and autosomal nonsynonymous constraint is higher than X-chromsome constraint in all taxa. We detect significant selective constraint at synonymous sites in primates, carnivores, and murid rodents. Synonymous site selective constraint is weakest in murids, a surprising result, considering that murid effective population sizes are likely to be considerably higher than the other two taxa.
[Show abstract][Hide abstract] ABSTRACT: During the past two decades, evidence has accumulated of adaptive evolution within protein-coding genes in a variety of species. However, with the exception of Drosophila and humans, little is known about the extent of adaptive evolution in noncoding DNA. Here, we study regions upstream and downstream of protein-coding genes in the house mouse Mus musculus castaneus, a species that has a much larger effective population size (N(e)) than humans. We analyze polymorphism data for 78 genes from 15 wild-caught M. m. castaneus individuals and divergence to a closely related species, Mus famulus. We find high levels of nucleotide diversity and moderate levels of selective constraint in upstream and downstream regions compared with nonsynonymous sites of protein-coding genes. From the polymorphism data, we estimate the distribution of fitness effects (DFE) of new mutations and infer that most new mutations in upstream and downstream regions behave as effectively neutral and that only a small fraction is strongly negatively selected. We also estimate the fraction of substitutions that have been driven to fixation by positive selection (α) and the ratio of adaptive to neutral divergence (ω(α)). We find that α for upstream and downstream regions (∼ 10%) is much lower than α for nonsynonymous sites (∼ 50%). However, ω(α) estimates are very similar for nonsynonymous sites (∼ 10%) and upstream and downstream regions (∼ 5%). We conclude that negative selection operating in upstream and downstream regions of M. m. castaneus is weak and that the low values of α for upstream and downstream regions relative to nonsynonymous sites are most likely due to the presence of a higher proportion of neutrally evolving sites and not due to lower absolute rates of adaptive substitution.
Molecular Biology and Evolution 11/2010; 28(3):1183-91. · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The relative contributions of neutral and adaptive substitutions to molecular evolution has been one of the most controversial issues in evolutionary biology for more than 40 years. The analysis of within-species nucleotide polymorphism and between-species divergence data supports a widespread role for adaptive protein evolution in certain taxa. For example, estimates of the proportion of adaptive amino acid substitutions (alpha) are 50% or more in enteric bacteria and Drosophila. In contrast, recent estimates of alpha for hominids have been at most 13%. Here, we estimate alpha for protein sequences of murid rodents based on nucleotide polymorphism data from multiple genes in a population of the house mouse subspecies Mus musculus castaneus, which inhabits the ancestral range of the Mus species complex and nucleotide divergence between M. m. castaneus and M. famulus or the rat. We estimate that 57% of amino acid substitutions in murids have been driven by positive selection. Hominids, therefore, are exceptional in having low apparent levels of adaptive protein evolution. The high frequency of adaptive amino acid substitutions in wild mice is consistent with their large effective population size, leading to effective natural selection at the molecular level. Effective natural selection also manifests itself as a paucity of effectively neutral nonsynonymous mutations in M. m. castaneus compared to humans.
[Show abstract][Hide abstract] ABSTRACT: Mutation accumulation (MA) experiments, in which mutations are allowed to drift to fixation in inbred lines, have been a principal way of studying the rates and properties of new spontaneous mutations. Phenotypic assays of MA lines inform us about the nature of new mutational variation for quanti-tative traits and provide estimates of the genomic rate and the distribution of effects of new mutations. Parameter estimates compared for a range of species suggest that the genomic mutation rate varies by several orders of magnitude and that the distribution of effects tends to be dominated by large-effect mutations. Some experiments suggest synergistic interactions between the effects of spontaneous deleterious mutations, whereas others do not. There is little reliable information on the distribution of dominance effects of new mutations. Most evidence does not suggest strong dependency of the effects of new mutations on the environment. Information from phe-notypic assays has recently been augmented by direct molecular estimates of the mutation rate.
Annual Review of Ecology Evolution and Systematics 12/2009; 40:151-72. · 10.98 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Contrary to the classical view, a large amount of non-coding DNA seems to be selectively constrained in Drosophila and other species. Here, using Drosophila miranda BAC sequences and the Drosophila pseudoobscura genome sequence, we aligned coding and non-coding sequences between D. pseudoobscura and D. miranda, and investigated their patterns of evolution. We found two patterns that have previously been observed in comparisons between Drosophila melanogaster and its relatives. First, there is a negative correlation between intron divergence and intron length, suggesting that longer non-coding sequences may contain more regulatory elements than shorter sequences. Our other main finding is a negative correlation between the rate of non-synonymous substitutions (d(N)) and codon usage bias (F(op)), showing that fast-evolving genes have a lower codon usage bias, consistent with strong positive selection interfering with weak selection for codon usage.
Journal of Molecular Evolution 10/2009; 69(6):601-11. · 1.86 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here, we show that in hominids, a group with historically low effective population sizes, all classes of noncoding DNA evolve more slowly than ancestral transposable elements and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5'UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably 4-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at 4-fold sites are effectively selected against in hominids. The higher constraint at 4-fold sites in hominids suggests a more complex protein-coding gene structure than murids and indicates that methods for detecting selection on protein-coding sequences (e.g., using the d(N)/d(S) ratio), with 4-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection and that there are three times as many constrained sites within noncoding sequences as within protein-coding sequences. Including coding and noncoding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.
Molecular Biology and Evolution 09/2009; 27(1):177-92. · 14.31 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Variation from new mutations is important for several questions in quantitative genetics. Key parameters are the genomic mutation rate and the distribution of effects of mutations (DEM), which determine the amount of new quantitative variation that arises per generation from mutation (V(M)). Here, we review methods and empirical results concerning mutation accumulation (MA) experiments that have shed light on properties of mutations affecting quantitative traits. Surprisingly, most data on fitness traits from laboratory assays of MA lines indicate that the DEM is platykurtic in form (i.e., substantially less leptokurtic than an exponential distribution), and imply that most variation is produced by mutations of moderate to large effect. This finding contrasts with results from MA or mutagenesis experiments in which mutational changes to the DNA can be assayed directly, which imply that the vast majority of mutations have very small phenotypic effects, and that the distribution has a leptokurtic form. We compare these findings with recent approaches that attempt to infer the DEM for fitness based on comparing the frequency spectra of segregating nucleotide polymorphisms at putatively neutral and selected sites in population samples. When applied to data for humans and Drosophila, these analyses also indicate that the DEM is strongly leptokurtic. However, by combining the resultant estimates of parameters of the DEM with estimates of the mutation rate per nucleotide, the predicted V(M) for fitness is only a tiny fraction of V(M) observed in MA experiments. This discrepancy can be explained if we postulate that a few deleterious mutations of large effect contribute most of the mutational variation observed in MA experiments and that such mutations segregate at very low frequencies in natural populations, and effectively are never seen in population samples.
[Show abstract][Hide abstract] ABSTRACT: Sex allocation theory has proved extremely successful at predicting when individuals should adjust the sex of their offspring in response to environmental conditions. However, we know rather little about the underlying genetics of sex ratio or how genetic architecture might constrain adaptive sex-ratio behavior. We examined how mutation influenced genetic variation in the sex ratios produced by the parasitoid wasp Nasonia vitripennis. In a mutation accumulation experiment, we determined the mutability of sex ratio, and compared this with the amount of genetic variation observed in natural populations. We found that the mutability (h(2)(m)) ranges from 0.001 to 0.002, similar to estimates for life-history traits in other organisms. These estimates suggest one mutation every 5-60 generations, which shift the sex ratio by approximately 0.01 (proportion males). In this and other studies, the genetic variation in N. vitripennis sex ratio ranged from 0.02 to 0.17 (broad-sense heritability, H(2)). If sex ratio is maintained by mutation-selection balance, a higher genetic variance would be expected given our mutational parameters. Instead, the observed genetic variance perhaps suggests additional selection against sex-ratio mutations with deleterious effects on other fitness traits as well as sex ratio (i.e., pleiotropy), as has been argued to be the case more generally.
[Show abstract][Hide abstract] ABSTRACT: Interspecies divergence of orthologous transposable element remnants is often assumed to be simply due to genetic drift of neutral mutations that occurred after the divergence of the species. However, divergence may also be affected by other factors, such as variation in the mutation rate, ancestral polymorphisms, or selection. Here we attempt to determine the impact of these forces on divergence of three classes of sites that are often assumed to be selectively unconstrained (INE-1 TE remnants, sites within short introns, and fourfold degenerate sites) in two different pairwise comparisons of Drosophila (D. melanogaster vs. D. simulans and D. simulans vs. D. sechellia). We find that divergence of these three classes of sites is strongly influenced by the recombination environment in which they are located, and this is especially true for the closer D. simulans vs. D. sechellia comparison. We suggest that this is mainly a result of the contribution of ancestral polymorphisms in different recombination regions. We also find that intergenic INE-1 elements are significantly more diverged than intronic INE-1 in both pairwise comparisons, implying the presence of either negative selection or lower mutation rates in introns. Furthermore, we show that substitution rates in INE-1 elements are not associated with the length of the noncoding sequence in which they are located, suggesting that reduced divergence in long noncoding sequences is not due to reduced mutation rates in these regions. Finally, we show that GC content for each site within INE-1 sequences has evolved toward an equilibrium value (approximately 33%) since insertion.
Journal of Molecular Evolution 01/2008; 65(6):627-39. · 1.86 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
[Show abstract][Hide abstract] ABSTRACT: The recombinational environment is predicted to influence patterns of protein sequence evolution through the effects of Hill-Robertson interference among linked sites subject to selection. In freely recombining regions of the genome, selection should more effectively incorporate new beneficial mutations, and eliminate deleterious ones, than in regions with low rates of genetic recombination.
We examined the effects of recombinational environment on patterns of evolution using a genome-wide comparison of Drosophila melanogaster and D. yakuba. In regions of the genome with no crossing over, we find elevated divergence at nonsynonymous sites and in long introns, a virtual absence of codon usage bias, and an increase in gene length. However, we find little evidence for differences in patterns of evolution between regions with high, intermediate, and low crossover frequencies. In addition, genes on the fourth chromosome exhibit more extreme deviations from regions with crossing over than do other, no crossover genes outside the fourth chromosome.
All of the patterns observed are consistent with a severe reduction in the efficacy of selection in the absence of crossing over, resulting in the accumulation of deleterious mutations in these regions. Our results also suggest that even a very low frequency of crossing over may be enough to maintain the efficacy of selection.
[Show abstract][Hide abstract] ABSTRACT: Spontaneous mutations are the source of genetic variation required for evolutionary change, and are therefore important for many aspects of evolutionary biology. For example, the divergence between taxa at neutrally evolving sites in the genome is proportional to the per nucleotide mutation rate, u (ref. 1), and this can be used to date speciation events by assuming a molecular clock. The overall rate of occurrence of deleterious mutations in the genome each generation (U) appears in theories of nucleotide divergence and polymorphism, the evolution of sex and recombination, and the evolutionary consequences of inbreeding. However, estimates of U based on changes in allozymes or DNA sequences and fitness traits are discordant. Here we directly estimate u in Drosophila melanogaster by scanning 20 million bases of DNA from three sets of mutation accumulation lines by using denaturing high-performance liquid chromatography. From 37 mutation events that we detected, we obtained a mean estimate for u of 8.4 x 10(-9) per generation. Moreover, we detected significant heterogeneity in u among the three mutation-accumulation-line genotypes. By multiplying u by an estimate of the fraction of mutations that are deleterious in natural populations of Drosophila, we estimate that U is 1.2 per diploid genome. This high rate suggests that selection against deleterious mutations may have a key role in explaining patterns of genetic variation in the genome, and help to maintain recombination and sexual reproduction.