Daniel L Halligan

The University of Edinburgh, Edinburgh, SCT, United Kingdom

Are you Daniel L Halligan?

Claim your profile

Publications (22)261.4 Total impact

  • Source
    Dataset: nature06341-s1
  • Source
    Article: Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans.
    Peter D Keightley, Daniel L Halligan
    [show abstract] [hide abstract]
    ABSTRACT: Sequencing errors and random sampling of nucleotide types among sequencing reads at heterozygous sites present challenges for accurate, unbiased inference of single-nucleotide polymorphism genotypes from high-throughput sequence data. Here, we develop a maximum-likelihood approach to estimate the frequency distribution of the number of alleles in a sample of individuals (the site frequency spectrum), using high-throughput sequence data. Our method assumes binomial sampling of nucleotide types in heterozygotes and random sequencing error. By simulations, we show that close to unbiased estimates of the site frequency spectrum can be obtained if the error rate per base read does not exceed the population nucleotide diversity. We also show that these estimates are reasonably robust if errors are nonrandom. We then apply the method to infer site frequency spectra for zerofold degenerate, fourfold degenerate, and intronic sites of protein-coding genes using the low coverage human sequence data produced by the 1000 Genomes Project phase-one pilot. By fitting a model to the inferred site frequency spectra that estimates parameters of the distribution of fitness effects of new mutations, we find evidence for significant natural selection operating on fourfold sites. We also find that a model with variable effects of mutations at synonymous sites fits the data significantly better than a model with equal mutational effects. Under the variable effects model, we infer that 11% of synonymous mutations are subject to strong purifying selection.
    Genetics 05/2011; 188(4):931-40. · 4.01 Impact Factor
  • Source
    Article: Positive and negative selection in murine ultraconserved noncoding elements.
    [show abstract] [hide abstract]
    ABSTRACT: There are many more selectively constrained noncoding than coding nucleotides in the mammalian genome, but most mammalian noncoding DNA is subject to weak selection, on average. One of the most striking discoveries to have emerged from comparisons among mammalian genomes is the hundreds of noncoding elements of more than 200 bp in length that show absolute conservation among mammalian orders. These elements represent the tip of the iceberg of a much larger class of conserved noncoding elements (CNEs). Much evidence suggests that CNEs are selectively constrained and not mutational cold-spots, and there is evidence that some CNEs play a role in the regulation of development. Here, we quantify negative and positive selection acting in murine CNEs by analyzing within-species nucleotide variation and between-species divergence of CNEs that we identified using a phylogenetically independent comparison. The distribution of fitness effects of new mutations in CNEs, inferred from within-species polymorphism, suggests that CNEs receive a higher number of strongly selected deleterious mutations and many fewer nearly neutral mutations than amino acid sites of protein-coding genes or regulatory elements close to genes. However, we also show that CNEs experience a far higher proportion of adaptive substitutions than any known category of genomic sites in murids. The absolute rate of adaptation of CNEs is similar to that of amino acid sites of proteins. This result suggests that there is widespread adaptation in mammalian conserved noncoding DNA elements, some of which have been implicated in the regulation of crucially important processes, including development.
    Molecular Biology and Evolution 04/2011; 28(9):2651-60. · 5.55 Impact Factor
  • Source
    Article: Inference of mutation parameters and selective constraint in mammalian coding sequences by approximate Bayesian computation.
    [show abstract] [hide abstract]
    ABSTRACT: We develop an inference method that uses approximate Bayesian computation (ABC) to simultaneously estimate mutational parameters and selective constraint on the basis of nucleotide divergence for protein-coding genes between pairs of species. Our simulations explicitly model CpG hypermutability and transition vs. transversion mutational biases along with negative and positive selection operating on synonymous and nonsynonymous sites. We evaluate the method by simulations in which true mean parameter values are known and show that it produces reasonably unbiased parameter estimates as long as sequences are not too short and sequence divergence is not too low. We show that the use of quadratic regression within ABC offers an improvement over linear regression, but that weighted regression has little impact on the efficiency of the procedure. We apply the method to estimate mutational and selective constraint parameters in data sets of protein-coding genes extracted from the genome sequences of primates, murids, and carnivores. Estimates of CpG hypermutability are substantially higher in primates than murids and carnivores. Nonsynonymous site selective constraint is substantially higher in murids and carnivores than primates, and autosomal nonsynonymous constraint is higher than X-chromsome constraint in all taxa. We detect significant selective constraint at synonymous sites in primates, carnivores, and murid rodents. Synonymous site selective constraint is weakest in murids, a surprising result, considering that murid effective population sizes are likely to be considerably higher than the other two taxa.
    Genetics 02/2011; 187(4):1153-61. · 4.01 Impact Factor
  • Source
    Article: Positive and negative selection on noncoding DNA close to protein-coding genes in wild house mice.
    [show abstract] [hide abstract]
    ABSTRACT: During the past two decades, evidence has accumulated of adaptive evolution within protein-coding genes in a variety of species. However, with the exception of Drosophila and humans, little is known about the extent of adaptive evolution in noncoding DNA. Here, we study regions upstream and downstream of protein-coding genes in the house mouse Mus musculus castaneus, a species that has a much larger effective population size (N(e)) than humans. We analyze polymorphism data for 78 genes from 15 wild-caught M. m. castaneus individuals and divergence to a closely related species, Mus famulus. We find high levels of nucleotide diversity and moderate levels of selective constraint in upstream and downstream regions compared with nonsynonymous sites of protein-coding genes. From the polymorphism data, we estimate the distribution of fitness effects (DFE) of new mutations and infer that most new mutations in upstream and downstream regions behave as effectively neutral and that only a small fraction is strongly negatively selected. We also estimate the fraction of substitutions that have been driven to fixation by positive selection (α) and the ratio of adaptive to neutral divergence (ω(α)). We find that α for upstream and downstream regions (∼ 10%) is much lower than α for nonsynonymous sites (∼ 50%). However, ω(α) estimates are very similar for nonsynonymous sites (∼ 10%) and upstream and downstream regions (∼ 5%). We conclude that negative selection operating in upstream and downstream regions of M. m. castaneus is weak and that the low values of α for upstream and downstream regions relative to nonsynonymous sites are most likely due to the presence of a higher proportion of neutrally evolving sites and not due to lower absolute rates of adaptive substitution.
    Molecular Biology and Evolution 11/2010; 28(3):1183-91. · 5.55 Impact Factor
  • Source
    Article: Evidence for pervasive adaptive protein evolution in wild mice.
    [show abstract] [hide abstract]
    ABSTRACT: The relative contributions of neutral and adaptive substitutions to molecular evolution has been one of the most controversial issues in evolutionary biology for more than 40 years. The analysis of within-species nucleotide polymorphism and between-species divergence data supports a widespread role for adaptive protein evolution in certain taxa. For example, estimates of the proportion of adaptive amino acid substitutions (alpha) are 50% or more in enteric bacteria and Drosophila. In contrast, recent estimates of alpha for hominids have been at most 13%. Here, we estimate alpha for protein sequences of murid rodents based on nucleotide polymorphism data from multiple genes in a population of the house mouse subspecies Mus musculus castaneus, which inhabits the ancestral range of the Mus species complex and nucleotide divergence between M. m. castaneus and M. famulus or the rat. We estimate that 57% of amino acid substitutions in murids have been driven by positive selection. Hominids, therefore, are exceptional in having low apparent levels of adaptive protein evolution. The high frequency of adaptive amino acid substitutions in wild mice is consistent with their large effective population size, leading to effective natural selection at the molecular level. Effective natural selection also manifests itself as a paucity of effectively neutral nonsynonymous mutations in M. m. castaneus compared to humans.
    PLoS Genetics 01/2010; 6(1):e1000825. · 8.69 Impact Factor
  • Source
    Article: Distributions of selectively constrained sites and deleterious mutation rates in the hominid and murid genomes.
    Lél Eory, Daniel L Halligan, Peter D Keightley
    [show abstract] [hide abstract]
    ABSTRACT: Protein-coding sequences make up only about 1% of the mammalian genome. Much of the remaining 99% has been long assumed to be junk DNA, with little or no functional significance. Here, we show that in hominids, a group with historically low effective population sizes, all classes of noncoding DNA evolve more slowly than ancestral transposable elements and so appear to be subject to significant evolutionary constraints. Under the nearly neutral theory, we expected to see lower levels of selective constraints on most sequence types in hominids than murids, a group that is thought to have a higher effective population size. We found that this is the case for many sequence types examined, the most extreme example being 5'UTRs, for which constraint in hominids is only about one-third that of murids. Surprisingly, however, we observed higher constraints for some sequence types in hominids, notably 4-fold sites, where constraint is more than twice as high as in murids. This implies that more than about one-fifth of mutations at 4-fold sites are effectively selected against in hominids. The higher constraint at 4-fold sites in hominids suggests a more complex protein-coding gene structure than murids and indicates that methods for detecting selection on protein-coding sequences (e.g., using the d(N)/d(S) ratio), with 4-fold sites as a neutral standard, may lead to biased estimates, particularly in hominids. Our constraint estimates imply that 5.4% of nucleotide sites in the human genome are subject to effective negative selection and that there are three times as many constrained sites within noncoding sequences as within protein-coding sequences. Including coding and noncoding sites, we estimate that the genomic deleterious mutation rate U = 4.2. The mutational load predicted under a multiplicative model is therefore about 99% in hominids.
    Molecular Biology and Evolution 09/2009; 27(1):177-92. · 5.55 Impact Factor
  • Source
    Article: Spontaneous mutation accumulation studies in evolutionary genetics
    Daniel L Halligan, Peter D Keightley
    [show abstract] [hide abstract]
    ABSTRACT: Mutation accumulation (MA) experiments, in which mutations are allowed to drift to fixation in inbred lines, have been a principal way of studying the rates and properties of new spontaneous mutations. Phenotypic assays of MA lines inform us about the nature of new mutational variation for quanti-tative traits and provide estimates of the genomic rate and the distribution of effects of new mutations. Parameter estimates compared for a range of species suggest that the genomic mutation rate varies by several orders of magnitude and that the distribution of effects tends to be dominated by large-effect mutations. Some experiments suggest synergistic interactions between the effects of spontaneous deleterious mutations, whereas others do not. There is little reliable information on the distribution of dominance effects of new mutations. Most evidence does not suggest strong dependency of the effects of new mutations on the environment. Information from phe-notypic assays has recently been augmented by direct molecular estimates of the mutation rate.
    Annu. Rev. Ecol. Evol. Syst. 01/2009; 40:151-72.
  • Source
    Article: Effects of spontaneous mutation accumulation on sex ratio traits in a parasitoid wasp.
    [show abstract] [hide abstract]
    ABSTRACT: Sex allocation theory has proved extremely successful at predicting when individuals should adjust the sex of their offspring in response to environmental conditions. However, we know rather little about the underlying genetics of sex ratio or how genetic architecture might constrain adaptive sex-ratio behavior. We examined how mutation influenced genetic variation in the sex ratios produced by the parasitoid wasp Nasonia vitripennis. In a mutation accumulation experiment, we determined the mutability of sex ratio, and compared this with the amount of genetic variation observed in natural populations. We found that the mutability (h(2)(m)) ranges from 0.001 to 0.002, similar to estimates for life-history traits in other organisms. These estimates suggest one mutation every 5-60 generations, which shift the sex ratio by approximately 0.01 (proportion males). In this and other studies, the genetic variation in N. vitripennis sex ratio ranged from 0.02 to 0.17 (broad-sense heritability, H(2)). If sex ratio is maintained by mutation-selection balance, a higher genetic variance would be expected given our mutational parameters. Instead, the observed genetic variance perhaps suggests additional selection against sex-ratio mutations with deleterious effects on other fitness traits as well as sex ratio (i.e., pleiotropy), as has been argued to be the case more generally.
    Evolution 08/2008; 62(8):1921-35. · 5.15 Impact Factor
  • Source
    Article: Effect of divergence time and recombination rate on molecular evolution of Drosophila INE-1 transposable elements and other candidates for neutrally evolving sites.
    Jun Wang, Peter D Keightley, Daniel L Halligan
    [show abstract] [hide abstract]
    ABSTRACT: Interspecies divergence of orthologous transposable element remnants is often assumed to be simply due to genetic drift of neutral mutations that occurred after the divergence of the species. However, divergence may also be affected by other factors, such as variation in the mutation rate, ancestral polymorphisms, or selection. Here we attempt to determine the impact of these forces on divergence of three classes of sites that are often assumed to be selectively unconstrained (INE-1 TE remnants, sites within short introns, and fourfold degenerate sites) in two different pairwise comparisons of Drosophila (D. melanogaster vs. D. simulans and D. simulans vs. D. sechellia). We find that divergence of these three classes of sites is strongly influenced by the recombination environment in which they are located, and this is especially true for the closer D. simulans vs. D. sechellia comparison. We suggest that this is mainly a result of the contribution of ancestral polymorphisms in different recombination regions. We also find that intergenic INE-1 elements are significantly more diverged than intronic INE-1 in both pairwise comparisons, implying the presence of either negative selection or lower mutation rates in introns. Furthermore, we show that substitution rates in INE-1 elements are not associated with the length of the noncoding sequence in which they are located, suggesting that reduced divergence in long noncoding sequences is not due to reduced mutation rates in these regions. Finally, we show that GC content for each site within INE-1 sequences has evolved toward an equilibrium value (approximately 33%) since insertion.
    Journal of Molecular Evolution 01/2008; 65(6):627-39. · 2.27 Impact Factor
  • Source
    Article: Evolution of genes and genomes on the Drosophila phylogeny.
    [show abstract] [hide abstract]
    ABSTRACT: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
    Nature 12/2007; 450(7167):203-18. · 36.28 Impact Factor
  • Article: Evolution of genes and genomes on the Drosophila phylogeny
    [show abstract] [hide abstract]
    ABSTRACT: Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
    Nature 11/2007; 450(7167):203-218. · 36.28 Impact Factor
  • Article: Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila.
    [show abstract] [hide abstract]
    ABSTRACT: Spontaneous mutations are the source of genetic variation required for evolutionary change, and are therefore important for many aspects of evolutionary biology. For example, the divergence between taxa at neutrally evolving sites in the genome is proportional to the per nucleotide mutation rate, u (ref. 1), and this can be used to date speciation events by assuming a molecular clock. The overall rate of occurrence of deleterious mutations in the genome each generation (U) appears in theories of nucleotide divergence and polymorphism, the evolution of sex and recombination, and the evolutionary consequences of inbreeding. However, estimates of U based on changes in allozymes or DNA sequences and fitness traits are discordant. Here we directly estimate u in Drosophila melanogaster by scanning 20 million bases of DNA from three sets of mutation accumulation lines by using denaturing high-performance liquid chromatography. From 37 mutation events that we detected, we obtained a mean estimate for u of 8.4 x 10(-9) per generation. Moreover, we detected significant heterogeneity in u among the three mutation-accumulation-line genotypes. By multiplying u by an estimate of the fraction of mutations that are deleterious in natural populations of Drosophila, we estimate that U is 1.2 per diploid genome. This high rate suggests that selection against deleterious mutations may have a key role in explaining patterns of genetic variation in the genome, and help to maintain recombination and sexual reproduction.
    Nature 02/2007; 445(7123):82-5. · 36.28 Impact Factor
  • Source
    Article: Reduced efficacy of selection in regions of the Drosophila genome that lack crossing over.
    [show abstract] [hide abstract]
    ABSTRACT: The recombinational environment is predicted to influence patterns of protein sequence evolution through the effects of Hill-Robertson interference among linked sites subject to selection. In freely recombining regions of the genome, selection should more effectively incorporate new beneficial mutations, and eliminate deleterious ones, than in regions with low rates of genetic recombination. We examined the effects of recombinational environment on patterns of evolution using a genome-wide comparison of Drosophila melanogaster and D. yakuba. In regions of the genome with no crossing over, we find elevated divergence at nonsynonymous sites and in long introns, a virtual absence of codon usage bias, and an increase in gene length. However, we find little evidence for differences in patterns of evolution between regions with high, intermediate, and low crossover frequencies. In addition, genes on the fourth chromosome exhibit more extreme deviations from regions with crossing over than do other, no crossover genes outside the fourth chromosome. All of the patterns observed are consistent with a severe reduction in the efficacy of selection in the absence of crossing over, resulting in the accumulation of deleterious mutations in these regions. Our results also suggest that even a very low frequency of crossing over may be enough to maintain the efficacy of selection.
    Genome biology 02/2007; 8(2):R18. · 6.63 Impact Factor
  • Article: Direct estimation of per nucleotide and genomic deleterious mutation rates in Drosophila
    [show abstract] [hide abstract]
    ABSTRACT: Spontaneous mutations are the source of genetic variation required for evolutionary change, and are therefore important for many aspects of evolutionary biology. For example, the divergence between taxa at neutrally evolving sites in the genome is proportional to the per nucleotide mutation rate,
    Nature 01/2007; 445(7123):82-85. · 36.28 Impact Factor
  • Article: Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison.
    Daniel L Halligan, Peter D Keightley
    [show abstract] [hide abstract]
    ABSTRACT: Non-coding DNA comprises approximately 80% of the euchromatic portion of the Drosophila melanogaster genome. Non-coding sequences are known to contain functionally important elements controlling gene expression, but the proportion of sites that are selectively constrained is still largely unknown. We have compared the complete D. melanogaster and Drosophila simulans genome sequences to estimate mean selective constraint (the fraction of mutations that are eliminated by selection) in coding and non-coding DNA by standardizing to substitution rates in putatively unconstrained sequences. We show that constraint is positively correlated with intronic and intergenic sequence length and is generally remarkably strong in non-coding DNA, implying that more than half of all point mutations in the Drosophila genome are deleterious. This fraction is also likely to be an underestimate if many substitutions in non-coding DNA are adaptively driven to fixation. We also show that substitutions in long introns and intergenic sequences are clustered, such that there is an excess of substitutions <8 bp apart and a deficit farther apart. These results suggest that there are blocks of constrained nucleotides, presumably involved in gene expression control, that are concentrated in long non-coding sequences. Furthermore, we infer that there is more than three times as much functional non-coding DNA as protein-coding DNA in the Drosophila genome. Most deleterious mutations therefore occur in non-coding DNA, and these may make an important contribution to a wide variety of evolutionary processes.
    Genome Research 07/2006; 16(7):875-84. · 13.61 Impact Factor
  • Article: Natural selection drives extremely rapid evolution in antiviral RNAi genes.
    [show abstract] [hide abstract]
    ABSTRACT: RNA interference (RNAi) is perhaps best known as a laboratory tool. However, RNAi-related pathways represent an antiviral component of innate immunity in both plants and animals. Since viruses can protect themselves by suppressing RNAi, interaction between RNA viruses and host RNAi may represent an ancient coevolutionary "arms race." This could lead to strong directional selection on RNAi genes, but to date their evolution has not been studied. By comparing DNA sequences from different species of Drosophila, we show that the rate of amino acid evolution is substantially elevated in genes related to antiviral RNAi function (Dcr2, R2D2, and Ago2). They are among the fastest evolving 3% of all Drosophila genes; they evolve significantly faster than other components of innate immunity and faster than paralogous genes that mediate "housekeeping" functions. Based on DNA polymorphism data from three species of Drosophila, McDonald-Kreitman tests showed that this rapid evolution is due to strong positive selection. Furthermore, Dcr2 and Ago2 display reduced genetic diversity, indicative of a recent selective sweep in both genes. Together, these data show rapid adaptive evolution of the antiviral RNAi pathway in Drosophila. This is a signature of host-pathogen arms races and implies that the ancient battle between RNA viruses and host antiviral RNAi genes is active and significant in shaping RNAi function.
    Current Biology 04/2006; 16(6):580-5. · 9.65 Impact Factor
  • Article: Evolutionary constraints in conserved nongenic sequences of mammals.
    [show abstract] [hide abstract]
    ABSTRACT: Mammalian genomes contain many highly conserved nongenic sequences (CNGs) whose functional significance is poorly understood. Sets of CNGs have previously been identified by selecting the most conserved elements from a chromosome or genome, but in these highly selected samples, conservation may be unrelated to purifying selection. Furthermore, conservation of CNGs may be caused by mutation rate variation rather than selective constraints. To account for the effect of selective sampling, we have examined conservation of CNGs in taxa whose evolution is largely independent of the taxa from which the CNGs were initially identified, and we have controlled for mutation rate variation in the genome. We show that selective constraints in CNGs and their flanks are about one-half as strong in hominids as in murids, implying that hominids have accumulated many slightly deleterious mutations in functionally important nongenic regions. This is likely to be a consequence of the low effective population size of hominids leading to a reduced effectiveness of selection. We estimate that there are one and two times as many conserved nucleotides in CNGs as in known protein-coding genes of hominids and murids, respectively. Polymorphism frequencies in CNGs indicate that purifying selection operates in these sequences. During hominid evolution, we estimate that a total of about three deleterious mutations in CNGs and protein-coding genes have been selectively eliminated per diploid genome each generation, implying that deleterious mutations are eliminated from populations non-independently and that sex is necessary for long-term population persistence.
    Genome Research 11/2005; 15(10):1373-8. · 13.61 Impact Factor
  • Source
    Article: Patterns of intron sequence evolution in Drosophila are dependent upon length and GC content.
    [show abstract] [hide abstract]
    ABSTRACT: Introns comprise a large fraction of eukaryotic genomes, yet little is known about their functional significance. Regulatory elements have been mapped to some introns, though these are believed to account for only a small fraction of genome wide intronic DNA. No consistent patterns have emerged from studies that have investigated general levels of evolutionary constraint in introns. We examine the relationship between intron length and levels of evolutionary constraint by analyzing inter-specific divergence at 225 intron fragments in Drosophila melanogaster and Drosophila simulans, sampled from a broad distribution of intron lengths. We document a strongly negative correlation between intron length and divergence. Interestingly, we also find that divergence in introns is negatively correlated with GC content. This relationship does not account for the correlation between intron length and divergence, however, and may simply reflect local variation in mutational rates or biases. Short introns make up only a small fraction of total intronic DNA in the genome. Our finding that long introns evolve more slowly than average implies that, while the majority of introns in the Drosophila genome may experience little or no selective constraint, most intronic DNA in the genome is likely to be evolving under considerable constraint. Our results suggest that functional elements may be ubiquitous within longer introns and that these introns may have a more general role in regulating gene expression than previously appreciated. Our finding that GC content and divergence are negatively correlated in introns has important implications for the interpretation of the correlation between divergence and levels of codon bias observed in Drosophila.
    Genome biology 02/2005; 6(8):R67. · 6.63 Impact Factor
  • Article: Patterns of evolutionary constraints in intronic and intergenic DNA of Drosophila.
    [show abstract] [hide abstract]
    ABSTRACT: We develop methods to infer levels of evolutionary constraints in the genome by comparing rates of nucleotide substitution in noncoding DNA with rates predicted from rates of synonymous site evolution in adjacent genes or other putatively neutrally evolving sites, while accounting for differences in base composition. We apply the methods to estimate levels of constraint in noncoding DNA of Drosophila. In introns, constraint (the estimated fraction of mutations that are selectively eliminated) is absolute at the 5' and 3' splice junction dinucleotides, and averages 72% in base pairs 3-6 at the 5'-end. Constraint at the 5' base pairs 3-6 is significantly lower in the lineage leading to Drosophila melanogaster than in Drosophila simulans, a finding that agrees with other features of genome evolution in Drosophila and indicates that the effect of selection on intron function has been weaker in the melanogaster lineage. Elsewhere in intron sequences, the rate of nucleotide substitution is significantly higher than at synonymous sites. By using intronic sites outside splice control regions as a putative neutrally evolving standard, constraint in the 500 bp of intergenic DNA upstream and downstream regions of protein-coding genes averages approximately 44%. Although the estimated level of constraint in intergenic regions close to genes is only about one-half of that of amino acid sites, selection against single-nucleotide mutations in intergenic DNA makes a substantial contribution to the mutation load in Drosophila.
    Genome Research 03/2004; 14(2):273-9. · 13.61 Impact Factor