Accelerated sequence divergence of conserved
genomic elements in Drosophila melanogaster
Alisha K. Holloway,1,4David J. Begun,1Adam Siepel,2and Katherine S. Pollard3
1Department of Evolution and Ecology and Center for Population Biology, University of California, Davis, California 95691, USA;
2Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA;3UC Davis
Genome Center and Department of Statistics, University of California, Davis, California 95691, USA
Recent genomic sequencing of 10 additional Drosophila genomes provides a rich resource for comparative genomics
analyses aimed at understanding the similarities and differences between species and between Drosophila and mammals.
Using a phylogenetic approach, we identified 64 genomic elements that have been highly conserved over most of the
Drosophila tree, but that have experienced a recent burst of evolution along the Drosophila melanogaster lineage.
Compared to similarly defined elements in humans, these regions of rapid lineage-specific evolution in Drosophila
differ dramatically in location, mechanism of evolution, and functional properties of associated genes. Notably, the
majority reside in protein-coding regions and primarily result from rapid adaptive synonymous site evolution. In
fact, adaptive evolution appears to be driving substitutions to unpreferred codons. Our analysis also highlights
interesting noncoding genomic regions, such as regulatory regions in the gene gooseberry-neuro and a putative novel
[Supplemental material is available online at www.genome.org. Sequence data have been submitted to GenBank under
accession nos. EU588685–EU588714.]
Comparative genomics approaches have assumed a central role
in the identification of functionally important genomic regions
(Kellis et al. 2003; Siepel et al. 2005; Xie et al. 2005; Birney et al.
2007). These approaches are based on the neutral theory predic-
tion that sequences that have been highly conserved over tens of
millions of years are either functionally important or are muta-
tional cold spots (although no molecular mechanism for gener-
ating cold spots has been proposed). Recent population genetic
analyses showed that low-frequency alleles are more common in
highly conserved sequences, which supports the idea that such
sequences, including those that do not encode proteins, are func-
tionally constrained in multiple lineages (Drake et al. 2006; Ast-
hana et al. 2007; Casillas et al. 2007; Katzman et al. 2007). On the
other hand, questions remain about the functional importance
of conserved sequences. For example, a recent functional analysis
provided no evidence for strong viability selection against four
conserved noncoding elements in mice (Ahituv et al. 2007).
The conceptual foundation linking conserved function with
conserved sequence ignores the biologically interesting question
of how biological functions evolve in different lineages. Indeed,
from an evolutionary perspective, understanding the causes of
rapid sequence evolution may be at least as interesting as under-
standing the causes of strong sequence conservation. Of particu-
lar relevance for identifying potential major functional changes
is the identification of genomic regions that are highly conserved
over most of a phylogeny, but that evolve very rapidly in at least
one lineage. Such phylogenetically restricted rapid evolution
could be due to a dramatic change in functional constraint, an
increased mutation rate, or a shift in function, which drives large
numbers of substitutions through populations under directional
selection (Gillespie 1991).
Although the statistical analysis of heterogeneous rates of
coding sequence evolution among lineages has a long history
(Zuckerandl and Pauling 1962; Ohta and Kimura 1971; Langley
and Fitch 1973, 1974), only recently have genome assemblies
and alignments from multiple species (Blanchette et al. 2004;
Clark et al. 2007; Stark et al. 2007) permitted such questions to be
pursued in a comprehensive manner that is unbiased with re-
spect to genomic feature. For example, Pollard et al. (2006) used
alignments of multiple vertebrate species to identify genomic
regions that are highly conserved in most vertebrates, but that
have evolved rapidly in humans. These human accelerated re-
gions (HARs) are candidates for contributing to human-specific
biology. Interestingly, the majority of these regions were non-
coding, and many were located near genes functioning in the
nervous system. A more recent genomic analysis (Kim and Pritch-
ard 2007) took a similar approach, but broadly investigated
heterogeneous rates of evolution for conserved noncoding se-
quence across vertebrates. They concluded that short bursts of
adaptive evolution drive divergence in conserved noncoding se-
The recent availability of multiple genome assemblies (Stark
et al. 2007) and alignments (Karolchik et al. 2003, 2004; Blan-
chette et al. 2004) from Drosophila motivates an extension of
such approaches to the Drosophila model for three main reasons.
First, the experimental power of Drosophila opens up the possi-
bility of detailed, in vivo functional investigation of candidate
regions that are generally highly conserved but evolve rapidly in
one lineage. Second, the genome organizations of flies and ver-
tebrates are markedly distinct, with flies having much more com-
pact genomes containing less noncoding DNA. This raises inter-
esting questions as to whether the genomic distribution of lin-
eage-specific increases in substitution rates in flies will also be
concentrated in noncoding DNA, or whether differences in the
biology and/or population genetics of flies and humans lead to
different patterns. Finally, the Drosophila melanogaster genome is
very well annotated, which facilitates targeted functional studies.
E-mail email@example.com; fax (530) 752-1449.
Article published online before print. Article and publication date are at http://
www.genome.org/cgi/doi/10.1101/gr.077131.108. Freely available online
through the Genome Research Open Access option.
1592 Genome Research
18:1592–1601 ©2008 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/08; www.genome.org
Comparison of functional annotations associated with lineage-
specific rate increases in different lineages could provide clues as
to potential generalities as well as unique biological functions
exhibiting these unusual evolutionary patterns.
Using whole-genome alignments of 10 Drosophila species to the
D. melanogaster reference (Karolchik et al. 2003, 2004; Blanchette
et al. 2004), we identified genomic regions that have been highly
conserved over tens of millions of years, but show a recent ac-
celeration in the rate of evolution solely along the D. melanogas-
ter branch (Fig. 1A). Genomic regions were defined as conserved
if they were 96% similar in sequence between Drosophila simu-
lans, Drosophila yakuba, and Drosophila erecta and were at least
100 bp long. We identified 97,901 conserved regions with a
mean (and median) length of 140 bp. Next, we assessed accelera-
tion along the D. melanogaster branch using a likelihood ratio test
(LRT) to compare two models of evolution over the Drosophila
tree. The three species used to identify conserved regions (D.
simulans, D. yakuba, and D. erecta) were excluded from this step
in the analysis since, by definition, they were highly conserved.
For each candidate region, the LRT compares the likelihood of
the multiple alignments under a local null model with no accel-
eration in D. melanogaster to an alternative model with accelera-
tion. There were 400 accelerated regions with an initial, unad-
justed P-value < 0.05. Sixty-four of the conserved regions were
determined to have significant acceleration along the D. melano-
gaster lineage after adjusting for multiple comparisons using the
false discovery rate (FDR) (adjusted P-value < 0.05; Table 1). Here-
after, we refer to these as Drosophila melanogaster accelerated re-
gions, or DMARs.
Accelerated rates of evolution could result from multiple
single substitution events or they could result from microinver-
sions that would cause a short region of sequence to appear to be
rapidly diverged. An analysis of possible microinversions showed
that only five substitution pairs could have resulted from this
process, which only explains ∼1% of all substitutions in DMARs.
Therefore, the substitution process that leads to DMARs predomi-
nantly results from multiple single substitution events.
The 64 DMARs were dispersed fairly evenly throughout
the major chromosome arms (Fig. 1B). Relative to the proportion
of regions identified on the X chromosome as “conserved” in
the first step of the analysis (10.5%), DMARs are significantly
over-represented on the X chromosome (n = 16, FET two-tailed
P-value = 0.0151). If DMARs are driven to fixation by directional
selection, more efficient selection on the X chromosome could
have led to this finding (for review, see Vicoso and Charlesworth
The majority of DMARs (72%) are found in protein-coding
regions (Table 1). There were 46 DMARs in exons, nine in inter-
genic regions, eight in introns, and a single DMAR in a core
promoter/5? untranslated region (UTR). This distribution of
DMARs among genomic features contrasts dramatically with re-
gions in the human genome that show evidence of recent accel-
eration (HARs), which were found primarily in noncoding re-
gions (Table 2; Pollard et al. 2006). The fact that the majority of
HARs were found in noncoding regions may not be surprising
considering that only 2% of the human genome is protein-
coding. Flies have much more compact genomes, with almost
20% of the genome coding for proteins. However, even after
considering genomic content in Drosophila, a significant excess
of DMARs occur in protein-coding regions (see Table 2).
DMARs in coding regions can be divided into two groups based
on whether substitutions are found primarily at synonymous
sites or nonsynonymous sites (Supplemental Table S1). DMARs
with primarily synonymous substitutions (DMARSS) were defined
as those with fewer than 25% of substitutions at amino acid
elements conserved throughout the tree. Branches in blue (D. simulans, D. yakuba, and D. erecta) were used to identify the blocks of at least 100 bp with
96% identity between the three species. All other lineages (and the D. melanogaster–D. simulans ancestor) were used to infer whether D. melanogaster
had an accelerated rate of evolution relative to the expected rate of evolution based on elements conserved throughout the tree. (B) Locations of D.
melanogaster accelerated regions (DMARs). (Stacked bars) Multiple DMARs within a single locus. (Two bars above a “V”) Two DMARs that were within
the same chromosomal band. DMARs are found predominately in exons (46/64) and are significantly over-represented on the X chromosome (16/64).
Chromosome images adapted from Lefevre (1976).
(A) Phylogeny of 11 Drosophila species with genome sequences. Branch lengths are derived from maximum likelihood analysis of all
Rapid evolution of conserved genomic elements
Schuster, P. 1994. Fast folding and comparison of RNA secondary
structures. Monatsh. Chem. 125: 167–188.
Hudson, R.R., Kreitman, M., and Aguade, M. 1987. A test of neutral
molecular evolution based on nucleotide data. Genetics
Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu,
Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J., et al.
2003. The UCSC Genome Browser Database. Nucleic Acids Res.
Karolchik, D., Hinrichs, A.S., Furey, T.S., Roskin, K.M., Sugnet, C.W.,
Haussler, D., and Kent, W.J. 2004. The UCSC Table Browser data
retrieval tool. Nucleic Acids Res. 32: D493–D496.
Katzman, S., Kern, A.D., Bejerano, G., Fewell, G., Fulton, L., Wilson,
R.K., Salama, S.R., and Haussler, D. 2007. Human genome
ultraconserved elements are ultraselected. Science 317: 915.
Kellis, M., Patterson, N., Endrizzi, M., Birren, B., and Lander, E.S. 2003.
Sequencing and comparison of yeast species to identify genes and
regulatory elements. Nature 423: 241–254.
Kent, W.J., Baertsch, R., Hinrichs, A., Miller, W., and Haussler, D. 2003.
Evolution’s cauldron: Duplication, deletion, and rearrangement in
the mouse and human genomes. Proc. Natl. Acad. Sci.
Kim, S.Y. and Pritchard, J.K. 2007. Adaptive evolution of conserved
noncoding elements in mammals. PLoS Genet. 3: 1572–1586.
Kimchi-Sarfaty, C., Oh, J.M., Kim, I.W., Sauna, Z.E., Calcagno, A.M.,
Ambudkar, S.V., and Gottesman, M.M. 2007. A “silent”
polymorphism in the MDR1 gene changes substrate specificity.
Science 315: 525–528.
Kliman, R.M. and Hey, J. 1994. The effects of mutation and natural
selection on codon bias in the genes of Drosophila. Genetics
Komar, A.A., Lesnik, T., and Reiss, C. 1999. Synonymous codon
substitutions affect ribosome traffic and protein folding during in
vitro translation. FEBS Lett. 462: 387–391.
Konigsberg, W. and Godson, G.N. 1983. Evidence for use of rare codons
in the dnaG gene and other regulatory genes of Escherichia coli. Proc.
Natl. Acad. Sci. 80: 687–691.
Langley, C.H. and Fitch, W.M. 1973. The constancy of evolution: A
statistical analysis of the alpha and beta haemoglobins, cytochrome
c, and fibrinopeptide A. In Genetic structure of populations (ed. N.E.
Morton), pp. 246–262. University of Hawaii Press, Honolulu.
Langley, C.H. and Fitch, W.M. 1974. An estimation of the constancy of
the rate of molecular evolution. J. Mol. Evol. 3: 161–177.
Lefevre, G. 1976. A photographic representation and interpretation of
the polytene chromosomes of Drosophila melanogaster salivary
glands. In The genetics and biology of Drosophila (eds. M. Ashburner
and E. Novitski) pp. 31–66. Academic Press, London.
Li, X. and Noll, M. 1994a. Compatibility between enhancers and
promoters determines the transcriptional specificity of gooseberry and
gooseberry neuro in the Drosophila embryo. EMBO J. 13: 400–406.
Li, X. and Noll, M. 1994b. Evolution of distinct developmental
functions of three Drosophila genes by acquisition of different
cis-regulatory regions. Nature 367: 83–87.
Li, X., Gutjahr, T., and Noll, M. 1993. Separable regulatory elements
mediate the establishment and maintenance of cell states by the
Drosophila segment-polarity gene gooseberry. EMBO J. 12: 1427–1436.
Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. 2003a.
Vertebrate microRNA genes. Science 299: 1540.
Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S.,
Rhoades, M.W., Burge, C.B., and Bartel, D.P. 2003b. The microRNAs
of Caenorhabditis elegans. Genes & Dev. 17: 991–1008.
McCaskill, J.S. 1990. The equilibrium partition function and base pair
binding probabilities for RNA secondary structure. Biopolymers
McDonald, J.H. and Kreitman, M. 1991. Adaptive protein evolution at
the Adh locus in Drosophila. Nature 351: 652–654.
Neafsey, D.E. and Galagan, J.E. 2007. Positive selection for unpreferred
codon usage in eukaryotic genomes. BMC Evol. Biol. 7: 119. doi:
Nei, M. 1987. Molecular evolutionary genetics. Columbia University Press,
Nei, M. and Gojobori, T. 1986. Simple methods for estimating the
numbers of synonymous and nonsynonymous nucleotide
substitutions. Mol. Biol. Evol. 3: 418–426.
Nielsen, R., Bauer DuMont, V.L., Hubisz, M.J., and Aquadro, C.F. 2007.
Maximum likelihood estimation of ancestral codon usage bias
parameters in Drosophila. Mol. Biol. Evol. 24: 228–235.
Ohta, T. and Kimura, M. 1971. On the constancy of the evolutionary
rate of cistrons. J. Mol. Evol. 1: 18–25.
Parker, J. 1989. Errors and alternatives in reading the universal genetic
code. Microbiol. Rev. 53: 273–298.
Pedersen, J.S., Bejerano, G., Siepel, A., Rosenbloom, K., Lindblad-Toh,
K., Lander, E.S., Kent, J., Miller, W., and Haussler, D. 2006.
Identification and classification of conserved RNA secondary
structures in the human genome. PLoS Comput. Biol. 2: e33. doi:
Pollard, K.S., Salama, S.R., Lambert, N., Lambot, M.A., Coppens, S.,
Pedersen, J.S., Katzman, S., King, B., Onodera, C., Siepel, A., et al.
2006. An RNA gene expressed during cortical development evolved
rapidly in humans. Nature 443: 167–172.
Purvis, I.J., Bettany, A.J., Santiago, T.C., Coggins, J.R., Duncan, K.,
Eason, R., and Brown, A.J. 1987. The efficiency of folding of some
proteins is increased by controlled rates of translation in vivo. A
hypothesis. J. Mol. Biol. 193: 413–417.
Rozas, J., Sanchez-DelBarrio, J.C., Messeguer, X., and Rozas, R. 2003.
DnaSP, DNA polymorphism analyses by the coalescent and other
methods. Bioinformatics 19: 2496–2497.
Siepel, A. and Haussler, D. 2004. Phylogenetic estimation of context-
dependent substitution rates by maximum likelihood. Mol. Biol. Evol.
Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M.,
Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., et
al. 2005. Evolutionarily conserved elements in vertebrate, insect,
worm, and yeast genomes. Genome Res. 15: 1034–1050.
Singh, N.D., Bauer Dumont, V.L., Hubisz, M.J., Nielsen, R., and
Aquadro, C.F. 2007. Patterns of mutation and selection at
synonymous sites in Drosophila. Mol. Biol. Evol. 24: 2687–2697.
Smith, N.G. and Eyre-Walker, A. 2002. Adaptive protein evolution in
Drosophila. Nature 415: 1022–1024.
Stark, A., Lin, M.F., Kheradpour, P., Pedersen, J.S., Parts, L., Carlson,
J.W., Crosby, M.A., Rasmussen, M.D., and Roy, S. 2007. Discovery of
functional elements in 12 Drosophila genomes using evolutionary
signatures. Nature 450: 219–232.
Stolc, V., Gauhar, Z., Mason, C., Halasz, G., van Batenburg, M.F., Rifkin,
S.A., Hua, S., Herreman, T., Tongprasit, W., Barbano, P.E., et al.
2004. A gene expression map for the euchromatic genome of
Drosophila melanogaster. Science 306: 655–660.
Thanaraj, T.A. and Argos, P. 1996. Ribosome-mediated translational
pause and protein domain organization. Protein Sci. 5: 1594–1612.
Thornton, K. 2003. Libsequence: A C++ class library for evolutionary
genetic analysis. Bioinformatics 19: 2325–2327.
Vicario, S., Moriyama, E.N., and Powell, J.R. 2007. Codon usage in
twelve species of Drosophila. BMC Evol. Biol. 7: 226. doi:
Vicoso, B. and Charlesworth, B. 2006. Evolution on the X chromosome:
Unusual patterns and processes. Nat. Rev. Genet. 7: 645–653.
Weir, B.S. 1990. Genetic data analysis. Sinauer, Sunderland, MA.
Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K.,
Lander, E.S., and Kellis, M. 2005. Systematic discovery of regulatory
motifs in human promoters and 3? UTRs by comparison of several
mammals. Nature 434: 338–345.
Zuker, M. and Stiegler, P. 1981. Optimal computer folding of large RNA
sequences using thermodynamics and auxiliary information. Nucleic
Acids Res. 9: 133–148.
Zuckerandl, E. and Pauling, L. 1962. Molecular disease, evolution, and
genetic heterogeneity. In Horizons in biochemistry (eds. M. Kasha and
B. Pullman), pp. 189–225. Academic Press, New York.
Received February 6, 2008; accepted in revised form June 19, 2008.
Rapid evolution of conserved genomic elements