Copyright © 2005 by the Genetics Society of America
Origin and Evolution of a Chimeric Fusion Gene in Drosophila subobscura,
D. madeirensis and D. guanche
Corbin D. Jones,*,†,1Andrew W. Custer* and David J. Begun*
*Center for Population Biology, University of California, Davis, California 95616 and†Department of Biology and
Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina 27599-3280
Manuscript received October 7, 2004
Accepted for publication February 9, 2005
An understanding of the mutational and evolutionary mechanisms underlying the emergence of novel
genes is critical to studies of phenotypic and genomic evolution. Here we describe a new example of a
recently formed chimeric fusion gene that occurs in Drosophila guanche, D. madeirensis, and D. subobscura.
This new gene, which we name Adh-Twain, resulted from an Adh mRNA that retrotransposed into the
Gapdh-like gene, CG9010. Adh-Twain is transcribed; its 5? promoters and transcription patterns appear
similar to those of CG9010. Population genetic and phylogenetic analyses suggest that the amino acid
sequence of Adh-Twain evolved rapidly via directional selection shortly after it arose. Its more recent
history, however, is characterized by slower evolution consistent with increasing functional constraints.
We present a model for the origin of this new gene and discuss genetic and evolutionary factors affecting
the evolution of new genes and functions.
and Jones 2000; Lynch and Conery 2003) and multi-
cellular organisms (Patthy 1999; Betran et al. 2002;
Harrison et al. 2002; Borevitz et al. 2003; Holland
2003; Tian et al. 2003). Identifying the mutational and
population genetic mechanisms involved in gene loss
and gain is critical to understanding the forces shaping
genome variation. The spread of gene duplications, by
either drift or natural selection (Wagner 2001; Ohta
2003; reviewed in Wolfe and Li 2003), is one mecha-
nism by which gene number increases (Haldane 1932;
Ohno 1970). Gene duplications may, in many cases,
evolve new functions or become subfunctionalized sim-
ply as a result of amino acid or expression evolution
and not as a consequence of large-scale changes in gene
organization (Ohno 1970; Lynch and Conery 2000;
Lynch et al. 2001; Hughes 2002; Betran and Long
2003; Katju and Lynch 2003).
Occasionally, duplication events lead to radical reor-
and immediate functional divergence. One type of radi-
cal reorganization is gene fusion, whereby two pre-
viously separate and independent genes are fused to
form a single contiguous gene. Such chimeric fusion
ENOMES gain and lose genes at surprisingly high
genes (CFGs) have been identified in several taxa. For
example, in plants CFGs are implicated in cytoplasmic
male sterility (He et al. 1996). A few CFGs have also
been found in vertebrates (Finta and Zaphiropoulos
2000; Rogalla et al. 2000; Thomson et al. 2000; Cour-
seaux and Nahon 2001). Finally, several novel CFGs
have been described in Drosophila, such as jingwei in
Drosophila tessieri and D. yakuba (Long and Langley
1993), Sdic in D. melanogaster (Nurminsky et al. 1998),
and Adh-finnegan (Sullivan et al. 1994; Begun 1997).
pseudogenes (Fischer and Maniatis 1985; Jeffs and
Ashburner 1991). In both cases, further analysis showed
that these genes were functional genes that acquired
protein-coding sequence 5? of the Adh-derived region
is a fusion of the amino terminus of a gene known as
yellow emperor and a retrotrasposed Adh (Wang et al.
2000). The jingwei expression profile has diverged from
its Adh ancestors. In D. tessieri expression is now limited
to the testes (like ymp), although in D. yakuba it is ex-
pressed in other tissues as well (Long and Langley
1993). Adh-finnegan was created by the chromosomal
duplication of Adh combined with the recruitment of a
new 5? exon of unknown origin (Begun 1997). Adh-
finnegan appears to be expressed broadly in adult tissues
(Sullivan et al. 1994). Although these two Adh-derived
fusion genes arose via different mechanisms and show
dramatically different expression patterns, directional
selectionappearsto havedrivenrapidamino acidevolu-
tion in both genes (Long and Langley 1993; Begun
1997). The fact that two novel Drosophila genes are
Sequence data from this article have been deposited with the
EMBL/GenBank Data Libraries under accession nos. AY874360–
1Corresponding author: Department of Biology, Carolina Center for
Genome Sciences, CB 3280, 414 Coker Hall, University of North
Carolina, Chapel Hill, NC 27599-3280.
Genetics 170: 207–219 (May 2005)
208C. D. Jones, A. W. Custer and D. J. Begun
derived from Adh and share some common aspects of
their evolution raises two important questions:
retropseudogene is actually part of a new chimeric fu-
sion gene that is the result of an Adh mRNA inserting
into the Gapdh-like gene, CG9010. This fusion gene is
actively and widely transcribed. While the 5? promoters
and transcription patterns of this gene are similar to
those of CG9010, the protein-coding region has di-
verged for both the CG9010 and the Adh-like regions.
Population genetic and phylogenetic analyses suggest
that this amino acid evolution resulted from directional
selection shortly after the chimeric fusion gene was
1. Is Adh overrepresented among novel fly genes? Or,
does the discovery of jingwei and Adh-finnegan—both
of which were discovered unintentionally—reflect
the intense study of Adh in Drosophila?
2. If Adh frequently participates in radical reorganiza-
tions associated with novel function, what general
principles of the evolution of novel function are re-
vealed by these examples of repeated evolution?
Given the history of the discovery of jingwei and Adh-
finnegan, a report of a third putative Adh pseudogene
in the obscura group of Drosophila (Marfany and Gon-
zalez-Duarte 1992; Luque et al. 1997) attracted our
attention. DNA sequencing showed that this putative
pseudogene originated by retrotransposition. Results
from polytene in situ hybridization showed that this Adh
retrosequence had transposed to chromosome arm E
from chromosome U, which is the expected location of
Adh on the basis of the conservation of Muller elements
in D. subobscura, D. guanche, and D. madeirensis, but not
Duarte 1992; Luque et al. 1997), suggesting that the
gene likely arose within the past 3 million years. Luque
Adh retropseudogene, two each from genomic libraries
of D. subobscura, D. guanche, and D. madeirensis. Compari-
sons of putative Adh pseudogenes to Adh for these spe-
cies revealed frameshift mutations or indels in 5? and
3? regions flanking the Adh coding regions. The codon
homologous to the ATG initiation codon of the ances-
tral Adh was CTG in both D. guanche clones. Premature
stop codons were evident in one of the two D. guanche
clones and one of the two D. subobscura clones, but none
of the D. madeirensis clones. These observations sug-
gested that at least some of these Adh sequences were
no longer functional. The fact that the duplicate Adh
was a retrosequence was also interpreted as support
for the pseudogene hypothesis, as retrotransposed se-
quences potentially lack regulatory elements necessary
amino acid substitution rates (dN) relative to Adh. None
of the putative retropseudogenes, however, showed a
nonsynonymous to synonymous substitution rate (dN/
dS) close to one, the expectation for a neutrally evolving
pseudogene. Moreover, codon bias increased in the puta-
tive Adh retropseudogenes, an unexpected result for
a nonfunctional gene. Overall, the data presented a
conflicting picture of the Adh retrosequence. Some as-
pects of the data supported the pseudogene hypothesis,
yet others were strangely inconsistent with the hypothe-
sis and were similar to the situation previously observed
in the repleta group Adh duplication (Sullivan et al.
1994; Begun 1997). We present here our analysis of this
retrotransposed Adh. We show that this putative Adh
MATERIALS AND METHODS
Stocks: D. subobscura stocks were obtained from the Species
Stock Center and from A. Davis. D. guanche, D. hydei, D. pseudo-
obscura, D. melanogaster, and D. yakuba stocks were originally
obtained from the Species Stock Center and the Bloomington
Stock Center. A D. madeirensis stock was kindly provided by
M. Aguade ´. All stocks were reared on standard Drosophila
medium at room temperature.
DNA sequencing: PCR products were sequenced directly
using an ABI 377 automated sequencer and BigDye Termina-
tor chemistry (Applied BioSystems, Foster City, CA).
RNA was prepared from whole flies or larvae using a Micro-
Poly(A) kit (Ambion, Austin, TX). cDNA for reverse tran-
scriptase-PCR and rapid amplification of cDNA ends (RACE)
was prepared from this RNA using the SMART RACE cDNA
amplification kit (CLONTECH, Palo Alto, CA). SuperScript
II reverse transcriptase (GIBCO BRL, Rockville, MD) was used
for all RT reactions. Gene-specific primers were used to assay
gene expression by RT-PCR on cDNA isolated from larvae
(first, second, and third instar), whole adult males, and whole
Genomic library construction and screening: D. subobscura
genomic DNA was isolated from adult flies, partially digested
with Sau3aI (New England BioLabs, Beverly, MA), and then
dephosphorylated with CIAP (Promega, Madison, WI). These
fragments were ligated into the Lambda DASH II vector ac-
cording to the manufacturer’s instructions (Stratagene, La
ing using Gigapack III Gold packaging reactions (Stratagene,
La Jolla, CA). The library was amplified once on plates using
XL-1Blue MRA [P2] cells.
Primary and secondary plaque lifts were carried out on
The library was screened with a 1-kb CG9010 probe that was
PCR amplified from D. melanogaster. Because this probe cross-
hybridizes to plaques harboring Gapdh, we used PCR with
primers designed for D. subobscura Gapdh to rule out false
positives. Phage containing CG9010 were digested with EcoRI
probed with CG9010. The fragment containing D. subobscura
CG9010 was subcloned into pBluescript.
Genomic Southern blot analysis of CG9010: Southern blot
analysis was used to infer copy number of CG9010 in D. subob-
scura. Genomic DNA (5 ?g) was purified from D. melanogaster,
D. pseudoobscura, D. guanche, and D. subobscura. These samples
were digested with PstI or HindIII (GIBCO BRL), electropho-
resed on a 0.7% gel, and Southern blotted to Nytran nylon
1-kb fragments of D. melanogaster CG9010 and then D. subob-
Protein analyses: One gram of tissue (whole adults and
219Origin and Evolution of Chimeric Fusion Gene
Genet. 18: 433–434.
Jain, R., M. C. Rivera and J. A. Lake, 1999
among genomes: the complexity hypothesis. Proc. Natl. Acad.
Sci. USA 96: 3801–3806.
Jeffs, P., and M. Ashburner, 1991
sophila. Proc. R. Soc. Biol. Sci. 44: 151–159.
Katju, V., and M. Lynch, 2003
of recently arisen gene duplicates in the Caenorhabditis elegans
genome. Genetics 165: 1793–1803.
Kern, A. D., C. D. Jones and D. J. Begun, 2002
of nucleotide substitutions in Drosophila simulans. Genetics 162:
Long,M., andC.H.Langley, 1993
of jingwei, a chimeric processed functional gene in Drosophila.
Science 260: 91–95.
Long, M., M. Deutsch, W. Wang, E. Betran, F. G. Brunet et al.,
2003aOrigin of new genes: evidence from experimental and
computational analyses. Genetica 118: 171–182.
Long, M., E. Betran, K. Thornton and W. Wang, 2003b
origin of new genes: glimpses from the young and old. Nat. Rev.
Genet. 4: 865–875.
Loukas, M., C. B. Krimbas, P. Mavragani-Tsipidou and C. D. Kas-
tritsis, 1979 Genetics of Drosophila subobscura populations.
VIII. Allozyme loci and their chromosome maps. J. Hered. 70:
Luque, T., G. Marfany and R. Gonzalez-Duarte, 1997
ization and molecular analysis of Adh retrosequences in species
of the Drosophila obscura group. Mol. Biol. Evol. 14: 1316–1325.
Lynch, M., and J. S. Conery, 2000
quences of duplicate genes. Science 290: 1151–1155.
Lynch, M., and J. S. Conery, 2003
ity. Science 302: 1401–1404.
Lynch, M., M.O’Hely, B. Walsh and A.Force, 2001
ity of preservation of a newly arisen gene duplicate. Genetics 159:
Marfany, G., and R. Gonzalez-Duarte, 1992
transcription of protein-coding genes in the Drosophila subobscura
genome. J. Mol. Evol. 35: 492–501.
McDonald, J. H., and M. Kreitman, 1991
tion at the Adh locus in Drosophila. Nature 351: 652–654.
Nielsen, H., J. Engelbrecht, S. Brunak and G. von Heijne, 1997
A neural network method for identification of prokaryotic and
eukaryotic signal peptides and prediction of their cleavage sites.
Int. J. Neural Syst. 8: 581–599.
Nurminsky, D. I., E. N. Moriyama, E. R. Lozovskaya and D. L.
Hartl, 1996Molecular phylogeny and genome evolution in
the Drosophila virilis species group: duplication of the alcohol dehy-
drogenase gene. Mol. Biol. Evol. 13: 132–149.
Nurminsky, D. I., M. V. Nurminskaya, D. De Aguiar and D. L.
Hartl, 1998 Selective sweep of a newly evolved sperm-specific
gene in Drosophila. Nature 396: 572–575.
Ochman, H., and I. B. Jones, 2000
genome content in Escherichia coli. EMBO J. 19: 6637–6643.
Ohler, U., G. C. Liao, H. Niemann and G. M. Rubin, 2002
tational analysis of core promoters in the Drosophila genome.
Genome Biol. 3: RESEARCH0087.
Ohno, S., 1970Evolution by Gene Duplication. Springer, Berlin.
Ohta, T., 2003Evolution by gene duplication revisited: differentia-
tion of regulatory elements versus proteins. Genetica 118: 209–
Patthy, L., 1999
shuffling: a review. Gene 238: 103–114.
Powell, J. R., 1997 Progress and Prospects in Evolutionary Biology: The
Drosophila Model. Oxford University Press, Oxford.
Ramos-Onsins, S., C. Segarra, J. Rozas and M. Aguade, 1998
lecular and chromosomal phylogeny in the obscura group of
Drosophila inferred from sequences of the rp49 gene region.
Mol. Phylogenet. Evol. 9: 33–41.
Reese, M. G., 2001 Application of a time-delay neural network to
put. Chem. 26: 51–56.
Rogalla, P., B. Kazmierczak, A. M. Flohr, S. Hauke and J. Buller-
diek, 2000Back to the roots of a new exon—the molecular
archaeology of a SP100 splice variant. Genomics 63: 117–122.
Rozas, J., and R. Rozas, 1999DnaSP version 3: an integrated pro-
gram for molecular population genetics and molecular evolution
analysis. Bioinformatics 15: 174–175.
Sullivan, D. T., W. T. Starmer, S. W. Curtiss, M. Menotti-Ray-
mond and J. Yum, 1994Unusual molecular evolution of an Adh
pseudogene in Drosophila. Mol. Biol. Evol. 11: 443–458.
Thomson, T. M., J. J. Lozano, N. Loukili, R. Carrio, F. Serras et
al., 2000 Fusion of the human gene for the polyubiquitination
coeffector UEV1 with Kua, a newly identified gene. Genome Res.
Tian, D., M. B. Traw, J. Q. Chen, M. Kreitman and J. Bergelson,
2003 Fitness costs of R-gene-mediated resistance in Arabidopsis
thaliana. Nature 423: 74–77.
Visa, N., G. Marfany, L. Vilageliu, R. Albalat, S. Atrian et al.,
1991 TheAdhinDrosophila: chromosomallocationandrestric-
tion analysis in species with different phylogenetic relationships.
Chromosoma 100: 315–322.
Vivas, M. V., J. Garcia-Planells, C. Ruiz, G. Marfany, N. Paricio et
al.,1999GEM, aclusterof repetitivesequencesin theDrosophila
subobscura genome. Gene 229: 47–57.
sequenced eukaryotes. Trends Genet. 17: 237–239.
Wang, W., J. Zhang, C. Alvarez, A. Llopart and M. Long, 2000
of its parental gene, yellow emperor, in Drosophila melanogaster. Mol.
Biol. Evol. 17: 1294–1301.
Wolfe, K. H., and W. H. Li, 2003
genomics revolution. Nat. Genet. 33: 255–265.
Yang, Z., 1997PAML: a program package for phylogenetic analysis
by maximum likelihood. Comput. Appl. Biosci. 13: 555–556.
Yang, Z., 1998Likelihood ratio tests for detecting positive selection
and application to primate lysozyme evolution. Mol. Biol. Evol.
Yang, Z., and J. P. Bielawski, 2000
molecular adaptation. Trends Ecol. Evol. 15: 496–503.
Yang, Z., W. J. Swanson and V. D. Vacquier, 2000
likelihood analysis of molecular adaptation in abalone sperm
lysin reveals variable selective pressures among lineages and sites.
Mol. Biol. Evol. 17: 1446–1455.
Yum, J., W. T. Starmer and D. T. Sullivan, 1991
the Adh locus in Drosophila mettleri; an intermediate in the evolu-
tion of the Adh locus in the repleta group of Drosophila. Mol.
Biol. Evol. 8: 857–867.
Zhang, J., A. M. Dean, F. Brunet and M. Long, 2004
protein functional diversity in new genes of Drosophila. Proc.
Natl. Acad. Sci. USA 101: 16246–16250.
Genome evolution and the evolution of exon-
Horizontal gene transfer
Processed pseudogenes in Dro-
The structure and early evolution
The evolutionary fate and conse-
The origins of genome complex-
Evidence for retro-
Adaptive protein evolu-
Molecular evolution meets the
Statistical methods for detecting
Evolutionary dynamics of full
The structure of
Communicating editor: T. Eickbush