Evolutionary Origin and Functions of Retrogene Introns
Marie Fablet,1Manuel Bueno,2Lukasz Potrzebowski, and Henrik Kaessmann
Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
Retroposed genes (retrogenes) originate via the reverse transcription of mature messenger RNAs from parental source
genes and are therefore usually devoid of introns. Here, we characterize a particular set of mammalian retrogenes that
acquired introns upon their emergence and thus represent rare cases of intron gain in mammals. We find that although
a few retrogenes evolved introns in their coding or 3# untranslated regions (untranslated region, UTR), most introns
originated together with untranslated exons in the 5# flanking regions of the retrogene insertion site. They emerged either
de novo or through fusions with 5# UTR exons of host genes into which the retrogenes inserted. Generally, retrogenes
with introns display high transcription levels and show broader spatial expression patterns than other retrogenes. Our
experimental expression analyses of individual intron-containing retrogenes show that 5# UTR introns may indeed
promote higher expression levels, at least in part through encoded regulatory elements. By contrast, 3# UTR introns may
lead to downregulation of expression levels via nonsense-mediated decay mechanisms. Notably, the majority of
retrogenes with introns in their 5# flanks depend on distant, sometimes bidirectional CpG dinucleotide–enriched
promoters for their expression that may be recruited from other genes in the genomic vicinity. We thus propose
a scenario where the acquisition of new 5# exon–intron structures was directly linked to the recruitment of distant
promoters by these retrogenes, a process potentially facilitated by the presence of proto-splice sites in the genomic
vicinity of retrogene insertion sites. Thus, the primary role and selective benefit of new 5# introns (and UTR exons) was
probably initially to span the often substantial distances to potent CpG promoters driving retrogene transcription. Later in
evolution, these introns then obtained additional regulatory roles in fine tuning retrogene expression levels. Our study
provides novel insights regarding mechanisms underlying the origin of new introns, the evolutionary relevance of intron
gain, and the origin of new gene promoters.
The discovery of introns (Berget et al. 1977; Chow
et al. 1977; Evans et al. 1977; Goldberg et al. 1977) repre-
sents one of the most remarkable findings in molecular bi-
ology. Ever since their discovery, both the evolutionary
origin and functional roles of introns have been intensely
studied. A major question has been why introns have be-
come prevalent in most eukaryotic genomes. Although
the original establishment of introns appears to have been
the result of nonadaptive forces and introns impose signif-
icant costs on their host genes in terms of transcription and
mutational load (Lynch 2007; Catania and Lynch 2008),
roles of introns have been suggested and/or demonstrated,
which have made them fundamental components of the eu-
karyotic genome. For example, it was suggested that by in-
creasing the recombination rate between coding exons,
introns would increase the efficiency of selection in genes
(Roy and Gilbert 2006). On the molecular–functional level,
it has been well established that introns increase and/
or affect transcription, polyadenylation, messenger RNA
(mRNA) export, translational efficiency, and mRNA stabil-
ity/decay (reviewed in Le Hir et al. 2003) and thus have
evolved indispensable functional roles.
Although both intron gain and loss have been docu-
prevailing, at least during the more recent evolution of eu-
karyotes (Roy and Gilbert 2006). For example, although
a significant number of intron losses have been detected
in mammals (Coulombe-Huntington and Majewski 2007),
only a single reliable case of intron gain has so far been
reported for this evolutionary lineage (O’Neill et al.
1998). Thus, opportunities to study the mechanisms of in-
tron gain andits evolutionary and functional relevance have
Retroposed genes (retrogenes) originate as ‘‘stripped-
down’’ gene copies (retrocopies) that lack introns (and usu-
ally also regulatory sequences), because the duplication
mechanisms leading to their emergence (retroposition, also
termed retroduplication, Kaessmann et al. 2009) involves
the reverse transcription of mature mRNAs from parental
source genes. Generally, due to their peculiar properties,
ularly good models to study mechanisms underlying the or-
igin of new genes and their functions (Kaessmann et al.
2009). Notably, although retrocopies lack introns upon
their emergence, intron-containing retrogenes have been
discovered in different eukaryotic lineages (e.g., Drosoph-
ila: Long and Langley 1993, plants: Wang et al. 2006, and
mammals: Bradley et al. 2004), thus representing interest-
ing cases of intron gain. Often, these introns were acquired
by fusions with host genes into which retrocopies inserted
(Kaessmann et al. 2009). Recently, however, instances of
mammalian retrogenes that appear to have evolved introns
de novo from their genomic environment have been iden-
tified (Vinckenbosch et al. 2006; Baertsch et al. 2008).
In this study, we set out to characterize in detail a com-
prehensive set of mammalian retrogenes that acquired new
introns during their evolution (Vinckenbosch et al. 2006),
mainly in their 5# untranslated regions (UTRs) but also in
their 3# UTRs. To unravel the mechanism of their emer-
gence, their functional impact as well as their evolutionary
significance, we performed a range of evolutionary and ex-
perimental analyses. We find that introns regulate retrogene
1Present address: Laboratoire de Biome ´trie et Biologie Evolutive,
Universite ´ de Lyon, Universite ´ Lyon 1, CNRS, UMR 5558, Villeurbanne,
2Present address: Ecole Polytechnique Fe ´de ´ral de Lausanne (EPFL),
Biomolecular Screening Facility, Building AAB, Station 15, Lausanne,
Key words: retrogenes, origin of new gene functions, intron
evolution, promoter evolution, gene expression.
Mol. Biol. Evol. 26(9):2147–2156. 2009
Advance Access publication June 24, 2009
? The Author 2009. Published by Oxford University Press on behalf of
the Society for Molecular Biology and Evolution. All rights reserved.
For permissions, please e-mail: firstname.lastname@example.org
by guest on December 21, 2015
many retrogene introns seems to be directly associated with
the acquisition of retrogene promoters. We discuss the rel-
evance of our results not only with respect to the functional
evolution of new genes but also in the context of previously
suggested mechanisms for the emergence of new introns
and their evolutionary and functional roles.
Materials and Methods
We determined the 5# and 3# ends of each retrogene
and the corresponding parent with the First Choice RLM-
RACE kit from Ambion. The cDNA pool was synthesized
from testis total RNAs according to the manufacturer’s
The RPL36AL, HSPA2, SPIN2B, and FAM50B retro-
genes were PCR amplified from the beginning of the 5#
UTR to the penultimate codon, both from genomic DNA
and a cDNA pool synthesized from testis total RNAs, pro-
viding the intron-containing and intron-spliced-out ver-
sions of the retrogenes, respectively. The PCR products
were subsequently cloned into the pEGFP-N1 vector from
Clontech. The ENSG00000182814 retrogene, whose intron
is located in its 3# UTR, was PCR amplified from the first
UTR, from genomic DNA and a cDNA pool synthesized
from testis total RNAs and then cloned into the pEGFP-
C1 vector from Clontech. The transcription of each of these
constructs is driven by a cytomegalovirus promoter, which
allows strong expression in HeLa cells. Complementary
DNAs of the RPL36AL and FAM50B retrogenes were also
cloned into the promoterless pGL4.10 plasmid from Prom-
ega (luciferase reporter gene). For both retrogenes, we PCR
without splice donor–acceptor sites. We confirmed that in-
trons are not spliced out in our experiments using PCR on
cDNAs obtained from transformed cell lysates.
Cell Culture and Transfections
medium (Gibco) supplemented with 10% fetal bovine se-
rum. Cells were transfected with Lipofectamin (Invitrogen)
and lysed for analysis about 48 h after transfection. The
assays, we cotransfected with the pGL4.74 plasmid (Prom-
ega) carrying the Renilla luciferase gene.
Quantitative Reverse Transcription PCR
RNAs were extracted with the RNeasy Mini kit (Qia-
gen) and treated with DNase using the Turbo DNA free kit
(Ambion). Total RNA was converted to cDNA (primed
with oligo dT) with the Superscript III kit (Invitrogen).
Except for HSPA2, the primers used for the SYBR Green
qPCRoverlap theend ofthe considered geneandpart of the
pEGFP sequence in order to exclude transcripts from the
endogenous genes. For HSPA2, transcription of the mouse
gene starts at two main sites, T1 and T2, the latter being
located within the intron (Scieglinska et al. 2001). Thus,
transcripts beginning at T2 are intronless. Therefore, the
primers were chosen to specifically amplify products from
the T1 transcription start site and exclude transcripts from
T2. Given that these primers are mouse specific, they are
unable to amplify HeLa cells endogenous HSPA2 tran-
scripts. For each gene, the designed primers produce ampli-
cons of approximately 70–120 bp. Transcription levels of
the constructs were normalized relative to neomycine tran-
scription levels. The neomycine gene is carried by the
scription is driven by an independent SV40 promoter, and
can therefore be used as a control of transfection level
(Jeyaseelan et al. 2001). To confirm that introns were
spliced out from full-length genomic constructs, we per-
formed PCR experiments using cDNAs generated from
RNA of cells transfected with genomic/cDNA constructs.
These experiments showed that full-length constructs and
intronless constructs resulted in products of equal size that
corresponded to the total lengths of retrogene exons. All
primer sequences are available upon request.
For luminescence assays, cells were rinsed once with
1? Dulbecco’s phosphate buffered saline and subsequently
directly lysed with the Passive Lysis Buffer from the Prom-
ega Dual-Luciferase Reporter Assay System. Luciferase ac-
tivity measurements were done with the Dual-Luciferase
Reporter Assay System (Promega) following the manufac-
turer’s instructions. Luminescence was quantified using a
Promega Glomax machine.
Expression data (Chalmel et al. 2007) were mapped to
a reannotated set of retrogenes following our previous pro-
cedure (Potrzebowski et al. 2008).
We analyzed nonsynonymous–synonymous substitu-
tion rates of the ENSG00000182814 (FUN14-domain)
retrogene (using orthologs from human, chimpanzee,
orangutan, and macaque) in a phylogenetic framework us-
ing the codeml tool from the PAML4 software package
(Yang 1997). We compared the two models detailed in
the main text using a likelihood ratio test (Yang 1998).
Retrogenes with Introns
In order to identify and characterize retrogenes with
introns, we reanalyzed a large recently established data
2148Fablet et al.
by guest on December 21, 2015
set of 3,590 human retrocopies (Vinckenbosch et al. 2006).
From this data set, we extracted 29 retrogenes (fig. 1) with
introns, whose structures were supported by expressed se-
quence tags (ESTs) and full-length mRNA sequences, and
are also confirmed by updated annotations available from
the Ensembl database (Hubbard et al. 2009) and the UCSC
genome browser (Kuhn et al. 2009). Functionality and ex-
pression for nearly all of these retrogenes is supported by
the presence of intact orthologous counterparts in the
mouse, low nonsynonymous to synonymous substitutions
rates (suggesting selective preservation by purifying selec-
tion), and/or a significant number of ESTs (fig. 1).
Origin of 5# UTR Introns
Where arethe introns of these retrogenes located, what
are their properties, and when and how did they arise? We
found that in the majority (26/29) of cases introns are em-
bedded in the5# UTR of retrogenes. In fourof the cases, the
retrogene is located in an intron of the host gene into which
UTR exons derived from 5# UTR or coding exons of their
host genes (figs. 1 and 2A). Thus, they represent a type of
gene fusion, relying on the promoters of their host genes for
transcription and being transcribed as ‘‘alternative’’ tran-
scripts of their host genes (Vinckenbosch et al. 2006).
In contrast, the remaining retrogenes with introns in
their 5# flanks appear to have evolved new 5# untranslated
exon–intron structures (termed 5#-UEI retrogenes in the
following). This notion is supported by the observation that
no other genes from which these exons/introns could have
been derived are located in the vicinity of the retrocopy
insertion sites. Furthermore, direct inheritance of these
introns from partially processed parental transcripts (i.e.,
retroposition of incompletely spliced mRNAs) is also
unlikely, given that their (extant) parental genes generally
FIG. 1.—Schematic structures and characteristics of orthologous intron-containing retrogenes in human and mouse genomes. Black bars indicate
coding exons, open bars untranslated exons. The age/phylogenetic distribution of the retrogenes was determined in a previous study (Potrzebowski et al.
2008). (A) Origination in common mammalian ancestor, (B) common therian (eutherian/marsupial) ancestor, (C) common eutherian ancestor, (D)
human–mouse ancestor (after dog lineage divergence), (E) primates. ‘‘ESTs’’ refers to unique ESTs mapped to these retrogenes (data from
Vinckenbosch et al. 2006). ‘‘CpG’’ (‘‘þ’’) refers to retrogenes with CpG dinucleotide island (and associated promoters) overlapping their 5# ends (see
supplementary fig. 1, Supplementary Material online, for details; data from UCSC genome browser, http://genome.ucsc.edu/). Circles indicate
bidirectional CpG island promoters. ‘‘(F)’’ indicates retrogenes located inside host genes that fused to the host genes’ 5# UTR exons. ‘‘ParChr’’ and
‘‘ParStructure’’ refer to the chromosomal location of the parental gene and its exon–intron structure, respectively. NUP62, FAM113B, and ATP6V1E2
each have two alternative transcripts with different 5# UTR structures that are both shown. We note that although the human HSPA2 retrogene is
depicted here as a single exon gene/transcript (major isoform), it can also be transcribed from a distant LTR promoter together with an additional 5#
UTR exon as discussed in the main text and illustrated in figure 2B.
Retrogenes with Introns2149
by guest on December 21, 2015
contain no UTR introns (fig. 1). For the same reasons, it is
unlikely that processedcDNAcopies oftranscripts partially
recombined with segmentally duplicated copies of the
parental gene, which could also explain the presence of in-
trons associated with this type of genes. Thus, it seems that
the 5# exon–intron structures of these retrogenes indeed
evolved de novo, although it remains possible that they
arose via duplication–transposition events that involved
only UTRs from other genes in the genome.
How did these exons/introns evolve and what was
their selective benefit? We suggest (similarly to the fusion
cases described above) that at least part of the answer to this
question is related to the acquisition of the promoters of
these retrogenes, which are—by definition—located just
upstream of the newly formed 5# exon–intron structures.
Specifically, we propose a scenario in which the original
rogenes became transcribed from a proto-promoter element
(or promoter from another gene—see below) upstream of
the insertion site during evolution (fig. 3). These promoters
may have been strong, a notion supported by the observa-
tion that 5#-UEI retrogenes show significantly higher tran-
scription activity as assessed by numbers of associated
ESTs (median no. of ESTs: 72, mean: 151.3) than other
functional retrogenes that lack introns (median: 11, mean:
70.2; P , 10?4, Mann–Whitney U test).
Once transcribed, proto-splice sites that were present
or evolved in the genomic sequence between the promoter
element and the inserted retrocopy were recruited and
selectively fixed for splicing the introns out of the newly
formed UTR region (fig. 3A). The promoter acquisition
and splice site recruitment probably occurred in parallel,
because a retrogene transcribed from a distant promoter
without intervening intron(s) would likely not have
been selectively preserved. The resultant long UTR
(fig. 3B)—transcriptional start sites (TSSs) of retrogenes
bases, fig.1)fromtheretrogeneinsertionsite—would prob-
ably have interfered with translation of retrogene open
reading frames (ORF) by introducing premature (ATG) ini-
tiation codons (Catania and Lynch 2008) and/or leading to
unfavorable secondary structure formation of the mRNA.
(mean: ;414 bp) is similar to that of other genes in the
genome (mean: ;300 bp, Lander et al. 2001; two-tailed
P 5 0.18, Student’s t-test), reflecting the general selective
constraint affecting 5# UTR exons lengths.
above, the retrogene is first transcribed from a promoter
close to the insertion site (or a promoter inherited from
the parental gene) similar to other retrogenes (Kaessmann
et al. 2009) and then, during evolution, acquired a more dis-
tant (potentially stronger) promoter (fig. 3C). Interestingly,
and its mouse ortholog, HSP70.2. Both the human (fig. 2B)
and mouse (Scieglinska et al. 2001) orthologous retrogenes
have a promoter close to their ORFs and an additional pro-
moter further upstream that is interrupted by a UTR intron,
FIG. 2.—Different mechanisms of retrogene promoter acquisition via the evolution of new 5#-UTR structures—illustrated with individual
examples. (A) Insertion of the retrocopy into an intron of a host gene and recruitment of host gene UTR exon and CpG promoter of the host gene. (B)
Initial transcription from promoter near the insertion site or utilization of alternative parental promoter in inherited UTR (Kaessmann et al. 2009) and
additional recruitment of distant promoter from LTR retrotransposable element. We note that the LTR-driven transcripts were detected in an eye–
retinoblastoma library (Strausberg et al. 2002). (C) Recruitment of bidirectional (CpG-enriched) promoter from neighboring gene in the vicinity of the
retrogene insertion sites. (D) Recruitment of distant CpG proto-promoter not previously associated with a gene. CpG islands and LTR are indicated with
symbols as indicated in the figure. Promoters (bent arrows), translated (black bars)–untranslated (white–shaded bars) parts of retroposed gene copies (to
the right), introns (thin white lines), and exons (dark–narrow shaded boxes) of other (nonretroposed genes) in the genomic vicinity are indicated.
2150Fablet et al.
by guest on December 21, 2015
dently (see further below).
Generally, as discussed above, the originof the new 5#
region of the retrogene was intimately linked to the recruit-
ment of potent and potentially distant promoters by the
newly formed retrogenes and the major—initial—role
and selective benefit of the introns of these newly formed
genes was probablyto reduce the size of the newly acquired
UTR exons. But what is the nature of the recruited pro-
moters that apparently drove the evolution of the new 5#
exon–intron structures? Strikingly, in 19 of 24 cases, the
TSS of the 5# UEI retrogenes overlaps with sequences sig-
nificantly enriched with CpG dinucleotides, the so-called
CpG islands (CpGIs; figs. 1 and 2, supplementary fig. 1,
Supplementary Material online). Approximately 50% of
human promoters are associated with CpG islands (Smale
and Kadonaga 2003). CpG island promoters usually drive
genes (Sandelin et al. 2007). Consistently, we find that 5#-
UEI retrogenes with CpGI promoters are, on average, ex-
pressed in a significantly larger range of tissues (mean: 9.6
tissues, median: 9.5) than other, non–intron containing ret-
rogenes (mean: 3.9, median: 1; two-tailed P , 10?3,
Mann–Whitney U test; fig. 4). Thus, it seems that many
5#-UEI retrogenes recruited CpGI with promoter capacity
in their genomic vicinity. Most of these likely represented
CpGI proto-promoters sequences not previously associated
with any genes.
(RPL36AL, HNRPH2, SLC35A4, HSPA1L, PLEKHA9,
and ATP6V1E2) obtained their CpGI promoters from other
genes (retrogene TSS within , 200 bp from TSS of neigh-
boring gene), being transcribed in the opposite direction
of the adjacent donor genes (figs. 1 and 2C, supplementary
fig. 1, Supplementary Material online). This observation is
consistent with previous studies that showed that many
CpGI-associated promoters have the capacity for bidirec-
tional activity (Trinklein et al. 2004). Therefore, in addition
to directly fusing to other genes (see above; fig. 2A,
capacity from these genes. Thus, our observations lend fur-
ther support to the notion that many retrogenes hitchhike on
(Vinckenbosch et al. 2006; Kaessmann et al. 2009).
Among the 5# UTR retrogenes that did not recruit
CpGI promoters, the HSPA2 gene (see also above) reveals
another potential source of retrogene promoters. Its first/up-
stream TSS lies within the Long Terminal Repeat (LTR)
of a retrotransposon of the MaLR family (Smit 1993), close
to the location of the expected TSS of this LTR. Thus,
HSPA2 seems to have recruited an LTR promoter that
drives the transcription of its intron-containing transcript,
an observation that may not be surprising given that many
FIG. 3.—Evolutionary scenarios for retrogene promoter gain via the
acquisition of new 5# exon–intron structures. Promoters (bent arrows),
splice sites (stars), translated (black bars)–untranslated (gray bars) parts of
retroposed gene copies, and genomic DNA (thin parallel lines) are
indicated. The specific scenarios are (A) The retroposed gene copy inserts
in region with upstream proto-splice sites and a distant promoter and can
immediately be transcribed with a 5# UTR intron. (B) Retrocopy inserts in
region with distant upstream promoter and is initially transcribed from
promoter with (long) UTR exon and without intervening 5# intron;
eventually, splice sites evolve in 5# UTR and lead to emergence/splicing
of 5# UTR intron. (C) Inserted retrocopy is transcribed from nearby
promoter, splice sites eventually evolve further upstream and allow for
recruitment of distant upstream promoter reducing 5# UTR exon size. See
main text for further discussion of these scenarios.
Retrogenes with Introns2151
by guest on December 21, 2015
genes appear to have recruited transposable elements, in-
cluding LTRs, for their transcription (Feschotte 2008).
The HSPA2 LTR promoter recruitment event seems to have
occurred on the primate lineage, because the mouse ortho-
its intron-containing transcript. Consistently, the human
and mouse orthologs appear to have different expression
patterns when expressed from their upstream promoters.
Human LTR–driven transcripts of HSPA2 are detected in
eye/retinoblastoma, whereas the mouse HSP70.2 gene is
specifically expressed in testes (meiotic spermatocytes
and postmeiotic spermatids, Scieglinska et al. 2004).
Generally, however, most 5# exon–intron structures of
5#-UEI retrogenes that originated before the human–mouse
split appear to have been preserved during the 80 My since
the separation of the primate and rodent lineages, although
468 10 12
Single exon retrogenes
FIG. 4.—Spatial expression patterns of (A) retrogenes with 5# untranslated exon–intron structures (5#-UEI retrogenes) and (B) single exon
retrogenes. Expression data (log 2 expression signals are represented according to the plotted scale) is shown for the following 5#-UEI retrogenes:
1) HNRPF, 2) RPL36AL, 3) HSPA2, 4) HNRNPH2, 5) SLC35A4, 6) RHOG, 7) ARF6, 8) SOD3, 9) NXT1, 10) ALDH1B1, 11) RHOH, 12) FAM50B, 13)
FAM113B, 14) COX7B2, 15) HSPA1L, 16) ARPM1, 17) ATP6V1E2, 18) DNAJB8, 19) ARD1B, and 20) HSPC105.
2152 Fablet et al.
by guest on December 21, 2015
some 5#-UEI retrogenes seem to have evolved differences
(gains/losses of additional introns) in their 5# flanks be-
tween these species (fig. 1).
Expression Effects of 5# UTR Introns
From the observations described above it seems prob-
able that the formation of introns is a by-product of the pro-
cess of retrogene promoter acquisition and serves to span
the distance between the recruited promoter and retrogene.
However, we hypothesized that once these introns were ac-
quired, they may have evolved additional functional roles.
In fact, it is well established that introns often increase tran-
script and protein levels through various mechanisms
(Le Hir et al. 2003).
Thus, we sought to experimentally assess the effect of
the introns on the mRNA and protein expression levels of
5#-UEI retrogenes. To this end, we sought to clone both the
full-genomic versions (including the 5# introns) of these
genes as well as cDNA versions of the genes (lacking in-
trons) into an expression vector containing a constitutive
promoter upstream of the cloning site. For the purpose
of this analysis, we chose four genes that originated during
different time points in mammalian evolution (HSP70.2:
common ancestor of all mammals, RPL36AL: eutherian
mammal ancestor, FAM50B: primate–rodent ancestor,
and SPIN2B: on the primate lineage; figs. 1 and 5) and
whose introns are small enough to allow cloning of their
entire genomic sequences. Notably, we confirmed the TSSs
of these retrogenes using 5# RACE analysis and confirmed
their annotated 5# exon–intron structures by sequencing
their cDNAs and aligning these sequences to the genome
(see Materials and Methods for details). We then trans-
fected stable cell lines with the different constructs and
measured transcript levels using quantitative RT-PCR
48-h posttransfection. Importantly, we confirmed that in-
trons are spliced out from mRNAs transcribed from the
full-genomic retrogene versions (see also Materials and
These experiments revealed several interesting effects
of introns on the transcript levels of this set of genes. The
‘‘original’’ intron-containing versions of two genes—
RPL36AL and HSPA2—had significantly higher (;1.6 to
2 times) transcript levels than those that lack introns
(two-tailed P , 0.01 and P , 0.05, respectively, Mann–
Whitney U test; fig. 6). Thus, the introns in these retrogenes
seem to support higher mRNA expression levels.
For the RPL36AL gene, we assessed whether the ob-
served effect may at least in part be explained by regulatory
elements in the 5# UTR intron that actively enhance tran-
scription itself or are rather due to the presence of the intron
machinery on transcription initiation and RNA polymerase
II processivity (Furger et al. 2002; Nott et al. 2003). To this
end, we cloned—in addition to the cDNA version—the in-
tron sequence on its own into a luciferase vector that is de-
void of a promoter. The intron sequence itself showed
significantly higher luminescence signals than the cDNA
version of the retrogene or the empty vector (two-tailed
P , 0.01, Mann–Whitney U test; fig. 7), suggesting that
ce(s) that promote transcription.
In the case of FAM50B, transcript levels of the intron-
containing/genomic retrogene version are not significantly
different from those of the cDNA version in the promoter-
containing construct (two-tailed P , 0.96, Mann–Whitney
U test; fig. 6). However, the 5# intron on its own
shows significantly higher mRNA expression levels than
the controls, suggesting an enhancing effect of the intron
on transcript levels of FAM50B (two-tailed P , 0.05,
Mann–Whitney U test; fig. 7).
Surprisingly, in contrast to the other cases tested, the
presence of the 5# intron significantly reduces mRNA ex-
pression levels of the SPIN2B gene (two-tailed P , 0.01,
Mann–Whitney U test; fig. 6). This may suggest that the
intron of this gene contains a transcriptional repressor ele-
ment. An alternative interesting explanation for this result
stems from the observation that the 5# UTR intron of
SPIN2B is located merely 2 bp upstream of the ATG trans-
lation start codon. It is known that the components of the
exon junction complex (EJC), which consists of several
proteins that are deposited on the mRNA upon intron ex-
cision just upstream of the exon–exon junction, remain
bound to the mRNA after its export to the cytoplasm
FIG. 5.—Schematic illustration of the structures of the 5 experimen-
tally characterized intron-containing retrogenes (see main text for details).
FIG. 6.—Effects of introns on retrogene transcript levels. Quantita-
tive RT-PCR analysis of genomic (‘‘g,’’ black bars) and cDNA (‘‘c,’’ gray
bars) versions of retrogenes with 5# exon–intron structures (5#-UEI
retrogenes). The y-axis indicates normalized (relative to neomycine)
transcript levels (arbitrary units). Each experiment was repeated six times.
Stars indicate significant expression differences as assessed by Mann–
Whitney U tests (**P , 0.01, *P , 0.05).
Retrogenes with Introns2153
by guest on December 21, 2015
and may enhance translation (Nott et al. 2004). We spec-
ulate that in the case of SPIN2B, the bound components
of the EJC may prevent translation initiation by the ribo-
some due to their proximity to the ATG start codon, which
might lead to degradation of the mRNA. Further work is
warranted to test this hypothesis.
3# UTR Introns and Nonsense-mediated Decay (NMD)
In addition to the 5#-UTR intron cases described
above, we identified two retrogenes (ARD1B and a FUN14
domain–containing gene, annotated as ENSG00000182814
in the Ensembl database, Hubbard et al. 2009) that each
carries an intron in their 3# UTRs (figs. 1 and 5). ARD1B
is present and intact in available placental mammal ge-
nomes and its exon–intron structure is conserved, suggest-
ing that this gene is functional and the functional role of the
intron has been preserved. The FUN14-domain gene has
emerged in primates. Interestingly, this gene (annotated
as a pseudogene) also seems to have been preserved by pu-
rifying selection, in spite of a shift in the ORF—which
generated a longer than the original (parental) ORF—after
the retroposition event (supplementary fig. 2, Supple-
mentary Material online, Materials and Methods). Using
a maximum likelihood procedure (Yang 1997), we found
that a model where the nonsynonymous over synonymous
substitution rate (dN/dS) during retrogene evolution is esti-
mated from the data (dN/dS? 0.36) provides a significantly
better fit to the data than a model where dN/dSis fixed to 1
(P , 0.05).
as 3# UTR introns are generally rather rare (Isken and
Maquat 2008). The paucity of genes with 3# UTR introns
may be due to the fact that transcripts with 3# UTR introns
(i.e., downstream of the stop codon) are usually targeted by
the NMD machinery if those introns are located more than
;55 bp downstream of the stop codon (Isken and Maquat
2008). Interestingly, whereas the 3# UTR intron of ARD1B
is located only 12 bp downstream of the end of the ORF
(and thus likely does not trigger NMD), the intron in the
FUN14-domain gene is 559 bp downstream of the termina-
tion codon and therefore is expected to elicit NMD.
To assess whether the FUN14-domain gene transcript
levels are indeed reduced by the presence of the 3# UTR
introns, we cloned genomic and cDNA versions of the
FUN14-domain gene into our green fluorescent protein ex-
pression vector. We observed a statistically significant re-
duction in transcript levels for the intron-containing/
genomic version of the gene compared with its cDNA ver-
sion (two-tailed P , 0.01, Mann–Whitney U test, fig. 6).
This suggests that part of the FUN14-domain gene tran-
scripts is indeed targeted for decay by the NMD machinery.
may provide a means for NMD-regulated gene expression,
a peculiar regulatory mechanism that has been reported for
several other genes before (Isken and Maquat 2008).
In this study, we have characterized a unique set of
human and mouse retroposed genes, usually devoid of in-
trons, which acquired introns during their evolution and
thus represent rare cases of intron gain in mammals. We
find that these introns mainly evolved as part of new 5#
or 3# exon–intron structures, although we identified two in-
teresting coding intron gain cases—ARPM1 and HSPC105
(fig. 1)—that could not be experimentallycharacterized due
to cloning issues. However, we note that the procedure we
used to reliably identify retrogenes relies to a large extent
on the absence of introns (relative to the parental gene)
over most of the coding sequence. Thus, further work is
FIG. 7.—Regulatory effects of retrogene introns. Luciferase reporter
assays measuring constructs containing cDNAs or introns of the (A)
RPL36AL gene and (B) FAM50B 5#-UEI retrogenes. The y-axis indicates
normalized (relative to the cotransfected pGL4.74 Renilla plasmid)
luminescence intensities (see Materials and Methods for details). Each
experiment was repeated four times. Stars indicate significant expression
differences as assessed by Mann–Whitney U tests (**P , 0.01, *P ,
2154Fablet et al.
by guest on December 21, 2015
warranted to explore in more detail the potential of intron
gain within retrogene-coding regions.
Our work suggests that 5# UTR retrogene introns
evolved in two main ways. First, in four of the cases,
the retrogene inserted into an intron of the host gene and
became transcribed (after recruitment and/or the evolution
of splice sites) together with 5# UTR exons of the host gene
as an ‘‘alternative transcript variant.’’ Some of these UTR
host gene fusion cases have been previously discussed in
some detail (Vinckenbosch et al. 2006; Kaessmann et al.
2009). Second, the majority of retrogenes with 5# UTR in-
trons probably evolved de novo together with UTR exons
and it seems that their origin was coupled with the acqui-
sition of strong distant promoters by the retrogene. Thus,
these introns probably evolved to span the sometimes con-
siderable distances to promoters in the genomic vicinity of
the retrogene insertion site, thus avoiding long UTRs that
would likely have been deleterious due to premature start
codons (Mignone et al. 2002; Hong et al. 2006; Scofield
et al. 2007; Catania and Lynch 2008) and/or RNA struc-
tures incompatible with translation. The emergence of these
5# UTR introns was likely facilitated by thepresence and/or
evolution of (proto-)splice sites between the retrogene and
distant promoter. Thus, our work unravels a novel mecha-
nism of intron gain. Notably, although several possible
ways of intron gain have been discussed (Roy and Gilbert
2006; Catania and Lynch 2008), the underlying mecha-
nisms for the emergence of new introns have remained elu-
sive. In particular, the origin of UTR introns has only
recently begun to be considered (Catania and Lynch 2008).
We further show that the promoters that were recruited
via the acquisition of the new 5# exon–intron structures in
most cases not only constitute strong CpG-enriched pro-
moters, but also include promoters derived from retrotrans-
posable elements. In several cases, the CpG promoters are
bidirectional and stem from other genes, thus illustrating
another striking mechanism by which retrogenes can hitch-
hike on the regulatory machinery from genes in their vicin-
ity (i.e., in addition to host gene fusions; see above,
Vinckenbosch et al. 2006; Kaessmann et al. 2009). In other
cases, proto CpG island promoters (not previously associ-
ated with other genes) were ‘‘captured’’ from the genomic
environment. Notably, although most retrogenes are rather
specifically expressed in testes, these promoters allowed
many of the 5#-UEI retrogenes to be transcribed broadly
in many tissues, consistent with the fact that the expression
of housekeeping genes is usually driven by such CpG-
enriched promoters (Sandelin et al. 2007).
Although the original gain of 5# intron–exon struc-
tures appears to have initially been a by-product of the ac-
quisition of new promoters, we provide experimental
evidence for individual cases, which suggests that these
novel introns may have evolved to modulate transcript lev-
els of these generally highly transcribed retrogenes. There
are at least two principal ways by which introns can affect
introns can affect transcription by acting as repositories for
transcriptional regulatory elements such as enhancers and
repressors. Second, once transcribed, the splicing signals
RNA polymerase II initiation and processivity. We find that
both transcriptional enhancer (RPL36AL, HSPA2) and re-
pressor (SPIN2B) elements appear to have evolved within
5#-UEI retrogene introns. Thus, at least part of the effects of
introns on retrogene transcript levels occur at the DNA
level, although further enhancement of transcription may
occur at the pre-mRNA level.
Interestingly,wealso findcases ofintrongaininthe3#
UTR of retrogenes. Despite the fact that 3# UTRs are typ-
ically about two to three times longer than 5# UTRs, 3#
UTR introns are rare, potentially because they trigger
NMD when not located close to the termination codon
(Isken and Maquat 2008). We experimentally demonstrate
that the 3# UTR intron in one retrogene (the FUN14
domain–containing retrogene) indeed reduces transcript
levels. Thus, our observations suggest that this retrogene
belongs to the rare class of genes regulated by NMD (Isken
and Maquat 2008).
In summary, our experimental and bioinformatics
scrutiny of intron-containing retrogenes has not only pro-
vided several novel insights with respect to mechanisms
and the evolutionary relevance of intron gain, but also re-
vealed a novel process through which retroposed gene cop-
ies can be endowed with regulatory elements. Thus, the
process of RNA-based gene duplication continues to pro-
vide unprecedented insights into mechanisms underlying
the emergence of new gene functions as well as the
evolution of mammalian genomes at large.
Supplementary figures 1 and 2 are available at Molec-
ular Biology and Evolution online (http://www.mbe.
erated the idea for this study; the Lausanne DNA Array
Facility for technical support; L. Duret, A. Marques,
L. Rosso, N. Vinckenbosch, and M. Weier for technical
support and helpful discussions; and the anonymous re-
viewers for their constructive comments that helped to im-
prove the manuscript. This study was supported by grants
from the Swiss National Science Foundation and the
EMBO Young Investigator Programme to H.K.
Baertsch R, Diekhans M, Kent WJ, Haussler D, Brosius J. 2008.
Retrocopy contributions to the evolution of the human
genome. BMC Genomics. 9:466.
Berget SM, Moore C, Sharp PA. 1977. Spliced segments at the
5# terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci
Bradley J, Baltus A, Skaletsky H, Royce-Tolland M, Dewar K,
Page DC. 2004. An X-to-autosome retrogene is required for
spermatogenesis in mice. Nat Genet. 36:872–876.
Catania F, Lynch M. 2008. Where do introns come from? PLoS
Retrogenes with Introns 2155
by guest on December 21, 2015
Chalmel F, Rolland AD, Niederhauser-Wiederkehr C, et al. (11
co-authors). 2007. The conserved transcriptome in human
and rodent male gametogenesis. Proc Natl Acad Sci USA.
Chow LT, Gelinas RE, Broker TR, Roberts RJ. 1977. An
amazing sequence arrangement at the 5# ends of adenovirus
2 messenger RNA. Cell. 12:1–8.
Coulombe-Huntington J, Majewski J. 2007. Characterization of
intron loss events in mammals. Genome Res. 17:23–32.
Evans RM, Fraser N, Ziff E, Weber J, Wilson M, Darnell JE.
1977. The initiation sites for RNA transcription in Ad2 DNA.
Feschotte C. 2008. Transposable elements and the evolution of
regulatory networks. Nat Rev Genet. 9:397–405.
Furger A, O’Sullivan JM, Binnie A, Lee BA, Proudfoot NJ.
2002. Promoter proximal splice sites enhance transcription.
Genes Dev. 16:2792–2799.
Goldberg S, Schwartz H, Darnell JE Jr. 1977. Evidence from UV
transcription mapping in HeLa cells that heterogeneous
nuclear RNA is the messenger RNA precursor. Proc Natl
Acad Sci USA. 74:4520–4523.
Hong X, Scofield DG, Lynch M. 2006. Intron size, abundance,
and distribution within untranslated regions of genes. Mol
Biol Evol. 23:2392–2404.
Hubbard TJ, Aken BL, Ayling S, et al. (58 co-authors). 2009.
Ensembl 2009. Nucleic Acids Res. 37:D690–D697.
Isken O, Maquat LE. 2008. The multiple lives of NMD factors:
balancing roles in gene and genome regulation. Nat Rev
Jeyaseelan K, Ma D, Armugam A. 2001. Real-time detection of
gene promoter activity: quantitation of toxin gene transcrip-
tion. Nucleic Acids Res. 29:E58.
Kaessmann H, Vinckenbosch N, Long M. 2009. RNA-based
gene duplication: mechanistic and evolutionary insights. Nat
Rev Genet. 10:19–31.
Kuhn RM, Karolchik D, Zweig AS, et al. (22 co-authors). 2009.
The UCSC Genome Browser Database: update 2009. Nucleic
Acids Res. 37:D755–D761.
Lander ES, Linton LM, Birren B, et al. (238 co-authors). 2001.
Initial sequencing and analysis of the human genome. Nature.
Le Hir H, Nott A, Moore MJ. 2003. How introns influence and
enhance eukaryotic gene expression. Trends Biochem Sci.
Long M, Langley CH. 1993. Natural selection and the origin of
jingwei, a chimeric processed functional gene in Drosophila.
Lynch M. 2007. The origins of genome architecture. Sunderland
(MA): Sinauer Associates.
Mignone F, Gissi C, Liuni S, Pesole G. 2002. Untranslated
regions of mRNAs. Genome Biol. 3:REVIEWS0004.
Nott A, Le Hir H, Moore MJ. 2004. Splicing enhances translation
in mammalian cells: an additional function of the exon
junction complex. Genes Dev. 18:210–222.
Nott A, Meislin SH, Moore MJ. 2003. A quantitative analysis of
intron effects on mammalian gene expression. RNA.
O’Neill RJ, Brennan FE, Delbridge ML, Crozier RH, Graves JA.
1998. De novo insertion of an intron into the mammalian sex
determining gene, SRY. Proc Natl Acad Sci USA. 95:
Potrzebowski L, Vinckenbosch N, Marques AC, Chalmel F,
Jegou B, Kaessmann H. 2008. Chromosomal gene move-
ments reflect the recent origin and biology of therian sex
chromosomes. PLoS Biol. 6:e80.
Roy SW, Gilbert W. 2006. The evolution of spliceosomal
introns: patterns, puzzles and progress. Nat Rev Genet. 7:
Sandelin A, Carninci P, Lenhard B, Ponjavic J, Hayashizaki Y,
Hume DA. 2007. Mammalian RNA polymerase II core
promoters: insights from genome-wide studies. Nat Rev
Scieglinska D, Vydra N, Krawczyk Z, Widlak W. 2004. Location
of promoter elements necessary and sufficient to direct testis-
specific expression of the Hst70/Hsp70.2 gene. Biochem J.
Scieglinska D, Widlak W, Konopka W, Poutanen M, Rahman N,
Huhtaniemi I, Krawczyk Z. 2001. Structure of the 5# region of
the Hst70 gene transcription unit: presence of an intron and
multiple transcription initiation sites. Biochem J. 359:
Scofield DG, Hong X, Lynch M. 2007. Position of the final
intron in full-length transcripts: determined by NMD? Mol
Biol Evol. 24:896–899.
Smale ST, Kadonaga JT. 2003. The RNA polymerase II core
promoter. Annu Rev Biochem. 72:449–479.
Smit AF. 1993. Identification of a new, abundant superfamily of
mammalian LTR-transposons. Nucleic
Strausberg RL, Feingold EA, Grouse LH, et al. (83 co-authors).
2002. Generation and initial analysis of more than 15,000 full-
length human and mouse cDNA sequences. Proc Natl Acad
Sci USA. 99:16899–16903.
Trinklein ND, Aldred SF, Hartman SJ, Schroeder DI, Otillar RP,
Myers RM. 2004. An abundance of bidirectional promoters in
the human genome. Genome Res. 14:62–66.
Vinckenbosch N, Dupanloup I, Kaessmann H. 2006. Evolution-
ary fate of retroposed gene copies in the human genome. Proc
Natl Acad Sci USA. 103:3220–3225.
Wang W, Zheng H, Fan C, et al. (14 co-authors). 2006. High rate
of chimeric gene origination by retroposition in plant
genomes. Plant Cell. 18:1791–1802.
Yang Z. 1997. PAML: a program package for phylogenetic
analysis by maximum likelihood. Comput Appl Biosci.
Yang Z. 1998. Likelihood ratio tests for detecting positive
selection and application to primate lysozyme evolution. Mol
Biol Evol. 15:568–573.
Acids Res. 21:
Kenneth Wolfe, Associate Editor
Accepted June 20, 2009
2156 Fablet et al.
by guest on December 21, 2015