A Driving Force for New Gene Origination?
Xuebing Wu1,2and Phillip A. Sharp1,3,*
1David H. Koch Institute for Integrative Cancer Research
2Computational and Systems Biology Graduate Program
3Department of Biology
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
The mammalian genome is extensively transcribed, a large fraction of which is divergent tran-
scription from promoters and enhancers that is tightly coupled with active gene transcription.
Here, we propose that divergent transcription may shape the evolution of the genome by new
Widespread Divergent Transcription
The vast majority of the human genome, including half of the
region outside of known genes, is transcribed (Djebali et al.,
2012). However, most intergenic transcription activity produces
short and unstable noncoding transcripts whose abundances
are usually an order of magnitude lower than those from typical
protein-coding genes. Except for a few well-studied cases
(see review in Guttman and Rinn, 2012; Lee, 2012; Mercer
et al., 2009; Ponting et al., 2009; Rinn and Chang, 2012; Ulitsky
and Bartel, 2013; Wang and Chang, 2011; Wei et al., 2011;
Wilusz et al., 2009), it’s unclear whether most intergenic tran-
scription is regulated or has cellular function.
Recent evidence has shown that most intergenic transcription
occurs near or is associated with gene transcription, such as
transcription from promoter and enhancer regions (Sigova
et al., 2013). The majority of mammalian promoters direct tran-
scription initiation on both sides with opposite orientations,
a phenomenon known as divergent transcription (Core et al.,
2008; Preker et al., 2008; Seila et al., 2008). Divergent tran-
scription generates upstream antisense RNAs (uaRNAs, or
PROMPTs, promoter upstream transcripts) near the 50end of
genes that are typically short (50–2,000 nucleotides) and rela-
tively unstable (Flynn et al., 2011; Ntini et al., 2013; Preker
et al., 2008, 2011). Similar divergent transcription also occurs
at distal enhancer regions, giving rise to RNAs termed enhancer
RNAs (eRNAs) (Kim et al., 2010; De Santa et al., 2010). In mouse
and human embryonic stem (ES) cells, most long noncoding
RNAs (lncRNAs, longer than 100 nucleotides) are associated
with protein-coding genes, including ?50% as uaRNAs and
?20% as eRNAs (Sigova et al., 2013). These observations
suggest that divergent transcription from promoters and
enhancers of protein-coding genes is the major source of
intergenic transcription in ES cells.
In the textbook model of a eukaryotic promoter, the direction-
ality isset by the arrangement of anupstream cis-element region
followed by a core promoter (Figure 1A). The cis-elements are
bound by sequence-specific transcription factors, whereas
the core promoter is bound by TATA-binding protein (TBP) and
other factors that recruit the core transcription machinery.
Most mammalian promoters lack a TATA element (TATA-less)
and are CpG rich (Sandelin et al., 2007). For these promoters,
TBP is recruited through sequence-specific transcription factors
such as Sp1 that bind CpG-rich sequences and components
of the TFIID complex that have little sequence specificity.
Thus, in the absence of strong TATA elements such as for CpG
island promoters, TBP-complexes are recruited on both sides
of the transcription factors to form preinitiation complexes
in both orientations (Figure 1B). This model is supported by
the observation that divergent transcription occurs at most
promoters that are associated with CpG islands in mammals,
whereas promoters with TATA elements in mammals and
worm are associated with unidirectional transcription (Core
et al., 2008; Kruesi et al., 2013). In addition, divergent tran-
scription is less common in Drosophila where CpG islands
are rare (Core et al., 2012). Since transcription factors with
chromatin remodeling potential and transcription activation
domains also bind at enhancer sites, it is not surprising that
these are also sites of divergent transcription. In fact, promoters
and enhancers have many properties in common, and it has
been shown recently that many intragenic enhancers can act
as alternative promoters producing tissue-specific lncRNAs
(Kowalczyk et al., 2012).
The U1-PAS Axis and Gene Maturation
Promoter-proximal noncoding transcription in both yeast and
mammals has been shown to be suppressed at the chromatin
level, including nucleosome remodeling (Whitehouse et al.,
2007), histone deacetylation (Churchman and Weissman,
2011), and gene loop formation (Tan-Wong et al., 2012). We
and others recently found that in mammals promoter upstream
antisense transcription is frequently terminated due to cleavage
of the nascent RNA by the same process responsible for the
generation of the poly A tract at the 30ends of genes (Almada
et al., 2013; Ntini et al., 2013). In both cases, the primary signal
directing this process is the poly (A) signal (PAS) motif, AAUAAA
or similar (Proudfoot, 2011). Pol II terminates transcription
990 Cell 155, November 21, 2013 ª2013 Elsevier Inc.
within several kb after such cleavage (Anamika et al., 2012;
Richard and Manley, 2009). Computational analysis showed
that relative to the 50end of the sense regions, PAS motifs are
enriched, whereas potential U1 snRNP-binding sites, or 50splice
site-like sequences, are depleted in the upstream antisense
regions. The binding of U1 snRNP is known to suppress PAS-
directed cleavage over regions of thousand nucleotides down-
stream (Berg et al., 2012; Kaida et al., 2010). Thus, the bias in
the distribution of U1 snRNP-binding sites and PAS promotes
expression of full-length mRNAs by suppressing premature
cleavage and polyadenylation but favors early termination of
uaRNAs. This conclusion is strongly supported by the finding
that inhibition of U1 snRNP dramatically increased termination
and polyadenylation of sense-oriented transcripts in the gene
region (Almada et al., 2013).
If the U1-PAS axis defines the length of a transcribed region,
then it might be expected that for a typical protein-coding
gene (?20 kb) to evolve from intergenic noncoding DNA would
involve strengthening of the U1-PAS axis by gaining U1 sites
and losing PAS in the sense orientation. Examining the dis-
tributions of U1 and PAS sites in bidirectional promoters
involving UCSC-annotated mRNA-mRNA, mRNA-lncRNA, and
mRNA-uaRNA pairs, we found that lncRNAs showed properties
resembling intermediates between mRNA genes and uaRNA
regions in terms of the density of U1 sites and PAS sites (Almada
et al., 2013). That is, the density of PAS decreases from regions
producing uaRNA to lncRNA to mRNA, whereas U1 sites show
the opposite trend, consistent with the differences in the length
and abundance of these transcripts. We also studied the
evolution of the U1-PAS axis in vertebrates, and found that
older genes exhibit progressive gain of U1 sites and loss of
PAS sites at their 50ends. Together these observations suggest
that strengthening of the U1-PAS axis may be associated with
the origination and maturation of genes.
De Novo Gene Origination from Divergent Transcription
Below we propose a model (Figure 2) arguing that the act of
transcription in germ cells strengthens the U1-PAS axis in the
upstream antisense region of an active gene, or the associated
enhancer regions, creating a feedback loop amplifying tran-
scription activity, which eventually may drive origination of a
new antisense-oriented gene (Figure 3).
One consequence of transcription is that it can cause
mutations, especially on the coding (nontranscribed) strand.
During transcription, transient R loops can be formed behind
the transcribing RNA polymerase II, exposing the coding
strand as single-stranded DNA, whereas the noncoding
strand is base paired with and thus protected by the nascent
RNA (Aguilera and Garcı ´a-Muse, 2012). The lack of splicing
signals in the divergent transcript also makes it more vulnerable
to R loop formation, as splicing factors have been implicated
in suppressing R loop formation (Li and Manley, 2006, 2005;
Paulsen et al., 2009). In addition, divergent transcription gener-
ates negative supercoiling at promoters, which facilitates
DNA unwinding and promotes R loop formation (Aguilera and
Garcı ´a-Muse, 2012; Seila et al., 2009). As a consequence of R
loop formation, the single-stranded coding strand is vulnerable
to mutagenic processes, such as cleavage, deamination, and
depurination. Genomics studies have shown that during
mammalian evolution, transcribed regions accumulate G and T
bases on the coding strand, relative to the noncoding strand or
nontranscribed regions (Green et al., 2003; Mugal et al., 2009;
Park et al., 2012; Polak et al., 2010). Evidence suggests that
such strand bias may result from passive effects of deamination,
Figure 1. Transcription Factors Drive Divergent Transcription
and associated factors, which binds the directional TATA element in the DNA
and orientates RNA Pol II to transcribe downstream DNA.
(B)Inthe absence of strong TATA elements common of CpG island promoters,
TF-recruited TBP and associated factors binds to low specificity sequences
and forms initiation complexes at similar frequencies in both directions.
Figure 2. Feedback Loops between Transcription, U1, and PAS
(A and B) Germ cell transcription exposes the coding strand (nontemplate,
which has the same sequence as the RNA) single stranded and vulnerable to
mutations toward G and T bases (A), which increases the chance of gaining
GT-rich sequences such as U1-binding site (50splice site [50SS]) and
also increases the chance of losing A-rich sequences such as PAS, which
terminates transcription (B). U1 binding can enhance transcription through
promoting transcription initiation and reinitiation, and also inhibiting the usage
of nearby PAS.
Cell 155, November 21, 2013 ª2013 Elsevier Inc. 991
pathways in germ cell-transcribed genes, in the absence of
selection (Green et al., 2003; McVicker and Green, 2010; Polak
and Arndt, 2008).
Accumulation of G and T content on the coding strand will
strengthen the U1-PAS axis (Figure 2). A-rich sequences such
as PAS (AATAAA) are likely to be lost when the genomic DNA
accumulates G and T. In contrast, G+T-rich sequences, such
as U1 snRNP-binding sites (e.g., resembling 50splice sites,
GjGTAAGT and GjGTGAGT), are likely to emerge in these
regions. Since promoter-proximal PAS reduces transcriptional
activity (Andersen et al., 2012), the loss of PAS and gain of U1
repair,and somatic hypermutation
sites should contribute to lengthening of the transcribed region
as well as its more robust transcription. The gain of U1 sites
initiation factors (Damgaard et al., 2008; Furger et al., 2002;
Kwek et al., 2002) or elongation factors (Fong and Zhou, 2001).
Therefore a positive feedback loop is formed: active transcrip-
tion causes the coding strand to accumulate sequence changes
favoring higher transcription activity.
As noted above, strengthening of the U1-PAS axis also favors
extension of the transcribed region. Being longer gives the tran-
script several advantages: by chance longer RNAs are more
likely to contain additional splicing signals such as a 30splice
site to become spliced, or binding sites for splicing-independent
nuclear export factors, thus escaping nuclear exosome degra-
dation by packaging and exporting to cytoplasm (Nott et al.,
2003; Singh et al., 2012). Longer RNAs are also more likely to
carry an open reading frame, either generated de novo or by
incorporation of gene remnants.
Once in the cytoplasm, the RNA should at some frequency be
translated into short polypeptides due to widespread transla-
tional activity (Carvunis et al., 2012). Some of the polypeptides
may provide advantage to the organism and become fixed in
the population, thereby forming a new gene.
Accelerating Other New Gene Origination Processes
In addition to de novo gene origination, the model described
in regions of divergent transcription. Tandem duplication, retro-
position, and recombination of existing genes or gene fragments
are the major mechanisms for new gene origination (Chen et al.,
2013; Long et al., 2013). Most duplicated genes or gene frag-
ments are silenced due to the lack of required elements such
as a promoter. In contrast, genes or gene fragments inserted
into regions of divergent transcription, such as upstream of a
promoter or flanking an enhancer, will be transcribed, likely
under different regulation than prior to their insertion, and thus
could evolve to carry out functions different than the original
gene. In support of this, a recent survey of human and mouse
genes evolved from ‘‘domesticated’’ transposons (Kalitsis and
Saffery, 2009) showed that a significant proportion of them are
located in bidirectional promoters. Promoter upstream regions
also preferentially accumulate transposable elements, which
can carry 50splice site sequences that may accelerate the
process of new gene origination (Gotea et al., 2013).
New Gene Origination from Enhancers
and, as a result, new genes might originate at enhancer regions
through the same mechanism described above. The possibility
of enhancer-derived new genes has not been previously dis-
cussed. Manual inspection of a list of 24 hominoid-specific de
novo protein-coding genes (Xie et al., 2012) revealed that
MYEOV (myeloma overexpressed), a gene implicated in various
types of cancer (Janssen et al., 2000, 2002; Leyden et al., 2006;
Moss et al., 2006), is likely derived from an intergenic enhancer
in mouse. The mouse syntenic region of MYEOV is within a
5 kb region about 100 kb away from any gene, but covered
by intensive H3K4me1 marks, diagnostic of an enhancer, and
Figure 3. Divergent Transcription Drives New Gene Origination
(A–F) De novo protein-coding gene origination (A–E), and gene duplication or
translocation (F). (A) Divergent transcription of a gene (right dark block)
generates divergent noncoding RNA (ncRNA) in the upstream antisense
direction, which is terminated by PAS-dependent mechanism (PAS: red bars).
(B) Transcription increases G and T frequency on the coding strand, thus
increases the chance of encoding a U1 site (blue bar) that suppress a
downstream PAS (PAS1), favoring the usage of a downstream PAS (PAS2). (C)
Increase in G+T content also increases the chance of losing PAS sites (PAS2)
that activates a further downstream site (PAS3) and extends the transcribed
region. (D)The longertranscript acquires splicing signals, whichmakes it more
stable and exported to the cytoplasm. (E) The longer transcript encodes
a short ORF and the resulting short peptide is selected and fixed in the
population and becomes a new protein-coding gene. (F) Gene A is trans-
located or duplicated into the promoter upstream antisense region of gene B,
and evolves into a new gene A’. Thin and thick blocks represent transcribed
noncoding and coding regions, respectively.
992 Cell 155, November 21, 2013 ª2013 Elsevier Inc.
positive for Mediator binding in mouse ES cells, as well as
nascent transcription signals (GRO-seq) indicating divergent
transcription, all indicating this region is an active enhancer in
mouse ES cells. Further analysis is needed to firmly establish
the role of enhancer transcription in the origination of the
MYEOV gene. For example, it will be interesting to examine the
evolutionary dynamics of the spatial and functional relationship
between the enhancer/MYEOV locus and the corresponding
Predictions and Supporting Evidence
detected over a thousand lncRNAs annotated in the upstream
antisense region of human genes, whereas lncRNAs divergent
from the corresponding mouse protein-coding genes could not
be detected (Gotea et al., 2013). This observation suggests that
promoter divergent transcription could be capable of generating
a large number of primate-specific transcripts. Another study
(Xie et al., 2012), identified 24 hominoid-specific de novo pro-
tein-coding genes in human, five of which derive from bidirec-
tional promoters (p < 0.01, compared to shuffled gene positions),
confirming promoter divergent transcription as an important
source of de novo gene origination, and enhancer transcription
may drive the origination of other new genes, as noted above.
An important feature of genes originated in the proposed
model is that both the new gene and the ancestral gene are likely
to be expressed in germ cells. This is because for the transcrip-
tion-induced G and T bias to accumulate and spread in a popu-
lation, these mutations should occur in germ cells. A prediction
of the model is that new genes are preferentially expressed in
germ cells, or tissues with high fraction of germ cells. Consistent
with this, previous reports showed that lineage-specific genes in
human, fly, and zebrafish genomes are preferentially expressed
in reproductive organs or tissues, such as testis (Clark et al.,
2007; Levine et al., 2006; Tay et al., 2009; Yang et al., 2013).
Moreover, divergent gene pairs in the human genome are
enriched for housekeeping genes, such as DNA repair and
transcribed in germ cells. In addition, the strand bias of G and T
content correlates with germ cell but not somatic tissue gene
expression levels (Majewski, 2003).
The model could explain the origin of divergent protein-coding
gene pairs separated by less than 1 kb (usually less), which
account for 10% of human protein-coding genes (Adachi and
Lieber, 2002; Li et al., 2006; Piontkivska et al., 2009; Trinklein
et al., 2004; Wakano et al., 2012; Xu et al., 2012), far higher
proportion than would be expected if genes were randomly
distributed in the genome. The model proposed here provides
a natural explanation for the evolutionary origin of these gene
pairs. It is likely that many more genes originated from divergent
rupted by transposon insertion, recombination, or other genome
rearrangement events. The model also predicts that divergent
gene pairs commonly have unrelated functions, although they
frequently might share coexpression. Except for a few cases,
suchas histone gene pairsand collagen gene pairs thatare likely
results of tandem duplication, the majority of divergent gene
pairs in the human genome do not share higher functional simi-
larity compared to random gene pairs (Li et al., 2006; Xu et al.,
2012). For example, 35 of the 105 annotated DNA repair genes
have bidirectional promoters, making DNA repair the most
over-represented pathway for genes involved in bidirectional
promoters, yet all 35 DNA repair genes are paired with non-
DNA repair genes (Xu et al., 2012). Similarly, genes coding sub-
units of protein complexes are enriched in bidirectional pairs in
human, yet none of these pairs code for two subunits of the
samecomplex(Li etal.,2006).Asimilarobservation hasbeenre-
rectional conformation reduces expression noise and is not
strongly selected for share functionality (Wang et al., 2011).
The lack of functional relatedness is also illustrated by the paral-
lel evolution of bidirectional promoters of RecQ helicases (Piont-
during metazoan evolution, yet all evolve to have divergent part-
ners in human. However, these partner genes showed no func-
tional or sequence similarity with each other (Piontkivska et al.,
2009), suggesting parallel and independent origination of new
genes from all five promoters.
Impact on Genome Organization and Evolution
Divergent transcription likely facilitates the rearrangement
events that reshape the genome and also introduces unique
features into genome organization, including the sharing of
promoters, physical linkage in three-dimensional space, and
coexpression of distal genes.
Although vertebrates share most of their genes, the genomic
position and orientation of specific genes differ significantly
due to genome rearrangement events, such as translocation,
recombination, and duplication followed by the loss of the orig-
inal copy. The survival of the gene or gene fragments at the
new position can be facilitated by divergent transcription as dis-
cussed above. The role of divergent transcription in preserving
the function of the new gene copy is likely significant, given
that translocation preferentially occurs near active promoters
(Chiarle et al., 2011; Klein et al., 2011). The correlation between
transcription and translocation could potentially increase the
chance that the translocated gene is still expressed and thus
functional, therefore reducing the cost of translocation. For
example, although ?40% of human protein-coding genes can
be traced back to fish, fewer than 7% (83/1,262) of human
bidirectional gene pairs are also bidirectional in the fish genome
(Li et al., 2006), suggesting that most human bidirectional gene
through translocation facilitated by divergent transcription.
In addition to bidirectional organization, spatial and functional
coupling between distal gene pairs would be introduced through
new gene origination from enhancer transcription. Due to the
tight coupling between gene transcription and enhancer tran-
scription, an enhancer-derived new gene will share a significant
coexpression pattern with the old gene, despite the distance
in the linear genome. Such coupled transcription of distal
gene pairs brought together by chromatin interactions could
contribute to the formation of transcription factories, nuclear
foci where multiple genes are transcribed together without the
requirement of shared function (Edelman and Fraser, 2012;
Sutherland and Bickmore, 2009). The existence of transcription
Cell 155, November 21, 2013 ª2013 Elsevier Inc. 993
factories has been supported by increasing evidence, including
in vivo live imaging (Ghamari et al., 2013) and chromatin inter-
action mapping (Li et al., 2012). These are probably related to
super-enhancers where many genes that are coordinately ex-
pressed are associated with a common enhancer region (Love ´n
et al., 2013; Whyte et al., 2013). Overlaying comparative
genomics analysis onto high-throughput chromatin interaction
mapping data across multiple species (Dixon et al., 2012;
Li et al., 2012) may help to reveal the evolutionary origin of tran-
In conclusion, we propose that divergent transcription at pro-
moters and enhancers results in changes of the transcribed
DNA sequences that over evolutionary time drive new gene orig-
here are consistent with significant available data, systematic
tests of these models await further advances such as in-depth
characterization of additional genomes and experiments de-
signed to test specific hypothesis. Over evolutionary times,
genes formed through divergent transcription can be shuffled
to other locations losing their evolutionary context. We envision
future studies will uncover more functional surprises from diver-
gent transcription, and illuminate how intergenic transcription
is integrated into the cellular transcriptome.
We thank all the Sharp lab members, especially Andrea Kriz, Anthony Chiu,
Albert Almada, Mohini Jangi, and Jesse Zamudio for comments and
Christopher Burge, Qifang Liu, Jianrong Wang, and Jeremy Wilusz for critical
reading of the manuscript. This work was supported by United States Public
Health Service grants RO1-GM34277 and R01-CA133404 from the National
Institutes of Health (P.A.S.) and partially by the Koch Institute Support (core)
grant P30-CA14051 from the National Cancer Institute. X.W. is a Howard
Hughes Medical Institute International Student Research fellow.
Adachi, N., and Lieber, M.R. (2002). Bidirectional gene organization: a com-
mon architectural feature of the human genome. Cell 109, 807–809.
Aguilera, A., and Garcı ´a-Muse, T. (2012). R loops: from transcription byprod-
ucts to threats to genome stability. Mol. Cell 46, 115–124.
Almada, A.E., Wu, X., Kriz, A.J., Burge, C.B., and Sharp, P.A. (2013). Promoter
directionality is controlled by U1 snRNP and polyadenylation signals. Nature
Anamika, K., Gyenis, A`., Poidevin, L., Poch, O., and Tora, L. (2012). RNA
polymerase II pausing downstream of core histone genes is different from
genes producing polyadenylated transcripts. PLoS ONE 7, e38769.
Andersen, P.K., Lykke-Andersen, S., and Jensen, T.H. (2012). Promoter-
proximal polyadenylation sites reduce transcription activity. Genes Dev. 26,
Berg, M.G., Singh, L.N., Younis, I., Liu, Q., Pinto, A.M., Kaida, D., Zhang, Z.,
Cho, S., Sherrill-Mix, S., Wan, L., and Dreyfuss, G. (2012). U1 snRNP deter-
mines mRNA length and regulates isoform expression. Cell 150, 53–64.
Carvunis, A.R., Rolland, T., Wapinski, I., Calderwood, M.A., Yildirim, M.A.,
Simonis, N., Charloteaux, B., Hidalgo, C.A., Barbette, J., Santhanam, B.,
et al. (2012). Proto-genes and de novo gene birth. Nature 487, 370–374.
Chen, S., Krinsky, B.H., and Long, M. (2013). New genes as drivers of pheno-
typic evolution. Nat. Rev. Genet. 14, 645–660.
Chiarle, R., Zhang, Y., Frock, R.L., Lewis, S.M., Molinie, B., Ho, Y.-J., Myers,
D.R., Choi, V.W., Compagno, M., Malkin, D.J., et al. (2011). Genome-wide
translocation sequencing reveals mechanisms of chromosome breaks and
rearrangements in B cells. Cell 147, 107–119.
Churchman, L.S., and Weissman, J.S. (2011). Nascent transcript sequencing
visualizes transcription at nucleotide resolution. Nature 469, 368–373.
Clark, A.G., Eisen, M.B., Smith, D.R., Bergman, C.M., Oliver, B., Markow, T.A.,
Kaufman, T.C., Kellis, M., Gelbart, W., Iyer, V.N., et al.; Drosophila 12
Genomes Consortium. (2007). Evolution of genes and genomes on the
Drosophila phylogeny. Nature 450, 203–218.
Core, L.J., Waterfall, J.J., and Lis, J.T. (2008). Nascent RNA sequencing
reveals widespread pausing and divergent initiation at human promoters.
Science 322, 1845–1848.
Core, L.J., Waterfall, J.J., Gilchrist, D.A., Fargo, D.C., Kwak, H., Adelman, K.,
and Lis, J.T. (2012). Defining the status of RNA polymerase at promoters. Cell
Rep 2, 1025–1035.
Damgaard, C.K., Kahns, S., Lykke-Andersen, S., Nielsen, A.L., Jensen, T.H.,
and Kjems, J. (2008). A 50splice site enhances the recruitment of basal tran-
scription initiation factors in vivo. Mol. Cell 29, 271–278.
De Santa, F., Barozzi, I., Mietton, F., Ghisletti, S., Polletti, S., Tusi, B.K., Muller,
H., Ragoussis, J., Wei, C.-L., and Natoli, G. (2010). A large fraction of extra-
Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and
Ren, B. (2012). Topological domains in mammalian genomes identified by
analysis of chromatin interactions. Nature 485, 376–380.
Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A.,
Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., et al. (2012). Landscape of
transcription in human cells. Nature 489, 101–108.
Edelman, L.B., and Fraser, P. (2012). Transcriptionfactories:genetic program-
ming in three dimensions. Curr. Opin. Genet. Dev. 22, 110–114.
Flynn, R.A., Almada, A.E., Zamudio, J.R., and Sharp, P.A. (2011). Antisense
RNA polymerase II divergent transcripts are P-TEFb dependent and sub-
strates for the RNA exosome. Proc. Natl. Acad. Sci. USA 108, 10460–10465.
Fong, Y.W., and Zhou, Q. (2001). Stimulatory effect of splicing factors on
transcriptional elongation. Nature 414, 929–933.
Furger, A., O’Sullivan, J.M., Binnie, A., Lee, B.A., and Proudfoot, N.J. (2002).
Promoter proximal splice sites enhance transcription. Genes Dev. 16, 2792–
Ghamari, A., van de Corput, M.P.C., Thongjuea, S., van Cappellen, W.A., van
Ijcken, W., van Haren, J., Soler, E., Eick, D., Lenhard, B., and Grosveld, F.G.
(2013). In vivo live imaging of RNA polymerase II transcription factories in
primary cells. Genes Dev. 27, 767–777.
Gotea, V., Petrykowska, H.M., and Elnitski, L. (2013). Bidirectional promoters
as important drivers for the emergence of species-specific transcripts. PLoS
ONE 8, e57323.
Green, P., Ewing, B., Miller, W., Thomas, P.J., and NISC Comparative
Sequencing Program, and Green, E.D. (2003). Transcription-associated muta-
tional asymmetry in mammalian evolution. Nat. Genet. 33, 514–517.
Guttman, M., and Rinn, J.L. (2012). Modular regulatory principles of large non-
coding RNAs. Nature 482, 339–346.
Janssen, J.W.G., Vaandrager, J.W., Heuser, T., Jauch, A., Kluin, P.M., Geelen,
E., Bergsagel, P.L., Kuehl, W.M., Drexler, H.G., Otsuki, T., et al. (2000).
D1 in a subset of multiple myeloma cell lines with t(11;14)(q13;q32). Blood 95,
Janssen, J.W.G., Cuny, M., Orsetti, B., Rodriguez, C., Valle ´s, H., Bartram,
C.R., Schuuring, E., and Theillet, C. (2002). MYEOV: a candidate gene for
DNA amplification events occurring centromeric to CCND1 in breast cancer.
Int. J. Cancer 102, 608–614.
Kaida, D., Berg, M.G., Younis, I., Kasim, M., Singh, L.N., Wan, L., and Drey-
fuss, G. (2010). U1 snRNP protects pre-mRNAs from premature cleavage
and polyadenylation. Nature 468, 664–668.
994 Cell 155, November 21, 2013 ª2013 Elsevier Inc.