A Driving Force for New Gene Origination?
Xuebing Wu1,2and Phillip A. Sharp1,3,*
1David H. Koch Institute for Integrative Cancer Research
2Computational and Systems Biology Graduate Program
3Department of Biology
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
The mammalian genome is extensively transcribed, a large fraction of which is divergent tran-
scription from promoters and enhancers that is tightly coupled with active gene transcription.
Here, we propose that divergent transcription may shape the evolution of the genome by new
Widespread Divergent Transcription
The vast majority of the human genome, including half of the
region outside of known genes, is transcribed (Djebali et al.,
2012). However, most intergenic transcription activity produces
short and unstable noncoding transcripts whose abundances
are usually an order of magnitude lower than those from typical
protein-coding genes. Except for a few well-studied cases
(see review in Guttman and Rinn, 2012; Lee, 2012; Mercer
et al., 2009; Ponting et al., 2009; Rinn and Chang, 2012; Ulitsky
and Bartel, 2013; Wang and Chang, 2011; Wei et al., 2011;
Wilusz et al., 2009), it’s unclear whether most intergenic tran-
scription is regulated or has cellular function.
Recent evidence has shown that most intergenic transcription
occurs near or is associated with gene transcription, such as
transcription from promoter and enhancer regions (Sigova
et al., 2013). The majority of mammalian promoters direct tran-
scription initiation on both sides with opposite orientations,
a phenomenon known as divergent transcription (Core et al.,
2008; Preker et al., 2008; Seila et al., 2008). Divergent tran-
scription generates upstream antisense RNAs (uaRNAs, or
PROMPTs, promoter upstream transcripts) near the 50end of
genes that are typically short (50–2,000 nucleotides) and rela-
tively unstable (Flynn et al., 2011; Ntini et al., 2013; Preker
et al., 2008, 2011). Similar divergent transcription also occurs
at distal enhancer regions, giving rise to RNAs termed enhancer
RNAs (eRNAs) (Kim et al., 2010; De Santa et al., 2010). In mouse
and human embryonic stem (ES) cells, most long noncoding
RNAs (lncRNAs, longer than 100 nucleotides) are associated
with protein-coding genes, including ?50% as uaRNAs and
?20% as eRNAs (Sigova et al., 2013). These observations
suggest that divergent transcription from promoters and
enhancers of protein-coding genes is the major source of
intergenic transcription in ES cells.
In the textbook model of a eukaryotic promoter, the direction-
ality isset by the arrangement of anupstream cis-element region
followed by a core promoter (Figure 1A). The cis-elements are
bound by sequence-specific transcription factors, whereas
the core promoter is bound by TATA-binding protein (TBP) and
other factors that recruit the core transcription machinery.
Most mammalian promoters lack a TATA element (TATA-less)
and are CpG rich (Sandelin et al., 2007). For these promoters,
TBP is recruited through sequence-specific transcription factors
such as Sp1 that bind CpG-rich sequences and components
of the TFIID complex that have little sequence specificity.
Thus, in the absence of strong TATA elements such as for CpG
island promoters, TBP-complexes are recruited on both sides
of the transcription factors to form preinitiation complexes
in both orientations (Figure 1B). This model is supported by
the observation that divergent transcription occurs at most
promoters that are associated with CpG islands in mammals,
whereas promoters with TATA elements in mammals and
worm are associated with unidirectional transcription (Core
et al., 2008; Kruesi et al., 2013). In addition, divergent tran-
scription is less common in Drosophila where CpG islands
are rare (Core et al., 2012). Since transcription factors with
chromatin remodeling potential and transcription activation
domains also bind at enhancer sites, it is not surprising that
these are also sites of divergent transcription. In fact, promoters
and enhancers have many properties in common, and it has
been shown recently that many intragenic enhancers can act
as alternative promoters producing tissue-specific lncRNAs
(Kowalczyk et al., 2012).
The U1-PAS Axis and Gene Maturation
Promoter-proximal noncoding transcription in both yeast and
mammals has been shown to be suppressed at the chromatin
level, including nucleosome remodeling (Whitehouse et al.,
2007), histone deacetylation (Churchman and Weissman,
2011), and gene loop formation (Tan-Wong et al., 2012). We
and others recently found that in mammals promoter upstream
antisense transcription is frequently terminated due to cleavage
of the nascent RNA by the same process responsible for the
generation of the poly A tract at the 30ends of genes (Almada
et al., 2013; Ntini et al., 2013). In both cases, the primary signal
directing this process is the poly (A) signal (PAS) motif, AAUAAA
or similar (Proudfoot, 2011). Pol II terminates transcription
990 Cell 155, November 21, 2013 ª2013 Elsevier Inc.
Wakano, C., Byun, J.S., Di, L.-J., and Gardner, K. (2012). The dual lives of
bidirectional promoters. Biochim. Biophys. Acta 1819, 688–693.
Wang, K.C., and Chang, H.Y. (2011). Molecular mechanisms of long noncod-
ing RNAs. Mol. Cell 43, 904–914.
Wang, G.-Z., Lercher, M.J., and Hurst, L.D. (2011). Transcriptional coupling
of neighboring genes and gene expression noise: evidence that gene orienta-
tion and noncoding transcripts are modulators of noise. Genome Biol. Evol. 3,
Wei, W., Pelechano, V., Ja ¨rvelin, A.I., and Steinmetz, L.M. (2011). Functional
consequences of bidirectional promoters. Trends Genet. 27, 267–276.
Whitehouse, I., Rando, O.J., Delrow, J., and Tsukiyama, T. (2007). Chromatin
remodelling at promoters suppresses antisense transcription. Nature 450,
Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H.,
Rahl, P.B., Lee, T.I., and Young, R.A. (2013). Master transcription factors
and mediator establish super-enhancers at key cell identity genes. Cell 153,
Wilusz, J.E., Sunwoo, H., and Spector, D.L. (2009). Long noncoding RNAs:
functional surprises from the RNA world. Genes Dev. 23, 1494–1504.
Xie, C., Zhang, Y.E., Chen, J.-Y., Liu, C.-J., Zhou, W.-Z., Li, Y., Zhang, M.,
Zhang, R., Wei, L., and Li, C.-Y. (2012). Hominoid-specific de novo protein-
coding genes originating from long non-coding RNAs. PLoS Genet. 8,
Xu, C., Chen, J., and Shen, B. (2012). The preservation of bidirectional pro-
moter architecture in eukaryotes: what is the driving force? BMC Syst. Biol.
6 (Suppl 1), S21.
Yang, L., Zou, M., Fu, B., and He, S. (2013). Genome-wide identification,
characterization, and expression analysis of lineage-specific genes within
zebrafish. BMC Genomics 14, 65.
996 Cell 155, November 21, 2013 ª2013 Elsevier Inc.