1Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA. 2Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139,
USA. 3Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.
Shortly after the discovery of messenger RNA, a large class of heteronu-
clear RNAs (hnRNAs)1 was described, which did not include mRNA or
associate with polyribosomes2. Following years of sifting through these
hnRNAs, the first RNA subfamilies were identified. These included small
nuclear RNAs involved in splicing regulation3 and small nucleolar RNAs
involved in ribosome biogenesis4, as well as the ribosomal RNAs and
transfer RNAs involved in translation5,6.
The world of RNA genes became even more complex with the discovery
of RNAs that resembled mRNA in length and splicing structure but did
not code for proteins. The first example was H19, which was identified as
an RNA that was induced during liver development in the mouse7. The
mouse H19 transcript contained no large open reading frames (ORFs), but
instead only small sporadic ORFs that were not evolutionarily conserved,
did not template translation in vivo and did not produce an identifiable
protein product8. Shortly afterwards, another non-coding RNA (ncRNA),
termed XIST, was found to be expressed exclusively from the inactive X
chromosome9 and later demonstrated to be required for X inactivation in
mammals10. Over the next two decades, more large ncRNA genes were dis-
covered including Airn11, Tug1 (ref. 12), NRON13 and HOTAIR14. With the
availability of a draft sequence of the human genome, it became clear that
much of the mammalian genome is transcribed15–18. These transcripts were
mapped to discrete loci throughout the genome. Over the next 10 years,
both large and small RNA transcripts were discovered at an unprecedented
rate15,17–20; however, the functional significance of most of these transcripts
was unclear. Although some of these could be considered noise21,22, there
are still many large ncRNAs that are known to have diverse functions23–29.
This Review focuses on the classic examples of large ncRNAs that have
helped to form the basis of more recent global studies of coding potential,
function and mechanism. We discuss the concepts that have emerged
from these examples that provide a framework for understanding the
principles of RNA interactions. We propose that by assembling distinct
regulatory components, large ncRNAs could produce intricate functional
specificity, which is suggestive of a possible modular RNA code.
ore than half a century after being placed as the central
component in the flow of genetic information from gene to
protein, it is now accepted that RNA can perform diverse roles.
After the sequencing of the human genome, the next major hurdle was
to define the genes it encoded. To do this, several research groups devel-
oped tiling microarrays17,19,20 and complementary DNA sequencing
methods15 to investigate transcriptional activity across the human
genome, which led to the observation of widespread transcription of
the genome. These studies, although limited to specific tissues and cell
types, demonstrated that the mammalian genome encodes many thou-
sands of non-coding transcripts including both short (<200 nucleotides
in length) and long (>200 nucleotides in length) transcripts. In this
Review, we focus on large ncRNAs produced from long transcripts,
including those that originate from intergenic loci or overlapping pro-
Dramatic innovations in sequencing technologies have allowed the
deep sequencing of cDNAs, known as RNA-Seq30; this deep sequenc-
ing, coupled with new computational methods for assembling the tran-
scriptome31, has identified non-coding transcripts across many different
cell types and tissues31,32. It is now clear that there are thousands of well-
expressed large ncRNAs with exquisite cell-type and tissue specificity31–33.
As the numbers of identified non-coding transcripts increased, so
did the uncertainty regarding their function; this led some authors to
express concern that many of these transcripts may be just transcrip-
tional noise21,22 with no function or incidental by-products of transcrip-
tion from enhancer regions34,35. These concerns are supported by the
observations that many of these transcripts are expressed at extremely
low levels32,36 and they have lower levels of evolutionary conservation
than protein-coding genes25,31,37. Although some of these transcripts
may indeed be transcriptional noise21, the remaining transcripts con-
sist of many distinct subclasses, including processed small RNAs18,29,38,
promoter-associated RNAs18,39, transcripts from enhancer regions34,35
and functional large ncRNAs14,23; each class varies in its expression and
conservation properties31,37. Distinguishing between these classes of
RNA transcripts requires additional biological information including
the coding potential of the RNA and the chromatin modifications of the
corresponding genomic region (Fig. 1a).
Genomic DNA is wrapped around histone proteins and packaged into
higher-order structures termed chromatin40. These histones can be
modified in different ways that are indicative of the underlying DNA
functional state. Advances in sequencing technologies have allowed
the comprehensive characterization of the chromatin-modification
landscape of mammalian genomes41–44. These studies revealed com-
binations of histone modifications (termed chromatin signatures) that
correspond to various gene properties, including a signature for active
It is clear that RNA has a diverse set of functions and is more than just a messenger between gene and protein. The
mammalian genome is extensively transcribed, giving rise to thousands of non-coding transcripts. Whether all of these
transcripts are functional is debated, but it is evident that there are many functional large non-coding RNAs (ncRNAs).
Recent studies have begun to explore the functional diversity and mechanistic role of these large ncRNAs. Here we
synthesize these studies to provide an emerging model whereby large ncRNAs might achieve regulatory specificity through
modularity, assembling diverse combinations of proteins and possibly RNA and DNA interactions.
Modular regulatory principles
of large non-coding RNAs
Mitchell Guttman1, 2 & John L. Rinn1,3
1 6 F E B R U A R Y 2 0 1 2 | V O L 4 8 2 | N A T U R E | 3 3 9
© 2012 Macmillan Publishers Limited. All rights reserved
12 Drosophila genomes. PLoS Comput. Biol. 4, e1000067 (2008).
51. Lin, M. F., Jungreis, I. & Kellis, M. PhyloCSF: a comparative genomics method
to distinguish protein coding and non-coding regions. Bioinformatics 27,
52. Finn, R. D. et al. The Pfam protein families database. Nucleic Acids Res. 38,
53. Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse
embryonic stem cells reveals the complexity and dynamics of mammalian
proteomes. Cell 147, 789–802(2011).
54. Galindo, M. I., Pueyo, J. I., Fouix, S., Bishop, S. A. & Couso, J. P. Peptides encoded
by short ORFs control development and define a new eukaryotic gene family.
PLoS Biol. 5, e106 (2007).
This paper demonstrates the existence of functional small peptides within a
presumed ‘non-coding’ transcript through ORF conservation, in vivo protein
identification and functional analysis.
55. Kondo, T. et al. Small peptides switch the transcriptional activity of Shavenbaby
during Drosophila embryogenesis. Science 329, 336–339 (2010).
56. Jiao, Y. & Meyerowitz, E. M. Cell-type specific analysis of translating RNAs in
developing flowers reveals new levels of control. Mol. Syst. Biol. 6, 419 (2010).
57. Li, Y. M. et al. The H19 transcript is associated with polysomes and may regulate
IGF2 expression in trans. J. Biol. Chem. 273, 28247–28252 (1998).
58. Cai, X. & Cullen, B. R. The imprinted H19 noncoding RNA is a primary microRNA
precursor. RNA 13, 313–316 (2007).
59. Yang, L. et al. ncRNA- and Pc2 methylation-dependent gene relocation between
nuclear structures mediates gene activation programs. Cell 147, 773–788
60. Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the
human genome. Proc. Natl Acad. Sci. USA 104, 19428–19433 (2007).
61. Kastenmayer, J. P. et al. Functional genomics of genes with small open reading
frames (sORFs) in S. cerevisiae. Genome Res. 16, 365–373 (2006).
62. Hanada, K., Zhang, X., Borevitz, J. O., Li, W. H. & Shiu, S. H. A large number
of novel coding small open reading frames in the intergenic regions of the
Arabidopsis thaliana genome are transcribed and/or under purifying selection.
Genome Res 17, 632–640 (2007).
63. Mattick, J. S. The genetic signatures of noncoding RNAs. PLoS Genet. 5,
64. Tsai, M. C. et al. Long noncoding RNA as modular scaffold of histone
modification complexes. Science 329, 689–693 (2010).
This paper identified multiple protein-interaction domains within HOTAIR that
together allowed it to carry out its function, which demonstrated that a large
ncRNA can act as a molecular scaffold.
65. Gupta, R. A. et al. Long non-coding RNA HOTAIR reprograms chromatin state to
promote cancer metastasis. Nature 464, 1071–1076 (2010).
66. Zappulla, D. C. & Cech, T. R. Yeast telomerase RNA: a flexible scaffold for protein
subunits. Proc. Natl Acad. Sci. USA 101, 10024–10029 (2004).
This paper demonstrated that telomerase RNA can bridge proteins by showing
that protein interaction domains can be swapped and spacer regions deleted
with minimal impact on the function of the RNA.
67. Korostelev, A. & Noller, H. F. The ribosome in focus: new structures bring new
insights. Trends Biochem. Sci. 32, 434–441 (2007).
68. Ivanova, N. et al. Dissecting self-renewal in stem cells with RNA interference.
Nature 442, 533–538 (2006).
69. Martens, J. A., Laprade, L. & Winston, F. Intergenic transcription is required
to repress the Saccharomyces cerevisiae SER3 gene. Nature 429, 571–574
70. Schmitt, S., Prestel, M. & Paro, R. Intergenic transcription through a Polycomb
group response element counteracts silencing. Genes Dev. 19, 697–708 (2005).
71. Lee, J. T. Lessons from X-chromosome inactivation: long ncRNA as guides and
tethers to the epigenome. Genes Dev. 23, 1831 –1842 (2009).
72. Ponjavic, J., Oliver, P. L., Lunter, G. & Ponting, C. P. Genomic and transcriptional
co-localization of protein-coding and long non-coding RNA pairs in the
developing brain. PLoS Genet. 5, e1000617 (2009).
73. Tian, D., Sun, S. & Lee, J. T. The long noncoding RNA, Jpx, is a molecular switch
for X chromosome inactivation. Cell 143, 390–403 (2010).
74. Koerner, M. V., Pauler, F. M., Huang, R. & Barlow, D. P. The function of non-coding
RNAs in genomic imprinting. Development 136, 1771–1783 (2009).
75. Pandey, R. R. et al. Kcnq1ot1 antisense noncoding RNA mediates lineage-
specific transcriptional silencing through chromatin-level regulation. Mol. Cell
32, 232–246 (2008).
76. Bertani, S., Sauer, S., Bolotin, E. & Sauer, F. The noncoding RNA Mistral activates
Hoxa6 and Hoxa7 expression and stem cell differentiation by recruiting MLL1 to
chromatin. Mol. Cell 43, 1040–1046 (2011).
77. Feng, J. et al. The Evf-2 noncoding RNA is transcribed from the Dlx-5/6
ultraconserved region and functions as a Dlx-2 transcriptional coactivator.
Genes Dev. 20, 1470–1484 (2006).
78. Koziol, M. J. & Rinn, J. L. RNA traffic control of chromatin complexes. Curr. Opin.
Genet. Dev. 20, 142–148 (2010).
79. Maison, C. et al. Higher-order structure in pericentric heterochromatin involves
a distinct pattern of histone modification and an RNA component. Nature Genet.
30, 329–334 (2002).
80. Bernstein, E. et al. Mouse polycomb proteins bind differentially to methylated
histone H3 and RNA and are enriched in facultative heterochromatin. Mol. Cell.
Biol. 26, 2560–2569 (2006).
81. Wutz, A., Rasmussen, T. P. & Jaenisch, R. Chromosomal silencing and localization
are mediated by different domains of Xist RNA. Nature Genet. 30, 167–174
This paper reported the generation of deletion mutants across the Xist
locus and identified the discrete domains responsible for the silencing and
localization roles of the RNA.
82. Chu, C., Qu, K., Zhong, F. L., Artandi, S. E. & Chang, H. Y. Genomic maps of long
noncoding RNA occupancy reveal principles of RNA–chromatin interactions.
Mol. Cell 44, 667–678 (2011).
83. Simon, M. D. et al. The genomic binding-sites of a non-coding RNA. Proc. Natl
Acad. Sci. USA 108, 20497–20502 (2011).
84. Zhao, J., Sun, B. K., Erwin, J. A., Song, J. J. & Lee, J. T. Polycomb proteins
targeted by a short repeat RNA to the mouse X chromosome. Science 322,
85. Plath, K., Mlynarczyk-Evans, S., Nusinow, D. A. & Panning, B. Xist RNA and the
mechanism of X chromosome inactivation. Annu. Rev. Genet. 36, 233–278
86. Nagano, T. et al. The Air noncoding RNA epigenetically silences transcription
by targeting G9a to chromatin. Science 322, 1717–1720 (2008).
87. Zhao, J. et al. Genome-wide identification of Polycomb-associated RNAs by
RIP-seq. Mol. Cell 40, 939–953 (2010).
88. Kaneko, S. et al. Phosphorylation of the PRC2 component Ezh2 is cell cycle-
regulated and up-regulates its binding to ncRNA. Genes Dev. 24, 2615–2620
89. Wang, X. et al. Induced ncRNAs allosterically modify RNA-binding proteins in
cis to inhibit transcription. Nature 454, 126–130 (2008).
90. Kino, T., Hurt, D. E., Ichijo, T., Nader, N. & Chrousos, G. P. Noncoding RNA Gas5
is a growth arrest- and starvation-associated repressor of the glucocorticoid
receptor. Sci. Signal 3, ra8 (2010).
91. Salmena, L., Poliseno, L., Tay, Y., Kats, L. & Pandolfi, P. P. A ceRNA hypothesis:
the Rosetta stone of a hidden RNA language? Cell 146, 353–358 (2011).
92. Cesana, M. et al. A long noncoding RNA controls muscle differentiation by
functioning as a competing endogenous RNA. Cell 147, 358–369 (2011).
93. Greider, C. W. & Blackburn, E. H. Identification of a specific telomere terminal
transferase activity in Tetrahymena extracts. Cell 43, 405–413 (1985).
94. Feng, J. et al. The RNA component of human telomerase. Science 269,
95. Lingner, J. et al. Reverse transcriptase motifs in the catalytic subunit of
telomerase. Science 276, 561–567 (1997).
96. Jeon, Y. & Lee, J. T. YY1 tethers Xist RNA to the inactive X nucleation center.
Cell 146, 119–133 (2011).
97. Hasegawa, Y., Brockdorff, N., Kawano, S., Tsutui, K. & Nakagawa, S. The matrix
protein hnRNP U is required for chromosomal localization of Xist RNA. Dev.
Cell 19, 469–476 (2010).
98. Schmitz, K. M., Mayer, C., Postepska, A. & Grummt, I. Interaction of noncoding
RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing
of rRNA genes. Genes Dev. 24, 2264–2269 (2010).
99. Martianov, I., Ramadass, A., Serra Barros, A., Chow, N. & Akoulitchev, A.
Repression of the human dihydrofolate reductase gene by a non-coding
interfering transcript. Nature 445, 666–670 (2007).
100.Bartel, D. P. MicroRNAs: target recognition and regulatory functions. Cell 136,
Acknowledgements We thank M. Cabili, J. Engreitz, M. Garber, P. McDonel and A.
Pauli for their reading and suggestions; T. Cech for comments and suggestions;
E. Lander for helpful discussions and ideas; and S. Knemeyer and L. Gaffney for
assistance with figures in this Review.
Author Information Reprints and permissions information is available at
www.nature.com/reprints. The authors declare no competing financial inter-
ests. Readers are welcome to comment on the online version of this article
at www.nature.com/nature. Correspondence should be addressed to M.G.
(email@example.com) and J.L.R. (firstname.lastname@example.org).
3 4 6 | N A T U R E | V O L 4 8 2 | 1 6 F E B R U A R Y 2 0 1 2
© 2012 Macmillan Publishers Limited. All rights reserved