Primate MicroRNAs miR-220 and
miR-492 Lie within Processed
ERIC J. DEVOR
From Molecular Genetics and Bioinformatics, Integrated DNA Technologies, 1710 Commercial Park, Coralville, IA 52241.
Address correspondence to E. J. Devor at the address above, or e-mail: email@example.com.
MicroRNAs (miRNAs) are a new and abundant class of small,
noncoding RNAs. To date, the evolutionary history of most
of these loci appears to be marked by duplication and di-
vergence. The ultimate origin of miRNAs remains an open
question. A survey of the genomic context of more than
300 human miRNA loci revealed that two primate-specific
miRNAs, miR-220 and miR-492, each lie within a processed
pseudogene. In silico and in vitro examinations of these two
loci suggest that this is a rare phenomenon requiring the
juxtaposition of a specific combination of factors. Thus it
appears that, while processed pseudogenes are good candi-
dates for miRNA incubators, it is unlikely that more than
a very small percentage of new miRNAs arise this way.
MicroRNAs (miRNAs) are an abundant class of small, non-
coding RNAs. First reported in Caenorhabditis elegans just over
a decade ago (Lee et al. 1993), miRNAs are found in animal,
2004; Pfeffer et al. 2004). Functional studies of miRNAs
show that they are potent regulators of gene expression and
play a crucial role in cellular processes such as cell differen-
tiation, apoptosis, and cell proliferation (Pasquinelli et al.
2005). Many extant miRNAs appear to have arisen via dupli-
cation from existing miRNAs (Tanzer and Stadler 2004;
Tanzer et al. 2005), but the ultimate origin of these loci is
an open question. A survey of the genomic context of 321
human miRNAs from RELEASE 7.0 of miRBase (Ambros
et al. 2003; Griffiths-Jones 2004) revealed that just two,
hsa-miR-220 and hsa-miR-492, lie within annotated pro-
cessed pseudogenes. miRNA hsa-miR-220, located at Xq25,
is expressed on the opposite strand from and is con-
tained completely within a b-tubulin–processed pseudogene
(LOC402422, GenBank accession no. NT_011786), and
miRNA hsa-miR-492, located at 12q22, is expressed on the
sense strand of an incomplete keratin-19–processed pseudo-
gene (LOC160313, GenBank accession no. NG_002383).
Results of in vitro and in silico analyses of these miRNAs
demonstrate that they are de novo loci in human, ape, and
old world monkey (OWM) genomes and that they became
expressed miRNAs after their pseudogene incubators were
created. Examination of these loci suggests that this is likely
a unique, or, at best, a rare phenomenon resulting from a for-
tuitous combination of factors including mRNA sequence
and specific genomic context.
Materials and Methods
miRNAs hsa-miR-220 and hsa-miR-492 are two examples of
a growing number of human miRNAs listed in RELEASE
7.0 of miRBase, the miRNA database (Ambros et al. 2003;
Griffiths-Jones 2004), for which no ortholog can be found
in mouse or rat genomes (http://microrna.sanger.ac.uk/
sequences/index.shtml). Due in silico diligence will resolve
some of these in either rodent or other eutherian genomes,
but many are primate specific. Using the chromosome
coordinates for hsa-miR-220 (X chromosome, 122421481–
122421590, Xq25, minus strand) and for hsa-miR-492 (Chro-
mosome 12, 93730642–93730757, 12q22, plus strand), these
loci were found to be encoded within annotated processed
pseudogenes. Locus hsa-miR-220 is transcribed on the op-
posite strand from and completely within the b-tubulin–
processed pseudogene identified as LOC402422, GenBank
accession no. NT_011786, and locus hsa-miR-492 is tran-
scribed on the sense strand of a keratin-19–processed pseu-
dogene identified as LOC160313, GenBank accession no.
NG_002383 (Figure 1). This suggested that both miRNAs
evolved after the pseudogenes were created. Genome an-
notation of LOC402422 identifies it as a TUBB4-processed
pseudogene. Clustal alignments of numerous b-tubulin
mRNAs with the pseudogene sequence indicate that TUBB5
is more likely to be the antecedent (data not shown). Thus,
GenBank accession no. AY890656) was used to estimate the
age of the reverse transcription and retroposition event that
keratin-19 (KRT19, GenBank accession no. NM_002276)
was used to estimate the age of the reverse transcription
and retroposition event that created LOC160313.
Journal of Heredity 2006:97(2):186–190
Advance Access publication February 17, 2006
ª The American Genetic Association. 2006. All rights reserved.
For permissions, please email: firstname.lastname@example.org.
Hsa-miR-220 sequences from several nonhuman primates
deposited in GenBank (see Berezikov et al. 2005) were then
used to design polymerase chain reaction (PCR) primers to
amplify and sequence this locus in additional nonhuman pri-
for hsa-miR-492, but the locus was found in the chimpanzee,
orangutan,and rhesusmacaque genomeassembliesand these
sequences were used to design PCR primers to amplify and
sequence this locus in additional nonhuman primate species.
All primers were designed with and assessed for melting tem-
perature and secondary structures using PRIMERQUEST
online software (available as part of the Integrated DNA
Technologies (IDT) SCITOOLS software, www.idtdna.
com/scitools/scitools.aspx). PCR amplifications were car-
ried out against a genomic DNA panel composed of human
(Homo sapiens), chimpanzee (Pan troglodytes), gorilla (Gorilla
gorilla), orangutan (Pongo pygmaeus), siamang (Hylobates syn-
dactylus), vervet monkey (Chlorocebus aethiops), olive baboon
(Papio anubis), Assamese macaque (Macaca assamensis), pigtail
macaque (Macaca nemstrina), rhesus macaque (Macaca mulatta),
squirrel monkey (Saimiri boliviensis), white-fronted capuchin
(Cebus albifrons), and brown lemur (Eulemur fulvus).
Amplicons were sequenced in both directions on an Ap-
plied Biosystems Model 310 automated fluorescence DNA
sequencer. Species for which miR-220 sequences were not
previously deposited in GenBank by Berezikov et al. (2005)
are orangutan (DQ088046), siamang (DQ088047), olive ba-
boon (DQ088048), vervetmonkey (DQ088049), and Assam-
in GenBank for gorilla (DQ289545), siamang (DQ289547),
baboon (DQ289550), vervet monkey (DQ289548), and
Assamese macaque (DQ289549). Comparative miR-492 se-
quences for chimpanzee, orangutan, and rhesus macaque
were obtained via BLAST search of National Center for
Biotechnology Information and ENSEMBL.
LOC402422, referred to herein as TUBB5W, lies in Xq25
about 3 kb 3# from another processed pseudogene
(NDUFA4W, AL030996) and 36.7 kb 5# from the gene en-
coding the transcription/export complex member THOC2
(AL030996, NM_020449) (Figure 1). REPEATMASKER
(http://www.repeatmasker.org) shows that the region be-
(94%) of repetitive sequences including Alu and L1 elements
and most (68.6%) of the region between TUBB5W and
THOC2 as well. In particular, the 10-kb region immediately
3# is completely (98%) composed of L1 elements. Alignment
of the LOC402422 sequence with the mRNA of human
TUBB5 reveals a 7.9% sequence divergence (P 5 .079) com-
along witha total of 17 stop codons ofwhich11 are in-frame.
A 12-base insertion site repeat (TTAATTAA-TAG-5# and
TTAATAAAATAG-3#) flanks the TUBB5W sequence.
LOC160313, referred to herein as KRT19W, lies in a sparsely
3# of DAP13 (Figure 1). REPEATMASKER shows that the
region immediately surrounding KRT19W is also repeat rich
or pseudogene are shown along with flanking repeats indicated by REPEATMASKER.
Map of the human genomic regions containing hsa-miR-220 and hsa-miR-492. Distances to the nearest annotated gene
the 5# untranslated region (UTR) and start codon. Within the
remaining aligned sequence, there is a 10.8% sequence diver-
gence (P 5 .108) composed of 122 nucleotide changes (95
transitions and 27 transversions) and nine indels. These
changes introduce 22 stop codons into the sequence. The
large 5# deletion appears to have happened at the time of ret-
roposition as the remaining sequence is flanked by a 15-base
insertion site repeat (AGAAAAGTTCCAGTC). Thus, these
loci display all the hallmarks of classical processed pseudo-
genes (Devor and Moffat-Wilson 2003).
Using the age estimation expression T 5 K/2r, where r is
taken to be 1.5 ? 10?9sequence changes per position per
year (Li 1997) and K is the Jukes-Cantor correction ?3/4
ln(1 ? 4/3p) (Jukes and Cantor 1969), an estimated age of
27.8 million years is obtained for TUBBW and 38.9 million
years for KRT19W. Though this method of age estimation
must be considered approximate because there are numer-
ous instances of the volatile CpG dimer in each sequence
(cf. Labuda and Striker 1989) and the large deletion in the
miR-492 sequence, subsequent PCR amplifications are con-
sistent as only human, ape, and OWM samples yielded ampli-
cons containing miR-220 or miR-492. Therefore, both loci
were reverse transcribed and retroposed into the primate ge-
nome after the divergence of OWM and new world monkey,
an event estimated to have taken place between 35 and 40
million years ago, but prior to divergence of OWM and apes,
an event estimated to have taken place between 20 and 25
million years ago (Szalay and Delson 1979).
Precursor sequences (pre-miRNAs) of miR-220 in 11 pri-
mate species and of the orthologous reverse complement of
human TUBB5 are shown in Figure 2. Also shown in Figure 2
are pre-miR-492 sequences for nine primate species and the
ortholgous region of human KRT19. MiRNAsare composed
of a primary RNA transcript (pri-miRNA) up to several kilo-
bases in length. Within this is the pre-miRNA transcript, usu-
ally 80–110 bases long, that forms a stable hairpin. This
hairpinis excised from thepri-miRNA by a complex contain-
ing the enzyme DROSHA and its cofactor DGCR8 (aka.
PASHA in Drosophila melanogaster and C. elegans). The hairpin
structure is exported from the nucleus as a double-stranded
RNA by exportin-5, whereona mature miRNA sequence21–
23 bases long is processed by the same Dicer/RISC complex
known to be responsible for RNA interference (cf. Bartel
2004; Berezikov and Plasterk 2005). It is the mature miRNA
purifying selection, particularly in the mature miRNA se-
quence,such thatevenvery ancient loci display little variation
even among distantly related families (Floyd and Bowman
2004; Pasquinelli et al. 2000). The pre-miRNA sequence
alignments presented in Figure 2 reveal a number of nucle-
otide changes throughout both miR-220 and miR-492. The
usual pattern of interspecies nucleotide variation in miRNAs
is marked by a high level of conservation in the mature
miRNA and its complement, a lower level of conservation
in both the stem and loop sequences, and a further decrease
of conservation in the sequences flanking the pre-miRNA.
This is the ‘‘camel-shaped’’ conservation profile described
by Berezikov and Plasterk (2005). Berezikov et al. (2005)
point out that nucleotide changes in pre-miRNAs can occur
in unpaired sites or in paired sites in the hairpin. Among
paired sites, the nucleotide substitution will either disrupt
the pairing or not (e.g., their example G::U to A::U).
the mature miRNA and its stem complement are under the dotted lines. The mature miRNA is indicated in italics. Conserved
nucleotides are indicated as dots, and deletions are shown as dashes. Also shown is the corresponding region from the
parent gene of each pseudogene.
Alignment of primate pre-miR-220 and pre-miR-492 sequences. Stem and loop regions are under the solid lines, while
Journal of Heredity 2006:97(2)
The nucleotide substitutions seen in Figure 2 represent
all three types, including several in the mature miRNA, par-
ticularly in miR-220. In order to assess the effects of the
observed sequence variations on hairpin structure and
thermodynamic stability, pre-miRNA transcripts were evalu-
ated using MFOLD (Zuker 2003, available online in IDT
SCITOOLS). Each transcript was evaluated as a linear
RNA sequence at 37?C. Results of this analysis are shown
in Table 1. Hairpin stability is measured by the thermody-
namicparameter DG,thechangeinGibbs free energy inkilo-
calories per mole. The expression DG 5 DH ? TDS, where
DH is the total energy exchange between the system and its
environment (enthalpy), DS is the energy spent by the system
to organize itself (entropy), and T is the absolute temperature
in Kelvin (?C þ 273.15), will indicate the stability of a hairpin
structure at a given temperature. The more negative the value
of DG, the more stable the hairpin. In both miR-220 and miR-
492, nucleotide differences relative to the human sequence
are seen to have a negative impact on thermodynamic stabil-
ity. That is, maxDG becomes less negative. However, in only
one case, that of miR-220 in gorilla, was the hairpin structure
itself significantly altered. These results tend to support the
view of Berezikov et al. (2005) that there is selective pressure
on pre-miRNA secondary structure, but some amount of
structural change is tolerated.
miRNAs miR-220 and miR-492 are unique to primates, spe-
cifically to OWM, apes, and humans. Both were found to lie
within processed pseudogenes estimated to have been cre-
ated 27 and 39 million years ago, respectively. Pre-miR-220
and pre-miR-492sequences were obtained for severalprimate
species representing African and Asian OWM, African and
Asian apes, and humans. These pre-miRNA sequences dis-
play a number of sequence variants, including a total of seven
variants within the mature miRNA itself. However, while
these changes do impact hairpin stability, they do not affect
As with the vast majority of miRNAs, the specific regu-
latory targets of miR-220 and miR-492 are unknown. How-
ever, there is evidence that these loci are being transcribed,
at least in the human genome (Bentwich et al. 2005; Lim
et al. 2003). On the other hand, their transcriptional status
in other primates is yet to be determined.
The observation of miRNAs evolving from inside pro-
cessed pseudogenes raises the question of whether such a
mechanism might explain the origin of at least some other
miRNAs that are not clearly due to duplications. It has
already been demonstrated that one subset of miRNAs is de-
repeat features (Smalheizer and Torvik 2005). Several fea-
tures of processed pseudogenes make them potential candi-
dates as miRNA antecedents (Devor and Moffat-Wilson
2003). First, they are reasonably common occurrences in
many genomes. Further, the genes from which they arise
are most often those that are suitable candidates for miRNA
regulation such as housekeeping genes and other genes
expressed at fairly high levels. Second, while not essential
for miRNA formation, processed pseudogenes are created
from reverse transcribed mRNAs, which usually result in
the presence of an intact sequence from 5# UTR to 3#
UTR. Thus, any resulting hairpin structure would be guaran-
transcript. Finally, they are almost always free of selection
pressure. This would permit changes affecting the sequence
to occur at will.
There are two different ways to approach an answer to
the question of the potential role of processed pseudogenes
as miRNA incubators. The most straightforward is to simply
look. Using chromosome coordinates listed in RELEASE
7.0 of miRBase, genome context of more than 300 human
miRNAs was evaluated. Among these loci about 40% were
seen to be located in introns and the remainder in intergenic
space. However, only the two loci reported here were found
within an annotated processed pseudogene. This is not to
say hat more such loci will not be found as estimates of
the ultimate number of miRNAs in the human genome as
high as 1,000 have been forwarded.
The apparent rarity, at least for now, of hsa-miR-220 and
hsa-miR-492 leads to the second approach to answering the
question. How likely is it that a processed pseudogene will
contain a hairpin structure suitable for forming a miRNA?
The preliminary answer is that it is very likely. In silico RNA
transcripts from 14 human processed pseudogenes, selected
solely because they were about the same size as TUBB5W
and KRT19W (2,302 and 1,153 bp, respectively), were sub-
mitted to MFOLD analyses with the result that every one
presented one or more pre-miRNA–suitable hairpin struc-
tures (i.e., length between 70 and 110 continuous bases with
an estimated DG of ?30.0 kcal or greater). If, therefore, it
is so apparently easy for processed pseudogene sequences
to have potential miRNA hairpins, why are they not more
common? The answer to this lies in the fact that miRNAs
are transcribed, and the appropriate transcription machinery
is not carried within pseudogenes themselves. Thus,
hsa-miR-220 and hsa-miR-492 not only possessed an appro-
priate hairpin structure but also were fortuitously retro-
posed to a position where a cis-acting RNA polymerase II
for various primate species
Maximum DG values for miRNA hairpin structures
Homo sapiens (Hsa)
Pan troglodytes (Ptr)
Gorilla gorilla (Ggo)
Pongo pygmaeus (Ppy)
Hylobates syndactylus (Hsy)
Chlorocebus aethiops (Cae)
Papio anubis (Pan)
Macaca mulatta (Mnl)
W-source gene orthologa
aFor miR-220 this is the reverse transcript of the orthologous region of
TUBB5, and for miR-492 this is the direct ortholog from KRT19.
transcription site (cf. Cai et al. 2004; Lee et al. 2004) was Download full-text
available within a reasonable distance. While the precise lo-
cation of these sites must await identification of the pri-
miRNA transcript for both these loci, a PROMOTER 2.0
(Knudsen 1999) scan of some 10 kb of upstream human
genomic sequence did indicate that several candidate tran-
scription sites are present.
Finally, accepting for the moment that processed pseudo-
genesandjuxtaposed L2orotherrepeats willprovetoberare
origins for new miRNAs, the question remains as to the ul-
timate origin of this important class of gene expression reg-
ulators. Allen et al. (2004) offered a tantalizing glimpse from
Arabidopsis of miRNAs evolving from inverted duplications
of what ultimately becomes the target site for regulation, but
this, too, appears to be a rare occurrence. On the other hand,
perhaps, the identification of three very different albeit rare
mechanisms for miRNA origins is, in fact, the answer. miR-
NAs may have evolved opportunistically and took advantage
of cellular mechanisms that were already present, such as the
Dicer/RISC complex, and there is noone ultimate source for
these loci. This possibility is not out of the question, and it
could explain why there are no consistent features among
miRNAs apart from the fact that all of them have a pre-
miRNA hairpin of some sort.
Allen E, Xie Z, Gustafson AM, Sung GH, Spatafora JW, and Carrington
JC, 2004. Evolution of microRNA genes by inverted duplication of target
gene sequences in Arabidopsis thaliana. Nat Genet 36:1282–1290.
Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X,
Dreyfuss G, Eddy SR, Griffiths-Jones S, Marshall M, Matzke M,
Ruvkun G, and Tuschl T, 2003. A uniform system for microRNA
annotation. RNA 9:277–279.
Bartel DP, 2004. MicroRNAs: genomics, biogenesis, mechanism, and func-
tion. Cell 116:281–297.
Bentwich I, Avinel A, Karov Y, Aharonov R, Gilad S, Barad O, Barzilai A,
Einat P, Einav U, Meiri E, Sharon E, Spector Y, and Bentwich Z, 2005.
Identification of hundreds of conserved and nonconserved human micro-
RNAs. Nat Genet 37:766–770.
Berezikov E, Guryev V, van de Belt J, Weinholds E, Plasterk RHA, and
Cuppen E, 2005. Phylogenetic shadowing and computational identification
of human microRNA genes. Cell 120:21–24.
Berezikov E and Plasterk RHA, 2005. Camels and zebrafish, viruses and
cancer: a microRNA update. Hum Mol Genet 14(2):R183–R190.
Cai X, Hagedorn CH, and Cullen BR, 2004. Human microRNAs are pro-
cessed from capped, polyadenylated transcripts that can also function as
mRNAs. RNA 10:1957–1966.
Devor EJ and Moffat-Wilson KA, 2003. Molecular and temporal character-
istics of human retropseudogenes. Hum Biol 75:661–672.
Floyd SK and Bowman JL, 2004. Ancient microRNA target sequences in
plants. Nature 428:485–486.
Griffiths-Jones S, 2004. The microRNA registry. Nucleic Acids Res 32:
Jukes TH and Cantor CR, 1969. Evolution of protein molecules. In: Evo-
lution of protein molecules (Munro HN, ed). New York: Academic Press;
Knudsen S, 1999. Promoter2.0: for the recognition of PolII promoter
sequences. Bioinformatics 15:356–361.
Labuda D and Striker G, 1989. Sequence conservation in Alu evolution.
Nucleic Acids Res 17:2477–2491.
Lee RC, Feinbaum RL, and Ambros V, 1993. The C. elegans heterochronic
gene lin-4 encodes small RNAs with antisense complementarity to lin-14.
Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, and Kim VN, 2004.
MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23:
Li W-H, 1997. Molecular evolution. Sunderland, MA: Sinauer.
Lim LP, Glasner ME, Yekta S, Burge CB, and Bartel DP, 2003. Vertebrate
microRNA genes. Science 299:1540.
Murchison EP and Hannon GJ, 2004. miRNAs on the move: miRNA bio-
genesis and the RNAi machinery. Curr Opin Cell Biol 16:223–229.
Pasquinelli AE, Hunter S, and Bracht J, 2005. MicroRNAs: a developing
story. Curr Opin Genet Dev 15:200–205.
Pasquinelli AE, Reinhart BJ, Slack F, Martindale MQ, Kuroda MI, Maller B,
Finnerty J, Corbo J, Levine M, Leahy P, Davidson E, and Ruvkun G, 2000.
Conservationofthe sequenceand temporalexpressionof let-7heterochronic
regulatory RNA. Nature 408:86–89.
Pfeffer S, Zavolan M, Grasser FA, Chien M, Russo JJ, Ju J, John B, Enright
AJ, Marks D, Sander C, and Tuschl T, 2004. Identification of virus-encoded
microRNAs. Science 304:734–736.
Smalheizer NR and Torvik VI, 2005. Mammalian microRNAs derived from
genomic repeats. Trends Genet 21:322–326.
Szalay FS and Delson E, 1979. Evolutionary history of the primates. New
York: Academic Press.
Tanzer A, Amemiya CT, Kim C-B, and Stadler PF, 2005. Evolution of
microRNAs located within the Hox gene clusters. J Exp Zool 304B:1–10.
Tanzer A and Stadler PF, 2004. Molecular evolution of a microRNA cluster.
J Mol Biol 339:327–335.
Zuker M, 2003. Mfold web server for nucleic acid folding and hybridization
prediction. Nucleic Acids Res 31:3406–3415.
Received December 5, 2005
Accepted January 9, 2006
Corresponding Editor: William Modi
Journal of Heredity 2006:97(2)