A Genome-Wide Map of Conserved
MicroRNA Targets in C. elegans
Sabbi Lall, Dominic Gru ¨n, Azra Krek, Kevin Chen,
Yi-Lu Wang, Colin N. Dewey, Pranidhi Sood,
Teresa Colombo, Nicolas Bray, Philip MacMenamin,
Huey-Ling Kao, Kristin C. Gunsalus, Lior Pachter,
Fabio Piano, and Nikolaus Rajewsky
Supplemental Results and Discussion
New MicroRNA Target Predictions across Six
We used the new version of the PicTar algorithm to
seven Drosophila species [S1]. We computed new tar-
get predictions based on our UCSC 30UTR alignments
at two different levels of conservations, correspond-
ing to modified versions of the settings S1 and S3 (de-
scribed in [S1]). The input data, i.e., the data sets of con-
served microRNAs and the UTR alignments as well as
the cohorts of randomized microRNAs, remained un-
changed compared to the previous version. The modi-
fied settings used for the new predictions are termed
S1.2 and S3.2, respectively. Setting S1.2 requires con-
servation of target sites between D. melanogaster,
D. yakuba, D. ananassae, and D. pseudoobscura. How-
ary distance to the reference species D. melanogaster,
we consider an anchor site conserved when a nucleus
is found in only one of those species and the sequence
of the entire UTR is missing in the third species. Setting
S3.2 requires an anchor site to be additionally con-
served in D. virilis and D. mojavensis, and these two
opposed to the first version of PicTar, we don’t mask
repeats in both cases in order to maintain the highest
possible sensitivity. Hence, the only remaining dif-
ference between settings S1.2 and S3.2 is the different
level of conservation. All additional modifications with
respect to free energy filtering and the treatment of
imperfect nuclei were applied as explained in the Exper-
imental Procedures section. The conservation score
was determined by the number of occurrences of a par-
ticular nucleus in anchor sites conserved in D. pseu-
doobscura (for S1.2) and D. mojavensis (for S3.2),
divided by the number of occurrences in D. mela-
nogaster. For the setting S1.2, we predict on average
105 target genes per microRNA, compared to 95 for
the setting S1. We computed the specificity by averag-
ing the ratio of the number of predicted targets for real
and randomized microRNAs over four cohorts and ob-
tained a signal-to-noise ratio of 2.5 (compared to 2.3
For the setting S3.2, PicTar computes on average 49
target genes per microRNA at a signal-to-noise ratio of
3.6. Compared to the first version of PicTar, this means
an increase of ~5 predicted target genes per microRNA
at an enhanced signal-to-noise ratio (3.2 for setting S3).
Compared to settings S1 and S3, respectively, the
specificity of the new PicTar predictions depends
much stronger on a PicTar score cutoff (shown for S1.2
in Figure S1). As can be seen from Figure S1, applying
a score cutoff allows us to extract highly specific target
predictions without losing too many predicted targets.
For setting S1.2 (S3.2), we obtained a linearly increasing
6.5 (15) at a score cutoff of 2.5. The sensitivity drops
down linearly from 105 (49) predicted target genes per
microRNA with no score cutoff to 45 (15) genes per mi-
ifications result in a less restrictive treatment of imper-
fect nuclei, the strong requirement of compensatory
binding of the microRNA 30end keeps their contribution
very low. Among all nuclei in anchor sites, imperfect nu-
and 0.5% of all cases with setting S3.2. The fraction of
genes with at least one predicted microRNA target site
We predict that 29% (16%) of all 9958 unique genes are
predicted targets of microRNAs for setting S1.2 (S3.2)
as opposed to 27% (15%) for setting S1 (S3). To assess
the influence of the PicTar modifications on the ranks of
microRNA-target gene pairs, we plotted all PicTar ranks
new version (Figure S2). Whenever a gene is a predicted
target of a particular microRNA for one PicTar version,
but not contained in the target list of the other, it is as-
signed rank zero for the version missing this prediction.
Thus, all newly predicted targets of the new version re-
side on the y axis, while all predictions present in the old
new version, we lose none of our previously predicted
targets, while the number of predicted targets grows
by ~10% in the new version. For both settings, the plot
be explained by a shift to higher ranks due to the inser-
tion of new predictions into the list of target genes.
Because the new scoring scheme assigns a distinct
probability to each possible nucleus of a particular mi-
croRNA, many ranks are observed to change signifi-
cantly. To assess the degree of similarity between both
versions, we asked (for each microRNA) for how many
of the predicted targets the difference between the old
and the new rank is less than 25% of the number of tar-
gets predicted for the microRNA under consideration.
We found that this is the case for roughly 70% of all pre-
dictions with setting S1.2 and for 74% with setting S3.2.
We conclude that the modifications of PicTar strongly
gorithm virtually without discarding any of the previous
Table S1. PicTar Identifies Previously Validated miRNA Targets in C. elegans
Target Sequences in C. elegansh
Yes YesYes Yesz
Yes YesYes Yes*,** TTTTATACAACCGTTCTACACTCA-27nt
Yes Yes* [ACCCATCTTATCCGAACTAAA..
Yes* Site 1: CTCCCAGTTAGCAACGGGCCC
Site 2: CAATCTGTTGTATGGGTAC
Yes YesYes Yesy,zYesYes1z
The tableillustrates the types of experimentalevidence that support miRNA/targetinteractions inC. elegans and the PicTar target finding scores
associated with these genes.
aA genetic interaction has been shown between the miRNA and target.
bInteraction evidence includes gel shift experiments that show the specificity of binding between the miRNA and target.
cExperimental evidence indicates that miRNAs and their suggested targets are expressed in a reciprocal manner.
eThe entire target 30UTR mediates downregulation of a reporter.
fThe predicted miRNA binding sequence is sufficient to mediate regulation in the context of a naive 30UTR.
gThe miRNA binding sequence is necessary for regulation, as shown via deletion (g1) or point mutation (g2) analysis. An asterisk indicates point
mutation analysis of the target site, and a double asterisk indicates further analysis by compensatory mutation in the miRNA.
hRegions in target 30UTR that contribute to differential downregulation. The sequence is required for regulation as shown by point mutations
(required sequence in italics) or by deletion (sequence bounding deleted region shown in brackets). In the case of lin-14, hbl-1, and lin-28,
the brackets indicate boundaries of the 30UTRs as defined by the respective publications. In the case of daf-12, the regions identified are sub-
regions of the 30UTR.
iRank and scores as calculated by PicTar. Those not predicted as targets are denoted ‘‘NP.’’
of the lin-4 sites in hbl-1, the input 30UTR lengths differ significantly. In the case of miR-273 sites in die-1, a perfect 6 nt nucleus is conserved
yhttp://pictar.bio.nyu.edu/data/nematode/single microRNA target predictions/singles_cel_let_7/20161/20161.html
zhttp://pictar.bio.nyu.edu/data/nematode/single microRNA target predictions/singles_cel_lin_4/19900/19900.html
aahttp://pictar.bio.nyu.edu/data/nematode/single microRNA target predictions/singles_cel_let_7/19648/19648.html
bbhttp://pictar.bio.nyu.edu/data/nematode/single microRNA target predictions/singles_cel_lsy_6/3257/3257.html
cchttp://pictar.bio.nyu.edu/data/nematode/single microRNA target predictions/singles_cel_lin_4/736/736.html
New MicroRNA Target Predictions across Six
By using the modified PicTar algorithm, we computed
new microRNA target predictions for vertebrates, based
on our previous data sets [S2]. The previous set of 30
UTR alignments served as input sequence. However, re-
peats were not masked anymore in order to increase the
sensitivity of the predictions. As in the first version, we
compute targets at two different levels of conservation.
The mammal predictions require conservation of anchor
sites between human, chimp, mouse, rat, and dog. The
vertebrate predictions require a nucleus to be addition-
ally conserved in chicken. Both human and chimp as
evolutionary level when compared to all species in-
cluded in the analysis. Hence, for both pairs of species
a nucleus in just one species was considered sufficient
if the sequence of the entire 30UTR is missing in the
other species. We maintained our sets of microRNAs
conserved in mammals and in vertebrates, respectively,
with the only change that we updated the mature micro-
RNA sequences with the current version provided by
Rfam (Release 7.0). To compute the signal-to-noise ra-
tio, we generated 3 cohorts for a subset of 23 unique mi-
croRNAs conserved in vertebrates and computed target
predictions for these randomized microRNAs with only
half of all 30UTRs. For mammals, we predict on average
320target genes per microRNA at asignal-to-noise ratio
of 1.9. Dependence of the specificity and the sensitivity
on thePicTarscore isshownin Figure S1.Forhighscore
cutoffs, the big error bars are due to the small numbers
of predicted targets for randomized microRNAs. At
a score cutoff of 1.4, the signal-to-noise ratio equals
2.3, which corresponds to the signal-to-noise ratio of
the previous predictions for mammals without anyscore
cutoff. At this signal-to-noise ratio, we now predict on
average 300 target genes per microRNA, an increase
of 7% compared to the old predictions. Furthermore,
came much stronger for the new predictions. At aPicTar
score cutoff of 10, we still predict on average 50 target
genes per microRNA at a signal-to-noise ratio of 5. Re-
quiring conservedanchor sites inmammalsand chicken
(vertebrates) yields on average 130 target genes per mi-
croRNA at a signal-to-noise ratio of 3. The signal-to-
noise ratio of our previous predictions without any score
cutoff (3.6), we now recover at a cutoff of 0.1 with 124
predicted target genes per microRNA (increase of
more than 20% compared to the old version). At a score
cutoff of 2, we still predict on average 60 targets per mi-
croRNA at a signal-to-noise ratio of 8. With a score cut-
off ranging from 0 to 3, the signal-to-noise ratio in-
creases linearly from 3 to 10, while the sensitivity
decreases from 130 to 30 predicted target genes per mi-
croRNA. The contribution of imperfect nuclei to con-
served anchor sites is again very minor. Among the tar-
get prediction for mammals, imperfect nuclei make up
for 1.5% of all nuclei in anchor sites, while this fraction
rises to 2% when requiring conservation of anchor sites
also in chicken. The total number of genes predicted to
be predicted targets of microRNA increased as well
compared to the previous version: for mammals (verte-
brates), we predict that 44% (16%) of all 17,450 unique
genes have at least one conserved anchor site com-
pared to 39% (15%) in the old version. The increase in
sensitivity is on the one hand due to the relaxed conser-
vation requirement. Especially the recent annotation of
the chimp genome used for the PicTar prediction still
contains many sequencing gaps and diminished the
sensitivity artificially in the first version. On the other
Table S2. Coverage of the Three Worm Genomes by the Orthology
Coverage of Genome
Coverage of Predicted
number of small scaffolds in its assembly and the fact that it is ap-
proximately 50% larger than the C. elegans genome.
Table S3. Number of Alignments Containing Sequence for All Spe-
cies up to a Given Level of Conservation, Including All Transcript
Variants and Only Unique Genes
C. elegans/C. briggsae
C. elegans/C. briggsae/C. remanei
Table S4. Number of Aligned Nucleotides for Each Species, in the
Case of All Alternate Transcript Forms, and Unique Genes
Table S5. Numbers of Orthologous Segment Sets and Predicted
Exon Sets Determined by a 1-to-1-to-1 Orthology Map of the Three
C. elegans/C. briggsae
C. elegans/C. remanei
C. briggsae/C. remanei
C. elegans/C. briggsae/C. remanei 2835
Sets are classified by the subset of genomes that they contain.
hand, leaving repeats unmasked already enhances the
sensitivity by approximately 2%. The difference be-
tween the new and the old PicTar ranks of the targets
of each microRNA is again measured by the fraction of
predictions with a rank change of less than 25% of the
length of the target list of the respective microRNA.
Approximately 52% (65%) of all predicted microRNA-
target gene pairs conserved in mammals (vertebrates)
do not change their rank by more than 25% of the target
list length of the respective microRNA. Hence, the
PicTar improvements led to changes in the ranking of
target genes, in this case even more pronounced than
in the case of Drosophila target predictions. However,
basically none of the previously predicted targets were
lost while we predict many additional targets with
Supplemental Experimental Procedures
Alignments of Orthologous Nematode Sequences
Whole-genome alignments of C. elegans (WormBase Release
WS120, March 2004), C. briggsae (cb25.agp8 assembly, July
2002), and C. remanei (Washington University, St. Louis, December
2004 assembly) were produced in a two-step approach that com-
bines orthology mapping with sequence alignment, as described
in these Supplemental Experimental Procedures and Figure S1. Fig-
ure S1 diagrams the process and its primary inputs and outputs.
First, exon annotations were obtained for all three genomes. For
the C. briggsae and C. remanei genomes, exon annotations were
produced by running the GENSCAN [S3], SNAP [S4], and geneid
[S5] gene prediction programs. Exon annotations for C. elegans
were obtained from the UCSC Genome Browser Web site [S6] and
consisted of the RefSeq [S7], WormBase, and Twinscan [S8] anno-
tation sets. Gene annotations from all sources were merged and fil-
tered to produce nonoverlapping coding exon sets for each ge-
nome. The amino acid sequences coded for by the exons in these
setswerethencomparedtoeachother withBLAT[S9].The resulting
pairwise hits were used by the MERCATOR program (http://bio.
math.berkeley.edu/mercator/; C.N.D. and L.P., unpublished data)
to create a 1-to-1-to-1 orthology map of the genomes. Because
the C. remanei and C. briggsae genomes were not yet assembled
into chromosomes, comparative-based assemblies of both ge-
nomes were produced during the orthology mapping process.
This resulted in an assembly of C. briggsae with 488 contigs of
N50 size 1.59 Mb and an assembly of C. remanei with 9200 contigs
of N50 size 0.86 Mb. The final multiple whole-genome alignment of
C. elegans, C. briggsae, and C. remanei revealed 3145 orthologous
segment sets. Each segment set contained at most one region from
each of the three genomes, with all regions in a set being ortholo-
gous and colinear. Because the segment sets represented a 1-to-
1-to-1 orthology map among the genomes, any particular genomic
position was contained in at most one segment set. The 2997 seg-
ment sets containing regions from C. elegans implied that along
Genome Alignment Process for C. elegans,
C. briggsae, and C. remanei
Data inputs, programs run, intermediate out-
puts, and main outputs are shown in green,
blue, yellow, and red, respectively.
Figure S2. Score-Dependent Sensitivity and
Specificity of PicTar Target Predictions for
Drosophila and Mammals
Signal-to-noise ratio (specificity, solid line)
and average number of predicted target
genes (sensitivity, dashed line) as a function
of a PicTar score cutoff for the modified Pic-
Tar algorithm in flies for setting S1.2 (A) and
in mammals, requiring conservation in hu-
man, chimp, mouse, rat, and dog (B). The sig-
nal-to-noise ratio was averaged over four
(three) cohorts of randomized microRNAs in
flies (mammals). The standard deviation is in-
dicated by error bars.
Figure S3. Comparison of PicTar Ranks for
the Original and Updated DrosophilaTarget
Scatterplot of target gene ranks of the previ-
ous list of PicTar predictions against the Pic-
Tar ranks extracted from the new prediction
sus S3 (B), respectively. Each dot represents
a particular microRNA-target gene pair. If
a microRNA-target gene pair is absent in the
predictions of one version, it is assigned
rank zero for the respective version.
the C. elegans genome, there were 2992 breaks in colinearity with
the other worm genomes. This number is an underestimate of the
true number of breakpoints because the C. briggsae and C. remanei
genomes have not yet been assembled into chromosomes. Of the
3145 total orthologous segment sets, 90% were common to all spe-
cies. In addition, roughly 75% of the orthologous predicted exon
sets determined by the alignment process had exons from all three
species. The numbers of segments and exons common to all spe-
cies and to only strict subsets of the species are given in Table S5.
Overall coverage statistics for the orthology map are summarized
termined to have 1-to-1 orthologs in C. elegans with the orthologous
gene pairs determined by [S10]. More than 95% of the C. briggsae
orthologous predicted exons overlapped with C. briggsae gene an-
notations given by [S10]. From these overlaps, we calculated that
11,874 of the C. briggsae gene annotations have 1-to-1 orthologs
with C. elegans. This number is in close agreement with the 12,155
ortholog pairs identified by [S10]. However, the number of C. brigg-
sae genes identified as being orthologs by both approaches is
somewhat smaller (10,640), indicating that the two methods differ
in their orthology predictions for more ambiguous cases.
The orthologous segment sets were used as input into a subse-
quent multiple alignment procedure based on the MAVID program
[S11]. MAVID is a constraint-based progressive alignment program
designed for large genome sequences. The exon constraints identi-
fied during the orthology mapping procedure were used in conjunc-
tion with the genome sequences from each block, in a series of 3145
multiple alignments. The C. remanei genome was first aligned to the
C. briggsae genome, followed byan alignment to C.elegans.We ob-
tained alignments of30UTRsfor21,623 transcripts correspondingto
19,333 unique genes. The latter alignment set of unique genes was
used to compute the sensitivity and specificity of PicTar. Repeats
remained unmasked. For comparison with the orthology predictions
made by [S10], we used the files ‘‘stein_2003/gene_prediction/cb.
hybrid.gff’’ and ‘‘stein_2003/orthologs_and_orphans/orthologs.txt’’
from their supporting information for C. briggsae gene annotations
and orthology predictions, respectively.
Modification of the PicTar Algorithm
a more elaborate way of assigning emission probabilities (for the
‘‘target-site’’ state in the HMM) to perfect and imperfect nuclei in or-
the increased evolutionaryconservation of functional sites. Foreach
possible nucleus, we recorded the number of occurrences in anchor
sites conservedin all species of the search set for each species sep-
arately. The ratio of these numbers for the species most distantly re-
lated to the reference species and the total number of occurrences
(conserved and nonconserved instances) of the same nucleus in
the reference species itself defines a ‘‘conservation score’’ that re-
and imperfect sites are further subject to different modes of free en-
ergy filtering. The influence of modified free energy constraints on
the sensitivity and specificity was tested to derive appropriate rules.
We discarded the constraintthat aduplex involving an imperfect nu-
cleus and its reverse complement must have a free energy equal to
or lower than that of the equivalent perfect nucleus and its reverse
complement. For perfect nuclei, the duplex of the entire microRNA
and its target sequence is no longer subject to free energy filtering.
For imperfect nuclei, we kept this restriction to model enhanced
compensatory base pairing to the microRNA’s 30end. More pre-
cisely, we compute the free energy of the microRNA/target interac-
tion and compare it to the free energy of the same microRNA hybrid-
izing to its reverse complement. In the previous PicTar version, we
required the former number to be 66% of the latter in the case of im-
perfect nuclei. This threshold was reduced to 60% in order to in-
crease the sensitivity without a significant loss in specificity. To fur-
ther enhance the sensitivity, the conservation requirement was
alleviated. Whenever a group of species with similar evolutionary
distance to the reference species (C. elegans) is present in the
search set, but the whole 30UTR sequence of a particular species
is missing, each target site found in the sequences of the remaining
species is considered to be conserved in the whole group. In this
case, C. briggsae and C. remanei are considered to be a similar evo-
lutionary distance from C. elegans. This strategy reduces the influ-
ence of incomplete alignments, e.g.,due to sequencing gaps. All an-
chor sites that pass the conservation and the free energy filtering
step are input to probabilistic scoring by the HMM as previously de-
scribed, which computes the probability of a target site against
mic DNA. 50entry clones were generated by amplifying the inter-
genic region (up to 2 kb) lying upstream of the start codon of a pre-
dicted target gene by a 50primer and 30primer containing the
appropriate att recombinational cloning site (http://www.invitrogen.
com). Sequences for 30entry clones were obtained from C. elegans
genomic DNA with primers designed to amplify the entire predicted
30UTR of the predicted target gene and that contained the appropri-
ate flanking recombinational cloning sites. In this study, we focused
porter GFP (from Fire lab vector pPD95.67) was amplified so that
primersintroducedflankingattrecombinationalcloning sites. Ampli-
fied fragments were introduced into ‘‘entry clone’’ vectors according
to the Invitrogen protocol, and these combined as instructed to give
a destination clone containing specific upstream sequence, reporter
GFP, and the given 30UTR.
let-7 Mutant Suppression Test
The ability of predicted targets to suppress the let-7(n2853) mutant
phenotype was assessed by dsRNA-mediated interference, as pre-
viously described [S12, S13]. Bacteria expressing dsRNA targeting
a given gene [S14] were grown in liquid culture overnight, and
dsRNA expression was induced overnight on plates containing 25
mg/ml ampicillin and 1 mM IPTG. L1 stage let-7(n2853) mutants
were placed on these plates and allowed to develop. The percent-
age of worms displaying the let-7 phenotype at the level of vulval
bursting was then scored.
GO Term Analysis
The GeneMerge program [S15] takes as input the set of genes of in-
terest (the ‘‘study set’’), a background set representing the universe
of genes from which the study set is drawn, and an association file
containing GO term annotations for each gene. It outputs a list
of GO terms that are overrepresented in the study set versus the
background set by using a hypergeometric distribution to compute
a P value followed by a Bonferroni correction to account for multiple
For our application, the study set is the set of predicted targets for
a given microRNA, and the background set is the set of all predicted
targets plus any other genes whose 30UTR contains at least one
conserved 7-mer (roughly speaking, this set represents all potential
targets of PicTar). For the association file, we used the hierarchical
biological process annotations from the GeneMerge distribution.
Because the Bonferroni correction is extremely conservative for
our application, we applied a relatively loose cutoff of 0.2 on the
Bonferroni-corrected P value. Although we did not explicitly correct
for multiple testing with regard to the different miRNA target sets,
Monte Carlo simulations indicate that our false positive rate for
any particular nonzero entry in the matrices is roughly one in four.
We performed 2-way hierarchical clustering with the program MeV
[S16], with the Pearson correlation coefficient and average linkage
For the motif analysis, we computed the conservation score of
each 7-mer as the ratio of conserved occurrences in nematode 30
UTRs to total the number of occurrences in C. elegans 30UTRs.
Based on the distribution of conservation scores, we computed
a Z score for each 7-mer, similar to [S17]. Of the 252 7-mers with Z
score > 3, which we call ‘‘highly conserved motifs,’’ 44 matched
either the first or second 7-mer of a known C. elegans microRNA,
a fraction consistent with similar data from vertebrate [S17] and fly
alignments (our unpublished data).
Anticipating that many of the highly conserved motifs would cor-
respond to binding sites for miRNAs or other translational regula-
tors, we defined the set of targets for each 7-mer to be the set of
Figure S4. Some PicTar-Predicted let-7 Targets Can Suppress the Vulval Bursting Phenotype of let-7 Mutants
(A) An RNAi suppression screen of PicTar-predicted targets identifies six novel suppressors of the let-7 phenotype. Percentage of animals in
which vulval bursting is not observed was recorded for let-7 (2853) worms raised at the nonpermissive temperature (25ºC) on bacteria producing
dsRNA targeting the given gene. Although phenotypic suppression is weak in some cases, pink indicates significant suppression (p < 0.05, one-
tailed t test).
(B) The same RNAi screen was carried out for a control set of genes that are not predicted to be let-7 targets. In only one case did RNAi targeting
a gene give rise to significantly increased survival over a negative control (shown in pink, p = 0.02). Even in this case, survival did not match the
levels seen with let-7 suppressors shown in (A). While this indicates that RNAi for genes that are not predicted let-7 targets does not lead to spu-
rious suppression of the let-7 phenotype, it should be noted that the gene set tested happens to not contain any genes where RNAi leads to
a phenotype. Thus, it remains possible that genes for which RNAi leads to a phenotype could spuriously suppress the let-7 vulval bursting phe-
notype. Error bars indicate standard deviation in independent experiments.
Figure S5. Overrepresented GO Terms in Individual miRNA Target Gene Sets
Bonferroni-corrected P values for overrepresented GO terms for the targets of each microRNA are plotted on a negative (natural) log scale of
0.0–6.0 (i.e., P values range from e-0= 1 to e-6y 0.002). 53 scores are beyond the upper limit, with a maximum score of 17.33. MicroRNAs
are indicated simply by their number, and colors indicate the following expression classes: black, expressed high levels at all stages; gray, ex-
pressed low levels at all stages; dark green, high from L1 onward; peach, low levels from L1 onward; light blue, high from L2 onward; red, high
from L3 onward; pink, high from L4 onward; dark blue, embryonic and gonadal expression; magenta, high during embryogenesis; light green,
discontinuous expression; not colored, expression not clear. Expression is based on Northern blots from a number of references [S18, S19,
S31–S36]. Brackets indicate miRNA families. All GO terms with a Bonferroni-corrected P value below a loose cutoff of e-1are included. GO terms
with lower confidence are indicated by lighter colors in the figure. Although no correction for multiple testing of miRNA target sets is performed,
we note that the number of miRNAs with at least one associated GO term is much higher than that expected based on the P value cutoffs for the
Figure S6. Overrepresented GO Terms in Sets of Genes Containing Highly Conserved Heptamers
This is a similar plot to that in Figure S5, but uses highly conserved 7-mers (Z score > 3; see Experimental Procedures) instead of annotated mi-
croRNAs. Thetarget setofeach7-mer isthesetofgenesthatcontainsatleastone conserved instanceof the7-mer.Foreach7-mer,ifitsreverse
complement matches the first or second 7-mer of an annotated C. elegans microRNA, the name of the microRNA is printed. If not, the reverse
complement is checked against the first and second 7-mers of tiny noncoding RNAs (tncRNA [S18]) in C. elegans and microRNAs in other meta-
zoan species, and in the case of a match, either the name of the tncRNA or a two-letter abbreviation for the name of the species is printed. Num-
bers in parentheses mean that more than one microRNA matching the 7-mer is found in that species. The following notations and abbreviations
are used: vertebrates (human, rat, mouse, chimpanzee, gorilla, red-bellied tamarind, bonobo, pig, zebrafish, rhesus monkey, pig-tailed ma-
caque, chicken), dm (D. melanogaster), ag (A. gambiae), dp (D. pseudoobscura), mm (M. musculus), hs (H. sapiens), tncR7_II (tiny non-coding
genes whose 30UTRs contained at least one conserved instance of
the 7-mer. We then ran GeneMerge with the same background and
association files as in the previous analysis. 96-k-mers were over-
represented for at least one GO term at a Bonferroni-corrected P
value cutoff of 0.2. 20 of the 96 matched a known C. elegans micro-
RNA in the first or second position, a fraction similar to that found
upon analysis of vertebrate and Drosophila 30UTRs (unpublished
data). One 7-mer matched a family of three predicted C. elegans
tncRNAs (tiny noncoding RNAs) in the second position [S18]. In ad-
S1. Gru ¨n, D., Wang, Y.-L., Langenberger, D., Gunsalus, K.C., and
Rajewsky, N. (2005). microRNA target predictions across
seven Drosophila species and comparison to mammalian tar-
S2. Krek, A., Grun, D., Poy, M.N., Wolf, R., Rosenberg, L., Epstein,
E.J., MacMenamin, P., da Piedade, I., Gunsalus, K.C., Stoffel,
M., et al. (2005). Combinatorial microRNA target predictions.
Nat. Genet. 37, 495–500.
S3. Burge, C., and Karlin, S. (1997). Prediction of complete gene
structures in human genomic DNA. J. Mol. Biol. 268, 78–94.
S4. Korf, I. (2004). Gene finding in novel genomes. BMC Bioinfor-
matics 5, 59.
S5. Guigo, R., Knudsen, S., Drake, N., and Smith, T. (1992). Predic-
tion of gene structure. J. Mol. Biol. 226, 141–157.
S6. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs,
D.J., et al. (2003). The UCSC genome browser database.
Nucleic Acids Res. 31, 51–54.
S7. Pruitt, K.D., Tatusova, T., and Maglott, D.R. (2005). NCBI refer-
ence sequence (RefSeq): a curated non-redundant sequence
database of genomes, transcripts and proteins. Nucleic Acids
Res. 33, D501–D504.
S8. Korf, I., Flicek, P., Duan, D., and Brent, M.R. (2001). Integrating
genomic homology into gene structure prediction. Bioinfor-
matics 17 (Suppl 1), S140–S148.
S9. Kent, W.J. (2002). BLAT—the BLAST-like alignment tool.
Genome Res. 12, 656–664.
S10. Stein, L.D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M.R.,
Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A., et al.
(2003). The genome sequence of Caenorhabditis briggsae:
a platform for comparative genomics. PLoS Biol. 1, e45.
S11. Bray, N., and Pachter, L. (2004). MAVID: constrained ancestral
alignment of multiple sequences. Genome Res. 14, 693–699.
S12. Grosshans, H., Johnson, T., Reinert, K.L., Gerstein, M., and
Slack, F.J.(2005).The temporal patterning microRNA let-7 reg-
ulates several transcription factors at the larval to adult transi-
tion in C. elegans. Dev. Cell 8, 321–330.
S13. Banerjee, D., Kwok, A., Lin, S.Y., and Slack, F.J. (2005). Devel-
opmental timing in C. elegans is regulated by kin-20 and tim-1,
homologs of core circadian clock genes. Dev. Cell 8, 287–295.
S14. Kamath, R.S., and Ahringer, J. (2003). Genome-wide RNAi
screening in Caenorhabditis elegans. Methods 30, 313–321.
S15. Castillo-Davis, C.I., and Hartl, D.L. (2003). GeneMerge—post-
genomic analysis, data mining, and hypothesis testing. Bioin-
formatics 19, 891–892.
S16. Saeed, A.I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati,
N., Braisted, J., Klapa, M., Currier, T., Thiagarajan, M., et al.
(2003). TM4: a free, open-source system for microarray data
management and analysis. Biotechniques 34, 374–378.
S17. Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lind-
blad-Toh, K., Lander, E.S., and Kellis, M. (2005). Systematic
discovery of regulatory motifs in human promoters and 30
UTRs by comparison of several mammals. Nature 434, 338–
S18. Ambros,V.,Lee,R.C.,Lavanway,A.,Williams, P.T.,andJewell,
D. (2003). MicroRNAs and other tiny endogenous RNAs in
C. elegans. Curr. Biol. 13, 807–818.
S19. Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bet-
tinger, J.C., Rougvie, A.E., Horvitz, H.R., and Ruvkun, G.
(2000). The 21-nucleotide let-7 RNA regulates developmental
timing in Caenorhabditis elegans. Nature 403, 901–906.
S20. Slack, F.J., Basson, M., Liu, Z., Ambros, V., Horvitz, H.R., and
Ruvkun, G. (2000). The lin-41 RBCC gene acts in the C. elegans
heterochronic pathway between the let-7 regulatory RNA and
the LIN-29 transcription factor. Mol. Cell 5, 659–669.
S21. Vella, M.C., Choi, E.Y., Lin, S.Y., Reinert, K., and Slack, F.J.
(2004). The C. elegans microRNA let-7 binds to imperfect
let-7 complementary sites from the lin-41 30UTR. Genes Dev.
S22. Vella, M.C., Reinert, K., and Slack, F.J. (2004). Architecture of
a validated microRNA::target interaction. Chem. Biol. 11,
S23. Ha, I., Wightman, B., and Ruvkun, G. (1996). A bulged lin-4/lin-
14 RNA duplex is sufficient for Caenorhabditis elegans lin-14
temporal gradient formation. Genes Dev. 10, 3041–3050.
S24. Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscrip-
tional regulation of the heterochronic gene lin-14 by lin-4
mediates temporal pattern formation in C. elegans. Cell 75,
S25. Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The
C. elegans heterochronic gene lin-4 encodes small RNAs
with antisense complementarity to lin-14. Cell 75, 843–854.
S26. Abrahante, J.E., Daul, A.L., Li, M., Volk, M.L., Tennessen, J.M.,
Miller, E.A., and Rougvie, A.E. (2003). The Caenorhabditis
elegans hunchback-like gene lin-57/hbl-1 controls develop-
mental time and is regulated by microRNAs. Dev. Cell 4, 625–
S27. Lin, S.Y., Johnson, S.M., Abraham, M., Vella, M.C., Pasquinelli,
A., Gamberi, C., Gottlieb, E., and Slack, F.J. (2003). The
C. elegans hunchback homolog, hbl-1, controls temporal pat-
terning and is a probable microRNA target. Dev. Cell 4, 639–
S28. Johnston, R.J., and Hobert, O. (2003). A microRNA controlling
left/right neuronal asymmetry in Caenorhabditis elegans.
Nature 426, 845–849.
S29. Chang,S.,Johnston, R.J.,Jr.,Frokjaer-Jensen, C.,Lockery,S.,
and Hobert, O. (2004). MicroRNAs act sequentially and asym-
metrically to control chemosensory laterality in the nematode.
Nature 430, 785–789.
S30. Moss, E.G., Lee, R.C., and Ambros, V. (1997). The cold shock
domain protein LIN-28 controls developmental timing in
C. elegans and is regulated by the lin-4 RNA. Cell 88, 637–646.
S31. Johnson, S.M., Grosshans, H., Shingara, J., Byrom, M., Jarvis,
F.J. (2005). RAS is regulated by the let-7 microRNA family. Cell
S32. Feinbaum, R., and Ambros, V. (1999). The timing of lin-4 RNA
accumulation controls the timing of postembryonic develop-
mental events in Caenorhabditis elegans. Dev. Biol. 210, 87–
S33. Lee, R.C., and Ambros, V. (2001). An extensive class of small
RNAs in Caenorhabditis elegans. Science 294, 862–864.
S34. Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An
abundant class of tiny RNAs with probable regulatory roles in
Caenorhabditis elegans. Science 294, 858–862.
S35. Lim, L.P., Lau, N.C., Weinstein, E.G., Abdelhakim, A., Yekta, S.,
Rhoades, M.W., Burge, C.B., and Bartel, D.P. (2003). The mi-
croRNAs of Caenorhabditis elegans.Genes Dev. 17,991–1008.
S36. Johnson, S.M., Lin, S.Y., and Slack, F.J. (2003). The time of
appearance of the C. elegans let-7 microRNA is transcription-
allycontrolled utilizingatemporalregulatoryelementin itspro-
moter. Dev. Biol. 259, 364–379.