The Sequence Structures of Human MicroRNA Molecules and Their Implications.
ABSTRACT The count of the nucleotides in a cloned, short genomic sequence has become an important criterion to annotate such a sequence as a miRNA molecule. While the majority of human mature miRNA sequences consist of 22 nucleotides, there exists discrepancy in the characteristic lengths of the miRNA sequences. There is also a lack of systematic studies on such length distribution and on the biological factors that are related to or may affect this length. In this paper, we intend to fill this gap by investigating the sequence structure of human miRNA molecules using statistics tools. We demonstrate that the traditional discrete probability distributions do not model the length distribution of the human mature miRNAs well, and we obtain the statistical distribution model with a decent fit. We observe that the four nucleotide bases in a miRNA sequence are not randomly distributed, implying that possible structural patterns such as dinucleotide (trinucleotide or higher order) may exist. Furthermore, we study the relationships of this length distribution to multiple important factors such as evolutionary conservation, tumorigenesis, the length of precursor loop structures, and the number of predicted targets. The association between the miRNA sequence length and the distributions of target site counts in corresponding predicted genes is also presented. This study results in several novel findings worthy of further investigation that include: (1) rapid evolution introduces variation to the miRNA sequence length distribution; (2) miRNAs with extreme sequence lengths are unlikely to be cancerrelated; and (3) the miRNA sequence length is positively correlated to the precursor length and the number of predicted target genes.

Article: Exposing synonymous mutations.
[Show abstract] [Hide abstract]
ABSTRACT: Synonymous codon changes, which do not alter protein sequence, were previously thought to have no functional consequence. Although this concept has been overturned in recent years, there is no unique mechanism by which these changes exert biological effects. A large repertoire of both experimental and bioinformatic methods has been developed to understand the effects of synonymous variants. Results from this body of work have provided global insights into how biological systems exploit the degeneracy of the genetic code to control gene expression, protein folding efficiency, and the coordinated expression of functionally related gene families. Although it is now clear that synonymous variants are important in a variety of contexts, from human disease to the safety and efficacy of therapeutic proteins, there is no clear consensus on the approaches to identify and validate these changes. Here, we review the diverse methods to understand the effects of synonymous mutations.Trends in Genetics 06/2014; · 11.60 Impact Factor  SourceAvailable from: Patrick Provost
Article: Effects of pathogen reduction systems on platelet microRNAs, mRNAs, activation, and function.
Abdimajid Osman, Walter E. Hitzler, Claudius U. Meyer, Patricia Landry, Aurélie Corduan, Benoit Laffont, Eric Boilard, Peter Hellstern, Eleftherios C. Vamvakas, Patrick Provost[Show abstract] [Hide abstract]
ABSTRACT: Pathogen reduction (PR) systems for platelets, based on chemically induced crosslinking and inactivation of nucleic acids, potentially prevent transfusion transmission of infectious agents, but can increase clinically significant bleeding in some clinical studies. Here, we documented the effects of PR systems on microRNA and mRNA levels of platelets stored in the blood bank, and assessed their impact on platelet activation and function. Unlike platelets subjected to gamma irradiation or stored in additive solution, platelets treated with Intercept (amotosalen + ultravioletA [UVA] light) exhibited significantly reduced levels of 6 of the 11 microRNAs, and 2 of the 3 antiapoptotic mRNAs (Bclxl and Clusterin) that we monitored, compared with platelets stored in plasma. Mirasol (riboflavin + UVB light) treatment of platelets did not produce these effects. PR neither affected platelet microRNA synthesis or function nor induced crosslinking of microRNAsized endogenous platelet RNA species. However, the reduction in the platelet microRNA levels induced by Intercept correlated with the platelet activation (p < 0.05) and an impaired platelet aggregation response to ADP (p < 0.05). These results suggest that Intercept treatment may induce platelet activation, resulting in the release of microRNAs and mRNAs from platelets. The clinical implications of this reduction in platelet nucleic acids secondary to Intercept remain to be established.Platelets 04/2014; · 2.63 Impact Factor
Page 1
The Sequence Structures of Human MicroRNA Molecules
and Their Implications
Zhide Fang1, Ruofei Du1, Andrea Edwards2, Erik K. Flemington3*, Kun Zhang2*
1Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana, United States of America, 2Department of
Computer Science, Xavier University of Louisiana, New Orleans, Louisiana, United States of America, 3Department of Pathology, Tulane University School of Medicine,
New Orleans, Louisiana, United States of America
Abstract
The count of the nucleotides in a cloned, short genomic sequence has become an important criterion to annotate such
a sequence as a miRNA molecule. While the majority of human mature miRNA sequences consist of 22 nucleotides, there
exists discrepancy in the characteristic lengths of the miRNA sequences. There is also a lack of systematic studies on such
length distribution and on the biological factors that are related to or may affect this length. In this paper, we intend to fill
this gap by investigating the sequence structure of human miRNA molecules using statistics tools. We demonstrate that the
traditional discrete probability distributions do not model the length distribution of the human mature miRNAs well, and we
obtain the statistical distribution model with a decent fit. We observe that the four nucleotide bases in a miRNA sequence
are not randomly distributed, implying that possible structural patterns such as dinucleotide (trinucleotide or higher order)
may exist. Furthermore, we study the relationships of this length distribution to multiple important factors such as
evolutionary conservation, tumorigenesis, the length of precursor loop structures, and the number of predicted targets. The
association between the miRNA sequence length and the distributions of target site counts in corresponding predicted
genes is also presented. This study results in several novel findings worthy of further investigation that include: (1) rapid
evolution introduces variation to the miRNA sequence length distribution; (2) miRNAs with extreme sequence lengths are
unlikely to be cancerrelated; and (3) the miRNA sequence length is positively correlated to the precursor length and the
number of predicted target genes.
Citation: Fang Z, Du R, Edwards A, Flemington EK, Zhang K (2013) The Sequence Structures of Human MicroRNA Molecules and Their Implications. PLoS ONE 8(1):
e54215. doi:10.1371/journal.pone.0054215
Editor: Yan Gong, College of Pharmacy, University of Florida, United States of America
Received July 7, 2012; Accepted December 10, 2012; Published January 18, 2013
Copyright: ? 2013 Fang et al. This is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: Research reported in this publication was supported by National Institutes of Health grants (NIGMS P20GM103424, NCRRRCMI 5G12RR02626004), an
US Department of Army grant (W911NF1210066) and an NSF grant (EPS1006891). The funders had no role in study design, data collection and analysis,
decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
* Email: eflemin@tulane.edu (EKF); kzhang@xula.edu (KZ)
Introduction
MicroRNAs (miRNAs) have been identified as a group of small
endogenous noncoding RNAs that negatively regulate protein
coding messenger RNAs (mRNAs) at the posttranscriptional level.
ThederivedprocessandthemainactivityofamiRNAareclearand
welldescribedintheliterature.MaturemiRNAsaresinglestranded
RNAsconsistingofabout22nucleotidesandarederivedfromlonger
noncodingprimarymiRNAs(primiRNAs)andthenfromprecursor
miRNAs (premiRNAs) by the sequential actions of the Drosha and
DicerRNAcleavingenzymes[1–3].ThemainfunctionofmiRNAsis
to step in and intervene in the translation of mRNAs or to induce
degradation of the mRNAs. In mammals, mature miRNAs are
incorporatedintoanRNAinducingsilencingcomplex(RISC).The
activated RISC permits the miRNAs to bind to the 39 untranslated
regions(39UTR)ofspecifictargetmRNAstosuppresstranslationand
cause their degradation by mRNA decay [4–6]. There may not be
a onetoone correspondence between the miRNAs and targeted
mRNAs. A single miRNA may have multiple mRNA targets. It is
a challenging task to predict the targeted mRNAs of a miRNA,
thoughtheprecisepredictionisessentialtostudyitsfunctionalactivity
and its association with diseases. The process of deriving a miRNA
molecule and its mainactivity isdepicted inFigure 1.
To annotate a cloned sequence as a miRNA, the most
important criteria include the characteristic length (approximately
22nucleotide) of the sequence and a corresponding compact pre
miRNA loop structure, with a median of 83 nucleotides [7]. The
association between the biological significance and the sequence
length heterogeneity has been recently recognized for a mature
miRNA in Arabidopsis thaliana [8]. This study shows the importance
and necessity to study the distributional structure of the sequence
lengths of mature miRNA molecules in the genome, and the
factors that may affect this length heterogeneity. With the
development of profiling technology and the advances of
bioinformatics/computational tools, the number of miRNAs
identified has increased dramatically. Since the first miRNA was
discovered in 1993 [9] and the biological functions of miRNAs
were recognized to be conserved in different species in 2000
[10,11], the number of mature human miRNAs jumped from 152
in August 2004 to 1732 in April 2011, according to miRBase,
a database of published miRNA sequences and annotation [12–
16]. In this paper, we systematically investigate the length
distribution of miRNAs, anticipating that the nature of non
uniformity of this distribution can reveal the complexity of the
miRNA molecular structures and have implications for genetic
research.
PLOS ONE  www.plosone.org1 January 2013  Volume 8  Issue 1  e54215
Page 2
Materials and Methods
Materials
The sequences of 1732 human mature miRNAs and the
corresponding precursor miRNAs were downloaded from the
public database miRBase (Release 17, April 2011). All the
calculations were carried out using the R language.
Statistical Methods
A random variable is defined to have an asymmetric Laplace
distribution if it has density
ph,k,s(x)~Ifx§hg(x)exp {
ffiffiffi
s
ffiffiffi
2
p
k
(x{h)
!
zIfxvhg(x)exp
2
p
sk(x{h)
!
,
where h,k,s are three unknown parameters. It reduces to
symmetric Laplace distribution when k~1: The function I is
the indicator function. The maximum likelihood estimates,^h h,^ k k,^ s s,
of these parameters are available in [17]. With these estimates, the
fitted discrete asymmetric Laplace distribution, DALaplace, has
the probability masses defined by,
p(k)~
p^h h,^ k k,^ s s(k)
P
allk
p^h h,^ k k,^ s s(k),
where k ranges from 16 to 27 (the range of the sequence lengths of
human mature miRNAs). The discrete symmetric Laplace
distribution (DLaplace) is defined in the same fashion.
A zeroinflated Poisson model is defined as
fp,l(x)~(1{p)Ifx~0g(x)zplx
x!exp({l),
where x is nonnegative integer, and p [½0,1?
parameters. This is a mixture model and it reduces to Poisson
distribution with mean l when p~1, or a single point distribution
ðÞ, l are unknown
putting its all mass at zero when p~0: We fitted this model to the
absolute differences of mature miRNA sequence lengths and their
median, and obtained the maximum likelihood estimates [18]:
^ p p,^l l: Then the discrete, symmetric zeroinflated model (DSZero
Inf) is defined as,
p(k)~ Ifk~mg(k)z1
2Ifk=mg(k)
??
f^ p p,^l l(k),
where k ranges from 16 to 27, and m is the median of the observed
sequence lengths of human mature miRNAs.
The tPoisson distribution is defined as
p(k)~clkexp({l)
k!
,
where k ranges from 16 to 27, l is the average sequence length of
all human mature miRNAs, and c is a constant such that the sum
of all probabilities is one.
Results and Discussion
The Distribution of the Sizes of Human Mature miRNA
Molecules
The number of nucleotides in a human mature miRNA is
a discrete random variable, which ranges from 16 to 27 and has
a mode and a median of 22. A histogram of lengths of all human
mature miRNA molecules is presented in Figure 2(a). Though
the Poisson distribution is the traditional model for fitting the
count data, it does not fit the length distribution of mature miRNA
molecules well. Figure 2(b) is the Poissonness plot of the data
[19]. It is created by plotting flog(xk)zlog(k!)g against fkg,
where k is the count, xkis the corresponding observed frequency
and k! represents the factorial of k. It is clear that the plotted points
do not fall onto a straight line, with the points at the middle above
the line and the points at both ends below the line. This suggests
nonPoisson distribution should be employed to fit the length
distribution of mature miRNAs.
A unique feature of Poisson distribution is the equality of its
mean and variance. This is not the case for the lengths of human
Figure 1. Biogenesis of mature miRNAs and their functional activity.
doi:10.1371/journal.pone.0054215.g001
Sequence Structures of miRNAs and the Implications
PLOS ONE  www.plosone.org2January 2013  Volume 8  Issue 1  e54215
Page 3
mature miRNA molecules because the sample mean (21.52) is
much larger than the sample variance (2.51). This fact also implies
that negative binomial distribution, another popular distribution
to model the count data and handle the overdispersion problem
in counts, could not fit the human mature miRNA lengths well.
We show in Figure 3 the schematic fitting results of three
discrete distributions to the sequence lengths of human mature
miRNA molecules. These include a discrete analogue of the
asymmetric Laplace distribution (denoted as DALaplace), a dis
crete, symmetric distribution induced from the zeroinflated model
(denoted as DSZeroInf) and a truncated Poisson distribution
(denoted as tPoisson). Details of DALaplace, DSZeroinf and
tPoisson are discussed in the Materials and Methods. Interested
readers are referred to [17] for the definition of the asymmetric
Laplace distribution and methods for parameter estimation, and to
[19] for the definition and applications of the zeroinflated model.
It is clear from the plot that DALaplace performs best while
tPoisson is the worst. The sum of squares of the residuals
(differences between observed percentages and corresponding
fitted values) are 0.0047, 0.01, 0.175 for DALaplace, DSZeroinf
and tPoisson, respectively, further illustrating the performance of
these models. We also calculated AIC (Akaike information
criterion) to evaluate the relative goodness of fit of these non
nested models. AICs for DALaplace, DSZeroinf and tPoisson are
5893.396, 6117.659, and 7970.977 respectively. The order of
these values confirms our selection of the model.
The Randomness of Bases in Mature Human miRNA
Molecules
Another question of interest to biologists is whether there is any
structural pattern in a human mature miRNA; in other words,
whether the proportion of one nucleotide base is significantly
higher or lower than those of other bases. We intend to address
this problem in this subsection. Given the length of a mature
miRNA sequence, the vector of counts, nA, nC, nG, nU,of the
bases, A, C, G, U, follows a multinomial distribution. By the
likelihood ratio method for the test of proportional homogeneity,
we conclude that the proportions of the four bases in every
sequence are significantly different (pvalue < 0). We further find
that at the significance level of 0.05, that there are 341 (about
20%) mature miRNA sequences showing inequality of base
probabilities. The sample proportions of four bases in all miRNA
sequences are presented in Figure 4(a), with the 95% simulta
neous confidence interval [20] at the top of corresponding bar.
These intervals clearly indicate that the four bases are not equally
probable in all the sequences. All these findings imply that there
may exist structural patterns in the sequences of certain mature
miRNAs.
However, as demonstrated in Figure 4(b), the content of GC
(50.8%) is very close to that of AU (49.2%). The 99.9% confidence
interval for the GC content is (0.499, 0.516), which is narrow and
covers 0.5. We comment that the hypothesis of the GC content
being 50% holds as long as the significance level is set to be greater
than 0.0028. This is due to the facts that the sample size N (the
total number of bases in all mature miRNA sequences) is large and
that in the hypothesis testing of a proportion, the significant
probability goes to zero as N approaches infinity.
The Relationship to Evolutionary Conservation
Highly conserved DNA sequences are thought to have
functional value. The genetic conservation across evolution has
been an important benchmark for detecting functionally important
nucleic acid sequences, and for studying gene interactions in
a group of coregulated genes [21–24]. Hirsh and Fraser [25]
revealed a negative and highly significant relationship between the
importance of a gene and the evolutionary rate. Similar relation
ship for miRNAs was also studied in the literature. Zhang et al.
[26] reported the rapid evolution of some miRNA clusters. In this
subsection, we present our findings on the correlation between
evolutionary conservation and the length of mature miRNA
molecules. To our knowledge, this is the first study exploring this
relationship.
All human mature miRNAs are divided into two classes,
conserved and humanspecific, by using the procedure documen
ted in [27]. Out of 1732 mature miRNAs, there are 914 (about
52.8%) miRNAs labeled as conserved and 818 (about 47.2%) as
humanspecific. These two ratios are significantly different (one
sided pvalue is 0.01). Figure 5 shows the length distributions of
the sequences in these two groups. We can see that the sequence
lengths of conserved miRNAs are symmetrically distributed
around 22. Both the discrete, symmetric, zeroinflated distribution
(DSZeroinf) and the discrete, symmetric Laplace (DLaplace) can
model the distribution decently and there is little difference
between these two models. On the contrary, the sequence lengths
of humanspecific miRNAs seem to be bimodally distributed with
modes of 16 and 22. One may need a mixture of two distributions
to model this variable well. The percentage (7.3%) of the short
humanspecific miRNAs that have length of 16 or 17 is about ten
fold of that (0.77%) of the short conserved miRNAs (a Ztest for
equality of two percentages gives a pvalue close to 0).
Figure 2. Histogram and corresponding Poissonness plot of
the sequence lengths (sizes) of human mature miRNA mole
cules.
doi:10.1371/journal.pone.0054215.g002
Figure 3. Histogram of sequence lengths of human mature
miRNA molecules and four fitted models.
doi:10.1371/journal.pone.0054215.g003
Sequence Structures of miRNAs and the Implications
PLOS ONE  www.plosone.org3January 2013  Volume 8  Issue 1  e54215
Page 4
All these results indicate that rapid evolution seems to increase
the variation in the sequence lengths of human mature miRNA
molecules, and thus complicate the distribution of the length
variable.
The Characteristic Size of Human miRNA Oncogenes and
Tumor Suppressors
It has become evident that miRNAs control the expression
levels of gene products that are important in cancer progression. A
number of studies have shown that many miRNAs reside within
chromosomal fragile sites in the human genome and that many
miRNAs have been linked to the initiation, progression, and
metastasis of human malignancies, with the earlier reports
associating miRNAs with cancers being published in [28,29].
Some miRNAs are able to target oncogenes – those with capacity
to induce tumor migration and invasion, or tumor suppressor
genes – those with capacity to suppress cancer and metastasis [30–
33]. The essence of the miRNA’s regulatory mechanism in cancer
lies in that increased expression of certain miRNAs can result in
downregulation of tumor suppressor genes, while decreased
expression of other miRNAs can lead to increased expression of
oncogenes. Examples include hsamiR10B [34] and hsamiR21
[35] in breast cancer, and hsamiR155 [36] in human B cell
lymphomas as oncogenes; and hsalet7 [37] in lung cancer, and
hsamiR15 and hsamiR16 [28] in chronic lymphocytic leukemia
as tumor suppressors.
To investigate the distributions of the sequence lengths of the
mature miRNA molecules that are associated with cancer, we
generate a class of miRNAs regulating either oncogenes or tumor
suppressor genes. For a miRNA to be included, there must be at
least one publication indicating the causal relationship between the
miRNA and the related oncogene or tumor suppressor gene. We
include those miRNAs that play opposite roles in different cancers
due to the fact that one miRNA may regulate multiple targets, and
the same miRNA may play opposite roles in cancer progression in
that it acts as a tumor suppressor in certain cancers and as an
oncogene in others [38]. This makes our selection slightly different
from that in [16]. If no such a causal relationship exists, a miRNA
is selected as an oncogene if it is upregulated in at least three
publications, or as a tumor suppressor if it is downregulated in at
least another three papers. We exclude the miRNAs which show
conflicted roles in the same cancer. We obtained 173 cancer
related miRNAs listed in Table 1, where the function of a miRNA
is marked ‘‘mixed’’ if it regulates some oncogenes in a certain
cancers and other tumor suppressor genes in different types of
cancers. We find that the characteristic sequence lengths of these
miRNAs are very stable, with 60.7% of human miRNAs having
sequences of 22 nucleotides, 96.5% of human miRNAs having
Figure 4. The sample proportions of nucleotide bases and GC, AU contents.
doi:10.1371/journal.pone.0054215.g004
Figure 5. Histograms and fitted distributions of the sequence lengths of mature conserved and human – specific miRNAs.
doi:10.1371/journal.pone.0054215.g005
Sequence Structures of miRNAs and the Implications
PLOS ONE  www.plosone.org4 January 2013  Volume 8  Issue 1  e54215
Page 5
sequences of 22+1 nucleotides, and 99.4% of human miRNAs
having sequences of 22+2 nucleotides. The only miRNA whose
sequence is of 18 nucleotides, outside of the interval 22+2, is has
miR516a3p. This miRNA has connection to human breast
cancer progression [39]. The length distribution for the miRNAs
exclusively regulating oncogenes (or tumor suppressors) is very
similar to that of all cancerrelated miRNAs. These observations
suggest that an extremely long or short miRNA is unlikely cancer
Table 1. All human mature miRNAs associated with cancer and their functions.
miRNAFunctionmiRNAfunctionmiRNAFunctionmiRNAFunction
let7asuppmiR148bsuppmiR21oncomiR34bsupp
let7a2* suppmiR150mixed miR210mixedmiR34c5psupp
let7bsupp miR152suppmiR214 suppmiR370supp
let7csuppmiR153supp miR215supp miR372onco
let7dsupp miR155oncomiR216bsuppmiR373*onco
let7esuppmiR15asuppmiR218suppmiR373onco
let7fsuppmiR15bsuppmiR21913ponco miR374aonco
let7f1*oncomiR16mixedmiR22suppmiR375mixed
let7gsuppmiR161*mixedmiR221oncomiR376a supp
let7isuppmiR17oncomiR222oncomiR376bsupp
miR1 suppmiR181amixedmiR223mixedmiR377supp
miR100suppmiR181a2*oncomiR224oncomiR424supp
miR101suppmiR181bsuppmiR23amixedmiR429supp
miR106aoncomiR181csuppmiR23bsuppmiR432supp
miR106boncomiR182oncomiR241*onco miR449a supp
miR107mixedmiR182*onco miR24oncomiR451supp
miR10aoncomiR183suppmiR242*oncomiR4855psupp
miR10boncomiR184oncomiR25oncomiR4865psupp
miR122suppmiR185 suppmiR26amixedmiR494onco
miR124suppmiR18aoncomiR26bmixedmiR495supp
miR125a5psuppmiR18a*suppmiR27aoncomiR497supp
miR125bmixedmiR191onco miR27bsupp miR498 onco
miR125b1* suppmiR192 supp miR2965poncomiR503 onco
miR125b2*suppmiR193a3psuppmiR29asuppmiR510 onco
miR126* mixed miR193bsupp miR29bsuppmiR516a3ponco
miR126mixedmiR194supp miR29b2* suppmiR519a supp
miR1273p supp miR195supp miR29csuppmiR520c3ponco
miR128 suppmiR196a mixedmiR30amixed miR520hsupp
miR1295p supp miR196a*onco miR30a*suppmiR521onco
miR130boncomiR197oncomiR30esupp miR5325p onco
miR133asuppmiR199b5p suppmiR31supp miR661 supp
miR133bsuppmiR19a oncomiR32oncomiR675onco
miR134 suppmiR19b onco miR320a suppmiR7supp
miR135a supp miR19b2*onco miR3245psupp miR9mixed
miR137mixedmiR200asupp miR326suppmiR9*onco
miR138suppmiR200bsupp miR328supp miR92aonco
miR1393psuppmiR200cmixed miR3303psupp miR93onco
miR1405supp miR203supp miR335supp miR95supp
miR141supp miR204mixed miR3373psupp miR96onco
miR143mixed miR205supp miR340oncomiR98onco
miR145supp miR206supp miR3425p suppmiR99asupp
miR146amixed miR20amixed miR345onco
miR146b5psuppmiR20a*onco miR346onco
miR148asupp miR20bonco miR34asupp
doi:10.1371/journal.pone.0054215.t001
Sequence Structures of miRNAs and the Implications
PLOS ONE  www.plosone.org5 January 2013  Volume 8  Issue 1  e54215