CPSS: a computational platform for the analysis of small RNA deep sequencing data.
ABSTRACT Next generation sequencing (NGS) techniques have been widely used to document the small ribonucleic acids (RNAs) implicated in a variety of biological, physiological and pathological processes. An integrated computational tool is needed for handling and analysing the enormous datasets from small RNA deep sequencing approach. Herein, we present a novel web server, CPSS (a computational platform for the analysis of small RNA deep sequencing data), designed to completely annotate and functionally analyse microRNAs (miRNAs) from NGS data on one platform with a single data submission. Small RNA NGS data can be submitted to this server with analysis results being returned in two parts: (i) annotation analysis, which provides the most comprehensive analysis for small RNA transcriptome, including length distribution and genome mapping of sequencing reads, small RNA quantification, prediction of novel miRNAs, identification of differentially expressed miRNAs, piwi-interacting RNAs and other non-coding small RNAs between paired samples and detection of miRNA editing and modifications and (ii) functional analysis, including prediction of miRNA targeted genes by multiple tools, enrichment of gene ontology terms, signalling pathway involvement and protein-protein interaction analysis for the predicted genes. CPSS, a ready-to-use web server that integrates most functions of currently available bioinformatics tools, provides all the information wanted by the majority of users from small RNA deep sequencing datasets. AVAILABILITY: CPSS is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.
- SourceAvailable from: Jean-Pierre A Kocher[Show abstract] [Hide abstract]
ABSTRACT: miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for a large scale study. Lack of flexibility and reliability of these web applications are also common issues.BMC genomics. 06/2014; 15(1):423.
- [Show abstract] [Hide abstract]
ABSTRACT: During early vertebrate development, various small non-coding RNAs (sRNAs) such as MicroRNAs (miRNAs) and Piwi-interacting RNAs (piRNAs) are dynamically expressed for orchestrating the maternal-to-zygotic transition (MZT). Systematic analysis of expression profiles of zebrafish small RNAome will be greatly helpful for understanding the sRNA regulation during embryonic development. We first determined the expression profiles of sRNAs during eight distinct stages of early zebrafish development by sRNA-seq technology. Integrative analyses with a new computational platform of CSZ (characterization of small RNAome for zebrafish) demonstrated an sRNA class transition from piRNAs to miRNAs as development proceeds. We observed that both the abundance and diversity of miRNAs are gradually increased, while the abundance is enhanced more dramatically than the diversity during development. However, although both the abundance and diversity of piRNAs are gradually decreased, the diversity was firstly increased then rapidly decreased. To evaluate the computational accuracy, the expression levels of four known miRNAs were experimentally validated. We also predicted 25 potentially novel miRNAs, whereas two candidates were verified by Northern blots. Taken together, our analyses revealed the piRNA to miRNA transition as a conserved mechanism in zebrafish, although two different types of sRNAs exhibit distinct expression dynamics in abundance and diversity, respectively. Our study not only generated a better understanding for sRNA regulations in early zebrafish development, but also provided a useful platform for analyzing sRNA-seq data. The CSZ was implemented in Perl and freely downloadable at: http://csz.biocuckoo.org.BMC Genomics 02/2014; 15(1):117. · 4.40 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short ncRNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/. firstname.lastname@example.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Bioinformatics 12/2013; · 5.47 Impact Factor
BIOINFORMATICS APPLICATIONS NOTE
Vol. 28 no. 14 2012, pages 1925–1927
CPSS: a computational platform for the analysis of small RNA
deep sequencing data
Yuanwei Zhang1,†, Bo Xu1,†, Yifan Yang2, Rongjun Ban3, Huan Zhang1, Xiaohua Jiang1,
Howard J. Cooke1,4, Yu Xue5,∗and Qinghua Shi1,∗
1Hefei National Laboratory for Physical Sciences at Microscale and School of Life Sciences, University of Science
and Technology of China, Hefei 230027, China,2Department of Statistics, University of Kentucky, Lexington, KY
40506, USA,3Department of Computer Science & Technology, Nanjing University, Nanjing 210093,4MRC Human
Genetics Unit, IGMM, University of Edinburgh, Edinburgh EH4 2XU, UK, and5Department of Biomedical Engineering,
Huazhong University of Science and Technology, Wuhan 430074, China
Associate Editor: Ivo Hofacker
Advance Access publication May 9, 2012
Summary: Next generation sequencing (NGS) techniques have
been widely used to document the small ribonucleic acids (RNAs)
implicated in a variety of biological, physiological and pathological
processes. An integrated computational tool is needed for handling
and analysing the enormous datasets from small RNA deep
sequencing approach. Herein, we present a novel web server,
CPSS (a computational platform for the analysis of small RNA
deep sequencing data), designed to completely annotate and
functionally analyse microRNAs (miRNAs) from NGS data on one
platform with a single data submission. Small RNA NGS data can
be submitted to this server with analysis results being returned
in two parts: (i) annotation analysis, which provides the most
comprehensive analysis for small RNA transcriptome, including
length distribution and genome mapping of sequencing reads, small
RNA quantification, prediction of novel miRNAs, identification of
differentially expressed miRNAs, piwi-interacting RNAs and other
non-coding small RNAs between paired samples and detection
of miRNA editing and modifications and (ii) functional analysis,
including prediction of miRNA targeted genes by multiple tools,
enrichment of gene ontology terms, signalling pathway involvement
and protein–protein interaction analysis for the predicted genes.
CPSS, a ready-to-use web server that integrates most functions of
currently available bioinformatics tools, provides all the information
wanted by the majority of users from small RNA deep sequencing
Availability: CPSS is implemented in PHP/PERL+MySQL+R and
can be freely accessed at http://mcg.ustc.edu.cn/db/cpss/index.
html or http://mcg.ustc.edu.cn/sdap1/cpss/index.html.
Contact: email@example.com or firstname.lastname@example.org
Supplementary information: Supplementary data are available at
Received on August 25, 2011; revised on April 20, 2012; accepted
on May 3, 2012
∗To whom correspondence should be addressed.
†The authors wish it to be known that, in their opinion, the first two authors
should be regarded as joint First Authors.
Non-coding ribonucleic acids (RNAs), which do not encode
proteins, include ribosomal RNAs, transfer RNAs, microRNAs
These RNAs participate in a surprisingly diverse collection of
attention due to their negative role in widespread regulation
of mRNA metabolism through direct base pairing interactions
at transcriptional and post-transcriptional levels (Carthew and
Sontheimer, 2009). To better understand the regulatory roles
of miRNAs and other small RNAs in different tissues and
developmental stages, the expression profiles of small RNAs need
to be assessed.
techniques has revolutionized the identification of small RNAs with
particularly high levels of sensitivity and accuracy (Zhou et al.,
2011). Several published tools detecting non-coding RNA profiles
from NGS data have been developed. For example, miRExpress
(Wang et al., 2009) is a stand-alone software for detecting known
miRNAs and novel miRNAs. miRanalyzer (Hackenberg et al.,
2009), which also offers stand-alone version, is a web server tool
that can detect known and novel miRNAs, identify differentially
expressed miRNAs and predict miRNAtargets. SeqBuster (Pantano
et al., 2010), offering a web-based toolkit and stand-alone version,
focuses on detecting miRNA variants/isoforms for known miRNAs
and can also be used to identify differentially expressed miRNAs
and predict miRNAtargets. There are several recent comprehensive
tools designed to analyse NGS data. mirTools (Zhu et al., 2010)
is a web-based tool designed to explore the genome map and
length distribution of short reads and to classify them into known
categories, to detect differentially expressed miRNAs and to predict
RNA expression and predicting novel non-coding RNAs. WapRNA
(Zhao et al., 2011) is not used only to detect miRNA expression
profile from small RNA NGS data but also to analyse mRNA NGS
shown in SupplementaryTable S1). Until now, none of the currently
available tools provides functional analysis for predicted targets of
miRNAs from NGS data, which could help users to find potentially
© The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please email: email@example.com
at University of Science and Technology of China on July 31, 2012
Y.Zhang et al.
Fig. 1. The summary results of CPSS (details are shown in http://mcg.ustc.
features of previous tools with functional analysis for predicted
targets of miRNAs from NGS data, is still needed.
Herein, we present a novel and free web server, CPSS, which
integrates most functions of currently available bioinformatics tools
(Supplementary Table S1 and Supplementary Fig. S1). By using
CPSS, small RNA NGS data can be analysed systematically in
one platform after a single submission of data by integration of
annotation and functional analysis of novel and/or differentially
expressed miRNAs. CPSS generates an analysis report including:
small RNA transcriptome, such as length distribution and genome
mapping of sequencing reads, small RNA annotation, prediction of
novel miRNAs, identification of differentially expressed miRNAs,
piRNAs and other non-coding small RNAs between paired samples
analysis, which provides the functional analysis of miRNAs, e.g.
predicting miRNA target genes by multi-tools, enriching gene
protein–protein interaction (PPI) for the predicted genes (Fig. 1).
CPSS provides an easy-to-use interface, allowing users to
conveniently analyse the data of small RNA transcriptome from
NGS techniques. Users submit the input data in FASTA format or
FASTA files compressed in *.gz format, and the FASTA format
(*.fa) files can be transformed from the raw data, which contain
unprecedented amounts of reads generated by Illumina Genome
Analyzer, 454 FLX instrument or SOLiDTMsystem. The overall
workflow of CPSS is shown in Supplementary Figure S2. First,
the remaining clean sequences in FASTA format filtered above
are classified into several categories, i.e. miRNA, piRNA, other
non-coding small RNAs, mRNA, genomic repeats, etc. Then,
the sequences that can be mapped to the reference genome but
cannot be assigned to any of the referred annotations are used to
mireap) or miRDeep (Friedlander et al., 2008), and their secondary
structures are predicted by RNAfold (http://rna.tbi.univie.ac.at/).
The potential target genes will be predicted for the most abundant
novel/known miRNAs from one sample and for all the differentially
expressed miRNAs from two paired samples automatically by eight
miRNA target prediction tools. For functional analysis of miRNAs,
the predicted targets are mapped to the GO annotation dataset
(Ashburner et al., 2000) and used to extract the enriched GO
processes using the Fisher’s exact test (enrichment ratio >2 and
to the signal pathway annotation datasets from KEGG (Kanehisa
et al., 2010) and PPI annotation dataset from String (Jensen et al.,
2009). (The details of algorithm for every step are presented in
Supplementary Material.) CPSS is ready-to-use for most users
without the need to change any of the default analysis parameters.
However, users can also modify most of the parameters according
to their advanced requirements. The final results are presented to
users as graphic summary in a browser (Fig. 1) and detailed results
are saved in a *.gz file that can be downloaded from the server.
Currently, CPSS is able to handle the data from either one or two
when the job is done. Users can retrieve the analysis results from
the stored jobs with a unique ID generated randomly by the server
for each job. Most strikingly, the annotation and functional analysis
of novel and/or differentially expressed miRNAs from small RNA
NGS data can be completed in one platform, CPSS, after a single
submission of data.
To evaluate the performance of CPSS, several small RNA
sequencing data from our laboratory are tested. First, one sample
from human ovary was uploaded to CPSS (Zhang et al., 2011).
Standard protocols were used for small RNA preparation and
Illumina sequencing. In total, 8721844 clean reads were generated
for the sample. According to the workflow, we filtered and
annotated them using currently available databases (Supplementary
Table S2). Massively parallel pyrosequencing generated 11966289
non-redundant sequences from the human ovary, and 260 predicted
novel miRNAs were detected (Supplementary Fig. S3A). All the
annotation and analysis were completed in 30 min and the detailed
results are available from CPSS (Download ID = 6818289109).
Second, two samples of small RNA sequencing data from testes
of Spo11 knockout and wild-type mice were tested. Following
the workflow, all the small RNA sequences were also filtered and
annotated using CPSS based on the currently available databases,
and the significant differentially expressed miRNAs and piRNAs
between the samples were detected and functional analysis for them
was completed (Supplementary Fig. S3B) within 1 h (Download ID
= 713628095). The differential expression of these miRNAs and
piRNAs (these miRNAs and piRNAs expressed differentially based
on both total read counts and most abundant unique tag) was further
validated by real-time polymerase chain reaction (PCR) (details
in Supplementary Methods), and a strong correlation for miRNA
levels was detected between deep sequencing and real-time PCR
at University of Science and Technology of China on July 31, 2012
Analysis platform for small RNA NGS data
(Supplementary Fig. S3C and D; R=0.947 and 0.916, respectively),
indicating the credibility and robustness of deep sequencing-based
expression analysis obtained from CPSS.
Funding: National Basic Research Program [2012CB944402,
2007CB947401, 2011CB944501] of China (973); Program of
Knowledge Innovation [KSCX2-EW-R-07] of ChineseAcademy of
Conflict of Interest: none declared.
Ashburner,M. et al. (2000) Gene ontology: tool for the unification of biology. The gene
ontology consortium. Nat. Genet., 25, 25–29.
Carthew,R.W. and Sontheimer,E.J. (2009) Origins and mechanisms of miRNAs and
siRNAs. Cell, 136, 642–655.
sequencing experiments. Nucleic Acids Res., 39, 112–117.
Friedlander,M.R. et al. (2008) Discovering microRNAs from deep sequencing data
using miRDeep. Nat. Biotechnol., 26, 407–415.
Hackenberg,M. et al. (2009) miRanalyzer: a microRNA detection and analysis
tool for next-generation sequencing experiments. Nucleic Acids Res., 37,
Jensen,L.J. et al. (2009) STRING 8—a global view on proteins and their functional
interactions in 630 organisms. Nucleic Acids Res., 37, D412–D416.
Kanehisa,M. et al. (2010) KEGG for representation and analysis of molecular networks
involving diseases and drugs. Nucleic Acids Res., 38, D355–D360.
Moazed,D. (2009) Small RNAs in transcriptional gene silencing and genome defence.
Nature, 457, 413–420.
Pantano,L. et al. (2010) SeqBuster, a bioinformatic tool for the processing and
embryonic cells. Nucleic Acids Res., 38, e34.
Wang,W.C. et al. (2009) miRExpress: analyzing high-throughput sequencing data for
profiling microRNA expression. BMC Bioinformatics, 10, 328.
Zhang,Y.W. et al. (2011) Prediction of novel pre-microRNAs with high accuracy
through boosting and SVM. Bioinformatics, 27, 1436–1437.
Zhao,W. et al. (2011) wapRNA: a web-based application for the processing of RNA
sequences. Bioinformatics, 27, 3076–3077.
Zhou,L. et al. (2011) Small RNAtranscriptome investigation based on next-generation
sequencing technology. J. Genet. Genomics, 38, 505–513.
Zhu,E. et al. (2010) mirTools: microRNA profiling and discovery based on high-
throughput sequencing. Nucleic Acids Res., 38, W392–W397.
at University of Science and Technology of China on July 31, 2012