RIPSeeker: A statistical package for identifying protein-associated transcripts from RIP-seq experiments

Department of Computer Science, University of Toronto, Toronto, Ontario, M5S 2E4, Canada The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada, Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A4, Canada and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
Nucleic Acids Research (Impact Factor: 9.11). 02/2013; 41(8). DOI: 10.1093/nar/gkt142
Source: PubMed


RIP-seq has recently been developed to discover genome-wide RNA transcripts that interact with a protein or protein complex. RIP-seq is similar to both RNA-seq and ChIP-seq, but presents unique properties and challenges. Currently, no statistical tool is dedicated to RIP-seq analysis. We developed RIPSeeker (, a free open-source Bioconductor/R package for de novo RIP peak predictions based on HMM. To demonstrate the utility of the software package, we applied RIPSeeker and six other published programs to three independent RIP-seq datasets and two PAR-CLIP datasets corresponding to six distinct RNA-binding proteins. Based on receiver operating curves, RIPSeeker demonstrates superior sensitivity and specificity in discriminating high-confidence peaks that are consistently agreed on among a majority of the comparison methods, and dominated 9 of the 12 evaluations, averaging 80% area under the curve. The peaks from RIPSeeker are further confirmed based on their significant enrichment for biologically meaningful genomic elements, published sequence motifs and association with canonical transcripts known to interact with the proteins examined. While RIPSeeker is specifically tailored for RIP-seq data analysis, it also provides a suite of bioinformatics tools integrated within a self-contained software package comprehensively addressing issues ranging from post-alignments' processing to visualization and annotation.

60 Reads
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Unlike DNA, RNA abundances can vary over several orders of magnitude. Thus identification of RNA-protein binding sites from high throughput sequencing data presents unique challenges. While peak identification in ChIP-Seq data has been extensively explored, there are few bioinformatics tools tailored for peak calling on analogous datasets for RNA-binding proteins. Here we describe ASPeak, an implementation of an algorithm that we previously applied to detect peaks in Exon Junction Complex (EJC) RNA immunoprecipitation in tandem (RIPiT) experiments. Our peak detection algorithm yields stringent and robust target sets enabling sensitive motif finding and downstream functional analyses. ASPeak is implemented in Perl as a complete pipeline that takes bedGraph files as input. ASPeak implementation is freely available at under the GNU General Public License. ASPeak can be run on a personal computer, yet is designed to be easily parallelizable. ASPeak can also run on high performance computing clusters providing efficient speedup. The documentation and user manual can be obtained from;
    Bioinformatics 08/2013; 29(19). DOI:10.1093/bioinformatics/btt428 · 4.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: CLIP-seq is widely used to study genome-wide interactions between RNA-binding proteins and RNAs. However, there are few tools available to analyze CLIP-seq data, thus creating a bottleneck to the implementation of this methodology. Here, we present PIPE-CLIP, a Galaxy framework-based comprehensive online pipeline for reliable analysis of data generated by three types of CLIP-seq protocol: HITS-CLIP, PAR-CLIP and iCLIP. PIPE-CLIP provides both data processing and statistical analysis to determine candidate cross-linking regions, which are comparable to those regions identified from the original studies or using existing computational tools. PIPE-CLIP is available at
    Genome biology 01/2014; 15(1):R18. DOI:10.1186/gb-2014-15-1-r18 · 10.81 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A number of long noncoding RNAs (lncRNAs) have been identified by deep sequencing methods, but their molecular and cellular functions are known only for a limited number of lncRNAs. Current databases on lncRNAs are mostly for cataloguing purpose without providing in-depth information required to infer functions. A comprehensive resource on lncRNA function is an immediate need. We present a database for functional investigation of lncRNAs that encompasses annotation, sequence analysis, gene expression, protein binding, and phylogenetic conservation. We have compiled lncRNAs for 6 species (human, mouse, zebrafish, fruit fly, worm, yeast) from ENSEMBL, HGNC, MGI, and lncRNAdb. Each lncRNA was analyzed for coding potential and phylogenetic conservation in different lineages. Gene expression data of 208 RNA-Seq studies (4995 samples), collected from GEO, ENCODE, modENCODE, and TCGA databases, were used to provide expression profiles in various tissues, diseases, and developmental stages. Importantly, we analyzed RNA-Seq data to identify co-expressed mRNAs that would provide ample insights on lncRNA functions. The resulting gene list can be subject to enrichment analysis such as Gene Ontology or KEGG pathways. Furthermore, we compiled protein-lncRNA interactions by collecting and analyzing publicly available CLIP-seq or PAR-CLIP sequencing data. Finally, we explored evolutionarily conserved lncRNAs with correlated expression between human and six other organisms to identify functional lncRNAs. The whole contents are provided in a user-friendly web interface. lncRNAtor is available at
    Bioinformatics 05/2014; 30(17). DOI:10.1093/bioinformatics/btu325 · 4.98 Impact Factor
Show more