RIPSeeker: A statistical package for identifying protein-associated transcripts from RIP-seq experiments

Department of Computer Science, University of Toronto, Toronto, Ontario, M5S 2E4, Canada The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada, Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A4, Canada and Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
Nucleic Acids Research (Impact Factor: 9.11). 02/2013; 41(8). DOI: 10.1093/nar/gkt142
Source: PubMed


RIP-seq has recently been developed to discover genome-wide RNA transcripts that interact with a protein or protein complex. RIP-seq is similar to both RNA-seq and ChIP-seq, but presents unique properties and challenges. Currently, no statistical tool is dedicated to RIP-seq analysis. We developed RIPSeeker (, a free open-source Bioconductor/R package for de novo RIP peak predictions based on HMM. To demonstrate the utility of the software package, we applied RIPSeeker and six other published programs to three independent RIP-seq datasets and two PAR-CLIP datasets corresponding to six distinct RNA-binding proteins. Based on receiver operating curves, RIPSeeker demonstrates superior sensitivity and specificity in discriminating high-confidence peaks that are consistently agreed on among a majority of the comparison methods, and dominated 9 of the 12 evaluations, averaging 80% area under the curve. The peaks from RIPSeeker are further confirmed based on their significant enrichment for biologically meaningful genomic elements, published sequence motifs and association with canonical transcripts known to interact with the proteins examined. While RIPSeeker is specifically tailored for RIP-seq data analysis, it also provides a suite of bioinformatics tools integrated within a self-contained software package comprehensively addressing issues ranging from post-alignments' processing to visualization and annotation.

Download full-text


Available from: Yue Li, Oct 17, 2014

Click to see the full-text of:

Article: RIPSeeker: A statistical package for identifying protein-associated transcripts from RIP-seq experiments

5.27 MB

See full-text
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Unlike DNA, RNA abundances can vary over several orders of magnitude. Thus identification of RNA-protein binding sites from high throughput sequencing data presents unique challenges. While peak identification in ChIP-Seq data has been extensively explored, there are few bioinformatics tools tailored for peak calling on analogous datasets for RNA-binding proteins. Here we describe ASPeak, an implementation of an algorithm that we previously applied to detect peaks in Exon Junction Complex (EJC) RNA immunoprecipitation in tandem (RIPiT) experiments. Our peak detection algorithm yields stringent and robust target sets enabling sensitive motif finding and downstream functional analyses. ASPeak is implemented in Perl as a complete pipeline that takes bedGraph files as input. ASPeak implementation is freely available at under the GNU General Public License. ASPeak can be run on a personal computer, yet is designed to be easily parallelizable. ASPeak can also run on high performance computing clusters providing efficient speedup. The documentation and user manual can be obtained from;
    Bioinformatics 08/2013; 29(19). DOI:10.1093/bioinformatics/btt428 · 4.98 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite the prevalent studies of DNA/Chromatin related epigenetics, such as, histone modifications and DNA methylation, RNA epigenetics did not receive deserved attention due to the lack of high throughput approach for profiling epitranscriptome. Recently, a new affinity-based sequencing approach MeRIPseq was developed and applied to survey the global mRNA N6-methyladenosine (m6A) in mammalian cells. As a marriage of ChIPseq and RNAseq, MeRIPseq has the potential to study, for the first time, the transcriptome-wide distribution of different types of post-transcriptional RNA modifications. Yet, this technology introduced new computational challenges that have not been adequately addressed. We have previously developed a MATLAB-based package 'exomePeak' for detection of RNA methylation sites from MeRIPseq data. Here, we extend the features of exomePeak by including a novel computational framework that enables differential analysis to unveil the dynamics in RNA epigenetic regulations. The novel differential analysis monitors the percentage of modified RNA molecules among the total transcribed RNAs, which directly reflects the impact of RNA epigenetic regulations. In contrast, current available software packages developed for sequencing-based differential analysis such as DESeq or edgeR monitors the changes in the absolute amount of molecules, and, if applied to MeRIPseq data, might be dominated by transcriptional gene differential expression. The algorithm is implemented as an R-package 'exomePeak' and freely available. It takes directly the aligned BAM files as input, statistically supports biological replicates, corrects PCR artifacts, and outputs exome-based results in BED format, which is compatible with all major genome browsers for convenient visualization and manipulation. Examples are also provided to depict how exomePeak R-package is integrated with exiting tools for MeRIPseq based peak calling and differential analysis. Particularly, the rationa- es behind each processing step as well as the specific method used, the best practice, and possible alternative strategies are briefly discussed. The algorithm was applied to the human HepG2 cell MeRIPseq data sets and detects more than 16000 RNA m6A sites, many of which are differentially methylated under ultraviolet radiation. The challenges and potentials of MeRIPseq in epitranscriptome studies are discussed in the end.
    2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 12/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: CLIP-seq is widely used to study genome-wide interactions between RNA-binding proteins and RNAs. However, there are few tools available to analyze CLIP-seq data, thus creating a bottleneck to the implementation of this methodology. Here, we present PIPE-CLIP, a Galaxy framework-based comprehensive online pipeline for reliable analysis of data generated by three types of CLIP-seq protocol: HITS-CLIP, PAR-CLIP and iCLIP. PIPE-CLIP provides both data processing and statistical analysis to determine candidate cross-linking regions, which are comparable to those regions identified from the original studies or using existing computational tools. PIPE-CLIP is available at
    Genome biology 01/2014; 15(1):R18. DOI:10.1186/gb-2014-15-1-r18 · 10.81 Impact Factor
Show more