Hybridization-based reconstruction of small non-coding RNA transcripts from deep sequencing data.
ABSTRACT Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present NorahDesk, the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. NorahDesk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with Dario and mirDeep2 and found that NorahDesk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.