RNA-SeQC: RNA-Seq metrics for quality control and process optimization

Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Bioinformatics (Impact Factor: 4.98). 04/2012; 28(11):1530-2. DOI: 10.1093/bioinformatics/bts196
Source: PubMed


Summary: RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3′/5′ bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis.Availability and implementation: See www.genepattern.org to run online, or www.broadinstitute.org/rna-seqc/ for a command line tool.Contact:
ddeluca@broadinstitute.orgSupplementary information:
Supplementary data are available at Bioinformatics online.

Download full-text


Available from: David S Deluca,
  • Source
    • "The quality of the RNA-seq samples was verified with RNASseQC version 1.17 [19] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Boehringer Ingelheim uses two CHO-DG44 lines for manufacturing biotherapeutics, BI-HEX-1 and BI-HEX-2, which produce distinct cell type-specific antibody glycosylation patterns. A recently established CHO-K1 descended host, BI-HEX-K1, generates antibodies with glycosylation profiles differing from CHO-DG44. Manufacturing process development is significantly influenced by these unique profiles. To investigate the underlying glycosylation related gene expression, we leveraged our CHO host and production cell RNA-seq transcriptomics and product quality database together with the CHO-K1 genome. We observed that each BI-HEX host and antibody producing cell line has a unique gene expression fingerprint. CHO-DG44 cells only transcribe Fut10, Gfpt2 and ST8Sia6 when expressing antibodies. BI-HEX-K1 cells express ST8Sia6 at host cell level. We detected a link between BI-HEX-1/BI-HEX-2 antibody galactosylation and mannosylation and the gene expression of the B4galt gene family and genes controlling mannose processing. Furthermore, we found major differences between the CHO-DG44 and CHO-K1 lineages in the expression of sialyl transferases and enzymes synthesizing sialic acid precursors, providing a rationale for the lack of immunogenic NeuGc/NGNA synthesis in CHO. Our study highlights the value of systems biotechnology to understand glycoprotein synthesis and product glycoprofiles. Such data improve future production clone selection and process development strategies for better steering of biotherapeutic product quality. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
    Biotechnology Journal 07/2015; 10(9). DOI:10.1002/biot.201400652 · 3.49 Impact Factor
  • Source
    • "0 . 11 ; ( Kim et al . , 2013 ) ) , transcript structure and abundance were estimated using Cufflinks ( ver . 2 . 1 . 1 ; ( Trapnell et al . , 2010 ) ) , and differential expression analysis was performed using Cuffdiff ( ver . 2 . 1 . 1 ; ( Trapnell et al . , 2013 ) ) . Quality control analysis was performed using RNA - SeQC ( ver . 1 . 1 . 7 ; ( DeLuca et al . , 2012 ) ) . The cummeRbund package ( ver . 2 . 4 . 1 ; ( Trapnell et al . , 2012 ) ) for R ( ver . 3 . 0 . 2 ) was used for data visualization . Differen - tial expression analysis was performed for the four donor samples Table 3 Top 40 differentially expressed genes between temporal retina vs . macular retina with q - value < 0 . 001 and abs"
    [Show abstract] [Hide abstract]
    ABSTRACT: We examined gene expression in temporal, macular, and nasal regions of human retina and retinal pigment epithelium (RPE)/choroid using RNA-Seq.•Expression differences between macula and periphery (nasal and temporal regions) in both tissues reflect the distribution of cell types.•Nasal and temporal regions of neural retina are indistinguishable in our analysis.
    Experimental Eye Research 11/2014; 129. DOI:10.1016/j.exer.2014.11.001 · 2.71 Impact Factor
  • Source
    • "The mapped reads are available as.sam or.bam files for each sample, which needs to be quality controlled because some issues only appear after the mapping/alignment of reads are finished. Running the after-alignment/mapping quality control (e.g. using software such as RNA-SeQC, DeLuca et al., 2012) or Qualimap (http://qualimap.bioinfo. cipf.es/) "
    [Show abstract] [Hide abstract]
    ABSTRACT: Livestock genomics has gone through a paradigm shift since the advent of genome sequencing that includes Genome-Wide Association Studies (GWAS), Whole Genome Predictions (WGP) and Genomic Selection (GS). Beginning with a brief review of current progress and challenges in livestock GWAS, WGP and GS, opportunities for next generation methods are introduced that unravel the underlying systems genetics of complex traits and provide biologically meaningful and accurate predictions. Genome-Wide Epistasis Association (GWEA) and Weighted Interaction SNP Hub (WISH) network methods are introduced here to unravel complex trait genetics. These methods effectively address the problems of GWAS that have no ability to model and analyze genome-wide genetic interactions and thus do not capture any epistatic variance that could explain part of the missing heritability. Further, the Systems genomic BLUP (sgBLUP) prediction method is introduced in this paper as a next generation WGP or GS tool that can account for and differentiate SNPs with known biological roles in the phenotypic or disease outcomes and potentially increase the accuracy of prediction. It is emphasized that tools that link genetic variants to their functions, pathways and other biological roles will become even more important in the future. These tools include FunctSNP, Postgwas and NCBI2R which are briefly discussed. Genome-Wide Gene Expression (Transcriptomics) analyses using RNAseq technology are briefly discussed with some examples including results from our own pig experiments. In the last part of this review, systems genetics and systems biology approaches are introduced that involve joint modeling and analyses of multi-omics data types from genomics through transcriptomics (microarray and RNAseq), metabolomics to proteomics. It is shown using published studies that these systems approaches are valuable and powerful compared to stand-alone genomic methods in identifying key causal and highly predictive genetic variants for complex traits as well as in building up complex genetic regulatory networks. In all sections, some applications of next generation –omics methods in livestock species (e.g. feed efficiency, growth, weight gain, fertility and disease resistance in cattle, pigs and sheep) are provided with references to relevant software and tools. In conclusion, this paper reviewed the current progress, lessons and challenges in livestock genomics and its ongoing transition to and opportunities for integrative systems genetics and systems biology in animal and veterinary sciences. Most of these integrative systems genetics and systems biology tools and methods presented here are equally applicable to plant and human genetics and systems biology.
    Livestock Science 08/2014; 166(1). DOI:10.1016/j.livsci.2014.04.028 · 1.17 Impact Factor
Show more