RNA-SeQC: RNA-seq metrics for quality control and process optimization

Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Bioinformatics (Impact Factor: 4.62). 04/2012; 28(11):1530-2. DOI: 10.1093/bioinformatics/bts196
Source: PubMed

ABSTRACT Summary: RNA-seq, the application of next-generation sequencing to RNA, provides transcriptome-wide characterization of cellular activity. Assessment of sequencing performance and library quality is critical to the interpretation of RNA-seq data, yet few tools exist to address this issue. We introduce RNA-SeQC, a program which provides key measures of data quality. These metrics include yield, alignment and duplication rates; GC bias, rRNA content, regions of alignment (exon, intron and intragenic), continuity of coverage, 3′/5′ bias and count of detectable transcripts, among others. The software provides multi-sample evaluation of library construction protocols, input materials and other experimental parameters. The modularity of the software enables pipeline integration and the routine monitoring of key measures of data quality such as the number of alignable reads, duplication rates and rRNA contamination. RNA-SeQC allows investigators to make informed decisions about sample inclusion in downstream analysis. In summary, RNA-SeQC provides quality control measures critical to experiment design, process optimization and downstream computational analysis.Availability and implementation: See to run online, or for a command line tool.Contact:
ddeluca@broadinstitute.orgSupplementary information:
Supplementary data are available at Bioinformatics online.

Download full-text


Available from: David S Deluca, Jul 26, 2015
1 Follower
  • Source
    • "0 . 11 ; ( Kim et al . , 2013 ) ) , transcript structure and abundance were estimated using Cufflinks ( ver . 2 . 1 . 1 ; ( Trapnell et al . , 2010 ) ) , and differential expression analysis was performed using Cuffdiff ( ver . 2 . 1 . 1 ; ( Trapnell et al . , 2013 ) ) . Quality control analysis was performed using RNA - SeQC ( ver . 1 . 1 . 7 ; ( DeLuca et al . , 2012 ) ) . The cummeRbund package ( ver . 2 . 4 . 1 ; ( Trapnell et al . , 2012 ) ) for R ( ver . 3 . 0 . 2 ) was used for data visualization . Differen - tial expression analysis was performed for the four donor samples Table 3 Top 40 differentially expressed genes between temporal retina vs . macular retina with q - value < 0 . 001 and abs"
    [Show abstract] [Hide abstract]
    ABSTRACT: We examined gene expression in temporal, macular, and nasal regions of human retina and retinal pigment epithelium (RPE)/choroid using RNA-Seq.•Expression differences between macula and periphery (nasal and temporal regions) in both tissues reflect the distribution of cell types.•Nasal and temporal regions of neural retina are indistinguishable in our analysis.
    Experimental Eye Research 11/2014; 129. DOI:10.1016/j.exer.2014.11.001 · 3.02 Impact Factor
  • Source
    • "The mapped reads are available as.sam or.bam files for each sample, which needs to be quality controlled because some issues only appear after the mapping/alignment of reads are finished. Running the after-alignment/mapping quality control (e.g. using software such as RNA-SeQC, DeLuca et al., 2012) or Qualimap (http://qualimap.bioinfo. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Livestock genomics has gone through a paradigm shift since the advent of genome sequencing that includes Genome-Wide Association Studies (GWAS), Whole Genome Predictions (WGP) and Genomic Selection (GS). Beginning with a brief review of current progress and challenges in livestock GWAS, WGP and GS, opportunities for next generation methods are introduced that unravel the underlying systems genetics of complex traits and provide biologically meaningful and accurate predictions. Genome-Wide Epistasis Association (GWEA) and Weighted Interaction SNP Hub (WISH) network methods are introduced here to unravel complex trait genetics. These methods effectively address the problems of GWAS that have no ability to model and analyze genome-wide genetic interactions and thus do not capture any epistatic variance that could explain part of the missing heritability. Further, the Systems genomic BLUP (sgBLUP) prediction method is introduced in this paper as a next generation WGP or GS tool that can account for and differentiate SNPs with known biological roles in the phenotypic or disease outcomes and potentially increase the accuracy of prediction. It is emphasized that tools that link genetic variants to their functions, pathways and other biological roles will become even more important in the future. These tools include FunctSNP, Postgwas and NCBI2R which are briefly discussed. Genome-Wide Gene Expression (Transcriptomics) analyses using RNAseq technology are briefly discussed with some examples including results from our own pig experiments. In the last part of this review, systems genetics and systems biology approaches are introduced that involve joint modeling and analyses of multi-omics data types from genomics through transcriptomics (microarray and RNAseq), metabolomics to proteomics. It is shown using published studies that these systems approaches are valuable and powerful compared to stand-alone genomic methods in identifying key causal and highly predictive genetic variants for complex traits as well as in building up complex genetic regulatory networks. In all sections, some applications of next generation –omics methods in livestock species (e.g. feed efficiency, growth, weight gain, fertility and disease resistance in cattle, pigs and sheep) are provided with references to relevant software and tools. In conclusion, this paper reviewed the current progress, lessons and challenges in livestock genomics and its ongoing transition to and opportunities for integrative systems genetics and systems biology in animal and veterinary sciences. Most of these integrative systems genetics and systems biology tools and methods presented here are equally applicable to plant and human genetics and systems biology.
    Livestock Science 08/2014; 166. DOI:10.1016/j.livsci.2014.04.028 · 1.10 Impact Factor
  • Source
    • "Resource Description References GeNCODe annotates gene-based features including alternatively transcribed variants PMID: 22955987 Harrow et al. (2012) eNCODe Catalogs all functional elements in the genome PMID: 15499007 eNCODe Project Consortium (2004) ensembl Produces genome databases for vertebrates and other eukaryotic species PMID: 11752248 Hubbard et al. (2002) RefSeq Provides reference sequence standards for genomes, transcripts and proteins PMID: 11125071 Pruitt and Maglott (2001) GTex Provides a resource database and tissue bank to investigate relationship between genetic variation and gene expression in human tissues PMID: 23715323 (GTex Consortium 2013) Illumina Human Body Map Provides reference RNa-Seq data in 16 different human tissues PMID: 22496456 asmann et al. (2012) RNa-SeQC Generates quality control (QC) metrics for RNa-Seq data PMID: 22539670 DeLuca et al. (2012) htSeqTools Provides quality assessment and visualization of highthroughput data in the Bioconductor environment PMID: 22199381 Planet et al. (2012) TopHat and Cufflinks aligns reads, identifies splice sites, and performs differential expression analysis of RNa-Seq data PMID: 22383036 Trapnell et al. (2012) DeSeq and edgeR Facilitates differential expression analysis of RNa-Seq data using R and Bioconductor PMID: 23975260 anders et al. (2013) aStalavista extracts and displays alternative splicing events PMID: 17485470 Foissac and Sammeth (2007) eCGene Provides annotation for gene structure, function and expression using alternative splicing PMID: 15608289 Kim et al. (2005) MISO/sashimi_plot Quantifies alternatively spliced genes from RNa-Seq and provides visualization PMID: 21057496 Katz et al. (2010) Integrated Genomics viewer Facilitates visualization of genomics high-throughput data PMID: 21221095 Robinson et al. (2011) ReSCUe-eSe Identifies sequences with exonic splicing enhancer activity PMID: 12114529 Fairbrother et al. (2002) FaS-eSS Identifies sequences with exonic splicing silencer activity PMID: 15607979 wang et al. (2004) expression as eQTLs (Coulombe-Huntington et al. 2009). variation in alternative splicing is highly heritable, with family-based linkage analysis demonstrating that transcript isoforms of a variety of genes undergo Mendelian inheritance and segregation (Kwan et al. 2007). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Alternative splicing is a major cellular mechanism in metazoans for generating proteomic diversity. A large proportion of protein-coding genes in multicellular organisms undergo alternative splicing, and in humans, it has been estimated that nearly 90 % of protein-coding genes-much larger than expected-are subject to alternative splicing. Genomic analyses of alternative splicing have illuminated its universal role in shaping the evolution of genomes, in the control of developmental processes, and in the dynamic regulation of the transcriptome to influence phenotype. Disruption of the splicing machinery has been found to drive pathophysiology, and indeed reprogramming of aberrant splicing can provide novel approaches to the development of molecular therapy. This review focuses on the recent progress in our understanding of alternative splicing brought about by the unprecedented explosive growth of genomic data and highlights the relevance of human splicing variation on disease and therapy.
    Human Genetics 06/2014; 133(6). DOI:10.1007/s00439-013-1411-3 · 4.52 Impact Factor
Show more