Synthetic spike-in standards for RNA-seq experiments

Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
Genome Research (Impact Factor: 14.63). 08/2011; 21(9):1543-51. DOI: 10.1101/gr.121095.111
Source: PubMed


High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.

Download full-text


Available from: Carrie Davis,
  • Source
    • "To interrogate biological variability, it is vital to accurately estimate and then account for technical variability. The most widely used approach to quantify technical variability is to use external spike-in RNA molecules (e.g., the ERCC RNA spike-in mix), which can be added to each cell's lysate at the same quantity (Jiang et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications. Copyright © 2015 Elsevier Inc. All rights reserved.
    Molecular cell 05/2015; 58(4):610-620. DOI:10.1016/j.molcel.2015.04.005 · 14.02 Impact Factor
  • Source
    • "Analogous methodologies have been applied in areas of gene-expression analysis that have revealed global transcriptional amplification upon normalization and in MethylC-seq , where bisulfate conversion rates have been normalized (Kanno et al., 2006; Krueger et al., 2012; Lin et al., 2012; Lové n et al., 2012; van de Peppel et al., 2003). These advancements have allowed standardization, precision, and a mechanistic understanding of RNA transcription (van Bakel and Holstege, 2004; Jiang et al., 2011; Li et al., 2013); however, no cellcount-normalized methods have been applied to global correction of histone posttranslational modifications. Since a vast array of histone modifications have been described in eukaryotic cells that play roles in organismal development, maintenance of cell state, differentiation, and disease, including those associated with transcriptional processes, genome organization, DNA repair, and cell-cycle progression (Calo and Wysocka, 2013; Pastor et al., 2013; Rinn and Chang, 2012; Rivera and Ren, 2013; Tan et al., 2011; Tian et al., 2012), a quantitative method for comparing these key marks is needed. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenomic profiling by chromatin immunoprecipitation coupled with massively parallel DNA sequencing (ChIP-seq) is a prevailing methodology used to investigate chromatin-based regulation in biological systems such as human disease, but the lack of an empirical methodology to enable normalization among experiments has limited the precision and usefulness of this technique. Here, we describe a method called ChIP with reference exogenous genome (ChIP-Rx) that allows one to perform genome-wide quantitative comparisons of histone modification status across cell populations using defined quantities of a reference epigenome. ChIP-Rx enables the discovery and quantification of dynamic epigenomic profiles across mammalian cells that would otherwise remain hidden using traditional normalization methods. We demonstrate the utility of this method for measuring epigenomic changes following chemical perturbations and show how reference normalization of ChIP-seq experiments enables the discovery of disease-relevant changes in histone modification occupancy.
    Cell Reports 11/2014; 9(3):1163-70. DOI:10.1016/j.celrep.2014.10.018 · 8.36 Impact Factor
  • Source
    • "From the total of five libraries, we generated 557,094,098 raw reads with an average length of 251 bp containing 139,830,744,098 nucleotide bases. Formal research has suggested that to achieve 99% coverage of an mRNA, at least an 8X sequencing depth is required [27]. For this study, the sequencing depth is 50X, enough to get the maximum coverage. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Broccoli (Brassica oleracea var. italica), a member of Cruciferae, is an important vegetable containing high concentration of various nutritive and functional molecules especially the anticarcinogenic glucosinolates. The sprouts of broccoli contain 10-100 times higher level of glucoraphanin, the main contributor of the anticarcinogenesis, than the edible florets. Despite the broccoli sprouts' functional importance, currently available genetic and genomic tools for their studies are very limited, which greatly restricts the development of this functionally important vegetable. A total of ∼85 million 251 bp reads were obtained. After de novo assembly and searching the assembled transcripts against the Arabidopsis thaliana and NCBI nr databases, 19,441 top-hit transcripts were clustered as unigenes with an average length of 2,133 bp. These unigenes were classified according to their putative functional categories. Cluster analysis of total unigenes with similar expression patterns and differentially expressed unigenes among different tissues, as well as transcription factor analysis were performed. We identified 25 putative glucosinolate metabolism genes sharing 62.04-89.72% nucleotide sequence identity with the Arabidopsis orthologs. This established a broccoli glucosinolate metabolic pathway with high colinearity to Arabidopsis. Many of the biosynthetic and degradation genes showed higher expression after germination than in seeds; especially the expression of the myrosinase TGG2 was 20-130 times higher. These results along with the previous reports about these genes' studies in Arabidopsis and the glucosinolate concentration in broccoli sprouts indicate the breakdown products of glucosinolates may play important roles in the stage of broccoli seed germination and sprout development. Our study provides the largest genetic resource of broccoli to date. These data will pave the way for further studies and genetic engineering of broccoli sprouts and will also provide new insight into the genomic research of this species and its relatives.
    PLoS ONE 02/2014; 9(2):e88804. DOI:10.1371/journal.pone.0088804 · 3.23 Impact Factor
Show more