Synthetic spike-in standards for RNA-seq experiments

Section of Developmental Genomics, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA.
Genome Research (Impact Factor: 14.63). 08/2011; 21(9):1543-51. DOI: 10.1101/gr.121095.111
Source: PubMed


High-throughput sequencing of cDNA (RNA-seq) is a widely deployed transcriptome profiling and annotation technique, but questions about the performance of different protocols and platforms remain. We used a newly developed pool of 96 synthetic RNAs with various lengths, and GC content covering a 2(20) concentration range as spike-in controls to measure sensitivity, accuracy, and biases in RNA-seq experiments as well as to derive standard curves for quantifying the abundance of transcripts. We observed linearity between read density and RNA input over the entire detection range and excellent agreement between replicates, but we observed significantly larger imprecision than expected under pure Poisson sampling errors. We use the control RNAs to directly measure reproducible protocol-dependent biases due to GC content and transcript length as well as stereotypic heterogeneity in coverage across transcripts correlated with position relative to RNA termini and priming sequence bias. These effects lead to biased quantification for short transcripts and individual exons, which is a serious problem for measurements of isoform abundances, but that can partially be corrected using appropriate models of bias. By using the control RNAs, we derive limits for the discovery and detection of rare transcripts in RNA-seq experiments. By using data collected as part of the model organism and human Encyclopedia of DNA Elements projects (ENCODE and modENCODE), we demonstrate that external RNA controls are a useful resource for evaluating sensitivity and accuracy of RNA-seq experiments for transcriptome discovery and quantification. These quality metrics facilitate comparable analysis across different samples, protocols, and platforms.

Download full-text


Available from: Carrie Davis
  • Source
    • "If reads Box 2. Experiment execution choices RNA-seq library preparation and sequencing procedures include a number of steps (RNA fragmentation, cDNA synthesis, adapter ligation, PCR amplification, bar-coding, and lane loading) that might introduce biases into the resulting data[196]. Including exogenous reference transcripts ('spike-ins') is useful both for quality control[1,197]and for library-size normalization[198]. For bias minimization, we recommend following the suggestions made by Van Dijk et al.[199], such as the use of adapters with random nucleotides at the extremities or the use of chemical-based fragmentation instead of RNase III-based fragmentation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0881-8) contains supplementary material, which is available to authorized users.
    Full-text · Article · Jan 2016 · Genome Biology
  • Source
    • "All sequencing libraries were confirmed to be of high-quality using FastQC (Andrews 2010). In addition, spike-in control mRNAs at a large range of concentrations were added at the total RNA step of library preparation (Jiang et al. 2011), and the results showed that all library preparation steps were of high quality (Table S1). Gene expression levels are reported as normalized values that take into account gene length and RNA-seq library size (Fragments Per Kilobase of Exon Per Million Fragments Mapped; FPKM) (Trapnell et al. 2012) (Table S2, Table S3, Table S4, and Table S5 for FPKM values for each brain region). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The developmental transition to motherhood requires gene expression changes that alter the brain to drive the female to perform maternal behaviors. We broadly examined the global transcriptional response in the mouse maternal brain, by examining four brain regions: hypothalamus, hippocampus, neocortex, and cerebellum, in virgin females, two pregnancy time points and three postpartum time points. We find that overall there are hundreds of differentially expressed genes, but each brain region and time point shows a unique molecular signature, with only 49 genes differentially expressed in all four regions. Interestingly, a set of 'early-response genes' is repressed in all brain regions during pregnancy and postpartum stages. Several genes previously implicated in underlying postpartum depression change expression. This study serves as an atlas of gene expression changes in the maternal brain, with the results demonstrating that pregnancy, parturition, and postpartum maternal experience substantially impact diverse brain regions.
    Full-text · Article · Nov 2015 · G3-Genes Genomes Genetics
  • Source
    • "To interrogate biological variability, it is vital to accurately estimate and then account for technical variability. The most widely used approach to quantify technical variability is to use external spike-in RNA molecules (e.g., the ERCC RNA spike-in mix), which can be added to each cell's lysate at the same quantity (Jiang et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications. Copyright © 2015 Elsevier Inc. All rights reserved.
    Full-text · Article · May 2015 · Molecular cell
Show more