Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

1] Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands. [2] Netherlands Bioinformatics Centre, Leiden, The Netherlands.
Nature Biotechnology (Impact Factor: 39.08). 09/2013; DOI: 10.1038/nbt.2702
Source: PubMed

ABSTRACT RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Rice yield and quality are adversely affected by high temperatures, especially at night; high nighttime temperatures are more harmful to grain weight than high daytime temperatures. Unfortunately, global temperatures are consistently increasing at an alarming rate and the minimum nighttime temperature has increased three times as much as the corresponding maximum daytime temperature over the past few decades. We analyzed the transcriptome profiles for rice grain from heat-tolerant and -sensitive lines in response to high night temperatures at the early milky stage using the Illumina Sequencing method. The analysis results for the sequencing data indicated that 35 transcripts showed different expressions between heat-tolerant and -sensitive rice, and RT-qPCR analyses confirmed the expression patterns of selected transcripts. Functional analysis of the differentially expressed transcripts indicated that 21 genes have functional annotation and their functions are mainly involved in oxidation-reduction (6 genes), metabolic (7 genes), transport (4 genes), transcript regulation (2 genes), defense response (1 gene) and photosynthetic (1 gene) processes. Based on the functional annotation of the differentially expressed genes, the possible process that regulates these differentially expressed transcripts in rice grain responding to high night temperature stress at the early milky stage was further analyzed. This analysis indicated that high night temperature stress disrupts electron transport in the mitochondria, which leads to changes in the concentration of hydrogen ions in the mitochondrial and cellular matrix and influences the activity of enzymes involved in TCA and its secondary metabolism in plant cells. Using Illumina sequencing technology, the differences between the transcriptomes of heat-tolerant and -sensitive rice lines in response to high night temperature stress at the early milky stage was described here for the first time. The candidate transcripts may provide genetic resources that may be useful in the improvement of heat-tolerant characters of rice. The model proposed here is based on differences in expression and transcription between two rice lines. In addition, the model may support future studies on the molecular mechanisms underlying plant responses to high night temperatures.
    BMC Genomics 12/2015; 16(1). DOI:10.1186/s12864-015-1222-0 · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The high level of accuracy and sensitivity of next generation sequencing for quantifying genetic material across organismal boundaries gives it tremendous potential for pathogen discovery and diagnosis in human disease. Despite this promise, substantial bacterial contamination is routinely found in existing human-derived RNA-seq datasets that likely arises from environmental sources. This raises the need for stringent sequencing and analysis protocols for studies investigating sequence-based microbial signatures in clinical samples.
    PLoS Pathogens 11/2014; 10(11):e1004437. DOI:10.1371/journal.ppat.1004437 · 8.06 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analysing k-mer frequencies. We show that kPAL can detect technical artifacts such as high duplication rates, library chimeras, contamination, and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at
    Genome Biology 12/2014; 15(12):555. DOI:10.1186/PREACCEPT-1559595347144548 · 10.47 Impact Factor