Trapnell, C, Williams, BA, Pertea, G, Mortazavi, A, Kwan, G, van Baren, MJ et al.. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28: 511-515

Department of Computer Science, University of Maryland, College Park, Maryland, USA.
Nature Biotechnology (Impact Factor: 41.51). 05/2010; 28(5):511-5. DOI: 10.1038/nbt.1621
Source: PubMed


High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

Download full-text


Available from: Jeltje van Baren
  • Source
    • "Transcripts with FPKM values greater than 0.05 were identified as being expressed (i.e. above the expected false discovery rate following Trapnell et al. 2010), and this threshold was used to determine the number of expressed genes for each sample. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively (Table 1; Additional file 1: Table S1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA-Seq has enabled high-throughput gene expression profiling to provide insight into the functional link between genotype and phenotype. Low quantities of starting RNA can be a severe hindrance for studies that aim to utilize RNA-Seq. To mitigate this bottleneck, whole transcriptome amplification (WTA) technologies have been developed to generate sufficient sequencing targets from minute amounts of RNA. Successful WTA requires accurate replication of transcript abundance without the loss or distortion of specific mRNAs. Here, we test the efficacy of NuGEN's Ovation RNA-Seq V2 system, which uses linear isothermal amplification with a unique chimeric primer for amplification, using white adipose tissue from standard laboratory rats (Rattus norvegicus). Our goal was to investigate potential biological artifacts introduced through WTA approaches by establishing comparisons between matched raw and amplified RNA libraries derived from biological replicates. We found that 93% of expressed genes were identical between all unamplified versus matched amplified comparisons, also finding that gene density is similar across all comparisons. Our sequencing experiment and downstream bioinformatic analyses using the Tuxedo analysis pipeline resulted in the assembly of 25,543 high-quality transcripts. Libraries constructed from raw RNA and WTA samples averaged 15,298 and 15,253 expressed genes, respectively. Although significant differentially expressed genes (P < 0.05) were identified in all matched samples, each of these represents less than 0.15% of all shared genes for each comparison. Transcriptome amplification is efficient at maintaining relative transcript frequencies with no significant bias when using this NuGEN linear isothermal amplification kit under ideal laboratory conditions as presented in this study. This methodology has broad applications, from clinical and diagnostic, to field-based studies when sample acquisition, or sample preservation, methods prove challenging.
    Full-text · Article · Dec 2015 · BMC Biotechnology
  • Source
    • "We performed gene and transcript assembly with Cufflinks (v 2.2.0) [76] for each individual sample. Per-base read coverage and FPKM (fragments per kilobase of transcript per million mapped fragments) values were calculated for each transcript and gene as described by Trapnell et al. (2010). We only considered assembled transcripts that met the following requirements: a) the transcript was covered by at least 4 reads, b) Abundance was higher than 1% of the most abundant isoform of the gene and, c) <20% of reads were mapped to multiple locations in the genome. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that do not contain any gene or gene copy. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process we have performed in-depth sequencing of the transcriptomes of four mammalian species, human, chimpanzee, macaque and mouse, and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new transcriptional multiexonic events in human and/or chimpanzee that are not observed in the rest of species. By comparative genomics we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. We also find that the coding potential of the new genes is higher than expected by chance, consistent with the presence of protein-coding genes in the dataset. Using available human tissue proteomics and ribosome profiling data we identify several de novo genes with translation evidence. These genes show significant purifying selection signatures, indicating that they are probably functional. Taken together, the data supports a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
    Full-text · Article · Jul 2015 · PLoS Genetics
  • Source
    • "( using TopHat (defaults options; Trapnell et al., 2009) and Cufflinks (Trapnell et al., 2010, Figure 1). The first assembly was done without specifying gene annotation and was named Cufflinks WA (without annotation; Figure 1). "

    Full-text · Dataset · Jul 2015
Show more