Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nature Biotech. 28, 503-510

Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
Nature Biotechnology (Impact Factor: 41.51). 05/2010; 28(5):503-10. DOI: 10.1038/nbt.1633
Source: PubMed


Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5' start sites, 3' ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.

Download full-text


Available from: Manuel Garber, Oct 05, 2015
1 Follower
68 Reads
  • Source
    • ". [data: NA12878 human whole genome sequence [32]] (b) Sequence assembly and annotation [33]. [data: Staphyloccus aureus whole genome sequence [31]] (c) Transcriptome reconstruction [23]. [data: "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: A wide variety of large-scale data have been produced in bioinformatics. In response, the need for efficient handling of biomedical big data has been partly met by parallel computing. Nevertheless, the time demand of many bioinformatics programs still remains high for large-scale practical uses, due to factors that hinder acceleration by parallelization. Recently, new generations of storage devices are emerging, such as NAND flash solid-state drives (SSDs), and with the renewed interest in near-data processing, they are increasingly becoming acceleration methods that can accompany parallel processing. In certain cases, a simple drop-in replacement of HDDs by SSDs results in dramatic speedup. Despite the various advantages and continuous cost reduction of SSDs, there has been little research on SSD-based profiling and performance exploration of important but time-consuming bioinformatics programs. Results: We perform in-depth profiling and analysis of 23 key bioinformatics programs using multiple types of SSDs. Based on the insight obtained therefrom, we further discuss issues related to design and optimize bioinformatics algorithms and pipelines to fully exploit SSDs. The programs profiled cover traditional and emerging areas of importance, such as alignment, assembly, mapping, expression analysis, variant calling, and metagenomics. We demonstrate how acceleration by parallelization can be combined with SSDs for extra performance and also how using SSDs can expedite important bioinformatics pipelines such as variant calling by the Genome Analysis Toolkit (GATK) and transcriptome analysis using RNA-seq. We hope that this paper can provide useful directions and tips that should accompany future bioinformatics algorithm design procedures that properly consider new generations of powerful storage devices. Availability:
  • Source
    • "RNA-seq enables accurate characterization and comparison of transcript structures across closely related species (Blekhman et al. 2010; Perry et al. 2012). Using RNA-seq data, one can identify exons and predict transcript structures de novo even in the absence of prior transcriptome annotations (Guttman et al. 2010; Trapnell et al. 2010). Here, we used deep RNA-seq data of human and nonhuman primate tissues to reconstruct and compare transcript structures across the primate phylogeny. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Changes in exon-intron structures and splicing patterns represent an important mechanism for the evolution of gene functions and species-specific regulatory networks. While exon creation is widespread during primate and human evolution and has been studied extensively, much less is known about the scope and potential impact of human-specific exon loss events. Historically, transcriptome data and exon annotations are significantly biased towards humans over nonhuman primates. This ascertainment bias makes it challenging to discover human-specific exon loss events. We carried out a transcriptome-wide search of human-specific exon loss events, by taking advantage of RNA-seq as a powerful and unbiased tool for exon discovery and annotation. Using RNA-seq data of humans, chimpanzees, and other primates, we reconstructed and compared transcript structures across the primate phylogeny. We discovered 33 candidate human-specific exon loss events, among which 6 exons passed stringent experimental filters for the complete loss of splicing activities in diverse human tissues. These events may result from human-specific deletion of genomic DNA, or small-scale sequence changes that inactivated splicing signals. The impact of human-specific exon loss events is predominantly regulatory. Three of the 6 events occurred in the 5'-UTR and affected cis regulatory elements of mRNA translation. In SLC7A6, a gene encoding an amino acid transporter, luciferase reporter assays showed that both a human-specific exon loss event and an independent human-specific single nucleotide substitution in the 5'-UTR increased mRNA translational efficiency. Our study provides novel insights into the molecular mechanisms and evolutionary consequences of exon loss during human evolution.
    Molecular Biology and Evolution 11/2014; 32(2). DOI:10.1093/molbev/msu317 · 9.11 Impact Factor
  • Source
    • "However, more than 50% the 5hmC peaks they identified were located at genic regions, where they are known to be associated with gene activation [10, 12, 38, 40, 42]. It is also possible that the 5hmC peaks at distal regions are associated with non-coding RNAs such as long non-coding RNAs (lincRNAs) [43]. Sérandour and colleagues also identified 5hmC at distal PPARγ binding sites [33]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Recent mapping of 5-hydroxymethylcytosine (5hmC) provides a genome-wide view of the distribution of this important chromatin mark. However, the role of 5hmC in specific regulatory regions is not clear, especially at enhancers. Results We found a group of distal transcription factor binding sites highly enriched for 5-hdroxymethylcytosine (5hmC), but lacking any known activating histone marks and being depleted for nascent transcripts, suggesting a repressive role for 5hmC in mouse embryonic stem cells (mESCs). 5-formylcytosine (5fC), which is known to mark poised enhancers where H3K4me1 is enriched, is also observed at these sites. Furthermore, the 5hmC levels were inversely correlated with RNA polymerase II (PolII) occupancy in mESCs as well as in fully differentiated adipocytes. Interestingly, activating H3K4me1/2 histone marks were enriched at these sites when the associated genes become activated following lineage specification. These putative enhancers were shown to be functional in embryonic stem cells when unmethylated. Together, these data suggest that 5hmC suppresses the activity of this group of enhancers, which we termed “silenced enhancers”. Conclusions Our findings indicate that 5hmC has a repressive role at specific proximal and distal regulatory regions in mESCs, and suggest that 5hmC is a new epigenetic mark for silenced enhancers. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-670) contains supplementary material, which is available to authorized users.
    BMC Genomics 08/2014; 15(1):670. DOI:10.1186/1471-2164-15-670 · 3.99 Impact Factor
Show more