[show abstract][hide abstract] ABSTRACT: BACKGROUND: Combinations of histone variants and modifications, conceptually representing a histone code, have been proposed to play a significant role in gene regulation and developmental processes in complex organisms. While various mechanisms have been implicated in establishing and maintaining epigenetic patterns at specific locations in the genome, they are generally believed to be independent of primary DNA sequence on a more global scale. RESULTS: To address this systematically in the case of the human genome, we have analyzed primary DNA sequences underlying 19 different methylated histones in human primary T-cells. We report that sequence alone can accurately predict the location of most of these histone marks genome-wide in this cell type. Furthermore, the sequence features responsible for such predictions are distinct for different groups of histone marks. CONCLUSIONS: These findings support the existence of a genomic code for histone modification associated with gene expression and chromatin programming, and they suggest that the mechanisms responsible for global histone modifications may interpret genomic sequence in various ways.
[show abstract][hide abstract] ABSTRACT: RNA sequencing (RNA-Seq) is rapidly replacing microarrays for profiling gene expression with much improved accuracy and sensitivity. One of the most common questions in a typical gene profiling experiment is how to identify a set of transcripts that are differentially expressed between different experimental conditions. Some of the statistical methods developed for microarray data analysis can be applied to RNA-Seq data with or without modifications. Recently several additional methods have been developed specifically for RNA-Seq data sets. This review attempts to give an in-depth review of these statistical methods, with the goal of providing a comprehensive guide when choosing appropriate metrics for RNA-Seq statistical analyses.
[show abstract][hide abstract] ABSTRACT: Single-molecule sequencing instruments can generate multikilobase sequences with the potential to greatly improve genome and transcriptome assembly. However, the error rates of single-molecule reads are high, which has limited their use thus far to resequencing bacteria. To address this limitation, we introduce a correction algorithm and assembly strategy that uses short, high-fidelity sequences to correct the error in single-molecule sequences. We demonstrate the utility of this approach on reads generated by a PacBio RS instrument from phage, prokaryotic and eukaryotic whole genomes, including the previously unsequenced genome of the parrot Melopsittacus undulatus, as well as for RNA-Seq reads of the corn (Zea mays) transcriptome. Our long-read correction achieves >99.9% base-call accuracy, leading to substantially better assemblies than current sequencing strategies: in the best example, the median contig size was quintupled relative to high-coverage, second-generation assemblies. Greater gains are predicted if read lengths continue to increase, including the prospect of single-contig bacterial chromosome assembly.
[show abstract][hide abstract] ABSTRACT: De novo assembly of the transcriptome is crucial for functional genomics studies in bioenergy research, since many of the organisms lack high quality reference genomes. In a previous study we successfully de novo assembled simple eukaryote transcriptomes exclusively from short Illumina RNA-Seq reads . However, extensive alternative splicing, present in most of the higher eukaryotes, poses a significant challenge for current short read assembly processes. Furthermore, the size of next-generation datasets, often large for plant genomes, presents an informatics challenge. To tackle these challenges we present a combined experimental and informatics strategy for de novo assembly in higher eukaryotes. Using maize as a test case, preliminary results suggest our approach can resolve transcript variants and improve gene annotations.
[show abstract][hide abstract] ABSTRACT: Transcription of the eukaryotic genomes is carried out by three distinct RNA polymerases I, II, and III, whereby each polymerase is thought to independently transcribe a distinct set of genes. To investigate a possible relationship of RNA polymerases II and III, we mapped their in vivo binding sites throughout the human genome by using ChIP-Seq in two different cell lines, GM12878 and K562 cells. Pol III was found to bind near many known genes as well as several previously unidentified target genes. RNA-Seq studies indicate that a majority of the bound genes are expressed, although a subset are not suggestive of stalling by RNA polymerase III. Pol II was found to bind near many known Pol III genes, including tRNA, U6, HVG, hY, 7SK and previously unidentified Pol III target genes. Similarly, in vivo binding studies also reveal that a number of transcription factors normally associated with Pol II transcription, including c-Fos, c-Jun and c-Myc, also tightly associate with most Pol III-transcribed genes. Inhibition of Pol II activity using alpha-amanitin reduced expression of a number of Pol III genes (e.g., U6, hY, HVG), suggesting that Pol II plays an important role in regulating their transcription. These results indicate that, contrary to previous expectations, polymerases can often work with one another to globally coordinate gene expression.
Proceedings of the National Academy of Sciences 02/2010; 107(8):3639-44. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Comprehensive annotation and quantification of transcriptomes are outstanding problems in functional genomics. While high throughput mRNA sequencing (RNA-Seq) has emerged as a powerful tool for addressing these problems, its success is dependent upon the availability and quality of reference genome sequences, thus limiting the organisms to which it can be applied.
Here, we describe Rnnotator, an automated software pipeline that generates transcript models by de novo assembly of RNA-Seq data without the need for a reference genome. We have applied the Rnnotator assembly pipeline to two yeast transcriptomes and compared the results to the reference gene catalogs of these organisms. The contigs produced by Rnnotator are highly accurate (95%) and reconstruct full-length genes for the majority of the existing gene models (54.3%). Furthermore, our analyses revealed many novel transcribed regions that are absent from well annotated genomes, suggesting Rnnotator serves as a complementary approach to analysis based on a reference genome for comprehensive transcriptomics.
These results demonstrate that the Rnnotator pipeline is able to reconstruct full-length transcripts in the absence of a complete reference genome.
[show abstract][hide abstract] ABSTRACT: RNA-Seq is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies. Studies using this method have already altered our view of the extent and complexity of eukaryotic transcriptomes. RNA-Seq also provides a far more precise measurement of levels of transcripts and their isoforms than other methods. This article describes the RNA-Seq approach, the challenges associated with its application, and the advances made so far in characterizing several eukaryote transcriptomes.
[show abstract][hide abstract] ABSTRACT: The identification of untranslated regions, introns, and coding regions within an organism remains challenging. We developed a quantitative sequencing-based method called RNA-Seq for mapping transcribed regions, in which complementary DNA fragments are subjected to high-throughput sequencing and mapped to the genome. We applied RNA-Seq to generate a high-resolution transcriptome map of the yeast genome and demonstrated that most (74.5%) of the nonrepetitive sequence of the yeast genome is transcribed. We confirmed many known and predicted introns and demonstrated that others are not actively used. Alternative initiation codons and upstream open reading frames also were identified for many yeast genes. We also found unexpected 3'-end heterogeneity and the presence of many overlapping genes. These results indicate that the yeast transcriptome is more complex than previously appreciated.