Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.

Department of Computer Science, University of Maryland, College Park, Maryland, USA.
Nature Biotechnology (Impact Factor: 39.08). 05/2010; 28(5):511-5. DOI: 10.1038/nbt.1621
Source: PubMed

ABSTRACT High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.


Available from: Jeltje van Baren, Jun 03, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Previously, we identified a major quantitative trait locus (QTL) for host response to Porcine Respiratory and Reproductive Syndrome virus (PRRSV) infection in high linkage disequilibrium (LD) with SNP rs80800372 on Sus scrofa chromosome 4 (SSC4). Within this QTL, guanylate binding protein 5 (GBP5) was differentially expressed (DE) (p < 0.05) in blood from AA versus AB rs80800372 genotyped pigs at 7,11, and 14 days post PRRSV infection. All variants within the GBP5 transcript in LD with rs80800372 exhibited allele specific expression (ASE) in AB individuals (p < 0.0001). A transcript re-assembly revealed three alternatively spliced transcripts for GBP5. An intronic SNP in GBP5, rs340943904, introduces a splice acceptor site that inserts five nucleotides into the transcript. Individuals homozygous for the unfavorable AA genotype predominantly produced this transcript, with a shifted reading frame and early stop codon that truncates the 88 C-terminal amino acids of the protein. RNA-seq analysis confirmed this SNP was associated with differential splicing by QTL genotype (p < 0.0001) and this was validated by quantitative capillary electrophoresis (p < 0.0001). The wild-type transcript was expressed at a higher level in AB versus AA individuals, whereas the five-nucleotide insertion transcript was the dominant form in AA individuals. Splicing and ASE results are consistent with the observed dominant nature of the favorable QTL allele. The rs340943904 SNP was also 100 % concordant with rs80800372 in a validation population that possessed an alternate form of the favorable B QTL haplotype. GBP5 is known to play a role in inflammasome assembly during immune response. However, the role of GBP5 host genetic variation in viral immunity is novel. These findings demonstrate that rs340943904 is a strong candidate causal mutation for the SSC4 QTL that controls variation in host response to PRRSV.
    BMC Genomics 05/2015; 16(1). DOI:10.1186/s12864-015-1635-9 · 4.04 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Hypoxia and temperature stress are two major adverse environmental conditions often encountered by fishes. The interaction between hypoxia and temperature stresses has been well documented and oxygen is considered to be the limiting factor for the thermal tolerance of fish. Although both high and low temperature stresses can impair the cardiovascular function and the cross-resistance between hypoxia and heat stress has been found, it is not clear whether hypoxia acclimation can protect fish from cold injury. Pre-acclimation of 96-hpf zebrafish larvae to mild hypoxia (5% O2) significantly improved their resistance to lethal hypoxia (2.5% O2) and increased the survival rate of zebrafish larvae after lethal cold (10°C) exposure. However, pre-acclimation of 96-hpf larvae to cold (18°C) decreased their tolerance to lethal hypoxia although their ability to endure lethal cold increased. RNA-seq analysis identified 132 up-regulated and 41 down-regulated genes upon mild hypoxia exposure. Gene ontology enrichment analyses revealed that genes up-regulated by hypoxia are primarily involved in oxygen transport, oxidation-reduction process, hemoglobin biosynthetic process, erythrocyte development and cellular iron ion homeostasis. Hypoxia-inhibited genes are enriched in inorganic anion transport, sodium ion transport, very long-chain fatty acid biosynthetic process and cytidine deamination. A comparison with the dataset of cold-regulated gene expression identified 23 genes co-induced by hypoxia and cold and these genes are mainly associated with oxidation-reduction process, oxygen transport, hemopoiesis, hemoglobin biosynthetic process and cellular iron ion homeostasis. The alleviation of lipid peroxidation damage by both cold- and hypoxia-acclimation upon lethal cold stress suggests the association of these genes with cold resistance. Furthermore, the alternative promoter of hmbsb gene specifically activated by hypoxia and cold was identified and confirmed. Acclimation responses to mild hypoxia and cold stress were found in zebrafish larvae and pre-acclimation to hypoxia significantly improved the tolerance of larvae to lethal cold stress. RNA-seq and bioinformatics analyses revealed the biological processes associated with hypoxia acclimation. Transcriptional events co-induced by hypoxia and cold may represent the molecular basis underlying the protection of hypoxia-acclimation against cold injury.
    BMC Genomics 05/2015; 16(1):385. DOI:10.1186/s12864-015-1560-y · 4.04 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Nucleosomes are the building blocks of chromatin where gene regulation takes place. Chromatin landscapes have been profiled for several species, providing insights into the understanding of fundamental mechanisms of chromatin-mediated transcriptional regulation of gene expression. However, knowledge is missing from several major and deep-branching eukaryotic groups, such as the marine model diatom Phaeodactylum tricornutum. Diatoms are highly diverse and ubiquitous species of phytoplankton that play a key role in global biogeochemical cycles. Dissecting chromatin-mediated regulation of genes in diatoms will help understand the ecological success of these organisms in contemporary oceans. Here, we use high resolution mass spectrometry to identify a full repertoire of post-translational modifications on P. tricornutum histones, including eight novel modifications. We map five histone marks coupled with expression data and show that P. tricornutum displays both unique and broadly conserved chromatin features, reflecting the chimeric nature of its genome. Combinatorial analysis of histone marks and DNA methylation demonstrates the presence of an epigenetic code defining active or repressive chromatin states. We further profile three specific histone marks under conditions of nitrate depletion and show that the histone code is dynamic and targets specific sets of genes. This study is the first genome-wide characterization of the histone code from a Stramenopile and a marine phytoplankton. The work represents an important initial step for understanding the evolutionary history of chromatin and how epigenetic modifications affect gene expression in response to environmental cues in marine environments.
    Genome biology 05/2015; 16(1):102. DOI:10.1186/s13059-015-0671-8 · 10.47 Impact Factor