De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics

Max-Delbrück-Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Robert Rössle Strasse 10, Berlin, Germany.
Genome Research (Impact Factor: 14.63). 05/2011; 21(7):1193-200. DOI: 10.1101/gr.113779.110
Source: PubMed

ABSTRACT Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5' and 3' ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%-13%, 0%-3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database.

Download full-text


Available from: Pinar Önal, Sep 27, 2015
1 Follower
109 Reads
  • Source
    • "We sought to define the molecular mechanisms driving pharynx regeneration by expression-profiling experiments. We designed custom oligonucleotide microarrays representing 43,806 predicted S. mediterranea transcripts and isoforms from various sources (Robb et al., 2008; Blythe et al., 2010; Adamidi et al., 2011). Based on our observations that pharynx regeneration triggered a localized stem cell proliferative response, we isolated a plug of tissue surrounding the pharynx wound site in order to enrich for those transcripts most directly relevant to this process (Figure 2F). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Planarian flatworms regenerate every organ after amputation. Adult pluripotent stem cells drive this ability, but how injury activates and directs stem cells into the appropriate lineages is unclear. Here we describe a single-organ regeneration assay in which ejection of the planarian pharynx is selectively induced by brief exposure of animals to sodium azide. To identify genes required for pharynx regeneration, we performed an RNAi screen of 356 genes upregulated after amputation, using successful feeding as a proxy for regeneration. We found that knockdown of 20 genes caused a wide range of regeneration phenotypes and that RNAi of the forkhead transcription factor FoxA, which is expressed in a subpopulation of stem cells, specifically inhibited regrowth of the pharynx. Selective amputation of the pharynx therefore permits the identification of genes required for organ-specific regeneration and suggests an ancient function for FoxA-dependent transcriptional programs in driving regeneration. DOI:
    eLife Sciences 04/2014; 3(3):e02238. DOI:10.7554/eLife.02238 · 9.32 Impact Factor
  • Source
    • "Evaluating this trade-off is difficult even when a gold standard is available, e.g., when re-assembling a genome with known sequence. In most practical settings, a reference genome sequence is not available, and the validation process must rely on other sources of information, such as independently derived data from mapping experiments [17], or from transcriptome sequencing [18]. Such data are, however, often not generated due to their high cost relative to the rapidly decreasing costs of sequencing. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The current revolution in genomics has been made possible by software tools called genome assemblers, which stitch together DNA fragments "read" by sequencing machines into complete or nearly complete genome sequences. Despite decades of research in this field and the development of dozens of genome assemblers, assessing and comparing the quality of assembled genome sequences still relies on the availability of independently determined standards, such as manually curated genome sequences, or independently produced mapping data. These "gold standards" can be expensive to produce and may only cover a small fraction of the genome, which limits their applicability to newly generated genome sequences. Here we introduce a de novo probabilistic measure of assembly quality which allows for an objective comparison of multiple assemblies generated from the same set of reads. We define the quality of a sequence produced by an assembler as the conditional probability of observing the sequenced reads from the assembled sequence. A key property of our metric is that the true genome sequence maximizes the score, unlike other commonly used metrics. We demonstrate that our de novo score can be computed quickly and accurately in a practical setting even for large datasets, by estimating the score from a relatively small sample of the reads. To demonstrate the benefits of our score, we measure the quality of the assemblies generated in the GAGE and Assemblathon 1 assembly "bake-offs" with our metric. Even without knowledge of the true reference sequence, our de novo metric closely matches the reference-based evaluation metrics used in the studies and outperforms other de novo metrics traditionally used to measure assembly quality (such as N50). Finally, we highlight the application of our score to optimize assembly parameters used in genome assemblers, which enables better assemblies to be produced, even without prior knowledge of the genome being assembled. Likelihood-based measures, such as ours proposed here, will become the new standard for de novo assembly evaluation.
    BMC Research Notes 08/2013; 6(1):334. DOI:10.1186/1756-0500-6-334
  • Source
    • "rRNA-depleted RNA was selected by using the Ribo-ZeroTM rRNA removal kit following manufacturer's protocol (EpiCenter) and quantified using a Nanodrop 7500 spectrophotometer. 100 ng of rRNA-depleted RNA was fragmented and RNAseq library preparation was carried out as described previously (Adamidi et al., 2011). RNA-seq was performed on a HiSeq2000 sequencing platform with 1 Â 100 cycles of single read single-plex sequencing, in accordance with manufacturer's instructions (Illumina). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transcriptome analysis of polar bears (Ursus maritimus) yielded sequences with highest similarity to the human endogenous retrovirus group HERV-K(HML-2). Further analysis of the polar bear draft genome identified an endogenous betaretrovirus group comprising 26 proviral copies and 231 solo LTRs. Molecular dating indicates the group originated before the divergence of bears from a common ancestor but is not present in all carnivores. Closely related sequences were identified in the giant panda (Ailuropoda melanoleuca) and characterized from its genome. We have designated the polar bear and giant panda sequences U. maritimus endogenous retrovirus (UmaERV) and A. melanoleuca endogenous retrovirus (AmeERV), respectively. Phylogenetic analysis demonstrated that the bear virus group is nested within the HERV-K supergroup among bovine and bat endogenous retroviruses suggesting a complex evolutionary history within the HERV-K group. All individual remnants of proviral sequences contain numerous frameshifts and stop codons and thus, the virus is likely non-infectious.
    Virology 05/2013; 443(1). DOI:10.1016/j.virol.2013.05.008 · 3.32 Impact Factor
Show more