SmashCell: a software framework for the analysis of single-cell amplified genome sequences

Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA.
Bioinformatics (Impact Factor: 4.62). 10/2010; 26(23):2979-80. DOI: 10.1093/bioinformatics/btq564
Source: PubMed

ABSTRACT Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes, however, is far more complicated than the analysis of those generated using traditional, culture-based methods. In order to simplify this analysis, we have developed SmashCell (Simple Metagenomics Analysis SHell-for sequences from single Cells). It is designed to automate the main steps in microbial genome analysis-assembly, gene prediction, functional annotation-in a way that allows parameter and algorithm exploration at each step in the process. It also manages the data created by these analyses and provides visualization methods for rapid analysis of the results.
The SmashCell source code and a comprehensive manual are available at
Supplementary data are available at Bioinformatics online.


Available from: David A Relman, Jun 09, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications.
    Frontiers in Cell and Developmental Biology 11/2014; 2:70. DOI:10.3389/fcell.2014.00070
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The multiple displacement amplification method has revolutionized genomic studies of uncultured bacteria, where the extraction of pure DNA in sufficient quantity for next-generation sequencing is challenging. However, the method is problematic in that it amplifies the target DNA unevenly, induces the formation of chimeric reads and also amplifies contaminating DNA. Here, we have tested the reproducibility of the multiple displacement amplification method using serial dilutions of extracted genomic DNA and intact cells from the cultured endosymbiont Bartonella australis. The amplified DNA was sequenced with the Illumina sequencing technology, and the results were compared to sequence data obtained from unamplified DNA in this study as well as from a previously published genome project. We show that artifacts such as the extent of the amplification bias, the percentage of chimeric reads and the relative fraction of contaminating DNA increase dramatically for the smallest amounts of template DNA. The pattern of read coverage was reproducibly obtained for samples with higher amounts of template DNA, suggesting that the bias is non-random and genome-specific. A re-analysis of previously published sequence data obtained after amplification from clonal endosymbiont populations confirmed these predictions. We conclude that many of the artifacts associated with the use of the multiple displacement amplification method can be alleviated or much reduced by using multiple cells as the template for the amplification. These findings should be particularly useful for researchers studying the genomes of endosymbionts and other uncultured bacteria, for which a small clonal population of cells can be isolated.
    PLoS ONE 11/2013; 8(11):e82319. DOI:10.1371/journal.pone.0082319 · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Complex microbial communities are an integral part of the Earth's ecosystem and of our bodies in health and disease. In the last two decades, culture-independent approaches have provided new insights into their structure and function, with the exponentially decreasing cost of high-throughput sequencing resulting in broadly available tools for microbial surveys. However, the field remains far from reaching a technological plateau, as both computational techniques and nucleotide sequencing platforms for microbial genomic and transcriptional content continue to improve. Current microbiome analyses are thus starting to adopt multiple and complementary meta'omic approaches, leading to unprecedented opportunities to comprehensively and accurately characterize microbial communities and their interactions with their environments and hosts. This diversity of available assays, analysis methods, and public data is in turn beginning to enable microbiome-based predictive and modeling tools. We thus review here the technological and computational meta'omics approaches that are already available, those that are under active development, their success in biological discovery, and several outstanding challenges.
    Molecular Systems Biology 05/2013; 9:666. DOI:10.1038/msb.2013.22 · 14.10 Impact Factor