Finished bacterial genomes from shotgun sequence data

Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA.
Genome Research (Impact Factor: 14.63). 07/2012; 22(11). DOI: 10.1101/gr.141515.112
Source: PubMed


Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been "finished" at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.

Download full-text


Available from: Dariusz Przybylski
    • "Dmitry Antipov, email: proved to be useful in generating high-quality assemblies at a relatively low cost (Koren et al., 2012; Ribeiro et al., 2012; Deshpande et al., 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. Results: We describe hybridSPAdes algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybridSPAdes generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. Availability and implementation: hybridSPAdes is implemented in C++ as a part of SPAdes genome assembler and is publicly available at CONTACT:
    No preview · Article · Nov 2015 · Bioinformatics
  • Source
    • "To enable the use of the ALLPATHS- LG genome assembler, we built two specialized Illumina libraries: a " fragment library " with paired-end 300 bp reads (i.e. 2 x 300 bp) and a " jumping library " with mate-pair reads with an average insert size of approximately 6.5 kb. Briefly, ALLPATHS-LG first joins paired-end reads from the fragment library that overlap to create longer reads, from which it builds a de Bruijn graph to construct contigs; the longer insert jumping library is then incorporated into the de Bruijn graph to scaffold the contigs, resolve repeats, and flatten the graph (Ribeiro et al. 2012). Since all Saccharomyces genomes contain Ty retrotransposons that are approximately 6 kb, duplicate gene families, and several other large repeats, a long-read or longinsert scaffolding strategy is critical to providing physical evidence that spans gaps to order and orient contigs. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The dramatic phenotypic changes that occur in organisms during domestication leave indelible imprints on their genomes. Although many domesticated plants and animals have been systematically compared to their wild genetic stocks, the molecular and genomic processes underlying fungal domestication have received less attention. Here we present a nearly complete genome assembly for the recently described yeast species Saccharomyces eubayanus and compare it to the genomes of multiple domesticated alloploid hybrids of S. eubayanus x S. cerevisiae (S. pastorianus syn. S. carlsbergensis), which are used to brew lager-style beers. We find that the S. eubayanus subgenomes of lager-brewing yeasts have experienced increased rates of evolution since hybridization, and that certain genes involved in metabolism may have been particularly affected. Interestingly, the S. eubayanus subgenome underwent an especially strong shift in selection regimes, consistent with more extensive domestication of the S. cerevisiae parent prior to hybridization. In contrast to recent proposals that lager-brewing yeasts were domesticated following a single hybridization event, the radically different neutral site divergences between the subgenomes of the two major lager yeast lineages strongly favor at least two independent origins for the S. cerevisiae x S. eubayanus hybrids that brew lager beers. Our findings demonstrate how this industrially important hybrid has been domesticated along similar evolutionary trajectories on multiple occasions. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Full-text · Article · Aug 2015 · Molecular Biology and Evolution
  • Source
    • "To date a few algorithms have been released that are capable of upgrading PacBio CLR data with high accuracy data from CCS or short read NGS data, among which PacBioToCA [9] and LSC [10]. These are further incorporated into hybrid assembly methods such as Celera [11], MIRA [12] and ALLPATHS-LG [13]. Even though promising results have been obtained, the error-correction step with short reads requires a sufficient read length (>75 bp) and sequencing depth, as well as large computational demands. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.
    Full-text · Article · Jun 2014 · BMC Bioinformatics
Show more