Article

MacCallum, I. et al. ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads. Genome Biol. 10, R103

Broad Institute of MIT and Harvard, Charles Street, Cambridge, MA 02141, USA.
Genome biology (Impact Factor: 10.47). 10/2009; 10(10):R103. DOI: 10.1186/gb-2009-10-10-r103
Source: PubMed

ABSTRACT We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).

Download full-text

Full-text

Available from: Swati Ranade, Sep 01, 2015
0 Followers
 · 
239 Views
 · 
70 Downloads
  • Source
    • "A new assembly tool, called AllPaths [42] , offsets this defect. It was initially designed for large genome assembly using PE reads obtained from the Solexa sequencer, but now it works both on small and large genomes because of the release of a revised version called AllPaths2 [41] . AllPaths preprocesses reads to correct errors [62-65] . "
    [Show abstract] [Hide abstract]
    ABSTRACT: The recent breakthroughs in next-generation sequencing technologies, such as those of Roche 454, Illumina/Solexa, and ABI SOLID, have dramatically reduced the cost of producing short reads of the genome of new species. The huge volume of reads, along with short read length, high coverage, and sequencing errors, poses a great challenge to de novo genome assembly. However, the paired-end information provides a new solution to these problems. In this paper, we review and compare some current assembly tools, including Newbler, CAP3, Velvet, SOAPdenovo, AllPaths, Abyss, IDBA, PE-Assembly, and Telescoper. In general, we compare the seed extension and graph-based methods that use the overlap/lapout/consensus approach and the de Bruijn graph approach for assembly. At the end of the paper, we summarize these methods and discuss the future directions of genome assembly.
    Tsinghua Science & Technology 10/2013; 18(5):500-514. DOI:10.1109/TST.2013.6616523
  • Source
    • "De novo genome assembly requires overlapping stretches of read sequences to form a long continuous sequence. For example, many assembler implementations use de Brujin graphs [35] [36] to find solutions to this problem [37] [38] [39] [40] [41]. Parts of the sequence that have a large number of overlapping reads are said to have high coverage and are generally accurately assembled. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Reduced costs and increased speed and accuracy of sequencing can bring the genome-based evaluation of individual disease risk to the bedside. While past efforts have identified a number of actionable mutations, the bulk of genetic risk remains hidden in sequence data. The biggest challenge facing genomic medicine today is the development of new techniques to predict the specifics of a given human phenome (set of all expressed phenotypes) encoded by each individual variome (full set of genome variants) in the context of the given environment. Numerous tools exist for the computational identification of the functional effects of a single variant. However, the pipelines taking advantage of full genomic, exomic, trascriptomic (and other) sequences have only recently become a reality. This review looks at the building of methodologies for predicting "variome"-defined disease risk. It also discusses some of the challenges for incorporating such a pipeline into everyday medical practice.
    Journal of Molecular Biology 08/2013; 431(21). DOI:10.1016/j.jmb.2013.07.038 · 4.33 Impact Factor
  • Source
    • "Though error correction has been a part of the AllPathsLG genome assembler for the past several versions, only recently has a stand-along version of their Python-based error correction module (http://www.broadinstitute.org/software/allpaths-lg/blog/?p=577 and Maccallum et al., 2009), which leverages several of the AllPaths subroutines, become available. With exception to the minimum kmer frequency, which was set to 0 (unique kmers retained in the final corrected dataset), the AllPathsLG error correction software was run using default settings for correcting errors contained within the raw sequencing reads. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of functional genomics, particularly in non-model organisms, has been dramatically improved over the last few years by the use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure, the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on and, while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated and empiric dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, and should be applied to all datasets. A complete collection of commands which will allow for the production of Reptile corrected reads is available at https://github.com/macmanes/error_correction/tree/master/scripts and as File S1.
    PeerJ 07/2013; 1:e113. DOI:10.7717/peerj.113 · 2.10 Impact Factor
Show more