Assembly algorithms for next-generation sequencing data

J. Craig Venter Institute, Rockville, MD 20850-3343, USA.
Genomics (Impact Factor: 2.79). 03/2010; 95(6):315-27. DOI: 10.1016/j.ygeno.2010.03.001
Source: PubMed

ABSTRACT The emergence of next-generation sequencing platforms led to resurgence of research in whole-genome shotgun assembly algorithms and software. DNA sequencing data from the Roche 454, Illumina/Solexa, and ABI SOLiD platforms typically present shorter read lengths, higher coverage, and different error profiles compared with Sanger sequencing data. Since 2005, several assembly software packages have been created or revised specifically for de novo assembly of next-generation sequencing data. This review summarizes and compares the published descriptions of packages named SSAKE, SHARCGS, VCAKE, Newbler, Celera Assembler, Euler, Velvet, ABySS, AllPaths, and SOAPdenovo. More generally, it compares the two standard methods known as the de Bruijn graph approach and the overlap/layout/consensus approach to assembly.

  • Source
    [Show description] [Hide description]
    DESCRIPTION: Because of the lack, faced by the Instituto Universitario de Enfermedades Tropicales y Salud Pública de Canarias (University Institute of Tropical Diseases and Public Health of the Canary Islands), of a platform properly configured for genome assembly of data obtained by massive DNA sequencing using the Ion Torrent specific technology, the solution IonGAP is presented. IonGAP is a public Web platform capable of performing both the assembly process and subsequent preliminary data analyses, in an automated, user-friendly way. There are many examples of public Web tools designed to help researchers in particular stages of genomic data analysis; however, the originality of IonGAP lies in the ability to provide a simple, automated use of a whole collection of genome assembly and analysis applications, specifically configured to handle microbial genomic data generated by using the Ion Torrent chemistry.
  • [Show abstract] [Hide abstract]
    ABSTRACT: For high-level molecular phylogenies, a comprehensive sampling design is a key factor for not only improving inferential accuracy, but also for maximizing the explanatory power of the resulting phylogeny. Two standing problems in molecular phylogenies are the unstable placements of some deep and long branches, and the phylogenetic relationships shown by robust supported clades conflict with recognized knowledge. Empirical and theoretical studies suggest that increasing taxon sampling is expected to ameliorate, if not resolve, both problems; however, neither the current taxonomic system nor the established phylogeny provides sufficient information to guide additional sampling design. We examined the phylogeny of the spider family Linyphiidae, and selected ingroup species based on epigynal morphology, which can be reconstructed in a phylogenetic context. Our analyses resulted in seven robustly supported clades within linyphiids. The placements of four deep and long branches are sensitive to variations in both outgroup and ingroup sampling, suggesting the possibility of long branch attraction artifacts. Results of ancestral state reconstruction indicate that successive state transformations of the epigynal plate are associated with early cladogenetic events in linyphiid diversification. Representatives of different subfamilies were mixed together within well supported clades and examination revealed that their defining characters, as per traditional taxonomy, are homoplastic. Furthermore, our results demonstrated that increased taxon sampling produced a more informative framework, which in turn helps to study character evolution and interpret the relationships among linyphiid lineages. Additional defining characters are needed to revise the linyphiid taxonomic system based on our phylogenetic hypothesis. Copyright © 2015 Elsevier Inc. All rights reserved.
    Molecular Phylogenetics and Evolution 05/2015; DOI:10.1016/j.ympev.2015.05.005 · 4.02 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background Whole genome sequences (WGS) have proliferated as sequencing technology continues to improve and costs decline. While many WGS of model or domestic organisms have been produced, a growing number of non-model species are also being sequenced. In the absence of a reference, construction of a genome sequence necessitates de novo assembly which may be beyond the ability of many labs due to the large volumes of raw sequence data and extensive bioinformatics required. In contrast, the presence of a reference WGS allows for alignment which is more tractable than assembly. Recent work has highlighted that the reference need not come from the same species, potentially enabling a wide array of species WGS to be constructed using cross-species alignment. Here we report on the creation a draft WGS from a single bighorn sheep (Ovis canadensis) using alignment to the closely related domestic sheep (Ovis aries). Results Two sequencing libraries on SOLiD platforms yielded over 865 million reads, and combined alignment to the domestic sheep reference resulted in a nearly complete sequence (95% coverage of the reference) at an average of 12x read depth (104 SD). From this we discovered over 15 million variants and annotated them relative to the domestic sheep reference. We then conducted an enrichment analysis of those SNPs showing fixed differences between the reference and sequenced individual and found significant differences in a number of gene ontology (GO) terms, including those associated with reproduction, muscle properties, and bone deposition. Conclusion Our results demonstrate that cross-species alignment enables the creation of novel WGS for non-model organisms. The bighorn sheep WGS will provide a resource for future resequencing studies or comparative genomics. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1618-x) contains supplementary material, which is available to authorized users.
    BMC Genomics 05/2015; 16(1). DOI:10.1186/s12864-015-1618-x · 4.04 Impact Factor


Available from