Article

ART: a next-generation sequencing read simulator.

Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.
Bioinformatics (Impact Factor: 4.62). 12/2011; 28(4):593-4. DOI: 10.1093/bioinformatics/btr708
Source: PubMed

ABSTRACT ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. AVAILABILITY: Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.

2 Followers
 · 
305 Views
  • Conference Paper: Supersonic MiB
    [Show abstract] [Hide abstract]
    ABSTRACT: A novel assembly pipeline, MiB, employs Minimum Description Length (MDL), de-Bruijn graphs and Bayesian estimation for reference assisted assembly of the novel genome. In a previous study MiB assembly was compared with nine other assembly algorithms showing significant improvement in results coupled with very large execution times. This correspondence introduces 'Supersonic MiB', an extension to our previous study MiB. Supersonic MiB aims to stimulate the assembly pipeline of MiB showing significant improvement in execution time compared to its predecessor.
    2013 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS); 11/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: A comprehensive catalog of variability in a given species is useful for many important purposes, e.g., designing high density arrays or pinpointing potential mutations of economic or physiological interest. Here we provide a genomewide, worldwide catalog of single nucleotide variants by simultaneously analyzing the shotgun sequence of 128 pigs and five suid outgroups. Despite the high SNP missing rate of some individuals (up to 88%), we retrieved over 48 million high quality variants. Of them, we were able to assess the ancestral allele of more than 39M biallelic SNPs. We found SNPs in 21,455 out of the 25,322 annotated genes in pig assembly 10.2. The annotation showed that more than 40% of the variants were novel variants, not present in dbSNP. Surprisingly, we found a large variability in transition / transversion rate along the genome, which is very well explained (R2=0.79) primarily by genome differences in in CpG content and recombination rate. The number of SNPs per window also varied but was less dependent of known factors such as gene density, missing rate or recombination (R2=0.48). When we divided the samples in four groups, Asian wild boar (ASWB), Asian domestics (ASDM), European wild boar (EUWB) and European domestics (EUDM), we found a marked correlation in allele frequencies between domestics and wild boars within Asia and within Europe, but not across continents, due to the large evolutive distance between pigs of both continents (~1.2 MYA). In general, the porcine species showed a small percentage of SNPs exclusive of each population group. EUWB and EUDM were predicted to harbor a larger fraction of potentially deleterious mutations, according to the SIFT algorithm, than Asian samples, perhaps a result of background selection being less effective due to a lower effective population size in Europe.
    PLoS ONE 01/2015; 10(3):e0118867. DOI:10.1371/journal.pone.0118867 · 3.53 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Background Somatically acquired structure variations (SVs) and copy number variations (CNVs) can induce genetic changes that are directly related to tumor genesis. Somatic SV/CNV detection using next-generation sequencing (NGS) data still faces major challenges introduced by tumor sample characteristics, such as ploidy, heterogeneity, and purity. A simulated cancer genome with known SVs and CNVs can serve as a benchmark for evaluating the performance of existing somatic SV/CNV detection tools and developing new methods. Results SCNVSim is a tool for simulating somatic CNVs and structure variations SVs. Other than multiple types of SV and CNV events, the tool is capable of simulating important features related to tumor samples including aneuploidy, heterogeneity and purity. Conclusions SCNVSim generates the genomes of a cancer cell population with detailed information of copy number status, loss of heterozygosity (LOH), and event break points, which is essential for developing and evaluating somatic CNV and SV detection methods in cancer genomics studies.
    BMC Bioinformatics 02/2015; 16(1). DOI:10.1186/s12859-015-0502-7 · 2.67 Impact Factor

Full-text (2 Sources)

Download
25 Downloads
Available from
Aug 1, 2014