Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing

Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza Houston, TX 77030, USA.
BMC Bioinformatics (Impact Factor: 2.67). 11/2010; 11(1):572. DOI: 10.1186/1471-2105-11-572
Source: PubMed

ABSTRACT Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing.
Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms.
We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.

Download full-text


Available from: Ronald Alan Harris, Jul 08, 2015
  • Source
    • "However, the massive amount of data generated by NGS poses a great bioinformatics challenge in terms of data processing and analysis. Recently, a number of computational methods have been developed for mining the DNA methylation data generated by NGS, such as RRBSMAP [Xi et al., 2012], BS Seeker [Chen et al., 2010], Bismark [Krueger and Andrews, 2011], PASH [Coarfa et al., 2010], RMAP [Smith et al., 2009], BRATbw [Harris et al., 2012], SAAP-RRBS [Sun et al., 2012], methylKit [Akalin et al., 2012], Meth Tools 2.0 [Grunau et al., 2000], Methyl- Analyzer [Xin et al., 2011], BSmooth [Hansen et al., 2012], Epi- Explorer [Halachev et al., 2012], GBSA [Benoukraf et al., 2013], and QDMRs [Zhang et al., 2011]. Among them, RRBSMAP is a short-read alignment tool for handling RRBS data [Xi et al., 2012]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In reduced representation bisulfite sequencing (RRBS), genomic DNA is digested with the restriction enzyme and then subjected to next-generation sequencing, which enables detection and quantification of DNA methylation at whole-genome scale with low-cost. However, the data processing, interpretation and analysis of the huge amounts of data generated pose a bioinformatics challenge. We developed RRBS-Analyser, a comprehensive genome-scale DNA methylation analysis server based on RRBS data. RRBS-Analyser can assess sequencing quality, generate detailed statistical information, align the bisulfite-treated short reads to reference genome, identify and annotate the methylcytosines and associate them with different genomic features in CG, CHG and CHH content. RRBS-Analyser supports detection, annotation and visualization of differentially methylated regions for multiple samples from nine reference organisms. Moreover, RRBS-Analyser provides researchers with detailed annotation of DMR-containing genes, which will greatly aid subsequent studies. The input of RRBS-Analyser can be raw FASTQ reads, generic SAM format or self-defined format containing individual 5mC sites. RRBS-Analyser can be widely used by researchers wanting to unravel the complexities of DNA methylome in the epigenetic community. RRBS-Analyser is freely available at
    Human Mutation 12/2013; 34(12). DOI:10.1002/humu.22444 · 5.05 Impact Factor
  • Source
    • "This not only causes problems in library construction and cluster formation on a sequencing plate, but also more profoundly affects alignment of bisulfite reads to the genome, i.e., mapping. Several algorithms have been developed to improve mapping of WGBS data (Xi and Li 2009; Coarfa et al. 2010; Krueger and Andrews 2011; Frith et al. 2012; Otto et al. 2012), but the problem remains not entirely solved. The confidence of mapping WGBS reads is generally lower than mapping standard, non-bisulfite-converted reads. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advancements in sequencing-based DNA methylation profiling methods provide an unprecedented opportunity to map complete DNA methylomes. These include whole genome bisulfite sequencing (WGBS, MethylC-seq or BS-seq), Reduced-Representation Bisulfite-Sequencing (RRBS), and enrichment-based methods such as MeDIP-seq, MBD-seq and MRE-seq. These methods yield largely comparable results, but differ significantly in extent of genomic CpG coverage, resolution, quantitative accuracy, and cost, at least while using current algorithms to interrogate the data. None of these existing methods provides single-CpG resolution, comprehensive genome-wide coverage, and cost feasibility for a typical laboratory. We introduce methylCRF, a novel Conditional Random Fields-based algorithm that integrates methylated DNA immunoprecipitation (MeDIP-seq) and methylation-sensitive restriction enzyme (MRE-seq) sequencing data to predict DNA methylation levels at single CpG resolution. Our method is a combined computational and experimental strategy to produce DNA methylomes of all 28 million CpGs in the human genome for a fraction (<10%) of the cost of whole genome bisulfite sequencing methods. MethylCRF was benchmarked for accuracy against Infinium arrays, RRBS, WGBS sequencing and locus specific-bisulfite sequencing performed on the same embryonic stem cell line. MethylCRF transformation of MeDIP-seq/MRE-seq was equivalent to a biological replicate of WGBS in quantification, coverage and resolution. We used conventional bisulfite conversion, PCR, cloning and sequencing to validate loci where our predictions do not agree with whole genome bisulfite data, and in 11 out of 12 cases methylCRF predictions of methylation level agree better with validated results than does whole genome bisulfite sequencing. Therefore, methylCRF transformation of MeDIP-seq/MRE-seq data provides an accurate, inexpensive and widely accessible strategy to create full DNA methylomes.
    Genome Research 06/2013; DOI:10.1101/gr.152231.112 · 13.85 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at
    PLoS ONE 01/2011; 6(1):e16327. DOI:10.1371/journal.pone.0016327 · 3.53 Impact Factor