Comparison of alignment software for genome-wide bisulphite sequence data

Department of Pathology, Dunedin School of Medicine, University of Otago, 270 Great King Street, Dunedin 9054, New Zealand.
Nucleic Acids Research (Impact Factor: 9.11). 02/2012; 40(10):e79. DOI: 10.1093/nar/gks150
Source: PubMed


Recent advances in next generation sequencing (NGS) technology now provide the opportunity to rapidly interrogate the methylation status of the genome. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. We sequenced reduced representation human genomes on the Illumina platform and efficiently mapped and visualized the data with different pipelines and software packages. We examined three pipelines for aligning bisulphite converted sequencing reads and compared their performance. We also comment on pre-processing and quality control of Illumina data. This comparison highlights differences in methods for NGS data processing and provides guidance to advance sequence-based methylation data analysis for molecular biologists.

Download full-text


Available from: Aniruddha Chatterjee,
  • Source
    • "For the liver methylome samples, the quality decreased towards the end of sequenced reads, therefore the reads were hardtrimmed from 100 bp to 65 bp to improve data quality. The adaptor sequences from the reads were removed with the cleanadaptors program of the DMAP package as previously described [8] [22]. The brain methylome dataset (read length = 49 bp) contained negligible levels of adaptor sequences (evaluated with cleanadaptors and FastQC). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Zebrafish (Danio rerio) is a vertebrate model organism that is widely used for studying a plethora of biological questions, including developmental processes, effects of external cues on phenotype, and human disease modeling. DNA methylation is an important epigenetic mechanism that contributes to gene regulation, and is prevalent in all vertebrates. Reduced representation bisulfite sequencing (RRBS) is a cost-effective technique to generate genome-wide DNA methylation maps and has been used in mammalian genomes (e.g., human, mouse and rat) but not in zebrafish. High-resolution DNA methylation data in zebrafish are limited: increased availability of such data will enable us to model and better understand the roles, causes and consequences of changes in DNA methylation. Here we present five high-resolution DNA methylation maps for wild-type zebrafish brain (two pooled male and two pooled female methylomes) and liver. These data were generated using the RRBS technique (includes 1.43 million CpG sites of zebrafish genome) on the Illumina HiSeq platform. Alignment to the reference genome was performed using the Zv9 genome assembly. To our knowledge, these datasets are the only RRBS datasets and base-resolution DNA methylation data available at this time for zebrafish brain and liver. These datasets could serve as a resource for future studies to document the functional role of DNA methylation in zebrafish. In addition, these datasets could be used as controls while performing analysis on treated samples.
    Genomics Data 12/2014; 2(December 2014):342–344. DOI:10.1016/j.gdata.2014.10.008
  • Source
    • "rovic , 2007 , 2009 ) , MethyLight ( Eads et al . , 2000 ) and epiTYPER ( reviewed in McLean et al . , 2012 ) . Next generation sequencing has meant that DNA methylation can now be interrogated on a genome wide scale by shot‐gun sequencing ( Cokus et al . , 2008 ; Lister et al . , 2009 ) or by reduced representation bisulfite sequencing ( RRBS ) ( Chatterjee et al . , 2012 ) . DNA methylation can also be interrogated by restriction enzymes that target methylated DNA ( i . e . , Guo et al . , 2011 ) and this method can also detect 5 0 hydroxymethylcy tosine ( Davis and Vaisvila , 2011 ) . Antibodies against 5 0 methyl cytosine and 5 0 hydroxymethylcytosine can be used to enrich for methylated regions of th"
    [Show abstract] [Hide abstract]
    ABSTRACT: Epigenetic mechanisms are proposed as an important way in which the genome responds to the environment. Epigenetic marks, including DNA methylation and Histone modifications, can be triggered by environmental effects, and lead to permanent changes in gene expression, affecting the phenotype of an organism. Epigenetic mechanisms have been proposed as key in plasticity, allowing environmental exposure to shape future gene expression. While we are beginning to understand how these mechanisms have roles in human biology and disease, we have little understanding of their roles and impacts on ecology and evolution. In this review, we discuss different types of epigenetic marks, their roles in gene expression and plasticity, methods for assaying epigenetic changes, and point out the future advances we require to understand fully the impact of this field. J. Exp. Zool. (Mol. Dev. Evol.) 9999B: 1–13, 2014. © 2014 Wiley Periodicals, Inc.
    Journal of Experimental Zoology Part B Molecular and Developmental Evolution 06/2014; 322(4). DOI:10.1002/jez.b.22571 · 2.31 Impact Factor
  • Source
    • "Fonseca et al. [25] classified the tools according to their indexing techniques and supported features such as mismatches, splicing, indels, gapped alignment, and minimum and maximum of read lengths. Chatterjee et al. [26] compared Bismark, BSMAP, and RMAPBS in terms of uniquely mapped reads percentages, multiple mapping percentages, CPU running time, and reads mapped per second. They also pointed out that trimming the data before aligning could improve mapping efficiency. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data.
    Advances in Bioinformatics 04/2014; 2014:472045. DOI:10.1155/2014/472045
Show more