Comparison of alignment software for genome-wide bisulphite sequence data

Department of Pathology, Dunedin School of Medicine, University of Otago, 270 Great King Street, Dunedin 9054, New Zealand.
Nucleic Acids Research (Impact Factor: 8.81). 02/2012; 40(10):e79. DOI: 10.1093/nar/gks150
Source: PubMed

ABSTRACT Recent advances in next generation sequencing (NGS) technology now provide the opportunity to rapidly interrogate the methylation status of the genome. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. We sequenced reduced representation human genomes on the Illumina platform and efficiently mapped and visualized the data with different pipelines and software packages. We examined three pipelines for aligning bisulphite converted sequencing reads and compared their performance. We also comment on pre-processing and quality control of Illumina data. This comparison highlights differences in methods for NGS data processing and provides guidance to advance sequence-based methylation data analysis for molecular biologists.

Download full-text


Available from: Aniruddha Chatterjee, Jun 18, 2015

Click to see the full-text of:

Article: Comparison of alignment software for genome-wide bisulphite sequence data

3 MB

See full-text
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Although formalin fixed paraffin embedded (FFPE) tissue is a major biological source in cancer research, it is challenging to work with due to macromolecular fragmentation and nucleic acid crosslinking. Therefore, it is important to characterise the quality of data that can be obtained from FFPE samples. We have compared three independent platforms (next generation sequencing, microarray and NanoString) for profiling microRNAs (miRNAs) using clinical FFPE samples from hepatoblastoma (HB) patients. The number of detected miRNAs ranged from 228 to 345 (median= 294) using the next generation sequencing platform, whereas 79 to 125 (median= 112) miRNAs were identified using microarrays in three HB samples, including technical replicates. NanoString identified 299 to 372 miRNAs in two samples. Between the platforms, we observed high reproducibility and significant levels of shared detection. However, for commonly detected miRNAs, a strong correlation between platforms was not observed. Analysis of 10 additional HB samples with NanoString identified significantly overlapping miRNA expression profiles, and an alternative pattern was identified in a poorly differentiated HB with an aggressive phenotype. This investigation serves as a roadmap for future studies investigating miRNA expression in clinical FFPE samples, and as a guideline for the selection of an appropriate platform. MicroRNAs (miRNAs) are a large group of small non-protein coding RNAs which are important regulators of gene expression 1,2. This group of small RNAs are expressed in normal cells at all stages of development, as well as in cancer cells. A number of miRNAs are overexpressed in cancer and have been shown to function as oncogenes. These " oncomiRs " promote cancer development by negatively regulating tumour suppressor genes, as well as genes controlling cell differentiation and apoptosis. Other types of miRNAs are underexpressed in cancer, and frequently function as tumour suppressor genes 3,4
    Scientific Reports 06/2015; 3;5::10438. DOI:10.1038/srep10438 · 5.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background. Large-scale bisulfite treatment and short reads sequencing technology allow comprehensive estimation of methylation states of Cs in the genomes of different tissues, cell types, and developmental stages. Accurate characterization of DNA methylation is essential for understanding genotype phenotype association, gene and environment interaction, diseases, and cancer. Aligning bisulfite short reads to a reference genome has been a challenging task. We compared five bisulfite short read mapping tools, BSMAP, Bismark, BS-Seeker, BiSS, and BRAT-BW, representing two classes of mapping algorithms (hash table and suffix/prefix tries). We examined their mapping efficiency (i.e., the percentage of reads that can be mapped to the genomes), usability, running time, and effects of changing default parameter settings using both real and simulated reads. We also investigated how preprocessing data might affect mapping efficiency. Conclusion. Among the five programs compared, in terms of mapping efficiency, Bismark performs the best on the real data, followed by BiSS, BSMAP, and finally BRAT-BW and BS-Seeker with very similar performance. If CPU time is not a constraint, Bismark is a good choice of program for mapping bisulfite treated short reads. Data quality impacts a great deal mapping efficiency. Although increasing the number of mismatches allowed can increase mapping efficiency, it not only significantly slows down the program, but also runs the risk of having increased false positives. Therefore, users should carefully set the related parameters depending on the quality of their sequencing data.
    Advances in Bioinformatics 04/2014; 2014:472045. DOI:10.1155/2014/472045
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genetic changes occurring in different stages of pre-cancer lesions reflect causal events initiating and promoting the progression to cancer. Co-existing pre-cancerous lesions including low- and high-grade squamous intraepithelial lesion (LGSIL and HGSIL), and adjacent "normal" cervical epithelium from six formalin-fixed paraffin-embedded samples were selected. Tissues from these 18 samples were isolated using laser-capture microdissection, RNA was extracted and sequenced. RNA-sequencing generated 2.4 billion raw reads in 18 samples, of which ~50.1% mapped to known and annotated genes in the human genome. There were 40 genes up-regulated and 3 down-regulated (normal to LGSIL) in at least one-third of the sample pairs (same direction and FDR p < 0.05) including S100A7 and KLK6. Previous studies have shown that S110A7 and KLK7 are up-regulated in several other cancers, whereas CCL18, CFTR, and SLC6A14, also differentially expressed in two samples, are up-regulated specifically in cervical cancer. These differentially expressed genes in normal to LGSIL progression were enriched in pathways related to epithelial cell differentiation, keratinocyte differentiation, peptidase, and extracellular activities. In progression from LGSIL to HGSIL, two genes were up-regulated and five down-regulated in at least two samples. Further investigations using co-existing samples, which account for all internal confounders, will provide insights to better understand progression of cervical pre-cancer.
    Frontiers in Oncology 11/2014; 4:339. DOI:10.3389/fonc.2014.00339