FadE: whole genome methylation analysis for multiple sequencing platforms.

Program in Computational Biology and Bioinformatics, University of Southern California, 1050 Childs Way, RRI 201, Los Angeles, CA 90089, USA and Life Technologies Corporation, 850 Lincoln Centre Drive, Foster City, CA 94404, USA.
Nucleic Acids Research (Impact Factor: 8.81). 09/2012; DOI: 10.1093/nar/gks830
Source: PubMed

ABSTRACT DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of 'C'-'C' alignments covering reference cytosines. Di-base color reads produced by lifetech's SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton-Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It is becoming clear that epigenetic changes are involved in human disease as well as during normal development. A unifying theme of disease epigenetics is defects in phenotypic plasticity--cells' ability to change their behaviour in response to internal or external environmental cues. This model proposes that hereditary disorders of the epigenetic apparatus lead to developmental defects, that cancer epigenetics involves disruption of the stem-cell programme, and that common diseases with late-onset phenotypes involve interactions between the epigenome, the genome and the environment. Increased understanding of epigenetic-disease mechanisms could lead to disease-risk stratification for targeted intervention and to targeted therapies.
    Nature 06/2007; 447(7143):433-40. · 42.35 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The determination of single nucleotide polymorphisms (SNPs) has become faster and more cost effective since the advent of short read data from next generation sequencing platforms such as Roche's 454 Sequencer, Illumina's Solexa platform, and Applied Biosystems SOLiD sequencer. The SOLiD sequencing platform, which is capable of producing more than 6 GB of sequence data in a single run, uses a unique encoding scheme where color reads represent transitions between adjacent nucleotides. The determination of SNPs from color reads usually involves the translation of color alignments to likely nucleotide strings to facilitate the use of tools designed for nucleotide reads. This technique results in the loss of significant information in the color read, producing many incorrect SNP calls, especially if regions exist with dense or adjacent polymorphism. Additionally, color reads align ambiguously and incorrectly more often than nucleotide reads making integrated SNP calling a difficult challenge. We have developed ComB, a SNP calling tool which operates directly in color space, using a Bayesian model to incorporate unique and ambiguous reads to iteratively determine SNP identity. ComB is capable of accurately calling short consecutive nucleotide polymorphisms and densely clustered SNPs; both of which other SNP calling tools fail to identify. ComB, which is capable of using billions of short reads to accurately and efficiently perform whole human genome SNP calling in parallel, is also capable of using sequence data or even integrating sequence and color space data sets. We use real and simulated data to demonstrate that ComB's iterative strategy and recalibration of quality scores allow it to discover more true SNPs while calling fewer false positives than tools which use only color alignments as well as tools which translate color reads to nucleotide strings.
    Journal of computational biology: a journal of computational molecular cell biology 06/2011; 18(6):795-807. · 1.69 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The expression of the insulin-like growth factor 2 (Igf2) and H19 genes is imprinted. Although these neighbouring genes share an enhancer, H19 is expressed only from the maternal allele, and Igf2 only from the paternally inherited allele. A region of paternal-specific methylation upstream of H19 appears to be the site of an epigenetic mark that is required for the imprinting of these genes. A deletion within this region results in loss of imprinting of both H19 and Igf2 (ref. 5). Here we show that this methylated region contains an element that blocks enhancer activity. The activity of this element is dependent upon the vertebrate enhancer-blocking protein CTCF. Methylation of CpGs within the CTCF-binding sites eliminates binding of CTCF in vitro, and deletion of these sites results in loss of enhancer-blocking activity in vivo, thereby allowing gene expression. This CTCF-dependent enhancer-blocking element acts as an insulator. We suggest that it controls imprinting of Igf2. The activity of this insulator is restricted to the maternal allele by specific DNA methylation of the paternal allele. Our results reveal that DNA methylation can control gene expression by modulating enhancer access to the gene promoter through regulation of an enhancer boundary.
    Nature 06/2000; 405(6785):482-5. · 42.35 Impact Factor


Available from
Jun 5, 2014