FadE: Whole genome methylation analysis for multiple sequencing platforms

Program in Computational Biology and Bioinformatics, University of Southern California, 1050 Childs Way, RRI 201, Los Angeles, CA 90089, USA and Life Technologies Corporation, 850 Lincoln Centre Drive, Foster City, CA 94404, USA.
Nucleic Acids Research (Impact Factor: 9.11). 09/2012; 41(1). DOI: 10.1093/nar/gks830
Source: PubMed


DNA methylation plays a central role in genomic regulation and disease. Sodium bisulfite treatment (SBT) causes unmethylated cytosines to be sequenced as thymine, which allows methylation levels to reflected in the number of 'C'-'C' alignments covering reference cytosines. Di-base color reads produced by lifetech's SOLiD sequencer provide unreliable results when translated to bases because single sequencing errors effect the downstream sequence. We describe FadE, an algorithm to accurately determine genome-wide methylation rates directly in color or nucleotide space. FadE uses SBT unmethylated and untreated data to determine background error rates and incorporate them into a model which uses Newton-Raphson optimization to estimate the methylation rate and provide a credible interval describing its distribution at every reference cytosine. We sequenced two slides of human fibroblast cell-line bisulfite-converted fragment library with the SOLiD sequencer to investigate genome-wide methylation levels. FadE reported widespread differences in methylation levels across CpG islands and a large number of differentially methylated regions adjacent to genes which compares favorably to the results of an investigation on the same cell-line using nucleotide-space reads at higher coverage levels, suggesting that FadE is an accurate method to estimate genome-wide methylation with color or nucleotide reads. http://code.google.com/p/fade/.


Available from: Ting Chen