[Show abstract][Hide abstract]ABSTRACT: Background
Many different methods exist to adjust for variability in cell-type mixture proportions when analyzing DNA methylation studies. Here we present the result of an extensive simulation study, built on cell-separated DNA methylation profiles from Illumina Infinium 450K methylation data, to compare the performance of eight methods including the most commonly used approaches.
We designed a rich multi-layered simulation containing a set of probes with true associations with either binary or continuous phenotypes, confounding by cell type, variability in means and standard deviations for population parameters, additional variability at the level of an individual cell-type-specific sample, and variability in the mixture proportions across samples. Performance varied quite substantially across methods and simulations. In particular, the number of false positives was sometimes unrealistically high, indicating limited ability to discriminate the true signals from those appearing significant through confounding. Methods that filtered probes had consequently poor power. QQ plots of p values across all tested probes showed that adjustments did not always improve the distribution. The same methods were used to examine associations between smoking and methylation data from a case–control study of colorectal cancer, and we also explored the effect of cell-type adjustments on associations between rheumatoid arthritis cases and controls.
We recommend surrogate variable analysis for cell-type mixture adjustment since performance was stable under all our simulated scenarios.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-0935-y) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract]ABSTRACT: Aim:
To identify regions of aberrant DNA methylation in acute lymphoblastic leukemia (ALL) cells of different subtypes on a genome-wide scale.
Materials & methods:
Whole-genome bisulfite sequencing (WGBS) was used to determine the DNA methylation levels in cells from four pediatric ALL patients of different subtypes. The findings were confirmed by 450k DNA methylation arrays in a large patient set.
Compared with mature B or T cells WGBS detected on average 82,000 differentially methylated regions per patient. Differentially methylated regions are enriched to CpG poor regions, active enhancers and transcriptional start sites. We also identified approximately 8000 CpG islands with variable intermediate DNA methylation that seems to occur as a result of stochastic de novo methylation.
WGBS provides an unbiased view and novel insights into the DNA methylome of ALL cells.
[Show abstract][Hide abstract]ABSTRACT: Background Genetic determinants of stroke, the leading neurological cause of death and disability, are poorly understood and have seldom been explored in the general population. Our aim was to identify additional loci for stroke by doing a meta-analysis of genome-wide association studies. Methods For the discovery sample, we did a genome-wide analysis of common genetic variants associated with incident stroke risk in 18 population-based cohorts comprising 84 961 participants, of whom 4348 had stroke. Stroke diagnosis was ascertained and validated by the study investigators. Mean age at stroke ranged from 45.8 years to 76.4 years, and data collection in the studies took place between 1948 and 2013. We did validation analyses for variants yielding a significant association (at p<5 x10(-6)) with all-stroke, ischaemic stroke, cardioembolic ischaemic stroke, or non-cardioembolic ischaemic stroke in the largest available cross-sectional studies (70 804 participants, of whom 19 816 had stroke). Summary-level results of discovery and follow-up stages were combined using inverse-variance weighted fixed effects meta-analysis, and in-silico lookups were done in stroke subtypes. For genome-wide significant findings (at p<5 x10(-8)), we explored associations with additional cerebrovascular phenotypes and did functional experiments using conditional (inducible) deletion of the probable causal gene in mice. We also studied the expression of orthologs of this probable causal gene and its effects on cerebral vasculature in zebrafish mutants. Findings We replicated seven of eight known loci associated with risk for ischaemic stroke, and identified a novel locus at chromosome 6p25 (rs12204590, near FOXF2) associated with risk of all-stroke (odds ratio [OR] 1.08, 95% CI 1.05-1-12, p=1.48 x10(-8); minor allele frequency 21%). The rs12204590 stroke risk allele was also associated with increased MRI-defined burden of white matter hyperintensity a marker of cerebral small vessel disease in stroke-free adults (n=21079; p=0.0025). Consistently, young patients (aged 2-32 years) with segmental deletions of FOXF2 showed an extensive burden of white matter hyperintensity. Deletion of Foxf2 in adult mice resulted in cerebral infarction, reactive gliosis, and microhaemorrhage. The orthologs of FOXF2 in zebrafish (fox2b and foxf2a) are expressed in brain pericytes and mutant foxf2b(-/-) cerebral vessels show decreased smooth muscle cell and pericyte coverage. Interpretation We identified common variants near FOXF2 that are associated with increased stroke susceptibility. Epidemiological and experimental data suggest that FOXF2 mediates this association, potentially via differentiation defects of cerebral vascular mural cells. Further expression studies in appropriate human tissues, and further functional experiments with long follow-up periods are needed to fully understand the underlying mechanisms.
[Show abstract][Hide abstract]ABSTRACT: Vitamin B12 (cobalamin, Cbl) cofactors adenosylcobalamin (AdoCbl) and methylcobalamin (MeCbl) are required for the activity of the enzymes methylmalonyl CoA mutase (MCM) and methionine synthase (MS). Inborn errors of Cbl metabolism are rare Mendelian disorders associated with hematological and neurological manifestations, and elevations of methylmalonic acid and/or homocysteine in the blood and urine. We describe a patient whose fibroblasts had decreased functional activity of MCM and MS and decreased synthesis of AdoCbl and MeCbl (3.4% and 1.0% of cellular cobalamin, respectively). The defect in cultured patient fibroblasts complemented those from all known complementation groups. Patient cells accumulated transcobalamin-bound-Cbl, a complex which usually dissociates in the lysosome to release free Cbl. Whole exome sequencing identified putative disease-causing variants c.851T>G (p.L284*) and c.1019C>T (p.T340I) in transcription factor ZNF143. Proximity biotinylation analysis confirmed the interaction between ZNF143 and HCFC1, a protein that regulates expression of the cobalamin trafficking enzyme MMACHC. qRT-PCR analysis revealed low MMACHC expression levels both in patient fibroblasts, and in control fibroblasts incubated with ZNF143 siRNA. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract]ABSTRACT: Context: Congenital hypothyroidism due to thyroid dysgenesis (CHTD) is a disorder with a prevalence of one in 4,000 live births, the cause of which remains unknown. The most common diagnostic category is thyroid ectopy, which occurs in up to 80% of CHTD cases. CHTD is predominantly not inherited and has a high discordance rate (>92%) between monozygotic (MZ) twins. The sporadic nature of CHTD might be explained by somatic events such as autosomal monoallelic expression (AME), given that genes expressed in a monoallelic way are more vulnerable to otherwise benign heterozygous genetic or epigenetic mutations.
Objective: To search for complete (90%) AME in normal and dysgenetic thyroid tissues.
Methods: Aggregated analysis of whole-exome and bulk RNA sequencing performed on two ectopic thyroids, four normal thyroids and the human thyroid cell line Nthy-ori.
Results: A median of 5,062 (range 2,081-5,270) genes per sample showed sufficient numbers of heterozygous SNPs to be informative. The median monoallelic expression represented 22 (range 16-32) of the informative genes for each thyroid sample. Examples of genes displaying AME are FCGBP, ZNF331, USP10, BCLAF1 and some HLA genes; these genes are involved in epithelial-mesenchymal transition, cell migration, cancer and immunity.
Conclusions: AME may account for the high discordance rate observed between MZ twins and for the sporadic nature of CHTD. Our findings have also implications for other pathologies including cancers and autoimmune disorders of the thyroid.
Full-text Article · Apr 2016 · Thyroid: official journal of the American Thyroid Association
[Show abstract][Hide abstract]ABSTRACT: Background:
Asthma and allergic rhinitis (AR) are common allergic comorbidities with a strong genetic component in which epigenetic mechanisms might be involved.
We aimed to identify novel risk loci for asthma and AR while accounting for parent-of-origin effect.
We performed a series of genetic analyses, taking into account the parent-of-origin effect in families ascertained through asthma: (1) genome-wide linkage scan of asthma and AR in 615 European families, (2) association analysis with 1233 single nucleotide polymorphisms (SNPs) covering the significant linkage region in 162 French Epidemiological Study on the Genetics and Environment of Asthma families with replication in 154 Canadian Saguenay-Lac-Saint-Jean asthma study families, and (3) association analysis of disease and significant SNPs with DNA methylation (DNAm) at CpG sites in 40 Saguenay-Lac-Saint-Jean asthma study families.
We detected a significant paternal linkage of the 4q35 region to asthma and allergic rhinitis comorbidity (AAR; P = 7.2 × 10(-5)). Association analysis in this region showed strong evidence for the effect of the paternally inherited G allele of rs10009104 on AAR (P = 1.1 × 10(-5), reaching the multiple-testing corrected threshold). This paternally inherited allele was also significantly associated with DNAm levels at the cg02303933 site (P = 1.7 × 10(-4)). Differential DNAm at this site was found to mediate the identified SNP-AAR association.
By integrating genetic and epigenetic data, we identified that a differentially methylated CpG site within the melatonin receptor 1A (MTNR1A) gene mediates the effect of a paternally transmitted genetic variant on the comorbidity of asthma and AR. This study provides a novel insight into the role of epigenetic mechanisms in patients with allergic respiratory diseases.
Article · Mar 2016 · The Journal of allergy and clinical immunology
[Show abstract][Hide abstract]ABSTRACT: Background:
CpG methylation variation is involved in human trait formation and disease susceptibility. Analyses within populations have been biased towards CpG-dense regions through the application of targeted arrays. We generate whole-genome bisulfite sequencing data for approximately 30 adipose and blood samples from monozygotic and dizygotic twins for the characterization of non-genetic and genetic effects at single-site resolution.
Purely invariable CpGs display a bimodal distribution with enrichment of unmethylated CpGs and depletion of fully methylated CpGs in promoter and enhancer regions. Population-variable CpGs account for approximately 15-20 % of total CpGs per tissue, are enriched in enhancer-associated regions and depleted in promoters, and single nucleotide polymorphisms at CpGs are a frequent confounder of extreme methylation variation. Differential methylation is primarily non-genetic in origin, with non-shared environment accounting for most of the variance. These non-genetic effects are mainly tissue-specific. Tobacco smoking is associated with differential methylation in blood with no evidence of this exposure impacting cell counts. Opposite to non-genetic effects, genetic effects of CpG methylation are shared across tissues and thus limit inter-tissue epigenetic drift. CpH methylation is rare, and shows similar characteristics of variation patterns as CpGs.
Our study highlights the utility of low pass whole-genome bisulfite sequencing in identifying methylome variation beyond promoter regions, and suggests that targeting the population dynamic methylome of tissues requires assessment of understudied intergenic CpGs distal to gene promoters to reveal the full extent of inter-individual variation.
[Show abstract][Hide abstract]ABSTRACT: Neuromyelitis optica (NMO) is rare in Finland. To identify rare genetic variants contributing to NMO risk we performed whole exome, HLA and regulatory region sequencing in all ascertained cases during 2005-2013 (n=5) in a Southern Finnish population of 1.6 million. There were no rare variant shared by all patients. Four missense variants were shared by two patients in C3ORF20, PDZD2, C5ORF47 and ZNF606. Another PDZD2 variant was found in a third patient. In the non-coding sequence two predictably functional rare variants were shared by two patients. Our results do not support a homogeneous genetic etiology of NMO in Finland.
[Show abstract][Hide abstract]ABSTRACT: Motivation:
DNA methylation patterns are well known to vary substantially across cell types or tissues. Hence, existing normalization methods may not be optimal if they do not take this into account. We therefore present a new R package for normalization of data from the Illumina Infinium Human Methylation450 BeadChip (Illumina 450K) built on the concepts in the recently published funNorm method (Fortin et al. 2014), and introducing cell-type or tissue-type flexibility.
funtooNorm is relevant for data sets containing samples from two or more cell or tissue types. A visual display of cross-validated errors informs the choice of the optimal number of components in the normalization. Benefits of cell (tissue)-specific normalization are demonstrated in three data sets. Improvement can be substantial; it is strikingly better on chromosome X, where methylation patterns have unique inter-tissue variability.
Availability and implementation:
An R package is available at https://github.com/GreenwoodLab/funtooNorm, and has been submitted to Bioconductor at http://bioconductor.org.
email@example.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract]ABSTRACT: Supplementary Methods funtooNorm normalization method: Let í µí± (í µí±xí µí° ¶) represent a matrix of summary control-probe data for í µí± samples and í µí° ¶ control probe signals, where there is a column for the average log signal from each control probe type and each colour (red, green). We create a larger matrix, í µí± * , by adding additional columns representing the interactions between the control probe summaries and cell type indicators. For example, if there are 3 cell types, then the matrix í µí± * will have 4í µí° ¶ columns: the original matrix, as well as all interactions with 3 cell-type indicator variables. That is, í µí± * = � í µí± í µí± (1) í µí± (2) … í µí± (í µí±) � where í µí± (í µí±¡) represents the matrix í µí± multiplied by an indicator for cell type í µí±¡, so that all that rows from samples that are not cell type í µí±¡ are zeros. The user can then choose whether to fit principal component regressions (PCR; as in the funNorm algorithm (Fortin, et al., 2014)), or partial least squares regressions (PLS)(Tenenhaus, 1998) predicting a series of quantiles of the A and B signals from the Illumina 450 BeadChip for each sample using this augmented í µí± * as the covariates. As in funNorm, these models are fit separately for probe type I red, type I green, and type II. We fit models at 529 quantiles: every 0.002 nd percentile plus a slightly finer grid in the tails of the distributions. The augmented covariate matrix containing interactions with cell-type or tissue-type indicators allows the relationship between quantiles and control probes to be cell-(or tissue-) type specific, hence implementing additional flexibility. As in funNorm, predictions for signals A and B are obtained for all quantiles by linear interpolation between the quantile fits. An important element of any PLS or PCR model is the number of components needed. funtooNorm includes a graphical display of cross-validated errors so that an appropriate number of components can be chosen (see Figure 1 and Supplemental Figure 5). All results except for Supplemental Figure 5 are based on 4 components and PCR; Supplemental Figure 5 demonstrates cross-validation results for PLS (with 4 components). The data for the 10-fold cross-validation is separately partitioned at each quantile, hence the plots are quite noisy. Measures of agreement between replicates:
[Show abstract][Hide abstract]ABSTRACT: DNA methylation is an epigenetic mark thought to be robust to environmental perturbations on a short time scale. Here, we challenge that view by demonstrating that the infection of human dendritic cells (DCs) with a live pathogenic bacteria is associated with rapid and active demethylation at thousands of loci, independent of cell division. We performed an integrated analysis of data on genome-wide DNA methylation, histone mark patterns, chromatin accessibility, and gene expression, before and after infection. We found that infection-induced demethylation rarely occurs at promoter regions and instead localizes to distal enhancer elements, including those that regulate the activation of key immune transcription factors. Active demethylation is associated with extensive epigenetic remodeling, including the gain of histone activation marks and increased chromatin accessibility, and is strongly predictive of changes in the expression levels of nearby genes. Collectively, our observations show that active, rapid changes in DNA methylation in enhancers play a previously unappreciated role in regulating the transcriptional response to infection, even in non-proliferating cells.
[Show abstract][Hide abstract]ABSTRACT: The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10-14), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10-11; ncases = 98,742 and ncontrols = 409,511). Using an En1cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10-11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.
[Show abstract][Hide abstract]ABSTRACT: Dietary folate is a major source of methyl groups required for DNA methylation, an epigenetic modification that is actively maintained and remodelled during spermatogenesis. While high dose folic acid supplementation (up to ten times the daily recommended dose) has been shown to improve sperm parameters in infertile men, the effects of supplementation on the sperm epigenome are unknown. To assess the impact of six months of high dose folic acid supplementation on the sperm epigenome, we studied 30 men with idiopathic infertility. Blood folate concentrations increased significantly after supplementation with no significant improvements in sperm parameters. Methylation levels of the differentially methylated regions of several imprinted loci (H19, DLK1/GTL2, MEST, SNRPN, PLAGL1, KCNQ1OT1) were normal both before and after supplementation. Reduced representation bisulfite sequencing (RRBS) revealed a significant global loss of methylation across different regions of the sperm genome. The most marked loss of DNA methylation was found in sperm from patients homozygous for the methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism, a common polymorphism in a key enzyme required for folate metabolism. RRBS analysis also showed that most of the differentially methylated tiles were located in DNA repeats, low CpG density and intergenic regions. Ingenuity Pathway Analysis revealed that methylation of promoter regions was altered in several genes involved in cancer and neurobehavioral disorders including CBFA2T3, PTPN6, COL18A1, ALDH2, UBE4B, ERBB2, GABRB3, CNTNAP4 and NIPA1. Our data reveal alterations of the human sperm epigenome associated with high dose folic acid supplementation, effects that were exacerbated by a common polymorphism in MTHFR.
Full-text Article · Aug 2015 · Human Molecular Genetics
[Show abstract][Hide abstract]ABSTRACT: Large-scale epigenome mapping by the NIH Roadmap Epigenomics Project, the ENCODE Consortium and the International Human Epigenome Consortium (IHEC) produces genome-wide DNA methylation data at one base-pair resolution. We examine how such data can be made open-access while balancing appropriate interpretation and genomic privacy. We propose guidelines for data release that both reduce ambiguity in the interpretation of open-access data and limit immediate access to genetic variation data that are made available through controlled access.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0723-0) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract]ABSTRACT: Anaplastic oligodendroglioma (AO) are rare primary brain tumours that are generally incurable, with heterogeneous prognosis and few treatment targets identified. Most oligodendrogliomas have chromosomes 1p/19q co-deletion and an IDH mutation. Here we analysed 51 AO by whole-exome sequencing, identifying previously reported frequent somatic mutations in CIC and FUBP1. We also identified recurrent mutations in TCF12 and in an additional series of 83 AO. Overall, 7.5% of AO are mutated for TCF12, which encodes an oligodendrocyte-related transcription factor. Eighty percent of TCF12 mutations identified were in either the bHLH domain, which is important for TCF12 function as a transcription factor, or were frameshift mutations leading to TCF12 truncated for this domain. We show that these mutations compromise TCF12 transcriptional activity and are associated with a more aggressive tumour type. Our analysis provides further insights into the unique and shared pathways driving AO.
Full-text Article · Jun 2015 · Nature Communications
[Show abstract][Hide abstract]ABSTRACT: Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS.
Full-text Article · May 2015 · Nature Communications
[Show abstract][Hide abstract]ABSTRACT: In this letter to the editor, we respond to the recent publication by Philibert et al. Methylation array data can simultaneously identify individuals and convey protected health information: an unrecognized ethical concern (Clinical Epigenetics 2014, 6:28). Further discussion of the issues raised by the risk of re-identification of epigenetic methylation data is needed, and a more nuanced approach should be taken with respect to its implications for data sharing policy than the one provided.