[Show abstract][Hide abstract] ABSTRACT: Background:
CpG methylation variation is involved in human trait formation and disease susceptibility. Analyses within populations have been biased towards CpG-dense regions through the application of targeted arrays. We generate whole-genome bisulfite sequencing data for approximately 30 adipose and blood samples from monozygotic and dizygotic twins for the characterization of non-genetic and genetic effects at single-site resolution.
Purely invariable CpGs display a bimodal distribution with enrichment of unmethylated CpGs and depletion of fully methylated CpGs in promoter and enhancer regions. Population-variable CpGs account for approximately 15-20 % of total CpGs per tissue, are enriched in enhancer-associated regions and depleted in promoters, and single nucleotide polymorphisms at CpGs are a frequent confounder of extreme methylation variation. Differential methylation is primarily non-genetic in origin, with non-shared environment accounting for most of the variance. These non-genetic effects are mainly tissue-specific. Tobacco smoking is associated with differential methylation in blood with no evidence of this exposure impacting cell counts. Opposite to non-genetic effects, genetic effects of CpG methylation are shared across tissues and thus limit inter-tissue epigenetic drift. CpH methylation is rare, and shows similar characteristics of variation patterns as CpGs.
Our study highlights the utility of low pass whole-genome bisulfite sequencing in identifying methylome variation beyond promoter regions, and suggests that targeting the population dynamic methylome of tissues requires assessment of understudied intergenic CpGs distal to gene promoters to reveal the full extent of inter-individual variation.
[Show abstract][Hide abstract] ABSTRACT: Neuromyelitis optica (NMO) is rare in Finland. To identify rare genetic variants contributing to NMO risk we performed whole exome, HLA and regulatory region sequencing in all ascertained cases during 2005-2013 (n=5) in a Southern Finnish population of 1.6 million. There were no rare variant shared by all patients. Four missense variants were shared by two patients in C3ORF20, PDZD2, C5ORF47 and ZNF606. Another PDZD2 variant was found in a third patient. In the non-coding sequence two predictably functional rare variants were shared by two patients. Our results do not support a homogeneous genetic etiology of NMO in Finland.
[Show abstract][Hide abstract] ABSTRACT: Motivation:
DNA methylation patterns are well known to vary substantially across cell types or tissues. Hence, existing normalization methods may not be optimal if they do not take this into account. We therefore present a new R package for normalization of data from the Illumina Infinium Human Methylation450 BeadChip (Illumina 450K) built on the concepts in the recently published funNorm method (Fortin et al. 2014), and introducing cell-type or tissue-type flexibility.
funtooNorm is relevant for data sets containing samples from two or more cell or tissue types. A visual display of cross-validated errors informs the choice of the optimal number of components in the normalization. Benefits of cell (tissue)-specific normalization are demonstrated in three data sets. Improvement can be substantial; it is strikingly better on chromosome X, where methylation patterns have unique inter-tissue variability.
Availability and implementation:
An R package is available at https://github.com/GreenwoodLab/funtooNorm, and has been submitted to Bioconductor at http://bioconductor.org.
email@example.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Supplementary Methods funtooNorm normalization method: Let í µí± (í µí±xí µí° ¶) represent a matrix of summary control-probe data for í µí± samples and í µí° ¶ control probe signals, where there is a column for the average log signal from each control probe type and each colour (red, green). We create a larger matrix, í µí± * , by adding additional columns representing the interactions between the control probe summaries and cell type indicators. For example, if there are 3 cell types, then the matrix í µí± * will have 4í µí° ¶ columns: the original matrix, as well as all interactions with 3 cell-type indicator variables. That is, í µí± * = � í µí± í µí± (1) í µí± (2) … í µí± (í µí±) � where í µí± (í µí±¡) represents the matrix í µí± multiplied by an indicator for cell type í µí±¡, so that all that rows from samples that are not cell type í µí±¡ are zeros. The user can then choose whether to fit principal component regressions (PCR; as in the funNorm algorithm (Fortin, et al., 2014)), or partial least squares regressions (PLS)(Tenenhaus, 1998) predicting a series of quantiles of the A and B signals from the Illumina 450 BeadChip for each sample using this augmented í µí± * as the covariates. As in funNorm, these models are fit separately for probe type I red, type I green, and type II. We fit models at 529 quantiles: every 0.002 nd percentile plus a slightly finer grid in the tails of the distributions. The augmented covariate matrix containing interactions with cell-type or tissue-type indicators allows the relationship between quantiles and control probes to be cell-(or tissue-) type specific, hence implementing additional flexibility. As in funNorm, predictions for signals A and B are obtained for all quantiles by linear interpolation between the quantile fits. An important element of any PLS or PCR model is the number of components needed. funtooNorm includes a graphical display of cross-validated errors so that an appropriate number of components can be chosen (see Figure 1 and Supplemental Figure 5). All results except for Supplemental Figure 5 are based on 4 components and PCR; Supplemental Figure 5 demonstrates cross-validation results for PLS (with 4 components). The data for the 10-fold cross-validation is separately partitioned at each quantile, hence the plots are quite noisy. Measures of agreement between replicates:
[Show abstract][Hide abstract] ABSTRACT: DNA methylation is an epigenetic mark thought to be robust to environmental perturbations on a short time scale. Here, we challenge that view by demonstrating that the infection of human dendritic cells (DCs) with a live pathogenic bacteria is associated with rapid and active demethylation at thousands of loci, independent of cell division. We performed an integrated analysis of data on genome-wide DNA methylation, histone mark patterns, chromatin accessibility, and gene expression, before and after infection. We found that infection-induced demethylation rarely occurs at promoter regions and instead localizes to distal enhancer elements, including those that regulate the activation of key immune transcription factors. Active demethylation is associated with extensive epigenetic remodeling, including the gain of histone activation marks and increased chromatin accessibility, and is strongly predictive of changes in the expression levels of nearby genes. Collectively, our observations show that active, rapid changes in DNA methylation in enhancers play a previously unappreciated role in regulating the transcriptional response to infection, even in non-proliferating cells.
[Show abstract][Hide abstract] ABSTRACT: The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10-14), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10-11; ncases = 98,742 and ncontrols = 409,511). Using an En1cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10-11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of complex traits and disease in the general population.
[Show abstract][Hide abstract] ABSTRACT: Dietary folate is a major source of methyl groups required for DNA methylation, an epigenetic modification that is actively maintained and remodelled during spermatogenesis. While high dose folic acid supplementation (up to ten times the daily recommended dose) has been shown to improve sperm parameters in infertile men, the effects of supplementation on the sperm epigenome are unknown. To assess the impact of six months of high dose folic acid supplementation on the sperm epigenome, we studied 30 men with idiopathic infertility. Blood folate concentrations increased significantly after supplementation with no significant improvements in sperm parameters. Methylation levels of the differentially methylated regions of several imprinted loci (H19, DLK1/GTL2, MEST, SNRPN, PLAGL1, KCNQ1OT1) were normal both before and after supplementation. Reduced representation bisulfite sequencing (RRBS) revealed a significant global loss of methylation across different regions of the sperm genome. The most marked loss of DNA methylation was found in sperm from patients homozygous for the methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism, a common polymorphism in a key enzyme required for folate metabolism. RRBS analysis also showed that most of the differentially methylated tiles were located in DNA repeats, low CpG density and intergenic regions. Ingenuity Pathway Analysis revealed that methylation of promoter regions was altered in several genes involved in cancer and neurobehavioral disorders including CBFA2T3, PTPN6, COL18A1, ALDH2, UBE4B, ERBB2, GABRB3, CNTNAP4 and NIPA1. Our data reveal alterations of the human sperm epigenome associated with high dose folic acid supplementation, effects that were exacerbated by a common polymorphism in MTHFR.
Full-text · Article · Aug 2015 · Human Molecular Genetics
[Show abstract][Hide abstract] ABSTRACT: Large-scale epigenome mapping by the NIH Roadmap Epigenomics Project, the ENCODE Consortium and the International Human Epigenome Consortium (IHEC) produces genome-wide DNA methylation data at one base-pair resolution. We examine how such data can be made open-access while balancing appropriate interpretation and genomic privacy. We propose guidelines for data release that both reduce ambiguity in the interpretation of open-access data and limit immediate access to genetic variation data that are made available through controlled access.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0723-0) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: Anaplastic oligodendroglioma (AO) are rare primary brain tumours that are generally incurable, with heterogeneous prognosis and few treatment targets identified. Most oligodendrogliomas have chromosomes 1p/19q co-deletion and an IDH mutation. Here we analysed 51 AO by whole-exome sequencing, identifying previously reported frequent somatic mutations in CIC and FUBP1. We also identified recurrent mutations in TCF12 and in an additional series of 83 AO. Overall, 7.5% of AO are mutated for TCF12, which encodes an oligodendrocyte-related transcription factor. Eighty percent of TCF12 mutations identified were in either the bHLH domain, which is important for TCF12 function as a transcription factor, or were frameshift mutations leading to TCF12 truncated for this domain. We show that these mutations compromise TCF12 transcriptional activity and are associated with a more aggressive tumour type. Our analysis provides further insights into the unique and shared pathways driving AO.
Full-text · Article · Jun 2015 · Nature Communications
[Show abstract][Hide abstract] ABSTRACT: Most genome-wide methylation studies (EWAS) of multifactorial disease traits use targeted arrays or enrichment methodologies preferentially covering CpG-dense regions, to characterize sufficiently large samples. To overcome this limitation, we present here a new customizable, cost-effective approach, methylC-capture sequencing (MCC-Seq), for sequencing functional methylomes, while simultaneously providing genetic variation information. To illustrate MCC-Seq, we use whole-genome bisulfite sequencing on adipose tissue (AT) samples and public databases to design AT-specific panels. We establish its efficiency for high-density interrogation of methylome variability by systematic comparisons with other approaches and demonstrate its applicability by identifying novel methylation variation within enhancers strongly correlated to plasma triglyceride and HDL-cholesterol, including at CD36. Our more comprehensive AT panel assesses tissue methylation and genotypes in parallel at ∼4 and ∼3 M sites, respectively. Our study demonstrates that MCC-Seq provides comparable accuracy to alternative approaches but enables more efficient cataloguing of functional and disease-relevant epigenetic and genetic variants for large-scale EWAS.
Full-text · Article · May 2015 · Nature Communications
[Show abstract][Hide abstract] ABSTRACT: In this letter to the editor, we respond to the recent publication by Philibert et al. Methylation array data can simultaneously identify individuals and convey protected health information: an unrecognized ethical concern (Clinical Epigenetics 2014, 6:28). Further discussion of the issues raised by the risk of re-identification of epigenetic methylation data is needed, and a more nuanced approach should be taken with respect to its implications for data sharing policy than the one provided.
[Show abstract][Hide abstract] ABSTRACT: The interplay between genetic and epigenetic variation is only partially understood. One form of epigenetic variation is methylation at CpG sites, which can be measured as methylation quantitative trait loci (meQTL). Here we report that in a panel of lymphocytes from 1,748 individuals, methylation levels at 1,919 CpG sites are correlated with at least one distal (trans) single-nucleotide polymorphism (SNP) (Po3.2 Â 10 À 13 ; FDRo5%). These trans-meQTLs include 1,657 SNP–CpG pairs from different chromosomes and 262 pairs from the same chromosome that are 41 Mb apart. Over 90% of these pairs are replicated (FDRo5%) in at least one of two independent data sets. Genomic loci harbouring trans-meQTLs are significantly enriched (Po0.001) for long non-coding transcripts (2.2-fold), known epigenetic regulators (2.3-fold), piwi-interacting RNA clusters (3.6-fold) and curated transcription factors (4.1-fold), including zinc-finger proteins (8.75-fold). Long-range epigenetic networks uncovered by this approach may be relevant to normal and disease states.
Full-text · Article · Feb 2015 · Nature Communications
[Show abstract][Hide abstract] ABSTRACT: Most complex disease-associated genetic variants are located in non-coding regions and are therefore thought to be regulatory in nature. Association mapping of differential allelic expression (AE) is a powerful method to identify SNPs with direct cis-regulatory impact (cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found 40–60% of these cis-rSNPs to be shared across cell types. We uncover a new class of cis-rSNPs, which disrupt footprint-derived de novo motifs that are predominantly bound by repressive factors and are implicated in disease susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new approach for genome-wide functional validation of transcription factor–SNP interactions. By perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive analysis of cis-variation in four cell populations and provide new tools for the identification of functional variants associated to complex diseases.
Full-text · Article · Oct 2014 · Molecular Systems Biology
[Show abstract][Hide abstract] ABSTRACT: Allele-specific (AS) assessment of chromatin has the potential to elucidate specific cis-regulatory mechanisms, which are predicted to underlie the majority of the known genetic associations to complex disease. However, development of chromatin landscapes at allelic resolution has been challenging since sites of variable signal strength require substantial read depths not commonly applied in sequencing based approaches. In this study, we addressed this by performing parallel analyses of input DNA and chromatin immunoprecipitates (ChIP) on high-density Illumina genotyping arrays. Allele-specificity for the histone modifications H3K4me1, H3K4me3, H3K27ac, H3K27me3, and H3K36me3 was assessed using ChIP samples generated from 14 lymphoblast and 6 fibroblast cell lines. AS-ChIP SNPs were combined into domains and validated using high-confidence ChIP-seq sites. We observed characteristic patterns of allelic-imbalance for each histone-modification around allele-specifically expressed transcripts. Notably, we found H3K4me1 to be significantly anti-correlated with allelic expression (AE) at transcription start sites, indicating H3K4me1 allelic imbalance as a marker of AE. We also found that allelic chromatin domains exhibit population and cell-type specificity as well as heritability within trios. Finally, we observed that a subset of allelic chromatin domains is regulated by DNase I-sensitive quantitative trait loci and that these domains are significantly enriched for genome-wide association studies hits, with autoimmune disease associated SNPs specifically enriched in lymphoblasts. This study provides the first genome-wide maps of allelic-imbalance for five histone marks. Our results provide new insights into the role of chromatin in cis-regulation and highlight the need for high-depth sequencing in ChIP-seq studies along with the need to improve allele-specificity of ChIP-enrichment.
Preview · Article · Jul 2014 · Epigenetics: official journal of the DNA Methylation Society
[Show abstract][Hide abstract] ABSTRACT: We applied genome-wide allele-specific expression analysis of monocytes from 188 samples. Monocytes were purified from white blood cells of healthy blood donors to detect cis-acting genetic variation that regulates the expression of long non-coding RNAs. We analysed 8929 regions harboring genes for potential long non-coding RNA that were retrieved from data from the ENCODE project. Of these regions, 60% were annotated as intergenic, which implies that they do not overlap with protein-coding genes. Focusing on the intergenic regions, and using stringent analysis of the allele-specific expression data, we detected robust cis-regulatory SNPs in 258 out of 489 informative intergenic regions included in the analysis. The cis-regulatory SNPs that were significantly associated with allele-specific expression of long non-coding RNAs were enriched to enhancer regions marked for active or bivalent, poised chromatin by histone modifications. Out of the lncRNA regions regulated by cis-acting regulatory SNPs, 20% (n = 52) were co-regulated with the closest protein coding gene. We compared the identified cis-regulatory SNPs with those in the catalog of SNPs identified by genome-wide association studies of human diseases and traits. This comparison identified 32 SNPs in loci from genome-wide association studies that displayed a strong association signal with allele-specific expression of non-coding RNAs in monocytes, with p-values ranging from 6.7×10-7 to 9.5×10-89. The identified cis-regulatory SNPs are associated with diseases of the immune system, like multiple sclerosis and rheumatoid arthritis.
[Show abstract][Hide abstract] ABSTRACT: Objectives
We coupled two strategies – trait extremes and genome-wide pooling – to discover a novel BP locus that encodes a previously uncharacterized thiamine transporter.
Hypertension is a heritable trait that remains the most potent and widespread cardiovascular risk factor, though details of its genetic determination are poorly understood. Methods. Representative genomic DNA pools were created from male and female subjects in the highest and lowest 5th %iles of BP in a primary care population of >50,000 individuals. The peak associated SNPs were typed in individual DNA samples, as well as twins/siblings phenotyped for cardiovascular and autonomic traits. Biochemical properties of the associated transporter were evaluated in cellular assays.
After chip hybridization and calculation of relative allele scores, the peak associations were typed in individual samples, revealing association of hypertension, SBP, and DBP to the previously uncharacterized solute carrier SLC35F3. The BP genetic association at SLC35F3 was validated by meta-analysis in an independent sample from the original source population, as well as the ICBP (across North America and Western Europe). Sequence homology to a putative yeast thiamine (vitamin B1) transporter prompted us to express human SLC35F3 in E. coli, which catalyzed [3H]-thiamine uptake. SLC35F3 risk allele (T/T) homozygotes displayed decreased erythrocyte thiamine content on microbiological assay. In twin pairs, the SLC35F3 risk allele predicted heritable cardiovascular traits previously associated with thiamine deficiency, including elevated cardiac stroke volume with decreased vascular resistance, and elevated pressor responses to environmental (cold) stress. Allelic expression imbalance (AEI) confirmed that cis-variation at the human SLC35F3 locus influenced expression of that gene, and the AEI peak coincided with the hypertension peak.
Novel strategies were coupled to position a new hypertension susceptibility locus, uncovering a previously unsuspected thiamine transporter whose genetic variants predicted several disturbances in cardiac and autonomic function. The results have implications for the pathogenesis and treatment of systemic hypertension.
Full-text · Article · Apr 2014 · Journal of the American College of Cardiology