[Show abstract][Hide abstract] ABSTRACT: The Illumina HumanMethylation450 BeadChip is increasingly utilized in epigenome-wide association studies, however, this array-based measurement of DNA methylation is subject to measurement variation. Appropriate data preprocessing to remove background noise is important for detecting the small changes that may be associated with disease. We developed a novel background correction method, ENmix, that uses a mixture of exponential and truncated normal distributions to flexibly model signal intensity and uses a truncated normal distribution to model background noise. Depending on data availability, we employ three approaches to estimate background normal distribution parameters using (i) internal chip negative controls, (ii) out-of-band Infinium I probe intensities or (iii) combined methylated and unmethylated intensities. We evaluate ENmix against other available methods for both reproducibility among duplicate samples and accuracy of methylation measurement among laboratory control samples. ENmix out-performed other background correction methods for both these measures and substantially reduced the probe-design type bias between Infinium I and II probes. In reanalysis of existing EWAS data we show that ENmix can identify additional CpGs, and results in smaller P-value estimates for previously-validated CpGs. We incorporated the method into R package ENmix, which is freely available from Bioconductor website.
Nucleic Acids Research 09/2015; DOI:10.1093/nar/gkv907 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Primary and metastatic melanoma tumors share the same cell origin, making it challenging to identify genomic biomarkers that can differentiate them. Primary tumors themselves can be heterogeneous, reflecting ongoing genomic changes as they progress toward metastasizing. We developed a computational method to explore this heterogeneity and to predict metastatic progression of the primary tumors. We applied our method separately to gene expression and to microRNA (miRNA) expression data from ~450 primary and metastatic skin cutaneous melanoma (SKCM) samples from the Cancer Genome Atlas (TCGA). Metastatic progression scores from RNA-seq data were significantly associated with clinical staging of patients' lymph nodes whereas scores from miRNA-seq data were significantly associated with Clark's level. The loss of expression of many characteristic epithelial lineage genes in primary SKCM tumor samples was highly correlated with predicted progression scores. We suggest that those genes/miRNAs might serve as putative biomarkers for SKCM metastatic progression. This article is protected by copyright. All rights reserved.
This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: Members of the tristetraprolin (TTP) family of CCCH tandem zinc finger proteins bind to AU-rich regions in target mRNAs, leading to their deadenylation and decay. Family members in Saccharomyces cerevisiae influence iron metabolism, whereas the single protein expressed in Schizosaccharomyces pombe, Zfs1, regulates cell-cell interactions. In the human pathogen Candida albicans, deep sequencing of mutants lacking the orthologous protein, Zfs1, revealed significant increases (>1.5 fold) in 156 transcripts. Of these, 113 (72%) contained at least one predicted TTP family member binding site in their 3'UTR, compared to only 3 of 56 (5%) down-regulated transcripts. The zfs1Δ/Δ mutant was resistant to 3-amino-1,2,4-triazole, perhaps because of increased expression of the potential target transcript encoded by HIS3. Sequences of the proteins encoded by the putative Zfs1 targets were highly conserved among other species within the fungal CTG clade, while the predicted Zfs1 binding sites in these mRNAs often "disappeared" with increasing evolutionary distance from the parental species. C. albicans Zfs1 bound to the ideal mammalian TTP binding site with high affinity, and Zfs1 was associated with target transcripts after co-immunoprecipitation. Thus, the biochemical activities of these proteins in fungi are highly conserved, but Zfs1-like proteins may target different transcripts in each species.
This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: Members of the mammalian tristetraprolin family of CCCH tandem zinc finger proteins can bind to certain AU-rich elements (AREs)
in mRNAs, leading to their deadenylation and destabilization. Mammals express three or four members of this family, but Drosophila melanogaster and other insects appear to contain a single gene, Tis11. We found that recombinant Drosophila Tis11 protein could bind to ARE-containing RNA oligonucleotides with low nanomolar affinity. Remarkably, co-expression in
mammalian cells with “target” RNAs demonstrated that Tis11 could promote destabilization of ARE-containing mRNAs and that
this was partially dependent on a conserved C-terminal sequence resembling the mammalian NOT1 binding domain. Drosophila Tis11 promoted both deadenylation and decay of a target transcript in this heterologous cell system. We used chromosome deletion/duplication
and P element insertion to produce two types of Tis11 deficiency in adult flies, both of which were viable and fertile. To address
the hypothesis that Tis11 deficiency would lead to the abnormal accumulation of potential target transcripts, we analyzed
gene expression in adult flies by deep mRNA sequencing. We identified 69 transcripts from 56 genes that were significantly
up-regulated more than 1.5-fold in both types of Tis11-deficient flies. Ten of the up-regulated transcripts encoded probable
proteases, but many other functional classes of proteins were represented. Many of the up-regulated transcripts contained
potential binding sites for tristetraprolin family member proteins that were conserved in other Drosophila species. Tis11 is thus an ARE-binding, mRNA-destabilizing protein that may play a role in post-transcriptional gene expression
in Drosophila and other insects.
[Show abstract][Hide abstract] ABSTRACT: Background
Most genes in mammals generate several transcript isoforms that differ in stability and translational efficiency through alternative splicing. Such alternative splicing can be tissue- and developmental stage-specific, and such specificity is sometimes associated with disease. Thus, detecting differential isoform usage for a gene between tissues or cell lines/types (differences in the fraction of total expression of a gene represented by the expression of each of its isoforms) is potentially important for cell and developmental biology.
We present a new method IUTA that is designed to test each gene in the genome for differential isoform usage between two groups of samples. IUTA also estimates isoform usage for each gene in each sample as well as averaged across samples within each group. IUTA is the first method to formulate the testing problem as testing for equal means of two probability distributions under the Aitchison geometry, which is widely recognized as the most appropriate geometry for compositional data (vectors that contain the relative amount of each component comprising the whole). Evaluation using simulated data showed that IUTA was able to provide test results for many more genes than was Cuffdiff2 (version 2.2.0, released in Mar. 2014), and IUTA performed better than Cuffdiff2 for the limited number of genes that Cuffdiff2 did analyze. When applied to actual mouse RNA-Seq datasets from six tissues, IUTA identified 2,073 significant genes with clear patterns of differential isoform usage between a pair of tissues. IUTA is implemented as an R package and is available at http://www.niehs.nih.gov/research/resources/software/biostatistics/iuta/index.cfm.
Both simulation and real-data results suggest that IUTA accurately detects differential isoform usage. We believe that our analysis of RNA-seq data from six mouse tissues represents the first comprehensive characterization of isoform usage in these tissues. IUTA will be a valuable resource for those who study the roles of alternative transcripts in cell development and disease.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-862) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: Estrogen Receptor α (ERα) interacts with DNA, directly, or indirectly via other transcription factors, referred to as "tethering". Evidence for tethering is based on in vitro studies and a widely used "KIKO" mouse model containing mutations that prevent direct estrogen response element (ERE) DNA-binding. KIKO mice are infertile, due in part to the inability of estrogen (E2) to induce uterine epithelial proliferation. To elucidate the molecular events that prevent KIKO uterine growth, regulation of the pro-proliferative E2 target gene Klf4, and of Klf15, a progesterone (P4) target gene that opposes KLF4's pro-proliferative activity, were evaluated. Klf4 induction was impaired in KIKO uteri; however, Klf15 was induced by E2 rather than by P4. Whole uterine ChIP-seq revealed enrichment of KIKO ERα binding to hormone response elements (HRE), motifs. KIKO binding to HRE motifs was verified using reporter gene and DNA-binding assays. Because the KIKO ERα has HRE DNA-binding activity, we evaluated the "EAAE" ERα, which has more severe DBD mutations, and demonstrated lack of ERE or HRE reporter gene induction or DNA binding. The EAAE mouse has an ERα-null like phenotype, with impaired uterine growth and transcriptional activity. Our findings demonstrate that the KIKO mouse model, which has been used by numerous investigators, cannot be used to establish biological functions for ERα tethering, as KIKO ERα effectively stimulates transcription using HRE motifs. The EAAE-ERα DBD mutant mouse demonstrates that ERα DNA-binding is crucial for biological and transcriptional processes in reproductive tissues, and that ERα-tethering may not contribute to estrogen-responsiveness in vivo.
[Show abstract][Hide abstract] ABSTRACT: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however.
To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful.
T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic "hot spots" where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein.
[Show abstract][Hide abstract] ABSTRACT: Diethylstilbestrol (DES) is a synthetic estrogen that is associated with adverse effects on reproductive organs. DES-induced toxicity of the mouse seminal vesicle (SV) is mediated by ERα with altered expression of seminal vesicle secretory protein IV (Svs4) and lactoferrin (Ltf) genes.
We examined a role for nuclear receptor activity in association with DNA methylation and altered gene expression.
We used the neonatal DES exposure mouse model to examine DNA methylation patterns via bisulfite conversion sequencing in WT and αERKO SVs.
DNA methylation status at 4 specific CpGs (-160, -237, -306 and -367) in the Svs4 gene promoter changes during mouse development from methylated to un-methylated, and DES prevents this change at 10-weeks of age in WT SV. DES alters the methylation status from methylated to un-methylated at 2 specific CpGs (-449 and -459) of the Ltf gene promoter. Alterations in DNA methylation of Svs4 and Ltf were not observed in αERKO SV, suggesting that changes of methylation status at these CpGs are ERα dependent. The methylation status associates with the level of gene expression. In addition, gene expression of three epigenetic modifiers, including DNMT3A, MBD2, and HDAC2 increased after DES exposure in WT SV.
DES-induced hormonal toxicity results from altered gene expression of Svs4 and Ltf associated with changes in DNA methylation that are mediated by ERα. Alterations in gene expression of DNMT3A, MBD2 and HDAC2 after DES exposure may be involved in mediating the changes in methylation status in the SVs of male mice.
Environmental Health Perspectives 12/2013; 122(3). DOI:10.1289/ehp.1307351 · 7.98 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We introduce a web-based tool, Peak Annotation and Visualization (PAVIS), for annotating and visualizing ChIP-seq peak data. PAVIS is designed with non-bioinformaticians in mind and presents a straightforward user interface to facilitate biological interpretation of ChIP-seq peak or other genomic enrichment data. PAVIS, through association with annotation, provides relevant genomic context for each peak, such as peak location relative to genomic features including transcription start site, intron, exon, or 5'/3' -UTR. PAVIS reports the relative enrichment p-values of peaks in these functionally distinct categories, and provides a summary plot of the relative proportion of peaks in each category. PAVIS, unlike many other resources, provides a peak-oriented annotation and visualization system, allowing dynamic visualization of tens to hundreds of loci from one or more ChIP-seq experiments, simultaneously. PAVIS enables rapid, and easy examination and cross-comparison of the genomic context and potential functions of the underlying genomic elements thus supporting downstream hypothesis generation.
PAVIS is publicly accessed at http://manticore.niehs.nih.gov/pavis.
firstname.lastname@example.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Recent studies suggested that human/mammalian genomes are divided into large, discrete domains that are units of chromosome organization. CTCF, a CCCTC binding factor, has a diverse role in genome regulation including transcriptional regulation, chromosome-boundary insulation, DNA replication, and chromatin packaging. It remains unclear whether a subset of CTCF binding sites plays a functional role in establishing/maintaining chromatin topological domains.
We systematically analysed the genomic, transcriptomic and epigenetic profiles of the CTCF binding sites in 56 human cell lines from ENCODE. We identified ~24,000 CTCF sites (referred to as constitutive sites) that were bound in more than 90% of the cell lines. Our analysis revealed: 1) constitutive CTCF loci were located in constitutive open chromatin and often co-localized with constitutive cohesin loci; 2) most constitutive CTCF loci were distant from transcription start sites and lacked CpG islands but were enriched with the full-spectrum CTCF motifs: a recently reported 33/34-mer and two other potentially novel (22/26-mer); 3) more importantly, most constitutive CTCF loci were present in CTCF-mediated chromatin interactions detected by ChIA-PET and these pair-wise interactions occurred predominantly within, but not between, topological domains identified by Hi-C.
Our results suggest that the constitutive CTCF sites may play a role in organizing/maintaining the recently identified topological domains that are common across most human cells.
[Show abstract][Hide abstract] ABSTRACT: Cancer and infection are predominant causes of human mortality and derive, respectively, from inadequate genomic and host defenses against environmental agents. The transcription factor p53 plays a central role in human tumor suppression. Despite its expression in immune cells and broad responsiveness to stressors, it is virtually unknown whether p53 regulates host defense against infection. We report that the lungs of naive p53(-/-) mice display genome-wide induction of NF-κB response element-enriched proinflammatory genes, suggestive of type 1 immune priming. p53-null and p53 inhibitor-treated mice clear Gram-negative and -positive bacteria more effectively than controls after intrapulmonary infection. This is caused, at least in part, by cytokines produced by an expanded population of apoptosis-resistant, TLR-hyperresponsive alveolar macrophages that enhance airway neutrophilia. p53(-/-) neutrophils, in turn, display heightened phagocytosis, Nox-dependent oxidant generation, degranulation, and bacterial killing. p53 inhibition boosts bacterial killing by mouse neutrophils and oxidant generation by human neutrophils. Despite enhanced bacterial clearance, infected p53(-/-) mice suffer increased mortality associated with aggravated lung injury. p53 thus modulates host defense through regulating microbicidal function and fate of phagocytes, revealing a fundamental link between defense of genome and host during environmental insult.
Journal of Experimental Medicine 04/2013; 210(5). DOI:10.1084/jem.20121674 · 12.52 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Members of the tristetraprolin (TTP) family of CCCH tandem zinc finger proteins can bind directly to AU-rich elements in mRNAs and promote transcript deadenylation and decay. The yeast Schizosaccharomyces pombe expresses a single TTP family member, Zfs1p. In this study, we identified probable Zfs1p target mRNAs by comparing transcript levels in wild-type yeast and zfs1Δ mutants, using deep sequencing and microarray approaches. We also used direct RNA sequencing to determine polyadenylation site locations and to confirm the presence of potential Zfs1p target sequences within the target mRNA. These studies identified a set of transcripts containing potential Zfs1p binding sites that accumulated significantly in the zfs1Δ mutants; a subset of these transcripts decayed more slowly in the zfs1Δ mutants and bound directly to Zfs1p in coimmunoprecipitation assays. One apparent direct target encodes the transcription factor Cbf12p, which is known to increase cell-cell adhesion and flocculation when overexpressed. Studies of zfs1Δ cbf12Δ double mutants demonstrated that the increased flocculation seen in zfs1Δ mutants is due, at least in part, to a direct effect on the turnover of cbf12 mRNA. These data suggest that Zfs1p can both directly and indirectly regulate the levels of transcripts involved in cell-cell adhesion in this species.
[Show abstract][Hide abstract] ABSTRACT: To advance understanding of mechanisms leading to biological and transcriptional endpoints related to estrogen action in the mouse uterus, we have mapped ERα and RNA polymerase II (PolII) binding sites using chromatin immunoprecipitation followed by sequencing of enriched chromatin fragments. In the absence of hormone, 5184 ERα-binding sites were apparent in the vehicle-treated ovariectomized uterine chromatin, whereas 17,240 were seen 1 h after estradiol (E₂) treatment, indicating that some sites are occupied by unliganded ERα, and that ERα binding is increased by E₂. Approximately 15% of the uterine ERα-binding sites were adjacent to (<10 kb) annotated transcription start sites, and many sites are found within genes or are found more than 100 kb distal from mapped genes; however, the density (sites per base pair) of ERα-binding sites is significantly greater adjacent to promoters. An increase in quantity of sites but no significant positional differences were seen between vehicle and E₂-treated samples in the overall locations of ERα-binding sites either distal from, adjacent to, or within genes. Analysis of the PolII data revealed the presence of poised promoter-proximal PolII on some highly up-regulated genes. Additionally, corecruitment of PolII and ERα to some distal enhancer regions was observed. A de novo motif analysis of sequences in the ERα-bound chromatin confirmed that estrogen response elements were significantly enriched. Interestingly, in areas of ERα binding without predicted estrogen response element motifs, homeodomain transcription factor-binding motifs were significantly enriched. The integration of the ERα- and PolII-binding sites from our uterine sequencing of enriched chromatin fragments data with transcriptional responses revealed in our uterine microarrays has the potential to greatly enhance our understanding of mechanisms governing estrogen response in uterine and other estrogen target tissues.
[Show abstract][Hide abstract] ABSTRACT: ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. AVAILABILITY: Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.
[Show abstract][Hide abstract] ABSTRACT: We propose a new and effective statistical framework for identifying genome-wide differential changes in epigenetic marks with ChIP-seq data or gene expression with mRNA-seq data, and we develop a new software tool EpiCenter that can efficiently perform data analysis. The key features of our framework are: (i) providing multiple normalization methods to achieve appropriate normalization under different scenarios, (ii) using a sequence of three statistical tests to eliminate background regions and to account for different sources of variation and (iii) allowing adjustment for multiple testing to control false discovery rate (FDR) or family-wise type I error. Our software EpiCenter can perform multiple analytic tasks including: (i) identifying genome-wide epigenetic changes or differentially expressed genes, (ii) finding transcription factor binding sites and (iii) converting multiple-sample sequencing data into a single read-count data matrix. By simulation, we show that our framework achieves a low FDR consistently over a broad range of read coverage and biological variation. Through two real examples, we demonstrate the effectiveness of our framework and the usages of our tool. In particular, we show that our novel and robust 'parsimony' normalization method is superior to the widely-used 'tagRatio' method. Our software EpiCenter is freely available to the public.
Nucleic Acids Research 07/2011; 39(19):e130. DOI:10.1093/nar/gkr592 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: ChIP-seq data are enriched in binding sites for the protein immunoprecipitated. Some sequences may also contain binding sites for a coregulator. Biologists are interested in knowing which coregulatory factor motifs may be present in the sequences bound by the protein ChIP'ed.
We present a finite mixture framework with an expectation-maximization algorithm that considers two motifs jointly and simultaneously determines which sequences contain both motifs, either one or neither of them. Tested on 10 simulated ChIP-seq datasets, our method performed better than repeated application of MEME in predicting sequences containing both motifs. When applied to a mouse liver Foxa2 ChIP-seq dataset involving ~ 12 000 400-bp sequences, coMOTIF identified co-occurrence of Foxa2 with Hnf4a, Cebpa, E-box, Ap1/Maf or Sp1 motifs in ~6-33% of these sequences. These motifs are either known as liver-specific transcription factors or have an important role in liver function.
Freely available at http://www.niehs.nih.gov/research/resources/software/comotif/.
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Epithelial stem cells self-renew while maintaining multipotency, but the dependence of stem cell properties on maintenance of the epithelial phenotype is unclear. We previously showed that trophoblast stem (TS) cells lacking the protein kinase MAP3K4 maintain properties of both stemness and epithelial-mesenchymal transition (EMT). Here, we show that MAP3K4 controls the activity of the histone acetyltransferase CBP, and that acetylation of histones H2A and H2B by CBP is required to maintain the epithelial phenotype. Combined loss of MAP3K4/CBP activity represses expression of epithelial genes and causes TS cells to undergo EMT while maintaining their self-renewal and multipotency properties. The expression profile of MAP3K4-deficient TS cells defines an H2B acetylation-regulated gene signature that closely overlaps with that of human breast cancer cells. Taken together, our data define an epigenetic switch that maintains the epithelial phenotype in TS cells and reveals previously unrecognized genes potentially contributing to breast cancer.
[Show abstract][Hide abstract] ABSTRACT: Metazoan transcription is controlled through either coordinated recruitment of transcription machinery to the gene promoter or regulated pausing of RNA polymerase II (Pol II) in early elongation. We report that a striking difference between genes that use these distinct regulatory strategies lies in the "default" chromatin architecture specified by their DNA sequences. Pol II pausing is prominent at highly regulated genes whose sequences inherently disfavor nucleosome formation within the gene but favor occlusion of the promoter by nucleosomes. In contrast, housekeeping genes that lack pronounced Pol II pausing show higher nucleosome occupancy downstream, but their promoters are deprived of nucleosomes regardless of polymerase binding. Our results indicate that a key role of paused Pol II is to compete with nucleosomes for occupancy of highly regulated promoters, thereby preventing the formation of repressive chromatin architecture to facilitate further or future gene activation.