[show abstract][hide abstract] ABSTRACT: A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however.
To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful.
T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic "hot spots" where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein.
[show abstract][hide abstract] ABSTRACT: Diethylstilbestrol (DES) is a synthetic estrogen that is associated with adverse effects on reproductive organs. DES-induced toxicity of the mouse seminal vesicle (SV) is mediated by ERα with altered expression of seminal vesicle secretory protein IV (Svs4) and lactoferrin (Ltf) genes.
We examined a role for nuclear receptor activity in association with DNA methylation and altered gene expression.
We used the neonatal DES exposure mouse model to examine DNA methylation patterns via bisulfite conversion sequencing in WT and αERKO SVs.
DNA methylation status at 4 specific CpGs (-160, -237, -306 and -367) in the Svs4 gene promoter changes during mouse development from methylated to un-methylated, and DES prevents this change at 10-weeks of age in WT SV. DES alters the methylation status from methylated to un-methylated at 2 specific CpGs (-449 and -459) of the Ltf gene promoter. Alterations in DNA methylation of Svs4 and Ltf were not observed in αERKO SV, suggesting that changes of methylation status at these CpGs are ERα dependent. The methylation status associates with the level of gene expression. In addition, gene expression of three epigenetic modifiers, including DNMT3A, MBD2, and HDAC2 increased after DES exposure in WT SV.
DES-induced hormonal toxicity results from altered gene expression of Svs4 and Ltf associated with changes in DNA methylation that are mediated by ERα. Alterations in gene expression of DNMT3A, MBD2 and HDAC2 after DES exposure may be involved in mediating the changes in methylation status in the SVs of male mice.
Environmental Health Perspectives 12/2013; · 7.26 Impact Factor
[show abstract][hide abstract] ABSTRACT: We introduce a web-based tool, Peak Annotation and Visualization (PAVIS), for annotating and visualizing ChIP-seq peak data. PAVIS is designed with non-bioinformaticians in mind and presents a straightforward user interface to facilitate biological interpretation of ChIP-seq peak or other genomic enrichment data. PAVIS, through association with annotation, provides relevant genomic context for each peak, such as peak location relative to genomic features including transcription start site, intron, exon, or 5'/3' -UTR. PAVIS reports the relative enrichment p-values of peaks in these functionally distinct categories, and provides a summary plot of the relative proportion of peaks in each category. PAVIS, unlike many other resources, provides a peak-oriented annotation and visualization system, allowing dynamic visualization of tens to hundreds of loci from one or more ChIP-seq experiments, simultaneously. PAVIS enables rapid, and easy examination and cross-comparison of the genomic context and potential functions of the underlying genomic elements thus supporting downstream hypothesis generation.
PAVIS is publicly accessed at http://manticore.niehs.nih.gov/pavis.
email@example.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
[show abstract][hide abstract] ABSTRACT: Recent studies suggested that human/mammalian genomes are divided into large, discrete domains that are units of chromosome organization. CTCF, a CCCTC binding factor, has a diverse role in genome regulation including transcriptional regulation, chromosome-boundary insulation, DNA replication, and chromatin packaging. It remains unclear whether a subset of CTCF binding sites plays a functional role in establishing/maintaining chromatin topological domains.
We systematically analysed the genomic, transcriptomic and epigenetic profiles of the CTCF binding sites in 56 human cell lines from ENCODE. We identified ~24,000 CTCF sites (referred to as constitutive sites) that were bound in more than 90% of the cell lines. Our analysis revealed: 1) constitutive CTCF loci were located in constitutive open chromatin and often co-localized with constitutive cohesin loci; 2) most constitutive CTCF loci were distant from transcription start sites and lacked CpG islands but were enriched with the full-spectrum CTCF motifs: a recently reported 33/34-mer and two other potentially novel (22/26-mer); 3) more importantly, most constitutive CTCF loci were present in CTCF-mediated chromatin interactions detected by ChIA-PET and these pair-wise interactions occurred predominantly within, but not between, topological domains identified by Hi-C.
Our results suggest that the constitutive CTCF sites may play a role in organizing/maintaining the recently identified topological domains that are common across most human cells.
[show abstract][hide abstract] ABSTRACT: Cancer and infection are predominant causes of human mortality and derive, respectively, from inadequate genomic and host defenses against environmental agents. The transcription factor p53 plays a central role in human tumor suppression. Despite its expression in immune cells and broad responsiveness to stressors, it is virtually unknown whether p53 regulates host defense against infection. We report that the lungs of naive p53(-/-) mice display genome-wide induction of NF-κB response element-enriched proinflammatory genes, suggestive of type 1 immune priming. p53-null and p53 inhibitor-treated mice clear Gram-negative and -positive bacteria more effectively than controls after intrapulmonary infection. This is caused, at least in part, by cytokines produced by an expanded population of apoptosis-resistant, TLR-hyperresponsive alveolar macrophages that enhance airway neutrophilia. p53(-/-) neutrophils, in turn, display heightened phagocytosis, Nox-dependent oxidant generation, degranulation, and bacterial killing. p53 inhibition boosts bacterial killing by mouse neutrophils and oxidant generation by human neutrophils. Despite enhanced bacterial clearance, infected p53(-/-) mice suffer increased mortality associated with aggravated lung injury. p53 thus modulates host defense through regulating microbicidal function and fate of phagocytes, revealing a fundamental link between defense of genome and host during environmental insult.
Journal of Experimental Medicine 04/2013; · 13.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: Members of the tristetraprolin (TTP) family of CCCH tandem zinc finger proteins can bind directly to AU-rich elements in mRNAs and promote transcript deadenylation and decay. The yeast Schizosaccharomyces pombe expresses a single TTP family member, Zfs1p. In this study, we identified probable Zfs1p target mRNAs by comparing transcript levels in wild-type yeast and zfs1Δ mutants, using deep sequencing and microarray approaches. We also used direct RNA sequencing to determine polyadenylation site locations and to confirm the presence of potential Zfs1p target sequences within the target mRNA. These studies identified a set of transcripts containing potential Zfs1p binding sites that accumulated significantly in the zfs1Δ mutants; a subset of these transcripts decayed more slowly in the zfs1Δ mutants and bound directly to Zfs1p in coimmunoprecipitation assays. One apparent direct target encodes the transcription factor Cbf12p, which is known to increase cell-cell adhesion and flocculation when overexpressed. Studies of zfs1Δ cbf12Δ double mutants demonstrated that the increased flocculation seen in zfs1Δ mutants is due, at least in part, to a direct effect on the turnover of cbf12 mRNA. These data suggest that Zfs1p can both directly and indirectly regulate the levels of transcripts involved in cell-cell adhesion in this species.
Molecular and cellular biology 08/2012; 32(20):4206-14. · 6.06 Impact Factor
[show abstract][hide abstract] ABSTRACT: To advance understanding of mechanisms leading to biological and transcriptional endpoints related to estrogen action in the mouse uterus, we have mapped ERα and RNA polymerase II (PolII) binding sites using chromatin immunoprecipitation followed by sequencing of enriched chromatin fragments. In the absence of hormone, 5184 ERα-binding sites were apparent in the vehicle-treated ovariectomized uterine chromatin, whereas 17,240 were seen 1 h after estradiol (E₂) treatment, indicating that some sites are occupied by unliganded ERα, and that ERα binding is increased by E₂. Approximately 15% of the uterine ERα-binding sites were adjacent to (<10 kb) annotated transcription start sites, and many sites are found within genes or are found more than 100 kb distal from mapped genes; however, the density (sites per base pair) of ERα-binding sites is significantly greater adjacent to promoters. An increase in quantity of sites but no significant positional differences were seen between vehicle and E₂-treated samples in the overall locations of ERα-binding sites either distal from, adjacent to, or within genes. Analysis of the PolII data revealed the presence of poised promoter-proximal PolII on some highly up-regulated genes. Additionally, corecruitment of PolII and ERα to some distal enhancer regions was observed. A de novo motif analysis of sequences in the ERα-bound chromatin confirmed that estrogen response elements were significantly enriched. Interestingly, in areas of ERα binding without predicted estrogen response element motifs, homeodomain transcription factor-binding motifs were significantly enriched. The integration of the ERα- and PolII-binding sites from our uterine sequencing of enriched chromatin fragments data with transcriptional responses revealed in our uterine microarrays has the potential to greatly enhance our understanding of mechanisms governing estrogen response in uterine and other estrogen target tissues.
[show abstract][hide abstract] ABSTRACT: ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. AVAILABILITY: Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.
[show abstract][hide abstract] ABSTRACT: We propose a new and effective statistical framework for identifying genome-wide differential changes in epigenetic marks with ChIP-seq data or gene expression with mRNA-seq data, and we develop a new software tool EpiCenter that can efficiently perform data analysis. The key features of our framework are: (i) providing multiple normalization methods to achieve appropriate normalization under different scenarios, (ii) using a sequence of three statistical tests to eliminate background regions and to account for different sources of variation and (iii) allowing adjustment for multiple testing to control false discovery rate (FDR) or family-wise type I error. Our software EpiCenter can perform multiple analytic tasks including: (i) identifying genome-wide epigenetic changes or differentially expressed genes, (ii) finding transcription factor binding sites and (iii) converting multiple-sample sequencing data into a single read-count data matrix. By simulation, we show that our framework achieves a low FDR consistently over a broad range of read coverage and biological variation. Through two real examples, we demonstrate the effectiveness of our framework and the usages of our tool. In particular, we show that our novel and robust 'parsimony' normalization method is superior to the widely-used 'tagRatio' method. Our software EpiCenter is freely available to the public.
Nucleic Acids Research 07/2011; 39(19):e130. · 8.28 Impact Factor
[show abstract][hide abstract] ABSTRACT: ChIP-seq data are enriched in binding sites for the protein immunoprecipitated. Some sequences may also contain binding sites for a coregulator. Biologists are interested in knowing which coregulatory factor motifs may be present in the sequences bound by the protein ChIP'ed.
We present a finite mixture framework with an expectation-maximization algorithm that considers two motifs jointly and simultaneously determines which sequences contain both motifs, either one or neither of them. Tested on 10 simulated ChIP-seq datasets, our method performed better than repeated application of MEME in predicting sequences containing both motifs. When applied to a mouse liver Foxa2 ChIP-seq dataset involving ~ 12 000 400-bp sequences, coMOTIF identified co-occurrence of Foxa2 with Hnf4a, Cebpa, E-box, Ap1/Maf or Sp1 motifs in ~6-33% of these sequences. These motifs are either known as liver-specific transcription factors or have an important role in liver function.
Freely available at http://www.niehs.nih.gov/research/resources/software/comotif/.
Supplementary data are available at Bioinformatics online.
[show abstract][hide abstract] ABSTRACT: Epithelial stem cells self-renew while maintaining multipotency, but the dependence of stem cell properties on maintenance of the epithelial phenotype is unclear. We previously showed that trophoblast stem (TS) cells lacking the protein kinase MAP3K4 maintain properties of both stemness and epithelial-mesenchymal transition (EMT). Here, we show that MAP3K4 controls the activity of the histone acetyltransferase CBP, and that acetylation of histones H2A and H2B by CBP is required to maintain the epithelial phenotype. Combined loss of MAP3K4/CBP activity represses expression of epithelial genes and causes TS cells to undergo EMT while maintaining their self-renewal and multipotency properties. The expression profile of MAP3K4-deficient TS cells defines an H2B acetylation-regulated gene signature that closely overlaps with that of human breast cancer cells. Taken together, our data define an epigenetic switch that maintains the epithelial phenotype in TS cells and reveals previously unrecognized genes potentially contributing to breast cancer.
[show abstract][hide abstract] ABSTRACT: Metazoan transcription is controlled through either coordinated recruitment of transcription machinery to the gene promoter or regulated pausing of RNA polymerase II (Pol II) in early elongation. We report that a striking difference between genes that use these distinct regulatory strategies lies in the "default" chromatin architecture specified by their DNA sequences. Pol II pausing is prominent at highly regulated genes whose sequences inherently disfavor nucleosome formation within the gene but favor occlusion of the promoter by nucleosomes. In contrast, housekeeping genes that lack pronounced Pol II pausing show higher nucleosome occupancy downstream, but their promoters are deprived of nucleosomes regardless of polymerase binding. Our results indicate that a key role of paused Pol II is to compete with nucleosomes for occupancy of highly regulated promoters, thereby preventing the formation of repressive chromatin architecture to facilitate further or future gene activation.
[show abstract][hide abstract] ABSTRACT: Aberrant DNA methylation commonly occurs in cancer cells where it has been implicated in the epigenetic silencing of tumor suppressor genes. Additional roles for DNA methylation, such as transcriptional activation, have been predicted but have yet to be clearly demonstrated. The BCL6 oncogene is implicated in the pathogenesis of germinal center-derived B cell lymphomas. We demonstrate that the intragenic CpG islands within the first intron of the human BCL6 locus were hypermethylated in lymphoma cells that expressed high amounts of BCL6 messenger RNA (mRNA). Inhibition of DNA methyltransferases decreased BCL6 mRNA abundance, suggesting a role for these methylated CpGs in positively regulating BCL6 transcription. The enhancer-blocking transcription factor CTCF bound to this intronic region in a methylation-sensitive manner. Depletion of CTCF by short hairpin RNA in neoplastic plasma cells that do not express BCL6 resulted in up-regulation of BCL6 transcription. These data indicate that BCL6 expression is maintained during lymphomagenesis in part through DNA methylation that prevents CTCF-mediated silencing.
Journal of Experimental Medicine 08/2010; 207(9):1939-50. · 13.21 Impact Factor
[show abstract][hide abstract] ABSTRACT: Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed.
For continuous or ordinal phenotypic outcome, we propose to use as the local statistic the coefficient of multiple determination (i.e., the square of multiple correlation coefficient) R2 from fitting natural cubic spline models to the phenotype-expression relationship. Next, we incorporate this association measure into the GSEA/SAFE framework to identify significant gene sets. Unsigned local statistics, signed global statistics and one-sided p-values are used to reflect our inferential interest. Furthermore, we describe a procedure for inference across multiple GSEA/SAFE analyses. We illustrate our approach using gene expression and liver injury data from liver and blood samples from rats treated with eight hepatotoxicants under multiple time and dose combinations. We set out to identify biological pathways/processes associated with liver injury as manifested by increased blood levels of alanine transaminase in common for most of the eight compounds. Potential statistical dependency resulting from the experimental design is addressed in permutation based hypothesis testing.
The proposed framework captures both linear and non-linear association between gene expression level and a phenotypic endpoint and thus can be viewed as extending the current GSEA/SAFE methodology. The framework for combining results from multiple GSEA/SAFE analyses is flexible to address practical inference interests. Our methods can be applied to microarray data with continuous phenotypes with multi-level design or the meta-analysis of multiple microarray data sets.
[show abstract][hide abstract] ABSTRACT: The Negative Elongation Factor (NELF) is a transcription regulatory complex that induces stalling of RNA polymerase II (Pol II) during early transcription elongation and represses expression of several genes studied to date, including Drosophila Hsp70, mammalian proto-oncogene junB, and HIV RNA. To determine the full spectrum of NELF target genes in Drosophila, we performed a microarray analysis of S2 cells depleted of NELF and discovered that NELF RNAi affects many rapidly inducible genes involved in cellular responses to stimuli. Surprisingly, only one-third of NELF target genes were, like Hsp70, up-regulated by NELF-depletion, whereas the majority of target genes showed decreased expression levels upon NELF RNAi. Our data reveal that the presence of stalled Pol II at this latter group of genes enhances gene expression by maintaining a permissive chromatin architecture around the promoter-proximal region, and that loss of Pol II stalling at these promoters is accompanied by a significant increase in nucleosome occupancy and a decrease in histone H3 Lys 4 trimethylation. These findings identify a novel, positive role for stalled Pol II in regulating gene expression and suggest that there is a dynamic interplay between stalled Pol II and chromatin structure.
Genes & Development 07/2008; 22(14):1921-33. · 12.44 Impact Factor
[show abstract][hide abstract] ABSTRACT: Cyclooxygenase-1 (COX-1, PTGS1) catalyzes the conversion of arachidonic acid to prostaglandin H2, which is subsequently metabolized to various biologically active prostaglandins. We sought to identify and characterize the functional relevance of genetic polymorphisms in PTGS1.
Sequence variations in human PTGS1 were identified by resequencing 92 healthy individuals (24 African, 24 Asian, 24 European/Caucasian, and 20 anonymous). Using site-directed mutagenesis and a baculovirus/insect cell expression system, recombinant wild-type COX-1 and the R8W, P17L, R53H, R78W, K185T, G230S, L237M, and V481I variant proteins were expressed. COX-1 metabolic activity was evaluated in vitro using an oxygen consumption assay under basal conditions and in the presence of indomethacin.
Forty-five variants were identified, including seven nonsynonymous polymorphisms encoding amino acid substitutions in the COX-1 protein. The R53H (35+/-5%), R78W (36+/-4%), K185T (59+/-6%), G230S (57+/-4%), and L237M (51+/-3%) variant proteins had significantly lower metabolic activity relative to wild-type (100+/-7%), while no significant differences were observed with the R8W (104+/-10%), P17L (113+/-7%), and V481I (121+/-10%) variants. Inhibition studies with indomethacin demonstrated that the P17L and G230S variants had significantly lower IC50 values compared to wild-type, suggesting these variants significantly increase COX-1 sensitivity to indomethacin inhibition. Consistent with the metabolic activity data, protein modeling suggested the G230S variant may disrupt the active conformation of COX-1.
Our findings demonstrate that several genetic variants in human COX-1 significantly alter basal COX-1-mediated arachidonic acid metabolism and indomethacin-mediated inhibition of COX-1 activity in vitro. Future studies characterizing the functional impact of these variants in vivo are warranted.
Pharmacogenetics and Genomics 03/2007; 17(2):145-60. · 3.61 Impact Factor
[show abstract][hide abstract] ABSTRACT: Regulatory factor X4 variant transcript 3 (Rfx4_v3) gene disruption in mice demonstrated that interruption of a single allele (heterozygous, +/-) prevented formation of the subcommissural organ, resulting in congenital hydrocephalus, while interruption of two alleles (homozygous, -/-) caused fatal failure of dorsal midline brain structure formation. To identify potential target genes for RFX4_v3, we used microarray analysis to identify differentially expressed genes in Rfx4_v3-deficient mouse brains at embryonic day 10.5, before gross structural changes were apparent. Of 109 differentially expressed transcripts, 24 were chosen for validation and 22 were confirmed by real-time PCR. Many validated genes encoded critical proteins involved in brain morphogenesis, such as the signaling components in the Wnt, bone morphogenetic protein (BMP) and retinoic acid (RA) pathways. Cx3cl1, a CX3C-type chemokine gene that is highly expressed in brain, was down-regulated in the Rfx4_v3-null mice. Both human and mouse Cx3cl1 proximal promoters contained highly conserved X-boxes, known cis-acting elements for RFX protein binding. Using the Cx3cl1 promoter as an example of a target gene, we demonstrated direct binding of RFX4_v3 to the Cx3cl1 promoter, and trans-acting activity of RFX4_v3 protein to stimulate gene expression. These data suggest that RFX4_v3 may act upstream of critical signaling pathways in the process of brain development.
Journal of Neurochemistry 09/2006; 98(3):860-75. · 3.97 Impact Factor
[show abstract][hide abstract] ABSTRACT: An argument is presented that the spontaneous mutation rate, the core of evolution theory, may be dictated by the deuterium/hydrogen (D/H) abundance ratio. This argument is supported by quantum mechanical calculations of the zero-point energy reduction for DNA base pairs upon deuterium substitution for hydrogen and recent experiments that show that the rate of catalytic dsDNA unwinding is dependent on the stability of the dsDNA.
Journal of Theoretical Biology 03/2006; 238(4):914-8. · 2.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Identifying functional elements, such as transcriptional factor binding sites, is a fundamental step in reconstructing gene regulatory networks and remains a challenging issue, largely due to limited availability of training samples.
We introduce a novel and flexible model, the Optimized Mixture Markov model (OMiMa), and related methods to allow adjustment of model complexity for different motifs. In comparison with other leading methods, OMiMa can incorporate more than the NNSplice's pairwise dependencies; OMiMa avoids model over-fitting better than the Permuted Variable Length Markov Model (PVLMM); and OMiMa requires smaller training samples than the Maximum Entropy Model (MEM). Testing on both simulated and actual data (regulatory cis-elements and splice sites), we found OMiMa's performance superior to the other leading methods in terms of prediction accuracy, required size of training data or computational time. Our OMiMa system, to our knowledge, is the only motif finding tool that incorporates automatic selection of the best model. OMiMa is freely available at 1.
Our optimized mixture of Markov models represents an alternative to the existing methods for modeling dependent structures within a biological motif. Our model is conceptually simple and effective, and can improve prediction accuracy and/or computational speed over other leading methods.