-
[show abstract]
[hide abstract]
ABSTRACT: We propose differential principal component analysis (dPCA) for analyzing multiple ChIP- sequencing datasets to identify differential protein-DNA interactions between two biological conditions. dPCA integrates unsupervised pattern discovery, dimension reduction, and statistical inference into a single framework. It uses a small number of principal components to summarize concisely the major multiprotein synergistic differential patterns between the two conditions. For each pattern, it detects and prioritizes differential genomic loci by comparing the between-condition differences with the within-condition variation among replicate samples. dPCA provides a unique tool for efficiently analyzing large amounts of ChIP-sequencing data to study dynamic changes of gene regulation across different biological conditions. We demonstrate this approach through analyses of differential chromatin patterns at transcription factor binding sites and promoters as well as allele-specific protein-DNA interactions.
Proceedings of the National Academy of Sciences 04/2013; · 9.68 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Background ChIP-seq provides new opportunities to study allele-specific protein-DNA binding (ASB). However, detecting allelic imbalance from a single ChIP-seq dataset often has low statistical power since only sequence reads mapped to heterozygote SNPs are informative for discriminating two alleles. Results We develop a new method iASeq to address this issue by jointly analyzing multiple ChIP-seq datasets. iASeq uses a Bayesian hierarchical mixture model to learn correlation patterns of allele-specificity among multiple proteins. Using the discovered correlation patterns, the model allows one to borrow information across datasets to improve detection of allelic imbalance. Application of iASeq to 77 ChIP-seq samples from 40 ENCODE datasets and 1 genomic DNA sample in GM12878 cells reveals that allele-specificity of multiple proteins are highly correlated, and demonstrates the ability of iASeq to improve allelic inference compared to analyzing each individual dataset separately. Conclusions iASeq illustrates the value of integrating multiple datasets in the allele-specificity inference and offers a new tool to better analyze ASB.
BMC Genomics 11/2012; 13(1):681. · 4.07 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Chromatin immunoprecipitation followed by genome tiling array hybridization (ChIP-chip) is a powerful approach to map transcription factor binding sites (TFBSs). Similar to other high-throughput genomic technologies, ChIP-chip often produces noisy data. Distinguishing signals from noise in these data is challenging. ChIP-chip data in public databases are rapidly growing. It is becoming more and more common that scientists can find multiple data sets for the same transcription factor in different biological contexts or data for different transcription factors in the same biological context. When these related experiments are analyzed together, binding site detection can be improved by borrowing information across data sets. This chapter introduces a computational tool JAMIE for Jointly Analyzing Multiple ChIP-chip Experiments. JAMIE is based on a hierarchical mixture model, and it is implemented as an R package. Simulation and real data studies have shown that it can significantly increase sensitivity and specificity of TFBS detection compared to existing algorithms. The purpose of this chapter is to describe how the JAMIE package can be used to perform the integrative data analysis.
Methods in molecular biology (Clifton, N.J.) 01/2012; 802:363-75.
-
Qian-Fei Wang,
George Wu,
Shuangli Mi,
Fuhong He,
Jun Wu,
Jingfang Dong,
Roger T Luo,
Ryan Mattison,
Joseph J Kaberlein,
Shyam Prabhakar, Hongkai Ji,
Michael J Thirman
[show abstract]
[hide abstract]
ABSTRACT: MLL encodes a histone methyltransferase that is critical in maintaining gene expression during embryonic development and hematopoiesis. 11q23 translocations result in the formation of chimeric MLL fusion proteins that act as potent drivers of acute leukemia. However, it remains unclear what portion of the leukemic genome is under the direct control of MLL fusions. By comparing patient-derived leukemic cell lines, we find that MLL fusion-bound genes are a small subset of that recognized by wild-type MLL. In an inducible MLL-ENL model, MLL fusion protein binding and changes in H3K79 methylation are limited to a specific portion of the genome, whereas wild-type MLL distributes to a much larger set of gene loci. Surprisingly, among 223 MLL-ENL-bound genes, only 12 demonstrate a significant increase in mRNA expression on induction of the fusion protein. In addition to Hoxa9 and Meis1, this includes Eya1 and Six1, which comprise a heterodimeric transcription factor important in several developmental pathways. We show that Eya1 has the capacity to immortalize hematopoietic progenitor cells in vitro and collaborates with Six1 in hematopoietic transformation assays. Altogether, our data suggest that MLL fusions contribute to the development of acute leukemia through direct activation of a small set of target genes.
Blood 06/2011; 117(25):6895-905. · 9.90 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: hmChIP is a database of genome-wide chromatin immunoprecipitation (ChIP) data in human and mouse. Currently, the database contains 2016 samples from 492 ChIP-seq and ChIP-chip experiments, representing a total of 170 proteins and 11 069 914 protein-DNA interactions. A web server provides interface for database query. Protein-DNA binding intensities can be retrieved from individual samples for user-provided genomic regions. The retrieved intensities can be used to cluster samples and genomic regions to facilitate exploration of combinatorial patterns, cell-type dependencies, and cross-sample variability of protein-DNA interactions. AVAILABILITY: http://jilab.biostat.jhsph.edu/database/cgi-bin/hmChIP.pl.
Bioinformatics 03/2011; 27(10):1447-8. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Chromatin immunoprecipitation (ChIP) coupled with genome tiling array hybridization (ChIP-chip) and ChIP followed by massively parallel sequencing (ChIP-seq) are high-throughput approaches to profiling genome-wide protein-DNA interactions. Both technologies are increasingly used to study transcription-factor binding sites and chromatin modifications. CisGenome is an integrated software system for analyzing ChIP-chip and ChIP-seq data. This unit describes basic functions of CisGenome and how to use them to find genomic regions with protein-DNA interactions, visualize binding signals, associate binding regions with nearby genes, search for novel transcription-factor binding motifs, and map existing DNA sequence motifs to user-supplied genomic regions to define their exact locations.
Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 03/2011; Chapter 2:Unit2.13.
-
Hongkai Ji,
George Wu,
Xiangcan Zhan,
Alexandra Nolan,
Cheryl Koh,
Angelo De Marzo,
Hoang Mai Doan,
Jinshui Fan,
Christopher Cheadle,
Mohammad Fallahi,
John L Cleveland,
Chi V Dang,
Karen I Zeller
[show abstract]
[hide abstract]
ABSTRACT: The functions of key oncogenic transcription factors independent of context have not been fully delineated despite our richer understanding of the genetic alterations in human cancers. The MYC oncogene, which produces the Myc transcription factor, is frequently altered in human cancer and is a major regulatory hub for many cancers. In this regard, we sought to unravel the primordial signature of Myc function by using high-throughput genomic approaches to identify the cell-type independent core Myc target gene signature. Using a model of human B lymphoma cells bearing inducible MYC, we identified a stringent set of direct Myc target genes via chromatin immunoprecipitation (ChIP), global nuclear run-on assay, and changes in mRNA levels. We also identified direct Myc targets in human embryonic stem cells (ESCs). We further document that a Myc core signature (MCS) set of target genes is shared in mouse and human ESCs as well as in four other human cancer cell types. Remarkably, the expression of the MCS correlates with MYC expression in a cell-type independent manner across 8,129 microarray samples, which include 312 cell and tissue types. Furthermore, the expression of the MCS is elevated in vivo in Eμ-Myc transgenic murine lymphoma cells as compared with premalignant or normal B lymphocytes. Expression of the MCS in human B cell lymphomas, acute leukemia, lung cancers or Ewing sarcomas has the highest correlation with MYC expression. Annotation of this gene signature reveals Myc's primordial function in RNA processing, ribosome biogenesis and biomass accumulation as its key roles in cancer and stem cells.
PLoS ONE 01/2011; 6(10):e26057. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Chromatin immunoprecipitation followed by genome tiling array hybridization (ChIP-chip) is a powerful approach to identify transcription factor binding sites (TFBSs) in target genomes. When multiple related ChIP-chip datasets are available, analyzing them jointly allows one to borrow information across datasets to improve peak detection. This is particularly useful for analyzing noisy datasets.
We propose a hierarchical mixture model and develop an R package JAMIE to perform the joint analysis. The genome is assumed to consist of background and potential binding regions (PBRs). PBRs have context-dependent probabilities to become bona fide binding sites in individual datasets. This model captures the correlation among datasets, which provides basis for sharing information across experiments. Real data tests illustrate the advantage of JAMIE over a strategy that analyzes individual datasets separately.
JAMIE is freely available from http://www.biostat.jhsph.edu/~hji/jamie
Bioinformatics 08/2010; 26(15):1864-70. · 5.47 Impact Factor
-
Cheng Ran Lisa Huang,
Anna M Schneider,
Yunqi Lu,
Tejasvi Niranjan,
Peilin Shen,
Matoya A Robinson,
Jared P Steranka,
David Valle,
Curt I Civin,
Tao Wang,
Sarah J Wheelan, Hongkai Ji,
Jef D Boeke,
Kathleen H Burns
[show abstract]
[hide abstract]
ABSTRACT: Characterizing structural variants in the human genome is of great importance, but a genome wide analysis to detect interspersed repeats has not been done. Thus, the degree to which mobile DNAs contribute to genetic diversity, heritable disease, and oncogenesis remains speculative. We perform transposon insertion profiling by microarray (TIP-chip) to map human L1(Ta) retrotransposons (LINE-1 s) genome-wide. This identified numerous novel human L1(Ta) insertional polymorphisms with highly variant allelic frequencies. We also explored TIP-chip's usefulness to identify candidate alleles associated with different phenotypes in clinical cohorts. Our data suggest that the occurrence of new insertions is twice as high as previously estimated, and that these repeats are under-recognized as sources of human genomic and phenotypic diversity. We have just begun to probe the universe of human L1(Ta) polymorphisms, and as TIP-chip is applied to other insertions such as Alu SINEs, it will expand the catalog of genomic variants even further.
Cell 06/2010; 141(7):1171-82. · 32.40 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Many genes initially identified for their roles in cell fate determination or signaling during development can have a significant impact on tumorigenesis. In the developing cerebellum, Sonic hedgehog (Shh) stimulates the proliferation of granule neuron precursor cells (GNPs) by activating the Gli transcription factors. Inappropriate activation of Shh target genes results in unrestrained cell division and eventually medulloblastoma, the most common pediatric brain malignancy. We find dramatic differences in the gene networks that are directly driven by the Gli1 transcription factor in GNPs and medulloblastoma. Gli1 binding location analysis revealed hundreds of genomic loci bound by Gli1 in normal and cancer cells. Only one third of the genes bound by Gli1 in GNPs were also bound in tumor cells. Correlation with gene expression levels indicated that 116 genes were preferentially transcribed in tumors, whereas 132 genes were target genes in both GNPs and medulloblastoma. Quantitative PCR and in situ hybridization for some putative target genes support their direct regulation by Gli. The results indicate that transformation of normal GNPs into deadly tumor cells is accompanied by a distinct set of Gli-regulated genes and may provide candidates for targeted therapies.
Proceedings of the National Academy of Sciences 05/2010; 107(21):9736-41. · 9.68 Impact Factor
-
Nature Biotechnology 04/2010; 28(4):337-40. · 29.50 Impact Factor
-
Kathy K Niakan, Hongkai Ji,
René Maehr,
Steven A Vokes,
Kit T Rodolfa,
Richard I Sherwood,
Mariko Yamaki,
John T Dimos,
Alice E Chen,
Douglas A Melton,
Andrew P McMahon,
Kevin Eggan
[show abstract]
[hide abstract]
ABSTRACT: In embryonic stem (ES) cells, a well-characterized transcriptional network promotes pluripotency and represses gene expression required for differentiation. In comparison, the transcriptional networks that promote differentiation of ES cells and the blastocyst inner cell mass are poorly understood. Here, we show that Sox17 is a transcriptional regulator of differentiation in these pluripotent cells. ES cells deficient in Sox17 fail to differentiate into extraembryonic cell types and maintain expression of pluripotency-associated transcription factors, including Oct4, Nanog, and Sox2. In contrast, forced expression of Sox17 down-regulates ES cell-associated gene expression and directly activates genes functioning in differentiation toward an extraembryonic endoderm cell fate. We show these effects of Sox17 on ES cell gene expression are mediated at least in part through a competition between Sox17 and Nanog for common DNA-binding sites. By elaborating the function of Sox17, our results provide insight into how the transcriptional network promoting ES cell self-renewal is interrupted, allowing cellular differentiation.
Genes & development 02/2010; 24(3):312-26. · 12.08 Impact Factor
-
Hongkai Ji
[show abstract]
[hide abstract]
ABSTRACT: Chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-seq) is a new technology to map protein-DNA interactions in a genome. The genome-wide transcription factor binding site and chromatin modification data produced by ChIP-seq provide invaluable information for studying gene regulation. This chapter reviews basic characteristics of ChIP-seq data and introduces a computational procedure to identify protein-DNA interactions from ChIP-seq experiments.
Methods in molecular biology (Clifton, N.J.) 01/2010; 674:143-59.
-
[show abstract]
[hide abstract]
ABSTRACT: Individual probes on an Affymetrix tiling array usually behave differently. Modeling and removing these probe effects are critical for detecting signals from the array data. Current data processing techniques either require control samples or use probe sequences to model probe-specific variability, such as with MAT. Although the MAT approach can be applied without control samples, residual probe effects continue to distort the true biological signals.
We propose TileProbe, a new technique that builds upon the MAT algorithm by incorporating publicly available data sets to remove tiling array probe effects. By using a large number of these readily available arrays, TileProbe robustly models the residual probe effects that MAT model cannot explain. When applied to analyzing ChIP-chip data, TileProbe performs consistently better than MAT across a variety of analytical conditions. This shows that TileProbe resolves the issue of probe-specific effects more completely.
http://www.biostat.jhsph.edu/ approximately hji/cisgenome/index_files/tileprobe.htm.
Bioinformatics 08/2009; 25(18):2369-75. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We present CisGenome, a software system for analyzing genome-wide chromatin immunoprecipitation (ChIP) data. CisGenome is designed to meet all basic needs of ChIP data analyses, including visualization, data normalization, peak detection, false discovery rate computation, gene-peak association, and sequence and motif analysis. In addition to implementing previously published ChIP-microarray (ChIP-chip) analysis methods, the software contains statistical methods designed specifically for ChlP sequencing (ChIP-seq) data obtained by coupling ChIP with massively parallel sequencing. The modular design of CisGenome enables it to support interactive analyses through a graphic user interface as well as customized batch-mode computation for advanced data mining. A built-in browser allows visualization of array images, signals, gene structure, conservation, and DNA sequence and motif information. We demonstrate the use of these tools by a comparative analysis of ChIP-chip and ChIP-seq data for the transcription factor NRSF/REST, a study of ChIP-seq analysis with or without a negative control sample, and an analysis of a new motif in Nanog- and Sox2-binding regions.
Nature Biotechnology 12/2008; 26(11):1293-300. · 29.50 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Sonic hedgehog (Shh) signals via Gli transcription factors to direct digit number and identity in the vertebrate limb. We characterized the Gli-dependent cis-regulatory network through a combination of whole-genome chromatin immunoprecipitation (ChIP)-on-chip and transcriptional profiling of the developing mouse limb. These analyses identified approximately 5000 high-quality Gli3-binding sites, including all known Gli-dependent enhancers. Discrete binding regions exhibit a higher-order clustering, highlighting the complexity of cis-regulatory interactions. Further, Gli3 binds inertly to previously identified neural-specific Gli enhancers, demonstrating the accessibility of their cis-regulatory elements. Intersection of DNA binding data with gene expression profiles predicted 205 putative limb target genes. A subset of putative cis-regulatory regions were analyzed in transgenic embryos, establishing Blimp1 as a direct Gli target and identifying Gli activator signaling in a direct, long-range regulation of the BMP antagonist Gremlin. In contrast, a long-range silencer cassette downstream from Hand2 likely mediates Gli3 repression in the anterior limb. These studies provide the first comprehensive characterization of the transcriptional output of a Shh-patterning process in the mammalian embryo and a framework for elaborating regulatory networks in the developing limb.
Genes & Development 11/2008; 22(19):2651-63. · 11.66 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Sonic hedgehog (Shh) acts as a morphogen to mediate the specification of distinct cell identities in the ventral neural tube through a Gli-mediated (Gli1-3) transcriptional network. Identifying Gli targets in a systematic fashion is central to the understanding of the action of Shh. We examined this issue in differentiating neural progenitors in mouse. An epitope-tagged Gli-activator protein was used to directly isolate cis-regulatory sequences by chromatin immunoprecipitation (ChIP). ChIP products were then used to screen custom genomic tiling arrays of putative Hedgehog (Hh) targets predicted from transcriptional profiling studies, surveying 50-150 kb of non-transcribed sequence for each candidate. In addition to identifying expected Gli-target sites, the data predicted a number of unreported direct targets of Shh action. Transgenic analysis of binding regions in Nkx2.2, Nkx2.1 (Titf1) and Rab34 established these as direct Hh targets. These data also facilitated the generation of an algorithm that improved in silico predictions of Hh target genes. Together, these approaches provide significant new insights into both tissue-specific and general transcriptional targets in a crucial Shh-mediated patterning process.
Development 06/2007; 134(10):1977-89. · 6.60 Impact Factor
-
Ji-Hye Paik,
Ramya Kollipara,
Gerald Chu, Hongkai Ji,
Yonghong Xiao,
Zhihu Ding,
Lili Miao,
Zuzana Tothova,
James W Horner,
Daniel R Carrasco,
Shan Jiang,
D Gary Gilliland,
Lynda Chin,
Wing H Wong,
Diego H Castrillon,
Ronald A DePinho
[show abstract]
[hide abstract]
ABSTRACT: Activated phosphoinositide 3-kinase (PI3K)-AKT signaling appears to be an obligate event in the development of cancer. The highly related members of the mammalian FoxO transcription factor family, FoxO1, FoxO3, and FoxO4, represent one of several effector arms of PI3K-AKT signaling, prompting genetic analysis of the role of FoxOs in the neoplastic phenotypes linked to PI3K-AKT activation. While germline or somatic deletion of up to five FoxO alleles produced remarkably modest neoplastic phenotypes, broad somatic deletion of all FoxOs engendered a progressive cancer-prone condition characterized by thymic lymphomas and hemangiomas, demonstrating that the mammalian FoxOs are indeed bona fide tumor suppressors. Transcriptome and promoter analyses of differentially affected endothelium identified direct FoxO targets and revealed that FoxO regulation of these targets in vivo is highly context-specific, even in the same cell type. Functional studies validated Sprouty2 and PBX1, among others, as FoxO-regulated mediators of endothelial cell morphogenesis and vascular homeostasis.
Cell 02/2007; 128(2):309-23. · 32.40 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Computational biology is a rapidly evolving area where methodologies from computer science, mathematics, and statistics are applied to address fundamental problems in biology. The study of gene regulatory information is a central problem in current computational biology. This article reviews recent development of statistical methods related to this field. Starting from microarray gene selection, we examine methods for finding transcription factor binding motifs and cis-regulatory modules in coregulated genes, and methods for utilizing information from cross-species comparisons and ChIP-chip experiments. The ultimate understanding of cis-regulatory logic in mammalian genomes may require the integration of information collected from all these steps.
Biometrics 10/2006; 62(3):645-63. · 1.83 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Genome-wide location analysis (ChIP-chip, ChIP-PET) is a powerful technique to study mammalian transcriptional regulation. In order to obtain a basic understanding of the location data generated for mammalian transcription factors and potential issues in their analysis, we conducted a comparative study of eight independent ChIP experiments involving six different transcription factors in human and mouse. Our cross-study comparisons, to the best of our knowledge the first to analyze multiple datasets, revealed the importance of carefully chosen genomic controls in the de novo identification of key transcription factor binding motifs, raised issues about the interpretation of ubiquitously occurring sequence motifs, and demonstrated the clustering tendency of protein-binding regions for certain transcription factors.
Nucleic Acids Research 02/2006; 34(21):e146. · 8.03 Impact Factor