ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia

Department of Genetics, Stanford University, Stanford, California 94305, USA
Genome Research (Impact Factor: 14.63). 09/2012; 22(9):1813-31. DOI: 10.1101/gr.136184.111
Source: PubMed


Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE ( and modENCODE ( portals.

Download full-text


Available from: Marc Perry
  • Source
    • "In total, we found 6,208 genomic peaks for GABPa (Figure 1B, Table S15). From a replicate experiment with 16.86 million ChIP-Seq reads, we identified 8,311 genomic peaks of GABPa, which were used for calculation of the overlap of regions among replicates and for calculation of the irreproducible discovery rate (IDR) as described before (Landt, et al. 2012). The 6,208 peaks from the first replicate were mapped to all UCSC known transcripts that start within ±5 kb of the peak (hg19). "
    [Show abstract] [Hide abstract]
    ABSTRACT: A substantial fraction of phenotypic differences between closely related species are likely caused by differences in gene regulation. While this has already been postulated over 30 years ago, only few examples of evolutionary changes in gene regulation have been verified. Here we identified and investigated binding sites of the transcription factor GABPa aiming to discover cis-regulatory adaptations on the human lineage. By performing ChIP-Seq experiments in a human cell line, we found 11,619 putative GABPa binding sites. Through sequence comparisons of the human GABPa binding regions with orthologous sequences from 34 mammals, we identified substitutions that have resulted in 224 putative human-specific GABPa binding sites. To experimentally assess the transcriptional impact of those substitutions, we selected four promoters for promoter-reporter gene assays using human and African green monkey cells. We compared the activities of wild-type promoters to mutated forms, where we have introduced one or more substitutions to mimic the ancestral state devoid of the GABPa consensus binding sequence. Similarly, we introduced the human-specific substitutions into chimpanzee and macaque promoter backgrounds. Our results demonstrate that the identified substitutions are functional, both in human and non-human promoters. In addition, we performed GABPa knock-down experiments and found 1,215 genes as strong candidates for primary targets. Further analyses of our datasets link GABPa to cognitive disorders, diabetes, KRAB zinc finger (KRAB-ZNF), and human-specific genes. Thus, we propose that differences in GABPa binding sites played important roles in the evolution of human-specific phenotypes.
    Full-text · Article · Jan 2016 · Molecular Biology and Evolution
  • Source
    • "ChIP assay was performed in duplicate using 50 μg of chromatin as previously described and according to ENCODE's guideline (Kelly et al. 2010; Landt et al. 2012). The following antibodies were used: H2A.Z (Abcam, ab4174); H3K4me3 (Active Motif, 39160); H3K4me1 (Active Motif, 39298); H3K27ac (Active Motif, 39297); and H3K27me3 (Active Motif, 39155). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The holistic role of DNA methylation in the organization of the cancer epigenome is not well understood. Here we perform a comprehensive, high-resolution analysis of chromatin structure to compare the landscapes of HCT116 colon cancer cells and a DNA methylation-deficient derivative. The NOMe-seq accessibility assay unexpectedly revealed symmetrical and transcription-independent nucleosomal phasing across active, poised, and inactive genomic elements. DNA methylation abolished this phasing primarily at enhancers and CpG island (CGI) promoters, with little effect on insulators and non-CGI promoters. Abolishment of DNA methylation led to the context-specific reestablishment of the poised and active states of normal colon cells, which were marked in methylation-deficient cells by distinct H3K27 modifications and the presence of either well-phased nucleosomes or nucleosome-depleted regions, respectively. At higher-order genomic scales, we found that long, H3K9me3-marked domains had lower accessibility, consistent with a more compact chromatin structure. Taken together, our results demonstrate the nuanced and context-dependent role of DNA methylation in the functional, multiscale organization of cancer epigenomes. © 2015 Lay et al.; Published by Cold Spring Harbor Laboratory Press.
    Full-text · Article · Mar 2015 · Genome Research
  • Source
    • "Interactions amongst TFs and TFBSs regulate the transcription of genes differentially on various developmental stages and tissue types [54]. Chromatin Immunoprecipitation followed by high-throughput DNA Sequencing (ChIP-Seq) has been widely used to detect TF-DNA interactions [55]. It can locate DNA regions bound by a certain TF. "
    [Show abstract] [Hide abstract]
    ABSTRACT: DNA, RNA and protein are three major kinds of biological macromolecules with up to billions of basic elements in such biological organisms as human or mouse. They function in molecular, cellular and organismal levels individually and interactively. Traditional assays on such macromolecules are largely experimentally based, which are usually time consuming and laborious. In the past few years, high-throughput technologies, such as microarray and next generation sequencing, were developed. Consequently, large genomic datasets are being generated and computational tools to analyzing these data are in urgent demand. This paper reviews several state-of-the-art high-throughput methodologies, representative projects, available databases and data analytic tools at different molecular levels. Finally, challenges and perspectives in processing biomedical big data are discussed.
    Full-text · Article · Feb 2015
Show more