An expansive human regulatory lexicon encoded in transcription factor footprints

Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
Nature (Impact Factor: 41.46). 09/2012; 489(7414):83-90. DOI: 10.1038/nature11212
Source: PubMed

ABSTRACT Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.

37 Reads
    • "Notably, PE-QMS provides a DNA-context-dependent method of identifying transcriptional regulatory complexes that requires no a priori knowledge of transcription factor identity (Ranish et al, 2003, 2004). Moreover, it can identify cofactor and combinatorial interactions, which are not readily achieved using epigenomic and chromatin accessibility information (Ramsey et al, 2010; Neph et al, 2012; Sherwood et al, 2014). A unique aspect of our PE-QMS design was assessing not only the effect of LXR ligand stimulation on gene regulatory complex composition, but also the subset of that response which was dependent on LXR promoter binding. "
    [Show abstract] [Hide abstract]
    ABSTRACT: LXR-cofactor complexes activate the gene expression program responsible for cholesterol efflux in macrophages. Inflammation antagonizes this program, resulting in foam cell formation and atherosclerosis; however, the molecular mechanisms underlying this antagonism remain to be fully elucidated. We use promoter enrichment-quantitative mass spectrometry (PE-QMS) to characterize the composition of gene regulatory complexes assembled at the promoter of the lipid transporter Abca1 following downregulation of its expression. We identify a subset of proteins that show LXR ligand- and binding-dependent association with the Abca1 promoter and demonstrate they differentially control Abca1 expression. We determine that NCOA5 is linked to inflammatory Toll-like receptor (TLR) signaling and establish that NCOA5 functions as an LXR corepressor to attenuate Abca1 expression. Importantly, TLR3-LXR signal crosstalk promotes recruitment of NCOA5 to the Abca1 promoter together with loss of RNA polymerase II and reduced cholesterol efflux. Together, these data significantly expand our knowledge of regulatory inputs impinging on the Abca1 promoter and indicate a central role for NCOA5 in mediating crosstalk between pro-inflammatory and anti-inflammatory pathways that results in repression of macrophage cholesterol efflux. © 2015 The Authors.
    The EMBO Journal 03/2015; 34(9). DOI:10.15252/embj.201489819 · 10.43 Impact Factor
  • Source
    • "In particular, experimental results obtained by chromatin immunoprecipitation (ChIP), FAIRE, and DNase I footprinting assays in combination with high-throughput sequencing have unmasked what was previously a hidden landscape of active DNA regions (Rhee and Pugh, 2011; Furey, 2012; Neph et al., 2012). The compendium of ChIP-seq determined DNA-binding for 119 different proteins in 72 cell experiments produced by the Encyclopedia of DNA Elements (ENCODE) consortium alone has revealed that the number of TF binding events greatly exceeds the number of genes in the genome and that over 8% of the genome can be bound by at least one TF (ENCODE Project Consortium, 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: As exome sequencing gives way to genome sequencing, the need to interpret the function of regulatory DNA becomes increasingly important. To test whether evolutionary conservation of cis-regulatory modules (CRMs) gives insight into human gene regulation, we determined transcription factor (TF) binding locations of four liver-essential TFs in liver tissue from human, macaque, mouse, rat, and dog. Approximately, two thirds of the TF-bound regions fell into CRMs. Less than half of the human CRMs were found as a CRM in the orthologous region of a second species. Shared CRMs were associated with liver pathways and disease loci identified by genome-wide association studies. Recurrent rare human disease causing mutations at the promoters of several blood coagulation and lipid metabolism genes were also identified within CRMs shared in multiple species. This suggests that multi-species analyses of experimentally determined combinatorial TF binding will help identify genomic regions critical for tissue-specific gene control. DOI:
    eLife Sciences 10/2014; 3. DOI:10.7554/eLife.02626 · 9.32 Impact Factor
  • Source
    • "Finally, it has been shown that in a typical DNase-seq experiment, the number of footprints saturates only after reaching a very high sequencing depth (>400 millions aligned reads) (Neph et al., 2012c). Given this observation, we also evaluate how the number of footprints and the reconstructed networks depend on the read coverage by progressively down-sampling the alignment files. "
    [Show abstract] [Hide abstract]
    ABSTRACT: DNase I is an enzyme preferentially cleaving DNA in highly accessible regions. Recently, Next-Generation Sequencing has been applied to DNase I assays (DNase-seq) to obtain genome-wide maps of these accessible chromatin regions. With high-depth sequencing, DNase I cleavage sites can be identified with base-pair resolution, revealing the presence of protected regions (“footprints”), corresponding to bound molecules on the DNA. Integrating footprint positions close to transcription start sites with motif analysis can reveal the presence of regulatory interactions between specific transcription factors (TFs) and genes. However, this inference heavily relies on the accuracy of the footprint call and on the sequencing depth of the DNase-seq experiment. Using ENCODE data, we comprehensively evaluate the performances of two recent footprint callers (Wellington and DNaseR) and one metric (the Footprint Occupancy Score, or FOS), and assess the consequences of different footprint calls on the reconstruction of TF-TF regulatory networks. We rate Wellington as the method of choice among those tested: not only its predictions are the best in terms of accuracy, but also the properties of the inferred networks are robust against sequencing depth.
    Frontiers in Genetics 09/2014; 5(278). DOI:10.3389/fgene.2014.00278
Show more