Hao Wang

University of Washington Seattle, Seattle, WA, USA

Are you Hao Wang?

Claim your profile

Publications (8)185.85 Total impact

  • Article: Systematic localization of common disease-associated variation in regulatory DNA.
    [show abstract] [hide abstract]
    ABSTRACT: Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
    Science 09/2012; 337(6099):1190-5. · 31.20 Impact Factor
  • Article: The accessible chromatin landscape of the human genome.
    [show abstract] [hide abstract]
    ABSTRACT: DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.
    Nature 09/2012; 489(7414):75-82. · 36.28 Impact Factor
  • Article: An expansive human regulatory lexicon encoded in transcription factor footprints.
    [show abstract] [hide abstract]
    ABSTRACT: Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.
    Nature 09/2012; 489(7414):83-90. · 36.28 Impact Factor
  • Article: Widespread plasticity in CTCF occupancy linked to DNA methylation.
    [show abstract] [hide abstract]
    ABSTRACT: CTCF is a ubiquitously expressed regulator of fundamental genomic processes including transcription, intra- and interchromosomal interactions, and chromatin structure. Because of its critical role in genome function, CTCF binding patterns have long been assumed to be largely invariant across different cellular environments. Here we analyze genome-wide occupancy patterns of CTCF by ChIP-seq in 19 diverse human cell types, including normal primary cells and immortal lines. We observed highly reproducible yet surprisingly plastic genomic binding landscapes, indicative of strong cell-selective regulation of CTCF occupancy. Comparison with massively parallel bisulfite sequencing data indicates that 41% of variable CTCF binding is linked to differential DNA methylation, concentrated at two critical positions within the CTCF recognition sequence. Unexpectedly, CTCF binding patterns were markedly different in normal versus immortal cells, with the latter showing widespread disruption of CTCF binding associated with increased methylation. Strikingly, this disruption is accompanied by up-regulation of CTCF expression, with the result that both normal and immortal cells maintain the same average number of CTCF occupancy sites genome-wide. These results reveal a tight linkage between DNA methylation and the global occupancy patterns of a major sequence-specific regulatory factor.
    Genome Research 09/2012; 22(9):1680-8. · 13.61 Impact Factor
  • Source
    Article: Widespread site-dependent buffering of human regulatory polymorphism.
    [show abstract] [hide abstract]
    ABSTRACT: The average individual is expected to harbor thousands of variants within non-coding genomic regions involved in gene regulation. However, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence. To address this, we comprehensively analyzed heritable genome-wide binding patterns of a major sequence-specific regulator (CTCF) in relation to genetic variability in binding site sequences across a multi-generational pedigree. We localized and quantified CTCF occupancy by ChIP-seq in 12 related and unrelated individuals spanning three generations, followed by comprehensive targeted resequencing of the entire CTCF-binding landscape across all individuals. We identified hundreds of variants with reproducible quantitative effects on CTCF occupancy (both positive and negative). While these effects paralleled protein-DNA recognition energetics when averaged, they were extensively buffered by striking local context dependencies. In the significant majority of cases buffering was complete, resulting in silent variants spanning every position within the DNA recognition interface irrespective of level of binding energy or evolutionary constraint. The prevalence of complex partial or complete buffering effects severely constrained the ability to predict reliably the impact of variation within any given binding site instance. Surprisingly, 40% of variants that increased CTCF occupancy occurred at positions of human-chimp divergence, challenging the expectation that the vast majority of functional regulatory variants should be deleterious. Our results suggest that, even in the presence of "perfect" genetic information afforded by resequencing and parallel studies in multiple related individuals, genomic site-specific prediction of the consequences of individual variation in regulatory DNA will require systematic coupling with empirical functional genomic measurements.
    PLoS Genetics 03/2012; 8(3):e1002599. · 8.69 Impact Factor
  • Source
    Article: An integrated encyclopedia of DNA elements in the human genome
    Nature 01/2012; 489(7414):57-74. · 36.28 Impact Factor
  • Article: Experimental validation of predicted mammalian erythroid cis-regulatory modules.
    [show abstract] [hide abstract]
    ABSTRACT: Multiple alignments of genome sequences are helpful guides to functional analysis, but predicting cis-regulatory modules (CRMs) accurately from such alignments remains an elusive goal. We predict CRMs for mammalian genes expressed in red blood cells by combining two properties gleaned from aligned, noncoding genome sequences: a positive regulatory potential (RP) score, which detects similarity to patterns in alignments distinctive for regulatory regions, and conservation of a binding site motif for the essential erythroid transcription factor GATA-1. Within eight target loci, we tested 75 noncoding segments by reporter gene assays in transiently transfected human K562 cells and/or after site-directed integration into murine erythroleukemia cells. Segments with a high RP score and a conserved exact match to the binding site consensus are validated at a good rate (50%-100%, with rates increasing at higher RP), whereas segments with lower RP scores or nonconsensus binding motifs tend to be inactive. Active DNA segments were shown to be occupied by GATA-1 protein by chromatin immunoprecipitation, whereas sites predicted to be inactive were not occupied. We verify four previously known erythroid CRMs and identify 28 novel ones. Thus, high RP in combination with another feature of a CRM, such as a conserved transcription factor binding site, is a good predictor of functional CRMs. Genome-wide predictions based on RP and a large set of well-defined transcription factor binding sites are available through servers at http://www.bx.psu.edu/.
    Genome Research 01/2007; 16(12):1480-92. · 13.61 Impact Factor
  • Article: Global regulation of erythroid gene expression by transcription factor GATA-1.
    [show abstract] [hide abstract]
    ABSTRACT: Transcription factor GATA-1 is required for erythropoiesis, yet its full actions are unknown. We performed transcriptome analysis of G1E-ER4 cells, a GATA-1-null erythroblast line that undergoes synchronous erythroid maturation when GATA-1 activity is restored. We interrogated more than 9000 transcripts at 6 time points representing the transition from late burst forming unit-erythroid (BFU-E) to basophilic erythroblast stages. Our findings illuminate several new aspects of GATA-1 function. First, the large number of genes responding quickly to restoration of GATA-1 extends the repertoire of its potential targets. Second, many transcripts were rapidly down-regulated, highlighting the importance of GATA-1 in gene repression. Third, up-regulation of some known GATA-1 targets was delayed, suggesting that auxiliary factors are required. For example, induction of the direct GATA-1 target gene beta major globin was late and, surprisingly, required new protein synthesis. In contrast, the gene encoding Fog1, which cooperates with GATA-1 in beta globin transcription, was rapidly induced independently of protein synthesis. Guided by bioinformatic analysis, we demonstrated that selected regions of the Fog1 gene exhibit enhancer activity and in vivo occupancy by GATA-1. These findings define a regulatory loop for beta globin expression and, more generally, demonstrate how transcriptome analysis can be used to generate testable hypotheses regarding transcriptional networks.
    Blood 12/2004; 104(10):3136-47. · 9.90 Impact Factor