Combinatorial effects of multiple enhancer variants
in linkage disequilibrium dictate levels of gene expression
to confer susceptibility to common traits
Olivia Corradin,1Alina Saiakhova,1Batool Akhtar-Zaidi,1Lois Myeroff,2Joseph Willis,2,3
Richard Cowper-Sal?lari,4Mathieu Lupien,4Sanford Markowitz,1,2,5
and Peter C. Scacheri1,2,6
1Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, Ohio 44122, USA;2Case Comprehensive
Cancer Center, Case Western Reserve University, Cleveland, Ohio 44106, USA;3Department of Pathology, Case Western Reserve
University, Cleveland, Ohio 44122, USA;4The Princess Margaret Cancer Center, University Health Network, Toronto,
Ontario M5G 1L7, Canada;5Department of Medicine, Case Western Reserve University, Cleveland, Ohio 44122, USA
DNA variants (SNPs) that predispose to common traits often localize within noncoding regulatory elements such as
enhancers. Moreover, loci identified by genome-wide association studies (GWAS) often contain multiple SNPs in
linkage disequilibrium (LD), any of which may be causal. Thus, determining the effect of these multiple variant SNPs on
target transcript levels has been a major challenge. Here, we provide evidence that for six common autoimmune dis-
orders (rheumatoid arthritis, Crohn’s disease, celiac disease, multiple sclerosis, lupus, and ulcerative colitis), the GWAS
association arises from multiple polymorphisms in LD that map to clusters of enhancer elements active in the same cell
type. This finding suggests a ‘‘multiple enhancer variant’’ hypothesis for common traits, where several variants in LD
impact multiple enhancers and cooperatively affect gene expression. Using a novel method to delineate enhancer–gene
interactions, we show that multiple enhancer variants within a given locus typically target the same gene. Using
available data from HapMap and B lymphoblasts as a model system, we provide evidence at numerous loci that multiple
enhancer variants cooperatively contribute to altered expression of their gene targets. The effects on target transcript
levels tend to be modest and can be either gain- or loss-of-function. Additionally, the genes associated with multiple
enhancer variants encode proteins that are often functionally related and enriched in common pathways. Overall, the
multiple enhancer variant hypothesis offers a new paradigm by which noncoding variants can confer susceptibility to
[Supplemental material is available for this article.]
Since 2005 more than 1350 genome-wide association studies
(GWAS) have been published, identifying thousands of single
nucleotide polymorphisms (SNPs) associated with more than 600
common traits and diseases. The vast majority of these GWAS
variants (93%–96%) are located outside protein-coding genes, and
only 10%–15% are in linkage disequilibrium (LD) with a protein-
coding variant (Mauranoet al. 2012; Schaub et al. 2012). Thus, it is
hypothesized that causal GWAS variants influence the function of
noncoding gene regulatory elements such as enhancers. Consis-
tent with this hypothesis, The ENCODE Project Consortium and
other groups have localized thousands of GWAS variants to en-
hancer elements identified through epigenomic profiling studies.
Specifically, DNase I hypersensitivity site (DHS) profiling in 349
different human cell types revealed that 77% of GWAS SNPs are
located in open chromatin presumed to contain enhancer ele-
SNPs correlate with enhancer elements marked with H3K4me1,
H3K27ac, and H3K4me3, usually in cell types that are considered
relevant to a given disorder (Ernst et al. 2011; Akhtar-Zaidi et al.
2012; Trynka et al. 2013).
Although a considerable numberof GWASvariantshave been
identified and mapped to enhancer elements, our understanding
of how these GWAS SNPs confer risk to a given trait is incomplete.
The prevailing hypothesis posits that enhancer variants impact
expression of the gene normally regulated by the enhancer.
However, because the gene targets of most enhancers have not yet
been identified, the impact of most enhancer variants on target
transcript levels remains unknown. Even in instances where the
target gene is known, assessing the transcriptional impact of the
risk variant is complicated. Reporter assays can be employed, but
these fail to recapitulate native chromatin context. As an alterna-
tive, one can compare levels of the predicted target gene between
individuals carrying the risk allele and those carrying the nonrisk
allele. However, because the enhancer landscape differs consider-
ably between cell types, identifying the appropriate target tissue to
examine is challenging. Large sample sizes are required to account
for interindividual variability in gene expression. There are often
multiple SNPs in LD with the ‘‘lead’’ SNP at any given GWAS locus,
any of which could be causal. The prevailing assumption is that
? 2014 Corradin et al.
Harbor Laboratory Press for the first six months after the full-issue publication
date (see http://genome.cshlp.org/site/misc/terms.xhtml). After sixmonths, itis
available under a Creative Commons License (Attribution-NonCommercial 3.0
Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.
This article is distributed exclusively by Cold Spring
Article published online before print. Article, supplemental material, and publi-
cation date are at http://www.genome.org/cgi/doi/10.1101/gr.164079.113.
24:1–13 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/14; www.genome.org
the causal variants are those that perturb a given regulatory ele-
ment located within the locus defined by the GWAS-association
signal, but this has not been thoroughly investigated for all but
a few GWAS loci (Pomerantz et al. 2009; Zhang et al. 2012). It is
likely that the expression of many if not most genes is often under
exquisite control by multiple regulatory elements, but few studies
have considered the possibility that, at a given GWAS locus, mul-
tiple LD SNPs may impact the function of more than one regula-
tory element and coordinately affect gene expression.
Here we provide evidence that for six human autoimmune
traits, the GWAS association often arises from several poly-
morphisms in LD that map to multiple enhancer elements in cis
and active in the same cell type. This finding suggests that, for
many common traits, there is not a single causal variant that un-
derlies the association signal, but rather several distributed among
multiple enhancers that cooperatively influence gene expression.
We call this the ‘‘multiple enhancer variant’’ hypothesis. Using
epigenome, we show that for a given GWAS locus involving mul-
tiple enhancer variants, the associated enhancers always target the
same gene. Using availabledata from HapMap and B lymphoblasts
as a model system, we provide evidence at numerous loci that
multiple enhancer variants cooperatively contribute to altered
expression of their gene targets. Our study provides a novel ex-
planation for how noncoding regulatory variants confer suscepti-
bility to common traits. Our study also serves as a general frame-
work for annotating noncoding GWAS SNPs, pinpointing the
causal variants and their gene targets, and assessing their impact
on target gene expression. Finally, these findings may help explain
the missing heritability of many common traits and complex
Common disease loci often involve multiple enhancer variants
To study the impact of GWAS–enhancer variants on gene expres-
sion, we first set out to identify a specific cell type for which the
enhancer landscape showed a significant correlation with SNPs
associated with a given trait. To do this, SNPs (both lead SNPs and
those in LD) associated with 627 traits were mapped to H3K4me1
sites (putative enhancer elements) identified through ChIP-seq in
13 different cell types (The ENCODE Project Consortium 2007;
Bernstein et al. 2010; Joseph et al. 2010; Ernst et al. 2011; Hindorff
et al. 2012). Among the SNPs that overlapped H3K4me1 sites,
those associated with six traits including rheumatoid arthritis,
Crohn’s disease, systemic lupus, multiple sclerosis, ulcerative co-
litis, and celiac disease—generally considered autoimmune dis-
orders—often mapped within H3K4me1 sites in B lymphoblasts
(GM12878). We performed Variant Set Enrichment Analysis
(Akhtar-Zaidi et al. 2012; Cowper-Sal lari et al. 2012) to test the
significance of the correlations between H3K4me1 sites in B lym-
phoblasts and all SNPs associated with each of the six autoim-
mune-related traits. For comparison, 12 additional cell types were
also analyzed. Highly significant correlations were found between
the risk SNPs associated with all six traits and H3K4me1 sites in
GM12878 cells (Fig. 1A,B; Supplemental Fig. S1). Other cell types
showed correlations that were either insignificant or less signifi-
cant than those observed in the B lymphoblasts. Two-thirds of the
H3K4me1 sites containing SNPs were enriched for H3K27ac and
were hypersensitive to DNase I digestion, and thus contain the
signature features of active enhancer elements (Fig. 1C–E). Addi-
tionally, 94% were located distal (>1 kb) to transcription start sites.
These results establish B lymphoblasts as a relevant model to in-
vestigate the function of enhancer variants associated with six
We next inspected individual loci where autoimmune SNPs
overlapped putative enhancer elements in B lymphoblasts. At
most loci, multiple common SNPs in LD were found to overlap
multiple putative enhancers arranged in cis. We call these regions
‘‘multiple enhancer variant loci’’, which we define as loci where at
least two GWAS SNPs in LD overlap at least two distinct H3K4me1
sites. An example is shown in Figure 1F, where several Crohn’s
disease SNPs in tight LD (LOD > 2 and D9 > 0.99) with one another
are located within multiple putative enhancers distributed across
the ICOSLG locus. These sites are distinguished from ‘‘single en-
hancer variant loci’’, in which one or more common SNPs overlap
only one enhancer at a given locus (Fig. 1G). All six autoimmune
traits showed evidence of multiple enhancer involvement. More-
over, for all six traits, the number of multiple enhancer variant loci
was far greater than the number of single enhancer variant loci
(Fig. 1H). The number of enhancers associated with each GWAS
locus is shown in Figure 1I. Additionally, compared with 67% of
autoimmune-associated enhancer SNP loci, 51% of randomly se-
lected enhancer SNPs not associated with disease were in LD with
another SNP that mapped to a separate putative enhancer (Fig. 1J).
Thus, autoimmune disease-associated loci are inherently biased
toward multiple enhancer involvement over the background rate
at which common SNPs in LD at a given locus map to multiple
enhancers. These findings led us to consider the possibility that
multiple variants mapping to multiple enhancers within a given
GWAS locus collectively contribute to the GWAS signal and confer
genetic predisposition to common disease. We call this the ‘‘mul-
tiple enhancer variant’’ hypothesis.
Delineation of enhancer–gene interactions with PreSTIGE
We next set out to connect enhancer variants with their gene
targets. The epigenome of a given cell type contains ;75,000
H3K4me1 sites, and these putative enhancers can be located
considerably far upstream of or downstream from the gene whose
expression they influence, and can even influence the expression
of multiple target genes (Spilianakis and Flavell 2004; Vernimmen
et al. 2009). The most common method of assigning an enhancer
computational approaches designed to delineate enhancer–gene
interactions (Ernst et al. 2011; Shen et al. 2012; Thurman et al.
2012), but these methods are not publicly available and their false
discovery rates (FDR) are unknown (Supplemental Table S1).
Moreover, when applied to GWAS, these methods link <10% of
noncoding variants with a predicted gene target. Due to these
limitations, we developed our own bioinformatics approach for
delineating enhancer–gene interactions, which we call PreSTIGE
(Predicting Specific Tissue Interactions of Genes and Enhancers).
PreSTIGE integrates available H3K4me1 ChIP-seq and RNA-seq
data sets from a panel of diverse cell types, and then pairs cell type-
Material). Using PreSTIGE, we delineated enhancer–gene inter-
actions in each of the 13 cell types listed in Figure 1A. Enhancer–
gene predictions were made using two different gene–enhancer
interactions for 46% and 28% of all expressed genes at low and
Corradin et al.
rheumatoid arthritis SNPs in putative enhancer elements in GM12878 cells. Boxplots represent the normalized null distribution generated using 1000
corrected for multiple testing. (B) Noncoding SNPs associated with rheumatoid arthritis. Shaded boxes denote instances where an H3K4me1 ChIP-seq
peak detected in the indicated cell types overlaps either the GWAS lead SNP (listed to the right), or a SNP in LD with the lead SNP. The red boxes denote
SNPs that drive the significant association with H3K4me1 sites in colon crypt and GM12878 cells. (C, left) Average H3K4me1 ChIP-seq signal at loci
containing autoimmune disease-associated SNPs (red) and control H3K4me1 sites not associated with disease (black). (Right) H3K4me1 ChIP-seq signals
at autoimmune disease-associated loci. Each row corresponds to an H3K4me1 site containing a SNP associated with any one of the six autoimmune
diseases. (D) Same as C for H3K27ac. The dashed red line corresponds to the threshold of H3K27ac enrichment. (*) P < 0.004, Wilcoxon test (paired
samples). (E) Same as C for DNase I hypersensitivity. The dashed red line corresponds to the threshold of DNase I HS. (F) Example of multiple enhancer
variant locus associated with Crohn’s disease. SNPs in LD with the lead SNP (rs762421) fall within multiple putative enhancer sites (gray boxes) enriched
for each of the six autoimmune traits. For example, for RA, 29 loci show evidence of multiple enhancer involvement, while seven show evidence of single
elements(purple), and other (gray).Piechartsdisplay thenumberof enhancers containing SNPs for eachGWAS locus. (J)Percent ofloci associated with all
six autoimmune diseases showing evidence of multiple enhancer involvement in B lymphoblasts (red), compared with loci not known to be associated
with a disease (gray). (**) P < 0.0001, by Fisher’s exact test.
Multiple enhancer variant loci associated with autoimmune diseases. (A) Variant Set Enrichment (VSE) analysis depicting enrichment of
GM12878-specific H3K4me1 sites outlined in black are predicted to interact with the SOCS1 promoter, and not the RMI2 promoter. Levels of SOCS1 and
RMI2 transcripts quantified by RNA-seq in each cell type. (FPKM) Fragments per kilobase of transcript per million fragments mapped. (B) UCSC Genome
Browser image of putative enhancers lost in CRC (lost VELs) relative to normal colon crypts. The three H3K4me1 sites highlighted in gray are predicted to
control gene, ASAP3. Note that TCEA3 levels are reduced in CRC lines containing lost VELs while ASAP3 is unaffected. (C) UCSC Genome Browser image of
a representative gained enhancer locus (gained VEL). The H3K4me1 sites highlighted in gray are predicted to target SERBP1 in CRC lines V9P and V703.
Gene expression inthe colon cancer cell linesrelativeto thecolon crypt forthe predicted target SERBP1 andnonpredicted controlgene IL12RB2. Notethat
expression. The left side of the heatmap corresponds to the number of lost (top) or gained (bottom) VELs associated with each gene (rows) in each of the
nine CRC cell lines (columns). Dark blue denotes multiple VELs, whereas white indicates no VEL. The right side of the heatmap is ordered identically to the
left side, and illustrates the change in expression (CRC/crypt) of the genes associated with the VELs by PreSTIGE (left) and the nearest gene to the VEL
(right). (E) Approximation of PreSTIGE FDR (mean 6 SEM) based on colon cancer VEL data compared with five commonly used computational methods.
(*) P < 0.003, by paired t-test.
PreSTIGE methodology and FDR. (A) PreSTIGE links cell type-specific enhancers to genes specifically expressed in the same cell type.
high threshold, respectively. We verified accuracy by comparing
PreSTIGE-identified interactions to those identified through mul-
tiple experimental approaches including 3C (Vernimmen et al.
2009), 5C (Sanyal et al. 2012), ChIA-PET (Li et al. 2012), and eQTL
(Stranger et al. 2007; Schadt et al. 2008; Montgomery et al. 2010)
analysis. The enhancer–gene interactions identified by PreSTIGE
were highly enriched among interactions detected by all four
methods (Supplemental Figs. S11–S14).
we took advantage of recent findings from our lab indicating that
colorectal cancer (CRC) cells display variant enhancer loci (VELs),
which are locus-specific losses and gains of the H3K4me1 mark
relative to normal colonic crypts, from which these cancers are
derived (Akhtar-Zaidi et al. 2012). Similar to a knockdown or
knockout experiment, locus-specific loss of the H3K4me1–en-
be expected to lead to reduced expression of the crypt-predicted
target in CRC, assuming the gene has been correctly assigned to its
enhancer. Likewise, we would expect the correctly predicted target
to crypts. Examples are shown in Figure 2B,C. Extending this
validation approach genome-wide, we plotted the levels of all
PreSTIGE-predicted gene targets of lost and gained VELs detected
among nine CRC cell lines. The results show high concordance
between VELs and PreSTIGE-predicted gene targets, with lost VEL
genes showing reduced levels in CRC relative to crypts, and gained
VEL genes showing elevated levels relative to crypts (Fig. 2D, left).
The magnitude of the transcriptional effect was largely quantita-
tive with the number of VELs involved. Genes associated with
a high number of lost VELs were expressed at lower levels than
genes associated with fewer lost VELs, and transcripts linked to
multiple gained VELs were elevated more than genes linked to
fewer gained VELs. Verifying the specificity of the predicted in-
teractions, the correlations between VELs and gene expression
were greatly diminished when the expression of the gene nearest
to the VEL was analyzed (Fig. 2D, right). Additionally, the tran-
scriptional effect was specific to the predicted gene targets of the
VELs and was not observed for nearby genes not predicted to be
regulated by the VEL (Supplemental Fig. S15). Overall, we estimate
the FDR at 23%–38% for low stringency PreSTIGE predictions, and
13%–25% for high stringency predictions. This represents a 1.5- to
4.5-fold improvement over the nearest gene, the nearest expressed
gene, and other commonly used methods for assigning enhancers
to their gene targets (Fig. 2E; Supplemental Fig. S16). To evaluate
our FDR approximation as conservatively as possible we next
compared the PreSTIGE FDR with that of randomly assigning
genes to enhancer elements. Random association of gene targets
results in an FDR of 70%. We then corrected the PreSTIGE FDR
based on the chance that a randomly assigned gene decreases in
expression. This conservative approach approximates the FDR for
PreSTIGE high stringency predictions to be 13%–35% (Supple-
mental Figs. S16, S17). We provide PreSTIGE as an open-source,
freely available tool (for details, see Supplemental Material).
Multiple enhancer variants in LD are often predicted to target
the same gene
We used PreSTIGE to predict the gene targets of the autoimmune-
associated enhancer variants in B lymphoblasts. Of the 207 total
autoimmune-disorder-associated loci examined, PreSTIGE pre-
high stringency. We next divided the 88 loci into those involving
multiple enhancer variants (61%) and those involving single en-
hancer variants (39%). Remarkably, 100% of the multiple en-
hancer variant loci contained at least two enhancers in cis pre-
dicted to target the same gene. As an example, we highlight the set
of Crohn’s disease SNPs distributed among multiple putative en-
hancers at the ICOSLG locus (Fig. 1F). These enhancers were pre-
dicted by PreSTIGE to interact with the ICOSLG gene. As a second
example, we highlight ulcerative colitis-associated LD SNPs dis-
tributed among four enhancers, all predicted to target IRF5 (Fig.
3A). The 88 enhancer-associated loci were dispersed throughout
the genome, with only a small fraction overlapping among the six
different traits (Supplemental Fig. S18).
Multiple enhancer variant loci were significantly enriched for
14 known transcription factor binding site motifs (Supplemental
Table S3; Davydov et al. 2010). Many of these motifs corresponded
to transcription factors with clear roles in regulating immune-
system activity, including the IRF (interferon regulatory factor)
family. Also of interest, 97% of multiple enhancer variant loci in-
volve at least two enhancers that shareat least onecommonmotif,
implying cooperative regulation. As an example, we highlight
rs2301436, a Crohn’s disease-associated locus on chromosome 6.
This locus has three enhancer elements that share the motif for
ELF1, a member of the ETS transcription factor family associated
with autoimmune function (Gallant and Gilkeson 2006). Variants
contained within single and multiple enhancer loci showed simi-
lar degrees of evolutionary conservation (Supplemental Fig. S19;
Davydov et al. 2010).
Multiple enhancer variants impact target gene expression
Given that the SNPs associated with the six autoimmune traits
were significantly enriched among enhancers in a B lymphoblast
cell line, we reasoned that B lymphoblast lines derived from a co-
hort of individuals could be exploited to study the impact of
multiple enhancer variants on target transcript levels. To this end,
(RNA-seq) along with corresponding SNP genotype data from a
cohort of 61 CEU individuals, made available through the In-
ternational HapMap Consortium (The International HapMap Con-
sortium 2007; The International HapMap 3 Consortium 2010).
The 61 individuals were stratified by their genotypes (risk versus
nonrisk) at 42 GWAS loci for which the GWAS-associated SNPs
were in ‘‘perfect LD’’ and for which there was adequate genotypic
diversity among individuals for sufficient statistical power. Of the
42 loci, 14 were single enhancer variant loci, and 28 were multiple
enhancer variant loci. The levels of predicted transcripts at each
locus were then plotted as FPKMs (fragments per kilobase per
million reads) for each individual, stratified by genotype. Tran-
script levels at each locus were somewhat variable among in-
dividuals with the same genotype, reducing power. Nonetheless,
13 out of 28 (46%) multiple enhancer variant loci contained genes
genotype and those with the nonrisk genotype. By comparison,
a difference between risk and nonrisk genotypes was observed for
only 1 out of 14 (7%) single enhancer variant loci. The rate at
which single enhancer variants impacted gene expression was
(nearest nonpredicted gene target) and a control set of randomly
selected expressed genes (>0.3 FPKM) located within 500 kb of the
lead enhancer SNP at each locus (Fig. 3A). Thus, an effect on gene
expression was rarely observed for genes not predicted to be reg-
ulated by PreSTIGE. Approximately 70% of loci associated with
Combinatorial effects of multiple enhancer variants
with ulcerative colitis. Red arrow indicates lead SNP (rs4728142). FPKMs (fragments per kilobase per million reads) of the predicted target transcript IRF5
expression for each gene. (B) Percent of GWAS loci with transcripts that show differential gene expression based on SNP genotype at single (left) and
multiple (right) enhancer variant loci. Enhancer SNPs linked to a gene target using PreSTIGE are shown in purple and red. Controls include the expressed
gene nearest the SNP that is not a PreSTIGE-predicted target (gray) and a randomly selected expressed gene (within 500 kb) that is not a PreSTIGE-
predicted target (black) (Fisher’s exact test). (C) Percent of GWAS loci associated with differential gene expression for single enhancer variant loci (purple)
versus all multiple enhancer variant loci (red) and loci with more than four enhancers with variants (black) (Fisher’s exact test). (D–G) Expression of
loci. Gray Tukey plots display the normal range of expression for each gene. ‘‘rs’’ numbers correspond to the lead SNPs at each GWAS locus (Mann-
Whitney-Wilcoxon test). (*) P < 0.02, (**) P < 0.007, (***) P < 0.0001.
Impact of multiple enhancer variants on gene expression. (A) UCSC Genome Browser image of multiple enhancer variant locus associated
SNPs hitting more than four enhancers contained transcripts
that significantly differed between risk and nonrisk genotypes,
compared with 46% of all multiple enhancer variant loci (two or
more enhancers involved) and 7% of single enhancer variant
loci (Fig. 3B). Results for five genes associated with multiple
enhancer variants, including those found at the IRF5 (ulcerative
colitis) and ICOSLG (Crohn’s disease) exemplar loci, are shown
in Figure 3C–G. In general, multiple enhancer variants conferred
modest effects on target gene expression (<2.5-fold) and the ef-
fect was not always in the same direction. Specifically, of the
loci where gene expression was significantly impacted by the
genotype of the risk SNP, 75% of target transcripts were ele-
vated, and 25% were suppressed.
Multiple enhancer variants cooperatively contribute
to the effect on target transcript levels
within the 61-person HapMap panel to test if more than one en-
hancer SNP within multiple enhancer variant loci is capable of
impacting the levels of the target gene. To do this, we identified
regions of ‘‘imperfect LD.’’ SNPs in these regions of imperfect LD
are reported to be in tight LD (LOD > 2 and D9 > 0.99) among the
CEU population, but upon examination of the actual haplotypes
within the 61-person HapMap panel, there are several individuals
in which not all enhancer variants are inherited together. These
loci provide an opportunity to test if multiple enhancer variants
cooperatively impact gene expression. An overview of the ap-
proach is shown in Figure 4A.
We stratified individuals by their genotype of the risk SNP at
loci containing multiple enhancer variants, and analyzed the
levels of the predicted target transcripts. At sites of imperfect LD,
only 7% (two of 29) of loci contained a predicted gene target that
showed a difference in gene expression between individuals
stratified by the risk genotype (which was similar to the back-
ground rate of nearest and random control genes). By comparison,
significant effects on target transcript levels were observed at 46%
of loci in which multiple enhancer variants were inherited in
4C. Here, the rheumatoid arthritis risk SNP rs706778 is in imper-
fect LD with three additional SNPs (rs3134883, rs3118470, and
rs7090530). Both SNPs rs706778 and rs3134883 map to one en-
hancer; rs3118470 and rs7090530 map to a separate enhancer.
Bothenhancersare predictedtoregulatePFKFB3. PFKFB3levelsare
not significantly different between individuals stratified by the
genotype of any one of the four SNPs (Fig. 4C). However, exami-
nation of the haplotypes reveals multiple alleles within the pop-
ulation, including homozygotes for the common nonrisk allele
CGTC (purple), heterozygotes for the common risk allele TACA
(red), and individuals with haplotypes that vary from the two
common alleles (CGTA and TGTA, black and gray). Upon exclu-
sion of the individuals with the uncommon alleles where the LD
structure is disrupted,arobust difference inPFKFB3 levelsisclearly
apparent between individuals who are homozygous for the non-
risk alleles and those heterozygous for the risk allele (Fig. 4D).
Thus, the effect on gene expression depends on the genetic
makeup of multiple SNPs within the haplotype, not just one en-
hancer SNP. Moreover, the SNPs associated with both enhancers
need to be considered when assessing the impact on target gene
when individuals were stratified by the genotype of the risk SNP.
Moreover, this variability was reduced substantially upon stratifi-
cation by haplotype (Fig. 4D, cf. left to right). We extended this
analysis to all 29 imperfect LD loci, plotting the average standard
the haplotype. When the entire haplotype was considered, a sig-
nificant decrease in variability of transcript levels was observed
(Fig. 4E). This further demonstrates that, for a given locus, SNPs at
more than one enhancer are likely to mediate the transcriptional
effect. Furthermore, odds ratios for risk annotation at loci in-
volving multiple enhancer variants in perfect LD were generally
higher than those in imperfect LD (Fig. 4F).
Gene targets of multiple enhancer variants are highly cell
type-specific and functionally related
We noticed that PreSTIGE identified gene targets of GWAS en-
baseline prediction rate. Specifically, although only 33% of all
GM12878-specific H3K4me1 sites were assigned to a gene, 55% of
GM12878-specific H3K4me1 sites containing an immune-related
GWAS SNP were assigned to a gene (Fig. 5A). To investigate the
basis for this bias, we ‘‘scored’’ all enhancers and genes by their
relative levels of specificity in GM12878 cells compared with 11
other cell types. We then plotted the range of specificity scores for
all enhancers and genes in GM12878 cells, enhancers and genes
predicted by PreSTIGE in GM12878 cells, and PreSTIGE-predicted
enhancers and genes contained within multiple enhancer variant
loci. Enhancers contained within disease-associated multiple
enhancer variant loci were slightly more GM12878-specific than
all PreSTIGE-predicted enhancers (Fig. 5B, cf. red to gray). By
comparison, genes associated with multiple enhancer variant
loci were highly GM12878-specific (Fig. 5C, cf. purple to gray).
Thus, PreSTIGE assigns GWAS enhancer variants to genes at
a high rate because the genes associated with GWAS enhancer
variants are highly cell type-specific. Genes associated with
multiple enhancer variants were enriched for specialized im-
mune functions including ‘‘regulation of immune response,’’
‘‘response to interferon-gamma,’’ and ‘‘IL2 receptor activity’’
(Fig. 5D). Furthermore, genes associated with each trait were
frequently enriched within common pathways (Fig. 5D). Thus,
although multiple enhancer variant loci associated with a given
trait are dispersed throughout the genome, the associated genes
encode proteins that share similar functions and may be com-
ponents of common pathways.
Many common traits show evidence of multiple enhancer
We next tested if traits other than the six autoimmune diseases
showed evidence of multiple enhancer involvement. We down-
loaded the entire NHGRI catalog of GWAS variants (Hindorff et al.
the SNP or any of its LD SNPs mapped within a coding region to
identify 5824 noncoding trait-associated SNPs. These SNPs (lead
SNPs and all those in LD) were intersected with H3K4me1 sites for
which PreSTIGEidentifieda genetarget in anyoneof 12 cell types.
The traits were hierarchically clustered based on the number of
GWAS loci within predicted enhancers for each cell type, and the
results were plotted as a heatmap (Fig. 6A). Interestingly, related
traits clustered within cell types generally considered relevant to
the pathophysiology of the trait. For example, a cluster of liver-
Combinatorial effects of multiple enhancer variants
GWAS SNPis indicative ofthe genotype oftheentire alleleandthe locusincludesonly two haplotypes (red andpurple) andthree possible genotypes (red/
red, red/purple, purple/purple). For loci with ‘‘imperfect LD’’ the lead SNPdoes not predict the genotype of remaining SNPs. Thisresults in more than two
haplotypes and more than three genotypes. (B) Percent of GWAS loci associated with transcripts that show differential gene expression based on SNP
genotype for multiple enhancer variant loci in sites of ‘‘imperfect LD’’ (black) and perfect LD (red), Fisher’s exact test. (C) Expression of predicted gene
target (PFKFB3) of an ‘‘imperfect LD’’ locus which contains the RA-associated SNP rs706778. Individuals are stratified based on the genotype of each LD
SNP that falls within an enhancer. (D) Each individual is color-coded based on his or her haplotype for the rs706778 ‘‘imperfect LD’’ locus (middle). Note
that the expression of the predicted target gene PFKFB3 segregates by haplotype only when the multiple enhancer variants are in tight LD (right) (Mann-
Whitney-Wilcoxon test). (E) Standard deviation of expression of predicted gene targets within multiple enhancer variant loci with ‘‘imperfect LD’’ for
individuals stratified by lead SNP genotype (black) and stratified by haplotype (red) (Mann-Whitney-Wilcoxon test). (F) Odds ratios for multiple enhancer
variant loci showing perfect (red) and imperfect LD (black) (Welch’s t-test). (*) P < 0.04, (**) P < 0.009.
Effect of individual SNPs in multiple enhancer variant loci. (A)Schematicdescribing‘‘imperfectLD’’loci.WhenSNPsareinperfectLD,thelead
related traits including phospholipid levels, triglyercides-blood
pressure, and cholesterol was observed in hepatocellular carci-
noma cells (HepG2) (Fig. 6B). Neural-related traits including cog-
nitive performance, bipolar disorder, and Alzheimer’s disease cor-
related with enhancer–gene predictions made in neural precursor
cells (NPC) (Fig. 6C). We quantified the number of liver-related
traits found in the HepG2 trait cluster versus those in different cell
types. Approximately 63% of the HepG2 cluster was comprised of
liver traits, compared with 5%–10% in other cell types. The traits
found in each cluster are listed in Supplemental Table S4. These
findings are consistent with other studies showing a correlation
between GWAS SNPs and enhancers in cell types generally con-
GWAS loci containing multiple enhancer variants for the traits in
in all 12 cell types showed evidence of multiple enhancer in-
volvement, ranging from 58% of loci associated with B lympho-
blast (GM12878) traits to 25% of loci associated with H1ES and
MCF-7 trait clusters. In total, for all trait-associated enhancer loci,
48% involved a multiple enhancer variant locus in at least one of
the 12 cell lines (Fig. 6D, black bar). Overall, these findings suggest
that multiple enhancer involvement is not limited to the six traits
associated with B lymphoblast enhancers, and that this trend may
be a general feature of GWAS traits.
In this study, we present several lines of evidence in support of
a ‘‘multiple enhancer variant’’ hypothesis for GWAS traits. We
demonstrated that for certain traits, particularly those with an
autoimmune component, multiple SNPs in LD at a given locus
influence multiple enhancers of a given gene. Using available data
from HapMap and B lymphoblasts as a model system, we provide
evidence at numerous loci that multiple enhancer variants con-
tribute to altered expression of the predicted gene targets. The
multiple enhancer variants can confer either gain- or loss-of-
function effects, elevating or reducing transcript levels. The effects
on gene expression tend to be modest, consistent with the action
of enhancers as modulators of gene expression, rather than binary
switches, as previously proposed (Bajpai et al. 2010; Schnetz et al.
2010). The modest effects may become more pronounced in re-
sponse to a given stimulus, as it is now well established that en-
hancer elements can dynamically respond to a variety of cellular
associated with a PreSTIGE prediction, for all GM12878 cell type-specific H3K4me1 sites (gray) and GM12878-specific sites that contain a GWAS SNP that
is associated with the six immune-related disorders (red) (x2, P-value < 0.0001). (B) Cell type specificity (Shannon entropy Q score) of all enhancers in the
12 cell line comparator set (white), GM12878 cell type-specific enhancers (gray), and enhancers containing disease-associated SNPs (red) (Mann-
Whitney-Wilcoxon test, P-value < 5.3 3 10?6). (C) Cell type specificity (Shannon entropy Q-score) of all genes (white), genes associated with a PreSTIGE
prediction in GM12878 (gray), and predicted gene targets of disease-correlated SNPs (purple) (Mann-Whitney-Wilcoxon test, P-value < 5.3 3 10?6). (D)
GREAT results for each of the six diseases. Top five significant results are shown for Pathway Commons, GO biological processes/molecular function, and
MSigDB pathways categories for each trait.
Gene targets of multiple enhancer variants are highly cell type-specific and functionally related. (A) Percent of H3K4me1 sites that are
Combinatorial effects of multiple enhancer variants
cues or environmental changes (Giorgetti et al. 2010; Heinz et al.
2010; Rada-Iglesias et al. 2011; Zentner et al. 2011; Lin et al.
Some aspects of GWAS have been difficult to explain. For
example, GWAS signals tend to be relatively broad, and it is often
causal SNP. The multiple enhancer variant hypothesis provides an
explanation for these puzzling aspects of GWAS studies, since the
presence of several SNPs distributed among multiple enhancers
throughout the locus, rather than a single SNP, often accounts for
the association signal and the impact on target gene expression.
Additionally, we notethat odds ratios atmultiple enhancer variant
of imperfect LD. GWAS typically pool cases and controls with
mixed genotypes, using HapMap data from ethnically related
populations to approximate LD structure. This would likely result
in an underestimate of the actual effect sizes, which in turn could
account for some of the missing heritability commonly seen in
GWAS. Moreover, because the impact on gene expression is typi-
cally conferred by multiple enhancer variants that are in tight LD,
GWA studies utilizing case–control samples with mixed LD struc-
tures may hinder detection of true association signals.
of SNPs that intersect with H3K4me1 sites linked to a gene target with PreSTIGE. Cluster of disease traits that correlate with SNPs present in HepG2 (B)
and NPC (C) predicted enhancers (zoomed image HepG2 and NPC clusters in A). Genes predicted to be targeted by the disease-correlated SNPs are
shown to the right. Columns are ordered as shown in A. (D) Percent of GWAS enhancer loci that involve multiple enhancer variants for each cluster
highlighted in A.
Multiple enhancer variant loci are a common feature of many GWAS traits. (A) Hierarchical clustering of disease traits based on the number
Corradin et al.
One possibility is that SNPs affecting a single enhancer are not
sufficient to impact expression of the target gene. Alternatively,
SNPs at a single enhancer could impact expression, but the degree
of change is not sufficient to incur risk. We currently cannot dis-
tinguish between these possibilities. It is noteworthythat there are
single enhancer, including preaxial polydactyly and Hirchsprung’s
prostate cancer disrupt transcription factor binding sites at a single
enhancer and alter SOX9 transcript levels in prostate cancer cells
(Zhang et al. 2012). Moreover, in our studies of enhancer function
in colon cancer cells, aberrantly expressed genes in colon cancer
enhancers, but there were numerous examples where epigenetic
loss or gain of a single enhancer altered target transcript levels
many GWAS loci involve multiple enhancer variants, there are
likely numerous examples where a given GWAS SNP(s) impacts
a single enhancer and is sufficient to incur a biological effect. How
often single versus multiple variant loci occur remains to be de-
termined, but our data suggest that these examples of single vari-
ant SNPs influencing gene expression are far rarer than instances
in which two or more enhancer variants cooperate to impact ex-
pression and confer risk. DNA-editing approaches using CRISPR/
Cas or TALEN-based technologies could help distinguish between
these possibilities and further refine the identity of the most in-
fluential SNPs within a given locus.
We have also considered the possibility that multiple en-
hancer variants provide some selective advantage during evolu-
tion. Such is the case for traits associated with ‘‘thrifty genes’’such
as those involved in type 2 diabetes mellitus, where one sees a past
selective advantage for what is now the risk allele. Interestingly,
population genome-wide scans for positive selection have re-
producibly yielded genes with autoimmune-related functions
(Barreiro and Quintana-Murci 2010), which may explain why
multiple enhancer involvement was particularly common among
the loci associated with the six autoimmune traits studied here.
The past decade has observed an explosion of activity in the
identification of SNPs associated with common traits through
GWAS. However, the identification of most of the genes that serve
as the molecular basis of the risk etiology, as well as the identifi-
is due in part to an inability to connect a risk allele to a target gene,
and an inability to functionally test the transcriptional impact of
the target gene will fail to show an expression difference between
individuals with the risk variant versus those with the nonrisk
other confounding factors. The results of our study suggest oth-
the appropriate cell type is utilized, and the SNPs in tight LD at
a given locus are considered together, the impact on target gene
expression is clearly evident. These findings should have impor-
tant implications for studies by the ENCODE and GTEx consortia
or other groups seeking to decipher the impactof genetic variation
also suggest that accurate identification of genes regulated by
multiple enhancer variants could reveal common, and potentially
‘‘druggable,’’ pathways frequently altered among individuals with
a given common disease.
PreSTIGE prediction methodology
To connect cell type-specific enhancers to genes we considered
multiple linear domain models. We systematically evaluated the
use of domainmodelsthat rely onthe distancebetween enhancers
and genes, as well as those that utilized CTCF binding sites to set
the number of predictions made while maintaining the lowest
FDR, utilizes 100 kb as a distance boundary in addition to a subset
the domain model was selected and evaluated, see Supplemental
Material. For an interaction to be predicted in a given cell line, the
normalized H3K4me1–enhancer signal had to be high above
background (>10) and both the enhancer and the gene have to be
specific to the cell line. Specificity was determined by calculating
Shannon entropy Q scores. Details on the development and vali-
dation of the PreSTIGE methodology can be found in the Supple-
mental Material and Supplemental Figs. S2–S16.
ChIP-seq data processing
were obtained for the 12 cell lines of the comparator set (see Sup-
plemental TableS2A) andaligned tohg18withBWA (Li and Durbin
2009). Duplicate reads were removed with SAMtools (Li et al.
reads prior to alignment and used for peak calling with MACS
(Zhang et al. 2008). Called peaks were used to generate a list of
potential enhancer sites. All identified ChIP enriched peaks across
the 12 cell lines were then compiled and overlapping peaks were
collapsed resulting in 309,713 regions. The maximum signal was
then retrieved in each region across all 12 cell lines and the results
were tabled. To normalize for read depth and varying enrichment
across ChIP samples, maximum signals were quantile normalized.
Shannon entropy scoring was performed on normalized maxi-
mum signals to quantify cell type specificity for each region.
RNA-seq data processing
the comparator set (Supplemental Fig. S2A). Reads were aligned to
10 multiple alignments. Gene expression score FPKM was de-
termined for all RefSeq genes using Cufflinks (Trapnell et al. 2010).
An FPKM threshold of 0.3 was chosen to balance the false dis-
covery and false negative rates as described by Ramskold et al.
(2009). Genes with FPKMs <0.3 were rounded to zero and then the
results were tabled. The data obtained for neural precursor cells
using TopHat modified for colorspace reads. Given the different
normalized. Shannon entropy scoring was then performed on the
normalized FPKMs to score cell line specificity of gene expression.
Colon cancer VEL data validation
H3K4me1 ChIP-seq and matching input (Supplemental Table S3B)
for the colon crypt was processed as described above. The crypt
peaks were added to the 309,713 peaks of the comparator set and
peaks within 1 bp were consolidated. The maximum signal in all
regions was obtained forall 13 cell lines andthe results were tabled
and quantile normalized. These results were then Shannon en-
tropy scored. Previously published Affymetrix Human Exon 1.0 ST
exon array data for healthy colon crypt and colon cancer cell lines
Combinatorial effects of multiple enhancer variants
were also obtained (Akhtar-Zaidi et al. 2012). The median PLIER
normalized expression score across three biological replicates of
the colon crypt was used in subsequent analysis. To accurately
compare colon crypt array data to the RNA-seq data of the com-
parator set, array expression was quantile normalized with the
RNA-seq data table and then Shannon entropy scoring was used to
quantify specificity of gene expression. Predictions were made as
described above for interactions that occur in the colon crypt and
CRC samples. For validation of the crypt predictions, colon cancer
cell lines were analyzed in pairs. H3K4me1 sites that show differ-
ential enrichment of H3K4me1 between crypt and cancer (i.e., lost
VELs) and those that are defined as unchanged, present in the
crypt and cancer lines, were considered for validation. As the dis-
tributions of expression were different between the six colon
cancer cell lines and the median colon crypt expression, data sets
decrease in gene expression between the cancer cell lines and the
crypt due to differences in distributions.
FDR calculation with CRC VEL data
Fold changes of transcripts in colon cancer versus normal colon
Enhancer–gene predictions were made in the normal colon crypts
and used to determine the gene targets of enhancers lost, i.e., lost
the enhancer–gene pair was considered successfully validated. The
positive prediction rate was determined by calculating the percent
of lost VEL predicted targets that decreased by >1.3-fold. Genes
associated with a lost VEL that failed to show a decrease in gene
FDR was determined by measuring the percent of genes that
did not decrease by >1.3-fold. The null distribution of the FDR
was determined by randomly assigning genes to colon crypt en-
hancers, and the percent of lost VELs that were randomly associ-
ated with a gene that decreased in expression>1.3-fold was used to
calculate the null distribution for the FDR. The PreSTIGE FDR was
then corrected by dividing the FDR by the FDR of randomly as-
sociating gene targets to enhancers.
Annotation of noncoding GWAS variants
We downloaded the entire NHGRI catalog of GWAS variants
(Hindorff et al. [Sept. 18, 2012]). We retrieved all SNPs in LD with
GWAS lead SNPs using LD blocks identified with publicly available
HapMapdataontheCEPHancestry population.SNPsin strongLD
(LOD > 2 and D9 > 0.99) with the lead SNP were utilized. All lead
and LD SNPs were intersected with human coding exons obtained
from UCSC Table Browser. If the lead SNP or any of its LD SNPs
intersected with the coding sequences, that lead SNP (and its LD
SNPs) was removed from the analysis. All subsequent analyses
utilized the identified noncoding GWAS SNPs.
Variant Set Enrichment Analysis
To test for enrichment of immune-related disorders in B-cell en-
hancers we used Variant Set Enrichment Analysis. SNPs associated
with one of the six disorders (rheumatoid arthritis, Crohn’s dis-
disease) were intersected with the PreSTIGE-predicted enhancers
for all 12 cell lines of the comparator set as well as the colon crypt.
To determine if enrichment of SNPs in a given cell line is statisti-
cally significant, we generated null distributions by randomly
Random SNP sets were matched to disease-associated SNPs by size
so that SNPs in the random set contained the same number of LD
SNPs as the disease-associated set. Enrichment in PreSTIGE-predicted
enhancers of disease-associated SNP and 1000 random size-matched
sets were compared in order to obtain the significance of the en-
richment (Akhtar-Zaidi et al. 2012; Cowper-Sal lari et al. 2012).
Impact of enhancer variant loci on gene expression
To determine the effect of the risk variant on the expression of the
predicted gene target we obtained RNA-seq gene expression data
from 61 CEU individuals (Montgomery et al. 2010) as well as the
corresponding genotypes from HapMap (International HapMap
Consortium 2007; The International HapMap 3 Consortium 2010).
We identified individuals who were homozygous for the nonrisk
allele, heterozygous for the risk allele, and homozygous for the risk
allele for the GWAS loci associated with the six immune-related
diseases. If fewer than three individuals were homozygous for the
nonrisk allele, or fewer than three individuals inherited the risk
allele, then this SNP was excluded from the analysis. We compared
the gene expression of individuals who carried the risk allele (ho-
mozygous or heterozygous) with those who were homozygous for
the nonrisk variant. For all analyses determining the impact of
Wilcoxon test (P-value < 0.05) was used to determine whether the
effect was significant.
Enrichment of disease-associated SNPs for multiple enhancer
We determined the percentage of GWAS enhancer variants with LD
SNPs that mapped to an additional enhancer predicted to target the
same gene. LD SNPs were retrieved as described above. As a control
we identified CEU SNPs that fall within GM12878 putative en-
hancers associated with PreSTIGE predictions (at high stringency)
and filtered out all SNPs that have previously been associated with
any disease. We then determined the percent of these control SNPs
that are associated with multiple enhancer variants in LD. We ran-
domly selected a matched number of control loci 100 times. The
comparing the proportion of genes that are significantly impacted
by SNP genotype were compared using Fisher’s exact test.
The PreSTIGE methodology is available for academic use at
prestige.case.edu through a Galaxy interface (Giardine et al. 2005;
Blankenberg et al. 2010; Goecks et al. 2010). Users can generate
predictions for any cell type of interest for which H3K4me1 ChIP-
seq and RNA-seq data are available. A database of the predictions
for the 13 cell lines listed in Figure 1A is also available at genetics.
We thank Anthony Wynshaw-Boris, Paul Tesar, Tom Laframboise,
and Stephanie Balow for helpful comments and discussion.
This work was supported by NIH grants R01CA160356 (P.C.S.),
5T32GM008056-29 (O.C.), and 5P50CA150964 (S.M., J.W.).
Akhtar-Zaidi B, Cowper-Sal-lari R, Corradin O, Saiakhova A, Bartels CF,
Balasubramanian D, Myeroff L, Lutterbaugh J, Jarrar A, Kalady MF, et al.
Corradin et al.
12 Genome Research
2012. Epigenomic enhancer profiling defines a signature of colon
cancer. Science 336: 736–739.
Bajpai R, Chen DA, Rada-Iglesias A, Zhang J, Xiong Y, Helms J, Chang CP,
Zhao Y, Swigut T, Wysocka J. 2010. CHD7 cooperates with PBAF to
control multipotent neural crest formation. Nature 463: 958–962.
Barreiro LB, Quintana-Murci L. 2010. From evolutionary genetics to human
immunology: How selection shapes host defence genes. Nat Rev Genet
Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A,
Meissner A, Kellis M, Marra MA, Beaudet AL, Ecker JR, et al. 2010. The
NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 28:
Blankenberg D, Von Kuster G, Coraor N, Ananda G, Lazarus R, Mangan M,
Nekrutenko A, Taylor J. 2010 Galaxy: A web-based genome analysis tool
for experimentalists. Curr Protoc Mol Biol 89: 19.10.1–19.10.21.
Cowper-Sal lari R, Zhang X, Wright JB, Bailey SD, Cole MD, Eeckhoute J,
Moore JH, Lupien M. 2012. Breast cancer risk-associated SNPs modulate
Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. 2010.
Identifying a high fraction of the human genome to be under selective
constraint using GERP++. PLoS Comput Biol 6: e1001025.
The ENCODE Project Consortium. 2007. Identification and analysis of
functional elements in 1% of the human genome by the ENCODE pilot
project. Nature 447: 799–816.
X, Wang L, Issner R, Coyne M, et al. 2011. Mapping and analysis of
chromatin state dynamics in nine human cell types. Nature 473: 43–49.
Gallant S, Gilkeson G. 2006. ETS transcription factors and regulation of
immunity. Arch Immunol Ther Exp (Warsz) 54: 149–163.
Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y,
Blankenberg D, Albert I, Taylor J, et al. 2005. Galaxy: A platform for
interactive large-scale genome analysis. Genome Res 15: 1451–1455.
Giorgetti L, Siggers T, Tiana G, Caprara G, Notarbartolo S, Corona T,
Pasparakis M, Milani P, Bulyk ML, Natoli G. 2010. Noncooperative
interactions between transcription factors and clustered DNA binding
sites enable graded transcriptional responses to environmental inputs.
Mol Cell 37: 418–428.
Goecks J, Nekrutenko A, Taylor J. 2010. Galaxy: A comprehensive approach
for supporting accessible, reproducible, and transparent computational
research in the life sciences. Genome Biol 11: R86.
Singh H, Glass CK. 2010. Simple combinations of lineage-determining
transcription factors prime cis-regulatory elements required for
macrophage and B cell identities. Mol Cell 38: 576–589.
Hindorff LA, MacArthur J, Morales J, Junkins HA, Hall PN, Klemm AK,
Manolio TA. 2012. A Catalog of Published Genome-Wide Association
The International HapMap Consortium. 2007. A second generation human
haplotype map of over 3.1 million SNPs. Nature 449: 851–861.
The International HapMap 3 Consortium. 2010. Integrating common and
rare genetic variation in diverse human populations. Nature 467: 52–58.
Joseph R, Orlov YL, Huss M, Sun W, Kong SL, Ukil L, Pan YF, Li G, Lim M,
Thomsen JS, et al. 2010. Integrative model of genomic factors for
determining binding site selection by estrogen receptor-a. Mol Syst Biol
Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-
Wheeler transform. Bioinformatics 25: 1754–1760.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G,
Abecasis G, Durbin R. 2009. The Sequence Alignment/Map format and
SAMtools. Bioinformatics 25: 2078–2079.
Li G, Ruan X, Auerbach RK, Sandhu KS, Zheng M, Wang P, Poh HM, Goh Y,
Lim J, Zhang J, et al. 2012. Extensive promoter-centered chromatin
interactions provide a topological basis for transcription regulation. Cell
Lin C, Garruss AS, Luo Z, Guo F, Shilatifard A. 2013. The RNA Pol II
elongation factor Ell3 marks enhancers in ES cells and primes future
gene activation. Cell 152: 144–156.
Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H,
Reynolds AP, Sandstrom R, Qu H, Brody J, et al. 2012. Systematic
localization of common disease-associated variation in regulatory DNA.
Science 337: 1190–1195.
J, Guigo R, Dermitzakis ET. 2010. Transcriptome genetics using second
Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E,
Veyrieras JB, Stephens M, Gilad Y, Pritchard JK. 2010. Understanding
mechanisms underlying human gene expression variation with RNA
sequencing. Nature 464: 768–772.
Pomerantz MM, Ahmadiyeh N, Jia L, Herman P, Verzi MP, Doddapaneni H,
BeckwithCA,ChanJA,HillsA,DavisM,etal.2009.The 8q24 cancer risk
variant rs6983267 shows long-range interaction with MYC in colorectal
cancer. Nat Genet 41: 882–884.
Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J.
2011. A unique chromatin signature uncovers early developmental
enhancers in humans. Nature 470: 279–283.
Ramskold D, Wang ET, Burge CB, Sandberg R. 2009. An abundance of
ubiquitously expressed genes revealed by tissue transcriptome sequence
data. PLoS Comput Biol 5: e1000598.
Sanyal A, Lajoie BR, Jain G, Dekker J. 2012. The long-range interaction
landscape of gene promoters. Nature 489: 109–113.
Schadt EE, Molony C, Chudin E, Hao K, Yang X, Lum PY, Kasarskis A, Zhang
B, Wang S,Suver C,et al. 2008. Mapping thegenetic architectureof gene
expression in human liver. PLoS Biol 6: 1020–1032.
Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. 2012. Linking
disease associations with regulatory information in the human genome.
Genome Res 22: 1748–1759.
Schnetz MP, Handoko L, Akhtar-Zaidi B, Bartels CF, Pereira CF, Fisher AG,
Adams DJ, Flicek P, Crawford GE, Laframboise T, et al. 2010. CHD7
targets active gene enhancer elements to modulate ES cell-specific gene
expression. PLoS Genet 6: e1001023.
Shen Y, Yue F, McCleary DF, Ye Z, Edsall L, Kuan S, Wagner U, Dixon J, Lee L,
mouse genome. Nature 488: 116–120.
Spilianakis CG, Flavell RA. 2004. Long-range intrachromosomal
interactions in the T helper type 2 cytokine locus. Nat Immunol 5: 1017–
Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE,
Dunning M, Flicek P, Koller D, et al. 2007. Population genomics of
human gene expression. Nat Genet 39: 1217–1224.
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E,
Sheffield NC, Stergachis AB, Wang H, Vernot B, et al. 2012. The
accessible chromatin landscape of the human genome. Nature 489:
Trapnell C, Pachter L, Salzberg SL. 2009. TopHat: Discovering splice
junctions with RNA-Seq. Bioinformatics 25: 1105–1111.
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ,
Salzberg SL, Wold BJ, Pachter L. 2010. Transcript assembly and
quantification by RNA-Seq reveals unannotated transcripts and
isoform switching during cell differentiation. Nat Biotechnol 28: 511–
Trynka G, Sandor C, Han B, Xu H, Stranger BE, Liu XS, Raychaudhuri S.
2013. Chromatin marks identify critical cell types for fine mapping
complex trait variants. Nat Genet 45: 124–130.
Vernimmen D, Marques-Kranc F, Sharpe JA, Sloane-Stanley JA, Wood WG,
Wallace HAC, Smith AJH, Higgs DR. 2009. Chromosome looping at the
human a-globin locus is mediated via the major upstream regulatory
element (HS ?40). Blood 114: 4253–4260.
Visel A, Rubin EM, Pennacchio LA. 2009. Genomic views of distant-acting
enhancers. Nature 461: 199–205.
Zentner GE, Tesar PJ, Scacheri PC. 2011. Epigenetic signatures distinguish
multiple classes of enhancers with distinct cellular functions. Genome
Res 21: 1273–1283.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum
C, Myers RM, Brown M, Li W, et al. 2008. Model-based analysis of ChIP-
Seq (MACS). Genome Biol 9: R137.
Zhang X, Cowper-Sal lari R, Bailey SD, Moore JH, Lupien M. 2012.
Integrative functional genomics identifies an enhancer looping to the
SOX9 gene disrupted by the 17q24.3 prostate cancer risk locus. Genome
Res 22: 1437–1446.
Received July 23, 2013; accepted in revised form November 5, 2013.
Combinatorial effects of multiple enhancer variants