ArticlePDF Available

Principles Of Regulatory Information Conservation Between Mouse And Human

Authors:

Abstract and Figures

To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.
Conservation of occupancy is associated with chromatin accessibility and enhancer activity in multiple tissues. a, Association between occupancy conservation and chromatin accessibility across several tissues. The density plot represents the frequency that TF OSs are in accessible chromatin in varying numbers of cell types. The x axis is the Shannon index density calculated on the basis of the DHS signals in 55 tissues or cell lines in mouse; high values mean the TF OS is in accessible chromatin in many cell types. The red line shows the fraction of TF OSs at which occupancy is conserved within each bin of Shannon index. b, Association between occupancy conservation and enhancer usage across several tissues. The density plot represents the frequency that TF OSs are in chromatin indicative of enhancer activity (calculated using histone H3 acetyl Lys 27 (H3K27ac) ChIP-seq signals) in varying numbers of cell types. The x axis is the Shannon index calculated based on H3K27ac signal across 23 tissues or cell lines. The red line shows the fraction of TF OSs at which occupancy is conserved within each bin of Shannon index. pre-enhancer, presumptive enhancer. c, Results of transgenic mouse enhancer assays of ten occupancy-conserved GATA1 binding sites. The stained embryo images are highlighted by activity in different tissues: light pink for those showing enhancer activity only in heart and vascular tissues, darker pink for those with activities in other tissues. Right panel shows genes, enhancers predicted by histone modifications, chromatin states (using the software ChromHMM, see Methods), factor occupancy, and DHS signals across different tissues for regions containing two GATA1 OSs.
… 
Content may be subject to copyright.
ARTICLE OPEN
doi:10.1038/nature13985
Principles of regulatory information
conservation between mouse and human
Yong Cheng
1
*, Zhihai Ma
1
*, Bong-Hyun Kim
2
, Weisheng Wu
3,4
, Philip Cayting
1
, Alan P. Boyle
1
, Vasavi Sundaram
5
, Xiaoyun Xing
5
,
Nergiz Dogan
3
, Jingjing Li
1
, Ghia Euskirchen
1
, Shin Lin
1,6
, Yiing Lin
1,7
, Axel Visel
8,9,10
, Trupti Kawli
1
, Xinqiong Yang
1
,
Dorrelyn Patacsil
1
, Cheryl A. Keller
3
, Belinda Giardine
3
, The Mouse ENCODE Consortium{, Anshul Kundaje
1
, Ting Wang
5
,
Len A. Pennacchio
8,9
, Zhiping Weng
2
, Ross C. Hardison
3
1& Michael P. Snyder
1
1
To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34
orthologous transcription factors (TFs) in human–mouse erythroid progenitor, lymphoblast and embryonic stem-cell
lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and
co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the
mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and
DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous
DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is
more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to
be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites
with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.
Determining the similarities and differences between mouseand human
regulatory networks will not o nly improve our understanding of the evo-
lution of regulatory mechanisms, but also help to interpret biomedical
insights derived from research performed on mouse models. Recent
genome-wide binding studies of eight TFs in several species uncovered
many regulatory networks that have been highly rewired since the di-
vergence of ancestors to mouse and human
1–4
, consistent with early studies
in other species
5
. These results contrast sharplywith other data showing
that conservation of genomic DNA sequences can be a useful guide to
discovery of regulatory regions
6
, and that the regulatory landscape can
be highly conserved among more distant species
7
. Consideri ng the large
numbers of known TFs and their functional diversity, comprehensive
studies on a broader range of TFs are needed to resolve these apparent
discrepancies. Furthermore, our knowledge of the functional consequences
of either divergence or conservation of TF occupancy remains limited.
The mouse–human orthologous occupancy profiles
To examine conservation of TF binding regions both between species
and across different cell types,we generated and analysed a large data set
of genome-wide binding profiles for 34 TFs in mouse and human. A
diversepanelofTFswerechosenincludingthosethatbindDNAthrough
specific consensus sequences, comprise part of the general transcrip-
tional machinery such as RNA polymerase 2 (POL2), and modify or
remodel chromatin (Extended Data Fig. 1a and Supplementary Infor-
mation). For simplicity, we refer to the entire collection as TFs, even
though some are general factors. We focused on occupancy by 32 TFs
in cell line models for erythroid progenitors (mouse erythroleukaemia
MEL and human leukaemia K562 cells) and lymphoblasts (mouse
lymphoma CH12 and human B lymphoblastoid GM12878 cells) in mouse
and human, and we also showed that the results are similar to those
obtained in mouse and human embryonic stem cells (Extended Data
Fig. 8). Chromatin immunoprecipitation with massively parallel sequen-
cing (ChIP-seq) assays were conductedusing replicate experiments and
in accordance with ENCODE standards
8
. A total of 120 data sets were
generated and analysed.
Conserved and non-conserved features
These genome-wide binding data for a large and diverse set of TFs
revealed both conserved and non-conserved features of TF occupancy
between mouse and human. First, althoughmost TFs can reside at both
promoters and distal sites, each shows a pronounced preference (Fig. 1a
and Extended Data Fig. 2a, b). The preference is strongly conserved
between mouse and human (R50.8; Extended Data Fig. 2c). The one
exception is ETS1. Even thoughthe primary motif in ETS1 is conserved
between mouse and human (Fig. 1b), it preferentially binds proximal
to promoters in human but not in mouse. ETS1 is responsible for the
mouse-specific expression of the T-cell marker Thy-1 in the thymus
9
,
and we propose that this marked difference in its binding location may
contribute to immune system differences between mouse and human
10
.
Second, although the primary motifs of most sequence-specific TFs are
conserved between mouse and human, the secondary motifs (for exam-
ple, motifs of associated factors; see Supplementary Information) tend
to be lineage-specific (Fig. 1b and Extended Data Fig. 2d), indicating a
change in co-associated partners.
The preferred chromatin states, defined by histone modifications,
for occupied sequences (OSs) of orthologous TFs are also conserved
*These authors contributed equally to this work.
{Lists of participants and their affiliations appear in the Supplementary Information.
1These authors jointly supervised this work.
1
Department of Genetics, Stanford University, Stanford, California 94305, USA.
2
Program in Bioinformatics and Integrative Biology, Department of Biochemistry and Molecular Pharmacology,University of
Massachusetts Medical School, Worcester, Massachusetts 01605, USA.
3
Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Department of Biochemistry and
Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
4
BRCF Bioinformatics Core, University of Michigan, Ann Arbor, Michigan 48105, USA.
5
Department of
Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine,St Louis, Missouri 63108, USA.
6
Division of Cardiovascular Medicine, Stanford University, Stanford,
California 94304, USA.
7
Department of Surgery, Washington University School of Medicine, St Louis, Missouri 63110, USA.
8
Lawrence Berkeley National Laboratory, Genomics Division, Berkeley, California
94701, USA.
9
Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA.
10
School of Natural Sciences, University of California, Merced, California 95343, USA.
20 NOVEMBER 2014 | VOL 515 | NATURE | 371
Macmillan Publishers Limited. All rights reserved
©2014
between mouse and human. Using data on five histone modifications,
the mouse and human genomes were segmented into eight chromatin
states (Fig. 1c and Extended Data Fig. 3a, b). Most TF OSs are located
in states characteristic of promoters and enhancers (states 1–4). By con-
trast, approximately 50% of OS s for the CTCF–cohesin complex (CTCF,
RAD21 and SMC3)
11,12
are located in state 5 and 8, which mark qui-
escent regions with very low signal for all the histone modifications.
MAFK also shows preference for quiescent regions. Notably, both the
CTCF–cohesin complex and MAFK
13
can mediate long-range inter-
actions inthe genome. The state preference is conserved between mouse
and human (Fig. 1c; R50.9; Extended Data Fig. 3b), suggesting that
the overall functions of the occupied segments are similar in the two
species. Indeed, the proportion of enhancers, predicted by a different
approach
14,15
, is also conserved (R50.7) (Extended Data Fig. 4).
We also examined DNA methylation profilesin TF OSs by using both
methylated DNA immunoprecipitation (MeDIP) and DNA digestion
with methyl-sensitive restriction enzymes followed by sequencing (MRE-
seq)
16
. The TF OSs are hi ghly enriched for MRE-seq signals and depleted
of MeDIP-seq signals, showing that TF OSs are generally hypomethy-
lated in both species (Fig. 1d and Extended Data Fig. 3c).
TF- and location-specific occupancy conservation
The TF binding regions are enriched for conservation of DNA sequences,
showing a strong signal for evolutionary constraint within 650 base pairs
(bp) of ChIP-seq peak summits (Fig. 2a). This result indicates that pu-
rifying selection has acted on DNA sequences in many of the TF OSs,
but it does not mean that all TF OSs are uniformly under constraint.
Approximately 50% of TF OSs do not align between mouse and human
15
because either they are lineage-specific sequences such as transposable
elements
17
, or they have diverged to an extent that they no longer align.
We then focused on the subset of TF OSs in which the sequences
aligned between mouse and human to determine whether orthologous
DNA sequences are also occupied by orthologous TFs (details in Sup-
plementary Methods). Notably, the proportion of TF OSs at which
occupancy was conserved varied markedly both among TFs and with
the genomic locations (Fig. 2b). Conservation of occupancy is consis-
tently higher in the promoter regions and lower in distal regions for
almost all TFs, suggesting that the promoters may be under stronger
selection than distal enhancers. Conserved promoter occupancy is ob-
served both for factors that bindnear promoters (NRF1 and MAZ) and
for factors with a minority of binding sites in promoter regions (for ex-
ample, MEF2A and TAL1). A notable exception is the CTCF–cohesin
complex, which not only shows high levels of occupancy conservation
as described previously
18
, but also the conservation remains high at prox-
imal, middle and distal regions relative to the transcription start site
(TSS) (Fig. 2b). These patterns of variation in conservation of occu-
pancy are robust. One potential confounding factor is the tendency for
promoter sequencesto be more conserved than otherregulatory regions,
but adjusting the occupancy conservation by the sequence conserva-
tion difference revealed similar trends, that is, the OSs in promoter re-
gions are more conserved than those in other regions (Extended Data
Fig. 5a). Similarly, removal of the few TFs for which markedly different
numbers of peaks were called between mouse and human did not change
the patterns of conservation of occupancy (Extended Data Fig. 5b and
Supplementary Information).
Next, we investigated how epigenetic factors influence TF binding
at orthologous sites between mouse and human. As expected, the dis-
tribution of chromatin statesis highly similar for occupancy-conserved
TF OSs. For orthologues of TF OSs that can be aligned betweenthe two
species but are bound only inone species, a smaller proportion were in
enhancer-associated states (states 3 and 4) and a larger proportion were
in either repressed (state 7) or quiescent (states 5 and 8) chromatin OSs
(Fig. 2c and Extended Data Fig. 6a, b). Thus species-specific loss of TF
occupancy at many sites is accompanied by a shift to repressive or
TSS
Middle
Distal
TSS
Middle
Distal
E2F4
KAT2A
CHD1
MXI1
POLR2A
CTCF
MAFK
GATA1
TAL1
RCOR1
JUND
EP300
SMC3
RAD21
ab
dc
0
0.2
0.4
0.6
0.8
Proportions
0.2
0.4
0.6
0.8
Proportions
0
Conserved
Not conserved
Partly conserved
NA
1,000 2,000 3,000
0
50
100
150
0
0
0.2
0.4
0.6
0.8
1.0
0
1
2
0 200 400 800 1,000600
0
0.2
0.4
0.6
0.8
1.0
Primary motifs
Secondary motifs
0
1
2
Bits
Bits
1
2
3
4
5
6
7
8
9
10
11
12
13
1
2
3
4
5
6
7
8
9
10
11
12
13
10
30
50
Normalized signal
MRE MREMeDIP MeDIP
UBTF
RDBP
NRF1
MAZ
SSIN3A
ELF1
TCF12
TBP
MAX
CHD2
MYC
USF1
USF2
MEF2A
BHLHE40
IRF4
ETS1
PAX5
USF2
USF1
UBTF
TBP
TAL1
SMC3
SIN3A
RDBP
RCOR1
RAD21
Pol2
NRF1
MYC
MXI1
MEF2A
MAZ
MAX
MAFK
JUND
GATA1
ETS1
EP300
ELF1
E2F4
CTCF
CHD2
CHD1
BHLHE40
CTCF
RAD21
Pol2
SMC3
USF1
TBP
MEF2A
MAFK
USF2
NRF1
MAX
TAL1
CHD2
MXI1
JUND
MYC
CHD1
GATA1
BHLHE40
EP300
ELF1
RDBP
UBTF
ETS1
SIN3A
RCOR1
MAZ
E2F4
Proportion with motif
Proportion with motif
IRF4
ETS1
ELF1
BHLHE40
USF2
USF1
TCF12
MXI1
MEF2A
MAX
E2F4
NRF1
132 456781 32 45678
H3K4me3
H3K4me3 + H3K27ac
H3K4me1 + H3K27ac
H3K4me1
H3K36me3
H3K27me3
Quiescent
Quiescent
1
2
3
4
5
6
7
8
CTCF
MAZ
MYC
PAX5
JUND
Motif to peak summit
distance (bp)
0
50
100
150
Motif to peak summit
distance (bp)
Figure 1
|
General features comparison between
orthologous TF OSs. a, Each row represents one
TF, and each column represents one genomic
region. Heat-map colour shows the proportions of
TF OSs (combination of different cell lines in the
same species) that are located in each genomic
region. b, Motif comparison for sequence-specific
TFs examined in lymphoblast cells. In the right
panel, each row represents one TF. The level of
motif conservation is encoded by colour. Detailed
results for the USF2 example are in the left panels.
Peaks were divided into different bins according
to the occupancy signal (higher signal on the left,
lower on the right). The proportions of peaks with
the motif in each bin (red lines) and the average
distances between motif sites and peak summit
in each bin (grey lines) are plotted against ranks of
peak bins. Red dots indicate the proportion of
control regions (6500bp flanking the USF2 OS)
that have the motif. NA, not available. c,TFOS
chromatin state preference comparison between
MEL and K562 cells. Heat map shows the
percentage of TF OSs (rows) that overlap with eight
different chromatin states (columns). d, The
average signal distributions for MeDIP-seq and
MRE-seq in MEL and K562 cells. Five-kilobase
flanking regions centred on the TF OS peak
summits were divided into 50-bp bins. Signals were
aggregated in each bin.
RESEARCH ARTICLE
372 | NATURE | VOL 515 | 20 NOVEMBER 2014
Macmillan Publishers Limited. All rights reserved
©2014
quiescent chromatin. By contrast, the promoter states (states 1 and 2)
were largely maintained in the second species even with the loss of TF
binding. This result indicates that other TFs may help to maintain con-
servation of a promoter state in these regions. We also searched for
changes in the level of DNA methylation between TF OSs and their
orthologous sequences. DNA methylation levels remained low in both
species for occupancy-conserved TF OSs (Fig. 2d and Extended Data
Fig. 6c), but the DNA methylation levels were significantly increased
in the unbound, orthologous sequences. Thus, species-specific loss of
TF occupancy is also associated with species-specific increases in DNA
methylation.
Occupancy conservation associates with pleiotropy
We proposed that TF OSs with regulatory functions in several tissues
would be under increased selective pressure, and thus more likely to
be conserved in occupancy. To test this hypothesis, we first examined
DNase I hypersensitive sites (DHSs) across 55 mouse tissues and cell
lines
15
to measure the chromatin accessibility of each TF OS among dif-
ferent tissues. Because DHSs are a proxy for regulatory element activity
19
,
TF OS regions accessible in multiple tissues are more likely to function
in those tissues. Chromatin accessibility of TF OSs presents wide varia-
tion, ranging from tissue-specific to ubiquitous patterns (Fig. 3a). Notably,
the TF OSs with more pervasive chromatin accessibility across differ-
ent tissues show the highest extent of occupancy conservation between
mouse andhuman. The association between tissue usage and occupancy
conservation is general; it was observed for most of the TFs examined
(Extended Data Fig. 7b, c). This association is also robust to several po-
tentialconfounding factors. CTCF–cohesin complexes,which are abun-
dant and conserved across different tissue types and species
18,20
, might
be expected to bias the result; however, we obtained comparable results
after removing all the genomic regions occupied by CTCF, RAD21 or
SMC3 (Extended Data Fig. 7a). The conservation of promoter regions
among several tissues and species
14
might also be expected to bias our
analysis, but, after removal of occupancy-conserved TF OSs that lie
within 2 kilobases (kb) of TSSs, we still found that the association be-
tween tissue usage and TF occupancy conservation holds for distal TF
OSs (Extended Data Fig. 7d, e). Furthermore, specifically examining
distal TF OSs that overlapped with enhancers predicted by chromatin
signals
14
showed that broad tissue usag e of presumptive enhancers tracks
strongly with conservation of occupancy between mouse and human
(Fig. 3b).
A prediction of our hypothesis is that occupancy-conserved TF OSs
will tend to be active in multiple tissues. To test this prediction experi-
mentally, we randomly chose ten occupancy-conserved GATA1 OSs.
Even though OSs were chosen on the basis of the occupancy profile of
an erythroid-specific regulatory factor, all ten conserved OSs overlapped
with DHSs peaksand predicted enhancers in many tissues, suchas brain
(Fig. 3c). When tested for in vivo enhancer activity in transgenic mouse
reporter assays at embryonic day 11.5, nine of the ten showed strong,
reproducible in vivo enhancer activity, and four were active in non-
erythroid tissues such as midbrain and neural tube (Fig. 3c). We ex-
panded our analysis to examine other mouse GATA1 OSs that overlapped
with previously tested enhancers deposited in the VISTA Enhancer
Browser (http://enhancer.lbl.gov)
21
. Six GATA1 OSs that are specific to
mouse generated positive enhancer assays; only one (16%) showed ex-
pression in tissues other than blood vessels and heart. By contrast, among
12 additional occupancy-conserved GATA1 OSs with in vivo enhancer
activity, 6 (50%) were active in non-erythroid tissues such as midbrain
(Supplementary Table 5).
Conservation and divergence of TFs co-association
Because precise gene regulation requires complex interactions among
differentTFs, we speculated that differences in conservation of TF occu-
pancy may be related, at least in part, to different co-association part-
ners. By calculating the occupancy signals for all the TFs in each TF
OS, we found that, in general, occupancy-conserved TF OSs tend to be
phyloP score
Distance to peaks summit (bp)
Human Mouse
0.10
0.15
0.20
0.25
0.30
0.35
−100 −50 0 50 100 −100 −50 0 50 100
Mouse
Human
Random background
TF
OSs
Orthologues OrthologuesTF
OSs
Occupancy
conserved
Occupancy conserved
Occupancy
not conserved
Occupancy not conserved
0.00
0.25
0.50
0.75
1.00
State
Proportion of peaks in each states
ab
d
c
TSS
Middle
Distal
TSS
Middle
Distal
TSS
Middle
Distal
TSS
Middle
Distal
CH12 GM12878 MEL K562
0.0
0.2
0.4
0.6
1
2
3
4
5
6
7
8
1133
JUND
RCOR1
TAL1
MAFK
EP300
GATA1
ETS1
KAT2A
PAX5
CHD1
IRF4
MEF2A
BHLHE40
CTCF
RDBP
UBTF
RAD21
USF1
MYC
USF2
SMC3
CHD2
TCF12
E2F4
MAX
MAZ
SIN3A
MXI1
TBP
ELF1
NRF1
Pol2
0.0
2.5
5.0
7.5
10.0
MeDIP-seq signals
Occupancy
conserved
Occupancy
not conserved
TF
OSs
Orthologues OrthologuesTF
OSs
Figure 2
|
Conservation and divergence of TF
OSs. a, Blue and purple lines represent the average
phyloP score distribution near (6100 bp) the
ChIP-seq peak summit in human and mouse. The
grey line represents the distribution for randomly
selected background sequences. The xaxis is the
distance to the peak summit, and the yaxis is the
average phyloP score. b, The heat map represents
the occupancy conservation of TF (rows) OSs in
the four cell lines. The colour intensity represents
the proportion of TF OSs for which occupancy is
conserved between mouse and human in different
genomic regions (columns). c, Comparison of
the chromatin state change between TF OSs and
orthologous sequences. TF OSs that can be aligned
between mouse and human are divided into two
groups according to the occupancy conservation
status (‘occupancy conserved’ versus ‘occupancy
not conserved’). Top, the yaxis is the proportion of
TF OSs and their orthologous sequences in each
chromatin state. Bottom, detailed chromatin state
change in human orthologues for mouse TF OSs in
chromatin states 1 and 3. The pie charts show
the distribution of chromatin states in the
orthologous sequence in the second species.
d, Comparison of the DNA methylation change
between TF OSs and orthologous sequences.
The yaxis gives the normalized DNA methylation
signals (MeDIP-seq). TF OSs are divided into
two categories according to the occupancy
conservation status as in c.
ARTICLE RESEARCH
20 NOVEMBER 2014 | VOL 515 | NATURE | 373
Macmillan Publishers Limited. All rights reserved
©2014
bound by more TFs compared to lineage-specific TF OSs (P,2.2 3
10
216
, two-tailed t-test; Fig. 4a), suggesting that co-association with sev-
eral TFs increases the level of purifying selection on the occupied se-
quences. Furthermore, by examining each co-associated TF pair (Fig. 4b),
we determined whether the co-associations were more enriched in
occupancy-conserved versus species-specific binding sites (Fig. 4c and
Extended Data Fig. 9).The relationships fell into three categories. In the
first category, co-association of TFs is not linked with occupancy con-
servation. For example, RAD21 is highly associatedwith CTCF in MEL
cells; however, this co-association occurs with equivalent frequency at
occupancy-conserved and species-specific binding sites. In the second
category, TF co-association is negatively correlated wi tho ccupancy con-
servation. For example, the co-association of MYC OSs with EP300, an
enhancer-associated factor
22
, is highly enriched in the mouse-specific
binding sites. In the last category, TF co-association is positively corre-
lated with occupancy conservation, as exemplified by theco-association
of MYC OSs with the co-repressor SIN3A (ref. 23), suggesting that MYC-
associated repressors tend to be conserved between mouse and human.
0.0
0.1
0.2
0.3
0.4
01234
0.0
0.1
0.2
0.3
0.4
Shannon index density (pre-enhancer)
Occupancy conservation
b
Shannon index density (DHS)
Occupancy conservation
0.0
0.2
0.4
0.6
01234
Shannon index Shannon index
Density
0.0
0.2
0.4
0.6
a
116,200,000 116,250,000 116,300,000
Lmo1
chr4:
chr7:
117,450,000 117,500,000 117,550,000
Klf17 Slc6a9
Ccdc24
Occupancy conservation
proportion
Occupancy conservation
proportion
c
HS1862 HS1858 HS1857HS1866HS1855 Enhancers (liver)
Enhancers (brian)
Enhancers (heart)
ChromHMM (heart)
ChromHMM (MEL)
ChromHMM (liver)
ChromHMM (brain)
DHSs (heart)
DHSs (liver)
DHSs (brain)
Enhancers (brian)
ChromHMM (brain)
DHSs (brain)
TAL1 OSs (MEL)
GATA1 OSs (MEL)
P300 OSs (MEL)
Enhancers (MEL)
ChromHMM (MEL)
TAL1 OSs (MEL)
GATA1 OSs (MEL)
P300 OSs (MEL)
Enhancers (MEL)
HS1857
HS1859
HS1854 HS1867 HS1859HS1860HS1856
Ric3
Density
Figure 3
|
Conservation of occupancy is
associated with chromatin accessibility and
enhancer activity in multiple tissues.
a, Association between occupancy conservation
and chromatin accessibility across several tissues.
The density plot represents the frequency that
TF OSs are in accessible chromatin in varying
numbers of cell types. The xaxis is the Shannon
index density calculated on the basis of the DHS
signals in 55 tissues or cell lines in mouse; high
values mean the TF OS is in accessible chromatin in
many cell types. The red line shows the fraction of
TF OSs at which occupancy is conserved within
each bin of Shannon index. b, Association between
occupancy conservationand enhancer usage across
several tissues. The density plot represents the
frequency that TF OSs are in chromatin indicative
of enhancer activity (calculated using histone H3
acetyl Lys 27 (H3K27ac) ChIP-seq signals) in
varying numbers of cell types. The xaxis is the
Shannon index calculated based on H3K27ac
signal across 23 tissues or cell lines. The red line
shows the fraction of TF OSs atwhich occupancy is
conserved within each bin of Shannon index.
pre-enhancer, presumptive enhancer. c, Results
of transgenic mouse enhancer assays of ten
occupancy-conserved GATA1 binding sites. The
stained embryo images are highlighted by activity
in different tissues: light pink for those showing
enhancer activity only in heartand vascular tissues,
darker pink for those with activities in other tissues.
Right panel shows genes, enhancers predicted
by histone modifications, chromatin states (using
the software ChromHMM, see Methods), factor
occupancy, and DHS signals across different
tissues for regions containing two GATA1 OSs.
0.0
0.2
0.4
0 5 10 15 20
No. TFs per occupied region
Density
Occupany not conserved
Occupancy conserved
a
CTCF
RAD21
MAZ
UBTF
RDBP
CHD1
E2F4
MEF2A
MAFK
JUND
SMC3
RCOR1
NRF1
BHLHE40
USF2
TAL1
GATA1
EP300
TBP
SIN3A
ELF1
Pol2
CHD2
ETS1
USF1
MYC
MAX
MXI1
GATA1
TAL1
RCOR1
ETS1
EP300
RAD21
SMC3
CTCF
JUND
MAFK
MEF2A
BHLHE40
USF2
CHD1
E2F4
USF1
MYC
CHD2
Pol2
NRF1
RDBP
TBP
ELF1
MXI1
SIN3A
MAX
UBTF
MAZ
CTCF
RAD21
MAZ
UBTF
RDBP
CHD1
E2F4
MEF2A
MAFK
JUND
SMC3
RCOR1
NRF1
BHLHE40
USF2
TAL1
GATA1
EP300
TBP
SIN3A
ELF1
Pol2
CHD2
ETS1
USF1
MYC
MAX
MXI1
GATA1
TAL1
RCOR1
ETS1
EP300
RAD21
SMC3
CTCF
JUND
MAFK
MEF2A
BHLHE40
USF2
CHD1
E2F4
USF1
MYC
CHD2
Pol2
NRF1
RDBP
TBP
ELF1
MXI1
SIN3A
MAX
UBTF
MAZ
50 0 50 100 1,000 0 1,000
SignicanceSignicance
b
c
Figure 4
|
TFs co-association and occupancy conservation. a, Density plot
shows the distribution of co-associatedTF numbers in each TF-binding region.
The xaxis represents the total number of occupied TFs per region. b, Pair-wise
TF co-association in MEL cells. The colour intensity represents the extent of
co-association between the TFs denoted in the rows and columns compared to
the random expectation (details in Supplementary Methods). Red represents
co-association higher than random expectation, blue represents co-association
lower than random expectation.c, Conditional TF OSs occupancy conservation
in MEL cells. The colour intensity represents for a given TF (columns),
whether the co-association with the other TF (rows)is more enriched in lineage-
specific binding sites (green) or occupancy-conserved binding sites (red). The
colour scale represents the extent (–log Pvalue) of the enrichment significance.
RESEARCH ARTICLE
374 | NATURE | VOL 515 | 20 NOVEMBER 2014
Macmillan Publishers Limited. All rights reserved
©2014
Occupancy conservation and functional SNVs
In a previous study, we assigned putative regulatory potential to gen-
ome variations by combining high-throughput experimental data sets,
computational predictions, and manual annotation
24
. Interestingly,even
though conservation was not considered during the previous classifi-
cations, we found that single nucleotide variants (SNVs) with high reg-
ulatory potentialwere highly enriched in occupancy-conserved TF OSs
(Extended Data Table 1a). Moreover, examination of the distribution
of genome-wide associationstudy (GWAS) single nucleotide polymor-
phisms (SNPs) as a function of TF OS occupancy conservation revealed
a significant enrichment of GWAS SNPs in occupancy-conserved TF
OSs (P,2.2 310
216
, Fisher’s exact test; see Supplementary Informa-
tion) compared with the background distribution of all genetic variation
in the SNP database (dbSNP). When examining individual phenotypes,
we found that SNPs associated with several phenotypes such as type I
diabetesare significantly enriched inoccupancy-conserved TF OSs (P5
0.019, Fisher’s exact test; Extended Data Table 1b). However, SNPs as-
sociated with other phenotypes, such aspulmonary function, are highly
human-specific (P50.027,Fisher’s exact test;Extended Data Table1b).
Thus, although GWAS SNPs are generally enriched in occupancy-
conserved TF OSs, this enrichment is phenotype-specific.
Discussion
Here we report that the conservation of TF occupancy associates with
pleiotropic functions. This observation was further validated by in vivo
enhancer assays in transgenic mice. To our knowledge, this is the first
systematic investigation and validation of the relationship between pleio-
tropic TF OSs and their occupancy conservation. The pleiotropic func-
tions of a regulatory module subject it to several constraints that preserve
the underlying motifs and occupancy patterns. However, the roles in
different tissues need not be carried out by the same TF. Paralogous
proteins that bind to the same DNA motif (for example, GATA5 or
GATA6) could be the active proteins in non-erythroid tissues at the
GATA1 OSs with conserved occupancyand pleiotropic functions. This
prediction can be tested in future studies.
Cell lines were used in this study because they provide an abundant
source of almost identical cells, whereas obtaining primary cells in suf-
ficientnumber for a study ofthis scale is problematic for manycell types.
One concern is that celllines across different species may not be entirely
analogous. Although this possibility cannotbe ruled out, when we com-
pared the expression profile of the four cell lines with those of many
other mouse tissues, we found thatboth MEL and K562, and also CH12
and GM12878, were the most similar pairs (Supplementary Fig. 2a). This
close similarity was also seenfor genome-wide histone modificationsig-
natures (Supplementary Fig. 2b). Thus, we conclude that the K562 and
MEL pair of cell lines and the GM12878 and CH12 cell-line pair are
sufficiently similar for meaningful cross-species comparisons. Another
concern is that the trends observed in cell lines may not be represent-
ative of primary cells. Examination of binding of five TFs in mouse and
human ES cells confirmed the preferential conservation of binding at
promoters and the correlation of occupancy conservation with pleio-
tropy of DHSs(Extended Data Fig. 8).Thus, the principlesgleaned from
our examination of many TFs in cell lines are likely to hold for TFs in
primary cells.
Online Content Methods, along with any additional Extended Data display items
and SourceData, are available in theonline version of the paper;references unique
to these sections appear only in the online paper.
Received 5 February; accepted 21 October 2014.
1. Odom, D. T. et al. Tissue-specific transcriptional regulation has diverged
significantly between human and mouse. Nature Genet. 39, 730–732 (2007).
2. Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of
transcription factor binding. Science 328, 1036–1040 (2010).
3. Stefflova, K. et al. Cooperativity and rapid evolution of cobound transcription
factors in closely related mammals. Cell 154, 530–540 (2013).
4. Kunarso, G. et al. Transposable elements have rewiredthe core regulatory network
of human embryonic stem cells. Nature Genet. 42, 631–634 (2010).
5. Borneman, A. R. et al. Divergence of transcription factor binding sites across
related yeast species. Science 317, 815–819 (2007).
6. Pennacchio, L. A. & Rubin, E. M. Genomic strategies to identify mammalian
regulatory sequences. Nature Rev. Genet. 2, 100 (2001).
7. He, Q. et al. High conservation of transcription factor binding and evidence for
combinatorial regulation acrosssix Drosophila species.Nature Genet. 43, 414–420
(2011).
8. Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and
modENCODE consortia. Genome Res. 22, 1813 (2012).
9. Tokugawa, Y., Koyama, M. & Silver, J. A molecular basis for species differences in
Thy-1 expression patterns. Mol. Immunol. 34, 1263 (1997).
10. Mestas, J. & Hughes, C. C. W. Of mice and notmen: differences betweenmouse and
human immunology. J. Immunol. 172, 2731–2738 (2004).
11. Nitzsche, A. et al. RAD21 cooperates with pluripotency transcription factors in the
maintenance of embryonic stem cell identity. PLoS ONE 6, e19470 (2011).
12. Merkenschlager, M. & Odom, D. T. CTCF and cohesin: linking gene regulatory
elements with their targets. Cell 152, 1285–1297 (2013).
13. Sawado, T., Igarashi, K. & Groudine, M. Activation of b-major globin gene
transcription is associated with recruitment of NF-E2 to the b-globin LCRand gene
promoter. Proc. Natl Acad. Sci. USA 98, 10226 (2001).
14. Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature
488, 116–120 (2012).
15. Yue, F. et al. A comparative encyclopedia of DNA elements inthe mouse genome.
Nature http://dx.doi.org/10.1038/nature13992 (this issue).
16. Xie, M. et al. DNA hypomethylation within specific transposable element families
associates with tissue-specific enhancer landscape. Nature Genet. 45, 836–841
(2013).
17. Sundaram, V., Cheng, Y., Snyder, M. P. & Wang, T. Widespread contribution of
transposableelements to theinnovation of gene regulatory networks.Genome Res.
http://dx.doi.org/10.1101/gr.168872.113 (15 October 2014).
18. Schmidt, D. et al. Waves of retrotransposon expansion remodel genome
organization and CTCF binding in multiple mammalian lineages. Cell 148,
335–348 (2012).
19. Gross, D. S. & Garrard, W. T. Nuclease hypersensitive sites in chromatin. Annu. Rev.
Biochem. 57, 159–197 (1988).
20. Heintzman, N. D. et al. Distinct and predictive chromatin signatures of
transcriptional promoters andenhancers in the human genome.Nature Genet. 39,
311–318 (2007).
21. Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser–a
database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92
(2007).
22. Visel, A. et al. ChIP-seq accurately predicts tissue-specific activity of enhancers.
Nature 457, 854–858 (2009).
23. Kadamb, R., Mittal, S., Bansal, N., Batra, H. & Saluja, D. Sin3: insight into its
transcription regulatory functions. Eur. J. Cell Biol. 92, 237–246 (2013).
24. Boyle, A. P. et al. Annotation of functional variation in personal genomes using
RegulomeDB. Genome Res. 22, 1790–1797 (2012).
Supplementary Information is available in the online version of the paper.
Acknowledgements Thiswork is funded by grants 3RC2HG005602, 5U54HG006996
and 1U54HG00699 (M.P.S.), and R01DK065806 and RC2HG005573 (R.C.H.). A.V.
and L.A.P. were supported by National Human Genome Research Institute (NHGRI)
grant R01HG003988, U54HG006997 and supplementary funds provided by the
American Recovery and Reinvestment Act. The in vivo enhancer activity assays were
conducted at the E.O. Lawrence Berkeley National Laboratory and performed under
Department of Energy Contract DE-AC02-05CH11231, University of California. We
acknowledge R. M. Myers for providing access to ChIP-seq data in human embryonic
cells. Illumina sequencing services were performed by the Stanford Center for
Genomics and Personalized Medicine.
Author Contributions Y.C., B.-H.K., A.P.B., W.W., J.L. and Z.M. analysed the data. Z.M.,
Y.C., P.C.,X.Y., D.P., G.E., T.K.,C.A.K. and B.G. preparedand pre-processedChIP-seq data.
V.S. and X.X. prepared and pre-processed MRE-seq and MEDIP-seq data. A.V. and N.D.
conducted the enhancer assay. Y.C., Z.M., R.C.H., M.P.S., K.A., T.W., L.A.P., Z.W., S.L. and
Y.L. wrote the paper with input from all authors. M.P.S. and R.C.H. coordinated and
supervised the project.
Author Information Reprints and permissions information is available at
www.nature.com/reprints. The authors declare no competing financial interests.
Readers are welcome to comment on the onlineversion of the paper. Correspondence
and requestsfor materials shouldbe addressed to M.P.S.(mpsnyder@stanford.edu) or
R.C.H. (rch8@psu.edu).
This work is licensed under a Creative Commons Attribution-
NonCommercial-ShareAlike 3.0 Unported licence. The images orother
thirdparty materialin this article are includedin the article’s CreativeCommons licence,
unless indicated otherwise in the credit line; if the material is not included under the
Creative Commons licence, users will need to obtainpermission from the licence holder
to reproduce the material. To view a copy of this licence, visit http://creativecommons.
org/licenses/by-nc-sa/3.0
ARTICLE RESEARCH
20 NOVEMBER 2014 | VOL 515 | NATURE | 375
Macmillan Publishers Limited. All rights reserved
©2014
METHODS
ChIP-seq. ChIP for TFs was carried out as previously described
25
. Cultured cells
for biological replicates were grown in separate batches and at separate times. In brief,
5310
7
cells were grown to a density of 0.6–0.8 310
6
per ml, cells were then cross-
linked in 1% formaldehyde for 10 min at room temperature. Nuclear lysates were
sonicated using a Branson 250 Sonifier (power setting 7, 100% duty cycle for 12 3
20 s intervals), such that the chromatin fragments ranged from 50 to 2,000bp. In-
formationon control IgG and TF antibodies used for ChIP-seqexperiments is listed
in Supplementary Table 2. Protein–DNA–TF antibody complexes were captured
on Protein A/G agarose beads (Millipore 16-156/16-266) and eluted in 1% SDS TE
buffer at 65 uC. After cross-link reversal and DNA purification, the ChIP DNA se-
quencing libraries were prepared as described
8
. Libraries were sequenced on an
Illumina Genome Analyzer II and HiSeq 2000.
Uniform ChIP-Seq data processingpipeline. We used a uniform processing pipe-
line to identify high confidence binding peaks in mouse and human. Reads mapping:
for human ChIP-Seq, mapped reads in the form of BAM files were downloaded from
ENCODE Universityof California, Santa Cruz (UCSC) Data CoordinationCenter
(DCC) (http://encodeproject.org/ENCODE/downloads.html). For mouse ChIP-seq,
reads were mapped by BWA
26
. To standardize the mapping protocol, we used cus-
tom mappability tracks to filter out multi-mapping reads and only retain unique
mapping reads (reads that map to exactly one location in the genome). We also
filtered all positional and PCR duplicates. Quality control: several quality metrics
for all replicate experiments of each data set were computed. In brief, these metrics
measure ChIP enrichment, signal-to-noise ratios, sequencing depth, library com-
plexityand reproducibility of peak calling
8
. ChIP-seqthat did not pass theminimum
qualitycontrol thresholds were discardedand not used in any analyses. Peak calling:
all ChIP-seq experimentswere scored against an appropriate controldesignated by
the production groups(either input DNA or DNA obtained from a control immu-
noprecipitation). We used the SPP pea k caller
27
to identify and score (rank) potential
occupancy sites/peaks. For obtaining optimal thresholds, we used the irreproducible
discoveryrate (IDR) framework to determine high confidenceoccupancy events by
leveraging the reproducibility and rank consistency of peak identifications across
replicate experiments of a data set. Code and detailed step-by-step instructionsto
call peaks using the IDR framework are available at: https://sites.google.com/site/
anshulkundaje/projects/idr. Black list: all peak sets were then screenedagainst spe-
ciallycurated empiricalblacklists for eachspecies (A.P.B. andA.K., manuscript sub-
mitted). In brief, theseblacklist regions typically show the followingcharacteristics:
unstructured and extreme high signal in sequenced input DNA and control data
sets as well as open chromatin data sets irrespective ofcell type identity; an extreme
ratio of multi-mapping to unique mapping reads from sequencing experiments;
overlap with specifictypes of repeat regions such as centromeric,telomeric and sat-
ellite repeats that often have few unique mappable locations interspersed in repeats.
The human blacklist can be found from: http://hgdownload.cse.ucsc.edu/goldenPath/
hg19/encodeDCC/wgEncodeMapability/wgEncodeDacMapabilityConsensusEx
cludable.bed.gz. The mouse blacklist can be downloaded from: http://www.broad
institute.org/,anshul/projects/mouse/blacklist/mm9-blacklist.bed.gz. In this study,
the blacklist filtered IDR binding peaks for the same TF using the same cell line
generated by different institutes were merged. All the raw read files, mapped files
and peak files in mouse are deposited in http://mouseencode.org.The human data
can be accessed in https://www.encodeproject.org.The access ID in eachexperiment
can be found in Supplementary Table 2.
Motif finding. To compare mouse and human regulatory networks, we applied
the de novo motif discoveryapproach that we developed previously
28
and obtained
a list of high-confidence sequence motifs using the ChIP-seq data sets. For each
ChIP-seqdata set, our computationalpipeline reportedup to five significant motifs.
Typically, one of the motifs is the canonical motif of the TF, reflecting its DNA-
binding specificity, and we call this the primary motif. If the TF does not have a
DNA binding domain, we define the strongest motif as its primary motif. We call
the remainingmotifs secondary motifs.When the primary motifs of a pair of ortho-
logous TFs arecompared, they are either ‘conserved’ or ‘not conserved’on the basis
of whether the similarity between them passes the cut off (1.0 310
25
). Because a
TF may have several secondary motifs, the secondary motifs of two orthologous
TFs are ‘partly conserved’ if a subset, but not all, of the motifs are conserved. When
neither the human TF nor the mouse TF has a secondary motif, we assign the situ-
ation as motif ‘not available’.
ChromHMM. ChromHMM
29
was applied on the ChIP-seq data of five histone
modifications to learn a multivariate HMM model for segmentation of mapped
genome in each cell type. Specifically, the ChIP-seq mapped reads were firstpooled
from replicates for each of the five histone modifications (H3K4me3, H3K4me1,
H3K36me3, H3K27ac and H3K27me3). These mapped reads were first processed
by ChromHMM into binarized data in every 200-bp window over the entire mapped
genome, with ChIP ‘input’ reads as the background control. To learn the model jointly
from mouse and human, a pseudo genome table was first constructed by concat-
enating the mouse mm9 and human hg19 table, then the model was learned from
the binarized data in all four cell lines, giving a single model with a common set of
emission parameters and transition parameters, which was then used to produce
segmentations in allcell types basedon the most likely state assignment of themodel.
We tried models with up to 20 states and selected an eight-state-model as it ap-
peared mostparsimonious in the sensethat all eight states had clearly distinct emis-
sion properties, while the interpretability of distinction between states in models with
additional states was less clear.
MeDIP-seqand MRE-seq. MeDIP-seq and MRE-seq experiments were performed
as previously described
16
. The reads were aligned to hg19 and mm9 using BWA.
MRE-seq reads were further normalized for difference in enzyme efficiency.
Defining different genomic locations. TSSs were defined by ENCOCDE con-
sortium
15
. Promoter regions weredefined as 2 kb upstream and downstream of the
TSS. Distal regions were defined as 10 kb away from TSS. The rest of the genomic
regions were defined as middleregions. All the three genomic locations are exclus-
ive to each other, and the priorityduring the definition is promoter, distal and mid-
dle. Each TF OS was assigned to one (and only one) genomic location. If TF OSs
overlapped with severalregions, thecentre of the OS wasused to definewhich region
to assign.
TF OSs sequence. phyloP
30
wiggletrack were downloadedfrom the UCSC browser.
Specifically, hg19 phyloP46way track was used for human and mm9 phyloP30way
track was used for mouse. This average phyloP score were calculated at one base
pair resolution in 200-bp regions centred on the summit of TF peaks.
Mapping reciprocalorthologous sequences between humanand mouse. Ortho-
logous DNA sequences between human and mouse were mapped by bnMapper
(O. Denas, R. Sandstrom and J. Taylor, manuscript submitted) using reciprocal chain
with default setting (bnMapper.py -f BED12).
RegulomeDB SNV and occupancy conservation. SNPs assigned with pre-calculated
regulatory potentials were downloaded from: http://www.regulomedb.org/down
loads. dbSNP138 was downloaded from the UCSC genome browser. TF OSs were
divided into two exclusive groups: occupancy-conserved and human-specific. The
number of SNPs with high regulatory potentialand the number of dbSNPs located
in each group of TF OSs were calculated.Fisher’s exact test was conductedto exam-
ine the enrichment of SNPs with high regulatory potential in each group.
GWAS SNPs and occupancy conservation. GWAS catalogue filewas downloaded
from: http://www.genome.gov/admin/gwascatalog.txt. Lead SNPs that overlapped
with exons were removed. For each lead SNP, if either the SNP itself or the linkage
disequilibrium SNPs are located within a given TF OS, it was assigned to that TF
OS. Lead SNPs that can be assigned to several TF OSs were also removed. Two-
sided Fisher’s exact tests were conduct to calculate the enrichment of conservation
in each given phenotype compared with the distributionof all dbSNPs, and Pvalues
were further adjusted by Benjamini–Hochberg procedure.
25. Kasowski, M. et al. Variation in transcription factor binding among humans.
Science 328, 232–235 (2010).
26. Li, H. & Durbin, R. Fast and accurateshort read alignment with Burrows–Wheeler
Transform. Bioinformatics 25, 1754–1760 (2009).
27. Kharchenko,P. V., Tolstorukov, M. Y. & Park,P. J. Design and analysisof ChIP-seq
experiments for DNA-binding proteins. Nature Biotechnol. 26, 1351–1359 (2008).
28. Wang, J. et al. Sequence features and chromatin structure around the genomic
regions bound by 119 human transcription factors. Genome Res. 22,
1798–1812 (2012).
29. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and
characterization. Nature Methods 9, 215–216 (2012).
30. Cooper, G. M. et al. Distribution and intensity of constraint in mammalian
genomic sequence. Genome Res. 15, 901–913 (2005).
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 1
|
TF ChIP-seq data overview and analysisworkflow.
a, All TFs in this study are grouped according to species and cell types. TF DNA
binding domains are list in the second column. The TFs without binding
domains are highlighted ingrey. The TFs assayed were cross-marked, whereas
TFs not assayed are depicted in white. b, Flowchart for the analysis pipeline for
inter- and intra-species comparisons.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 2
|
TF OSs distribution and motifs. a, An illustration
of TF OS distribution relative to TSSs in MEL and K562 cells. Each row
represents one TF, each column represents one genomic region. Heat-map
colour shows the proportions of TF OSs that are located in different genomic
regions. b, Similar TF OS distribution plot as ain CH12 and GM12878 cells.
c, Correlation between mouse and human TF OS distribution. Dot plot shows
the correlation of orthologousTF OS distribution in each genomic region.Each
dot represents proportion of OSs for one TF in one genomic region. The xaxis
is the proportion in mouse genome, and the yaxis is the proportion in
human genome. d, Motif comparison for sequence specific TFs examined in
erythroid progenitor cells (MEL and K562). Each row represents one TF.
The level of motif conservation is encoded by colour.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 3
|
TF OS chromatin states and DNA methylation
status preference comparison. a, Emission matrix of ChromHMM trained
by five histone modification markers (H3K4me1, H3K4me3, H3K36me3,
H3K27me3 and H3K27ac).b, Heat map shows the proportion of TF OSs (rows)
that overlap with each chromatin state (columns) generated by ChromHMM
using five different histone markers in CH12 and GM12878 cells. c, The average
signal distributions for MeDIP-seq and MRE-seq in CH12 and GM12878 cells.
The 5-kb flanking regions centred on the TF OS peak summits were
divided into 50-bp bins. Signals were aggregated in each bin.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 4
|
Proportion of predicted enhancers in the
orthologous TF OSs. Bar graphs show the proportions of TF OSs that
overlapped with the predicted enhancers. a, Results in MEL and K562 cells.
b, Results in CH12 and GM12878 cells. The xaxis represents different TFs, the
yaxis represents the proportion of TF OSs that overlapped with predicted
enhancers.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 5
|
Occupancy conservation adjusted by sequence
conservation. a,The heat map represents the adjusted occupancy conservation
of TF (row) OSs in the four cell lines. The colour intensity represents the
proportion of TF OSs that are occupancy-conserved between mouse and
human in different genomic regions (column). To remove the bias introduced
by variation of sequence conservation at different genomic loci, only TF OSs in
which the sequence can be aligned between mouse and human were included in
this analysis. b, The heat map is similar to Fig. 2b. TFs showing remarkable
difference on total binding peaks numbers between the mouse and human
were excluded.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 6
|
Comparison of the epigenetic features between TF
OSs and orthologous sequences. a,Theyaxis represents the proportion of
TF OSs in each chromatin state. TF OSs that can be aligned between mouse
and human are divided into two categories according to the occupancy
conservation status. Each panel represents distribution of TF OSs in one cell
line. b, Each panel represents mouse TF OSs in one chromatin state.
The pie chart in each panel shows the proportions of chromatin states in the
orthologous sequence in human. Panels in the left column represent the
occupancy-conserved TF OSs, and panels in the right column represent the TF
OSs that can be aligned but without occupancy conservation. c,Theyaxis
represents the normalized DNA methylation signals (MeDIP-seq). TF OSs that
can be aligned between mouse and human are divided into two categories
according to the occupancy conservation status (both sequence and occupancy
are conserved (OCC) and sequence is conserved but occupancy is not
conserved (SCNC)). Each panel represents distribution in one cell line.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 7
|
Conservation of occupancy is associated with
chromatin accessibility and enhancer activity in several tissues.
a, Association between occupancy conservation and chromatin accessibility
across several tissues. The density plot represents the frequency that TF OSs
(removed DNA sequences occupied by CTCF, RAD21 and SMC3) are in
accessible chromatinin varying numbers of cell types. The xaxis is the Shannon
index calculated based on the DHS signals in 55 mouse tissues or cell lines;
high values mean the TF OS is in accessible chromatin in many cell types.
The red line shows the fraction of TF OSs at which occupancy is conserved
within each bin of the Shannon index. b,c, The association between occupancy
conservation and chromatin accessibility across multiple tissues for each
TF (row) in CH12 and MEL cells. TF OSs are divided into different bins
according to the value of the Shannon index (columns). The colour intensity
represents the proportion of occupancy-conserved TF OSs within each bin.
d,e, Similar distribution to band cbut only for TF OSs that are located 2kb
away from TSSs.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 8
|
Consistency of observations between
embryonic stem cells and cell lines. a, Genomic distribution of five TF OSs
in embryonic stem cells. b, Occupancy conservation in different genomic
locations between human and mouse embryonic stem cells. c, Occupancy
conservation of TF OSs in embryonic stem cells is associated with function in
many tissues.
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Figure 9
|
Relationship between occupancy conservation
and pair-wised TFs co-association. ad, Occupancy conservation and TF
co-association analysis was conducted as described in Fig. 4c for all four
cell lines. The TFs were kept in the same order across the four cell lines for
easy visualization.
ARTICLE RESEARCH
Macmillan Publishers Limited. All rights reserved
©2014
Extended Data Table 1
|
SNVs with regulatory potential are enriched in occupancy-conserved TF OSs
a, SNVs annotated with high regulatory potential by RegulomeDB are enriched in occupancy-conserved TF OSs.
*Category 1a includes SNVs with the following features: eQTL 1TF binding 1matched TF motif 1matched DNase footprint 1DNase peak.
** Category 1b includes SNVs with the following features: eQTL 1TF binding 1any motif 1DNase footprint 1DNase peak.
b, GWAS SNPs show significant enrichment in occupancy-conserved TF OSs or human-specific TF OSs (highlighted in grey).
RESEARCH ARTICLE
Macmillan Publishers Limited. All rights reserved
©2014
... Previous bulk sequencing assays have revealed general principles concerning the conservation of CREs and tissue-specific gene expression patterns. For example, enhancers exhibit rapid turnover during mammalian evolution 4,5 , and conserved enhancers have lower cell type specificity 6,7 . By contrast, sequence divergent enhancers have a substantial role in establishing tissue and species-specific traits 8,9 . ...
... By contrast, sequence divergent enhancers have a substantial role in establishing tissue and species-specific traits 8,9 . Such divergent enhancers are often mediated by de novo insertion of transposable elements (TEs) carrying clusters of transcription-factor-binding sites 6,10,11 . Notably, the conservation of CREs 12,13 and expression 14 generally decreases as development progresses. ...
... We used stringent criteria to identify whether a gene is differentially expressed between a species pair. To account for multiple comparisons we nominated an FDR of 0.001, which we further lowered to 8.33 × 10 −6 by dividing by the number of pairs of species (6), multiplied by the number of cell types (20). In addition to this FDR threshold, we required our differentially expressed genes to meet a minimum fold change of 2, as well as be expressed in at least 15% of the cells in the upregulated species cell type. ...
Article
Full-text available
Divergence of cis-regulatory elements drives species-specific traits¹, but how this manifests in the evolution of the neocortex at the molecular and cellular level remains unclear. Here we investigated the gene regulatory programs in the primary motor cortex of human, macaque, marmoset and mouse using single-cell multiomics assays, generating gene expression, chromatin accessibility, DNA methylome and chromosomal conformation profiles from a total of over 200,000 cells. From these data, we show evidence that divergence of transcription factor expression corresponds to species-specific epigenome landscapes. We find that conserved and divergent gene regulatory features are reflected in the evolution of the three-dimensional genome. Transposable elements contribute to nearly 80% of the human-specific candidate cis-regulatory elements in cortical cells. Through machine learning, we develop sequence-based predictors of candidate cis-regulatory elements in different species and demonstrate that the genomic regulatory syntax is highly preserved from rodents to primates. Finally, we show that epigenetic conservation combined with sequence similarity helps to uncover functional cis-regulatory elements and enhances our ability to interpret genetic variants contributing to neurological disease and traits.
... To explain why human TINAGL1 gene expression differs from mouse, a better understanding of the differences between the human and mouse genomes has clarified that genes involved in intracellular processes such as RNA processing and chromatin organization (transcription factors binding to promoter regions like the ZNF334 gene) tend to have a similar gene expression pattern between humans and mice (highly conserved), whereas genes involved in extracellular matrix and cellular adhesion (distant regulatory elements like the TINAGL1 gene) and signaling receptors are less conserved (lineage-specific) [48][49][50]. ...
Article
Full-text available
Background The primary goal of this work is to identify biomarkers associated with lung squamous cell carcinoma and assess their potential for early detection of lymph node metastasis. Methods This study investigated gene expression in lymph node metastasis of lung squamous cell carcinoma using data from the Cancer Genome Atlas and R software. Protein-protein interaction networks, hub genes, and enriched pathways were analyzed. ZNF334 and TINAGL1, two less explored genes, were further examined through in vitro, ex vivo, and in vivo experiments to validate the findings from bioinformatics analyses. The role of ZNF334 and TINAGL1 in senescence induction was assessed after H2O2 and UV induced senescence phenotype determined using β-galactosidase activity and cell cycle status assay. Results We identified a total of 611 up- and 339 down-regulated lung squamous cell carcinoma lymph node metastasis-associated genes (FDR < 0.05). Pathway enrichment analysis highlighted the central respiratory pathway within mitochondria for the subnet genes and the nuclear DNA-directed RNA polymerases for the hub genes. Significantly down regulation of ZNF334 gene was associated with malignancy lymph node progression and senescence induction has significantly altered ZNF334 expression (with consistency in bioinformatics, in vitro, ex vivo, and in vivo results). Deregulation of TINAGL1 expression with inconsistency in bioinformatics, in vitro (different types of lung squamous cancer cell lines), ex vivo, and in vivo results, was also associated with malignancy lymph node progression and altered in senescence phenotype. Conclusions ZNF334 is a highly generalizable gene to lymph node metastasis of lung squamous cell carcinoma and its expression alter certainly under senescence conditions.
... Integrated multi-omics analysis of identifies distinct molecular characteristics in infections [32,65] and that chromatin accessibility is a marker for active enhancers, which are cis-regulatory elements of gene expression [66]. Chromatin accessibility and RNA abundance are often measured together to map the regulatory context of gene expression but do not include proteomic measurements [67]. ...
Article
Full-text available
Pseudomonas aeruginosa (P. aeruginosa) can cause severe acute infections, including pneumonia and sepsis, and cause chronic infections, commonly in patients with structural respiratory diseases. However, the molecular and pathophysiological mechanisms of P. aeruginosa respiratory infection are largely unknown. Here, we performed assays for transposase-accessible chromatin using sequencing (ATAC-seq), transcriptomics, and quantitative mass spectrometry-based proteomics and ubiquitin-proteomics in P. aeruginosa-infected lung tissues for multi-omics analysis, while ATAC-seq and transcriptomics were also examined in P. aeruginosa-infected mouse macrophages. To identify the pivotal factors that are involved in host immune defense, we integrated chromatin accessibility and gene expression to investigate molecular changes in P. aeruginosa-infected lung tissues combined with proteomics and ubiquitin-proteomics. Our multi-omics investigation discovered a significant concordance for innate immunological and inflammatory responses following P. aeruginosa infection between hosts and alveolar macrophages. Furthermore, we discovered that multi-omics changes in pioneer factors Stat1 and Stat3 play a crucial role in the immunological regulation of P. aeruginosa infection and that their downstream molecules (e.g., Fas) may be implicated in both immunosuppressive and inflammation-promoting processes. Taken together, these findings indicate that transcription factors and their downstream signaling molecules play a critical role in the mobilization and rebalancing of the host immune response against P. aeruginosa infection and may serve as potential targets for bacterial infections and inflammatory diseases, providing insights and resources for omics analyses.
... It is also possible that the nonconserved NR2F1-binding site in mouse cRE1 attenuates the role of cRE2 in mouse r4MNs. Finally, introduction of cRE2 SNVs in our lacZ assay unveiled enhancer activity, probably through oppor tunistic binding of other TFs, which could vary between mice and humans 71 . ...
Article
Full-text available
Hereditary congenital facial paresis type 1 (HCFP1) is an autosomal dominant disorder of absent or limited facial movement that maps to chromosome 3q21-q22 and is hypothesized to result from facial branchial motor neuron (FBMN) maldevelopment. In the present study, we report that HCFP1 results from heterozygous duplications within a neuron-specific GATA2 regulatory region that includes two enhancers and one silencer, and from noncoding single-nucleotide variants (SNVs) within the silencer. Some SNVs impair binding of NR2F1 to the silencer in vitro and in vivo and attenuate in vivo enhancer reporter expression in FBMNs. Gata2 and its effector Gata3 are essential for inner-ear efferent neuron (IEE) but not FBMN development. A humanized HCFP1 mouse model extends Gata2 expression, favors the formation of IEEs over FBMNs and is rescued by conditional loss of Gata3. These findings highlight the importance of temporal gene regulation in development and of noncoding variation in rare mendelian disease.
... Further support for the pleiotropy hypothesis comes from transcriptomic studies on insects and vertebrates, which show conserved expression of regulatory genes during the phylotypic stage, including expression of microRNA genes (Kalinka et al. 2010;De Mendoza et al. 2013;Stergachis et al. 2013;Ninova et al. 2014;Levin et al. 2016), genes with pleiotropic activity in other parts of the embryo (Cheng et al. 2014;Hu et al. 2017), and those with pleiotropic activity at other stages during development (Levin et al. 2012;Hu et al. 2017; see also Fish et al. 2017). ...
Chapter
Full-text available
Essays on evolvability from the perspectives of quantitative and population genetics, evolutionary developmental biology, systems biology, macroevolution, and the philosophy of science. Evolvability—the capability of organisms to evolve—wasn't recognized as a fundamental concept in evolutionary theory until 1990. Though there is still some debate as to whether it represents a truly new concept, the essays in this volume emphasize its value in enabling new research programs and facilitating communication among the major disciplines in evolutionary biology. The contributors, many of whom were instrumental in the development of the concept of evolvability, synthesize what we have learned about it over the past thirty years. They focus on the historical and philosophical contexts that influenced the emergence of the concept and suggest ways to develop a common language and theory to drive further evolvability research. The essays, drawn from a workshop on evolvability hosted in 2019–2020 by the Center of Advanced Study at the Norwegian Academy of Science and Letters, in Oslo, provide scientific and historical background on evolvability. The contributors represent different disciplines of evolutionary biology, including quantitative and population genetics, evolutionary developmental biology, systems biology, and macroevolution, as well as the philosophy of science. This plurality of approaches allows researchers in disciplines as diverse as developmental biology, molecular biology, and systems biology to communicate with those working in mainstream evolutionary biology. The contributors also discuss key questions at the forefront of research on evolvability. Contributors:J. David Aponte, W. Scott Armbruster, Geir H. Bolstad, Salomé Bourg, Ingo Brigandt, Anne Calof, James M. Cheverud, Josselin Clo, Frietson Galis, Mark Grabowski, Rebecca Green, Benedikt Hallgrímsson, Thomas F. Hansen, Agnes Holstad, David Houle, David Jablonski, Arthur Lander, Arnaud LeRouzic, Alan C. Love, Ralph Marcucio, Michael B. Morrissey, Laura Nuño de la Rosa, Øystein H. Opedal, Mihaela Pavličev, Christophe Pélabon, Jane M. Reid, Heather Richbourg, Jacqueline L. Sztepanacz, Masahito Tsuboi, Cristina Villegas, Marta Vidal-García, Kjetil L. Voje, Andreas Wagner, Günter P. Wagner, Nathan M. Young
... Two methods can be employed to identify candidate transcription factors, namely (a) ATAC-seq to measure enrichment of transcription factor binding motifs at promoters and enhancers 21 ; and (b) RNA-seq to identify transcription factors known to be expressed in the first portion (S1 Segment) of the proximal tubule. ...
Article
Full-text available
Loss of a kidney results in compensatory growth of the remaining kidney, a phenomenon of considerable clinical importance. However, the mechanisms involved are largely unknown. Here, we use a multi-omic approach in a unilateral nephrectomy model in male mice to identify signaling processes associated with renal compensatory hypertrophy, demonstrating that the lipid-activated transcription factor peroxisome proliferator-activated receptor alpha (PPARα) is an important determinant of proximal tubule cell size and is a likely mediator of compensatory proximal tubule hypertrophy.
... More than 50% of TAD boundaries identified in human cells were reported to be found at homologous locations in mouse genomes (Dixon et al. 2012(Dixon et al. , 2016. However, despite this conservation of TADs across species, the underlying regulatory systems for traits of interest seem substantially to be divergent, implying that understanding complex traits need species-specific information (Schmidt et al. 2010;Cheng et al. 2014;Stergachis et al. 2014). Additionally, study by Foissac et al. (2019) also indicated that TADs boundaries under stronger selective pressure play more important role in genome architecture and regulatory function. ...
Article
Context Recent advances in molecular technology have allowed us to examine the cattle genome with an accuracy never before possible. Genetic variations, both small and large, as well as the transcriptional landscape of the bovine genome, have both been explored in many studies. However, the topological configuration of the genome has not been extensively investigated, largely due to the cost of the assays required. Such assays can both identify topologically associated domains and be used for genome scaffolding. Aims This study aimed to implement a chromatin conformation capture together with long-read nanopore sequencing (Pore-C) pipeline for scaffolding a draft assembly and identifying topologically associating domains (TADs) of a Bos indicus Brahman cow. Methods Genomic DNA from a liver sample was first cross-linked to proteins, preserving the spatial proximity of loci. Restriction digestion and proximity ligation were then used to join cross-linked fragments, followed by nucleic isolation. The Pore-C DNA extracts were then prepped and sequenced on a PromethION device. Two genome assemblies were used to analyse the data, namely, one generated from sequencing of the same Brahman cow, and the other is the ARS-UCD1.2 Bos taurus assembly. The Pore-C snakemake pipeline was used to map, assign bins and scaffold the draft and current annotated bovine assemblies. The contact matrices were then used to identify TADs. Key results The study scaffolded a chromosome-level Bos indicus assembly representing 30 chromosomes. The scaffolded assembly showed a total of 215 contigs (2.6 Gbp) with N50 of 44.8 Mb. The maximum contig length was 156.8 Mb. The GC content of the scaffold assembly is 41 ± 0.02%. Over 50% of mapped chimeric reads identified for both assemblies had three or more contacts. This is the first experimental study to identify TADs in bovine species. In total, 3036 and 3094 TADs across 30 chromosomes were identified for input Brahman and ARS-UCD1.2 assemblies respectively. Conclusions The Pore-C pipeline presented herein will be a valuable approach to scaffold draft assemblies for agricultural species and understand the chromatin structure at different scales. Implications The Pore-C approach will open a new era of 3D genome-organisation studies across agriculture species.
Preprint
Full-text available
Cofactors interacting with PPARγ can regulate adipogenesis and adipocyte metabolism by modulating the transcriptional activity and selectivity of PPARγ signaling. ZFP407 was previously demonstrated to regulate PPARγ target genes such as GLUT4, and its overexpression improved glucose homeostasis in mice. Here, using a series of molecular assays, including protein-interaction studies, mutagenesis, and ChIP-seq, ZFP407 was found to interact with the PPARγ/RXRα protein complex in the nucleus of adipocytes. Consistent with this observation, ZFP407 DNA binding sites significantly overlapped with PPARγ sites, with more than half of ZFP407 binding sites overlapping with PPARγ DNA binding sites. Transcription factor binding motifs enriched in these overlapping sites included GFY-Staf, ELF1, ETS, ELK1, and ELK4, which regulate key functions within adipocytes. Site-directed mutagenesis of frequent PPARγ phosphorylation or SUMOylation sites did not prevent its regulation by ZFP407, while mutagenesis of ZFP407 regions necessary for RXR and PPARγ binding abrogated any impact of ZFP407 on PPARγ activity. These data suggest that ZFP407 controls the activity of PPARγ, but does so independently of post-translational modifications, likely by direct binding, establishing ZFP407 as a newly identified PPARγ cofactor. In addition, ZFP407 was also found to bind to DNA in regions that did not overlap with PPARγ. These DNA binding sites were more significantly enriched for the transcription factor binding motifs of GFY and ZNF143, which may contribute to the non-PPARγ dependent functions of ZFP407 in adipocytes and other cell types.
Article
Full-text available
CRISPR screen technology enables systematic and scalable interrogation of gene function by using the CRISPR-Cas9 system to perturb gene expression. In the field of cancer immunotherapy, this technology has empowered the discovery of genes, biomarkers, and pathways that regulate tumor development and progression, immune reactivity, and the effectiveness of immunotherapeutic interventions. By conducting large-scale genetic screens, researchers have successfully identified novel targets to impede tumor growth, enhance anti-tumor immune responses, and surmount immunosuppression within the tumor microenvironment (TME). Here, we present an overview of CRISPR screens conducted in tumor cells for the purpose of identifying novel therapeutic targets. We also explore the application of CRISPR screens in immune cells to propel the advancement of cell-based therapies, encompassing T cells, natural killer cells, dendritic cells, and macrophages. Furthermore, we outline the crucial components necessary for the successful implementation of immune-specific CRISPR screens and explore potential directions for future research.
Article
Full-text available
Recent progress in massively parallel sequencing platforms has enabled genome-wide characterization of DNA-associated proteins using the combination of chromatin immunoprecipitation and sequencing (ChIP-seq). Although a variety of methods exist for analysis of the established alternative ChIP microarray (ChIP-chip), few approaches have been described for processing ChIP-seq data. To fill this gap, we propose an analysis pipeline specifically designed to detect protein-binding positions with high accuracy. Using previously reported data sets for three transcription factors, we illustrate methods for improving tag alignment and correcting for background signals. We compare the sensitivity and spatial precision of three peak detection algorithms with published methods, demonstrating gains in spatial precision when an asymmetric distribution of tags on positive and negative strands is considered. We also analyze the relationship between the depth of sequencing and characteristics of the detected binding positions, and provide a method for estimating the sequencing depth necessary for a desired coverage of protein binding sites.
Article
Full-text available
Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, relationships among sequence, conservation, and function are still poorly understood. We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse using WGA. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence is repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.
Article
Full-text available
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Article
Full-text available
Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.
Article
Full-text available
To mechanistically characterize the microevolutionary processes active in altering transcription factor (TF) binding among closely related mammals, we compared the genome-wide binding of three tissue-specific TFs that control liver gene expression in six rodents. Despite an overall fast turnover of TF binding locations between species, we identified thousands of TF regions of highly constrained TF binding intensity. Although individual mutations in bound sequence motifs can influence TF binding, most binding differences occur in the absence of nearby sequence variations. Instead, combinatorial binding was found to be significant for genetic and evolutionary stability; cobound TFs tend to disappear in concert and were sensitive to genetic knockout of partner TFs. The large, qualitative differences in genomic regions bound between closely related mammals, when contrasted with the smaller, quantitative TF binding differences among Drosophila species, illustrate how genome structure and population genetics together shape regulatory evolution.
Article
Full-text available
Introduction Transposable element (TE) derived sequences comprise half of our genome and DNA methylome, and are presumed densely methylated and inactive. Examination of the genome-wide DNA methylation status within 928 TE subfamilies in human embryonic and adult tissues revealed unexpected tissue-specific and subfamily-specific hypomethylation signatures. Genes proximal to tissue-specific hypomethylated TE sequences were enriched for functions important for the tissue type and their expression correlated strongly with hypomethylation of the TEs. When hypomethylated, these TE sequences gained tissue-specific enhancer marks including H3K4me1 and occupancy by p300, and a majority exhibited enhancer activity in reporter gene assays. Many such TEs also harbored binding sites for transcription factors that are important for tissue-specific functions and exhibited evidence for evolutionary selection. These data suggest that sequences derived from TEs may be responsible for wiring tissue type-specific regulatory networks, and have acquired tissue-specific epigenetic regulation.
Article
Full-text available
Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.
Article
Current epigenomics approaches have facilitated the genome-wide identification of regulatory elements based on chromatin features and transcriptional regulator binding and have begun to map long-range interactions between regulatory elements and their targets. Here, we focus on the emerging roles of CTCF and the cohesin in coordinating long-range interactions between regulatory elements. We discuss how species-specific transposable elements may influence such interactions by remodeling the CTCF binding repertoire and suggest that cohesin's association with enhancers, promoters, and sites defined by CTCF binding has the potential to form developmentally regulated networks of long-range interactions that reflect and promote cell-type-specific transcriptional programs.