Ales Vancura (ed.), Transcriptional Regulation: Methods and Protocols, Methods in Molecular Biology, vol. 809,
DOI 10.1007/978-1-61779-376-9_11, © Springer Science+Business Media, LLC 2012
Mapping Protein–DNA Interactions Using ChIP-Sequencing
Charles E. Massie and Ian G. Mills
Chromatin immunoprecipitation (ChIP) allows enrichment of genomic regions which are associated with
specifi c transcription factors, histone modifi cations, and indeed any other epitopes which are present on
chromatin. The original ChIP methods used site-specifi c PCR and Southern blotting to confi rm which
regions of the genome were enriched, on a candidate basis. The combination of ChIP with genomic tiling
arrays (ChIP-chip) allowed a more unbiased approach to map ChIP-enriched sites. However, limitations
of microarray probe design and probe number have a detrimental impact on the coverage, resolution,
sensitivity, and cost of whole-genome tiling microarray sets for higher eukaryotes with large genomes. The
combination of ChIP with high-throughput sequencing technology has allowed more comprehensive
surveys of genome occupancy, greater resolution, and lower cost for whole genome coverage. Herein, we
provide a comparison of high-throughput sequencing platforms and a survey of ChIP-seq analysis tools,
discuss experimental design, and describe a detailed ChIP-seq method.
Key words: Chromatin immunoprecipitation , High-throughput sequencing , “Next-generation”
sequencing , Illumina (Solexa) sequencing , Transcription , Cancer , Nuclear hormone receptor ,
Chromatin immunoprecipitation (ChIP) has been used to map
specifi c histone modifi cations ( 1 ) and sites of protein occupancy
( 2– 4 ) , providing insights into the dynamic changes associated with
gene transcription and identifying the target genes of transcription
factors and other regulatory proteins. The ChIP method utilizes
specifi c antibodies to allow enrichment of the genomic regions
associated with epitopes corresponding to chromatin marks or
chromatin-bound proteins. Since many of these interactions are
transient, protein–DNA interactions are cross-linked using formal-
dehyde, allowing their isolation by ChIP. Chromatin is then
158 C.E. Massie and I.G. Mills
fragmented to allow separation of genomic fragments bound by
the transcription factor of interest away from those which are not
bound. Following antibody enrichment (immunoprecipitation),
formaldehyde cross links are reversed and enriched DNA fragments
are then purifi ed. ChIP-enriched genomic DNA fragments can be
mapped to the genome and quantifi ed using genomic microarrays
(ChIP-chip), direct sequencing (ChIP-seq), or quantitative PCR.
Each of these approaches offers specifi c advantages and disadvan-
tages. For example, the application of quantitative PCR to assess
ChIP-enriched DNA fragments requires a priory knowledge of
ideally both the target genes and binding sequences which are
associated with a given transcription factor. Genomic tiling microar-
rays have been widely used to map transcription factor-binding
sites (ChIP-chip), with genomic coverage targeted to specifi c
regions of interest ( 5 ) , gene promoter regions ( 3 ) , or whole non-
repetitive genomes ( 4 ) , and have allowed assessment of histone
marks in as few as 10 3 cells ( 6 ) and transcription factor binding
from >10 4 cells ( 7 ) . However, ChIP-chip approaches require many
rounds of amplifi cation to yield suffi cient DNA quantities for
microarray hybridization and are limited by the restraints of
microarray probe design, resulting in uneven genomic coverage
and low resolution of DNA-binding sites. Given the large size of
higher eukaryote genomes (e.g. human genome ~3 × 10 9 bp),
genome tiling microarrays covering the whole non-repetitive
genome are printed on 7–38 individual microarrays (depending on
the platform and probe spacing); thus, the amount of ChIP DNA
required and the cost of each replicate are high. The combination
of ChIP with second-generation sequencing technologies (ChIP-
seq) offered an attractive alternative to ChIP-chip, requiring fewer
amplifi cation steps, providing more complete genome coverage,
increasing the resolution of DNA-binding sites and reducing the
cost of whole genome coverage (Table 1 ) ( 8 ) . Three second-gen-
eration sequencing platforms have been developed (Table 1 ),
although platforms which yield many short sequence reads and
give greater special resolution are more appropriate for ChIP-seq
analysis (Table 1 ). Most ChIP-seq studies to date have used
Illumina (Solexa) sequencing to identify ChIP-enriched DNA frag-
ments, and these studies have shown the benefi ts of greater genomic
coverage and special resolution in the study of transcriptional regu-
lation and chromatin organization.
However, this relatively new technology is not without its
problems; specifi cally, ChIP-seq approaches require access to expen-
sive specialist equipment, have poorly defi ned standards for data
analysis, and recommend an order of magnitude more starting mate-
rial than ChIP-chip approaches. A recent study reported a new
ChIP-seq method using the Heliscope single-molecule sequencing
platform (Helicos) allowing successful genome-wide coverage from
low cell numbers, without the need for amplifi cation steps ( 9 ) .
159 11 Mapping Protein–DNA Interactions Using ChIP-Sequencing
However, this platform has not yet been extensively tested in
comparison with other platforms and in our hands; ChIP-seq using
Illumina (Solexa) sequencing allows identifi cation of transcription
factor-binding sites from small amounts of starting material (IM
and CM unpublished data). The problem of data analysis is related
to the relatively recent development of these second-generation
sequencing technologies and many analysis tools have been
developed (Table 2 ). However, the performance of these analysis
packages varies greatly, as highlighted by a recent community
challenge which reported a detailed comparison of 11 currently
available ChIP-seq analysis packages ( 30 ) . We have attempted to
provide a comprehensive list of all available analysis tools and
the studies which have tested these different methods (Table 2 );
however, there is no clear consensus to recommend which analysis
tools may be most appropriate for a given data set. Therefore, we
would recommend comparing the results of two or more analysis
tools and cross-checking the analysis results with the aligned
sequence tag pile-ups to ensure that false-positive and false-negative
rates are low.
However, a more fundamental question remains with regard
to genome-wide coverage using any genome-wide ChIP methods
(ChIP-chip or ChIP-seq), since most studies resort to taking arbi-
trary windows around genes of 10–100 kb in an attempt to link
Comparison of high-throughput sequencing platforms
Platform Read length Coverage
~30× coverage of
in a single run
400 bp average
Not listed (high)
ABI/Solid 35–75 bp ~300 Gb/run Deep coverage $3,000 cost/
25–55 bp reads 25–35 Gb/run No amplifi cation,
less GC bias
Not listed (low,
160C.E. Massie and I.G. Mills
Table 2 Compendium of ChIP-seq analysis tools currently available
Conditional binomial model
( 10 )
( 11 )
Continuous seq-tag density estimation
( 12 )
Commercial uses: NGS Analyzer or MACS
http:/ / liulab.dfci.harvard.edu/
Empirically models fragment lengths and
uses a dynamic
( 13 )
Kharchenko et al.
( 14 )
Probabilistic inference of ChIP-Seq using an
empirical Bayes mixture model approach
( 15 )
Markov chain in both strand directions, using
penalties proportional to the number of reads
( 16 )
Suite of applications for analyzing Illumina,
SOLiD, and 454 data
( 17 )
Negative binomial distribution, Bayesian
( 18 )
Hidden Markov model-based approach
( 19 )
161 11 Mapping Protein–DNA Interactions Using ChIP-Sequencing
Simple height criteria following eXtension
of single-end tags (XSET)
( 20 )
Kernel density estimation
( 21 )
Simple height criteria
( 22 )
Compares reads on different strands
( 23 )
Sample normalization binomial distribution
( 24 )
Mikkelsen et al.
Supervised hidden Markov Model p -values
( 25 )
Hidden Markov model
( 26 )
Modular online analysis tool
( 1 )
http:/ / bioinformatics-renlab.ucsd.edu/rentrac/wiki/
Unsupervised learning method
( 27 )
Mixture model-based analysis method
for RNAP II analysis
( 28 )
http:/ / bips.u-strasbg.fr/seqminer/tiki-index.php
Allows comparisons between reference
genome and multiple ChIP-seq datasets
http:/ / havoc.genomecenter.ucdavis.edu/
One sample t -test
( 29 )
162 C.E. Massie and I.G. Mills
transcription factor-binding sites to their functional targets.
Therefore, most studies could provide the same level of utilized
data using genomic tiling microarrays focussed on protein-coding
genes and miRNA loci. This situation is likely to continue until
information about the three-dimensional structure of chromatin in
the nucleus is mapped for a given cell type, opening the door for
future studies and more comprehensive analyses of the existing
The materials and methods described below are adapted from
previously published protocols ( 2, 3, 31 ) and provide a focussed
description of positive- and negative-control experiments to mea-
sure androgen-stimulated AR binding using ChIP in combination
with direct Solexa sequencing (Illumina). However, these methods
are also more generally applicable to the study of AR in other
contexts and also the study of other chromatin-bound factors.
1. RPMI supplemented with 10% fetal bovine serum (FBS).
2. Phenol red-free RPMI supplemented with 10% charcoal dextran-
3. AR ligands (e.g. DHT or the synthetic androgen R1881).
4. 11% formaldehyde in 50 mM HEPES–KOH, pH 7.5, 100 mM
NaCl, 1 mM EDTA, 0.5 mM EGTA ( see Note 1 ).
5. Formaldehyde-quenching solution of 2.5 M glycine.
1. Cell scrapers.
2. Phosphate-buffered saline (PBS) supplemented with Complete
protease inhibitors (Roche).
3. Rotating tube mixer at 4°C.
4. Sonicating water bath (e.g. Diagenode Bioruptor) or probe
sonicator ( see Note 2 ).
5. ChIP cell lysis buffer: 50 mM HEPES–KOH, pH 7.5, 140 mM
NaCl, 1 mM EDTA, 10% glycerol, 0.5% Igepal (NP-40), 0.25%
Triton X-100, 1× SIGMAFAST protease inhibitors (Sigma).
6. ChIP nuclei wash buffer: 10 mM Tris–HCl, pH 7.5, 200 mM
NaCl, 1 mM EDTA, 0.5 mM EGTA.
7. ChIP nuclear lysis buffer: 10 mM Tris–HCl, pH 7.5, 100 mM
NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.1% sodium deoxy-
cholate, 0.5% SDS ( see Note 3 ).
8. 10% Triton X-100.
2.1. Cell Culture
and Cross Linking
2.2. Harvesting Cells
163 11 Mapping Protein–DNA Interactions Using ChIP-Sequencing
1. Protein A/G Dynal magnetic beads (Invitrogen) and magnetic
tube rack ( see Note 4 ).
2. Antibodies to target proteins (e.g. rabbit anti-AR N20, Santa
Cruz) ( see Note 5 for resources listing validated ChIP-grade
3. PBS supplemented with 0.5% BSA.
4. RIPA ChIP wash buffer: 50 mM HEPES, pH 7.6, 1 mM
EDTA, 0.5 M LiCl, 1% Igepal (NP-40), 0.7% sodium
5. TE with 50 mM NaCl: 10 mM Tris–HCl, pH 8.0, 1 mM
EDTA, 50 mM NaCl.
6. Elution buffer: 1% SDS, 0.1 M NaHCO 3 .
1. TE: 10 mM Tris–HCl, pH 8.0, 1 mM EDTA.
2. RNase A, 1 mg/ml (DNase-free).
3. Proteinase K, 20 mg/ml.
4. 5 M NaCl.
5. Glycogen (Roche) or suitable carrier for precipitation.
6. Phenol:chloroform:isoamyl alcohol (25:24:1).
8. 75% ethanol.
9. 10 mM Tris–HCl, pH 8.0.
1. Oligonucleotide primers to genomic regions of interest.
2. Sybr Green PCR master mix (Applied Biosystems).
3. Optical PCR plates and adhesive covers compatible with the
real-time PCR instrument.
1. T4 DNA polymerase.
2. Klenow DNA polymerase.
3. T4 polynucleotide kinase.
4. dNTP mix (10 mM).
5. DNA Clean and Concentrator-5 Kit (Zymo Research).
6. Klenow fragment (3 ¢ → 5 ¢ exo-minus, 5 U/ μ l).
7. dATP (1 mM).
8. Illumina oligonucleotide adapters.
9. T4 DNA ligase.
10. Phusion DNA polymerase.
11. Illumina oligonucleotide primers 1.1 and 2.1.
12. Dedicated “clean” electrophoresis equipment.
2.4. DNA Isolation
2.5.1. Real-Time PCR
164 C.E. Massie and I.G. Mills
13. High purity agarose (e.g. Low Range Ultra Agarose, Bio-Rad).
14. Sybr safe DNA stain.
15. Dark reader transilluminator (Clare Chemical).
16. Qiagen MinElute Gel Extraction Kit.
17. Agilent Bioanalyser ( see Note 6 ).
The ChIP-seq method described below has been successfully used
to map AR-binding sites, binding sites for other transcriptional
regulators, and RNAP II occupancy using Illumina (Solexa)
sequencing. However, the basic ChIP method presented could
equally be applied to the study of other factors and the same exper-
imental design, quality control tests, and analysis tools can be used
in combination with other sequencing platforms (using the appro-
priate library preparation methods).
The AR is activated by androgen stimulation, allowing the
same antibody to be used in both positive- and negative-control
conditions, thus providing an ideal control for antibody specifi city;
however, other controls are widely used for non-inducible systems
or steady-state measurements ( see Note 7 ).
1. Maintain LNCaP cells in RPMI supplemented with 10% FBS
in cell culture incubators (5% CO 2 at 37°C) and passage at a
dilution of 1:3 when approaching confl uence with trypsin/
EDTA. For ChIP-seq assays, we use 10 7 cells for each ChIP
2. When cells are ~70% confl uent, aspirate media from culture
fl asks, wash cells with PBS, and replace media with phenol
red-free RPMI supplemented with 10% charcoal dextran-
3. After 72 h, replace cell culture media with media supplemented
with androgens (e.g. 1 mM R1881) or an equal volume of
ethanol (vehicle) and return cells to the incubator for 4 h.
4. For every 10 ml of culture media, add 1/10th volume of 11%
formaldehyde solution to cell culture media. Incubate fl asks at
room temperature for 10 min. ( see Note 1 ).
5. Quench the formaldehyde cross-linking reaction by adding
1/20th volume of 2.5 M glycine solution directly to culture
media (to give a fi nal concentration of 125 mM) and incubate
for 5 min at room temperature.
3.1. Cell Culture
and Cross Linking
165 11 Mapping Protein–DNA Interactions Using ChIP-Sequencing
1. Transfer fl asks to ice, remove media, and wash cells twice with
ice-cold PBS supplemented with protease inhibitors.
2. Aspirate PBS from cells and harvest cells using a cell scraper.
Transfer cells to a 15-ml tube using a wide-bore pipette tip and
centrifuge cells at 1,000 × g for 3 min at 4°C.
3. Aspirate residual PBS and add 5 ml of ChIP cell lysis buffer per
5 × 10 6 cells. Incubate on a rotary tube mixer at low speed for
10 min at 4°C. Centrifuge at 1,200 × g for 5 min in a bench-
top centrifuge and discard the supernatant.
4. Re-suspend pellet in 5 ml of ChIP nuclei wash buffer, and
incubate on a rotary tube mixer at low speed for 5 min at 4°C.
Centrifuge at 1,200 × g for 5 min in a bench-top centrifuge and
discard the supernatant.
5. Re-suspend pellet in 1 ml of ChIP nuclear lysis buffer. Split
nuclear lysate into 4× 250- μ l aliquots and sonicate for 15 min
at maximum power in a Bioruptor sonicator (Diagenode) to
fragment chromatin to an average length of 500 bp ( see
Note 2 ). Re-pool sonicated lysates, add 100 μ l of 10% Triton
X-100, and centrifuge in a bench-top microfuge at 14,000 × g
for 10 min at 4°C. Transfer the supernatant to a 15-ml tube
and add 2 ml of ChIP nuclear lysis buffer and 200 μ l of 10%
6. Take 50 μ l from each sample for the total genomic input con-
trol. Assess the extent of sonication by electrophoresis on a 1%
agarose gel, after reversing formaldehyde cross links ( see below).
A smear should be visible, with the majority of fragments
between 250 bp and 1 kb.
1. Aliquot 100 μ l of protein-A Dynal beads per ChIP reaction
and wash three times with 1 ml of PBS–BSA (0.5%), collecting
beads on a magnetic rack in between washes. Re-suspend beads
in 250 μ l of PBS–BSA, add 7.5 μ g of AR N20 antibody (Santa
Cruz), and incubate overnight on a rotary mixer at 4°C. Wash
antibody–bead complexes three times with 1 ml of PBS–BSA,
using a magnetic rack, and re-suspend in 100 μ l of PBS–BSA.
2. Add 100 μ l of antibody–bead complexes to the 3 ml of soni-
cated lysates in 15-ml tubes and incubate overnight at 4°C on
a rotary mixer at low speed.
3. Working in a cold room at 4°C, transfer chromatin–antibody–
bead complexes to 1.5-ml centrifuge tubes by sequentially
adding 1 ml of the mixture to tubes on a magnetic rack and
discarding the supernatant.
4. Wash beads six times with 1 ml of ice-cold RIPA ChIP wash
buffer, taking care to fully re-suspend beads during washes and
using a magnetic rack to immobilize beads between washes.
3.2. Harvesting Cells
166C.E. Massie and I.G. Mills
5. Wash beads once with 1 ml of ice-cold TE supplemented with
50 mM NaCl, immobilize beads using a magnetic rack, and
discard the supernatant. Centrifuge at 3,000 × g in a bench-top
microfuge for 2 min at 4°C and discard the residual
6. Elute ChIP material by adding 200 μ l of ChIP elution buffer
and incubating at room temperature for 1 min with vigorous
mixing. Centrifuge samples at 300 × g for 3 min to pellet beads
and transfer the supernatant to a fresh tube. Repeat the elution
and combine eluates.
1. Reverse protein–DNA cross links of total genomic input con-
trol and ChIP samples by adding NaCl to a fi nal concentration
of 200 mM and incubate at 65°C overnight.
2. Digest contaminating RNA by adding RNase A (20 μ g/ml
fi nal concentration). Incubate at 37°C for 30 min.
3. Digest proteins by adding EDTA (10 mM fi nal concentration),
Tris–HCl, pH 6.7 (20 mM fi nal concentration), and proteinase
K (80 μ g/ml fi nal concentration). Incubate at 55°C for 1 h.
4. Recover DNA by adding an equal volume (~500 μ l) of
phenol:chloroform:isoamyl alcohol, vortex vigorously, and
centrifuge at 13,000 × g for 15 min. Carefully transfer the aque-
ous phase to a fresh tube, add 20 μ g of glycogen and an equal
volume of isopropanol. Vortex vigorously and centrifuge at
13,000 × g for 15 min. Discard supernatant and add 1 volume
of 75 % ethanol per volume of isopropanol used. Centrifuge at
7,000 × g for 8 min, discard supernatant, and air dry pellet.
Re-suspend pellet in 60 μ l of 10 mM Tris–HCl, pH 8.0.
Two methods for quantifying ChIP enrichment are described
below: ChIP-quantitative PCR and ChIP-seq. When designing
and analysing ChIP experiments, it is important to bear in mind
that ChIP enriches rather than isolates genomic targets. Therefore,
while the enriched protein-bound DNA fragments are highly
enriched, the majority of DNA isolated from a ChIP reaction is
likely to comprise fragments not bound by the protein of interest
purely because these DNA fragments constitute such a large pro-
portion of the genome.
ChIP, together with quantitative real-time PCR, is currently the
“gold standard” for assessing or confi rming ChIP enrichment of
specifi c genomic targets and as such is a useful tool for assessing
ChIP effi ciency prior to genome-wide analysis and for confi rming
the results of genome-wide analyses. When analysing ChIP enrich-
ment using real-time PCR, it is necessary to compare the test ChIP
with a control ChIP ( see Note 3 ) to assess specifi c enrichment over
background. It is also necessary to compare the candidate genomic
3.4. DNA Isolation
3.5.1. Real-Time PCR
16711 Mapping Protein–DNA Interactions Using ChIP-Sequencing
region (i.e. the region believed to be bound by the protein of interest)
with a control genomic region which is not bound by the protein.
The control genomic region allows an assessment of the non-specifi c
DNA from each ChIP and can be used to normalize between the
test and control ChIPs. Finally, in order to avoid bias caused by
differences in PCR effi ciency between test and control PCR reac-
tions, it is advisable to use a serial dilution of input material as a
standard curve for each PCR reaction (e.g. 1× to 1/128).
1. Into an optical PCR plate, aliquot 1 μ l of ChIP DNA and stan-
dard curve samples in triplicate for each genomic region to be
analyzed by real-time PCR (usually, a minimum of two PCR
reactions, for the candidate region and the control region).
2. Mix 10 pmol of each primer with water and SybrGreen Master
mix to a 1× fi nal concentration in a fi nal volume of 10 μ l.
3. Aliquot PCR mix onto ChIP DNA, seal plates with adhesive
covers, and centrifuge briefl y.
4. Use the PCR conditions suggested for use with the SybrGreen
mix used (e.g. hot start: 50°C 2 min, 95°C 10 min (95°C 15 s,
60°C 1 min), repeat 40 times). The addition of a dissociation
curve at the end of the PCR allows an assessment of the speci-
fi city of the PCR.
5. Specifi c enrichment by ChIP can be assessed using the
(control sample Ct test sample Ct)
(control sample Ct test sample Ct)
Here, “ControlEff” is the effi ciency of the control PCR and
“TestEff” is the effi ciency of the test PCR, both calculated
using the formula: 10 (1/−slope of standard curve) . “Control sample Ct”
and “test sample Ct” are the cycle thresholds (Ct) at which the PCR
reactions for control or test samples become exponential.
The combination of sequencing-based approaches with ChIP
circumvents many of the problems associated with ChIP-chip (e.g. probe
design, probe specifi city, genome coverage, and bias introduced by
amplifi cation). Comparison of ChIP-seq and ChIP-chip for the
STAT1 transcription factor revealed a 64–71% overlap between the
binding sites identifi ed by both techniques, although ChIP-seq
found 3.8-fold more binding sites in total suggesting that it is the
more sensitive method ( 32 ) . The emergence of third-generation,
single-molecule sequencing platforms, such as the Heliscope
system (Helicos), and the promise of nanopore technology ( 33 )
suggest that in the near future the amount of starting material
required will be greatly reduced and post-ChIP amplifi cation steps
may no longer be required. However, for most applications, where
starting material is not severely limited (e.g. larger tissue samples
3.5.2. Direct Sequencing
168 C.E. Massie and I.G. Mills
or cell lines), and given the ever-increasing capacity and falling cost
of second-generation sequencing technologies, it seems likely that
this will remain the platform of choice for ChIP-seq studies.
Accordingly, the method outlined below describes the ChIP-seq
library preparation for the second-generation Solexa (Illumina)
1. Blunt DNA fragments from 50 μ l of the ChIP material and
50 ng of total genomic input control using 15U T4 DNA poly-
merase, 5U Klenow DNA polymerase, and 50U T4 polynucle-
otide kinase in 1× T4 DNA ligase buffer. Incubate for 30 min
at 20°C. Clean up DNA using the Zymogen DNA Clean and
Concentrate-5 kit, eluting in 32 μ l of preheated EB.
2. Add A-overhangs to 32 μ l of blunted ChIP and input DNA
samples using 15U Klenow 3 ¢ → 5 ¢ exo-minus, with 200 μ M
dATP in Klenow buffer. Incubate at 37°C for 30 min. Clean
up DNA using the Zymogen DNA Clean and Concentrate-5
kit, eluting in 8 μ l of preheated EB.
3. Ligate Illumina adapters to 8 μ l of ChIP and input DNA using
2 μ l of adapters and 2.5 μ l of Quick T4 DNA ligase (NEB) in
1× Quick DNA ligase buffer. Incubate at room temperature
for 15 min. Clean up DNA using the Zymogen DNA Clean
and Concentrate-5 kit, eluting in 23 μ l of preheated EB.
4. Amplify and enrich adapter-ligated DNA fragments by PCR
using 23 μ l of ChIP and input DNA, 1 μ l of Illumina PCR
primer 1.1, 1 μ l of Illumina PCR primer 2.1, 0.5 μ l of Phusion
DNA polymerase, 1× high-fi delity PCR buffer, and 200 μ M
dNTP. Amplify using the following conditions: 98°C 30 s,
(98°C 10 s, 65°C 30 s, 72°C 30 s), repeat 17 times, 72°C
5 min. Clean up amplifi ed material using a Qiagen PCR purifi -
cation kit, eluting DNA in 30 μ l of preheated EB.
5. Size select ChIP and total genomic input material using aga-
rose gel electrophoresis. Pour a 2% TAE agarose gel with 1×
SybrSafe DNA stain. Load DNA ladder, ChIP, and input sam-
ples using only glycerol (12% fi nal glycerol concentration). Use
specifi c electrophoresis equipment for library preparation and
run only one library per gel. Run gel at 120 for 45 min, visual-
ize on a Dark Reader transilluminator, and excise the 200–
300 bp part of the DNA smear. Purify the DNA using a Qiagen
MiniElute Gel Extraction Kit, eluting in 15 μ l of preheated EB
( see Note 8 ).
6. Measure DNA concentration on an Agilent Bioanalyser chip
and proceed with sequencing on an Illumina Genome
7. Base calling from raw image fi les, quality control of sequence
reads, alignment of short sequence reads to the reference
genome, removal of exact duplicate reads, and peak calling
16911 Mapping Protein–DNA Interactions Using ChIP-Sequencing
require the implementation of an Illumina bioinformatics pipeline
in combination with other bioinformatics programs (e.g.
MAQ and MACS), requiring signifi cant bioinformatics sup-
port ( see Note 9 ). With regard to the analysis of fi ltered and
aligned data, there are many ChIP-seq peak calling packages
(Table 2 ) and 11 have recently been directly compared ( 12 ) .
Careful implementation of multiple analysis packages (e.g. the
best performing from direct comparisons: MACS 1.3.5, USeq,
Partek, SWEMBL, or BPC) should provide the most reliable
peak calls. These analysis tools require specialist bioinformatics
support and so simplifi ed Web-based analysis tools offer a use-
ful alternative to wet lab scientists (e.g. Cistrome or Sole
Search) ( 34, 35 ) , although it is likely that these tools may be
most useful for fi rst-pass analysis.
1. Cross-linking of protein–DNA interactions is commonly used
when studying transcription factor binding. Formaldehyde is
most widely used for this purpose and produces covalent cross
links between amino or imino groups, which are within 2 Å
from each other. It is also possible to use other cross-linking
agents, such as imidoesters or NHD-esters (e.g. dimethyl
pimelimidate (DMP) or disuccinimidyl glutarate (DSG)) in
combination with formaldehyde, to increase the effi ciency of
cross linking, which may be most applicable to low-abundance
DNA-binding proteins ( 36 ) . The use of imidoesters or NHS-
esters as cross-linkers also provides an opportunity to alter the
resolution from 2 to 20 Å, depending on the spacer length of
the ester used ( 36 ) . It is possible to perform ChIP without
cross linking (i.e. native ChIP); however, this is only suitable
for proteins which bind stably to DNA and is mainly used in
ChIP assays for histones ( 37 ) .
2. It is essential to optimize sonication conditions for each cell
type and sonicator, since this step defi nes the resolution of
ChIP. It is advisable to test a range of conditions, including
length of pulse, number of pulses, and amplitude of sonication.
The effi ciency of sonication should be assessed by resolving a
sample of total chromatin after sonication and decross-linking
using agarose gel electrophoresis.
3. Sodium dodecyl sulphate (SDS) is liable to precipitate and can
cause foaming when used with probe-based sonicators. Therefore,
N -lauroyl sarcosine can be substituted with SDS to avoid these
problems. However, in our experience, when using a water
bath sonicator, SDS or N -lauroyl sarcosine works equally well.
170 C.E. Massie and I.G. Mills
4. Dynal magnetic protein-A/G beads have lower non-specifi c
DNA binding, reducing background, and increasing specifi c
enrichment compared to either agarose or sepharose beads.
In general, protein-A or protein-G beads may be used for
ChIP using rabbit antibodies and protein-G may be used for ChIP
using mouse, sheep, and goat IgG1 antibodies. However, we
use an equal mixture of protein-A and protein-G beads to allow
comparison of ChIP using antibodies from different species.
5. A number of research groups and companies provide search-
able databases and compendia of validated ChIP-grade anti-
bodies ( 38– 44 ) .
6. Quantitation of adapter-ligated ChIP DNA in sequencing
libraries is an essential step to determine the loading on
sequencing fl ow cells. It is common to use the Agilent
Bioanalyser system to quantify ChIP-seq libraries following
limited PCR enrichment of adapter-ligated DNA fragments.
However, quantitative PCR-based methods have also been
described, which utilize the adapter sequences to allow accu-
rate quantitation of only DNA fragments which are attached to
sequencing adapters ( 45 ) .
7. There are many control ChIP experiments which can be used
as a reference to assess specifi c enrichment in the test ChIP.
The choice of which control to use depends on the system
under investigation and the question to be addressed. Many
studies use an IgG ChIP control to assess non-specifi c enrich-
ment caused by protein–DNA complexes binding to beads or
IgG. However, this control does not account for any “off-
target” binding of the specifi c antibody used for ChIP. In order
to assess the specifi c enrichment by a ChIP antibody, it is nec-
essary to compare isogenic cells in which the protein of interest
is not bound to DNA or alternatively lacks the target protein
completely. In the case of NHRs, it is possible to compare hor-
mone-deprived cells to cells stimulated with the specifi c NHR
ligand resulting in nuclear translocation and DNA binding
(e.g. androgen treatment to activate the AR). Where possible,
the best controls for ChIP may be isogenic cells which are null
for the target protein (e.g. have targeted deletions of the gene
encoding the target protein) or alternatively RNAi “knock-
down” of the target protein.
8. A recent methods paper has comprehensively described impro-
vements to the preparation of Illumina sequencing libraries
( 45 ) . This included a suggestion that heating dsDNA during
library preparation may have an impact on GC bias in the resultant
library. Therefore, the increased yields afforded by preheating
elution buffers in DNA purifi cation columns and gel extraction
kits may be counter balanced by the cost of introducing experimental
noise into this sensitive system, with implications for the fi delity
of ChIP-seq libraries.
171 11 Mapping Protein–DNA Interactions Using ChIP-Sequencing
9. A recent review discussed the characteristics of ChIP-seq analysis
software packages ( 46 ) and many of the currently available
analysis tools were compared side-by-side in a ChIP-seq commu-
nity challenge ( 30 ) . The top scoring analysis tools were MACS
1.3.5, USeq, Partek, SWEMBL, and BPC, which used alone
or in combination should provide the most reliable peak calls.
However, the published comparisons may not accurately model
all types of ChIP-seq data sets, not all analysis tools were
included in these comparisons (Table 2 ), and it remains possi-
ble that some analysis tools may perform better than others for
certain ChIP-seq profi les.
C. E. Massie is a postdoctoral researcher at the Department of
Haematology, University of Cambridge. I. G. Mills is a research
group leader within the Centre for Molecular Medicine (Norway),
a Visiting Scientist at Cancer Research UK and an Honorary Senior
Visiting Research Fellow within the Department of Oncology at
the University of Cambridge.
1. Barski, A., Cuddapah, S., Cui, K., Roh, T. Y.,
Schones, D. E., Wang, Z., Wei, G., Chepelev,
I., and Zhao, K. (2007) High-resolution profi l-
ing of histone methylations in the human
genome, Cell 129 , 823–837.
2. Carroll, J. S., Meyer, C. A., Song, J., Li, W.,
Geistlinger, T. R., Eeckhoute, J., Brodsky, A.
S., Keeton, E. K., Fertuck, K. C., Hall, G. F.,
Wang, Q., Bekiranov, S., Sementchenko, V.,
Fox, E. A., Silver, P. A., Gingeras, T. R., Liu, X.
S., and Brown, M. (2006) Genome-wide analy-
sis of estrogen receptor binding sites, Nature
Genetics 38 , 1289–1297.
3. Massie, C. E., Adryan, B., Barbosa-Morais, N.
L., Lynch, A. G., Tran, M. G., Neal, D. E., and
Mills, I. G. (2007) New androgen receptor
genomic targets show an interaction with the
ETS1 transcription factor, EMBO Reports 8 ,
4. Wang, Q., Li, W., Zhang, Y., Yuan, X., Xu, K.,
Yu, J., Chen, Z., Beroukhim, R., Wang, H.,
Lupien, M., Wu, T., Regan, M. M., Meyer, C.
A., Carroll, J. S., Manrai, A. K., Janne, O. A.,
Balk, S. P., Mehra, R., Han, B., Chinnaiyan, A.
M., Rubin, M. A., True, L., Fiorentino, M.,
Fiore, C., Loda, M., Kantoff, P. W., Liu, X. S.,
and Brown, M. (2009) Androgen receptor
regulates a distinct transcription program in
androgen-independent prostate cancer, Cell
138 , 245–256.
5. Bolton, E. C., So, A. Y., Chaivorapol, C., Haqq,
C. M., Li, H., and Yamamoto, K. R. (2007)
Cell- and gene-specifi c regulation of primary
target genes by the androgen receptor, Genes
Dev 21 , 2005–2017.
6. Dahl, J. A., Reiner, A. H., and Collas, P. (2009)
Fast genomic muChIP-chip from 1,000 cells,
Genome Biol 10 , R13.
7. Acevedo, L. G., Iniguez, A. L., Holster, H. L.,
Zhang, X., Green, R., and Farnham, P. J. (2007)
Genome-scale ChIP-chip analysis using 10,000
human cells, BioTechniques 43 , 791–797.
8. Johnson, D. S., Mortazavi, A., Myers, R. M.,
and Wold, B. (2007) Genome-wide mapping of
in vivo protein-DNA interactions, Science 316 ,
9. Goren, A., Ozsolak, F., Shoresh, N., Ku, M.,
Adli, M., Hart, C., Gymrek, M., Zuk, O.,
Regev, A., Milos, P. M., and Bernstein, B. E.
(2010) Chromatin profi ling by directly sequenc-
ing small quantities of immunoprecipitated
DNA, Nature Methods 7 , 47–49.
10. Ji, H., Jiang, H., Ma, W., Johnson, D. S.,
Myers, R. M., and Wong, W. H. (2008) An
integrated software system for analyzing
172 C.E. Massie and I.G. Mills
ChIP-chip and ChIP-seq data, Nat Biotechnol
26 , 1293–1300.
11. Mortazavi, A., Williams, B. A., McCue, K.,
Schaeffer, L., and Wold, B. (2008) Mapping
and quantifying mammalian transcriptomes by
RNA-Seq, Nature methods 5 , 621–628.
12. Boyle, A. P., Guinney, J., Crawford, G. E., and
Furey, T. S. (2008) F-Seq: a feature density esti-
mator for high-throughput sequence tags,
Bioinformatics 24 , 2537–2538.
13. Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J.,
Johnson, D. S., Bernstein, B. E., Nussbaum,
C., Myers, R. M., Brown, M., Li, W., and Liu,
X. S. (2008) Model-based analysis of ChIP-Seq
(MACS), Genome Biol 9 , R137.
14. Kharchenko, P. V., Tolstorukov, M. Y., and
Park, P. J. (2008) Design and analysis of ChIP-
seq experiments for DNA-binding proteins,
Nat Biotechnol 26 , 1351–1359.
15. Zhang, X., Robertson, G., Krzywinski, M.,
Ning, K., Droit, A., Jones, S., and Gottardo, R.
PICS: Probabilistic Inference for ChIP-seq,
16. Schmidt, D., Schwalie, P. C., Ross-Innes, C. S.,
Hurtado, A., Brown, G. D., Carroll, J. S.,
Flicek, P., and Odom, D. T. A CTCF-
independent role for cohesin in tissue-specifi c
transcription, Genome Res 20 , 578–588.
17. Nix, D. A., Courdy, S. J., and Boucher, K. M.
(2008) Empirical methods for controlling false
positives and estimating confi dence in ChIP-
Seq peaks, BMC Bioinformatics 9 , 523.
18. Spyrou, C., Stark, R., Lynch, A. G., and
Tavare, S. (2009) BayesPeak: Bayesian analy-
sis of ChIP-seq data, BMC Bioinformatics 10 ,
19. Qin, Z. S., Yu, J., Shen, J., Maher, C. A., Hu,
M., Kalyana-Sundaram, S., and Chinnaiyan, A.
M. HPeak: an HMM-based algorithm for defi n-
ing read-enriched regions in ChIP-Seq data,
BMC Bioinformatics 11 , 369.
20. Robertson, G., Hirst, M., Bainbridge, M.,
Bilenky, M., Zhao, Y., Zeng, T., Euskirchen,
G., Bernier, B., Varhol, R., Delaney, A.,
Thiessen, N., Griffi th, O. L., He, A., Marra,
M., Snyder, M., and Jones, S. (2007) Genome-
wide profi les of STAT1 DNA association using
chromatin immunoprecipitation and massively
21. Valouev, A., Johnson, D. S., Sundquist, A.,
Medina, C., Anton, E., Batzoglou, S., Myers,
R. M., and Sidow, A. (2008) Genome-wide
analysis of transcription factor binding sites
based on ChIP-Seq data, Nature methods 5 ,
Nature methods 4 ,
22. Fejes, A. P., Robertson, G., Bilenky, M., Varhol,
R., Bainbridge, M., and Jones, S. J. (2008)
FindPeaks 3.1: a tool for identifying areas of
enrichment from massively parallel short-read
sequencing technology, Bioinformatics 24 ,
23. Jothi, R., Cuddapah, S., Barski, A., Cui, K., and
Zhao, K. (2008) Genome-wide identifi cation
of in vivo protein-DNA binding sites from
ChIP-Seq data, Nucleic Acids Res 36 ,
24. Rozowsky, J., Euskirchen, G., Auerbach, R. K.,
Zhang, Z. D., Gibson, T., Bjornson, R.,
Carriero, N., Snyder, M., and Gerstein, M. B.
(2009) PeakSeq enables systematic scoring of
ChIP-seq experiments relative to controls, Nat
Biotechnol 27 , 66–75.
25. Ku, M., Jaffe, D. B., Issac, B., Lieberman, E.,
Giannoukos, G., Alvarez, P., Brockman, W.,
Kim, T. K., Koche, R. P., Lee, W., Mendenhall,
E., O’Donovan, A., Presser, A., Russ, C., Xie,
X., Meissner, A., Wernig, M., Jaenisch, R.,
Nusbaum, C., Lander, E. S., and Bernstein, B.
E. (2007) Genome-wide maps of chromatin
state in pluripotent and lineage-committed
cells, Nature 448 , 553–560.
26. Xu, H., Wei, C. L., Lin, F., and Sung, W. K.
(2008) An HMM approach to genome-wide
identifi cation of differential histone modifi ca-
tion sites from ChIP-seq data, Bioinformatics
24 , 2344–2349.
27. Hon, G., Ren, B., and Wang, W. (2008)
ChromaSig: a probabilistic approach to fi nding
common chromatin signatures in the human
genome, PLoS Comput Biol 4 , e1000201.
28. Feng, W., Liu, Y., Wu, J., Nephew, K. P.,
Huang, T. H., and Li, L. (2008) A Poisson
mixture model to identify changes in RNA
polymerase II binding quantity using high-
throughput sequencing technology, BMC
Genomics 9 Suppl 2 , S23.
29. Blahnik, K. R., Dou, L., O’Geen, H., McPhillips,
T., Xu, X., Cao, A. R., Iyengar, S., Nicolet, C.
M., Ludascher, B., Korf, I., and Farnham, P. J.
Sole-Search: an integrated analysis program for
peak detection and functional annotation using
ChIP-seq data, Nucleic Acids Res 38 , e13.
30. http:/ /sourceforge.net/projects/useq/files/
31. Schmidt, D., Wilson, M. D., Spyrou, C.,
Brown, G. D., Hadfi eld, J., and Odom, D. T.
(2009) ChIP-seq: using high-throughput
sequencing to discover protein-DNA interac-
tions, Methods (San Diego, Calif) 48 ,
173 11 Mapping Protein–DNA Interactions Using ChIP-Sequencing Download full-text
32. Robertson, G., Hirst, M., Bainbridge, M.,
Bilenky, M., Zhao, Y., Zeng, T., Euskirchen, G.,
Bernier, B., Varhol, R., Delaney, A., Thiessen, N.,
Griffi th, O. L., He, A., Marra, M., Snyder, M.,
and Jones, S. (2007) Genome-wide profi les of
STAT1 DNA association using chromatin immu-
noprecipitation and massively parallel sequenc-
ing, Nature Methods 4 , 651–657.
33. http:/ /www.nanoporetech.com/ . Nanopore.
34. http:/ /cistrome.dfci.harvard.edu/ap/ . Cistrome.
35. http:/ /chipseq.genomecenter.ucdavis.edu/cgi-
bin/chipseq.cgi . Sole Search.
36. Nowak, D. E., Tian, B., and Brasier, A. R.
(2005) Two-step cross-linking method for
identifi cation of NF-kappaB gene network by
chromatin immunoprecipitation, BioTechniques
39 , 715–725.
37. West, A. G., Huang, S., Gaszner, M., Litt, M. D.,
and Felsenfeld, G. (2004) Recruitment of histone
modifi cations by USF proteins at a vertebrate bar-
rier element, Molecular Cell 16 , 453–463.
38. http:/ /www.chiponchip.org/Antibody/chip.
html . Compendium of ChIP grade antibodies.
39. http:/ /www.abcam.com/index.html?c=917 .
Abcam ChIP grade antibodies.
40. http:/ /www.diagenode.com/en/topics/anti-
41. http:/ /www.cellsignal.com/technologies/chip.
html . Cell Signalling ChIP antibodies.
42. http:/ /www.activemotif.com/catalog/18/
chip-validated-antibodies.html . Active motif
43. http:/ /www.millipore.com/microsites/search.
w=10#0:0 . Millipore ChIP antibodies.
44. http:/ /www.invitrogen.com/site/us/en/
chip.html . Invitrogen ChIP antibodies.
45. Quail, M. A., Kozarewa, I., Smith, F., Scally, A.,
Stephens, P. J., Durbin, R., Swerdlow, H., and
Turner, D. J. (2008) A large genome center’s
improvements to the Illumina sequencing sys-
tem, Nature Methods 5 , 1005–1010.
46. Pepke, S., Wold, B., and Mortazavi, A. (2009)
Computation for ChIP-seq and RNA-seq stud-
ies, Nature Methods 6 , S22–32.