Nucleic Acids Research, 2009, Vol. 37, No. 2Published online 4 December 2008
An affinity-based scoring scheme for predicting
DNA-binding activities of modularly assembled
Jeffry D. Sander1,*, Peter Zaback1, J. Keith Joung2,3, Daniel F. Voytas4and
1Department of Genetics, Development and Cell Biology, Bioinformatics and Computational Biology Program,
Iowa State University, Ames, IA 50011,2Molecular Pathology Unit, Center for Cancer Research, and Center
for Computational and Integrative Biology, Massachusetts General Hospital, 149 13th Street, Charlestown,
MA 02129,3Department of Pathology, Harvard Medical School, Boston, MA 02115 and4Department of
Genetics, Cell Biology and Development and Center for Genome Engineering, University of Minnesota,
MN 55455, USA
Received August 7, 2008; Revised November 10, 2008; Accepted November 12, 2008
Zinc-finger proteins (ZFPs) have long been recog-
nized for their potential to manipulate genetic infor-
mation because they can be engineered to bind
novel DNA targets. Individual zinc-finger domains
(ZFDs) bind specific DNA triplet sequences; their
apparent modularity has led some groups to pro-
pose methods that allow virtually any desired DNA
motif to be targeted in vitro. In practice, however,
ZFPs engineered using this ‘modular assembly’
approach do not always function well in vivo.
Here we report a modular assembly scoring strategy
that both identifies combinations of modules least
likely to function efficiently in vivo and provides
accurate estimates of their relative binding aff-
inities in vitro. Predicted binding affinities for 53
‘three-finger’ ZFPs, computed based on energy con-
tributions of the constituent modules, were highly
correlated (r=0.80) with activity levels measured
in bacterial two-hybrid
values for seven modularly assembled ZFPs and
their intended targets, measured using fluorescence
anisotropy, were also highly correlated with predic-
tions (r=0.91). We propose that success rates
for ZFP modular assembly can be significantly
improved by exploiting the score-based strategy
The ability to reliably engineer DNA binding proteins that
recognize any desired DNA sequence would provide an
unprecedented level of control over genetic information;
for example, by allowing the creation of site-specific
nucleases that specifically alter genomic DNA (1–5). The
C2H2zinc-finger domain (ZFD) is arguably the best char-
acterized DNA binding motif and offers considerable pro-
mise for the rational engineering of site-specific DNA
binding proteins (6–11). Zinc-finger proteins (ZFPs) con-
sist of multiple individual ZFDs, each of which typically
recognizes adjacent sequence triplets in duplex DNA
(Figure 1). An individual ZFD comprises a pair of anti-
parallel b-strands and one a-helix, which coordinate a zinc
ion through conserved pairs of cysteine and histidine resi-
dues. In the canonical three-finger domain of the Zif268
transcription factor, the amino acid side chains at posi-
tions ?1, +3 and +6 relative to the amino-terminal end
of the a-helix typically make base-specific contacts with
three adjacent nucleotides within the major groove of
double-stranded DNA (12). An aspartic acid residue in
the +2 position of the DNA recognition helix can specify
a fourth nucleotide, resulting in either target-site overlap
with an adjacent module or specification of an additional
nucleotide at the 30-end of the target site (13,14).
Several research groups have characterized ZFDs that
recognize many of the 64 possible DNA triplets (15–20).
Using a ‘modular assembly’ approach, novel ZFPs that
recognize variant DNA sites are assembled by simply
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
*To whom correspondence should be addressed. Tel: +1 515 294 4991; Fax: +1 515 294 6790; Email: email@example.com
Correspondence may also be addressed to Drena Dobbs. Tel: +1 515 294 4991; Fax: +1 515 294 6790; Email: firstname.lastname@example.org
? 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
stringing together individual ZFDs. In practice, however,
ZFPs made by modular assembly display a wide range of
binding affinities and specificities (15,19,21–23). Although
modular assembly has proven useful for some in vivo
applications, such as artificial transcription factors,
recent work suggests that the success rate of creating arti-
ficial zinc-finger nucleases (ZFNs–fusions of engineered
zinc fingers to a non-specific nuclease domain) by this
method is considerably lower (24,25). These low success
rates, together with the inability to predict which ZFPs
are likely to function in vivo, have motivated our groups
to improve the procedures and design criteria for ZFP
The present study was motivated by our observation
that among a small set of modularly assembled ZFPs,
those that fail to function in vivo are more likely to possess
modules previously shown to have relatively low affinity
for target DNA. This observation implies that insufficient
affinity can contribute to poor function in vivo and also
suggested that it might be possible to predict the affinity
of a modularly assembled ZFP using existing affinity data
for component modules. Here we test these hypotheses
and demonstrate that both the in vitro binding affinity
and the lack of in vivo activity of a ZFP can be predicted
using the energy contributions of its component ZFDs.
Our approach for predicting the binding of ZFPs to
desired target sequences should improve success rates of
modular assembly by guiding investigators away from
target sites and ZFP combinations least likely to function
MATERIALS AND METHODS
Zinc-finger modules and three-finger arrays (ZFPs)
All ZFDs used in these experiments have been described
by the Barbas group (15) and are referred to as ‘Barbas
modules’. ZFPs containing desired three-finger (three-
module) arrays were assembled by iterative ligation and
cloning of restriction fragments encoding ZFDs using
reagents and protocols previously described by the Zinc
Finger Consortium (http://www.zincfingers.org/) (27).
ZFP-encoding fragments were then cloned into vectors
for expression as Gal11P-hybrid proteins in the bacterial
two-hybrid (B2H) system as previously described (27).
A series of B2H reporter plasmids, each harboring a target
binding site for one of 27 different three-finger ZFPs, was
constructed by cloning synthetic target oligonucleotides
into reporter plasmid pBAC-lacZ as previously described
(27). Binding of a Gal11P-ZFP hybrid protein to the
target sequence on a B2H reporter plasmid triggers tran-
scriptional activation of a lacZ reporter gene encoding
b-galactosidase. In vivo ZFP performance was therefore
assayed using a b-galactosidase assay in which ZFP-
induced activation of lacZ expression was measured rela-
tive to control constructs lacking the ZFP.
Zincfinger–maltose binding proteinfusion protein
construction, expression and purification
Zinc finger–maltose binding protein (MBP) fusion protein
constructs were generated by transferring three-finger
arrays, assembled as described above, into pHMTC (28).
The MBP fusion plasmids were transformed into BL21
Escherichia coli cells (Invitrogen) using standard chemical
transformation procedures (29).
For protein expression, 5ml cultures were grown for
16h at 308C with agitation in ZFE broth [Luria Broth
(LB), 1.11mM dextrose, 100mg/ml ampicillin]. Expansion
cultures of 10ml were inoculated from these overnight
cultures (1:100 dilution) and grown to an OD600of 0.5
before a 2h induction with isopropyl b-D-1-thiogalacto-
pyranoside (IPTG). Cells were harvested by centrifugation
for 10min at 4000g at 48C and frozen overnight at ?208C.
The following day, cells were resuspended in 4ml WB1
(15mM HEPES pH 7.8, 200mM NaCl, 20mM ZnSO4)/
1mM PMSF/0.1% NonidetTMP40 (NP-40) and refrozen
at ?708C. Cells were then thawed in ice water and centri-
fuged at 9000g at 48C for 20min. To remove remaining
nucleic acids, the resulting supernatant was transferred
to a new cold tube and polyethyleneimine was added to
0.1%. The supernatant was then incubated for 30min
before a second centrifugation at 16000g at 48C for
Amylose beads (NEB) were prepared in 50ml aliquots in
1.5ml micro-centrifuge tubes according to manufacturer’s
instructions. Beads were washed (suspended, spun down
and supernatant removed) three times in 1ml WB1/0.1%
NP-40at 48C and resuspended in 450ml WB1. For affinity
purification, 1ml of clarified protein supernatant was
added to prepared beads, and incubated at 48C for
30min. The slurry was centrifuged and the supernatant
was removed. The proteins bound to beads were washed
two times with 700ml WB1/0.1% NP-40 and two times
with zinc buffer A (ZBA; 10mM Tris–HCl, pH 7.5,
90mM KCl, 1mM MgCl2, 90mM ZnCl2)/0.1% NP-40
(15). Purified proteins were then eluted in 200ml ZBA/
0.1% NP-40/40mM maltose for 30min at room tempera-
ture, with gentle agitation. After elution, beads were cen-
trifuged at 16000g. The supernatant was transferred to a
new cold tube and centrifuged again at 16000g. The super-
natant was transferred to a new cold tube and gently stir-
red to mix protein. Proteins were stored at ?708C in
Axygen MaxymumRecoveryTMtubes. Protein concentra-
tions were estimated using a Bradford assay against a
Figure 1. A three-finger ZFP with its DNA target site. A ZFP consist-
ing of three adjacent ZFDs binds its target DNA through contacts
between the amino acids of the DNA recognition helices and consecu-
tive nucleotides in the DNA. The protein chain is drawn in the N- to
C-terminal direction and the DNA target in the 30–50direction. Note
that an ‘unnatural’ extended array is shown to better illustrate the
critical amino acid/nucleotide contacts. Structure diagrams were gener-
ated using PyMol (http://www.pymol.org).
Nucleic Acids Research,2009, Vol. 37,No. 2507
bovine serum albumin (BSA) standard in ZBA/0.1%
Binding measurements using fluorescence anisotropy
Binding reactions were performed in ZBA/0.1% NP-40/
0.1mg/ml non-acetylated BSA (Sigma) for 30min on ice
with 5 nM target DNA. Target sites (shown in Figure 2a)
were formed using hairpin DNA oligonucleotides as
described (15). HPLC purified, 30-6-FAM-labeled oligonu-
Technologies (Coralville, IA, USA). In each experiment,
two serial dilutions of purified ZFP-MBP fusion protein
were performed over a range of 1000–0.122 nM. Reported
binding affinity values are based on the average of three
separate binding experiments, performed on different
days, using three separate protein preparations. Fluo-
rescence anisotropy (FA) measurements were made
using a Varian Cary Eclipse spectrophotometer in
L-format configuration. Each value was based on five
measurements averaged over 5s, using a 490nm excitation
wavelength (5nm slit width), and 530nm emission wave-
length (20nm slit width) at 880V. Background light scat-
tering for each protein sample dilution was measured and
subtracted to correct for protein concentration-dependent
variation in intensities. Kdvalues were determined by non-
linear regression (30,31) using Prism (http://www.graph
To test the hypothesis that binding energy contributions
of individual ZFDs can be used to predict the in vitro
binding affinities and in vivo performance of extended
ZFP arrays, 27 three-module ZFPs were constructed by
assembling various GNN-specific modules previously
characterized by the Barbas group (15). ZFP compositions
were chosen to systematically explore a wide range of
predicted binding affinities and to test the influence of
context on module performance. As shown in Table 1,
ZFDs were divided into three affinity classes based on
their reported affinity constants measured in a fixed con-
text, namely as fingers in the middle position of a three-
finger Zif268 variant (15). Modules comprising Zif268
variants with Kd values <10 nM were categorized as
‘strong’, Kd=10–30 nM as ‘moderate’ and Kd>30 nM
as ‘weak’ (Table 1). Using three different modules to
represent each binding class, all possible combinations of
strong, moderate and weak affinity modules for a three-
module ZFP were assembled. To allow direct comparisons
among proteins that differ by a single module, ZFPs were
designed in subgroups in which only one finger position
was varied (Table 2).
Predicting relative binding energiesformodularly
If one assumes that the binding energy of a three-finger
ZFP (?G8ZFP) is equal to the sum of the binding energies
of its three component ZFDs (?G8ZFD) [Equation (1)], it
follows that the difference in binding energy between any
two ZFPs is the sum over the positions of the difference in
binding energy between the modules at each position
Because the ZFDs used in this study were evaluated
in the middle (F2) position of a three-finger ZFP, and
because the other fingers (F1 and F3) were constant in
all these ZFPs, the differences in measured binding con-
stants among these constructs should be attributable to
the differences in binding energy between the F2 ZFDs.
Thus, ??G can be calculated between any two ZFDs
by using the identity relating Gibbs free energy to Kd
[Equation (3), RT=0.58].
ZFD2¼ ?RTln KdZFP1=KdZFP2
To compare binding affinity measurements with pre-
dicted values, the predicted ??G was calculated as the
difference between each ZFP and a standard (STD) ZFP
composed entirely of the F2 domain of parental C7 (15).
??GZFD1¼ ?RT ln KdZFP1=KdSTD
þ ln KdZFP3=KdSTD
Thus, using Equation (4) and binding constants for ZFP
variants published by the Barbas group (15), we predicted
??G values for 27 novel modularly assembled ZFPs con-
structed using Barbas GNN modules (Figure 2). Predicted
??G values ranged from 2.1kcal/mol for ZFP #1, con-
taining three strong modules to 8.2kcal/mol for ZFP #27
containing three weak modules.
To evaluate in vivo binding of the 27 modularly assembled
ZFPs to their cognate DNA targets, we used a quantita-
tive B2H assay (32). In this assay, binding of a ZFP to its
target site activates transcription of a lacZ reporter posi-
tioned downstream of an adjacent promoter. Thus, ZFP
DNA-binding activity can be assessed by quantifying
b-galactosidase activity in ZFP-expressing cells relative
to control cells that do not express the ZFP. We chose
to use the B2H system as an assay because recently pub-
lished studies have shown that absence of ZFP activity in
this system is an excellent predictor for failure of these
proteins to function as ZFNs in human cells (24–26).
For 25 of the 27 ZFPs tested, the level of lacZ activation
observed was in excellent agreement with predicted ener-
gies (Figure 2a). Expression of the two ZFPs with the
strongest predicted binding energy was toxic to cells,
Nucleic Acids Research, 2009, Vol. 37, No. 2
Figure 2. Predicted binding energies are highly correlated with in vivo activity in a B2H assay. Twenty-seven three-module ZFPs were designed by
modular assembly to span a broad range of predicted binding affinities. (a) DNA recognition helix sequences, DNA sequence targets, predicted
energies, measured fold-activation in the B2H assay, and standard error of the mean are listed for each construct. Entries are sorted from lowest to
highest predicted ??G. Constructs marked with an asterisk were also tested in vitro (Figure 4). (b) ZFP activities in the B2H assay are plotted versus
predicted energies. The two constructs with highest predicted affinities were toxic to their host cells and therefore could not be included. Points shown
as red diamonds correspond to proteins containing the GTA-specific QSSSLVR module (see text). Best-fit lines from a segmental linear regression
model using all points (dashed line, r=0.77), or excluding red points (solid line, r=0.86) are shown. (c) Same as (b), except that values indicated by
red triangles were adjusted assuming a binding affinity of 2.5nM, rather than 25nM, for the GTA-specific QSSSLVR module; this increases the
correlation coefficient to 0.86 (see text for details).
Nucleic Acids Research,2009, Vol. 37,No. 2509
preventing analysis of these constructs. Several models
describing the relationship between predicted and mea-
sured activity were evaluated, with segmental linear
regression providing the best fit (r=0.77; Figure 2b,
dashed line). Inspection of the data revealed that the
GTA-specific module (QSSSLVR) was present in most
ZFPs that exhibited significantly greater activation than
predicted (Figure 2b, red diamonds). Excluding ZFPs con-
taining this module from the analysis increased the corre-
lation coefficient to 0.86 (Figure 2b, solid line).
The predictions described above relied on published
in vitro binding affinities for ZFPs in which modules
were evaluated in a fixed context (15) to estimate bind-
ing contributions of individual modules. In an alternate
approach, we predicted ZFP performance by solving indi-
vidual module contributions as component variables of
a system of linear equations. Briefly, in constructing the
27 different three-finger proteins, nine ZFDs were used
approximately 8–10 times (approximately three times at
each of the possible three positions, Table 1). Assuming
that the energy contributions of individual ZFDs in a ZFP
are additive, the B2H activity of each ZFP was considered
to result from its particular combination of modules
(Supplementary Figure 1). Individual module contribu-
tions were calculated for each ZFP using a leave-one-out
linear system solution. Expected lacZ activation in the
B2H assay for each of the ZFPs was then predicted by
summing individual module contributions. As shown in
Figure 3, expected levels of activation computed in this
manner were highly correlated with actual B2H activity
The energy contributions computed using a system
of linear equations to analyze in vivo activity data from
the B2H assay indicate that the GTA-specific QSSSLVR
module binds with higher affinity than previous Kdesti-
mates. This is consistent with our conclusion based on
inspection of energies computed from in vitro binding con-
stants (Figure 2b). We estimated a new value for this
module by calculating the Kdthat optimizes the correla-
tion of predicted energies with the B2H data. This
approach resulted in an estimated Kdof 2.5 nM for this
module, 10-fold lower than the previously reported value
of 25 nM (15). Incorporating this new estimate improved
correlation between the in vitro energy model and in vivo
fold activation data (r=0.86, Figure 2c).
To directly evaluate the effects of individual module
affinities on in vivo performance, sets of related ZFPs
designed to vary at a single module position were analyzed
for differences in B2H activity (Table 2). For all three sets
of ZFPs in which the F1 position was varied (while F2 and
F3 were fixed), the greatest in vivo activity was observed
when the F1 position contained a high affinity module; the
least activity was observed with a low affinity module in
this position. The same trend was observed for all four
Table 2. Single module substitutions in ZFPs alter target affinity
Individual ZFDs, represented by their DNA recognition helix, are
shaded to indicate their affinity class (Strong, Moderate, Weak, see
Table 1). For each subgroup (demarcated by horizontal lines), modules
in two positions were held constant while the position indicated in the
leftmost column was varied. Fold activation denotes performance of
the ZFP in the B2H assay. Toxic refers to the poor growth of E. coli
cultures observed when cells expressed certain ZFPs.
Table 1. ZFP variants that differ in the middle (F2) position bind
targets with variable affinities
N- to -C
The data in this table were reported by Segal et al. (15). Each ZFD in
the F2 position was selected to bind a particular target triplet. The F1
and F3 modules (derived from Zif268) were the same in each construct.
A binding affinity constant for each ZFP variant was determined using
an EMSA. Here, modules are classified based on the affinity of the
ZFPs for their cognate target sites (Strong, Moderate, Weak). Three
modules from each class were chosen to construct 27 diverse three-
module ZFPs (Table 2).
Nucleic Acids Research, 2009, Vol. 37, No. 2
groups in which the F3 position was varied while the F1
and F2 fingers were fixed. For sets in which the F2 posi-
tion was varied, only one strong module (TSGSLVR) and
one moderate affinity module (QSSSLVR) were tested.
In these cases, the moderate affinity module outperformed
the high affinity module. These results suggest that the
effect of single module substitutions on relative binding
affinity can be predicted reliably in most cases.
In summary, three lines of analysis: (i) predictions based
on in vitro binding constants for modules in a fixed con-
text, (ii) predictions derived from a system of linear equa-
tions based on in vivo performance and (iii) analysis of the
effects of various single finger substitutions in vivo, demon-
strate that in vivo performance for ZFPs can be predicted
based on DNA-binding affinities of individual ZFDs.
In vitroDNA binding affinities ofZFPsare highlycorrelated
Our success in estimating the activities of ZFPs in the B2H
assay suggested that our scoring scheme could be applied
more generally to predict in vitro ZFP affinities. To test
whether activation measured in the B2H assay directly
reflects DNA binding affinity for the desired target site,
9 of the 27 engineered proteins, along with a control
Zif268 protein, were chosen for in vitro binding affinity
measurements (Figure 4). Kd values were determined
using fluorescence anisotropy (FA), a rapid and reprodu-
cible solution-based DNA binding assay that allows com-
putation of the bound fraction of a fluorescently labeled
ligand, based on the decrease in its rotational velocity due
to binding (33,34).
As shown in Figure 5, binding affinity constants deter-
mined by FA were highly correlated with predicted ener-
gies. The two ZF proteins with highest predicted affinities
were toxic to bacterial cells, prohibiting purification of
sufficient quantities of protein for in vitro analysis.
in vitro affinity measurements for modules in a fixed con-
text (15) were proportional to the log of Kd’s measured in
our experiments (r=0.91) (Figure 5a). As before, assum-
ing a Kdof 2.5 nM for the QSSSLVR module significantly
improved the correlation (r=0.97). Predicted in vivo acti-
vation levels generated by the leave-one-out linear system
method were also highly correlated with experimentally
determined binding constants (r=0.93; Figure 5b).
Thus, results obtained using a rapid and reliable spectro-
scopic method suggest that ZFP binding affinities mea-
sured in vitro generally correspond to results obtained
in vivo using the B2H system. This demonstrates that
our rule-based strategy can be used to predict ZFP
DNA binding affinity.
Abinding energy threshold forZFP function in vivo
To evaluate the generality of this rule-based approach, we
calculated predicted energies for another set of modularly
assembled ZFPs that had been previously evaluated using
the B2H system (25). From 168 modularly assembled
ZFPs, we selected all ZFPs comprising GNN or TGG
modules for which published in vitro DNA binding affinity
constants are available [measured in the F2 position of
the standard Zif268 variant backbone (15)]. As shown in
Figure 6a, based on a segmental linear regression model,
binding energies for 24 of these 26 ZFPs are highly corre-
lated (r=0.80) with reported B2H activity measurements.
These results are also in excellent agreement with the
results described above and shown in Figure 2b, although
slightly higher activation levels were uniformly observed
in the latter experiments. Notably, both sets of experi-
ments identify a ??G of ?5kcal/mol (corresponding to
a Kdof ?100 nM) as the threshold for zinc-finger function
in vivo (in bacterial cells). We also used the scoring func-
tion generated from the B2H experiments performed by us
(and shown in Figure 2c) to predict B2H activity for the
24 ZFPs evaluated by Ramirez et al. (25). Again, the pre-
dicted and measured fold-activation scores were in
close agreement, with a correlation coefficient of 0.79
(Figure 6b). Taken together, these results suggest that
the scoring function developed and evaluated may be gen-
erally applicable to ZFPs assembled using the Barbas lab
Using a rule-based strategy that combines experimentally
determined binding energies of individual ZFDs, we were
able to compute binding energies for ZFPs made from
a particular set of well-characterized GNN modules (15).
We also showed that these predicted binding energies are
in excellent agreement (r=0.91; Figure 5a) with binding
affinity constants measured directly in vitro. Furthermore,
we showed a strong correlation between these computed
Figure 3. ZFDs contribute additively to B2H activity, independent of
context and position. For each ZF protein, the expected contribution to
B2H activity from each of its component modules was estimated by
solving a system of linear equations representing the other 24 proteins
(see text). Comparison of actual versus predicted B2H activity
(expressed as relative fold-activation in the B2H assay) reveals a high
Nucleic Acids Research,2009, Vol. 37,No. 2511
binding energies and ZFP activities in a B2H system for
two different sets of modularly assembled three-finger
ZFPs. This is an important advance because a ZFP that
lacks activity in the B2H system will also have a high
Figure 6. Predicted ZFP performance agrees with in vivo activity for an
independently generated set of ZFPs. Data shown are for 24 of 26 ZFPs
containing characterized GNN or TGG modules, constructed and eval-
uated by Ramirez et al. (25) (a) A segmental linear regression model
provides an excellent fit of in vivo ZF-induced fold activation measured
in the B2H assay with predicted binding energies (r=0.80). (b) ZF-
induced fold activation values measured in the B2H assay for 24 ZFPs
from Ramirez et al. (25) are also highly correlated (r=0.79) with pre-
dicted fold-activation levels calculated based on a scoring function
derived from the segmental linear regression model fit for the 25 ZFPs
shown in Figure 2a (see text for details). Note: predictions for 2 of 26
ZFPs containing characterized GNN or TGG modules from Ramirez
et al. (25) were considered outliers (values were outside the range included
in these graphs); they were not included in the regression analysis.
Figure 4. Determining binding affinity constants using fluorescence anisotropy. (a) A representative in vitro binding isotherm obtained using FA.
Data points for each ZFP were collected using three separate purified protein preparations, each assayed for binding activity on a different day.
Curve fitting was performed using Prism. (b) Kdvalues for seven modularly assembled ZFPs, determined in FA experiments. Note that two ZFPs
were toxic to host cells, preventing purification of proteins in quantities required for in vitro analysis.
Figure 5. In vitro affinity constants for ZFPs are highly correlated with
predictions. Using affinity constants for seven ZFPs determined by FA
(Figure 4b), log(Kd) is plotted against: (a) predicted energy, expressed
as ??G in kcal/mol (r=0.91) and (b) predicted B2H activity based on
a leave-one-out system of linear equations analysis (r=0.93).
Nucleic Acids Research, 2009, Vol. 37, No. 2
probability of failing to function as a ZFN in human
cells (24–26). Thus, using only our scoring method,
researchers can now identify target sites that will have
a high probability of failing to yield functional zinc-
finger arrays by the method of modular assembly. Our
rule-based strategy will thus allow researchers to focus
their modular assembly efforts on a smaller number of
target sites with a higher probability of success.
We believe that our results also provide one potential
explanation for the discrepancy between the overwhelm-
ing success rates for a previous in vitro report (35) and the
low in vivo success rates observed for ZFPs in the recent
study of Ramirez et al. (25): many of the modules used
to performmodular assembly
affinities. Our data suggest, in fact, that 30–50% of poten-
tial three-finger ZFPs made wholly from the Barbas GNN
modules will fail to function in the B2H system, a result
in agreement with the recently published results of
Ramirez et al. (25).
Although our results demonstrate that the energy con-
tributions of individual ZFDs in a ZFP array are additive,
we also believe they lend additional support to the notion
that context is an important parameter that should be
accounted for when engineering multi-finger ZFPs (i.e.
that one single ZFD module will not always be optimal
or adequate for recognition of its cognate 3-bp subsite
in different multi-finger ZFP contexts). For example, our
data show that although a weak finger will sometimes be
found in a nonfunctional ZFP array (if it is joined together
with other weak affinity ZFDs), it will also sometimes be
found in functional arrays when paired with stronger affi-
nity ZFDs. In addition, our data show that although
strong fingers will sometimes be found in functional
ZFPs, they can be found in nonfunctional ZFPs.
Furthermore, the use of three strong fingers in a ZFP
can lead to toxicity in E. coli cells. Although the precise
mechanism of this toxicity is unclear, a reasonable hypoth-
esis is that excessively high affinity leads to binding to
related but off-target sequences with sufficient affinity to
cause biological consequences (essentially, excessive affi-
nity leading to problems of specificity). Thus, our data
further re-enforce the ideas that individual ZFDs do not
function completely independently and that the specific
attributes of neighboring fingers do matter in the context
of engineering a multi-finger ZFP.
The importance of context-dependent effects also sug-
gests that identification of additional ZFDs with variable
affinities for GNN triplets may be needed if the efficiency
of modular assembly is to be improved. If such ZFDs were
available, it might be possible to achieve higher success
rates for modular assembly by creating several ZFPs
for a given target site so as to identify a combination
that balances affinities (and presumably, specificities) of
its component ZFDs. A related point is that our findings
also suggest one possible reason why more complex selec-
tion-based methods that account for context-dependent
effects [e.g. the OPEN method recently described by
Joung and colleagues (26,32)] may be more successful
than modular assembly: these methods are able to balance
the overall affinity and specificity of the final ZFP array by
identifying optimal combinations from various ZFDs with
a range of affinities and specificities for their target 3-bp
The strong correlations among predicted binding ener-
gies, in vivo activities, and in vitro binding affinity con-
stants for the ZFPs analyzed in this work suggests that
our rule-based approach might be extended to evaluate
arrays assembled using GNN modules from other sources
(17,19) and non-GNN modules. We have not yet evalu-
ated such modules, but our work demonstrates two ways
this could be achieved: (i) by directly measuring in vitro
binding constants for modules in the F2 position of a
standardized ZFP framework and (ii) by computing indi-
vidual module contributions to ZFP binding as compo-
nent variables of a system of linear equations that describe
their activities (measured in vivo in this work, but in vitro
binding constants could also be used). The energy
scoring scheme proposed here will allow researchers to
determine whether a modular assembly strategy is likely
to be feasible for specific targets of interest, based on cur-
rently available well-characterized modules, or whether an
alternative selection-based engineering strategy should be
A recent study on the use of ZFNs for homologous
recombination cited lack of specificity as a primary deter-
minant of ZFN-mediated toxicity in human cells (24).
A likely mechanism for ZFN-induced toxicity is through
binding to genomic sequences similar to the desired target
sequence. As noted above, we observed toxicity in bacter-
ial cells for several ZFPs, even in the absence of a fused
nuclease domain, suggesting that ZFP binding to certain
sites in genomic DNA can be toxic, particularly for high
affinity ZFPs. Although this is the first published report
of such toxicity in bacterial cells that we are aware of, it
has been observed previously for several other sites
(Joung,J.K. unpublished data). However, bacterial expres-
sion of ZFPs with affinities in the pM range, with no toxic
effects, has also been reported (19,36,37). High-through-
put chip or microfluidics-based DNA binding experiments
(38–41) could be used to obtain affinity and specificity
data for virtually every possible target site for a given
ZFP, providing additional insight into ZFP-induced toxi-
city and into the fundamental rules that govern the affinity
and specificity of DNA recognition by zinc-finger DNA
A correlation between ZFP binding constants measured
in vitro and functional activity measured in vivo has
also been observed by others using different reporter sys-
tems (37). A similar degree of correlation was observed
using the B2H system in our study (Supplementary
Figure 2). Our results further demonstrate that measur-
able ZFP activity in an in vitro binding assay does not
necessarily translate into adequate function in vivo, in
agreement with Beerli et al. (42). However, the energy
threshold we determined for ZFP activity in vivo, using
B2H assays, corresponds to a Kdof ?100 nM, and thus
differs from the estimated threshold Kd of ?10nM
reported as the minimum affinity necessary for ZFP func-
tion in mammalian cells (42). The significance of this dif-
ference between thresholds determined in bacterial and
mammalian cells is difficult to evaluate, given that func-
tional assays and Kd measurements were performed in
Nucleic Acids Research,2009, Vol. 37,No. 2 513
different laboratories using different assays and with ZFPs
containing different numbers of fingers.
Stormo and colleagues (43–47) have shown that the
DNA-binding specificity of ZFPs can be effectively pre-
dicted from additive energy contributions of individual
residues that make base-specific contacts with target
site nucleotides. Our results complement this idea by
demonstrating that the affinity of ZFPs also can be pre-
dicted, using affinity data for component modules. Our
application of the binding energy additivity concept differs
somewhat from that used by Stormo to predict specificity
in that it assumes additivity of energy contributions at the
individual finger rather than individual residue level. Also,
our approach implicitly includes energetic contributions of
residues that are not directly involved in base contacts
(e.g. phosphate contacts), as well as energetic contribu-
tions resulting from context-dependent effects that pre-
sumably occur among recognition helix residues within
The apparent simplicity of modular assembly has
contributed to the current focus on C2H2ZFDs as the
domains of choice for designing custom DNA binding
proteins. Our results make it possible, for the first time,
to reliably identify prospective binding sites that are unli-
kely to yield functional ZFDs by modular assembly
using a set of GNN-specific finger modules. The rule-
based strategy presented here can provide accurate
guidance for both in vitro binding affinities and in vivo
functionality for engineered ZFPs by computing energy
contributions of individual ZFDs. We have updated the
Zinc Finger Targeter (ZiFiT) web server (http://bindr.gd
cb.iastate.edu/ZiFiT) (48) so that it now provides users
with a list of potential ZFP-target site pairs for a desired
genomic sequence, scored according to the procedures
developed and validated in this work.
Supplementary Data are available at NAR Online.
We thank members of our groups and colleagues, espe-
cially Fengli Fu, Deepak Reyon, David Wright, Ronnie
Winfrey, Ben Lewis, Bob Farnham, Abd Elhamid Azzaz,
Les Miller, Gaya Amarasinghe and Vasant Honavar and
the referees for their helpful suggestions and valuable
feedback. We also thank Guru Rao for the use of his
National Institutes of Health (GM066387 to D.D.);
National Science Foundation (DBI0501678 to D.F.V.);
GM078369 to J.K.J.); and graduate research assistant-
ships providedby United
IGERT0504304 and ISU’s Center for Integrated Animal
Genomics (CIAG). Funding for open access charge:
National Science Foundation (DBI0501678).
Conflict of interest statement. None declared.
1. Durai,S., Mani,M., Kandavelou,K., Wu,J., Porteus,M.H. and
Chandrasegaran,S. (2005) Zinc finger nucleases: custom-designed
molecular scissors for genome engineering of plant and mammalian
cells. Nucleic Acids Res., 33, 5978–5990.
2. Klug,A. (2005) Towards therapeutic applications of engineered zinc
finger proteins. FEBS Lett., 579, 892–894.
3. Porteus,M.H. and Carroll,D. (2005) Gene targeting using zinc finger
nucleases. Nat. Biotechnol., 23, 967–973.
4. Wu,J., Kandavelou,K. and Chandrasegaran,S. (2007) Custom-
designed zinc finger nucleases: what is next? Cell. Mol. Life Sci., 64,
5. Cathomen,T. and Joung,J.K. (2008) Zinc-finger nucleases: the next
generation emerges. Mol. Ther., 16, 1200–1207.
6. Desjarlais,J.R. and Berg,J.M. (1993) Use of a zinc-finger consensus
sequence framework and specificity rules to design specific DNA
binding proteins. Proc. Natl Acad. Sci. USA, 90, 2256–2260.
7. Jamieson,A.C., Wang,H. and Kim,S.H. (1996) A zinc finger direc-
tory for high-affinity DNA recognition. Proc. Natl Acad. Sci. USA,
8. Beerli,R.R., Segal,D.J., Dreier,B. and Barbas,C.F. 3rd. (1998)
Toward controlling gene expression at will: specific regulation of the
erbB-2/HER-2 promoter by using polydactyl zinc finger proteins
constructed from modular building blocks. Proc. Natl Acad. Sci.
USA, 95, 14628–14633.
9. Wolfe,S.A., Nekludova,L. and Pabo,C.O. (2000) DNA recognition
by Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol.
Struct., 29, 183–212.
10. Pabo,C.O., Peisach,E. and Grant,R.A. (2001) Design and selection
of novel Cys2His2 zinc finger proteins. Annu. Rev. Biochem., 70,
11. Segal,D.J. (2002) The use of zinc finger peptides to study the role of
specific factor binding sites in the chromatin environment. Methods,
12. Pavletich,N.P. and Pabo,C.O. (1991) Zinc finger-DNA recognition:
crystal structure of a Zif268-DNA complex at 2.1A. Science, 252,
13. Elrod-Erickson,M., Rould,M.A., Nekludova,L. and Pabo,C.O.
(1996) Zif268 protein-DNA complex refined at 1.6A: a model
system for understanding zinc finger-DNA interactions. Structure,
14. Miller,J.C. and Pabo,C.O. (2001) Rearrangement of side-chains in a
Zif268 mutant highlights the complexities of zinc finger-DNA
recognition. J. Mol. Biol., 313, 309–315.
15. Segal,D.J., Dreier,B., Beerli,R.R. and Barbas,C.F. 3rd. (1999)
Toward controlling gene expression at will: selection and design of
zinc finger domains recognizing each of the 5’-GNN-3’ DNA target
sequences. Proc. Natl Acad. Sci. USA, 96, 2758–2763.
16. Wolfe,S.A., Greisman,H.A., Ramm,E.I. and Pabo,C.O. (1999)
Analysis of zinc fingers optimized via phage display: evaluating the
utility of a recognition code. J. Mol. Biol., 285, 1917–1934.
17. Liu,Q., Xia,Z., Zhong,X. and Case,C.C. (2002) Validated zinc finger
protein designs for all 16 GNN DNA triplet targets. J. Biol. Chem.,
18. Dreier,B., Beerli,R.R., Segal,D.J., Flippin,J.D. and Barbas,C.F. 3rd.
(2001) Development of zinc finger domains for recognition of the 5’-
ANN-3’ family of DNA sequences and their use in the construction
of artificial transcription factors. J. Biol. Chem., 276, 29466–29478.
19. Bae,K.H., Kwon,Y.D., Shin,H.C., Hwang,M.S., Ryu,E.H.,
Park,K.S., Yang,H.Y., Lee,D.K., Lee,Y., Park,J. et al. (2003)
Human zinc fingers as building blocks in the construction of arti-
ficial transcription factors. Nat. Biotechnol., 21, 275–280.
20. Dreier,B., Fuller,R.P., Segal,D.J., Lund,C.V., Blancafort,P.,
Huber,A., Koksch,B. and Barbas,C.F. 3rd. (2005) Development of
zinc finger domains for recognition of the 50-CNN-30family DNA
sequences and their use in the construction of artificial
transcription factors. J. Biol. Chem., 280, 35588–35597.
Nucleic Acids Research, 2009, Vol. 37, No. 2
21. Alwin,S., Gere,M.B., Guhl,E., Effertz,K., Barbas,C.F. 3rd., Download full-text
Segal,D.J., Weitzman,M.D. and Cathomen,T. (2005) Custom
zinc-finger nucleases for use in human cells. Mol. Ther., 12,
22. Beumer,K., Bhattacharyya,G., Bibikova,M., Trautman,J.K. and
Carroll,D. (2006) Efficient gene targeting in Drosophila with zinc-
finger nucleases. Genetics, 172, 2391–2403.
23. Segal,D.J., Crotty,J.W., Bhakta,M.S., Barbas,C.F. 3rd. and
Horton,N.C. (2006) Structure of Aart, a designed six-finger zinc
finger peptide, bound to DNA. J. Mol. Biol., 363, 405–421.
24. Cornu,T.I., Thibodeau-Beganny,S., Guhl,E., Alwin,S.,
Eichtinger,M., Joung,J.K. and Cathomen,T. (2008) DNA-binding
specificity is a major determinant of the activity and toxicity of zinc-
finger nucleases. Mol. Ther., 16, 352–358.
25. Ramirez,C.L., Foley,J.E., Wright,D.A., Muller-Lerch,F.,
Rahman,S.H., Cornu,T.I., Winfrey,R.J., Sander,J.D., Fu,F.,
Townsend,J.A. et al. (2008) Unexpected failure rates for modular
assembly of engineered zinc fingers. Nat. Methods, 5, 374–375.
26. Maeder,M.L., Thibodeau-Beganny,S., Osiak,A., Wright,D.A.,
Anthony,R.M., Eichtinger,M., Jiang,T., Foley,J.E., Winfrey,R.J.,
Townsend,J.A. et al. (2008) Rapid ‘‘open-source’’ engineering of
customized zinc-finger nucleases for highly efficient gene modifica-
tion. Mol. Cell, 31, 294–301.
27. Wright,D.A., Thibodeau-Beganny,S., Sander,J.D., Winfrey,R.J.,
Hirsh,A.S., Eichtinger,M., Fu,F., Porteus,M.H., Dobbs,D.,
Voytas,D.F. et al. (2006) Standardized reagents and protocols for
engineering zinc finger nucleases by modular assembly. Nat. Protoc.,
28. Ryder,S.P., Frater,L.A., Abramovitz,D.L., Goodwin,E.B. and
Williamson,J.R. (2004) RNA target specificity of the STAR/GSG
domain post-transcriptional regulatory protein GLD-1. Nat. Struct.
Mol. Biol., 11, 20–28.
29. Seidman, C.E. (1997) Transformation using calcium chloride. In
Ausubel, F.M. (ed), Current Protocols in Molecular Biology, Vol. I
Unit 1.8. John Wiley & Son, Inc., New York.
30. Lundblad,J.R., Laurance,M. and Goodman,R.H. (1996)
Fluorescence polarization analysis of protein-DNA and protein-
protein interactions. Mol. Endocrinol., 10, 607–612.
31. LiCata,V.J. and Wowor,A.J. (2008) Applications of fluorescence
anisotropy to the study of protein-DNA interactions. Methods Cell
Biol., 84, 243–262.
32. Hurt,J.A., Thibodeau,S.A., Hirsh,A.S., Pabo,C.O. and Joung,J.K.
(2003) Highly specific zinc finger proteins obtained by directed
domain shuffling and cell-based selection. Proc. Natl Acad. Sci.
USA, 100, 12271–12276.
33. Veprintsev,D.B. and Fersht,A.R. (2008) Algorithm for prediction of
tumour suppressor p53 affinity for binding sites in DNA. Nucleic
Acids Res., 36, 1589–1598.
34. Hayouka,Z., Rosenbluh,J., Levin,A., Maes,M., Loyter,A. and
Friedler,A. (2008) Peptides derived from HIV-1 Rev inhibit HIV-1
integrase in a shiftide mechanism. Biopolymers, 90, 481–487.
35. Segal,D.J., Beerli,R.R., Blancafort,P., Dreier,B., Effertz,K.,
Huber,A., Koksch,B., Lund,C.V., Magnenat,L., Valente,D. et al.
(2003) Evaluation of a modular strategy for the construction of
novel polydactyl zinc finger DNA-binding proteins. Biochemistry,
36. Yang,W.P., Wu,H. and Barbas,C.F. 3rd. (1995) Surface plasmon
resonance based kinetic studies of zinc finger-DNA interactions.
J. Immunol. Methods, 183, 175–182.
37. Kang,J.S. (2007) Correlation between functional and binding
activities of designer zinc-finger proteins. Biochem. J., 403, 177–182.
38. Bulyk,M.L. (2006) DNA microarray technologies for measuring
protein-DNA interactions. Curr. Opin. Biotechnol., 17, 422–430.
39. Berger,M.F., Philippakis,A.A., Qureshi,A.M., He,F.S., Estep,P.W.
3rd and Bulyk,M.L. (2006) Compact, universal DNA microarrays
to comprehensively determine transcription-factor binding site
specificities. Nat. Biotechnol., 24, 1429–1435.
40. Maerkl,S.J. and Quake,S.R. (2007) A systems approach to mea-
suring the binding energy landscapes of transcription factors.
Science, 315, 233–237.
41. Berger,M.F., Badis,G., Gehrke,A.R., Talukder,S., Philippakis,A.A.,
Pena-Castillo,L., Alleyne,T.M., Mnaimneh,S., Botvinnik,O.B.,
Chan,E.T. et al. (2008) Variation in homeodomain DNA binding
revealed by high-resolution analysis of sequence preferences. Cell,
42. Beerli,R.R., Dreier,B. and Barbas,C.F. 3rd. (2000) Positive and
negative regulation of endogenous genes by designed transcription
factors. Proc. Natl Acad. Sci. USA, 97, 1495–1500.
43. Stormo,G.D. and Fields,D.S. (1998) Specificity, free energy and
information content in protein-DNA interactions. Trends Biochem.
Sci., 23, 109–113.
44. Benos,P.V., Bulyk,M.L. and Stormo,G.D. (2002) Additivity in
protein-DNA interactions: how good an approximation is it?
Nucleic Acids Res., 30, 4442–4451.
45. Benos,P.V., Lapedes,A.S. and Stormo,G.D. (2002) Probabilistic
code for DNA recognition by proteins of the EGR family. J. Mol.
Biol., 323, 701–727.
46. Liu,J. and Stormo,G.D. (2005) Quantitative analysis of EGR pro-
teins binding to DNA: assessing additivity in both the binding site
and the protein. BMC Bioinformatics, 6, 176.
47. Liu,J. and Stormo,G.D. (2008) Context-dependent DNA recogni-
tion code for C2H2 zinc-finger transcription factors. Bioinformatics,
48. Sander,J.D., Zaback,P., Joung,J.K., Voytas,D.F. and Dobbs,D.
(2007) Zinc Finger Targeter (ZiFiT): an engineered zinc finger/
target site design tool. Nucleic Acids Res., 35, W599–W605.
Nucleic Acids Research,2009, Vol. 37,No. 2515