Protein–Protein Docking: Overview and Performance
Kevin Wiehe, Matthew W. Peterson, Brian Pierce, Julian Mintseris,
and Zhiping Weng
Protein–protein docking is the computational prediction of protein complex structure given
the individually solved component protein structures. It is an important means for under-
standing the physicochemical forces that underlie macromolecular interactions and a valuable
tool for modeling protein complex structures. Here, we report an overview of protein–protein
docking with specific emphasis on our Fast Fourier Transform-based rigid-body docking program
ZDOCK, which is consistently rated as one of the most accurate docking programs in the Critical
Assessment of Predicted Interactions (CAPRI), a series of community-wide blind tests. We also
investigate ZDOCK’s performance on a non-redundant protein complex benchmark. Finally, we
perform regression analysis to better understand the strengths and weaknesses of ZDOCK and
to suggest areas of future development for protein-docking algorithms in general.
Key Words: Protein–protein docking; ZDOCK; RDOCK; Fast Fourier Transform; benchmark;
CAPRI; shape complementarity; electrostatics; desolvation energy; regression analysis.
Protein–protein interactions play a central role in biochemistry. This can
be seen in cell-signaling cascades, enzyme catalysis, the immune response
by means of antibody–antigen interactions, and the large-scale motions of
organisms. These interactions are also implicated in many diseases.
From: Methods in Molecular Biology, Vol. 413: Protein Structure Prediction, Second Edition
Edited by: M. Zaki and C. Bystroff © Humana Press Inc., Totowa, NJ
284Wiehe et al.
While experimental techniques such as yeast two-hybrid system and mass
spectrometry are able to determine the existence of protein–protein interactions,
the structure of the macromolecular complex of two interacting proteins can
provide additional information about their interaction, such as the specific
residues involved in the interaction and the degree of conformational change
undergone by the proteins upon binding.
X-ray crystallography and nuclear magnetic resonance have provided us
with the structures of many complexes, but numerous structures still remain
unsolved because of time and experimental limitations. This leads to a need for
computational methods to understand the nature of protein–protein interactions,
one of which is protein–protein docking.
This chapter is divided into three sections. The first section provides an
overview of protein–protein docking and describes some of the available
algorithms for docking. The second describes the ZDOCK suite of programs
in detail, and the third describes an analysis of the performance of ZDOCK.
1.1. Protein–Protein Docking: An Overview
Protein–protein docking is defined as the prediction of the structure of two
proteins in a complex, given only the structure of the interacting proteins. The
“docking problem” can be broken down into two types of docking: bound
docking, in which a complex is separated and reassembled, and unbound
docking, where the structure of the complex is found from the individually
solved structures of the interacting proteins. Obviously, bound docking has
little applicable value, but it is often used for testing and verification purposes.
Unbound docking is much more difficult than bound docking because the
proteins involved can change conformation upon binding. A study of confor-
mational changes in protein complexes (1) showed that while the general model
for protein–protein recognition is an induced fit model where the proteins must
change conformation in order to bind, the amount of conformational change
was small enough such that binding could be modeled as a “lock-and-key”
mechanism as a first approximation. This allows for successful docking results
even when there are noticeable changes in the conformation of the inter-
acting proteins. This “rigid-body” approximation has been invaluable in the
advancement of the protein–protein docking field. However, modeling induced
fit by flexible docking remains a central challenge, and a large portion of
current docking research is focused in this area.
There are two main challenges in the development of methods for protein–
protein docking. The first is the construction of a scoring function that allows
for the discrimination between correct or near-correct predictions and incorrect
predictions. The second is the development of an algorithm that quickly searches
and scores all possible orientations of the proteins to be docked. The most
Protein–Protein Docking 285
obvious way to dock two proteins would be simulate the molecular dynamics, as
this would allow the complex to reach its native state with time. Unfortunately,
the computational power necessary for such a simulation makes this currently
Protein–protein docking is often carried out in two stages. The initial stage
treats the proteins as rigid bodies, allowing for an efficient search of the six-
dimensional (6-D) space (three dimensions of translational freedom and three
dimensions of rotational freedom). The 6-D space is searched for regions of
high shape and biochemical complementarity, using a “soft” scoring function
that allows for some clashes between atoms. A critical component of docking
research has been the development of novel techniques for increasing the
speed of the search. One of the most popular methods is the Fast Fourier
Transform (FFT) (2), used in ZDOCK (3), FTDock (4), and GRAMM (5) to
search translational space and in HEX (6) to search angular space. Other search
methods that have been used include representing the proteins using grids of
bits (7), Monte Carlo sampling (8,9), genetic algorithms (10), and geometric
Many docking algorithms have a refinement and re-ranking stage. This
involves making small changes to the highest-scoring predictions from the
initial stage using techniques such as 6-D rigid-body movements, molecular
dynamics, and the clustering of similar predictions. Often, a more advanced
scoring function, designed to increase the rank of near-native structures and
decrease the rank of false positives, is introduced. This allows for a more
descriptive approximation of biochemical properties such as desolvation free
energy, electrostatics, and hydrogen bonding. Table 1 provides a list of current
docking methods, along with their methodologies.
1.2. Measuring the Accuracy of Predicted Complexes
Once a prediction has been created, it is useful to evaluate it in a quantitative
fashion. This is most often done using root mean square deviation (RMSD)
between the atoms (using all atoms, backbone atoms, or C? atoms) of the
prediction and the complex. This is done by first aligning the predicted structure
with the crystallized complex in a manner that minimizes RMSD. RMSD
between the predicted (p) and actual (a) C? atoms is calculated as follows
(with n being the total number of atoms):
Two of the most often used metrics for measuring the accuracy of a predicted
structure are interface RMSD (iRMSD) and ligand RMSD (lRMSD). iRMSD
286 Wiehe et al.
A Summary of Docking Tools
AutoDOCK Flexible docking using Monte Carlo
search and incremental construction
Global search using bit mapping,
rescored with multiple filters
Docking with DOT/ZDOCK,
Global search with grid-based
energy function, flexible docking
with random search and
FFT global search using shape
complementarity and electrostatics
FFT rigid-body search
FFT with clustering and rescoring
semi-flexible simulated annealing,
rescoring using biochemical data
Fourier correlation of spherical
harmonics, explicit translational
Docking by combining
pseudo-Brownian potential and
torsional steps with local gradient
Geometric hashing and
pose-clustering to score shape
Optimization of side-chain
conformation with rigid- body
Monte Carlo minimization
FFT search using shape
complementarity, desolvation, and
electrostatics. Refinement and
re-ranking with RDOCK
FFT, Fast Fourier Transform.
aAvailability of download to academic users.
bBrowser can be downloaded; docking component must be purchased.
is defined as the C? RMSD of those residues having at least one atom within a
distance cutoff of the interacting partner; lRMSD is calculated by superposing
the receptor of the predicted structure with the known structure, performing
the same transformation on the ligand, and calculating the C? RMSD of the
ligand. An advantage of using iRMSD is that unlike lRMSD, it is not affected
by conformational change in domains that do not include the binding site.
Often, a prediction is classified as a “hit” if the iRMSD and lRMSD are below
a threshold. Unfortunately, this hard cutoff does not take into account many
nuances. Another method of evaluating the accuracy of docking predictions is
the fraction of native and non-native contacts (fnatand fnon−nat). Contacts are
defined as residue pairs with less than 5´Å distance between the receptor and
ligand. fnatis a measure of the number of contacts correctly predicted, and
fnon−natmeasures the number of incorrectly predicted contacts. fnon−natserves
as an indication of atomic clash between the interface residues in the predicted
complex and also as a proxy for conformational change, as residues may move
into the interface upon binding.
1.3. The Critical Assessment of Predicted Interactions Experiment
The CAPRI (Critical Assessment of Predicted Interactions) experiment
was created to compare the performance of docking algorithms of various
groups (19). CAPRI was modeled after Critical Assessment of Structural
Prediction (CASP), which started in 1994 to compare the performance of
protein-folding algorithms (20).
CAPRI is a blind competition, so the participating groups do not receive
the complex structure until after all predictions have been made. Each group
based on various factors and assigned a score [incorrect, acceptable (one star),
medium (two stars), and high (three stars)] based on their accuracy. The CAPRI
metrics for these scores are described by the Boolean expressions below:
High =?fnat≥ 0?5?∩??lRMSD ≤ 1?0?∪?iRMSD ≤ 1?0??
Medium =???fnat≥ 0?3?∩?fnat< 0?5??∩??lRMSD ≤ 5?0?∪?iRMSD ≤ 2?0???∪
??fnat≥ 0?5?∩?lRMSD > 1?0?∩?iRMSD > 1?0??
Acceptable =???fnat≥ 0?1?∩?fnat< 0?3??∩??lRMSD ≤ 10?0?∪?iRMSD ≤ 4?0???∪
??fnat≥ 0?3?∩?lRMSD > 5?0?∩?iRMSD > 2?0??
We have made predictions for all CAPRI targets, and Table 2 summarizes
our performance. As an example, Fig. 1 shows the close resemblance between