Page 1

BIOINFORMATICSORIGINAL PAPER

Vol. 21 no. 8 2005, pages 1472–1478

doi:10.1093/bioinformatics/bti229

Structural bioinformatics

M-ZDOCK: a grid-based approach for Cnsymmetric multimer

docking

Brian Pierce1, Weiwei Tong2and Zhiping Weng1,2,∗

1Bioinformatics Program and2Department of Biomedical Engineering, Boston University, Boston, MA, USA

Received on October 13, 2004; revised on November 10, 2004; accepted on December 13, 2004

Advance Access publication December 21, 2004

ABSTRACT

Summary: Computational protein docking is a useful technique for

gaining insights into protein interactions. We have developed an

algorithm M-ZDOCK for predicting the structure of cyclically sym-

metric (Cn) multimers based on the structure of an unbound (or

partially bound) monomer. Using a grid-based Fast Fourier Transform

approach, a space of exclusively symmetric multimers is searched

for the best structure. This leads to improvements both in accuracy

and running time over the alternative, which is to run a binary docking

programZDOCKandfiltertheresultsfornear-symmetry. Theaccuracy

isimprovedbecausefewerfalsepositivesareconsideredinthesearch,

thus hits are not as easily overlooked. By searching four instead of six

degrees of freedom, the required amount of computation is reduced.

This program has been tested on several known multimer complexes

from the Protein DataBank, including four unbound multimers: three

trimers and a pentamer. For all of these cases, M-ZDOCK was able

to find at least one hit, whereas only two of the four testcases had hits

when using ZDOCK and a symmetry filter. In addition, the running

times are 30–40% faster for M-ZDOCK.

Availability: M-ZDOCK is freely available to academic users at

http://zlab.bu.edu/m-zdock/

Contact: zhiping@bu.edu

Supplementary information: http://zlab.bu.edu/m-zdock

INTRODUCTION

Much of the activity of cells is guided by interactions between pro-

teins. In order to better understand the workings of cells and for

rational drug development, it is useful to understand these protein–

protein interactions. One means of revealing information about

protein interactions is the prediction of the structure of a protein

complexbasedonthestructuresoftwoindividuallycrystallizedsub-

units. This problem is referred to as unbound docking, as opposed to

the simpler (and largely solved) bound docking which is to predict

the structure based on subunit coordinates taken directly from the

bound structure. In order to simplify unbound docking, it is gener-

ally divided into two steps, the initial stage and the refinement stage.

The initial stage is a full search of the six-dimensional (6D) space

(three rotational degrees and three translational degrees) for the pos-

sible relative orientations of the two molecules. In order to make

this search tractable, the proteins are assumed to be rigid during this

stage, with allowance for some clash between the proteins (referred

∗To whom correspondence should be addressed.

to as soft docking). The next stage, the refinement stage, performs

slight improvements on a subset of the predictions from the initial

docking stage. In addition to slight movements of the rigid bodies in

6D space, the refinement stage sometimes allows for movements of

side chains and backbones (referred to as flexible in this case).

For initial stage docking, a variety of approaches have been

developed; they are discussed in several reviews (Lengauer and

Rarey, 1996; Halperin et al., 2002). A popular approach, using a

fast Fourier transform (FFT) correlation-based method to test for

surface complementarity, was first proposed by Katchalski-Katzir

et al. (1992). The programs DOT (Mandell et al., 2001), GRAMM

(Vakser, 1995), FTDOCK (Gabb et al., 1997) and ZDOCK (Chen

et al., 2003a) all use (and expand upon) this concept successfully to

predict protein complex structures. ZDOCK, developed by our lab,

uses FFT correlations to find complexes based on desolvation and

electrostatics, in addition to a surface complementarity metric called

PSC (Chen and Weng, 2003).

A subclass of interactions between proteins is the case where two

or more identical proteins interact to form a homomultimer. A com-

mon form of symmetry found in homomultimers is Cnsymmetry or

cyclic symmetry, which delineates a ring-shaped complex. For sym-

metric dimers, trimers, pentamers and heptamers, this symmetry is

necessarily the case, while this symmetry is also found for other

numbers of protein subunits. For instance, membrane channels and

chaperonins often have oligomers with Cnsymmetry.

To efficiently and accurately predict Cn multimer complexes,

we have implemented a program called Multimer ZDOCK (M-

ZDOCK). This program takes advantage of the properties of Cn

symmetry to perform a simplified search for the correct complex.

There are many instances where this program can be applied. A

number of proteins have been solved as monomers or in a com-

plex with another protein but exist in a homomultimeric state under

different conditions in vivo (e.g. heat shock, pH changes, viral

fusion).

The recently solved crystal structure of adeno-associated virus

Rep40providesevidencethatitoligomerizesfornucleotidebinding,

possiblyasaCnhexamer(Jamesetal.,2004).UsingM-ZDOCKwith

thismonomericcrystalstructureasinput,thestructureofthehexamer

can be modeled. Another example is the protein Chaperonin-60

(Cpn60), which is expressed under heat shock and other forms of

stress, isahomologofEscherichiacoli GroELandistypicallyfound

inadoubleringstructurecomposedof14protomers. However, ithas

been found that Mycobacterium tuberculosis has lower order oli-

gomersofthisprotein(Qamraetal.,2004). Aring-shapedmodelfor

1472

© The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org

Page 2

M-ZDOCK multimer docking

this structure can be obtained by multimer docking. Also, Korkhov

etal.(2004)havedevisedamodelforthedimericstructureofGABA

transporter-1 (GAT1). By computationally predicting possible struc-

tures of dimeric GAT1, multimer docking would help to support this

model or provide new ones regarding the structure.

Since the interface between two adjacent subunits is the same for

all interfaces of the complex, only one of the n interfaces needs to be

considered, reducing the problem to two monomers for any degree

of Cnsymmetry. In addition, since all Cnmultimers can be aligned

in a plane (as they are rotated around a single axis), one spatial

degree of freedom can be ignored. Finally, since there is redundancy

when rotating a Cncomplex around its rotational axis (the resultant

complex will be the same), this rotational degree of freedom is elim-

inated. Thus the problem becomes 4D instead of 6D; this reduces

the amount of searching and the computational time.

Another type of symmetry seen in proteins is dihedral (D2)

symmetry, which is composed of two homodimers interacting sym-

metrically, or a dimer of dimers (four asymmetric units). From an

interaction standpoint, this case differs from Cnsymmetry in that

therearetwointerfacestopredictratherthannidenticalinterfacesas

is seen in Cnsymmetry. Recently Berchanski and Eisenstein (2003)

filtered and combined the pairwise complexes between monomers

generated with a FFT-based generic docking algorithm (Katchalski-

Katzir et al., 1992) to predict the structures of D2multimers. They

tested the subunits taken directly from the complex structure, as well

as homology modeled monomers, and reported promising results. A

similar approach was used earlier to construct the helically symmet-

ric protein coat of the tobacco mosaic virus (Eisenstein et al., 1997).

However, due to the discrete nature of the FFT algorithm, the vast

majority of these binary predictions are not symmetric and even the

ones that pass the filter are never truly symmetric.

Here we have developed a new docking algorithm M-ZDOCK,

such that we explore only the part of search space that conforms to

the Cnsymmetry. We observe a significant improvement in accur-

acy, lower redundancy and fewer false positives, as shown in a direct

comparison with docking and filtering. In addition, since only per-

fectly symmetric multimers are explored in the search space, less

computational time is required.

IntheprocessofdevelopingM-ZDOCK,wehavecarefullycurated

a set of test cases that exist in both monomeric and multimeric forms

in physiological conditions. Although small, this set represents an

exhaustive search of such test cases in the Protein Data Bank (PDB)

(Berman et al., 2000). It should prove useful for future docking

studies on multimers.

METHODS

Scoring function

The scoring function used by this program is based on the scoring used in

the latest version of ZDOCK (Chen et al., 2003a). ZDOCK is an initial stage

docking algorithm designed to predict the structure of the complex of two

proteins, referred to as the receptor and the ligand. It takes into account sur-

face complementarity, electrostatics and desolvation to find the optimal fit

between two proteins. Surface complementarity is calculated using pairwise

shape complementarity (PSC), which consists of a favorable term determ-

ined by the number of atom pairs within a distance cutoff, and a penalty term

determined by the number of clashes. Atomic Contact Energy (ACE) (Zhang

et al., 1997) is used to score desolvation, and the electrostatic term is calcu-

lated by applying Coulomb’s equation to the partial charges of the ligand in

the electrostatic field of the receptor.

The search strategy of ZDOCK is to discretize both ligand and receptor

onto a grid and use FFT to determine the best position of the ligand relative

to the receptor. This discretization and FFT is performed for a complete set

of angular orientations of the ligand (relative to a fixed receptor).

Resultshavedemonstratedthatthisapproachperformswellagainstadock-

ingbenchmarkandintheinternationalblindtestCAPRI(Chenetal.,2003b),

Critical Assessment of Prediction of Interactions.

Euler angles

The Euler angle conventions used in this paper refer to these successive

rotations from the initial configuration, described in Goldstein’s Classical

Mechanics (Goldstein, 1980):

(1) Rotation by ψ around the z-axis.

(2) Rotation by θ around the original x-axis.

See Figure 1 for a diagram of these rotations. Typically, Euler angles are sets

of three angles; in this case, the third angle, φ, is not necessary as it would

be redundant in the symmetric search (see the Search space section for an

explanation).

Rotational/translational search

M-ZDOCK uses the convention that the rotational axis will be parallel to the

z-axis, and searches in the x–y plane for the optimal position of this axis.

To perform the search for the best conformation of a multimer based on the

structure of a monomer, it has been necessary to make modifications to the

search methodology that is used for ZDOCK 2.3. The new search algorithm

is outlined below:

(1) Center the receptor (the input monomer) at the origin.

(2) Rotatethereceptorbyanangleψ aroundthez-axis, andthenθ around

the x-axis.

(3) Copy the receptor, and rotate it by 360◦/n around the z-axis to create

the ligand.

(4) Discretize both the ligand and receptor, with a grid spacing of 1.2 Å

(the same as ZDOCK 2.3).

(5) Perform the 3D FFT and correlation, and search in the x–y plane for

the best scoring multimer position for that rotational orientation.

(6) Repeat the steps 2–5 for various other sets of ψ and θ.

Search space

In order to fully explore the space of multimers, it is necessary to vary ψ

from 0 to 360◦, and θ from 0 to 90◦. θ does not need to sample a full 360◦

because for a given φ there are redundancies at 180◦− θ, 180◦+ θ, and −θ

due to the symmetric nature of these angles around the z- and x-axes.

Itisnotnecessarytosampleφ angles(thethirdrotationaroundthez-axis),

as these are symmetric around the z-axis and therefore would be redundant

for the same values of ψ and θ. This corresponds to the loss of a rotational

degree of freedom that is referred to in the Introduction section.

M-ZDOCK uses 1500 angle sets, as this was found to be a good bal-

ance between computational time and predictive performance. In addition,

given that ZDOCK 2.3 uses 54000 angle sets for 3D angular freedom

(6 degree sampling density), the number of angles that M-ZDOCK covers is

mathematically reasonable as it is approximately 540002/3.

Reconstructing the multimer

Based on the optimal relative position of two adjacent monomers in the x–y

plane (output from the FFT), it is possible to reconstruct the full multimer.

The only constraint is that the monomers need to be rotated by 360◦/n

with respect to one another around the z-axis. Referring to Figure 2, the

vector representing the displacement between the two adjacent monomers

is L and the vector from the monomer to the symmetry axis (in the x–

y plane) is d. β is the angle around the Cnsymmetry axis between two

multimer centers of mass, 360◦/n. The angle between the vectors L and

1473

Page 3

B.Pierce et al.

Fig. 1. Diagram of successive rotation through Euler angles ψ and θ the angles used to describe the rotational configuration of the ligand and receptor. In this

case, ψ = 90◦and θ = 45◦.

d is α, given by (180◦− β)/2. The magnitude of d can be computed

as L/(2∗cos(α)).

Once the rotational axis is found, the monomer needs to be rotated

around this axis n times by β◦to form the multimer. Thus, given the

vector between two adjacent monomers in the Cnmultimer (and the sym-

metry number), it is possible to reconstruct the entire multimer. To illustrate

this concept, a java applet has been written and is publicly available at

http://zlab.bu.edu/m-zdock.

Symmetry filter

In order to compare the results of M-ZDOCK with results from an existing

method of docking, we implemented a symmetry filter that will choose only

near-symmetric complexes. It is designed to process the results from a dock-

ing tool such as ZDOCK which produces many predictions (54000 in the

case of ZDOCK with dense sampling).

The filter determines the angle and axis between the monomers of the pre-

diction, as well as the center of mass translation between the monomers. For

perfect symmetry, the angle between the center of mass translation and the

axis is 90◦, and the angle of rotation around the axis is 360◦/n, but a certain

range must be allowed as the predictions are not perfectly symmetric. In the

case of Berchanski and Eisenstein (2003) the angular range for the rotation

around the axis was ±6◦, and between the axis and translation the angular

range was ±3◦. To allow for a comparison with the M-ZDOCK results so

therewouldbeapproximately1500predictionspertestcasetheserangeswere

increased to ±18◦and ±9◦, respectively.

Multimer testcases

We tested M-ZDOCK with two categories of testcases, bound/quasi-bound

and unbound.

Bound and quasi-bound testcases

To ensure that the search space is covered entirely and that the algorithm is

valid for various types of Cnsymmetry, both bound and quasi-bound dock-

ing testcases were used. The bound testcases were generated by extracting

the monomer from the multimeric structure so that the docking algorithm

can attempt to reassemble the multimer. These testcases should be relatively

simple to dock as there is no conformational change to account for. If the

correct structure is not found with these cases, it is possibly due to some

problem with the searching algorithm. Though found in the PDB as both

monomers and multimers, quasi-bound testcases are most likely biological

multimers. Therefore the conformational change involved is of little or no

significance, making these cases similar to (but slightly more difficult than)

1474

Page 4

M-ZDOCK multimer docking

Fig. 2. The relative positions of the subunits of a C3multimer. The vector

L is the relative position between the receptor and the ligand (which is the

receptor rotated by β degrees; in this case β = 120◦). The magnitude of

vector d to the axis of symmetry and the angle α between vectors L and d

can be determined algebraically. Thus, once the interface between the ligand

and receptor is evaluated by M-ZDOCK, the rest of the multimer (in this case

the subunit represented by the dashed lines) can be generated automatically.

the bound testcases. The monomer structure that is found in the PDB is used

as input to the docking algorithm, while the multimer structure in the PDB is

used to evaluate the docking results.

Unbound testcases

The second type of testcases is unbound structures. These testcases are signi-

ficantlymoredifficulttopredict, bothduetotheconformationalchangeofthe

proteins inherent in unbound docking and because of the low affinity of the

complexes, as these cases must exist in both monomer and multimer forms

to be found experimentally. Four proteins were found in the PDB (Berman

et al., 2000) for which different symmetric forms exist, according to Protein

Quaternary Structure server classification. Here is a brief summary of these

proteins:

RNaseA.

et al., 1992) and trimeric (Liu et al., 2002) forms. The trimer in this case is

oneoftwotrimericformsthoughttoexistinmildlyacidicconditions.Notable

about this structure is a domain-swapped C-terminal beta strand.

Phospholipase A2.

The Naja naja naja (Indian cobra) phospholipase

A2(PLA2) was obtained from the venom and crystallized in trimeric form

(Segelkeetal.,1998)usingrandomcrystallizationscreening.Themonomeric

version(Scottetal.,1990)istheNajanajaatra(Chinesecobra)PLA2, which

was crystallized with a lower concentration of PLA2and higher concentra-

tion of Ca2+(the Ca2+is seen in the structure of the monomer but not the

trimer). In Segelke et al. (1998) it is discussed that the trimeric form may be

a means of shielding the active site and thus ‘protecting the snake from its

own venom’.

Flavivirus envelope protein.

This is the fusion envelope protein of the

tick-borne encephalitis virus (TBEV E protein). The input structure is taken

from the homodimer structure (Rey et al., 1995). The trimeric form, which

occurs at low pH during membrane fusion, was recently solved (Bressanelli

et al., 2004).

Bovine trypsin inhibitor.

This testcase is the bovine pancreatic trypsin

inhibitor (BPTI), which occurs as a monomer (Wlodawer et al., 1984) at

BovinepancreaticRNaseAwascrystallizedinmonomeric(Tilton

Table 1. The unbound multimer testcases

Testcase PDB IDsa

SymmetryRMSDb

RNase A

Phospholipase A2(PLA2)

Flavivirus envelope

protein (TBEV E)

Bovine trypsin inhibitor (BPTI)

9RAT/1JS0

1POA/1A3F

1SVB/1OML

Trimer

Trimer

Trimer

0.33

0.79

2.08

3PTI/1B0CPentamer0.41

aThe first PDB code is for the structure used as input for docking, while the second one

is the bound multimer.

bInterface Cα RMSD change between unbound/bound structures.

basic pH and a decamer (Hamiaux et al., 2000) at acidic pH levels. As the

decameriscomprisedoftwoC5symmetricpentamers, thetargetforthiscase

is one half of the decamer.

Table 1 summarizes these testcases. To provide a measure of the difficulty

of docking each complex, interface Cα atoms from unbound monomers were

fitted to two adjacent subunits of the complex. As M-ZDOCK is a rigid-body

docking algorithm, the root mean square deviation (RMSD) in this case can

be seen as the lower limit for the RMSD of the predictions.

RMSD calculations and hits

To evaluate bound and unbound predictions, the RMSDs of interface alpha

Carbon (Cα) atoms were used. The interface Cα atoms were determined

from the crystal structure of the multimer. If any atom of a residue is within

10 Å of any atom of another chain, the Cα atom from that residue is determ-

ined to be an interface Cα. In addition, to avoid false negatives due to large

domain movements, regions of residues with large movement from unbound

to bound (>4 Å) were removed before determining interface Cα atoms. See

Supplementary information for the removed residues.

Once the Cα residues are known, two adjacent subunits of the predicted

structurearefittedtotwoadjacentsubunitsofthecomplexusingtheinterface

Cαs, and the RMSD between the interface Cαs is computed. Hits are defined

as predictions that have an interface Cα RMSD ≤2.5 Å.

RESULTS AND DISCUSSION

Structure prediction: quasi-bound and bound

Eightquasi-boundandboundtestcaseswereusedtoensurethecover-

ageandbasicfunctionalityofM-ZDOCKforavarietyofsymmetries.

TheresultsinTable2clearlydemonstratethatM-ZDOCKiscapable

of predicting structures with Cnsymmetry. For all of the structures

the number one ranked prediction was a hit, and in addition there

were a number of hits in the top 20 for every testcase.

Structure prediction: unbound

The structure prediction capabilities of M-ZDOCK are shown to be

superior to filtering normal docking predictions, across the unbound

multimer benchmark (Table 3). For M-ZDOCK, all of the first hits

areinthetopthirdofthepredictions, whereasforfilteringtherewere

two cases where no hit was found, and the two other cases were in

the bottom third of the predictions.

M-ZDOCK successfully predicted a hit for RNase A (Fig. 3a),

while the near-symmetric predictions of ZDOCK failed to produce a

hit. This is despite the fact that 375 more predictions were produced

by ZDOCK plus filtering. This complex was difficult to predict due

to the strand swapping that takes place upon multimerization, which

explains the relatively high rank of 476 for the first M-ZDOCK hit.

1475

Page 5

B.Pierce et al.

Table 2. M-ZDOCK results for quasi-bound and bound testcases

Testcasea

SymmetryHitsb

Rankc

RMSDd

References

Quasi-bound testcases

1NSP/1B99e

1KKU/1GZUe

1AUS/1AA1e

1EXB/1QRQ

Bound testcases

1AF6

1A8Re

1QNU

1G31

Trimer

Trimer

Tetramer

Tetramer

111

1

1

1

1.06

0.88

0.93

1.15

(Morera et al., 1995; Gonin et al., 1999)

(Garavaglia et al., 2002; Werner et al., 2002)

(Taylor and Andersson, 1997)

(Gulbis et al., 1999, 2000)

7

8

7

Trimer

Pentamer

Pentamer

Heptamer

61

1

1

1

0.73

0.78

0.75

1.81

(Wang et al., 1997)

(Auerbach et al., 2000)

(Kitov et al., 2000)

(Hunt et al., 1997)

10

16

17

aPDB IDs of the testcases, with the PDB ID of the input structure for M-ZDOCK listed first for the quasi-bound testcases.

bNumber of hits in the top 20 (out of 1500) predictions, as ranked by M-ZDOCK.

cRank of the first hit.

dRMSD (in Å) of the first hit.

eThe bound structures in these cases are in fact dimers of the Cnmultimer; just the Cncontacts are predicted so the other interface is ignored.

Table 3. M-ZDOCK results for unbound testcases

Testcase M-ZDOCK

Na

p

ZDOCK + filtering

Ne

p

Hitsb

Rankc

RMSDd

Hitsb

Rankc

RMSDd

Rnase A

PLA2

TBEV E

BPTI

1500

1500

1500

1500

1

6

2

476

33

62

384

2.44

1.11

2.31

2.25

1875

1595

1476

1164

0

1

0

1

—

1417

—

1064

—

2.05

—

1.8320

aNumber of predictions produced by M-ZDOCK (the number is always 1500).

bNumber of hits among the predictions.

cRank of the first hit.

dRMSD (in Å) of the first hit.

eNumber of predictions remaining after running ZDOCK and filtering the 54000 predictions for symmetry.

AlthoughtheswappedstrandswerenotincludedintheRMSDcalcu-

lation, they were clearly a part of the interface making the prediction

non-trivial. The swapped strands are highlighted in Figure 3a.

The symmetric trimer PLA2 was successfully prediced by M-

ZDOCK. In this case M-ZDOCK predicted six hits, one of them

with the particularly high rank of 33. While ZDOCK + filtering

obtained a hit, the rank of the hit was 1417 and the RMSD of this hit

was higher.

PerhapsthemoststrikingresultsarefortheTBEVEprotein,where

two hits were found by M-ZDOCK, while no hits were found with

ZDOCK. This protein is somewhat difficult to dock due to the large

C-terminal conformational change upon trimerization that helps to

stabilize the interaction. The difficulty is also reflected in the lower

boundfortheRMSDof2.08Å(Table1), whichleaveslittleroomfor

error for rigid-body docking to obtain a hit (under 2.5 Å). However

M-ZDOCK is able to predict this structure, giving the first-ranked

hit an impressive rank of 62 (see Fig. 3c for the structure).

The BPTI pentamer (Fig. 3d) had a large number of predictions

produced by M-ZDOCK. As with the other testcases, M-ZDOCK

performed better with regard to hits and the rank of the first hit. In

thiscasetheRMSDofthefirsthitwasslightlybetterfortheZDOCK

prediction. Butofthe20M-ZDOCKhits, 6ofthemhadabetterrank

and RMSD than the top ZDOCK prediction, so clearly M-ZDOCK

is superior in this case as well.

Computational performance

The docking predictions reported in this study were performed on

an IBM p690 workstation with 32 1.3 Ghz Power4 processors, using

MPI for parallelization. Due to the increased efficiency of the M-

ZDOCK search, a significant reduction in running time can be seen

using this approach. The discretization of the receptor at every angle

set (as described in the Methods section), which costs more than

regular ZDOCK, is more than compensated by the faster search. On

average, M-ZDOCK runs 30–40% faster than ZDOCK.

M-ZDOCK was also compiled and run on Linux in serial and

parallel, and on Mac OS X in serial. Versions of M-ZDOCK for all

of these platforms are available at: http://zlab.bu.edu/m-zdock.

Future work

A possible future modification to the M-ZDOCK algorithm would

be to incorporate the degree of packing into the algorithm. Since

the algorithm currently used considers only the interface between

1476