An NMR-based scoring function improves the accuracy of binding pose predictions by docking by two orders of magnitude.
ABSTRACT Low-affinity ligands can be efficiently optimized into high-affinity drug leads by structure based drug design when atomic-resolution structural information on the protein/ligand complexes is available. In this work we show that the use of a few, easily obtainable, experimental restraints improves the accuracy of the docking experiments by two orders of magnitude. The experimental data are measured in nuclear magnetic resonance spectra and consist of protein-mediated NOEs between two competitively binding ligands. The methodology can be widely applied as the data are readily obtained for low-affinity ligands in the presence of non-labelled receptor at low concentration. The experimental inter-ligand NOEs are efficiently used to filter and rank complex model structures that have been pre-selected by docking protocols. This approach dramatically reduces the degeneracy and inaccuracy of the chosen model in docking experiments, is robust with respect to inaccuracy of the structural model used to represent the free receptor and is suitable for high-throughput docking campaigns.
-
Citations (0)
-
Cited In (0)
Page 1
ARTICLE
An NMR-based scoring function improves the accuracy of binding
pose predictions by docking by two orders of magnitude
Julien Orts•Stefan Bartoschek•Christian Griesinger•
Peter Monecke•Teresa Carlomagno
Received: 30 June 2011/Accepted: 25 September 2011/Published online: 14 December 2011
? The Author(s) 2011. This article is published with open access at Springerlink.com
Abstract
mized into high-affinity drug leads by structure based drug
design when atomic-resolution structural information on
the protein/ligand complexes is available. In this work we
show that the use of a few, easily obtainable, experimental
restraints improves the accuracy of the docking experi-
ments by two orders of magnitude. The experimental data
are measured in nuclear magnetic resonance spectra and
consist of protein-mediated NOEs between two competi-
tively binding ligands. The methodology can be widely
applied as the data are readily obtained for low-affinity
ligands in the presence of non-labelled receptor at low
concentration. The experimental inter-ligand NOEs are
efficiently used to filter and rank complex model structures
that have been pre-selected by docking protocols. This
approachdramatically reduces
Low-affinity ligands can be efficiently opti-
thedegeneracy and
inaccuracy of the chosen model in docking experiments, is
robust with respect to inaccuracy of the structural model
used to represent the free receptor and is suitable for high-
throughput docking campaigns.
Keywords
Drug design
NMR ? INPHARMA ? NOE ? Docking ?
Introduction
Structure based drug design (SBDD) has evolved within
the last decades to a powerful tool for the optimization of
many low molecular weight lead compounds to highly
potent drugs (Rees et al. 2004). The principle of SBDD lies
in the combination of different chemical moieties with the
aim of obtaining a molecule that, while possessing the
pharmacological properties necessary for a drug, is com-
plementary in shape to the receptor-binding pocket. This
process requires knowledge of the exact structure of the
receptor/ligand complex, which is usually obtained by
X-ray crystallography.
In the absence of structural information for the complex,
SBDD relies on the generation of plausible docking models.
However, docking protocols suffer from inaccuracies in the
description of the interaction energies between the ligand
and the target molecule and often fail in the prediction of the
correct interaction mode. This is particularly true when the
docking experiments use low-definition or inaccurate target
structures. Such limitation of the docking approach is
serious when considering the increasing gap between the
newly identified protein sequences and the availability of
structural information (The UniProt Consortium 2008;
Berman et al. 2009). While for proteins sharing more than
30% sequence identity to their homologous templates,
Electronic supplementary material
article (doi:10.1007/s10858-011-9590-5) contains supplementary
material, which is available to authorized users.
The online version of this
J. Orts ? T. Carlomagno (&)
EMBL, Structure and Computational Biology Unit,
Meyerhofstrasse 1, 69117 Heidelberg, Germany
e-mail: teresa.carlomagno@embl.de
S. Bartoschek
Sanofi-Aventis Deutschland GmbH, R&D LGCR/Parallel
Synthesis & Natural Products, Industriepark Hoechst, Bldg.
H811, 65926 Frankfurt am Main, Germany
C. Griesinger
Max Planck Institute for Biophysical Chemistry,
Am Fassberg 11, 37077 Go ¨ttingen, Germany
P. Monecke
Sanofi-Aventis Deutschland GmbH, R&D LGCR/Structure,
Design & Informatics, Industriepark Hoechst, Bldg. G838,
65926 Frankfurt am Main, Germany
123
J Biomol NMR (2012) 52:23–30
DOI 10.1007/s10858-011-9590-5
Page 2
computational methods provide models that are typically
comparable to low-resolution experimental structures,
when the sequence identity drops below 30%, the model
accuracy decreases due to alignment errors (Kortagere and
Ekins 2010; Katritch et al. 2010; Rai et al. 2010). These
problems call for the need of experimental data that could
improve the performance of the docking scoring functions,
while not requiring the difficult step of obtaining high-res-
olution structural information for the target.
In recent years, nuclear magnetic resonance (NMR)
spectroscopy has taken an important role in the detection
and structural characterization of low-affinity (micromolar
to millimolar range) ligands that can be developed into
high-affinity leads by SBDD (Pochapsky and Pochapsky
2001; Wyss et al. 2002; Van Dongen et al. 2002; Pellecchia
et al. 2002). Transferred-NOEs (Ni and Scheraga 1994) and
transferred-cross correlated relaxation rates (Carlomagno
et al. 1999, 2003) provide the bioactive conformation of the
ligand, while saturation transfer difference (STD) experi-
ments (Mayer and Meyer 2001) reveal the ligand epitope.
These approaches have the advantage of observing only the
resonances of the ligands and of being applicable to protein
targets of any size. Recently, we have developed an NMR-
based methodology, INPHARMA (Interligand NOEs for
PHARmacophore MApping) (Sanchez-Pedregal et al. 2005;
Reese et al. 2007; Orts et al. 2008), which is able to reveal
the relative, and in favourable cases even the absolute,
binding mode of competitively binding, low-affinity
ligands, with the sole requirement of a structural model of
the apo-receptor. The relative binding mode of two ligands
interacting competitively with a common receptor allows
pharmacophore or ligand superimposition. This is an
essential step in SBDD, guiding the synthetic combination
of smaller ligands into a larger, higher affinity compound.
The absolute binding mode of ligands to the receptor rep-
resents a higher level of knowledge that allows optimization
of receptor/ligand interactions at an atomic level.
To demonstrate the efficacy of INPHARMA we had
validated the methodology for a system consisting of the
protein kinase A (PKA) with two inhibitors of catalysis
(Fig. S1), for which the binding modes determined by
INPHARMA could be compared to existing crystal struc-
tures (Orts et al. 2008). This test established the value of
INPHARMA, and confirmed that the combination of tr-
NOEs and INPHARMA NOEs, in the presence of a
structural model for the apo-potein, allows discriminating
between a few, very diverse docking modes (Orts et al.
2008).
Here, we demonstrate the use of INPHARMA data as a
high-throughput scoring function for binding modes pre-
dicted by molecular docking. We show that INPHARMA
allows a two-order of magnitude increase in accuracy with
respect to state-of-the-art docking scoring functions and
provides ligand binding modes at high resolution (up to
less than 1 A˚). In addition, we show that INPHARMA is
applicable also to receptors whose apo-form is not a good
representative of the holo-form or to receptors without an
accurate structural representation. The easy availability of
experimental INPHARMA data for target proteins of any
size and nature make INPHARMA a tool of choice to
increase the reliability of docking models and to substan-
tially speed up the process of structure-based drug design.
Experimental procedures
INPHARMA data
The INPHARMA data were measured on a mixture of the
Chinese hamster Ca catalytic subunit of cyclic adenosine
monophosphate (cAMP) dependentent protein kinase A
(PKA) (25–30 uM), L1 (150 uM) and L2 (450 uM), as
described in (Orts et al. 2008). The values of the measured
INPHARMA NOEs used in this work are in Table S1.
NOESY spectra were recorded at two mixing times
(sm= 300 and 600 ms) on an 800 MHz spectrometer
(Bruker, Karlsruhe). One NOESY spectrum was recorded
at a mixing time sm= 600 ms at a 900 MHz spectrometer
(Bruker, Karlsruhe).
Molecular dynamics simulations
Molecular dynamics (MD) simulation were performed for
the free PKA starting from the crystal structure of
3DNE.pdb after removal of L1with the software NAMD
(Phillips et al. 2005) and the CHARMM force field
(MacKerell et al. 1988). A 5 A˚layer of water molecules
hydrated the protein PKA. The water sphere was main-
tained with a spherical harmonic potential. Langevin
dynamics was performed with a 2 fs time step using the
SHAKE algorithm, without coupling the hydrogens to the
thermal bath and with a damping coefficient g of 5 per
picosecond. First, 30.000 steps of energy minimization
were performed at 0 K using a conjugate gradient and the
line-search algorithm as described in the NAMD manual.
In order to achieve a larger sampling of the conformational
space of the protein, we increased the temperature from an
initial value of 0 K to a final value of 1,200 K. Every,
30.000 steps the temperature increased by 50 K. Each final
structure was minimized at a temperature of 0 K in 30.000
steps.
Docking
PLANTS docking (Korb et al. 2009) was performed using
the ChemPLP scoring function and default parameters. The
24J Biomol NMR (2012) 52:23–30
123
Page 3
crystal structures of PKA/L1and PKA/L2were aligned on
the protein. A spherical definition of the binding pocket
centered at the center of mass of L1and a radius of 11.2 A˚
was used to restrict the sampling space. After pose clus-
tering using a threshold of 0.5 A˚, the best-ranked 200 poses
were used for further evaluations.
SURFLEX utilizes an idealized biding-site ligand as a
target to generate conjectural poses of molecules. The
idealized binding-site ligand is calculated for each of the
700 protein structures generated by MD simulation and
minimized. Ligands are docked into the protein to optimize
the value of the Hammerhead scoring function. For the
analysis, we select the 10 best scoring poses for each
protein structure. A similarity filter of 0.5 A˚is applied for
poses docked into the same protein structure, resulting in a
final set of 4,636 and 4,758 unique complex structures for
PKA/L1and PKA/L2respectively.
GLIDE requires preparing each proteins target with the
‘‘preparation wizard’’ option. For each prepared protein, we
generated a grid around the binding site that was used as
the docking target. A simple precision docking run was
performed for each of the 700 protein structures generated
by MD simulation, producing 5 poses per protein structure
per ligand (PKA/L1and PKA/L2). As for the SURFLEX
docking, a similarity filter of 0.5 A˚(integrated in GLIDE)
deleted redundant ligand poses within the same protein
structure. This procedure resulted in a final set of 2,697 and
3,069 unique complex structures for PKA/L1and PKA/L2,
respectively.
Calculation of INPHARMA
The theoretical INPHARMA NOEs for the complex pairs
generated by docking were calculated with a program
written in-house following the theory developed in (Reese
et al. 2007; Orts et al. 2009). Protons within 8 A˚from any
ligand proton were included in the full relaxation matrix
calculation. For the docking of Fig. 1, 40,000 complex
pairs were calculated; for the docking with SURFLEX in
Fig. 2the datasetcomprised
1,159,884, 70,448, 234,060 complex pairs for proteins with
binding pocket RMSD in the range 0–1, 1–2, 2–3, 3–4,
4–5 A˚from the crystal structure, respectively, of which
8,331, 16,770, 3,296, 181 and 412 correspond to the correct
relative orientation of the ligands in each protein RMSD
range.
The ranking of the complex pairs was based on the
centered Pearson correlation coefficient between the mea-
sured and the predicted INPHARMA NOEs. Structures
were accepted when the Pearson correlation coefficient R2
was higher than 0.89 for the data of Fig. 1 and 0.72 for the
data of Fig. 2 and Fig. S4. An additional filter was applied
based on the qualitative agreement of very weak IN-
PHARMA NOEs, which were visible only at high-fields
1,095,085,2,694,315,
Fig. 1 a Initial pool of docked structures for the complexes PKA/L1
and PKA/L2. The receptor model used in the docking is the PKA
structure of 3DNE.pdb. The complex pairs in b pass the selection
through the INPHARMA data (Pearson correlation coefficient R2
betweentheexperimental and thetheoreticalINPHARMA
NOEs[0.89). All these complex pairs show a very low ligand
RMSD from the correct binding mode. c Overlap of L1and L2in the
complex pairs of b, after superimposition of the protein structures.
The INPHARMA data define the orientation of L1and L2correctly to
0.5 and 1 A˚resolution, respectively
J Biomol NMR (2012) 52:23–30 25
123
Page 4
due to the better sensitivity of the instrumentation. For the
docking of Fig. 2, 107, 208, 189, 26, 23 complex pairs with
proteins in an RMSD range of 0–1, 1–2, 2–3, 3–4, 4–5 A˚
from the crystal structure, respectively, passed the selec-
tion, of which 98, 63, 60, 5 and 7 correspond to the correct
relative orientation of the ligands in each protein RMSD
range.
For the docking with GLIDE in Fig. S4, the dataset
comprised 452,760, 908,974, 575,320, 36,864, 73,060
complex pairs for proteins with binding pocket RMSD in
the range 0–1, 1–2, 2–3, 3–4, 4–5 A˚ from the crystal
structure, respectively, of which 13,560, 8,694, 4,184, 160
and 166 correspond to the correct relative orientation of the
ligands in each protein RMSD range. Moreover, 2, 485,
585, 35, 39 complex pairs with proteins in an RMSD range
of 0–1, 1–2, 2–3, 3–4, 4–5 A˚from the crystal structure,
respectively, passed the selection, of which 2, 114, 103, 19
and 9 correspond to the correct relative orientation of the
ligands in each protein RMSD range.
Results and discussion
The INPHARMA method is based on the observation of
interligand, spin diffusion mediated, transferred-NOE data,
between two ligands L1and L2, binding competitively and
weakly to a receptor T (Fig. S2). As the ligands are com-
petitive binders, such NOEs do not originate from a direct
transfer of magnetization between the two ligands, but
rather from a spin-diffusion process mediated by the pro-
tons of the receptor binding pocket and are, therefore,
dependent on the specific interactions of each of the two
ligands with the protein (Sanchez-Pedregal et al. 2005). In
line with common SBDD worflows, the INPHARMA
NOEs are used to select among possible complex structures
suggested by molecular docking. The bound ligand struc-
tures, which can be determined by tr-NOEs, are docked to a
structural model of the apo-receptor. A library consisting of
pairs of complex structures (receptor/L1and receptor/L2) is
generated by combining all docking modes of L1to the
receptor with all docking modes of L2to the receptor. The
resulting docking models pairs are ranked on the basis of
the agreement between the predicted and the experimental
INPHARMA NOEs (Reese et al. 2007).
Previously, we demonstrated that INPHARMA is able to
determine the binding mode of the two ligands L1and L2to
the catalytic subunit of PKA (Orts et al. 2008). In this work
we aim at establishing INPHARMA as an effective scoring
function for binding modes in high-throughput docking
campaigns. First, we evaluate the ability of INPHARMA to
provide high-resolution binding modes when ligands are
docked to a correctly folded binding pocket; second, we
evaluate the efficacy of the methodology in dependence of
the accuracy of the protein structure used in the docking
experiments. We prove that the use of experimental IN-
PHARMA data to score binding modes generated in silico
provides a considerable improvement in the accuracy of the
selection of the correct binding pose, even when using a
poor representation of the protein binding pocket. As a test
system, we use the two ligands L1and L2bound to the
catalytic subunit of the protein PKA, for which experi-
mental data have been measured in the laboratory as
described in the Experimental Section. L1and L2bind PKA
with KDs of 6 and 16 uM, respectively and are therefore
suitable to measure both transferred-NOEs and INPHAR-
MA NOEs. The crystal structures of the complexes PKA/
L1and PKA/L2(3DNE.pdb and 3DND.pdb, respectively)
serve as benchmark to evaluate the performance of
INPHARMA.
INPHARMA allows the definition of binding modes
to 1 A˚resolution
The bound structures of L1and L2, which can be deter-
mined by transferred-NOEs, are docked into the structure
of the catalytic subunit of PKA from 3DNE.pdb after
removal of the ligand. The PKA structure of 3DND.pdb
could have been used instead, as the protein heavy atom
RMSD (root mean square deviation) in the two complexes
is only 0.28 A˚. 200 docking modes are generated per ligand
with the program PLANTS (Korb et al. 2009) and
90
80
70
60
50
40
30
20
10
0
012345
Accuracy (%)
Binding pocket RMSD (Å)
0.8%0.6%
0.2%0.2%0.2%
Fig. 2 Accuracy of the INPHARMA predictions as a function of the
quality of the receptor structure. The x-axis represents the (protein
only) binding pocket RMSD of the receptor models used in the
docking from the crystallographic structure of PKA in the complex
PKA/L1(3DNE.pdb). The accuracy on the y axis is defined as the
number of complex pairs reproducing the correct ligands superposi-
tion (relative binding mode of L1and L2) divided by the total number
of pairs selected by INPHARMA. The numbers over each bar in red
represent the accuracy before applying the INPHARMA score. In this
case the accuracy is the number of the complex pairs showing the
correct ligands superposition divided by the total number of complex
pairs selected by the energy function of the docking program. The
docking for this dataset was performed with SURFLEX
26J Biomol NMR (2012) 52:23–30
123
Page 5
combined pair-wise to give 40,000 pairs of complex
structures of PKA/L1and PK/L2. Each pair of this library is
represented in Fig. 1 in terms of the RMSD of each ligand
from the true binding mode, as observed in the crystal
structures ofthe PKA/L1
(3DNE.pdb and 3DND.pdb). The initial library of docking
modes contains complex structures pairs where both
ligands are in the correct orientation (lower left corner),
both ligands are in the wrong orientation (higher right
corner) or only one ligand is in the correct orientation
(lower right and higher left corners). Next we ranked the
40,000 structure pairs with respect to the agreement
between the theoretical, predicted INPHARMA NOEs for
each particular structures pair and the experimentally
measured INPHARMA NOEs of Table S1. The purpose of
this analysis is to verify whether INPHARMA data can be
used to select the correct binding modes of L1and L2and
to determine the maximum achievable resolution of the
resulting complex structures. We use the linear correlation
coefficient R2to describe the agreement between experi-
mental and theoretical INPHARMA data; pairs of complex
structures with R2[0.89 are accepted. Indeed, the struc-
tures selected by INPHARMA (Fig. 1b) are those of the
lower left corner of the graph of Fig. 1a, namely close to
the correct binding poses for both ligands. A closer analysis
of the INPHARMA-selected structures reveals that they
correspond to only one orientation per ligand, with L1and
L2being defined to a precision higher than 0.5 and 1 A˚,
respectively (Fig. 1c). The maximum distance between two
INPHARMA selected structures is between the orange and
the yellow binding mode of L2(Fig. 1c) and corresponds to
a rotation of 21? around the axis perpendicular to the figure
plane. This result highlights an impressive performance of
INPHARMA, which distinguishes even between closely
related binding modes at a high level of resolution (*1 A˚).
The receptor model used in the docking can be derived
either from the structure of the apo-receptor or from the
structure of the receptor in complex with a reference ligand
Lx. In the absence of conformational rearrangements
between the apo- and the holo-receptor, or between the
receptor/Lxand the receptor/L1(receptor/L2) complexes,
the absolute binding mode of any ligand (L1….Ln) can be
derived at a high confidence level from INPHARMA data
measured for pair-wise combinations of ligands (e.g. L1
and L2).
INPHARMA alleviates the need of crystallizing the
receptor in complex with all chemical lead series of
interests, overcoming an important limiting factor in the
daily work of pharmaceutical industry. The binding modes
of all chemical series of interest are within reach through
the measurement of a few INPHARMA NOESY spectra
and the employment of the INPHARMA NOEs as a reli-
able selection criterion for docking modes. The NMR time
andPKA/L2
complexes
necessary to acquire data for a ligands pair amounts to only
2 days, while the calculation time is less than 1 day for
40,000 pairs of docking models.
INPHARMA allows a 100-fold improvement
with respect to docking scoring functions
Despite the enormous potential of INPHARMA demon-
strated in the previous section, the pharmaceutical research
often faces more challenging cases, where either the
structure of the receptor is not known at a high level of
accuracy or the receptor undergoes substantial conforma-
tional changes between the apo- and holo-forms (Barto-
schek et al. 2010). In this section the performance of
INPHARMA as energy function to rank docking modes
generated from an ill-defined protein structure is system-
atically tested and compared with the performance of state-
of-the-art docking scoring functions.
As a test system we use the protein PKA in complex
with L1and L2(Fig. S1). Structures of PKA that differ
from the ligand-bound structure were generated by a high-
temperature molecular dynamic simulation run starting
from the crystal structure of 3DNE.pdb after removal of L1.
700 frames were sampled during the simulation, resulting
in structures that display 0.5–6 A˚heavy atom RMSD in the
binding pocket from the ligand-bound structure (Fig. S3).
In our definition the binding pocket comprises all atoms
with distance \8 A˚from any ligand atom in the crystal
structure. All frames were subject to energy minimization
in explicit water. This initial library of PKA models con-
tains structures in a wide range of distances from the cor-
rect one and is therefore optimally suited to evaluate the
performance of INPHARMA in dependence of the accu-
racy of the protein structural model.
Next we docked the protein-bound conformation of L1
and L2to each of the 700 structural models of the protein
PKA. We used the rigid docking module of the commer-
cially available software SURFLEX (Jain 2003) and
retained the 10 best energy solutions for each docking run.
A filter based on similarity was applied to exclude redun-
dant binding modes for the same protein model. Complex
structures with the same protein model and with ligand all-
atom RMSD\0.5 A˚were represented by one member of
the family. Note that similar ligand binding poses in two
different protein models are considered non-redundant and
are retained. The final set of complexes consists of 4,636
and 4,758 poses for PKA/L1and PKA/L2, respectively.
The 4,636 and 4,758 structures for the PKA/L1 and
PKA/L2complexes, respectively, have been selected by the
docking scoring function as the lowest energy ones and
represent the docking solution to the problem. At this point
it is interesting to evaluate which percentage of the docking
models predicts the correct relative orientation of the two
J Biomol NMR (2012) 52:23–30 27
123