Available via license: CC BY-NC-ND 4.0
Content may be subject to copyright.
Development of Receptor Desolvation Scoring and Covalent
Sampling in DOCK 6: Methods Evaluated on a RAS Test Set
Y. Stanley Tan, Mayukh Chakrabarti, Reed M. Stein, Lauren E. Prentis, Robert C. Rizzo, Tom Kurtzman,
Marcus Fischer, and Trent E. Balius*
Cite This: J. Chem. Inf. Model. 2025, 65, 722−748
Read Online
ACCESS Metrics & More Article Recommendations *
sı Supporting Information
ABSTRACT: Molecular docking methods are widely used in drug discovery
eorts. RAS proteins are important cancer drug targets, and are useful systems for
evaluating docking methods, including accounting for solvation eects and
covalent small molecule binding. Water often plays a key role in small molecule
binding to RAS proteins, and many inhibitors�including FDA-approved drugs�
covalently bind to oncogenic RAS proteins. We assembled a RAS test set,
consisting of 138 RAS protein structures and 2 structures of KRAS DNA in
complex with ligands. In DOCK 6, we have implemented a receptor desolvation
scoring function and a covalent docking algorithm. These new features were
evaluated using the test set, with pose reproduction, cross-docking, and enrichment
calculations. We tested two solvation methods for generating receptor desolvation scoring grids: GIST and 3D-RISM. Using grids
from GIST or 3D-RISM, water displacements are precomputed with Gaussian-weighting, and trilinear interpolation is used to speed
up this scoring calculation. To test receptor desolvation scoring, we prepared GIST and 3D-RISM grids for all KRAS systems in the
test set, and we compare enrichment performance with and without receptor desolvation. Accounting for receptor desolvation using
GIST improves enrichment for 51% of systems and worsens enrichment for 35% of systems, while using 3D-RISM improves
enrichment for 44% of systems and worsens enrichment for 30% of systems. To more rigorously test accounting for receptor
desolvation using 3D-RISM, we compare pose reproduction with and without 3D-RISM receptor desolvation. Pose reproduction
docking with 3D-RISM yields a 1.8 ±2.41% increase in success rate compared to docking without 3D-RISM. Accounting for
receptor desolvation provides a small, but significant, improvement in both enrichment and pose reproduction for this set. We tested
the covalent attach-and-grow algorithm on 70 KRAS systems containing covalent ligands, obtaining similar pose reproduction
success rates between covalent and noncovalent docking. Comparing covalent docking to noncovalent docking, there is a 2.4 ±
3.29% increase and a 1.27 ±3.33% decline in the success rate when docking with experimental and SMILES-generated ligand
conformations, respectively. As a proof-of-concept, we performed covalent virtual screens with and without receptor desolvation
scoring, targeting the switch II pocket of KRAS, using 3.4 million make-on-demand acrylamide compounds from the Enamine REAL
database. On average, the attach-and-grow algorithm spends approximately 17.61 s per molecule across the screen. The test set is
available at https://github.com/tbalius/teb_docking_test_sets.
1. INTRODUCTION
RAS proteins play an important role in cellular signaling,
particularly in modulating pathways governing cell growth and
proliferation. RAS genes (KRAS,HRAS, and NRAS) are among
the most prevalent oncogenes, with mutations occurring in
∼20% of human cancers.
1−3
Thus, RAS is an important and long
sought-after cancer drug target. RAS proteins are small GTPases
that bind a nucleotide, either GTP or GDP, and function as
binary switches.
4
GTP-bound RAS is in an “active” state, and
GDP-bound RAS is in an “inactive state”. In recent years, there
have been an influx of structures of RAS proteins in complex
with inhibitors, including approved drugs.
5,6
This structural
compendium
7
provides an excellent opportunity to develop
benchmarks for evaluating the eectiveness of new docking
software features that can contribute to virtual screening
campaigns against this challenging target.
To assess therapeutically useful areas of chemical space,
molecular docking methods are commonly used in drug
discovery campaigns.
8
Numerous docking software are currently
available, including AutoDock Vina,
9
Glide,
10
FRED,
11
GNINA,
12
FlexX,
13
rDOCK,
14
SEED,
15
PLANTS,
16
Gold,
17
DiDock,
18
and many others.
19,20
DOCK is the first docking
software,
21
and has undergone extensive development over the
past four decades. Two versions of DOCK are being actively
developed: DOCK 3 and DOCK 6. Both versions have benefits
Received: September 9, 2024
Revised: December 4, 2024
Accepted: December 17, 2024
Published: January 6, 2025
Articlepubs.acs.org/jcim
© 2025 The Authors. Published by
American Chemical Society 722
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
This article is licensed under CC-BY-NC-ND 4.0
and drawbacks; DOCK 3 is faster and is well tested, and DOCK
6 is modular and amenable to the development of new features.
There are many potential areas of improvement for docking
methods, including better accounting for water, and improving
covalent docking methods. In recognition of these limitations,
we are porting desirable features from DOCK 3 into DOCK 6
and augmenting them, e.g., the receptor desolvation scoring
function, and implementing new features that have been
inspired by them, e.g., the covalent docking method introduced
herein. We discuss these methods further in the following
sections.
Water has long been recognized as an important factor and
driver in molecular binding,
22,23
and appears to play a role in
small molecule binding to RAS.
24−26
There are many methods
to account for the contribution of water to binding in biological
systems.
27
Two methods grounded in statistical mechanics are
inhomogeneous solvation theory (IST)
28−30
and 3D reference
interaction site model (3D-RISM) theory.
31,32
IST has been
used by us and others, and it has been applied to ligand discovery
eorts.
33−37
Grid-based IST (GIST) has been useful in
understanding the thermodynamics and energetics of waters
on surfaces of proteins (as well as other molecules), and in their
cavities and openings. These grids provide a visualization of the
hydration patterns about the protein binding site.
38
Accounting
for receptor desolvation during docking with GIST has been
implemented in DOCK 3.
31,32
Many other methods have been
used to account for solvation eects in molecular docking.
39−41
As an alternative to GIST, 3D-RISM can be used to generate
solvation energy grids that can be readily incorporated into
receptor desolvation scoring.
In addition to noncovalent drugs, there are many examples of
covalent drugs,
42,43
including the first approved drug targeting
KRAS.
44
Better covalent docking methods would aid the
discovery of new covalent drugs. Many covalent docking
approaches have been reported, including DOCKovalent,
45
CovDOCK,
46,47
covalent docking using AutoDock4,
48
and
many others.
49
Most of these methods target cysteine residues.
However, other residues may also be attacked, including serines,
aspartates, and arginines, all of which have been targeted for
RAS,
50−52
as well as lysines, tyrosines, threonines, and
histidines.
53,54
There are many chemical moieties that may be
included in compounds as attacking groups, including
acrylamides, chloroacetamides, vinyl sulfones, disulfides, phenol
esters, and many more.
55
DOCKovalent was implemented
within DOCK 3, and it has been used in ligand discovery
campaigns.
45,56
In this work, we present three interconnected topics: (1) We
assembled a RAS test set for evaluating docking performance on
RAS to aid drug discovery and method development. We
evaluated new docking methods on the RAS test set, using both
enrichment calculations and pose reproduction experiments. (2)
We implemented receptor desolvation scoring functions within
DOCK 6, and we tested two solvation methods that account for
receptor desolvation within these scoring functions: GIST and
3D-RISM. GIST and 3D-RISM both calculate similar solvation
energetic quantities that are then stored on a grid; however, the
two methods dier in the underlying theory. GIST is quantified
by analyzing an MD simulation, while 3D-RISM numerically
solves the Ornstein−Zernike equation with a closure approx-
imation without simulation (see Section 2 for more
information). We evaluated these two methods for accounting
for receptor desolvation by comparing enrichment and pose
reproduction performance on KRAS systems in the test set (and,
for enrichment, on an alternative test set of 43 diverse targets).
(3) We implemented a covalent docking algorithm in DOCK 6,
called attach-and-grow. In this method, the molecule is first
attached to the receptor at the specified covalent residue, and
then the molecule is grown out to generate the pose. That is,
user-specified covalent bond information is used to position the
attachment point, then the molecule is revealed one segment at a
time, and degrees of freedom are sampled, until the molecule is
fully displayed. We evaluated the attach-and-grow algorithm
across 70 covalent KRAS systems (and on an alternative test set
of 207 diverse targets), and we compared its performance to
noncovalent docking using the anchor-and-grow algorithm.
Finally, we combined receptor desolvation scoring and covalent
docking by performing proof-of-concept virtual screens with a
purchasable library of over 3 million covalent compounds to
demonstrate applicability of these methods to ligand discovery
campaigns.
2. METHODS
2.1. Test Set Generation. The RAS test set is made up of
KRAS, HRAS, and NRAS structures in complex with ligands (SI
Section 1 and SI Tables S1−S3). For a full list of ligands and
their corresponding PDB IDs, see Table S1. These structures
dier in nucleotide state and in sequence (Tables S2, S3, and
Figure S1). In all docking experiments, the nucleotide and, when
present, the ion cofactors, are treated as part of the receptor.
To add systems to the RAS test set, structures from the
Protein Data Bank
57
(PDB) are prepared for docking in a
semiautomated manner. The preparation procedure is organized
into two stages: processing the PDB files and preparing the
docking setups. In addition to the RAS test set, we evaluated our
methods on test sets of diverse targets consisting of 207 covalent
systems from Scarpino et al.
49
and 43 systems from the DUDE-Z
test set,
58
prepared in a similar manner.
In the first stage, the PDB files are processed. The files are
separated into ligand, cofactor, and protein files, and each piece
is prepared by adding hydrogens and partial charges.
(1) PDB files are separated into ligand, cofactor, and protein.
A ligand−receptor pair is flagged as covalent if there is a
link definition in the PDB file between the ligand and the
receptor. Ligands flagged as covalent are prepared for
both covalent and noncovalent docking (to exploit
covalent systems for noncovalent docking evaluation,
and to directly compare covalent and noncovalent
docking). Covalently prepared ligands are docked using
the covalent attach-and-grow method. Noncovalently
prepared ligands are docked using two noncovalent
docking methods: anchor-and-grow and hierarchical
database search (see Section 2.4 below). For covalent
systems, the residue that forms a covalent bond with the
ligand is mutated to an alanine to avoid steric clashes.
(2) Cofactor files are protonated using UCSF Chimera.
59
(3) Proteins are passed through Chimera’s Dock Prep, where
solvent (including all structural waters) and alternative
side chains are removed, and missing rotamers of side
chains are added. Hydrogens are added using the program
Reduce.
60
(4) As with the cofactors, experimental conformations of the
ligands (xtal) are protonated using Chimera, and partial
charges are assigned using AM1-BCC charges
61
calcu-
lated in Chimera using Amber’s sqm program.
62
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
723
(5) The processed ligand, cofactor, and protein files are
aligned to the same frame (aligned to PDB ID 6GJ8)
using Chimera’s MatchMaker.
If needed, the processed ligand, cofactor, and protein
files are modified by hand, allowing us to salvage systems
that fail to run through our automated pipeline. The files
are then placed in a staging directory for convenient
access.
In the second stage, the files for docking produced in
the first stage are used to prepare the docking setups.
(6) Partial charges of the protonated nucleotides are
generated with Amber’s antechamber using the AM1-
BCC model. The protonated receptor is opened in
Chimera, and partial charges are added using the Dock
Prep module. The nucleotide is added to the receptor
mol2 file using the combine command in Chimera. The
charged nucleotide mol2 files are converted to parameter
files to be read into blastermaster.py; charges are taken
from the mol2 file, and Sybyl mol2 atom types are
converted into DOCK atom types (used to define the van
der Waals parameters).
(7) The processed ligand mol2 files containing the exper-
imental structure from step 1 are sucient for docking
with anchor-and-grow using Grid Score. AMSOL 7.1
63
is
also used to recalculate partial charges and the per-atom
desolvation parameters for use with Chemgrid Score. To
prepare covalent ligands noncovalently, the atom involved
in the covalent bond is removed to ensure that the
experimental pose can be reproduced without clashing.
The ligand is then prepared in the same way as other
noncovalent ligands. To prepare ligands covalently, the
ligand with the covalent adduct (i.e., the γsulfur, Sγ, and β
carbon, Cβ) is protonated, and partial charges are added
using Chimera and AMSOL 7.1.AMSOL is also used to
calculate the per-atom desolvation parameters. Hydro-
gens associated with the covalent adduct are removed, and
the partial charges of the two remaining heavy atoms are
set to zero. The sum of the removed partial charges is
distributed to the remaining atoms. The two covalent
adduct heavy atoms are then converted to dummy atoms.
The Sγand Cβatoms are renamed D1 and D2,
respectively, and the atom type is changed to Du.
(8) Ligands are prepared from SMILES. SMILES of each
ligand are obtained from the PDB (see SI Table S1).
Using the covalent linkage information from the PDB
files, the SMILES of covalent ligands are modified for
noncovalent docking (SI Table S1). Ligands are
converted from 2D SMILES format to a 3D file format
for docking in a multistep process (discussed also in
Section 2.6). The scripts to prepare ligands from SMILES
are now distributed with DOCK 6.12. Briefly, protonation
states of the ligands are determined using ChemAxon’s
cxcalc
64
(https://www.chemaxon.com). The protonated
SMILES strings are converted to a single 3D con-
formation in the mol2 file format using Corina.
65
AMSOL
7.1 is used to calculate partial charges and per-atom
desolvation parameters. The partial charges and desolva-
tion parameters are added to the mol2 file. This mol2 file
can be used for docking with anchor-and-grow using
Chemgrid Score. Covalent ligands are prepared non-
covalently using the same procedure, but the covalent
adduct atom is removed from the SMILES. The
procedure to prepare ligands covalently from SMILES is
described below in Section 2.3.1.
(9) DOCK 6 grids and spheres are generated. Grids are used
for scoring, and spheres are used for orienting the ligand
into the binding site. The program DMS
66
is used to
generate the molecular surface of the protein. The
program sphgen
67
is used to generate surface spheres
representing the inverse image of the protein. The
program sphere_selector is used to pick the spheres within
8 Å of the ligand, and a box is generated encompassing the
spheres. DOCK 6′sgrid program
68
is used to calculate the
van der Waals grids (using an all-atom force field and 6−9
exponents) and Coulombic grids (using a distance
dependent dielectric).
69
(10) DOCK 3 grids and spheres are generated. We use the
blastermaster.py script distributed with DOCK 3.7.
DOCK 3 spheres, called matching spheres, are generated
by converting the crystallographic ligand atom to spheres
and supplementing with surface spheres from sphgen. The
DOCK 3 grids are converted to a DOCK 6 compatible
format using the gconv program distributed with DOCK
6.12 (updated to work with grids produced with
blastermaster). Blastermaster is first run without the
nucleotide. The nucleotide is then added to the
protonated receptor file. Blastermaster is run a second
time without the hydrogen generation step (addNOhy-
drogensflag) and using the van der Waals and partial
charge parameters supplemented with the nucleotide
parameters generated in step 6.
(11) Receptor desolvation scoring grids are generated using
GIST and 3D-RISM, as described below in Sections
2.2.2−2.2.5.
Scripts for the semiautomated procedure used to prepare the
test set, as well as the processed ligand, cofactor, and protein files
for docking, have been made available on GitHub.
2.2. DOCK Scoring Functions. In this study, we evaluate
three DOCK 6 scoring functions: (1) Grid Score, (2) Chemgrid
Score, and (3) Chemgrid + RecDesolv Score. RecDesolv Score
refers to the receptor desolvation scoring function, which can be
used with grids produced by either the GIST or 3D-RISM
solvation methods.
Grid Score is a two-component scoring function
E E E
Grid Score vdw coul
= +
_
It has a van der Waals component (Evdw) (all-atom, 6−9
attractive and repulsive exponents), and an electrostatic
component using Coulomb’s law (Ecoul) with a distance
dependent dielectric.
69
Chemgrid Score is a three-component scoring function
E E E E
Chemgrid Score vdw ele lig desolv
= + +
_ _
It has a van der Waals component (Evdw) (united-atom, 6−12
attractive and repulsive exponents), an electrostatic component
calculated using a Poisson−Boltzmann potential grid (Eele), and
a ligand desolvation (Elig_desolv) component.
68,70,71
RecDesolv Score can be called by itself for rescoring, although
it is more useful when combined with the above functions. The
receptor desolvation term is calculated as follows
E C E
rec desolv water displacement
=*
_ _
Here, the water displacement term (Ewater_displacement) is scaled
by a scaling factor (C) (see SI Section 8). The water
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
724
displacement term is calculated using the receptor desolvation
grids, discussed further below.
To incorporate receptor desolvation scoring, we call the
Descriptor Score scoring function, which is used to combine
Chemgrid Score with RecDesolv Score
E
E E E E
Chemgrid RecDesolv Score
vdw ele lig desolv rec desolv
= + + +
+ _
_ _
2.2.1. Receptor Desolvation. We implemented a receptor
desolvation scoring function into DOCK 6. There are three
types of calculations: (1) GIST displacement, which is the same
as that presented in Balius et al.,
31
(2) GIST blurry displacement,
in which Gaussian weighting is applied to the voxels based on
proximity to the center of an atom to avoid issues with double
counting, and (3) GIST blurry trilinear displacement, in which
atomic displacements are precomputed to make the calculation
fast. GIST blurry trilinear displacement approximates GIST blurry
displacement, and was introduced previously.
32
See Section 2.2.6.
for more information on these types of receptor desolvation
calculations. We first discuss how the receptor desolvation
scoring grids are produced. We then discuss how to calculate the
displacements. We also substitute the grids generated using
GIST with grids generated using 3D-RISM, which, in general, is
faster and easier to run (SI Figure S2).
2.2.2. GIST. To generate receptor desolvation grids using
GIST, we perform an all-atom molecular dynamics (MD)
simulation of the receptor in a box of water, where the receptor
heavy atoms are restrained and the water molecules sample
about the receptor. This simulation is then postprocessed. A grid
is laid over the region of interest, and for each grid point, or
voxel, the following thermodynamic properties are calculated
and stored as dx files: the receptor (solute)-water enthalpy (Esw),
the water−water enthalpy (Eww), the water−oxygen occupancy
(go), the translational entropy (Strans), and the orientational
entropy (Sorient). These quantities can be combined in dierent
ways. The gogrid is used to reference the water−water term to
bulk water. The solute (receptor)-water enthalpy and water−
water enthalpy (referenced to bulk) is combined to give the total
enthalpy grid. The entropy grids are combined with the total
enthalpy grid to give the total free energy grid. Here, we focus on
the total enthalpy grid.
2.2.3. GIST Grid Generation. We generate GIST grids using
the procedure described in Balius et al.,
31
with a few key
dierences. Briefly, the GIST grid generation procedure involves
running an MD simulation, followed by running the GIST
calculation to generate GIST grids. The GIST grids are then
processed and combined to generate the GIST docking grids,
and then blurred using the procedure described in Stein et al.
32
to generate precomputed grids.
As discussed in Section 2.1,antechamber was used to
parametrize the cofactor, AM1-BCC partial charges were
calculated, and frcmod and prep files were produced. Amber’s
tleap is used to add the force field, and to place the protein,
magnesium, and nucleotide in a box of water. The water box has
a margin of 10 Å in all directions from the complex. The 14SB
force field
72
was applied to the protein, the TIP3P water model
73
was applied to the solvent, 12-6-4 Li/Merz ion parameters
74
were used for the magnesium ion, and the GAFF force field,
75
supplemented with the frcmod file, was used to parametrize the
cofactor. We use Amber 22 pmemd.cuda to perform
minimization and equilibration, and to run the MD simulation.
As an initial test, for 20 KRAS systems, we run a 100 ns
simulation, in ten chunks of 10 ns each. To reduce the
computational demands of running GIST, for all of the KRAS
systems (N= 123), we run a shorter 10 ns simulation (see
Section 3 for more information). To run MD simulations on
systems with missing backbone regions, we capped the missing
regions with ACE and NME groups. See SI Table S1.1 for
systems with missing regions that were capped for MD
simulations. DUDE-Z systems are prepared similarly, and
GIST grids are generated using MD simulations of 10 ns with
all termini capped (see SI Section 5).
We then run the GIST calculation, using cpptraj version 6.22.0
of Amber, to process the simulation data and generate GIST
grids. We used an MPI-parallelization of GIST,
76
which provides
a significant speed up over the nonparallelized calculation (see
SI Section 2 and Figure S2). At this stage, the dimensions of the
GIST grid box are determined. We ensured the GIST grid box
completely contains the other grid boxes (e.g., the van der Waals
grid box) to not impact sampling coverage. If the GIST grids are
smaller, poses that extend outside the GIST grid are discarded,
and many poses that would normally be scored will not be
considered. Note that changes to the box size will have an impact
on enrichment and pose reproduction.
The grids produced by the GIST calculation are combined to
produce the docking GIST grid. We only use the enthalpy terms,
and we scale the water−water term by two.
31,36
We apply a cap
of ±3 kcal/mol/Å3to the values of the voxels, i.e., any voxel that
exceeds a magnitude of 3 kcal/mol/Å3is set to the cap value.
This energy capping step is used to remove extremely high
magnitude energies, and the value of 3 kcal/mol/Å3used follows
the procedure previously.
31
The grids produced at this stage are
sucient for docking with the GIST (full) displacement and
GIST blurry displacement methods. We then apply Gaussian
weighting and precompute displacements, as discussed above, to
obtain the blurred GIST grids. The blurred grids are used for
docking with the GIST blurry trilinear displacement method.
2.2.4. 3D-RISM. 3D-RISM is an alternative solvation model to
GIST. Unlike GIST, 3D-RISM does not postprocess an MD
simulation. Instead, 3D-RISM is an approximation of the
molecular Ornstein−Zernike (OZ) equation,
77
derived from
statistical mechanics. To solve the 3D-RISM equation, a closure
relation is needed. Dierent approximate closures may be used,
including Kovalenko−Hirata (KH),
78
partial series expansion of
order n(PSEn),
79
and hyper-netted chain (HNC),
80,81
which
are implemented in Amber.
82
The 3D-RISM and closure
equations are solved on a 3D grid by iteratively converging on a
solution,
83
accelerated using the modified direct inversion of the
iterative subspace (MDIIS) method.
84
Convolution integrals in
the 3D-RISM equation are evaluated in reciprocal space using
the fast Fourier transform.
85,86
The 3D-RISM solution is then
used to calculate the thermodynamic properties at each grid
point, including the excess chemical potential (solvation free
energy, labeled exchem), solvation energy (labeled solvene), and
solute−solvent potential energy (labeled potUV).
82
The
solvation energy (solvene) and solute−solvent potential energy
(potUV) roughly correspond to the GIST total enthalpy (Esw +
Eww) and the GIST solute-water enthalpy (Esw), respectively.
3D-RISM models solvent molecules as being composed of
interaction sites, e.g., water has oxygen and hydrogen sites; and
3D-RISM produces separate distribution grids for each solvent
site. Because separate grids for oxygen and hydrogen solvent
sites are produced, the grids must be combined to obtain
molecular distribution grids like GIST. There are two
approaches implemented in Amber: (1) combining the grids
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
725
additively or (2) using the molecular reconstruction method
described by Nguyen et al.
87
We generated grids using both
methods, and we compare docking performance without
receptor desolvation to docking with receptor desolvation
calculated with additively combined 3D-RISM (see Section 3),
or with molecular reconstruction 3D-RISM (see SI Section 6).
Because additively combined 3D-RISM grids outperform
molecular reconstruction 3D-RISM grids for docking using
our scoring method, we focus only on additively combined 3D-
RISM in the main text.
2.2.5. 3D-RISM Grid Generation. The 3D-RISM grid
generation procedure closely follows the GIST procedure.
Because 3D-RISM does not postprocess a simulation, it also
does not require placing the protein in a water box, and 3D-
RISM grids are generated without changing the frame.
We perform a minimization with pmemd.cuda on the receptor
before running the 3D-RISM calculation using rism3d.snglpnt in
Amber. The 3D-RISM equation is solved using the KH, PSE2,
and PSE3 closures, in that order. The cTIP3P water model is
used, and 1D-RISM results for cTIP3P water using PSE3 closure
are provided through the cTIP3P_pse3.xvv input file. For
systems that fail to converge, we adjust the MDIIS settings until
the calculation is successful (SI Section 2 and Table S4). We
tested the excess chemical potential, solvation energy, and
solute−solvent potential energy grids summed over all solvent
sites, labeled exchem.1.dx,solvene.1.dx, and potUV.1.dx,
respectively, for our receptor desolvation energy calculations.
These 3D-RISM grids are treated as a direct substitute for the
total enthalpy GIST grids used for docking with the (full)
displacement and blurry displacement methods.
The 3D-RISM grids encompass the entire protein, and we
trim the box dimensions to include only the area around the
binding pocket, matching the size of the GIST grid box. Note
that the trimmed 3D-RISM grid box should completely contain
the other grid boxes to not impact sampling coverage, like the
GIST grids. After trimming the box dimensions, we apply a cap
and generate precomputed grids, following the same procedure
as for GIST. Because the magnitudes of the voxels for the 3D-
RISM grids are dierent from GIST, we truncate the maximum
values of the voxels in proportion to GIST by applying a cap of
±100 kcal/mol/Å3to the values of the voxels (i.e., any voxel that
exceeds a magnitude of 100 kcal/mol/Å3is capped). Having
tested the three dierent scoring methods with GIST, and
having determined the behavior to be correlated (see SI Section
7), we only test receptor desolvation scoring with 3D-RISM by
docking to the blurred grids with the blurry trilinear displace-
ment method.
2.2.6. Using GIST and 3D-RISM Grids to Account for
Desolvation in Docking. We calculate the receptor desolvation
energy for ligand binding by summing up the energies of the
voxels displaced by the ligand
E E
v V
vwater displacement
disp,l
=
_
The total displacement region (Vdisp,l) is made up of the voxels
proximal to the atoms of the ligand. Here, Evhas been multiplied
by the volume of the voxel (0.125 Å3), and is the energy in kcal/
mol. If we assume no double counting, meaning no overlap
between atoms, then
E E
a v V
vwater displacement
lig adisp,
=
_
where ais an atom of the ligand (lig) and the atom-specific
displacement region (Vdisp,a) is made up of the voxels proximal to
that atom. However, some atoms in molecules will have
overlapping displacement regions, so we must avoid double
counting
E E
i N v V
v V
j i
vwater displacement
1, ,
for
a i
a j
disp, ( )
disp, ( )
=
_
=
<
where Nis the number of atoms in the ligand, and we only add
the energy associated with a voxel that has not already been
included. We described this method previously in Balius et al.
31
The bookkeeping associated with avoiding double counting is
substantial, and partially results in a 6-times slower runtime in
docking when evaluating displacement on-the-fly.
31
To obviate
the double counting issue, we precompute receptor desolvation
displacement, which allows us to evaluate receptor desolvation
in the same manner as the other scoring components
(electrostatics and van der Waals). Specifically, the closest
points are looked up and trilinear interpolation is used to infer
the energy at the atomic position.
To avoid the issue of double counting, we weight energy
stored at voxels by their proximity to the center of the atom. We
use the Gaussian equation for this purpose
i
k
j
j
j
j
j
j
j
i
k
j
j
j
j
j
÷÷÷÷ y
{
z
z
z
z
z
y
{
z
z
z
z
z
z
z
p x
x
( ) 1
2exp 1
2
a
a
a
a
2
=
where the mean,
÷÷÷÷
a
, is the atomic center of atom a, and the
standard deviation, σa, is a function of the radius of atom a: σa=
ra/D. Modifying the divisor, D, changes the standard deviation,
which determines the sharpness of the peak of the Gaussian. We
refer to applying Gaussian weighting to the voxels as “blurring
the grids”. Figure 1 shows a visualization of how the divisor
influences the sharpness of weight centered at the atomic
positions. For a smaller divisor of 2, the weight is less sharp, and
we cannot distinguish the shape of the aromatic rings of the
molecule, while for a larger divisor of 3, the weight is sharper and
clearly contours the molecule.
We write the water displacement as the sum of energies stored
on all voxels, weighted by the Gaussian distribution. This
Gaussian term will weight voxels closer to the atomic center
more than atoms at the periphery
E p v E( )
a v
avwater displacement
lig grid
=*
_
Due to the Gaussian weighting, we no longer need to
determine if a voxel is proximal to any atom, since the weight will
approach zero for voxels further away from the ligand atoms. For
voxels with overlapping regions, we see that the weight on that
voxel is a sum of all weights from all atoms
E p v E E p v( ) ( )
v a
av
v
v
a
a
water displacement
grid lig grid lig
=*=
_
This allows us to visualize the weights (Figure 1).
The farther away a voxel, the less it contributes. Thus, for
speed we only calculate the sum of the weighted energies over
the displaced voxels, that is, the voxels proximal to a ligand atom
E p v E( )
a v V
avwater displacement
lig adisp,
*
_
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
726
We precompute atomic displacement by placing a probe atom
at each grid point. All voxels displaced by the atom are weighted
by the Gaussian, and then summed to give the new precomputed
value. We generate two precomputed (blurred) grids: one with a
probe radius of 1.0 Å for hydrogens, and the other with the probe
radius of 1.8 Å for heavy atoms. To ensure that the probe atom is
contained within the original grid, we subtract a margin value in
all directions. The dimensions of both blurred grids are reduced
by 1.8 Å, so that these new grids have the same dimensions.
The precomputed grids are read into DOCK, and the trilinear
interpolation procedure is used to calculate the ligand
displacement. For each atom, the eight closest grid points are
looked up and combined by linear interpolation to give the value
at the atomic position. Table 1 shows the receptor desolvation
parameters as implemented in DOCK 6.12.
Although the receptor desolvation scoring method was
developed for GIST, the same procedures, i.e., blurring with
the Gaussian and precomputing displacements, may be applied
using 3D-RISM grids instead. When comparing GIST and 3D-
RISM, we replace the processed GIST grids with 3D-RISM grids
processed in the same way.
2.3. Covalent Docking with Attach-and-Grow.We have
implemented a covalent docking method into DOCK 6, called
attach-and-grow. The attach-and-grow algorithm aligns the
modified covalent molecule to the covalent attachment point
residue. The covalent bond environment is defined by four
parameters that are sampled during docking: a dihedral angle
(φ), a bond angle (θ), and two bond distances (d1 and d2). A
cartoon shows these parameters (Figure 2). The dihedral angle
is sampled starting at 0°, and steps to 360°with a user-specified
step size. The dihedral angle sampling is analogous to the orients
in anchor-and-grow. Values for the bond angle and two distances
are user-specified. The dierent bond environments are
analogous to dierent anchors (e.g., rings) in anchor-and-grow.
Also analogous to anchor-and-grow,attach-and-grow minimizes
torsions at every stage of growth using the premin option in
DOCK 6. Including the verbose flag at runtime for DOCK 6
displays the progress of growth in the dock statistics, similarly to
anchor-and-grow. The growth statistics may be useful for
Figure 1. Using the Gaussian function centered at each atom to define
the weights of grid voxels for the blurry-displacement version of the
GIST scoring function. (A) The weights determined using a Gaussian
function with a divisor of 2: σa=ra/2. (B) The weights determined
using a Gaussian function with a divisor of 3: σa=ra/3. For both panels,
three contour levels are shown: purple surface is weighted the highest
(weight > (A) 0.6 or (B) 0.45), blue mesh the next highest (weight >
0.3), and finally, yellow mesh the lowest (weight > 0.1).
Table 1. Receptor Desolvation Scoring Function Parameters in DOCK 6.12
a
parameter input used description
gist_score_primary yes parameter to select GIST score as the scoring function.
gist_score_att_exp 6 attractive exponent. used for calculating the atomic radius.
gist_score_rep_exp 12 repulsive exponent. used for calculating the atomic radius.
gist_score_gist_scale −1.0 scaling factor.
parameters for full displacement receptor desolvation
gist_score_gist_type displace typeof GIST calculation; use displacement and avoid double counting (for rescoring).
gist_score_grid_file gist-EswPlus2Eww_ref_cap.dx
parameters for blurry displacement
receptor desolvation
gist_score_gist_type blurry_displace type of GIST calculation; use displacement and proximity weighting (for rescoring).
gist_score_grid_file gist-EswPlus2Eww_ref_cap.dx
gist_score_blurry_gist_div 2 sigma determines the sharpness of the peak of the Gaussian; equal to the radius divided
by a divisor: σa=ra/D. This parameter is only used for blurry displacement.
parameters for blurry trilinear displacement receptor desolvation
gist_score_gist_type trilinear type of GIST calculation; use trilinear (for docking).
gist_score_grid_file gist-
EswPlus2Eww_ref_cap_blurr_1pt8.dx run calculation on precomputed displacement grid using blurry GIST. Generated using
1.8 Å probe radius for heavy atoms.
gist_score_Hydrogen_grid_file gist-
EswPlus2Eww_ref_cap_blurr_1pt0.dx generated using 1.0 Å probe radius for hydrogens.
a
GIST Score may be used as a stand-alone scoring function or combined with other scoring functions as part of Descriptor Score.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
727
debugging if a molecule fails to produce a pose. This method has
been discussed previously.
88
Table 2 shows the sampling
parameters for the attach-and-grow algorithm as implemented
in the DOCK 6.12 release.
2.3.1. Covalent Database Generation. For the example
virtual screens, we first prepare a database of molecules
containing acrylamide warheads for docking. We modify the
acrylamide warheads by attaching two silicon dummy atoms to
the adduct in the place of the γ-sulfur and β-carbon. This is done
using a Python script (simple_reaction_file.py, available on
GitHub in the teb_scripts_programs repository) that takes as
input a SMARTS string specifying a reaction and a file
containing SMILES, and uses RDKit
89
to apply the chemical
transformation.
Here is the SMARTS string that we use for acrylamides:
[SiH3:1][SiH2:2].[C:3] = [C:4][C:5](=[O:6])[N:7] ≫
[SiH3:1][SiH2:2][C:3][C:4][C:5](=[O:6])[N:7]
We then process the transformed molecules using a script now
distributed with DOCK 6.12:
$DOCK6BASE/template_pipeline/hdb_lig_gen/generate/enu-
merate_stereoisomers.csh
This script enumerates all unspecified stereoisomers using
ChemAxon’s cxcalc
64
(https://www.chemaxon.com).
To generate mol2 files for covalent docking, we used another
script now distributed with DOCK 6.12:
$DOCK6BASE/template_pipeline/hdb_lig_gen/generate/buil-
d_ligand_simple_dock6_covalent_solv.csh
ChemAxon’s cxcalc was used to generate protomers (i.e., the
protonation states of the ligand) and isomers. Corina was used to
convert the ligand chemical structure into a set of 3D atomic
coordinates. AMSOL 7.1 was used to calculate partial charges
and desolvation parameters. Finally, a Python script (mol_co-
valent_Si_to_Du_solv.py, available on GitHub in the teb_script-
s_programs repository) was called to finish preparing the mol2
for covalent docking. This script converts the silicon atoms to
dummy atoms, removes hydrogens that were connected to the
silicon atoms, changes dummy atom charges to zero, and, to
ensure the formal charge is an integer, distributes the sum of the
removed partial charges to the remaining atoms equally. It also
adds the per-atom solvation to the mol2 file.
Because the attach-and-grow algorithm docks from a mol2 file,
and we do not need to conformationally expand our molecules
to generate DB2 files, database generation is faster and requires
less disk space in comparison to DOCK 3.7 (DOCKovalent).
2.4. Noncovalent Docking. We perform noncovalent
docking in pose reproduction experiments and enrichment
calculations with both noncovalent and covalent ligands. For
noncovalent docking of covalent ligands, the covalent attach-
ment point is not restrained, and the full degrees of freedom are
sampled. We prepare the covalent ligands by removing an atom
so that it is possible for the experimental pose to be reproduced
without a clash between the ligand and the protein (see SI Table
S1). This is done for ligands prepared from the experimental
structure, and ligands prepared from SMILES. There are two
methods that we use for noncovalent docking: Anchor-and-grow
and Hierarchical Database Search.
71,90
Anchor-and-grow is used
for pose reproduction experiments for more thorough sampling,
and Hierarchical Database Search is used for enrichment
calculations for speed.
2.4.1. Anchor-and-Grow. Anchor-and-grow is a breadth-first
search method where ligand conformational degrees of freedom
are explored on-the-fly.
91
The molecule is broken into rigid
segments at rotatable bonds. Segments with the most atoms and
attachment points become anchors, and, for each anchor, the
remaining segments are arranged in layers about the anchor. The
anchor is positioned in the binding site in multiple orientations
using the orienting spheres. For each partially grown pose (e.g.,
anchor placement), the next segment is revealed, and dihedral
angles are sampled and minimized. Conformations that exceed
an energy threshold are pruned before proceeding to the next
step of growth. The growth procedure continues until a
completed conformation of the ligand is generated. We use
the anchor-and-grow search method for noncovalent pose
reproduction experiments.
2.4.2. Hierarchical DataBase (HDB) Search. HDB search is a
depth-first search method where we search through precom-
puted ligand conformations stored in a hierarchical database,
formatted as a DB2 file.
71,90,92
The DB2 file format stores ligand
Figure 2. Covalent bond formation and environment. (A) The thiol−
ene reaction to attach an acrylamide warhead to a cysteine residue is
shown. (B) The covalent bond environment is defined by a bond angle
(θ), and two distances (d1 and d2). The dihedral angle (φ) is sampled.
Table 2. Attach-and-grow Sampling Parameters in DOCK
6.12
parameter input used
a
description
b
covalent_bondlength 1.8, 1.7:0.1:1.91 bond length between Du2 and
Du1. A value or a range
c
is given.
covalent_bondlength2 1.8, 1.7:0.1:1.91 bond length between Du1 and A1.
A value or a range
c
is given.
covalent_angle 104.0,
103.0:1.0:105.0
angle between Du2, Du1, and A1. A
value or a range
c
is given.
covalent_dihedral_step 10.0 step size for exploring the dihedral
angle. We start at 0°and go to
360°. The dihedral is defined by
Sph3, Du2, Du1, and A1.
a
First input is a single value used for virtual screens; second input is a
range of values used for test set pose reproduction.
b
See Figure 2B for
reference.
c
A range is provided using the form: start:step:stop; the
form: start:stop may also be used.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
728
conformations as a tree, with each node representing a part or
segment of the molecule in a specific conformation. A branch of
the tree represents a conformation of the complete molecule.
Each node appears only once, e.g., the root node (the rigid
segment), which is shared among all branches, and thus the DB2
file format is a highly compressed way of storing conformations.
Like anchor-and-grow, the rigid segment of the molecule is
placed in the binding site and scored. Then, for each branch
(conformation of the molecule), each node (segment) of the
branch is scored. The score is stored, and the node is flagged. If
the score of the node exceeds a threshold, the branch is
terminated early. If a node that has already been flagged is re-
encountered, the score value is looked up. To make the
algorithm ecient, the node is only oriented in the pocket before
scoring. HDB search is 16-times faster than anchor-and-grow.
90
We use the HDB search method for enrichment calculations.
2.5. Pose Reproduction Outcomes. In a completed
docking experiment, we classify the result into one of three
outcomes: docking success, scoring failure, and sampling
failure.
93
Docking success is defined when the top scoring
pose reproduces the experimental pose. Scoring failure is defined
when there exists a pose that reproduces the experimental pose,
but it is not the top scoring pose. Sampling failure is defined
when none of the ten poses generated reproduce the
experimental pose. A docked pose is considered to reproduce
the experimental pose if it is within 2 Å heavy atom root-mean-
square deviation (RMSD). We use a symmetry-corrected
RMSD (Hungarian algorithm
94
) implemented in DOCK 6.
Docking experiments might not run to completion and no poses
may be produced; we classified these as did-not-dock.
2.6. Cross-Docking. The structures were aligned to a single
PDB (PDB ID 6GJ8) prior to being prepared for docking.
Aligning all the structures to the same frame allows us to evaluate
cross-docked poses using the aligned ligand experimental pose
as a reference. Using 127 KRAS ligand−receptor pairs from the
test set, we performed cross-docking of each ligand into each
receptor, for a total of 16 129 docking calculations.
For cross-docking pose reproduction, RMSD values for a
cross-docked pose of a ligand were calculated using the aligned
experimental pose of the same ligand as the reference. Because
the receptor conformation is dierent between systems and is
kept rigid during docking, the experimental ligand pose may not
be possible to reproduce for some cross-docking experiments. If
the experimental pose clashes with the receptor, we classify the
ligand−receptor pair as incompatible.
93
To determine which
pairs are incompatible, we perform a constrained minimization
of the ligand. If, after the constrained minimization, the
Chemgrid Score is high (above 1000.0 kcal/mol), or if the
molecule moves significantly (more than 2 Å RMSD), the pair is
classified as incompatible (see SI Figure S3). Docking
experiments with ligand−receptor pairs classified as incompat-
ible usually result in a sampling failure or did-not-dock outcome,
with a few exceptions (see SI Section 3 and Table S5). Note that
an incompatible pair (i,j) is determined by comparing the
experimental ligand pose (from system i) with the receptor
(from system j) without any docking, while the sampling failure
and did-not-dock outcomes are determined by the docking
experiments.
2.7. Enrichments. DUDE-Z-like decoy backgrounds were
generated for each of the 110 KRAS ligands for enrichment
calculations on KRAS systems. The decoy generation procedure
is adapted from the procedure for generating DUDE-Z decoys
described in Stein et al.
58
The ligands’ SMILES were protonated
at pH 7.2 using ChemAxon’s cxcalc. For each ligand protomer,
the ZINC20 database
95
was searched for potential decoy
molecules that matched the ligand’s molecular weight,
calculated water-octanol partition coecient, number of
rotatable bonds, number of hydrogen-bond donors, and number
of hydrogen-bond acceptors. The potential decoy molecules
were protonated, and molecules that did not match the formal
charge of the ligand were discarded. Morgan fingerprints
96
(using a radius of 4) for the ligands and decoys were generated
using RDKit, and Tanimoto coecients (Tc) were used to
determine similarity between molecules. Decoys too similar to
any ligand (Tc> 0.35) were discarded, and the remaining decoys
were clustered using a Tcthreshold of 0.5. After clustering, the
decoys were assigned to the ligands, with a goal of 50 property-
matched decoys per ligand protomer. See SI Section 15 for
discussion of parameter choice. The procedure to generate
charge extrema decoys is the same, except, instead of matching
the formal charge of the ligand, decoys were found for each of
the five formal charges between −2 and +2. A goal of 50 extrema
decoys per charge per ligand protomer was set, for a total of 250
extrema decoys per ligand. Charge extrema decoys were used to
check if the docking scoring function was overoptimizing for
charge interactions.
Enrichment benchmark databases are generated in the DB2
file format. ChemAxon’s cxcalc was used to protonate the
SMILES at pH 7.2. Corina was used to generate 3D
conformations of the molecules from the 2D SMILES
representations. RDKit,AMSOL 7.1, and DOCK 6 were used
to generate multiple conformations of each molecule.
Conformations were expanded about a rigid segment of the
molecule; any rigid segment with more than 4 heavy atoms was
used. The multiconformation mol2, partial charges, and per-
atom desolvation (calculated from the starting conformation)
are converted to the DB2 format using the mol2db2.py script
distributed with DOCK 6.12 (in the $DOCK6BASE/template_-
pipeline/hdb_lig_gen/mol2db2/ directory). All molecules are
generated as noncovalent databases. For covalent ligands, one
atom was removed from the SMILES prior to database
generation to ensure that the molecule could reproduce the
experimental pose without clashing. To evaluate the docking
performance on each pocket, the enrichment benchmark sets
were partitioned based on the binding pocket of the ligands.
Because a significant proportion of switch II pocket-binding
ligands were covalent, the switch II pocket enrichment set was
further partitioned based on whether the ligand binds covalently
or noncovalently. This resulted in three enrichment sets: switch
I/II pocket, switch II pocket noncovalent, and switch II pocket
covalent.
To quantify enrichment, we use the area under the curve
(AUC) of the receiver operator characteristic (ROC) curve,
which plots the true positive rate (% of ligands found) versus the
false positive rate (% of decoys found). To produce the curve,
first, a sorted list of the best scores for each unique ligand and
decoy is obtained. We iterate through the list, stepping up when
a ligand is encountered and stepping right when a decoy is
encountered. Perfect enrichment of the ligands from the decoys
results in an AUC of 100%. Random enrichment where the
ligands are uniformly distributed among the decoys results in an
AUC of 50%. Any AUC value above 50% is better than random,
and any value below 50% is subrandom. To weight early
enrichment more than later enrichment, we calculate at area
under the log adjusted ROC curve, where we take the logarithm
of the false positive rate, and subtract the AUC of the random log
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
729
adjusted ROC curve to obtain the logAUC.
70
As a result, the
logAUC will be positive for better than random enrichment and
negative for subrandom enrichment. Because the logarithm
function is undefined at 0, we define a lower bound of 0.001.
That is, the logAUC is calculated on the range from 0.1 to 100%
for the % of decoys found.
2.7.1. Error in Enrichment Calculations Using Boot-
strapping. To quantify error in our enrichment calculations,
we use bootstrapping
97
to resample. We perform random
sampling with replacement on the list of scored molecules to
obtain a resampled list of the same length as the original. The
ligands and decoys are sampled separately, so that the number of
ligands and decoys remains consistent. We perform 1000
bootstrap runs and calculate enrichment metrics for each,
generating a distribution of logAUC values. To compare
enrichment between dierent docking setups, we perform
paired bootstrapping. We dock the same ligands and decoys
using two dierent docking setups and perform the same
resampling of the docked molecules for both enrichment runs.
This results in 1000 paired logAUC values, from which we
calculate the dierence of each pair.
We calculate p-values using two t-tests from SciPy:
98
two-
sample t-test (using stats.ttest_ind) and one-sample t-test (using
stats.ttest_1samp). The two-sample t-test compares two
distributions, testing the null hypothesis that the two
distributions have the same mean. The one-sample t-test tests
the null hypothesis that the mean of the distribution of logAUC
dierences is equal to a given population mean, set to zero. In
both cases, we are testing whether the mean logAUC dierence
between the two bootstrapped distributions is significant. We
reject the null hypothesis with a level of significance of 0.01 (p-
value ≤0.01) and classify the enrichment calculation as
significant or not significant (no significant dierence)
accordingly. Rejecting the null hypothesis indicates that the
change in the docking setup (e.g., incorporating receptor
desolvation) had a significant impact on the enrichment results.
We classify changes to the docking setup as better or worse
based on the results of the two-sample t-test and the change in
logAUC. For example, when comparing enrichment with and
without 3D-RISM receptor desolvation for one system, if the p-
value is less than 0.01, and the logAUC with 3D-RISM is greater
than the logAUC without 3D-RISM, or vice versa, we classify the
system as better or worse, respectively.
2.8. Error in Pose Reproduction Experiments. To
quantify error in pose reproduction experiments, we ran the
docking with dierent random seeds for the simplex
minimization procedure.
91
Because we use the simplex
minimization from the start of the sampling procedure for
both anchor-and-grow and attach-and-grow, modifying the
random seed of the simplex will perturb the sampling from the
start, ensuring sampling divergence.
To compare pose reproduction between dierent docking
setups, we ran the pose reproduction experiments with each
docking setup 100 times with dierent random seeds. For each
of the 100 random seed runs (or replicas), we classify the
completed docking calculation for each system into one of the
three docking outcomes: docking success, scoring failure, and
sampling failure. Docking calculations that do not run to
completion were classified as did-not-dock. This results in 100
replicas from which we determine the number of systems with
docking success, scoring failure, sampling failure, and did-not-
dock outcomes. For each outcome, we calculate the mean and
standard deviation. We calculate p-values using the same t-tests
from SciPy used for the enrichment error calculations. The p-
values are used to compare the mean number of each docking
outcome between the two docking setups. A smaller p-value
indicates a greater likelihood that a change in the mean number
of each docking outcome is significant, and results from a change
in the docking setup (e.g., incorporating 3D-RISM receptor
desolvation or covalent docking). For example, if the number of
docking success outcomes is greater with covalent docking than
with noncovalent docking, and the p-value is small (≤0.01), then
covalent docking statistically significantly increases docking
success rate for that set of systems.
3. RESULTS
To evaluate docking performance on the RAS test set, we
performed pose reproduction experiments, for both cognate
docking and cross-docking cases, and enrichment calculations.
3.1. Description of the Test Set. We assembled a RAS test
set consisting of 138 RAS protein structures (125 KRAS4B, 11
HRAS, and 2 NRAS), and 2 structures of KRAS DNA G-
quadruplexes in complex with ligands. To construct this test set,
we searched the PDB for structures of RAS proteins in complex
with small molecule inhibitors. We filtered and discarded
structures containing GTP-competitive inhibitors. From the
remaining structures, we extracted the PDB ligand chemical IDs,
resulting in 110, 5, and 1 unique ligand(s) for the 125 KRAS, 11
HRAS, and 2 NRAS protein structures, respectively. See SI
Table S1 for a full list of ligands and corresponding PDB IDs.
This test set is prepared for docking as described in Section 2.
Shown in Table 3 is a breakdown of the number of systems and
unique ligands for KRAS.
The majority of the ligands bind to one of two main pockets:
the switch I/II pocket (SI−SII), and the switch II pocket (SII).
Two KRAS structures (PDB IDs 7U8H and 8AFD) contain
ligands in both the SI−SII and SII pockets, resulting in 127
ligand−receptor pairs from the 125 KRAS protein structures.
We classify ligands that do not bind to either the SI−SII or SII
pockets as miscellaneous binders. The test set contains ligands
that bind noncovalently and ligands that bind covalently. We
prepared covalent ligands for standard noncovalent docking, and
covalent docking with attach-and-grow (see Section 2).
The locations of the two main pockets, and ligands that bind
to these sites, are seen through an overlay of 48 representative
KRAS structures (Figure 3). Some miscellaneous binders are
shown as well. The bound nucleotides are shown and may be
GDP, or one of three GTP analogs: GPP[NH]P (GNP),
GPP[CH2]P (GCP), or GTPγS (GSP). The fluctuations of the
Switch I and Switch II regions are also seen in the overlay of
structures in Figure 3.
3.2. Pose Reproduction. Using the 127 KRAS ligand−
receptor pairs, we docked all ligands to all receptors and
Table 3. Breakdown of the KRAS Systems and Ligands
#systems #unique ligands
switch I/II pocket 40 32
switch II pocket 83 74
switch II pocket, noncovalent 24 18
switch II pocket, covalent 59 56
miscellaneous 4 4
total 127
a
110
a
There are 125 PDB structures; two structures have two ligands for a
total of 127 ligand−receptor pairs.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
730
generated cross-docking matrices using the results. Docking
calculations were performed using the noncovalent anchor-and-
grow sampling method with Chemgrid Score. We organize the
cross-docking results by ligand binding pocket, i.e., SI−SII
pocket systems (Figure 4A) and SII pocket systems (Figure 4B);
or view the results for all systems (SI Figure S4A). For each
completed docking experiment (one entry of the matrix), we
define three possible outcomes: docking success, scoring failure,
and sampling failure shown in blue, yellow and red, respectively
(Figure 4), defined above in Section 2. Sometimes a docking
calculation does not run to completion and no pose is generated;
these we classify as a did-not-dock outcome and are shown in
black (Figure 4). There are also often cases of incompatibility
between a ligand and a receptor (see SI Figure S4). RAS is a
flexible protein that adopts many distinct conformations,
7,99
and
these ligand−receptor incompatibilities are likely due to
conformational dierences in RAS, particularly within the SII
pocket.
6
In the SI−SII pocket systems we observe a swath of 8
systems where cross-docking experiments with the associated
ligands results in sampling failures for all receptors (Figure 4A),
see the Section 4 for more about these 8 systems. A fractional
breakdown of the pose reproduction outcomes for the cross-
docking matrix diagonal (cognate docking), matrix o-diagonal
(cross-docking), and full matrix is shown in Figure 4C. The
cognate docking outcomes (Table 4) and cross-docking
outcomes (Table 5) are also reported below. We use ligand
conformations generated using the experimental conformation
as a starting point, for results using ligand conformations
generated from SMILES see SI Figure S4B,C.
Tables 4 and 5show rates of each docking outcome for
cognate docking (diagonal) and cross-docking (o-diagonal).
As expected, we see that cognate docking has a higher docking
success rate than cross-docking. For the SI−SII pocket systems,
we see a 25.9% higher docking success rate for cognate docking
compared to cross-docking. We see a similar trend for the SII
pocket systems, 62.8% higher, and for all systems, 59.4% higher.
The SII pocket has a higher cognate docking success rate
compared to the SI−SII pocket, 81.9% compared to 52.5%,
respectively, but a lower rate for cross-docking, 19.1% compared
to 26.6%, respectively.
The SI−SII pocket has a lower sampling failure rate than the
SII pocket, 62.3% compared to 73.3%, respectively. This
corresponds to the incompatibility rate: 20.2% for the SI−SII
pocket compared to 32.2% for the SII pocket. For cross-docking,
the docking success rate for the entire set falls to 10.7%, lower
than either the SI−SII or SII pocket alone. This decline in
Figure 3. Representative KRAS structures (N= 48) from the test set. Switch I (residues 30−38, magenta) and Switch II (residues 60−76, cyan)
regions are highlighted. Ligands that occupy the SI−SII pocket are shown in orange. Ligands that occupy the SII pocket are shown in blue. Ligands that
do not bind to either of these two pockets are classified as miscellaneous and shown in green. (A, B) Overlays of 48 representative structurally distant
KRAS protein−ligand complexes. (C, D) Ligands are shown without the protein. (B) is rotated 90°of (A), and, likewise, (D) is rotated 90°of (C).
Cluster representatives are reported in SI Table S1.1.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
731
docking success rate corresponds with an increase in sampling
failure rate, which is higher for the full set, 84.7%, than for either
pocket alone, 62.3 or 73.3%. Unsurprisingly, the incompatibility
rate is also higher for the full set, 61.1%, than for either pocket,
20.2 or 32.2%. (See SI Section 3 and Table S5 for systems
classified as incompatible, but where cross-docking reproduces
the experimental pose).
For cognate docking, we generally see that the rates of each
docking outcome for all systems falls between the corresponding
rates for the two pockets (Table 4). For instance, the scoring
failure rate for all systems, 12.6%, falls between the scoring
Figure 4. Cross-docking and cognate docking results using Chemgrid Score. Blue, yellow, red, and black indicate docking success, scoring failure,
sampling failure, and did-not-dock outcomes, respectively. Cross-docking matrix showing docking outcomes for (A). 40 SI−SII pocket systems and (B).
83 SII pocket systems. Arrangement of systems determined using hierarchical clustering by similarity between ligand outcomes. (C) Fractional
breakdown of docking outcomes for the complete matrix, cognate docking (diagonal), and cross-docking (o-diagonal).
Table 4. Cognate Docking (Diagonal) Outcomes Using
Chemgrid Score
docking
success scoring
failure sampling
failure did not
dock
SI−SII pocket
a
21 (52.5%) 7 (17.5%) 12 (30.0%) 0 (0.0%)
SII pocket
b
68 (81.9%) 8 (9.6%) 6 (7.2%) 1 (1.2%)
all systems
c
89 (70.1%) 16 (12.6%) 21 (16.5%) 1 (0.8%)
a
Number of cognate docking (diagonal) experiments: 40.
b
83.
c
127.
Table 5. Cross-Docking (O-Diagonal) Outcomes Using Chemgrid Score
docking success scoring failure sampling failure did not dock incompatible
SI−SII pocket
a
415 (26.6%) 173 (11.1%) 972 (62.3%) 0 (0.0%) 315 (20.2%)
SII pocket
b
1301 (19.1%) 396 (5.8%) 4988 (73.3%) 121 (1.8%) 2190 (32.2%)
All systems
c
1716 (10.7%) 569 (3.6%) 13554 (84.7%) 163 (1.0%) 9782 (61.1%)
a
Number of cross-docking (o-diagonal) experiments: 1560.
b
6806.
c
16 002.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
732
failure rates for the two pockets, 17.5 and 9.6%. Since most
systems contain ligands that bind to the SI−SII or SII pockets
(i.e., there are few miscellaneous systems), cognate docking for
all systems only diers from the union of results for the two
individual pockets by a few systems. However, this is not the case
for cross-docking (Table 5). This is due to the additional cases
added to cross-docking across all systems, namely, when ligands
that bind to the SI−SII pocket are docked to SII pocket
receptors, or vice versa. For instance, the cross-docking scoring
failure rate for all systems, 3.6%, is lower than the rates for either
of the two pockets, 11.1 or 5.8%.
Focusing on the subset of SII pocket systems containing
noncovalent ligands, we obtain a much higher docking success
rate (Figure 5A). There are 24 systems in this subset, with some
systems sharing the same ligand (Table 3). We generated a
ligand chemical similarity matrix (Figure 5B) to compare to the
cross-docking matrix (Figure 5A). Regions of high ligand
similarity seem to correspond to regions of high docking success
rate. We select three systems from the matrix based on chemical
Figure 5. Pose reproduction for SII pocket noncovalent systems. (A) Docking outcomes for 24 SII pocket systems containing noncovalent ligands.
Blue, yellow, red, and black indicate docking success, scoring failure, sampling failure, and did-not-dock outcomes, respectively. (B) Ligand chemical
similarity determined using Tanimoto coecients (Tc) between Morgan fingerprints generated using RDKit, where a higher Tc indicates greater
similarity. Black indicates the identical or high similarity, gray indicates some similarity, and white indicates poor similarity. (C) Chemical structures of
three selected ligands (PDB ligand chemical IDs VR5, 6IC, and VU6). (D) Cognate docking, and two cross-docking poses for the ligand VR5 to
receptors 8TXH (green), 7T47 (orange), and 8ONV (purple). The receptor is hidden for clarity.
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article
https://doi.org/10.1021/acs.jcim.4c01623
J. Chem. Inf. Model. 2025, 65, 722−748
733
similarity of the three ligands (Figure 5B,C). The ligands 6IC
and VU6 serve as representatives of docking success and ligand
similarity from the two clusters. The ligand VR5 was selected for
having high docking success rate for both clusters. The ligands
6IC and VR5 have high chemical similarity, and cross-docking
VR5 to receptor 7T47 (associated with 6IC) results in a docking
success (Figure 5D, outlined in orange). The ligands VU6 and
VR5 have a lower Tc, indicating less similarity. However, both
molecules contain benzothiophene moieties that orient deep in
the pocket. Cross-docking VR5 to receptor 8ONV (associated
with VU6) also results in a docking success (Figure 5D, outlined
in purple). Chemical similarity between ligands seems to result
in increased compatibility between systems, due to similar
receptor conformations. Similar ligands may also be easier to
orient to the matching spheres, which are generated using ligand
atoms. As a result of sharing chemical similarity with ligands
from both clusters, the ligand VR5 has one of the highest rates of
docking success (83.3%) across the 24 noncovalent, SII pocket
systems.
3.3. Receptor Desolvation. We used systems from the test
set to test the newly implemented scoring functions that account
for receptor desolvation. Docking calculations here in Section
3.3 are performed using the noncovalent sampling methods:
hierarchical database search for enrichment calculations, and
anchor-and-grow for pose reproduction.
3.3.1. Enrichment Calculations. Using the decoy generation
procedure described in Section 2, we generated DUDE-Z-like
benchmark sets for enrichment quantification. These sets
consist of the SI−SII and SII pocket-binding ligands from the
test set, and corresponding property-matched and charge
extrema decoys sets. The ligands were first partitioned by
ligand-binding pocket, resulting in 32 SI−SII pocket ligands and
74 SII pocket ligands. A large portion of ligands that bind to the
SII pocket are covalent binders, which we modify by removing
an atom from the covalent warhead to prevent clashes during
noncovalent docking. The 74 SII pocket ligands were further
partitioned based on whether the ligands bind covalently or
noncovalently, resulting in 18 noncovalent ligands and 56
covalent ligands. The result is three sets of ligands: 32 SI−SII
pocket ligands, 18 noncovalent SII pocket ligands, and 56
covalent SII pocket ligands. Each set of ligands has two
corresponding sets of decoys: property-matched decoys and
charge extrema decoys (Table 6).
We performed benchmark enrichment calculations on 123
KRAS systems containing ligands that bind to either the SI−SII
or SII pockets with Chemgrid Score, omitting 4 miscellaneous
systems containing ligands that do not bind to either the SI−SII
or SII pockets from enrichment calculations. Due to the
computational cost of running GIST using 100 ns simulations,
we first selected the 20 systems with the best enrichment to run
GIST on: 10 SI−SII pocket systems, and 10 SII pocket systems.
Receptors with missing residues were omitted from our
selection. For more discussion on enrichment results for all
KRAS systems, see Section 3.3.2 and SI Section 5. We further
run GIST, using 10 ns simulations to reduce computational cost,
and 3D-RISM on all KRAS systems in the test set.
3.3.2. Comparison of GIST and 3D-RISM Scoring. We tested
incorporating receptor desolvation scoring by comparing
docking performance with and without the receptor desolvation
term, using enrichment calculations. Receptor desolvation grids
were generated using either GIST or 3D-RISM, and they were
incorporated into the scoring function for docking using the
trilinear interpolation method (i.e., the gist_score_gist_type
parameter was set to trilinear). Thus, we compare three scoring
functions: Chemgrid Score (baseline, no receptor desolvation),
Chemgrid Score with GIST, and Chemgrid Score with 3D-
RISM.
For the GIST calculations, the protein is translated and
rotated into a water box for the MD simulation; this changes the
frame, which impacts the docking. To accurately compare all
three scoring functions, we use receptors (and therefore grids)
aligned to the new GIST MD simulation frame. We regenerate
docking grids for the aligned receptors. We also prepare three
blurred 3D-RISM grids for docking: the excess chemical
potential (exchem), total solvation energy (solvene), and
solute−solvent potential energy (potUV) grids. We visualize
and compare the GIST and potUV 3D-RISM grids (Figure 6).
Using these grids, we quantify water displacement energies:
waters displaced from red regions are unfavorable, resulting in an
energy boost, while waters displaced from blue regions are
favorable, resulting in an energy penalty. There are visual
similarities between GIST grids (Figure 6A) and 3D-RISM grids
(Figure 6B), but also substantial dierences. To use the trilinear
scoring method, we precompute displacement using a Gaussian
weight, as discussed in Section 2. In Figure 6C−F, precomputed
displacement grids using two dierent probe radii are shown.
We refer to the process of precomputing displacement using a
Gaussian weight as blurring, and the resulting grids as blurred.
GIST and 3D-RISM grids blurred with a probe radius of 1.8 Å
(Figure 6E,F) are the most visually similar. The magnitude of
the voxels also diers between GIST and 3D-RISM, and we
adjust the thresholds used for visual comparison of the grids: ±
0.25 kcal/mol/Å3for GIST, and ±20 kcal/mol/Å3for 3D-
RISM. This is also the case for the blurred grids, and we use
thresholds of ±2.5 kcal/mol for blurred GIST and ±50 kcal/mol
for blurred 3D-RISM.
To compare the trilinear, blurry displacement, and