Rapid Scanning Structure-Activity Relationships in Combinatorial Data Sets: Identification of Activity Switches

Article (PDF Available)inJournal of Chemical Information and Modeling 53(6) · May 2013with17 Reads
DOI: 10.1021/ci400192y · Source: PubMed
Abstract
We present a general approach to describe the structure-activity relationships (SAR) of combinatorial data sets with activity for two biological endpoints with emphasis on the rapid identification of substitutions that have a large impact on activity and selectivity. The approach uses Dual-Activity Difference (DAD) maps that represent a visual and quantitative analysis of all pairwise comparisons of one, two, or more substitutions around a molecular template. Scanning the SAR of data sets using DAD maps allows the visual and quantitative identification of activity switches defined as specific substitutions that have an opposite effect on the activity of the compounds against two targets. The approach also rapidly identifies single- and double-target R-cliffs, i.e., compounds where a single or double substitution around the central scaffold dramatically modifies the activity for one or two targets, respectively. The approach introduced in this report can be applied to any analogue series with two biological activity endpoints. To illustrate the approach, we discuss the SAR of 106 pyrrolidine bis-diketopiperazines tested against two formylpeptide receptors obtained from positional scanning deconvolution methods of mixture-based libraries.
Rapid Scanning StructureActivity Relationships in Combinatorial
Data Sets: Identication of Activity Switches
Jose
́
L. Medina-Franco,*
,
Bruce S. Edwards,
Clemencia Pinilla,
§
Jon R. Appel,
§
Marc A. Giulianotti,
Radleigh G. Santos,
Austin B. Yongye,
Larry A. Sklar,
and Richard A. Houghten
,
Torrey Pines Institute for Molecular Studies, Port St. Lucie, Florida 34987, United States
University of New Mexico, Albuquerque, New Mexico 87131, United States
§
Torrey Pines Institute for Molecular Studies, San Diego California 92121, United States
*
S
Supporting Information
ABSTRACT: We present a general approach to describe the
structureactivity relationships (SAR) of combinatorial data sets
with activity for two biological endpoints with emphasis on the
rapid identication of substitutions that have a large impact on
activity and selectivity. The approach uses dual-activity dierence
(DAD) maps that represent a visual and quantitative analysis of all
pairwise comparisons of one, two, or more substitutions around a
molecular template. Scanning the SAR of data sets using DAD
maps allows the visual and quantitative identication of activity
switches dened as specic substitutions that have an opposite
eect on the activity of the compounds against two targets. The
approach also rapidly identies single- and double-target R-clis,
i.e., compounds where a single or double substitution around the
central scaold dramatically modies the activity for one or two
targets, respectively. The approach introduced in this report can be applied to any analogue series with two biological activity
endpoints. To illustrate the approach, we discuss the SAR of 106 pyrrolidine bis-diketopiperazines tested against two
formylpeptide receptors obtained from positional scanning deconvolution methods of mixture-based libraries.
INTRODUCTION
Structureactivity relationship (SAR) analyses of large data sets
usually require the application of computational methods,
which enable an organized characterization and rapid
identication of activity and selectivity clis. Systemat ic
identication and quantication of such cases have been the
subject of intense research giving rise to the development of
activity landscape modeling that is extensively reviewed
elsewhere.
13
Most of the activity landscape methods are
applied to diverse data sets using ngerprint-based representa-
tions calculated from whole molecular structures. Just recently,
substructure-based representations have been explored for
activity landscapes using the concept of matched molecular pair
(MMP), which is dened as a pair of compounds that only
diers at a single site.
46
Combinatorial data sets continue to play a central role in lead
identication and drug discovery. For example, screening of
highly dense mixture-based libraries
79
explores uncovered
regions of the medicinally relevant chemical space,
10
increases
the potential of identifying activity clis, and provides a rapid
understanding of the SAR associated with novel leads and
targets.
10
Furthermore, in vivo testing of mixture-based libraries
oers the possibility of identifying master key compounds for
multitarget drug discovery (i.e., molecules that may operate on
a desired set of lockstargetsto gain access to a desired
clinical eect).
11
High-density libraries are suitable for lead
identication because they facilitate the detection of small
structural modications that contribute to biological activity
and selectivity.
Computational methods have been developed to navigate
and visualize the SAR of analogue series or combinatorial data
sets. Recent examples include the SAR map,
12
SAR matrix,
13
single R-group polymorphisms that identify R-cli s,
14
and SAR
analysis tools recently reviewed by Duy et al.
15
Although most
of these methods are suitable to quickly identify specicR-
groups that lead to active, inactive, and selective compounds, it
is not straightforward to identify pairs of compounds that have
opposing activity outcomes on two or more targets due to
specic changes in structure.
Herein, we introduce an approach for the facile visualization
and analysis of the SAR of combinatorial data sets. The method
is based on systematic pairwise comparisons of the R-groups of
all molecular pairs in a data set tested across two biological
endpoints. The approach represents an extended application of
the dual-activity dierence (DAD) maps that were designed to
Received: April 1, 2013
Published: May 24, 2013
Article
pubs.acs.org/jcim
© 2013 American Chemical Society 1475 dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 1485
explore the SAR of diverse sets with activity against two
targets.
16,17
In previous applications of the DAD maps, the
structural relationships were obtained with similarity calcu-
lations computed using ngerprint representations. Because
combinatorial data sets have, in general, low-structural diversity,
ngerprint-based methods do not always result in easily
interpretable SAR. In addition, it has been discussed that the
activity landscape models largely depend on the variables
utilized to represent chemical structures.
18,19
In order to
address these issues, we present an intuitive method to compare
systematic changes in the substitutions of combinatorial data
sets and nding the associations between those changes and the
response in biological activity. We also discuss examples of
activity switches and selectivity switches,
3
the latter concept
dened as a minor structural modication that drastically
inverts the selectivity pattern of two compounds. As a case
study, we explored the SAR of a series of novel high anity
formylpeptide receptor (FPR) ligands we reported recently.
20
FPRs are a small group of G protein-coupled receptors that are
important in host defense and inammation. Specically, the
two receptors investigated were FPR1, linked to antibacterial
inammation
21
and malignant glioma cell metastasis,
22
and
FPR2, linked to chronic inammation in systemic amyloidosis,
Alzheimers disease, and prion diseases.
23
Using positional
scanning deconvolution methods, FPR1 and FPR2 selective
ligands with nanomolar binding anities were identied from
mixture-based molecule libraries containing more than 700,000
compounds. The ligands were identied by the screening and
deconvolution of the Torrey Pines Institute for Molecular
Studies (TPIMS) libraries using a high-throughput screening
duplex receptor assay.
20
As noted previously, a number of
compounds in this data set showed selective anities for FPR1
or FPR2 that are the most functionally active reported to date
for small molecules in a ligand competition assay format.
20
METHODS
Data Set. We analyzed the SAR of 106 compounds
obtained by screening and deconvoluting the pyrrolidine bis-
diketopiperazine library in Figure 1.
20
The library has four
diversity positions. Each molecule in the data set has reported
binding inhibition constants (K
i
) that were obtained from
competitive ligand displacement assays. The 106 compounds
were synthesized and tested as single molecules on the basis of
deconvoluted results from primary high-throughput screening
of mixture libraries in a duplex ow cytometry binding assay. Of
note, most of the R-groups are polar substituents. The purity of
all molecules was analyzed by liquid chromatographymass
spectrometry as described elsewhere.
20
The chemical structures
and binding activity (K
i
) are presented in Table S1 of the
Supporting Information. The initial K
i
values (in nM) were
transformedtopK
i
(log
10
K
i
) values. The activity of
compounds with undened experimental K
i
(>10,000 nM)
was approximated as 10,000 nM.
Dual-Activity Dierence (DAD) Maps. The SAR of data
sets tested with two biological endpoints can be characterized
using pairwise comparisons portrayed in DAD maps proposed
recently.
16,17,24,25
Given a set of N compounds tested with
targets I and II, the DAD map depicts N(N 1)/2 pairwise
potency dierences for each possible pair in the data set against
both targets. The potency dierences for target T for each
molecule pair are calculated with the expression
Δ
=−KKKp(T) p(T) p(T)
ab a b
iii
where pK
i
(T)
a
and pK
i
(T)
b
are the activities of molecules a
and b (b > a) against the two targets and T = FPR1, FPR2.
Noteworthy, ΔpK
i
can have positive or negative values
providing information about the directionality of the SAR.
Thus, DAD maps are able to dierentiate pairs of molecules
where the structural change increases the activity for one target
but decreases the activity for the other target (see below).
17
A general form of a DAD map is shown in Figure 2. Vertical
and horizontal lines at ΔpK
i
± tdene boundaries for low/high
potency dierence for targets I and II, respectively. Here, we set
t = 1, one log unit, so that data points were considered with low
potency dierence if 1 ΔpK
i
1 for each target. The
boundaries dene zones Z1 through Z5 in Figure 2. Structural
modications for molecule pairs that fall into zone Z1 (small or
a large structural change) have a similar impact on the activity
against the two targets (increase or decrease in activity).
Therefore, Z1 is associated with similar SAR of the pair of
compounds for both targets. In sharp contrast, pairs of
compounds that fall into Z2 indicate that the change in activity
for the compounds in the pair is opposite for I and II. Thus, the
structural changes in the pair of compounds in Z2 are
associated with an inverse SAR or switch in activity,
3
which
increases the activity for one target but decreases the activity for
the other target. Thus, activity switches point to structural
changes that completely invert the activity pattern. Data points
in Z3 and Z4 correspond to pairs of molecules with the same or
similar activity for one target (I or II, respectively) but dierent
Figure 1. Core scaold of the 106 pyrrolidine bis-diketopiperazines
analyzed in this work.
Figure 2. General form of a dual-activity dierence (DAD) map for
targets I and II. The dashed lines intersect the axes at potency
dierence values of 0 ± t, e.g., t = 1 (one log unit). The regions are as
follows: Z1, substitution(s) result in a signicant decrease or increase
of activity in both targets; Z2, substitution(s) increase activity for one
target, while decreasing activity for the other target signicantly; Z3
and Z4, substitution(s) result in signicant changes in activity on one
target, but not an appreciable change on the other.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851476
activity for the other target (II or I, respectively). Data points in
Z5 denote a pair of compounds with similar activity (or
identical if Δactivity = 0 for both targets) against I and II. In
other words, structural changes in the pairs of compounds in
Z5 have little or no impact on the activity against the two
targets. As previously noted, the classication of data points in
an activity-dierence map is independent of the structure
similarity.
16,17,24,25
Pairwise Comparison of the R-Group Substitutions.
Because dual activity-dierence maps are based on pairwise
comparisons, it is straightforward to incorporate pairwise
structure relationships by distinguishing each molecular pair
by the number of substitutions (one to four) that dier
between the two molecules. The number of dierent R-groups
for each pair of compounds was determined by comparing the
text strings of the chemical names of the substituents in Table
S1 of the Supporting Information. This is a simple but powerful
method to compare combinatorial data sets. Remarkably, the
stereochemistry is taken into account such that substituents
with R and S conguration are easily distinguished by the
specic name of the R-group (e.g., R-propyl vs S-propyl).
Analysis of the SAR is focused on data points in the DAD
maps with one or two substitutions because these examples are
straightforward to interpret from the experimental point of
view. As noted above, distinguishing molecular pairs based on
the dierent number of R-groups around a core scaold is a
substructure-based approach to represent chemical structures.
The substructure-based representation of compounds has been
proposed in activity landscape studies to enhance interpret-
ability of the SAR
5
and address the facts vs artifacts issue of
activity clis.
26
Fingerprint Representations and Structure Similarity.
For comparison, we computed pairwise similarity values using
molecular access system (MACCS) keys (166-bits), graph-
based three-point pharmacophores ngerprints (GpiDAPH3)
implemented in MOE,
27
and radial ngerprints implemented in
Canvas.
28
These three ngerprints were selected because they
have conceptually dierent designs capturing distinctive aspects
of chemical structures.
19
For example, MACCS keys used in
this work are a predened set of 166 s tructural keys;
GpiDAPH3 ngerprints are graph-based three-point pharma-
cophores employing any set of three possible atom types (pi
system, donor, acceptor); and radial ngerprints are equivalent
to the extended connectivity ngerprints (ECFPs) and entail a
growing a set of fragments radially from each heavy atom over a
series of iterations.
29,30
We also computed the average similarity
of all three measures as a consensus representation discussed
previously. Briey, the consensus or aggregated representation
is intended to capture the dierent aspects covered by the
individual molecular representations.
17,31
RESULTS AND DISCUSSION
Overview the Diversity of the Data Set. In order to
assess quantitatively the structural similarity and high
structural density (low molecular diversity) of the data set,
we measured the molecular similarity of the 106 compounds
with the pairwise comparisons using the Tanimoto metric and
three dierent ngerprints, namely, MACCS, GpiDPH3, and
radial ngerprints. The averageofallthreeTanimoto/
ngerprint similarities was computed as described in the
Methods section. Table 1 summarizes the distribution of the
molecular similarities of the 5565 pairwise comparisons of the
106 compounds in the data set. Table 1 also summarizes the
molecular similarities of the pairwise comparisons of pairs of
compounds with one, two, three, and four substitutions (275,
896, 1863, and 2531 pairs of compounds, respectively). The
entire data set has, in general, low structural diversity (or high
Table 1. Number of Pairwise Comparisons for 14 Substitutions and Summary of the Distribution of the Molecular Similarity
Using Tanimoto and Three Dierent Fingerprints
number of substitutions
all compounds 1 2 3 4
no. of pairs 5565 275 896 1863 2531
MACCS median 0.867 0.947 0.898 0.867 0.844
U95
a
0.870 0.946 0.903 0.873 0.851
mean 0.869 0.941 0.900 0.871 0.849
L95
a
0.867 0.936 0.897 0.868 0.847
std. dev. 0.053 0.039 0.045 0.048 0.048
GpiDAPH3 median 0.773 0.902 0.837 0.778 0.745
U95 0.778 0.917 0.837 0.779 0.742
mean 0.776 0.911 0.833 0.776 0.740
L95 0.774 0.905 0.829 0.774 0.738
std. dev. 0.074 0.052 0.058 0.060 0.059
radial median 0.143 0.478 0.248 0.166 0.110
U95 0.186 0.487 0.289 0.185 0.118
mean 0.183 0.470 0.282 0.182 0.117
L95 0.180 0.453 0.276 0.179 0.116
std. dev. 0.109 0.144 0.094 0.063 0.030
mean similarity
b
median 0.598 0.774 0.673 0.609 0.568
U95 0.611 0.782 0.675 0.612 0.570
mean 0.609 0.774 0.672 0.610 0.568
L95 0.607 0.767 0.669 0.608 0.567
std. dev. 0.066 0.063 0.046 0.040 0.033
a
U95 and L95 represent the upper and lower 95% condence intervals of the mean, respectively.
b
Computed from the average similarity of MACCS,
GpiDAPH3, and radial ngerprints.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851477
density) as deduced from the median, mean (0.87), and other
statistics of the 5565 Tanimoto/MACCS keys similarity values.
For comparison, the median MACCS/Tanimoto similarity
reported for a general screening collection and a set of
approved drugs is 0.32 and 0.30, respectively.
10
The high
density of the data set studied in this work can also be deduced
from other ngerprints, e.g., the mean GpiDAPH3/Tanimoto
similarity of a set of approved drugs is 0.13.
32
It is also
remarkable the dierent ranges of similarity values obtained
with the three ngerprints for the same data set. Such
dependence has been extensively discussed in the literature
18,33
and emphasizes the importance of using more than one
ngerprint representation.
Not surprisingly Table 1 also shows that the 275 pairs of
compounds with one substitution have higher similarity than
the pairs of compounds with four substitutions. Indeed, as the
number of substitutions increases, the molecular similarity
decreases. This result is similar for all three ngerprint
representations.
To compare the similarities of the potencies of the data set
toward FPR1 and FPR2, the distributions of their absolute
pairwise potency dierences were analyzed, and the results are
summarized in Table 2. A total of 5565 pairwise comparisons
are shown. Considering all pairwise comparisons, the potency
dierence for FPR1 is lower than the dierence of FPR2, as
deduced from the median and all other statistics. This suggests
that the activity for FPR2 was, overall, more sensitive to the
structural changes of this data set. Table 2 also summarizes the
distributions of the p otency dierences of the pairwise
comparisons of pairs of compounds with one, two, three, and
four changes in their R-groups. Not surprisingly, for both
receptors, the potency dierence increases from one to four
substitutions. However, the activity for FPR1 was more
sensitive than FPR2 to one and two changes in R-groups as
clearly shown by the higher absolute potency dierences (for
example, median values of 0.47 and 0.75 for FPR1 vs 0.14 and
0.42 for FPR2, respectively). This result suggests that, for this
data set, FPR1 is involved in more activity clis than FPR2, i.e.,
it is expected that a large change in activity will be observed for
FPR1 ligands due to one or two substitutions in the R-groups.
It remains to be explored if this observation applies for other
pyrrolidine bis-diketopiperazines tested with FPR1 and FPR2.
The most dramatic clis for each receptor are discussed in the
next section with emphasis on those cases where a change in
structure switches the activity pattern for FPR1 and FPR2.
Dual Activity-Dierence Maps. In order to facilitate the
interpretation of the SAR, the analysis is mainly focused on the
pairwise comparisons of molecules with one and two
substitutions in the R-groups. Figure 3 shows DAD maps
with pairwise potency dierences corresponding to pairs of
compounds with one (275 data points) and two (896 data
points) substitutions; the table beneath the plots shows the
number and percentage of data points in each region of the
map for compound pairs with single and double substitutions,
respectively. DAD maps for pairs of compounds with three
(1863 data points) and four (2531 data points) substitutions
are shown in Figure S1 of the Supporting Information. Activity
changes associated with three and four changes, although easily
mined in the DAD maps, are less informative from the SAR
interpretation point of view. Noteworthy, the distribution of
the data points in all DAD maps discussed here is independent
of the structure similarity.
Figure 3A and B show that the majority of the data points in
the corresponding DAD maps are in the center, zone Z5, in
particular for the DAD map for single substitutions. These
results support the notion that similar compounds have similar
activity as one or two changes around the core scaold do not
have a large impact on the activity for both receptors, e.g.,
potency dierence less than one log unit. As the number of
substitutions increases from one to four, the percentage of pairs
of compounds in Z5 decreases as shown in Table S2 of the
Supporting Information.
The DAD maps in Figure 3 also show data points in the
regions Z1Z4, which are the most informative from an SAR
point of view.
16,17
As discussed in the Methods section,
compound pairs in these regions point to one or two R-group
replacements that change dramatically (more than log unit) the
activity dierence for one or both receptors. Because the
compounds in the data set were derived from the screening
results of a mixture based combinatorial library in positional
scanning format, it is expected that pairs of compounds with
one or two substitutions will have relatively high structural
similarity.
The number and percentage of pairs in each zone are
summarized in the table below the DAD maps. The higher
percentages of data points in Z4 vs Z3 (Figure 3) indicate that
compared to FPR2, FPR1 is more sensitive to changes in
activity than FPR2 from single and double substitutions in the
data set. This result is in agreement with the results in Table 2.
Mapping Structure Similarity on DAD Maps Filtered
by the Number of Substituents. Although the main focus of
this work is scanning the SAR from DAD maps showing pairs
of compounds with a discrete number of substituents, for
reference, the mean similarity values of the molecule pairs was
mapped into the DAD maps. Figure S2 of the Supporting
Information shows four DAD maps with data points
corresponding to one, two, three, and four substitutions. Data
points are colored by mean structural similarity using a
continuous scale from less similar (green) to more similar
(red). The scale was dened based on the distribution of
similarity values for all possible pairs of molecules in the data
set. This analysis depicted in a visual manner clearly shows that
pairs of compounds with only one dierent R-group are more
Table 2. Distribution of the Absolute Potency Dierence for
the Two Targets, Corresponding to All 5565 Pairwise
Comparisons and All Corresponding Pairwise Comparisons
for 14 Substitutions
number of substitutions
all
compounds 1 2 3 4
no. of pairs 5565 275 896 1863 2531
FPR1 median 0.900 0.470 0.750 0.860 1.060
U95
a
1.077 0.707 0.954 1.031 1.232
mean 1.055 0.633 0.906 0.994 1.198
L95
a
1.033 0.559 0.857 0.957 1.164
std. dev. 0.841 0.626 0.745 0.810 0.884
FPR2 median 0.970 0.140 0.420 0.820 1.580
U95 1.215 0.537 0.805 1.121 1.549
mean 1.185 0.454 0.747 1.072 1.502
L95 1.155 0.371 0.688 1.024 1.455
std. dev. 1.139 0.703 0.889 1.072 1.199
a
U95 and L95 represent the upper and lower 95% condence intervals
of the mean, respectively.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851478
similar than compound pair s with tw o, three, and four
substitutions (Figure S2, Supporting Information, and Dis-
cussion section). This conclusion reects the quantitative
characterization of the molecular similarity analyzed above
(Table 1).
The SAR obtained from zones Z3 and Z4, e.g., single-target
activity clis, has been broadly discussed in previous
applications of DAD maps for other data sets.
16,17
Herein,
the discussion of the DAD maps is primarily focused on pairs of
compounds in Z1 and more importantly in Z2. It should be
recalled that Z1 represents similar SARs, while Z2 indicates
inverse SAR. All activity switches (in Z2) are discussed rst,
followed by representative examples of dual-target activity clis
with the same directionality of SAR (in Z1).
Activity Switches (Z2) with One Substitution. Figure 4
shows a DAD map displaying 275 pairs of compounds with one
substitution. In this map, the data points are colored by the
mean molecular similarity (distributions summarized in Table
1). As discussed above, most of these points are colored
orange-to-red further emphasizing the structural similarity of
the pairs of compounds with one substitution (see above).
Figure 4 also shows the chemical structures, biological activity,
potency dierence, and as reference, the structural similarity of
the four activity switches in Z2. As discussed below, all
compound pairs shown in Figure 4 except 1754-26/1754-56 are
in addition selectivity switches.
For the four pairs in Figure 4, the change in the R-group
(highlighted in magenta) has a large and opposite eect on the
activity of FPR1 and FPR2. For example, in the compound pair
1754-43/1754-49 the replacement of an R-propyl with R-2-
naphthylmethyl at R
2
dramatically increases the activity for
FPR2 by more than 2.48 log units (from K
i
= >10,000 to 33
nM), but it greatly decreases the activity for FPR1 by 1.41 log
units (from K
i
= 90 to 2322 nM). This is also an example of a
selectivity switch because 1754-43 is selective for FPR1 whereas
175449 is selective for FPR2.
A second notable example is the compound pair 1754-26/
1754-56 where the substitution of an S-isopropyl with S-butyl
at R
1
increases the activity for FPR1 by 1.36 log units but
decreases the activity for FPR2 in 1.28 log units. Notably, as
pointed out previously, 1754-26 was the only compound in the
data set with a K
i
value less than 100 nM for both receptors.
20
The compound pair 1754-26/1754-56, however, is not a
selectivity switch because 1754-26 is nearly equipotent with
both receptors.
The mean structural similarity of the activity switches in
Figure 4 is high, relative to the mean similarity of the entire
data set (for all 5565 pairs of compounds, Table 1). For
example, all four molecule pairs have mean similarity equal or
greater than 0.73, which is higher than the U95 of the mean
similarity for the whole data set, 0.61. These results indicate
that the switches discussed in this gure would have been
identied following a ngerprint similarity-based approach
(because of its high ngerprint-based structural similarity).
However, as discussed in the literature, using a substructure-
based method to classify molecular structures is more intuitive
Figure 3. Dual-activity dierence maps for the 106 compounds. Each data point represents a pairwise comparison with (A) one substitution (275
data points total) and (B) two substitutions (896 data points). Data points in the center of each DAD map (zone Z5) with potency dierence 1 log
unit for any target are in gray. The table shows the number and percentage of data points in each region of the map for compound pairs with single
and double substitutions, respectively.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851479
and easy to interpret than using ngerprint representations in
activity landscape studies.
5
Activity Switches (Z2) with Two Substitutions. Figure 5
presents a DAD map showing 896 pairs of compounds with
two substitutions. Data points are further distinguished by the
mean molecular similarity. Comparison of this gure in the
color pattern of the DAD map in Figure 4 clearly shows the
overall lower structural similarity of the pairs of compounds
with two substitutions (see also Table 1). Figure 5 also presents
the chemical structures of three representative activity switches
in Z2 (selected from 49 total) along with the biological activity,
potency dierence, and structural similarity. The changes in the
R-groups are highlighted in magenta. We choose these
examples because they have one potent compound (K
i
90
nM) for FPR1 or FPR2 in the pair. The complete set of 49
activity switches is summarized in Table S3 of the Supporting
Information.
For the three examples in Figure 5, the two changes in the R-
groups have a large and opposite eect on the activity for FPR1
and FPR2. Notably, in the molecule pair 1754-31/1754-43, the
replacement of an S-isopropyl to an S-propyl in R
1
and the
substitution of an R-2-naphthylmethyl to R-propyl in R
2
increases the activity for FPR1 by more than two log units
(from K
i
= >10,000 to 90 nM), but it decreases dramatically the
activity for FPR2 by four log units (from K
i
= 1 to >10,000
nM). Two other remarkable examples of activity switches with
two R-group replacements are given by the pairs of compounds
1754-20/1754-56 and 1754-31/1858-482. The three activity
switches in Figure 5 are also selectivity switches because the
corresponding replacement in the R-groups has not only an
Figure 4. Activity switches for single substitutions. Data points are colored by the mean structure similarity (Figure S2, Supporting Information).
The switches are readily identied in zone Z2 of the DAD maps. The structural change in each pair is highlighted in magenta. The table summarizes
the potency dierence and ngerprint-based similarity values for each pair.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851480
opposite eect on the activity of the two receptors but also on
the selectivity.
The molecular similarity of the three compound pairs in
Figure 5 (e.g., mean structure similarity equal to or greater than
0.65) is still higher than the U95 mean similarity (0.61) of all
pairs of compounds in the data set. Thus, they would be
considered structurally similar. Although these examples would
have b een retrieved from a ngerprint similarity-based
approach, it is clear that these pairs of compounds represent
borderline cases in activity landscape studies based on
ngerprint-based molecular similarity.
The previous examples illustrate that the SAR of pairs of
compounds with two substitutions can be rapidly analyzed in a
systematic manner in DAD maps. However, the interpretation
of the SAR for compounds with two (or more) substitutions is
more dicult than pairs of compounds with only one
substitution. In the following subsections, the discussion is
focused on representative examples of dual- and single-target
activity clis with one R-group replacement. These cases can be
regarded as R-clis.
14
Dual-Target Activity Clis (Z1). Figure 6 shows a DAD
map with pairs of compounds with one substitution. The seven
Z1 data points are labeled. As clearly shown in the gure, the
change in the R-group for all seven pairs simultaneously
increases (or decreases) the activity for both targets by more
than a log unit. The pair of compounds 1754-44 and 1858-483
illustrate that the replacement of a 2-biphenyl-4-yl-ethyl to a 4-
methyl-1-cyclohexyl-methyl in R
4
decreases the activity for
FPR1 and FPR2. Similar analysis can be performed for the
other six pairs of compounds.
Single-Target Activity Clis. Figure 7 shows the three
single-target activity clis with a very large potency dierence,
Figure 5. Representative activity switches with double substitutions (selected from 49 pairs in total). The switches are readily identied in the zone
Z2 of DAD maps. The structural changes in each pair are highlighted in magenta. The table summarizes the potency dierence and ngerprint-based
similarity values for each pair.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851481
i.e., more than three log units. These examples can be regarded
as deep activity clis.
31
It is clear from the gure that the
corresponding R-group modication greatly changes the
activity for only one target, either FPR1 (pairs 1754-44/
1754-50; 1754-56/1858-480) or FPR2 (1754-6/1754-31).
However, the replacement of the R-group does not have a
major impact (<1 log unit) on the activity of the other target.
Interestingly, the chemical structures and K
i
values of
compounds 1754-43 and 1858-483 in Figure 6 exemplify a
single-target activity cli for FPR1 (the pair 1754-43/1858-483
is not labeled in the DAD map in gure 6 that illustrates dual-
target clis). The only dierence in the chemical structure of
this pair of compounds is the methyl group in the R
4
substituent highlighted in blue, i.e., cyclohexyl-methyl vs 4-
methyl-1-cyclohexyl-methyl. This subtle modication changes
the activity for FPR1 from K
i
= 90 nM (1754-43) to K
i
>
10,000 nM (1858-483). However, this modication has no
impact on the activity of FPR2.
SAS Maps. DAD maps are highly related to the structure
activity similarity (SAS) maps that are two-dimensional plots
representing the relationship between the potency dierence
for one target (typically plotted on the Y-axis) and the
structural similarity (usually plotted on the X-axis).
3
It is
possible to generate SAS maps for FPR1 and FPR2 and further
distinguish the data points based on the number of R-group
changes ( Figures S3 and S4, Supporting Information).
Figure 6. Dual-target activity clis for single substitutions with the same direction, e.g., increases or decreases the activity for the two targets. The
seven activity clis with direct SAR are readily identied in zone Z1 of the DAD maps. The structural changes in each pair are highlighted in
magenta. The table summarizes the potency dierence and ngerprint-based similarity values for each pair.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851482
Although it is straightforward to identify single-target activity
clis from these maps, the directionality of the SAR is lost;
therefore, it is impossible to identify selectivity switches and
dual-target clis with similar SAR.
CONCLUSIONS
We report an intuitive substructure-based approach for the
systematic and rapid scanning of the SAR in combinatorial data
sets using the concept of activity landscape modeling. The
general method herein introduced enables the quick and
methodological identication on large changes in biological
activity associated with one, two, or more replacements in the
R-groups of a common scaold. The larger and more structural
diverse the R-groups (e.g., measured by a ngerprint-based
method) will increase the applicability domain of the DAD
maps. The approach captures the substructure relationships of
all possible pairs in the data set, and it is reminiscent of the
MMP concept. However, while the MMP and MMP-clis
consider structural changes at a single R-group, the current
approach considers structural changes at one, two, or more
groups. Of course, changes at one or two groups are the ones
that are easier to interpret for a medicinal chemist. The
substructure relationships can be readily mapped and visualized
in DAD maps. DAD maps are based on pairwise comparisons
of potency dierences making it straightforward to represent
changes in the R-groups and to systematically explore single-
and dual-target activity R-clis. In particular, DAD maps enable
the rapid identication of all activity switches,dened as pairs
of compounds where a small change in the structure (e.g.,
replacement of one or two R-groups) completely inverts the
biological response for two targets, namely, increases the
activity for one target but decreases the activity for the second
target. Several activity switches identied in this work were
also selective switches dened as structural changes that
completely invert the selectivity pattern of similar compounds
against two biological endpoints. Activity and selectivity cli s
and switches are of value for the medicinal chemist because
they point to specic molecules and substitutions that can be
further modied for improved activity and/or selectivity.
However, as currently stands, the DAD maps presented in
this work are focused on the description of the SAR. Eorts to
conduct prospective applications of DAD maps are ongoing. Of
note, the number of data points analyzed in the DAD maps are
selected solely based on the number of R-group substitutions of
the core scaold. This approach is in sharp contrast with
previous applications of the DAD maps where data points are
selected based on the ngerprint-based similarity values. To
illustrate the rapid detection of activity switches in
combinatorial data sets using DAD maps, we discuss the
activity switches, single- and dual-target activity clis of a novel
and relevant set of 106 pyrrolidine bis-diketopiperazines tested
with two formylpeptide receptors.
20
Most of the R-group
substituents of the main scaold are hydrophobic groups, and
we did not observe cli s that involve highly polar groups. This
data set represents one example of several combinatorial data
sets obtained from positional scanning deconvolution methods
of mixture-based libraries.
7,9
Figure 7. Activity clis for FPR1 and FPR2 (deep clis with >3 log units in potency dierence for single substitutions). The structural changes in
each pair are highlighted in magenta. The table summarizes the potency dierence and ngerprint-based similarity values for each pair.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851483
ASSOCIATED CONTENT
*
S
Supporting Information
SMILES representation and biological activity of the 106
compounds analyzed in this work (Table S1); number of pairs
of compounds in Z5 (Table S2); 49 activity switches with two
substitutions (Table S3); consensus SAS maps for FPR1
(Figure S3); and consensus SAS maps for FPR2 (Figure S4).
This material is available free of charge via the Internet at
http://pubs.acs.org.
AUTHOR INFORMATION
Corresponding Author
*Tel: +1-772-345-4685. Fax: +1-772-345-3649. E-mail:
jmedina@tpims.org.
Notes
The authors declare no competing nancial interest.
ACKNOWLEDGMENTS
The authors thank Jacob Waddell for writing scripts used to
generate the DAD maps. This work was supported by NIH
Grants U54MH074425 (L.A.S.), U54MH084690 (L.A.S.),
R01HG005066 (B.S.E.), and 1R01DA031370 (R.A.H.) and
the University of New Mexico Shared Flow Cytometry and
High Throughput Screening Resource (supported in part by
UNM Cancer Center and NIH Grants P30 CA118100 and U54
RR026083). We also appreciate funding from the State of
Florida, Executive Oce of the Governors Department of
Economic Opportunity.
REFERENCES
(1) Stumpfe, D.; Bajorath, J. Exploring activity cliffs in medicinal
chemistry. J. Med. Chem. 2012, 55, 29322942.
(2) Bajorath, J. Modeling of activity landscapes for drug discovery.
Expert. Opin. Drug Discov. 2012, 7, 463473.
(3) Medina-Franco, J. L. Scanning structureactivity relationships
with with structureactivity similarity and related maps: From
consensus activity cliffs to selectivity switches. J. Chem. Inf. Model.
2012, 52, 24852493.
(4) Kenny, P. W.; Sadowski, J. Chemoinformatics in Drug Discovery;
Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2005.
(5) Hu, X.; Hu, Y.; Vogt, M.; Stumpfe, D.; Bajorath, J. MMP-Cliffs:
Systematic identification of activity cliffs on the basis of matched
molecular pairs. J. Chem. Inf. Model. 2012, 52, 11381145.
(6) Dossetter, A. G.; Griffen, E. J.; Leach, A. G. Matched molecular
pair analysis in drug discovery. Drug Discovery Today 2013,
DOI: 10.1016/j.drudis.2013.03.003.
(7) Houghten, R. A.; Pinilla, C.; Appel, J. R.; Blondelle, S. E.; Dooley,
C. T.; Eichler, J.; Nefzi, A.; Ostresh, J. M. Mixture-based synthetic
combinatorial libraries. J. Med. Chem. 1999, 42, 37433778.
(8) Pinilla, C.; Appel, J. R.; Borras, E.; Houghten, R. A. Advances in
the use of synthetic combinatorial chemistry: Mixture-based libraries.
Nat. Med. 2003, 9, 118122.
(9) Houghten, R. A.; Pinilla, C.; Giulianotti, M. A.; Appel, J. R.;
Dooley, C. T.; Nefzi, A.; Ostresh, J. M.; Yu, Y. P.; Maggiora, G. M.;
Medina-Franco, J. L.; Brunner, D.; Schneider, J. Strategies for the use
of mixture-based synthetic combinatorial libraries: Scaffold ranking,
direct testing, in vivo, and enhanced deconvolution by computational
methods. J. Comb. Chem. 2008, 10,319.
(10) Lo
́
pez-Vallejo, F.; Giulianotti, M. A.; Houghten, R. A.; Medina-
Franco, J. L. Expanding the medicinally relevant chemical space with
compound libraries. Drug Discovery Today 2012, 17, 718726.
(11) Medina-Franco, J. L.; Giulianotti, M. A.; Welmaker, G. S.;
Houghten, R. A. Shifting from the single to the multitarget paradigm in
drug discovery. Drug Discovery Today 2013, 18, 495
501.
(12) Kolpak, J.; Connolly, P. J.; Lobanov, V. S.; Agrafiotis, D. K.
Enhanced SAR maps: Expanding the data rendering capabilities of a
popular medicinal chemistry tool. J. Chem. Inf. Model. 2009, 49, 2221
2230.
(13) Wassermann, A. M.; Haebel, P.; Weskamp, N.; Bajorath, J. SAR
Matrices: Automated extraction of information-rich SAR tables from
large compound data sets. J. Chem. Inf. Model. 2012, 52 , 17691776.
(14) Agrafiotis, D. K.; Wiener, J. J. M.; Skalkin, A.; Kolpak, J. Single
R-group polymorphisms (SRPs) and R-cliffs: An intuitive framework
for analyzing and visualizing activity cliffs in a single analog series. J.
Chem. Inf. Model. 2011, 51, 11221131.
(15) Duffy, B. C.; Zhu, L.; Decornez, H.; Kitchen, D. B. Early phase
drug discovery: cheminformatics and computational techniques in
identifying lead series. Bioorg. Med. Chem. 2012, 20, 53245342.
(16) Pe
́
rez-Villanueva, J.; Santos, R.; Herna
́
ndez-Campos, A.;
Giulianotti, M. A.; Castillo, R.; Medina-Franco, J. L. Structure
activity relationships of benzimidazole derivatives as antiparasitic
agents: Dual activity-difference (DAD) maps. Med. Chem. Comm.
2011, 2,4449.
(17) Medina-Franco, J. L.; Yongye, A. B.; Pe
́
rez-Villanueva, J.;
Houghten, R. A.; Martínez-Mayorga, K. Multitarget structureactivity
relationships characterized by activity-difference maps and consensus
similarity measure. J. Chem. Inf. Model. 2011, 51, 24272439.
(18) Medina-Franco, J. L.; Martínez-Mayorga, K.; Bender, A.; Marín,
R. M.; Giulianotti, M. A.; Pinilla, C.; Houghten, R. A. Characterization
of activity landscapes using 2D and 3D similarity methods: Consensus
activity cliffs. J. Chem. Inf. Model. 2009, 49, 477491.
(19) Yongye, A.; Byler, K.; Santos, R.; Martínez-Mayorga, K.;
Maggiora, G. M.; Medina-Franco, J. L. Consensus models of activity
landscapes with multiple chemical, conformer and property
representations. J. Chem. Inf. Model. 2011, 51, 12591270.
(20) Pinilla, C.; Edwards, B. S.; Appel, J. R.; Yates-Gibbins, T.;
Giulianotti, M. A.; Medina-Franco, J. L.; Young, S. M.; Santos, R. G.;
Sklar, L. A.; Houghten, R. A. Selective agonists and antagonists of
formylpeptide receptors: Duplex flow cytometry and mixture-based
positional scanning libraries. Mol. Pharmacol. 2013, in press.
(21) Le, Y.; Murphy, P. M.; Wang, J. M. Formyl-peptide receptors
revisited. Trends Immunol. 2002 , 23, 541548.
(22) Zhou, Y.; Bian, X.; Le, Y.; Gong, W.; Hu, J.; Zhang, X.; Wang,
L.; Iribarren, P.; Salcedo, R.; Howard, O. M. Z.; Farrar, W.; Wang, J.
M. Formylpeptide receptor FPR and the rapid growth of malignant
human gliomas. J. Natl. Cancer Inst. 2005, 97, 823835.
(23) Le, Y.; Yazawa, H.; Gong, W.; Yu, Z.; Ferrans, V. J.; Murphy, P.
M.; Wang, J. M. Cutting edge: The neurotoxic prion peptide fragment
PrP(106126) is a chemotactic agonist for the G protein-coupled
receptor formyl peptide receptor-like 1. J. Immunol. 2001, 166, 1448
1451.
(24) Pe
́
rez-Villanueva, J.; Medina-Franco, J. L.; Me
́
ndez-Lucio, O.;
Yoo, J.; Soria-Arteche, O.; Izquierdo, T.; Lozada, M. C.; Castillo, R.
Case plots for the chemotype-based activity and selectivity analysis: A
case study of cyclooxygenase inhibitors. Chem. Biol. Drug Des. 2012,
80, 752762.
(25) Me
́
ndez-Lucio, O.; Pe
́
rez-Villanueva, J.; Castillo, R.; Medina-
Franco, J. L. Activity landscape modeling of PPAR ligands with dual-
activity difference maps. Bioorg. Med. Chem. 2012, 20, 35233532.
(26) Medina-Franco, J. L. Activity cliffs: Facts or artifacts? Chem. Biol.
Drug Des. 2013, 81, 553556.
(27) Molecular Operating Environment (MOE), version 2011.10;
Chemical Computing Group, Inc.: Montreal, Quebec, Canada. http://
www.chemcomp.com (accessed May 31, 2013).
(28) Canvas, version 1.5; Schro
̈
dinger, LLC: New York, 2012.
(29) Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J.
Chem. Inf. Model 2010, 50, 742754.
(30) Sastry, M.; Lowrie, J. F.; Dixon, S. L.; Sherman, W. Large-scale
systematic analysis of 2D fingerprint methods and parameters to
improve virtual screening enrichments. J. Chem. Inf. Model. 2010, 50,
771784.
(31) Pe
́
rez-Villanueva, J.; Santos, R.; Herna
́
ndez-Campos, A.;
Giulianotti, M. A.; Castillo, R.; Medina-Franco, J. L. Towards a
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851484
systematic characterization of the antiprotozoal activity landscape of
benzimidazole Derivatives. Bioorg. Med. Chem. 2010, 18, 73807391.
(32) Singh, N.; Guha, R.; Giulianotti, M. A.; Pinilla, C.; Houghten, R.
A.; Medina-Franco, J. L. Chemoinformatic analysis of combinatorial
libraries, drugs, natural products, and molecular libraries small
molecule repository. J. Chem. Inf. Model. 2009, 49, 10101024.
(33) Bender, A. How similar are those molecules after all? Use two
descriptors and you will have three different answers. Expert Opin.
Drug Discovery 2010, 5, 11411151.
Journal of Chemical Information and Modeling Article
dx.doi.org/10.1021/ci400192y | J. Chem. Inf. Model. 2013, 53, 1475 14851485
    • "Table 2shows the codified set of 106 compounds. The original compound formulations can be found in Table S1of the original article's [5] supporting information. It is worth noting that the SSIR method can be applied systematically without the need for special preparatory operations (other methods require molecular minimizations, alignments, descriptor calculations and so on). "
    [Show abstract] [Hide abstract] ABSTRACT: The Superposing Significant Interaction Rules (SSIR) method is described. It is a general combinatorial and symbolic procedure able to rank compounds belonging to combinatorial analogue series. The procedure generates structure-activity relationship (SAR) models and also serves as an inverse SAR tool. The method is fast and can deal with large databases. SSIR operates from statistical significances calculated from the available library of compounds and according to the previously attached molecular labels of interest or non-interest. The required symbolic codification allows dealing with almost any combinatorial data set, even in a confidential manner, if desired. The application example categorizes molecules as binding or non-binding, and consensus ranking SAR models are generated from training and two distinct cross-validation methods: leave-one-out and balanced leave-two-out (BL2O), the latter being suited for the treatment of binary properties.
    Full-text · Article · May 2016
    • "We have previously reported the results of a collaborative screening effort with Torrey Pines Institute for Molecular Studies (TPIMS) involving libraries generated by combinatorial synthesis and a duplex of G protein coupled receptors (GPCRs) which resulted in a large number of the most active small molecules for the formyl peptide receptors (FPRs) ever reported (Medina-Franco et al., 2013; Pinilla et al., 2013; Santos et al., 2013). The combinatorial library contains more than 5 million small molecules and 26 million peptides. "
    [Show abstract] [Hide abstract] ABSTRACT: Neurological diseases have placed heavy social and financial burdens on modern society. As the life expectancy of humans is extended, neurological diseases, such as Parkinson's disease, have become increasingly common among senior populations. Although the enigmas of Parkinson's diseases await resolution, more vivid pictures on the cause, progression, and control of the illness are emerging after years of research. On the molecular level, GTPases are implicated in the etiology of Parkinson's disease and are rational pharmaceutical targets for their control. However, targeting individual GTPases, which belong to a superfamily of proteins containing multiple members with a conserved guanine nucleotide binding domain, has proven to be challenging. In contrast, pharmaceutical pursuit of inhibition of kinases, which constitute another superfamily of proteins with more than 500 members, has been fairly successful. We reviewed the breakthroughs in the history of kinase drug discovery to provide guidance for the GTPase field. We summarize recent progress made in the regulation of GTPase activity. We also present an efficient and cost effective approach to drug screening, which uses multiplex flow cytometry and mixture-based positional scanning libraries. These methods allow simultaneous measurements of both the activity and the selectivity of the screened library. Several GTPase activator clusters were identified which showed selectivity against different GTPase subfamilies. While the clusters need to be further deconvoluted to identify individual active compounds, the method described here and the structure information gathered create a foundation for further developments to build upon.
    Full-text · Article · Jun 2014
    • "The analysis also rapidly revealed that the compound pair 1754-20/1754-56 (Table 2 ) is a " selectivity switch " because changes in the Rgroups have a large and opposite effect on the activity for FPR1 and FPR2. These and other conclusions from the comprehensive SAR analysis based on the structure of the 106 ligands are reported elsewhere (Franco et al., 2013). The structure-activity analysis derived from the deconvolution of positional scanning libraries provides useful information about the importance of the functionalities at each position of the compound, which can then be used as a starting point for a more detailed characterization of the functionalities required for activity and selectivity. "
    [Show abstract] [Hide abstract] ABSTRACT: The formylpeptide receptor (FPR1) and formylpeptide-like 1 receptor (FPR2) are G protein coupled receptors that are linked to acute inflammatory responses, malignant glioma stem cell metastasis and chronic inflammation. While several N-formyl peptides are known to bind to these receptors, more selective small molecule high-affinity ligands are needed for a better understanding of the physiological roles played by these receptors. High throughput assays utilizing mixture-based combinatorial libraries represent a unique, highly efficient approach for rapid data acquisition and ligand identification. We report the superiority of this approach in the context of the simultaneous screening of a diverse set of mixture-based small molecule libraries. We used a single cross-reactive peptide ligand for a duplex flow cytometric screen of FPR1 and FPR2 in color-coded cell lines. Upon screening 37 different mixture-based combinatorial libraries totaling more than 5 million small molecules (contained in 5,261 mixture samples), seven libraries significantly inhibited activity at the receptors. Using positional scanning deconvolution, selective high affinity (low nM Ki) individual compounds were identified from two separate libraries, namely pyrrolidine bis-diketopiperazine and polyphenyl urea. The most active individual compounds were characterized for their functional activities as agonists or antagonists with the most potent FPR1 agonist and FPR2 antagonist identified to date with an EC50 of 131 nM (4 nM Ki) and IC50 of 81 nM (1 nM Ki), respectively, in intracellular Ca(2+) response determinations. Comparative analyses of other previous screening approaches clearly illustrate the efficiency of identifying receptor selective, individual compounds from mixture-based combinatorial libraries.
    Full-text · Article · Jun 2013
Show more