Content uploaded by Arup Ghose
Author content
All content in this area was uploaded by Arup Ghose
Content may be subject to copyright.
J.
Chem.
Znf:
Comput.
Sci.
1987,
27,
21-35
21
(7)
Balaban, A. T.
MATCH
1976,
2,
51.
Balaban, A. T.
Rev. Roum. Chim.
1981,
26,
407.
Balasubramanian. K.; Kaufman, J. J.: Koski, W.
S.;
Balaban, A. T.
J.
Comput. Chem.
1980,
1,
149.
Knop, J. V.; Szymanski, K.; JeriEeviE,
i.;
TrinajstiE,
N.
J.
Comput.
Chem.
1983,
4,
23.
TrinaistiE.
N.:
JerikviE,
2.:
Knot), J. V.: Muller.
W.
R.; Szymanski,
K.
Pure Appl. Chem.
1983,
55,
j79.
Knop, J. V.; Szymanski, K.; JeriEeviE,
2.;
TrinajstiE,
N.
MATCH
1984,
16,
119.
Knop, J. V.; Muller,
W.
R.; Szymanski, K.; TrinajstiE,
N.
Computer
Generation
of
Certain Classes
of
Molecules;
Association
of
Chemists
and Technologists of Croatia: Zagreb, 1985.
He,
W.;
He, W.
Theor. Chim. Acta
1985,
68,
301.
Brunvoll, J.; Cyvin,
S.
J.; Cyvin, B.
N.
J.
Comput. Chem.,
in press.
DoroslovaEki, R.; TOW, R., private communication,
Novi
Sad,
Yugo-
slavia.
Polansky,
0.
E.;
Rouvray, D. H.
MATCH
1976,
2,
63.
Polansky,
0.
E.;
Rouvray, D. H.
MATCH
1977,
3,
97.
Vogtle,
F.;
Staab, H. A.
Chem. Ber.
1968,
101,
2709.
Jenny,
W.;
Baumgarten, P.; Paioni, R.
Proceedings ofthe Symposium
on
the Nonbenrenoid Aromatic Compounds;
Sendai, Japan, 1970; p
183.
Diederich,
F.;
Staab, H.
A.
Angew. Chem.
1978,
90,
383.
Staab, H. A.; Diederich,
F.
Chem. Ber.
1983,
116,
3487.
Vogler,
H.
THEOCHEM
1985,
122,
333.
Jenny, H.; Peter,
R.
Angew.
Chem.
1965,
77,
1027.
A large selection
of
names has been suggested
for
this concept: hex-
agonal animal, hexanimal, hexagonal polyomino, polyhex, PAH-6
(PAH
=
polycyclic aromatic hydrocarbon), fusene (cat+~furene, peri-
fusene). We are using the term benzenoid, but no standard terminology
seems to prevail at present.
Professor
N.
TrinajstiE has informed
us
privately that the wrong number
(48) is due to a typing error in ref
8.
It was unfortunately repeated in
ref 9. The error has been notice4 by several researchers.
Gutman,
I.
Bull. SOC. Chim., Beograd
1982,
47,
453.
Smith,
F.
T.
J.
Chem. Phys.
1961,
34,
793.
Gordon, M.; Davison,
W.
H.
T.
J.
Chem. Phys.
1952,
20,
428.
Hall, G. G.
Proc. R. SOC. London, A
1955,
229,
251.
Gutman,
I.;
Cyvin,
S.
J.
THEOCHEM
1986,
138,
325.
Coulson, C.
A,;
Longuet-Higgins,
H.
C.
Proc. R. SOC. London,
A
1947,
129,
16.
Longuet-Higgins, H.
C.
J.
Chem. Phys.
1950,
18,
265.
Balaban, A.
T.;
Tomescu,
I.
MATCH
1983,
14,
155.
Cyvin,
S.
J.;
Gutman,
I.
J.
Serb. Chem. SOC.
1985,
50,
443.
Gutman,
I.
Croat. Chem. Acfa
1974,
46,
209.
Dias, J. R.
J.
Chem.
InJ
Comput. Sci.
1984,
24,
124.
RandiE, M.
J.
Chem. SOC., Faraday Trans. 2
1976,
72,
232.
Cyvin,
S.
J.;
Bergan, J.
L.;
Cyvin,
B.
N.
Acta Chim. Hung.,
in
press.
Brown, R. L.
J.
Comput. Chem.
1983,
4,
556.
Atomic Physicochemical Parameters
for
Three-Dimensional-Structure-Directed
Quantitative
Structure-Activity Relationships.
2.
Modeling Dispersive and Hydrophobic Interactions
ARUP
K.
GHOSE* and GORDON M. CRIPPEN*
College of Pharmacy, University of Michigan, Ann Arbor, Michigan 48109
Received July
22,
1986
In
an
earlier paper (Ghose
A.
K.;
Crippen,
G.
M.
J.
Comput. Chem.
1986,
7,
565)
the need
of
atomic physicochemical properties for
three-dimensional-structure-directed
quantitative
structure-activity relationships was demonstrated, and it was shown how atomic parameters can
be developed to successfully evaluate the molecular water-l-octanol partition coefficient, which
is a measure of hydrophobicity. In the present work the atomic values of molar refractivity are
reported. Carbon, hydrogen, oxygen, nitrogen, sulfur, and halogens are divided into 110 atom
types of which 93 atomic values are evaluated from 504 molecules by using a constrained
least-squares technique. These values gave a standard deviation
of
1.269
and
a correlation
coefficient
of
0.994. The parameters were
used
to predict the molar refractivities
of
78
compounds.
The predicted values have a standard deviation of 1.614 and a correlation coefficient of 0.994.
The degree
of
closeness
of
the linear relationship between the atomic water-l-octanol partition
coefficients and molar refractivities has been checked by the correlation coefficient of 89 atom
types used for both the properties. The correlation coefficient has been found to be
0.322.
The
low
value suggests that both parameters can be used to model the intermolecular interaction.
The origin of these physicochemical properties and
the
types of interaction that can be modeled
by
these properties have been critically analyzed.
INTRODUCTION
In the process of drug design, medicinal chemists evaluate
the binding energy of some closely related ligands with a
biological receptor. The explicit structure of the receptor in
most cases is unknown. The ultimate objective of any quan-
titative structure-activity relationship
(QSAR)
is to portray
the receptor by the structural, physicochemical, and biological
properties of the ligand. Not only is the task difficult but the
inherent weakness of the approach ought to make the portrait
misty. Explanation of the simplest biological data, namely,
the binding energy of the ligand on the purified receptor,
involves
(1)
the three-dimensional structure of the biological
receptor' and its conformational flexibility,2 (2) knowledge
of the active site,'
(3)
the conformational behavior of the
ligand,3s4 (4) the interaction of the biophase5 with the lig-
and/receptor, and, most important, (5) the interaction of the
ligand with the receptor. Each process has its energetic (en-
thalpic) and entropic contribution. The energetic contribution
often is easier to model than the entropic part. Entropy is
related to the flexibility of the ligand and the receptor as well
as the structural randomness of the biophase around the ligand
and the receptor before and after binding. The complexity
of these processes leads to very slow development along this
line and urges some method that can allow
us
a rough estimate
of the active site.
Most
QSAR
approaches therefore correlate the binding
energy of the ligand with different physicochemical properties
for different parts of the ligand. If these physicochemical
properties represent the different types of molecular forces,
one can guess the nature of interaction at different regions.
The first problem is therefore to identify the possible types
of forces in the biomolecular interaction and next to identify
the physicochemical properties that can model these forces.
Unlike the intermolecular interaction between simple mole-
cules, the biochemical interaction of a drug involves a mac-
romolecule
on
one side. The macromolecule is assumed to have
low flexibility under physiological conditions, and hence the
steric fit of the ligand structure at the active site often con-
0095-2338/87/1627-0021$01.50/0
0
1987 American Chemical Society
22
J.
Chem. InJ Comput. Sci.,
Vol.
27,
No.
1,
1987
stitutes a major factor. The flexibility in turn is a complex
function of the intramolecular forces within the biomolecule
and in the biophase. The interaction
of
the biophase with the
ligand constitutes another important factor in the biochemical
process. If the ligand is highly solvated and
needs
desolvation
for the binding process, such binding ought to
be
weak unless
it
is
compensated by strong interaction with the receptor. The
interaction of the biophase with the ligand
or
the receptor is
often governed by entropy rather than by enthal~y.~,~ The inert
gases and simple hydrocarbons are only slightly soluble in
water, although they have a favorable (negative) enthalpy of
solution. The negative enthalpy comes from two sources, the
dispersive force between the solute and the solvent and
structuring
of
the water around the solute. It is the latter
factor that gives unfavorable (negative) entropy. Both en-
thalpic and energetic factors are responsible for the hydro-
phobic interaction. The term
hydrophobic interaction
refers
to the force
or
the corresponding energy that operates between
two
or
more nonpolar solutes in liquid water. Although the
theoretical work
on
hydrophobic interactions led to a clear
understanding of the molecular structure of aqueous solution,
it has hardly begun to build a satisfactory theoretical de-
scription
of
the process that has a wide range of practical
applicability.
In
such a situation, medicinal chemists try to
model this interaction using a physicochemical property that
closely parallels the hydrophobicity. They use the partition
coefficient of the ligand molecules between water and a non-
polar solvent (usually 1-octanol) as a measure of hydropho-
bicity. This property, in fact, represents nonspecific dispersive
and electrostatic forces and the consequent entropic factor.
However, biological interaction has some regiospecific dis-
persive and electrostatic forces and thus urges the use of some
physicochemical properties that can handle these forces. The
formal charge density4 on the atoms
or
the electrostatic po-
tential near the van der Waals surface is a good measure of
the electrostatic forces. Since the primary objective of this
paper is to develop parameters that can be used to model the
dispersive interaction of the ligand at the receptor site, we shall
consider this interaction in greater detail in what follows.
THEORY OF DISPERSIVE FORCE AND ATOMIC
REFRACTIVITY
London
first showed that the attractive force between
nonpolar molecules is due to correlation of the electron motion.
It is therefore known as London forces
or
dispersive forces.’**
An
accurate quantum chemical treatment of the process is very
diffi~ult.~ Since the polarizability is closely related to the
dispersive force, all approximate formulas for the latter are
obtained by replacing unevaluated terms by it, if they ap-
proximately represent polarizability. Thus, according to
London, the dispersive interaction between two spherically
symmetrical systems A and
B
is
GHOSE
AND
CRIPPEN
where
a
is the polarizability and
U
is the approximate ioni-
zation energy. On the other hand, according to Slater-
Kirkwood
where
N
is an empirical parameter known as the effective
number
of
electrons. Equations
1
and
2
are strictly applicable
for spherically symmetrical systems and are not suitable for
most molecular systems. However, Pitzer’ first used this idea
to
calculate the intramolecular dispersion interaction. The
dispersion energies were summed for all pairs of nonbonded
atoms. The approximation of atom-pair dissection of the
dispersive force ultimately led to the development
of
the
molecular mechanics method for conformational analy~is.~
This method is found to be successful for evaluating inter-
molecular interactions.1° Theoretical estimation of the lig-
and-receptor binding energy from the properties of the ligand
is based
on
the idea that the properties of the ligand and the
receptor can be separated
EL
=
Kf(A)f(B)
(3)
wheref(A) andf(B) represent functions characteristic of the
ligand and the receptor, respectively. If the receptor is rela-
tively rigid and the different ligands bind in the same region
of the receptor, the distance
R
in
eq
1
or
2
may be assumed
to be constant. The other part (containing the ionization
energy) can be separated in the form of eq
3
if
in
eq
1
(a)
UA
>>
VB
or
(b)
UA
<<
UB
or
(c)
UA
i=
UB
(4)
If atom-pair dissection of the dispersive interaction is accepted,
then the interaction of a particular ligand atom with the re-
ceptor leads to different expressions under different conditions:
under condition 4a
Under the summation is over all the receptor atoms. The
quantity within brackets in eq
5
for a particular receptor will
be constant for a small specified region due to the distance
factor.
In
other words, if the receptor is rigid, the propor-
tionality constant of dispersive force with the polarizability
of the ligand
(Y~
will be different
in
different regions. This
is why in
our
three-dimensional-structure-directed quantitative
structure-activity relationships’
’
the hypothetical site cavity
is divided into small pockets of different types.
under condition 4b
Clearly here polarizability of the ligand alone cannot be used
for modeling the dispersive interaction. Here the appropriate
quantity is
(YAUA.
Under condition 4c the corresponding expression for dis-
persive interaction becomes half of
(5)
or
(6).
On
the other hand, the Slater-Kircwood equation (eq
2)
can be separated in the form of eq
3
if
aA ffB aA ffB (YA aB
NA NB NA NB NA NB
(a)
-
>>
-
or
(b)
-
<<
-
or
(c)
-
=
-
(7)
Under conditions 7a and 7b the expression for the total dis-
persive interaction of a ligand atom with the receptor becomes
eq
8
and
9,
respectively:
(9)
Here also we reach similar conclusions that
under
certain
conditions the dispersive force is a linear function of polariz-
ability, and under some other condition it is a linear function
of
(aANA)’12.
Although assuming only one
of
the ionization energy
con-
ditions (4a-c)
or
one of the polarizability conditions (7a-c)
for
all receptor atoms may seem very crude,
in
practice it is
not that bad, since in a particular region only those atoms of
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS
the receptor which are close to the ligand atom will make
major contributions to the dispersive interaction. However,
it may be a good idea to make the dispersive interaction a
linear function of both
aA
and (aANA)’/’.
The polarizability
a
of a substance is directly proportional
to its molar refractivity, MR,” as
MR
=
47rNa/3 (10)
where
N
is Avogadro’s number. It is therefore obvious that
for a small region in the hypothetical receptor the dispersive
interaction may be modeled as a linear function of the molar
refractivity. The proportionality constant characterized by
the receptor and the position should be adjusted
so
that it can
represent the observed binding energies of the ligand with the
receptor.
It can be deduced from electrostatics” that for a spherical
m
o
1
e c
u
1
e
where
r
is the radius of the molecule. Inserting eq 11 in eq
10, we
see
that molar refractivity is equal to the actual volume
of the molecules in
1
mol. If this interpretation holds in
general, then the atomic contribution to molar refractivity is
the volume of the atom in the molecule. Such volume should
be different from the isolated atomic volume due to (1) the
effect of polarity of the bonds on the atomic volume and (2)
the overlap of the electron clouds of the bonded atoms.
METHOD
OF
CALCULATION
Classification
of
the Atoms.
In an earlier work,13 we
evaluated atomic hydrophobic parameters from water-octanol
partition coefficients. That involved representing commonly
occurring atomic states of carbon, hydrogen, oxygen, nitrogen,
halogens, and sulfur in organic molecules by 110 atom types.
Since the factors considered in classifying the atoms also affect
the molar refractivity and the identical classification allows
checking the correlation between the two properties, the atom
classification was kept unaltered in this work also (Table
I).
This classification partly differentiates (1) the polarizing effect
of the heteroatoms and (2) the effect of overlapping with
non-hydrogen atoms. The classification, however, may be
weak in differentiating the conjugation effects. The atoms thus
classified cover most of the common neutral organic molecules
containing the above-mentioned atoms. The classification may
not completely cover all organic molecules and we are not
overly concerned, since addition of atom types is always fea-
sible. Since the constitutive factor of the property has been
included (at least partly) by giving them different types, the
evaluation of the individual atomic value is based on the idea
that the sum of the atomic values
(ai)
is the molecular value:
a
=
r3
(1 1)
Preparation
of
Data.
The preparation of data involves two
distinct steps: (1) collection of the molar refractivities of
various compounds and (2) classification of the atoms ac-
cording to their environment in the structure. Since in the
atom classification a large number of atom types are used, it
is necessary to have an even larger number of molecules in
the data set to get a statistically significant result. However,
classification of the atoms from a long list of atom types is
extremely error prone. In order to keep the data accurate,
the molecular structure (topology and bond type) was gen-
erated by a computer program
CHEMSTRUC13
using simple
commands comparable to CAS ONLINE substructure gen-
eration. The correctness of the structure is checked by
graphics, and the program has some other logical checks that
assure the correctness of the structure even when visual aids
fail
to
detect structure errors. Even then we feel that the best
way to prepare absolutely error-free input data is to have the
J.
Chem.
In$
Comput.
Sci..
Vol.
27,
No.
1,
1987
23
structures generated by more than one person and accept them
if they are identical. However, in the present work such error
checking was not done due to lack of resources. The structural
information is kept in the Cambridge Crystallographic Data
File format with minor modifications. Another program,
CLASIF,
uses this information to classify the atom types ac-
cording to Table
I.
Mathematics
of
Evaluation.
Although the least-squares
technique is the most standard procedure for fitting the data
in an equation like
eq
12, it cannot
be
used here. The physical
concept of molar refractivity is the volume of the molecule or
atom, which cannot have a negative value. In simple least-
squares method such a condition cannot be maintained.
Constrained least-squares fitting, however, is a special case
of quadratic programming,14J5 which has been used here.
Another advantage of this method is that with some modifi-
cation, quadratic programming can be used to confine the
solution to any desired region of the solution space. This
feature is sometimes helpful in confining the solution to a
physically realistic region. For the present study the quadratic
programming problem can be defined as follows:
minimize
=
[MRcalcd
-
MRobsdl (13)
where MRcalcd is given by eq 12,
subject to the constraints
ai
L
1,
i
=
1, 2,
...,
n
(14)
where the
a,’s
are the atomic refractivities and the
1,’s
are the
corresponding desired lower limits of the solution. It is im-
portant to note that this formulation of the problem becomes
identical with least squares if the lower limits of the variables,
as given by eq 14, are kept sufficiently low.
RESULTS AND DISCUSSION
The compounds used to evaluate the atomic refractivity are
shown in Table
11.
The molar refractivity values were either
obtained from the compilation of VogelI6 or evaluated from
the molecular weight, density, and refractive index values.”
Some of the parameters were evaluated from a limited number
of compounds due to the unavailability of molecules having
that atom type. Getting a stable solution is a difficult problem
when a large number of parameters are used in a fitting study.
When the number of compounds was much lower, the solution
for the different carbons was very unstable in the sense that
adding more molecules resulted in substantially different fitted
values. A relatively stable solution was obtained when the
number of compounds was nearly 400. One hundred more
compounds were added after this stage for even greater sta-
bility. The Lemke algorithm for quadratic pr~gramming’~ was
used for the initial evaluation of the parameters; the resultant
values were finally refined by using the pattern search tech-
nique.’*
In
order to explain the classification of the atoms,
10
selected
molecules are presented with their skeletal structures and
complete atom classification in Table I11 and Figure
1.
During this study we found some inconsistencies in the
values of molar refractivities calculated from the data of the
CRC
Handbook.” Some of these compounds should be
mentioned. For 2-chloroacetophenone (24 369) the refractive
index
(nD)
has been given as 1.685, which led to the molar
refractivity of 48.90. This value was far from the calculated
one, approximately 40.5, but the original reference of Beilstein
(B73, 963) showed the refractive index to be 1.5404, which
gave the molar refractivity to be 40.39. For 3,4-benzoisoxazole
(25 151) the density and refractive index were given to be
1.8127 and 1.5845, respectively, which suggested the molar
refractivity value to
be
22.008, whereas the fitted value
in
most
24
J.
Chem.
InJ
Comput.
Sci.,
Vol.
27.
No.
I,
1987
GHOSE
AND
CRIPPEN
Table
I.
Classification of Atoms and Their Contributions
to
Molar Refractivity and Hydrophobicity
type descriptionn
I
111
no.
of
compd
freq of use partition coeff'
atomic refracb
1
2
3
4
5
6
7
8
9
10
I1
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52-55
56
57
58
59
60
61'
62-65
66
67
68
69
70
71
72
c
in
:CH3R, CH4
:CH2R,
:CHR,
:CR4
:CH?X
:CH2RX
:CH,X,
:CHRIX
:CHRX,
:CHX3
:CR,X
:CR,X,
:CRX,
:=CH,
:=CR,
:cx4
:=CHR
:=CHX
:=CRX
:=CH
:=CR, R=C=R
.=
.-ex
:R-
-CH- -R
:R-
-CR- -R
:R-
-CX- -R
:R-
-CH- -X
:R-
-CR- -X
:R- -CX- -X
:X- -CH-
-X
:X- -CR- -X
:x-
-cx-
-x
:R- -CH**.X
:R-
-CR* **X
:R-
-CX*..X
:AI-CH=X
:
Ar
-
C H=X
:Al-C(=X)-Al
:
Ar-C(=X)-R
:R-C(=X)-X
R-CEX, X=C=X
x-C(=X)-x
:X- -CH*..X
:X- -CR***X
:x-
-cx..
.x
unused
H attached
tod
:=cx,
:heteroatom
:a-C
unused
:alcohol
:phenol, enol
carboxyl OH
.-0
:AI-0-AI
:AI-0-Ar, Ar,O
:-
-0
unused
:AI-",
0
in
._
R..
.OB.
.R, R-O-C=X
N
in
:A12NH
:Al,N
:Ar-NH,, X-NH,
:Ar-NH-Al
:Ar-NAI2
:RCO-N<,
>N-X=X
1.0330
1.4336
2.0068
1.8489
2.4666
2.6338
3.1274
2.7332
2.7885
3.0075
2.5823
2.7286
2.1784
3.1677
2.8557
4.1009
3.7162
3.6247
4.4024
1.9708
3.1472
4.2943
3.2593
3.0745
4.3404
3.7428
1.3896
1.3044
1.6607
1
.oooo
1
.ooo
1
3.1 193
3.7755
2.7215
3.3750
4.021
1
3.3265
3.8103
2.3957
1.8242
2.0234
1.0004
1.0006
1.5903
1.1616
1.0309
1.0001
1.0001
1.4647
1.1231
1.2847
1.8485
1
.oooo
1.8787
1.0005
1.8396
1.5242
2.1094
3.1442
2.6774
3.8031
2.8495
2.3000
2.3071
2.4926
2.3000
3.4006
3.2624
3.6770
3.0137
3.225
3.2401
2.6140
3.1488
2.3010
3.3559
3.5071
4.48 14
3.7781
3.621
1
4.4310
3.2000
3.4161
4.3043
3.4905
3.4127
4.3725
3.8182
2.5001
2.5000
2.7967
2.5000
2.5000
3.4372
3.4494
3.1048
3.8251
4.5401
3.7529
4.1288
2.7938
2.4165
3.0606
2.5001
2.5001
1.1461
0.8000
0.8006
0.8001
0.8000
1.0026
1.4430
1.4090
1.6506
1.2000
1.8434
1.6001
2.5001
2.5001
2.5377
3.6195
2.9832
3.9733
3.0059
25
1
165
32
11
67
173
I5
40
15
5
15
16
13
4
39
45
7
11
14
8
12
18
7
184
105
98
14
15
4
1
0
1
19
14
4
15
15
15
8
112
13
6
5
3
270
389
69
55
104
113
21
30
169
30
115
21
IO
9
8
9
8
IO
16
399
372
34
11
97
252
16
45
19
5
16
25
16
4
45
57
7
11
16
12
13
23
7
834
132
144
16
16
4
1
0
3
22
15
5
15
15
16
8
138
13
6
5
2
1695
1773
95
61
145
330
23
35
202
44
145
45
11
9
8
12
8
11
17
-0.623 2
-0.3957
-0.2821
0.2112
-1.1423
-0.9557
-0.9679
0.2041
0.5335
0.6684
-1.1 I65
1.0525
0.5390
1.1390
-0.2 3
8
6
-0.0363
-0.3295
-0.4739
-0.2407
0,1922
0.1517
0.3324
-0.0447
0.3301
-0.1244
0.057
1
0.221
8
-0.1456
-0.1179
0.6739
0.0740
0.0117
0.1381
-0.27
1
0
0.1671
-0.0909
-0.6828
-0.4963
-0.46 7 2
-0.2992
-0.6503
-0.2786
-0.2992
0.4283
0.3607
-0.0069
-0.2539
-0.4001
0.2271
0.0101
0.5624
0.1230
0.1017
0.3914
1.8239
0.3562
0.371 2
0.4886
0.5128
1.1484
0.7387
0.2160
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIW
J.
Chem.
Inf.
Comput.
Sci.,
Vol.
27,
No.
1,
1987
25
Table
I
(Continued)
atomic refracb
type description"
I
I11
no. of compd freq of use partition coeffe
73
:Ar,NH, Ar2N
74
75
76
77
78
79-80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101-105
106
107
108
109
Ar2N-A1,
Re.
.N.. .Rf
:REN, R=N-
:R--N-.R,B R--N--X
RO-NOZ
:AI-N02
:Ar-N02,
R-
-N(- -R)-
-Oh
:Ar-N=X, X-N=X
unused
F
attached to
:CSp31
:Gp3
:Cspz1
:csp2-4, Capl
c,;,
x
:C,p3l
:csp32
:Csp3'
:cspzl
:Csp22-?
cspl
c,;,
x
:C,p3'
:csp32
:Csp3'
:Cspz'
:Csp22-4 Capl
Gp3l
:c,2
:c,p24,
CBP1
c,;,
x
:csp3;
CI attached to
Br
attached to
c,;,
x'
:csp3;
:csp31
I
attached to
unused halogens
S
in
:R-SH
:R=S
:R-SO-R
:R2S, RS-SR
2.4082
3.3952
6.2666
5.9990
3.9660
3.4136
1.0001
1.0001
1.4160
1.0001
2.2548
5.2233
5.7784
5.7328
4.6108
6.4057
8.2314
8.6483
8.9016
8.027 1
9.2260
13.5880
13.6990
13.4388
12.8225
13.6716
7.4314
7.5003
9.4004
4.6036
4.4935
2.6295
3.1464
4.5123
4.7125
3.0389
3.6838
0.8060
0.8000
1.3484
0.8000
1.6440
5.3647
5.6484
5.6858
5.0000
5.9312
8.3379
8.5393
8.8635
8.0866
9.0569
13.7535
13.6306
13.4586
12.8876
13.5530
7.7751
7.3151
9.2916
5.3957
5.4662
7
27
24
15
6
11
8
7
7
21
6
22
16
11
26
28
21
10
3
14
9
$7
4
3
5
1
9
18
6
7
8
7
29
27
17
6
14
8
32
28
34
12
27
28
32
29
37
25
21
9
14
9
8
7
3
5
1
10
20
7
7
8
0.4777
0.1989
0.1605
-3.1845
-3.3406
-0.1 367
0.4929
-0.1394
0.1457
0.6128
0.4989
1.1021
0.3333
0.4402
1.0372
0.7220
1.1263
0.4640
1.3343
1.0137
1.4608
1.8362
1.0859
1.1181
1.0769
0.3726
-0.5594
-0.6864
"R represents any group linked through carbon; X represents any heteroatom
(0,
N,
S,
and halogens); AI and Ar represent the aliphatic and
aromatic groups, respectively;
--
represents aromatic bonds as in benzene
or
delocalized bonds as the N-0 bond in nitro group; represents
aromatic single bond as the C-N bond in pyrrole. bAtomic refractivity of only one atom. CA different data set was used to evaluate these values.
The data set here is similar to the one reported earlier (ref
11)
with two additional compounds, 2-methylbenzoimidazole and phenylacetaldehyde.
dThe subscript represents hybridization and the superscript its formal oxidation number.
gpyridine type structure. Pyridine-N-oxide type. As in nitro, =N-oxides. fpyrrole type structure.
studies was approximately 34. Beilstein (B27*, 17) showed
the density and refractive index to be 1.1866 and 1.5789,
respectively. These values suggested a molar refractivity of
33.36. For benzylidene dibromide (27 296) the density and
refractive index are 1.51 and 1.6147, respectively, suggesting
a molar refractivity of 57.743. The fitted value was much
lower, near 47.5. In the original reference of Beilstein (B54,
836) these values are given as 1.8365 and 1.6106, respectively,
leading to a value of 47.222. For thiazole (37 802) the density
is given as 1.998, giving a molar refractivity of 14.515, while
the fitted value was around 21. Another source19 gave the
density to be 1.1998, giving the molar refractivity of 24.192.
Since in most other cases the agreement between the exper-
imental and the fitted values was good enough, we did not try
to check those values from Beilstein. Also, the
Handbook
reference of Beilstein did not always give the density and
refractive index values, but in turn cited some other reference.
There are two compounds for which we did not find any
discrepancy in the reported density and refractive index values
but still may be incorrect:
p-chloro-N-methylaniline
(24 937)
and trichloro-( 3-chloropheny1)methane (27 078). When these
values were corrected, the calculated values showed a standard
deviation of 1.269, a correlation coefficient of 0.994, and an
explained variance of 0.984. These parameters were used to
predict the molar refractivity of 78 molecules listed in Table
IV. The calculated values showed a standard deviation of
1.614 and a correlation coefficient of 0.994.
If we look at the atomic values of the various carbons (study
I), we see that the saturated carbons have values around 2.5,
lower than the roughly 3.5 for the ethylenic or acetylenic
carbons. The effect of carbon substitution
on
these carbons
usually goes through a maximum, as is indicated by the value
of the subsets: carbon replacing hydrogen in a saturated
carbon when no heteroatom is present, 1-1.0330, 2-1.4336,
3-2.0068, 4-1.8489 (here the first number indicates the atom
type and the second one its refractivity; see Table I for the
definition of the atom types); when one heteroatom
is
present,
5-2.4666, 6-2.6338, 8-2.7332, 11-2.5823; when two hetero-
atoms are present, 7-3.1274, 9-2.7885, 12-2.7286 (is one side
of the peak missing here?; in the earlier subsets the value
started declining at the fourth place); when three heteroatoms
are present, 10-3.0075, 14-3.1677; in ethylenic carbon, 15-
26
J.
Chem.
If.
Comput.
Sci.,
Vol.
27,
No.
1,
1987
GHOSE
AND
CRIPPEN
Table
11.
Compounds Used to Evaluate the Atomic Refractivity
calcd
from
study
no.
ID"
comDd obsd
I
I1
111
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
1001
1
002
1
004
1008
1 009
1011
1018
1022
1026
3 001
3 002
3 003
3 004
3 005
3 009
3010
3011
3 016
3 018
3 021
3 032
5 001
5 002
5 007
5 008
10117
10119
10 120
11 138
11 139
12 141
12 142
12 143
12 150
12 156
12 159
12 161
12 162
12 166
12 167
12 168
12 177
13 180
13 181
13 190
13 191
13 199
13236
13242
13 250
13263
13 325
14279
14280
14281
14282
14283
14285
14287
14288
14289
14290
14 292
14295
14296
14298
14 299
14300
14 302
14 306
14307
14 308
14 309
14310
14311
methyl malonate
methyl succinate
methyl adipate
ethyl malonate
ethyl succinate
ethyl adipate
methyl dimethylmalonate
methyl dipropylmalonate
1,l
-bis(methoxycarbonyl)cyclohexane
cyclopentanone
3-methylcyclopentanone
cyclohexanone
2-methylcyclohexanone
3-methylcyclohexanone
methylenecyclopentane
methylenecyclohexane
3-methylmethylenecyclohexane
cyclopentene
3-methylcyclopentanol
cyclohexanol
cycloheptanol
acetone
2-butanone
2- hexanone
4-methyl-2-pentanone
toluene
n-propylbenzene
isopropylbenzene
acetophenone
propiophenone
diethyl ether
dipropyl ether
diisopropyl ether
methyl n-butyl ether
2,2'-dichlorodiethyl ether
phenyl methyl ether (anisole)
n-propyl phenyl ether
isopropyl phenyl ether
allyl phenyl ether
dimethoxymethane
diethoxymethane
1,l -dipropoxyethane
ethyl formate
n-propyl formate
n-propyl acetate
isopropyl acetate
methyl propionate
diethyl oxalate
dimethyl succinate
dimethyl adipate
dimethyl methylmalonate
chlorobenzene
1,2-dichloroethane
1,2-dichIoropropane
benzyl chloride
1,3-dichloropropane
methyl chloroacetate
n-propyl chloroacetate
1,2-dibromoethane
1,2-dibromopropane
1,3-dibromopropane
n-propyl bromoacetate
ethyl a-bromopropionate
1
-bromo-2-phenylethane
ethyl 2-bromoethyl ether
1,3-diiodopropane
1 -iodo-2-phenylethane
propyl iodoacetate
1 -fluoropentane
fluorobenzene
4-fluorotoluene
a-fluoronaphthalene
4-chlorotoluene
m-dichlorobenzene
benzenesulfonyl fluoride
28.62
33.01
42.18
37.89
42.35
51.51
37.73
56.07
49.16
23.31
27.97
27.87
32.51
32.65
27.29
32.15
37.19
22.40
29.37
29.16
34.00
16.11
20.67
30.04
30.15
31.10
40.42
40.39
36.27
40.83
22.51
3 1.68
31.71
27.02
3 1.94
32.88
42.28
42.39
41.73
19.20
28.53
42.37
17.71
22.41
26.95
26.96
22.14
33.56
32.99
42.20
33.18
31.14
21.00
25.69
36.03
25.50
22.34
31.72
26.96
3 1.77
31.13
34.57
34.35
43.8
1
29.41
41.51
48.78
39.72
24.99
25.98
30.74
43.73
35.99
36.16
34.87
28.51
32.87
42.10
38.13
42.49
51.72
37.61
56.06
49.07
23.13
27.92
27.74
32.66
32.53
27.35
31.97
36.75
24.37
29.26
29.09
33.70
16.03
20.77
30.00
30.17
31.32
40.55
40.73
36.61
41.35
22.52
3 1.75
32.01
26.94
3 1.27
32.75
42.18
42.30
42.20
19.09
28.71
42.37
17.88
22.50
26.93
27.05
22.24
33.77
32.87
42.10
33.42
29.53
20.36
25.10
35.70
24.98
22.86
32.28
26.38
31.12
30.99
35.29
35.12
43.32
29.91
41.70
48.68
40.65
25.60
25.92
3
1.83
43.08
35.44
33.65
35.37
2818 1
33.46
42.75
38.10
42.75
52.04
38.10
56.69
50.21
23.47
28.12
28.12
32.76
32.76
28.07
32.72
37.37
23.43
29.52
29.52
34.16
16.01
20.66
29.95
29.95
31.19
40.49
40.49
36.08
40.72
22.05
3 1.35
31.35
26.70
3 1.44
32.83
42.12
42.12
42.31
19.04
28.33
42.28
17.64
22.29
26.94
26.94
22.29
33.46
33.46
42.75
33.46
3 1.24
20.51
25.16
35.89
25.16
22.34
3 1.63
26.65
31.30
3 1.30
34.70
34.70
43.60
29.82
41.69
48.80
39.90
25.21
26.69
31.34
42.56
35.89
35.94
35.62
32.80
42.00
38.09
42.40
5 1.60
37.95
56.35
49.47
23.23
28.00
27.83
32.75
32.60
27.28
31.88
36.66
24.36
29.23
29.05
33.65
16.02
20.76
29.96
30.14
31.17
40.37
40.55
36.52
41.27
22.40
31.60
31.78
26.80
3 1.38
32.53
41.93
42.02
41.98
19.28
28.88
42.56
17.69
22.29
26.80
26.89
22.14
33.78
32.80
42.00
33.41
29.88
20.45
25.14
35.66
25.05
22.72
32.12
26.40
3 1.09
3 1
.OO
35.09
34.98
43.24
29.86
41.83
48.65
40.5
1
25.20
25.68
3 1.58
42.85
35.78
34.49
35.29
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS
J.
Chem.
In$
Comput.
Sci.,
Vol.
27,
No.
1,
1987
21
Table
I1
(Continued) calcd from study
no.
ID’
compd obsd
I
I1
i11
76 40.16 39.58
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
15313
15
315
15317
15
318
15325
15
327
15
328
16331
16 332
16 334
16 349
16352
16
355
16356
16 359
16 366
17390
17 391
17 398
17 399
17408
17409
17415
17417
17418
18419
18420
18421
18 426
18 428
18 432
18433
19438
19440
19441
19 442
19444
19445
19 446
19450
20 453
20454
20 467
20 468
20 474
20 475
21 483
21 484
21 494
21 495
21 496
22 503
22
505
22513
22514
22515,
22517
22518
22 525
22 526
22 528
22 529
22
535
22 541
23 546
23 549
23
553
23 554
23
555
23 551
23 558
23 559
23 560
23 562
benzenesulfonyl chloride
methyl benzoate
n-propyl benzoate
methyl phenylacetate
ethyl phenylacetate
bromobenzene
iodobenzene
a-methylnaphthalene
vinylacetic acid
methyl vinylacetate
n-propyl vinyl acetate
ethyl allylmalonate
allyl acetate
allyl succinate
allyl chloride
methyl maleate
ethyl fumarate
methyl but-3-yne- 1-carboxylate
ethyl but-3-yne-
1
-carboxylate
dimethyl acetylenedicarboxylate
diethyl acetylenedicarboxylate
methyl cyanide
ethyl cyanide
allyl cyanide
phenyl cyanide
benzyl cyanide
methyl cyclopropyl ketone
cyclopropanecarboxylic acid
methyl cyclopropanecarboxylate
diethyl cyclopropane-
1,l
-dicarboxylate
dimethyl cyclobutane-
1,l
-dicarboxylate
cyclobutanecarboxylic acid
methyl cyclobutanecarboxylate
methyl cyclopentyl ether
cyclopentyl formate
cyclopentyl acetate
cyclopentyl chloride
cyclopentyl iodide
dicyclohexyl
methyl cyclohexyl ether
cyclohexyl chloride
methyl alcohol
ethyl alcohol
allyl alcohol
2-methoxyethanol
acetic acid
propanoic acid
ethanethiol
propanethiol
thiophenol
methyl phenyl thioether
ethyl phenyl thioether
propylamine
isobut ylamine
ethylenediamine
aniline
benzylamine
diethylamine
di-n-propylamine
dicyclohexylamine
ethyl N-methylcarbamate
N-nitroso-N-methylaniline
N-methylaniline
tripropylamine
N,N-dimethylaniline
ethyl dichloroacetate
methyl trichloroacetate
dichloromethane
dibromomethane
diiodomethane
1,1,2,2-tetrachIoroethane
chloroform
methylchloroform
carbon tetrachloride
1,1,2,2-tetrabromoethane
41.03
37.81
47.22
41.84
46.55
33.99
39.15
48.65
21.73
26.30
35.65
51.27
26.39
50.85
20.42
33.18
43.20
29.32
34.05
32.72
42.22
11.09
15.75
19.67
31.58
35.22
23.91
20.77
25.34
45.60
40.70
25.14
29.7
1
29.42
29.53
34.07
27.96
36.38
53.22
34.02
32.99
8.22
12.90
16.98
19.18
12.99
17.51
19.02
23.71
34.52
39.42
44.19
19.45
23.98
18.23
30.56
34.45
24.30
33.51
56.91
25.73
39.97
35.67
47.68
40.8
1
32.16
32.47
16.38
21.90
32.54
30.60
21.37
26.20
26.45
41.97
39.52
3760
47.02
41.96
46.77
32.95
37.75
48.48
21.33
26.88
36.30
52.30
26.95
51.77
20.62
34.61
44.29
29.52
34.33
32.14
42.36
11.22
15.96
20.60
31.31
35.67
23.30
19.23
24.77
44.85
39.84
23.84
29.39
29.30
29.47
33.90
27.58
35.94
53.34
33.92
32.19
8.07
12.88
17.52
18.99
11.96
16.70
19.19
23.81
33.35
38.38
43.18
19.22
24.00
17.59
30.07
34.32
24.05
33.27
56.46
26.1
1
38.99
34.55
48.24
40.63
32.69
3
1.45
16.75
22.49
32.59
30.75
21.21
25.18
26.10
42.23
37.71
47.01
42.36
47.01
34.31
39.51
47.06
22.48
27.13
36.42
52.24
27.13
52.43
20.66
33.65
42.94
30.09
34.73
31.96
41.25
11.85
16.50
21.34
31.92
36.57
23.47
20.46
25.1
1
45.57
40.92
25.1
1
29.75
29.52
29.75
34.40
27.93
36.19
53.93
34.16
32.58
8.1
1
12.76
17.60
19.04
13.00
17.64
18.44
23.09
33.87
38.51
43.16
19.70
24.34
18.98
30.47
35.12
24.34
33.64
57.86
26.22
39.81
35.12
47.58
39.77
31.68
31.73
15.87
22.00
32.40
29.90
20.56
25.21
25.26
42.18
37.52
46.92
41.84
46.64
32.97
37.77
48.35
21.35
26.79
36.19
52.26
26.85
51.70
20.62
34.74
44.34
29.52
34.32
32.79
42.39
11.25
15.99
20.64
31.38
35.69
23.41
19.35
24.78
45.28
40.27
23.95
29.38
29.21
29.30
33.81
27.58
35.96
53.27
33.81
32.17
8.04
12.84
17.49
18.97
1
1.96
16.70
19.18
23.78
33.46
38.00
42.80
19.30
24.08
17.93
30.10
34.40
24.50
33.70
56.92
26.12
39.02
34.47
48.14
40.46
32.41
3
1.45
16.57
22.36
32.54
30.64
21.10
25.10
26.10
42.21
28
J.
Chem.
If.
Comput.
Sci.,
Vol.
27,
No.
1, 1987
GHOSE
AND
CRIPPEN
Table
I1
(Continued)
calcd
from
study
no.
ID"
compd obsd
I
I1
111
151 23 563 29.86 30.7 1 29.77 30.63
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
20
1
202
203
204
205
206
207
208
209
210
21 1
212
213
214
215
216
217
218
219
220
221
222
223
224
23 565
23 566
23 568
23 569
23 573
23 574
23 577
23 578
23 584
23 585
23 588
23 591
23 601
24 009
24010
24011
24021
24 023
24 026
24 036
24 040
24041
24 042
24 046
24 047
24 05 1
24 057
24 058
24 060
24 067
24 073
24 176
24 177
24 178
24 183
24 187
24 190
24 193
24 306
24 307
24 309
24314
24 35
1
24 369
24 370
24 45 1
24 480
24 48
1
24 484
24 486
24519
24 770
24 777
24 791
24 797
24 801
24 834
24 844
24 872
24 876
24 879
24 883
24 887
24 937
24 941
24971
24 972
24 999
25 151
25 154
25 326
25 327
25 338
bromoform
ethyl orthoformate
propyl orthoformate
thionyl chloride
sulfuryl chloride
dimethyl-N-nitrosoamine
diethyl-N-nitrosoamine
nitromethane
nitroethane
nitrobenzene
n-butyl nitrite
ethyl nitrate
dimethyl carbonate
propyl xanthate
5- bromoacenaphthene
5-chloroacenaphthene
5-iodoacenaphthene
acetaldehyde
aminoacetaldehyde diethyl acetal
bromoacetaldehyde dimethyl acetal
acetaldehyde diethyl mercaptal
diphen ylacetaldeh yde
ethoxyacetaldehyde
hydrox yacetaldeh yde
acetaldoxime
phenylacetaldehyde
tribromobenzaldehyde
trimethylacetaldehyde
acetamide
diacetylethylamine
N-acetyl-N- butylaniline
diethylacetamide
allyl acetate
acetic anhydride
trifluoroacetic anhydride
bromomethyl acetate
sec-butyl acetate
tert-butyl acetate
2-chloro-2-propyl acetate
acetone
acetone azine
bromoacetone
1,3-dichloroacetone
acetophenone
2-chloroacetophenone
3-chloroacetophenone
1
-phenyl- 1-propyne
acraldeh yde
2-chloroacraldehyde
2-methylacraldehyde
acrylic acid
acrylyl chloride
2- bromoaniline
3-bromoaniline
N-butylaniline
2-tert-bu tylaniline
4-tert-but ylaniline
N,N-dibut ylaniline
N,N-diethylaniline
N,N-dimethylaniline
N,N-dimethyl-2-bromoaniline
2-chloro-N,N-dimethylaniline
2-ni tro-N,N-dimethylaniline
2,3-dimethylaniline
4-chloro-N-methylaniline
N-methyl-N-nitrosoaniline
N-propylaniline
N-isobutylaniline
3-methoxybenzaldehyde
3,4- benzisooxazole
tert-butyl nitrite
antimalarine
antipyrine
2,5-dimethoxysaffrole
39.30
53.28
22.12
21.43
19.27
28.43
12.36
17.02
32.38
26.87
19.28
18.97
52.72
59.54
56.07
64.03
11.52
36.57
29.89
45.74
60.04
22.46
12.43
15.66
35.88
35.74
25.13
15.21
34.46
58.11
33.08
26.45
22.37
23.83
25.24
3 1.28
3 1.45
32.19
16.18
36.17
23.38
25.70
36.5
1
40.39
40.57
40.05
16.22
20.79
20.94
17.44
21.18
37.86
38.56
49.26
49.01
49.01
68.92
50.15
40.89
47.97
45.32
48.86
39.94
29.34
40.14
45.12
49.26
38.87
33.36
26.77
91.20
57.44
68.27
39.29
53.13
19.26
21.00
20.01
29.63
12.83
17.33
32.92
27.13
20.64
19.33
51.35
59.54
56.12
64.34
11.65
36.14
30.91
46.15
60.74
23.55
13.91
15.48
36.1
1
35.1
1
25.48
14.52
32.95
57.43
34.04
26.95
21.22
23.22
25.39
3 1.67
31.54
3 1.67
16.03
35.15
24.40
26.75
36.61
40.72
40.72
39.91
16.67
20.42
20.92
16.97
21.09
37.60
37.60
48.59
49.43
49.43
68.71
50.25
40.63
48.16
44.75
48.14
41.88
38.67
38.99
43.98
48.76
39.73
34.88
27.14
88.40
58.19
63.21
39.26
53.20
19.09
20.14
19.74
29.03
13.35
18.00
33.42
26.74
19.63
19.28
50.92
57.64
54.57
62.83
11.36
36.91
31.45
44.35
60.79
22.29
13.00
15.32
36.08
34.65
25.30
15.29
34.1
1
58.59
33.87
27.13
22.53
23.41
25.41
31.58
3 1.58
3 1.63
16.01
34.83
23.77
25.40
36.08
40.77
40.77
38.99
16.20
20.90
20.85
17.84
20.90
38.23
38.23
49.06
49.06
49.06
67.65
49.06
39.77
47.53
44.46
46.64
39.77
39.81
39.81
44.41
49.06
37.71
33.54
26.74
87.42
56.23
59.33
39.44
53.24
18.91
20.63
19.94
29.54
12.65
17.24
32.85
26.98
20.42
19.36
51.27
59.66
56.57
64.46
11.58
36.59
31.23
45.60
60.64
23.34
13.79
15.29
36.02
35.17
25.79
14.36
33.1
1
57.44
33.96
26.85
21.35
23.42
25.41
31.49
31.42
31.87
16.02
35.03
24.32
26.67
36.52
41.13
41.13
39.78
16.66
20.8
1
20.90
17.04
20.76
37.79
37.79
48.46
49.78
49.78
68.45
50.06
40.46
48.15
45.06
48.04
41.90
39.07
39.02
43.87
48.64
39.68
34.32
27.01
88.66
58.52
63.10
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS
J.
Chem.
Znf
Comput.
Sci.,
Vol.
27,
No.
1,
1987
29
Table
I1
(Continued)
calcd from study
no.
ID‘
compd obsd
I
I1
111
225
226
227
228
229
230
23 1
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
25 1
252
253
254
255
256
257
258
259
260
26 1
262
263
264
265
266
267
268
269
270
27 1
272
273
274
275
276
277
278
279
280
28
1
282
283
284
285
286
287
288
289
290
29 1
292
293
294
295
296
297
298
299
25 453
25 465
25 510
25 584
25 590
25 654
25 679
25 684
25 687
25 734
25 742
25 758
25 767
25 776
25 793
25 991
25 998
26014
26 042
26 049
26 062
26 093
26 103
26 108
26 149
26 153
26 155
26 156
26 157
26 159
26 163
26 274
26 356
26416
26 429
26 450
26 498
26512
26 531
26 692
26710
26718
26 848
26 866
26 869
26 876
27 070
27 072
27 078
27 088
27 103
27 113
27 119
27 128
27 136
27 137
27 160
27 174
27 175
27 176
27 196
27 200
27 207
27211
27212
27 221
27 222
27 254
27 259
27 261
27 263
27 264
27 296
27311
27312
nonanedioic acid
azobenzene
3,3’-dimethyldiazobenzene
azomethane
azoxybenzene
benzaldehyde
2-chlorobenzaldehyde
3-chlorobenzaldehyde
4-chlorobenzaldehyde
3-ethoxybenzaldehyde
salicylaldehyde
4-hydroxybenzaldehyde
N-ethylbenzaldehyde imine
3-methoxybenzaldehyde
benzoxime
rert-butylbenzene
4-methyl-tert-butylbenzene
2,3-dinitrochlorobenzene
2-chloro-2-phenylpropane
pentafluorochlorobenzene
m-phen ylenediamine
2,4-dichloronitrobenzene
catechol
2,4-difluoronitrobenzene
4-nitroethylbenzene
fluorobenzene
4-iodofluorobenzene
o-nitrofluorobenzene
m-nitrofluorobenzene
2,4,6-trimethylfluorobenzene
hexafluorobenzene
pyrogallol
benzenesulfinyl chloride
ethyl benzenesulfonate
propyl benzenesulfonate
benzenesulfonyl fluoride
1-methylbenzimidazole
phenyldichlorofluoromethane
benzoic acid
3-ethylbenzoic acid
salicylic acid
propyl 4-hydroxybenzoate
benzonitrile
4-fluorobenzonitrile
2-hydroxybenzonitrile
3-methylbenzonitrile
trichlorophenylmethane
trichloro-( 3-chlorophenyl)methane
benzothiazole
2-chlorobenzothiazole
2-meth ylbenzothiazole
5-methylbenzothiophene
2-chlorobenzoxazole
2-methylbenzoxazole
benzoyl bromide
benzoyl chloride
benzyl alcohol
3,4-dimethoxybenzyl alcohol
2-phenyl4-propanol
1-phenylpropanol
benzylamine
4-(methylbenzylamino)- 1-butyne
benzyldimethylamine
benzylethylamine
benzylethylaniline
benzylaniline
N-benzyl-2-methylaniline
benzyl chloromethyl ether
benzyl fluoride
benzyl iodide
benzyl isothiocyanate
phenylmethanethiol
benzylidene dibromide
benzylideneeth ylamine
benzylidenemethylamine
39.08
53.66
72.51
19.75
60.72
32.28
36.74
36.90
37.74
43.81
34.52
35.52
44.45
37.78
36.83
44.99
49.92
45.36
40.01
33.06
36.15
41.43
32.95
32.92
42.74
26.15
34.96
33.78
32.69
40.35
26.49
28.11
25.46
45.63
50.19
35.05
40.25
41.29
33.64
44.84
31.18
50.28
3 1.48
3 1.77
33.67
34.8
1
45.92
41.02
38.99
44.22
43.94
46.56
37.33
35.01
39.60
37.15
32.55
45.79
44.03
41.73
34.27
56.22
43.54
43.37
69.25
61.84
65.29
41.89
31.09
44.94
45.69
38.80
47.22
44.36
39.40
44.86
56.61
68.49
20.55
60.26
32.39
36.51
36.51
36.51
44.54
34.18
34.18
44.70
39.73
36.22
44.78
50.69
44.55
44.93
32.07
34.72
41.16
29.00
33.94
43.45
25.92
38.25
33.43
33.43
43.65
28.46
30.79
37.78
45.75
50.37
35.37
39.67
40.58
32.05
42.57
33.84
48.81
31.31
31.82
33.10
37.22
44.90
49.02
38.81
43.20
42.22
46.86
38.81
37.83
38.99
36.17
32.60
47.27
41.83
41.96
34.32
56.58
44.49
43.76
69.97
59.08
64.99
42.45
3 1.48
44.07
45.67
38.91
46.64
44.70
39.89
47.40
58.88
68.17
18.74
60.72
31.43
36.13
36.13
36.13
42.36
33.07
33.07
43.05
37.71
35.39
45.13
49.78
44.98
45.18
31.98
34.40
42.8 1
29.82
33.71
42.71
26.69
39.65
33.57
33.57
40.63
27.43
3 1.45
39.1 1
46.40
51.05
35.62
40.48
40.73
33.07
42.36
34.70
48.64
31.92
32.07
33.56
36.57
45.28
49.97
39.23
43.92
43.87
45.53
42.79
42.74
39.19
36.13
32.83
45.39
42.12
42.12
35.12
56.85
44.41
44.41
69.13
59.83
64.48
42.17
31.34
44.15
45.74
38.51
46.72
43.05
38.40
44.93
57.13
68.93
20.18
59.82
32.43
37.03
37.03
37.03
44.48
34.24
34.24
44.52
39.68
36.13
44.95
50.85
45.04
44.89
31.91
34.93
42.07
28.91
33.67
43.35
25.68
38.18
33.26
33.26
43.38
27.71
30.72
37.86
49.09
50.69
35.29
39.60
40.46
32.09
42.59
33.90
48.74
31.38
31.78
33.19
37.27
44.79
49.40
38.49
43.06
42.44
46.34
40.39
39.77
38.94
35.81
32.54
47.04
41.77
41.83
34.40
56.62
44.44
44.20
69.75
58.96
64.86
42.43
31.10
44.05
45.53
38.87
46.54
44.52
39.72
30
J.
Chem.
Inf:
Comput.
Sci.,
Vol.
27,
No.
1,
1987
Table
I1
(Continued)
GHOSE
AND
CRIPPEN
no.
ID’
compd
calcd from study
obsd
I
I1
I11
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
3 20
321
322
323
324
325
326
327
328
329
3 30
331
332
333
334
335
336
337
338
339
340
34
1
342
343
344
345
346
347
348
349
350
35
1
352
353
354
355
356
357
358
359
3
60
36
1
362
363
364
365
366
367
368
369
370
371
372
373
27313
27 383
27 405
27 432
27 483
27 602
27610
27611
27631
27 638
27 639
27 640
27 646
27 647
27 655
27 657
27 658
27 668
27 673
27 679
27 696
27 701
27 738
27 747
27 756
27761
27 770
27 772
27 78
1
27 782
27 787
27 788
27 798
27 801
27 816
27 829
27 832
27851
27 854
27861
27 876
27 943
27 944
27 945
27 946
27 947
27 949
27 950
27951
27 953
28017
28 029
28 057
28 065
28 066
28 067
28081
28 085
28 088
28 098
28 101
28 102
28 107
28 109
28 123
28 137
28 140
28 144
28 151
28 152
28 153
28 173
28 174
28 178
benzylidene difluoride
butane-2,3-diol
2-chloro-6-phen ylphenol
3,3’-difluorobiphenyl
2-iodobiphen yl
bromoacetic acid
bromoacetyl bromide
chlorobromoacetic acid
1,2-butadiene
1,3-butadiene
2-bromo- 1,3-butadiene
1
-chloro- 1,3-butadiene
l,l-dichloro-1,3-butadiene
1,2-dichlor0-1,3-butadiene
2-fluoro-l,3-butadiene
hexafluoro- 1,3-butadiene
2-iodo- 1,3-butadiene
butane-l,3-diyne
n- butylamine
2-methyl-2-aminobu tane
2-methyl-2-bromobutane
1 -chloro-4-fluorobutane
1,2,3,4-diepoxybutane
2,3-epoxy-2,3-dimethylbutane
1,4-butanedithioI
butyl fluoride
isopentyl iodide
sec-butyl iodide
1-nitrobutane
2-nitrobutane
1,1,2,2-tetrabromobutane
1,2,2,3-tetrabromobutane
1,2,2-trimethyIbutane
2,2,3-tribromobutane
2-methyl-2-nitropropane
butane-l,3-diol
butane-1,3-diol sulfite
butane- 1 -thiol
butane-2- thiol
1 -hydroxy-2-aminobutane
2,2,3,3,4,4,4-heptafluorobutane
cis-1-bromo- 1-butene
trans-1-bromo-1-butene
2-bromo- 1-butene
2-bromo-3-methyl- 1-butene
2-bromo-4-phenyl-
1
-butene
cis-
I-chloro- 1-butene
trans- 1-chloro- 1-butene
1 -chloro-2-methyG 1 -butene
2-chloro-
1
-butene
crotonic acid
ethyl 4-bromocrotonate
methyl vinyl ketone
but- 1-en-3-yne
1
-chlorobut-l-yn-3-ene
1
-methoxybut-l-en-3-yne
1-(N,N-dimethy1amino)butane
2-aminobutane
ethyl-sec-butylamine
tert-butyl bromide
ser-butyl chloride
tert-butyl chloride
1
-chloro-2-methyl-l-propene
1,l -dichloro-2-methylpropane
ethyl tert-butyl ether
isobutylisothiocyanide
2-methylpropanethiol
1,l-dimethylethanethiol
butyl nitrate
sec-butyl nitrate
isobutyl nitrite
butyl sulfite
isobutyl sulfite
butyl sulfoxide
30.76
23.61
58.25
52.18
64.63
20.43
27.54
25.76
20.27
22.46
27.94
25.77
30.69
29.96
20.75
24.06
33.76
17.16
24.08
28.61
33.37
25.33
20.18
29.67
35.50
20.46
38.13
33.83
27.45
25.61
5
1.08
5 1.42
44.10
43.90
26.40
23.71
30.31
28.74
28.29
25.38
22.95
27.54
27.61
27.61
32.51
51.71
25.00
25.01
28.50
24.98
22.46
39.98
20.03
18.42
23.89
25.83
33.82
21.40
33.48
28.86
26.48
25.8
1
25.06
29.79
31.43
35.19
28.43
28.71-
28.32
28.23
26.91
50.51
50.56
54.14
3 1.34
23.64
56.95
52.06
63.37
20.32
27.26
25.21
20.54
20.88
28.05
24.97
30.49
28.72
2 1.02
23.77
32.85
16.94
23.83
28.45
32.84
25.37
19.70
29.38
36.01
20.99
38.36
33.71
26.56
26.39
51.46
51.14
43.63
43.76
25.96
23.52
29.48
28.42
28.55
25.23
22.96
28.36
28.36
28.03
32.81
52.36
24.95
24.95
29.20
24.61
22.86
40.60
21.04
18.91
24.40
26.22
34.00
23.96
33.40
28.23
25.34
25.22
24.95
30.11
31.76
35.35
28.59
28.43
29.87
30.00
27.30
50.19
50.53
47.64
3
1.49
23.69
57.59
51.56
64.22
20.76
26.89
25.45
19.86
20.80
28.57
25.50
30.19
30.19
20.95
21.69
33.76
17.42
24.34
28.99
32.83
25.26
20.03
29.52
35.06
20.57
38.02
33.38
27.29
27.29
5 1.47
51.47
43.71
43.71
27.29
23.69
29.73
27.74
27.14
25.98
23.09
28.37
28.37
28.37
33.02
53.09
25.31
25.31
29.95
25.31
22.48
39.54
20.85
19.1
1
23.81
25.39
33.64
24.34
33.64
28.18
25.1
1
25.1 1
25.31
30.00
31.35
34.96
27.74
27.74
28.92
28.92
26.74
50.15
50.15
46.88
31.06
23.59
57.29
5 1.68
63.37
20.26
27.11
25.07
20.43
20.78
28.01
25.09
30.73
29.24
20.73
23.44
32.81
17.04
23.90
28.53
32.77
25.09
19.75
29.38
36.07
20.61
38.33
33.64
26.44
26.33
51.41
5 1.32
43.77
43.86
26.07
23.50
29.75
28.37
28.46
25.36
22.95
28.13
28.13
27.96
32.74
52.26
25.04
25.04
29.28
24.88
22.95
40.65
21.10
18.91
24.1
1
25.87
33.94
23.99
33.79
28.17
25.25
25.19
25.04
30.32
31.63
35.21
28.55
28.40
29.62
29.71
27.16
50.33
50.69
47.45
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS
J.
Chem.
If.
Comput.
Sci.,
Vol.
27,
No.
1,
1987
31
Table
I1
(Continued)
calcd from study
no.
ID”
wmpd obsd
I
I1
I11
374
375
376
377
378
379
380
38 1
382
383
384
385
386
387
388
389
390
39 1
392
393
394
395
396
397
398
399
400
40 1
402
403
404
405
406
407
408
409
410
41 1
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
43 1
432
433
434
435
436
437
438
439
440
44 1
442
443
444
445
446
447
448
28 179
28 192
28 207
28 233
28 234
28 240
28 246
28 263
28 282
28 421
28 426
28 434
28 435
28 443
28 445
28 460
28 659
28 661
28 666
28717
28 812
28 921
28 930
28 938
28 939
28 949
28 952
28 961
28 962
28 975
28 976
28 988
28 995
28 996
28 997
29 070
29 07 1
29 159
29 162
29216
29 226
29 365
29 373
29 456
29 457
29 466
29 528
29 533
30 163
30 171
30 195
30 207
30215
30 220
30 221
30 224
30 225
30 361
30 829
30 847
30 863
30 864
30 867
30871
30 922
30 923
30 924
31 116
31 117
31 119
31 121
31 122
31 170
31 177
31 562
butyl thiocyanate
1 -chloro-2-methyl- 1 -propene
2-butynedinitrile
3-methylbutanal oxime
2,2,3-trichlorobutyraldehyde
butyramide
N,N-dimethylbutyramide
butyric acid
2-bromobutyric acid
but yronitrile
2-methylbutyronitrile
2-bromoisobut yronitrile
2-hydroxyisobutyronitrile
isobutyroyl bromide
butyroyl chloride
isobutyroyl chloride
N,N-diethylcarbamic acid
ethyl carbamate
methyl
N-nitro-N-ethylcarbamate
carbon disulfide
monobutyl catechol ether
chloroacetic acid
ethyl hydroxychloroacetate
chloroacetone cyanohydrin
chloroacetonitrile
bis( 1-chloroethyl) ether
methyl 1-chloroethyl ether
2-chloroethyl chloroformate
chloromethyl chloroformate
trichloromethyl chloroformate
bis(chloromethy1) ether
2-chloro- 1,3-butadiene
ethyl chlorosulfinate
chlorosulfonic acid
methyl chlorosulfonate
cinnamaldehyde
8-
bromocinnamaldehyde
cinnamonitrile
cinnamoyl chloride
3-allylpiperidine
2-propylpiperidine
2-bromo-4-methylphenol
2-nitro-4-methylphenol
perfluorocyclobutene
phenyl cyclobutyl ketone
azacycloheptane
cyclohexane epoxide
fluorocyclohexane
N,N-dimethyl-2-methylpropane
N,N-dimethylpentane
ethyl 3,5-dinitrobenzoate
1,3-dioxane
1,4-dioxane
glycol methylene ether
glycerolethylidene ether
1,2-ethyIenediol carbonate
1,2-propanediol carbonate
trimethylene 1,3-disulfide
ethyl 2-propyn-1-yl ether
1-chloro- 1,2,2-trifluoroethene
1,l -dichloroethene
l,l-dichloro-2-fluoroethene
1,2-dichloro- 1,2-difluoroethene
1,2,2hichloro- 1-fluoroethene
methoxyacet ylene
phenylacetylene
propoxyacetylene
furan
2-acetylfuran
2-bromofuran
2-tert-butylfuran
2-chlorofuran
furfural
5-methylfurfural
1
-fluoroheptane
31.50
25.06
21.66
29.64
35.42
24.32
33.43
22.21
28.53
20.37
25.09
28.11
22.12
29.14
25.80
25.83
32.00
22.60
32.18
21.50
48.56
17.56
28.58
26.90
16.02
32.63
23.17
27.66
22.60
32.58
22.59
25.23
27.33
26.87
22.02
44.20
50.76
42.96
49.99
39.38
40.60
40.08
40.76
18.80
48.60
31.61
27.40
27.54
33.85
38.28
59.97
21.41
21.68
16.84
27.76
16.72
21.36
28.76
24.71
17.52
20.35
20.43
20.48
25.36
16.28
33.43
24.88
18.16
29.58
26.11
37.46
23.41
25.44
30.53
34.39
33.28
24.59
20.17
29.62
35.43
23.88
33.78
21.31
29.38
20.57
25.48
28.21
22.10
28.55
25.43
25.73
30.33
21.16
33.11
21.20
48.58
17.32
28.95
26.48
16.58
3 1.80
22.35
27.09
22.92
32.32
22.94
24.63
25.50
27.24
22.43
42.27
49.44
41.84
46.70
40.32
40.25
40.65
40.62
20.26
48.49
30.90
27.25
27.97
34.17
38.62
57.42
21.72
21.83
17.10
27.56
17.34
22.09
30.10
25.19
17.11
19.96
20.44
21.26
25.41
15.27
33.99
24.69
18.65
29.36
26.45
37.54
23.63
25.15
30.20
34.83
33.1 1
25.31
20.37
29.26
34.74
24.58
33.87
22.29
30.05
21.15
25.79
28.91
22.78
28.42
25.35
25.35
30.86
21.57
33.09
21.42
48.40
17.69
28.62
27.48
16.55
3 1.44
22.10
27.03
22.39
31.78
22.15
25.50
25.33
26.38
21.73
40.92
48.68
41.41
45.61
41.29
41.10
40.59
39.70
19.67
48.19
31.81
27.68
28.03
33.64
38.28
56.10
21.86
21.86
17.21
28.14
17.45
22.09
28.58
25.20
16.46
20.71
20.86
21.00
25.55
15.91
34.34
25.20
19.33
28.86
27.10
37.92
24.03
24.22
28.86
34.51
33.05
24.68
20.49
29.41
35.64
23.70
33.70
21.30
29.34
20.59
25.51
28.37
22.27
28.47
25.03
25.35
30.48
21.12
33.29
21.38
48.34
17.29
28.89
26.76
16.57
32.02
22.41
26.93
22.77
32.26
23.05
24.93
25.42
27.14
22.34
42.28
49.51
41.94
46.37
40.85
40.7
1
40.68
40.57
19.96
48.51
3 1.42
27.22
27.62
34.12
38.54
57.48
22.00
21.85
17.40
27.95
17.48
22.17
29.63
25.18
17.26
20.17
20.28
21.55
25.84
15.35
33.96
24.75
18.74
29.04
26.67
37.47
23.54
24.95
29.47
34.40
32
J.
Chem.
If.
Comput.
Sci.,
Vol.
27,
No.
1.
1987
GHOSE
AND
CRIPPEN
Table
I1
(Continued)
no. ID'
calcd
from
study
compd obsd
I
I1
111
449
450
45
1
452
453
454
455
456
457
458
459
460
46
1
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
48
1
482
483
484
485
486
487
488
489
490
49 1
492
493
494
495
496
497
498
499
500
50
1
502
503
504
31 563
31 766
31 790
31 813
31 824
31 828
32 325
32 328
32 329
32 947
33 044
33 065
33 070
33 083
33 084
33 379
34 202
34 203
34619
34 626
34 756
34 757
34 758
35 196
35 622
35 623
35 790
35 796
35 807
35 843
35 846
36 303
36 337
36 441
36 460
36 463
36 466
36 467
36471
36 488
36 490
36 492
36 599
36 623
36 626
36 630
36 861
37 802
37 808
37871
37 872
37 974
38 420
38 430
38 499
38 953
DerfluoroheDtane 36.90 36.50 36.72 36.44
I-bromo-6-fluorohexane
2,2-dichlorohexane
1
-fluorohexane
perfluorohexane
lI1,2,2-tetrachlorohexane
imidazole
1
-methylimidazole
4-methylimidazole
2,4,6-triamino- 1,3,5-triazine
chloroiodomethane
dichloroiodomethane
diiodomethane
trichloroiodomethane
trifluoroiodomethane
a-fluoronaphthalene
2,4-dimethyloxazole
2,5-dimethyloxazole
1 -bromopentyne
1 -iodopentyne
2-fluorophenyl ethyl ether
3-fluorophenyl ethyl ether
4-fluorophenyl ethyl ether
phenylacetylene
4- benzylpiperidine
N-butylpiperidine
2-chloro-2- bromopropane
1
-chloro-2,2-difluoropropane
1
-chloro-1-nitropropane
2,2-difluoropropane
2,2-diiodopropane
propoxyacetylene
1,3-dibromo-
1
-propyne
pyridine
2-butoxy-Saminopyridine
2-benzylpyridine
2-bromopyridine
3-bromopyridine
2-chloropyridine
2,3-dimethylpyridine
2,6-dimethyl-4-ethylpyridine
2-(dimethylamino)pyridine
4-methylpyrimidine
pyrrole
1
-methyl-2-acetylpyrrole
2,4-dimethylpyrrole
thiophene
thiazole
2,4-dimethylthiazole
2- bromothiophene
2-bromo-5-chlorothiophene
allylthiourea
ethyl tribromoacetate
N,N-dimethyltrichloroacetami
n-propyl trifluoroacetate
37.57
39.89
29.74
3 1.58
49.27
18.77
23.27
23.33
36.48
24.31
29.50
32.57
34.92
19.18
43.80
26.09
25.63
31.31
36.26
37.47
37.47
37.33
34.98
54.61
45.75
29.12
20.64
26.36
15.79
41.96
24.88
29.61
24.07
50.08
50.67
3 1.44
3 1.49
29.20
34.14
43.49
39.25
26.85
20.65
37.01
30.55
24.36
24.19
32.00
32.53
37.09
32.33
45.97
de 40.42
27.70
37.60
39.74
30.22
31.77
49.3
1
19.24
24.19
23.58
37.23
24.67
28.91
32.59
33.80
20.85
43.08
25.45
26.17
31.81
36.26
38.07
38.07
38.07
33.99
55.40
45.86
28.76
20.71
26.08
16.34
41.73
24.69
29.97
23.75
49.54
52.55
32.25
3 1.29
29.43
34.00
42.96
38.73
27.89
20.18
35.84
31.14
24.27
23.33
31.08
32.07
37.05
35.32
45.77
38.37
27.92
37.62
39.10
29.86
31.78
48.49
19.96
24.61
24.61
33.35
24.13
28.83
32.40
33.52
19.88
42.56
26.97
26.97
31.33
36.52
37.62
37.62
37.62
34.34
56.52
45.75
28.23
20.76
27.34
16.07
41.69
25.20
29.80
24.89
49.04
54.25
32.65
32.65
29.58
34.18
43.48
38.1 1
27.88
21.62
35.80
30.92
25.02
23.36
32.65
32.78
37.47
36.07
45.58
38.67
27.38
37.27
39.72
29.80
3 1.69
49.30
19.34
24.34
23.85
36.70
24.56
28.87
32.54
33.87
20.86
42.85
26.21
26.22
3 1.79
36.28
37.73
37.73
37.73
33.96
55.90
45.86
28.81
20.7
1
26.51
16.23
41.89
24.75
30.05
23.75
49.72
52.70
32.30
3 1.44
29.18
34.16
43.26
38.82
27.65
20.33
35.63
30.75
24.22
23.23
3 1.68
32.14
36.94
35.37
45.78
38.41
27.83
ethyl xanthate 43.25 42.13 41.63 42.07
OThe compound
ID
is given
for
easy reference.
All
molecules having ID numbers
less
than 24000 were taken from ref 14.
For
these compounds
the right three digits represent the compound number, and the remaining digits beyond that represent the paper sequel number, e.g., compound
14287 was taken
from
paper 14 and its number was 287. Since in the first few papers the molecules were not numbered by the authors, we used
arbitrary numbers. Molecules having ID numbers greater than 24000 were taken
from
ref
15. Simply subtract 24000 to get the compound number
of
the
CRC
Handbook;
e.g., the compound
24
484 is compound 484 in the handbook.
2.8557, 16-4.1009, 17-3.7162. The heteroatom substitution
for hydrogen is even more confusing: heteroatom replacing
hydrogen on saturated carbon when there is no carbon sub-
stitution, 5-2.4666, 7-3.1274, 10-3.0075, 14-3.1677; when there
is one carbon substitution, 6-2.6338, 9-2.7885, 13-2.1784; in
ethylenic carbon, 15-2.8557, 18-3.6247, 20-1.9708. There may
be several factors involved in the changes. The substituting
atoms may have a direct effect on the volume of the atom
concerned, e.g., more electronegative atoms lead
to
volume
contraction due
to
electron withdrawal. The volume loss due
to greater overlapping may also affect the atomic refractivities.
The nature of the bonds also plays an important role in its
value.
Table
I
(study
I)
shows that the hydrogens have a relatively
small span of values ranging from 1.0 to 1.5. These values
are decreased by electron-attracting atoms. Double-bonded
oxygens, like the multiple-bonded carbons, have higher values
compared to their single-bonded counterpart. The aryl ether
or ester oxygens also have high values. Unexpectedly, the
oxygens with a delocalized bond, as in the nitro group, have
low values. The nitrogen has a higher value in arylamines than
in
aliphatic amines. The nitrogens
in
aromatic heterocyclic
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS
Table
111.
Classification
of
Atoms
in
Selected
Molecules
molecule
ID
structure'
atom
type
(atom
list)
14300
I
23573
I1
24484
111
25
151
IV
26108
V
27088
VI
27263
27658
30922
IX
1
(7), 2 (6), 6 (2,
5),
40
(3),
46 (13-17), 47
5
(1,
5),
47 (6-11),
58
(4), 72 (2), 78
(3)
1
(5),
15
(l),
17 (2), 36 (3), 46 (9-11), 47 (6,
24 (1, 4-6), 25 (2), 28 (3),
33
(7), 47
(10-13),
24
(3,
5,
6), 26
(1,
2, 4), 47 (12-14), 61
(8,
24 (6-9), 28 (4), 34
(5),
44 (2), 47 (11-14),
6 (7), 24 (2-6), 25
(l),
40 (9), 47 (11-17), 74
15
(1,
4), 16 (3), 19 (2), 47 (6-lo), 99
(5)
5
(4), 21
(l),
23 (2), 47 (6-8), 48
(5),
60
(3)
(11,
12),
51
(9,
lo),
58
(8), 60 (4), 96
(1)
7), 49
(8),
58 (4)
48 (14), 60
(8),
75 (9)
9),
76 (7), 84 (10, 11)
75 (3),
90
(lo),
107 (1)
(8),
108
(10)
VI1
VI11
32329
X
1
(6), 28 (4),
33
(5),42 (2), 48 (9),49
(81,
50
(7),
51
(10-12). 73
(l),
75
(3)
'See
Figure
1
for
the
chemical
structure
of
the
molecules
and
their
atom
numbering.
compounds and aromatic nitro compounds have unexpectedly
high values. Each individual halogen has little variation in
its values, although fluorine, chlorine, and bromine attached
to unsaturated oxidized carbon showed some high values.
Since a very small number of parameters are known to
express the molar refractivities of many organic molecules17
and the present calculation showed discrepancies in a few
parameters, the data set was allowed to fit in terms of a very
small number of parameters by converting all saturated car-
bons (1-1 4) to the same type, all ethylenic carbons (1 5-1
8)
to the same type, and
so
on, as in Table
V.
Such
a
simplified
classification (study
11)
used only 22 atom types, yet the fit
of the data set was remarkably good, having a standard de-
viation of 1.527, a correlation coefficient of 0.991, and an
explained variance
of
0.98
1.
When these parameters were
allowed to predict the molar refractivity of the
78
molecules,
the calculated values showed a standard deviation of 1.618 and
a correlation coefficient of 0.995. Since here the fitting was
done by using simple least-squares technique, the statistical
goodness of fit of each parameter is also given by their t-test
values.
Although the statistical fit with such few parameters gives
very good t-test values, they cannot represent the subtle
changes that may occur due to the change in the nature
of
the
substituents. An intermediate step (study
111)
was taken to
get a solution that would keep the atom classification of study
I
but would not show unexpected variation from this average
value and at the same time reflect these changes. We used
quadratic programming subject to the constraints that the
solution will not deviate beyond 20% of its base value as ob-
tained in study
11.
The calculated values of this study gave
a standard deviation
of
1.2897, a correlation coefficient of
0.993, and an explained variance of 0.984. These parameters
predicted the values of the 78 molecules with a standard de-
viation of 1.5817 and a correlation coefficient of 0.995. The
statistics of fit and the predictive power of the various studies
are presented in Tables
VJ
and
VII.
The standard deviations
of studies
I
and
I11
are somewhat better than that of study
11.
However, the correlation coefficients and the explained
variances are almost identical. The standard deviation of the
predicted values is slightly better for study
111,
while for studies
I
and
I1
it is almost identical. The comparison of the pa-
rameters obtained from studies
I
and
I11
shows that in general
the parameters having low values in study
I
have a tendency
toward lower values within the allowed limits in study
111.
Similarly, the high values in
I
tended to be high in
111.
It
should
be
remembered that although the number of parameters
used in studies
I11
and
I
is
the same, the number of degrees
J.
Chem.
Zf.
Comput.
Sci.,
Vol.
27,
No.
1,
1987
33
I
CSHJ
I1
09
08
I11
VI
VI
I
I5
CdN'C2H1
\-5
1-5
VI11
IX
X
Figure
1.
Schematic representation of
the
structures of
the
molecules
used
to
illustrate the atom classification. The number
after
non-
hydrogen atoms indicates
the
atom label, while
the
number after
hydrogen indicates the quantity. The atom label for hydrogen
can
be easily obtained from
the
label
of
their
point
of
attachment. The
numbering starts from
the
lowest non-hydrogen atom and proceeds
toward the higher numbered atoms. The number
in
between
bonded
atoms indicates
the
bond type. The structural information was kept
according
to
the Cambridge Crystallographic Data File,
with
minor
modification. The aromatic bonds in pyrrole type
structure,
for
example,
were
represented
by
two
types of
bonds,
-5
and
-6.
of freedom for regression is much lower in
I11
due to the
boundaries formed by the constraints.
Molar
Refractivity
and
Hydrophobicity.
The hydropho-
bicities on the scale of water-octanol partition coefficients are
presented in Table
I.
Except for a few cases, these values are
very close to those reported earlier.13 Since in the present study
we used quadratic programming to evaluate the atomic re-
fractivities, we wanted to evaluate the partition coefficient
values also using this program.
In
theory, if the lower limit
on the solution of the quadratic programming is lower than
the value evaluated by the least-squares technique and if there
are no other constraints on the solution, it should lead to the
same values of the parameters. Except for 12 parameters,
exactly the same values were obtained by this method. The
discrepancy in the 12 parameters was found to be due to the
singularity or near singularity in the least-squares matrix. The
singularity was removed by setting prameters
41
and
44
equal,
since they are chemically very similar. Under such a condition
both methods gave exactly the same solution. The present
solution was obtained by introducing two more molecules,
2-methylbenzimidazole and phenylacetaldehyde. This allowed
us to evaluate parameters 36 and 43 from more than one
34
J.
Chem.
If.
Comput.
Sei.,
Vol.
27,
No.
I,
1987
Table
IV.
Compounds Used to Check the Predictive Power
of
the Parameters
GHOSE
AND
CRIPPEN
calcd from study
no.
ID"
compd obsd
I
I1
I11
1
24 079 N-methylacetamide 19.73 19.47 19.93 19.36
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
41
48
49
50
51
52
53
54
55
56
51
58
59
60
61
62
63
64
65
66
61
68
69
70
11
12
73
14
I5
16
17
78
24 099
24 194
24261
24 325
24331
24 342
24 343
24 345
24 376
24 394
24 399
24410
24 434
24 440
24441
24 490
24 525
24 538
24 572
24 591
24 604
24 624
24 626
24 627
24
658
24 678
24 769
24 803
24811
24 923
24 924
24 949
25 065
25 067
25 375
25 379
25 499
25 597
25 774
25 781
25 796
25 948
25 949
25 962
26 179
26 195
26 952
26 960
27015
27071
21 074
27 110
27 298
27 662
27 665
27 698
27 742
27 744
27 849
21 938
27 941
28 020
28 124
28 187
28 195
28 201
28 663
28 685
28811
29 546
30 174
30 175
30 695
30 696
30 697
30 780
30781
58.59
22.34
27.13
17.64
19.96
11.85
38.09
21.24
45.47
36.23
37.7
1
49.04
19.12
24.32
25.90
34.89
30.19
42.75
25.07
25.36
19.89
27.09
37.42
37.97
26.22
25.07
58.47
35.17
35.17
30.62
30.62
42.44
39.70
39.70
85.26
43.17
68.18
70.01
37.71
36.08
45.37
43.60
43.60
41.18
33.72
47.36
58.47
63.1
1
61.27
49.97
3 1.64
40.88
40.58
26.63
39.58
25.1
1
58.27
46.34
3 5.06
25.54
25.50
27.18
29.85
26.74
18.92
19.16
30.86
28.94
59. I8
36.45
20.05
19.00
21.68
24.74
32.51
25.33
26.38
57.44
22.52
26.87
18.22
19.72
11.25
38.00
21.46
46.74
36.93
38.34
49.02
18.81
23.31
24.98
34.51
30.37
41.37
24.22
24.73
19.35
26.62
37.82
38.36
26.26
25.35
58.80
34.71
34.71
30.5
1
30.5
1
42.82
40.10
40.10
86.92
43.8
1
68.93
7 1.62
39.68
38.32
47.70
43.33
43.24
40.55
32.47
45.77
58.95
64.85
61.64
49.40
3 1.78
40.44
40.76
28.33
37.71
25.16
58.93
46.43
36.07
26.73
26.8
1
27.10
29.35
26.98
18.86
20.62
30.92
28.76
58.84
35.91
21.58
19.86
2
1.98
25.70
33.17
25.42
27.14
N-but yl-N-phen ylacetamide
chloromethyl acetate
1-methyl vinylacetate
2-methyl phenylacetate
acetone oxime
acetonitrile
cyclohexylidieneacetonitrile
dichloroacetonitrile
dichloroacetophenone
4-fluoroacetophenone
3-hydroxyacetophenone
2-iodoacetophenone
acetyl bromide
acetyl iodide
acetyl isothiocyanide
ethyl 1-bromoacrylate
1,6-hexanedial
monoethyl adipate
3-(methylamino)propionitrile
2-bromoallyl alcohol
allylamine
allyl vinyl ether
diallyl thioether
diallyl sulfoxide
ethyl glycinate
(N,N-dimethy1amino)methyl
cyanide
benzalaniline
o-chloroaniline
p-chloroaniline
m-fluoroaniline
p-fluoroaniline
4-(methylthio)aniline
o-nitroanisole
4-nitroanisole
aramite
arecoline
2,2'-dimethylazobenzene
3,3'-dimethylazoxybenzene
o-methoxybenzaldehyde
3-meth ylbenzaldehyde
p-isopropylbenzaldehyde
(a-bromoethyl) benzene
(P-bromoethy1)benzene
m-nitrobromobenzene
phenyl isocyanate
a-nitroisopropylbenzene
benzophenone imine
benzophenone 4-(N-methylimine)
2-methyl-7,8-benzoquinoline
trichloromethyl-2-chlorobenzene
trifluoromethylbenzene
benzothiophene
(dich1oromethyl)benzene
perfluoroisoprene
1,2,3,4-tetrachlorobutadiene
n-butyl chloride
1,4-difluorooctachlorobutane
1,4-diiodobutane
butane- 1.4-dithiol
2-chlorocrotonaldehyde
3-meth ylcrotonaldehyde
3-chlorocrotonic acid
butoxyacetylene
n-butyl nitrite
1
-butyne
2-butyn-
1
-al
diethyl carbamate
ethyl thiocarbamate
(2-pheny1methoxy)phenol
meth ylcyclohexylamine
dimethyl sulfone
dimethyl sulfoxide
ethanesulfonic acid
ethanesulfonyl chloride
2-bromoethanesulfony1 chloride
ethyl chlorosulfinate
58.15
22.47
26.90
17.66
20.1
1
1
1.07
36.51
21.13
46.19
36.19
38.55
49.37
20.12
26.15
26.81
34.00
29.70
46.04
24.27
24.85
18.98
25.68
37.00
38.04
25.62
24.07
59.73
35.48
28.64
30.32
28.79
44.05
36.89
37.38
87.47
42.42
72.12
78.03
38.87
37.08
46.94
44.06
43.86
40.45
33.94
45.62
58.62
63.74
63.25
50.64
30.76
41.97
40.87
25.97
40.04
25.44
58.84
46.28
35.50
25.95
26.06
28.06
29.15
26.83
19.17
19.59
30.31
29.91
58.61
35.33
20.47
20.04
21.48
25.61
33.06
27.33
57.43
22.52
26.94
18.29
19.86
11.22
37.84
21.60
46.99
37.1
1
38.40
48.94
18.90
23.34
24.86
34.49
30.40
41.37
23.59
24.69
19.24
27.11
38.30
38.46
26.29
25.37
58.86
34.19
34.19
30.57
30.57
43.03
40.26
40.26
86.64
43.64
68.49
72.08
39.73
38.30
47.70
43.45
43.32
40.46
32.56
45.68
59.25
65.16
61.21
49.02
3 1.95
40.95
40.90
28.51
36.56
25.21
58.69
46.32
36.01
26.30
26.8
1
26.61
29.31
27. I3
18.89
20.62
30.92
28.71
59.07
35.44
21.91
20.17
21.84
25.96
33.35
25.50
ethyl chlorosulfonate 26.87 27.24
"See the footnote of Table
11.
QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIPS
Table
V.
Atomic Refractivities As Obtained in Study
I1
atomic no. of freq
t
type refrac compd of use test
1-14
15-20
21-23
24-35, 42-44
36-41
46-5
1
56, 57, 59,
60
58
61
66-73
74
75
76, 77
78
81-85
86-90
91-95
96-100
106, 107
108
109
110
2.8158
3.8278
3.8974
3.5090
3.0887
0.9155
1.6351
1.7956
2.1407
3.0100
3.2009
2.7662
3.5054
3.8095
1.0632
5.6105
8.6782
13.8741
7.3190
9.1680
6.0762
5.3321
821 1311
124
148
37 43
468 1205
178 205
999 4099
195 247
169 202
21 45
77 83
27 29
24 27
21 23
11 14
49 114
103 153
57 78
20 24
27 30
6 7
7 7
8
8
1.00
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
100
Table
VI.
Statistics of the Various Studies
no. of no. of correl explained
study compd parameters std dev coeff variance
I
504 93 1.2685 0.994 0.984
I1
504 22 1.5265 0.991 0.981
111
504 93 1.2897 0.993 0.984
Table
VII.
Statistics of Predictive Power of the Various Studies
no.
of
no.
of
correl
study compd parameters std dev coeff
I
78 93 1.6135 0.994
I1
78 22 1.6184 0.995
I11
78 93
1.5817
0.995
datum. When the atomic partition coefficient values are
correlated with the atomic refractivities of the various studies,
study
I
showed a correlation coefficient of 0.322, study I1
0.358, and study I11
0.340.
The low coefficient suggests a poor
linear correlation between the two parameters, thereby sug-
gesting the use of both parameters in correlation studies.
However, it should be remembered that the correlation
coefficient evaluated here is based
on
the complete atom type
set and assumes equal weighting. In a particular QSAR study,
such a condition may not hold.
So,
one should
be
careful when
using both parameters to evaluate the correlation for the
particular data set.
Modeling
Repulsive
Nonbonded
Interaction.
Although molar
refractivity is suitable for modeling the dispersive force or van
der Waals attractive interaction, often an important factor for
a strongly bound ligand is its steric fit with the receptor cavity.
This is the consequence of repulsive nonbonded interaction.
In the Lennard-Jones formulation,20 this interaction is rep-
resented by
(u/rijl*),
where
rij
is the distance between two
atoms. Unfortunately, in most cases
of
interest to medicinal
chemists the explicit structure of the receptor is not known,
making it extremely difficult to model the repulsive interaction.
This property is largely dependent
on
the flexibility of the
ligand. An artificial way to model the situation is to measure
J.
Chem.
Inf.
Comput.
Sci.,
Vol.
27,
No.
1,
1987
35
the volume of the molecule beyond a selected region of the
hypothetical receptor cavity and model the interaction in terms
of this volume. A study along this line is
in
progress and will
be communicated in the future.
CONCLUSION
The objective of the paper is to make the partially additive,
partially constitutive, properties of the ligands, which are
related to molecular interaction, into additive ones by hiding
the constitutive part in the atom classification. Since the
constitutional factors cannot be discretized as we did, it should
be considered as an approximate empirical technique. The
advantage of this approach is comparable to the advantage
of molecular mechanics calculations over quantum mechanical
calculations. Our approach gives great flexibility in a corre-
lation study since the local value of the necessary property can
be easily calculated in any region of three-dimensional space.
An added advantage is that the approximate value of these
properties for any molecule can be evaluated by this approach.
Although a better approach is to give the atomic values on the
basis of some more fundamental properties, such as molecular
orbital indices using some physical model, such a method will
suffer from the burden of doing such calculations, and the
various inaccuracies in those calculations may easily be
transmitted to the evaluated atomic property.
ACKNOWLEDGMENT
This work was supported by grants from the National
Science Foundation (PCM-83
14998)
and the National In-
stitutes of Health (5-R01-GM37123-02). We thank Prof.
P.
K.
Ponnuswamy for his comments which helped to improve
the manuscript.
REFERENCES
AND
NOTES
Volz, K.; Matthews, D. A,; Alden, R.
A.;
Freer,
S.
T.;
Hansch, C.;
Kaufmann, B.
T.;
Kraut, J.
J.
Biol. Chem.
1982,
257,
2528.
Scheraga, H. A.
In
Advances in Physical Organic Chemistry;
Gold, V.,
Ed.; Academic: London, 1968.
Berkert, U.; Allinger, N. L.
In
Molecular Mechanics;
American
Chemical Society: Washington, DC, 1982.
Csizmadia,
1.
G.
In
Theory and Practice
of
MO Calculations on Or-
ganic Molecules;
Elsevier: Amsterdam, 1976.
Nemethy, G.
Angew. Chem., Int. Ed. Engl.
1967,
6,
195.
Pratt, L. R.
Annu. Rev. Phys. Chem.
1985,
36,
433.
Pitzer, K.
S.
In
Aduances in Chemical Physics;
Prigogine,
I.,
Ed.;
Interscience: New York, 1959; Vol.
11,
p 59.
Claverie,
P.
In
Intermolecular Interactions from Diatomics
to
Bio-
polymers;
Pullman, B., Ed.; Wiley: New York, 1978; p 69.
Kochanski, E. In
Intermolecular Forces;
Pullman, B., Ed.; Reidel:
Dordrecht, Holland, 1981; p 15.
Blaney,
J.
M.;
Weiner, P. K.; Dearing,
A,;
Kollman, P. A,; Jorgensen,
E.
C.;
Oatley,
S.
J.;
Burridge, J. M.; Blake, C. C.
F.
J.
Am. Chem. SOC.
1982,
104,
6424.
Ghose,
A.
K.; Crippen, G. M.
J.
Med. Chem.
1985,
28,
333.
Glasstone,
S.
In
Textbook of Physical Chemistry;
Macmillan: London,
1948; p 543.
Ghose, A. K.; Crippen,
G.
M.
J.
Comput. Chem.
1986,
7,
565.
Ravindran, A.
Commun. ACM
1972,
15,
8
18.
Ravindran, A.
Commun. ACM
1974,
17,
157.
Vogel, A.
I.
J.
Chem.
Soc. 1948,
1833 and previous articles in the series.
CRC Handbook
of
Chemistry and Physics,
65th ed.; Weast, R. C., Ed.;
CRC: Boca Raton, FL, 1984.
The
molar refractivity, MR, was
evaluated from refractive index,
nD,
molecular weight,
M,
and density,
d,
by using the expression MR
=
[(nD2
-
l)/(nD2
+
2)](M/d).
Hooke, R.; Jeeves,
T.
A.
J.
Assoc. Comput. Mach.
1961,
8,
212.
Dictionary
of
Organic Compounds;
Oxford University: New York,
1965; Vol.
5,
p
3044.
Momany,
F.
A.;
Carruthers, L. M.; McGuire,
R.
F.;
Scheraga,
H.
A.
J.
Phys. Chem.
1974,
78,
1595.