Page 1
American Journal of Epidemiology
Copyright © 1991 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
Vol 134, No 10
Printed in U.SA
Differential Misclassification Arising from Nondifferential
Errors in Exposure Measurement
Katherine M. Flegal,1 Penelope M. Keyl,2 and F. Javier Nieto2
Misclassification into exposure categories formed from a continuous variable arises
from measurement error in the continuous variable. Examples and mathematical results
are presented to show that if the measurement error is nondifferential (independent of
disease status), the resulting misclassification will often be differential, even in cohort
studies. The degree and direction of differential misclassification vary with the exposure
distribution, the category definitions, the measurement error distribution, and the
exposuredisease relation.
Failure to recognize the likelihood of differential misclassification may lead to incorrect
conclusions about the effects of measurement error on estimates of relative risk when
categories are formed from continuous variables, such as dietary intake. Simulations
were used to examine some effects of nondifferential measurement error. Under the
conditions used, nondifferential measurement error reduced relative risk estimates, but
not to the degree predicted by the assumption of nondifferential misclassification. When
relative risk estimates were corrected using methods appropriate for nondifferential
misclassification, the "corrected" relative risks were almost always higher than the true
relative risks, sometimes considerably higher. The greater the measurement error, the
more inaccurate was the correction. The effects of exposure measurement errors need
more critical evaluation. Am J Epidemiol 1991 ;134:123344.
epidemiologic methods; measurement error; misclassification
In epidemiologic research, subjects are
often crossclassified by disease status and
exposure category. Some exposures are in
herently categorical (married or single, male
or female, presence or absence of a positive
family history of some condition). Exposure
categories are also often formed from con
tinuous variables such as body mass index,
years of occupational exposure, cigarettes
smoked per day, or dietary intake.
Subjects may be misclassified into an in
Received for publication December 26, 1990, and in
final form August 19, 1991
1 Division of Health Examination Statistics, National Cen
ter for Health Statistics, Centers for Disease Control,
Hyattsville, MD
2 Department of Epidemiology, The Johns Hopkins Uni
versity School of Hygiene and Public Health, Baltimore,
MD.
Reprint requests to Dr. Katherine Flegal, National Center
for Health Statistics, 6525 Belcrest Rd., Room 900, Hyatts
ville, MD 20782.
correct exposure category; such misclassifi
cation is defined as nondifferential if the
probabilities of exposure misclassification
are the same for persons with the disease as
for persons without the disease but differ
ential if the probabilities differ for persons
with and without the disease. In casecontrol
studies, differential misclassification may
arise from such sources as biased or selective
recall by cases, bias by interviewers who are
not blinded to case status, or differences
between the methods of exposure measure
ment used for cases and for noncases. Dif
ferential misclassification has often been
thought unlikely to arise in cohort studies,
in which exposure is determined before the
onset of disease, although Wacholder et al.
(1) have recently demonstrated this possibil
ity.
Exposure misclassification will tend to
bias measures of association between expo
1233
Page 2
1234 Regal etal.
sure and disease and affect the power of
statistical tests for association (24). For a
dichotomous variable, nondifferential mis
classification biases the expected relative risk
or odds ratio toward the null value (5, 6),
but differential misclassification may bias
the expected relative risk either toward or
away from the null value. If the variable has
more than two categories, the expected rel
ative risk may be biased in either direction
by either differential or nondifferential mis
classification (26).
Misclassification into exposure categories
formed from a continuous variable arises
from measurement error in the underlying
continuous variable, which may cause an
observation to be placed in the wrong cate
gory and thus be misclassified. In this paper,
we refer to measurement error that is inde
pendent of disease status as nondifferential
measurement error, to parallel the terminol
ogy used for misclassification. It is widely
assumed that if measurement error in a con
tinuous variable were nondifferential, then
misclassification into categories formed
from that variable would also be nondiffer
ential (7, 8).
In this paper, we describe the mechanism
by which nondifferential measurement error
in a continuous variable may often give rise
to differential misclassification. We then ex
amine some effects of nondifferential mea
surement error in a continuous exposure
variable on estimates of relative risk.
METHODS
We used simulations to explore the effects
of measurement error in a continuous
variable on misclassification. The SAS
RANNOR and RANBIN functions were
used to generate data (9). In each simulation,
we chose true exposure values (E) evenly
spaced over a specified interval. Error terms
randomly sampled from a normal distribu
tion with mean zero and a specified standard
deviation were then added to the true values
to simulate measured values. To simulate a
disease outcome, we calculated a probability
of disease (p) for each true exposure value
from a linear logistic model, with specified
parameters a and b, of the form: In (p/{\ 
p)) = a + b*E. We then generated a 0 or 1
value for disease status by sampling from a
binomial distribution with n = 1 and prob
ability of "success" (disease) equal to the
probability of disease calculated for that ex
posure value.
We conducted a preliminary investigation
of the effects of measurement error on esti
mates of relative risk under different condi
tions using simulations. In each of these
simulations, the integers from 1,600 to 2,499
were arbitrarily chosen as the true exposure
values, and high and low categories were
defined by a cutpoint of 2,200. We varied
the magnitude of measurement error by
varying the standard deviation of the mea
surement error, using values of 100, 300,
and 500. We also varied the true relative risk
in the high compared with the low category
by using three different sets of values for the
parameters a and b of the linear logistic
model. We ran 200 simulations for each of
the nine combinations of three levels of true
relative risk and three values of the standard
deviation of the measurement error.
To examine the effects of nondifferential
measurement error on the bias in relative
risk, we calculated the relative risk for the
measured values in each simulation and
compared the mean of these values with the
true relative risk. We also applied procedures
that predict (10) or correct for (4, 7) the
effects of nondifferential misclassification
on relative risk. We calculated predicted and
corrected relative risks for each simulation
using the misclassification rates and the true
and measured relative risks calculated for
that simulation. The purpose of this was to
examine the effects of applying procedures
that assume nondifferential misclassification
to data with nondifferential measurement
error. The formulas used are given in the
Appendix.
RESULTS
Misclassification probabilities higher
close to cutpoints
We first present an example to demon
strate that, when a continuous exposure
Page 3
Nondifferential Exposure Measurement Error 1235
variable with measurement error is dichot
omized with an arbitrary cutpoint, the prob
ability of misclassification may well not be
uniform across exposure values. Figure I
shows two identical scatterplots of measured
versus true values for a sample of 100 uni
formly distributed true values. Error terms
randomly sampled from a normal distribu
tion with mean 0 and standard deviation
150 were added to the true values to generate
measured values. The magnitude and direc
tion of measurement error were therefore
independent of the true exposure value.
Cutpoints of 2,000 (figure I, top) and
2,200 (figure I, bottom) were arbitrarily cho
sen to divide true and measured values into
high and lowexposure categories. In each
part of the figure, most of the misclassified
points (quadrants 1 and 3) have true values
close to the cutpoint. Points with true values
more distant from the cutpoint have little
misclassification. This example demon
strates the point that, when exposure cate
gories are formed from a continuous vari
able with normally distributed measurement
error, the probability of misclassification is
M
E
A
S
U
R
E
D
V
A
L
U
E
JUUU
2600
2600
2400
2200
2000
1800
1600
1400
1200
mnn
QUADRANT 1



QUADRANT 4
QUADRANT 2
QUADRANT 3
1400 1600
1800
2000
2200
2400
2600
TRUE VALUE
M
E
A
sU
R
E
D
V
A
L
U
E
3000
2600
2600
2400
2200
2000
1800
1600
1400
1200
1000
QUADRANT 1
•
QUADRANT 4
QUADRANT 2
: . „ • • •
QUADRANT 3
1400
1600 1800
2000
2200 24002600
TRUE VALUE
FIGURE 1. Measured values plotted against true values, from a simulated example with randomly generated
normally distributed measurement error (mean = 0, standard deviation = 150), showing correctly classified points
(quadrants 2 and 4) and misclassified points (quadrants 1 and 3) for a cutpoint of 2,000 (top) and a cutpoint of
2,200 (bottom).
Page 4
1236 Regal etal.
higher near the cutpoint used to form the
categories, regardless of the specific value
chosen for the cutpoint.
Probability of misclassification
For a given true exposure value with nor
mally distributed measurement error, the
probability that the corresponding measured
value will fall on the other side of a given
cutpoint (and thus be misclassified) can
readily be determined from a cumulative
normal distribution. In the examples in fig
ure 1, the probability of misclassification is
only 0.02 for true values 300 units (2 stan
dard deviations) from the cutpoint, but 0.48
for true values 10 units from the cutpoint.
Figures 2 and 3 both depict a situation in
which uniformly distributed true exposure
values ranging from 1,600 to 2,500 are
grouped into categories of low and high us
ing a cutpoint of 2,200. As in figure 1,
measurement error is assumed to be inde
pendent of disease status and to be normally
distributed with a standard deviation of 150.
For each true exposure value, the expected
probability of exposure misclassification
based on these assumptions was calculated
from a cumulative normal distribution. In
figures 2 and 3, the probabilities of misclas
sification are shown plotted as a dotted line
against true exposure values.
Figures 2 and 3 both also show an exposure
disease relation plotted as a solid line against
true exposure values. Figure 2 shows the
expected probability of disease for a linear
logistic model (a = 10, b = 0.004), and
figure 3 shows the expected probability of
disease p as a quadratic function of the true
exposure value E (p = 6.592  0.00673.E +
0.00000112E2), resulting in a Jshaped curve
over the specified range of values.
Differential misclassification when the
exposure variable is categorized
For the examples shown in figures 2 and
3, the probability of exposure misclassifica
tion and the probability of disease both de
pend on the true exposure value. In these
circumstances differential misclassification
is likely to arise.
The process by which differential misclas
sification arises may be seen from inspection
of the low category in figure 2. Those with
true exposure values close to the upper
bound of the low category are more likely
to have the disease (because they have higher
exposure). Consequently, within the low cat
egory persons with the disease will tend to
have higher exposure values than persons
without the disease. However, those with
true exposure values close to the upper
bound of the low category are also more
likely to be misclassified (solely because of
being closer to the cutpoint). The net result
KJ.t
0.6
0.5
0.4
0.3
0.2
0.1
0
1600 1700 1800 1900 2000 2100 2200 2300 2400 2500
TRUE VALUE
FIGURE 2. Probability of misclassification (
distributed exposure measurement error (mean = 0, standard deviation = 150), and probability of disease (
based on logistic regression model (a = 10, b = 0.004) plotted for true exposure values from 1,600 to 2,500.
LOW



1 H 1' 1 1
HIGH
1 1 H
p
R
O
B
A
B
I
L
I
T
Y
) into low and highexposure categories, based on normally
),
Page 5
Nondifferential Exposure Measurement Error 1237
1600 1700 1800 1900 2000 2100 2200 2300 2400 2500
TRUE VALUE
FIGURE 3. Probability of misclassification (
distributed exposure measurement error (mean = 0, standard deviation = 150), and probability of disease (
based on a linear model with a quadratic term, plotted for true exposure values from 1,600 to 2,500.
) into low and highexposure categories, based on normally
),
is that, in the low category, persons with
disease will be more likely to be misclassified
(simply because they have higher exposure
values) than persons without the disease.
Similar reasoning shows that, in the high
category, persons with the disease will be less
likely to be misclassified than are persons
without the disease. Thus, in both the high
and the low category, misclassification will
tend to be differential.
The degree and direction of differential
misclassification depend in part on the
exposuredisease model, as shown by the
example in figure 3. In the low category in
that example, the probability of disease is
highest at the lowest level of exposure. Per
sons with true exposure values close to the
upper bound of the low category are both
more likely to be misclassified and less likely
to have the disease than are persons with the
lowest true exposure values. The net result
is that, in the low category, persons without
the disease will be more likely to be misclas
sified than persons with the disease. There
fore, misclassification in the low category in
figure 3 will tend to be differential, but in
the opposite direction from that seen in the
low category in figure 2.
The calculated mean misclassification
probabilities for the situations depicted in
figures 2 and 3 showed differential misclas
sification in the direction predicted. In the
low category, for the situation depicted in
figure 2, those with the disease were more
likely to be misclassified than those without
the disease (18 percent vs. 9 percent), but
for the situation in figure 3, those with the
disease were less likely to be misclassified
than those without the disease (9 percent vs.
10 percent). In the high category, those with
the disease were less likely to be misclassified
than those without the disease in both figure
2 (17 percent vs. 21 percent) and figure 3
(14 percent vs. 22 percent). These findings
demonstrate the point that, for a given dis
tribution of exposure values and distribution
of measurement error, the probabilities of
misclassification for persons with or without
the disease vary with the model relating
exposure to disease.
Conditions giving rise to differential
misclassification
The preceding examples show differential
misclassification arising from a normally
distributed exposure measurement error and
two specific models for the diseaseexposure
relation. However, it can be seen from these
examples that the general mechanism by
which differential misclassification arises
does not depend on these specific distribu
tions and should be expected to arise in other
situations as well. The mathematical results
Page 6
1238 Regal et al.
to support this statement are given in the
Appendix.
As shown in the Appendix, two conditions
are necessary for differential misclassifica
tion to arise from nondifferential measure
ment error. 1) Within an exposure category,
the probability of misclassification must
vary with the underlying true exposure value
as can occur when categories are formed
from a continuous exposure variable. 2)
Within an exposure category, the probability
of disease must also vary with the underlying
true exposure value. This might, for exam
ple, be expected to occur within a smoking
category of "12 packs a day," since those
who smoke two packs a day may have a
higher risk of lung cancer than those who
smoke one pack a day.
Table 1 shows the relation between these
two conditions and the type of misclassifi
cation expected to arise from nondifferential
measurement error. When both conditions
are present, the general result is differential
misclassification. If either condition is ab
sent, the general result is nondifferential
misclassification.
Case 1 in table 1 corresponds to the situ
ation exemplified by figure 2, in which ex
posure categories are formed from a contin
uous variable but the probability of disease
depends on the underlying exposure value,
rather than on the category. Case 2 corre
sponds to the situation in which the true
exposure measurement is inherently cate
gorical and the probability of disease de
pends only on the category (for example, if
infants born in hospital A are at greater risk
than infants born in hospital B). Case 3
TABLE 1. Expected type of misclassification
arising from nondifferential measurement error
Case
1
2
3
4
Within exposure categories
Probability of
disease
varies with
underlying
true
exposure
level
Yes
No
No
Yes
Probability of
misclassification
varies with
underlying true
exposure level
Yes
No
Yes
No
Expected type
of
misclassification
Differential
Nondifferential
Nondifferential
Nondifferential
could arise if exposure and disease were not
related, since in that situation the probability
of disease would not vary within a category.
Case 3 might also arise over certain ranges
of exposure values if the exposuredisease
relation exhibited a threshold or a plateau
effect. A situation similar to case 4 could
arise if categories were formed from a con
tinuous variable, but the probability of mis
classification varied little within a category,
as might occur if the categories were narrow
or if the magnitude of error was very high.
Effects of nondifferential measurement
error on estimated relative risks
We investigated the effects of nondiffer
ential measurement error on estimates of
relative risk, using 200 simulations for each
of nine combinations of true relative risk
and exposure measurement error, as de
scribed in the Methods section. The results
are presented in tables 2 and 3, and the
corresponding formulas are shown in the
Appendix.
Table 2 shows the mean sensitivity and
specificity, overall and by disease status, for
each combination of measurement error and
relative risk. As the level of measurement
error increased, the probabilities of correct
classification decreased. For a given degree
of measurement error, the overall mean sen
sitivity and specificity were the same regard
less of the expected relative risk. However,
the probabilities of misclassification by dis
ease status varied with the expected relative
risk as well as with the degree of measure
ment error. The differences between persons
with the disease and persons without the
disease were greater at higher levels of
relative risk. For every combination of mea
surement error and relative risk, the mean
sensitivity was higher for persons with the
disease than for persons without the disease,
but the mean specificity was lower.
Mean values of the true and measured
relative risks based on the simulated data
over the 200 simulations are presented in
table 3. The true relative risks for the simu
lated data with no measurement error were
close to the expected values of relative risk.
Page 7
Nondifferential Exposure Measurement Error 1239
TABLE 2. Mean sensitivity (Se) and specificity
(Sp) overall and by disease status over 200
simulations for each combination of expected
relative risk (RR) and standard deviation of
normally distributed measurement error
Standard deviation of
measurement error
100
300
500
Se Sp Se Sp Se Sp
Expected RR = 1.66
(a = 4.5, b =
0.0019)
Overall
In diseased
In nondiseased
0.87 0.93 0.68 0.80 0.61 0.72
0.88 0.91 0.69 0.77 0.62 0.64
0.86 0.94 0.67 0.82 0.61 0.73
Expected RR = 2.49
(a = 8.0, b =
0.0035)
Overall
In diseased
In nondiseased
0.87 0.93 0.68 0.80 0.61 0.72
0.89 0.89 0.70 0.74 0.62 0.67
0.85 0.95 0.66 0.82 0.61 0.73
Expected RR = 5.00
(a = 16.0, to =
0.0072)
Overall
In diseased
In nondiseased
0.87 0.93 0.68 0.80 0.61 0.72
0.89 0.83 0.70 0.67 0.63 0.62
0.81 0.95 0.64 0.83 0.59 0.73
The measured relative risks for the simu
lated data with measurement error were
everywhere smaller than the true relative
risks. At each level of exposure measurement
error, the mean measured relative risk was
less than the true relative risk, showing that
in these simulations, misclassification on av
erage biased the estimates of relative risk
away from the true relative risk and toward
the null value of 1.0.
Data on misclassification rates are some
times used to predict the effects of misclas
sification on relative risks, either to estimate
the expected degree of bias or to determine
the sample size that may be required to
detect a certain level of risk in the presence
of misclassification. Such calculations often
use the assumption that misclassification is
nondifferential. Table 3 presents the mean
predicted relative risks that were calculated
from the simulated data using the assump
tion of nondifferential misclassification. If
misclassification were nondifferential, the
predicted relative risks would be equal to
the measured relative risks. However, the
mean predicted relative risks were lower
than the mean measured relative risks,
showing that for these examples, predictive
procedures that assume
misclassification do not in general give the
correct results when measurement error is
nondifferential. In these examples, the bias
introduced by misclassification was smaller
than would be predicted.
Procedures such as those described by
Kleinbaum et al. (4) and by Willett (7) can
be used to correct measured relative risks for
nondifferential misclassification. Table 3
presents the mean corrected relative risks
that were calculated from the simulated data
using the assumption of nondifferential mis
classification. If misclassification were non
differential, the corrected relative risks
would be equal to the true relative risks. The
mean corrected relative risks were higher
than the mean true relative risks, and the
greater the magnitude of measurement er
ror, the more inaccurate the correction. For
the highest relative risk and magnitude of
measurement error, the mean corrected rel
ative risk was more than double the true
relative risk. The proportion of samples in
which the corrected relative risk was greater
than the true relative risk ranged from 83.5
percent to 100 percent.
nondifferential
DISCUSSION
In this paper, we demonstrate the appar
ent paradox that nondifferential exposure
measurement error (measurement error un
related to disease status) can give rise to
differential misclassification, in which mis
classification probabilities differ by disease
status. This phenomenon is likely to occur
whenever exposure categories are formed
from a continuous exposure variable with
nondifferential measurement error, but the
probability of disease is a function of the
continuous variable, rather than of the cat
egories formed by grouping.
Our findings confirm and extend those
recently reported by Wacholder et al. (1),
who showed that blind assessment of expo
sure category does not necessarily ensure
nondifferential misclassification (1). The dif
Page 8
1240 Regal etal.
TABLE 3. Mean ± standard deviation of true, measured, predicted, and corrected relative risks (RR),
averaged over 200 simulations for each combination of expected relative risk and standard deviation of
normally distributed measurement error
Expected RR = 1.66
Mean true RR
Mean measured RR
Mean predicted RR
Mean corrected RR
Expected RR = 2.49
Mean true RR
Mean measured RR
Mean predicted RR
Mean corrected RR
Expected RR = 5.00
Mean true RR
Mean measured RR
Mean predicted RR
Mean corrected RR
100
1.6 ±0.15
1.6 ±0.15
1.5 ±0.11
1.8 ±0.21
2.5 ± 0.23
2.4 ± 0.20
2.1 ±0.15
3.0 ± 0.33
5.0 ± 0.50
4.4 ± 0.42
3.5 ± 0.27
6.8 ± 1.00
Standard deviation of measurement error
300
1.7 ±0.14
1.4 ±0.13
1.3 ±0.06
2.0 ± 0.39
2.5 ± 0.21
1.8 ±0.16
1.6 ±0.08
3.5 ± 0.71
5.0 ± 0.55
2.7 ± 0.25
2.1 ±0.14
10.8 ±11.67
500
1.7 ±0.14
1.3 ±0.10
1.2 ±0.04
2.1 ± 0.56
2.5 ± 0.25
1.5 ±0.14
1.4 ± 0.06
3.9 ± 3.00
5.0 ± 0.53
1.9 ±0.19
1.7 ±0.10
12.0 ±7.44
ferential misclassification that occurs when
categories are formed from a continuous
variable can arise in cohort studies, in which
exposure is measured before disease onset,
as well as in casecontrol studies.
Our results apply to any exposure cate
gories formed from continuous variables.
Even exposure categories that appear inher
ently categorical but bear some relation to
an underlying continuous exposure mea
surement could potentially be affected by
differential misclassification. Although, for
simplicity, we confine our examples to situ
ations with two categories, our results apply
to any number of categories.
As shown in the Appendix, the degree and
direction of differential misclassification are
a function of the distribution of exposure,
the definition of the exposure categories, the
distribution of measurement error, and the
relation between exposure and disease.
Thus, for the same degree of measurement
error and the same overall probabilities of
misclassification, the probabilities of mis
classification for persons with the disease
and persons without the disease will vary
with the form of the exposuredisease rela
tion. For the specific examples we used, the
effect of the resulting differential misclassi
fication was to bias the estimated relative
risk toward the null value. This will not
always be the case. We show elsewhere ex
amples of real data in which measurement
error is nondifferential, but the relative risk
is biased away from the null value (11).
Predicting the effects of
misclassification
There are a number of methods that may
be used to predict the effects of nondiffer
ential misclassification into ordered expo
sure categories on risk estimates or required
sample size, including those described by
Walker and Blettner (12) and by Marshall et
al. (8). These and other methods that assume
nondifferential misclassification have been
applied to ordinal exposure categories
formed from continuous variables with
measurement error (1315). However, our
results show that methods that assume non
differential misclassification do not in gen
eral correctly predict the effects of misclas
sification on risk estimates for categories
formed from a continuous variable. There
fore, such methods are likely to give incor
rect results for categories formed from di
Page 9
Nondifferential Exposure Measurement Error 1241
etary data or from other continuous expo
sure measurements and, in general, should
not be applied to such categories.
In our examples, the bias in estimates of
relative risk that arose from differential mis
classification was less than the bias predicted
by assuming nondifferential misclassifica
tion. The observation that in some circum
stances misclassification has less effect than
predicted implies that low estimates from
imperfect methods may sometimes repre
sent truly low relative risks, not high relative
risks attenuated by misclassification. An ad
ditional implication is that the sample sizes
needed to compensate for the effects of mis
classification on reducing statistical power
may not always be as large as suggested.
Correcting for the effects of
misclassification
Our results suggest that, when categories
have been formed from a continuous vari
able, estimates of relative risk should not
be corrected for misclassification using
methods that assume nondifferential mis
classification, such as those described by
Kleinbaum et al. (4) and Willett (7). The
assumption that random nondifferential er
ror leads to nondifferential misclassification
may result in "corrected" relative risks that
are considerably higher than the true relative
risks. Ironically, the greater the degree of
measurement error, the more inaccurate the
"correction," so that highly inaccurate mea
surements may have "corrected" risks more
than double the true risks.
Our results emphasize the point made by
Marshall (16) that methods of correction
and adjustment are crucially dependent
upon a set of assumptions that may well not
be correct, and that improving the quality
of exposure measurements may be a better
strategy than trying to correct for poor ex
posure measurements. Exposure measure
ments may have multiple types and sources
of error. The effects of these errors arc likely
to be complex and not easily predicted, and
correction for these errors is likely to be
difficult.
Differential misclassification commonly
unrecognized
Although the problems associated with
differential misclassification are well known
and well documented, it is commonly as
sumed that such misclassification arises only
from obvious differences in measurement
between persons with and without the dis
ease. Our findings suggest that differential
exposure misclassification is far more com
mon than is usually recognized, particularly
in cohort studies. Whenever exposure cate
gories are explicitly or implicitly formed
from a continuous exposure measurement,
even random imprecision in measurement
error in the continuous variable may well
give rise to differential misclassification.
Failure to recognize the likelihood of differ
ential misclassification may lead to incorrect
assumptions about the effects of nondiffer
ential measurement error on estimates of
relative risk.
Differential misclassification is likely to
arise from nondifferential measurement er
ror in many situations commonly encoun
tered in epidemiologic studies. Further re
search is necessary to delineate the expected
effects of differential misclassification asso
ciated with different types of nondifferential
measurement error, methods of forming cat
egories, distributions of exposure, and rela
tions of exposure to disease. Our results
show a need for more careful and critical
evaluations of the effects of different types
of exposure measurement error.
REFERENCES
1. Wacholder S, Dosemeci M, Lubin JH. Blind as
signment of exposure does not always prevent dif
ferential misclassification. Am J Epidemiol
1991;134:4337.
2. Fleiss JL. Statistical methods for rates and propor
tions. 2nd ed. New York: John Wiley & Sons, Inc,
1981.
3. Schlesselman JJ. Casecontrol studies: design, con
duct, analysis. New York: Oxford University Press,
1982.
4. Kleinbaum DG, Kupper LL, Morgenstern H. Epi
demiologic research: principles and quantitative
methods. Belmont, CA: Lifetime Learning Publi
cations, 1982.
5. Walker AM, Velema JP, Robins JM. Analysis of
casecontrol data derived in part from proxy re
Page 10
1242 Flegal et al.
spondents. Am J Epidemiol 1988;127:9O514.
6. Dosemeci M, Wacholder S, Lubin JH. Does non
differential misclassification of exposure always
bias a true effect toward the null value? Am J
Epidemiol 1990; 132:7468.
7. Willett W. Nutritional epidemiology. New York:
Oxford University Press, 1990.
8. Marshall JR, Priore R, Graham S, et al. On the
distortion of risk estimates in multiple exposure
level casecontrol studies. Am J Epidemiol
1981;113:46473.
9. SAS Institute, Inc. SAS user's guide: basics, version
5 ed. Cary, NC: SAS Institute, Inc, 1985.
10. Flegal KM, Brownie C, Haas JE>. The effects of
exposure misclassification on estimates of relative
risk. Am J Epidemiol 1986; 123:73651.
11. Keyl PM, Flegal KM, NietoGarcia FJ. Effects of
using selfreported versus measured weight and
height in epidemiologic analyses. (Abstract). Am J
Epidemiol 1991;134:7334.
12. Walker AM, Blettner M. Comparing imperfect
measures of exposure.
1985;121:78390.
13. Van Staveren WA, Burema J, Deurenberg P, et al.
Weak associations in nutritional epidemiology: the
importance of replication of observations on indi
viduals. Int J Epidemiol 1988;17(suppl):9649.
14. Freudenheim JL, Johnson NE, Wardrop RL. Nu
trient misclassification: bias in the odds ratio and
loss of power in the Mantel test for trend. Int J
Epidemiol 1989; 18:2328.
15. Hartman AM, Brown CC, Palmgren J, et al. Vari
ability in nutrient and food intakes among older
middleaged men: implications for design of epi
demiologic and validation studies using food re
cording. Am J Epidemiol 1990;132:9991012.
16. Marshall JR. The use of dual or multiple reports
in epidemiologic studies. Stat Med 1989,8:10419.
Am J Epidemiol
APPENDIX
Differential misclassification from nondifferential measurement error
We now show that misclassification may be nondifferential at a detailed level of exposure
but differential for aggregate levels of exposure. For a given population, let Pr(E,) denote the
probability that a randomly selected individual has a true exposure value E,. Let PT(D  £,)
denote the probability that a randomly selected individual with a true exposure value £, has
the disease. For the sake of simplicity, we assume no confounding. Define an exposure
category C as consisting of all true exposure values E within a specified range. Let Pr(Af  £,
e C) denote the probability that a randomly selected individual with true exposure value of
E, within the range of C is misclassified out of category C. Let the probability of exposure
misclassification for any true exposure value E, be independent of the probability of disease
(nondifferential), so that:
Px(M n D\E,(C) = PT(M \E,tC)x Pr(D \ E,)
(1)
By the definition of conditional probability, the average probability of disease among those
with true exposure E, within the exposure category C (Pr(£)c)) is the weighted average of the
probabilities of disease for the true exposure values within the category, weighted by the
probability of occurrence of each exposure value within the category:
£•)! S Pr(£,) = { 2 PT(D\E,) Pr(£,)  E Pr(£,)
E,iCE,tC
Pr(De) = 2
(2)
E,eCE,tC
Similarly, the average probability of being both misclassified and diseased given an exposure
value within the category C (Pr(Mc D Dc)) is the weighted average of the probabilities of
being both diseased and misclassified for the true exposure values within the category,
weighted by the probability of occurrence of each true exposure value within the category:
Pr{Mc H Dc) ={ E
E,tC
£>C
(3)
By the definition of conditional probability, the average probability within the category C of
being misclassified given disease is:
PT(MC\DC) = PT(MC n Dc)  PT(DC)
(4)
Page 11
Industrial Sampling Technique Application to Epidemiology 1243
APPENDIX TABLE 1. Fourfold tables
Disease present
Disease absent
Totals
High
"21
n t
showing cell numbers with
True exposure
Low
"12
n 2
Total
"t
n2
N
true exposure and with measured exposure
Measured exposure
High Low
n'it
n*2i
77' ,
n',2
n*22
"' 2.
Total
n*,
"V
N
From equations 24, the probability within the category of being misclassified for those with
the disease may be reexpressed as:
Pr(Mc  Dc) = { S Pr(£>  E,)Pr(M \ E.) Pr(£,) I ) I Pr(/> I £,)Pr(£,)l
£,<C
(5)
£,.C
Similar reasoning shows that the probability within the category of being misclassified for
those without the disease may be expressed as:
Pr(Mc  Dc) = ! £ (I  Pr(£>  E,)) Pr(M\ E,) Pr(£,)l I I 1 (1  Pr(£> I £,)) Pr(£,)} (6)
£>C £>C
The righthand sides of equations 5 and 6 will be identical (showing that the expected
misclassification rates are nondifferential) if the probability of misclassification is constant
over all values of E, within the category or if the probability of disease is constant over all
values of £, within the category or both.
However, if the probability of misclassification and the probability of disease are not
constant over all values of E, within the category, then the probability within the category
of being misclassified for those with the disease will not in general be the same as the
probability within the category of being misclassified for those without the disease. In other
words, misclassification will, in general, be differential under these conditions.
For many types of measurement error, the probability of misclassification will not be
invariant over all values of E,. In general, if the probability density of measured values of
exposure (x) for a given true exposure value of E is some function f(x \ E), then for a
category C = [a.b] the probability of misclassification conditional on the true value Ej is
given by:
Pr(M\E, *C)= 1  Ax\E,)dx (7)
For example, if measurement error is additive and normally distributed with mean p = 0
and variance a1, then for a true exposure value E, measured exposure values will be normally
distributed with mean n = E, and variance a2, and the probability of misclassification
conditional on the true value E, will be:
PT(M\E, eC)=\ I j== exp \(X f)2\ dx
(8)
Formulas used to calculate quantities in tables 2 and 3
Each simulation generated N individuals, with attributes for disease (present or absent),
true exposure category (high or low), and measured exposure category (high or low).
Individuals were crossclassified according to disease and true exposure, as shown in the left
portion of appendix table 1, and were also crossclassified according to disease and measured
exposure as shown in the right of the same table. The following quantities were calculated
Page 12
1244 Regal etal.
individually for each simulation and averaged over simulations to give the mean values
presented in tables 2 and 3:
True relative risk (R) = (nu/n.[)/(nn/n.2)
Measured relative risk = (n\\/n.{)/(n\2/n.2)
Sensitivity (U) = n't/n.i
Sensitivity for persons with the disease = nu/nu
Sensitivity for persons without the disease = n'2l/n2\
Specificity (V) = n'.2/n.2
Specificity for persons with the disease = ri[2/ni2
Specificity for persons without the disease = n*22/n22
Prevalence of exposure (E) = n^/N
Predicted relative risk assuming nondifferential misclassification =
\[URE + (1  V){\  E)] [(1  U)E + V{\  £)]}/
\[UE + (1  F)(l  E)][{\  U)RE+ V(l  E)]\
Corrected cell numbers assuming nondifferential misclassification
V I)
Corrected relative risk assuming nondifferential misclassification = (nc\jnc.\)/(nc\2/nc2)
The editors thank the following consultants for their assistance in the review of papers
in this special issue: Jacob Brody, T. T. Chen, Stephen Cunnion, James W. Curran, Alan
R. Dyer, Gabriel Escobar, Manning Feinleib, James Fries, Stephen C. Hadler, Richard S.
Hopkins, Jennifer Kelsey, Mark A. Klebanoff, Jess F. Kraus, Lewis Kuller, John F. Kurtzke,
Frank E. Lundin, Jr., James E. Maynard, Hal Morgenstern, Nancy Padian, Paolo Pasquini,
Charles Poole, William C. Reeves, Bernard Rosner, William Schaffner, Stanley H. Schu
man, Daniel G. Seigel, Jack Siemiatycki, Cladd E. Stevens, Robert B. Wallace, Martin
Weinrich, Jay D. Wenger, Allen J. Wilcox, David William, Walter Willett, Warren
Winkelstein, Jr., and Philip A. Wolf.
Page 13
VISITING FELLOWSHIPS IN EPIDEMIOLOGY, NATIONAL CENTRE FOR EPIDEMIOLOGY AND
POPULATION HEALTH, THE AUSTRALIAN NATIONAL UNIVERSITY. The National Centre for Epi
demiology and Population Health is funded by the Department of Health, Housing and Community Services
to carry out research in epidemiology, health economics, sociology, population studies, and statistics. The
Centre is seeking visiting epidemiologists to join its research staff for short periods of up to 12 months during
1992. Although the area of epidemiology is not specified, preference may be given to applicants whose research
interests link with those of current staff. The Centre conducts MSc and PhD research programs, a Master of
Applied Epidemiology by coursework and field placement and a Graduate Diploma in Population Health by
coursework as part of its national graduate training responsibilities. Appointees will be expected to contribute
to at least one of these programs. Applicants should have a PhD in epidemiology (or other qualifications,
including medical qualifications, combined with equivalent research experience), a sound research/publications
record, and the ability to work effectively both independently and as part of a multidisciplinary research centre.
The successful appointees will be offered a grant to contribute towards costs associated with the appointment
and applicants should indicate if they will be on sabbatical during the fellowship. The grant will include
economy air travel, a contribution to living expenses and in special circumstances salary may be paid. Note:
Applicants are urged to obtain further particulars from the Registrar before submitting an application.
Enquiries: Professor Douglas, Director, National Centre for Epidemiology and Population Health. Telephone
61 6 249 4578. Fax: 61 6 249 5608. Closing date for overseas applications: December 20, 1991. Applications can
be faxed on 61 6 249 5608.
Ref. 15.4.0.2. APPLICATIONS should be submitted in duplicate to the Registrar, The Australian National
University, GPO Box 4, Canberra ACT 2601, Australia, quoting reference number and including curriculum
vitae, list of publications, preferred commencement date and duration of the fellowship, and the names of at
least three referees. The University reserves the right not to make an appointment or to appoint by invitation
at any time. Further information is available from the Registrar.
THE AUSTRALIAN NATIONAL UNIVERSITY IS AN EQUAL OPPORTUNITY EMPLOYER
Epidemiologist
Burroughs Wellcome Co. is seeking an Epidemiologist to be a member of a team of six epidemiolo
gists responsible for an exciting program of observational research, and to be directly responsible
for a series of projects, primarily in the antiviral area
Candidates need (a an MD, MS, or MPH in epidemiology or related field such as biostatistics or
public health with concentration on quantitative research plus one year of statistical analysis/
programming experience OR b a PhD in public health or quantitative discipline, (a) or b) plus
one year conducting epidemiologic research, including experience performing multivanate analysis
Candidates selected for interview will take an onsite writing assessment Other factors to be
considered include experience working as a member of a team and experience with SAS.
At Burroughs Wellcome Co , you will enjoy a compensation and benefits package that is among
the best in the pharmaceutical industry, along with the advantages of being part of an innovative
pharmaceutical leader Please send your resume, indicating position number in your cover letter,
by December 30, 1991, to Burroughs Wellcome Co., Recruiting and Staffing, Pos. #631273,
3030 Cornwallis Road, Research Triangle Park, NC 27709.
For information on other job opportunities with Burroughs Wellcome Co . call our Job Information
Line at (919) 2488347
BURROUGHS WELLCOME CO.
A^ EQjai OrDo'^nity Employe'
Page 14
EGRET
StateoftheArt Epidemiological Computing
> B 2 S 
ticdUn Snivivil
Srouf 7
'
ji
FEATURES
• Intuitive, ScreenOriented User Interface
• Contextsensitive Help
• Runs on DOS Micros
MODELS
• Logistic Regression
• Cox Proportional Hazards Regression
with TimeDependent Covariates
• Conditional Logistic Regression
• Poisson Regression
• Logistic Regression with Random Effects
• KaplanMeier
• Fast, Exact Contingency Table Analysis
GRAPHICS*
DeviceIndependent Graphics
Online or Offline Plotting
Scatter Plots
KaplanMeier and Cox Survival Plots
Delta Beta and Fitted Value Plots
Annotate Plots with Lines, Text and
Movable Legend '
* Graphics option is separately priced
=( CALL OR WRITE FOR BROCHURE:
¥^ Statistics and Epidemiology Research Corporation
909 NE 43rd Street, Suite 310 Seattle, Washington 98105 (206)6323014
Statement of Ownership,
Management and
Circulation
IRt€par»d by 39 U S C 36851
f
"
AMERICAN JOURNAL 07 EPIDEMIOLOGY 0 0 0 2s2 j «  2 Oct. 9,
Tulcw Monthly
24 $190 US, $198 Vox
flaltiaoc*, H O 21205
Earl. V. Hart, J r . , 2007 B. nonu>«nt S t . , Baltioor., flD 21205
Mltam
QKSfSTSS"1 OSizrr.&x.
•~°~— »—.€.»»•»
t'SS'JZZnZL i . . • • • i • • i
" IcMHttMOwimnMinditT
6791
Kon*
6034
6034
73
6107
684
Non*
6791
6800
None
6039
6039
73
6112
6B8
Hon*
6B00