Am J Epidemiol 2003;158:14–21
American Journal of Epidemiology
Copyright © 2003 by the Johns Hopkins Bloomberg School of Public Health
All rights reserved
Vol. 158, No. 1
Printed in U.S.A.
Structure of Dietary Measurement Error: Results of the OPEN Biomarker Study
Victor Kipnis1, Amy F. Subar2, Douglas Midthune1, Laurence S. Freedman3,4, Rachel Ballard-
Barbash2, Richard P. Troiano2, Sheila Bingham5, Dale A. Schoeller6, Arthur Schatzkin7, and
Raymond J. Carroll8
1 Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD.
2 Applied Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD.
3 Department of Mathematics, Statistics and Computer Science, Bar Ilan University, Ramat Gan, Israel.
4 Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, Israel.
5 Medical Research Council, Dunn Human Nutrition Unit, Cambridge, United Kingdom.
6 Department of Nutritional Sciences, University of Wisconsin, Madison, WI.
7 Nutritional Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD.
8 Department of Statistics, Texas A&M University, College Station, TX.
Received for publication December 26, 2001; accepted for publication December 3, 2002.
Multiple-day food records or 24-hour dietary recalls (24HRs) are commonly used as “reference” instruments to
calibrate food frequency questionnaires (FFQs) and to adjust findings from nutritional epidemiologic studies for
measurement error. Correct adjustment requires that the errors in the adopted reference instrument be
independent of those in the FFQ and of true intake. The authors report data from the Observing Protein and
Energy Nutrition (OPEN) Study, conducted from September 1999 to March 2000, in which valid reference
biomarkers for energy (doubly labeled water) and protein (urinary nitrogen), together with a FFQ and 24HR, were
observed in 484 healthy volunteers from Montgomery County, Maryland. Accounting for the reference
biomarkers, the data suggest that the FFQ leads to severe attenuation in estimated disease relative risks for
absolute protein or energy intake (a true relative risk of 2 would appear as 1.1 or smaller). For protein adjusted
for energy intake by using either nutrient density or nutrient residuals, the attenuation is less severe (a relative
risk of 2 would appear as approximately 1.3), lending weight to the use of energy adjustment. Using the 24HR as
a reference instrument can seriously underestimate true attenuation (up to 60% for energy-adjusted protein).
Results suggest that the interpretation of findings from FFQ-based epidemiologic studies of diet-disease
associations needs to be reevaluated.
bias (epidemiology); biological markers; diet; energy intake; epidemiologic methods; nutrition assessment;
questionnaires; reference values
Abbreviations: DLW, doubly labeled water; FFQ, food frequency questionnaire; OPEN, Observing Protein and Energy Nutrition;
24HR, 24-hour dietary recall.
Much of the recent literature on the relation between diet
and cancer has been based on analytic epidemiologic studies
using food frequency questionnaires (FFQs). A number of
large prospective studies of this kind have failed to find a
consistent relation between dietary components (such as fat,
fiber, and fruits and vegetables) and cancers of the breast,
colon, or rectum (1–3), which may be explained by a true
lack of diet-cancer associations or, alternatively, by serious
methodological limitations of the studies themselves, espe-
cially due to FFQ measurement error.
Over the years, investigators have recognized that the
reported values from FFQs are subject to substantial error,
both systematic and random, that can profoundly affect the
design, analysis, and interpretation of nutritional epidemio-
logic studies (4–6). Dietary measurement error often attenu-
ates (biases toward one) the estimates of disease relative
Reprint requests to Dr. Victor Kipnis, Biometry Research Group, Division of Cancer Prevention, National Cancer Institute, Executive Plaza
North, Room 3124, 6130 Executive Boulevard, MSC 7354, Bethesda, MD 20892-7354 (e-mail: firstname.lastname@example.org).
by guest on June 7, 2013
Nutrient Biomarkers and Dietary Assessment Instruments 15
Am J Epidemiol 2003;158:14–21
risks and reduces statistical power to detect their signifi-
cance. Therefore, an important relation between diet and
disease may be obscured.
This problem has prompted researchers involved in large
epidemiologic investigations to integrate calibration
substudies that include a more intensive, but presumably
more accurate, reference method, typically multiple-day
food records (7) or multiple 24-hour dietary recalls (24HRs)
(8). Comparing reference measurements with those from the
FFQ enables adjustment for attenuation by using the regres-
sion calibration approach (7). However, the correct applica-
tion of this approach requires that the adopted reference
instrument satisfy two critical conditions. Although it may
be imperfect and contain measurement error, this error
should be independent of 1) true intake and 2) error in the
FFQ (9). Throughout this paper, we take these two condi-
tions as requirements for a valid reference instrument.
A great deal of accumulated evidence suggests that
common dietary report reference instruments are unlikely to
meet these requirements. Studies with the few biomarkers of
dietary intake that do qualify as valid reference measure-
ments (“reference” biomarkers), such as doubly labeled
water (DLW) for total energy expenditure and urinary
nitrogen for protein intake, demonstrate serious systematic
biases in all dietary report instruments that may be poten-
tially related (10–16). This has led to proposals for new
models of dietary measurement error that might explain why
the large prospective studies fail to find a relation between
diet and cancer, even were an important relation to exist (9,
For example, Kipnis et al. (9) considered two potential
systematic components of dietary measurement error. The
first component reflects correlation between error and true
intake (“intake-related” bias). The second component
(“person-specific” bias) is independent of true intake and
represents the difference between total within-person bias
and its intake-related component. The existence of
person-specific biases was proposed in all dietary report
instruments, and a sensitivity analysis demonstrated that
correlation between person-specific biases in the FFQ and
the reference instrument, if ignored, could lead to serious
underestimation of the degree of attenuation in a conven-
tional calibration study. In a subsequent paper, Kipnis et al.
(18) provided empirical evidence directly supporting their
hypothesis, based on the results from a validation study that
included the urinary nitrogen reference biomarker for
protein intake. Moreover, based on the urinary nitrogen data,
the measurement error model was extended to also include
intake-related bias in dietary report reference instruments
and was shown to fit the data statistically significantly better
than other proposed models.
In this paper, we take this further by analyzing data from
the Observing Protein and Energy Nutrition (OPEN) Study
that included reference biomarkers for protein (urinary
nitrogen) and energy (DLW) intakes, together with a FFQ
and a 24HR. This study enabled us to evaluate not only abso-
lute protein intake but also total energy and energy-adjusted
protein intakes (19). We were therefore able to investigate
the conjecture that energy adjustment substantially reduces
measurement error in reported intake and that remaining
error can be reliably corrected for by the common approach
MATERIALS AND METHODS
Effect of measurement error
The effects of dietary measurement error on the estimation
of disease risks are well known (9). The most important
concept is that of attenuation. Consider the disease model
R(D|T) = α0 + α1T, (1)
where R(D|T) denotes the risk of disease D on an appropriate
scale (e.g., logistic) and T is true habitual intake of a given
nutrient, also measured on an appropriate scale. The slope α1
represents an association between the nutrient intake and
disease (e.g., log relative risk). In practice, FFQ-reported
intake Q is used instead of unknown true intake T. We
assume throughout that dietary measurement error is nondif-
ferential with respect to disease D; that is, reported intake
contributes no additional information about disease risk
beyond that provided by true intake. To an excellent approx-
imation, fitting model 1 to reported intake leads to estimating
not the true risk parameter α1 but the product
α1 and the slope λ1 in the linear regression calibration model,
T = λ0 + λ1Q + ξ, where ξ denotes random error.
In nutritional studies, the value of λ1 is usually between 0
and 1 (21), so dietary measurement error leads to underesti-
mation of the true risk parameter. This underestimation is
called attenuation, and λ1 is called the attenuation factor.
Values of λ1 closer to zero lead to more serious underestima-
tion of risk. For example, a true relative risk of 2 would
appear as 20.4 = 1.32 if the attenuation factor were 0.4 and as
20.2 = 1.15 if the attenuation factor were 0.2.
Measurement error also leads to loss of statistical
power for testing disease-exposure associations. Approx-
imately, the sample size required to reach the desired
statistical power to detect a given risk is proportional
the correlation between the reported and true intakes and
and are the between-person variances of the reported and
true intakes, respectively (21). In particular, for a given FFQ,
the required sample size is inversely proportional to the
squared attenuation factor, . For example, if the true atten-
uation factor were 0.2, the sample size, calculated by
assuming that λ1 = 0.4, should be multiplied by 0.42/0.22 = 4 to
achieve the nominal power.
where ρ(Q,T) is
Estimation of the attenuation factor
Estimation of the attenuation factor λ1 requires collecting
additional reference measurements to compare with the FFQ
in the calibration substudy (9). The common approach in
nutritional epidemiology uses a more intensive dietary report
method as the reference instrument, assuming that it is unbi-
ased at the individual level and that its errors are independent
of those in the FFQ (7). In this paper, we contrast this model
with the measurement error model of Kipnis et al. (18) that
specifies the same general error structure in the dietary
( , )σT
by guest on June 7, 2013
16 Kipnis et al.
Am J Epidemiol 2003;158:14–21
report reference instrument (F) as the one for the FFQ (Q).
To be fully identifiable, the model requires data from a refer-
ence biomarker. The model is specified as
where µQj, µFj, and µMj are time-specific group intercepts for
the FFQ, 24HR, and biomarker, respectively, which sum to
zero over j; βQ0 and βF0 are the overall group intercepts for
the FFQ and 24HR; βQ1 and βF1 are the slopes reflecting
intake-related bias for the FFQ and 24HR; ri and si are
person-specific biases for the FFQ and 24HR that are inde-
pendent of true intake Ti, have means zero, variances
, respectively, and are correlated with the correlation
coefficient ρrs; and εij, uij, and υij are within-person random
errors for the FFQ, 24HR, and biomarker, with means zero
and variances , , and , respectively, that are assumed
to be independent of each other and of other terms in the
model, except that “within-pair” errors (εij, uij), (εij, υij), and
(uij, υij) are allowed to be correlated, if the corresponding
measurements are taken contemporaneously.
In the presence of the reference biomarker, model 2 does
not require an instrument F to estimate the error components
in the FFQ. However, its inclusion enables us to additionally
analyze the error structure of the dietary report reference
instrument and its relation to that in the FFQ.
The common model may be obtained from model 2 by
ignoring information from the reference biomarker and
assuming that the dietary report instrument F contains no
intake-related bias (βF1 = 1) or person-specific bias
. We use the following general form of this model:
When the model parameters are used, the attenuation
factor is expressed as
and the correlation of the FFQ and true intake is given by
Both are estimated by replacing the parameters by their
estimates based on the corresponding model 2 or 3. Doing so
is essentially equivalent to adjusting for random measure-
ment error in the adopted reference instrument.
The OPEN data
The OPEN Study was conducted by the National Cancer
Institute from September 1999 to March 2000. The recruit-
ment procedure, subject characteristics, and detailed study
conduct are described in the companion paper in this issue of
the Journal (22). Briefly, 261 male and 223 female partici-
pants aged 40–69 years were healthy volunteers from Mont-
gomery County, Maryland. Each participant was asked to
complete a FFQ and a 24HR on two occasions. The FFQ was
completed within 2 weeks of visit 1 and then approximately
3 months later, within a few weeks of visit 3. The 24HR was
completed at visit 1 and then approximately 3 months later at
visit 3. Participants received their DLW dose at visit 1 and
returned 2 weeks later (visit 2) to complete the DLW assess-
ment. In addition, repeat DLW measurements were collected
from 14 male and 11 female volunteers who received their
second DLW dose at the end of visit 2 and returned 2 weeks
later to complete their DLW assessment. Participants
provided two 24-hour urine collections during the 2-week
period between visit 1 and visit 2, verified for completeness
by using the PABAcheck method (23). Since approximately
81 percent of nitrogen intake is excreted through the urine
(18) and nitrogen constitutes 16 percent of protein, the
urinary nitrogen values were adjusted, by dividing by 0.81
and multiplying by 6.25, to estimate individual protein
The adopted FFQ was the Diet History Questionnaire,
developed and evaluated at the National Cancer Institute
(24–28). The 24HR was a highly standardized version using
the five-pass method, developed by the US Department of
Agriculture for use in national dietary surveillance (29).
Throughout, we applied the logarithmic transformation to
energy and protein to make measurement error in the DLW
and urinary nitrogen biomarkers additive and homoscedastic
and to better approximate normality. In addition to total
energy and protein, the reference biomarkers in the OPEN
Study enabled us to evaluate dietary measurement error for
energy-adjusted protein intake. Because modeling relations
between disease and multiple covariates measured with error
is beyond the scope of this paper, we assumed that model 1
included only energy-adjusted exposure and that energy was
not related to disease. We used two energy adjustment
methods: nutrient density and nutrient residual (19). Protein
density was calculated as the percentage of energy from pro-
tein sources and was then log transformed. The protein
residual was calculated from the linear regression of protein
on energy intake on the log scale. Both protein density and
residual were calculated for each instrument by using the
protein and energy intakes as measured by this instrument.
The convention used for dealing with biomarker-based
derived measures is explained in the Appendix.
For all dietary variables, we excluded extreme outlying
values that fell outside the interval given by the 25th percen-
tile minus twice the interquartile range to the 75th percentile
plus twice the interquartile range. For each variable and each
cov T Q
cov T Q
( )var Q
by guest on June 7, 2013
Nutrient Biomarkers and Dietary Assessment Instruments 17
Am J Epidemiol 2003;158:14–21
instrument, no more than six outlying values for men and
four for women were excluded from the analyses.
The estimates of the model parameters and their standard
errors were obtained by using the method of maximum like-
lihood under the assumption of normality of the random
terms in the models. Standard errors were checked for accu-
racy by using the bootstrap method. Comparisons of corre-
lated parameters (such as attenuation factors estimated by
two models) were performed by comparing the ratios of their
differences to the standard errors of the differences calcu-
lated by the bootstrap method with the standardized normal
The descriptive statistics for measurements taken by using
the different instruments are provided in the companion
paper (22). For energy-adjusted protein, the results for only
nutrient density are shown since the results for nutrient
residual were similar.
Attenuation and correlation with true intake
Table 1 displays the estimates of the attenuation factor λ1
and correlation ρ(Q, T) between the FFQ and true usual
intake resulting from applying models 2 and 3 to energy,
protein, and energy-adjusted protein. The table contrasts the
estimated values when the common approach versus the
biomarker-based model was used.
Absolute intakes. The biomarker-based attenuation
factors were distressingly close to zero. For example, for
women, the attenuation factors for energy and protein were
0.039 and 0.137, respectively. The attenuation factors esti-
mated by using the common approach were substantially
higher (underestimating the corresponding attenuation) for
energy at 0.128 (p = 0.05 when compared with the bio-
marker-based attenuation) and somewhat higher for protein
at 0.158 (p = 0.73). Results for men showed a similar pattern,
with the attenuation factor being statistically significantly
overestimated (p < 0.001) when the common approach for
energy was used.
The correlations between the FFQ and true intake were
also very low. The biomarker-based correlations for energy
and protein intakes for women were 0.098 and 0.298, respec-
tively, while the common approach overestimated correla-
tions at 0.261 (p = 0.10) and 0.334 (p = 0.81). For men, the
correlation estimated by using the common approach was
statistically significantly biased upward (p < 0.001) for
Energy-adjusted intakes. For energy-adjusted intakes,
the attenuation factors were somewhat higher (attenuation
was lower) than for absolute intakes. For example, for
women, the biomarker-based estimate for protein density
was 0.316 compared with 0.137 for protein (p = 0.10).
Results for men showed a similar pattern, with the highly
statistically significant difference in attenuation between
absolute and energy-adjusted protein intakes (p < 0.001).
The attenuation factor estimated by using the common
approach for women again appeared substantially more opti-
mistic than the biomarker-based estimate at 0.501 versus
0.316 (p = 0.10) for protein density. For men, however, no
marked difference was found between the attenuation factors
estimated by using the two models. Correlations between
FFQ and true intake for energy-adjusted protein displayed
the same pattern as those for attenuation factors.
TABLE 1. Estimated attenuation factor λ1 and correlation ρ(Q,T) of food frequency
questionnaire-reported intake (Q) and true intake (T)* in the Observing Protein and
Energy Nutrition Study, Maryland, September 1999–March 2000
* As estimated by the model accounting for the reference biomarker of intake or the
common model accounting only for the 24-hour recall (24HR) as reference measurements.
† Defined as model 2 in the text.
‡ Numbers in parentheses, standard error.
§ Defined as model 3 in the text.
EnergyMale Biomarker based†
by guest on June 7, 2013
18 Kipnis et al.
Am J Epidemiol 2003;158:14–21
Error structure of the FFQ and 24HR
Intake-related bias. Table 2 demonstrates across-the-
board intake-related bias in both FFQ and 24HR measure-
ments. All biomarker-based estimates of slopes βQ1 and βF1
were substantially smaller than the desired value of 1.0,
leading to the flattened slope phenomenon. If anything,
energy adjustment seemed to make this phenomenon even
more pronounced. The flattened slope in the FFQ estimated
by using the common approach is often not seen as clearly.
For example, for males, the DLW-based estimate of βQ1 for
energy intake was 0.49, but the common estimate was 0.83.
Person-specific bias. Table 2 also demonstrates the exist-
ence and importance of person-specific biases in reported
intakes from the FFQ and 24HR. Compared with the true
between-person variance , the person-specific biases
and were quite dominant for absolute intakes. For
example, for females reporting protein intake, the FFQ
person-specific bias variance was 0.110 and the 24HR
person-specific bias variance was 0.026, quite large
compared with the variance of true intake (0.037). Energy
adjustment considerably reduced person-specific biases.
Continuing with the example above, for protein density, this
variance was reduced from 0.110 to 0.023 for the FFQ and
from 0.026 to 0.012 for the 24HR, while the variance of true
intake remained practically the same (0.035). However, even
for energy-adjusted intakes, person-specific biases were still
substantial and highly significantly different from zero.
Table 2 also demonstrates substantial positive correlation
ρr,s between person-specific biases in the FFQ and 24HR.
The correlation increased after energy adjustment, especially
Within-person random error. For
within-person random variation
same magnitude as between-person variation
intake. Similar to person-specific bias, it was considerably
reduced by energy adjustment. As expected because of day-
to-day variation in intake, within-person random variation
in the 24HR was substantially greater. Interestingly, rela-
tive to variation of true intake, it was only moderately
reduced by energy adjustment. In all cases considered,
within-person random errors were not statistically signifi-
cantly correlated across instruments.
“Nonprotein” intake. Using the measurements for protein
and energy on each instrument, we also evaluated dietary
measurement error for nonprotein-energy-contributed nutri-
ents (“nonprotein” for short), for both absolute nonprotein
and energy-adjusted nonprotein intakes. The results for
absolute nonprotein intake were similar to the results for
energy, and the results for energy-adjusted nonprotein were
similar to the results for energy-adjusted protein.
in the FFQ was of the
In this paper, we focused mostly on the attenuation factor
because it directly affects the observed relative risks and the
sample size necessary to detect diet-disease associations in
epidemiologic studies. The critical requirement for our
results that the adopted biomarkers represent valid reference
instruments, that is, their errors are unrelated to true intakes
and errors in dietary report instruments, is supported by
accumulated evidence for both the adjusted urinary nitrogen
(18) and DLW (30). The OPEN Study yielded the following
TABLE 2. Variance of true intake and parameters of dietary measurement error in the food frequency questionnaire and 24-hour
dietary recall,* the Observing Protein and Energy Nutrition Study, Maryland, September 1999–March 2000
* As estimated by the model accounting for the reference biomarker of intake or the common model accounting only for the 24-hour dietary recall (24HR) as
† FFQ, food frequency questionnaire.
‡ Defined as model 2 in the text.
§ Numbers in parentheses, standard error.
¶ Defined as model 3 in the text.
σ2T × 102
on true intake
on true intake
bias in FFQ
σ2r × 102
bias in 24HR
σ2s × 102
FFQ and 24HR
error in FFQ
σ2ε × 102
error in 24HR
σ2u × 102
EnergyMale Biomarker based‡2.6 (0.27)§ 0.49 (0.15)0.66 (0.10)12.2 (1.2)3.2 (0.61) 0.45 (0.08)5.3 (0.48)
24HR based¶4.4 0.68) 0.83 (0.15)1 9.7 (1.2)0 3.2 (0.28) 5.3 (0.47)
Female Biomarker based2.4 (0.29) 0.24 (0.17)0.46 (0.13) 11.2 (1.3)3.2 (0.78)0.28 (0.11)3.9 (0.37) 7.9 (0.75)
24HR based 3.7 (0.81) 0.53 (0.20)110.3 (1.3)0 3.9 (0.37)7.9 (0.75)
ProteinMaleBiomarker based4.4 (0.57)0.67 (0.15)0.70 (0.11)13.3 (1.4)3.9 (0.94) 0.18 (0.10)3.7 (0.33)9.3 (0.82)
24HR based6.1 (1.0)0.55 (0.14)113.5 (1.5)03.7 (0.33) 9.3 (0.82)
FemaleBiomarker based 3.7 (0.71)0.65 (0.21)0.60 (0.16) 11.0 (1.5)2.6 (1.1) 0.24 (0.15) 4.8 (0.46)12.0 (1.2)
24HR based3.9 (1.2)0.70 (0.25)110.7 (1.6)04.8 (0.46) 12.0 (1.2)
MaleBiomarker based3.1 (0.47) 0.46 (0.08)0.62 (0.11)1.6 (0.25) 1.2 (0.50)0.40 (0.15)1.2 (0.11)5.8 (0.51)
24HR based2.4 (0.53)0.60 (0.13)11.4 (0.30)01.2 (0.11)5.8 (0.51)
FemaleBiomarker based 3.5 (0.72) 0.38 (0.11)0.39 (0.13)2.3 (0.36)1.2 (0.60)0.94 (0.19)1.4 (0.13) 6.8 (0.65)
24HR based1.7 (0.59)1.2 (0.36)10.30 (0.76)01.4 (0.13)6.8 (0.65)
by guest on June 7, 2013
Nutrient Biomarkers and Dietary Assessment Instruments 19
Am J Epidemiol 2003;158:14–21
First, the impact of FFQ measurement error on total
energy and absolute protein intakes was severe and in agree-
ment with the findings of Kipnis et al. (18) for protein intake.
Attenuation factors were vexingly close to zero, as were the
correlations with true intake.
Second, the impact of measurement error seemed less
severe after energy adjustment. As follows from expression
4, the attenuation factor is inversely proportional to the vari-
ances of both person-specific bias and within-person random
error relative to between-person variation of true intake.
Since these relative variances decreased substantially after
energy adjustment (table 2) because of correlated errors in
reporting protein and energy, energy-adjusted protein was
less affected by measurement error compared with absolute
protein intake. However, the estimated attenuation factors
for energy-adjusted intakes were in the range 0.32–0.41
(table 1), indicating that measurement error still remained an
Third, the 24HR was seriously flawed, suffering from
intake-related bias and from person-specific bias that was
correlated with person-specific bias in the FFQ. As a result, it
violated both requirements for a valid reference instrument
and in most cases substantially misrepresented the impact of
measurement error in the FFQ. As follows from formula A1 in
the Appendix, bias in the attenuation factor λF calculated by
using the common approach depends on the sum of the values
for slope βF1 and expression
Table 2 reveals that, for absolute intakes, the relative vari-
ances of person-specific biases in the FFQ and 24HR and the
correlation between them were sufficiently large to override
the small values of βF1 and to raise λF above the true attenu-
ation factor λ1. The same remained true for energy-adjusted
protein in women, where the effect of reduced person-
specific biases was compensated for by the increased corre-
lation between them. As a result, the 24HR underestimated
true attenuation. On the other hand, for energy-adjusted
protein in men, the two effects essentially cancelled each
other, demonstrating that a flawed reference instrument may
sometimes produce a good estimate.
Our results are in line with previous data presented on
protein intake. For women in the British Medical Research
Council study (18), the urinary-nitrogen-based attenuation
factor for protein was 0.187, while the common approach
based on a 4-day weighed food record produced an overly
optimistic estimate of 0.282. The former is slightly larger
than the 0.137 obtained in the OPEN Study, while the latter
is noticeably more optimistic than our 24HR-based estimate
of 0.158 (p = 0.08). The correlations of FFQ with true intake
were 0.284 (urinary nitrogen based) and 0.432 (record
based) compared with our values of 0.298 (urinary nitrogen
based) and 0.334 (24HR based), respectively. Neither differ-
ence approaches statistical significance.
An important consideration is whether our results could be
affected by the fact that biomarkers in the OPEN Study were
collected mostly over one season. We analyzed 24HRs taken
in different seasons in cross-sectional national survey data
(Continuing Survey of Food Intakes by Individuals 1994–
1996) by region and gender, and we found no seasonal fluc-
tuations in energy or protein intakes. However, if seasonality
were to exist, it would affect only the estimated mean usual
intake and would not change the higher-order parameters
presented in tables 1 and 2.
Since DLW measures total energy expenditure, it would
be important to adjust the data for long-term weight change
to enable DLW to truly represent usual energy intake. Doing
so over the 2-week DLW period may introduce only more
random error, however, since only a small amount of within-
person week-to-week fluctuations in energy balance can be
explained by contemporary changes in weight (31). Even
using the 3-month OPEN Study period may not adequately
represent long-term weight changes, especially given
protocol differences in fasting conditions between the first
and last visits (22). Nevertheless, when we adjusted indi-
vidual DLW measurements for the weight change over either
the 2-week or 3-month period, the results did not change
materially for either absolute or energy-adjusted nutrients.
Recently, Willett (20) suggested that any evaluation of a
FFQ would be invalid unless heterogeneity in the study
population due to gender, age, and body size was adjusted
for. To address this issue, we performed further analyses that
included age in 5-year groups and the logarithm of body
mass index as covariates in the models. The results did not
change substantially except for energy in women; the atten-
uation factor and correlation of the FFQ with true intake
became even closer to zero.
Our results have important implications for nutritional
epidemiology. First, they question the ability of FFQs to
detect diet-disease associations for absolute nutrient intakes.
While some journals have recently required that energy
adjustment be used in the analysis of nutrient-disease associ-
ations, the practice has been controversial (32, 33). Our data
clearly document failure of the FFQ to provide a sufficiently
accurate report of absolute protein, nonprotein, and energy
intakes to enable detection of their moderate associations
with disease. For example, with the attenuation factors of
0.08 for energy intake for males and 0.04 for females, a true
relative risk of 2.0 would appear as 1.06 and 1.03, respec-
tively, using the FFQ data. Needless to say, such small rela-
tive risks are not detectable in epidemiologic studies since
their signal is smaller than the noise caused by confounders.
It is plausible that similarly small attenuation factors would
be found for many other nutrients, although it would require
a suitable reference biomarker for each nutrient to confirm
Second, it appears that FFQ-based energy-adjusted
nutrient intakes may just be sufficiently accurate to use in
large cohort studies to detect moderate diet-disease associa-
tions; a relative risk of 2.0 would appear close to 1.3, which
could be at the limits of detection. The benefits of adjusting
for energy intake have been discussed previously at the
general level (19, 32). Our conclusion is necessarily a quali-
fied one, since our study was restricted to energy-adjusted
protein and nonprotein intakes. There is no guarantee that the
results will be as favorable for nonprotein components such
as energy-adjusted fat intake. Even less could be speculated
about the effect of energy adjustment for non-energy-
contributing nutrients. Nevertheless, until further evidence
ρr s ,
( ) σs
by guest on June 7, 2013
20 Kipnis et al.
Am J Epidemiol 2003;158:14–21
becomes available on other nutrients, use of energy-adjusted
intakes seems the best working approach for nutritional
epidemiology, at least under the assumption that energy is
not related to disease. Note, however, that biomarker-based
attenuation factors for energy-adjusted protein intake are
between 0.32 and 0.41, indicating that measurement error
has a substantial negative impact on the statistical power of
observational epidemiologic studies.
Third, our results throw into question use of the 24HR as a
reference instrument for validation/calibration studies. In the
OPEN Study, such use substantially overestimated perfor-
mance of the FFQ for absolute intakes of energy and nonpro-
tein. The results also cast some doubt on the performance of
the 24HR as a reference for energy-adjusted intakes. For
example, for protein density in women (table 1), the bio-
marker-based attenuation factor was estimated as 0.3 com-
pared with the 24HR-based estimate of 0.5. Use of the latter
would lead to underestimation of the required sample size by
a factor of 2.8 = 0.52/0.32, with profound effects on the
power to detect diet-disease associations.
The OPEN Study provides solid evidence of measurement
errors in a FFQ as they pertain to energy intake and both
absolute and energy-adjusted protein and nonprotein intakes.
Further studies of a similar design are needed to confirm our
results, especially to clarify whether 24HRs or multiple-day
food records can be used reliably as reference instruments in
validation/calibration studies, at least for energy-adjusted
intakes. Unfortunately, few dietary biomarkers qualify as
valid reference instruments; that is, they have errors unre-
lated to true intakes and errors in dietary report instruments.
Most other biomarkers, such as vitamin C or beta-carotene,
measure concentrations of related constituents for which the
quantitative relation to dietary intake is unknown and
depends on individual characteristics (e.g., concomitant
intake of other nutrients, obesity, or smoking habits) (34).
Therefore, such concentration-based biomarkers cannot
provide valid reference measurements and at best can serve
only as correlates of intake. Further work should explore
whether a combination of data from dietary report and bio-
marker measurements for energy or protein can be used to
assess dietary exposure variables for which no reference
Dr. Carroll’s research was supported by a grant from the
National Cancer Institute (CA-57030) and by the Texas
A&M Center for Environmental and Rural Health via a grant
from the National Institute of Environmental Health
1. Hunter DJ, Spiegelman D, Adami HO, et al. Cohort studies of
fat intake and the risk of breast cancer—a pooled analysis. N
Engl J Med 1996;334:356–61.
2. Fuchs CS, Giovannucci EL, Colditz GA, et al. Dietary fiber and
the risk of colorectal cancer and adenoma in women. N Engl J
3. Michels KB, Giovannucci E, Joshipura KJ, et al. Prospective
study of fruit and vegetable consumption and incidence of
colon and rectal cancers. J Natl Cancer Inst 2000;92:1740–52.
4. Beaton GH, Milner J, Corey P, et al. Sources of variance in 24-
hour dietary recall data: implications for nutrition study design
and interpretation. Am J Clin Nutr 1979;32:2546–59.
5. Freudenheim JL, Marshall JR. The problem of profound mis-
measurement and the power of epidemiologic studies of diet
and cancer. Nutr Cancer 1988;11:243–50.
6. Freedman LS, Schatzkin A, Wax Y. The impact of dietary mea-
surement error on planning a sample size required in a cohort
study. Am J Epidemiol 1990;132:1185–95.
7. Rosner B, Willett WC, Spiegelman D. Correction of logistic
regression relative risk estimates and confidence intervals for
systematic within-person measurement error. Stat Med 1989;8:
8. Kaaks R, Riboli E. Validation and calibration of dietary intake
measurements in the EPIC project: methodological consider-
ations. Int J Epidemiol 1997;26(suppl):S15–25.
9. Kipnis V, Carroll RJ, Freedman LS, et al. Implications of a new
dietary measurement error model for estimation of relative risk:
application to four calibration studies. Am J Epidemiol 1999;
10. Bandini LG, Schoeller DA, Cyr HN, et al. Validity of reported
energy intake in obese and nonobese adolescents. Am J Clin
11. Livingstone MBE, Prentice AM, Strain JJ, et al. Accuracy of
weighed dietary records in studies of diet and health. BMJ
12. Heitmann BL. The influence of fatness, weight change, slim-
ming history and other lifestyle variables on diet reporting in
Danish men and women aged 35–65 years. Int J Obes 1993;17:
13. Heitmann BL, Lissner L. Dietary underreporting by obese indi-
viduals—is it specific or non-specific? BMJ 1995;311:986–9.
14. Martin LJ, Su W, Jones PJ, et al. Comparison of energy intakes
determined by food records and doubly labeled water in women
participating in a dietary-intervention trial. Am J Clin Nutr
15. Sawaya AL, Tucker K, Tsay R, et al. Evaluation of four meth-
ods for determining energy intake in young and older women:
comparison with doubly labeled water measurements of total
energy expenditure. Am J Clin Nutr 1996;63:491–9.
16. Black AE, Bingham SA, Johansson G, et al. Validation of
dietary intakes of protein and energy against 24 urinary N and
DLW energy expenditure in middle-aged women, retired men
and post-obese subjects: comparisons with validation against
presumed energy requirements. Eur J Clin Nutr 1997;51:405–
17. Prentice R. Measurement error and results from analytic epide-
miology: dietary fat and breast cancer. J Natl Cancer Inst 1996;
18. Kipnis V, Midthune D, Freedman LS, et al. Empirical evidence
of correlated biases in dietary assessment instruments and its
implications. Am J Epidemiol 2001;153:394–403.
19. Willett WC. Nutritional epidemiology. Chapter 5. New York,
NY: Oxford University Press, 1990.
20. Willett W. Commentary: dietary diaries versus food frequency
questionnaires—a case of undigestible data. Int J Epidemiol
21. Kaaks R, Riboli E, van Staveren W. Calibration of dietary
intake measurements in prospective cohort studies. Am J Epi-
22. Subar AF, Kipnis V, Troiano RP, et al. Using intake biomarkers
by guest on June 7, 2013
Nutrient Biomarkers and Dietary Assessment Instruments 21
Am J Epidemiol 2003;158:14–21
to evaluate the extent of dietary misreporting in a large sample
of adults: the OPEN Study. Am J Epidemiol 2003;158:1–13.
23. Bingham SA, Cummings JH. Urine nitrogen as an independent
validatory measure of dietary intake: a study of nitrogen bal-
ance in individuals consuming their normal diet. Am J Clin
24. Subar AF, Thompson FE, Smith AF, et al. Improving food fre-
quency questionnaires: a qualitative approach using cognitive
interviewing. J Am Diet Assoc 1995;95:781–8.
25. Subar AF, Midthune D, Kulldorff M, et al. Evaluation of alter-
native approaches to assign nutrient values to food groups in
food frequency questionnaires. Am J Epidemiol 2000;152:279–
26. Subar AF, Ziegler RG, Thompson FE, et al. Is shorter always
better? Relative importance of questionnaire length and cogni-
tive ease on response rates and data quality for two dietary
questionnaires. Am J Epidemiol 2001;153:404–9.
27. Subar AF, Thompson FE, Kipnis V, et al. Comparative valida-
tion of the Block, Willett, and National Cancer Institute food
frequency questionnaires: the Eating at America’s Table Study.
Am J Epidemiol 2001;154:1089–99.
28. Thompson FE, Subar AF, Brown CC, et al. Cognitive research
enhances accuracy of food frequency questionnaire reports:
results of an experimental validation study. J Am Diet Assoc
29. Moshfegh AJ, Raper N, Ingwersen I, et al. An improved
approach to 24-hour dietary recall methodology. Ann Nutr
Metab 2001;45(suppl 1):156.
30. Schoeller DA. Measurement error of energy expenditure in
free-living humans by using doubly labeled water. J Nutr 1988;
31. Edholm OG, Healy MJR, Wolfe HS, et al. Food intake and
energy expenditure in army recruits. Br J Nutr 1970;24:1091–
32. Willett W, Howe GR, Kushi L. Adjustment for total energy
intake in epidemiological studies. Am J Clin Nutr 1997;
33. Freedman LS, Kipnis V, Brown CC, et al. Comments on
“Adjustment for total energy intake in epidemiological stud-
ies.” Am J Clin Nutr 1997;65(suppl):1229S–31S.
34. Kaaks RJ. Biochemical markers as additional measurements in
studies of the accuracy of dietary questionnaire measurements:
conceptual issues. Am J Clin Nutr 1997;65(suppl):1232S–9S.
Derived Reference Measures Based on the Observed
In the OPEN Study, replications of the DLW measurement
were available for only a small sample of 25 persons (14 men
and 11 women). This fact did not affect the results for total
energy intake since the DLW measurements were remark-
ably consistent across replications. The coefficient of varia-
tion in the DLW measurements was only 5.1 percent, in
effect indicating that energy expenditure was measured with
very little error.
However, a technical difficulty arose in the analysis of
nonprotein and energy-adjusted nutrients. The error in the
biomarker-based derived reference measures was almost
entirely influenced by the error in the urinary nitrogen
measurements, where the coefficient of variation was 17.6
percent. As a result, attempting to estimate the within-person
variance of the derived reference measurements as a param-
eter in the model led to relatively large standard errors in the
main analysis and to instability in the procedure for boot-
On the basis of these facts, in dealing with the derived
reference measurements for nonprotein and energy-adjusted
protein and nonprotein intakes, we used the following
convention. When defining biomarker-based reference
measures for nonprotein as well as nutrient density and
nutrient residual, we used the first DLW observation with
both the first and second repeat urinary nitrogen observa-
tions. In theory, doing so induced some correlation between
repeat biomarker-based reference observations, but the
DLW measurement error was so small that this correlation
could be ignored in practice.
Bias in the Attenuation Factor Based on the Dietary
Report Reference Instrument
For a valid reference biomarker M, the attenuation factor is
expressed as λM = cov(M, Q)/var(Q) = cov(T,Q)/var(Q) (18).
Thus, the biomarker-based attenuation factor λM is equal to
the true attenuation factor λ1. However, the attenuation
factor λF based on the common approach with a dietary
report reference instrument is given by λF = cov(F,Q)/var(Q) =
Taking into account expression 4 for the true attenuation
factor λ1, we can rewrite this expression as
Thus, the attenuation factor λF is generally biased. The
relative bias, defined by the expression in parentheses,
depends on intake-related biases in the FFQ and dietary
report instrument F, reflected by slopes βQ1 and βF1, respec-
tively; the variances of their person-specific biases relative
to variation in true intake,
and the correlation ρr,s between person-specific biases.
Values of slope βF1 less than one decrease λF relative to true
attenuation factor λ1, whereas positive values of
and , respectively;
as well as values of slope βQ1 less than one, increase λF.
ρr s ,
by guest on June 7, 2013