Page 1
Volume 7, Issue 1 2011 Article 1
The International Journal of
Biostatistics
Fitting a Bivariate Measurement Error Model
for Episodically Consumed Dietary
Components
Saijuan Zhang, Texas A&M University
Susan M. Krebs-Smith, National Cancer Institute
Douglas Midthune, National Cancer Institute
Adriana Perez, University of Texas School of Public Health
Dennis W. Buckman, Information Management Services,
Inc.
Victor Kipnis, National Cancer Institute
Laurence S. Freedman, Gertner Institute for Epidemiology
and Public Health Research
Kevin W. Dodd, National Cancer Institute
Raymond J. Carroll, Texas A&M University
Recommended Citation:
Zhang, Saijuan; Krebs-Smith, Susan M.; Midthune, Douglas; Perez, Adriana; Buckman, Dennis
W.; Kipnis, Victor; Freedman, Laurence S.; Dodd, Kevin W.; and Carroll, Raymond J. (2011)
"Fitting a Bivariate Measurement Error Model for Episodically Consumed Dietary
Components," The International Journal of Biostatistics: Vol. 7 : Iss. 1, Article 1.
Available at: http://www.bepress.com/ijb/vol7/iss1/1
DOI: 10.2202/1557-4679.1267
©2011 Berkeley Electronic Press. All rights reserved.
Page 2
Fitting a Bivariate Measurement Error Model
for Episodically Consumed Dietary
Components
Saijuan Zhang, Susan M. Krebs-Smith, Douglas Midthune, Adriana Perez, Dennis
W. Buckman, Victor Kipnis, Laurence S. Freedman, Kevin W. Dodd, and
Raymond J. Carroll
Abstract
There has been great public health interest in estimating usual, i.e., long-term average, intake
of episodically consumed dietary components that are not consumed daily by everyone, e.g., fish,
red meat and whole grains. Short-term measurements of episodically consumed dietary
components have zero-inflated skewed distributions. So-called two-part models have been
developed for such data in order to correct for measurement error due to within-person variation
and to estimate the distribution of usual intake of the dietary component in the univariate case.
However, there is arguably much greater public health interest in the usual intake of an
episodically consumed dietary component adjusted for energy (caloric) intake, e.g., ounces of
whole grains per 1000 kilo-calories, which reflects usual dietary composition and adjusts for
different total amounts of caloric intake. Because of this public health interest, it is important to
have models to fit such data, and it is important that the model-fitting methods can be applied to
all episodically consumed dietary components.
We have recently developed a nonlinear mixed effects model (Kipnis, et al., 2010), and have
fit it by maximum likelihood using nonlinear mixed effects programs and methodology (the SAS
NLMIXED procedure). Maximum likelihood fitting of such a nonlinear mixed model is generally
slow because of 3-dimensional adaptive Gaussian quadrature, and there are times when the
programs either fail to converge or converge to models with a singular covariance matrix. For
these reasons, we develop a Monte-Carlo (MCMC) computation of fitting this model, which
allows for both frequentist and Bayesian inference. There are technical challenges to developing
this solution because one of the covariance matrices in the model is patterned. Our main
application is to the National Institutes of Health (NIH)-AARP Diet and Health Study, where we
illustrate our methods for modeling the energy-adjusted usual intake of fish and whole grains. We
demonstrate numerically that our methods lead to increased speed of computation, converge to
reasonable solutions, and have the flexibility to be used in either a frequentist or a Bayesian
manner.
KEYWORDS: Bayesian approach, latent variables, measurement error, mixed effects models,
nutritional epidemiology, zero-inflated data
Page 3
Author Notes: This paper forms part of the Ph.D. dissertation of the first author at Texas A&M
University. The research of Zhang, Perez and Carroll was supported by a grant from the National
Cancer Institute (R37-CA057030). This publication is based in part on work supported by Award
Number KUS-CI-016-04, made by King Abdullah University of Science and Technology
(KAUST).
Page 4
1 INTRODUCTION
This paper is about the important public health problem of understanding the
distribution of episodically consumed dietary component intakes in terms of
their energy-adjusted amounts, and relating this to diet-disease relationships.
Before commenting in more detail, we first discuss the literature for simpler
problems that are also of interest.
In nutritional surveillance and nutritional epidemiology, there is consider-
able interest in understanding the distribution of usual dietary intake, which
is defined as long-term daily average intake. In addition, of interest is the
regression of this intake on measured covariates, which is needed to correct
diet-disease relationships for measurement error in assessing diet. If the di-
etary component of interest is ubiquitously consumed, as most nutrients are,
the data are continuously distributed and methods are well-established for
solving both problems. See for example Nusser, et al. (1997) for surveillance
and Carroll, et al. (2006) for measurement error modeling.
Another class of dietary components is those which are episodically con-
sumed, as is true of most foods, e.g., fish, red meat, dark green vegetables,
whole grains. When consumption is measured by a short-term instrument
such as a 24 hour recall, hereafter denoted by 24hr, the episodic nature of
these dietary components means that their reported intake may either equal
zero on a non-consumption day, or is positive on a day the component is
consumed. In many studies, non-consumption days predominate for several
episodically consumed foods of interest. For example, in our data example,
for fish and whole grains, 65% and 12% reported no consumption on both
of two administrations of the 24hr, respectively. Thus, data on episodically
consumed dietary components are zero-inflated data with measurement error.
Recently, Tooze, et al. (2006) for nutritional surveillance and Kipnis, et al.
(2009) for nutritional epidemiology have reported so-called two-part meth-
ods, which are actually nonlinear mixed effects models, for analyzing episod-
ically consumed dietary components in the univariate case. These methods
are known commonly as the “NCI method” because many of the co-authors
of these papers are members of the National Cancer Institute (NCI), and
because SAS routines based upon the NLMIXED procedure are available
at http://riskfactor.cancer.gov/diet/usualintakes/, an NCI web site. Other
1
Zhang et al.: Model for Episodically Consumed Dietary Components
Published by Berkeley Electronic Press, 2011
Page 5
derstanding the usual intake of an episodically consumed dietary compo-
nent adjusted for energy intake (caloric intake), along with the distribu-
tion of usual intake of energy. This is critical because it addresses the
issue of dietary component composition, and makes comparable diets of
individuals whose usual intakes of energy are very different.
ample, the U.S. Department of Agriculture’s Healthy Eating Index-2005
(www.cnpp.usda.gov/HealthyEatingIndex.htm) is a measure of diet quality
that assesses conformance to Federal dietary guidance. One component of
that index is the number of ounces of whole grains consumed per 1000 kilo-
calories: there are other items in the HEI-2005 that deal with episodically
consumed dietary components, and all of them are adjusted for energy intake.
The data needed to compute such variables are thus the usual intake of the
dietary component consumed and the usual amount of calories consumed, and
(possibly normalized) ratios of them.
Recently, Kipnis, et al. (2010) have developed a model for an episodically
consumed dietary component and energy, see Section 2. They fit this model
using nonlinear mixed effects models with likelihoods computed by adaptive
Gaussian quadrature using the SAS procedure NLMIXED. However, as de-
scribed in Section 2 and documented in Section 4, this form of computation
can be slow, and can have serious convergence issues. This is extremely prob-
lematic, because of the importance of the problem and the fact that solutions
will find wide use in the nutrition community, but only if they are numerically
stable.
In this paper, we take an alternative Markov Chain Monte Carlo (MCMC)
approach to computation, which is faster and numerically more stable. There
are many good introductory papers reviewing MCMC, such as Casella, et al.
(1992), Chib, et al. (1995) and Kass, et al. (1998). Effectively, we exploit
the well-known fact (Lehmann and Casella, 1998, Chapter 6.8) that in fully
parametric regular models of the type we study, Bayesian posterior means
of parameters are asymptotically equivalent to their corresponding maximum
likelihood estimators. To implement an MCMC approach in our problem, there
are technical issues that have to be overcome, including the fact that one of
the covariance matrices in the model of Kipnis, et al. (2010) is patterned.
Schafer (2001), Tooze, et al. (2002) and Li, et al. (2005).
We are interested in the more complex public health problem of un-
As an ex-
two-part models in different contexts are described for example in Olsen and
2
The International Journal of Biostatistics, Vol. 7 [2011], Iss. 1, Art. 1
http://www.bepress.com/ijb/vol7/iss1/1
DOI: 10.2202/1557-4679.1267
Page 6
the parameter estimates to then estimate the distributions of the usual intake
of energy and energy-adjusted usual intake of dietary components.
In Section 2, we describe the model of Kipnis, et al. (2010). In Sec-
tion 2, we also briefly outline some of the details of our implementation, al-
though the technical details are given in the Appendix. In Sections 3 and
4, we take up the analysis of the NIH-AARP Study of Diet and Health
(http://dietandhealth.cancer.gov/) as an illustration of our model and method.
Section 5 gives concluding remarks.
2 Data and Model
2.1 The Data
In practice, the response data often come from repeated 24hr. Necessarily, due
to cost and logistical reasons, the number of recalls is limited, and is rarely
greater than 2. In a 24hr, what is observed is whether a dietary component
is consumed, and if it is consumed, the reported amount. In addition, the
amount of energy reported to be consumed is also available. Thus, for person
i = 1,...,n, and for the k = 1,...,mirepeats of the 24hr, the data are?Yik=
• Yi1k= Indicator of whether the episodically consumed dietary component
is consumed.
(Yi1k,Yi2k,Yi3k)T, where
• Yi2k= Amount of the dietary component consumed as reported by the
24hr, which equals zero if the dietary component is not consumed.
• Yi3k= Amount of energy consumed as reported by the 24hr.
There are also generally covariates such as age category, ethnic status and in
many cases the results of reported intakes from a food frequency questionnaire.
We will generically call these covariates X.
2.2 The Model
Here we describe the nonlinear mixed effects latent variable model of Kipnis,
et al. (2010). There are i = 1,...,n individuals and k = 1,...,mi repeats
Besides fitting the model, our main focus in this paper is to discuss how to use
3
Zhang et al.: Model for Episodically Consumed Dietary Components
Published by Berkeley Electronic Press, 2011
Page 7
the episodically consumed dietary component is consumed, the amount if it is
consumed, and the amount of energy. Also with the observed data, we will have
covariates for the individual, generically called X, see below for more precise
notation. Finally, Kipnis, et al. (2010) use what are called in nutritional
epidemiology “person-specific random effects” which are generically denoted
by U, so that individuals actually differ from one another in usual intake when
they have the same values of the covariates.
To be more precise, for the ithindividual there are covariates (Xi1,Xi2,Xi3):
Xi1are the covariates for the indicator of consumption, Xi2are the covariates
for the consumption amount of the dietary component of interest, and Xi3are
the covariates for the consumption of energy. Often, in practice, the covari-
ates for each observed data component are the same, so that Xi1= Xi2= Xi3.
Along with the covariates, there are corresponding person specific random ef-
fects (Ui1,Ui2,Ui3), the role of which is to allow different people who share
the same covariates to have different amounts of usual intakes. As we will
see shortly, there are also errors accounting for day-to-day variation. Only
the covariates, the person-specific random effects, and, because of transfor-
mations, the variances of the random errors are relevant to the definitions of
usual intake, which are given below at equations (6)-(7).
The model of Kipnis, et al. (2010) uses a latent variable approach. Let
(Wi1k,Wi2k,Wi3k) be latent variables that are assumed to follow the linear
mixed effects model
of the 24hr. Also, the observed data have three parts, relating to whether
Wijk= XT
ijβj+ Uij+ ϵijkfor j = 1,2,3, (1)
where (Ui1,Ui2,Ui3) = Normal(0,Σu) are the person-specific random ef-
fects, while the within-person errors that account for day-to-day variation
(ϵi1k,ϵi2k,ϵi3k) = Normal(0,Σϵ). The (Ui1,Ui2,Ui3) and (ϵi1k,ϵi2k,ϵi3k) are mu-
tually independent.
The observed data are related to the latent variables as follows:
Yi1k
= I(Wi1k> 0);
= Yi1kg−1(Wi2k,λF);
= g−1(Wi3k,λE),
(2)
Yi2k
(3)
Yi3k
(4)
where I(·) is the indicator function and g−1(x,λ) is the inverse of the Box-Cox
transformation g(x,λ) = (xλ− 1)/λ for λ ̸= 0 and g(x,0) = log(x) if λ = 0.
4
The International Journal of Biostatistics, Vol. 7 [2011], Iss. 1, Art. 1
http://www.bepress.com/ijb/vol7/iss1/1
DOI: 10.2202/1557-4679.1267
Page 8
We used the same Box-Cox transformations as Kipnis, et al. (2009, 2010).
Under the model defined by (1)-(4), the probability to consume follows the
probit model
pr(Yi1k= 1|Xi1,Ui1,Ui2,Ui3) = Φ(XT
where Φ(·) is the standard normal distribution function. The probit model is
commonly used to model a relationship between a binary dependent variable
and one or more independent variables. The probit link was used in Kipnis, et
al. (2010) to allow the day-to-day variation in whether a food is consumed to
be correlated with the amount of energy consumed, and in such a way that the
day-to-day variation random variables (ϵi1k,ϵi2k,ϵi3k) are jointly normal, thus
facilitating both nonlinear mixed effects software and the MCMC. The Box-
Cox transformations in (3)-(4) allow for skewed distributions typically seen
with dietary data. Of course, the notation in (5) means that consumption
depends on (Ui1,Ui2,Ui3) only through Ui1.
Under the assumption that the 24hr is unbiased for usual (mean) in-
take, the usual intake of the dietary component and energy are given as
TFi = E(Yi2k|Xi1,Xi2,Ui1,Ui2) and TEi = E(Yi3k|Xi3,Ui3). Kipnis, et al.
(2009, 2010) use a Taylor series approximation E{g−1(v + ϵ)|v) ≈ g−1(v,λ) +
(1/2)var(ϵ){∂2g−1(v,λ)/∂v2}. Using this approximation, see equation (19) of
Kipnis, et al. (2009), and under the covariance matrix restriction described
below in Section 2.3, they show that the usual intake TFiof the dietary com-
ponent and the usual intake TEiof energy for individual i are given as
i1β1+ Ui1), (5)
TFi = Φ(XT
TEi = g∗{XT
i1β1+ Ui1)g∗{XT
i3β3+ Ui3,λE,Σϵ(3,3)},
i2β2+ Ui2,λF,Σϵ(2,2)},(6)
(7)
where the (j,k) element of Σϵ is denoted as Σϵ(j,k) and g∗(v,λ,σ2
g−1(v,λ) + (1/2)σ2
because g∗(·) is an approximate inverse of g(·). We can combine the usual
intakes of dietary component and energy in various ways, e.g., the number of
ounces of whole grains per 1000 kilo-calories, i.e., 1000 × TFi/TEi.
ϵ) =
ϵ{∂2g−1(v,λ)/∂v2}. Of course, (6)-(7) are approximations
Remark 1 The Taylor series approximation to computing expectations of
inverses of the Box-Cox transformation is used here because it was used by
Kipnis, et al. (2009, 2010). More precise quadrature formulae can be used,
and we have done so, finding almost no numerical changes. The computational
convenience of the approximation makes it attractive.
5
Zhang et al.: Model for Episodically Consumed Dietary Components
Published by Berkeley Electronic Press, 2011
Page 9
2.3Restriction on the Covariance Matrix
There are two restrictions necessary in the specification of Σϵ. First, following
Kipnis, et al. (2009, 2010), we set ϵi1k and ϵi2k to be independent.
intuitive way to think about the independence between the first two is that
whether the dietary component is consumed or not and the amount consumed
are assumed to be independent. This actually makes sense because a dietary
component being consumed cannot indicate how much was consumed. Second,
for identifiability of β1and the distribution of Ui1, we require that var(ϵi1k) =
1, because otherwise the marginal probability of consumption is Φ{(XT
Ui1)/var1/2(ϵi1k)}. Without this second restriction, β1, var(Ui1), cov(Ui1,Ui2)
and cov(Ui1,Ui3) are identified only up to scale factors. Hence we have that
The difficulty with parameterizations such as (8) is that (s13,s23,s22,s33)
cannot be left unconstrained, or else (8) need not be a covariance ma-
trix. Define s13 = ρ13s1/2
|Σϵ| = s22s33(1 − ρ2
must be non-negative, and hence we cannot allow the correlations (ρ13,ρ23)
to vary freely. There are many ways to parameterize Σϵin an unrestricted
way that forces it to be positive semi-definite. Here we use a polar coordi-
nate representation, ρ13= γ cos(θ) while ρ23= γ sin(θ), with γ ∈ (−1,1) and
θ ∈ (−π,π).
The zero entries in (8) are not required, although they are implicit in the
two part model used in the original papers involving only the episodically
consumed dietary component and not energy (Tooze, et al., 2006; Kipnis, et
al., 2009) and they make intuitive sense in our context. We have chosen to
use this restriction for these reasons and especially so that the marginal model
for the episodically consumed dietary component is the same as that in the
literature.
Kipnis, et al. (2010) explore a sample selection model (Heckman, 1976,
1979; Leung and Yu, 1996; Kyriazidou, 1997; Min and Agresti, 2002) that does
not have this restriction. They found that such a sample selection model can be
very unstable in our context, with the components of Σuand Σϵvarying wildly.
The
i1β1+
Σϵ=
10s13
0s22 s23
s13 s23 s33
.
(8)
33 and s23 = ρ23(s22s33)1/2. Then the determinant
13− ρ2
23). Since Σϵis a covariance matrix, its determinant
6
The International Journal of Biostatistics, Vol. 7 [2011], Iss. 1, Art. 1
http://www.bepress.com/ijb/vol7/iss1/1
DOI: 10.2202/1557-4679.1267
Page 10
Although it is possible to use MCMC computations to fit the sample selection
model, given the acceptance of the restriction in nutritional epidemiology and
of the NCI method, we focus on the covariance model (8).
Remark 2 It is very important to allow for Σϵbeing non-diagonal. The term
s23̸= 0 simply reflects the reality that, within a person and hence conditional
on (Ui1,Ui2,Ui3), the amount of food reported consumed and the amount of
energy consumed are sometimes highly correlated. The reason we allow s13̸= 0
is to account for the very real possibility that, again within a person, the very
fact that one consumes a food leads to a higher or lower reported energy
(caloric) intake.
2.4 Model Fitting and Computation
It is possible in principle to fit model (1)-(8) using nonlinear mixed effects soft-
ware. Kipnis, et al. (2010) use the SAS procedure PROC NLMIXED. How-
ever, we have found that such implementation is slow and not very stable, with
many issues of convergence. NLMIXED uses adaptive Gaussian quadrature
to integrate the likelihood over the distribution of random effects. NLMIXED
can have convergence problems, especially when there are too many, or too
few, zeros. What typically happens is that corr(Ui1,Ui2) tries to go to 1.00 or
sometimes even −1.00, or that var(Ui1) or var(Ui2) tries to go to 0.00. When
one of these things happens, the model usually converges, according to the
change-in-likelihood criterion, but the Hessian is not positive definite. Occa-
sionally, NLMIXED fails to converge at all. In general, we have found that
when NLMIXED does not have such numerical problems, its results and ours
are in reasonable agreement. These issues are described in more detail in
Section 4.2.
Hence, for stability and speed, we have turned to a Bayesian approach
for fitting the model described by equations (1)-(8). We emphasize that the
Markov Chain Monte Carlo computation can either be thought of as a strictly
Bayesian computation with ordinary Bayesian inference, or as a means of
developing frequentist estimators of the crucial parameters, based on the well-
known fact that in parametric models such as ours, the posterior mean of the
parameters is a consistent and asymptotically normally distributed frequentist
estimator, see for example Lehmann and Casella (1998, Chapter 6.8).
7
Zhang et al.: Model for Episodically Consumed Dietary Components
Published by Berkeley Electronic Press, 2011
Page 11
Our computational algorithm, described in detail in the appendix, uses
Gibbs sampling with some Metropolis-Hastings steps. We have implemented
this approach in both Matlab and R, and it is fast enough for practical use. In
the NIH-AARP Diet and Health Study described in Section 3, with a sample
size of 899, for a burn-in of 1,000 steps followed by 10,000 MCMC iterations,
our Matlab and R programs take approximately 2 minutes and 11.7 minutes
on an Intel(R) Xeon(TM) CPU with 3.73GHz and 7.8GB of RAM in a Linux
system, respectively. For a burn-in of 5,000 steps followed by 15,000 MCMC
iterations, our Matlab and R programs take approximately 3 minutes and 17.5
minutes, respectively. Both programs are available from the first author.
We have also developed an implementation in WinBUGS with a BUGS
model called from R by using the package R2WinBUGS. Details are available
from the third author. As to be expected, the WinBUGS code is much slower
than the custom programs, taking approximately 5 hours (Pentium computer
with 3.5GHz CPU and 1.99GB of RAM in a Windows system) for a burn-
in of 1,000 steps followed by 10,000 MCMC samples. We are also currently
developing a SAS macro for use by the nutritional community. On various test
data sets, the WinBUGS, R, SAS and Matlab code gave very similar answers.
In our empirical work, we use the Matlab code.
Remark 3 There are important data conventions that we use. These are de-
scribed in detail in the Appendix. For example, in Section A.1, we mention
that covariates are always standardized to have sample mean zero and sample
variance one. The reason is a matter of scaling: energy intake is in terms of
calories, which are typically in the 1,000’s, so that the corresponding regres-
sion parameters, without standardization, with the FFQ energy as a covariate,
would necessarily be tiny, making it hard to develop a plausible prior distri-
bution. As described in Section A.1, we also standardize the responses for
numerical stability and weaken dependence upon the prior distributions, and
in Section A.2 we describe why this standardization makes sense. We have fit
our method with various different prior distributions, and there is very little
sensitivity to prior specification.
8
The International Journal of Biostatistics, Vol. 7 [2011], Iss. 1, Art. 1
http://www.bepress.com/ijb/vol7/iss1/1
DOI: 10.2202/1557-4679.1267
Page 12
2.5 The Role of Covariates
Covariates are important for estimating the distribution of usual intakes, for
at least three reasons.
• First, as a matter of model specification. Consider abstractly the simple
linear regression model Y = β0+β1X +ϵ: given X, ϵ might be normally
distributed, but if X is not simultaneously normally distributed, then
removing it from the model would give a model Y = κ0+ ξ, and ξ
would not be normally distributed, and our model assumptions would
be violated.
• Subar, et al. (2006) studied using food frequency questionnaire (FFQ)
data as covariates to estimate the distributions of individual usual in-
takes of episodically consumed dietary components. They found strong
and consistent relationships between FFQ and 24hr. This supports the
postulate that FFQ data may provide important covariate information in
supplementing 24hr for estimating usual intake of dietary components.
Besides FFQ, there are some other clinical covariates such as gender,
age, body mass index (BMI), etc. that may be associated with usual in-
take. Thus, our covariates included an intercept, age, BMI, the FFQ for
energy intake and the FFQ for the dietary component of interest. They
are used to reduce the error with which the usual intake is estimated,
and to make more plausible our distributional assumptions.
• Kipnis, et al. (2009) state in their abstract “One feature of the proposed
method is that additional covariates potentially related to usual intake
may be used to increase the precision of estimates of usual intake and of
diet-health outcome associations”. In their introduction they state “In
Section 3, using data from the Eating at Americas Table Study (EATS),
we quantify the increased precision obtained from including a FFQ report
as a covariate”.
A referee has asked whether the β-coefficients for the covariates are inter-
pretable, and whether it would be of interest to make inferences about whether
the covariates are associated with usual intake. Because energy adjusted usual
intakes involve three β-coefficients for each covariates, interpretation of any one
of them is difficult. Whether a particular covariate is associated with usual in-
9
Zhang et al.: Model for Episodically Consumed Dietary Components
Published by Berkeley Electronic Press, 2011
Page 13
take is a mildly interesting question, but if far less important than estimating
distributions of energy-adjusted usual intakes.
2.6 Simulation Study
We performed a simulation study that was based upon our empirical study
given in Section 3, in order to ascertain whether the methodology results in
reasonably unbiased estimates of (β1,β2,β3,Σu,Σϵ). To test whether our
algorithm can produce non-near-zero correlations when the true correlations
are actually far from zero, we simulated 200 data sets, each of size n = 1,000,
roughly the size of the NIH-AARP calibration cohort in Section 3. In this
simulation, we used the same covariates for each of the three outcomes, i.e.,
we set Xi1= Xi2= Xi3. The covariate vectors had three components, the
first equal to 1.0 for an intercept, and the other two generated as Normal(0,1).
The parameters (β1,β2,β3) were generated as Uniform(0,1) for each simulated
data set. We used
The mean of the posterior means of (β1,β2,β3) was unbiased overall and are
not reported here. The mean of the posterior means of (Σu,Σϵ) were
Crucially, for the main purposes of estimating the distribution of usual intakes,
the posterior means were essentially unbiased for estimating Σu. As seen in
the Appendix, Σϵ also has a role in the definition of usual intake, and it
too was essentially unbiased except for a small bias of size 0.08 in estimating
cov(ϵi1k,ϵi3k), a term that does not appear in the definitions of usual intake.
Σu=
0.50 0.24 0.24
0.24 0.70 0.35
0.24 0.35 0.70
;
Σϵ=
1.00 0.00 0.47
0.00 1.20 0.78
0.47 0.78 1.40
.
?Σu=
0.51 0.27 0.27
0.27 0.68 0.33
0.27 0.33 0.67
;
?Σϵ=
1.00 0.00 0.39
0.00 1.23 0.80
0.39 0.80 1.43
.
Remark 4 We give here only the results of a single simulation because what
we have shown above are representative of other simulations we have done.
10
The International Journal of Biostatistics, Vol. 7 [2011], Iss. 1, Art. 1
http://www.bepress.com/ijb/vol7/iss1/1
DOI: 10.2202/1557-4679.1267
Page 14
For example, we have simulated cases where the off-diagonal elements of Σu
were zero and cases where some of them were negative. We have also simulated
cases that the diagonal elements of Σuwere smaller and somewhat larger. In
none of the cases did we see any significant bias in the estimates.
Remark 5 We have not displayed the simulation results for the Proc NLMIXED
procedure because in those cases that it converges, it is very nearly unbiased,
just like our method.
3Empirical Analysis: Methods
3.1Introduction to the NIH-AARP Diet and Health
Study
The NIH-AARP Diet and Health Study, see http://dietandhealth.cancer.gov/
and Schatzkin, et al. (2001), has two components, the main study with diet
assessed by a Food Frequency Questionnaire (FFQ) and a calibration sub-
study with additional diet assessment by two 24hr. We considered a part of
the main study that consists of np = 142,364 women, who contributed an
FFQ as well as relevant demographic characteristics. The data used were the
same as in Sinha, et al. (2010). The covariates X used included an intercept,
age, body mass index, the FFQ for energy intake and the FFQ for the dietary
component in question. The 24hr was not available for these subjects. Thus,
the primary sample represents data on Xi= Xi1= Xi2= Xi3for i = 1,...,np.
In addition to the primary sample, there was a subsample of nv = 899
women in the calibration sub-study who completed an FFQ and demo-
graphic characteristics, so that there are Xi = Xi1 = Xi2 = Xi3 for
= np+ 1,...,nv+ np. In addition, these women completed two 24hr. Hence
we observed (Yi1k,Yi2k,Yi3k) for k = 1,2 and for i = np+ 1,...,nv+ np.
We illustrate our computational algorithm using data from both the two
24hr and the FFQ for whole grains, fish and energy intake, along with covari-
ates. Following Kipnis, et al. (2009, 2010), the FFQ values for fish, whole grain
and energy intake were transformed using λ = 0.25, λ = 0.33 and λ = 0.00,
respectively. The 24hr used λ = 0.50, λ = 0.33 and λ = 0.33, respectively.
i
11
Zhang et al.: Model for Episodically Consumed Dietary Components
Published by Berkeley Electronic Press, 2011
Page 15
The MCMC calculations result in samples from the posterior distribution
of B = (βT
np+ 1,...,nv+ np. The means of the samples for (B,Σu,Σϵ) can be taken
as frequentist point estimates of these quantities, and are denoted here as
(?β1,?β2,?β3,?Σu,?Σϵ). We will use shorthand notation for usual intake:
= G1{Xi1,Xi2,β1,β2,Ui1,Ui2,Σϵ(2,2)}, see (6);
Usual energy intake is TEi= G2{Xi3,β3,Ui3,Σϵ(3,3)}, see (7).
For both usual dietary component intake and usual energy intake, 24hr samples
are available for i = np+ 1,...,nv+ np.
1,βT
2,βT
3)T, Σu, Σϵ and (Ui1,Ui2,Ui3), the latter only for i =
Usual dietary component intake is TFi
3.2 Frequentist Analysis
We are going to write the variable of interest as H(TFi,TEi). Thus, (a) the
dietary component is H(TFi,TEi) = TFi; (b) energy is H(TFi,TEi) = TEi; and
(c) the energy adjusted dietary component is H(TFi,TEi) = 1000 × TFi/TEi.
In general then, the usual intake variable of interest for person i can be written
as
Qi= H[G1{Xi1,Xi2,β1,β2,Ui1,Ui2,Σϵ(2,2)},G2{Xi3,β3,Ui3,Σϵ(3,3)}],
for i = 1,...,np+ nv, where we have that (Ui1,Ui2,Ui3) = Normal(0,Σu).
Estimation of the distribution of Q across the population is easily accom-
plished by a Monte-Carlo computation. This is a different Monte-Carlo com-
putation than the MCMC, and is performed after the MCMC has been done.
Specifically, for a large B, where we took B = 5,000, and for b = 1,...,B gen-
erate (Ubi1,Ubi2,Ubi3) = Normal(0,?Σu). Here B is not the number of burn-in
the distribution of usual intake can be estimated as the empirical distribution
of the values
[
taken across i = 1,...,nv+ npand b = 1,...,B.
Standard errors and confidence intervals for the distribution of usual intake
can be formed easily by bootstrapping. We used 400 bootstrap samples in our
numerical work.
steps, but simply a large enough number to do numerical integration. Then
Qbi= HG1{Xi1,Xi2,?β1,?β2,Ubi1,Ubi2,?Σϵ(2,2)},G2{Xi3,?β3,Ubi3,?Σϵ(3,3)}
]
,
12
The International Journal of Biostatistics, Vol. 7 [2011], Iss. 1, Art. 1
http://www.bepress.com/ijb/vol7/iss1/1
DOI: 10.2202/1557-4679.1267
Download full-text