ArticlePDF Available

# A Recursive Estimator for Random Coefficient Models

Authors:

## Abstract and Figures

This paper describes a recursive method for estimating random co-efficient models. Starting with a trial value for the moments of the distribution of coefficients in the population, draws are taken and then weighted to represent draws from the conditional distribution for each sampled agent (i.e., conditional on the agent's observed dependent vari-able.) The moments of the weighted draws are calculated and then used as the new trial values, repeating the process to convergence. The recursion is a simulated EM algorithm that provides a method of sim-ulated scores estimator. The estimator is asymptotically equivalent to the maximum likelihood estimator under specified conditions. The re-cursive procedure is faster than maximum simulated likelihood (MSL) with numerical gradients, easier to code than MSL with analytic gra-dients, assures a positive definite covariance matrix for the coefficients at each iteration, and avoids the numerical difficulties that often oc-cur with gradient-based optimization. The method is illustrated with a mixed logit model of households' choice among energy suppliers.
Content may be subject to copyright.
A Recursive Estimator for Random Coeﬃcient
Models
Kenneth Train
Department of Economics
University of California, Berkeley
October 18, 2007
Abstract
This paper describes a recursive method for estimating random co-
eﬃcient models. Starting with a trial value for the moments of the
distribution of coeﬃcients in the population, draws are taken and then
weighted to represent draws from the conditional distribution for each
sampled agent (i.e., conditional on the agent’s observed dependent vari-
able.) The moments of the weighted draws are calculated and then
used as the new trial values, repeating the process to convergence. The
recursion is a simulated EM algorithm that provides a method of sim-
ulated scores estimator. The estimator is asymptotically equivalent to
the maximum likelihood estimator under speciﬁed conditions. The re-
cursive procedure is faster than maximum simulated likelihood (MSL)
with numerical gradients, easier to code than MSL with analytic gra-
dients, assures a positive deﬁnite covariance matrix for the coeﬃcients
at each iteration, and avoids the numerical diﬃculties that often oc-
cur with gradient-based optimization. The method is illustrated with a
mixed logit model of households’ choice among energy suppliers.
Keywords: Mixed logit, probit, random coeﬃcients, EM algorithm.
1
1 Introduction
Random coeﬃcient models, such as mixed logit or probit, are widely used be-
cause they parsimoniously represent the fact that diﬀerent agents have diﬀerent
preferences. The parameters of the model are the parameters of the distribu-
tion of coeﬃcients in the population. The speciﬁcations generally permit full
covariance among the random coeﬃcients. However, this full generality is sel-
dom realized in empirical applications due to the numerical diﬃculty of max-
imizing a likelihood function that contains so many parameters. As a result,
most applications tend to assume no covariance among coeﬃcients (Chen and
Cosslett, 1998, Goett et al., 2000, Hensher et al., 2005) or covariance among
only a subset of coeﬃcients (Train, 1998, Revelt and Train, 1998).1
This paper presents a procedure that facilitates estimation of random co-
eﬃcient models with full covariance among coeﬃcients. In its simplest form,
it is implemented as follows. For each sampled agent, draws are taken from
the population distribution of coeﬃcients using a trial value for the mean and
covariance of this distribution. Each draw is weighted proportionally to the
probability of the agent’s observed dependent variable under this draw. The
mean and covariance of these weighted draws over all sampled agents are then
calculated. This mean and covariance become the new trial values, and the pro-
cess is repeated to convergence. The procedure provides a method of simulated
scores estimator (Hajivassiliou and McFadden, 1998), which is asymptotically
equivalent to maximum likelihood under well-known conditions discussed be-
low. The recursive procedure constitutes a simulated EM algorithm (Dempster
et al., 1977; Ruud, 1991), which converges to a root of the score condition.
The procedure is related to the diagnostic tool described by Train (2003,
section 11.5) of comparing the conditional and unconditional densities of co-
1Restrictions on the covariances are not as benign as they might at ﬁrst appear. For exam-
ple, Louviere (2003) argues, with compelling empirical evidence, that the scale of utility (or,
equivalently, the variance of random terms over repeated choices by the same agent) varies
over people, especially in stated-preference experiments. Models without full covariance of
utility coeﬃcients imply the same scale for all people. If in fact scale varies, the variation in
scale, which does not aﬀect marginal rates of substitution (MRS), manifests erroneously as
variation in independent coeﬃcients that does aﬀect estimated MRS.
2
eﬃcients for an estimated model. In particular, to evaluate a model, draws
are taken from the conditional distribution of coeﬃcients for each agent in the
sample, and then the distribution of these draws is compared with the esti-
mated population (i.e., unconditional) distribution. If the model is correctly
speciﬁed, the two distributions should be similar, since the expectation of the
former is equal to the later. In the current paper, this concept is used as an
estimation criterion rather than a diagnostic tool.
The procedure is described and applied in the sections below. Section 2
provides the basic version under assumptions that are more restrictive than
needed but facilitate explanation and implementation. Section 3 generalizes
the basic version. Section 4 applies the procedure to data on households’
choices among energy suppliers.
2 Basic Version
Each agent faces exogenous observed explanatory variables xand observed
dependent variable(s) y. We assume in our notation that yis discrete and
xis continuous, though these assumptions can be changed with appropriate
change in notation. Let βbe a vector of random coeﬃcients that aﬀect the
agent’s outcome and are distributed over agents in the population with density
f(β|θ), where θare parameters that characterize the density, such as its mean
and covariance. For the purposes of this section, we specify fto be the normal
density, independent of x; these assumptions will be relaxed in section 3. Let
m(β) be the vector-valued function consisting of βitself and the vectorized
lower portion of (ββ), Then, by deﬁnition, θ=m(β)f(β|θ).Thatis,θ
are the unconditional moments of β.
Consider now the behavioral model. Given β, the behavioral model gives
the probability that an agent facing xhas outcome yas some function L(y|
β,x), which we assume in this section depends on coeﬃcients βand not (di-
rectly) on elements of θ. In a mixed logit model with repeated choices for each
agent, Lis a product of logits. In other models, L, which we call the kernel of
the behavioral model, takes other forms.2Since βis not known, the probability
2If all random elements of the behavioral model are captured in β, then Lis an indicator
3
of outcome yis P(y|x, θ)=L(y|β, x)f(β|θ)dβ.
The density of βcan be determined for each agent conditional on the agent’s
outcome. This conditional distribution is the distribution of βamong the
subpopulation of agents who, when faced with x,haveoutcomey.ByBayes
identity, the conditional density is h(β|y, x, θ)=L(y|β,x)f(β|θ)/P (y|
x, θ). The moments of this conditional density are m(β)h(β|y, x, θ),and
the expectation of such moments in the population is:
M(θ)=x
yS(y|x)βm(β)h(β|y, x, θ)dβ g(x)dx
where g(x) is the density of xin the population and S(y|x)istheshareof
agents with outcome yamong those facing x.
Denote the true parameters as θ. At the true parameters S(y|x)=
P(y|x, θ), such that the expected value of the moments of the conditional
distributions equals the unconditional moments:
M(θ)=x
yS(y|x)βm(β)L(y|β,x)f(β|θ)
P(y|x, θ)g(x)dx
=x
yβm(β)L(y|β,x)f(β|θ)dβg(x)dx
=xβm(β)[
y
L(y|β,x)]f(β|θ)dβg(x)dx
=xβm(β)f(β|θ)dβg(x)dx
=θ.
since L(y|β,x) sums to one over all possible values of y.
The estimation procedure uses a sample analog to the population expecta-
tion M(θ). The variables for sampled agents are subscripted by n=1, ..., N.
The sample average of the moments of the conditional distributions is then:
M(θ)= 1
N
nβm(β)L(yn|β,xn)
P(yn|xn)f(β|θ).
This quantity is simulated as follows: (1) For each agent, take Rdraws of βfrom
f(β|θ) and label the r-th draw for agent nas βnr. (2) Calculate L(yn|βnr,x
n)
function of whether or not the observed outcome arises under that β.
4
for all draws for all agents. (3) Weight draw βnr by wnr =L(yn|βnr,xn)
1
RrL(yn|βnr,xn),
such that the weights average to one over draws for each given agent. (4)
Average the weighted moments:
˜
M(θ)=
n
r
wnr m(βnr)/N R
The estimator ˆ
θis deﬁned by ˜
M(ˆ
θ)=ˆ
θ. The recursion starts with an
initial value of θand repeatedly calculates θt+1 =˜
M(θt)untilθT+1 =θT
within a tolerance. Since the ﬁrst two moments determine the covariance, the
procedure is equivalently applied to the mean and covariance directly. Note
that the covariance in each iteration is necessarily positive deﬁnite, since it is
calculated as the covariance of weighted draws.
We ﬁrst examine the properties of the estimator and then the recursion.
2.1 Relation of estimator to maximum likelihood
Given the speciﬁcation of P(yn|xn), the score can be written:
sn(θ)=∂logP(yn|xn)
∂θ
=1
P(yn|xn)L(yn|β, xn)∂f(β|θ)
∂θ
=∂logf(β|θ)
∂θ
L(yn|β,xn)
P(yn|xn)f(β|θ).
The maximum likelihood estimator is a root of nsn(θ)=0.
Let bbe the mean and Wthe covariance of the normally distributed coef-
ﬁcients, such that logf(β|b, W )=k1
2log(|W|)1
2(βb)W1(βb).The
derivatives entering the score are:
∂logf
∂b =W1(βb)
∂logf
∂W =1
2W1+1
2W1[(βb)(βb)]W1.
It is easy to see that nsn(θ0) = 0 for some θ0if and only if M(θ0)=θ0,such
that, in the non-simulated version, the estimator is the same as MLE.
Consider now simulation. A direct simulator of the score is
˜sn(θ)= 1
R
r
wnr
∂logf(β|θ)
∂θ .
5
A method of simulated scores estimator is a root of n˜sn(θ) = 0. As in the non-
simulated case, ˜sn(θ0)=0i˜
M(θ0)=θ0, such that the recursive estimator
is this MSS estimator. Hajivassiliou and McFadden (1998) give properties of
MSS estimators. In our case, the score simulator is not unbiased, due to the
inverse probability that enters the weights. In this case, the MSS estimator is
consistent and asymptotically equivalent to MLE if Rrises at a rate greater
than N.
These properties, and the requirement on the draws, are the same as
for maximum simulated likelihood (MSL; Hajivassiliou and Ruud, 1994, Lee,
1995.) However, the estimator is not the same as the MSL estimator. For MSL,
the probability is expressed as an integral over a parameter-free density, with
the parameters entering the kernel. The gradient then involves the derivatives
of the kernel rather than the derivatives of the density. That is, the coeﬃcients
are treated as functions β(θ, μ)withμhaving a parameter-free distribution.
The probability is expressed as P(y|x, θ)=L(y|β(θ, μ),x)f(μ)and
simulated as ˜
P(y|x, θ)=rL(y|β(θ, μr),x)/R for draws μ1, ...μR.The
derivative of the log of this simulated probability is
˜
˜s(θ)= 1
˜
P(y|x, θ)
1
R
r
∂L(y|β(θ, μr),x)
∂θ ,
which is not numerically the same as ˜s(θ) for a ﬁnite number of draws. In
particular, the value of θthat solves n˜sn(θ) = 0 is not the same as the value
that solves n˜
˜sn(θ) = 0 and maximizes the log of the simulated likelihood
function. Either simulated score can serve as the basis for a MSS estimator,
and they are asymptotically equivalent to each other under the maintained
condition that Rrises faster than N. The distinction is the same as for any
MSS estimator that is based on a simulated score that is not the derivative of
the log of the simulated probability.3
The simulated scores at ˆ
θprovide an estimate of the information matrix,
analogous to the BHHH estimate for standard maximum likelihood: ˆ
I=
SS/N,whereSis the NxKmatrix of K-dimensional scores for Nagents.
3An important class are the unbiased score simulators that Hajivassiliou and McFadden
(1998) discuss, which, by deﬁnition, diﬀer from the derivative of the log of the simulated
probability because the latter is necessarily biased due to the log operation.
6
The covariance matrix of the estimated parameters is then estimated as V=
ˆ
I1/N =(SS)1, under the maintained assumption that Rrises faster than
N. Also, the scores can be used as a convergence criterion, using the statistic
¯sV¯s,whers=n˜sn/N .
2.2 Simulated EM algorithm
We can show that the recursive procedure is an EM algorithm and, as such,
is guaranteed to converge. In general, an EM algorithm is a procedure for
maximizing a likelihood function in the presence of missing data (Dempster,
et al., 1977). For sample n=1,...,N, with discrete observed sample out-
come ynand continuous missing data znfor observation n(and suppress-
ing notation for observed explanatory variables), the likelihood function is
nlog P(yn|z,θ)fn(z|θ)dz,wherefn(z|θ) is the density of the miss-
ing data for observation nwhich can depend on parameters θ. The recursion
is speciﬁed as:
θt+1 =argmaxθ
nhn(z|yn
t)logP(yn,z |θ)dz
where Pis the probability-density of both the observed outcome and missing
data, and his the density of the missing data conditional on y. It is called
EM because it consists of an expectation that is maximized. The term being
maximized is the expected log-likelihood of both the outcome and the missing
data, where this expectation is over the density of the missing data conditional
on the outcome. The expectation is calculated using the previous iteration’s
value of θin hn, and the maximization to obtain the next iteration’s value is
over θin logP (yn,z |θ). This distinction between the θentering the weights
for the expectation and the θentering the log-likelihood is the key element of
the EM algorithm. Under conditions given by Boyles (1983) and Wu (1983),
this algorithm converges to a local maximum of the original likelihood function.
As with standard gradient-based methods, it is advisable to check whether the
local maximum is global, by, e.g., using diﬀerent starting values.
In the present context, the missing data are the β’s which have the same
unconditional density for all observations, such that the above notation is trans-
7
lated to zn=βnand fn(z|θ)=f(β|θ)n. The EM recursion becomes:
θt+1 =argmaxθ
nh(β|yn,x
n
t)log[L(yn|β,xn)f(β|θ)]dβ. (1)
Since L(yn|β,xn) does not depend on θ, the recursion becomes
θt+1 =argmaxθ
nh(β|yn,x
n
t)logf(β|θ)(2)
The integral is approximated by simulation, giving:
θt+1 =argmaxθ
n
r
wnr(θt)logf (βnr |θ)(3)
where the weights are expressed as functions of θtsince they are calculated
from θt. Note, as stated above, that in the maximization to obtain θt+1,the
weights are ﬁxed, and the maximization is over θin f. The function being
maximized is the log-likelihood function for a sample of draws from fweighted
by w(θt). In the current section, fis the normal density, which makes this
maximization easy. In particular, for a sample of weighted draws from a normal
distribution, the maximum likelihood estimator for the mean and covariance
of the distribution is simply the mean and covariance of the weighted draws.
This is our recursive procedure.4
3 Generalization
We consider non-normal distributions, ﬁxed coeﬃcients, and parameters that
enter the kernel but not the distribution of coeﬃcients.
3.1 Non-normal distributions
For distributions that can be expressed as a transformation of normally dis-
tributed terms, the transformation can be taken in the kernel, L(y|T(β),x)
4EM algorithms have been used extensively to examine Gaussian mixture models for
cluster analysis and data mining (e.g., McLachlan and Peel, 2000.) In these models, the
density of the data is described by a mixture of Gaussian distributions, and the goal is to
estimate the mean and covariance of each Gaussian distribution and the parameters that mix
them. In our case, the data being explained are discrete outcomes rather than continuous
variables, and the Gaussian is the mixing distribution rather than the quantity that is mixed.
8
for transformation T, and all other aspects of the procedure remain the same.
The parameters of the model are still the mean and covariance of the normally
distributed terms, before transformation. Draws are taken from a normal with
given mean and covariance, weights are calculated for each draw, the mean and
covariance of the weighted draws are calculated, and the process is repeated
with the new mean and covariance. The transformation aﬀects the weights,
but nothing else. A considerable degree of ﬂexibility can be obtained in this
way. Examples include lognormals with transformation exp(β), censored nor-
mal with max(0), and Johnson’s SBdistribution with exp(β)/(1 + exp(β)).
The empirical application in section 4 explores the use of these kinds of trans-
formations.
For any distribution, the EM algorithm in eqn (4) states that the next
value of the parameter, θt+1,istheMLEofθfrom a sample of weighted draws
from the distribution. With a normal distribution, the MLE is the mean and
covariance of the weighted draws. For many other distributions, the same
is true, namely, that the parameters of the distribution are moments whose
MLE is the analogous moments in the sample of weighted draws. When this
is not the case, then the moments of the weighted draws are replaced with
whatever constitutes the MLE of parameters based on the weighted draws.
The equivalence of ˜sn(θ)=0and ˜
M(θ)=θarises under any fwhen ˜
Mis
deﬁned as the MLE estimator from weighted draws from f.
3.2 Fixed coeﬃcients and parameters in the kernel
The procedure can be conveniently modiﬁed to allow random coeﬃcients to
contain a systematic part that would ordinarily appear as a ﬁxed coeﬃcient
in the kernel. Let βnzn+ηnwhere znis a vector of observed variables
relating to agent n, Γ is a conforming matrix, and ηnis normally distributed.
The parameters θare now Γ and the mean and covariance of η. The density of β
is denoted f(β|zn) since it depends on z. The probability for observation n
is P(yn|xn,z
n)=L(yn|β, xn)f(β|zn), and the conditional density
of βis h(β|yn,x
n,z
n
t)=L(yn|β,xn)f(β|zn)/P (yn|xn,z
n). The EM
9
recursion is
θt+1 =argmaxθ
nh(β|yn,x
n,z
n
t)log[L(yn|β,xn)f(β|zn)].
As before, Ldoes not depend on θand so drops out, giving:
θt+1 =argmaxθ
nh(β|yn,x
n,z
nθt)logf(β|zn)dβ.
which is simulated by
θt+1 =argmaxθ
n
r
wnr(θt)logf (βnr |zn)(4)
where wnr =L(yn|βnr,x
n)/r1
RL(yn|βnr,x
n). Given a value of θ, draws of
βnare obtained by drawing ηfrom its normal distribution and adding Γzn.The
weight for each draw of βnis determined as before, proportional to L(yn|βn,x).
Then the ML estimate of θis obtained from the sample of weighted draws.
Since βis speciﬁed as a system of linear equations with normal errors, the
MLE of the parameters is the weighted seemingly unrelated regression (SUR)
of βnon zn(e.g., Greene, 2000, section 15.4). The estimated coeﬃcients of zn
are the new value of Γ; the estimated constants are the new means of η;and
the covariance of the residuals is the new value of the covariance of η.
For ﬁxed parameters that are not implicitly part of a random coeﬃcient,
an extra step must be added to the procedure. To account for this generality,
let the kernel depend on parameters λthat do not enter the distribution of
the random β: i.e., L(y|β,x,λ). Denote the parameters as θ, λ,whereθis
still the mean and covariance of the normally distributed coeﬃcients. The EM
recursion given in eq (1) becomes:
θt+1
t+1=argmaxθ,λ
nh(β|yn,x
n
t
t)log[L(y|β,x,λ)f(β|θ)]dβ.
Unlike before, Lnow depends on the parameters and so does not drop out.
However, Ldepends only on λ,andfdepends only on θ, such that the two
sets of parameters can be updated separately. The equivalent recursion is:
θt+1 =argmaxθ
nh(β|yn,x
n
t
t)logf(β|θ)
10
as before and
λt+1 =argmaxλ
nh(β|yn,x
n
t
t)logL(y|β,x,λ)dβ.
The latter is the MLE for the kernel model on weighted observations. If,
e.g., the kernel is a logit formula, then the updated value of λis obtained by
estimating a standard (i.e., non-mixed) logit model on weighted observations,
with each draw of βproviding an observation. A more realistic situation is a
model in which the kernel is a product of GEV probabilities (McFadden, 1978),
with λbeing the nesting parameters, which are the same for all agents. The
updated values of the nesting parameters are obtained by MLE of the nested
logit kernel on the weighted observations, where the only parameters in this
estimation are the nesting parameters themselves. The parameters associated
with the random coeﬃcients are updated the same as before, as the mean and
covariance of the weighted draws.
Alternative-speciﬁc constants in discrete choice models can be handled in
the way just described. However, if the constants are the only parameters that
enter the kernel, then the contraction suggested by Berry, Levinsohn, and Pakes
(1995) can be applied rather than estimating them by ML.5For constants α,
this contraction is a recursive application of αt+1 =αt+ln(S)ln(ˆ
S(θt
t)),
where Sis the sample (or population) share choosing each alternative, and
ˆ
S(θ, α) is the predicted share based on parameters θand α. This recursion
would ideally be iterated to convergence with respect to αfor each iteration of
the recursion for θ. However, it is probably eﬀective with just one updating of
αfor each updating of θ.
4 Application
We apply the procedure to a mixed logit model, using data on households’
choice among energy suppliers in stated-preference (SP) exercises. SP exercises
are often used to estimate preferences for attributes that are not exhibited in
5If the kernel is the logit formula, then the contraction gives the MLE of the constants,
since both equate sample and predicted shares for each alternative; see, e.g., Train 2003, p.
66.
11
markets or for which market data provide insuﬃcient variation for meaningful
estimation. A general description of the approach, with a review of its history
and applications, is provided by, e.g., Louviere et al. (2000). In an SP survey,
each respondent is presented with a series of choice exercises. Each exercise
consists of two or more alternatives, with attributes of each alternative de-
scribed. The respondent is asked to identify the alternative that they would
choose if facing the choice in the real world. The attributes are varied over
situations faced by each respondent as well as over respondents, to obtain the
variation that is needed for estimation.
In the current application, respondents are residential energy customers,
deﬁned as a household member who is responsible for the household’s elec-
tricity bills. Each respondent was presented with 12 SP exercises representing
choice among electricity suppliers. Each exercise consisted of four alternatives,
with the following attributes of each alternative speciﬁed: the price charged by
the supplier in cents per kWh; the length of contract that binds the customer
and supplier to that price (varying from 0 for no binding to 5 years); whether
the supplier is the local incumbent electricity company (as opposed to a en-
trant); whether, if an entrant, the supplier is a well-known company like Home
Depot (as opposed to a entrant that is not otherwise known); whether time-
of-use rates are applied, with the rates in each period speciﬁed; and whether
seasonal rates are applied, with the rates in each period speciﬁed. Choices
were obtained for 361 respondents, with nearly all respondents completing all
12 exercises. These data are described by Goett (1998). Huber and Train
(2001) used the data to compare ML and Bayesian methods for estimation of
conditional distributions of utility coeﬃcients.
The behavioral model is speciﬁed as a mixed logit with repeated choices
(Revelt and Train, 1998). Consumer nfaces Jalternatives in each of Tchoice
situations. The utility that consumer nobtains from alternative jin choice
situation tis Unjt =β
nxnjt +εnj t,wherexnjt is a vector of observed variables,
βnis random with distribution speciﬁed below, and εnjt is iid extreme value. In
each choice situation, the agent chooses the alternative with the highest utility,
and this choice is observed but not the latent utilities themselves. By specifying
εnjt to be iid, all structure in unobserved terms is captured in the speciﬁcation
12
of β
nxnjt . McFadden and Train (2000) show that any random utility choice
model can be approximated to any degree of accuracy by a mixed logit model
of this form.6
Let ynt denote consumer n’s chosen alternative in choice situation t,with
the vector yncollecting the choices in all Tsituations. Similarly, let xnbe the
collection of variables for all alternatives in all choice situation. Conditional
on β, the probability of the consumer’s observed choices is a product of logits:
L(yn|β,xn)=
t
eβxnytt
jeβxnjt .
The (unconditional) probability of the consumer’s sequence of choice is:
P(yn|xn)=L(yn|β, xn)f(β|θ)
where fis the density of β, which depends on parameters θ.Thisfis the
(unconditional) distribution of coeﬃcients in the population. The density of β
conditional on the choices that consumer nmade when facing variables xnis
h(β|yn,x
n)=L(yn|β, xn)f(β|θ)/P (yn|xn).
We ﬁrst assume that βis normally distributed with mean band covariance
W. The recursive estimation procedure is implemented as follows, with band
Wused explicitly for θ:
2. For each sampled consumer, take Rdraws of β,withther-th draw for
consumer ncreated as βnr =b0+C0ηwhere C0is the lower triangular
Choleski factor of W0and ηis a vector of iid standard normal draws.
3. Calculate a weight for each draw as wnr =L(yn|βnr,x
n)/1
RrL(yn|
βnr,x
n).
4. Calculate the weighted mean and covariance of the N·Rdraws, and label
them b1and W1.
6It is important to note that McFadden and Train’s theorem is an existence result only
and does not provide guidance on ﬁnding the appropriate distribution and speciﬁcation of
variables that attains a close approximation.
13
5. Repeat steps (2)-(4) using b1and W1in lieu of b0and W0,continuingto
convergence.
The last choice situation for each respondent was not used in estimation and
instead was reserved as a “hold-out” choice to assess the predictive ability
of the estimated models. For simulation, 200 randomized Halton draws were
used for each respondent. These draws are described by, e.g., Train (2003). In
the context of mixed logit models, Bhat (2001) found that 100 Halton draws
provided greater accuracy than 1000 pseudo-random draws; his results have
been conﬁrmed by Train (2000), Munizaga and Alvarez-Diaziano (2001) and
Hensher (2001).
The estimated parameters are given in Tables 1, with standard errors cal-
culated as described above, using the simulated scores at convergence. Table 1
also contains the estimated parameters obtained by maximum simulated likeli-
hood (MSLE.) The results are quite similar. Note that the recursive estimator
(RE) treats the covariances of the coeﬃcients as parameters, while the param-
eters for MSLE are the elements of the Choleski factor of the covariance. (The
covariances are not parameters in MLE because of the diﬃculty of assuring
that the covariance matrix at each iteration is positive deﬁnite when using
gradient-based methods. By construction, the RE assures a positive deﬁnite
covariance at each iteration, since each new value is the covariance of weighted
draws.) To provide a more easily interpretable comparison, Table 2 gives the
estimated standard deviations and correlation matrix implied by the estimated
parameters for each method.
The estimated parameters were used to calculate the probability of each
respondent’s choice in their last choice situation. The results are given at the
bottom of Table 1. Two calculation methods were utilized. First, the prob-
ability was calculated by mixing over the population density of parameters
(i.e., the unconditional distribution), i.e., PnT =L(ynT |β, xnT )f(β|ˆ
θ)dβ,
where Tdenotes the last choice situation. This is the appropriate formula
to use in situations for which previous choices by each sampled agent are not
observed. RE gives an average probability of 0.3742, and MSLE gives 0.3620.
The probability is slightly higher for RE than MSLE, which indicates that RE
predicts somewhat better. The same result was observed for all the alternative
14
speciﬁcations discussed below. The second calculation mixes over the condi-
tional density for each respondent, using h(β|y, x, ˆ
θ). This
formula is appropriate when previous choices of agents have been observed.
The probability is of course higher under both estimators than when using the
unconditional density, since each respondent’s previous choices provide useful
information about how they will choose in a new situation. The average prob-
ability from RE is again higher than that from MSLE. However, unlike the
unconditional probability calculation, this relation is reversed for some of the
alternative speciﬁcations discussed below.
The MSLE algorithm converged in 141 iterations and took 7 minutes, 4
seconds using analytic gradients and 3 hours, 20 minutes using numerical gra-
dients.7For RE, I deﬁned convergence as each parameter changing by less
than one-half of one percent and the convergence statistic given above being
less than 1E-4. The ﬁrst of these criteria was the more stringent in this case,
in that the second was met (at 0.82E-4) once the ﬁrst was. RE converged in
162 iterations and took 7 minutes, 59 seconds. Since RE does not require the
coding of gradients, the implication of these time comparisons is that using
RE instead of MSLE reduces either the researcher’s time in coding analytic
Alternative convergence criteria were explored for RE, both more relaxed
and more stringent. Using a more relaxed criterion of each parameter changing
less than one percent, estimation required 63 iterations; took 3 minutes, 1
second; and obtained a convergence statistic of 1.2E-4. When the criterion was
tightened to each parameter changing by less than one-tenth of one percent,
estimation required 609 iterations; took 29 minutes, 3 seconds; and obtained
a convergence statistic of 0.44E-4. The estimated parameters changed little
by applying the stricter criterion. Interestingly, the more relaxed criterion
7All estimation was in Gauss on a PC with a Pentium 4 processor, 3.2GHz, with 2 GB
of RAM. For MSLE, I used Gauss’ maxlik routine with my codes for the mixed logit log-
likelihood function and for analytic gradients under normally distributed coeﬃcients. For
RE, I wrote my own code; one of the advantages of the approach is the ease of coding it. I
from my website at http://elsa.berkeley.edu/train.
15
obtained parameters that were a bit closer to the MSL estimates. For example,
the mean and standard deviation of the price coeﬃcient were -0.927 and .611
after 62 iterations and -0.9954 and 0.5471 after 162 iterations, compared to the
MSL estimates of -0.939 and 0.691.
Step-sizes are compared across the algorithms by examining the iteration
log. Table 3 gives the iteration log for the mean and standard deviation of the
price coeﬃcient, which is indicative for all the parameters. The RE algorithm
moves, at ﬁrst, considerably more quickly toward the converged values than the
gradient-based MSLE algorithm. However, it later slows down and eventually
takes smaller steps than the MSLE algorithm. As Dempster et al. (1977) point
out, this is a common feature of EM algorithms. However, Ruud (1991) notes
that the algorithm’s slowness near convergence is balanced by greater numerical
stability, since it avoids the numerical problems that are often encountered
in gradient-based methods, such as overstepping the maximum and getting
“stuck” in areas of the likelihood function that are poorly approximated by a
quadradic. We observed these problems with MSLE in two of our alternative
speciﬁcations, discussed below, where new starting values were required after
the MSLE algorithm failed at the original starting values. We encountered no
such problems with RE.8
Alternative starting values were tried in each algorithm. Several diﬀerent
convergence points were found with each of the algorithms. All of them were
similar to the estimates in Table 1, and none obtained a higher log-likelihood
value. However, the fact that diﬀerent converged values were obtained indi-
cates that the likelihood function is “rippled” around the maximum. This
phenomenon is not unexpected given the large number of parameters and the
relatively small behavioral diﬀerences associated with diﬀerent combinations
of parameter values. Though this issue might constitute a warning about esti-
mation of so many parameters, restricting the parameters doesn’t necessarily
8The recursion can be used as an “accelerator” rather than an estimator, by using it
for initial iterations and then switching to MSL near convergence. This procedure takes
advantage of its larger initial steps and the avoidance of numerical problems, which usually
occur in MSL further from the maximum, while retaining the familiarity of MSL and its
larger step-sizes near convergence.
16
resolve the issue as much as mask it. In any case, the issue is the same for
MSLE and RE.
Table 4 gives statistics for several alternative speciﬁcations. The columns
in the table are for the following speciﬁcations:
1. All coeﬃcients are normally distributed. This is the speciﬁcation in Table
1 and is included here for comparison.
2. Price coeﬃcient is lognormally distributed, as exp(βp), with βpand the
coeﬃcients of the other variables normally distributed. This speciﬁcation
assures a negative price coeﬃcient for all agents.
3. The coeﬃcients of price, TOU rates and seasonal rates are lognormally
distributed, and the other coeﬃcients are normal. This speciﬁcation as-
sures that all three price-related attributes have negative coeﬃcients for
all agents.
4. Price coeﬃcient is censored normal, min(0
p), with others normal. This
speciﬁcation prevents positive price coeﬃcients but allows some agents
to place no importance on price, at least in the range of prices considered
in the choice situations.
5. Price coeﬃcient is distributed as SBfrom 0 to 2, as 2exp(βp)/(1 +
exp(βp)), others normal. This distribution is bounded on both sides and
allows a variety of shapes within these bounds; see Train and Sonnier
(2005) for an application and discussion of its use.
6. The model is speciﬁed in willingness-to-pay space, using the concepts
from Sonnier et al. (2007) and Train and Weeks (2005). Utility is re-
parameterized as U=αp +αβz+εfor price pand non-price attributes
z, such that βis the agent’s willingness to pay (wtp) for attribute z.This
parameterization allows the distribution of wtp to be estimated directly.
Under the usual parameterization, the distribution of wtp is estimated
indirectly by estimating the distribution of the price and attribute coef-
ﬁcients, and deriving (or simulating) the distribution of their ratio.
17
MSLE and RE provide fairly similar estimates under all the speciﬁcations.
In cases when the estimated mean and standard deviation of the underlying
normal for the price coeﬃcient are somewhat diﬀerent, the diﬀerence is less
when comparing the mean and standard deviation of the coeﬃcient itself. For
example, in speciﬁcations (2) and (3), a ﬁfty percent diﬀerence in the estimated
mean of the underlying normal translates into less than four percent diﬀerence
in the mean of the coeﬃcient itself.
For all the speciﬁcations, the log of the simulated likelihood ( ˜
LL) is lower at
convergence with RE than with MSLE. This diﬀerence is by construction, since
the MSL estimates are those that maximize the ˜
LL, while the RE estimates are
those that set the simulated scores equal to zero with the simulated scores not
being the derivative of the ˜
LL. However, despite this diﬀerence, it would be
useful if the ˜
LL under the two methods moved in the same direction when the
speciﬁcation is changed. This is not the case. ˜
LL is higher for speciﬁcation (3)
than speciﬁcation (1) under either estimator. However, for speciﬁcation (4),
˜
LL under RE is higher while that under MSLE is lower than for speciﬁcation
(1). The change under MSLE does not necessarily provide better guidance,
since simulation error can aﬀect MSLE both in the estimates that are obtained
and the calculation of the log-likelihood at those estimates.
The average probability for the “hold-out” choice using the population den-
sity is higher under RE than MSLE for all speciﬁcations. When using the
conditional density, neither method obtains a higher average probability for all
speciﬁcations. These results were mentioned above.
For MSLE, I used numerical gradients rather than recoding the analytic
gradients. The run times in Table 4 therefore reﬂect equal amounts of recoding
time for each method. Run times are much lower for RE than MSLE when
be about the same speed as RE,9but of course would require more coding
time. As mentioned above, the ML algorithm failed for two of the speciﬁcations
9In some cases, MSLE is slower even with analytic gradients. For example, speciﬁcation
(2) was took 333 iterations in MSLE, while RE took 139. An iteration in MSLE with analytic
gradients takes about the same time as an iteration in RE, such that for speciﬁcation (2),
MSLE with analytic gradients would be slower than RE.
18
(namely, 5 and 6) when using the same starting values as for the others; these
runs were repeated with the converged values from speciﬁcation (2) used as
starting values.
5 Summary
A simple recursive estimator for random coeﬃcients is based on the fact that
the expectation of the conditional distributions of coeﬃcients is equal to the
unconditional distribution. The procedure takes draws from the unconditional
distribution at trial values for its moments, weights the draws such that they
are equivalent to draws from the conditional distributions, calculates the mo-
ments of the weighted draws, and then repeats the process with these calculated
moments, continuing until convergence. The procedure constitutes a simulated
EM algorithm and provides a method of simulated scores estimator. The es-
timator is asymptotically equivalent to MLE if the number of draws used in
simulation rises faster than N, which is the same condition as for MSL. In
an application of mixed logit on stated-preference data, the procedure gave
estimates that are similar to those by MSL, was faster than MSL with numeri-
cal gradients, and avoided the numerical problems that MSL encountered with
some of the speciﬁcations.
19
References
Berry, S., J. Levinsohn and A. Pakes (1995), ‘Automobile prices in market
equilibrium’, Econometrica 63, 841–889.
Bhat, C. (2001), ‘Quasi-random maximum simulated likelihood estimation of
the mixed multinomial logit model’, Transportation Research B 35, 677–
693.
Boyles, R. (1983), ‘On the convergence of the em algorithm’, Journal of the
Royal Statistical Society B 45, 47–50.
Chen, H. and S. Cosslett (1998), ‘Environmental quality preference and beneﬁt
estimation in multinomial probit models: A simulation approach’, Amer-
ican Journal of Agricultural Economics 80, 512–520.
Dempster, A., N. Laird and D. Rubin (1977), ‘Maximum likelihood from incom-
plete data via the em algorithm’, Journal of the Royal Statistical Society
B39, 1–38.
Goett, A. (1998), ‘Estimating customer preferences for new pricing products’,
Electric Power Research Institute Report TR-111483, Palo Alto.
Goett, A., K. Hudson and K. Train (2000), ‘Consumers’ choice among retail
energy suppliers: The willingnes-to-pay for service attributes’, The Energy
Journal 21, 1–28.
Greene, W. (2000), Econometric Analysis, Prentice Hall, New York.
Hajivassiliou, V. and D. McFadden (1998), ‘The method of simulated scores
for the estimation of ldv models’, Econometrica 66, 863–96.
Hajivassiliou, V. and P. Ruud (1994), Classical estimation methods for ldv
models using simulation, in R.Engle and D.McFadden, eds, ‘Handbook of
Econometrics’, North-Holland, Amsterdam, pp. 2383–441.
Hensher, D. (2001), ‘The valuation of commuter travel time savings for car
drivers in new zealand: Evaluating alternative model speciﬁcations’,
Transportation 28, 101–118.
20
Hensher, D., N. Shore and K. Train (2005), ‘Households’ willingness to pay
for water service attributes’, Environmental and Resource Economics
32, 509–531.
Huber, J. and K. Train (2001), ‘On the similarity of classical and bayesian
estimates of individual mean partworths’, Marketing Letters 12, 259–269.
Lee, L. (1995), ‘Asymptotic bias in simulated maximum likelihood estimation
of discrete choice models’, Econometric Theory 11, 437–483.
Louviere, J. (2003), ‘Random utility theory-based stated preference elicitation
methods’, working paper, Faculty of Business, University of Technology,
Sydney.
Louviere, J., D. Hensher and J. Swait (2000), Stated Choice Methods: Analysis
and Applications, Cambridge University Press, New York.
McLachlan, G. and D. Peel (2000), Finite Mixture Models, John Wiley and
Sons, New York.
Munizaga, M. and R. Alvarez-Daziano (2001), ‘Mixed logit versus nested logit
and probit’, Working Paper, Departmento de Ingeniera Civil, Universidad
de Chile.
Revelt, D. and K. Train (1998), ‘Mixed logit with repeated choices’, Review of
Economics and Statistics 80, 647–657.
Ruud, P. (1991), ‘Extensions of estimation methods using the em algorithm’,
Journal of Econometrics 49, 305–341.
Sonnier, G., A. Ainslie and T. Otter (2007), ‘Hereogeneous distributions of will-
ingness to pay in choice models’, Quantitative Marketing and Economics
5(3), 313–331.
Train, K. (1998), ‘Recreation demand models with taste variation’, Land Eco-
nomics 74, 230–239.
Train, K. (2000), ‘Halton sequences for mixed logit’, Working Paper No. E00-
278, Department of Economics, University of California, Berkeley.
21
Train, K. (2003), Discrete Choice Methods with Simulation, Cambridge Uni-
versity Press, New York.
Train, K. and G. Sonnier (2005), Mixed logit with bounded distributions of cor-
related partworths, in R.Scarpa and A.Alberini, eds, ‘Applications of Sim-
ulation Methods in Environmental and Resource Economics’, Springer,
Dordrecht, pp. 117–134.
Train, K. and M. Weeks (2005), Discrte choice models in preference space
and willingness-to-pay space, in R.Scarpa and A.Alberini, eds, ‘Applica-
tions of Simulation Methods in Environmental and Resource Economics’,
Springer, Dordrecht, pp. 1–16.
Wu, C. (1983), ‘On the convergence properties of the em algorithm’, Annals of
Statistics 11, 95–103.
22
Table 1: Mixed Logit Model of Electricity Supplier Choice
All coeﬃcients normally distributed
Recursive estimator (RE), Maximum simulated likelihood estimator (MSLE)
Parameters RE MSLE
(Std errors in parentheses)
Means
1. Price -0.9954 (0.0521) -0.9393 (0.0520)
2. Contract length -0.2404 (0.0231) -0.2428 (0.0256)
3. Local utility 2.5464 (0.1210) 2.3328 (0.1337)
4. Well known co. 1.8845 (0.0742) 1.8354 (0.0104)
5. TOU rates -9.3126 (0.4571) -9.1682 (0.4400)
6. Seasonal rates -9.6898 (0.4496) -9.0710 (0.4365)
Covariances Choleski
11 0.5471 (0.0726) 0.6909 (0.0611)
21 0.0266 (0.0439) -0.0333 (0.0290)
22 0.1222 (0.0146) 0.4180 (0.0236)
31 0.9430 (0.2672) -1.6089 (0.1523)
32 0.3602 (0.1039) 0.2419 (0.1475)
33 2.8709 (0.3321) 1.4068 (0.1468)
41 0.5208 (0.1689) -0.9107 (0.1218)
42 0.2668 (0.0681) 0.1526 (0.1192)
43 1.3065 (0.3543) 0.6746 (0.1216)
44 1.1015 (0.1339) -1.0424 (0.0997)
51 4.5204 (1.2492) -4.6228 (0.4740)
52 0.2707 (0.3972) -0.1813 (0.1690)
53 7.9995 (2.4263) 1.8399 (0.1740)
54 4.5092 (1.5356) 0.3592 (0.2026)
55 45.050 (5.9201) 2.6309 (0.1631)
61 4.4860 (1.1875) -5.3688 (0.4862)
62 0.1156 (0.3933) -0.3913 (0.1302)
63 7.5672 (2.2439) 0.4850 (0.1691)
64 3.8878 (1.3968) 0.5309 (0.2054)
65 39.927 (10.578) 1.1074 (0.1544)
66 41.916 (5.2169) 1.7984 (0.1371)
Log of Sim. Likelihood -3482.93 -3423.08
Average probability of chosen alt. in last situation.
Unconditional density 0.3742 0.3620
Conditional density 0.5678 0.5632
23
Table 2: Standard deviations and correlations
Std devs Correlations
RE MSLE RE bottom, MSLE top
Price 0.740 0.691 1.000 0.079 0.748 0.589 0.819 0.921
Contract 0.350 0.419 0.103 1.000 0.172 0.145 0.033 0.006
Local util 1.694 2.151 0.752 0.608 1.000 0.736 0.822 0.736
Well known 1.050 1.547 0.671 0.728 0.735 1.000 0.578 0.511
TOU rates 6.712 5.643 0.911 0.115 0.703 0.640 1.000 0.879
Seasonal 6.474 5.827 0.937 0.051 0.690 0.572 0.919 1.000
24
Table 3: Iterations
Price coeﬃcents
Iteration Mean Std dev
RE MSLE RE MSLE
1002.449 0.2000
20.1431 0.1108 0.6650 0.1259
30.0479 0.0718 0.3641 0.1567
4-0.0553 0.0174 0.2944 0.1120
5-0.1405 0.1127 0.2657 0.2430
6-0.2136 0.0567 0.2567 0.2217
7-.2762 -0.0326 0.2575 0.1552
8-.3293 -0.0162 0.2625 0.1717
9-.3746 -0.0070 0.2702 0.1687
10 -.4132 -0.0064 0.2800 0.1514
20 -.6416 -0.3322 0.4191 0.0697
30 -0.7607 -0.5693 0.5346 0.3825
40 -0.8357 -0.7316 0.5947 0.4505
50 -0.8869 -0.7919 0.6325 0.5633
60 -0.9217 -0.8913 0.6573 0.6173
70 -0.9446 -0.9272 0.6738 0.6414
80 -0.9602 -0.9399 0.6854 0.6559
90 -0.9711 -0.9325 0.6941 0.6485
100 -0.9776 -0.9415 0.7006 0.6593
110 -0.9827 -0.9462 0.7044 0.6688
120 -0.9856 -0.9464 0.7087 0.6786
130 -0.9848 -0.9425 0.7178 0.6862
140 -0.9832 -0.9396 0.7282 0.6904
150 -0.9862 NA 0.7341 NA
160 -0.9932 NA 0.7381 NA
25
Table 4: Alternative Speciﬁcations
All Price Price Price Price WTP
normal log TOU censor SBspace
normal season normal
lognorm
Price
Underlying normal
Mean
RE -.9954 -.2441 -.1692 -1.0203 -.1761 -0.0892
MSLE -.9393 -.1655 -.2466 -.9828 -.0915 -.1125
Std dev
RE .7397 .5560 .6274 .6253 1.316 .2941
MSLE .6909 .4475 .7903 .6530 1.7964 .2444
Coeﬃcient
Mean
RE -.9954 -.9144 -1.028 -1.033 -.9335 -.9551
MSLE -.9393 -.9397 -1.068 -1.002 -.9711 -.9207
Std dev
RE .7397 .5503 .7140 .5971 .4990 .2871
MSLE .6909 .4411 .9946 .6155 .5958 .2284
Probability
for last choice
Population density
RE .3742 .3629 .3702 .3785 .3688 .3696
MSLE .3620 .3557 .3539 .3565 .3662 .3649
Conditional densities
RE .5678 .5501 .5640 .5630 .5634 .5309
MSLE .5632 .5569 .5674 .5691 .5637 .5415
Log Sim. Likelihd
RE -3482.93 -3510.81 -3467.49 -3508.84 -3474.66 -3554.66
MSLE -3423.08 -3456.63 -3420.58 -3420.21 -3424.19 -3494.48
Run time
RE 7m59s 6m45s 12m2s 11m27s 10m14s 22m32s
MSLE* 3h20m30s 7h54m34s 3h31m31s 6h26m46s 2h51m9s 4h6m5s
*Using numerical derivatives. Staring values for (5) and (6) are estimates from (2).
26
... Dans cette littérature, deux approches sont souvent utilisées pour prendre en compte l'hétérogénéité inobservée: des modèles à paramètres aléatoires et des modèles intégrant des indices latents. Très utilisés en micro-économétrie appliquée et en statistique computationnelle, les modèles à paramètres aléatoires (Hsiao et Pesaran, 2004;Train, 2007) permettent de ne pas seulement se limiter à la constante du modèle, comme seul paramètre affecté par l'hétérogénéité inobservée. Les paramètres associés aux variables d'intérêt du modèle, telles que les prix, sont supposés varier d'un individu à un autre. ...
... La modélisation micro-8 économétrique des choix de production agricole fait l'objet d'une littérature abondante (Carpentier et Letort, 2011;Just et Pope, 2001, 2003. Nous nous appuyons ici sur les travaux de Carpentier and Letort (2014) interprétations agronomiques aisément compréhensible pour les non-économistes et combine à la fois les concepts de la programmation mathématique positive (Heckelei et al., 2012;Howitt, 1995) et de la modélisation micro-économétrique (Just et al., 1983) La prise en compte de l'hétérogénéité inobservée par la spécification des modèles micro-économétriques de choix de production à paramètres aléatoires (Train, 2007) est la première contribution de cette thèse. Elle vise à exploiter les possibilités offertes par la spécification des modèles à paramètres aléatoires, notamment le fait que l'ensemble des paramètres du modèle, et non uniquement la constante, peuvent être affectés par les effets de l'hétérogénéité inobservée. ...
Thesis
Full-text available
Dans cette thèse, nous nous intéressons aux questions de l’hétérogénéité inobservée et des solutions en coin dans les modèles de choix d’assolements. Pour répondre à ces questions, nous nous appuyons sur un modèle de choix de production multicultures avec choix d’assolement de forme NMNL, dont nous proposons des extensions. Ces extensions conduisent à des problèmes spécifiques d’estimation, auxquels nous apportons des solutions. La question de l’hétérogénéité inobservée est traitée en considérant une spécification à paramètres aléatoires. Ceci nous permet de tenir compte des effets de l’hétérogénéité inobservée sur l’ensemble des paramètres du modèle. Nous montrons que les versions stochastiques de l’algorithme EM sont particulièrement adaptées pour estimer ce type de modèle.Nos résultats d’estimation et de simulation montrent que les agriculteurs réagissent de façon hétérogène aux incitations économiques et que ne pas tenir compte de cette hétérogénéité peut conduire à des effets simulés de politiques publique biaisés.Pour tenir compte des solutions en coin dans les choix d’assolement, nous proposons une modélisation basée sur les modèles à changement de régime endogène avec coûts fixes associés aux régimes. Contrairement aux approches basées sur des systèmes de régression censurées, notre modèle est cohérent d’un point de vue micro-économique. Nos résultats montrent que les coûts fixes associés aux régimes jouent un rôle important dans le choix des agriculteurs de produire ou non certaines cultures et qu’ils constituent, à court terme, un déterminant important des c
... They confirmed that the effectiveness of the MSL method was diminished during the recovery of true coefficients as the number of random coefficients increased, while an expectation-maximization (EM) algorithm dealt much better with the problem. Train (2007) also pointed out that an EM algorithm is a robust way to overcome a lack of empirical identification when calibrating a mixed logit model. However, to the best of our knowledge, an EM algorithm has rarely been employed to calibrate the more 4 complex ICLV model. ...
... However, as far as I could ascertain, EM algorithms have rarely been applied to the estimation of ICLV models. Train (2007) shed light on the possibility that the EM algorithm could be applied to an ICLV model. He presented an EM-based approach to resolve a choice model wherein fixed and random coefficients were mixed. ...
Article
Full-text available
As computing capability has grown dramatically, the transport choice model has rigorously included latent variables. However, integrated latent and choice variable (ICLV) models are hampered by a serious problem that is caused by the maximum simulated likelihood (MSL) method. The method cannot properly reproduce the true coefficients, which is a problem that is often referred to as a lack of empirical identification. In particular, the problem is exacerbated particularly when an ICLV model is calibrated based on cross-sectional data. An expectation-maximization (EM) algorithm has been successfully employed to calibrate a random coefficient choice model, but it has never been applied to the calibration of an ICLV model. In this study, an EM algorithm was adapted to calibrate an ICLV model, and it successfully reproduced the true coefficients in the model. The main contribution of adopting an EM algorithm was to simplify the calibration procedure by decomposing the procedure into three well-known econometric problems: a weighted linear regression, a weighted discrete choice problem, and a weighted ordinal choice problem. Simulation experiments also confirmed that an EM algorithm is a stable method for averting the problem of lack of empirical identification.
... The second benchmark model is a simpler version of our proposed model without modeling correlation among error terms. 15 The third model is our proposed model that accounts for both correlation and heterogeneity. In Table 2, we present the relative performance of these three models based on AIC 16 (Akaike Information Criterion) and BIC 17 (Bayesian Information Criterion, see Allengby, 1990). ...
... These dealer types roughly correspond to the "modern" dealers described earlier, to which we now refer as 'lean' dealers, and the others correspond to the "traditional" retail dealers. This identification is summarized in the following result: 15 The estimates of these two benchmark models are available from the authors upon request. 16 ( 2 2 ) / AIC LnL k N = − + , here L refers to likelihood, k refers to the number of parameters we estimate, and N is the number of observations. ...
... Even though McFadden expresses hope for research on more complex RUMs, econometrics research in the last decade has mainly focused on the applications of the MMNL model and new estimators for MMNL model extensions based on methods such as the EM algorithm [113,115,114]. From the statistical perspective we see a continuing interest in building new estimators for the Luce model such as the minorize-maximize algorithm [67], xed point estimators for Bradley-Terry Model [100], and rank-centrality algorithm [96]. ...
Chapter
Full-text available
This chapter describes and compares suitable software for the analysis of basic and advanced discrete choice models. Software packages are classified into proprietary and non-proprietary, according to the operating system required and modelling capabilities. Abilities of both selected commercial (Stata, SAS and Latent Gold, e.g.) and open-source packages (Biogeme and R-libraries) are considered. Finally, some user-written estimation packages for Gauss, Matlab, R and Stata are presented.
Article
Stated preference elicitation methods collect data on consumers by “just asking” about tastes, perceptions, valuations, attitudes, motivations, life satisfactions, and/or intended choices. Choice-Based Conjoint (CBC) analysis asks subjects to make choices from hypothetical menus in experiments that are designed to mimic market experiences. Stated preference methods are controversial in economics, particularly for valuation of non-market goods, but CBC analysis is accepted and used widely in marketing and policy analysis. The promise of stated preference experiments is that they can provide deeper and broader data on the structure of consumer preferences than is obtainable from revealed market observations, with experimental control of the choice environment that circumvents the feedback found in real market equilibria. The risk is that they give pictures of consumers that do not predict real market behavior. It is important for both economists and non-economists to learn about the performance of stated preference elicitations and the conditions under which they can contribute to understanding consumer behavior and forecasting market demand. This monograph re-examines the discrete choice methods and stated preference elicitation procedures that are commonly used in CBC, and provides a guide to techniques for CBC data collection, model specification, estimation, and policy analysis. The aim is to clarify the domain of applicability and delineate the circumstances under which stated preference elicitations can provide reliable information on preferences.
Article
This paper develops a new technique for estimating mixed logit models with a simple minorization–maximization (MM) algorithm. The algorithm requires minimal coding and is easy to implement for a variety of mixed logit models. Most importantly, the algorithm has a very low cost per iteration relative to current methods, producing substantial computational savings. In addition, the method is asymptotically consistent, efficient and globally convergent. Copyright
Article
Full-text available
The random coefficients logit model allows a more realistic representation of agents' behavior. However, the estimation of that model may involve simulation, which may become impractical with many random coefficients because of the curse of dimensionality. In this paper, the traditional maximum simulated likelihood (MSL) method is compared with the alternative expectation-maximization (EM) method, which does not require simulation. Previous literature had shown that for cross-sectional data, MSL outperforms the EM method in the ability to recover the true parameters and estimation time and that EM has more difficulty in recovering the true scale of the coefficients. In this paper, the analysis is extended from cross-sectional data to the less volatile case of panel data to explore the effect on the relative performance of the methods with several realizations of the random coefficients. In a series of Monte Carlo experiments, evidence suggested four main conclusions: (a) efficiency increased when the true variance-covariance matrix became diagonal, (b) EM was more robust to the curse of dimensionality in regard to efficiency and estimation time, (c) EM did not recover the true scale with cross-sectional or with panel data, and (d) EM systematically attained more efficient estimators than the MSL method. The results imply that if the purpose of the estimation is only to determine the ratios of the model parameters (e.g., the value of time), the EM method should be preferred. For all other cases, MSL should be used.
Article
Full-text available
As computing capabilities have advanced, random coefficient models have emerged as the mainstream method of dealing with traveler behaviors in transport studies. Car-following models with random coefficients, however, are rarely used, although many kinds of car-following models have been attempted. For the present study, we proposed a rigorous methodology to calibrate a GM-type car-following model with random coefficients, which could account for the heterogeneity across drivers who respond differently to stimuli. To avert both the curse of dimensionality and the lack of empirical identification, which can be a part of dealing with a simulated likelihood, a robust algorithm called the expectation–maximization (EM) was adopted. The calibration results confirmed that random coefficients of the model fluctuated considerably across drivers, and were correlated with each other. The exclusion of these facts might be a potential reason for the difficulty in simulating real traffic situations based on a single car-following model with constant coefficients.
Article
The developments in discrete choice formulation, estimation and inference techniques have been fast and furious over the past few years. This special issue of Transportation Research Part B is a compilation of some of the cutting-edge research in the field.
Article
Full-text available
Simulated maximum likelihood is used to estimate a random parameter multinomial probit model of destination choice for recreational fishing trips, formulated to accommodate varying tastes and varying perceptions of environmental quality across individuals. The restricted likelihood ratio test strongly rejects the independent probit model, which is similar to the independent logit model in both the parameter and benefit estimates. Furthermore, both the Krinsky-Robb and bootstrapping procedures suggest that the benefit (standard deviation) of an environmental policy is found to be markedly lower (higher) when heterogeneous preferences are taken into account.
Article
Full-text available
A mixed logit is specified with partworths that are transforma- tions of normally distributed terms, including censored normals, log- normals, and SB distributions which are bounded on both sides. The model is estimated by Bayesian MCMC procedures, which are espe- cially well-suited to mixed logit with normal distributions. The trans- formations provide greater flexibility for the distributions of partworths without appreciably diminishing the speed of the MCMC calculations. The method is applied to data on customers' choice among vehicles in stated choice experiments. The flexibility that the transformations allow is found to greatly improve the model, both in terms of fit and plausibility.
Article
Full-text available
Simulated maximum likelihood is used to estimate a random parameter multinomial probit model of destination choice for recreational fishing trips, formulated to accommodate varying tastes and varying perceptions of environmental quality across individuals. The restricted likelihood ratio test strongly rejects the independent probit model, which is similar to the independent logit model in both the parameter and benefit estimates. Furthermore, both the Krinsky-Robb and bootstrapping procedures suggest that the benefit (standard deviation) of an environmental policy is found to be markedly lower (higher) when heterogeneous preferences are taken into account. Copyright 1998, Oxford University Press.
Book
Full-text available
This book describes the new generation of discrete choice methods, focusing on the many advances that are made possible by simulation. Researchers use these statistical methods to examine the choices that consumers, households, firms, and other agents make. Each of the major models is covered: logit, generalized extreme value, or GEV (including nested and cross-nested logits), probit, and mixed logit, plus a variety of specifications that build on these basics. Simulation-assisted estimation procedures are investigated and compared, including maximum simulated likelihood, method of simulated moments, and method of simulated scores. Procedures for drawing from densities are described, including variance reduction techniques such as anithetics and Halton draws. Recent advances in Bayesian procedures are explored, including the use of the Metropolis-Hastings algorithm and its variant Gibbs sampling. No other book incorporates all these fields, which have arisen in the past 20 years. The procedures are applicable in many fields, including energy, transportation, environmental studies, health, labor, and marketing.
Article
Full-text available
We examine small/medium commercial and industrial customers' choices among energy suppliers in conjoint-type experiments. The distribution of customers' willingness to pay is estimated for more than 40 attributes of suppliers, including sign-up bonuses, amount and type of renewables, billing options, bundling with other services, reductions in voltage fluctuations, and charitable contributions. These estimates provide guidance for suppliers in designing service options and to economists in anticipating the services that will be offered in competitive retail energy markets.
Article
Full-text available
An exciting development in modeling has been the ability to estimate reliable individual-level parameters for choice models. Individual partworths derived from these parameters have been very useful in segmentation, identifying extreme individuals, and in creating appropriate choice simulators. In marketing, hierarchical Bayes models have taken the lead in combining information about the aggregate distribution of tastes with the individual's choices to arrive at a conditional estimate of the individual's parameters. In economics, the same behavioral model has been derived from a classical rather than a Bayesian perspective. That is, instead of Gibbs sampling, the method of maximum simulated likelihood provides estimates of both the aggregate and the individual parameters. This paper explores the similarities and differences between classical and Bayesian methods and shows that they result in virtually equivalent conditional estimates of partworths for customers. Thus, the choice between Bayesian and classical estimation becomes one of implementation convenience and philosophical orientation, rather than pragmatic usefulness.
Article
An example is given showing that a sequence generated by a GEM algorthm need not converge under the conditions stated in Dempster et al., (1977). Two general convergence results are presented which suggest that in practice a GEM sequence will converge to a compact connected set of local maxima of the likelihood function; this limit set may or may not consist of a single point.
Article
This chapter discusses classical estimation methods for limited dependent variable (LDV) models that employ Monte Carlo simulation techniques to overcome computational problems in such models. These difficulties take the form of high-dimensional integrals that need to be calculated repeatedly. In the past, investigators were forced to restrict attention to special classes of LDV models that are computationally manageable. The simulation estimation methods we discuss here make it possible to estimate LDV models that are computationally intractable using classical estimation methods. The chapter first reviews the ways in which LDV models arise, describing the differences and similarities in censored and truncated data generating processes. Censoring and truncation give rise to the troublesome multivariate integrals. Following the LDV models, we described various simulation methods for evaluating such integrals. Naturally, censoring and truncation play roles in simulation as well. Finally, estimation methods that rely on simulation are described. The chapter also reviews three general approaches that combine estimation of LDV models and simulation: simulation of the log-likelihood function (MSL), simulation of moment functions (MSM), and simulation of the score (MSS). The MSS is a combination of ideas from MSL and MSM, treating the efficient score of the log-likelihood function as a moment function.