ArticlePDF Available

Duality in Dynamic Discrete Choice Models

Authors:

Abstract and Figures

Using results from convex analysis, we investigate a novel approach to identification and estimation of discrete choice models which we call the "Mass Transport Approach" (MTA). We show that the conditional choice probabilities and the choice-specific payoffs in these models are related in the sense of conjugate duality, and that the identification problem is a mass transport problem. Based on this, we propose a new two-step estimator for these models; interestingly, the first step of our estimator involves solving a linear program which is identical to the classic assignment (two-sided matching) game of Shapley and Shubik (1971). The application of convex-analytic tools to dynamic discrete choice models, and the connection with two-sided matching models, is new in the literature.
Content may be subject to copyright.
Quantitative Economics 7 (2016), 83–115 1759-7331/20160083
Duality in dynamic discrete-choice models
Khai Xiang Chiong
INET and Department of Economics, University of Southern California
Alfred Galichon
Economics Department and CIMS, New York University and Economics Department, Sciences Po
Mat t Shum
Division of Humanities and Social Sciences, California Institute of Technology
Using results from Convex Analysis, we investigate a novel approach to identifica-
tion and estimation of discrete-choice models that we call the mass transport ap-
proach. We show that the conditional choice probabilities and the choice-specific
payoffs in these models are related in the sense of conjugate duality, and that
the identification problem is a mass transport problem. Based on this, we pro-
pose a new two-step estimator for these models; interestingly, the first step of
our estimator involves solving a linear program that is identical to the classic as-
signment (two-sided matching) game of Shapley and Shubik (1971). The applica-
tion of convex-analytic tools to dynamic discrete-choice models and the connec-
tion with two-sided matching models is new in the literature. Monte Carlo results
demonstrate the good performance of this estimator, and we provide an empirical
application based on Rust’s (1987) bus engine replacement model.
Keywords. Conditional choice probability inversion, estimation of discrete
choice models, mass transportation approach.
JEL classification. C35, C61, D90.
1. Introduction
Empirical research utilizing dynamic discrete-choice models of economic decision-
making has flourished in recent decades, with applications in all areas of applied mi-
Khai Xiang Chiong: kchiong@usc.edu
Alfred Galichon: ag133@nyu.edu
Matt Shum: mshum@caltech.edu
The authors thank the editor and three anonymous referees, as wellas Benjamin Connault, Thierry Magnac,
Emerson Melo, Bob Miller, Sergio Montero, John Rust, Sorawoot (Tang) Srisuma, and Haiqing Xu for useful
comments. We are especially grateful to Guillaume Carlier for providing decisive help with the proof of
Theorem 5. We also thank audiences at Michigan, Northwestern, NYU, Pittsburgh, UCSD, the CEMMAP
Conference on Inference in Game-Theoretic Models (June 2013), UCLA Econometrics Mini-Conference
(June 2013), the Boston College Econometrics of Demand Conference (December 2013), and the Toulouse
Conference on Recent Advances in Set Identification (December 2013) for helpful comments. Galichon’s
research has received funding from the European Research Council under the European Unions Seventh
Framework Programme (FP7/2007-2013)/ERC Grant 313699.
Copyright ©2016 Khai Xiang Chiong, Alfred Galichon, and Matt Shum. Licensed under the Creative Com-
mons Attribution-NonCommercial License 3.0. Available at http://www.qeconomics.org.
DOI: 10.3982/QE436
84 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
croeconomics including labor economics, industrial organization, public finance, and
health economics. The existing literature on the identification and estimation of these
models has recognized a close link between the conditional-choice probabilities (here-
after, CCP, which can be observed and estimated from the data) and the payoffs (or
choice-specific value functions, which are unobservable to the researcher); indeed, most
estimation procedures contain an “inversion step in which the choice-specific value
functions are recovered given the estimated choice probabilities.
This paper has two contributions. First, we explicitly characterize this duality rela-
tionship between the choice probabilities and choice-specific payoffs. Specifically, in
discrete-choice models, the social surplus function (McFadden (1978)) provides us with
the mapping from payoffs to the probabilities with which a choice is chosen at each state
(conditional-choice probabilities). Recognizing that the social surplus function is con-
vex, we develop the idea that the convex conjugate of the social surplus function gives us
the inverse mapping: from choice probabilities to utility indices. More precisely, the sub-
differential of the convex conjugate is a correspondence that maps from the observed
choice probabilities to an identified set of payoffs. In short, the choice probabilities and
utility indices are related in the sense of conjugate duality. The discovery of this rela-
tionship allows us to succinctly characterize the empirical content of discrete-choice
models, both static and dynamic.
Not only is the convex conjugate of the social surplus function a useful theoretical
object; it also provides a new and practical way to “invert from a given vector of choice
probabilities back to the underlying utility indices that generated these probabilities.
This is the second contribution of this paper. We show how the conjugate along with its
set of subgradients can be efficiently computed by means of linear programming. This
linear-programming formulation has the structure of an optimal assignment problem
(as in Shapley–Shubiks (1971) classic work). This surprising connection enables us to
apply insights developed in the optimal transport literature (e.g., Villani (2003,2009)) to
discrete-choice models. We call this new methodology the mass transport approach to
CCP inversion.
This paper focuses on the estimation of dynamic discrete-choice models via two-
step estimation procedures in which conditional-choice probabilities are estimated in
the initial stage; this estimation approach was pioneered in Hotz and Miller (HM, 1993)
and Hotz, Miller, Sanders, and Smith (1994).1Our use of tools and concepts from Convex
Analysis to study identification and estimation in this dynamic discrete-choice (DDC)
setting is novel in the literature. Based on our findings, we propose a new two-step esti-
mator for DDC models. A nice feature of our estimator is that it works for practically any
assumed distribution of the utility shocks.2Thus, our estimator would make possible
1Subsequent contributions include Aguirregabiria and Mira (2002,2007), Magnac and Thesmar (2002),
Pesendorfer and Schmidt-Dengler (2008), Bajari, Chernozhukov, Hong, and Nekipelov (2009), Arcidiacono
and Miller (2011), and Norets and Tang (2013).
2While existing identification results for dynamic discrete-choice models allow for quite general specifi-
cations of the additive choice-specific utility shocks, many applications of these two-step estimators main-
tain the restrictive assumption that the utility shocks are independent and identically distributed (i.i.d.)
type I extreme values, independent of the state variables, leading to choice probabilities that take the multi-
nomial logit form.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 85
the task of evaluating the robustness of estimation to different distributional assump-
tions.3
Section 2contains our main results regarding duality between choice probabilities
and payoffs in discrete-choice models. Based on these results, we propose, in Section 3,
a two-step estimation approach for these models. We also emphasize here the surpris-
ing connection between dynamic discrete-choice and optimal matching models. In Sec-
tion 4, we discuss computational details for our estimator, focusing on the use of lin-
ear programming to compute (approximately) the convex conjugate function from the
dynamic discrete-choice model. Monte Carlo experiments (in Section 5) show that our
estimator performs well in practice, and we apply the estimator to Rust’s (1987)busen-
gine replacement data (Section 6). Section 7concludes. The Appendix contains proofs
and also a brief primer on relevant results from Convex Analysis. Sections 2.2 and 2.3,as
well as Section 4, are not specific to dynamic discrete-choice problems but are also true
for any (static) discrete-choice model.
2. Basic model
2.1 The framework
In this section we review the basic dynamic discrete-choice setup, as encapsulated in
Rust’s (1987) seminal paper. The state variable is xX, which we assume to take only a
finite number of values. Agents choose actions yYfrom a finite space Y={01D}.
The single-period utility flow that an agent derives from choosing yin a given pe-
riod is
¯
uy(x) +εy
where εydenotes the utility shock pertaining to action y, which differs across agents.
Across agents and time periods, the set of utility shocks εy)yYis distributed ac-
cording to a joint distribution function Q(·;x), which can depend on the current val-
ues of the state variable x. We assume that this distribution Qis known to the re-
searcher.
Throughout, we consider a stationary setting in which the agent’s decision environ-
ment remains unchanged across time periods; thus, for any given period, we use primes
() to denote next-period values. Following Rust (1987) and most of the subsequent pa-
pers in this literature, we maintain the following conditional independence assumption
(which rules out serially persistent forms of unobserved heterogeneity4).
3While they are not the focus in this paper, many applications of dynamic-choice models do not utilize
HM-type two-step estimation procedures, and they allow for quite flexible distributions of the utility shocks
and also for serial correlation in these shocks (examples include Pakes (1986) and Keane and Wolpin (1997)).
This literature typically employs the simulated method of moments or simulated maximum likelihood for
estimation (see Rust (1994, Section 3.3)).
4See Norets (2009), Kasahara and Shimotsu (2009), Arcidiacono and Miller (2011), and Hu and Shum
(2012).
86 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Assumption 1 (Conditional Independence). The set (x ε) evolves across time periods
as a controlled first-order Markov process,with transition
Prxε
|yxε=Prε|xyxε
·Prx|yxε
=Prε|x·Prx|yx
Thediscountrateisβ. Agents are dynamic optimizers whose choices each period
satisfy5
yarg max
˜
yY¯
u˜
y(x) +ε˜
y+βE¯
Vxε
|x ˜
y(1)
where the value function ¯
Vis recursively defined via Bellmans equation as6
¯
V(xε)=max
˜
yY¯
u˜
y(x) +ε˜
y+βE¯
Vxε
|x ˜
y
The ex ante value function is defined as7
V(x)=E¯
V(xε)|x
The expectation above is conditional on the current state x. In the literature, V(x)is
called the ex ante (or integrated) value function, because it measures the continuation
value of the dynamic optimization problem before the agent observes his shocks ε,so
that the optimal action is still stochastic from the agent’s point of view.
Next we define the choice-specific value functions as consisting of two terms: the per-
period utility flow and the discounted continuation payoff
wy(x) ¯
uy(x) +βEVx|x y
In this paper, the utility flows {uy(x);∀yYxX}and, subsequently, also the choice-
specific value functions {wy(x) yx}will be treated as unknown parameters; we will
study the identification and estimation of these parameters. For this reason, in the initial
part of the paper, we will suppress the explicit dependence of wyon xfor convenience.
Given these preliminaries, we derive the duality that is central to this paper.
2.2 The social surplus function and its convex conjugate
We start by introducing the expected indirect utility of a decision-maker facing the |Y|-
dimensional vector of choice-specific values w≡{wyy Y},
G(w;x) =Emax
yY(wy+εy)x(2)
5We have used Assumption 1to eliminate εas a conditioning variable in the expectation in (1).
6See, for example, Bertsekas (1987, Chapter 5) for an introduction and derivation of this equation.
7There is a difference between the definition of V(x)and the last terms in (1). Here, we are considering
the expectation of the value function ¯
V(xε) taken over the distribution of ε|x(i.e., holding the first argu-
ment fixed). In the last term of (1), however, we are considering the expectation over the joint distribution
of (xε
)|x(i.e., holding neither argument fixed).
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 87
where the expectation is assumed to be finite and is taken over the distribution of the
utility shocks, Q(·;x).ThisfunctionG(·;x) :R|Y|R, is called the social surplus func-
tion in McFaddens (1978) random utility framework, and can be interpreted as the ex-
pected welfare of a representative agent in the dynamic discrete-choice problem.
For convenience in what follows, we introduce the notation Y(wε) to denote an
agent’s optimal choice given the vector of choice-specific value functions wand the vec-
tor of utility shocks ε;thatis,Y(wε)=arg maxyY(wy+εy).8This notation makes ex-
plicit the randomness in the optimal alternative (arising from the utility shocks ε). We
get
G(w;x) =E[wY(wε) +εY(wε)|x](3)
=
yY
PrY(wε)=y|x

py(x) wy+Eεy|Y(wε)=yx
which shows an alternative expression for the social surplus function as a weighted aver-
age, where the weights are the components of the vector of conditional-choice probabili-
ties p(x). For the remainder of this section, we suppress the dependence of all quantities
on xfor convenience. In later sections, we will reintroduce this dependence when it is
necessary.
In the case when the social surplus function G(w) is differentiable (which holds for
most discrete-choice model specifications considered in the literature9), we obtain a
well known fact that the vector of choice probabilities pcompatible with rational choice
coincides with the gradient of Gat w.
Proposition 1 (The Williams–Daly–Zachary (WDZ) Theorem). We have
p=∇G(w)
This result, which is analogous to Roy’s identity in discrete-choice models, is ex-
pounded in McFadden (1978)andRust(1994, Theorem 3.1). It characterizes the vector
of choice probabilities corresponding to optimal behavior in a discrete-choice model
as the gradient of the social surplus function. For completeness, we include a proof in
Appendix B. The WDZ theorem provides a mapping from the choice-specific value func-
tions (which are unobserved by researchers) to the observed choice probabilities p.
However, the identification problem is the reverse problem, namely to determine the
set of wthat would lead to a given vector of choice probabilities. This problem is exactly
solved by convex duality and the introduction of the convex conjugate of G,whichwe
denote as G.10
8We us e wand ε(and also pbelow) to denote vectors, while wyand εy(and py)denotetheyth compo-
nents of these vectors.
9This includes logit, nested logit, multinomial probit, and so forth in which the distribution of the utility
shocks is absolutely continuous and wis bounded (see Lemma 1 in Shi, Shum, and Song (2014)).
10Details of convex conjugates are expounded in Appendix A. Convex conjugates are also encountered
in classic producer and consumer theory. For instance, when fis the convex cost function of the firm (de-
creasing returns to scale in production), then the convex conjugate of the cost function, f,isinfactthe
firm’s optimal profit function.
88 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Definition 1 (Convex Conjugate). We define G, the Legendre–Fenchel conjugate
function of G(a convex function), by
G(p) =sup
wRY
yY
pywyG(w)(4)
Equation (4)hasthepropertythatifpis not a probability, that is, if either condi-
tions py0or yYpy=1do not hold, then G(p) =+. Because the choice-specific
value functions wand the choice probabilities pare, respectively, the arguments of the
functions Gand its convex conjugate function G, we say that wand pare related in the
sense of conjugate duality. The theorem below states an implication of this duality, and
provides an “inverse correspondence from the observed choice probabilities back to
the unobserved w, which is a necessary step for identification and estimation.
Theorem 1. The following pair of equivalent statements captures the empirical content
of the DDC model:
(i) The vector pis in the subdifferential of Gat w:
pG(w) (5)
(ii) The vector wis in the subdifferential of Gat p:
wG(p) (6)
The definition and properties of the subdifferential of a convex function are provided
in Appendix A.11 Part (i) is, of course, connected to the WDZ theorem (Proposition 1);
indeed, it is the WDZ theorem when G(w) is differentiable at w.Hence,itencapsulates
an optimality requirement that the vector of observed choice probabilities pbe derived
from optimal discrete-choice decision-making for some unknown vector wof choice-
specific value functions.
Part (ii) of this proposition, which describes the “inverse mapping from conditional-
choice probabilities to choice-specific value functions, does not appear to have been
exploited in the literature on dynamic discrete choice. It relates to Galichon and Salanié
(2012) who use Convex Analysis to estimate matching games with transferable utilities.
It specifically states that the vector of choice-specific value functions can be identified
from the corresponding vector of observed choice probabilities pas the subgradient of
the convex conjugate function G(p).Equation(6) is also constructive, and suggests a
11The function Gis differentiable at wif and only if G(w) is single-valued. In that case, part (i) of Theo-
rem 1reduces to p=∇G(w), which is the WDZ theorem. If, in addition, Gis one-to-one, then we imme-
diately get w=(G)1(p),orG(p) =(G)1(p), which is the case of the classical Legendre transform.
However, as we show below, G(w) is not typically one-to-one in discrete-choice models, so that the state-
ment in part (ii) of Theorem 1is more suitable.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 89
procedure for computing the choice-specific value functions corresponding to observed
choice probabilities. We will fully elaborate this procedure in subsequent sections.12
Appendix Acontains additional derivations related to the subgradient of a convex
function. Specifically, it is known ((25)) that G(w) +G(p) =yYpywyif and only if
pG(w). Combining this with (3), we obtain an alternative expression for the convex
conjugate function G,
G(p) =−
y
pyEεy|Y(wε)=y(7)
corresponding to the weighted expectations of the utility shocks εyconditional on
choosing the option y. It is also known that the subdifferential G(p) corresponds to
the set of maximizers in the program (4) that define the conjugate function G(p);that
is,
wG(p) warg max
wRY
yY
pywyG(w)(8)
Later, we will exploit this variational representation of the subdifferential G(p) for com-
putational purposes (cf. Section 4).
Example 1 (Logit). Before proceeding, we discuss the logit model, for which the func-
tions and relations above reduce to familiar expressions. When the distribution Qof ε
obeys an extreme-value type I distribution, it follows from extreme-value theory that G
and Gcan be obtained in closed form:13 G(w) =log(yYexp(wy)) +γ, while G(p) =
yYpylog pyγif pbelongs in the interior of the simplex, and G(p) =+otherwise
(γ057 is Euler’s constant). Hence in this case, Gis the entropy of distribution p(see
Anderson, de Palma, and Thisse (1988) and references therein).
The subdifferential of Gis characterized as wG(p) if and only if wy=log pyK
for some KR. In this logit case the convex conjugate function Gis the entropy of
distribution p, which explains why it can be called a generalized entropy function even
in nonlogit contexts.
2.3 Identification
It follows from Theorem 1that the identification of systematic utilities boils down to the
problem of computing the subgradient of a generalized entropy function. However, from
examining the social surplus function G,weseethatifwG(p), then it is also true that
12Clearly, Theorem 1also applies to static random utility discrete-choice models, with the w(x) being in-
terpreted as the utility indices for each of the choices. As such, (6) relates to results regarding the invertibil-
ity of the mapping from utilities to choice probabilities in static discrete-choice models (e.g., Berry (1994),
Haile, Hortacsu, and Kosenok (2008), and Berry, Gandhi, and Haile (2013)). Similar results have also arisen
in the literature on stochastic learning in games (Hofbauer and Sandholm (2002) and Cominetti, Melo, and
Sorin (2010)).
13In a related article, Arcidiacono and Miller (2011, pp. 1839–1841) discuss computational and analytical
solutions for the Gfunction in the generalized extreme-value setting.
90 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
wKG(p),whereKR|Y|is a vector taking values of Kacross all Ycomponents.
Indeed, the choice probabilities are only affected by the differences in the levels offered
by the various alternatives. In what follows, we shall tackle this indeterminacy problem
by isolating a particular w0among those satisfying wG(p), where we choose
Gw0=0(9)
We will impose the following assumption on the heterogeneity.
Assumption 2 (Full Support). Assume the distribution Qof the vector of utility shocks ε
is such that the distribution of the vector yε1)y=1has full support.
Under this assumption, Theorem 2below shows that (9)denesw0uniquely. Theo-
rem 3will then show that the knowledge of w0allows for easy recovery of all vectors w
satisfying pG(w).
Theorem 2. Under Assumption 2,let pbe in the interior of the simplex Δ|Y|(i.e., py>0
for each yand ypy=1). Then there exists a unique w0G(p) such that G(w0)=0.
The proof of this theorem is given in Appendix B. Moreover, even when Assumption 2
is not satisfied, w0will still be set-identified; Theorem 4below describes the identified
set of w0corresponding to a given vector of choice probabilities p.
Our next result is our main tool for identification; it shows that our choice of w0(x)
as defined in (9) is without loss of generality; it is not an additional model restriction,
but merely a convenient way to represent all w(x) in G(p(x)) withrespecttoanatural
and convenient reference point.14
Theorem 3. Maintain Assumption 2and let Kdenote any scalar KR.The set of con-
ditions
wG(p) and G(w) =K
is equivalent to
wy=w0
y+KyY
This theorem shows that any vector within the set G(p) can be characterized as
the sum of the (uniquely determined, by Theorem 3) vector w0and a constant KR.As
we will see below, this is our invertibility result for dynamic discrete-choice problems,
14This indeterminacy issue has been resolved in the existing literature on dynamic discrete-choice mod-
els (e.g., Hotz and Miller (1993), Rust (1994), and Magnac and Thesmar (2002)) by focusing on the differ-
ences between choice-specific value functions—which is equivalent to setting wy0(x), the choice-specific
value function for a benchmark choice y0—equal to zero. Compared to this, our choice of w0(x) satisfying
G(w0(x)) =0is more convenient in our context, as it leads to a simple expression for the constant K(see
Section 2.4).
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 91
as it will imply unique identification of the vector of choice-specific value functions cor-
responding to any observed vector of conditional-choice probabilities.15
2.4 Empirical content of dynamic discrete-choice model
To summarize the empirical content of the model, we recall the fact that the ex ante
value function Vsolves the equation
V(x)=
yY
py(x)¯
uy(x) +Eεy|Y(wε) =y x+β
x
px|x yVx
(derived in Pesendorfer and Schmidt-Dengler (2008), among others), where we write
p(x|x y) =Pr(xt+1=x|xt=x yt=y). Noting that the choice-specific value function is
just
wy(x) =¯
uy(x) +β
x
px|x yVx(10)
and comparing with (3) yields
V(x)=Gw(x);xand p(x) Gw(x);x
Hence, by Theorem 3,thetruew(x) will differ from w0(x) by a constant term V(x),
w(x) =w0(x) +V(x)
where w0(x) is defined in Theorem 2. This result is also convenient for identification
purposes, as it separates identification of winto two subproblems: the determination of
w0and the determination of V.Oncew0and Vare known, the utility flows are deter-
mined from (10). This motivates our two-step estimation procedure, which we describe
next.
3. Estimation using the mass transport approach
Based on the derivations in the previous section, we present a two-step estimation pro-
cedure. In the first step, we use the results from Theorem 3to recover the vector of
choice-specific value functions w0(x) corresponding to each observed vector of choice
probabilities p(x). In the second step, we recover the utility flow functions ¯
uy(x) given
the w0(x) obtained from the first step.
3.1 First step
In the first step, the goal is to recover the vector of choice-specific value functions
w0(x) G(p(x)) corresponding to the vector of observed choice probabilities p(x) for
15See Berry (1994), Chiappori and Komunjer (2010), and Berry, Gandhi, and Haile (2013), among others,
for conditions that ensure the invertibility or “univalence of demand systems stemming from multinomial
choice models under settings more general than the random utility framework considered here.
92 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
each value of x. In doing this, we use Theorem 1above and Proposition 2below, which
show how w0(x) belongs to the subdifferential of the conjugate function G(p(x)).We
delay discussing these details until Section 4. There we will show how this problem of
obtaining w0(x) can be reformulated in terms of a class of mathematical programming
problems, the Monge–Kantorovich mass transport problems, which lead to convenient
computational procedures. Since this is the central component of our estimation proce-
dure,wehavenameditthemass transport approach (MTA).
3.2 Second step
From the first step, we obtained w0(x) such that w(x) =w0(x) +V(x). Now in the second
step, we use the recursive structure of the dynamic model, along with fixing one of the
utility flows, to jointly pin down the values of w(x) and V(x). Finally, once w(x) and V(x)
are known, the utility flows can be obtained from ¯
uy(x) =wy(x) βE[V(x
)|x y].
To nonparametrically identify ¯
uy(x), we need to fix some values of the utility flows.
Following Bajari et al. (2009), we fix the utility flow corresponding to a benchmark choice
y0to be constant at zero.16
Assumption 3 (Fix Utility Flow for Benchmark Choice). For all x,¯
uy0(x) =0.
With this assumption, we get
0=w0
y0(x) +V(x)βEVx|x y =y0(11)
Let Wbe the column vector whose general term is (w0
y0(x))xX,letVbe the column
vector whose general term is (V (x))xX,andletΠ0be the |X|×|X|matrix whose general
term Π0
ij is Pr(xt+1=j|xt=i y =y0).Equation(11), rewritten in matrix notation, is
W=βΠ0VV
and for β<1,matrixIβΠ0is a diagonally dominant matrix. Hence, it is invertible and
(11) becomes
V=βΠ0I1W (12)
The right-hand side of this equation is uniquely estimated from the data. After ob-
taining V(x),¯
uy(x) can be nonparametrically identified by
¯
uy(x) =w0
y(x) +V(x)βEVx|x y(13)
where w0(x) is as in Theorem 3,andVis given by (12).
16In a static discrete-choice setting (i.e., β=0), this assumption would be a normalization and without
loss of generality. In a dynamic discrete-choice setting, however, this entails some loss of generality because
different values for the utility flows imply different values for the choice-specific value functions, which
leads to differences in the optimal choice behavior. Norets and Tang (2013) discuss this issue in greater
detail.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 93
As a sanity check, one recovers ¯
uy0(·)=W+VβΠ0V=0.Also,whenβ0,one
recovers ¯
uy(x) =w0
y(x) w0
y0(x), which is the case in standard static discrete choice.
Moreover, since our approach to identifying the utility flows is nonparametric, our MTA
approach does not leverage any known restrictions on the flow utility (including para-
metric or shape restrictions) in identifying or estimating the flow utilities.17
Equations (12)and(13), which show how the per-period utility flows can be re-
covered from the choice-specific value functions via a system of linear equations,
echoe similar derivations in the existing literature (e.g., Aguirregabiria and Mira (2007),
Pesendorfer and Schmidt-Dengler (2008), and Arcidiacono and Miller (2011,2013)).
Hence, the innovative aspect of our MTA estimator lies not in the second step, but rather
in the first step. In the next section, we delve into computational aspects of this first
step.
Existing procedures for estimating DDC models typically rely on a small class of
distributions for the utility shocks—primarily those in the extreme-value family, as in
Example 1above—because these distributions yield analytical (or near analytical) for-
mulas for the choice probabilities and {E[εy|Y(wε)=yx]}y, the vector of conditional
expectation of the utility shocks for the optimal choices, which is required so as to
recover the utility flows.18 Our approach, however, which is based on computing the
Gfunction, easily accommodates different choices for Qε, the (joint) distribution of
the utility shocks conditional on X. Therefore, our findings expand the set of dynamic
discrete-choice models suitable for applied work far beyond those with extreme-value
distributed utility shocks.19
4. Computational details for the MTA estimator
In Section 4.1, we show that the problem of identification in DDC models can be formu-
lated as a mass transport problem, and also how this may be implemented in practice.
In showing how to compute G, we exploit the connection, alluded to above, between
this function and the assignment game: a model of two-sided matching with transfer-
able utility that has been used to model marriage and housing markets (such as Shapley
and Shubik (1971)andBecker (1973)).
17To ensure that the inverted wsatisfies certain shape restrictions, the linkage between wand the CCP
will no longer be stipulated by the subdifferential of the convex conjugate function. It is possible that there
exists a modification of the convex conjugate function that is equivalent to imposing certain shape restric-
tions on utilities. This is an interesting avenue for future research.
18Related papers include Hotz and Miller (1993), Hotz et al. (1994), Aguirregabiria and Mira (2007),
Pesendorfer and Schmidt-Dengler (2008), and Arcidiacono and Miller (2011). Norets and Tang (2013)pro-
pose another estimation approach for binary dynamic-choice models in which the choice probability func-
tion is not required to be known.
19This remark is also relevant for static discrete-choice models. In fact, the random-coefficients multi-
nomial demand model of Berry, Levinsohn, and Pakes (1995) does not have a closed-form expression for
the choice probabilities, thus necessitating a simulation-based inversion procedure. In ongoing work, we
are exploring the estimation of random-coefficients discrete-choice demand models using our approach.
94 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
4.1 Mass transport formulation
Much of our computational strategy will be based on the following proposition, which
was derived in Galichon and Salanié (2012, Proposition 2). It characterizes the Gfunc-
tion as an optimum of a well studied mathematical program: the mass transport prob-
lem; see Villani (2003).
Proposition 2 (Galichon and Salanié). Given Assumption 2,the function G(p) is the
value of the mass transport problem in which the distribution Qof vectors of utility shocks
εis matched optimally to the distribution of actions ygiven by the multinomial distribu-
tion p,when the cost associated to a match of (ε y) is given by
c(yε) =−εy
where εyis the utility shock from taking the yth action.That is,
G(p) =sup
wz
stw
y+z(ε)c(yε)Ep[wY]+EQz(ε)(14)
where the supremum is taken over the pair (w z) ,where wyis a vector of dimension |Y|
and z(·)is a Q-measurable random variable.By Monge–Kantorovich duality,(14)coin-
cides with its dual
G(p) =min
Yp
εQ
Ec(Yε)(15)
where the minimum is taken over the joint distribution of (Y ε) such that the first margin
Yhas distribution pand the second margin εhas distribution Q.Moreover,wG(p)
if and only if there exists zsuch that (w z) solves (14). Finally,w0G(p) and G(w0)=0
if and only if there exists zsuch that (w0z)solves (14)and zis such that EQ[z(ε)]=0.
In (15) above, the minimum is taken across all joint distributions of (Y ε) with
marginal distribution equal to, respectively, pand Q. It follows from the proposition
that the main problem of identification of the choice-specific value functions wcan be
recast as a mass transport problem (Villani (2003)), in which the set of optimizers to (14)
yield vectors of choice-specific value functions wG(p).
Moreover, the mass transport problem can be interpreted as an optimal matching
problem. Using a marriage market analogy, consider a setting in which a matched cou-
ple consisting of a “man (with characteristics yp) and a “woman” (with characteris-
tics εQ) obtain a joint marital surplus c(yε) =εy.Accordingly,(15)isanoptimal
matching problem in which the joint distribution of characteristics (y ε) of matched
couples is chosen to maximize the aggregate marital surplus.
InthecasewhenQis a discrete distribution, the mass transport problem in the
above proposition reduces to a linear-programming problem that coincides with the
assignment game of Shapley and Shubik (1971). This connection suggests a convenient
way to efficiently compute the Gfunction (along with its subgradient). Specifically, we
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 95
will show how the dual problem ((15)) takes the form of a linear-programming problem
or assignment game, for which some of the associated Lagrange multipliers correspond
to the subgradient G, and hence the choice-specific value functions. These compu-
tational details are the focus of Section 4.2. We include the proof of Proposition 2in
Appendix Bfor completeness.
4.2 Linear-programming computation
Let ˆ
Qbe a discrete approximation to the distribution Q. Specifically, consider an S-point
approximation to Q, where the support is Supp(ˆ
Q) ={ε1ε
S}.LetPr(ˆ
Q=εs)=qs.
The best S-point approximation is such that the support points are equally weighted,
qs=1
S,thatis,thebest ˆ
Qis a uniform distribution; see Kennan (2006). Therefore, let ˆ
Q
be a uniform distribution whose support can be constructed by drawing Spoints from
the distribution Q.Moreover, ˆ
Qconverges to Quniformly as S→∞,20 so that the ap-
proximation error from this discretization will vanish when Sis large. Under these as-
sumptions, problem (14)–(15) has a linear-programming formulation as
max
π0
ys
πysεs
y(16)
S
s=1
πys =pyyY(17)
yY
πys =qss∈{1S}(18)
For this discretized problem, the set of wG(p) is the set of vectors wof Lagrange
multipliers corresponding to constraints (17). To see how we recover w0,thespecic
element in G(p) as defined in Theorem 1, we begin with the dual problem
min
λz
yY
pyλy+
S
s=1
qszs
(19)
s.t. λy+zsεs
y
Consider (λ z) to be a solution to (19). By duality, λand zare, respectively, vectors
of Lagrange multipliers associated to constraints (17)and(18).21 We ha ve G(p) =
yYpyλy+S
s=1qszs, which implies22 that G(λ) =−S
s=1qszs. Also, for any two el-
ements λ,w0G(p),wehaveyYpyλyG(λ) =yYpyw0
yG(w0).
20Because ˆ
Qis constructed from i.i.d. draws from Q, this uniform convergence follows from the
Glivenko–Cantelli theorem.
21Because the two linear programs (16) and (19) are dual to each other, the Lagrange multipliers of inter-
est λycan be obtained by computing either program. In practice, for the simulations and empirical appli-
cation below, we computed the primal problem (16).
22This uses (25) in Appendix A, which (in our setup) states that G(p) +G(λ) =p·λfor all Lagrange
multiplier vectors λG(p).
96 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Hence, because G(w0)=0,weget
w0
y=λyG(λ) =λy+
S
s=1
qszs(20)
In Theorem 5below, we establish the consistency of this estimate of w0.
4.3 Discretization of Qand a second type of indeterminacy issue
Thus far, we have proposed a procedure for computing G(and the choice-specific value
functions w0) by discretizing the otherwise continuous distribution Q. However, be-
cause the support of εis discrete, w0
ywill generally not be unique.23 This is due to the
nonuniqueness of the solution to the dual of the linear-programming (LP) problem in
(16), and corresponds to Shapley and Shubiks (1971) well known results on the mul-
tiplicity of the core in the finite assignment game. Applied to discrete-choice models, it
implies that when the support of the utility shocks is finite, the utilities from the discrete-
choice model will only be partially identified. In this section, we discuss this partial iden-
tification, or indeterminacy, problem further.
Recall that
G(p) =sup
wy+z(ε)c(yε)Ep[wY]+EQz(ε)(21)
where c(yε) =−εy. In Proposition 2, this problem was shown to be the dual formula-
tion of an optimal assignment problem.
We ca ll the identified set of payoff vectors, denoted by I(p), the set of vectors wsuch
that
Prwy+εymax
y{wy+εy}=py(22)
and we denote by I0(p) the normalized identified set of payoff vectors, that is, the set of
wI(p) such that G(w) =0.IfQwere to have full support, I0(p) would contain only
the singleton {w0}as in Theorem 3. Instead, when the distribution Qis discrete, the set
I0(p) contains a multiplicity of vectors wthat satisfy (5). One has the following theorem.
Theorem 4. The following statements hold:
(i) The set I(p) coincides with the set of wsuch that there exists zsuch that (w z) is a
solution to (21). Thus
I(p) =w:∃z wy+zεc(yε)Ep[wY]+EQ[zε]=G(p)
(ii) The set I0(p) is determined by the set of linear inequalities
I0(p) =w:∃z wy+zεc(yε)Ep[wY]=G(p) EQ[zε]=0
23Note that Theorem 1requires εto have full support.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 97
This result allows us to easily derive bounds on the individual components of w0us-
ing the characterization of the identified set using linear inequalities. Indeed, for each
yY, we can obtain upper (resp. lower) bounds on wyby maximizing (resp. mini-
mizing) wysubject to the linear inequalities characterizing I0(p),24 which is a linear-
programming problem.25
Furthermore, when the dimensionality of discretization, S, is high, the core shrinks
to a singleton and the core collapses to {w0}. This is a consequence of our next theo-
rem, which is a consistency result.26 In our Monte Carlo experiments below, we provide
evidence for the magnitude of this indeterminacy problem under different levels of dis-
cretization.
4.4 Consistency of the MTA estimator
Here we show (strong) consistency for our MTA estimator of w0, the normalized choice-
specific value functions. In our proof, we accommodate two types of error: (i) an ap-
proximation error from discretizing the distribution Qof εand (ii) a sampling error from
our finite-sample observations of the choice probabilities. We use Qnto denote the dis-
cretized distributions of ε,andusepnto denote the sample estimates of the choice prob-
abilities. The limiting vector of choice probabilities is denoted p0.Foragiven(Qnp
n),
let wn
ydenote the choice-specific value functions estimated using our MTA approach.
Theorem 5. Make the following assumptions:
(i) The sequence of vectors {pn
y}yY,viewed as the multinomial distribution of y,con-
verges weakly to p0.
(ii) The discretized distributions of εconverge weakly to Q:Qnd
Q.
(iii) The second moments of Qnare uniformly bounded.
Then the convergence wn
yw0
yfor each yYholds almost surely.
The proof, which is given in Appendix B, may be of independent interest as the main
argument relies on approximation results from mass transport theory, which we believe
to be the first use of such results for proving consistency in an econometrics context.
5. Monte Carlo evidence
In this section, we illustrate our estimation framework using a dynamic model of re-
source extraction. To illustrate how our method can tractably handle any general distri-
bution of the unobservables, we use a distribution in which shocks to different choices
are correlated. We will begin by describing the setup.
24However, letting ¯
wy(resp. ¯
wy) denote the upper (resp. lower) bound on wy, we note that typically the
vector (wyy Y)/I0(p).
25Moreover, partial identification in w0(due to discretization of the shock distribution Q(ε)) will natu-
rally also imply partial identification in the utility flows u0. For a given identified vector w0(and also given
the choice probabilities pand transition matrix Π0from the data), we can recover the corresponding u0
using (12)–(13).
26Gretsky, Ostroy, and Zame (1999) also discusses this phenomenon in their paper.
98 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
At each time t,letxt∈{1230}be the state variable denoting the size of the
resource pool. There are three choices:
yt=0: The pool of resources is extracted fully. The transition of the state variable is
governed by xt+1|xty
t=0, which follows a multinomial distribution on {1234}with
parameter π=1π
2π
3π
4). The utility flow is ¯
u(yt=0x
t)=05xt2+ε0.
yt=1: The pool of resources is extracted partially. The transition of the state
variable is governed by xt+1|xty
t=1, which follows a multinomial distribution on
{max{1x
t10}max{2x
t9}max{3x
t8}max{4x
t7}}with parameter π.Theutil-
ity flow is ¯
u(yt=1x
t)=04xt2+ε1.
yt=2: The agent waits for the pool to grow and does not extract. The transition of
the state variable is governed by xt+1|xty
t=3, which follows a multinomial distribu-
tion on {xtx
t+1x
t+2x
t+3}with parameter π. We normalize the utility flow to be
¯
u(yt=2x
t)=ε2.
The joint distribution of the unobserved state variables is given by
0ε2ε
1ε2)N0
00505
051

Other parameters we fix and hold constant for the Monte Carlo study are the discount
rate, β=09,andπ=(03035025010).
5.1 Asymptotic performance
As a preliminary check of our estimation procedure, we show that we are able to recover
the utility flows using the actual conditional choice probabilities implied by the under-
lying model. We discretized the distribution of εusing S=5000 support points. As is
clear from Figure 1, the estimated utility flows (plotted as dots) as a function of states
matched the actual utility functions very well.
5.2 Finite-sample performance
To test the performance of our estimation procedure when there is sampling error in
the CCPs, we generate simulated panel data of the form {yitx
it :i=12N;t=
12T},whereyit ∈{012}is the dynamically optimal choice at xit after the realiza-
tion of simulated shocks. We vary the number of cross-section observations Nand the
number of periods T, and for each combination of (N T ), we generate 100 independent
data sets.27
For each replication or simulated data set, the root-mean-square error (RMSE) and
R2are calculated, showing how well the estimated ¯
uy(x) fits the true utility function for
each y. The averages are reported in Table 1.
27In each data set, we initialized xi1with a random state in X. When calculating RMSE and R2,werestrict
to states where the probability is in the interior of the simplex Δ3; otherwise utilities are not identified and
the estimates are meaningless.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 99
Figure 1. Comparison between the estimated and true utility flows.
Table 1. Average fit across all replications.
Design RMSE (y=0)RMSE(y=1)R2(y=0)R2(y=1)
N=100,T=100 05586 02435 03438 07708
N=100,T=500 01070 01389 07212 09119
N=100,T=1000 00810 01090 08553 09501
N=200,T=100 01244 01642 05773 08736
N=200,T=200 01177 01500 07044 09040
N=500,T=100 00871 01162 08109 09348
N=500,T=500 00665 00829 08899 09678
N=1000,T=100 00718 00928 08777 09647
N=1000,T=1000 00543 00643 09322 09820
Note : Standard errors are reported in Table 3in Appendix C.
5.3 Size of the identified set of payoffs
As mentioned in Section 4.3, using a discrete approximation to the distribution of the
unobserved state variable introduces a partial identification problem: the identified
choice-specific value functions might not be unique. Using simulations, we next show
that the identified set of choice-specific value functions (which we will simply refer to as
payoffs) shrinks to a singleton as Sincreases, where Sis the number of support points
in the discrete approximation of the continuous error distribution. For Sranging from
100 to 1000, we plot in Figure 2, the differences between the largest and the smallest
choice-specific value function for y=2across all values of pΔ3(using the linear-
programming procedures described in Section 4.3).
100 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Figure 2. The identified set of payoffs shrinks to a singleton across Δ3.ForeachvalueofS,we
plot the values of the differences maxwG(p) wminwG(p) wacross all values of pΔ3.Inthe
box plot, the central mark is the median, the edges of the box are the 25th and 75th percentiles,
the whiskers extend to the most extreme data points not considered outliers, and outliers are
plotted individually.
As is evident, even at small S, the identified payoffs are very close to each other in
magnitude. At S=1000, where computation is near instantaneous, for most of the val-
ues in the discretized grid of Δ3, the core is a singleton; when it is not, the difference in
the estimated payoff is less than 001. Similar results hold for the choice-specific value
functions for choices y=0and y=1, which are plotted in Figures 5and 6in Appendix C.
To sum up, it appears that this indeterminacy issue in the payoffs is not a worrisome
problem for even very modest values of S.
5.4 Comparison: MTA versus simulated maximum likelihood
One common technique used in the literature to estimate dynamic discrete-choice
models with nonstandard distribution of unobservables is the simulated maximum like-
lihood (SML). Our MTA method has a distinct advantage over SML: while MTA allows
the utility flows ¯
uy(x) for different choices yand states xto be nonparametric, the SML
approach typically requires parameterizing these utility flows as a function of a low-
dimensional parameter vector. This makes comparison of these two approaches awk-
ward. Nevertheless, here we undertake a comparison of the nonparametric MTA versus
the parametric SML approach. First we compare the performance of the two alterna-
tive approaches in terms of computational time. The computations were performed on
aquadcoreIntelXeon293-GHz UNIX workstation, and the results are presented in Ta-
ble 2.
From a computational point of view, the disadvantage of SML is that the dynamic
programming problem must be solved (via Bellman function iteration) for each trial
parameter vector, whereas the MTA requires solving a large-scale linear-programming
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 101
Table 2. Comparison: MTA versus simulated maximum likelihood.
SML:+MTA:++
SDiscretized Points Avg. Seconds Avg. Seconds
2000 19826
3000 24544
4000 26566
5000 40996
6000 705134
7000 1050175
8000 1294215
Note :+In this column we report time it takes to estimate the parameters θ=
00θ
01θ
10θ
11)as a local maximum of a simulated maximum likelihood, where θcor-
responds to ¯
uy=0(x) =θ00 +θ01xand ¯
uy=1(x) =θ10 +θ11x.++In this column we report
the time it takes to nonparametrically estimate the per-period utility flow.
problem—but only once. Table 2shows that our MTA procedure significantly outper-
forms SML in terms of computational speed. This finding, along with the results in Ta-
ble 1, show that MTA has the desirable properties of speed and accuracy, and also allows
for nonparametric specification of the utility flows ¯
uy(x).
Furthermore, as confirmed in our computations, the nonlinear optimization rou-
tines typically used to implement SML have trouble finding the global optimum; in
contrast, the MTA estimator, by virtue of being a linear-programming problem, always
finds the global optimum. Indeed, under the logistic assumption on unobservables and
linear-in-parameters utility, one advantage of the Hotz–Miller estimator for DDC mod-
els (vs. SML) is that the system of equations defining the estimator has a unique global
solution; in their discussion of this, Aguirregabiria and Mira (2010, p. 48) remark that “ex-
tendingtherangeofapplicabilityof...CCPmethodstomodelswhichdonotimposethe
CLOGIT [logistic] assumption is a topic for further research. This paper fills the gap: our
MTA estimator shares the computational advantages of the CLOGIT (conditional logit)
setup, but works for nonlogistic models. In this sense, the MTA estimator is a generalized
CCP estimator.
6. Empirical application:Revisiting Harold Zurcher
In this section, we apply our estimation procedure to the bus engine replacement data
set first analyzed in Rust (1987). In each week t, Harold Zurcher (bus depot manager)
chooses yt∈{01}after observing the mileage xtXand the realized shocks εt.Ifyt=0,
then he chooses not to replace the bus engine and yt=1means that he chooses to re-
place the bus engine. The states space is X={0129}, that is, we divided the mileage
space into 30 states, each representing a 12,500 increment in mileage since the last en-
gine replacement.28 Harold Zurcher manages a fleet of 104 identical buses, and we ob-
28This grid is coarser compared to Rust’s (1987) original analysis of these data, in which he divided the
mileage space into increments of 5000 miles. However, because replacement of engines occurred so infre-
quently (there were only 61 replacement in the entire 10-year sample period), using such a fine grid size
102 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
serve the decisions that he made, as well as the corresponding bus mileage at each time
period t. The duration between t+1and tis a quarter of a year, and the data set spans 10
years. Figures 7and 8in Appendix Csummarize the frequencies and mileage at which
replacements take place in the data set.
First, we can directly estimate the probability of choosing to replace and not to re-
place the engine for each state in X. Also directly obtained from the data are the Markov
transition probabilities for the observed state variable xtX, estimated as
ˆ
Pr(xt+1=j|xt=i yt=0)=
07405 if j=i,
02595 if j=i+1,
0otherwise,
ˆ
Pr(xt+1=j|xt=i yt=1)=
07405 if j=0,
02595 if j=1,
0otherwise.
For this analysis, we assumed a normal mixture distribution of the error term, specif-
ically, εt0εt11
2N(01)+1
2N(01
1+01x).29 We chose this mixture distribution so as to
allow the utility shocks to depend on mileage, which accommodates, for instance, oper-
ating costs that may be more volatile and unpredictable at different levels of mileage. At
the same time, these specifications for the utility shock distribution showcase the flex-
ibility of our procedure in estimating dynamic discrete-choice models for any general
error distribution. For comparison, we repeat this exercise using an error distribution
that is homoskedastic, that is, its variance does not depend on the state variable xt.The
result appears to be robust to using different distributions of εt0εt1.Wesetthedis-
count rate β=09.
To nonparametrically estimate ¯
uy=0(x),wefixed ¯
uy=1(x) to 0 for all xX.Hence,our
estimates of ¯
uy=0(x) should be interpreted as the magnitude of operating costs30 rela-
tive to replacement costs,31with positive values implying that replacement costs exceed
operating costs. The estimated utility flows from choosing y=0(do not replace) relative
to y=1(replace engine) are plotted in Figure 3. We only present estimates for mileage
within the range x∈[925], because within this range, the CCPs are in the interior of the
probability simplex (see footnote 28 and Figure 8in Appendix C).
Within this range, the estimated utility function does not vary much with increas-
ing mileage, that is, it has a slope that is not significantly different from zero. The re-
leads to many states that have zero probability of choosing replacement. Our procedure—like all other
CCP-based approaches—fails when the vector of conditional choice probability lies on the boundary of the
simplex.
29In this paper, we restrict attention to the case where the researcher fully knows the distribution of the
unobservables Qε, so that there are no unknown parameters in these distributions. In principle, the two-
step procedure proposed here can be nested inside an additional outer loop” in which unknown parame-
ters of Qεare considered, but identification and estimation in this case must rely on model restrictions in
addition to those considered in this paper. We are currently exploring such a model in the context of the
simpler static discrete-choice setting.
30Operating costs include maintenance, fuel, and insurance costs, plus Zurcher’s estimate of the costs of
lost ridership and good will due to unexpected breakdowns.
31To be pedantic, this also includes the operating cost at x=0.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 103
Figure 3. Estimates of utility flows ¯
uy=0(x) across values of mileage x.
covered utilities fall within the narrow band of 9and 95, which implies that on av-
erage the replacement cost is much higher than the maintenance cost, by a magni-
tude of 1819 times the variance of the utility shocks. It is somewhat surprising that
our results suggest that when the mileage goes beyond the cutoff point of 100,000
miles, Harold Zurcher perceived the operating costs to be inelastic with respect to ac-
cumulated mileage. It is worth noting that Rust (1987) mentioned that “[a]ccording to
Zurcher, monthly maintenance costs increase very slowly as a function of accumulated
mileage.”
To get an idea for the effect of sampling error on our estimates, we bootstrapped
our estimation procedure. For each of 100 resamples, we randomly drew 80 buses with
replacement from the data set, and reestimated the utility flows ¯
uy=0(x) using our pro-
cedure. The results are plotted in Figure 4. The evidence suggests that we are able to
obtain fairly tight cost estimates for states where there is at least one replacement, that
is, for x9(x112,500 miles), and for states that are reached often enough, that is, for
x22 (x275,000 miles).
7. Conclusion
In this paper, we have shown how results from Convex Analysis can be fruitfully ap-
plied to study identification in dynamic discrete-choice models; modulo the use of
these tools, a large class of dynamic discrete-choice problems with quite general util-
ity shocks becomes no more difficult to compute and estimate than the logit model
encountered in most empirical applications. This has allowed us to provide a natural
and holistic framework encompassing the papers of Rust (1987), Hotz and Miller (1993),
and Magnac and Thesmar (2002). While the identification results in this paper are com-
parable to other results in the literature, the approach we take, based on the convexity
104 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Figure 4. Bootstrapped estimates of utility flows ¯
uy=0(x). We plot the values of the boot-
strapped resampled estimates of ¯
uy=0(x). In each box plot, the central mark is the median, the
edgesoftheboxarethe25th and 75th percentiles, and the whiskers extend to the 5th and 95th
percentiles.
of the social surplus function Gand the resulting duality between choice probabilities
and choice-specific value functions, appears to be new. Far more than providing a mere
reformulation, this approach is powerful, and has significant implications in several di-
mensions.
First, by drawing the (surprising) connection between the computation of the G
function and the computation of optimal matchings in the classical assignment game,
we can apply the powerful tools developed to compute optimal matchings to dynamic
discrete-choice models.32
We believe the present paper opens a more flexible way to deal with discrete-choice
models. While identification is exact for a fixed structure of the unobserved heterogene-
ity, one may wish to parameterize the distribution of the utility shocks and do inference
on that parameter. The results and methods developed in this paper may also extend to
dynamic discrete games, with the utility shocks reinterpreted as players’ private infor-
mation.33 However, we leave these directions for future exploration.
32While the present paper has used standard linear-programming algorithms such as the Simplex algo-
rithm, other, more powerful matching algorithms such as the Hungarian algorithm may be efficiently put to
use when the dimensionality of the problem grows. The technical details are available in a supplementary
file on the journal website, http://qeconomics.org/supp/436/code_and_data.zip.Moreover,byreformulat-
ing the problem as an optimal matching problem, all existence and uniqueness results are inherited from
the theory of optimal transport. For instance, the uniqueness of a systematic utility rationalizing the con-
sumer’s choices follows from the uniqueness of a potential in the Monge–Kantorovich theorem.
33See, for example, Aguirregabiria and Mira (2007)orPesendorfer and Schmidt-Dengler (2008).
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 105
Appendix A: Background results
A.1 Convex analysis for discrete-choice models
Here, we give a brief review of the main notions and results used in the paper. We keep an
informal style and do not give proofs, but we refer to Rockafellar (1970) for an extensive
treatment of the subject.
Let uR|Y|be a vector of utility indices. For utility shocks {εy}yYdistributed
according to a joint distribution function Q, we define the social surplus function
as
G(u) =Emax
y{uy+εy}(23)
where uyis the yth component of u.IfEy)exists and is finite, then the function Gis
a proper convex function that is continuous everywhere. Moreover assuming that Qis
sufficiently well behaved (for instance, if it has a density with respect to the Lebesgue
measure), Gis differentiable everywhere.
Define the Legendre–Fenchel conjugate,orconvex conjugate of G,asG(p) =
supuR|Y|{p·uG(u)}. Clearly, Gis a convex function as it is the supremum of affine
functions. Note that the inequality
G(u) +G(p) p·u(24)
holds in general. The domain of Gconsists of pR|Y|for which the supremum is fi-
nite. In the case when Gis defined by (23), it follows from Norets and Takahashi (2013)
that the domain of Gcontains the simplex Δ|Y|, which is the set of pR|Y|such that
py0and yYpy=1. This means that our convex conjugate function is always well
defined.
The subgradient G(u) of Gat uis the set of pR|Y|such that
p·uG(u) p·uGu
holds for all uR|Y|.HenceGis a set-valued function or correspondence. The sub-
gradient G(u) is a singleton if and only if G(u) is differentiable at u;inthiscase,
G(u) =∇G(u).
One sees that pG(u) if and only if p·uG(u) =G(p), that is, if equality is reached
in inequality (24),
G(u) +G(p) =p·u (25)
This equation is itself of interest, and is known in the literature as Fenchel’s equality.By
symmetry in (25), one sees that pG(u) if and only if uG(p).Inparticular,when
both Gand Gare differentiable, then G=∇G1.
106 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Appendix B: Proofs
Proof of Proposition 1. Consider the yth component, corresponding to G(w)
∂wy:
G(w)
∂wy=
∂wymax
y[wy+εy]dQ (26)
=
∂wy
max
y[wy+εy]dQ (27)
=1wy+εywy+εyy=ydQ =p(y) (28)
(We have suppressed the dependence on xfor convenience.)
Proof of Theorem 1. This proof follows directly from Fenchel’s equality (see Rockafel-
lar (1970, Theorem 23.5); see also Appendix A.1), which states that
pG(w)
is equivalent to G(w) +G(p) =ypywy, which is equivalent in turn to
wG(p)
Proof of Theorem 2.Becauseεhas full support, the choice probabilities pwill lie
strictly in the interior of the simplex Δ|Y|.Let ˜
wG(p) and let wy=˜
wyG(˜
w).Onehas
G(w) =0, and an immediate calculation shows that G(w) =p.Letusnowshowthatw
is unique. Consider wand wsuch that G(w) =G(w)=0,andpG(w) and pG(w).
Assume w= wto get a contradiction; then there exist two distinct y0and y1such that
wy0wy1=w
y0w
y1; without loss of generality one may assume
wy0wy1>w
y0w
y1
Let Sbe the set of ε’s s u ch t h a t
wy0wy1
y1εy0>w
y0w
y1
wy0+εy0>max
y=y0y1{wy+εy}
w
y1+εy1>max
y=y0y1w
y+εy
Because εhas full support, Shas positive probability.
Let ¯
w=w+w
2.BecausepG(w) and pG(w),Gis linear on the segment [w w];
thus G(¯
w) =0,andthus
0=E[¯
wY(¯
wε) +εY(¯
wε)]=1
2E[wY(¯
wε) +εY(¯
wε)]+1
2Ew
Y(¯
wε) +εY(¯
wε)
1
2E[wY(wε) +εY(wε)]+1
2Ew
Y(w
ε) +εY(w
ε)
=1
2G(w) +Gw=0
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 107
Hence equality holds term by term, and
wY(wε) +εY(wε) =wY( ¯
wε) +εY(¯
wε)
w
Y(w
ε) +εY(w
ε) =w
Y(¯
wε) +εY(¯
wε)
For εS,Y(wε)=Y( ¯
w ε) =y0and Y(w
ε)=Y( ¯
w ε) =y1, and we get the desired
contradiction.
Hence w=w, and the uniqueness of wfollows.
Proof of Theorem 3.From
G(w0)=0and G(w G(w)) =G(w), and by the unique-
ness result in Theorem 2, it follows that
w0=wG(w)
Proof of Proposition 2. The proof is given in Galichon and Salanié (2012), but we
include it here to be self-contained. This connection between the Gfunction and a
matching model follows from manipulation of the variational problem in the definition
of G:
G(p) =sup
wRY
y
pywyEQmax
yY(wy+εy)
(29)
=sup
wRY
y
pywy+EQmin
yY(wyεy)

z(ε)
Defining c(yε) ≡−εy,onecanrewrite(29)as
G(p) =sup
wy+z(ε)c(yε)Ep[wY]+EQz(ε)(30)
As is well known from the results of Monge–Kantorovich (Villani (2003, Theorem 1.3)),
this is the dual problem for a mass transport problem. The corresponding primal prob-
lem is
G(p) =min
Yp
εˆ
Q
Ec(Yε)
which is equivalent to (16)–(18). Comparing (29)and(30), we see that the subdifferential
G(p) is identified with those elements wsuch that (w z) for some zsolves the dual
problem (30).
Proof of Theorem 4. Part (i) follows from Proposition 2and the fact that if wy+z(ε)
c(yε),thenEp[wY]+EQ[z(ε)]=G(p) if and only if (w z) is a solution to the dual
problem.
Part (ii) follows from the fact that z(ε) =supy{wyc(yε)}=supy{wy+εy};thus
EQ[z(ε)]=0is equivalent to EQ[supy{wy+εy}] = 0,thatis,G(w) =0.
108 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Proof of Theorem 5. We shall show that the vector of choice-specific value functions
derived from the MTA estimation procedure, denoted wn, converges to the true vector
w0. In our procedure, there are two sources of estimation error. The first is the sampling
error in the vector of choice probabilities, denoted pn. The second is the simulation error
involved in the discretization of the distribution of ε.WeletQndenote this discretized
distribution.
A distinctive aspect of our proof is that it utilizes the theory of mass transport;
namely convergence results for sequences of mass transport problems. For yY,let
ιydenote the |Y|-dimensional row vector with all zeros except a 1in the yth column.
This discretized mass transport problem from which we obtain wnis
sup
γM(Qnpn)Rd×Rd
·ε)γ(dε d ι) (31)
where M(Qnp
n)denotes the set of joint (discrete) probability measures with marginal
distributions Qnand pn.In(31), ιdenotes a random vector that is equal to ιywith prob-
ability pn
yfor yY. The dual problem used in the MTA procedure is
inf
zw z(ε) dQn(ε) +
y
wypn
y(32)
s.t. z(ε) ιy·εwyyε (33)
Gnwn
y=0(34)
where Gn(w) EQn(wy+y).Welet(znw
n)denote solutions to this discretized dual
problem (32). Recall (from the discussion in Section 2.3) that the extra constraint (34)in
the dual problem just selects among the many dual optimizing arguments (wnzn)cor-
responding to the optimal primal solution γn, and so does not affect the primal prob-
lem.34
Next we derive a more manageable representation of this constraint (34). From
Fenchel’s equality ((25)), we have ypn
ywn
y=Gn(wn)+G
n(pn)=G
n(pn)(with G
ndefined
as the convex conjugate function of Gn). Moreover, from Proposition 2, we know that
G
n(pn)can be characterized as the optimized dual objective function in (32). Hence, we
see that the constraint Gn(wn)=0is equivalent to zn(ε) dQn(ε) =0. We introduce this
latter constraint directly and rewrite the dual program:
inf
zw
y
wypn
y+z(ε)dQn(ε) (35)
s.t. z(ε) ιy·εwyyε (36)
z(ε)dQn(ε) =0(37)
34We note that, as discussed before, the discreteness of Qnimplies that (znw
n)will not be uniquely
determined, as the core of the assignment game for a finite market is not a singleton. But this does not
affect the proof, as our arguments below hold for any sequence of selections {znw
n}n.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 109
We will demonstrate consistency by showing that (znw
n)converge almost surely
(a.s.) to the dual optimizers in the “limit” dual problem, given by
inf
zw
y
wyp0
y(38)
s.t. z(ε) ιy·εwyyε (39)
z(ε)dQ =0(40)
We denote the optimizers in this limit problem by (w0z0),where,byconstruction,w0
represents the true” values of the choice-specific value functions. The difference be-
tween the discretized and limit dual problems is that Qnin the former has been replaced
by Q—the continuous distribution of ε—and the estimated choice probabilities pnhave
been replaced by the limit p0.
We proceed in two steps. First, we argue that the sequence of optimized dual pro-
grams (35) converges to the optimized limit dual program (38) a.s. Based on this, we
then argue that the sequence of dual optimizers (wnzn)necessarily converges to its
unique limit optimizers, (w0z0), a.s.
First step. By the Kantorovich duality theorem, we know that the optimized values
for the limit primal and dual programs coincide:
sup
γΠ(Q0p0)Rd×Rd
·ε)γ(dε d ι) =inf
y
wyp0
y+z(ε)dQ (41)
Moreover, both the primal and the dual problems in the discretized case are finite-
dimensional linear-programming problem, and by the usual LP duality, the optimal pri-
mal and dual problems for the discretized case also coincide:
Rd×Rd
·ε)γn(dε dι) =
y
wn
ypn
y+zn(ε) dQn
Given Assumption 1, and by Theorem 5.20 in Villani (2009,p.77),wehavethat,upto
a subsequence extraction, γn(the optimizing argument of (31)) converges weakly. In ad-
dition, by Theorem 5.30 in Villani (2009), the left-hand side of (41) has a unique solution
γ; hence, the sequence γnmust converge generally to γ. This implies a.s. convergence
of the value of the primal problems,
Rd×Rd
·ε)γn(dε dι) Rd×Rd
·ε)γ(dε d ι) a.s.
and, by duality, we must also have a.s. convergence of the discretized dual problem to
the limit problem:
y
wn
ypn
y+zn(ε) dQn
y
wyp0
y+z(ε)dQ a.s. (42)
110 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Second step. Next, we show that the discretized dual minimizers (znw
n)converge
a.s. For convenience, in what follows we will suppress the qualifier a.s.” from all the
statements. Let
¯
wn=min
ywn
y(43)
From examination of the dual problem (35), we see that znis the piecewise affine
function
zn(ε) =max
yιy·εwn
y(44)
Thus znis M-Lipschitz with M:= maxy|ιy|=1.Nowobservethat
zn(ε) +¯
wn=max
yιy·εwn
y+¯
wnmax
yιy·ε=: ¯
z(ε) (45)
and, letting ybe the argument of the minimum in (43),
zn(ε) +¯
wnιy·εwn
y+¯
wn=ιy·εmin
yιy·ε=: ¯
z(ε) (46)
Thus, by a combination of (45)and(46),
¯
z(ε) zn(ε) +¯
wn¯
z(ε) (47)
By zn(ε) dQn(ε) =0,wehavethat ¯
wnis uniformly bounded (sublinear): for some con-
stant K,|zn(ε)|≤C(1+|ε|)for every nand every ε. Hence the sequence znis uniformly
equicontinuous, and converges locally uniformly up to a subsequence extraction by As-
coli’s theorem. Let this limit function be denoted z0.By(42)andTheorem2,wededuce
that z, the optimizer in the limit dual problem, is unique,35 so that it must coincide with
the limit function z0.
By the definition of (wnzn)as optimizing arguments for (35), we have ywn
ypn
y
y¯
wnpy+[¯
z(ε)]dQn(ε) or
ywn
y¯
wnpn
y¯
z(ε)dQn(ε) =EQn¯
z
The second moment restrictions on Qn(condition (ii) in the theorem) imply that
EQn¯
z(ε) exists and converges to EQ¯
z. Hence, the nonnegative vectors (wn
y¯
wn)are
bounded; accordingly, the vectors (wn
y)are themselves bounded. This implies that wn
converges up to a subsequence to some limit point w, using the Bolzano–Weierstrass
theorem. This implies that ywn
ypn
yyw
ypyby bounded convergence. By Theo-
rem 2, we know that the limit point wmust coincide with w0, which is the unique op-
timizer in the dual limit problem (38).Thus,wehaveshownthatwnconverges to w0
a.s.
35Although the support of εis not bounded, the locally uniform convergence of znand the fact that the
second moments of Qnare uniformly bounded are enough to conclude.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 111
Appendix C: Additional table and figures
Table 3. This table reports the standard errors for the Monte Carlo experiment in Table 1.
Design RMSE (y=0)RMSE(y=1)R2(y=0)R2(y=1)
N=100,T=100 05586 (37134)02435 (01155)03438 (07298)07708 (02073)
N=100,T=500 01070 (00541)01389 (00638)07212 (02788)09119 (00820)
N=100,T=1000 00810 (00376)01090 (00425)08553 (01285)09501 (00352)
N=200,T=100 01244 (00594)01642 (00628)05773 (06875)08736 (01112)
N=200,T=200 01177 (00736)01500 (00816)07044 (02813)09040 (00842)
N=500,T=100 00871 (00375)01162 (00430)08109 (02468)09348 (00650)
N=500,T=500 00665 (00261)00829 (00290)08899 (01601)09678 (00374)
N=1000,T=100 00718 (00340)00928 (00344)08777 (01320)09647 (00314)
N=1000,T=1000 00543 (00176)00643 (00162)09322 (00577)09820 (00101)
Figure 5.ForeachvalueofS, we plot the values of the differences maxwG(p) w1
minwG(p) w1across all values of pΔ3. In the box plot, the central mark is the median, the
edgesoftheboxarethe25th and 75th percentiles, the whiskers extend to the most extreme data
points not considered outliers, and outliers are plotted individually.
112 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Figure 6.ForeachvalueofS, we plot the values of the differences maxwG(p) w2
minwG(p) w2across all values of pΔ3. In the box plot, the central mark is the median, the
edgesoftheboxarethe25th and 75th percentiles, the whiskers extend to the most extreme data
points not considered outliers, and outliers are plotted individually.
Figure 7. This figure plots the number of observed engine replacements as a function of x,the
mileage since last replacement.
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 113
Figure 8. This figure plots the empirical probability of engine replacement conditional on x,
the mileage since last replacement.
References
Aguirregabiria, V. and P. Mira (2002), “Swapping the nested fixed point algorithm: A class
of estimators for discrete Markov decision models.” Econometrica, 70, 1519–1543. [84]
Aguirregabiria, V. and P. Mira (2007), “Sequential estimation of dynamic discrete games.
Econometrica, 75, 1–53. [84,93,104]
Aguirregabiria, V. and P. Mira (2010), “Dynamic discrete choice structural models: A sur-
vey.” Journal of Econometrics, 156, 38–67. [101]
Anderson, S., A. de Palma, and J.-F. Thisse (1988), A representative consumer theory of
the logit model.” International Economic Review, 29 (3), 461–466. [89]
Arcidiacono, P. and R. Miller (2011), “Conditional choice probability estimation of
dynamic discrete choice models with unobserved heterogeneity.” Econometrica, 79,
1823–1867. [84,85,89,93]
Arcidiacono, P. and R. Miller (2013), “Identifying dynamic discrete choice models off
short panels.” Working paper. [93]
Bajari, P., V. Chernozhukov, H. Hong, and D. Nekipelov (2009), “Nonparametric and
semiparametric analysis of a dynamic game model.” Preprint. [84,92]
Becker, G. S. (1973), A theory of marriage: Part I. Journal of Political Economy, 81,
813–846. [93]
Berry, S. (1994), “Estimating discrete-choice models of production differentiation.
RAND Journal of Economics, 25, 242–262. [89,91]
Berry, S., A. Gandhi, and P. Haile (2013), “Connected substitutes and invertibility of de-
mand.” Econometrica, 81, 2087–2111. [89,91]
114 Chiong, Galichon, and Shum Quantitative Economics 7 (2016)
Berry, S., J. Levinsohn, and A. Pakes (1995), Automobile prices in market equilibrium.”
Econometrica, 63, 841–890. [93]
Bertsekas, D. (1987), Dynamic Programming Deterministic and Stochastic Models.
Prentice-Hall, Englewood Cliffs, NJ. [86]
Chiappori, P. and I. Komunjer (2010), “On the nonparametric identification of multiple
choice models.” Working paper. [91]
Cominetti, R., E. Melo, and S. Sorin (2010), A payoff-based learning procedure and its
application to traffic games. Games and Economic Behavior, 70, 71–83. [89]
Galichon, A. and B. Salanié (2012), Cupid’s invisible hand: Social surplus and identifi-
cation in matching models.” Preprint. [88,94,107]
Gretsky, N., J. Ostroy, and W. Zame (1999), “Perfect competition in the continuous as-
signment model.” Journal of Economic Theory, 85, 60–118. [97]
Haile, P., A. Hortacsu, and G. Kosenok (2008), “On the empirical content of quantal re-
sponse models.” American Economic Review, 98, 180–200. [89]
Hofbauer, J. and W. Sandholm (2002), “On the global convergence of stochastic fictitious
play.” Econometrica, 70, 2265–2294. [89]
Hotz, J. and R. Miller (1993), “Conditional choice probabilties and the estimation of dy-
namic models.” Review of Economic Studies, 60, 497–529. [90,93,103]
Hotz, J., R. Miller, S. Sanders, and J. Smith (1994), A simulation estimator for dynamic
models of discrete choice. Review of Economic Studies, 61, 265–289. [84,93]
Hu, Y. and M. Shum (2012), “Nonparametric identification of dynamic models with un-
observed heterogeneity.” Journal of Econometrics, 171, 32–44. [85]
Kasahara, H. and K. Shimotsu (2009), “Nonparametric identification of finite mixture
models of dynamic discrete choice. Econometrica, 77, 135–175. [85]
Keane, M. and K. Wolpin (1997), “The career decisions of young men.” Journal of Politi-
cal Economy, 105, 473–522. [85]
Kennan, J. (2006), A note on discrete approximations of continuous distributions.” Re-
port, University of Wisconsin, Madison. [95]
Magnac, T. and D. Thesmar (2002), “Identifying dynamic discrete decision processes.”
Econometrica, 70, 801–816. [84,90,103]
McFadden, D. (1978), “Modeling the choice of residential location.” In Spatial Interac-
tion Theory and Residential Location (A. Karlquist et al., eds.), North-Holland, Amster-
dam. [84,87]
Norets, A. (2009), Inference in dynamic discrete choice models with serially correlated
unobserved state variables.” Econometrica, 77, 1665–1682. [85]
Norets, A. and S. Takahashi (2013), “On the surjectivity of the mapping between utilities
and choice probabilities. Quantitative Economics, 4 (1), 149–155. [105]
Quantitative Economics 7 (2016) Duality in dynamic discrete-choice models 115
Norets, A. and X. Tang (2013), “Semiparametric inference in dynamic binary choice
models.” Preprint, Princeton University. [84,92,93]
Pakes, A. (1986), “Patents as options: Some estimates of the value of holding European
patent stocks.” Econometrica, 54, 1027–1057. [85]
Pesendorfer, M. and P. Schmidt-Dengler (2008), Asymptotic least squares estimators for
dynamic games.” Review of Economic Studies, 75, 901–928. [84,91,93,104]
Rockafellar, R. T. (1970), Convex Analysis. Princeton University Press, Princeton. [105,
106]
Rust, J. (1994), “Structural estimation of Markov decision processes. In Handbook of
Econometrics, Vol. 4 (R. Engle and D. McFaddened, eds.), North-Holland, Amsterdam.
[85,87,90]
Rust, J. (1987), “Optimal replacement of GMC bus engines: An empirical model of Harold
Zurcher. Econometrica, 55, 999–1033. [83,85,101,103]
Shapley, L. and M. Shubik (1971), “The assignment game I: The core.” International Jour-
nalofGameTheory, 1 (1), 111–130. [83,84,93,94,96]
Shi, X., M. Shum, and W. Song (2014), “Estimating multinomial models using cyclic
monotonicity.” Caltech Social Science Working Paper 1397. [87]
Villani, C. (2003), Topics in Optimal Transportation. Graduate Studies in Mathematics,
Vol. 58. American Mathematical Society, Providence, RI. [84,94,107]
Villani, C. (2009), Optimal Transport, Old and New. Springer, Berlin. [84,109]
Co-editor Rosa L. Matzkin handled this manuscript.
Submitted February, 2014. Final version accepted June, 2015.
... Subsequently, we estimate the discrete-choice model using the compressed data matrix, in place of the original high-dimensional dataset. Specifically in the second step, we estimate the discrete-choice model without needing to specify the distribution of the random utility errors by using inequalities derived from cyclic monotonicity: -a generalization of the notion of monotonicity for vector-valued functions which always holds for random-utility discrete-choice models; see (Rockafellar (1970), Chiong et al. (2016). ...
... Proposition 1 arises from the underlying convexity properties of the discrete-choice problem. We refer to Chiong et al. (2016) and Shi et al. (2016) for the full details. Briefly, the independence of and X implies that the social surplus function of the discrete choice model, defined as, ...
... Following Shi et al. (2016), we use the cyclic monotonic inequalities in (1) to estimate the parameters β. 9 Suppose we observe the aggregate behavior of many independent 8 See Theorem 1(i) in Chiong et al. (2016). This is the Williams-Daly-Zachary Theorem (cf. ...
Preprint
We introduce sparse random projection, an important dimension-reduction tool from machine learning, for the estimation of discrete-choice models with high-dimensional choice sets. Initially, high-dimensional data are compressed into a lower-dimensional Euclidean space using random projections. Subsequently, estimation proceeds using cyclic monotonicity moment inequalities implied by the multinomial choice model; the estimation procedure is semi-parametric and does not require explicit distributional assumptions to be made regarding the random utility errors. The random projection procedure is justified via the Johnson-Lindenstrauss Lemma -- the pairwise distances between data points are preserved during data compression, which we exploit to show convergence of our estimator. The estimator works well in simulations and in an application to a supermarket scanner dataset.
... Our paper is also related to the literature on convex analysis and discrete choice models. In particular, our characterization of QRE as the NE of a concave game exploits the same convex analytic tools used by Chiong et al. (2016), Galichon (2017), andFosgerau et al. (2020). Our paper differentiates from their results in at least three aspects. ...
... The next proposition follows from well-known results of convex analysis in discrete choice models (cf. Chiong et al. 2016;Galichon 2017;Fosgerau et al. 2020). ...
... In this section we review some basic definitions and notions of convex analysis applied to the case of discrete choice models. We follow the treatment in Chiong et al. (2016) and Galichon (2017). For an extensive discussion of the subject, we refer the reader to Rockafellar (1970) and Rockafellar and Wets (1997). ...
Article
Full-text available
This paper studies the uniqueness of a quantal response equilibrium (QRE) in a broad class of n-person normal form games. We make three main contributions. First, we show that the uniqueness of a QRE is determined by a precise relationship between the strong concavity of players’ payoffs, a bound on the intensity of strategic interaction, and the number of players in the game. Second, we introduce three new parametric models which allow for correlation among alternatives: the generalized nested logit, the ordered generalized extreme value (OGEV), and the nested logit (NL) models. For these three models, we provide a simple uniqueness condition which captures the degree of correlation between players’ actions. Finally, we apply our results to the study of network games. In particular, we apply the OGEV model to study treatment participation and public goods games. In addition, we apply the NL model to study technology adoption in networked environments. In these three applications, we show that the uniqueness of a QRE is determined by the network topology and its interaction with a measure of correlation between players’ actions.
... Estimators defined by mathematical programming have a long history in econometrics, dating back to Markowitz's (1952) classic work on optimal portfolio selection. More recently, Chiong, Galichon, and Shum (2016) and Chiong, Hsieh, and Shum (2017) propose estimators for problems in discrete-choice analysis which also take the form of mathematical programming. Due to the absence of an inference theory, researchers often resort to bootstrap in practice; e.g., Scherer (2002). ...
... Example 4: Nonparametric utility estimation in discrete-choice demand models. Chiong et al. (2016) show that the utility indices can be non-parametrically recovered from the market shares/choice probabilities, in the additive random utility model with known distribution of utility shocks. Suppose individual i obtains utility α j + ǫ ij by choosing alternative j, where the joint distribution ǫ i· ∼ G. ...
Preprint
We propose an inference procedure for estimators defined by mathematical programming problems, focusing on the important special cases of linear programming (LP) and quadratic programming (QP). In these settings, the coefficients in both the objective function and the constraints of the mathematical programming problem may be estimated from data and hence involve sampling error. Our inference approach exploits the characterization of the solutions to these programming problems by complementarity conditions; by doing so, we can transform the problem of doing inference on the solution of a constrained optimization problem (a non-standard inference problem) into one involving inference based on a set of inequalities with pre-estimated coefficients, which is much better understood. We evaluate the performance of our procedure in several Monte Carlo simulations and an empirical application to the classic portfolio selection problem in finance.
... In 1975, the Nobel Memorial Prize in Economic Sciences, which he shared with Tjalling Koopmans, was given "for their contributions to the theory of optimum allocation of resources." Subsequently, OT has been widely generalized and successfully applied in economics, including the classical discrete choice model [6,14], the partial identification with random sets [13,12], the hedonic equilibrium problem [10,25], and the price discrimination and implementability [22], among others. ...
Preprint
Full-text available
In this paper, we propose the optimal production transport model, which is an extension of the classical optimal transport model. We observe in economics, the production of the factories can always be adjusted within a certain range, while the classical optimal transport does not take this situation into account. Therefore, differing from the classical optimal transport, one of the marginals is allowed to vary within a certain range in our proposed model. To address this, we introduce a multiple relaxation optimal production transport model and propose the generalized alternating Sinkhorn algorithms, inspired by the Sinkhorn algorithm and the double regularization method. By incorporating multiple relaxation variables and multiple regularization terms, the inequality and capacity constraints in the optimal production transport model are naturally satisfied. Alternating iteration algorithms are derived based on the duality of the regularized model. We also provide a theoretical analysis to guarantee the convergence of our proposed algorithms. Numerical results indicate significant advantages in terms of accuracy and efficiency. Furthermore, we apply the optimal production transport model to the coal production and transport problem. Numerical simulation demonstrates that our proposed model can save the production and transport cost by 13.17%.
... Outside of the firm problem, duality has been used in the presence of heterogeneity in discrete choice(McFadden, 1981), matching models(Galichon and Salanié, 2015), hedonic models (Chernozhukov,Galichon, Henry and Pass, 2017), dynamic discrete choice(Chiong, Galichon and Shum, 2016), and the additively separable framework ofAllen and Rehbeck (2019). ...
Article
Full-text available
This paper studies nonparametric identification and counterfactual bounds for heterogeneous firms that can be ranked in terms of productivity. Our approach works when quantities and prices are latent, rendering standard approaches inapplicable. Instead, we require observation of profits or other optimizing-values such as costs or revenues, and either prices or price proxies of flexibly chosen variables. We extend classical duality results for price-taking firms to a setup with discrete heterogeneity, endogeneity, and limited variation in possibly latent prices. Finally, we show that convergence results for nonparametric estimators may be directly converted to convergence results for production sets.
... Outside of the firm problem, duality has been used in the presence of heterogeneity in discrete choice(McFadden, 1981), matching models (Galichon and Salanié, 2015), hedonic models (Chernozhukov, Galichon, Henry and Pass, 2017), dynamic discrete choice(Chiong, Galichon and Shum, 2016), and the additively separable framework ofAllen and Rehbeck (2019). ...
Preprint
Full-text available
This paper studies nonparametric identification and counter-factual bounds for heterogeneous firms that can be ranked in terms of productivity. Our approach works when quantities and prices are latent, rendering standard approaches inapplicable. Instead, we require observation of profits or other optimizing-values such as costs or revenues, and either prices or price proxies of flexibly chosen variables. We extend classical duality results for price-taking firms to a setup with discrete heterogeneity, endogeneity, and limited variation in possibly latent prices. Finally, we show that convergence results for nonparametric estimators may be directly converted to convergence results for production sets.
... This result was extended to the nonsmooth case (where no regularity as-sumption is made on the distribution of ε) by Chiong, Galichon and Shum (2016), where a linear programming approch was provided for computational purposes. It has been extended to the continuous choice by Chernozhukov, Galichon, Henry and Pass (2021), and beyond additive random utility models by Bonnet et al. (2021). ...
Preprint
Full-text available
Optimal transport has become part of the standard quantitative economics toolbox. It is the framework of choice to describe models of matching with transfers, but beyond that, it allows to: extend quantile regression; identify discrete choice models; provide new algorithms for computing the random coefficient logit model; and generalize the gravity model in trade. This paper offer a brief review of the basics of the theory, its applications to economics, and some extensions.
... Optimal transport has a long history in economics and operations research (see Kantorovitch, 1958 for an early treatment). Furthermore, optimal transport has recently witnessed renewed interest in economics and applied econometrics, including the analysis of identification of dynamic discrete-choice models (Chiong, Galichon and Shum, 2016), in vector quantile regression (Carlier, Chernozhukov and Galichon, 2016), in empirical matching models (Galichon, Kominers and Weber, 2018), and in latent variables (Arellano and Bonhomme, 2019). To the best of our knowledge, this is the first principled use of optimal transport for reduced form analysis in economics, notwithstanding matching applications. ...
Preprint
Full-text available
Black markets can reduce the effects of distortionary regulations by reallocating scarce resources toward consumers who value them most. The illegal nature of black markets, however, creates transaction costs that reduce the gains from trade. We take a partial identification approach to infer gains from trade and transaction costs in the black market for Beijing car license plates, which emerged following their recent rationing. We find that at least 11% of emitted license plates are illegally traded. The estimated transaction costs suggest severe market frictions: between 61% and 82% of the realized gains from trade are lost to transaction costs.
... It makes use of the Hotz-Miller inversion (Hotz and Miller (1993)), which, in turn, establishes that the difference of conditional value functions is a known function of the CCPs: v a (x) − v j (x) = φ aj (p(x)), where φ aj (·) is again derived only from G. When ε it follows the type I extreme value distribution, φ aj (p(x)) = log p a (x) − log p j (x). Chiong, Galichon, and Shum (2016) proposed a novel approach that can calculate ψ a and φ aj for a broad set of distributions G. ...
Article
Full-text available
Dynamic discrete choice (DDC) models are not identified nonparametrically, but the non‐identification of models does not necessarily imply the nonidentification of counterfactuals. We derive novel results for the identification of counterfactuals in DDC models, such as non‐additive changes in payoffs or changes to agents' choice sets. In doing so, we propose a general framework that allows the investigation of the identification of a broad class of counterfactuals (covering virtually any counterfactual encountered in applied work). To illustrate the results, we consider a firm entry/exit problem numerically, as well as an empirical model of agricultural land use. In each case, we provide examples of both identified and nonidentified counterfactuals of interest.
Preprint
Full-text available
We investigate a model of one-to-one matching with transferable utility and general unobserved heterogeneity. Under a separability assumption that generalizes Choo and Siow (2006), we first show that the equilibrium matching maximizes a social gain function that trades off exploiting complementarities in observable characteristics and matching on unobserved characteristics. We use this result to derive simple closed-form formulae that identify the joint matching surplus and the equilibrium utilities of all participants, given any known distribution of unobserved heterogeneity. We provide efficient algorithms to compute the stable matching and to estimate parametric versions of the model. Finally, we revisit Choo and Siow's empirical application to illustrate the potential of our more general approach.
Article
Full-text available
This paper proposes a new identification and estimation approach to semi-parametric multinomial choice models that easily applies to not only cross-sectional settings but also panel data settings with unobservable fixed effects. Our approach is based on cyclic monotonicity, which is a defining feature of the random utility framework underlying multinomial choice models. From the cyclic monotonicity property, we derive identifying inequalities without requiring any shape restriction for the distribution of the random utility shocks. These inequalities point identify model parameters under straightforward assumptions on the covariates. We propose a consistent estimator based on these inequalities, and apply it to a panel data set to study the determinants of the demand of bathroom tissue.
Article
This paper analyzes the identification of flow payoffs and counterfactual choice probabilities (CCPs) in single-agent dynamic discrete choice models. We develop new results on non-stationary models where the time horizon for the agent extends beyond the length of the data (short panels). We show that counterfactual CCPs in short panels are identified when induced by temporary policy changes affecting payoffs, even though the utility flows are not. Counterfactual CCPs induced by innovations to state transitions are generally not identified unless the model exhibits single action finite dependence, and the payoffs of those actions establishing single action finite dependence are known.
Article
We consider the invertibility (injectivity) of a nonparametric nonseparable demand system. Invertibility of demand is important in several contexts, including identification of demand, estimation of demand, testing of revealed preference, and economic theory exploiting existence of an inverse demand function or (in an exchange economy) uniqueness of Walrasian equilibrium prices. We introduce the notion of "connected substitutes" and show that this structure is sufficient for invertibility. The connected substitutes conditions require weak substitution between all goods and sufficient strict substitution to necessitate treating them in a single demand system. The connected substitutes conditions have transparent economic interpretation, are easily checked, and are satisfied in many standard models. They need only hold under some transformation of demand and can accommodate many models in which goods are complements. They allow one to show invertibility without strict gross substitutes, functional form restrictions, smoothness assumptions, or strong domain restrictions. When the restriction to weak substitutes is maintained, our sufficient conditions are also "nearly necessary" for even local invertibility.
Article
This paper provides structural estimates of a dynamic model of schooling, work, and occupational choice decisions based on 11 years of observations on a sample of young men from the 1979 youth cohort of the National Longitudinal Surveys of Labor Market Experience (NLSY). The structural estimation framework that we adopt fully imposes the restrictions of the theory and permits an investigation of whether such a theoretically restricted model can succeed in quantitatively fitting the observed data patterns. We find that a suitably extended human capital investment model can in fact do an excellent job of fitting observed data on school attendance, work, occupational choices, and wages in the NLSY data on young men and also produces reasonable forecasts of future work decisions and wage patterns.
Article
We introduce an approach for semiparametric inference in dynamic binary choice models that does not impose distributional assumptions on the state variables unobserved by the econometrician. The proposed framework combines Bayesian inference with partial identification results. The method is applicable to models with finite space of observed states. We demonstrate the method on Rust's model of bus engine replacement. The estimation experiments show that the parametric assumptions about the distribution of the unobserved states can have a considerable effect on the estimates of per-period payoffs. At the same time, the effect of these assumptions on counterfactual conditional choice probabilities can be small for most of the observed states.
Article
This note considers a standard multinomial choice model. It is shown that if the distribution of additive utility shocks has a density, then the mapping from deterministic components of utilities to choice probabilities is surjective. In other words, any vector of choice probabilities can be obtained by selecting suitable utilities for alternatives. This result has implications for at least three areas of interest to econometricians: the Hotz and Miller (1993) estimator for structural dynamic discrete choice models, nonparametric identification of multinomial choice models, and consistency of conditional density estimators based on covariate dependent mixtures.
Article
Introduction The Kantorovich duality Geometry of optimal transportation Brenier's polar factorization theorem The Monge-Ampere equation Displacement interpolation and displacement convexity Geometric and Gaussian inequalities The metric side of optimal transportation A differential point of view on optimal transportation Entropy production and transportation inequalities Problems Bibliography Table of short statements Index.