Bayesian analysis of the ordered probit model with endogenous selection
ABSTRACT This paper presents a Bayesian analysis of an ordered probit model with endogenous selection. The model can be applied when analyzing ordered outcomes that depend on endogenous covariates that are discrete choice indicators modeled by a multinomial probit model. The model is illustrated by analyzing the effects of different types of medical insurance plans on the level of hospital utilization, allowing for potential endogeneity of insurance status. The estimation is performed using the Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution of the parameters in the model.
- SourceAvailable from: genetics.ucla.edu[show abstract] [hide abstract]
ABSTRACT: A vast literature in statistics, biometrics, and econometrics is concerned with the analysis of binary and polychotomous response data. The classical approach fits a categorical response regression model using maximum likelihood, and inferences about the model are based on the associated asymptotic theory. The accuracy of classical confidence statements is questionable for small sample sizes. In this article, exact Bayesian methods for modeling categorical response data are developed using the idea of data augmentation. The general approach can be summarized as follows. The probit regression model for binary outcomes is seen to have an underlying normal regression structure on latent continuous data. Values of the latent data can be simulated from suitable truncated normal distributions. If the latent data are known, then the posterior distribution of the parameters can be computed using standard results for normal linear models. Draws from this posterior are used to sample new latent data, and the process is iterated with Gibbs sampling. This data augmentation approach provides a general framework for analyzing binary regression models. It leads to the same simplification achieved earlier for censored regression models. Under the proposed framework, the class of probit regression models can be enlarged by using mixtures of normal distributions to model the latent data. In this normal mixture class, one can investigate the sensitivity of the parameter estimates to the choice of “link function,” which relates the linear regression estimate to the fitted probabilities. In addition, this approach allows one to easily fit Bayesian hierarchical models. One specific model considered here reflects the belief that the vector of regression coefficients lies on a smaller dimension linear subspace. The methods can also be generalized to multinomial response models with J > 2 categories. In the ordered multinomial model, the J categories are ordered and a model is written linking the cumulative response probabilities with the linear regression structure. In the unordered multinomial model, the latent variables have a multivariate normal distribution with unknown variance-covariance matrix. For both multinomial models, the data augmentation method combined with Gibbs sampling is outlined. This approach is especially attractive for the multivariate probit model, where calculating the likelihood can be difficult.Journal of The American Statistical Association - J AMER STATIST ASSN. 01/1993; 88(422):669-679.
- [show abstract] [hide abstract]
ABSTRACT: The majority of Medicare beneficiaries supplement the basic Medicare benefit package with additional insurance. This article reviews the literature on Medicare supplemental insurance. Supplemental insurance plays a significant role in protecting Medicare beneficiaries from financial risk. The two major sources of coverage for beneficiaries--former employers and individual purchase--differ in benefit structure and characteristics of policy holders. Employer-sponsored policies tend to provide broader coverage with more cost sharing than individually purchased policies, and holders of employer policies tend to be younger, wealthier, healthier, and better educated. Supplemental insurance policies have been shown to be associated with higher Medicare expenditures, but there is no consensus on the cause of the higher expenditures. Some studies attribute the increase to adverse selection of policies; other studies point to the moral hazard effect of insurance.Medical Care Research and Review 07/2001; 58(2):131-61. · 3.01 Impact Factor
- [show abstract] [hide abstract]
ABSTRACT: We use data from the Health and Retirement Study to examine the elderly's out-of-pocket health care spending. We find that Medicare HMOs, employer supplements, and Medicaid effectively insulate against the risk of high expenditures. At the ninetieth percentile, Medicare beneficiaries with employer supplements or enrolled in Medicare HMOs spend 1,600 dollars less out of pocket than beneficiaries with traditional Medicare spend. For the poor elderly, Medicaid offers similar protection. Among the near-poor elderly, there is little employer coverage, so Medicare HMOs provide most of the protection against financial risk. There is evidence that Medicare HMO benefits have eroded since 1998, raising the question of whether the near-poor have lost financial protection since then.Health Affairs 01/2003; 22(3):194-202. · 4.64 Impact Factor
Bayesian Analysis of the Ordered Probit Model with
Murat K. Munkin
Department of Economics
531 Stokely Management Center
University of Tennessee
Knoxville, TN 37919, U.S.A.
Pravin K. Trivedi
Department of Economics
Wylie Hall 105
Bloomington, IN 47405, U.S.A
February 2, 2007
This paper presents a Bayesian analysis of an ordered probit model with endoge-
nous selection. The model can be applied when analyzing ordered outcomes that
depend on endogenous covariates that are discrete choice indicators modeled by a
multinomial probit model. The model is illustrated by analyzing the effects of dif-
ferent types of medical insurance plans on the level of hospital utilization, allowing
for potential endogeneity of insurance status. The estimation is performed using
the Markov Chain Monte Carlo (MCMC) methods to approximate the posterior
distribution of the parameters in the model.
Key words: Treatment Effects; MCMC; Discontinuity Regression.
∗We thank Jeff Racine for comments on an earlier version of the paper presented at the 2004 meetings
of the Southern Economic Association. In revising and rewriting the paper we have benefitted from the
comments of two anonymous referees, an Associate Editor, and Co-Editor John Geweke. However, we
remain responsible for the current version.
This paper develops an estimation method for the ordered probit model with endogenous
covariates, termed the ordered probit model with endogenous selection (OPES). Specif-
ically, we analyze the effect of endogenous multinomial choice indicators on an ordinal
dependent variable. Endogeneity is modeled using a correlated latent variable structure,
with multinomial choice represented by the multinomial probit model. Markov chain
Monte Carlo (MCMC) methods are then used to approximate the posterior distribution
of the parameters and treatment effects. The application of the model is illustrated by
analyzing the effects of different types of medical insurance plans on the level of hospital
care utilization by the US adult population.
The ordered probit (OP) model with exogenous covariates is well established in the
literature. Extending it to the case where some covariates are endogenous is empirically
useful. Then it can be applied also to models with count dependent variables whose
frequencies are restricted to just a few support points. Thus, the OPES model may
serve as an alternative to the existing count models with endogenous treatment.
Our model analyzes the effect of a set of endogenous choice indicators on a count
variable whose distribution displays a very large proportion of zeros. Specifically we
consider cases when even extensions of the Poisson model that allow for overdispersion
do not provide an adequate fit. Examples of such extensions include the negative bi-
nomial and the Poisson-lognormal mixture models (Munkin and Trivedi, 2003). There
are at least two empirical considerations which motivate this paper. First, using obser-
vational data we want to model an outcome (the biannual number of hospitalizations)
which is a count variable, but more than 80 percent of observations are zeros, and the
distribution has a short tail. Second, the outcome depends on some categorical dummy
variables (e.g., types of health insurance plans) which are potentially endogenous, i.e.,
jointly determined with the outcome variable. This is simply a particular case of an
often-encountered model in which some of the covariates are endogenous dummy vari-
ables. We develop a model that generalizes the OP model by including endogenous
choice variables among the covariates.
Our approach is Bayesian. The full model consists of an ordered probit equation and
a set of discrete choice equations. The interdependence between the OP and discrete
choice equations is modeled using a correlated latent variable structure. The defined
latent variables are made a part of the parameter set. Augmenting full conditional
densities with latent variables, following Tanner and Wong (1987) and others, simpli-
fies the MCMC algorithm. Our analysis is related to several previous contributions,
including Albert and Chib (1993), Cowles (1996), Chib and Hamilton (2000), Geweke,
Gowrisankaran, and Town (2003), Poirier and Tobias (2003), and Li and Tobias (2006).
Albert and Chib (1993) present a Bayesian treatment of the OP model using the Gibbs
sampler. However, the proposed Gibbs sampler mixes poorly in the case of many thresh-
old parameters and large samples. Geweke et al. (2003) analyze the endogenous binary
probit model (EBP) to study the quality of hospitals based on mortality rates in treating
pneumonia. In their analysis the patients self-select hospitals, so choices are endoge-
nous. Our model can be interpreted as an extension or synthesis of both the OP model
and the EBP model.
The rest of the paper is organized as follows. Section 2 describes the OPES model.
Section 3 presents the MCMC estimation algorithm for the model. Section 4 presents
an illustrative application using the Medical Expenditure Panel Survey (MEPS) data
on hospitalizations and health insurance. Section 5 concludes.
2. An Ordered Probit Model with Endogenous Selection
Assume that we observe N independent observations for individuals who choose the
treatment variable among J alternatives. Let di = (d1i,d2i,...,dJ−1i) be binary ran-
dom variables for individual i (i = 1,...,N) representing this choice (category J is the
baseline) such that dji= 1 if alternative j is chosen and dji= 0 otherwise. Define the
multinomial probit model using the multinomial latent variable structure which rep-
resents gains in utility received from the choices, relative to the utility received from
choosing alternative J. Let the (J − 1) × 1 random vector Zibe defined as
Zi= Wiα + εi,
where Wiis a (J −1)×q matrix of exogenous regressors, α is a q×1 parameter vector,
where ZJi= 0 and I[0,+∞)is the indicator function for the set [0,+∞). The distribution
of the error term εiis (J − 1)-variate normal N (0,Σ). For identification it is customary
to restrict the leading diagonal element of Σ to unity.
I[0,+∞)(Zji− Zli),j = 1,...,J,
We will impose identifying
restrictions after defining the entire model.
To model the ordered dependent variable we assume that there is another latent
ithat depends on the outcomes of disuch that
i= Xiβ + diρ + ui,
where Xi is a 1 × p vector of exogenous regressors, β is p × 1 and ρ is (J − 1) × 1
parameter vectors. Define Yias
where τ0, τ1, ...,τM are threshold parameters and m = 1,...,M. In our application Yi
is an ordered variable measuring the degree of medical service utilization. For identi-
fication, it is standard to set τ0= −∞ and τM= ∞ and additionally restrict τ1= 0.
Denote τ = (τ2,...,τM−1). The choice of insurance is potentially endogenous to utiliza-
tion and this endogeneity is modeled through correlation between uiand εi. Assume
that they are jointly normally distributed such that cov(εi,ui) = δ with variance of ui
restricted for identification since Y∗
Then ui|εi∼ N¡δ0Σ−1εi,1¢.
We present our estimation strategy by first simplifying the exposition of the model
to be consistent with the application and reparameterizing Σ. In the application the
iis latent. Assume that V ar(ui) = 1 + δ0Σ−1δ.
multinomial choice is among three alternatives so that J = 3. Let Zi=
thateZi= Z2iand use tilde to denote all parameters and variables related toeZi. Denote
for identification such that V ar(ε1i) = 1 + σ2
V ar(eZi) = e σ22where in fact e σ22= σ22, cov(eZi,Z1i) = σ21and restrict variance of Z1i
22. Then ε1i|e εi∼ N(σ21e σ−1
Denote π0= δ0Σ−1, π0= (π1,e π) (where π1 is 1 × 1 and e π is 1 × 1) and e σ21 =
(π1,e π0,e σ21,e σ22). Then the model can be presented as
Xiβ + diρ + (Z1i− W1iα1)π1+ (eZi−f
density of the observable data and latent variables is
22. There is a one-to-one correspondence between parameter sets (δ,Σ) and
Wie α)e π + ζi,
Wie α)e σ21+ ηi,
Wie α +e εi,
Let ∆i= (Xi,Wi, τ, β,ρ,π1,e π,α1, e α,e σ21,e σ22). For each observation i the joint
= (2π)−3/2e σ−1/2
i− Xiβ − diρ − (Z1i− W1iα1)π1− (eZi−f
Wie α)e π
The joint distribution of observable and latent variables for all observations is the
product of N such independent terms over i = 1,...N. The posterior density is propor-
tional to the product of the prior density of the parameters and the joint distribution
of observables and included latent variables.
In order to identify causal effects of the endogenous treatment variables on the out-
come variable one needs exclusion restrictions, which arise if there are variables which
affect the insurance choices but not utilization. We discuss such restrictions at greater
length in the application section. However, there is a further identification issue in that