Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models
ABSTRACT We present several modifications of the Poisson and negative binomial models for count data to accommodate cases in which the number of zeros in the data exceed what would typically be predicted by either model. The excess zeros can masquerade as overdispersion. We present a new test procedure for distinguishing between zero inflation and overdispersion. We also develop a model for sample selection which is analogous to the Heckman style specification for continuous choice models. An application is presented to a data set on consumer loan behavior in which both of these phenomena are clearly present.
 [Show abstract] [Hide abstract]
ABSTRACT: We analyze the claims database of a large malpractice insurer covering more than 8,000 physicians and 9,300 claims. Applying empirical Bayes methods in a regression setting, we construct a predictor of each physician's underlying propensity to incur malpractice claims. Our explanatory factors are physician demographics (age, sex, specialty, training) and physician practice pattern characteristics (practice setting, procedures performed, practice intensity, special risk factors, and characteristics of hospital(s) on staff of). We divide physicians into medical and surgical/ancillary specialty categories and fit separate models to each. In the surgical/ancillary specialty group, physician characteristics can effectively distinguish between more and less claimsprone physicians. Physician characteristics have somewhat less predictive power in the medical specialty group. As measured by predictive information, physician characteristics are superior to 10 years of claims history. Insofar as medical malpractice claims can be thought of as extreme indicators of poorquality care, this finding suggests that easily gathered physician characteristics can be helpful in designing targeted quality of care improvement policies.Journal of Empirical Legal Studies 04/2007; · 1.40 Impact Factor  SourceAvailable from: SSRN[Show abstract] [Hide abstract]
ABSTRACT: Recently, the sport of ice climbing has seen a dramatic increase in popularity. This paper uses the travel cost method to estimate the demand for ice climbing in Hyalite Canyon, Montana, one of the premier ice climbing venues in North America. Access to Hyalite and other ice climbing destinations have been put at risk due to liability issues, public land management agendas, and winter road conditions. To this point, there has been no analysis on the economic benefits of ice climbing. In addition to the novel outdoor recreation application, this study applies econometric methods designed to deal with "excess zeros" in the data. Depending upon model specification, per person per trip values are estimated to be in the range of $76 to $135.Journal of Environmental Management 11/2009; 91(4):101220. · 3.06 Impact Factor  SourceAvailable from: Martijn Johan Burger[Show abstract] [Hide abstract]
ABSTRACT: Using a dataset of greenfield investments for the period of 19972008, the paper by three Dutch researchers seeks to determine to what extent Chinese and Indian foreign direct investment (FDI) in Europe is attracted to specific regional location factors. The authors utilize descriptive statistics and a negative binominal estimation method to analyze the number of greenfield investments, in an effort to explain why Chinese and Indian FDI is quite unevenly distributed across Europe. Support is marshaled for the hypothesis that Chinese and Indian FDI is more horizontal than vertical in character, and that divergence over time between current core European locations and more peripheral ones is increasing.Eurasian Geography and Economics 03/2010; 51(2):254273. · 1.69 Impact Factor
Page 1
Accounting for Excess Zeros and Sample Selection
in Poisson and Negative Binomial Regression Models
by
William H. Greene
Department of Economics
Stern School of Business
New York University
44 West 4th Street
New York, NY 100121126
Phone 2129980876
Internet: wgreene@stern.nyu.edu
March, 1994
Abstract: We present several modifications of the Poisson and negative binomial models for count data to
accommodate cases in which the number of zeros in the data exceed what would typically be predicted by
either model. The excess zeros can masquerade as overdispersion. We present a new test procedure for
distinguishing between zero inflation and overdispersion. We also develop a model for sample selection
which is analogous to the Heckman style specification for continuous choice models. An application is
presented to a data set on consumer loan behavior in which both of these phenomena are clearly present.
JEL Classification: C12, C13  Econometric and Statistical Methods and Estimation
Field Designation: Cross Section Econometrics
We are grateful for the able research assistance of Jin Yoo. Errors in the paper are our responsibility.
Page 2
Modified Poisson Models
───────────────────────────────────────────
1 Introduction
1
1The third of these could show up as if it were the first or second. Terza and Wilson (1990) introduce a variant of our zero altered
Poisson model specifically to allow for overdispersion by (at least in principle) disconnecting the Poisson mean and variance.
The Poisson regression model forms the basis for a large proportion of the received empirical
literature involving discrete outcomes and count data. However, real data considerations and the
shortcomings of the basic model, itself, have led researchers to employ a wide variety of alternative
specifications. Modifications of the Poisson model have been suggested to accommodate:
· over and underdispersion, which is a violation of the Poisson restriction that the variance of
the observed random variable equal its mean,
· unobserved individual heterogeneity, for example, in panel data (Hausman, et al. (1984)),
which mandates the introduction of a disturbance term into the Poisson specification much like
that which appears in conventional regression models, and which induces overdispersion,
· `nonpoissonness,' (Johnson and Kotz (1969)) which is reflected in an overabundance or
underabundance of certain specific values, usually zero.1
Another issue which arises occasionally (e.g., Heilbron (1989), Smith (1990)), but remains to be examined in
detail is an extension of the Poisson regression model to
· sample selection, which will likely produce distortions in the inference drawn from count data
by conventional methods similar to those which arise in the analysis of continuous choice
models.
The literature on the Poisson regression model often discusses separately specification and
estimation of the model and its variants and specification testing in the context of the basic model. The
focus of this paper is primarily the first of these. We will present two modifications of the Poisson model, a
model for handling `excess zeros,' and a specification for modeling sample selection in the spirit of Heckman
(1979). Since excess zeros will masquerade as overdispersion, we are also interested in the first two points.
Our first model extends an existing literature. In addition, we will present a method of testing this extension
of the model against the base (Poisson or negative binomial) case. The test procedure can also be applied to
the problem of testing for overdispersion in the Poisson model, so, in passing, we will add another method to
the set of tools that have already been proposed for this problem. Our second model is a sample selection
model which does not appear to have been treated elsewhere.
Page 3
Modified Poisson Models
───────────────────────────────────────────
2
This paper proceeds as follows: Section 2 will review some of the existing literature on the Poisson
model and tie together some widely dispersed but related contributions. We begin with a cursory
presentation of the basic Poisson specification. Section 3.1 will describe the specification and estimation of
our `zero inflated Poisson' (ZIP, Lambert (1992)) regression. Restrictive variants of the ZIP model have
appeared elsewhere in the literature. In addition to presenting a more general model, we will propose a new
method of testing the specification against the basic Poisson model. Section 3.2 will describe a framework
for analyzing sample selection in the context of the Poisson model. Although this model has been hinted at
in various places, it appears not to have been formalized previously. This section will detail a sample
selection model and provide methods of parameter estimation and computation of appropriate asymptotic
covariance matrices for the estimates. The ZIP and sample selection models are combined in Section 3.3. In
section 4, we present an application of the techniques to an aspect of consumer behavior, default on credit
card loans. We will use the Poisson model to examine the number of major derogatory reports to a credit
reporting agency for a group of credit card applicants. The overwhelming majority of applicants have `clean'
(at least in this respect) credit histories, so there is a prevalence of zeros in the data. Hence the ZIP model is
appropriate. We will apply the model to a general population and to a heavily screened subpopulation (those
whose applications for credit were accepted), in which the screen clearly produces the sort of
nonrandomness found in settings in which Heckman's selection model is usually applied. The statistical
results suggest unambiguously that models more general than the basic Poisson regression are called for.
Conclusions are drawn in Section 5.
Page 4
Modified Poisson Models
───────────────────────────────────────────
2 Poisson and Negative Binomial Regression Models
3
2See Johnson and Kotz (1969) for an extensive survey on the unconditional model. A useful overview is given by Cameron and
Trivedi (1986).
The Poisson model arises in many contexts as the probability distribution for the discrete,
nonnegative count of the number of occurrences of an event.2 Applications of the basic model include:
· the number of failures of electronic components per unit of time,
· the number of individuals arriving at a serving station (bank teller, gas station, cash register,
etc.) within a fixed interval,
· the number of homicides per year (Grogger (1990a)),
· the number of patents applied for and received (Hausman, et al. (1984)),
and so on.3 The unconditional probability distribution for a Poisson random variable is given by
i
i
λ
y
()
t
i
ii
iii
i
( )
t
! y
e
Prob[Y = _ ] = p( _ ) =
y
t
, = 0,1,...
yy
t
λ
(2.1)
where λ is the mean occurrence rate per unit of time and ti is the length of the interval over which yi is
observed.4 For our purposes, no generality will be lost by assuming that the time interval is one unit for each
observation.5 It is easily shown that the unconditional mean of yi given a unit length interval is λ. The
model lends itself conveniently to a regression framework by defining the conditional mean function,
i x
i
ii
i
E[ _
y
, =1] = =
t
,
e
x
′ β
λ
where here and in what follows, xi will denote the full set of regressor variables for yi.6 The exponentiation
3Other applications in the literature include Gray and Jones (1991) (citation counts), King (1986) (count data in political science),
Kostiuk and Follman (1989) (success rates of military recruiters), Papke (1986) (industry "births" in different states), and Portney and
Mullahy (1986) (air quality and the incidence of respiratory illness). See, as well, Agresti (1984), Arvan (1989), Cooil (1991),
Coughlin, et al. (1988), Flowerdew and Aitken (1982), Frome (1983), Frome et al. (1973), Gart (1964), and Okoruwa, et al. (1988)
for a variety of specifications and uses of the Poisson model.
4See, e.g., Hoffman and Milligan (1990).
5In the context of the regression model to be analyzed here, the case of differing time intervals is handled by including the time
variable in the linear index function of the model with a coefficient of 1.0. We will return briefly to this point below. For an
application, see McCullagh and Nelder (1983, pp. 136140).
6See Cameron and Trivedi (1986), El Sayyad (1973), Engel (1984), Holgate (1964), Jorgenson (1961), Lawless (1987a, 1987b)
Maddala (1983), McCullagh Nelder (1983), and Simpson (1987).
Page 5
Modified Poisson Models
───────────────────────────────────────────
4
7Note that the normalization needed to accommodate an observation specific interval length is handled by tλi = exp(lnti + β'xi).
Thus, lnti is simply included in the regression with a coefficient of one. Henceforth, we will omit further reference to the
normalization and, for simplicity, just assume that ti equals one.
insures a positive mean.7 Estimation and inference for the Poisson model are considered below.
The Poisson distribution has the convenient, albeit restrictive property that
i
ii
E[ ] = Var[ ] = ,
yy
λ
(where, for the moment, we have subsumed the conditioning variables, xi in the subscript). The equality of
the mean and variance is the subject of the literature on over and underdispersion in the Poisson model.
Although a number of modifications have been proposed, the most frequently cited alternative is the
negative binomial regression model,
iy
i
ii
ii
i
i
i
( + )
y
( ) ! y
Γ θ
p( ) =
y
(1 
u
, > 0, = 0,1,...
θ
y)
u
= .
u
+
θ
θ
Γθ
θ
λ
(2.2)
where
This has E[yi] = λi
and Var[yi] = λi[1 + (1/θ)λi] = λi(1 + αλi)
The negative binomial model has been formulated with overdispersion,
i
i
i
Var[ ]
E[ ]
y
y
= 1 + E[ ] > 1,
α
y
as an end in itself,8 or as a consequence of incorporating unobserved individual heterogeneity (e.g.,
Hausman, et al. (1984)). Let
E[yi│εi] = λi εi
where εi is a disturbance distributed as gamma with mean 1 and variance α;
8See Cameron and Trivedi (1986) (their model `NegBin II') and King (1989b).
Page 6
Modified Poisson Models
───────────────────────────────────────────
5
9Hausman et al. (1984) present an extensive application of the model from this perspective. Recent modifications include Wedel,
et al. (1993), Wasserman (1983), and Winkelmann and Zimmermann (1991a, 1991b, 1991c).
i
 1
iii
1
α
g( ) =
ε
, > 0, =
ε
.
e
( )
θ
θθ
ε
θ
θ
ε
Γ θ
This produces
i
i
i
y
exp( + )
x
i
i
i
i
i
exp(
! y
+ )
e
x
f( _ ) =
y
ε
′ β
ε
′ β
⎡
⎣
⎤
⎦
ε
The marginal distribution is
ii
ii
0
f ( ) = f( _ )
y
i
g( ) d ,
ε
y
∞
∫
εε
which is the negative binomial model given earlier.9
Maximum likelihood estimation of parameters of the Poisson regression model is straightforward.
The loglikelihood function and its first and second derivatives are
N
λ
N
ii
iii
i=1
∑
i=1
log L = log f ( ) =
y
ii
x x ′
 + log  log ! ,
y
λλ
y
⎡
⎣
⎤
⎦
∑∑
(2.3) (2.5)
NN
i
i
i
i
i
i=1i=1
log L =
∂β
(  ) =
y
λ
x
e
x
,
∂
∑
(2.4)
N
2
i
i=1
log L = 
′
∂β∂β
.
∂
∑
The Hessian is always negative definite, which makes Newton's method a convenient way to compute the
MLE of β. Alternatively, the model is a nonlinear regression, so β can be estimated consistently by
nonlinear ordinary least squares, or efficiently by nonlinear generalized (iteratively reweighted) least
squares.10
The loglikelihood and gradient for the negative binomial model are11
10Note that the moment condition in (2.4) is the same as that for the classical regression model and prescribes nonlinear GLS as
the efficient GMM estimator.
11We have manipulated the function to eliminate the gamma integrals. This simplifies programming and marginally speeds up
computation of the estimates. See Greene (1991). We will use the indicator function, 1(condition) = 1 if the condition is true and 0 if
not, at various points below.
Page 7
Modified Poisson Models
───────────────────────────────────────────
6
12Gourieroux, et al. (1984) and White (1982).
i
( )
1  log ! + log +
y
i=1
∂θ
⎣
N
=
i
ii
i=1
log L =
x
log(1 )
⎟
λ
⎠
(2.6)
(2.8)
y
u
θ
u
∑
u
.
⎡
⎣
⎤
⎦
N
i
ii
i=1
log L =
∂β
u e
∑
∂
(2.7)
( )
1 + log + (1  ) 1
N
i
ii
i
log L
y
u
⎡
⎢
⎤
⎥
⎦
⎛
⎜
⎝
⎞
∂
∑
The function is a bit less well behaved than the Poisson loglikelihood owing to the need to keep θ positive,
but is easily handled by a gradient method incorporating a line search. Among the interesting aspects of this
model is the robustness of the Poisson MLE of β in the presence of heterogeneity (overdispersion).12
The literature on testing the Poisson restriction of equal mean and variance (over or
underdispersion) is vast.13 The problem of testing for homogeneity does not fit into the classical Neyman
Pearson methodology because the restricted case lies on the boundary of the parameter space; α = 0, or θ →
+∞.14 The mechanics of the existing procedures for testing for heterogeneity or overdispersion are tangential
to the subject of this paper. The interested reader is referred to the literature cited earlier for details.
However, the specification test procedure for our zero altered model which is proposed below is readily
adapted to a test for heterogeneity (see Section 3.1). An evaluation of this testing procedure versus the
alternatives is left for further work.
Several modifications of the Poisson regression model beyond the negative binomial specification
are of direct relevance to this study. They deal primarily with the observed frequencies in the data or the
functional form of the conditional mean function.
Problems of censoring and truncation are common, and arise from the same sources as in the more
familiar regression settings. In survey data, for example, respondents are sometimes given a limit category,
`C or more' for some large value (censoring). In other settings, such as surveys of users of recreational
facilities (e.g., Smith (1990), respondents who report zero are sometimes discarded from the sample. These
complications can be built directly into the basic Poisson or negative binomial model in same way that they
13See, e.g., Breslow (1984, 1990), Cameron and Trivedi (1990), Chesher (1984), Collings and Margolin (1985), Cox (1983), Dean
and Lawless (1989), Ganio and Schafer (1991), Gurmu(1991), King (1989), Lee (1986), Mullahy (1986,1990), Potthoff and
Whittinghill (1966), and Wasserman (1983).
14Note that this is equivalent to the problem of testing for a zero variance, as arises in the random effects classical regression
model. See Breusch and Pagan (1980).
Page 8
Modified Poisson Models
───────────────────────────────────────────
7
15See Greene (1991) for details. Applications are given by Terza (1985), Grogger and Carson (1988, 1991), Cohen (1954, 1960),
Creel and Loomis (1990, 1991), van Praag (1993), and Shaw(1988).
are handled in classical normal regression model in the form of the tobit and truncated regression models.15
For the first example given, the loglikelihood function for a model incorporating censoring of the form
suggested is
N
i=1
log L = 1 .
⎡ ⎤
⎣ ⎦
∑
The counterpart for a model with truncation at zero, as in the second example, would be
N
i
i=1
log L = log p( )  log (1  p(0)) .
y
⎡
⎣
∑
⎤
⎦
(2.9)
The gradients and Hessians, albeit tedious, are straightforward and appear in general form in Greene
(1991).16
We will be interested in two rather sparsely analyzed variations on the Poisson/negative binomial
models in this study. First, there are situations in which number of occurrences of a specific value (usually
zero) exceeds what would be predicted by the Poisson model. The problem was analyzed by Cohen (1954)
and is described in some detail in Johnson and Kotz (1969). Various modifications have been suggested
which involve a rudimentary parameterization of the `nonPoissonness' of the distribution. For example,
Heilbron (1989), who labels this the `zero altered Poisson,' or ZAP(λ,ρ) model, and Mullahy (1986), among
others, who calls this a `hurdle' model,17 suggest
i
i
i
k
i
i

Prob[ = 0] =
y
1
1
e
Prob[ = k] =
y
, k = 1,2,...
k!
e
λ
λ
ρ
ρ
⎡
⎢
⎣
⎤
⎥
⎦
λ
16Full details for the censored Poisson model appear in Terza(1985). Greene(1991) gives results for models with truncation and
for the negative binomial model.
17See Cragg (1971) and Lin and Schmidt (1984).
Page 9
Modified Poisson Models
───────────────────────────────────────────
8
18Terza and Wilson (1990) adopt this formulation solely to induce overdispersion.
Heilbron's interpretation of the model is as a modification of the Poisson model to add mass to the zero
point,18 while Mullahy's hurdle interpretation (which is closer to ours) treats the modification as a binary data
generating process. Thus, "[t]he idea underlying the hurdle formulations is that a binomial probability model
governs the binary outcome of whether a count variate has a zero or a positive realization."19,20 Note that the
positive part of the distribution is the truncated Poisson model which appears in (2.9).
In the ZAP model, observations which surpass the `hurdle' are positive. Our interest here is in a
setting in which the zeros are observed as well, with greater frequency than would otherwise be predicted by
the Poisson model. The surfeit of zeros results from a mixture of two processes, both of which produce
zeros. One generates the regime choice as a binary outcome, while the other generates the count variable,
which may equal zero as well. In one regime, the zero value is automatic, while in the other, it is but one
possible outcome. Consider, for example, answers to the survey question "[h]ow many children do you
have?" Respondents would be of two types, some who have no intention of ever having children and some
who may have some children or may not yet have any children at the time the question is asked, but might
later. The model we propose is a straightforward modification of what Mullahy and Heilbron (following
Johnson and Kotz) label the `with zeros' (WZ) model,
i
i
Prob[ = 0] = + (1  )f (0)
y
Prob[ = j] = (1  ) f (j), j = 1,2,...,
y
ψψ
ψ
where ψ is a parameter between 0 and 1.21 This formulation has the virtue of simplicity, though the
inequality constraint on ψ does complicate the computation of the maximum likelihood estimates.
For our purposes, the primary shortcoming of Heilbron/Mullahy's specification is that there are no
covariates in ψ, so that the construction of a behavioral splitting model (regime generation process) remains.
19Mullahy (1986, p. 345).
20The Hurdle model has a close resemblance to Schmidt and Witte's (1989) `splitting' model. They model a binary censoring
indicator in the context of various survival models with a probit or logit specification. The counterpart to Mullahy's zero and
truncated Poisson model is their survival or hazard function.
21As Heilbron notes, some negative values of ψ are admissable, though the interpretation of ψ as a mixing parameter will be lost in
this case.
Page 10
Modified Poisson Models
───────────────────────────────────────────
9
22Heilbron notes (p. 2
normal theory sample selection model.
Smith (1990) makes note of the utility of a selection model for counts of uses of recreational sites, but states that it is "beyond the
scope of his study." Bockstael, et al. (1990) note the issue in passing (p. 41) but treat the counts as realizations of a continuous choice
variable, and make no further mention of the problem. Shaw's (1988) model is somewhat related to this problem, but his analysis
centers on direct truncation, not sample selection as we are considering it here. This appears to be the extent of the received
commentary on the subject.
This interpretation of the model is suggested, more or less in the work of Lambert (1992), upon which much
of our current study is built. Once again, her primary motivation is `non Poissonness,' although as will be
clear below, one of her specifications provides our main building block.
The second framework proposed here will be on the subject of sample selection modelling. There
appears to have been little progress on this aspect of the model.22 We note that since the Poisson model is a
bona fide regression model, the problem of sample selection poses itself naturally. In Smith's (1988)
application, which drops neatly into this framework, we have a sample of observations which has been
culled from a larger sample specifically on the basis of their use of recreational sites. The second purpose of
this study is to offer one possible specification for handling the problem.
3 Modified Poisson and Negative Binomial Models
Our models for a zero augmented count model and for sample selection are built on the preceding in
a straightforward fashion. The first is an extension of Lambert's ZIP model. The selection model takes the
approach of modifying the joint discrete distribution of the random variables and the conditional mean
function of the count variable, rather than relying on a transformation to normality to produce the conditional
distribution of a latent continuous variable.
3.1. ZIP Models
Lambert (1992) proposes the following modification of the Heilbron/Mullahy WZ model, which she
labels the `zero inflated Poisson' or ZIP model:23
(3.1)
ii
i
ii
~ 0 with probability
yq
~ Poisson ( ) with probability 1  ,
λ
yq
where logλi = β′ ′xi as before, and
w
of a
unt data."
9) "For nonPoisson counts, there is no transformation of Y that would make sen ible the application
Further, it seems difficult to formulate an appealing sample selection model for co
e
s
ii
ii
x
i
ii
wx
i
ee
= or = = .
q
1 + 1 + 1 +
e
′′
γτβτ
′′
γτβτ
λ
λ
q
23We have changed the notation a bit, but the substance of the model is identical.
Page 11
Modified Poisson Models
───────────────────────────────────────────
10
24Lambert gives her formulation in terms of τ, but since τ is unrestricted in sign or magnitude, no generality is lost by using our
slightly more convenient parameterization.
Lambert labels the latter the ZIP(τ) model.24 Thus, the ZIP model generalizes Mullahy's WZ model by
parameterizing a formal probability model for ψ. Although Mullahy's interpretation of a primary regime
generating process for ψ is consistent with the ZIP model, in fact, Lambert's primary interest appears to be in
nonPoissonness, i.e., the shape of the distribution.25 We will propose some extensions of Lambert's ZIP and
ZIP(τ) models. First, we will consider an alternative formulation of the splitting variate  the determination
of qi.26 Second, we will extend the ZIP model to the negative binomial model. The extension is a natural
one. Lambert does mention the possibility of augmenting the mass at zero for other discrete distributions (p.
12), however, our interest here goes beyond merely specifying an alternative distribution. The presence of
excess zeros in the data will likely lead to a conclusion of overdispersion. In our case, we are interested in
heterogeneity as the source of the overdispersion. A zero inflated negative binomial (ZINB or ZINB(τ))
model will enable us to distinguish between the effect of the splitting mechanism and the overdispersion
induced by individual heterogeneity. In this connection, we are interested in a procedure which will enable
us to test the zero inflated model against the simple Poisson model or against the negative binomial model.
The latter will allow us to make a statement as to whether the excess zeros are the consequence of the
splitting mechanism or are a symptom of unobserved heterogeneity. We note that the same test that we
propose here provides an as yet unexamined method of testing the specification of the negative binomial
model against the Poisson model, independently of the splitting mechanism.
We consider a process whereby the observed random variable yi is generated as
*
i
i
i
=
yy
z
where zi is a binary (0/1) variable and y is distributed as Poisson(λi) or negative binomial (λi,θ). The ZIP
model is, by this construction, a model of `partial observability'  only the product of the two latent variables,
25The introduction does state, however, "One interpretation is that slight, unobserved changes in the environment cause the
process to move back and forth between a perfect state in which defects [the Poisson variate] are extremely rare and an imperfect
state in which defects are possible but not inevitable." (Lambert, p. 1.) This is, of course, consistent with Mullahy's description,
though, it understates the case a bit since in the perfect state predicted by the model, defects are impossible.
26Lambert does mention in passing some alternative formulations for qi in the ZIP(τ) model (p. 3), but confines attention to the
initial logit model.
Page 12
Modified Poisson Models
───────────────────────────────────────────
11
27See Poirier (1980) and Abowd and Farber (1982).
zi and y, is observed.27 Thus,
where f(⋅) is the Poisson or negative binomial probability distribution for y. For an application, consider the
response to the question how many trips have you taken to a certain sport fishing site? The answer to this
question comes at two levels. There are individuals who would not visit a sport fishing site almost
regardless of circumstances because this activity does not interest them, whereas there are others for whom
the number of visits might follow more conventional patterns amenable to a Poisson or negative binomial
regression  but might, once again, be zero. The binary part of the model, i.e., the splitting mechanism, lends
itself conveniently to a probit or logit specification, though we need not limit it to those two choices.
Likewise, the conditional count variable can have a Poisson or negative binomial (or, in principle, some
other) distribution. Clearly, any combination of models for a binary outcome, zi, and count variable, y, might
be considered. We will limit our attention to the familiar probit and logit models for zi and the Poisson and
negative binomial models for y. We will also consider both the ZIP and ZIP(τ) specifications.
3.1.1 Estimation
Estimation of the parameters of the ZIP model is fairly straightforward. Lambert suggested the EM
algorithm, but our experience has been that a straightforward gradient approach with a line search is more
efficient and poses no unusual calibration problems. To formulate the loglikelihood and gradient for the
ZIP models, let
and
qi = F(γ′ ′wi) for the ZIP model
qi = F(τβ′ ′xi) for the ZIP(τ) model,
where F(⋅) is either the cumulative normal probability, Φ, for the probit model or the cumulative logistic
probability, Λ(⋅) for the logit model. Let f(⋅) denote either the Poisson(λi) or the negative binomial (λi,θ)
probability density function. (This produces eight possible models.) Then, the probability density function
for the observed random variable, yi, is
ii
qq
*
i
ii
i
ii
Prob[ = 0] = Prob[ = 0] + Prob[ = 1,
y
= 0] = + (1  ) f(0)
y
zz
Prob[ = k] = (1  ) f( k ), k = 1,2,...
yq
Page 13
Modified Poisson Models
───────────────────────────────────────────
12
iq
iiiii
p( ) = = (1  )f( ) + 1( = 0) ,
ypqyy
so the loglikelihood is simply
N
i
i=1
log L = log p( ).
∑
y
(3.2)
To obtain the gradient, let β* equal either β for the Poisson model or (β,θ) for the negative binomial model.
Then, each term in Σi (∂logpi/∂β*) is
()
ii
iiii
**
i
log
∂
β
logf( )
∂
β
1
p
py
= (1 )f( )
q
+ 1( =0)  f( )
y
.
yy
⎡
⎢
⎢
⎣
⎤
⎥
⎥
⎦
⎛
⎜
⎝
⎞
⎟
⎠
⎛
⎜
⎝
∂∂
i
*
q⎞
∂β⎠
∂
⎟
(3.3)
The derivatives of log f(yi) were given in Section 2. Also, ∂qi/∂β* will equal 0 in the ZIP model, or τxiqi′ for
the ZIP(τ) model with a trailing zero for θ if f(yi) is the negative binomial model, since θ does not enter qi.
(The inner derivative, qi′, is either the standard normal density, φi for the probit model, or
Λi(1Λi) for the logit model.) Finally, the parameters of the ZIP model are either γ, a vector, in the ZIP
model or τ, a scalar, in the ZIP(τ) model. Denoting these generically as γ, we have
i
i
ii
i
lo
∂
i
g p
∂ γ
q
p
= [1 ( = 0)  f( )]
y
()
y
x
′
′ β
(3.4)
for the ZIP(τ) model. For the ZIP model, β′ ′xi is replaced with wi, the vector of covariates. The second
derivatives are fairly complicated. In our applications, we have used the BHHH estimator instead as a
convenient expedient. Finally, in the ZIP model, the hypothesis that some or all of the parameters in f(yi)
equal those in qi might be of interest. Estimation subject to the equality constraints is straightforward, and
carrying out the test via a Wald or likelihood ratio procedure can be done by conventional procedures.
For the ZIP specification, a natural set of starting values for the parameters is provided by the probit
or logit and independent Poisson or negative binomial estimates. In the ZIP(τ) case, the Poisson or negative
binomial model can be used for the regression parameters. One could then choose a value for τ which
would produce approximately the correct probability for zero. An alternative possibility would be to
estimate τ by fitting a probit or logit model to the binary indicator 1(yi = 0) with the single covariate equal to
the Poisson estimates of β′ ′xi (only so as to get the right sign and approximately the right magnitude on τ;
this is not a consistent estimator). Save for a few badly identified cases found by experimentation in which
Page 14
Modified Poisson Models
───────────────────────────────────────────
13
no solution could be found, convergence of the DFP or Broyden algorithms appears to be routine.
3.1.2 Specification Testing
The ZIP model relaxes the assumption of equal mean and variance in the Poisson model. To derive
the unconditional mean and variance of yi, we first consider the Poisson case. The two conditional
distributions are
i
ii
i
i
ii
f( _ = 0) = 1, = 0,
y
z
y
f( _ = 1) = Poisson( ), = 0,1,...
y
z
y
λ
Then
i
i
ii
z
iiii
E[ ] =
y
[E[ _ ]] = 0 + (1  ) = (1 
yq
z
)
q
E
iq
λλ
and
Page 15
Modified Poisson Models
───────────────────────────────────────────
14
28See, for example, Breslow (1990), Cameron and Trivedi (1990), Collings and Margolin (1985), Ganio and Schafer (1992),
Gurmu (1991), Mullahy (1990), and Potthoff and Whittinghill (1966).
The unconditional Poisson model emerges if qi → 0. Also,
so the splitting phenomenon produces overdispersion in its own right. Thus, qi/(1qi) is the counterpart to α
in the negative binomial model as regards overdispersion. The ratio increases with qi, as might be expected,
so the more likely is the zero state, the greater is the overdispersion. For the negative binomial model, the
conditional means are the same, so the unconditional mean is unchanged. Only the term Ez[Var[yi│z]]
changes, from (1qi)λi to (1qi)λi(1 + αλi). Combining terms and simplifying produces the unconditional
variance for the negative binomial model,
This shows that the overdispersion arises from these two independent sources. Moreover, the effects are
cumulative, since the term in parentheses is greater than α for all positive qi.
There is a large literature on testing for overdispersion in the Poisson model.28 With rare exception,
the diagnostic statistics proposed and analyzed are based on second moments constructed from:
· deviations of estimated means and variances (e.g., Dean and Lawless(1989)),
· deviations of a regression slope from one or zero (e.g., Cameron and Trivedi (1990)),
· deviations of derivatives from zero (LM tests) (e.g., Mullahy (1986).
While tests such as these are clearly related to the model analyzed here, the potential lack of fit of the
Poisson or negative binomial model to the observed data seems also to have potential utility as a diagnostic.
i)
λ
i
i
ii
z
z
iii
22
iii
iiiiii
ii
ii
Var[ ] =
y
[Var[ _ ]] +
y
[E[ _ ]]
y
Var
Ezz
= [ 0 + (1  ) ] + [ (0  (1  )
qq
λ
+ (1) (  (1  )
λ
]
qq)qq
= (1  ) [1 + ].
q
λ
q
λ
λ
ii
i
ii
ii
Var[ ]
E[ ]
y
yq
= 1 + = 1 +
λ
E[ ]
yq
1 
q
⎡
⎢
⎣
⎤
⎥
⎦
ii
ii
iiii
ii
Var[ ]
E[ ]
y
+
yq
1
Var[ ] = (1  ) [1 + ( + ) ] or
yq
λ
= 1 + E[ ].
yq
q
⎛
⎜
⎝
⎞
⎟
⎠
α
α
λ
View other sources
Hide other sources
 Available from SSRN
 Available from William H Greene · Jun 2, 2014