Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models

Source: RePEc


We present several modifications of the Poisson and negative binomial models for count data to accommodate cases in which the number of zeros in the data exceed what would typically be predicted by either model. The excess zeros can masquerade as overdispersion. We present a new test procedure for distinguishing between zero inflation and overdispersion. We also develop a model for sample selection which is analogous to the Heckman style specification for continuous choice models. An application is presented to a data set on consumer loan behavior in which both of these phenomena are clearly present.

Download full-text


Available from: William H Greene,
  • Source
    • "That is, even after accounting for zero inflation, the nonzero part of the count distribution may be overdispersed (in our context, this will be mainly observed for densely ionizing radiation). For dealing with this situation, Greene (1994) introduced an extended version of the negative binomial model for excess zero count data, the ZINB. In that case, when the overdispersion is due to both the C 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim "
    [Show abstract] [Hide abstract]
    ABSTRACT: Within the field of cytogenetic biodosimetry, Poisson regression is the classical approach for modeling the number of chromosome aberrations as a function of radiation dose. However, it is common to find data that exhibit overdispersion. In practice, the assumption of equidispersion may be violated due to unobserved heterogeneity in the cell population, which will render the variance of observed aberration counts larger than their mean, and/or the frequency of zero counts greater than expected for the Poisson distribution. This phenomenon is observable for both full- and partial-body exposure, but more pronounced for the latter. In this work, different methodologies for analyzing cytogenetic chromosomal aberrations datasets are compared, with special focus on zero-inflated Poisson and zero-inflated negative binomial models. A score test for testing for zero inflation in Poisson regression models under the identity link is also developed.
    Biometrical Journal 10/2015; DOI:10.1002/bimj.201400233 · 0.95 Impact Factor
    • "during the previous 12 months always exceeds zero was specified using the zero truncated negative binomial distribution (Greene, 1994): "
    [Show abstract] [Hide abstract]
    ABSTRACT: This research quantifies changes in consumer welfare due to changes in visitor satisfaction with the availability of information about recreational sites. The authors tested the hypothesis that an improvement in visitor satisfaction with recreation information increases the number of visits to national forests, resulting in increased consumer welfare. They tested the hypothesis with a travel cost model for the Allegheny National Forest using data from the National Visitor Use Monitoring (NVUM) programme. An ex ante simulation suggests that annual per capita consumer welfare increased when highly satisfactory recreation information was available. The findings, along with the expected costs of providing better recreation information, may be a useful reference for recreation site managers who wish to increase the number of visits in an economically effective way.
    Tourism Economics 08/2015; 21(4). DOI:10.5367/te.2014.0383 · 0.80 Impact Factor
  • Source
    • "Lambert (1992) has developed the Zero Inflated Poisson (ZIP) model to handle this case. In order to model both unobserved heterogeneity and excess zeros a Zero Inflated Negative Binomial (ZINB) model could be applied to the data (Greene, 1994). "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper focuses on the properties of the matching process which leads to scientific collaboration. In a first step, it proposes a simple theoretical model to describe the intertemporal choice of researchers facing successive opportunities of co-authoring papers. In a second part, the paper empirically assesses the properties of the model. The main empirical result is that the number and the productivity of a researcher's co-authors reflect the productivity of this researcher. This result is consistent with the assumption that co-authorship is motivated by a willingness to increase both the quality and the quantity of research output. As researchers with a lot of influent publications papers may create links with a large number of influential co-authors, co-authoring with highly productive academics appears as a signaling device of researchers' quality.
Show more