Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binomial Regression Models

Source: RePEc


We present several modifications of the Poisson and negative binomial models for count data to accommodate cases in which the number of zeros in the data exceed what would typically be predicted by either model. The excess zeros can masquerade as overdispersion. We present a new test procedure for distinguishing between zero inflation and overdispersion. We also develop a model for sample selection which is analogous to the Heckman style specification for continuous choice models. An application is presented to a data set on consumer loan behavior in which both of these phenomena are clearly present.

Download full-text


Available from: William H Greene, Oct 05, 2015
61 Reads
  • Source
    • "Lambert (1992) has developed the Zero Inflated Poisson (ZIP) model to handle this case. In order to model both unobserved heterogeneity and excess zeros a Zero Inflated Negative Binomial (ZINB) model could be applied to the data (Greene, 1994). "
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper focuses on the properties of the matching process which leads to scientific collaboration. In a first step, it proposes a simple theoretical model to describe the intertemporal choice of researchers facing successive opportunities of co-authoring papers. In a second part, the paper empirically assesses the properties of the model. The main empirical result is that the number and the productivity of a researcher's co-authors reflect the productivity of this researcher. This result is consistent with the assumption that co-authorship is motivated by a willingness to increase both the quality and the quantity of research output. As researchers with a lot of influent publications papers may create links with a large number of influential co-authors, co-authoring with highly productive academics appears as a signaling device of researchers' quality.
    • "In conclusion, a negative binominal regression model (NBRM) was considered to be most appropriate. In contrast to the Poisson regression (PR), which is an alternative model for lefttailed non-linear count data, NBRM allows for overdispersed data which violate the Poisson restriction that the variance should not exceed its mean (Greene 1994). Zero-inflated variants of both models, which may account for an additional zero clustering, were not considered because we had no hypothesis suggesting that zero observations could be attributed to two different latent groups (Lambert 1992; Ridout et al. 1998). "
    [Show abstract] [Hide abstract]
    ABSTRACT: There are high proportions of problem gamblers among individuals who themselves or whose parents immigrated to Germany. This study aimed to examine whether demographic risk factors and gambling preference may explain the higher prevalence of gambling problems among those with migration background (MB). Data was obtained from a nationwide telephone survey which was part of the project "Pathological Gambling and Epidemiology" (PAGE). The sample comprised 15,023 study participants aged 14-64 years living in Germany. Participants who had reported gambling within their lifetime (n = 6,406) were defined as gamblers and categorized according to their MB (n = 1,209 with MB), additional demographic characteristics (sex, age, marital status, household size, education, occupation), preferred types of gambling (21 categories covering the gambling types available in Germany), and the count of lifetime gambling problem symptoms (0-10 criteria of the fourth Diagnostic and Statistical Manual of Mental Disorders). Estimates from a negative binomial regression revealed that there is a 146.2 % increase in the expected count of gambling problem symptoms for gamblers with MB compared to those without MB. The percentage decreased to 102.5 and 97.6 % after adjustment for demographic characteristics and further adjustment for preferred types of gambling, respectively. Demographic risk factors and gambling preference may partially mediate but not completely explain the higher prevalence of gambling problems among the population with MB. Having an MB may be considered as an independent risk factor for gambling problems, which indicates a need for culturally sensitive prevention and treatment measures.
    Journal of Gambling Behavior 04/2014; 31(3):741-. DOI:10.1007/s10899-014-9459-0 · 1.28 Impact Factor
    • "This type of approach is quite useful in other settings in statistical modeling to allow for two (or more) types of subjects in a sample. A direct generalization of the ZIP model to deal with the overdispersion commonly enountered with count data is the zero-inflated negative binomial model (Greene, 1994). Another generalization deals with repeated measures of zero-inflated data (Min and Agresti, 2005). "
    [Show abstract] [Hide abstract]
    ABSTRACT: We present an overview of some important and/or interesting contributions to the latent variable literature for the analysis of multivariate categorical responses, beginning with Lazarsfeld's introduction of latent class models. There is by now an enormous literature on latent variable models for categorical responses, especially in the context of including random effects in generalized linear mixed models, so this is necessarily a highly selective overview. Due to space considerations, we summarize the main ideas, suppressing details. As part of our presentation, we raise a couple of questions that may suggest future research work.
    Communication in Statistics- Theory and Methods 01/2014; 43(4). DOI:10.1080/03610926.2013.814783 · 0.27 Impact Factor
Show more