# Are there generalizations of zero-inflated negative binomial and hurdle modeling that address continuous (i.e., non-count) variables?

I know that count variables whose distributions include a lot of zeros in them can be modeled with zero-inflated negative binomial models and also hurdle models, but I'm curious if there are similar kinds of models for continuous, non-negative outcome variables where the probability of departure from zero is itself modeled alongside simultaneous modeling of the non-zero magnitude. It occurred to me that it would be possible to discretize any continuous outcome variable and so convert it to a count variable, but I wonder if there are any models that explicitly address continuous variables of this sort. It goes without saying that such variables are only "continuous" beyond the transition from zero to non-zero values, but I'd be interested in any insights out there.

## Popular Answers

Janet A Tooze· Wake Forest School of MedicineSebastian E. Baumeister· Universität Regensburg## All Answers (18)

Subrata Chakraborty· Dibrugarh UniversityJonathan Sidi· Hebrew University of JerusalemStephen Senn· LIH Luxembourg Institute of Healthhttp://onlinelibrary.wiley.com/journal/10.1002/%28ISSN%291097-0258/earlyview

James P. Howard· University of Maryland University CollegeThere is a second model, called the Cragg model, that may also be of interest. It's a two-step model that provides a probit model to determine departure from zero and a Poisson model to estimate the magnitude. It has the downside of still being a count-model internally, and that may not fit your dependent variable, however.

Please feel free to contact me if you need assistance.

Jason T Machan· Rhode Island HospitalJanet A Tooze· Wake Forest School of MedicineJixian Wang· Celgene International, SwitzerlandKen P Kleinman· University of Massachusetts AmherstMichael Smithson· Australian National UniversityGraeme R Tucker· University of AdelaideVasudeva Guddattu· Manipal UniversityJonathan Sidi· Hebrew University of JerusalemJason T Machan· Rhode Island HospitalI've treating these as time-event data with some success (censored CT at 40), but sample size is always small. A more complicated and robust method I'd that may bear some relevance was directly modeling the "40" inflation by using a model which explicitely truncates a pdf (cycles-to-threshold tend to be Gaussian in my experience)...and then proceed as you would with any other linear model...figure attached...this was designed to illustrate the impact of imputing 40s and then simply treating as if normal...shows the degree of bias increasing as the central tendency approaches 40 (red line is arithmetic mean...grean is best fit allowing for truncation). A key calculation is the difference between a target gene and a housekeeping gene...expressed as 2^deltaCT..."fold" difference

Sebastian E. Baumeister· Universität RegensburgNathaniel L Baker· Medical University of South CarolinaMatthew Browne· Central Queensland UniversityThis has already been mentioned by @Vasudeva, but it's probably worth repeating that Tweedie models appear to be the most direct answer to your question. With parameter 1 < p < 2, they assume a point mass at zero and a gamma distribution on the positives. They have another parameter which specifies the dispersion, making them a pretty flexible and (at least to me) interesting model of the error distribution.

If you wish to use Tweedie models, you are probably best off using the R package 'tweedie'. Having used them myself, I did sometimes have convergence issues, and in some cases ended up using a composite logit / log-normal model, as mentioned by @Nathaniel.

One important practical difference between the two approaches, is that composite models (whether done manually, or via the R functions 'hurdle' or 'zeroinfl') estimate separate linear predictors for the two parts of the model. In many cases, this might make perfect sense. In other situations, it might be more sensible a single set of coefficients associated with a the common linear predictor employed by tweedie.

One last thing to be aware of is that handy methods like stepAIC (stepwise variable selection based on information criteria) are not (to my knowledge) yet available for tweedie models. So, if this kind of thing is required, it might be easier to go for the composite models.

Joseph Hilbe· Arizona State U and U of HawaiiThe binomial commands will be in the Stata Journal, and I will be posting them shortly to my web site, http://works.bepress.com/joseph_hilbe/

Joseph M Hilbe

hilbe@asu.edu

Alessandro Marcon· University of VeronaWould you find it acceptable to use a negative binomial regression model to analyse a QUANTATIVE DISCRETE score (ranging 0-10) that shows a marked excess of zeroes? This model is generally used for COUNT variables. I found this application for a quantitative discrete score (Byers AL et al. J Clin Epidemiol. 2003;56:559-64), but one could argue that this application violates the assumption of "independence across trials" (items that are summed up to devise the score are neither bernoullian trials nor independent for one subject).

Any reference and idea will be very much appreciated.

Can you help by adding an answer?