Quasi-poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology

National Marine Mammal Laboratory, Alaska Fisheries Science Center, National Marine Fisheries Service, 7600 Sand Point Way NE, Building 4, Seattle, Washington 98115-6349, USA.
Ecology (Impact Factor: 4.66). 12/2007; 88(11):2766-72. DOI: 10.1890/07-0043.1
Source: PubMed


Quasi-Poisson and negative binomial regression models have equal numbers of parameters, and either could be used for overdispersed count data. While they often give similar results, there can be striking differences in estimating the effects of covariates. We explain when and why such differences occur. The variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative binomial model is a quadratic function of the mean. These variance relationships affect the weights in the iteratively weighted least-squares algorithm of fitting models to data. Because the variance is a function of the mean, large and small counts get weighted differently in quasi-Poisson and negative binomial regression. We provide an example using harbor seal counts from aerial surveys. These counts are affected by date, time of day, and time relative to low tide. We present results on a data set that showed a dramatic difference on estimating abundance of harbor seals when using quasi-Poisson vs. negative binomial regression. This difference is described and explained in light of the different weighting used in each regression method. A general understanding of weighting can help ecologists choose between these two methods.

Download full-text


Available from: Jay M. Ver Hoef, Dec 03, 2014

Click to see the full-text of:

Article: Quasi-poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology

203.95 KB

See full-text
  • Source
    • "It is likely that dist.low was only retained as a significant predictor by the quasi-Poisson model because of characteristics of the estimation process. For example, quasi-Poisson regression gives greater weight to larger counts in the fit from iteratively weighted least squares as compared to alternatives such as the negative binomial (Ver Hoef and Boveng 2007), and the age 0 densities contributing to the survival estimates above one were an "
    [Show abstract] [Hide abstract]
    ABSTRACT: Describing how population-level survival rates are influenced by environmental change becomes necessary during recovery planning to identify threats that should be the focus for future remediation efforts. However, the ways in which data are analyzed have the potential to change our ecological understanding and thus subsequent recommendations for remedial actions to address threats. In regression, distributional assumptions underlying short time series of survival estimates cannot be investigated a priori and data likely contain points that do not follow the general trend (outliers) as well as contain additional variation relative to an assumed distribution (overdispersion). Using juvenile survival data from three endangered Atlantic salmon Salmo salar L. populations in response to hydrological variation, four distributions for the response were compared using lognormal and generalized linear models (GLM). The influence of outliers as well as overdispersion was investigated by comparing conclusions from robust regressions with these lognormal models and GLMs. The analyses strongly supported the use of a lognormal distribution for survival estimates (i.e., modeling the instantaneous rate of mortality as the response) and would have led to ambiguity in the identification of significant hydrological predictors as well as low overall confidence in the predicted relationships if only GLMs had been considered. However, using robust regression to evaluate the effect of additional variation and outliers in the data relative to regression assumptions resulted in a better understanding of relationships between hydrological variables and survival that could be used for population-specific recovery planning. This manuscript highlights how a systematic analysis that explicitly considers what monitoring data represent and where variation is likely to come from is required in order to draw meaningful conclusions when analyzing changes in survival relative to environmental variation to aid in recovery planning.
    Ecology and Evolution 08/2015; 5(16). DOI:10.1002/ece3.1614 · 2.32 Impact Factor
  • Source
    • "Two other potentially useful, better known techniques are negative binomial models (Ver Hoef and Boveng 2007; Zuur et al. 2009; O'Hara and Kotze 2010) and zero-inflated regression (Martin et al. 2005; Zeileis et al. 2008; Zuur et al. 2009), which are often used when there are signs of overdispersion. Ecological and behavioural data often display overdispersion, i.e. variances larger than the means (Ver Hoef and Boveng 2007; Zuur et al. 2009). If overdispersion is present, it means that generalized linear models with Poisson errors (i.e. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Animal behaviour is of fundamental importance but is often overlooked in biological invasion research. A problem with such studies is that they may add pressure to already threatened species and subject vulnerable individuals to increased risk. One solution is to obtain the maximum possible information from the generated data using a variety of statistical techniques, instead of solely using simple versions of linear regression or generalized linear models as is customary. Here, we exemplify and compare the use of modern regression techniques which have very different conceptual backgrounds and aims (negative binomial models, zero-inflated regression, and expectile regression), and which have rarely been applied to behavioural data in biological invasion studies. We show that our data display overdispersion, which is frequent in ecological and behavioural data, and that conventional statistical methods such as Poisson generalized linear models are inadequate in this case. Expectile regression is similar to quantile regression and allows the estimation of functional relationships between variables for all portions of a probability distribution and is thus well suited for modelling boundaries in polygonal relationships or cases with heterogeneous variances which are frequent in behavioural data. We applied various statistical techniques to aggression in invasive mosquitofish, Gambusia holbrooki, and the concomitant vulnerability of native toothcarp, Aphanius iberus, in relation to individual size and sex. We found that medium sized male G. holbrooki carry out the majority of aggressive acts and that smaller and medium size A. iberus are most vulnerable. Of the regression techniques used, only negative binomial models and zero-inflated and expectile Poisson regressions revealed these relationships.
    Reviews in Fish Biology and Fisheries 07/2015; 25(3):537-549. DOI:10.1007/s11160-015-9391-0 · 2.73 Impact Factor
  • Source
    • "A model averaging approach would take some weighted average of the adjustments provided by both methods . However , for the data set they considered , Ver Hoef and Boveng ( 2007 ) showed that negative binomial regression gave unbelievably high adjust - ments . They explained the difference based on the different weights that quasi – Poisson and negative binomial place on the observations , due to different variance - mean relationships , which influenced the regression estimates . "
    [Show abstract] [Hide abstract]
    ABSTRACT: Multimodel inference accommodates uncertainty when selecting or averaging models, which seems logical and natural. However, there are costs associated with multimodel inferences, so they are not always appropriate or desirable. First, we present statistical inference in the big picture of data analysis and the deductive–inductive process of scientific discovery. Inferences on fixed states of nature, such as survey sampling methods, generally use a single model. Multimodel inferences are used primarily when modeling processes of nature, when there is no hope of knowing the true model. However, even in these cases, iterating on a single model may meet objectives without introducing additional complexity. Additionally, discovering new features in the data through model diagnostics is easier when considering a single model. There are costs for multimodel inferences, including the coding, computing, and summarization time on each model. When cost is included, a reasonable strategy may often be iterating on a single model. We recommend that researchers and managers carefully examine objectives and cost when considering multimodel inference methods. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
    Journal of Wildlife Management 05/2015; DOI:10.1002/jwmg.891 · 1.73 Impact Factor
Show more