Quasi-poisson vs. negative binomial regression: How should we model overdispersed count data? Ecology

National Marine Mammal Laboratory, Alaska Fisheries Science Center, National Marine Fisheries Service, 7600 Sand Point Way NE, Building 4, Seattle, Washington 98115-6349, USA.
Ecology (Impact Factor: 4.66). 12/2007; 88(11):2766-72. DOI: 10.1890/07-0043.1
Source: PubMed


Quasi-Poisson and negative binomial regression models have equal numbers of parameters, and either could be used for overdispersed count data. While they often give similar results, there can be striking differences in estimating the effects of covariates. We explain when and why such differences occur. The variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative binomial model is a quadratic function of the mean. These variance relationships affect the weights in the iteratively weighted least-squares algorithm of fitting models to data. Because the variance is a function of the mean, large and small counts get weighted differently in quasi-Poisson and negative binomial regression. We provide an example using harbor seal counts from aerial surveys. These counts are affected by date, time of day, and time relative to low tide. We present results on a data set that showed a dramatic difference on estimating abundance of harbor seals when using quasi-Poisson vs. negative binomial regression. This difference is described and explained in light of the different weighting used in each regression method. A general understanding of weighting can help ecologists choose between these two methods.

Download full-text


Available from: Jay M. Ver Hoef, Dec 03, 2014
  • Source
    • "The effects of the environmental and climatic variables on the abundance of the various sandfly species were analyzed using Generalized AdditiveModels (GAMs) to detect non-linear relationships. As the data showed a significant over-dispersion, the model was fitted using a negative binomial error distribution and a logarithm link[21]. The explanatory variables were: altitude, slope, wall orientation, mean/minimum/ maximum temperature and mean/minimum/maximum relative humidity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background: Phlebotomine sandflies are hematophagous insects widely present in Western Mediterranean countries and known for their role as Leishmania vectors. During the last ten years, the risk of leishmaniasis re-emergence has increased in France. However, sandfly biology and ecology in the South of France remain poorly known because the last detailed study on their spatiotemporal dynamics was performed over 30 years ago. The aim of the present study was to update our knowledge on sandfly ecology by determining their spatiotemporal dynamics and by investigating the relationship between environmental/climatic factors and the presence and abundance of sandflies in the South of France. Methods: An entomological survey was carried out during three years (2011-2013) along a 14 kilometer-long transect. The findings were compared with the data collected along the same transect in 1977. Data loggers were placed in each station and programmed to record temperature and relative humidity every six hours between April 2011 and November 2014. Several environmental factors (such as altitude, slope and wall orientation (North, East, West and South)) were characterized at each station. Results: Four sandfly species were collected: Phlebotomus ariasi and Sergentomyia minuta, which were predominant, Ph. perniciosus and Ph. mascittii. Sandfly activity within the studied area started in May and ended in October with peaks in July-August at the optimum average temperature. We found a positive effect of altitude and temperature and a negative effect of relative humidity on Ph. ariasi and Se. minuta presence. We detected interspecific differences and non-linear effects of these climatic variables on sandfly abundance. Although the environment has considerably changed in 30 years, no significant difference in sandfly dynamics and species diversity was found by comparing the 1977 and 2011-2013 data. Conclusion: Our study shows that this area maintains a rich sandfly fauna with high Ph. ariasi population density during the active season. This represents a risk for Leishmania transmission. The analysis revealed that the presence and abundance of Ph. ariasi and Se. minuta were differently correlated with the environmental and climatic factors. Comparison with the data collected in 1977 highlighted the sandfly population stability, suggesting that they can adapt, in the short and long term, to changing ecosystems.
    Full-text · Article · Dec 2015 · Parasites & Vectors
  • Source
    • "It is likely that dist.low was only retained as a significant predictor by the quasi-Poisson model because of characteristics of the estimation process. For example, quasi-Poisson regression gives greater weight to larger counts in the fit from iteratively weighted least squares as compared to alternatives such as the negative binomial (Ver Hoef and Boveng 2007), and the age 0 densities contributing to the survival estimates above one were an "
    [Show abstract] [Hide abstract]
    ABSTRACT: Describing how population-level survival rates are influenced by environmental change becomes necessary during recovery planning to identify threats that should be the focus for future remediation efforts. However, the ways in which data are analyzed have the potential to change our ecological understanding and thus subsequent recommendations for remedial actions to address threats. In regression, distributional assumptions underlying short time series of survival estimates cannot be investigated a priori and data likely contain points that do not follow the general trend (outliers) as well as contain additional variation relative to an assumed distribution (overdispersion). Using juvenile survival data from three endangered Atlantic salmon Salmo salar L. populations in response to hydrological variation, four distributions for the response were compared using lognormal and generalized linear models (GLM). The influence of outliers as well as overdispersion was investigated by comparing conclusions from robust regressions with these lognormal models and GLMs. The analyses strongly supported the use of a lognormal distribution for survival estimates (i.e., modeling the instantaneous rate of mortality as the response) and would have led to ambiguity in the identification of significant hydrological predictors as well as low overall confidence in the predicted relationships if only GLMs had been considered. However, using robust regression to evaluate the effect of additional variation and outliers in the data relative to regression assumptions resulted in a better understanding of relationships between hydrological variables and survival that could be used for population-specific recovery planning. This manuscript highlights how a systematic analysis that explicitly considers what monitoring data represent and where variation is likely to come from is required in order to draw meaningful conclusions when analyzing changes in survival relative to environmental variation to aid in recovery planning.
    Full-text · Article · Aug 2015 · Ecology and Evolution
  • Source
    • "Two other potentially useful, better known techniques are negative binomial models (Ver Hoef and Boveng 2007; Zuur et al. 2009; O'Hara and Kotze 2010) and zero-inflated regression (Martin et al. 2005; Zeileis et al. 2008; Zuur et al. 2009), which are often used when there are signs of overdispersion. Ecological and behavioural data often display overdispersion, i.e. variances larger than the means (Ver Hoef and Boveng 2007; Zuur et al. 2009). If overdispersion is present, it means that generalized linear models with Poisson errors (i.e. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Animal behaviour is of fundamental importance but is often overlooked in biological invasion research. A problem with such studies is that they may add pressure to already threatened species and subject vulnerable individuals to increased risk. One solution is to obtain the maximum possible information from the generated data using a variety of statistical techniques, instead of solely using simple versions of linear regression or generalized linear models as is customary. Here, we exemplify and compare the use of modern regression techniques which have very different conceptual backgrounds and aims (negative binomial models, zero-inflated regression, and expectile regression), and which have rarely been applied to behavioural data in biological invasion studies. We show that our data display overdispersion, which is frequent in ecological and behavioural data, and that conventional statistical methods such as Poisson generalized linear models are inadequate in this case. Expectile regression is similar to quantile regression and allows the estimation of functional relationships between variables for all portions of a probability distribution and is thus well suited for modelling boundaries in polygonal relationships or cases with heterogeneous variances which are frequent in behavioural data. We applied various statistical techniques to aggression in invasive mosquitofish, Gambusia holbrooki, and the concomitant vulnerability of native toothcarp, Aphanius iberus, in relation to individual size and sex. We found that medium sized male G. holbrooki carry out the majority of aggressive acts and that smaller and medium size A. iberus are most vulnerable. Of the regression techniques used, only negative binomial models and zero-inflated and expectile Poisson regressions revealed these relationships.
    Full-text · Article · Jul 2015 · Reviews in Fish Biology and Fisheries
Show more