Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data?

National Marine Mammal Laboratory, Alaska Fisheries Science Center, National Marine Fisheries Service, 7600 Sand Point Way NE, Building 4, Seattle, Washington 98115-6349, USA.
Ecology (Impact Factor: 5). 12/2007; 88(11):2766-72. DOI: 10.1890/07-0043.1
Source: PubMed

ABSTRACT Quasi-Poisson and negative binomial regression models have equal numbers of parameters, and either could be used for overdispersed count data. While they often give similar results, there can be striking differences in estimating the effects of covariates. We explain when and why such differences occur. The variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative binomial model is a quadratic function of the mean. These variance relationships affect the weights in the iteratively weighted least-squares algorithm of fitting models to data. Because the variance is a function of the mean, large and small counts get weighted differently in quasi-Poisson and negative binomial regression. We provide an example using harbor seal counts from aerial surveys. These counts are affected by date, time of day, and time relative to low tide. We present results on a data set that showed a dramatic difference on estimating abundance of harbor seals when using quasi-Poisson vs. negative binomial regression. This difference is described and explained in light of the different weighting used in each regression method. A general understanding of weighting can help ecologists choose between these two methods.

Download full-text


Available from: Jay M. Ver Hoef, Dec 03, 2014

Click to see the full-text of:

Article: Quasi-Poisson vs. negative binomial regression: how should we model overdispersed count data?

203.95 KB

See full-text
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Multimodel inference accommodates uncertainty when selecting or averaging models, which seems logical and natural. However, there are costs associated with multimodel inferences, so they are not always appropriate or desirable. First, we present statistical inference in the big picture of data analysis and the deductive–inductive process of scientific discovery. Inferences on fixed states of nature, such as survey sampling methods, generally use a single model. Multimodel inferences are used primarily when modeling processes of nature, when there is no hope of knowing the true model. However, even in these cases, iterating on a single model may meet objectives without introducing additional complexity. Additionally, discovering new features in the data through model diagnostics is easier when considering a single model. There are costs for multimodel inferences, including the coding, computing, and summarization time on each model. When cost is included, a reasonable strategy may often be iterating on a single model. We recommend that researchers and managers carefully examine objectives and cost when considering multimodel inference methods. Published 2015. This article is a U.S. Government work and is in the public domain in the USA.
    Journal of Wildlife Management 05/2015; DOI:10.1002/jwmg.891 · 1.61 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Overdispersion is common in models of count data in ecology and evolutionary biology, and can occur due to missing covariates, non-independent (aggregated) data, or an excess frequency of zeroes (zero-inflation). Accounting for overdispersion in such models is vital, as failing to do so can lead to biased parameter estimates, and false conclusions regarding hypotheses of interest. Observation-level random effects (OLRE), where each data point receives a unique level of a random effect that models the extra-Poisson variation present in the data, are commonly employed to cope with overdispersion in count data. However studies investigating the efficacy of observation-level random effects as a means to deal with overdispersion are scarce. Here I use simulations to show that in cases where overdispersion is caused by random extra-Poisson noise, or aggregation in the count data, observation-level random effects yield more accurate parameter estimates compared to when overdispersion is simply ignored. Conversely, OLRE fail to reduce bias in zero-inflated data, and in some cases increase bias at high levels of overdispersion. There was a positive relationship between the magnitude of overdispersion and the degree of bias in parameter estimates. Critically, the simulations reveal that failing to account for overdispersion in mixed models can erroneously inflate measures of explained variance (r (2)), which may lead to researchers overestimating the predictive power of variables of interest. This work suggests use of observation-level random effects provides a simple and robust means to account for overdispersion in count data, but also that their ability to minimise bias is not uniform across all types of overdispersion and must be applied judiciously.
    10/2014; 2:e616. DOI:10.7717/peerj.616
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many multispecies models have assumed that prey density determines per-capita predator consumption rates, following a functional response relationship. However, empirical evidence suggests that a predator's diet can also be influenced by a variety of environmental factors, including interactions with other predators. We used diet and abundance data from National Marine Fisheries Service (NMFS) bottom trawl surveys for three groundfish predators (Pacific cod (Gadus macrocephalus), Pacific halibut (Hippoglossus stenolepis), and sablefish (Anoplopoma fimbria)) in the Gulf of Alaska (GOA) to determine whether temperature or other species influence the consumption of walleye pollock (Gadus chalcogrammus). Using an information-theoretic approach, we tested for relationships between walleye pollock observed in predator stomachs and predator length, bottom temperature, prey availability (walleye pollock catch per unit effort (CPUE) scaled by observed prey lengths), and CPUE of the three predators and arrowtooth flounder (Atheresthes stomias). Predator length was positively related to walleye pollock presence and proportion of total diet mass in all predators. Increased temperatures negatively affected consumption of walleye pollock by Pacific halibut, but not the other predators. We found evidence for a number of interpredator effects of co-occurring predators, both positive (facultative) and negative (competitive). Surprisingly, observed prey density was not statistically significant with respect to consumption for these predators, suggesting that trawls sample the environment far differently than walleye pollock predators or species interactions are more complex than those used in previous multispecies models. These factors should be considered for future models contributing to ecosystem-based management.
    Canadian Journal of Fisheries and Aquatic Sciences 08/2014; 71(8):1123-1133. DOI:10.1139/cjfas-2013-0260 · 2.28 Impact Factor