Article

Bayesian capture-recapture analysis and model selection allowing for heterogeneity and behavioral effects

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this article, we present Bayesian analysis of capture-recapture models for a closed population which allows for heterogeneity of capture probabilities between animals and bait/trap effects. We use a flexible discrete mixture model to account for the heterogeneity and behavioural effects. In addition we present a solid model selection criterion. Through illustrations with a motivating dataset, we demonstrate how Bayesian analysis can be applied in this setting and discuss some major benefits which result, including consideration of informative priors based on historical data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Bayesian methods have also become popular in capturerecapture studies. An extensive Bayesian literature on capture-recapture closed population models includes Castledine (1981), Smith (1991), George and Robert (1992), Madigan and York (1997), Basu and Ebrahimi (2001), Ghosh and Norris (2005), King and Brooks (2008), and Ghosh (2009, 2011). Bayesian statistical modeling requires the development of the likelihood function of the observed data, given a set of parameters, as well as the joint prior distribution of all model parameters. ...
Article
Full-text available
Modeling individual heterogeneity in capture probabilities has been one of the most challenging tasks in capture–recapture studies. Heterogeneity in capture probabilities can be modeled as a function of individual covariates, but correlation structure among capture occasions should be taking into account. A proposed generalized estimating equations (GEE) and generalized linear mixed modeling (GLMM) approaches can be used to estimate capture probabilities and population size for capture–recapture closed population models. An example is used for an illustrative application and for comparison with currently used methodology. A simulation study is also conducted to show the performance of the estimation procedures. Our simulation results show that the proposed quasi-likelihood based on GEE approach provides lower SE than partial likelihood based on either generalized linear models (GLM) or GLMM approaches for estimating population size in a closed capture–recapture experiment. Estimator performance is good if a large proportion of individuals are captured. For cases where only a small proportion of individuals are captured, the estimates become unstable, but the GEE approach outperforms the other methods.
... This enduring effect is called trap-happiness or trap-shyness effect according to whether the recapture probability becomes larger or smaller. This very simple one-parameter model flexibility sheds some light on the population under study and the presence of a behavioral effect can have a great impact on the estimate of the unknown population size (Yip et al., 2000;Hwang et al., 2002;Hwang and Huggins, 2011;Lee and Chen, 1998;Chao et al., 2000;Lee et al., 2003;Ghosh and Norris, 2005;Alunni Fegatelli and Tardella, 2013). However this specific type of behavioural effect is certainly only a limited device to approach the understanding of complex behavioral patterns that can be originated in multiple capture-recapture designs. ...
Article
We develop some new strategies for building and fitting new flexible classes of parametric capture-recapture models for closed populations which can be used to address a better understanding of behavioural patterns. We first rely on a conditional probability parameterization and review how to regard a large subset of standard capture-recapture models as a suitable partitioning in equivalence classes of the full set of conditional probability parameters. We then propose the use of new suitable quantifications of the conditioning binary partial capture histories as a device for enlarging the scope of flexible behavioural models and also exploring the range of all possible partitions. We show how one can easily find unconditional MLE of such models within a generalized linear model framework. We illustrate the potential of our approach with the analysis of some known datasets and a simulation study.
... Previous papers based on a latent class approach are Bartolucci and Forcina (2001), Stanghellini and van der Heijden (2004) and Bartolucci and Forcina (2006). Fewer authors have adopted a Bayesian approach for the simplest permanent behavioural settings (Lee and Chen 1998; Lee et al. 2003; Ghosh and Norris 2005) while alternative estimating approaches have been more recently proposed to cope with behavioural modeling in continuous-time recapture settings (Chaiyapong and Lloyd 1997; Yip et al. 2000; Chao et al. 2000; Hwang et al. 2002). In Sect. 2 we will adopt as model set up a rather general modeling framework recently proposed by Farcomeni (2011) where the contingency table probabilities of all possible capture histories are reparameterized in terms of conditional probabilities. ...
Article
In the context of capture-recapture modeling for estimating the unknown size of a finite population it is often required a flexible framework for dealing with a behavioural response to trapping. Many alternative settings have been proposed in the literature to account for the variation of capture probability at each occasion depending on the previous capture history. Inference is typically carried out relying on the so-called conditional likelihood approach. We highlight that such approach may, with positive probability, lead to inferential pathologies such as unbounded estimates for the finite size of the population. The occurrence of such likelihood failures is characterized within a very general class of behavioural effect models. It is also pointed out that a fully Bayesian analysis overcomes the likelihood failure phenomenon. The overall improved performance of alternative Bayesian estimators is investigated under different non-informative prior distributions verifying their comparative merits with both simulated and real data.
... Neste trabalho, foi utilizado o critério visual, dada a necessidade de se identificar o modelo com o ajuste mais próximo ao dado observado. No entanto, como critério de desempate, recorreu-se a um critério analítico, a saber, o critério de Gelfand e Ghosh (GGC) (14). As estimativas dos parâmetros dos modelos de mistura foram obtidas a partir do método bayesiano, que fornece dados mais precisos para a classe de modelos utilizada neste trabalho (15). ...
Article
Full-text available
OBJECTIVE: To develop a simulation model using public data to estimate the cancer care infrastructure required by the public health system in the state of São Paulo, Brazil. METHOD: Public data from the Unified Health System database regarding cancer surgery, chemotherapy, and radiation therapy, from January 2002-January 2004, were used to estimate the number of cancer cases in the state. The percentages recorded for each therapy in the Hospital Cancer Registry of Brazil were combined with the data collected from the database to estimate the need for services. Mixture models were used to identify subgroups of cancer cases with regard to the length of time that chemotherapy and radiation therapy were required. A simulation model was used to estimate the infrastructure required taking these parameters into account. RESULTS: The model indicated the need for surgery in 52.5% of the cases, radiation therapy in 42.7%, and chemotherapy in 48.5%. The mixture models identified two subgroups for radiation therapy and four subgroups for chemotherapy with regard to mean usage time for each. These parameters allowed the following estimated infrastructure needs to be made: 147 operating rooms, 2 653 operating beds, 297 chemotherapy chairs, and 102 radiation therapy devices. These estimates suggest the need for a 1.2-fold increase in the number of chemotherapy services and a 2.4-fold increase in the number of radiation therapy services when compared with the parameters currently used by the public health system. CONCLUSION: A simulation model, such as the one used in the present study, permits better distribution of health care resources because it is based on specific, local needs.
... As suggested by Knorr- Held and Richardson (2003), we view the DIC values as rough indices for model evaluation, and did not use them to slavishly rank models, but used them in combination with the posterior distributions of meaningful model parameters to help interpret the results. For model checking, we also computed the MSPE metric which measures how well the posterior predictive distribution conforms to the observed data (e.g., Ghosh and Norris 2005). ...
Article
Full-text available
Underlying dynamic event processes unfolding in continuous time give rise to spatiotemporal patterns that are sometimes observable at only a few discrete times. Such event processes may he modulated simultaneously over several spatial (e.g., latitude and longitude) and temporal (e.g., age, calendar time, and cohort) dimensions. The ecological challenge is to understand the dynamic latent processes that were integrated over several dimensions (space and time) to produce the observed pattern: a so-called inverse problem. An example of such a problem is characterizing epidemiological rate processes from spatially referenced age-specific prevalence data for a wildlife disease such as chronic wasting disease (CWD). With age-specific prevalence data, the exact infection times are not observed, which complicates the direct estimation of rates. However, the relationship between the observed data and the unobserved rate variables can he described with likelihood equations. Typically, for problems with multiple timescales, the likelihoods are integral equations without closed forms. The complexity of the likelihoods often makes traditional maximum-likelihood approaches untenable. Here, using seven years of hunter-harvest prevalence data From the CWD epidemic in white-tailed deer (Odocoileus virginianus) in Wisconsin. USA, we develop and explore a Bayesian approach that allows for a detailed examination of factors modulating the infection rates over space, age, and time, and their interactions. Our approach relies on the Bayesian ability to borrow strength from neighbors in both space and time. Synthesizing a number of areas of event time analysis (current-status data, age/period/cohort models. Bayesian spatial shared frailty models), our general framework has very broad ecological applicability beyond disease prevalence data to a number of important ecological event time analyses, including general survival studies with multiple time dimensions for which existing methodology is limited. We observed strong associations of infection rates with age, gender, and location. The infection rate appears to be increasing with time. We could not detect growth hotspots. or location by time interactions, which suggests that spatial variation in infection rates is determined primarily by when the disease arrives locally, rather than how fast it grows. We emphasize assumptions and the potential consequences of their violations.
... Also, it has an appealing interpretation as the sum of predictive variances and goodness-of-fit terms. Next, we briefly discuss this criterion (for a detailed discussion, see Gelfand and Ghosh 1998; for a simple discussion of the GGC statistic, see Ghosh and Norris (2005)). Define Σ as the observed covariance matrix and Σ pred as the predicted covariance matrix generated from the following posterior predictive distribution is the posterior distribution of parameter θ given the observed covariance matrix. ...
Article
Proefschrift Rijksuniversiteit Groningen. Met lit.opg. - Met samenvatting in het Nederlands.
... Several methods produce a value for each candidate model to be compared among a set of pre-selected models [Bayes factor Kass & Raftery, 1995; mean square predictive error-e.g. Ghosh & Norris, 2005; deviance information criterion (DIC) Spiegelhalter et al., 2002; Bayesian information criterion (BIC)-e.g. Link & Barker, 2006]. ...
Article
Full-text available
The impact of the ongoing rapid climate change on natural systems is a major issue for human societies. An important challenge for ecologists is to identify the climatic factors that drive temporal variation in demographic parameters, and, ultimately, the dynamics of natural populations. The analysis of long‐term monitoring data at the individual scale is often the only available approach to estimate reliably demographic parameters of vertebrate populations. We review statistical procedures used in these analyses to study links between climatic factors and survival variation in vertebrate populations. We evaluated the efficiency of various statistical procedures from an analysis of survival in a population of white stork, Ciconia ciconia , a simulation study and a critical review of 78 papers published in the ecological literature. We identified six potential methodological problems: ( i ) the use of statistical models that are not well‐suited to the analysis of long‐term monitoring data collected at the individual scale; ( ii ) low ratios of number of statistical units to number of candidate climatic covariates; ( iii ) collinearity among candidate climatic covariates; ( iv ) the use of statistics, to assess statistical support for climatic covariates effects, that deal poorly with unexplained variation in survival; ( v ) spurious detection of effects due to the co‐occurrence of trends in survival and the climatic covariate time series; and ( vi ) assessment of the magnitude of climatic effects on survival using measures that cannot be compared across case studies. The critical review of the ecological literature revealed that five of these six methodological problems were often poorly tackled. As a consequence we concluded that many of these studies generated hypotheses but only few provided solid evidence for impacts of climatic factors on survival or reliable measures of the magnitude of such impacts. We provide practical advice to solve efficiently most of the methodological problems identified. The only frequent issue that still lacks a straightforward solution was the low ratio of the number of statistical units to the number of candidate climatic covariates. In the perspective of increasing this ratio and therefore of producing more robust analyses of the links between climate and demography, we suggest leads to improve the procedures for designing field protocols and selecting a set of candidate climatic covariates. Finally, we present recent statistical methods with potential interest for assessing the impact of climatic factors on demographic parameters.
Article
The current paper presents the comprehensive analysis of a bivariate Dirichlet process mixture spatial model for estimation of pedestrian and bicycle crash counts. This study focuses on active transportation at traffic analysis zone (TAZ) level by developing a semi-parametric model that accounts for the unobserved heterogeneity by combining the strengths of bivariate specification for correlation among crash modes; spatial random effects for the impact of neighboring TAZs; and Dirichlet process mixture for random intercept. Three alternate models, one Dirichlet and two parametric, are also developed for comparison based on different criteria. Bicycle and pedestrian crashes are observed to share three influential variables: the positive correlation of K12 student enrollment; the bike-lane density; and the percentage of arterial roads. The heterogeneity error term demonstrates the presence of statistically significant correlation among the bicycle and pedestrian crashes, whereas the spatial random effect term indicates the absence of a significant correlation for the area under focus. The Dirichlet models are consistently superior to non-Dirichlet ones under all evaluation criteria. Moreover, the Dirichlet models exhibit the capability to identify latent distinct subpopulations and suggest that the normal assumption of intercept associated with traditional parametric models does not hold true for the TAZ-level crash dataset of the current study.
Article
Full-text available
Abstract: Capture-recapture models have been widely used to estimate animal population size. This work considers models for closed populations, which assume that the number of individuals in the population remains constant during the study period. An estimator for population size will be biased under presence of heterogeneity in capture probability, relative to the inherent characteristics of the individuals. That sort of heterogeneity is difficult to measure because it is not observable. This kind of problem has been traditionally approached using capture-recapture models for closed population, designated by M h and M th. In this work, logistic regression will be used to model observable heterogeneity in individual capture probabilities using covariates and non-observable heterogeneity will be modelled as random effects, through a Bayesian approach. The capture probabilities estimates are used to estimate the population size. Estimators based on Mth models will be compared, when only observable heterogeneity is modelled and when both, observable and non-observable heterogeneity, are modelled. This work is illustrated by an example.
Article
Full-text available
Modelling multiple fishing gear efficiencies and abundance for aggregated populations using fishery or survey data. – ICES Journal of Marine Science, doi: 10.1093/icesjms/fsu068. Fish and wildlife often exhibit an aggregated distribution pattern, whereas local abundance changes constantly due to movement. Estimating popu-lation density or size and survey detectability (i.e. gear efficiency in a fishery) for such elusive species is technically challenging. We extend abundance and detectability (N-mixture) methods to deal with this difficult situation, particularly for application to fish populations where gear efficiency is almost never equal to one. The method involves a mixture of statistical models (negative binomial, Poisson, and binomial functions) at two spatial scales: between-cell and within-cell. The innovation in this approach is to use more than one fishing gear with different efficiencies to simultan-eously catch (sample) the same population in each cell at the same time-step. We carried out computer simulations on a range of scenarios and estimated the relevant parameters using a Bayesian technique. We then applied the method to a demersal fish species, tiger flathead, to demon-strate its utility. Simulation results indicated that the models can disentangle the confounding parameters in gear efficiency and abundance, and the accuracy generally increases as sample size increases. A joint negative binomial– Poisson model using multiple gears gives the best fit to tiger flat-head catch data, while a single gear yields unrealistic results. This cross-sampling method can evaluate gear efficiency cost effectively using existing fishery catch data or survey data. More importantly, it provides a means for estimating gear efficiency for gear types (e.g. gillnets, traps, hook and line, etc.) that are extremely difficult to study using field experiments.
Article
Full-text available
We describe a framework for spatial modeling of data from surveys of stream-dwelling fish species in which repeated counts are made of animals within a sample of habitat units. Using Bayesian modeling with Markov chain-Monte Carlo (MCMC) algorithms, it is possible to estimate fish population size from repeated-count survey data while allowing fish detection probabilities to vary across the stream. We propose the use of conditional autoregressive models for modeling the spatial dependence of density across the habitat units of the stream. The spatial dependence model can be used along with covariate models for density and detection to predict density at unsampled units and thereby estimate total abundance across the stream. We apply these models to data sampled from an intensive repeated-count survey of juvenile coho salmon Oncorhynchus kisutch in McGarvey Creek, Northern California. Spatial dependence in fish density was detected, and models that account for spatial dependence produced more precise predictions at unsurveyed units, and thus more precise estimates of total stream abundance, than models that assumed spatial independence. Through a small simulation study, we show that ignoring heterogeneity in detection probabilities can lead to significant underestimation of total abundance. Inclusion of heterogeneity by means of a random effect in the detection component of the model can lead to numerical instability of the MCMC method, and we stress the importance of accounting for heterogeneity by incorporating covariates in modeling detection probability.
Article
Full-text available
We propose a general and flexible capture-recapture model in continuous time. Our model incorporates time-heterogeneity, observed and unobserved individual heterogeneity, and behavioral response to capture. Behavioral response can possibly have a delayed onset and a finite-time memory. Estimation of the population size is based on the conditional likelihood after use of the EM algorithm. We develop an application to the estimation of the number of adult cannabinoid users in Italy.
Article
In a sample of mRNA species counts, sequences without duplicates or with small numbers of copies are likely to carry information related to mutations or diseases and can be of great interest. However, in some situations, sequence abundance is unknown and sequencing the whole sample to find the rare sequences is not practically possible. To collect mRNA sequences of interest, or more generally, species of interest, we propose a two-phase Bayesian sampling method that addresses these concerns. The first phase of the design is used to infer sequence (species) abundance levels through a cluster analysis applied to a pilot data set. The clustering method is built upon a multivariate hypergeometric model with a Dirichlet process prior for species relative frequencies. The second phase, through Monte Carlo simulations, infers the sample size necessary to collect a certain number of species of particular interest. Efficient posterior computing schemes are proposed. The developed approach is demonstrated and evaluated via simulations. An mRNA segment data set is used to illustrate and motivate the proposed sampling method.
Article
A capture–recapture estimation method for closed wildlife population has been adapted by epidemiologists to estimate the size of a hidden or hard-to-reach population. The heterogeneity of capture probabilities on the estimation of population size using capture–recapture data is considered in this article. A generalized estimating equation approach to the problem of estimating capturing probabilities is presented by considering the heterogeneity of the study population. Resulting probabilities then serve as denominators for calculating the size of the population.
Article
Full-text available
Conventional biomass dynamics models express next year’s biomass as this year’s biomass plus surplus production less catch. These models are typically applied to species with several age-classes but it is unclear how well they perform for short-lived species with low survival and high recruitment variation. Two alternative versions of the standard biomass dynamics model (Standard) were constructed for short-lived species by ignoring the ‘old biomass’ term (Annual), and assuming that the biomass at the start of the next year depends on density-dependent processes that are a function of that biomass (Stock-recruit). These models were fitted to catch and effort data for the grooved tiger prawn Penaeus semisulcatus using a hierarchical Bayesian technique. The results from the biomass dynamics models were compared with those from more complicated weekly delay-difference models. The analyses show that: the Standard model is flexible for short-lived species; the Stock-recruit model provides the most parsimonious fit; simple biomass dynamics models can provide virtually identical results to data-demanding models; and spatial variability in key population dynamics parameters exists for P. semisulacatus. The method outlined in this paper provides a means to conduct quantitative population assessments for data-limited short-lived species.
Article
Full-text available
In the context of capture–recapture experiments heterogeneous capture probabilities are often perceived as one of the most challenging features to be incorporated in statistical models. In this paper we propose within a Bayesian framework a new modeling strategy for inference on the unknown population size in the presence of heterogeneity of subject characteristics. Our approach is attractive in that parameters are easily interpretable. Moreover, no parametric distributional assumptions are imposed on the latent distribution of individual heterogeneous propensities to be captured. Bayesian inference based on marginal likelihood by-passes some common identifiability issues, and a formal default prior distribution can be derived. Alternative default prior choices are considered and compared. Performance of our formal default approach is favorably evaluated with two real data sets and with a small simulation study. KeywordsCapture–recapture models-Heterogeneity-Bayesian inference-Population size-Default prior-Model choice Mathematics Subject Classification (2000)62F15-62G05
Article
To develop a simulation model using public data to estimate the cancer care infrastructure required by the public health system in the state of São Paulo, Brazil. Public data from the Unified Health System database regarding cancer surgery, chemotherapy, and radiation therapy, from January 2002-January 2004, were used to estimate the number of cancer cases in the state. The percentages recorded for each therapy in the Hospital Cancer Registry of Brazil were combined with the data collected from the database to estimate the need for services. Mixture models were used to identify subgroups of cancer cases with regard to the length of time that chemotherapy and radiation therapy were required. A simulation model was used to estimate the infrastructure required taking these parameters into account. The model indicated the need for surgery in 52.5% of the cases, radiation therapy in 42.7%, and chemotherapy in 48.5%. The mixture models identified two subgroups for radiation therapy and four subgroups for chemotherapy with regard to mean usage time for each. These parameters allowed the following estimated infrastructure needs to be made: 147 operating rooms, 2 653 operating beds, 297 chemotherapy chairs, and 102 radiation therapy devices. These estimates suggest the need for a 1.2-fold increase in the number of chemotherapy services and a 2.4-fold increase in the number of radiation therapy services when compared with the parameters currently used by the public health system. A simulation model, such as the one used in the present study, permits better distribution of health care resources because it is based on specific, local needs.
Article
Full-text available
The use of a finite dimensional Dirichlet prior in the finite normal mixture model has the effect of acting like a Bayesian method of sieves. Posterior consistency is directly related to the dimension of the sieve and the choice of the Dirichlet parameters in the prior. We find that naive use of the popular uniform Dirichlet prior leads to an inconsistent posterior. However, a simple adjustment to the parameters in the prior induces a random probability measure that approximates the Dirichlet process and yields a posterior that is strongly consistent for the density and weakly consistent for the unknown mixing distribution. The dimension of the resulting sieve can be selected easily in practice and a simple and efficient Gibbs sampler can be used to sample the posterior of the mixing distribution.
Article
Model choice is a fundamental and much discussed activity in the analysis of datasets. Nonnested hierarchical models introducing random effects may not be handled by classical methods. Bayesian approaches using predictive distributions can be used though the formal solution, which includes Bayes factors as a special case, can be criticised. We propose a predictive criterion where the goal is good prediction of a replicate of the observed data but tempered by fidelity to the observed values. We obtain this criterion by minimising posterior loss for a given model and then, for models under consideration, selecting the one which minimises this criterion. For a broad range of losses, the criterion emerges as a form partitioned into a goodness-of-fit term and a penalty term. We illustrate its performance with an application to a large dataset involving residential property transactions.
Article
SUMMARY Capture-recapture models are widely used in the estimation of population sizes. Based on data augmentation considerations, we show how Gibbs sampling can be applied to calculate Bayes estimates in this setting. As a result, formulations which were previously avoided because of analytical and numerical intractability can now be easily considered for practical application. We illustrate this potential by using Gibbs sampling to calculate Bayes estimates for a hierarchical capture-recapture model in a real example.
Article
We conduct nonparametric maximum likelihood estimation under two common heterogeneous closed population capture-recapture models. Our models specify mixture models (as did previous researchers' models) which have a common generating distribution, say F, for the capture probabilities. Using Lindsay and Roeder's (1992, Journal of the American Statistical Association, 87, 785-794) mixture model results and the EM algorithm, a nonparametric maximum likelihood estimator (MLE) of F for any specified population size N is obtained. Then, the nonparametric MLE of the (N, F) pair and thus for N is determined. Perhaps most importantly, since our MLE pair maximizes the likelihood under the entire nonparametric probability model, it provides an excellent foundation for estimating properties of estimators, conducting a goodness-of-fit test, and performing a likelihood ratio test. These are illustrated in the paper.
Article
We develop the non-parametric maximum likelihood estimator (MLE) of the full Mbh capture-recapture model which utilizes both initial capture and recapture data and permits both heterogeneity (h) between animals and behavioural (b) response to capture. Our MLE procedure utilizes non-parametric maximum likelihood estimation of mixture distributions (Lindsay, 1983; Lindsay and Roeder, 1992) and the EM algorithm (Dempsteret al., 1977). Our MLE estimate provides the first non-parametric estimate of the bivariate capture-recapture distribution.Since non-parametric maximum likelihood estimation exists for submodels Mh (allowing heterogeneity only), Mb (allowing behavioural response only) and M0 (allowing no changes), we develop maximum likelihood-based model selection, specifically the Akaike information criterion (AIC) (Akaike, 1973). The AIC procedure does well in detecting behavioural response but has difficulty in detecting heterogeneity.
Article
The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.
Article
This paper considers estimation of the unknown size N of a population based on multiple capture‐recapture samples. We extend the Bayesian multiple recapture model to accommodate possible heterogeneity and dependence among the samples and possible heterogeneity within the samples. In the dependent model, we show that posterior inference for N is independent of almost all the nuisance parameters. We develop a flexible Bayesian model for heterogeneity within samples and demonstrate how Gibbs sampling can be used to calculate the Bayesian estimator for N and other quantities of interest. The performance of the proposed estimators is evaluated by simulation under both correct and incorrect model specifications, and we illustrate our methods in two examples about software review and estimation of a cottontail rabbit population.
Article
This paper considers from a Bayesian viewpoint inferences about the size of a closed animal population from data obtained by a multiple-recapture sampling scheme. The method developed enables prior information about the population size and the catch probabilities to be utilized to produce considerable improvements in certain cases on ordinary maximum likelihood methods. Several ways of expressing such prior information are explored and a practical example of the uses of these ways is given. The main result of the paper is an approximation to the posterior distribution of sample size that exhibits the contributions made by the likelihood and the prior ideas.
Article
Agresti (1994, Biometrics 50, 494-500) and Norris and Pollock (1996a, Biometrics 52, 639-649) suggested using methods of finite mixtures to partition the animals in a closed capture-recapture experiment into two or more groups with relatively homogeneous capture probabilities. This enabled them to fit the models Mh, Mbh (Norris and Pollock), and Mth (Agresti) of Otis et al. (1978, Wildlife Monographs 62, 1-135). In this article, finite mixture partitions of animals and/or samples are used to give a unified linear-logistic framework for fitting all eight models of Otis et al. by maximum likelihood. Likelihood ratio tests are available for model comparisons. For many data sets, a simple dichotomy of animals is enough to substantially correct for heterogeneity-induced bias in the estimation of population size, although there is the option of fitting more than two groups if the data warrant it.
Article
We consider the problem of comparing complex hierarchical models in which the number of parameters is not clearly defined. Using an information theoretic argument we derive a measure pD for the effective number of parameters in a model as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest. In general pD approximately corresponds to the trace of the product of Fisher's information and the posterior covariance, which in normal models is the trace of the 'hat' matrix projecting observations onto fitted values. Its properties in exponential families are explored. The posterior mean deviance is suggested as a Bayesian measure of fit or adequacy, and the contributions of individual observations to the fit and complexity can give rise to a diagnostic plot of deviance residuals against leverages. Adding pD to the posterior mean deviance gives a deviance information criterion for comparing models, which is related to other information criteria and has an approximate decision theoretic justification. The procedure is illustrated in some examples, and comparisons are drawn with alternative Bayesian and classical proposals. Throughout it is emphasized that the quantities required are trivial to compute in a Markov chain Monte Carlo analysis.
N) resid[kstar]<-log(freqnew[kstar]+0.5)-log(freq[kstar]+0.5) for(j in 1:k){for(l in 1:(k-j+1)){for(i in 1:r)
  • Freq
freq[kstar]˜dbin(p[kstar], N) freqnew[kstar] ˜ dbin(p[kstar], N) resid[kstar]<-log(freqnew[kstar]+0.5)-log(freq[kstar]+0.5) for(j in 1:k){for(l in 1:(k-j+1)){for(i in 1:r){ cellprob[i,j,l]<-pi[i]*theta[1,i]*pow(1-theta[1,i],j-1)*
Information Theory and an Extension of the Maximum Likelihood Principle
  • H Akaike
Akaike, H. (1973), "Information Theory and an Extension of the Maximum Likelihood Principle," in Proceedings of the Second International Symposium on Information Theory, eds. B. N. Petrov and F. Csaki, Budapest: Akademiai Kiado, pp. 267-281.
Finite Mixture Models
  • G Mclachlan
  • D Peel
McLachlan, G., and Peel, D. (2000), Finite Mixture Models, New York: Wiley.
  • D J Spiegelhalter
  • N G Best
  • B P Carlin
  • A Van Der Linde
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and van der Linde, A. (2002), "Bayesian Measures of Model Complexity and Fit," (with discussion), Journal of the Royal Statistical Society, Ser. B, 64, 583-639.
and Wildlife Service Resources
  • C Brownie
  • D R Anderson
  • K P Burnham
  • D S Robson
Brownie, C., Anderson, D. R., Burnham, K. P., and Robson, D. S. (1985), Statistical Inference From Band Recovery Data: A Handbook (2nd ed.), Washington, DC: U.S. Fish and Wildlife Service Resources, Publication 156.
  • D Spiegelhalter
  • A Thomas
  • N Best
  • D Lunn
Spiegelhalter, D., Thomas, A., Best, N., and Lunn, D. (2001), WinBUGS User Manual (version 1.4), Cambridge, UK: MRC Biostatistics Unit. Available at http://www.mrc-bsu.cam.ac.uk/bugs.