Article

Simple Capture-Recapture Models Permitting Unequal Catchability and Variable Sampling Effort

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

We consider two capture-recapture models that imply that the logit of the probability of capture is an additive function of an animal catchability parameter and a parameter reflecting the sampling effort. The models are special cases of the Rasch model, and satisfy the property of quasi-symmetry. One model is log-linear and the other is a latent class model. For the log-linear model, point and interval estimates of the population size are easily obtained using standard software, such as GLIM.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... If a logit-link is used for and a log-link for in this situation to model the effect of covariates, say, the required constraints become nonlinear and cannot be coded using the design matrix. Another example where constraints that are linear under one parametrization and nonlinear under another is the closed-population heterogeneity model of Agresti (1994) and Tjur (1982). Using a log-linear specification in which the encounter history probabilities are expressed in terms of parameters representing logodds of capture and log-odds ratios the required constraints are linear. ...
... Most of the work has been on closed population models, although there has been some work on modelling open population models. The log-linear approaches pioneered by Fienberg (1972), Cormack (1989) and Agresti (1994) provide a general framework for analysing mark-recapture data. In this approach the data can be thought of as contributing to a 2 k contingency table with each sample generating a binary classification according to whether or not an animal is caught (0,1). ...
... For the biologist, the parameters need to be expressed in terms of natural parameters such as survival and capture probabilities. If the log-linear approach is to develop, research needs to focus on identifying constraints on interaction terms that correspond to a reasonable set of constraints on the capture/ recovery/resighting processes such as age-dependence, temporary trap response, and simple heterogeneity (Agresti, 1994;Tjur, 1982). ...
Article
With a proliferation of mark–recapture models and studies collecting mark–recapture data, software and analysis methods are being continually revised. We consider the construction of the likelihood for a general model that incorporates all the features of the recently developed models: it is a multistate robust–design mark–recapture model that includes dead recoveries and resightings of marked animals and is parameterised in terms of state–specific recruitment, survival, movement, and capture probabilities, state–specific abundances, and state–specific recovery and resighting probabilities. The construction that we outline is based on a factorisation of the likelihood function with each factor corresponding to a different component of the data. Such a construction would allow the likelihood function for a mark–recapture analysis to be customized according to the components that are actually present in the dataset.
... This is often undertaken in population censuses on the basis of geography, race/ethnicity, housing characteristics, age and sex (for example, Hogan 1992) for the US and Brown et al. (1999) for the UK). When the covariates that account for the heterogeneity of capture are continuous, instead of categorical, so that in effect there are as many categories as individuals, Rasch-type models can be used (Agresti 1994;Fienberg et al. 1999). The choice of these covariates to ensure that capture is homogeneous across individuals requires a great deal of effort, and it is inevitable that, in some applications, there is a failure to account for all the heterogeneity leading to inaccurate estimates of the population (Chao 2001). ...
... Several authors have discussed latent class analysis within a capture-recapture framework. For instance, Agresti (1994) fits various latent class models to estimate the population of snowshoe hares. In an application to human populations, Bruno et al. (1994) estimate the incidence of diabetes in the northern Italian town of Casale Monteferrato, while Wang and Thandrayen (2009) use a similar approach to estimate the number of homeless people in the Australian city of Adelaide's central business district. ...
... In particular, we can consider various two-way and three-way interaction terms. In essence, simultnaeously modelling over strata has the advantage over modelling each stratum separately since it enables selection of more parsimonious models through restricting certain parameters to be equal over specific strata (Agresti 1994). ...
Article
Full-text available
Estimation of the unknown population size using capture-recapture techniques relies on the key assumption that the capture probabilities are homogeneous across individuals in the population. This is usually accomplished via post-stratification by some key covariates believed to influence individual catchability. Another issue that arises in population estimation from data collected from multiple sources is list dependence, where an individual’s catchability on one list is related to that of another list. The earlier models for population estimation heavily relied upon list independence. However, there are methods available that can adjust the population estimates to account for dependence among lists. In this article, we propose the use of latent class analysis through log-linear modelling to estimate the population size in the presence of both heterogeneity and list dependence. The proposed approach is illustrated using data from the 1988 US census dress rehearsal.
... Rasch and log-linear models (see also Chapters 19,20,and 22), for example, lead to estimating the individual probability to be registered by a given source depending on the source capture "ability" (quality) and the individual proneness to be registered (latent factor), see e.g. Agresti [2]. ...
... We revisit the snowshoe hares data (Cormack [86], Agresti [2]), where a sample of n = 68 hares was observed at least once on six occasions. For these data, in the literature a strong sensitivity to the dependency structure is recognized, with estimates ranging fromN = 70 toN = 90 for a set of models with and without heterogeneity. ...
... Therefore, the CMP estimator could be a good candidate to estimate the unknown population size. Parameter estimates areν = 0.77, withλ = 1.43 and the resulting estimated population size isN = 86, slightly higher than the one estimated in Agresti [2], but in line with aforementioned works. If we remove the 2 hares caught 6 times (see Figure 3.1(b)), as Cormack [86], the situation changes considerably and underdispersion is estimated (λ = 2.16;ν = 1.25), withN = 78, close to the estimate proposed by Farcomeni [114]. ...
... Rasch and log-linear models (see also Chapters 19,20,and 22), for example, lead to estimating the individual probability to be registered by a given source depending on the source capture "ability" (quality) and the individual proneness to be registered (latent factor), see e.g. Agresti [2]. ...
... We revisit the snowshoe hares data (Cormack [86], Agresti [2]), where a sample of n = 68 hares was observed at least once on six occasions. For these data, in the literature a strong sensitivity to the dependency structure is recognized, with estimates ranging fromN = 70 toN = 90 for a set of models with and without heterogeneity. ...
... Therefore, the CMP estimator could be a good candidate to estimate the unknown population size. Parameter estimates areν = 0.77, withλ = 1.43 and the resulting estimated population size isN = 86, slightly higher than the one estimated in Agresti [2], but in line with aforementioned works. If we remove the 2 hares caught 6 times (see Figure 3.1(b)), as Cormack [86], the situation changes considerably and underdispersion is estimated (λ = 2.16;ν = 1.25), withN = 78, close to the estimate proposed by Farcomeni [114]. ...
... Rasch and log-linear models (see also Chapters 19,20,and 22), for example, lead to estimating the individual probability to be registered by a given source depending on the source capture "ability" (quality) and the individual proneness to be registered (latent factor), see e.g. Agresti [2]. ...
... We revisit the snowshoe hares data (Cormack [86], Agresti [2]), where a sample of n = 68 hares was observed at least once on six occasions. For these data, in the literature a strong sensitivity to the dependency structure is recognized, with estimates ranging fromN = 70 toN = 90 for a set of models with and without heterogeneity. ...
... Therefore, the CMP estimator could be a good candidate to estimate the unknown population size. Parameter estimates areν = 0.77, withλ = 1.43 and the resulting estimated population size isN = 86, slightly higher than the one estimated in Agresti [2], but in line with aforementioned works. If we remove the 2 hares caught 6 times (see Figure 3.1(b)), as Cormack [86], the situation changes considerably and underdispersion is estimated (λ = 2.16;ν = 1.25), withN = 78, close to the estimate proposed by Farcomeni [114]. ...
Chapter
At the beginning of this century, the Dutch government initiated a project to monitor the size of the illegal immigrant population in the Netherlands, which resulted in a series of publications with yearly estimates [111,176,282,285]. The estimates are based on data extractions from police records involving illegal immigrants who had come into contact with the police during that year. These police data consist of a single record for each police contact, and also include covariates such as age and gender. By counting the number of police contacts yi = 1, 2, 3, … for each individual i = 1, …, n in the observed data, a zero-truncated count distribution is obtained. Under the assumption that the counts follow a Poisson distribution with observed heterogeneity, i.e. heterogeneity that is completely described by observed covariates, the zero-truncated Poisson regression model can be used to estimate the total population size N.
... Rasch and log-linear models (see also Chapters 19,20,and 22), for example, lead to estimating the individual probability to be registered by a given source depending on the source capture "ability" (quality) and the individual proneness to be registered (latent factor), see e.g. Agresti [2]. ...
... We revisit the snowshoe hares data (Cormack [86], Agresti [2]), where a sample of n = 68 hares was observed at least once on six occasions. For these data, in the literature a strong sensitivity to the dependency structure is recognized, with estimates ranging fromN = 70 toN = 90 for a set of models with and without heterogeneity. ...
... Therefore, the CMP estimator could be a good candidate to estimate the unknown population size. Parameter estimates areν = 0.77, withλ = 1.43 and the resulting estimated population size isN = 86, slightly higher than the one estimated in Agresti [2], but in line with aforementioned works. If we remove the 2 hares caught 6 times (see Figure 3.1(b)), as Cormack [86], the situation changes considerably and underdispersion is estimated (λ = 2.16;ν = 1.25), withN = 78, close to the estimate proposed by Farcomeni [114]. ...
... Rasch and log-linear models (see also Chapters 19,20,and 22), for example, lead to estimating the individual probability to be registered by a given source depending on the source capture "ability" (quality) and the individual proneness to be registered (latent factor), see e.g. Agresti [2]. ...
... We revisit the snowshoe hares data (Cormack [86], Agresti [2]), where a sample of n = 68 hares was observed at least once on six occasions. For these data, in the literature a strong sensitivity to the dependency structure is recognized, with estimates ranging fromN = 70 toN = 90 for a set of models with and without heterogeneity. ...
... Therefore, the CMP estimator could be a good candidate to estimate the unknown population size. Parameter estimates areν = 0.77, withλ = 1.43 and the resulting estimated population size isN = 86, slightly higher than the one estimated in Agresti [2], but in line with aforementioned works. If we remove the 2 hares caught 6 times (see Figure 3.1(b)), as Cormack [86], the situation changes considerably and underdispersion is estimated (λ = 2.16;ν = 1.25), withN = 78, close to the estimate proposed by Farcomeni [114]. ...
... Rasch and log-linear models (see also Chapters 19,20,and 22), for example, lead to estimating the individual probability to be registered by a given source depending on the source capture "ability" (quality) and the individual proneness to be registered (latent factor), see e.g. Agresti [2]. ...
... We revisit the snowshoe hares data (Cormack [86], Agresti [2]), where a sample of n = 68 hares was observed at least once on six occasions. For these data, in the literature a strong sensitivity to the dependency structure is recognized, with estimates ranging fromN = 70 toN = 90 for a set of models with and without heterogeneity. ...
... Therefore, the CMP estimator could be a good candidate to estimate the unknown population size. Parameter estimates areν = 0.77, withλ = 1.43 and the resulting estimated population size isN = 86, slightly higher than the one estimated in Agresti [2], but in line with aforementioned works. If we remove the 2 hares caught 6 times (see Figure 3.1(b)), as Cormack [86], the situation changes considerably and underdispersion is estimated (λ = 2.16;ν = 1.25), withN = 78, close to the estimate proposed by Farcomeni [114]. ...
... The random effects model variation across individuals in the probability of being observed. Early proposals modeled the random effects using a parametric mixing distribution, often chosen because of its mathematical tractability (Sanathanan [1973], Agresti [1994], Darroch et al. [1993]). Other approaches have sought to estimate the mixing distribution in a non-parametric way (Mao [2008]). ...
... The M th model (Agresti [1994]) is given by the following joint distribution for the list capture random variables X 1 , . . . , X T : ...
... These data give multiple-recapture history of n = 68 hares across T = 6 occasions. Previous analyses, including Cormack [1989], Agresti [1994], Dorazio and Royle [2003], Pledger [2005], generally agree that considerable capture heterogeneity across individuals is present. Unsurprisingly, given the small sample size, these analysts have also noted that different models accounting for heterogeneity seem to fit the data comparably well, yet result in different estimates of the population size, N . ...
Article
Capture-recapture methods aim to estimate the size of a closed population on the basis of multiple incomplete enumerations of individuals. In many applications, the individual probability of being recorded is heterogeneous in the population. Previous studies have suggested that it is not possible to reliably estimate the total population size when capture heterogeneity exists. Here we approach population estimation in the presence of capture heterogeneity as a latent length biased nonparametric density estimation problem on the unit interval. We show that in this setting it is generally impossible to estimate the density on the entire unit interval in finite samples, and that estimators of the population size have high and sometimes unbounded risk when the density has significant mass near zero. As an alternative, we propose estimating the population of individuals with capture probability exceeding some threshold. We provide methods for selecting an appropriate threshold, and show that this approach results in estimators with substantially lower risk than estimators of the total population size, with correspondingly smaller uncertainty, even when the parameter of interest is the total population. The alternative paradigm is demonstrated in extensive simulation studies and an application to snowshoe hare multiple recapture data.
... If there are no covariates that can explain the heterogeneity, then individuals can be thought of as having some random effects that determine their catchability in each sample. So under the assumption of independence across individuals, Darroch et al. (1993) and Agresti (1994) allow for heterogeneous capture probabilities by using a logit model with random effects. This model is the same as that suggested by Rasch (1960) in an application to educational testing, with individuals differing on a continuous scale. ...
... However, maximum likelihood estimation has a disadvantage in its assumption of asymptotic normality in capture-recapture. Seber (1982) and Agresti (1994) showed that the distribution of the population size estimator can be markedly skewed, and so the assumption of asymptotic normality may be flawed. Buckland and Garthwaite (1991) suggested a boostrapping procedure as a method of quantifying the precision of the population size estimator. ...
... Buckland and Garthwaite (1991) suggested a boostrapping procedure as a method of quantifying the precision of the population size estimator. Cormack (1992), Agresti (1994) and Coull and Agresti (1999) use a profile likelihood function that views the maximized likelihood as a function of the unobserved cell count. In fact, Coull and Agresti (1999) found that rather than being centred in the interval, it is common for the population estimator to be nearer to the low end of the interval which is indicative of the skewness. ...
Thesis
Full-text available
The theoretical framework of estimating the population totals from the Census, Survey and an Administrative Records List is based on capture-recapture methodology which has traditionally been employed for the measurement of abundance of biological populations. Under this framework, in order to estimate the unknown population total, N, an initial set of individuals is captured. Further subsequent captures are taken at later periods. The possible capture histories can be represented by the cells of a 2r contingency table, where r is the number of captures. This contingency table will have one cell missing, corresponding to the population missed in all r captures. If this cell count can be estimated, adding this to the sum of the observed cells will yield the population size of interest. There are a number of models that may be specied based on the incomplete (2r
... For validation purposes, the subpopulation abundance was also calculated with a method based on loglinear estimation that includes the detectability of individuals (Jolly 1963(Jolly , 1965Agresti 1994;Baillargeon and Rivest 2007). This estimation was provided by the Rcapture package working in the R environment. ...
... By contrast, one may argue that the problem of the differences in catch effectiveness may be simply solved by using the methods based on the Jolly-Seber models (Jolly 1963(Jolly , 1965. The input data for these models are supplied by the matrix of individual observations during consecutive 'capture occasions', which enables the catch probability to be calculated and included in the estimation (Cormack 1992;Agresti 1994;Royle 2006). However, when netting butterflies, the probability of catching is 'doubly biased' in relation to traps: the catching probability depends not only on the activities of individual butterflies, but also on the effectiveness of the actual catching process. ...
Article
Understanding metapopulation structures is very important in the context of ecological studies and conservation. Crucial in this respect are the abundances of both the whole metapopulation and its constituent subpopulations. In recent decades, capture-mark-recapture studies have been considered the most reliable means of calculating such abundances. In butterfly studies, individual insects are usually caught with an entomological net. But the effectiveness of this method can vary for a number of reasons: differences between fieldworkers, in time, between sites etc. This article analyses catch effectiveness data with respect to two subpopulations of the Apollo butterfly (Parnassius apollo) metapopulation in the Pieniny National Park (Polish Carpathians). The results show that this parameter varied significantly between sites, probably because of differences in microrelief and plant cover. In addition, a method is proposed that will include information on catch effectiveness for estimating the sizes of particular subpopulations and will help to elucidate the structure of the entire metapopulation.
... Once identifiability is settled for a family of models, one can begin discussing properties of population size estimates using that family, such as consistency (Sanathanan, 1972) or finite-sample risk (Johndrow et al., 2019). Currently, a literature does not exist characterizing identifiability in M th models, even though a large number of parametric M th model families have been proposed in the literature (Agresti, 1994;Coull & Agresti, 1999;Fienberg et al., 1999;Pledger, 2000;Bartolucci et al., 2004;Durban & Elston, 2005;King & Brooks, 2008). ...
... Latent class models are a classical tool for the analysis of multivariate categorical data that describe populations which can be stratified into J classes, in which the latent sampling probabilities are homogeneous for individuals within each class (Goodman, 1974;Haberman, 1979). They form a special case of the M th model, where the mixing distribution is a discrete finite mixture, and have been used for multiple-systems estimation many times (Agresti, 1994;Coull & Agresti, 1999;Pledger, 2000;Bartolucci et al., 2004;Manrique-Vallier, 2016). We denote the family of latent class models with J classes by ...
Preprint
Latent class models have recently become popular for multiple-systems estimation in human rights applications. However, it is currently unknown when a given family of latent class models is identifiable in this context. We provide necessary and sufficient conditions on the number of latent classes needed for a family of latent class models to be identifiable. Along the way we provide a mechanism for verifying identifiability in a class of multiple-systems estimation models that allow for individual heterogeneity.
... These models allow the probability of capture to be affected by three factors: time (capture probabilities vary by occasion); behaviour (probability of initial capture is different to all subsequent recaptures) and heterogeneity (each individual has a different capture probability). These models have been fitted using a variety of methods including maximum likelihood (Otis et al., 1978;Agresti, 1994;Norris and Pollock, 1996;Coull and Agresti, 1999;Pledger, 2000), the jackknife (Burnham andOverton, 1978, 1979;Pollock and Otto, 1983), moment methods based on sample coverage (Chao et al., 1992) and Bayesian methods (Casteldine, 1981;Gazey and Staley, 1986;Smith, 1988Smith, , 1991George and Robert, 1992;Diebolt and Robert, 1993;Ghosh and Norris, 2005;King and Brooks, 2008). To specifically address the problem of heterogeneity in the capture probabilities a variety of models have been proposed, including finite mixtures (Diebolt and Robert, 1993;Agresti, 1994;Norris and Pollock, 1996;Pledger, 2000) and infinite mixtures (Coull and Agresti, 1999;Dorazio and Royle, 2003). ...
... These models have been fitted using a variety of methods including maximum likelihood (Otis et al., 1978;Agresti, 1994;Norris and Pollock, 1996;Coull and Agresti, 1999;Pledger, 2000), the jackknife (Burnham andOverton, 1978, 1979;Pollock and Otto, 1983), moment methods based on sample coverage (Chao et al., 1992) and Bayesian methods (Casteldine, 1981;Gazey and Staley, 1986;Smith, 1988Smith, , 1991George and Robert, 1992;Diebolt and Robert, 1993;Ghosh and Norris, 2005;King and Brooks, 2008). To specifically address the problem of heterogeneity in the capture probabilities a variety of models have been proposed, including finite mixtures (Diebolt and Robert, 1993;Agresti, 1994;Norris and Pollock, 1996;Pledger, 2000) and infinite mixtures (Coull and Agresti, 1999;Dorazio and Royle, 2003). A comparison of examples of the two types of mixture through simulation are presented in Pledger (2005) and Dorazio and Royle (2005). ...
Article
Full-text available
We develop a multi-state model to estimate the size of a closed population from ecological capture-recapture studies. We consider the case where capture-recapture data are not of a simple binary form, but where the state of an individual is also recorded upon every capture as a discrete variable. The proposed multi-state model can be regarded as a generalisation of the commonly applied set of closed population models to a multi-state form. The model permits individuals to move between the different discrete states, whilst allowing heterogeneity within the capture probabilities. A closed-form expression for the likelihood is presented in terms of a set of sufficient statistics. The link between existing models for capture heterogeneity are established, and simulation is used to show that the estimate of population size can be biased when movement between states is not accounted for. The proposed unconditional approach is also compared to a conditional approach to assess estimation bias. The model derived in this paper is motivated by a real ecological data set on great crested newts, Triturus cristatus.
... However, differences of character or behaviour between individuals may cause indirect dependence between lists. Models that allow for varying susceptibility to capture through individuals and unequal catchability have been proposed either in the case of human populations [5] or in animal population studies [6] and psychometric models, such as the Rasch model, were successfully applied. In applying the dichotomous Rasch model to the capture-recapture context, correct or incorrect answers to an item are replaced by " being observed " or " not being observed " in a list and, if all lists are supposed to be of the same kind, it is possible to treat heterogeneity in terms of constant apparent dependence between lists (Darroch [5], Agresti [6], International Working Group for Disease Monitoring and Forecasting [2]). ...
... Models that allow for varying susceptibility to capture through individuals and unequal catchability have been proposed either in the case of human populations [5] or in animal population studies [6] and psychometric models, such as the Rasch model, were successfully applied. In applying the dichotomous Rasch model to the capture-recapture context, correct or incorrect answers to an item are replaced by " being observed " or " not being observed " in a list and, if all lists are supposed to be of the same kind, it is possible to treat heterogeneity in terms of constant apparent dependence between lists (Darroch [5], Agresti [6], International Working Group for Disease Monitoring and Forecasting [2]). Bartolucci and Forcina [7] , shown how to relax the basic assumptions of the Rasch model (conditional independence and unidimensionality) by adding some suitable columns to the design matrix of the model. ...
... However, it is often questionable in practice, and violating it may lead to biased estimation of the prevalence or population size (Brenner, 1995). While great effort has been directed toward relaxation of such assumptions, many sources (Agresti, 1994;Hook and Regal, 1995;Cormack, 1999) point out that applying popular CRC estimation strategies in practice is almost always fraught with pitfalls; this includes significant drawbacks to the popular log-linear modeling paradigm (Fienberg, 1972;Baillargeon and Rivest, 2007;Jones et al., 2014;. To better explore relationships between multiple CRC data sources, some researchers (Chatterjee and Mukherjee, 2016;Zhang and Small, 2020; have proposed sensitivity analysis to evaluate the uncertainty caused by different levels of association. ...
Preprint
Monitoring key elements of disease dynamics (e.g., prevalence, case counts) is of great importance in infectious disease prevention and control, as emphasized during the COVID-19 pandemic. To facilitate this effort, we propose a new capture-recapture (CRC) analysis strategy that takes misclassification into account from easily-administered, imperfect diagnostic test kits, such as the Rapid Antigen Test-kits or saliva tests. Our method is based on a recently proposed "anchor stream" design, whereby an existing voluntary surveillance data stream is augmented by a smaller and judiciously drawn random sample. It incorporates manufacturer-specified sensitivity and specificity parameters to account for imperfect diagnostic results in one or both data streams. For inference to accompany case count estimation, we improve upon traditional Wald-type confidence intervals by developing an adapted Bayesian credible interval for the CRC estimator that yields favorable frequentist coverage properties. When feasible, the proposed design and analytic strategy provides a more efficient solution than traditional CRC methods or random sampling-based biased-corrected estimation to monitor disease prevalence while accounting for misclassification. We demonstrate the benefits of this approach through simulation studies that underscore its potential utility in practice for economical disease monitoring among a registered closed population.
... As multiple sources [1,19,27] have suggested, this challenge creates one of the shortcomings of popular CRC paradigms such as log-linear modeling often used in practice [3,22]. In light of this issue, some authors [10,54,55] have proposed variations on sensitivity analyses to ac-knowledge uncertainty about associations between CRC surveillance streams. ...
Preprint
Surveillance research is of great importance for effective and efficient epidemiological monitoring of case counts and disease prevalence. Taking specific motivation from ongoing efforts to identify recurrent cases based on the Georgia Cancer Registry, we extend recently proposed "anchor stream" sampling design and estimation methodology. Our approach offers a more efficient and defensible alternative to traditional capture-recapture (CRC) methods by leveraging a relatively small random sample of participants whose recurrence status is obtained through a principled application of medical records abstraction. This sample is combined with one or more existing signaling data streams, which may yield data based on arbitrarily non-representative subsets of the full registry population. The key extension developed here accounts for the common problem of false positive or negative diagnostic signals from the existing data stream(s). In particular, we show that the design only requires documentation of positive signals in these non-anchor surveillance streams, and permits valid estimation of the true case count based on an estimable positive predictive value (PPV) parameter. We borrow ideas from the multiple imputation paradigm to provide accompanying standard errors, and develop an adapted Bayesian credible interval approach that yields favorable frequentist coverage properties. We demonstrate the benefits of the proposed methods through simulation studies, and provide a data example targeting estimation of the breast cancer recurrence case count among Metro Atlanta area patients from the Georgia Cancer Registry-based Cancer Recurrence Information and Surveillance Program (CRISP) database.
... In particular, a number of methods have been developed to estimate population size from capture-recapture data whilst allowing for heterogeneity in individual probability of capture. These include the jackknife estimator (Burnham and Overton, 1978), the generalised removal estimator (Otis et al., 1978) and the sample coverage approach (Lee and Chao, 1994; 39 CHAPTER 3. HETEROGENEOUS TRAPPABILITY AND PEST MANAGEMENT 1992), as well as some likelihood-based methods (Yang and Chao, 2005;Pledger, 2000;Norris and Pollock, 1996;Yip et al., 1995;Agresti, 1994;Huggins, 1991), bayesian probability models (Anderson et al., 2016;Mäntyniemi et al., 2005) and mixture models (Dorazio and Royle, 2003). Most of these methods are based on the heterogeneous population models firstly described by Otis et al. (1978), and some of them are used in modern software for density estimations from mark-recapture or capture-recapture data . ...
Thesis
Full-text available
When modelling the population dynamics of wild animals we traditionally assume individual variation in behaviour is of only minor relevance to population dynamics. However, just like humans, animals exhibit consistent variation in behaviour among individuals (“personality”) and most wild populations are behaviourally heterogeneous. In this thesis, we defend the argument that individual heterogeneity in animal behaviour should not be treated only as a source of “noise” in models. Instead, significant behavioural differences between members of the same species can have important consequences for population-level processes and ecological interactions. We ask to what extent individual heterogeneity affects pest eradication, what modelling strategies can be used and what kind of empirical data allow us to quantify these effects. Using the example of invasive mammal pest species in New Zealand, we first perform a meta-analysis to summarise some key characteristics of these species’ trappability and space use, across a range of population densities, habitats and types of surveillance device. We then used numerical simulations to show that individual heterogeneity and the possible transmission of personalities from parent to offspring can have significant effects on the eradication of these species. Finally, we analyse empirical data from field trials to explore the different behavioural profiles observable in North Island brown kiwi, a bird species at the core of New Zealand’s wildlife conservation efforts. The significance of this study is that it adds to our theoretical understanding of animal personalities by introducing a focus on their implications on wildlife management, and informs on what factors to consider when designing field experiments aimed at quantifying animal personalities.
... La méthode capture-recapture à K sources fut également appliquée à des populations en prenant en compte les migrations, naissances et décès pendant la période d'étude (la population était alors considérée comme « ouverte ») (Pollock, 1991 ;Seber, 1992 ;Agresti 1994 ;Norris et al, 1996). Cependant, ces modèles dépendaient de l'hypothèse que les sources soient indépendantes. ...
Thesis
Full-text available
WAHID (OIE), ProMed-mail and EMPRES-I (FAO) are three monitoring systems, which notify disease events in animal health worldwide. This study aimed to estimate the actual number of “exceptional epidemiological events” for the 63 animal diseases reported by the three systems in the countries submitting reports to the OIE, between 2005 and 2010, and to estimate the exhaustiveness of surveillance by the capture-recapture method with three sources. Lots of information exchanges have been identified between the three networks. The total number of exceptional events was estimated at 841 [95 % CI =824-862].WAHID exhaustiveness rate was estimated at 92% [95% CI = 89-93], 57% [95% CI = 55-58] for ProMed-ail and 51% [95% CI = 50-52] for EMPRES-i. These results, and especially the marked superiority of WAHID exhaustiveness rate compared to the other networks that are less formal, are surprising.
... For fishery species, mark-recapture experiments have been designed to investigate local population sizes and sources of mortality like fishery exploitation rates (Seber 1986;Pine et al. 2003). Models for analyzing mark-recapture data have been adapted to address various sources of uncertainty, including unequal catchability (Chao 1987;Agresti 1994), mixed stocks (Michielsens et al. 2006), and tag loss (Kremers 1988;Conn et al. 2004). Mark-recapture studies also have been used to study animal movements (Dorazio et al. 1994;Aguilar et al. 2005;Trudel et al. 2009). ...
Article
Full-text available
Despite the need to quantify total catch to support sustainable fisheries management, estimating harvests of recreational fishers remains a challenge. Harvest estimates from mark–recapture studies have proven valuable, yet animal movements and migrations may bias some of these estimates. To improve recreational harvest estimates, explore seasonal and spatial harvest patterns, and understand the influence of animal movement on exploitation rates, we conducted a mark–recapture experiment for the blue crab (Callinectes sapidus) fishery in Maryland waters of Chesapeake Bay, USA. Data were analyzed with standard tag-return methods and with revised equations that accounted for crab movement between reporting areas. Using standard calculations, state-wide recreational harvest was estimated to be 4.04 million crabs. When movement was included in the calculations, the estimate was 5.39 million, an increase of 34%. With crab movement, recreational harvest in Maryland was estimated to be 6.5% of commercial harvest, a finding consistent with previous effort surveys. The new methods presented herein are broadly applicable for estimating recreational harvest in fisheries that target mobile species and for which spatial variation in commercial harvest is known.
... [24], where k is the number of captures, Poisson and Gamma models with alternatives to the parameter defaults values of 2 and 3.5 can be fitted using the "colsedp" functions. Darroch's models for M h was considered by Darroch [24][25]. ...
Article
Full-text available
A popular method for dealing with an unknown population size is the capture-recapture method (CRC). Then capture-recapture models were applied in the adjustment of epidemiology data. The aim of study was to compare the estimator of capture-recapture models in the closed population for estimation the number of opioid use disorders (OUD). The data of opioid use disorders in 2014, in New South Wales, Australia. The CRC was use to estimate the number of OUD cases based on three data sources, including patient department, emergency department, and national death index. The data were analyzed using the Rcapture package. The model used M 0 , M t , M h , and M b to compare the assessment of OUD data. The minimum Akaike's Information Criterion (AIC) value was used to select the best model. The three data sources were: 87 patients at the Patient Department, 407 cases from Emergency Department, and 15 cases from the National Death Index. The overlapping of the three data sources involved 54 cases. The results showed that the estimates obtained were 666 cases from M 0 model and M h Chao (LB), 465 cases from M t model, 433 cases from M h Poisson 2, 351 cases from M h Gamma 3.5 and 503 cases from M b model. The smallest AIC was obtained for the M t model (AIC = 51.962). With the M t model more suitable for OUD were estimated. This model should be able to apply to any setting with similar context. The method provided a simple, quick method to estimate the numbers of OUD.
... In particular, a number of methods have been developed to estimate population size from capture-recapture data whilst allowing for heterogeneity in individual probability of capture. These include the jackknife estimator (Burnham and Overton 1978), the generalised removal estimator (Otis et al. 1978) and the sample coverage approach (Lee and Chao 1994;Chao et al. 1992), as well as some likelihood-based methods (Yang and Chao 2005;Pledger 2000;Norris and Pollock 1996;Yip et al. 1995;Agresti 1994;Huggins 1991), bayesian probability models (Anderson et al. 2016;Mäntyniemi et al. 2005) and mixture models (Dorazio and Royle 2003). Most of these methods are based on the heterogeneous population models firstly described by Otis et al. (1978), and some of them are used in modern software for density estimations from mark-recapture or capture-recapture data (Efford et al. 2004). ...
Article
Full-text available
1. The eradication of invasive small mammal pests is a challenging undertaking, but is needed in many areas of the world to preserve biodiversity. Trapping and poisonous baits are some of the most widespread tools for pest control. Most of the models used to make predictions and to design effective trapping protocols assume that pest populations are behaviourally homogeneous and, in particular, that all individuals react the same way when confronted with a trap or bait. In this study, we analyse the effect of consistent variations in trappability across a pest population on the success of eradication and the time taken to be confident that eradication has occurred. 2. We present results obtained using both a simple, stochastic, individual-based model, and an analytical approach. Simulations were run using two different modelling techniques, one where individuals display consistent daily behaviour towards traps and one with variable daily behaviour. We then show how to use our model to detect and measure heterogeneity in a population using capture data. 3. Results show that neglecting individual heterogeneity in trappability leads to overly optimistic predictions for the efficacy of eradications operations. The presence of even a small proportion of relatively trap-shy individuals is shown to make eradication much more difficult 4. In this study we reveal how individual heterogeneity can affect capture probabilities and the outcome of pest eradications. Such information contributes towards improved pest management designs, needed by ecological operations making use of trapping systems.
... For example civilian and military deaths in the Syrian conflict data are captured differently by some data lists. Rasch models and extensions on them [Rasch, 1993] [Darroch et al., 1993] [Agresti, 1994] [Fienberg et al., 1999] incorporate individual heterogeneity into the log-linear model. A more flexible method, mixture models has also been used to capture individual heterogeneity [Manrique-Vallier and Fienberg, 2008] [Manrique-Vallier, 2016]. ...
Preprint
Full-text available
Heterogeneity of response patterns is important in estimating the size of a closed population from multiple recapture data when capture patterns are different over time and location. In this paper, we extend the non-parametric one layer latent class model for multiple recapture data proposed by Manrique-Vallier (2016) to a nested latent class model with the first layer modeling individual heterogeneity and the second layer modeling location-time differences. Location-time groups with similar recording patterns are in the same top layer latent class and individuals within each top layer class are dependent. The nested latent class model incorporates hierarchical heterogeneity into the modeling to estimate population size from multi-list recapture data. This approach leads to more accurate population size estimation and reduced uncertainty. We apply the method to estimating casualties from the Syrian conflict.
... Extensions to the basic log-linear models are provided. Just to mention a few examples, Cormack (1989) discusses the use of log-linear models for dependence and the detection of the presence of heterogeneity in capture probabilities; Darroch et al. (1993) and Agresti (1994) introduce models in the generalised class of Rasch models to explain the heterogeneity in capture probabilities; Coull and Agresti (1999) introduce generalised mixture models. Evans et al. (1994) suggest applying log-linear models when the heterogeneity effects can be explained by the observable covariates. ...
Article
Full-text available
Data integration is now common practice in official statistics and involves an increasing number of sources. When using multiple sources, an objective is to assess the unknown size of the population. To this aim, capture-recapture methods are applied. Standard capture-recapture methods are based on a number of strong assumptions, including the absence of errors in the integration procedures. However, in particular when the integrated sources were not originally collected for statistical purposes, this assumption is unlikely and linkage errors (false links and missing links) may occur. In this article, the problem of adjusting population estimates in the presence of linkage errors in multiple lists is tackled; under homogeneous linkage error probabilities assumption, a solution is proposed in a realistic and practical scenario of multiple lists linkage procedure.
... The logic behind this is to improve the goodness of fit of the model by partitioning units into two or more homogeneous groups, according to a discrete latent variable (see e.g., Pledger 2000). In particular, the use of LCM in capture-recapture dates back to Agresti (1994). Since then, several extensions to the LCM models have been proposed to include covariates to model observed heterogeneity, and to relax the local independence assumption of the LCM, that is, the hypothesis of independence of captures of the same unit in different sources conditionally on the latent variable (e.g., Bartolucci and Forcina 2001). ...
Article
Full-text available
The quantity and quality of administrative information available to National Statistical Institutes have been constantly increasing over the past several years. However, different sources of administrative data are not expected to each have the same population coverage, so that estimating the true population size from the collective set of data poses several methodological challenges that set the problem apart from a classical capture-recapture setting. In this article, we consider two specific aspects of this problem: (1) misclassification of the units, leading to lists with both overcoverage and undercoverage; and (2) lists focusing on a specific subpopulation, leaving a proportion of the population with null probability of being captured. We propose an approach to this problem that employs a class of capturerecapture methods based on Latent Class models. We assess the proposed approach via a simulation study, then apply the method to five sources of empirical data to estimate the number of active local units of Italian enterprises in 2011.
... Then an estimate of the abundanceN is derived from the loglinear parameters. In order to account for heterogeneity, three log-linear models are used: Chao, Poisson and Darroch (Chao, 1987;Darroch et al., 1993;Agresti, 1994;Rivest and Baillargeon, 2007 ...
Thesis
Full-text available
Gewaltsamer Konflikt ist eine der hartnäckigsten Bedrohungen des Lebensunterhalts und der Nahrungssicherheit von Individuen weltweit. Trotz einer wachsenden Literatur, die die Ursachen und Folgen von Konflikten untersucht, bestehen nach wie vor erhebliche Verständnislücken, die zum Teil auf einen Mangel an qualitativ hochwertigen Konfliktereignisdaten zurückgehen. Mit Hilfe moderner ökonometrischer und statistischer Methoden trägt diese Monographie empirisch zur Literatur bei, indem sie sich mit drei miteinander verknüpften Themen befasst: (i) die Auswirkungen von Gewalterfahrungen auf Radikalisierung; (ii) das Ausmaß von Verzerrungen ("bias") in medienbasierten Konfliktereignisdaten; sowie (iii) die Rolle von Gewalt in benachbarten Gebieten für die Vorhersage von Ausbruch und Eskalation von Konflikten. Erstens zeigt eine Analyse des Gaza-Krieges von 2009, dass Menschen, die Gewalt direkt ausgesetzt sind, radikale Gruppen im Durchschnitt weniger unterstützen. Wenn frühere Wahlpräferenzen statistisch einbezogen werden, besitzt Gewalt jedoch eine polarisierende Wirkung im Wahlverhalten. Zweitens schätzt eine Auswertung syrischer Konfliktereignisdaten basierend auf internationalen und nationalen Quellen, dass Medien über nur knapp zehn Prozent der auftretenden Ereignisse berichten. Zudem ist die Berichterstattung stark räumlich und nach Konflikt-Akteuren verzerrt. Drittens stellt sich anhand von Paneldaten kleiner geographischer Zellen heraus, dass die räumliche und zeitliche Dynamik von Gewalt starken Einfluss auf sowohl den Ausbruch als auch die Eskalation von Konflikten an einem bestimmten Ort hat. In hochaufgelösten Analysen erhöht Gewalt in benachbarten Raumzellen jedoch nicht die Vorhersagekraft des Modells. Auf Grundlage der empirischen Befunde entwickelt diese Arbeit eine neue Methode zur Erhebung von Konfliktdaten, die auf direkte Informationsquellen vor Ort zurückgreift ("crowdseeding"), um Politik und Forschung verlässlichere Daten zu bieten.
Article
Full-text available
Natural populations that are rare, cryptic or inaccessible provide a monumental challenge to monitoring, as adequate data are extremely difficult to collect. Surveys often encompass only a small portion of a population's range due to difficult terrain or inclement weather, especially for populations with extensive ranges. Thus, to maximise encounters, sampling efforts may be largely opportunistic or biased to accessible areas. The resulting sparse and spatially biased data may be difficult to model, standardise across years and incorporate into an assessment or management framework. However, in many monitoring programs, there are usually multiple threads of data that, though each may have its own limitations, can be synthesised to reveal important ecological processes. Here, we demonstrate a simple technique to incorporate two additional streams of data on the same population, telemetry and survey effort data, into capture‐recapture analyses to address spatiotemporal sampling bias using simulated data. Utilisation distributions (UDs) computed from telemetry data are overlaid with UDs of survey efforts, providing an ‘effort by animal space use’ overlap covariate for modelling detection in a Jolly–Seber open population model. Using simulated data, we found that our method resulted in more accurate and precise estimates of abundance than traditional capture‐recapture models. We then applied this method to a 16 year photo‐identification capture‐recapture dataset (n = 143 individuals) along with telemetry data (n = 44 satellite tag deployments) collected from the endangered population of false killer whales resident to the main Hawaiian Islands. Incorporating space use and effort into this analysis improved precision of abundance estimates relative to previous modelling endeavours.
Chapter
Capture–recapture methods for both open and closed populations have developed extensively in recent years, especially with the development of sophisticated computer programs and packages. There are now many different methods to estimate the abundance of closed populations. These include standard maximum likelihood methods, jackknife methods, coverage models, martingale estimating equation models, log-linear models, logistic models, non-parametric models, and mixture models, which are all discussed in some detail. Because of the large amount of materials, Bayesian methods are considered in the next chapter for convenience, as those methods are being used more. Covariates such as environmental variables are being used more, and with improved monitoring devices, including DNA methods, we can expect covariate methods to increase. The two-sample capture–recapture model has been extensively used with a focus on variable catchability, use of two observers, which can also help with detectability problems, epidemiological populations using two lists (or later more lists), and dual record systems. For three or more capture–recapture samples, the glue behind the model development has been the setting out of eight particular model categories, due to Pollock, providing for a time factor, a behavioral factor, a heterogeneity factor, and combinations of these. Several variations of these have also been developed by various researchers, including time-to-detection models. Heterogeneity has been the biggest challenge and, as well as various models, has also been considered using covariates or even stratification where possible underlying assumptions are tested. Finally, sampling one at a time and continuous models are considered in detail. With this plethora of methods, the practitioner is left in a quandary. What methods are appropriate for what conditions and types of studies? What is needed here is a comparison of the various closed models with respect to both efficiency and robustness. Also, further research is needed on interval estimation, with intervals based on profile likelihoods becoming more popular, and on model diagnostics.
Article
Surveillance research is of great importance for effective and efficient epidemiological monitoring of case counts and disease prevalence. Taking specific motivation from ongoing efforts to identify recurrent cases based on the Georgia Cancer Registry, we extend recently proposed "anchor stream" sampling design and estimation methodology. Our approach offers a more efficient and defensible alternative to traditional capture-recapture (CRC) methods by leveraging a relatively small random sample of participants whose recurrence status is obtained through a principled application of medical records abstraction. This sample is combined with one or more existing signaling data streams, which may yield data based on arbitrarily non-representative subsets of the full registry population. The key extension developed here accounts for the common problem of false positive or negative diagnostic signals from the existing data stream(s). In particular, we show that the design only requires documentation of positive signals in these non-anchor surveillance streams, and permits valid estimation of the true case count based on an estimable positive predictive value (PPV) parameter. We borrow ideas from the multiple imputation paradigm to provide accompanying standard errors, and develop an adapted Bayesian credible interval approach that yields favorable frequentist coverage properties. We demonstrate the benefits of the proposed methods through simulation studies, and provide a data example targeting estimation of the breast cancer recurrence case count among Metro Atlanta area patients from the Georgia Cancer Registry-based Cancer Recurrence Information and Surveillance Program (CRISP) database.
Article
Capture–recapture (CRC) surveys are used to estimate the size of a population whose members cannot be enumerated directly. CRC surveys have been used to estimate the number of Coronavirus Disease 2019 (COVID-19) infections, people who use drugs, sex workers, conflict casualties, and trafficking victims. When k-capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a 2k contingency table in which one element—the number of individuals appearing in none of the samples—remains unobserved. In the absence of additional assumptions, the population size is not identifiable (i.e., point identified). Stringent assumptions about the dependence between samples are often used to achieve point identification. However, real-world CRC surveys often use convenience samples in which the assumed dependence cannot be guaranteed, and population size estimates under these assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a nontrivial confidence set for the true population size. We construct confidence sets under bounds on pairwise capture probabilities using two methods: test inversion bootstrap confidence intervals and profile likelihood confidence intervals. Simulation results demonstrate well-calibrated confidence sets for each method. In an extensive real-world study, we apply the new methodology to the problem of using heterogeneous survey data to estimate the number of people who inject drugs in Brussels, Belgium.
Article
The application of serial principled sampling designs for diagnostic testing is often viewed as an ideal approach to monitoring prevalence and case counts of infectious or chronic diseases. Considering logistics and the need for timeliness and conservation of resources, surveillance efforts can generally benefit from creative designs and accompanying statistical methods to improve the precision of sampling-based estimates and reduce the size of the necessary sample. One option is to augment the analysis with available data from other surveillance streams that identify cases from the population of interest over the same timeframe, but may do so in a highly nonrepresentative manner. We consider monitoring a closed population (e.g., a long-term care facility, patient registry, or community), and encourage the use of capture–recapture methodology to produce an alternative case total estimate to the one obtained by principled sampling. With care in its implementation, even a relatively small simple or stratified random sample not only provides its own valid estimate, but provides the only fully defensible means of justifying a second estimate based on classical capture–recapture methods. We initially propose weighted averaging of the two estimators to achieve greater precision than can be obtained using either alone, and then show how a novel single capture–recapture estimator provides a unified and preferable alternative. We develop a variant on a Dirichlet-multinomial-based credible interval to accompany our hybrid design-based case count estimates, with a view toward improved coverage properties. Finally, we demonstrate the benefits of the approach through simulations designed to mimic an acute infectious disease daily monitoring program or an annual surveillance program to quantify new cases within a fixed patient registry.
Chapter
The most basic quantitative question about the consequences of armed conflicts is perhaps how many people were killed. During and after conflicts, it is common to attempt to create tallies of victims. However, destroyed infrastructure and institutions, danger to field workers, and a reasonable suspicion of data collection by victim communities limit the result of these efforts to incomplete and non-representative lists. Capture-Recapture (CR) estimation, also known as Multiple Systems Estimation (MSE) in the context of human populations, is a family of methods for estimating the size of closed populations based on matched incomplete samples. CR methods vary in details and complexity, but they all ultimately rely on analyzing the patterns of inclusion of individuals across samples to estimate the probability of not being observed and then the number of unobserved individuals. In this discussion, we describe the versions MSE with which analysts have estimated the total number of casualties in armed conflicts. We explore the advances of the last 15 years, and we describe outstanding statistical challenges.
Chapter
The capture-recapture approach is a common and potentially useful paradigm for estimating the total number (N) of cases or deaths via multiple registries in epidemiological studies. Using data on childhood deaths from two sources in a Sierra Leone chiefdom collected by the Child Health and Mortality Prevention Surveillance (CHAMPS) project team as a motivating example, we consider point and interval estimation in the two-capture case. We focus primarily on closed population scenarios under what we term and clarify as the LP conditions, i.e., assumptions that make the well-known Lincoln-Petersen (LP) and Chapman estimators valid. We clarify the unverifiable nature of assumptions about a key population-level parameter (akin to a relative risk of capture) implicitly made by popular alternatives such as loglinear models and the estimator of Chao (Biometrics. 43:783–791, 1987). We argue that the LP conditions remain the most central and useful given the possibility to defend them within strata of judiciously chosen covariates and/or to ensure them by design. We then propose two new multinomial distribution-based estimators that are valid under those conditions. The first adjusts for typical (mean) bias and provides a potentially preferable alternative to the Chapman estimator. The second targets reduced median bias, which is generally overlooked as a performance criterion in the capture-recapture setting. Finally, we develop an approach geared toward improved confidence intervals in this setting that utilizes refinements to the posterior distribution of the proposed mean bias-adjusted estimand within a Bayesian credible interval strategy. The proposed point and interval estimators are evaluated in comparison with others through simulation studies.
Article
We propose a method to test for the presence of differential ascertainment in case‐control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe bias in the computation of the odds ratio. We can alleviate the effect of such bias using the estimates that our method of testing for differential ascertainment naturally provides. We apply it to a dataset obtained from the National Violent Death Reporting System, with the goal of checking for the presence of differential ascertainment by race in the count of deaths caused by child maltreatment.
Article
We propose a method for estimating the size of a population in a multiple record system in the presence of missing data. The method is based on a latent class model where the parameters and the latent structure are estimated using a Gibbs sampler. The proposed approach is illustrated through the analysis of a data set already known in the literature, which consists of five registrations of neural tube defects.
Article
Population estimation methods are used for estimating the size of a population from samples of individuals. In many applications, the probability of being observed in the sample varies across individuals, resulting in sampling bias. We show that in this setting, estimators of the population size have high and sometimes infinite risk, leading to large uncertainty in the population size. As an alternative, we propose estimating the population of individuals with observation probability exceeding a small threshold. We show that estimators of this quantity have lower risk than estimators of the total population size. The proposed approach is shown empirically to result in large reductions in mean squared error in a common model for capture-recapture population estimation with heterogeneous capture probabilities.
Chapter
This article has no abstract.
Book
You're being asked to quantify your usability improvements with statistics. But even with a background in statistics, you are hesitant to statistically analyze their data, as they are often unsure which statistical tests to use and have trouble defending the use of small test sample sizes. The book is about providing a practical guide on how to solve common quantitative problems arising in usability testing with statistics. It addresses common questions you face every day such as: Is the current product more usable than our competition? Can we be sure at least 70% of users can complete the task on the 1st attempt? How long will it take users to purchase products on the website? This book shows you which test to use, and how provide a foundation for both the statistical theory and best practices in applying them. The authors draw on decades of statistical literature from Human Factors, Industrial Engineering and Psychology, as well as their own published research to provide the best solutions. They provide both concrete solutions (excel formula, links to their own web-calculators) along with an engaging discussion about the statistical reasons for why the tests work, and how to effectively communicate the results. *Provides practical guidance on solving usability testing problems with statistics for any project, including those using Six Sigma practices *Show practitioners which test to use, why they work, best practices in application, along with easy-to-use excel formulas and web-calculators for analyzing data *Recommends ways for practitioners to communicate results to stakeholders in plain English. © 2012 Jeff Sauro and James R. Lewis Published by Elsevier Inc. All rights reserved.
Article
The Conway–Maxwell–Poisson estimator is considered in this paper as the population size estimator. The benefit of using the Conway–Maxwell–Poisson distribution is that it includes the Bernoulli, the Geometric and the Poisson distributions as special cases and, furthermore, allows for heterogeneity. Little emphasis is often placed on the variability associated with the population size estimate. This paper provides a deep and extensive comparison of bootstrap methods in the capture–recapture setting. It deals with the classical bootstrap approach using the true population size, the true bootstrap, and the classical bootstrap using the observed sample size, the reduced bootstrap. Furthermore, the imputed bootstrap, as well as approximating forms in terms of standard errors and confidence intervals for the population size, under the Conway–Maxwell–Poisson distribution, have been investigated and discussed. These methods are illustrated in a simulation study and in benchmark real data examples.
Article
Full-text available
Individual heterogeneity in capture probabilities and time dependence are fundamentally important for estimating the closed animal population parameters in capture-recapture studies. A generalized estimating equations (GEE) approach accounts for linear correlation among capture-recapture occasions, and individual heterogeneity in capture probabilities in a closed population capture-recapture individual heterogeneity and time variation model. The estimated capture probabilities are used to estimate animal population parameters. Two real data sets are used for illustrative purposes. A simulation study is carried out to assess the performance of the GEE estimator. A Quasi-Likelihood Information Criterion (QIC) is applied for the selection of the best fitting model. This approach performs well when the estimated population parameters depend on the individual heterogeneity and the nature of linear correlation among capture-recapture occasions.
Chapter
Capture–recapture methods are used to estimate population sizes and related parameters such as survival rates, birth rates, and migration rates. A typical capture–recapture experiment places traps in the study area and samples the population several times. At each trapping sample, one records and attaches a unique tag to every unmarked animal, records the capture of any animal that has been previously tagged, and returns all animals to the population. This article describes the history of the method, discusses applications to epidemiology, discusses potential models, and discusses the challenges inherent in estimating the population size well.
Article
The Rasch model has been used to estimate the unknown size of a population from multi-list data. It can take both the list effectiveness and individual heterogeneity into account. Estimating the population size is shown to be equivalent to estimating the odds that an individual is unseen. The odds parameter is nonidentifiable. We propose a sequence of estimable lower bounds, including the greatest one, for the odds parameter. We show that a lower bound can be calculated by linear programming. Estimating a lower bound of the odds leads to an estimator for a lower bound of the population size. A simulation experiment is performed and three real examples are studied.
Article
This text is written to provide a mathematically sound but accessible and engaging introduction to Bayesian inference specifically for environmental scientists, ecologists and wildlife biologists. It emphasizes the power and usefulness of Bayesian methods in an ecological context. The advent of fast personal computers and easily available software has simplified the use of Bayesian and hierarchical models. One obstacle remains for ecologists and wildlife biologists, namely the near absence of Bayesian texts written specifically for them. The book includes many relevant examples, is supported by software and examples on a companion website and will become an essential grounding in this approach for students and research ecologists. Engagingly written text specifically designed to demystify a complex subject. Examples drawn from ecology and wildlife research. An essential grounding for graduate and research ecologists in the increasingly prevalent Bayesian approach to inference. Companion website with analytical software and examples. Leading authors with world-class reputations in ecology and biostatistics.
Article
Every day, biologists in parkas, raincoats, and rubber boots go into the field to capture and mark a variety of animal species. Back in the office, statisticians create analytical models for the field biologists' data. But many times, representatives of the two professions do not fully understand one another's roles. This book bridges this gap by helping biologists understand state-of-the-art statistical methods for analyzing capture-recapture data. In so doing, statisticians will also become more familiar with the design of field studies and with the real-life issues facing biologists.Reliable outcomes of capture-recapture studies are vital to answering key ecological questions. Is the population increasing or decreasing? Do more or fewer animals have a particular characteristic? In answering these questions, biologists cannot hope to capture and mark entire populations. And frequently, the populations change unpredictably during a study. Thus, increasingly sophisticated models have been employed to convert data into answers to ecological questions. This book, by experts in capture-recapture analysis, introduces the most up-to-date methods for data analysis while explaining the theory behind those methods. Thorough, concise, and portable, it will be immensely useful to biologists, biometricians, and statisticians, students in both fields, and anyone else engaged in the capture-recapture process.
Article
Full-text available
There have been no estimators of population size associated with the capture-recapture model when the capture probabilities vary by time and individual animal. This work proposes a nonparametric estimation technique that is appropriate for such a model using the idea of sample coverage, which is defined as the proportion of the total individual capture probabilities of the captured animals. A simulation study was carried out to show the performance of the proposed estimation procedure. Numerical results indicate that it generally works satisfactorily when the coefficient of variation of the individual capture probabilities is relatively large. An example is also given for illustration.
Chapter
The analysis of capture-recapture data from 7 or fewer samples can be easily carried out using the GLIM computer package. A program is given which enables the user to be guided by the data towards selection of an appropriate model. Five examples are discussed in detail to demonstrate how easy and flexible the analysis is. These show how misleading the estimates from unthinking application of a Jolly-Seber analysis can be.
Article
We use martingale theory and a method-of-moments technique to derive a class of estimators for the size of a closed population in a capture-recapture experiment in discrete time where the capture probabilities on the successive capture occasions are allowed to be different. The same capture probability is assumed to apply to all animals in the population on the same occasion. Explicit expressions are given for these estimators and their associated standard errors which involve only simple computation. An optimal estimator for the population size is found. Asymptotic results are readily obtained by an application of a martingale central limit theorem. Two examples are given to compare the results.
Article
Log-linear models are developed for capture-recapture experiments, and their advantages and disadvantages discussed. Ways in which they can be extended, sometimes with only partial success, to open populations, subpopulations, trap dependence, and long chains of recapture periods are presented. The use of residual patterns, and analysis of subsets of data, to identify behavioural patterns and acceptable models is emphasised and illustrated with two examples.
Article
The use of conditional likelihood methods in the analysis of capture data allows the modeling of capture probabilities in terms of observable characteristics of the captured individuals and the trapping occasions. The resulting models may then be used to estimate the size of the population. Here the use of conditional likelihood procedures to construct models for capture probabilities is discussed and illustrated by an example.
Article
The multiple recapture census for closed populations is reconsidered, assuming an underlying multinomial sampling model. The resulting data can be put in the form of an incomplete 2k contingency table, with one missing cell, that displays the full multiple recapture history of all individuals in the population. Log linear models are fitted to this incomplete contingency table, and the simplest plausible model that fits the observed cells is projected to cover the miming cell, thus yielding an estimate of the total population size. Asymptotic variances for the estimate of the population size are considered, and the techniques are illustrated on a population of children possessing a common congenital anomaly.
Article
SUMMARY Capture-recapture models are widely used in the estimation of population sizes. Based on data augmentation considerations, we show how Gibbs sampling can be applied to calculate Bayes estimates in this setting. As a result, formulations which were previously avoided because of analytical and numerical intractability can now be easily considered for practical application. We illustrate this potential by using Gibbs sampling to calculate Bayes estimates for a hierarchical capture-recapture model in a real example.
Article
SUMMARY An estimation method of Lloyd & Yip (1991), based on martingales and the assumption of beta distributed capture probabilities, is modified to improve its performance for all but highly heterogeneous populations. This estimator is compared to the jackknife estimator by simulation under both beta and non-beta conditions. Some expressions for standard error are developed but found to behave poorly for realistic sample sizes. Bootstrap estimates of standard error appear superior but do not lead to accurate confidence intervals because of the difficulties in bias estimation.
Article
SUMMARY We consider inferences about Poisson log linear models for which only certain disjoint sums of the data are observable. We derive an explicit formula for the observed information matrix associated with the log linear parameters that is intuitively appealing and simple to evaluate.
Article
The Rasch model for item analysis is an important member of the class of exponential response models in which the number of nuisance parameters increases with the number of subjects, leading to the failure of the usual likelihood methodology. Both conditional-likelihood methods and mixture-model techniques have been used to circumvent these problems. In this article, we show that these seemingly unrelated analyses are in fact closely linked to each other, despite dramatic structural differences between the classes of models implied by each approach. We show that the finite-mixture model for J dichotomous items having T latent classes gives the same estimates of item parameters as conditional likelihood on a set whose probability approaches one if T ≥ (J + 1)/2. Unconditional maximum likelihood estimators for the finite-mixture model can be viewed as Keifer-Wolfowitz estimators for the random-effects version of the Rasch model. Latent-class versions of the model are especially attractive when T is small relative to J. We analyze several sets of data, propose simple diagnostic checks, and discuss procedures for assigning scores to subjects based on posterior means. A flexible and general methodology for item analysis based on latent class techniques is proposed.
Article
This paper considers a wide class of latent structure models. These models can serve as possible explanations of the observed relationships among a set of m manifest polytomous variables. The class of models considered here includes both models in which the parameters are identifiable and also models in which the parameters are not. For each of the models considered here, a relatively simple method is presented for calculating the maximum likelihood estimate of the frequencies in the m-way contingency table expected under the model, and for determining whether the parameters in the estimated model are identifiable. In addition, methods are presented for testing whether the model fits the observed data, and for replacing unidentifiable models that fit by identifiable models that fit. Some illustrative applications to data are also included.
Article
Estimators of population size under two commonly used models (the time- variation model and the heterogeneity model) for sparse capture-recapture data are proposed. A real data set of Illinois mud turtle (Kinosternon flavescens spooneri) is used to illustrate the methods and to compare them with other estimators. A simulation study was carried out to show the performance and robustness of the proposed estimators.
Article
It is common in the medical, biological, and social sciences for the categories into which an object is classified not to have a fully objective definition. Theoretically speaking the categories are therefore not completely distinguishable. The practical extent of their distinguishability can be measured when two expert observers classify the same sample of objects. It is shown, under reasonable assumptions, that the matrix of joint classification probabilities is quasi-symmetric, and that the symmetric matrix component is non-negative definite. The degree of distinguishability between two categories is defined and is used to give a measure of overall category distinguishability. It is argued that the kappa measure of observer agreement is unsatisfactory as a measure of overall category distinguishability.
Article
Data of L. M. Wiggins from three-wave panels, each with a single dichotomous response, illustrate the use of models with response probabilities that vary over occasions or over individuals, or neither, or both, with a “no interaction” combination of the two being specified for the last case, which can also be derived from the Rasch measurement model. Models more complicated than these, allowing for changes in individual parameters (interaction of occasions and persons) or serial dependence of responses are considered when the Rasch-type model does not adequately describe the data.
Article
A model which allows capture probabilities to vary by individuals is introduced for multiple recapture studies n closed populations. The set of individual capture probabilities is modelled as a random sample from an arbitrary probability distribution over the unit interval. We show that the capture frequencies are a sufficient statistic. A nonparametric estimator of population size is developed based on the generalized jackknife; this estimator is found to be a linear combination of the capture frequencies. Finally, tests of underlying assumptions are presented.
Article
A procedure is given for estimating the size of a closed population in the presence of heterogeneous capture probabilities using capture-recapture data when it is possible to model the capture probabilities of individuals in the population using covariates. The results include the estimation of the parameters associated with the model of the capture probabilities and the use of these estimated capture probabilities to estimate the population size. Confidence intervals for the population size using both the asymptotic normality of the estimator and a bootstrap procedure for small samples are given.
Article
Textbooks continue to recommend the use of an asymptotic normal distribution to provide an interval estimate for the unknown size, N, of a closed population studied by a mark-recapture experiment or multiple-record system. A likelihood interval approach is proposed and its implementation demonstrated for a range of models for such studies, including all main effect and interaction models for incomplete contingency tables.
Article
One encounters in the literature estimates of some rates of genetic and congenital disorders based on log-linear methods to model possible interactions among sources. Often the analyst chooses the simplest model consistent with the data for estimation of the size of a closed population and calculates confidence intervals on the assumption that this simple model is correct. However, despite an apparent excellent fit of the data to such a model, we note here that the resulting confidence intervals may well be misleading in that they can fail to provide an adequate coverage probability. We illustrate this with a simulation for a hypothetical population based on data reported in the literature from three sources. The simulated nominal 95 per cent confidence intervals contained the modelled population size only 30 per cent of the time. Only if external considerations justify the assumption of plausible interactions of sources would use of the simpler model's interval be justified.
Article
The effect of population heterogeneity in capture-recapture, or dual registration, models is discussed. An estimator of the unknown population size based on a logistic regression model is introduced. The model allows different capture probabilities across individuals and across capture times. The probabilities are estimated from the observed data using conditional maximum likelihood. The resulting population estimator is shown to be consistent and asymptotically normal. A variance estimator under population heterogeneity is derived. The finite-sample properties of the estimators are studied via simulation. An application to Finnish occupational disease registration data is presented.
Article
Suppose we observe responses on several categorical variables having the same scale. We consider latent class models for the joint classification that satisfy quasi-symmetry. The models apply when subject-specific response distributions are such that (i) for a given subject, responses on different variables are independent, and (ii) odds ratios comparing marginal distributions of the variables are identical for each subject. These assumptions are often reasonable in modeling multirater agreement, when a sample of subjects is rated independently by different observers. In this application, the model parameters describe two components of agreement--strength of association between classifications by pairs of observers and degree of heterogeneity among the observers' marginal distributions. We illustrate the models by analyzing a data set in which seven pathologists classified 118 subjects in terms of presence or absence of carcinoma, yielding seven categorical classifications with the same binary scale. A good-fitting model has a latent classification that differentiates between subjects on whom there is agreement and subjects on whom there is disagreement.
Article
"A central assumption in the standard capture-recapture approach to the estimation of the size of a closed population is the homogeneity of the 'capture' probabilities. In this article we develop an approach that allows for varying susceptibility to capture through individual parameters using a variant of the Rasch model from psychological measurement situations. Our approach requires an additional recapture. In the context of census undercount estimation, this requirement amounts to the use of a second independent sample or alternative data source to be matched with census and Post-Enumeration Survey (PES) data.... We illustrate [our] models and their estimation using data from a 1988 dress-rehearsal study for the 1990 census conducted by the U.S. Bureau of the Census, which explored the use of administrative data as a supplement to the PES. The article includes a discussion of extensions and related models."
The flexibility of GLIM analyses of multiple recapture or resighting data
  • R M Cormack
Cormack, R. M. (1993). The flexibility of GLIM analyses of multiple recapture or resighting data. In Marked Individuals in the Study of Bird Population, J.-D. Lebreton and P. M. North (eds), 39-49. Basel: Birkhauser Verlag.