Content uploaded by Darryl I MacKenzie

Author content

All content in this area was uploaded by Darryl I MacKenzie on Nov 13, 2017

Content may be subject to copyright.

2248

Ecology,

83(8), 2002, pp. 2248–2255

q

2002 by the Ecological Society of America

ESTIMATING SITE OCCUPANCY RATES WHEN DETECTION

PROBABILITIES ARE LESS THAN ONE

D

ARRYL

I. M

AC

K

ENZIE

,

1,5

J

AMES

D. N

ICHOLS

,

2

G

IDEON

B. L

ACHMAN

,

2,6

S

AM

D

ROEGE

,

2

J. A

NDREW

R

OYLE

,

3

AND

C

ATHERINE

A. L

ANGTIMM

4

1

Department of Statistics, North Carolina State University, Raleigh, North Carolina 27695-8203 USA

2

U.S. Geological Survey, Patuxent Wildlife Research Center, 11510 American Holly Drive,

Laurel, Maryland 20708-4017 USA

3

U.S. Fish and Wildlife Service, Patuxent Wildlife Research Center, 11510 American Holly Drive,

Laurel, Maryland 20708-4017 USA

4

U.S. Geological Survey, Florida Caribbean Science Center, Southeastern Amphibian Research and Monitoring Initiative,

7920 NW 71st Street, Gainesville, Florida 32653 USA

Abstract.

Nondetection of a species at a site does not imply that the species is absent

unless the probability of detection is 1. We propose a model and likelihood-based method

for estimating site occupancy rates when detection probabilities are

,

1. The model provides

a ﬂexible framework enabling covariate information to be included and allowing for missing

observations. Via computer simulation, we found that the model provides good estimates

of the occupancy rates, generally unbiased for moderate detection probabilities (

.

0.3). We

estimated site occupancy rates for two anuran species at 32 wetland sites in Maryland,

USA, from data collected during 2000 as part of an amphibian monitoring program, Frog-

watch USA. Site occupancy rates were estimated as 0.49 for American toads (

Bufo amer-

icanus

), a 44% increase over the proportion of sites at which they were actually observed,

and as 0.85 for spring peepers (

Pseudacris crucifer

), slightly above the observed proportion

of 0.83.

Key words: anurans; bootstrap;

Bufo americanus;

detection probability; maximum likelihood;

metapopulation; monitoring; patch occupancy;

Pseudacris crucifer;

site occupancy.

I

NTRODUCTION

We describe an approach to estimating the proportion

of sites occupied by a species of interest. We envision

a sampling method that involves multiple visits to sites

during an appropriate season during which a species

may be detectable. However, a species may go unde-

tected at these sites even when present. Sites may rep-

resent discrete habitat patches in a metapopulation dy-

namics context or sampling units (e.g., quadrats) reg-

ularly visited as part of a large-scale monitoring pro-

gram. The patterns of detection and nondetection over

the multiple visits for each site permit estimation of

detection probabilities and the parameter of interest,

proportion of sites occupied.

Our motivation for considering this problem in-

volves potential applications in (1) large-scale moni-

toring programs and (2) investigations of metapopu-

lation dynamics. Monitoring programs for animal pop-

ulations and communities have been established

throughout the world in order to meet a variety of ob-

jectives. Most programs face two important sources of

Manuscript received 29 May 2001; revised 11 October 2001;

accepted 22 October 2001.

5

Present address: Proteus Research and Consulting Ltd.,

P.O. Box 5193, Dunedin, New Zealand.

E-mail: darryl@proteus.co.nz

6

Present address: International Association of Fish and

Wildlife Agencies, 444 N. Capitol Street NW, Suite 544,

Washington, D.C., 20001 USA.

variation that must be incorporated into the design

(e.g., see Thompson 1992, Lancia et al. 1994, Thomp-

son et al. 1998, Yoccoz et al. 2001, Pollock et al. 2002).

The ﬁrst source of variation is space. Many programs

seek to provide inferences about areas that are too large

to be completely surveyed. Thus, small areas must be

selected for surveying, with the selection being carried

out in a manner that permits inference to the entire area

of interest (Thompson 1992, Yoccoz et al. 2001, Pol-

lock et al. 2002).

The second source of variation important to moni-

toring program design is detectability. Few animals are

so conspicuous that they are always detected at each

survey. Instead, some sort of count statistic is obtained

(e.g., number of animals seen, heard, trapped, or oth-

erwise detected), and a method is devised to estimate

the detection probability associated with the count sta-

tistic. Virtually all of the abundance estimators de-

scribed in volumes such as Seber (1982) and Williams

et al. (

in press

) can be viewed as count statistics divided

by estimated detection probabilities. Not allowing for

detectability and solely using the count statistic as an

index to abundance is unwise. Changes in the count

may be a product of random variations or changes in

detectability, so it is impossible to make useful infer-

ence about the system under investigation.

The methods used to estimate detection probabilities

of individual animals (and hence abundance) at each

site are frequently expensive of time and effort. For

August 2002 2249

ESTIMATING SITE OCCUPANCY RATES

this reason, these estimation methods are often used in

detailed experiments or small-scale investigations, but

are not as widely used in large-scale monitoring pro-

grams. The methods proposed here to estimate the pro-

portion of sites (or more generally, the proportion of

sampled area) occupied by a species can be imple-

mented more easily and less expensively than the meth-

ods used for abundance estimation. For this reason, our

proposed method should be attractive as a basis for

large-scale monitoring programs, assuming that the

proportion of sites or area occupied is an adequate state

variable with respect to program objectives.

The second motivation for considering this estima-

tion problem involves the importance of patch occu-

pancy data to the study of metapopulation dynamics.

The proportion of patches occupied is viewed as a state

variable in various metapopulation models (e.g., Levins

1969, 1970, Lande 1987, 1988, Hanski 1992, 1994,

1997). So-called ‘‘incidence functions’’ (e.g., see Di-

amond 1975, Hanski 1992) depict the probability of

occurrence of a species in a patch, expressed as a func-

tion of patch characteristics such as area. Under the

assumption of a stationary Markov process, incidence

function data are sometimes used to estimate patch ex-

tinction and colonization probabilities (e.g., Hanski

1992, 1994, 1997, Moilanen 1999). Given the relevance

of patch occupancy data to metapopulation investiga-

tions and models, it seems important to estimate patch

occupancy probabilities properly. For most animal

sampling situations, detection of a species is indeed

indicative of ‘‘presence,’’ but nondetection of the spe-

cies is not equivalent to absence. Thus, we expect most

incidence function estimates of the proportion of patch-

es occupied to be negatively biased to some unknown

degree because species can go undetected when pre-

sent.

In this paper, we ﬁrst present general sampling meth-

ods that permit estimation of the probability of site

occupancy when detection probabilities are

,

1 and

may vary as functions of site characteristics, time, or

environmental variables. We then present a statistical

model for site occupancy data and describe maximum

likelihood estimation under this model. We illustrate

use of the estimation approach with empirical data on

site occupancy by two anuran species at 32 wetland

sites in Maryland collected during 2000. Finally we

discuss extending this statistical framework to address

other issues such as colony extinction/colonization,

species co-occurrence, and allowing for heterogeneous

detection and occupancy probabilities

M

ETHODS

Notation

We use the following notation throughout this article:

c

i

, probability that a species is present at site

i

;

p

it

,

probability that a species will be detected at site

i

at

time

t

, given presence;

N

, total number of surveyed

sites;

T

, number of distinct sampling occasions;

n

t

num-

ber of sites where the species was detected at time

t

;

n.

, total number of sites at which the species was de-

tected at least once.

Our use of

p

, to signify detection probabilities, dif-

fers from its customary use in the metapopulation lit-

erature, where it is used to denote the probability of

species presence (our

c

). However, our notation is con-

sistent with the mark–recapture literature which pro-

vides the foundation of our approach.

Basic sampling situation

Here we consider situations in which surveys of spe-

cies at

N

speciﬁc sites are performed at

T

distinct oc-

casions in time. Sites are occupied by the species of

interest for the duration of the survey period, with no

new sites becoming occupied after surveying has be-

gun, and no sites abandoned before the cessation of

surveying (i.e., the sites are ‘‘closed’’ to changes in

occupancy). At each sampling occasion, investigators

use sampling methods designed to detect the species

of interest. Species are never falsely detected at a site

when absent, and a species may or may not be detected

at a site when present. Detection of the species at a

site is also assumed to be independent of detecting the

species at all other sites. The resulting data for each

site can be recorded as a vector of 1’s and 0’s denoting

detection and nondetection, respectively, for the oc-

casions on which the site was sampled. The set of such

detection histories is used to estimate the quantity of

interest, the proportion of sites occupied by the species.

General likelihood

We propose a method that parallels a closed-popu-

lation, mark–recapture model, with an additional pa-

rameter (

c

) that represents the probability of species

presence. In closed-population models, the focus is to

estimate the number of individuals never encountered

by using information garnered from those individuals

encountered at least once (e.g., see Otis et al. 1978,

Williams et al.,

in press

). In our application, sites are

analogous to individuals except that we observe the

number of sites with the history comprising

T

0’s (sites

at which the species is never detected over the

T

sam-

pling occasions); hence, the total population size of

sites is known, but the focus is to estimate the fraction

of those sites that the species actually occupies. One

could recast this problem into a more conventional

closed mark–recapture framework by only considering

those sites where the species was detected at least once.

Use of such data with closed-population, capture–re-

capture models (e.g., Otis et al. 1978) would yield es-

timates of population size that correspond to the num-

ber of sites where the species is present. However, the

following method enables additional modeling of

c

to

be investigated (such as including covariate informa-

tion).

A likelihood can be constructed using a series of

2250

DARRYL I. MACKENZIE ET AL.

Ecology, Vol. 83, No. 8

probabilistic arguments similar to those used in mark–

recapture modeling (Lebreton et al. 1992). For sites

where the species was detected on at least one sampling

occasion, the species must be present and was either

detected or not detected at each sampling occasion. For

example, the likelihood for site

i

with history 01010

would be

c

(1

2

p

)

p

(1

2

p

)

p

(1

2

p

).

ii1i2i3i4i5

However, nondetection of the species does not imply

absence. Either the species was present and was not

detected after

T

samples, or the species was not present.

For site

k

with history 00000, the likelihood is

5

c

(1

2

p

)+(1

2c

).

P

kktk

t

5

1

Assuming independence of the sites, the product of all

terms (one for each site) constructed in this manner

creates the model likelihood for the observed set of

data, which can be maximized to obtain maximum like-

lihood estimates of the parameters.

Note that, at this stage, presence and detection prob-

abilities have been deﬁned as site speciﬁc. In practice,

such a model could not be ﬁt to the data because the

likelihood contains too many parameters: the model

likelihood is over-parameterized. However, the model

is presented in these general terms because, in some

cases, the probabilities may be modeled as a function

of site-speciﬁc covariates, to which we shall return.

When presence and detection probabilities are con-

stant across monitoring sites, the combined model like-

lihood can be written as

T

n. n n.

2

n

tt

L

(

c

,p)

5c

p

(1

2

p

)

P

tt

[]

t

5

1

(1)

N

2

n.

T

3c

(1

2

p

)+(1

2c

).

P

t

[]

t

5

1

Using the likelihood in this form, our model could be

implemented with relative ease via spreadsheet soft-

ware with built-in function maximization routines, be-

cause only the summary statistics (

n

1

,...,

n

T

,

n.

) and

N

are required. Detection probabilities could be time

speciﬁc, or reduced forms of the model could be in-

vestigated by constraining

p

to be constant across time

or a function of environmental covariates.

We suggest that the standard error of

c

be estimated

using a nonparametric bootstrap method (Buckland and

Garthwaite 1991), rather than the asymptotic (large-

sample) estimate involving the second partial deriva-

tives of the model likelihood (Lebreton et al. 1992).

The asymptotic estimate represents a lower bound on

the value of the standard error, and may be too small

when sample sizes are small. A random bootstrap sam-

ple of

N

sites is taken (with replacement) from the

N

monitored sites. The histories of the sites in the boot-

strap sample are used to obtain a bootstrap estimate of

c

. The bootstrap procedure is repeated a large number

of times, and the estimated standard error is the sample

standard deviation of the bootstrap estimates (Manly

1997).

Extensions to the model

Covariates

.—It would be reasonable to expect that

c

may be some function of site characteristics such as

habitat type or patch size. Similarly,

p

may also vary

with certain measurable variables such as weather con-

ditions. This covariate information (X) can be easily

introduced to the model using a logistic model (Eq. 2)

for

c

and/or

p

(denote the parameter of interest as

u

and the vector of model parameters as B:

exp(XB)

u5

. (2)

1 + exp(XB)

Because

c

does not change over time during the sam-

pling (the population is closed), appropriate covariates

would be time constant and site speciﬁc, whereas cov-

ariates for detection probabilities could be time varying

and site speciﬁc (such as air or water temperature).

This is in contrast to mark–recapture models in

which time-varying individual covariates cannot be

used. In mark–recapture, a time-varying individual

covariate can only be measured on those occasions

when the individual is captured; the covariate value is

unknown otherwise. Here, time-varying, site-speciﬁc

covariates can be collected and used regardless of

whether the species is detected. It would not be pos-

sible, however, to use covariates that change over time

and cannot be measured independent of the detection

process.

If

c

is modeled as a function of covariates, the av-

erage species presence probability is

N

cˆ

O

i

i

5

1

cˆ5

. (3)

N

Missing observations

.—In some circumstances, it

may not be possible to survey all sites at all sampling

occasions. Sites may not be surveyed for a number of

reasons, from logistic difﬁculties in getting ﬁeld per-

sonnel to all sites, to the technician’s vehicle breaking

down en route. These sampling inconsistencies can be

easily accommodated using the proposed model like-

lihood.

If sampling does not take place at site

i

at time

t

,

then that occasion contributes no information to the

model likelihood for that site. For example, consider

the history 10 11, where no sampling occurred at time

3. The likelihood for this site would be:

c

p

(1

2

p

)

pp

.

1245

Missing observations can only be accounted for in this

manner when the model likelihood is evaluated sepa-

rately for each site, rather than using the combined form

of Eq. 1.

August 2002 2251

ESTIMATING SITE OCCUPANCY RATES

F

IG

. 1. Results of the 500 simulated sets of data for

N

5

40, with no missing values. Indicated are the average value of

, ; the replication-based estimate of the true standard error of ,

SE

( ); and the average estimate of the standard error

cˆcˆcˆcˆ

obtained from 200 nonparametric bootstrap samples, , for various levels of

T

,

p

, and

c

.

SE

(

cˆ

)

S

IMULATION

S

TUDY

Simulation methods

A simulation study was undertaken to evaluate the

proposed method for estimating

c

. Data were generated

for situations in which all sites had the same probability

of species presence, and the detection probability was

constant across time and sites,

c

(·)

p

(·). The effects of

ﬁve factors were investigated: (1)

N

5

20, 40, or 60;

(2)

c5

0.5, 0.7, or 0.9; (3)

p

5

0.1, 0.3, or 0.5; (4)

T

5

2, 5, or 10; (5) probability of a missing observation

5

0.0, 0.1, or 0.2.

For each of the 243 scenarios, 500 sets of data were

simulated. For each site, a uniformly distributed, pseu-

do-random number between 0 and 1 was generated (

y

),

and if

y

#c

then the site was occupied. Further pseudo-

random numbers were generated and similarly com-

pared to

p

to determine whether the species was de-

tected at each time period, with additional random

numbers being used to establish missing observations.

The

c

(·)

p

(·) model was applied to each set of simulated

data. The resulting estimate of

c

was recorded and the

nonparametric bootstrap estimate of the standard error

was also obtained using 200 bootstrap samples.

Simulation results

Fig. 1 presents the simulation results for scenarios

where

N

5

40 with no missing values only, but these

are representative of the results in general. The full

simulation results are included in the Appendix.

Generally, this method provides reasonable estimates

of the proportion of sites occupied. When detection

probability is 0.3 or greater, the estimates of

c

are

reasonably unbiased in all scenarios considered for

T

$

5. When

T

5

2, only when detection probability is

at least 0.5 do the estimates of

c

appear to be reason-

able. For low detection probabilities, however,

c

tends

to be overestimated when the true value is 0.5 or 0.7,

but underestimated when

c

equals 0.9. A closer ex-

amination of the results reveals that, in some situations

in which detection probability is low, tends to 1.

cˆ

In most cases, the nonparametric bootstrap provides

a good estimate of the standard error for , the excep-

cˆ

tion being for situations with low detection probabil-

2252

DARRYL I. MACKENZIE ET AL.

Ecology, Vol. 83, No. 8

T

ABLE

1. Relative difference in AIC (

D

AIC), AIC model

weights (

w

i

), overall estimate of the fraction of sites oc-

cupied by each species ( ), and associated standard error

cˆ

(

SE

( )).

cˆ

Model, by species

D

AIC

w

i

cˆ

SE

()

cˆ

American toad

c

(Habitat)

p

(Temperature)

c

(·)

p

(Temperature)

c

(Habitat)

p

(·)

c

(·)

p

(·)

0.00

0.42

0.49

0.70

0.36

0.24

0.22

0.18

0.50

0.49

0.49

0.49

0.13

0.14

0.12

0.13

Spring peeper

c

(Habitat)

p

(Temperature)

c

(·)

p

(Temperature)

c

(Habitat)

p

(·)

c

(·)

p

(·)

0.00

1.72

40.49

42.18

0.85

0.15

0.00

0.00

0.84

0.85

0.84

0.85

0.07

0.07

0.07

0.07

ities. Again, this is caused by

c

estimates close to 1;

in such situations, the bootstrap estimate of the stan-

dard error is very small, which overstates the precision

of .

cˆ

In general, increasing the number of sampling oc-

casions improves both the accuracy and precision of

, although in some instances there is little gain in

cˆ

using 10 occasions rather than ﬁve. If only two occa-

sions are used, however, accuracy tends to be poor

unless detection probabilities are high, and even then

the standard error of is approximately double that of

cˆ

using ﬁve sampling occasions.

Similarly, increasing the number of sites sampled,

N

,

also improves both the accuracy and precision of .

cˆ

Not presented here are the simulation results for sce-

narios with missing observations. The proposed meth-

od appears to be robust to missing data, with the only

noticeable effect being (unsurprisingly) a loss of pre-

cision. In this study, on average, the standard error of

increased by 5% with 10% missing observations, and

cˆ

by 11% with 20% missing observations. The bootstrap

standard error estimates also increased by a similar

amount, accounting well for the loss of information.

F

IELD

S

TUDY OF

A

NURANS AT

M

ARYLAND

W

ETLANDS

Field methods and data collection

We illustrate our method by considering monitoring

data collected on American toads (

Bufo americanus

)

and spring peepers (

Pseudacris crucifer

) at 32 wetland

sites located in the Piedmont and Upper Coastal Plain

physiographic provinces surrounding Washington,

D.C., and Baltimore, Maryland, USA. Volunteers en-

rolled in the National Wildlife Federation/U.S. Geo-

logical Survey’s amphibian monitoring program,

FrogwatchUSA, visited monitoring sites between 19

February 2000 and 12 October 2000. Sites were chosen

nonrandomly by volunteers and were monitored at their

convenience. Observers collected information on the

species of frogs and toads heard calling during a 3-min

counting period taken sometime after sundown. Each

species of calling frog and toad was assigned a three-

level calling index, which, for this study, was truncated

to reﬂect either detection (1) or nondetection (0).

The data set was reduced by considering only the

portion of data for each species between the dates of

ﬁrst and last detection exclusive. Truncating the data

in this manner ensures that species were available to

be detected throughout that portion of the monitoring

period, thus satisfying our closure assumption. Includ-

ing the dates of ﬁrst and last detection in the analysis

would bias parameter estimates because the data set

was deﬁned using these points; hence, they were ex-

cluded.

Three sites were removed after the truncation be-

cause they were never monitored during the redeﬁned

period. Fewer than eight of the 29 sites were monitored

on any given day and the number of visits per site

varied tremendously, with a very large number of miss-

ing observations (

;

90%). Note that in the context of

this sampling, the entire sampling period included the

interval between the date at which the ﬁrst wetland was

sampled and the date at which all sampling ended. A

missing observation was thus any date during this in-

terval on which a wetland was not sampled. Each time

a site was visited, air temperature was recorded. Sites

were deﬁned as being either a distinct body of water

(pond, lake) or other habitat (swamp, marsh, wet mead-

ow). These variables were considered as potential cov-

ariates for detection and presence probabilities, re-

spectively. The data used in this analysis have been

included in the Supplement.

Results of ﬁeld study

American toad

.—Daily records for the 29 sites, mon-

itored between 9 March 2000 and 30 May 2000, were

included for analysis. Sites were visited 8.9 times on

average (minimum

5

2, maximum

5

58 times), with

American toads being detected at least once at 10 lo-

cations (0.34). Three models with covariates and one

without were ﬁt to the data (Table 1) and ranked ac-

cording to AIC (Burnham and Anderson 1998). The

four models considered have virtually identical weight,

suggesting that all models provide a similar description

of the data, despite the different structural forms.

Therefore we cannot make any conclusive statement

regarding the importance of the covariates, but there

is some suggestion that detection probabilities may in-

crease with increasing temperature and occupancy rates

may be lower for habitats consisting of a distinct body

of water. However, all models provide very similar es-

timates of the overall occupancy rate (

;

0.49), which

is 44% larger than the proportion of sites where toads

were detected at least once. The standard error for the

estimate is reasonably large and corresponds to a co-

efﬁcient of variation of 27%.

Spring peeper

.—Daily records for the 29 sites, mon-

itored between 27 February 2000 and 30 May 2000,

were included for analysis. Sites were visited, on av-

August 2002 2253

ESTIMATING SITE OCCUPANCY RATES

erage, 9.6 times (minimum

5

2, maximum

5

66 visits),

with spring peepers being detected at least once at 24

locations (0.83). The same models as those for the

American toad were ﬁt to the spring peeper data and

the results are also displayed in Table 1. Here the two

p

(·) models have virtually zero weight, indicating that

the

p

(Temperature) models provide a much better de-

scription of the data. We suspect that this effectis due,

partially, to a tapering off of the calling season as spring

progresses into summer. The

c

(Habitat)

p

(Temperature)

model clearly has greatest weight and suggests that

estimated occupancy rates are lower for distinct bodies

of water (0.77) than for other habitat types (1.00). This

is not unexpected, given spring peepers were actually

detected at all sites of the latter type. Regardless of

how the models ranked, however, all models provide

a similar estimate of the overall occupancy rate that is

only marginally greater than the number of sites where

spring peepers were detected at least once. This sug-

gests that detection probabilities were large enough that

spring peepers probably would be detected during the

monitoring if present.

D

ISCUSSION

The method proposed here to estimate site occupancy

rate uses a simple probabilistic argument to allow for

species detection probabilities of

,

1. As shown, it pro-

vides a ﬂexible modeling framework for incorporating

both covariate information and missing observations.

It also lays the groundwork for some potentially ex-

citing extensions that would enable important ecolog-

ical questions to be addressed.

From the full simulation results for scenarios with

low detection probabilities, it is very easy to identify

circumstances in which one should doubt the estimates

of

c

. We advise caution if an estimate of

c

very close

to 1 is obtained when detection probabilities are low

(

,

0.15), particularly when the number of sampling oc-

casions is also small (

,

7). In such circumstances, the

level of information collected on species presence/ab-

sence is small, so it is difﬁcult for the model to dis-

tinguish between a site where the species is genuinely

absent and a site where the species has merely not been

detected.

Our simulation results may also provide some guid-

ance on the number of visits to each site required in

order to obtain reasonable estimates of occupancy rate.

If one wishes to visit a site only twice, then it appears

that the true occupancy rate needs to be

.

0.7 and de-

tection probability (at each visit) should be

.

0.3. Even

then, however, precision of the estimate may be low.

Increasing the number of visits per site improves the

precision of the estimated occupancy rate, and the re-

sulting increase in information improves the accuracy

of the estimate when detection probabilities are low.

We stress that whenever a survey (of any type) is being

designed, some thought should be given to the likely

results and method of analysis, because these consid-

erations can provide valuable insight on the level of

sampling effort required to achieve ‘‘good’’ results.

Logistical considerations of multiple visits will prob-

ably result in some hesitancy to use this approach, but

we suggest that the expenditure of extra effort to obtain

unbiased estimates of parameters of interest generally

will be preferable to the expenditure of less effort to

obtain biased estimates. If travel time to sites is sub-

stantial, then multiple searches or samples may be con-

ducted by multiple observers, or even by a single ob-

server, at a single trip to a site, e.g., conduct two or

more 3-min amphibian calling surveys in a single night

at the same pond. If large numbers of patches must be

surveyed, then it may be reasonable to conduct multiple

visits at a subset of sites for the purpose of estimating

detection probability, and perhaps associated covariate

relationships. Then this information on detection prob-

ability, perhaps modeled as a function of site-speciﬁc

covariates, could be applied to sites visited only once.

Issues about optimal design require additional work,

but it is clear that a great deal of ﬂexibility is possible

in approaches to sampling.

Site occupancy may well change over years or be-

tween seasons as populations change; new colonies

could be formed or colonies could become locally ex-

tinct. When sites are surveyed on more than one oc-

casion between these periods of change, for multiple

periods, the approach described here could be com-

bined with the robust design mark–recapture approach

(Pollock et al. 1990). For example, suppose that the

anuran sampling described in our examples is contin-

ued in the future, such that the same wetland sites are

surveyed multiple times each summer, for multiple

years. During the periods when sites are closed to

changes in occupancy, our approach could be used to

estimate the occupancy rate as in our example. The

change in occupancy rates over years could then be

modeled as functions of site colonization and extinction

rates, analogous with the birth and death rates in an

open-population mark–recapture study. Such Markov

models of patch occupancy dynamics will permit time-

speciﬁc estimation and modeling of patch extinction

and colonization rates that do not require the assump-

tions of

p

5

1 or process stationarity invoked in pre-

vious modeling efforts (e.g., Erwin et al. [1998] re-

quired

p

5

1; Hanski [1992, 1994] and Clark and Ro-

senzweig [1994] required both assumptions).

Often monitoring programs collect information on

the presence/absence of multiple species at the same

sites. An important biological question is whether spe-

cies co-occur independently. Does the presence/ab-

sence of species A depend upon the occupancy state

of species B? Our method of modeling species presence

could be extended in this direction, enabling such im-

portant ecological questions to be addressed. The mod-

el could be parameterized in terms of

c

AB

(in addition

to

c

A

and

c

B

): the probability that both species A and

species B are present at a site. However, the number

2254

DARRYL I. MACKENZIE ET AL.

Ecology, Vol. 83, No. 8

of parameters in the model would increase exponen-

tially with the number of species, so reasonably good

data sets might be required. For example, four addi-

tional parameters would be required to model co-oc-

currences between species A, B, and C (

c

AB

,

c

AC

,

c

BC

,

c

ABC

), but if six species were being modeled, 57 extra

parameters would need to be estimated.

Not addressed are situations in which presence and

detection probabilities are heterogeneous, varying

across sites. Some forms of heterogeneity may be ac-

counted for with covariate information such as site

characteristics or environmental conditions at the time

of sampling. On other occasions, however, the source

of heterogeneity may be unknown. We foresee that

combining our method with the mixture model ap-

proach to closed-population, mark–recapture models of

Pledger (2000) would be one solution, which enables

the problem to be contained within a likelihood frame-

work. It may also be possible to combine our method

with other closed-population, mark–recapture methods

such as the jackknife (Burnham and Overton 1978) or

coverage estimators (Chao et al. 1992). For different

sampling frameworks, where monitoring is performed

on a continuous or incidental basis rather than at dis-

crete sampling occasions, combining our methods with

the Poisson family of models (Boyce et al. 2001,

MacKenzie and Boyce 2001) may also be feasible, par-

ticularly for multiple years of data.

The three extensions to the proposed methods are

currently the focus of ongoing research on this general

topic of estimating site occupancy rates.

Software to perform the above modeling has been

included in the Supplement.

A

CKNOWLEDGMENTS

We would like to thank ChristopheBarbraud, Mike Conroy,

Ullas Karanth, Bill Kendall, Ken Pollock, John Sauer, and

Rob Swihart for useful discussions of this general estimation

problem and members of the USGS Southeastern Amphibian

Research and Monitoring Initiative for discussions on am-

phibian monitoring. Atte Moilanen provided a constructive

review of the manuscript, as did a second anonymous re-

viewer. Our thanks also go to the Frogwatch USAvolunteers

directed by Sue Muller at Howard County Department of

Parks and Recreation for collection of the data.

L

ITERATURE

C

ITED

Boyce, M. S., D. I. MacKenzie, B. F. J. Manly, M. A. Har-

oldson, and D. Moody. 2001. Negative binomial models

for abundance estimation of multiple closed populations.

Journal of Wildlife Management 65:498–509.

Buckland, S. T., and P. H. Garthwaite. 1991. Quantifying

precision of mark–recapture estimates using the bootstrap

and related methods. Biometrics 47:255–268.

Burnham, K. P., and D. R. Anderson. 1998. Model selection

and inference—a practical information-theoretic approach.

Springer-Verlag, New York, New York, USA.

Burnham, K. P., and W. S. Overton. 1978. Estimation of the

size of a closed population when capture probabilities vary

among animals. Biometrika 65:625–633.

Chao, A., S.-M. Lee, and S.-L. Jeng. 1992. Estimating pop-

ulation size for capture–recapture data when capture prob-

abilities vary by time and individual animal. Biometrics

48:201–216.

Clark, C. W., and M. L. Rosenzweig. 1994. Extinction and

colonization processes: parameter estimates from sporadic

surveys. American Naturalist 143:583–596.

Diamond, J. M. 1975. Assembly of species communities.

Pages 342–444

in

M. L. Cody and J. M. Diamond, editors.

Ecology and evolution of communities. Harvard University

Press, Cambridge, Massachusetts, USA.

Erwin, R. M., J. D. Nichols, T. B. Eyler, D. B. Stotts, and B.

R. Truitt. 1998. Modeling colony site dynamics: a case

study of Gull-billed Terns (

Sterna nilotica

) in coastal Vir-

ginia. Auk 115:970–978.

Hanski, I. 1992. Inferences from ecological incidence func-

tions. American Naturalist 139:657–662.

Hanski, I. 1994. A practical model of metapopulation dy-

namics. Journal of Animal Ecology 63:151–162.

Hanski, I. 1997. Metapopulation dynamics: from concepts

and observations to predictive models. Pages 69–91

in

I.

A. Hanski and M. E. Gilpin, editors. Metapopulation bi-

ology: ecology, genetics, and evolution. Academic Press,

New York, New York, USA.

Lancia, R. A., J. D. Nichols, and K. H. Pollock. 1994. Es-

timating the number of animals in wildlife populations.

Pages 215–253

in

T. Bookhout, editor. Research and man-

agement techniques for wildlife and habitats. The Wildlife

Society, Bethesda, Maryland, USA.

Lande, R. 1987. Extinction thresholds in demographic mod-

els of territorial populations. American Naturalist 130:624–

635.

Lande, R. 1988. Demographic models of the northern spotted

owl (

Strix occidentalis caurina

). Oecologia 75:601–607.

Lebreton, J. D., K. P. Burnham, J. Clobert, and D. R. An-

derson. 1992. Modeling survival and testing biological hy-

potheses using marked animals. A uniﬁed approach with

case studies. Ecological Monographs 62:67–118.

Levins, R. 1969. Some demographic and genetic consequenc-

es of environmental heterogeneity for biological control.

Bulletin of the Entomological Society of America 15:237–

240.

Levins, R. 1970. Extinction. Pages 77–107

in

M. Gersten-

haber, editor. Some mathematical questions in biology. Vol-

ume II. American Mathematical Society, Providence,

Rhode Island, USA.

MacKenzie, D. I., and M. S. Boyce. 2001. Estimating closed

population size using negative binomial models. Western

Black Bear Workshop 7:21–23.

Manly, B. F. J. 1997. Randomization, bootstrap and Monte

Carlo methods in biology. Second edition. Chapman and

Hall, London, UK.

Moilanen, A. 1999. Patch occupancy models of metapopu-

lation dynamics: efﬁcient parameter estimation using im-

plicit statistical inference. Ecology 80:1031–1043.

Otis, D. L., K. P. Burnham, G. C. White, and D. R. Anderson.

1978. Statistical inference from capture data on closed an-

imal populations. Wildlife Monographs 62.

Pledger, S. 2000. Uniﬁed maximum likelihood estimates for

closed capture–recapture models using mixtures. Biomet-

rics 56:434–442.

Pollock, K. H., J. D. Nichols, C. Brownie, and J. E. Hines.

1990. Statistical inference for capture–recapture experi-

ments. Wildlife Monographs 107.

Pollock, K. H., J. D. Nichols, T. R. Simons, G. L.Farnsworth,

L. L. Bailey, and J. R. Sauer. 2002. Large scale wildlife

monitoring studies: statistical methods for design and anal-

ysis. Environmetrics 13:1–15.

Seber, G. A. F. 1982. The estimation of animal abundance

and related parameters. MacMillan Press, New York, New

York, USA.

August 2002 2255

ESTIMATING SITE OCCUPANCY RATES

Thompson, S. K. 1992. Sampling. John Wiley, New York,

New York, USA.

Thompson, W. L., G. C. White, and C. Gowan. 1998. Mon-

itoring vertebrate populations. Academic Press, San Diego,

California, USA.

Williams, B. K., J. D. Nichols, and M. J. Conroy.

In press.

Analysis and management of animal populations. Academ-

ic Press, San Diego, California, USA.

Yoccoz, N. G., J. D. Nichols, and T. Boulinier. 2001. Mon-

itoring of biological diversity in space and time; concepts,

methods and designs. Trends in Ecology andEvolution 16:

446–453.

APPENDIX

Full results of the simulation study are available in ESA’s Electronic Data Archive:

Ecological Archives

E083-041-A1.

SUPPLEMENT

Software, source code, and the sample data sets are available in ESA’s Electronic DataArchive:

Ecological Archives

E083-

041-S1.