ArticlePDF Available

Abstract and Figures

Long-term biodiversity monitoring data are mainly used to estimate changes in species occupancy or abundance over time, but they may also be incorporated into predictive models to document species distributions in space. Although changes in occupancy or abundance may be estimated from a relatively limited number of sampling units, small sample size may lead to inaccurate spatial models and maps of predicted species distributions. We provide a methodological approach to estimate the minimum sample size needed in monitoring projects to produce accurate species distribution models and maps. The method assumes that monitoring data are not yet available when sampling strategies are to be designed and is based on external distribution data from atlas projects. Atlas data are typically collected in a large number of sampling units during a restricted timeframe and are often similar in nature to the information gathered from long-term monitoring projects. The large number of sampling units in atlas projects makes it possible to simulate a broad gradient of sample sizes in monitoring data and to examine how the number of sampling units influences the accuracy of the models. We apply the method to several bird species using data from a regional breeding bird atlas. We explore the effect of prevalence, range size and habitat specialization of the species on the sample size needed to generate accurate models. Model accuracy is sensitive to particularly small sample sizes and levels off beyond a sufficiently large number of sampling units that varies among species depending mainly on their prevalence. The integration of spatial modelling techniques into monitoring projects is a cost-effective approach as it offers the possibility to estimate the dynamics of species distributions in space and over time. We believe our innovative method will help in the sampling design of future monitoring projects aiming to achieve such integration.
Content may be subject to copyright.
29
how they change over time may provide key information to
guide eff ective landscape and conservation planning.
Dynamic species distribution mapping may, therefore, be
considered as an essential component of a biodiversity mon-
itoring project (Brotons et al. 2007, K é ry et al. 2013). In any
monitoring project, sampling units are, however, sparsely
distributed over the region of interest, which is inconvenient
for a straightforward mapping of species distributions.
Species distribution modelling is an increasingly used
technique (Rodr í guez et al. 2007) that can produce distribu-
tion maps based on monitoring data (Brotons et al. 2006).
With these models, environmental variables describing the
habitat conditions in the sampling units are related to records
of species presence.  ese models are used to predict the spe-
cies distribution beyond the sampling units in areas where
species occurrence is unknown (Ara ú jo and Guisan 2006,
Elith et al. 2010).  e use of models to predict species distri-
butions is of key signifi cance for biodiversity conservation
(Guisan et al. 2013). Among several applications, models
© 2014  e Authors. Ecography © 2014 Nordic Society Oikos
Subject Editor: Miguel Ara ú jo. Accepted 11 April 2014
Ecography 38: 29–40, 2015
doi: 10.1111/ecog.00749
Optimising long-term monitoring projects for species distribution
modelling: how atlas data may help
Olatz Aizpurua , Jean-Yves Paquet , Llu í s Brotons and Nicolas Titeux
O. Aizpurua and N. Titeux (nicolas.titeux@ctfc.es), Centre de Recherche Public Gabriel Lippmann, D é pt Environnement et Agro-biotechnologies,
41 rue du Brill, LU-4422 Belvaux, Luxembourg. L. Brotons, OA and NT, Forest Sciences Centre of Catalonia (CEMFOR-CTFC), Ctra. Sant
Lloren ç de Morunys, km 2, ES-25280 Solsona, Spain. LB also at: Centre de Recerca Ecol ò gica i Aplicacions Forestals (CREAF), Univ. Aut ò noma
de Barcelona, ES-08193 Bellaterra, Spain, and Inst. Catal à d Ornitologia (ICO), Museu de Zoologia, Passeig Picasso s/n, ES-08003 Barcelona,
Spain. NT also at: Univ. de Li è ge Gembloux Agro-Bio Tech, Unit é Biodiversit é et Paysage, 2 Passage des D é port é s, BE-5030, Gembloux,
Belgium. J.-Y. Paquet, Aves-Natagora, D é pt É tudes, 98 rue Nanon, BE-5000 Namur, Belgium.
Long-term biodiversity monitoring data are mainly used to estimate changes in species occupancy or abundance over
time, but they may also be incorporated into predictive models to document species distributions in space. Although
changes in occupancy or abundance may be estimated from a relatively limited number of sampling units, small sample
size may lead to inaccurate spatial models and maps of predicted species distributions. We provide a methodological
approach to estimate the minimum sample size needed in monitoring projects to produce accurate species distribution
models and maps.  e method assumes that monitoring data are not yet available when sampling strategies are to
be designed and is based on external distribution data from atlas projects. Atlas data are typically collected in a large
number of sampling units during a restricted timeframe and are often similar in nature to the information gathered
from long-term monitoring projects.  e large number of sampling units in atlas projects makes it possible to simulate
a broad gradient of sample sizes in monitoring data and to examine how the number of sampling units infl uences the
accuracy of the models. We apply the method to several bird species using data from a regional breeding bird atlas. We
explore the eff ect of prevalence, range size and habitat specialization of the species on the sample size needed to generate
accurate models. Model accuracy is sensitive to particularly small sample sizes and levels off beyond a suffi ciently large
number of sampling units that varies among species depending mainly on their prevalence.  e integration of spatial
modelling techniques into monitoring projects is a cost-eff ective approach as it off ers the possibility to estimate the
dynamics of species distributions in space and over time. We believe our innovative method will help in the sampling
design of future monitoring projects aiming to achieve such integration.
Long-term wildlife monitoring is generally considered as an
essential tool for biodiversity management and for research
studies on biodiversity conservation (Gitzen et al. 2012).
Monitoring projects primarily aim at delivering information
on the changing status of key features of biodiversity
(Lindenmayer et al. 2012). State variables are used to charac-
terise the status of these features at diff erent points in time
with a view to assessing system state and inferring changes in
state over time (Gitzen et al. 2012). State variables include,
among others, species occupancy (MacKenzie et al. 2005,
K é ry et al. 2009) or species abundance (Royle and Nichols
2003). In such projects, fi eld data are often repeatedly
collected over time in a network of sampling units according
to standardised procedures (Gitzen et al. 2012). Previous
studies have reported that monitoring projects have also the
potential to provide an appropriate source of data to docu-
ment the distribution of species in space (Brotons et al.
2007, Braunisch and Suchant 2010, Rodhouse et al. 2012).
Mapping species distributions in space and documenting
30
may be used to identify the most important environmental
conditions that infl uence species distributions or to guide
the prioritization of management options amongst areas that
vary in their suitability for the species (Titeux et al. 2007).
Species distribution models are also often built to explore the
impacts of environmental changes on future species distribu-
tions (Elith et al. 2010). Previous studies examined the use of
monitoring data to generate species distribution models
(Brotons et al. 2006, 2007, Braunisch and Suchant 2010)
and showed that the integration of monitoring data into
modelling approaches may contribute to understanding how
species distributions change over time (De C á ceres and
Brotons 2012, Rodhouse et al. 2012, K é ry et al. 2013).
Sampling design in a monitoring project typically results
from a balance between the number of sampling units and
the number of repeated surveys in these units to document
the state variables with an acceptable level of precision
(MacKenzie et al. 2005). A limited number of sampling
units and a suffi cient number of repeated surveys may be
suited, and in some cases recommended, to derive unbiased
estimates of the state variables (MacKenzie and Royle
2005, K é ry et al. 2009, MacKenzie 2012).  is appropriate
sampling design for monitoring purposes may, however, fail
to produce enough spatial data to build relevant species dis-
tribution models (Brotons et al. 2007), because a small
number of sampling units is known to induce inaccurate
spatial models (Hernandez et al. 2006, Wisz et al. 2008,
Jim é nez-Valverde et al. 2009, Bean et al. 2012).  is draw-
back can be avoided if dynamic species distribution mapping
is explicitly considered when setting the objectives of the
monitoring project and when making decisions about sam-
pling design. At this stage of a project, existing monitoring
data in the region of interest are, however, not yet available
and other sources of information based on upfront sampling
eff orts are needed to help putting the monitoring project
into place (Hooten et al. 2012).
Atlas projects are an interesting source of spatial informa-
tion that may assist in making such pilot analysis. Two-
stage sampling design ( ompson 2012) is increasingly
implemented in last-generation atlases (Estrada et al. 2004,
Jacob et al. 2010, Maes et al. 2013): species presence or
abundance is recorded in 1) primary sampling units to pro-
vide a picture of the species distribution across the whole
region of interest but at coarse spatial resolution and in 2) a
set of secondary sampling units nested within the primary
ones to explore species distribution at fi ner resolution.
Last-generation atlases are generally completed over consid-
erable time periods and repeated at long time intervals
(Dunn and Weston 2008), which prevents them from being
suited to detect changes in species distributions with time
scales matching decision-making needs. Interestingly, fi eld
sampling procedures for atlas data collection in secondary
sampling units (e.g. bird or butterfl y counts along transects)
are often similar in nature to the procedures implemented in
long-term monitoring projects (Van Swaay et al. 2008,
Vor í sek et al. 2008). Such kind of atlas data are generally col-
lected only once during the atlas period, but in a large
number of secondary sampling units to cover an important
part of the region of interest at a fi ne spatial resolution
(Carden et al. 2010, Maes et al. 2012). Hence, atlas data in
secondary sampling units may be manipulated to imitate a
broad gradient of sample sizes in a monitoring project and to
build species distribution models with varying numbers of
sampling units. Such an approach may, in turn, contribute
to identifying how large the number of sampling units
should be at the start of a monitoring project if dynamic
species distribution mapping is set as an objective.
Here, we provide an innovative analytical framework
using data from last-generation atlases to aid in the initial
design of monitoring projects able to generate appropriate
data for the production of accurate species distribution mod-
els and maps. We draw attention to important issues that are
to be addressed if we are to generate and update species dis-
tribution maps as a direct output of long-term monitoring
projects.  is study illustrates how datasets derived from
last-generation atlas projects can contribute to the integra-
tion of spatial modelling techniques into long-term moni-
toring studies in order to cost-effi ciently estimate biodiversity
dynamics in space and over time (Rodr í guez et al. 2007).
Methods
An increasing number of atlas projects with two-stage
sampling designs become available for diff erent taxa world-
wide (Estrada et al. 2004, Carden et al. 2010, Maes
et al. 2012) and may support the integration of spatial mod-
elling techniques into monitoring studies.  e following
analytical framework is of general interest as it can be applied
to any dataset derived from such last-generation atlas proj-
ects. In the present study, we apply this innovative method
to the ‘ Breeding Bird Atlas of Wallonia ’ (BBAW) data (Jacob
et al. 2010).
Study area
Belgium is a heavily industrialized north-western European
country with a high human population density.  e south-
ern part of Belgium (Wallonia, ca 16 850 km
2 , Fig. 1a) is
characterised by a strong gradient in landscape composition,
from a densely populated and agriculture dominated
landscape in the northwest to a hilly landscape with an
important cover of forest and grassland in the southeast
(Jacob et al. 2010).
Atlas data
During 2001 2007, 650 volunteer fi eldworkers participated
in the BBAW data collection. Data were collected across a
range of spatial resolutions according to a two-stage sam-
pling design and an additional territory-mapping procedure.
Grid-based procedure: primary sampling units
Based on regular fi eld visits during day and night from
February to August, fi eldworkers were asked to report the
presence, estimate the abundance and record the breeding
evidence for all bird species in 40 km
2 (5 8 km) primary
sampling units (n 514, Fig. 1b). Fieldworkers paid partic-
ular attention to survey the diff erent habitat types present in
the primary sampling units. Abundance was estimated by
eldworkers in the form of 9 abundance classes derived from
31
Figure 1. (a) Location of Wallonia in NW Europe. (b) Main ecological regions in Wallonia and grid system of the Breeding Bird Atlas of
Wallonia with the 40-km
2 (5 8 km) primary sampling units. (c) Subset of the study area with the 1-km
2 secondary sampling units
(black squares show an example with red-backed shrike Lanius collurio presence records collected during the transect-based procedure).
(d) Same subset of the study area as in (c) with L. collurio territories (black dots) mapped during the simplifi ed territory-mapping
procedure.
a geometric progression with a common ratio set to 2
(see details in Jacob et al. 2010) and the central value of each
class was used in subsequent analyses.  e highest possible
breeding evidence for each species was provided according to
the EOAC classifi cation, i.e. non-breeding, possible breed-
ing, probable breeding and confi rmed breeding (Timothy
and Sharrock 1974).
Transect-based procedure: secondary sampling units
Secondary sampling units of 1-km
2 squares were selected
according to a regular and systematic sampling design (see
details in Jacob et al. 2010) so that all primary sampling
units were geographically covered in the same way by the
secondary sampling units (Fig. 1c). Within these secondary
sampling units, transects were delineated by volunteer
eldworkers to cover the whole diversity of habitats in the
squares. Fieldworkers walked during 1 h along these sam-
pling routes in the fi rst ve hours after sunrise and twice
a year during breeding season to record early and late breed-
ers. Each breeding or non-breeding bird (detected either
by sight or by sound) was recorded individually. In each
secondary sampling unit, the transect-based procedure was
conducted in only one year during the timeframe of the
BBAW project.  e number of secondary sampling units
surveyed during the BBAW project (n 2800) covered
almost 17% of the study area.
Territory-mapping procedure
At the start of the BBAW project, bird species were classifi ed
in low-, moderate- and high-abundance species according
to prior knowledge of their regional abundance. Based on
territorial indications collected during the regular fi eld
visits conducted in the diversity of habitat types within the
primary sampling units, fi eldworkers were asked to map
the locations of all detected territories or colonies of low-
and moderate-abundance species.  ese locations were
considered as the centres of the territories and were associ-
ated with an accuracy ranging from 100 to 500 m as
estimated by the fi eldworkers (Fig. 1d).  is simplifi ed
territory-mapping procedure is a detailed and time-
consuming technique and is unachievable over large areas on
a regular basis.
32
the minimum sample size (i.e. minimum number of second-
ary sampling units) needed to reach an acceptable level of
modelling performance based on three diff erent evaluation
measures. Finally, we evaluated for the whole set of species
the eff ect of prevalence, range size and habitat specialization
on the minimum sample size (redundancy analysis).
Transect-based model training
We randomly selected subsets of the available secondary
sampling units to simulate a range of sample sizes in a long-
term monitoring project (Jim é nez-Valverde et al. 2009):
0.5% of the study area (sample size: n 83 secondary
sampling units), 1% (n 166), 2% (n 332), 4% (n 664),
6% (n 996), 8% (n 1328) and 12% (n 1992). In order
Overview of the modelling approach
In our analytical framework (Fig. 2), we considered the data
collected during the transect-based procedure in the second-
ary sampling units as equivalent to long-term monitoring
data (Vor í sek et al. 2008, Maes et al. 2012). We used these
data as a basis to produce large-scale, fi ne-resolution species
distribution models (hereafter transect-based models )
and we manipulated the number of secondary sampling
units in order to examine the eff ect of sample size on the
performance of the models.  e territory-mapping data cov-
ered the whole study area and provided the best available
information on the distribution and habitat requirements of
low- to moderate-abundance species.  erefore, we used
territory-mapping data as a reference to evaluate the perfor-
mance of the transect-based models.  en, we calculated
Figure 2. Overview of the modelling and analytical framework. Red-backed shrike Lanius collurio is used as an example.
33
secondary sampling units.  e quadratic terms of the con-
tinuous environmental variables were included in addition
to the linear functions.  e continuous modelling outputs
were converted into binary predictions by setting a threshold
probability value above which the species was predicted as
present. To set this value, we assumed that some presence
records were located in unsuitable areas (Hirzel and Le Lay
2008) and we defi ned a threshold such that an omission rate
of 10% was specifi ed in the subsets of secondary sampling
units used for model training (Martin et al. 2013).  is
method allows fi xing a threshold that is independent of
the false positive fraction, which is suitable in the case of
presence-only data (Pearson et al. 2007).
Territory-based model training: reference
distribution maps
Using the same 1-km
2 squares as for the environmental vari-
ables, we considered a square as occupied by a species when
it enclosed the centre of at least one territory of the species
recorded during the territory-mapping procedure. In order
to avoid redundancy between the data used for model train-
ing and model evaluation (see below), we removed from the
territory-mapping data the 1-km
2 squares that coincide
with the set of secondary sampling units.  e remaining
territory-mapping data were used to build reference territory-
based distribution models with the same environmental vari-
ables as for the transect-based models. Using a bootstrap
approach, we fi tted and averaged ten models for each focal
species based on random selections of 70% of the territory-
mapping data for model training. In order to create a refer-
ence distribution map for each focal species, the modelling
outputs were converted into presence absence predictions
with the same threshold decision rule as for the transect-
based models (10% of omission rate in the training data).
for the subsets of secondary sampling units to be spread out
over the whole environmental gradient in the study area,
they were generated using a stratifi ed random sampling
procedure ( ompson 2012) with the main ecological
regions in Wallonia as environmental strata (Jacob et al.
2010, Fig. 1b). We iterated this stratifi ed random sampling
with ten bootstrap replicates for each sample size.
We used 23 environmental variables (Supplementary
material Appendix 1, Table A1) that characterize the most
important habitat conditions for birds (e.g. elevation,
climate, land cover and soil type) in southern Belgium
(Jacob et al. 2010) as predictors in the models.  ese vari-
ables were sourced from available GIS data layers and sam-
pled in the 1-km
2 squares that are completely within the
boundaries of Wallonia (n 16 600). We considered a sec-
ondary sampling unit as occupied by the species when at
least one individual was recorded with breeding evidence
during the transect-based procedure.  e species that
were included in the modelling exercise (hereafter focal
species ’ , Table 1) fulfi lled four criteria: 1) they were recorded
as present in all randomly generated subsets of secondary
sampling units for model training, 2) territory-mapping data
for model evaluation were available, 3) they are diurnal song-
bird species, and 4) their territory size or home range is on
average lower than or close to the spatial resolution of the
secondary sampling units.
Reliable absence data were unavailable and this issue may
produce inaccurate presence absence models (Brotons et al.
2004, Lobo et al. 2010). Hence, we applied the presence-only
maximum entropy framework Maxent 3.3.1 (Phillips et al.
2006). Maxent is only moderately sensitive to sample size and
outperforms other methods when sample size is small
(Hernandez et al. 2006, Wisz et al. 2008, Bean et al. 2012).
For each focal species and sample size, model training
was performed with the ten randomly generated subsets of
Table 1. Minimum sampling coverage (MSC: percentage of the study area) and sample size (MSS: number of secondary sampling units)
needed to achieve an acceptable level of modelling performance according to omission rate, area under the curve of a ROC plot (AUC) and
kappa value for each focal species (n 20) used in this study. The species are listed by decreasing order of prevalence in secondary sampling
units.
Species Code Prevalence Range size Specialization
Omission rate
MSC/MSS
AUC
MSC/MSS
Kappa
MSC/MSS
Picus viridis PICVIR 0.37 (high) 0.88 (wide) 0.86 (high) 1.78/295 0.88/145 2.48/411
Anthus trivialis ANTTRI 0.30 (high) 0.71 (wide) 0.68 (high) 1.35/225 0.68/112 1.64/271
Pyrrhula pyrrhula PYRPYR 0.26 (high) 0.78 (wide) 0.58 (low) 1.13/187 0.01/1 1.27/211
Anthus pratensis ANTPRA 0.23 (high) 0.73 (wide) 0.46 (low) 1.96/325 0.73/122 1.91/317
Sylvia curruca SYLCUR 0.22 (high) 0.85 (wide) 0.59 (low) 1.99/330 0.84/140 3.60/597
Cuculus canorus CUCCAN 0.22 (high) 0.84 (wide) 0.44 (low) 1.92/319 0.83/138 2.49/413
Motacilla fl ava MOTFLA 0.21 (high) 0.50 (restricted) 1.20 (high) 1.40/232 0.47/78 1.17/195
Carduelis carduelis CARCAR 0.20 (high) 0.83 (wide) 0.41 (low) 2.22/369 0.75/124 1.19/198
Turdus pilaris TURPIL 0.19 (high) 0.56 (restricted) 1.01 (high) 2.97/493 0.76/126 1.97/327
Streptopelia turtur STRTUR 0.18 (high) 0.84 (wide) 0.43 (low) 2.54/422 0.97/160 4.49/746
Acrocephalus palustris ACRPAL 0.17 (low) 0.85 (wide) 0.45 (low) 2.83/470 0.90/150 4.57/759
Dryocopus martius DRYMAR 0.10 (low) 0.67 (restricted) 0.56 (low) 2.80/465 0.98/162 2.55/423
Muscicapa striata MUSSTR 0.09 (low) 0.75 (wide) 0.46 (low) 6.16/1022 0.76/126 7.91/1313
Saxicola torquatus SAXTOR 0.08 (low) 0.60 (restricted) 0.50 (low) 4.83/802 1.20/199 5.83/967
Dendrocopos medius DENMED 0.08 (low) 0.62 (restricted) 1.03 (high) 2.73/453 1.08/179 4.69/778
Lanius collurio LANCOL 0.07 (low) 0.52 (restricted) 0.71 (high) 4.08/678 0.99/165 4.95/821
Hippolais polyglotta HIPPOL 0.06 (low) 0.53 (restricted) 0.74 (high) 5.80/964 1.11/185 8.46/1404
Miliaria calandra MILCAL 0.05 (low) 0.27 (restricted) 1.47 (high) 3.62/601 0.94/156 4.18/694
Emberiza schoeniclus EMBSCH 0.04 (low) 0.47 (restricted) 0.95 (high) 8.13/1350 1.78/296 8.50/1410
Serinus serinus SERSER 0.03 (low) 0.38 (restricted) 0.63 (high) 7.76/1288 1.29/214 8.09/1342
34
Where y is the modelling performance measure, x is the
sample size, y 0 is the minimum asymptote y value, y 0 a is
the initial modelling performance measure when the sample
size is equal to zero (forced to 1 in our case), and b is the
decay constant.
For each modelling performance measure and each focal
species separately, we calculated the minimum sample size
(MSS, number of secondary sampling units) and sampling
coverage (MSC, percentage of the study area) required to
achieve an acceptable level of modelling performance,
defi ned as the lowest x value for which the mean predicted y
value was within the 95% confi dence limits around the
asymptote value (Fig. 3).
We calculated prevalence, range size and degree of
habitat specialization for each focal species (Table 1) to eval-
uate how these features infl uence the MSS.  e species
prevalence was calculated from the whole set of secondary
sampling units as the proportion of units in which the spe-
cies was present. Species range size was calculated as the
number of primary sampling units in which the species was
recorded with probable or confi rmed breeding evidence
(McPherson et al. 2004). We used a k -means clustering
analysis (Legendre and Legendre 2012) based on the con-
tinuous environmental variables (Supplementary material
Appendix 1, Table A1) to allocate the primary sampling
units to diff erent habitat classes (n 10 based on an analy-
sis of the decrease in the total error sum of squares with
increasing number of classes) and we used the species abun-
dance data in the primary sampling units to calculate the
degree of habitat specialization for each focal species as
the coeffi cient of variation ( standard deviation/average)
of the average species densities among the habitat classes
(see details in Julliard et al. 2006). We used a redundancy
analysis (RDA) to examine how much of the among-species
variation in the MSS was explained by variation in preva-
lence, range size and habitat specialization (Legendre and
Legendre 2012). In order to present the results in a simpli-
ed manner, the set of focal species was divided in equal-
size categories according to prevalence (high- and
low-prevalence species), range size (wide- and restricted-
range species) and degree of habitat specialization (high-
and low-specialization species).
Results
Range size was positively correlated with prevalence (r 0.70,
p 0.0006) and negatively with habitat specialization
(r 0.69, p 0.0007), but prevalence was not related to
habitat specialization (r 0.19, p 0.4131). e training
sample prevalence in the random subsets of secondary
sampling units was independent of sample size (Fig. 4)
and refl ected the prevalence of the focal species in the whole
set of sampling units (Table 1). In contrast, the proportion
of the species environmental range represented in the subsets
of secondary sampling units increased with sample size
according to an exponential rise to maximum function
(Fig. 5).  is indicates that, even with the implementation
of a stratifi ed random sampling procedure, the complete
range of conditions used by the species is only partly cap-
tured with very small sample sizes.
Transect-based model evaluation
To evaluate the performance of the transect-based models,
we fi rst calculated an omission rate to measure the percent-
age of presence records in the evaluation territory-mapping
data that were mistakenly classifi ed as absences. Second, the
area under the curve (AUC) of a receiver operating charac-
teristic (ROC) plot was used as a threshold-independent
measure of modelling performance (Fielding and Bell
1997). ROC plots were computed using presence and back-
ground data in the evaluation dataset. AUC values refl ected
the ability of the transect-based models to discriminate
between presence data and a randomly selected secondary
sampling unit (see details in Phillips et al. 2006, Jim é nez-
Valverde 2012).  ird, we computed misclassifi cation
matrices to calculate the agreement between the binary pre-
dictions of the transect-based and the territory-based mod-
els based on the Cohen s kappa (Fielding and Bell 1997).
e kappa value documented the extent to which the out-
put of the transect-based models converged on those of the
territory-based reference models (Hernandez et al. 2006).
Statistical analysis
Before analysing the modelling performance, we evaluated
the extent to which the diff erent subsets of secondary sam-
pling units captured the range of environmental conditions
used by the species. To do this, the range of all continuous
variables was fi rst normalized between 0 and 1 using a linear
scaling transformation. Second, we calculated for each
focal species and in each random subset of secondary sam-
pling units, the diff erence between the maximum and
the minimum values of the environmental variables associ-
ated with a presence record.  ird, we calculated the arith-
metic mean of these diff erences among all environmental
variables to represent the width of the environmental range
covered by the species in the random subsets of secondary
sampling units. Fourth, we applied the same procedure
for each focal species to the full set of evaluation territory-
mapping data. Fifth, we computed the environmental
range overlap for each focal species as the ratio between the
environmental range covered by the species in the random
subsets of secondary sampling units and in the territory-
mapping data (Wisz et al. 2008, Feeley and Silman 2011).
Modelling performance was expected to increase with
sample size and to level off beyond a suffi cient number of
secondary sampling units (Hernandez et al. 2006, Wisz et al.
2008). We plotted modelling performance measures against
sample size and we fi tted exponential functions to the data.
An exponential rise to maximum function was used for
the AUC and kappa values:
y a (1 e bx ) (1)
Where y is the modelling performance measure, x is the
sample size, a is the maximum asymptote y value, and b is
the rise constant.
An exponential decay function was used for the omission
rate:
y y 0 a e bx (2)
35
Figures 3 and 6 show the modelling performance
(omission rate, AUC and kappa value) obtained with a num-
ber of sampling units covering 0.5 to 12% of the study area.
Table 1 summarizes the minimum sample size (MSS) and
coverage (MSC) calculated for each focal species according
to the diff erent modelling performance measures.
e constrained axes of the redundancy analysis
(RDA, Fig. 7) explained together 57% of the total variance
in the data (R
2
adjusted 0.47). Only the fi rst RDA axis was
found to be statistically signifi cant (permutation tests,
p 0.001), accounting for more than 97% of the explained
variance.  e rst RDA axis was for the most part related
to the prevalence of the species (canonical coeffi cient 2.39)
and, to a much lower extent, to habitat specialization (0.12)
and range size (0.04). Hence, the RDA results indicated
that the prevalence of the focal species in the whole set of
secondary sampling units had the most prominent infl uence
on the MSS and that the eff ect of range size and habitat
specialization can be considered negligible.
On average, the omission rate was lower for high-
prevalence species than for low-prevalence species over the
entire gradient of sample size, but the diff erence was
decreasingly pronounced with an increasing sample size
(Fig. 6).  e exponential functions were estimated to reach
their minimal value with a smaller sample size (MSS
320 29 SE) or sampling coverage (MSC 1.93% 0.18%)
in high-prevalence species than in low-prevalence species
(MSS 809 106, MSC 4.87% 0.64%). e AUC
was only weakly sensitive to sample size and levelled off
at smaller sample size in high-prevalence species (MSS
115 14, MSC 0.69% 0.09%) than in low-prevalence
species (MSS 183 15, MSC 1.10% 0.09%). e
kappa value increased consistently with sample size, thereby
indicating that the predictions of the transect-based
models gradually converged on those of the reference
territory-based models. Kappa values were particularly
aff ected by small sample size in low-prevalence species:
they levelled off at smaller sample size in high-prevalence
species (MSS 369 57, MSC 2.22% 0.35%) than in
Figure 3. Identifi cation of the minimum sampling coverage
required to achieve an acceptable level of modelling performance
according to (a) omission rate, (b) AUC and (c) kappa value for
Lanius collurio . Black dots represent the modelling performance
measures for the transect-based models fi tted with the diff erent
subsets of secondary sampling units. Continuous and dashed
black lines are the predicted average 95% confi dence intervals
after minimum square fi t to the exponential function (Eq. 1 and 2).
Grey areas represent the 95% confi dence interval around the
estimated (a) minimum or (b, c) maximum asymptote value.  e
dotted vertical lines indicate the minimum sampling coverage
above which the modelling performance is considered to become
stable.
Figure 4. Average training sample prevalence of the diff erent species
along the gradient of sampling coverage. Sample prevalence was
calculated as the proportion of secondary sampling units with
species presence records in each individual subset of the units used
to fi t the models.
36
and temporal replication in the data collection strategy to
minimize uncertainties associated with the estimation of
changes in state variables over time (Rhodes and Jonz é n
2011, Guillera-Arroita and Lahoz-Monfort 2012). In line
with previous studies (De C á ceres and Brotons 2012,
Rodhouse et al. 2012, K é ry et al. 2013), we argue that
monitoring data may also be cost-eff ectively collected and
used in species distribution models to document the spatial
distribution of the species.
Although the infl uence of sample size on the performance
of species distribution models is reported in many studies
(Wisz et al. 2008), only few have addressed this issue
when models are built with data from monitoring projects
(Brotons et al. 2007).  is is mostly due to the fact
that dynamic distribution mapping is seldom explicitly
addressed when setting the objectives of a project. If such an
objective is integrated after the start of the project, the
available data have been typically collected in a limited num-
ber of sampling units.  is sampling design prevents from
evaluating modelling performance over a broad gradient of
sample sizes and from identifying how large the sample size
should be to obtain an acceptable performance. On the other
hand, monitoring data are unavailable when species distribu-
tion mapping is considered as an objective before the start of
data collection. Other sources of information are therefore
needed to help optimising the initial sampling design.
Here, we provide an analytical framework that makes use
of data from large-scale last-generation atlases with two-
stage sampling design to examine the infl uence of sample
size on modelling performance and to identify how large the
number of sampling units should be in a monitoring project
to derive accurate species distribution maps.  e method
does not rely on existing data from already running moni-
toring projects and, hence, it may be applied before the start
of fi eld data collection when decisions about sampling
design are made.  e innovative idea was to consider part of
the data collected during last-generation atlas projects as
analogous to those derived from long-term monitoring
projects (Van Swaay et al. 2008, Vor í sek et al. 2008). In
contrast with previous studies focusing on the link between
monitoring projects and distribution modelling approaches
(Brotons et al. 2007), the manipulation of atlas data allowed
us to simulate a broad gradient of sample size in order to
identify an optimal number of sampling units to achieve an
acceptable modelling performance.  e analytical frame-
work may be easily implemented wherever such atlas data
are available and where the sampling strategies of monitor-
ing projects need to be optimised to map species distribu-
tions. Although we used bird data to illustrate our method,
it is important to note that it may also be applied to other
species groups for which atlas data are collected, at least
partly, in the same way as in a monitoring project, such as in
butterfl ies (Maes et al. 2013) or bats (Carden et al. 2010).
We showed that modelling performance was sensitive to
particularly small sample sizes and reached an asymptote
level beyond a suffi ciently large number of sampling units.
is result is especially interesting because it is generally
assumed or reported that modelling performance increases
with sample size (McPherson et al. 2004, Feeley and Silman
2011), without examining how large sample size should be
to obtain suffi ciently well-performing models. Wintle and
Figure 5. Average ( 95% confi dence intervals) proportion of the
species environmental range covered by the subsets of secondary
sampling units along the gradient of sampling coverage for
(a) high- and low-prevalence species, (b) wide- and restricted-
range species, and (c) low- and high-specialization species.
low-prevalence species (MSS 991 111, MSC
5.97% 0.67%).
Discussion
When designing a monitoring project to estimate biodiver-
sity dynamics, a trade-off is typically made between spatial
37
Figure 6. Average ( 95% confi dence intervals) (a, d, g) omission rate, (b, e, h) AUC and (c, f, i) kappa values along the gradient
of sampling coverage for high- and low-prevalence species, wide- and restricted-range species, and low- and high-specialization species.
Bardos (2006) and Jim é nez-Valverde et al. (2009) have pre-
viously studied the infl uence of sample size on modelling
performance and also showed that the eff ect of sample size
becomes apparent only below a certain threshold, but their
studies were conducted with virtual species and may only
partly refl ect monitoring data.
e prevalence of the species in the random subsets of
secondary sampling units remained stable along the gradi-
ent of sample sizes and refl ected the prevalence of the
species in the whole set of sampling units. So, the link
between modelling performance and sample size was
independent of the proportion of sampling units with
species presence records. Hence, we avoided the confusion
between the eff ects of sample size and training sample prev-
alence (McPherson et al. 2004). In contrast, the extent to
which the subsets of sampling units covered the range of
environmental conditions used by the species was found to
decrease with sample size and this contributes to explaining
why the ability of the models to capture the environmental
response of the species decreased markedly below a certain
sample size.  is issue underlines the importance of using a
well-designed sampling procedure: the stratifi ed random
sampling that we implemented (see also Jim é nez-Valverde
et al. 2009) maximizes the chances to sample species distri-
bution along the whole environmental gradient of the study
area even when sample size decreases (Hortal and Lobo
2005,  ompson 2012). Below a certain sample size, the
number of species presence records is, however, insuffi cient
to cover the full range of environmental conditions used by
the species and the modelling performance becomes less
stable and much lower (see also Wintle and Bardos 2006,
Wisz et al. 2008).
e minimum sample size required to ensure an accept-
able level of modelling performance was strongly related to
the prevalence of the species in the sampling units. On aver-
age, the minimum sample size was larger in low-prevalence
species than in high-prevalence species. In contrast, the
decrease in modelling performance with increasingly smaller
sample size was found to be comparable in restricted-
and wide-range species and in low- and high-specialization
species. A large part of the among-species variance in the
minimum sample size remained, however, unexplained and
may be related to additional methodological issues or eco-
logical processes. As imperfect detection of the species
may confound the link between species distribution and
environmental conditions, it is for instance warranted to
analyse how detection rates may infl uence modelling perfor-
mance. Rota et al. (2011) showed that using occupancy
models to account for imperfect detection may contribute
to improving modelling performance and relevance, espe-
cially in situations where detection probability varies along
with environmental conditions (see also K é ry et al. 2010).
38
Figure 7. First two dimensions (RDA1 and RDA2) of the ordination space from the redundancy analysis (RDA, type-2 scaling).
e explanatory variables (prevalence, range size and habitat specialization) are represented with arrows and the response variables
(minimum sampling coverage according to omission rate, AUC and Kappa) are represented with bold black lines. Species are plotted using
their code names (Table 1).
Such approaches are based on observation data collected
during repeated surveys in the sampling units and are, there-
fore, only poorly suited to the context of the present study,
as replicated observations are generally unavailable when set-
ting the objectives of a monitoring project (but see Van
Strien et al. 2013). Other ecological processes may have a
direct or indirect infl uence on the minimum sample size.
For instance, biotic interactions such as competition (with
conspecifi c individuals or with other species) and predation
may alter the location of the individuals in the landscape
and shape the realized distribution of the species (Cadena
and Loiselle 2007, Lima 2009). Although modelling tools
become increasingly available to deal with this issue
(Boulangeat et al. 2012, Wisz et al. 2013), such factors are
probably beyond the scope of the analyses that could be
done with the available atlas data.
We also have to stress the point that further work should
include additional species because the set of species used
in this study had to satisfy a number of criteria for the
modelling exercise, which resulted in the use of a limited
number of species that may only partly refl ect the entire bird
species assemblage. One of the most restrictive criteria was
the availability of a suffi cient amount of territory-mapping
data to evaluate modelling performance. Such information
was collected only for low- to moderate-abundance species
in the atlas project. In order to increase the number of
species in the analysis, a promising approach would be to
use the increasingly available information on species distri-
bution derived from web-based encoding systems for casual
observation data (Sullivan et al. 2009).
When applying our innovative approach to the low- to
moderate-abundance bird species in southern Belgium, a
minimum sampling coverage of 4 5% (n 664 – 830)
was found to be needed in order to achieve an acceptable
level of modelling performance for the majority of the stud-
ied species. Interestingly, Hoeting et al. (2000) and Wintle
and Bardos (2006) obtained similar results with their simu-
lated data refl ecting plant and mammal distributions.
However, the estimated minimum sampling coverage should
probably not be considered as a rule of thumb. First,
our results revealed considerable among-species variation in
this minimum sample size. Second, the heterogeneity of
the study area and the variables that are used to quantify the
environmental conditions undoubtedly infl uence the num-
ber of sampling units needed to capture the link between
species distribution and environmental conditions.
is application in southern Belgium illustrates that
a substantial sampling coverage may be needed to derive
39
Boulangeat, I. et al. 2012. Accounting for dispersal and biotic
interactions to disentangle the drivers of species distributions
and their abundances. Ecol. Lett. 15: 584 593.
Braunisch, V. and Suchant, R. 2010. Predicting species distributions
based on incomplete survey data: the trade-off between
precision and scale. Ecography 33: 826 840.
Brotons, L. et al. 2004. Presence absence versus presence-only
modelling methods for predicting bird habitat suitability.
Ecography 27: 437 – 448.
Brotons, L. et al. 2006. Spatial modeling of large-scale bird
monitoring data: towards pan-European quantitative
distribution maps. J. Ornithol. 147: 29.
Brotons, L. et al. 2007. Updating bird species distribution at large
spatial scales: applications of habitat modelling to data
from long-term monitoring programs. Divers. Distrib. 13:
276 – 288.
Cadena, C. D. and Loiselle, B. A. 2007. Limits to elevational
distributions in two species of emberizine fi nches: disentangling
the role of interspecifi c competition, autoecology, and
geographic variation in the environment. Ecography
30: 491 – 504.
Carden, R. et al. 2010. Irish bat monitoring schmes: ATLAS
Republic of Ireland. Report for 2008 2009. Bat Conservation
Ireland.
De C á ceres, M. and Brotons, L. 2012. Calibration of hybrid
species distribution models: the value of general-purpose vs.
targeted monitoring data. Divers. Distrib. 18: 977 982.
Dunn, A. M. and Weston, M. A. 2008. A review of terrestrial
bird atlases of the world and their application. Emu 108:
42 – 67.
Elith, J. et al. 2010.  e art of modelling range-shifting species.
Methods Ecol. Evol. 1: 330 342.
Estrada, J. et al. 2004. Atles dels ocells nidifi cants de Catalunya
(1999 – 2002). Lynx Editions.
Feeley, K. J. and Silman, M. R. 2011. Keep collecting: accurate
species distribution modelling requires more collections
than previously thought. Divers. Distrib. 17: 1132 1140.
Fielding, A. H. and Bell, J. F. 1997. A review of methods for the
assessment of prediction errors in conservation presence/
absence models. Environ. Conserv. 24: 38 49.
Gitzen, R. A. et al. 2012. Design and analysis of long-term
ecological monitoring studies. Cambridge Univ. Press.
Guillera-Arroita, G. and Lahoz-Monfort, J. J. 2012. Designing
studies to detect diff erences in species occupancy: power
analysis under imperfect detection. Methods Ecol. Evol.
3: 860 – 869.
Guisan, A. et al. 2013. Predicting species distributions for
conservation decisions. Ecol. Lett. 16: 1424 1435.
Hernandez, P. A. et al. 2006.  e eff ect of sample size and species
characteristics on performance of diff erent species distribution
modeling methods. Ecography 29: 773 785.
Hirzel, A. H. and Le Lay, G. 2008. Habitat suitability modelling
and niche theory. J. Appl. Ecol. 45: 1372 1381.
Hoeting, J. A. et al. 2000. An improved model for spatially
correlated binary responses. J. Agric. Biol. Environ. Stat.
5: 102 – 114.
Hooten, M. B. et al. 2009. Optimal spatio-temporal hybrid
sampling designs for ecological monitoring. J. Veg. Sci.
20: 639 – 649.
Hooten, M. B. et al. 2012. Optimal spatio-temporal monitoring
designs for characterizing population trends. In: Gitzen, R.
A. et al. (eds), Design and analysis of long-term ecological
monitoring studies. Cambridge Univ. Press, pp. 443 459.
Hortal, J. and Lobo, J. M. 2005. An ED-based protocol for
optimal sampling of biodiversity. Biodivers. Conserv. 14:
2913 – 2947.
Jacob, J. P. et al. 2010. Atlas des oiseaux nicheurs de Wallonie.
Aves & D é pt de l’Etude du Milieu Naturel et Agricole (Service
accurate species distribution models from long-term moni-
toring data. A sampling coverage of 4 5% of the study area
is actually much higher than the coverage implemented in
most of existing monitoring programmes worldwide. It may
then become logistically diffi cult to fi nd a trade-off between
the number of sampling units and the number of repeated
surveys in order for the same monitoring project to integrate
in its objectives both the estimation of changes in occupancy
or abundance and the mapping of species distribution.
Interestingly, Hooten et al. (2009, 2012) used an optimal
hybrid sampling design to combine diff erent objectives in a
single long-term monitoring project. In line with such an
approach, a fi xed subset of sampling units may be repeatedly
surveyed within and between seasons (static design) to esti-
mate species occupancy and detection probability, while
a roving subset of sampling units may be surveyed less
frequently over time (dynamic design) to increase spatial
knowledge for distribution mapping. Both static and
dynamic designs have advantages and disadvantages
(MacKenzie and Royle 2005, Wikle and Royle 2005), but an
appropriate allocation of sampling eff ort between fi xed and
roving units may contribute to combining several monitor-
ing and mapping objectives (Hooten et al. 2009). In this con-
text, our methodological approach constitutes a pilot analysis
able to provide an initial estimate of the total number of sam-
pling units needed when monitoring data are not yet avail-
able and to help putting the monitoring eff ort into place in
order to reach one of the objectives. It is now important to
consider additional optimisation criteria and to further inte-
grate such approaches into a more general analytical framework
to evaluate whether this initial sampling design will be suited
to document species distribution dynamics and to estimate
changes in the selected state variables or, alternatively, how
the design has to be modifi ed to better reach the multiple
(and sometimes confl icting) objectives. In this respect, adap-
tive sampling design (Hooten et al. 2012) may prove a useful
approach as it focuses on adjusting the sampling strategy on a
regular basis as new information is gained in order to improve
the cost-effi ciency of the monitoring project.
Acknowledgements We thank participants in the Breeding Bird
Atlas of Wallonia project for fi eldwork and Christophe Dehem,
Marc De Sloover, Marc Fasol, Jean-Paul Jacob and  ierry
Kinet for data or project management. Land use (COSW and
IGN), land management (SIGEC) and soil (CNSW) maps were
provided by the Direction G é n é rale de l Agriculture, des Ressources
Naturelles et de l Environnement (DGARNE) of the Service
Public de Wallonie (SPW).  e BBAW project was funded by the
Service Public de Wallonie (SPW-DGO3). OA was funded by the
National Research Fund, Luxembourg (FNR-AFR-PHD-08-63).
NT and LB were funded through the EU BON project (contract
no. 308454; FP7-ENV-2012, European Commission).  is
work has also been supported by the European infrastructure for
biodiversity and ecosystem research (LifeWatch).
References
Ara ú jo, M. B. and Guisan, A. 2006. Five (or so) challenges for
species distribution modelling. J. Biogeogr. 33: 1677 1688.
Bean, W. T. et al. 2012. e eff ects of small sample size and sample
bias on threshold selection and accuracy assessment of species
distribution models. Ecography 35: 250 258.
40
Phillips, S. J. et al. 2006. Maximum entropy modeling of
species geographic distributions. Ecol. Model. 190:
231 – 259.
Rhodes, J. R. and Jonz é n, N. 2011. Monitoring temporal trends
in spatially structured populations: how should sampling
eff ort be allocated between space and time. Ecography 34:
1040 – 1048.
Rodhouse, T. J. et al. 2012. Assessing the status and trend of bat
populations across broad geographic regions with dynamic
distribution models. Ecol. Appl. 22: 1098 1113.
Rodr í guez, J. P. et al. 2007. e application of predictive modelling
of species distribution to biodiversity conservation. Divers.
Distrib. 13: 243 – 251.
Rota, C. T. et al. 2011. Does accounting for imperfect detection
improve species distribution models? Ecography 34:
659 – 670.
Royle, J. A. and Nichols, J. D. 2003. Estimating abundance from
repeated presence absence data or point counts. Ecology 84:
777 – 790.
Sullivan, B. L. et al. 2009. eBird: a citizen-based bird observation
network in the biological sciences. Biol. Conserv. 142:
2282 – 2292.
ompson, S. K. 2012. Sampling. Wiley.
Timothy, J. and Sharrock, R. 1974. Minutes of the second meeting
of the European Ornithological Committee. Acta Ornithol.
14: 404 – 411.
Titeux, N. et al. 2007. Fitness-related parameters improve
presence-only distribution modelling for conservation practice:
the case of the red-backed shrike. Biol. Conserv. 138:
207 – 223.
Van Strien, A. J. et al. 2013. Opportunistic citizen science data
of animal species produce reliable estimates of distribution
trends if analysed with occupancy models. J. Appl. Ecol. 50:
1450 – 1458.
Van Swaay, C. A. M. et al. 2008. Butterfl y monitoring in Europe:
methods, applications and perspectives. Biodivers. Conserv.
17: 3455 – 3469.
Vor í sek, P. et al. 2008. A best practice guide for wild bird monitoring
schemes. CSO/RSPB.
Wikle, C. K. and Royle, J. A. 2005. Dynamic design of ecological
monitoring networks for non-Gaussian spatio-temporal data.
Environmetrics 16: 507 – 522.
Wintle, B. A. and Bardos, D. C. 2006. Modeling species
habitat relationships with spatially autocorrelated observation
data. Ecol. Appl. 16: 1945 1958.
Wisz, M. S. et al. 2008. Eff ects of sample size on the
performance of species distribution models. Divers. Distrib.
14: 763 773.
Wisz, M. S. et al. 2013.  e role of biotic interactions in
shaping distributions and realised assemblages of species:
implications for species distribution modelling. Biol. Rev. 88:
15 – 30.
Public de Wallonie Direction g é n é rale de l’Agriculture, des
Ressources naturelles et de l’Environnement).
Jim é nez-Valverde, A. 2012. Insights into the area under the receiver
operating characteristic curve (AUC) as a discrimination
measure in species distribution modelling. Global Ecol.
Biogeogr. 21: 498 – 507.
Jim é nez-Valverde, A. et al. 2009. e eff ect of prevalence and its
interaction with sample size on the reliability of species
distribution models. Community Ecol. 10: 196 205.
Julliard, R. et al. 2006. Spatial segregation of specialists and
generalists in bird communities. Ecol. Lett. 9: 1237 1244.
K é ry, M. et al. 2009. Trend estimation in populations
with imperfect detection. J. Appl. Ecol. 46: 1163 1172.
K é ry, M. et al. 2010. Predicting species distributions from
checklist data using site-occupancy models. J. Biogeogr. 37:
1851 – 1862.
K é ry, M. et al. 2013. Analysing and mapping species range dynamics
using occupancy models. J. Biogeogr. 40: 1463 1474.
Legendre, P. and Legendre, L. 2012. Numerical ecology.
Elsevier.
Lima, S. L. 2009. Predators and the breeding bird: behavioral and
reproductive fl exibility under the risk of predation. Biol.
Rev. 84: 485 – 513.
Lindenmayer, D. B. et al. 2012. Improving biodiversity monitoring.
Austral Ecol. 37: 285 – 294.
Lobo, J. M. et al. 2010.  e uncertain nature of absences and
their importance in species distribution modelling. Ecography
33: 103 – 114.
MacKenzie, D. I. 2012. Study design and analysis options
for demographic and species occurrence dynamics. In: Gitzen,
R. A. et al. (eds), Design and analysis of long-term ecological
monitoring studies. Cambridge Univ. Press, pp. 397 425.
MacKenzie, D. I. and Royle, J. A. 2005. Designing occupancy
studies: general advice and allocating survey eff ort. J.
Appl. Ecol. 42: 1105 1114.
MacKenzie, D. I. et al. 2005. Occupancy estimation and modeling:
inferring patterns and dynamics of species occurrence.
Elsevier.
Maes, D. et al. 2012. Applying IUCN Red List criteria at a
small regional level: a test case with butterfl ies in Flanders
(north Belgium). Biol. Conserv. 145: 258 266.
Maes, D. et al. 2013. Dagvlinders in Vlaanderen: nieuwe kennis
voor betere actie. Uitgeverij Lannoo nv.
Martin, Y. et al. 2013. Testing instead of assuming the importance
of land use change scenarios to model species distributions
under climate change. Global Ecol. Biogeogr. 22: 1204 1216.
McPherson, J. M. et al. 2004.  e eff ects of species range sizes on
the accuracy of distribution models: ecological phenomenon
or statistical artefact? J. Appl. Ecol. 41: 811 823.
Pearson, R. G. et al. 2007. Predicting species distributions from
small numbers of occurrence records: a test case using cryptic
geckos in Madagascar. J. Biogeogr. 34: 102 117.
Supplementary material (Appendix ECOG-00749 at
www.ecography.org/readers/appendix ). Appendix 1.
... Ensuring the inclusion of key plots with comprehensive sampling in any subsample is therefore vital for preserving data integrity. A mixed monitoring strategy, which combines a reduction in the number of plots with intensified sampling in critical areas, could help balance resource efficiency with the need for robust conservation (Gardner 2010;Bicknell et al. 2014;Aizpurua et al. 2015). ...
Article
Full-text available
The Bohemian Forest spans the borders of Bavaria, Czechia and Upper Austria, and is important for studying forest biodiversity in central European mountain ecosystems. This study focuses on assessing the patterns in biodiversity in the Šumava National Park. Species richness, Shannon diversity index, evenness and dominance were determined for 117 forest plots (large sample) and a subsample of 49 plots (small sample) using comprehensive monitoring techniques within the Silva Gabreta project, a cross-border initiative implemented together with the Bavarian Forest National Park. Data were collected for the following taxonomic groups: plants, fungi, mammals and invertebrates, using a variety of trapping methods and survey techniques. Results indicate significant differences in the number of species in the different taxonomic groups, with Lepidoptera, fungi and Bryophyta with the highest species richness and diversity, whereas groups such as Neuroptera, Curculionidae and mammals had lower values. Although most biodiversity indicators were not significantly different between the large and the small sample at the taxonomic level, species richness and Shannon diversity were higher in the small sample. This may be attributed to the trapping methods used in those plots, which are likely to have resulted in more complete captures of the species than in the plots of the large sample. The findings indicate that 49 plots are a suitable number for long-term biodiversity monitoring, provided key plots with efficient trapping setups are included. This study highlights the importance of careful plot selection and suggests that a mixed monitoring strategy, incorporating both broad taxonomic assessments and targeted approaches for specific taxa, may be the most effective for monitoring biodiversity.
... Note that this grid resolution also (Gibbons et al., 1993). The use of the grid referenced by an atlas to define PSUs for subsequent abundance estimation is a recent development in atlas production (see, for example, Gibbons et al., 2007;Aizpurua et al., 2015;McCabe et al., 2018). We can therefore see a link, or even a convergence, between atlas production and monitoring programs. ...
... For SDMs to be used effectively, it is therefore essential that such model outputs are accurate representations of the true distributions. The accuracies of SDMs are dependant not only upon the sampling effort (the quantity of occurrence data) used to generate the models (Aizpurua et al., 2015;Valavi et al., 2021), but also the spatial configuration of those sampling points (Kramer-Schadt et al., 2013;Syfert et al., 2013), particularly for presence-pseudoabsence models (Barbet-Massin et al., 2012;Phillips et al., 2009). As there is likely to be considerable sample selection bias in occurrence data, any SDM therefore risks conflating modelling species distribution with modelling this sampling bias (Beck et al., 2014;Phillips et al., 2009;Ploton et al., 2020;Radosavljevic and Anderson, 2014). ...
Article
Full-text available
Species distribution models (SDMs) are key tools in biodiversity and conservation, but assessing their reliability in unsampled locations is difficult, especially where there are sampling biases. We present a spatially-explicit sensitivity analysis for SDMs – SDM profiling – which assesses the leverage that unsampled locations have on the overall model by exploring the interaction between the effect on the variable response curves and the prevalence of the affected environmental conditions. The method adds a ‘pseudo-presence’ and ‘pseudo-absence’ to unsampled locations, re-running the SDM for each, and measuring the difference between the probability surfaces of the original and new SDMs. When the standardised difference values are plotted against each other (a ‘profile plot’), each point's location can be summarized by four leverage measures, calculated as the distances to each corner. We explore several applications: visualization of model certainty; identification of optimal new sampling locations and redundant existing locations; and flagging potentially erroneous occurrence records.
... Among several applications, SDMs can be used to anticipate the impacts of environmental drivers on species performance (Elith et al. 2010), a critical step for effective conservation. SDMs can also help identify priority areas for conservation (Arcos et al. 2012) or optimise long-term monitoring protocols (Aizpurua et al. 2015), which are especially important for species at risk. Ideally, a good recovery plan also requires adaptive management, i.e. information should be obtained about the management effectiveness by monitoring the outcomes (McCarthy & Possingham 2007), with the aim of updating knowledge and improving decision-making over time (Canessa et al. 2016). ...
Article
Full-text available
Despite much discussion about the utility of remote sensing for effective conservation, the inclusion of these technologies in species recovery plans remains largely anecdotal. We developed a modeling approach for the integration of local, spatially measured ecosystem functional dynamics into a species distribution modeling (SDM) framework in which other ecologically relevant factors are modeled separately at broad scales. To illustrate the approach, we incorporated intraseasonal water‐vegetation dynamics into a cross‐scale SDM for the Common Snipe (Gallinago gallinago), which is highly dependent on water and vegetation dynamics. The Common Snipe is an Iberian grassland waterbird characteristic of European agricultural meadows and a member of one of the most threatened bird guilds. The intraseasonal dynamics of water content of vegetation were measured using the standard deviation of the normalized difference water index time series computed from bimonthly images of the Sentinel‐2 satellite. The recovery plan for the Common Snipe in Galicia (northwestern Iberian Peninsula) provided an opportunity to apply our modeling framework. Model accuracy in predicting the species’ distribution at a regional scale (resulting from integration of downscaled climate projections with regional habitat–topographic suitability models) was very high (area under the curve [AUC] of 0.981 and Boyce's index of 0.971). Local water‐vegetation dynamic models, based exclusively on Sentinel‐2 imagery, were good predictors (AUC of 0.849 and Boyce's index of 0.976). The predictive power improved (AUC of 0.92 and Boyce's index of 0.98) when local model predictions were restricted to areas identified by the continental and regional models as priorities for conservation. Our models also performed well (AUC of 0.90 and Boyce's index of 0.93) when projected to updated water‐vegetation conditions. Our modeling framework enabled incorporation of key ecosystem processes closely related to water and carbon cycles while accounting for other factors ecologically relevant to endangered grassland waterbirds across different scales, allowed identification of priority areas for conservation, and provided an opportunity for cost‐effective recovery planning by monitoring management effectiveness from space.
... Although both GBIF and the atlases yielded comparable numbers of records, as expected the spatial coverage provided by the atlas data was higher [50] (see Fig C in S1 Appendix). Grid-based biological atlases are often generated by compiling existing data, aggregating them to a reference grid that is reported as a single, generalized georeferencing for the data, and then seeking to fill in the cells lacking in data [51]. ...
Article
Full-text available
The advent of online data aggregator infrastructures has facilitated the accumulation of Digital Accessible Knowledge (DAK) about biodiversity. Despite the vast amount of freely available data records, their usefulness for research depends on completeness of each body of data regarding their spatial, temporal and taxonomic coverage. In this paper, we assess the completeness of DAK about terrestrial mammals distributed across the Iberian Peninsula. We compiled a dataset with all records about mammals occurring in the Iberian Peninsula available in the Global Biodiversity Information Facility and in the national atlases from Portugal and Spain. After cleaning the dataset of errors as well as records lacking collection dates or not determined to species level, we assigned all occurrences to a 10-km grid. We assessed inventory completeness by calculating the ratio between observed and expected richness (based on the Chao2 richness index) in each grid cell and classified cells as well-sampled or under-sampled. We evaluated survey coverage of well-sampled cells along four environmental gradients and temporal coverage. Out of 796,283 retrieved records, quality issues led us to remove 616,141 records unfit for this use. The main reason for discarding records was missing collection dates. Only 25.95% cells contained enough records to robustly estimate completeness. The DAK about terrestrial mammals from the Iberian Peninsula was low, and spatially and temporally biased. Out of 5,874 cells holding data, only 620 (9.95%) were classified as well-sampled. Moreover, well-sampled cells were geographically aggregated and reached inventory completeness over the same temporal range. Despite the increasing availability of DAK, its usefulness is still compromised by quality issues and gaps in data. Future work should therefore focus on increasing data quality, in addition to mobilizing unpublished data.
... Such databases contain errors (e.g. geographic bias, observer and spatial errors), although many of these can be accounted for and minimised (McCarthy 1998, Aizpurua et al. 2015. Importantly, because these databases are frequently the only sources of records spanning the timeframes and geographic extent required to characterise temporal and spatial changes in species' distributions, they are currently the best means of potentially uncovering the drivers of these trends (Shaffer et al. 1998). ...
Thesis
Full-text available
Dryland Australia has a distinctive mammal fauna that has been severely impacted by novel threats since European colonisation. I aimed to understand the defining characteristics of mammal refuges in this region. In chapter 2 I used atlas data to compare the historic and contemporary distributions of dryland marsupials. The greater bilby and common brushtail possum have substantially contracted in distribution. The bilby was more likely to occur on land without cattle grazing and with low rabbit densities, while the possum has contracted to cooler areas. In chapter 3 I focused on the MacDonnell Ranges to understand the factors protecting declining mammals. Predation was supported as a major driver of extant mammal richness and vast areas of rugged terrain provide vital refuge for dryland mammals. In chapter 4 I consider the hypothesis that trophic competition between the dingo and cat creates refuge from predation for small mammals by analysing the diets of the two predators for evidence of competition. I conclude that habitat complexity underpins the refuge and that effects of dingo predation on the cat population are of secondary importance. In chapters 5-7 I focused on the critically endangered central rock-rat (CRR). My habitat suitability maps confirmed a dramatic range contraction for this species over the last 100 years and their current association with extreme ruggedness supported the hypothesis that the impact of cat predation is mediated by habitat complexity. I established the effectiveness of camera trapping for sampling the CRR and, using this sampling tool, found that CRR occupancy was positively associated with areas burnt within the past 5 years and that cats forage less frequently in areas with dense hummock grass cover. Fire management could be used as a tool for rodent conservation in this environment. In chapter 8 I synthesise my findings and provide a framework for research on declining fauna.
... In the context of species distribution modeling, most studies (surveys, plant coverage surveys, air pollution surveys, etc.) have been repeated periodically for long periods of time (Gitzen 2012; Aizpurua et al. 2015). Although the main interest is the spatial evolution of the system under study, it must be considered that it varies not only in space but also in time. ...
Article
Full-text available
The use of complex statistical models has recently increased substantially in the context of species distribution behavior. This complexity has made the inferential and predictive processes challenging to perform. The Bayesian approach has become a good option to deal with these models due to the ease with which prior information can be incorporated along with the fact that it provides a more realistic and accurate estimation of uncertainty. In this paper, we first review the sources of information and different approaches (frequentist and Bayesian) to model the distribution of a species. We also discuss the Integrated Nested Laplace approximation as a tool with which to obtain marginal posterior distributions of the parameters involved in these models. We finally discuss some important statistical issues that arise when researchers use species data: the presence of a temporal effect (presenting different spatial and spatio-temporal structures), preferential sampling, spatial misalignment, non-stationarity, imperfect detection, and the excess of zeros.
... Importantly, issues of sampling design for programmes focusing on multiple species are complicated by differences in species detectability (MacKenzie & Royle 2005), habitat associations and resource availability (Reynolds et al. 2011, Carvalho et al. 2016. In response to these logistical challenges, many monitoring programmes have enlisted the assistance of the public in collecting data on species occurrence and abundance over broad geographical regions (Aizpurua et al. 2015). These citizen science efforts are now a mainstay of ecological monitoring and are considered a global tool in conservation (Greenwood 2007, Dunn & Weston 2008, Devictor et al. 2010. ...
Article
Selecting a sampling design to monitor multiple species across a broad geographical region can be a daunting task, and often involves tradeoffs between limited resources and the accurate estimation of population abundance and occurrence. Since the 1950s, biological atlases have been implemented in various regions to document the occurrence of plant and animal species. As next-generation atlases repeat original surveys, investigators often seek to raise the rigor of atlases by incorporating species abundances. We present a repeatable framework that incorporates existing monitoring data, hierarchical modelling, and sampling simulations to augment existing atlas occurrence and breeding status maps with a secondary sampling of species abundances. Using existing information on three bird species with varying abundance and detectability, we evaluated several sampling scenarios for the 2nd Wisconsin Breeding Bird Atlas. In general, we found that most sampling schemes produced accurate mean statewide abundance estimates for species with medium to high abundance and detection probability, but estimates varied significantly for species with low abundance and low detection probability. Our approach provided a statewide point-count sampling design that: provided precise and unbiased abundance estimates for species of varied prevalence and detectability; ensured suitable spatial coverage across the state and its habitats; and reduced spending on total survey costs. Our framework could benefit investigators conducting atlases and other broad-scale avian surveys that seek to add systematic, multi-species sampling for estimating density and abundance across broad geographic regions. This article is protected by copyright. All rights reserved.
Thesis
Full-text available
Wetlands are highly threatened ecosystems, especially in boreal landscapes. They provide valuable ecological services and serve as important habitats for many species. In northern Canada, wetlands are being transformed by natural disturbances, climate change, and human activities. These changes can lead to habitat loss for species dependent on wetlands. Small wetlands, particularly ponds, are crucial for amphibians, birds, and mammals. However, there is limited knowledge about vertebrate presence in small ponds. This thesis aimed to compare peatland ponds and beaver ponds and understand the influence of habitat factors and species interactions on amphibians, birds, and mammals. The research highlighted the importance of both pond types but suggested that beaver ponds are more productive for amphibians. Birds responded differently to the two pond types, with beaver ponds supporting greater species richness. The presence of red squirrels negatively affected bird species richness. Camera trap surveys revealed unexpected preferences of certain mammals and bird species for peatland ponds. Overall, understanding wetland ecosystems is vital for their conservation and the diverse range of species they support.
Article
Full-text available
Biodiversity conservation faces a methodological conundrum: Biodiversity measurement often relies on species, most of which are rare at various scales, especially prone to extinction under global change, but also the most challenging to sample and model. Predicting the distribution change of rare species using conventional species distribution models is challenging because rare species are hardly captured by most survey systems. When enough data is available, predictions are usually spatially biased toward locations where the species is most likely to occur, violating the assumptions of many modelling frameworks. Workflows to predict and eventually map rare species distributions imply important trade‐offs between data quantity, quality, representativeness, and model complexity that need to be considered prior to survey and analysis. Our opinion is that study designs need to carefully integrate the different steps, from species sampling to modelling, in accordance to the different types of rarity and available data in order to improve our capacity for sound assessment and prediction of rare species distribution. In this article, we summarize and comment on how different categories of species rarity lead to different types of occurrence and distribution data depending on choices made during the survey process, namely the spatial distribution of samples (where to sample) and the sampling protocol in each selected location (how to sample). We then clarify which species distribution models are suitable depending on the different types of distribution data (how to model). Among others, for most rarity forms, we highlight the insights from systematic species‐targeted sampling coupled with hierarchical models that allow correcting for overdispersion and for spatial and sampling sources of bias. Our article provides scientists and practitioners with a much‐needed guide through the ever‐increasing diversity of methodological developments to improve prediction of rare species distribution depending on rarity type and available data.
Book
Full-text available
The book summarises recommendations on establishing, running, and improving national wild bird monitoring schemes. The methodology is described in detail and includes field methods, sampling design, data management and analysis, and communication; including case studies from various countries. The guide will be distributed among the Pan-European Common Bird Monitoring Scheme (PECBMS) network of cooperating individuals and organisations across Europe, as well as through the European Bird Census Council national delegates and BirdLife International partner organisations. We hope that the first edition will contribute to and help to improve the high scientific standard of bird monitoring in Europe. Development of new bird monitoring schemes, as well as a need for improvements of existing schemes, brings an increasing need to use the highest level of scientifically sound methods for counting birds, analysing and presenting the data. Although general principles of bird monitoring are available in a form of textbooks and scientific papers, the information is scattered across many titles. Probably, more importantly, there is much good experience and practice across Europe, which can be shared and used for the development and improvement of monitoring schemes. Therefore, PECBMS, a common initiative of the European Bird Census Council and BirdLife International, decided to bring together and publish a Best Practice Guide summarizing the principles of good bird monitoring including case studies from European countries documenting details of various aspects of bird monitoring.
Chapter
Full-text available
To provide useful and meaningful information, long-term ecological programs need to implement solid and efficient statistical approaches for collecting and analyzing data. This volume provides rigorous guidance on quantitative issues in monitoring, with contributions from world experts in the field. These experts have extensive experience in teaching fundamental and advanced ideas and methods to natural resource managers, scientists and students. The chapters present a range of tools and approaches, including detailed coverage of variance component estimation and quantitative selection among alternative designs; spatially balanced sampling; sampling strategies integrating design- and model-based approaches; and advanced analytical approaches such as hierarchical and structural equation modelling. Making these tools more accessible to ecologists and other monitoring practitioners across numerous disciplines, this is a valuable resource for any professional whose work deals with ecological monitoring. Supplementary example software code is available online at www.cambridge.org/9780521191548.
Book
Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence, Second Edition, provides a synthesis of model-based approaches for analyzing presence-absence data, allowing for imperfect detection. Beginning from the relatively simple case of estimating the proportion of area or sampling units occupied at the time of surveying, the authors describe a wide variety of extensions that have been developed since the early 2000s. This provides an improved insight about species and community ecology, including, detection heterogeneity; correlated detections; spatial autocorrelation; multiple states or classes of occupancy; changes in occupancy over time; species co-occurrence; community-level modeling, and more. Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence, Second Edition has been greatly expanded and detail is provided regarding the estimation methods and examples of their application are given. Important study design recommendations are also covered to give a well rounded view of modeling.
Chapter
Introduction There are a wide range of approaches available for investigating the dynamics of the demographics and occurrence of ecological populations. So many that it would take an entire book, or more, to cover the important issues and options in sufficient detail. In this single chapter it is clearly impossible for me to go into detail on specific approaches, so I instead focus more on outlining some of the options available for addressing different types of questions and on general considerations, particularly with respect to program design. Inherently, because of those whom I have been fortunate enough to work with and learn from to this point in my career, most of the methods I discuss assume that detection of the items of interest (whether it be individual animals or plants, or of a species as a whole) will be imperfect, i.e. it will not be observed with certainty whenever you venture into the field to find it. However, many of the issues I will discuss are still relevant even with perfect detection. Recommended readings for further details on the topics I cover are Williams et al. (2002), Amstrup et al. (2005) and MacKenzie et al. (2006). Before launching into the main thrust of this chapter, I am going to make a few comments (some might even say a rant) about the importance of thinking hard about Why, What, and How during the conception stage of any monitoring program. Although these are fairly basic questions that have been discussed early in this volume, their fundamental importance cannot be emphasized enough.
Chapter
Article
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.