ArticlePDF Available

Abstract and Figures

Species occurrences inherently include positional error. Such error can be problematic for species distribution models (SDMs), especially those based on fine‐resolution environmental data. It has been suggested that there could be a link between the influence of positional error and the width of the species ecological niche. Although positional errors in species occurrence data may imply serious limitations, especially for modelling species with narrow ecological niche, it has never been thoroughly explored. We used a virtual species approach to assess the effects of the positional error on fine‐scale SDMs for species with environmental niches of different widths. We simulated three virtual species with varying niche breadth, from specialist to generalist. The true distribution of these virtual species was then altered by introducing different levels of positional error (from 5 to 500 m). We built generalized linear models and MaxEnt models using the distribution of the three virtual species (unaltered and altered) and a combination of environmental data at 5 m resolution. The models’ performance and niche overlap were compared to assess the effect of positional error with varying niche breadth in the geographical and environmental space. The positional error negatively impacted performance and niche overlap metrics. The amplitude of the influence of positional error depended on the species niche, with models for specialist species being more affected than those for generalist species. The positional error had the same effect on both modelling techniques. Finally, increasing sample size did not mitigate the negative influence of positional error. We showed that fine‐scale SDMs are considerably affected by positional error, even when such error is low. Therefore, where new surveys are undertaken, we recommend paying attention to data collection techniques to minimize the positional error in occurrence data and thus to avoid its negative effect on SDMs, especially when studying specialist species.
This content is subject to copyright. Terms and conditions apply.
www.ecography.org
ECOGRAPHY
Ecography
256
––––––––––––––––––––––––––––––––––––––––
© 2019 e Authors. Ecography published by John Wiley & Sons Ltd on behalf of Nordic Society Oikos
is is an open access article under the terms of the Creative Commons
Attribution License, which permits use, distribution and reproduction in any
medium, provided the original work is properly cited.
Subject Editor: Dan Warren
Editor-in-Chief: Miguel Araújo
Accepted 4 October 2019
43: 256–269, 2020
doi: 10.1111/ecog.0 4687
doi: 10.1111/ecog.04687 43 256–269
Species occurrences inherently include positional error. Such error can be problematic
for species distribution models (SDMs), especially those based on ne-resolution envi-
ronmental data. It has been suggested that there could be a link between the inuence
of positional error and the width of the species ecological niche. Although positional
errors in species occurrence data may imply serious limitations, especially for model-
ling species with narrow ecological niche, it has never been thoroughly explored. We
used a virtual species approach to assess the eects of the positional error on ne-scale
SDMs for species with environmental niches of dierent widths. We simulated three
virtual species with varying niche breadth, from specialist to generalist. e true dis-
tribution of these virtual species was then altered by introducing dierent levels of
positional error (from 5 to 500 m). We built generalized linear models and MaxEnt
models using the distribution of the three virtual species (unaltered and altered) and a
combination of environmental data at 5 m resolution. e models’ performance and
niche overlap were compared to assess the eect of positional error with varying niche
breadth in the geographical and environmental space. e positional error negatively
impacted performance and niche overlap metrics. e amplitude of the inuence of
positional error depended on the species niche, with models for specialist species being
more aected than those for generalist species. e positional error had the same eect
on both modelling techniques. Finally, increasing sample size did not mitigate the
negative inuence of positional error. We showed that ne-scale SDMs are consider-
ably aected by positional error, even when such error is low. erefore, where new
surveys are undertaken, we recommend paying attention to data collection techniques
to minimize the positional error in occurrence data and thus to avoid its negative eect
on SDMs, especially when studying specialist species.
Keywords: data errors, niche breadth, spatial overlay, virtual species
The effect of positional error on fine scale species distribution
models increases for specialist species
LukášGábor, VítězslavMoudrý, VincentLecours, MarcoMalavasi, VojtěchBarták, MichalFogl,
PetraŠímová, DuccioRocchini and TomášVáclavík
L. Gábor (https://orcid.org/0000-0001-6137-0994), V. Moudrý (https://orcid.org/0000-0002-3194-451X) (moudry@fzp.czu.cz), M. Malavasi,
V. Barták (https://orcid.org/0000-0001-9887-1290), M. Fogl (https://orcid.org/0000-0002-5880-6926), P. Šímová (https://orcid.org/0000-0003-2480-
1171) and D. Rocchini (https://orcid.org/0000-0003-0087-0594), Dept of Applied Geoinformatics and Spatial Planning, Faculty of Environmental
Sciences, Czech Univ. of Life Sciences Prague, Praha – Suchdol, Czech Republic. DR also at: Univ. of Trento, Center Agriculture Food Environment (C3A),
S. Michele all’Adige, TN, Italy, and Univ. of Trento, Dept of Cellular, Computational and Integrative Biology – CIBIO, Univ. of Trento, Povo, Italy, and
Fondazione Edmund Mach, Research and Innovation Centre, Dept of Biodiversity and Molecular Ecology, S. Michele all’Adige, TN, Italy. – V. Lecours
(https://orcid.org/0000-0002-4777-3348), School of Forest Resources and Conservation, Univ. of Florida, Gainesville, FL, USA. – T. Václavík (https://orcid.
org/0000-0002-1113-6320), Palacký Univ. Olomouc, Dept of Ecology and Environmental Sciences, Faculty of Science, Olomouc, Czech Republic, and
UFZ – Helmholtz Centre for Environmental Research, Dept of Computational Landscape Ecology, Leipzig, Germany.
Research
257
Introduction
Studying relationships between species and their environ-
ment is fundamental for understanding Earth’s biodiversity.
Species distribution models (SDMs) are a common tool used
to study these relationships. ey use species occurrence data
and environmental data to produce a set of rules explain-
ing the environmental space where species were collected
or observed (Ferrier et al. 2017). All applications of SDMs,
however, assume that species occurrence data are largely free
of spatial error. Nonetheless, all spatial data inherently con-
tain some level and type of spatial errors. ese errors can be,
for example, related to the use of inadequate spatial resolu-
tion (Gottschalketal. 2011, Šímováetal. 2019), low sample
size (Wiszetal. 2008, Moudrýetal. 2017), biased sampling
(Hijmans 2012, Rancetal. 2016) or occurrences with posi-
tional error (Grahametal. 2008, Osborne and Leitão 2009,
Mitchelletal. 2017). Data quality (both for species occur-
rences and environmental variables) is currently considered
a major factor limiting SDM accuracy (Araújoetal. 2019)
and demonstrating, quantifying and understanding the con-
sequences of these errors is therefore critical.
It is often assumed that the negative eects of positional
error (i.e. inaccurate location of species occurrences) is mini-
mal or mainly associated with relatively older datasets that are
often georeferenced from textual descriptions of their loca-
tions (which may cause errors of up to hundreds of meters,
Wieczoreketal. 2004). However, it is also necessary to con-
sider positional errors inherent to data georeferenced using
modern global navigation satellite systems (GNSS). e
positional error of GNSS data may be caused by the use of
outdated technology, by poor satellite signal reception (e.g.
because of inappropriate site conditions), or by data process-
ing (e.g. conversion between coordinate systems or round-
ing of coordinate values). Moreover, species occurrence data
often represent the position of the observer and not the actual
position of the species (Zhang et al. 2018). Additionally,
where the marine environment is concerned, species data are
often acquired using underwater cameras, in which case the
positional error can be aected for example by the camera
depth; the deeper the camera is, the greater is the positional
error (Rattrayetal. 2014, Mitchell etal. 2017). erefore,
even though the accuracy of standard GNSS is usually below
30 m (Frairetal. 2010), the errors associated with such data
may be much larger.
In addition, performance of SDMs is complicated by
various spatial (e.g. prevalence or range size) and ecologi-
cal (e.g. niche breadth) characteristics of the studied spe-
cies (Luoto et al. 2005, Bulluck et al. 2006, McPherson
and Jetz 2007, Evangelistaetal. 2008, Chefaouietal. 2011,
Connoretal. 2018). It has been hypothesized that range size
is positively correlated with niche breadth (i.e. the range of
environments that the species can inhabit), in other words
that species able to tolerate a wider range of conditions are
typically more widespread (Brown 1984, Gastonetal. 1997,
Arribas et al. 2012, Boulangeat et al. 2012). e niche
breadth–range size relationship is one of the possible mecha-
nisms explaining commonness and rarity. Modelling rare spe-
cies (i.e. species with small geographical ranges) is particularly
problematic and novel approaches have been adopted for this
purpose (Breineretal. 2015) to overcome the common prob-
lem of a low number of occurrences available for modelling
that may not be sucient to completely describe the spe-
cies niche. Similar eects can be caused by a low positional
accuracy of the occurrences (Johnson and Gillingham 2008,
Fernandezetal. 2009, Osborne and Leitão 2009).
Although the magnitude of the niche breadth–range size
relationship is still under debate, a recent meta-analysis of 64
studies found a signicant positive relationship between the
range size and niche breadth (Slatyeretal. 2013). Such a syn-
ergic relationship can increase the already high vulnerability
of specialist species to environmental changes. In addition,
Slatyeretal. (2013) suggested that specialist species might
be particularly vulnerable to any environmental change
due to synergistic eects of a narrow niche and small range
size. Specialist species are of high conservation concern, and
SDMs might be the only tractable means of estimating their
distribution and reaction to environmental change. However,
confounding eects of inaccurate data on modelling species
that utilize a narrow niche breadth (i.e. specialist) versus
species that utilize a wide niche breadth (i.e. generalist) are
unknown (Connoretal. 2018).
It is intuitive that positional error of a given magnitude
might have a greater eect on specialist than generalist spe-
cies, as it is more likely that occurrences get incorrectly shifted
into cells representing an unsuitable environment, i.e. envi-
ronment that is outside of the species’ environmental niche.
is, however, has never been thoroughly explored because it
is extremely dicult, if not impossible, to estimate the true
responses of a real species to the environment and, conse-
quently, to be able to fully understand the true suitability of
an area for the species in question.
In this study, we focused on Light Detection and Ranging
(LiDAR)-derived variables that are being more and more
often combined with species distribution data of unknown
positional accuracy to study species–environment relation-
ships at ne scales. Studies published so far have used real
species to test the eect of positional error. However, real
species distribution data are usually aected by a complex
set of other uncertainties (e.g. sampling bias, incompleteness,
inaccuracies). As a consequence, the isolation and identica-
tion of the eects of positional error can be very challeng-
ing, if not impossible. is is likely one of the reasons why
little consensus exists on how the eect of positional error
manifests in SDMs (Naimietal. 2011, Mitchelletal. 2017).
For example, Grahametal. (2008) concluded that SDMs are
robust to positional error while others argued that positional
errors reduce models’ performance (Johnson and Gillingham
2008, Fernandezetal. 2009, Osborne and Leitão 2009).
Another aspect may be that positional errors of species
occurrences were studied using relatively coarse environ-
mental data (but see Mitchelletal. 2017). Positional error
258
considered in prior studies ranged from 50 m up to 50 km
(Table 1). While such error results in a shift over several cells
in a coarse-resolution SDM (e.g. 1 × 1 km), it will cause a
much greater shift in a ne-resolution SDM (e.g. 10 × 10 m).
erefore, with the increasing availability of ne-scale data,
additional studies are needed (Osborne and Leitão 2009); it
can be expected that SDMs at ne scales would be more sen-
sitive to positional error.
To ensure the full knowledge of the exact ecological
and geographical characteristics of the species and to avoid
unknown complexities associated with real data, we used a
virtual species approach to test the eect of the positional
error in species occurrences on ne-scale SDMs in the con-
text of species niche breadth (i.e. specialist versus generalist
species). We generated three virtual species that diered in
characteristics related to the geographic distribution of the
species, i.e. prevalence and relative occurrence area (ROA);
the proportion of the total study area occupied by the species
(Lobo 2008).
e virtual species approach allowed us to control the
experiment and to isolate the eects of positional error
(Zurelletal. 2010). is approach is increasingly used to eval-
uate the eects of data inaccuracies on model performance
(Barbet-Massinetal. 2012, Václavík and Meentemeyer 2012,
Qiaoetal. 2015, Rancetal. 2016, Fernandesetal. 2018,
Leroyetal. 2018, Moudrýet al. 2018, Gábor etal. 2019,
Meynardetal. 2019), but has yet to be adopted for the study
of positional error. In particular, we tested whether: 1) SDMs
for specialist species are more aected by positional error than
those for generalist species; 2) it is possible to compensate the
assumed negative eect of a positional error with a higher
sample size; and 3) the positional error has dierent eects
when using a parametric (e.g. generalized linear model) ver-
sus a nonparametric (e.g. MaxEnt) modelling technique.
Material and methods
LiDAR data acquisition, processing and variable
selection
Discrete LiDAR data were collected in Krkonose Mountains
National Park (KRNAP), Czech Republic (Supplementary
material Appendix 1 Fig. A1) in 2012 using a small-footprint
airborne LiDAR system (RIEGL LMS Q-680i). e average
point density was approximately six points per square meter.
e LiDAR point cloud was automatically classied into
ground, vegetation, building, wire and transmission tower
classes in the ENVI LiDAR software (ver. 5.3) and LAStools
(ver. 171215). e terrain data points were used to produce
a digital terrain model (DTM), and the vegetation data
points were used to produce a canopy height model (CHM)
(Khosravipour et al. 2016). Both models were generated
from the point cloud at a 0.5 m resolution and subsequently
resampled to 5 m cell resolution for the analysis to improve
processing time. A topographic wetness index (TWI) was
derived from the DTM based on the equation
TWIlnAs
tan
=
β
where As is the specic catchment area and tan β is the local
slope in radians (Beven and Kirkby 1979). To calculate the
specic catchment area, we used the multiple ow routing
algorithm of Quinnetal. (1991), recommended by Kopecký
and Čížková (2010), using SAGA-GIS (Conrad 2003).
e selection of these three variables (DTM, CHM,
TWI) was motivated by the need to simulate a realistic sit-
uation that includes variables with various levels of spatial
Table 1. Overview of prior studies focused on the influence of positional error in species occurrence data on SDMs.
Species
data
Environmental
data
Resolution of input
environmental data
(pixel size) Range of shifting occurrences
Grahametal. 2008 observed categorical,
continuous
100 × 100 m 0–5 km 0–50 pixels
Johnson and
Gillingham 2008
observed categorical 30 × 30 m 50–1000 m (over 50 m) 1–34 pixels
Osborne and
Leitão 2009
observed continuous 1 × 1 km 0–1, 2–3, 4–5, 0–5 km 0–1, 2–3,
4–5, 0–5
pixels
Fernandezetal. 2009 observed continuous 1 × 1 km 5–10–25–50 km 1–5, 1–10,
1–25,
1–50
pixels
Naimietal. 2011 artificial continuous artificial data x 1–30 (over 1
pixel)
Mitchelletal. 2017 observed continuous 2.5 × 2.5 m 5–25–50–20–400 m 1–2, 1–12,
1–80,
1–160
pixels
259
autocorrelation (Supplementary material Appendix 2 Fig.
A2). CHM describes a horizontal structural variability of the
vegetation and is known to aect species richness (Lefskyetal.
2002). For example, higher vegetation was found to be related
to higher bird species richness (Davies and Asner 2014). TWI
is a surrogate for soil moisture, an environmental variable
that aects the vegetation composition and that has been pre-
viously used to predict bird occurrences (Besnardetal. 2013,
Reifetal. 2018). e relationships between CHM and TWI
on the one side and bird distribution and richness on the
other side make our study relatable to applications with real
species; our virtual species could theoretically be birds with
specic habitat requirements in terms of terrain characteristic
and vegetation structure. We also used the DTM as a sur-
rogate for climatic variables and to restrict our virtual species
to certain altitudes (Coopsetal. 2010, Vogeleretal. 2014).
Simulating virtual species with different niche
breadths
Virtual species were generated with the virtualspecies pack-
age (Leroyetal. 2016) in the statistical software R v.3.4.4 (R
Development Core Team). e process involved three steps:
a) generating the true distribution of the virtual species’ envi-
ronmental suitability, b) converting the environmental suit-
ability into presences and absences and c) sampling species
occurrences for further analysis and modelling.
Applying the formatFunctions function in R, we dened
the species–environment relationships using normal distribu-
tion curves. To simulate species with dierent niche breadth,
prevalence and ROA, we used the same means and varied
standard deviations of the used environmental variables
(Supplementary material Appendix 3 Table B1). Specically,
we simulated three distinct virtual species with varying ROAs
and prevalence that represent realistic scenarios of species
extent of occurrence in the study area. e species with low
ROA (4%) represents a specialist with low species prevalence
(0.04), narrow niche breadth and small geographical range.
e species with medium ROA (12%) may be described as an
intermediate species (species prevalence = 0.12) with a wider
niche breadth and medium geographical range. Finally, the
species with high ROA (52%) can be perceived as a general-
ist with high species prevalence (0.47), wide niche breadth
and wide geographical range (Futuyma and Moreno 1988,
Devictor et al. 2010, Franklin 2010, Peers et al. 2012).
Subsequently, we multiplied individual species’ responses to
environmental variables in order to acquire an environmental
suitability raster (function generateSpFromFun). We opted
for multiplication of the variables to assume irreplaceability
of environmental conditions (i.e. we assumed that unsuitabil-
ity of one condition causes a low probability of occurrence
even though remaining conditions are in species’ range of
suitable values).
As noted in several studies (Meynard and Kaplan 2012,
2013, Moudrý 2015, Meynardetal. 2019), an appropriate
setting of the whole simulation with respect to the research
questions is crucial for obtaining reliable results. In addition,
Meynardet al. (2019) highlighted that simulation studies
based on the threshold approach fail in appropriately separat-
ing factors such as prevalence and niche breadth. erefore,
due to these concerns, we adopted a probabilistic simulation
approach (logistic function with α = 0.05 and β = 0.3) to
convert the environmental suitability rasters into probabili-
ties of occurrences that were subsequently used to sample
binary presence/absence rasters (function convertToPA). To
sample species occurrences (function sampleOccurrences),
we randomly generated, using a uniform random distribu-
tion, both presence-only and presence/absence data. Both
types of occurrence datasets were generated in order to test
dierent modelling techniques (cf. section Model tting and
evaluation). To test whether it is possible to compensate the
assumed negative eect of positional error with a higher sam-
ple size, we generated four dierent sample sizes. Specically,
30, 100, 500 and 1000 species presences were generated,
complemented for the purpose of GLM modelling by twice
as many absences.
Simulating positional error in species occurrences
It is generally assumed that the magnitude of the positional
error in species occurrence varies based on the source of the
error. e positional error associated with GNSS points (e.g.
species occurrences) may range from a few centimetres up to
several metres. Furthermore, in some species such as birds or
big predators, it is usually impossible to record their accu-
rate position and such data are shifted by tens or hundreds
of meters. An even greater shift is sometimes observed in
museum databases. erefore, to evaluate the range of pos-
sible magnitudes of the positional error, we simulated the
positional error by shifting the sampled locations (i.e. pres-
ences and, in case of GLM, also absences) in a random direc-
tion according to six scenarios that corresponded to dierent
distances ranging from 5–10 m up to 100–500 m. e error
in the focal virtual species locations was 5–10 m for S1 sce-
nario, 10–15 m for S2, 15–20 m for S3, 20–50 m for S4,
50–100 m for S5 and 100–500 m for S6 (Supplementary
material Appendix 4 Table C1). Scenarios S1–S4 simulated
realistic degrees of error if using modern monitoring tech-
nologies like GNSS, while scenarios S5–S6 simulated more
extreme positional errors that could be associated with spe-
cies observations recorded without GNSS, species dicult to
pinpoint properly such as birds or big predators, or occur-
rences from museum databases. If the shifting of the original
data points resulted in the points falling outside the study
area, we recalculated the shift until the new coordinates were
located within the boundaries of the study area. We provide a
script of how we simulated virtual species and shifting occur-
rences in Supplementary material Appendix 2.
Model fitting and evaluation
We selected generalized linear models (GLM; Nelder and
Baker 1972, Oksanen and Minchin 2002) as a presence/
absence method and MaxEnt (Phillips et al. 2006) as a
260
presence-background method that are often adopted in
ecological studies (Moudrý and Šímová 2013, Lindaetal.
2016, Malavasietal. 2018, Gáboretal. 2019, Wattsetal.
2019). In addition, Grahametal. (2008) showed that these
two approaches were among the better performing model-
ling techniques when the data was aected by positional
errors. Models were built in the statistical software R using
the ‘dismo’ (ver. 1.1.4) and ‘glm2’ (ver. 1.2.1) packages. e
GLM was run with a logit–link function and binomial distri-
bution. e quadratic terms of the three environmental vari-
ables were included because of the known normal distribution
curves of the response function. To enable the comparison
of individual SDMs, we needed to maintain the param-
eters of MaxEnt unchanged, as done in many prior studies
(Franklinetal. 2014, Fourcadeetal. 2014, Hollowayetal.
2016, Rancetal. 2016, Tingleyetal. 2018, Yeetal. 2018).
e default settings established by Phillipsetal. (2009) were
used with randomly drawn background data generated from
the binary map of the true occurrences of the virtual spe-
cies. e same three environmental variables (DTM, CHM
and TWI) used in the process of generating virtual species
were used in the SDMs. Fivefold cross-validation where the
data were randomly divided into fths was used to evaluate
the models. Four fths of the data were used to train the
model and the remaining one fth was used to assess the
performance. Control models without positional error were
calculated for all three species with dierent niche breadth,
prevalence and ROA and for both modelling techniques,
allowing an easy comparison of the eect of positional error
on model performance.
e area under the receiver operating characteristic curve
(AUC) (Fielding and Bell 1997, Jiménez-Valverde 2012) and
the true-skill statistic (TSS) (Alloucheetal. 2006) were used
to assess model performance (i.e. discrimination accuracy).
AUC is widely used in ecological studies as a single threshold-
independent measure of model performance (Václavík and
Meentemeyer 2012, Mitchelletal. 2017). e AUC ranges
from 0 to 1 where a score of 1 indicates perfect discrimi-
nation, a score of 0.5 indicates random performance and
values lower than 0.5 indicate a worse than random perfor-
mance. TSS is a frequently used threshold dependent metric
(Cianfranietal. 2018, Eatonetal. 2018) taking both omis-
sion and commission errors into account. It ranges from 1
to +1 where +1 indicates perfect agreement and values of zero
or less indicate random performance (Alloucheetal. 2006).
To quantify dierences between the true probability of
occurrence of virtual species and the predicted distribution
inferred from the models in geographical space, their niche
overlap was compared using the I measure (Warren et al.
2008, Rödder and Engler 2011) and Spearman’s rank cor-
relation. e I ranges between 0 (no overlap) and 1 (perfect
overlap). Following Rödder and Engler (2011), we used the
following classes to interpret the results: no or very limited
overlap (0–0.2), low overlap (0.2–0.4), moderate overlap
(0.4–0.6), high overlap (0.6–0.8) and very high overlap (0.8–
1.0). Spearman’s rank correlation ranges between 1 and +1,
where 1 indicates that species responses to the environment
are exactly negatively correlated (opposite) and +1 indicates
perfectly positively correlated overlap (identical). e closer
the values are to zero, the lower is the niche overlap.
e magnitude of the negative eect of the positional error
on SDMs is dependent on the size of the positional error
and distribution of species’ suitable environment in the geo-
graphical space (Naimietal. 2011). e positional data may
be shifted in the geographical space and even a relatively low
positional error in geographical space can have a profound
eect on environmental niche estimates in environmental
space and vice versa. Furthermore, we expected this would be
related to the species niche breadth. erefore, we were also
interested in how the positional error is manifested in the
environmental space and measured the niche overlap in the
environmental space as well. We used I and Spearman’s rank
correlation implemented in ENMTools 0.2 (Warren et al.
2019a, b) to estimate overlap in the environmental space
between models tted with accurate occurrences without any
positional error (hereafter unaltered models) and models t-
ted with shifted occurrences (i.e. scenarios S1–S6).
We ran the entire process from species generation to
model evaluation 30 times (Fig. 1). In addition, we used the
analysis of variance (ANOVA) to assess the strength of the
individual eects of the positional error, sample size, ROA
and modelling technique, including all possible interactions.
We compared the relative importance of individual predictors
based on their contribution to the overall explained variation
(R2). Instead of formal testing, we plotted the eects (and
their condence intervals) of all predictors combinations and
evaluated them qualitatively. Because both AUC and TSS
values were highly heteroscedastic (e.g. the ratio between
maximum and minimum standard deviation across all fac-
tors combinations was 22 resp. 19 for AUC resp. TSS), we
used robust variance–covariance matrix estimator suggested
by MacKinnon and White (1985) for computation of con-
dence intervals. is was done using an R package ‘sandwich
(Zeileis 2006).
Results
Unaltered models
Both performance metrics (AUC and TSS) largely followed
the same pattern and highlighted excellent model perfor-
mance for all, i.e. specialist, intermediate and generalist, spe-
cies (AUC ranged from 0.91 up to 0.97 for MaxEnt models
and from 0.80 up to 0.85 for GLM models). e only excep-
tion were the MaxEnt models for generalist species where
AUC achieved only good performance (mean AUC 0.73).
MaxEnt models were more successful in modelling special-
ist and intermediate species while GLM models were more
accurate for the generalist species (Fig. 2).
Models achieved high or very high niche overlaps in geo-
graphical space according to both I and Spearman’s rank
correlation. In general, the niche overlap decreased in the fol-
lowing order: generalist, specialists and intermediate species,
261
Figure1. General modelling process. (i) We rst acquired and processed LiDAR data and selected three ne-scale environmental predictors:
DTM, CHM and TWI. (ii) We simulated virtual species with dierent niche breadths (ROA) by dening their response to environmental
gradients for each environmental variable. (iii) We multiplied those variables to generate environmental suitability (‘true’ distribution of
virtual species). (iv) We translated the probability of species occurrence to a presence–absence raster. (v) We sampled occurrences based on
the presence–absence raster. (vi) We simulated the positional error in species occurrences. (vii) We generated SDMs with accurate as well as
shifted occurrences, evaluated their performances (AUC, TSS) and assessed the niche overlap (I, Spearman’s rank correlation) in the geo-
graphical and environmental space.
262
Figure2. Resulting AUC (A) and TSS (B) scores according to dierent species niche breadth (specialist, intermediate, generalist), positional
error (S0, unaltered models; S1, 5–10 m; S2, 10–15 m; S3, 15–20 m; S4 20–50 m, S5, 50–100 m; S6, 100–500 m) and sample size (number
of presences = 30, 100, 500, 1000; note that for GLM models twice as many absences compared to presences were generated). Black colour
shows results for GLM models while grey shows results for MaxEnt models.
263
except for the Spearman’s rank correlation for specialists
modelled by MaxEnt that achieved very high correlation.
Comparison of modelling techniques showed that MaxEnt
models achieved a higher niche overlap than GLM for all spe-
cies with the most obvious dierences in specialist species. An
increase in the sample size of unaltered models led to none or
negligible increase in niche overlap (Fig. 3).
Effect of positional error on models of species with
different niche breadth
Results show, independently of the modelling technique,
a clear trend of the positional error worsening model per-
formance (both AUC and TSS). e highest drop is evi-
dent between unaltered models and models aected by the
smallest simulated positional error (5–10 m). Increasing
the positional error further led to additional decrease in
model performances; however, this decrease was mini-
mal (positional error 10–50 m). Even the extreme cases
of positional error (50–100 and 100–500 m) led to a rela-
tively low decrease in models’ performances in contrast
to the drop caused by the 5–10 m error. For example, in
the case of MaxEnt models for intermediate species, AUC
dropped on average from 0.91 (unaltered models) to 0.79
for the positional error of magnitude inherent to any occur-
rence data (i.e. up to 10 m), and to 0.71 in the case of the
extreme positional error (100–500 m), respectively (Fig. 2).
Nevertheless, the magnitude of the negative eect of posi-
tional error varied according to the species niche breadth.
For both GLM and MaxEnt models the drop between unal-
tered models and the smallest simulated positional error
(5–10 m) was higher for specialist and intermediate species
(AUC dropped on average about 0.12) than for generalist
species (AUC dropped on average about 0.05).
e results showed that the positional error in the occur-
rence data reduced the niche overlap in both the geographical
and environmental space of both GLM and MaxEnt models.
Niche overlap decreased gradually with the increasing posi-
tional error with an especially signicant decrease in mod-
els’ niche overlap at the extreme case of the positional error
(100–500 m) (Fig. 3, 4). However, the eect of the positional
error on the niche overlap varied depending on species’ niche
breadth. Decrease in the niche overlap was higher for spe-
cialist and intermediate species than for generalist species,
especially in the geographical space. For example, in case of
MaxEnt models, Spearman’s rank correlation was reduced
from 0.98 to 0.58 for the specialist and from 0.83 to 0.70
for the generalist species, respectively (Fig. 3). However,
the eect of the positional error was not that evident from
I, especially for the generalist species in geographical space.
For example, the decrease for generalist species and MaxEnt
models was on average only from 0.96 to 0.9 and the GLM
models appeared as not being aected at all.
Finally, independently of the validation metric, results
showed that increasing the sample size cannot compensate for
the eect of positional error (Fig. 2–4). On the contrary, it is
evident that a combination of low sample size of 30 samples
with positional error led to erratic behaviour and generally
low performance of the models.
Comparison of the relative importance of individual
predictors (R2)
e results show that the positional error and modelling tech-
nique had the highest relative importance (R2) for the model
performance (AUC, TSS). e relative importance of the
sample size and niche breadth was much smaller and mutu-
ally comparable (Table 2). According to the niche overlap in
geographical space assessed by I (model predictions), niche
breadth had the greatest eect, followed by the positional
error, modelling technique and sample size, the importance
of which was almost negligible. In contrast, according to cor-
relations, the modelling technique and positional error had
the highest relative importance (R2) followed by the niche
breadth and by sample size, the importance of which was
minimal. When assessing relative importance for niche over-
lap in the environmental space, the modelling technique and
positional error showed the highest contribution followed
by the niche breadth and by sample size, the importance of
which was almost negligible, just like in the above metrics.
All those factors signicantly aected SDMs performance
and predictions (p-value < 0.05).
Discussion
In this study, we focused on the eect of positional error in
species occurrences on ne-scale SDMs. We simulated species
with dierent levels of niche breadth to assess whether there
was a link between the width of the environmental niche and
the eect of the size of positional error. Our results showed
that introducing positional error into species occurrence
data led to a decrease in model performance and prediction
accuracy in both the geographical and environmental space.
However, the eect of the positional error varied with species
niche breadth. e same positional error had a greater impact
on specialist (low ROA and prevalence, narrow breadth of
niche) than on generalist (high ROA and prevalence, wide
breadth of niche) species. is is likely because in case of
specialist species, occurrences could be easily shifted to inap-
propriate environments outside of the species’ environmental
niche. is could also explain the inconsistent conclusions of
previous studies (Grahametal. 2008, Fernandezetal. 2009).
Higher sample sizes slightly improved unaltered models
accuracy; the results however showed that increasing the sam-
ple size could not compensate for the eect of positional error
on models’ accuracy (Fig. 2–4). On the other hand, low sam-
ple sizes of positionally inaccurate data were especially prob-
lematic for modelling. ese results are in general agreement
with the study by Mitchelletal. (2017) who investigated the
inuence of sample size (ranging from 100 samples to 400)
in conjunction with the positional error; their results showed
that models based on smaller sample sizes were more aected
by a positional error than those with higher numbers of spe-
cies occurrences. However, it is dicult to conclude whether
264
Figure3. Resulting I (A) and Spearman’s rank correlation (B) scores of niche overlap in geographical space according to dierent species niche
breadth (specialist, intermediate, generalist), positional error (S0, unaltered models; S1, 5–10 m; S2, 10–15 m; S3, 15–20 m; S4, 20–50 m,
S5, 50–100 m; S6, 100–500 m) and sample sizes (number of presences = 30, 100, 500, 1000; note that for GLM models twice as many
absences compared to presences were generated). Black colour shows results for GLM models while grey shows results for MaxEnt models.
265
Figure4. Resulting I (A) and Spearman’s rank correlation (B) scores of niche overlap in the environmental space according to dierent spe-
cies niche breadth (specialist, intermediate, generalist), positional error and sample size (number of presences = 30, 100, 500, 1000; note
that for GLM models, twice as many absences as presences were generated). Also note that here we show the niche overlap between unal-
tered models and models aected by a specied positional error (and not a comparison with simulated probability of occurrences as in Fig.
3). us, for example, S1 shows a comparison of niche overlap between unaltered models and models aected with positional error in the
range of 5–10 m. Black colour shows results for GLM models while grey shows results for MaxEnt models.
266
or not 100 records with positional error of 10 m are better
or worse for modelling at the scale of 5 m than 500 records
with positional error 25 m. For example, Moudrý and Šímová
(2012) suggested that the spatial resolution of the environ-
mental data should be coarser than the biggest positional error
of the occurrence data and Naimietal. (2011) showed that the
eect of positional error is reduced by spatial autocorrelation
in environmental variables. However, the trade-o between
the scale and positional error has not been thoroughly studied.
e degree of decrease between unaltered and altered mod-
els (i.e. those with positional error) diered among adopted
validation metrics and assuming a suciently large sample
size, AUC and TSS provided clear evidence of decreasing
model quality. e ability of evaluation metrics to identify the
magnitude of error caused by positional inaccuracies was pre-
viously discussed by Osborne and Leitão (2009). Interestingly,
they found that the use of AUC for the error quantication
in models aected by positional error was limited as AUC
did not decrease when compared to the control models. We
hypothesize that this contradiction results from confounding
eects of real data used in their study (i.e. they did not use
virtual species). In Osborne and Leitão (2009), the model-
ling algorithms were allowed to choose the best combination
of environmental variables from a set of twelve variables for
scenarios with dierent levels of positional error. Indeed, they
showed that positional error led to alteration of the variables
selected by the modelling algorithm. e selected variables
however often failed to represent the conditions pertinent to
the species during habitat selection. In contrast, here we used
the same variables throughout, both to generate the virtual
species and to model their distribution. Hence, our modelling
approaches (GLM, MaxEnt) did not have the option to select
variables that would provide a closer t to the altered occur-
rence data but that were lacking ecological relevance and as a
result did not lead to spurious increase in AUC and TSS val-
ues. We suggest that the eect of positional error on selection
of environmental variables should be further investigated.
e eects discussed above raise serious concerns as it is
possible that the use of positionally inaccurate data com-
bined with an arbitrary selection of environmental variables
that may lack ecological relevance results in seemingly accu-
rate but entirely wrong models. For instance, Fourcadeetal.
(2018) successfully tted SDMs with non-ecological vari-
ables such as paintings to demonstrate this point. While
Osborne and Leitão (2009) and Mitchellet al. (2017) sug-
gested that useful predictions can still be generated from data
aected by positional error, they warned that the ecological
interpretation of such data and predictions was dangerous.
Our results support the importance of assessing data in terms
of tness-for-use (Lecours 2017). Fitness-for-use is the con-
cept of determining whether or not a dataset is of sucient
quality for a particular purpose (Goodchild 2006). Spatial
scale is intrinsically linked to such assessment of tness-for-
use (Lecoursetal. 2017) as data accuracy is dependent on
the spatial resolution of the environmental data. As indicated
by Moudrý and Šímová (2012), the spatial resolution of the
environmental data should always be coarser than the largest
positional error associated with occurrence data.
In line with previous work (Van Niel and Austin 2007,
Rocchinietal. 2011, Lecoursetal. 2017), we believe that
attempts to predict species distributions with data of unknown
accuracy are potentially dangerous and as such, we highlight
the necessity of quantifying the positional accuracy of data. If
such assessment is limited by metadata availability, for exam-
ple in case of historical data, we recommend to at least approx-
imate the positional accuracy based on known information
such as the collection methodology or the number of deci-
mals recorded with coordinates. With a proper tness-for-use
assessment that includes data quality and scale, the resolution
of environmental variables can be coarsened before they are
integrated into a modelling exercise to minimize the adverse
eects of the positional error of species occurrences. However,
we are aware that this may involve altering the spatial resolu-
tion of data to a level that is no longer eligible for potentially
optimal resolution(s), i.e. the scale at which species respond
to the environment (Lecoursetal. 2015, Moudrýetal. 2019).
As demonstrated in Lecourset al. (2017), there is a trade-
o between spatial scale and data quality that needs to be
evaluated as a part of the tness-for-use assessment. While
no experiments are currently available to help quantify which
is more important for successful modelling (whether it is the
data quality or scale), we suggest that pre-analyses be per-
formed to test whether keeping a ner resolution is more
important than minimizing positional error, or vice-versa. For
new surveys, we suggest paying a close attention to measure-
ment techniques to minimize positional error, for instance by
using dierential GNSS, especially for species with a narrow
ecological niche as our results show that the positional error
of species occurrence data has a profound eect on results of
SDMs. Finally, we advocate for additional studies focused on
the inuence of positional error using more complex virtual
species (e.g. with a higher number of environmental variables
or with more complex response curves) to improve SDM use
in ecology, macroecology and biogeography.
Table 2. Comparison of the relative importance of individual factors (R2, %) for ANOVA of performance metrics (AUC, TSS) and niche over-
lap in the geographical and environmental spaces (I, correlation).
Factor AUC TSS
I geographical
space
Correlation
geographical space
I environmental
space
Correlation
environmental space
ROA 4 4.14 75 11.2 9.7 1.7
Sample size 1.1 1.78 0.1 1 0.2 0.4
Modelling technique 18.7 21.35 8 24.7 45.4 21.5
Positional error 25.4 24.58 8.4 27.5 13.2 18.3
267
Conclusions
In this study, we explored how positional error in spe-
cies occurrences aects ne-scale SDMs. We showed that
the inuence of positional error on SDMs diered accord-
ing to the width of species’ ecological niches and this eect
was evident in both geographical and environmental space.
e eect of the positional error on generalist species was
much smaller than the eect on specialist species, which were
aected the most. In addition, our results show that the neg-
ative eects of positionally inaccurate data entering SDMs
cannot be mitigated by increasing the sample size. erefore,
a take away message of our study is that improving positional
accuracy of data appears to be more eective than increas-
ing sample size. We suggest that it is critical to evaluate the
quality of data with respect to the spatial resolution of the
environmental variables and to select occurrences with a low
positional error (note that a low positional error can be even
1km if the spatial resolution of environmental variables is of
similar size). Future research should be focused on the inu-
ence of positional error using more complex virtual species
(e.g. with a higher number of environmental variables or with
more complex response curves) and on how positional accu-
racy errors may aect the selection of variables in modelling
species distribution to improve its future application in ecol-
ogy, macroecology and biogeography.
Data availability statement
Using our methods, species occurrence data may be articially
generated using virtualspecies package in R. e LiDAR data
are owned by Krkonose Mountains National Park and are
available upon request for research purposes.
Acknowledgements – e authors would like to thank the Krkonose
Mountains National Park for providing LiDAR data. We greatly
appreciate the contribution of the subject editor and both reviewers.
Funding – is research was funded by the Internal Grant Agency
of Faculty of Environmental Sciences, Czech Univ. of Life Sciences
Prague, grant no. 20174241 and no. 20194224. VM, VB and MF
were also supported by the Czech Science Foundation (project no.
17-17156Y).
Author contributions – All authors contributed substantially to the
work. VM and TV are authors of the main idea of the research and
supervised the whole research. LG and VB performed all GIS and
statistical analyses. VB supervised statistical analyses. MF processed
LiDAR data. LG wrote the rst draft of the manuscript. VL, MM,
PŠ and DR helped to improve the manuscript. All authors gave
nal approval for publication.
References
Allouche, O.etal. 2006. Assessing the accuracy of species distribu-
tion models: prevalence, kappa and the true skill statistic (TSS).
– J. Appl. Ecol. 43: 1223–1232.
Araújo, M. B. et al. 2019. Standards for distribution models in
biodiversity assessments. – Sci. Adv. 5: eaat4858.
Arribas, P.etal. 2012. Dispersal ability rather than ecological toler-
ance drives dierences in range size between lentic and lotic
water beetles (Coleoptera: Hydrophilidae). – J. Biogeogr. 39:
984–994.
BarbetMassin, M.etal. 2012. Selecting pseudoabsences for spe-
cies distribution models: how, where and how many? – Methods
Ecol. Evol. 3: 327–338.
Besnard, A. G.etal. 2013. Topographic wetness index predicts the
occurrence of bird species in oodplains. – Divers. Distrib. 19:
955–963.
Beven, K. J. and Kirkby, M. J. 1979. A physically based, variable
contributing area model of basin hydrology. – Hydrol. Sci. J.
24: 43–69.
Boulangeat, I. et al. 2012. Niche breadth, rarity and ecological
characteristics within a regional ora spanning large environ-
mental gradients. – J. Biogeogr. 39: 204–214.
Breiner, F. T.etal. 2015. Overcoming limitations of modelling rare
species by using ensembles of small models. – Methods Ecol.
Evol. 6: 1210–1218.
Brown, J. H. 1984. On the relationship between abundance and
distribution of species. – Am. Nat. 124: 255–279.
Bulluck, L.et al. 2006. Spatial and temporal variations in species
occurrence rate aect the accuracy of occurrence models.
– Global Ecol. Biogeogr. 15: 27–38.
Chefaoui, R. M.etal. 2011. Eects of species’ traits and data char-
acteristics on distribution models of threatened invertebrates.
– Anim. Biodivers. Conserv. 34: 229–247.
Cianfrani, C.etal. 2018. More than range exposure: global otter
vulnerability to climate change. – Biol. Conserv. 221: 103–113.
Connor, T.etal. 2018. Eects of grain size and niche breadth on
species distribution modeling. – Ecography 41: 1270–1282.
Conrad, O. 2003. Module topographic wetness index (SAGA).
– Version 2.1.3.
Coops, N. C.etal. 2010. Assessing the utility of LiDAR remote
sensing technology to identify mule deer winter habitat. – Can.
J. Remote Sens. 36: 81–88.
Davies, A. B. and Asner, G. P. 2014. Advances in animal ecology
from 3D-LiDAR ecosystem mapping. – Trends Ecol. Evol. 29:
681–691.
Devictor, V.et al. 2010. Dening and measuring ecological spe-
cialization. – J. Appl. Ecol. 47: 15–25.
Eaton, S.etal. 2018. Adding small species to the big picture: spe-
cies distribution modelling in an age of landscape scale conser-
vation. – Biol. Conserv. 217: 251–258.
Evangelista, P. H.etal. 2008. Modelling invasion for a habitat gen-
eralist and a specialist plant species. – Divers. Distrib. 14:
808–817.
Fernandes, R. F.etal. 2018. How much should one sample to accu-
rately predict the distribution of species assemblages? A virtual
community approach. – Ecol. Inform. 48: 125–134.
Fernandez, M.etal. 2009. Locality uncertainty and the dierential
performance of four common niche-based modeling tech-
niques. – Biodivers. Inform. 6: 36–52.
Ferrier, S.etal. 2017. Biodiversity modelling as part of an observa-
tion system. – In: Walters, M. and Scholers, R. (eds), e GEO
handbook on biodiversity observation networks. Springer, pp.
239–257.
Fielding, A. H. and Bell, J. F. 1997. A review of methods for the
assessment of prediction errors in conservation presence/absence
models. – Environ. Conserv. 24: 38–49.
Fourcade, Y.etal. 2014. Mapping species distributions with MAX-
ENT using a geographically biased sample of presence data: a
268
performance assessment of methods for correcting sampling
bias. – PLoS One 9: e97122.
Fourcade, Y.etal. 2018. Paintings predict the distribution of spe-
cies, or the challenge of selecting environmental predictors and
evaluation statistics. – Global Ecol. Biogeogr. 27: 245–256.
Frair, J. L. etal. 2010. Resolving issues of imprecise and habitat-
biased locations in ecological analyses using GPS telemetry
data. – Phil. Trans. R. Soc. B 365: 2187–2200.
Franklin, J. 2010. Mapping species distributions: spatial inference
and prediction. – Cambridge Univ. Press.
Franklin, J.etal. 2014. Linking spatially explicit species distribu-
tion and population models to plan for the persistence of plant
species under global change. – Environ. Conserv. 41: 97–109.
Futuyma, D. J. and Moreno, G. 1988. e evolution of ecological
specialisation. – Annu. Rev. Ecol. Syst. 207–233.
Gábor, L.etal. 2019. How do species and data characteristics aect
species distribution models and when to use environmental l-
tering? – Int. J. Geogr. Inform. Sci. doi:
10.1080/13658816.2019.1615070
Gaston, K. J.etal. 1997. Interspecic abundance range size rela-
tionships: an appraisal of mechanisms. – J. Anim. Ecol. 66:
579–601.
Goodchild, M. F. 2006. Fundamentals of spatial data quality.
– ISTE, London.
Gottschalk, T. K. et al. 2011. Inuence of grain size on species–
habitat models. – Ecol. Model. 222: 3403–3412.
Graham, C. H.etal. 2008. e inuence of spatial errors in species
occurrence data used in distribution models. – J. Appl. Ecol.
45: 239–247.
Hijmans, R. J. 2012. Crossvalidation of species distribution mod-
els: removing spatial sorting bias and calibration with a null
model. – Ecology 93: 679–688.
Holloway, P.etal. 2016. Incorporating movement in species distri-
bution models: how do simulations of dispersal aect the accu-
racy and uncertainty of projections? – Int. J. Geogr. Inform.
Sci. 30: 2050–2074.
JiménezValverde, A. 2012. Insights into the area under the receiver
operating characteristic curve (AUC) as a discrimination meas-
ure in species distribution modelling. – Global Ecol. Biogeogr.
21: 498–507.
Johnson, C. J. and Gillingham, M. P. 2008. Sensitivity of species-
distribution models to error, bias and model design: an applica-
tion to resource selection functions for woodland caribou.
– Ecol. Model. 213: 143–155.
Khosravipour, A.etal. 2016. Generating spike-free digital surface
models using LiDAR raw point clouds: a new approach for
forestry applications. – Int. J. Appl. Earth Observ. Geoinform.
52: 104–114.
Kopecký, M. and Čížková, Š. 2010. Using topographic wetness
index in vegetation ecology: does the algorithm matter? – Appl.
Veg. Sci. 13: 450–459.
Lecours, V. 2017. On the use of maps and models in conservation
and resource management (warning: results may vary). – Front.
Mar. Sci. 4: 1–18.
Lecours, V. et al. 2015. Spatial scale and geographic context in
benthic habitat mapping: review and future directions. – Mar.
Ecol. Progr. Ser. 535: 259–284.
Lecours, V.etal. 2017. Artefacts in marine digital terrain models:
a multiscale analysis of their impact on the derivation of terrain
attributes. – IEEE Trans. Geosci. Remote Sens. 55: 5391–5406.
Lefsky, M. A. et al. 2002. LiDAR remote sensing for ecosystem
studies: LiDAR, an emerging remote sensing technology that
directly measures the three-dimensional distribution of plant
canopies, can accurately estimate vegetation structural attrib-
utes and should be of particular interest to forest, landscape and
global ecologists. – BioScience 52: 19–30.
Leroy, B.etal. 2016. virtualspecies, an R package to generate virtual
species distributions. – Ecography 39: 599–607.
Leroy, B.etal. 2018. Without quality presence–absence data, dis-
crimination metrics such as TSS can be misleading measures of
model performance. – J. Biogeogr. 45: 1994–2002.
Linda, R. et al. 2016. Developing a criterion for distinguishing
tetraploid birch species from diploid and modelling their poten-
tial distribution on the Czech Republic. – In: Kacálek, D.etal.
(eds), Proceedings of central European silviculture, pp. 71–77.
Lobo, J. M. 2008. More complex distribution models or more rep-
resentative data? – Biodivers. Inform. 5: 14–19.
Luoto, M.etal. 2005. Uncertainty of bioclimate envelope models
based on the geographical distribution of species. – Global Ecol.
Biogeogr. 14: 575–584.
MacKinnon, J. G. and White, H. 1985. Some heteroskedasticity-
consistent covariance matrix estimators with improved nite
sample properties. – J. Economet. 29: 305–325.
Malavasi, M.etal. 2018. Plant invasions in Italy: an integrative
approach using the European LifeWatch infrastructure data-
base. – Ecol. Indic. 91: 182–188.
McPherson, J. M. and Jetz, W. 2007. Eects of species’ ecology
on the accuracy of distribution models. – Ecography 30:
135–151.
Meynard, C. N. and Kaplan, D. M. 2012. e eect of a gradual
response to the environment on species distribution modeling
performance. – Ecography 35: 499–509.
Meynard, C. N. and Kaplan, D. M. 2013. Using virtual species to
study species distributions and model performance. – J. Bioge-
ogr. 40: 1–8.
Meynard, C. N.etal. 2019. Testing methods in species distribution
modelling using virtual species: what have we learnt and what
are we missing? – Ecography doi: 10.1111/ecog.04385
Mitchell, P. J.etal. 2017. Sensitivity of nescale species distribu-
tion models to locational uncertainty in occurrence data across
multiple sample sizes. – Methods Ecol. Evol. 8: 12–21.
Moudrý, V. 2015. Modelling species distributions with simulated
virtual species. – J. Biogeogr. 42: 1365–1366.
Moudrý, V. and Šímová, P. 2012. Inuence of positional accuracy,
sample size and scale on modelling species distributions: a
review. – Int. J. Geogr. Inform. Sci. 26: 2083–2095.
Moudrý, V. and Šímová, P. 2013. Relative importance of climate,
topography and habitats for breeding wetland birds with dier-
ent latitudinal distributions in the Czech Republic. – Appl.
Geogr. 44: 165–171.
Moudrý, V.etal. 2017. Which breeding bird categories should we
use in models of species distribution? – Ecol. Indic. 74:
526–529.
Moudrý, V.etal. 2018. On the use of global DEMs in ecological
modelling and the accuracy of new bare-earth DEMs. – Ecol.
Model. 383: 3–9.
Moudrý, V.etal. 2019. Potential pitfalls in rescaling digital terrain
model-derived attributes for ecological studies. – Ecol. Inform.
54: 100987.
Naimi, B.etal. 2011. Spatial autocorrelation in predictors reduces
the impact of positional uncertainty in occurrence data on spe-
cies distribution modelling. – J. Biogeogr. 38: 1497–1509.
Nelder, J. A. and Baker, R. J. 1972. Generalized linear models.
– Wiley.
269
Oksanen, J. and Minchin, P. R. 2002. Continuum theory revisited:
what shape are species responses along ecological gradients?
– Ecol. Model. 157: 119–129.
Osborne, P. E. and Leitão, P. J. 2009. Eects of species and habitat
positional errors on the performance and interpretation of spe-
cies distribution models. – Divers. Distrib. 15: 671–681.
Peers, M. J. et al. 2012. Reconsidering the specialist–generalist
paradigm in niche breadth dynamics: resource gradient selec-
tion by Canada lynx and bobcat. – PLoS One 7: e51488.
Phillips, S. J.etal. 2006. Maximum entropy modelling of species
geographic distributions. – Ecol. Model. 190: 231–259.
Phillips, S. J.etal. 2009. Sample selection bias and presenceonly
distribution models: implications for background and pseudo
absence data. – Ecol. Appl. 19: 181–197.
Qiao, H.etal. 2015. Marble algorithm: a solution to estimating eco-
logical niches from presence-only records. – Sci. Rep. 5: 14232.
Quinn, P. F. B. J.etal. 1991. e prediction of hillslope ow paths
for distributed hydrological modelling using digital terrain
models. – Hydrol. Process. 5: 59–79.
Ranc, N.etal. 2016. Performance tradeos in targetgroup bias cor-
rection for species distribution models. – Ecography 40:
1076–1087.
Rattray, A. et al. 2014. Quantication of spatial and thematic
uncertainty in the application of underwater video for benthic
habitat mapping. – Mar. Geodesy 37: 315–336.
Reif, J. et al. 2018. Competitiondriven niche segregation on a
landscape scale: evidence for escaping from syntopy towards
allotopy in two coexisting sibling passerine species. – J. Anim.
Ecol. 87: 774–789.
Rocchini, D.etal. 2011. Accounting for uncertainty when map-
ping species distributions: the need for maps of ignorance.
– Progr. Phys. Geogr. 35: 211–226.
Rödder, D. and Engler, J. O. 2011. Quantitative metrics of overlaps
in Grinnellian niches: 422 advances and possible drawbacks.
– Global Ecol. Biogeogr. 20: 915–927.
Šímová, P.etal. 2019 Fine scale waterbody data improve prediction
of waterbird occurrence despite coarse species data. – Ecography
42: 511–520.
Slatyer, R. A.etal. 2013. Niche breadth predicts geographical range
size: a general ecological pattern. – Ecol. Lett. 16: 1104–1114.
Tingley, R. et al. 2018. Integrating transport pressure data and
species distribution models to estimate invasion risk for alien
stowaways. – Ecography 41: 635–646.
Václavík, T. and Meentemeyer, R. K. 2012. Equilibrium or not?
Modelling potential distribution of invasive species in dierent
stages of invasion. – Divers. Distrib. 18: 73–83.
Van Niel, K. P. and Austin, M. P. 2007. Predictive vegetation mod-
elling for conservation: impact of error propagation from digi-
tal elevation data. – Ecol. Appl. 17: 266–280.
Vogeler, J. C.et al. 2014. Terrain and vegetation structural inu-
ences on local avian species richness in two mixed-conifer for-
ests. – Remote Sens. Environ. 147: 13–22.
Warren, D. L.etal. 2008. Environmental niche equivalency versus
conservatism: quantitative approaches to niche evolution.
– Evolution 62: 2868–2883.
Warren, D. L.etal. 2019a. Evaluating species distribution models
with discrimination accuracy is uninformative for many appli-
cations. – BioRxiv 684399.
Warren, D. L. et al. 2019b. danlwarren/ENMTools: initial beta
release. – Package ver. 0.2, Zenodo, < https://github.com/dan-
lwarren/ENMTools >.
Watts, S. M.etal. 2019. Modelling potential habitat for snow leop-
ards (Panthera uncia) in Ladakh, India. – PLoS One 14:
e0211509.
Wieczorek, J.etal. 2004. e point-radius method for georeferenc-
ing locality descriptions and calculating associated uncertainty.
– Int. J. Geogr. Inform. Sci. 18: 745–767.
Wisz, M. S. et al. 2008. Eects of sample size on the
performance of species distribution models. – Divers. Distrib.
14: 763–773.
Ye, X.etal. 2018. Impacts of future climate and land cover changes
on threatened mammals in the semi-arid Chinese Altai Moun-
tains. – Sci. Total Environ. 612: 775–787.
Zeileis, A. 2006. Object-oriented computation of sandwich estima-
tors. – J. Stat. Softw. 16: 1–16.
Zhang, G.etal. 2018. A heuristicbased approach to mitigating
positional errors in patrol data for species distribution mode-
ling. – Trans. GIS 22: 202–216.
Zurell, D.etal. 2010. e virtual ecologist approach: simulating
data and observers. – Oikos 119: 622–635.
Supplementary material (available online as Appendix ecog-
04687 at < www.ecography.org/appendix/ecog-04687 >).
Appendix 1–5.
... The desire for spatial precision and representative sampling of occurrences leads to a trade-off. On the one hand, if records can be geolocated only imprecisely, using them risks introducing uncertainty into estimates of environmental tolerances (Cheng et al., 2021;Collins et al., 2017;Feeley & Silman, 2010;Fernandez et al., 2009;Gábor et al., 2020Gábor et al., , 2022Graham et al., 2008;Marcer et al., 2022;Mitchell et al., 2016;Osborne & Leitão, 2009;Tulowiecki et al., 2015). On the other hand, discarding records risks under-representing the true geographical and environmental range of a species, even if the location of occurrences is uncertain . ...
... Bars represent the frequency of studies that report the given data cleaning method as a percentage of studies that described any data cleaning procedure (for description of categories and for sampling and scoring methods, see Supporting Information Appendix S1). Gábor et al., 2020Gábor et al., , 2022Graham et al., 2008;Gueta & Carmel, 2016;Hefley et al., 2017;Mitchell et al., 2016;Osborne & Leitão, 2009;Soultan & Safi, 2017;Tulowiecki et al., 2015). ...
Article
Aim Museum and herbarium specimen records are frequently used to assess the conservation status of species and their responses to climate change. Typically, occurrences with imprecise geolocality information are discarded because they cannot be matched confidently to environmental conditions and are thus expected to increase uncertainty in downstream analyses. However, using only precisely georeferenced records risks undersampling of the environmental and geographical distributions of species. We present two related methods to allow the use of imprecisely georeferenced occurrences in biogeographical analysis. Innovation Our two procedures assign imprecise records to the (1) locations or (2) climates that are closest to the geographical or environmental centroid of the precise records of a species. For virtual species, including imprecise records alongside precise records improved the accuracy of ecological niche models projected to the present and the future, especially for species with c . 20 or fewer precise occurrences. Using only precise records underestimated loss of suitable habitat and overestimated the amount of suitable habitat in both the present and the future. Including imprecise records also improves estimates of niche breadth and extent of occurrence. An analysis of 44 species of North American Asclepias (Apocynaceae) yielded similar results. Main conclusions Existing studies examining the effects of spatial imprecision typically compare outcomes based on precise records against the same records with spatial error added to them. However, in real‐world cases, analysts possess a mix of precise and imprecise records and must decide whether to retain or discard the latter. Discarding imprecise records can undersample the geographical and environmental distributions of species and lead to mis‐estimation of responses to past and future climate change. Our method, for which we provide a software implementation in the enmSdmX package for R, is simple to use and can help leverage the large number of specimen records that are typically deemed “unusable” because of spatial imprecision in their geolocation.
... been shown to be ineffective. For example, Gábor et al. (2020) demonstrated that increased sample sizes do not reduce the negative effects of positional uncertainty. Similarly, Smith et al. (2023) showed that discarding data with high positional uncertainty limits our ability to determine species' distribution and climatic niche tolerances properly. ...
Article
Full-text available
Species distribution models (SDMs) have become a common tool in studies of species–environment relationships but can be negatively affected by positional uncertainty of underlying species occurrence data. Previous work has documented the effect of positional uncertainty on model predictive performance, but its consequences for inference about species–environment relationships remain largely unknown. Here we use over 12 000 combinations of virtual and real environmental variables and virtual species, as well as a real case study, to investigate how accurately SDMs can recover species–environment relationships after applying known positional errors to species occurrence data. We explored a range of environmental predictors with various spatial heterogeneity, species' niche widths, sample sizes and magnitudes of positional error. Positional uncertainty decreased predictive model performance for all modeled scenarios. The absolute and relative importance of environmental predictors and the shape of species–environmental relationships co-varied with a level of positional uncertainty. These differences were much weaker than those observed for overall model performance, especially for homogenous predictor variables. This suggests that, at least for the example species and conditions analyzed, the negative consequences of positional uncertainty on model performance did not extend as strongly to the ecological interpretability of the models. Although the findings are encouraging for practitioners using SDMs to reveal generative mechanisms based on spatially uncertain data, they suggest greater consequences for applications utilizing distributions predicted from SDMs using positionally uncertain data, such as conservation prioritization and biodiversity monitoring.
... The strength of these relationships infere species' niches and can be used to predict a species' occurrence in unsurveyed locations. Although SDMs are a fundamental tool for answering many ecological, evolutionary, and conservation-related questions, some methodological issues remain unresolved (Araújo et al., 2019;Gábor et al., 2020;Moudrý et al., 2017;Rocchini et al., 2011;Santini et al., 2021). ...
Article
There is a lack of guidance on the choice of the spatial grain of predictor and response variables in species distribution models (SDM). This review summarizes the current state of the art with regard to the following points: (i) the effects of changing the resolution of predictor and response variables on model performance; (ii) the effect of conducting multi-grain versus single-grain analysis on model performance; and (iii) the role of land cover type and spatial autocorrelation in selecting the appropriate grain size. In the reviewed literature, we found that coarsening the resolution of the response variable typically leads to declining model performance. Therefore, we recommend aiming for finer resolutions unless there is a reason to do otherwise (e.g. expert knowledge of the ecological scale). We also found that so far, the improvements in model performance reported for multi-grain models have been relatively low and that useful predictions can be generated even from single-scale models. In addition, the use of high-resolution predictors improves model performance; however, there is only limited evidence on whether this applies to models with coarser-resolution response variables (e.g. 100 km2 and coarser). Low-resolution predictors are usually sufficient for species associated with fairly common environmental conditions but not for species associated with less common ones (e.g. common vs rare land cover category). This is because coarsening the resolution reduces variability within heterogeneous predictors and leads to underrepresentation of rare environments, which can lead to a decrease in model performance. Thus, assessing the spatial autocorrelation of the predictors at multiple grains can provide insights into the impacts of coarsening their resolution on model performance. Overall, we observed a lack of studies examining the simultaneous manipulation of the resolution of predictor and response variables. We stress the need to explicitly report the resolution of all predictor and response variables.
... The change of grain size therefore affects the estimated suitability through overlap between the hypervolume of the realized niche and the hypervolume of the environmental background (a measure of similarity between two distributions) ( Figure 4B). For narrow-ranged and specialist species, we expect the performance and the estimated suitability of SDMs to be particularly prone to differences in spatial grain becausecompared to wide-ranged and generalist speciestheir realized niches overlap less with the environmental background [49,50], and the hypervolumes of the realized niche and the environmental background are likely to respond differently to spatial grain (Figure 2). Suitability estimates (A) A change of niche breadth results in a change in the rankings of niche breadth among species, hence altering the range size-niche breadth relationship and the diversification rate-niche breadth relationship. ...
Article
Full-text available
Species environmental niches are central to ecology, evolution, and global change research, but their characterization and interpretation depend on the spatial scale (specifically, the spatial grain) of their measurement. We find that the spatial grain of niche measurement is usually uninformed by ecological processes and varies by orders of magnitude. We illustrate the consequences of this variation for the volume, position, and shape of niche estimates, and discuss how it interacts with geographic range size, habitat specialization, and environmental heterogeneity. Spatial grain significantly affects the study of niche breadth, environmental suitability, niche evolution, niche tracking, and climate change effects. These and other fields will benefit from a more mechanism-informed choice of spatial grain and cross-grain evaluations that integrate different data sources.
... Coordinates do not suffice to know confidently and rigorously the environmental conditions of a specimen's preferred habitat (Gábor et al. 2019). In fact, the knowledge of the degree of uncertainty with which these coordinates have been determined is crucial to determine the fitness of data for a particular research objective. ...
Article
Full-text available
Natural history collections (NHCs) represent an enormous and largely untapped wealth of information on the Earth's biota, made available through GBIF as digital preserved specimen records. Precise knowledge of where the specimens were collected is paramount to rigorous ecological studies, especially in the field of species distribution modelling. Here, we present a first comprehensive analysis of georeferencing quality for all preserved specimen records served by GBIF, and illustrate the impact that coordinate uncertainty may have on predicted potential distributions. We used all GBIF preserved specimen records to analyse the availability of coordinates and associated spatial uncertainty across geography, spatial resolution, taxonomy, publishing institutions and collection time. We used three plant species across their native ranges in different parts of the world to show the impact of uncertainty on predicted potential distributions. We found that 38% of the 180+ million records provide coordinates only and 18% coordinates and uncertainty. Georeferencing quality is determined more by country of collection and publishing than by taxonomic group. Distinct georeferencing practices are more determinant than implicit characteristics and georeferencing difficulty of specimens. Availability and quality of records contrasts across world regions. Uncertainty values are not normally distributed but peak at very distinct values, which can be traced back to specific regions of the world. Uncertainty leads to a wide spectrum of range sizes when modelling species distributions, potentially affecting conclusions in biogeographical and climate change studies. In summary, the digitised fraction of the world's NHCs are far from optimal in terms of georeferencing and quality mainly depends on where the collections are hosted. A collective effort between communities around NHC institutions, ecological research and data infrastructure is needed to bring the data on a par with its importance and relevance for ecological research.
... This source of error has been noted as problematic for other taxa and datasets and influences the outcomes of analyses and conservation assessments (Meier and Dikow 2004;Haase et al. 2006;Stribling et al. 2008;Guzzon and Ardenghi 2018;Egli et al. 2020). Although there has been specific attention on locality error inherent in natural history records (Graham et al. 2008;Gábor et al. 2020), taxonomic error is unfortunately rarely considered when assessing biases in datasets, and this could be problematic especially for meta-analyses, for example, modelling species distributions or global patterns of richness, among others. Museum collections are known to contain incorrectly identified specimens (Graham et al. 2004;Meier and Dikow 2004;Newbold 2010;Maldonado et al. 2015), but this issue can be rectified with re-examination of specimens and barcoding of fresh material from the same or similar localities. ...
Article
It is commonly recognised that natural history datasets contain locality errors that can compromise the utility of those datasets. However, another source of error in these datasets is taxonomic misidentifications, and this type of error is potentially common, particularly with regards to morphologically conservative species. For example, in the African skinks, the Trachylepis striata and T. varia species complexes each contain morphologically similar species that are commonly confused, despite being genetically distinct. Some species also are partly sympatric, and misidentifications are likely to be especially problematic in those areas. Using DNA barcoding, we assessed misidentification rates between species and applied the updated identifications to known distribution maps to examine whether those maps are accurate representations. Existing banked samples and newly collected samples were DNA barcoded using the mitochondrial 16S gene and supplemented with GenBank data. Identifications were made by matching sequences using haplotype networks that included material from near type localities. The barcode-based identifications were compared with the original identifications recorded for those samples. Taxonomic error was common, particularly in areas of presumed sympatry (error for T. striata species complex: 28%; T. varia species complex: 31%) and this resulted in inaccurately represented species distributions and areas of sympatry. Areas of sympatry were, however, confirmed for T. spilogaster/T. punctatissima, T. striata/T. punctatissima and T. damarana/T. laevigata/T. varia. Our findings corroborate other studies that demonstrate taxonomic error in existing datasets is a significant, but typically unrecognised problem, particularly for morphologically conservative species. This has implications for the utility of historical collections, citizen science records and public databases used in the formulation of species distribution maps, but also for other downstream analyses that rely on these datasets.
... The impact of georeferencing uncertainty on distribution modeling results can be estimated by simulating random location errors (see Graham et al., 2008). A general recommendation is to choose a pixel resolution that is large enough to reduce the impact of georeferencing uncertainty while at the same time accord with the modeling purpose (see e.g., Gábor et al., 2020). Finally, standardized protocols for correction of inaccurate geographic coordinates may be used, for example the SAGA protocol proposed by Bloom et al. (2017) for museum data. ...
Article
Full-text available
Information about the distribution of a study object (e.g., species or habitat) is essential in face of increasing pressure from land or sea use, and climate change. Distribution models are instrumental for acquiring such information, but also encumbered by uncertainties caused by different sources of error, bias and inaccuracy that need to be dealt with. In this paper we identify the most common sources of uncertainties and link them to different phases in the modeling process. Our aim is to outline the implications of these uncertainties for the reliability of distribution models and to summarize the precautions needed to be taken. We performed a step-by-step assessment of errors, biases and inaccuracies related to the five main steps in a standard distribution modeling process: (1) ecological understanding, assumptions and problem formulation; (2) data collection and preparation; (3) choice of modeling method, model tuning and parameterization; (4) evaluation of models; and, finally, (5) implementation and use. Our synthesis highlights the need to consider the entire distribution modeling process when the reliability and applicability of the models are assessed. A key recommendation is to evaluate the model properly by use of a dataset that is collected independently of the training data. We support initiatives to establish international protocols and open geodatabases for distribution models.
Article
Full-text available
Biological invasion assessments have often used species distribution models (SDMs) assuming species equilibrium with the environment. However, the identification of invaded areas seems more accurate when incorporating movement constraints as landscape connectivity (e.g., circuit theory models). We studied an introduced population of an Asian bird species (Leiothrix lutea) in Portugal during its spreading stage between 2014 and 2019 to: (1) compare accuracy in inferring year-based invasion stages between resistance surfaces models (from SDMs) and circuit theory models; (2) quantify the consistency of niche conservatism; and (3) map a long-term connectivity pattern (2014–2050) to predict future invaded areas. We considered three environmental variables: two static (distance to rivers and altitude) and a dynamic (normalized difference vegetation index: NDVI). SDMs were projected during the species dispersal period to infer range expansion, and then converted into resistance surfaces. We compared SDM performances with those of circuit theory models, built with resistance surfaces plus reachable habitat patches as nodes. Overall, our results showed the superiority of circuit theory models over SDMs in inferring invasion. Along the years, SDMs showed that the relative importance of river proximity decreased while NDVI increased, with landscape metrics suggesting an increasing niche generalism. We examined niche conservatism across years by comparing continuous distribution to binary maps (habitat patches) through landscape metrics. We found no evidence for niche conservatism after accounting for landscape variation. Our findings highlight the importance of following each invasion stage of an established exotic species, as well as incorporating niche breadth and dispersal constraints into frameworks to enhance population monitoring and control strategies.
Article
Full-text available
Species distribution models (SDMs) are powerful tools in ecology and conservation. Choosing the right environmental drivers and filtering species' occurrences taking their biases into account are key factors to consider before modeling. In this case study, we address five common problems arising during the selection of input data for presence-only SDMs on an example of a general-ist species: the endangered Cantabrian brown bear. First, we focus on the selection of environmental variables that may drive its distribution, testing if climatic variables should be considered at a 1-km analysis grain. Second, we investigate how filtering the species' data in view of (1) their collection procedures , (2) different time frames, (3) dispersal areas, and (4) subpopulations affects the performance and outputs of the models at three different spatial analysis grains (500 m, 1 km, and 5 km). Our results show that models with different input data yielded only minor differences in performance and behaved properly in terms of model validation, although coarsening the analysis grain deteriorated model performance. Still, the contribution of individual variables and the habitat suitability predictions differed among models. We show that a combination of limited data availability and poor selection of environmental variables can lead to inaccurate predictions. Specifically for the brown bear, we conclude that climatic variables should not be considered for exploring habitat suitability and that the best input data for modeling habitat suitability in the study area originate from (1) observations and traces from the (2) most recent period (2006-2019) in which the population is expanding, (3) not considering cells of dispersing bear occurrences and (4) modeling sub-populations independently (as they show distinct habitat preferences). In conclusion , SDMs can serve as a useful tool for generalist species including all available data; still, expert evaluation from the perspective of data suitability for the purpose of modeling and possible biases is recommended. This is especially important when the results are intended for management and conservation purposes at the local level, and for species that respond to the environment at coarse analysis grains.
Article
The performance of species distribution models (SDMs) is known to be affected by analysis grain and positional error of species occurrences. Coarsening of the analysis grain has been suggested to compensate for positional errors. Nevertheless, this way of dealing with positional errors has never been thoroughly tested. With increasing use of fine‐scale environmental data in SDMs, it is important to test this assumption. Models using fine‐scale environmental data are more likely to be negatively affected by positional error as the inaccurate occurrences might easier end up in unsuitable environment. This can result in inappropriate conservation actions. Here, we examined the trade‐offs between positional error and analysis grain and provide recommendations for best practice. We generated narrow niche virtual species using environmental variables derived from LiDAR point clouds at 5 × 5 m fine‐scale. We simulated the positional error in the range of 5 m to 99 m and evaluated the effects of several spatial grains in the range of 5 m to 500 m. In total, we assessed 49 combinations of positional accuracy and analysis grain. We used three modelling techniques (MaxEnt, BRT and GLM) and evaluated their discrimination ability, niche overlap with virtual species and change in realized niche. We found that model performance decreased with increasing positional error in species occurrences and coarsening of the analysis grain. Most importantly, we showed that coarsening the analysis grain to compensate for positional error did not improve model performance. Our results reject coarsening of the analysis grain as a solution to address the negative effects of positional error on model performance. We recommend fitting models with the finest possible analysis grain and as close to the response grain as possible even when available species occurrences suffer from positional errors. If there are significant positional errors in species occurrences, users are unlikely to benefit from making additional efforts to obtain higher resolution environmental data unless they also minimize the positional errors of species occurrences. Our findings are also applicable to coarse analysis grain, especially for fragmented habitats, and for species with narrow niche breadth.
Article
Full-text available
Aim Species distribution models are used across evolution, ecology, conservation and epidemiology to make critical decisions and study biological phenomena, often in cases where experimental approaches are intractable. Choices regarding optimal models, methods and data are typically made based on discrimination accuracy: a model's ability to predict subsets of species occurrence data that were withheld during model construction. However, empirical applications of these models often involve making biological inferences based on continuous estimates of relative habitat suitability as a function of environmental predictor variables. We term the reliability of these biological inferences ‘functional accuracy.’ We explore the link between discrimination accuracy and functional accuracy. Methods Using a simulation approach we investigate whether models that make good predictions of species distributions correctly infer the underlying relationship between environmental predictors and the suitability of habitat. Results We demonstrate that discrimination accuracy is only informative when models are simple and similar in structure to the true niche, or when data partitioning is geographically structured. However, the utility of discrimination accuracy for selecting models with high functional accuracy was low in all cases. Main conclusions These results suggest that many empirical studies and decisions are based on criteria that are unrelated to models’ usefulness for their intended purpose. We argue that empirical modelling studies need to place significantly more emphasis on biological insight into the plausibility of models, and that the current approach of maximizing discrimination accuracy at the expense of other considerations is detrimental to both the empirical and methodological literature in this active field. Finally, we argue that future development of the field must include an increased emphasis on simulation; methodological studies based on ability to predict withheld occurrence data may be largely uninformative about best practices for applications where interpretation of models relies on estimating ecological processes, and will unduly penalize more biologically informative modelling approaches.
Article
Full-text available
Species distribution models (SDMs) have become one of the major predictive tools in ecology. However, multiple methodological choices are required during the modelling process, some of which may have a large impact on forecasting results. In this context, virtual species, i.e., the use of simulations involving a fictitious species for which we have perfect knowledge of its occurrence‐environment relationships and other relevant characteristics, have become increasingly popular to test SDMs. This approach provides for a simple virtual ecologist framework under which to test model properties, as well as the effects of the different methodological choices, and allows teasing out the effects of targeted factors with great certainty. This simplification is therefore very useful in setting up modelling standards and best practice principles. As a result, numerous virtual species studies have been published over the last decade. The topics covered include differences in performance between statistical models, effects of sample size, choice of threshold values, methods to generate pseudo‐absences for presence‐only data, among many others. These simulations have therefore already made a great contribution to setting best modelling practices in SDMs. Recent software developments have greatly facilitated the simulation of virtual species, with at least 3 different packages published to that effect. However, the simulation procedure has not been homogeneous, which introduces some subtleties in the interpretation of results, as well as differences across simulation packages. Here we (1) review the main contributions of the virtual species approach in the SDM literature; (2) compare the major virtual species simulation approaches and software packages; and (3) propose a set of recommendations for best simulation practices in future virtual species studies in the context of SDMs. This article is protected by copyright. All rights reserved.
Article
Full-text available
The snow leopard Panthera uncia is an elusive species inhabiting some of the most remote and inaccessible tracts of Central and South Asia. It is difficult to determine its distribution and density pattern, which are crucial for developing conservation strategies. Several techniques for species detection combining camera traps with remote sensing and geographic information systems have been developed to model the habitat of such cryptic and low-density species in challenging terrains. Utilising presence-only data from camera traps and direct observations, alongside six environmental variables (elevation, aspect, ruggedness, distance to water, land cover, and prey habitat suitability), we assessed snow leopard habitat suitability across Ladakh in northern India. This is the first study to model snow leopard distribution both in India and utilising direct observation data. Results suggested that elevation and ruggedness are the two most influential environmental variables for snow leopard habitat suitability, with highly suitable habitat having an elevation range of 2,800 m to 4,600 m and ruggedness of 450 m to 1,800 m. Our habitat suitability map estimated approximately 12% of Ladakh's geographical area (c. 90,000 km²) as highly suitable and 18% as medium suitability. We found that 62.5% of recorded livestock depredation along with over half of all livestock corrals (54%) and homestays (58%) occurred within highly suitable snow leopard habitat. Our habitat suitability model can be used to assist in allocation of conservation resources by targeting construction of livestock corrals to areas of high habitat suitability and promoting ecotourism programs in villages in highly suitable snow leopard habitat.
Article
Full-text available
Demand for models in biodiversity assessments is rising, but which models are adequate for the task? We propose a set of best-practice standards and detailed guidelines enabling scoring of studies based on species distribution models for use in biodiversity assessments. We reviewed and scored 400 modeling studies over the past 20 years using the proposed standards and guidelines. We detected low model adequacy overall, but with a marked tendency of improvement over time in model building and, to a lesser degree, in biological data and model evaluation. We argue that implementation of agreed-upon standards for models in biodiversity assessments would promote transparency and repeatability, eventually leading to higher quality of the models and the inferences used in assessments. We encourage broad community participation toward the expansion and ongoing development of the proposed standards and guidelines.
Poster
Full-text available
Species distribution modelling is now routinely applied in many macroecologicalstudies. However, the reliability of evaluation metrics used to validate these models remains debated. Moreover, the emergence of online databases of environmental variables with global coverage, especially climatic, has favoured the use of the same set of standard predictors. Unfortunately, the effort of variable selection based on the species’ ecology is often limited. In this context, our aim was to highlight the importance of selecting ad hoc variables in species distribution modelling, and to assess the ability of classical evaluation statistics to identify biologically non-significant models.
Article
Terrain attributes (e.g., slope, rugosity) derived in Geographic Information Systems (GIS) from digital terrain models (DTMs) are widely used in both terrestrial and marine ecological studies due to their potential to act as surrogates of species distribution. However, the spatial resolution of DTMs is often altered to match the scale at which species observations were collected. Here, we highlight the significance of adequately reporting the methods used to derive terrain attributes from DTMs and the consequences of their incorrect reporting in ecological studies. To ensure full repeatability of studies, they should report (i) the source and the resolution of the original DTM; (ii) the algorithm used to calculate terrain attributes; (iii) the method used for rescaling (e.g., aggregating or resampling, using the mean or maximum values); and (iv) the order in which these operations were performed. We contrast the effects of two common scale alteration approaches for the derivation of terrain attributes from DTMs. These two scale alteration methods differ in the step at which the change is performed: (i) the resolution alteration is performed after computing terrain attributes from the original DTM at the native resolution, or (ii) the resolution alteration is performed on the native DTM before computing terrain attributes. While these approaches conceptually do the same thing (i.e., change the resolution of the terrain attributes), we demonstrate that they produce two distinct sets of variables that are not interchangeable and describe different properties of the terrain. In a species distribution modelling (SDM) context, the first approach calculates terrain attribute values within the cell where a species is found, while the second approach calculates terrain attribute values with respect to neighbouring cells. A mutual substitution of the two approaches results in a decrease of models' discrimination ability and in misleading spatial predictions of species probability of occurrence. Regardless of the DTM-derived attribute, we argue that the choice of the approach should be carefully guided by both the ecological scale relevant to the question being asked and the performance of pre-analyses. We emphasize that selected methods be clearly described to encourage reproducibility and proper interpretation of results, thus enabling a better understanding of the role of scale in ecology.
Article
Species distribution models (SDMs) are widely used in ecology and conservation. However, their performance is known to be affected by a variety of factors related to species occurrence characteristics. In this study, we used a virtual species approach to overcome the difficulties associated with testing of combined effects of those factors on performance of presence-only SDMs when using real data. We focused on the individual and combined roles of factors related to response variable (i.e. sample size, sampling bias, environmental filtering, species prevalence, and species response to environmental gradients). Results suggest that environmental filtering is not necessarily helpful and should not be performed blindly, without evidence of bias in species occurrences. The more gradual the species response to environmental gradients is, the greater is the model sensitivity to an inappropriate use of environmental filtering, although this sensitivity decreases with higher species prevalence. Results show that SDMs are affected to the greatest degree by the species response to environmental gradients, species prevalence, and sample size. Models’ accuracy decreased with sample size below 300 presences. Furthermore, a high level of interactions among individual factors was observed. Ignoring the combined effects of factors may lead to misleading outcomes and conclusions.
Article
Correlative species distribution models (SDMs) are widely used to predict species distributions and assemblages, with many fundamental and applied uses. Different factors were shown to affect SDM prediction accuracy. However, real data cannot give unambiguous answers on these issues, and for this reason, artificial data have been increasingly used in recent years. Here, we move one step further by assessing how different factors can affect the prediction accuracy of virtual assemblages obtained by stacking individual SDM predictions (stacked SDMs, S-SDM). We modelled 100 virtual species in a real study area, testing five different factors: sample size (200-800-3200), sampling method (nested, non-nested), sampling prevalence (25%, 50%, 75% and species true prevalence), modelling technique (GAM, GLM, BRT and RF) and thresholding method (ROC, MaxTSS, and MaxKappa). We showed that the accuracy of S-SDM predictions is mostly affected by modelling technique followed by sample size. Models fitted by GAM/GLM had a higher accuracy and lower variance than BRT/RF. Model accuracy increased with sample size and a sampling strategy reflecting the true prevalence of the species was most successful. However, even with sample sizes as high as >3000 sites, residual uncertainty remained in the predictions, potentially reflecting a bias introduced by creating and/or resampling the virtual species. Therefore, when evaluating the accuracy of predictions from S-SDMs fitted with real field data, one can hardly expect reaching perfect accuracy, and reasonably high values of similarity or predictive success can already be seen as valuable predictions. We recommend the use of a ‘plot-like’ sampling method (best approximation of the species' true prevalence) and not simply increasing the number of presences-absences of species. As presented here, virtual simulations might be used more systematically in future studies to inform about the best accuracy level that one could expect given the characteristics of the data and the methods used to fit and stack SDMs.