ArticlePDF Available

Abstract and Figures

Species occurrences inherently include positional error. Such error can be problematic for species distribution models (SDMs), especially those based on fine‐resolution environmental data. It has been suggested that there could be a link between the influence of positional error and the width of the species ecological niche. Although positional errors in species occurrence data may imply serious limitations, especially for modelling species with narrow ecological niche, it has never been thoroughly explored. We used a virtual species approach to assess the effects of the positional error on fine‐scale SDMs for species with environmental niches of different widths. We simulated three virtual species with varying niche breadth, from specialist to generalist. The true distribution of these virtual species was then altered by introducing different levels of positional error (from 5 to 500 m). We built generalized linear models and MaxEnt models using the distribution of the three virtual species (unaltered and altered) and a combination of environmental data at 5 m resolution. The models’ performance and niche overlap were compared to assess the effect of positional error with varying niche breadth in the geographical and environmental space. The positional error negatively impacted performance and niche overlap metrics. The amplitude of the influence of positional error depended on the species niche, with models for specialist species being more affected than those for generalist species. The positional error had the same effect on both modelling techniques. Finally, increasing sample size did not mitigate the negative influence of positional error. We showed that fine‐scale SDMs are considerably affected by positional error, even when such error is low. Therefore, where new surveys are undertaken, we recommend paying attention to data collection techniques to minimize the positional error in occurrence data and thus to avoid its negative effect on SDMs, especially when studying specialist species.
This content is subject to copyright. Terms and conditions apply.
www.ecography.org
ECOGRAPHY
Ecography
256
––––––––––––––––––––––––––––––––––––––––
© 2019 e Authors. Ecography published by John Wiley & Sons Ltd on behalf of Nordic Society Oikos
is is an open access article under the terms of the Creative Commons
Attribution License, which permits use, distribution and reproduction in any
medium, provided the original work is properly cited.
Subject Editor: Dan Warren
Editor-in-Chief: Miguel Araújo
Accepted 4 October 2019
43: 256–269, 2020
doi: 10.1111/ecog.0 4687
doi: 10.1111/ecog.04687 43 256–269
Species occurrences inherently include positional error. Such error can be problematic
for species distribution models (SDMs), especially those based on ne-resolution envi-
ronmental data. It has been suggested that there could be a link between the inuence
of positional error and the width of the species ecological niche. Although positional
errors in species occurrence data may imply serious limitations, especially for model-
ling species with narrow ecological niche, it has never been thoroughly explored. We
used a virtual species approach to assess the eects of the positional error on ne-scale
SDMs for species with environmental niches of dierent widths. We simulated three
virtual species with varying niche breadth, from specialist to generalist. e true dis-
tribution of these virtual species was then altered by introducing dierent levels of
positional error (from 5 to 500 m). We built generalized linear models and MaxEnt
models using the distribution of the three virtual species (unaltered and altered) and a
combination of environmental data at 5 m resolution. e models’ performance and
niche overlap were compared to assess the eect of positional error with varying niche
breadth in the geographical and environmental space. e positional error negatively
impacted performance and niche overlap metrics. e amplitude of the inuence of
positional error depended on the species niche, with models for specialist species being
more aected than those for generalist species. e positional error had the same eect
on both modelling techniques. Finally, increasing sample size did not mitigate the
negative inuence of positional error. We showed that ne-scale SDMs are consider-
ably aected by positional error, even when such error is low. erefore, where new
surveys are undertaken, we recommend paying attention to data collection techniques
to minimize the positional error in occurrence data and thus to avoid its negative eect
on SDMs, especially when studying specialist species.
Keywords: data errors, niche breadth, spatial overlay, virtual species
The effect of positional error on fine scale species distribution
models increases for specialist species
LukášGábor, VítězslavMoudrý, VincentLecours, MarcoMalavasi, VojtěchBarták, MichalFogl,
PetraŠímová, DuccioRocchini and TomášVáclavík
L. Gábor (https://orcid.org/0000-0001-6137-0994), V. Moudrý (https://orcid.org/0000-0002-3194-451X) (moudry@fzp.czu.cz), M. Malavasi,
V. Barták (https://orcid.org/0000-0001-9887-1290), M. Fogl (https://orcid.org/0000-0002-5880-6926), P. Šímová (https://orcid.org/0000-0003-2480-
1171) and D. Rocchini (https://orcid.org/0000-0003-0087-0594), Dept of Applied Geoinformatics and Spatial Planning, Faculty of Environmental
Sciences, Czech Univ. of Life Sciences Prague, Praha – Suchdol, Czech Republic. DR also at: Univ. of Trento, Center Agriculture Food Environment (C3A),
S. Michele all’Adige, TN, Italy, and Univ. of Trento, Dept of Cellular, Computational and Integrative Biology – CIBIO, Univ. of Trento, Povo, Italy, and
Fondazione Edmund Mach, Research and Innovation Centre, Dept of Biodiversity and Molecular Ecology, S. Michele all’Adige, TN, Italy. – V. Lecours
(https://orcid.org/0000-0002-4777-3348), School of Forest Resources and Conservation, Univ. of Florida, Gainesville, FL, USA. – T. Václavík (https://orcid.
org/0000-0002-1113-6320), Palacký Univ. Olomouc, Dept of Ecology and Environmental Sciences, Faculty of Science, Olomouc, Czech Republic, and
UFZ – Helmholtz Centre for Environmental Research, Dept of Computational Landscape Ecology, Leipzig, Germany.
Research
257
Introduction
Studying relationships between species and their environ-
ment is fundamental for understanding Earth’s biodiversity.
Species distribution models (SDMs) are a common tool used
to study these relationships. ey use species occurrence data
and environmental data to produce a set of rules explain-
ing the environmental space where species were collected
or observed (Ferrier et al. 2017). All applications of SDMs,
however, assume that species occurrence data are largely free
of spatial error. Nonetheless, all spatial data inherently con-
tain some level and type of spatial errors. ese errors can be,
for example, related to the use of inadequate spatial resolu-
tion (Gottschalketal. 2011, Šímováetal. 2019), low sample
size (Wiszetal. 2008, Moudrýetal. 2017), biased sampling
(Hijmans 2012, Rancetal. 2016) or occurrences with posi-
tional error (Grahametal. 2008, Osborne and Leitão 2009,
Mitchelletal. 2017). Data quality (both for species occur-
rences and environmental variables) is currently considered
a major factor limiting SDM accuracy (Araújoetal. 2019)
and demonstrating, quantifying and understanding the con-
sequences of these errors is therefore critical.
It is often assumed that the negative eects of positional
error (i.e. inaccurate location of species occurrences) is mini-
mal or mainly associated with relatively older datasets that are
often georeferenced from textual descriptions of their loca-
tions (which may cause errors of up to hundreds of meters,
Wieczoreketal. 2004). However, it is also necessary to con-
sider positional errors inherent to data georeferenced using
modern global navigation satellite systems (GNSS). e
positional error of GNSS data may be caused by the use of
outdated technology, by poor satellite signal reception (e.g.
because of inappropriate site conditions), or by data process-
ing (e.g. conversion between coordinate systems or round-
ing of coordinate values). Moreover, species occurrence data
often represent the position of the observer and not the actual
position of the species (Zhang et al. 2018). Additionally,
where the marine environment is concerned, species data are
often acquired using underwater cameras, in which case the
positional error can be aected for example by the camera
depth; the deeper the camera is, the greater is the positional
error (Rattrayetal. 2014, Mitchell etal. 2017). erefore,
even though the accuracy of standard GNSS is usually below
30 m (Frairetal. 2010), the errors associated with such data
may be much larger.
In addition, performance of SDMs is complicated by
various spatial (e.g. prevalence or range size) and ecologi-
cal (e.g. niche breadth) characteristics of the studied spe-
cies (Luoto et al. 2005, Bulluck et al. 2006, McPherson
and Jetz 2007, Evangelistaetal. 2008, Chefaouietal. 2011,
Connoretal. 2018). It has been hypothesized that range size
is positively correlated with niche breadth (i.e. the range of
environments that the species can inhabit), in other words
that species able to tolerate a wider range of conditions are
typically more widespread (Brown 1984, Gastonetal. 1997,
Arribas et al. 2012, Boulangeat et al. 2012). e niche
breadth–range size relationship is one of the possible mecha-
nisms explaining commonness and rarity. Modelling rare spe-
cies (i.e. species with small geographical ranges) is particularly
problematic and novel approaches have been adopted for this
purpose (Breineretal. 2015) to overcome the common prob-
lem of a low number of occurrences available for modelling
that may not be sucient to completely describe the spe-
cies niche. Similar eects can be caused by a low positional
accuracy of the occurrences (Johnson and Gillingham 2008,
Fernandezetal. 2009, Osborne and Leitão 2009).
Although the magnitude of the niche breadth–range size
relationship is still under debate, a recent meta-analysis of 64
studies found a signicant positive relationship between the
range size and niche breadth (Slatyeretal. 2013). Such a syn-
ergic relationship can increase the already high vulnerability
of specialist species to environmental changes. In addition,
Slatyeretal. (2013) suggested that specialist species might
be particularly vulnerable to any environmental change
due to synergistic eects of a narrow niche and small range
size. Specialist species are of high conservation concern, and
SDMs might be the only tractable means of estimating their
distribution and reaction to environmental change. However,
confounding eects of inaccurate data on modelling species
that utilize a narrow niche breadth (i.e. specialist) versus
species that utilize a wide niche breadth (i.e. generalist) are
unknown (Connoretal. 2018).
It is intuitive that positional error of a given magnitude
might have a greater eect on specialist than generalist spe-
cies, as it is more likely that occurrences get incorrectly shifted
into cells representing an unsuitable environment, i.e. envi-
ronment that is outside of the species’ environmental niche.
is, however, has never been thoroughly explored because it
is extremely dicult, if not impossible, to estimate the true
responses of a real species to the environment and, conse-
quently, to be able to fully understand the true suitability of
an area for the species in question.
In this study, we focused on Light Detection and Ranging
(LiDAR)-derived variables that are being more and more
often combined with species distribution data of unknown
positional accuracy to study species–environment relation-
ships at ne scales. Studies published so far have used real
species to test the eect of positional error. However, real
species distribution data are usually aected by a complex
set of other uncertainties (e.g. sampling bias, incompleteness,
inaccuracies). As a consequence, the isolation and identica-
tion of the eects of positional error can be very challeng-
ing, if not impossible. is is likely one of the reasons why
little consensus exists on how the eect of positional error
manifests in SDMs (Naimietal. 2011, Mitchelletal. 2017).
For example, Grahametal. (2008) concluded that SDMs are
robust to positional error while others argued that positional
errors reduce models’ performance (Johnson and Gillingham
2008, Fernandezetal. 2009, Osborne and Leitão 2009).
Another aspect may be that positional errors of species
occurrences were studied using relatively coarse environ-
mental data (but see Mitchelletal. 2017). Positional error
258
considered in prior studies ranged from 50 m up to 50 km
(Table 1). While such error results in a shift over several cells
in a coarse-resolution SDM (e.g. 1 × 1 km), it will cause a
much greater shift in a ne-resolution SDM (e.g. 10 × 10 m).
erefore, with the increasing availability of ne-scale data,
additional studies are needed (Osborne and Leitão 2009); it
can be expected that SDMs at ne scales would be more sen-
sitive to positional error.
To ensure the full knowledge of the exact ecological
and geographical characteristics of the species and to avoid
unknown complexities associated with real data, we used a
virtual species approach to test the eect of the positional
error in species occurrences on ne-scale SDMs in the con-
text of species niche breadth (i.e. specialist versus generalist
species). We generated three virtual species that diered in
characteristics related to the geographic distribution of the
species, i.e. prevalence and relative occurrence area (ROA);
the proportion of the total study area occupied by the species
(Lobo 2008).
e virtual species approach allowed us to control the
experiment and to isolate the eects of positional error
(Zurelletal. 2010). is approach is increasingly used to eval-
uate the eects of data inaccuracies on model performance
(Barbet-Massinetal. 2012, Václavík and Meentemeyer 2012,
Qiaoetal. 2015, Rancetal. 2016, Fernandesetal. 2018,
Leroyetal. 2018, Moudrýet al. 2018, Gábor etal. 2019,
Meynardetal. 2019), but has yet to be adopted for the study
of positional error. In particular, we tested whether: 1) SDMs
for specialist species are more aected by positional error than
those for generalist species; 2) it is possible to compensate the
assumed negative eect of a positional error with a higher
sample size; and 3) the positional error has dierent eects
when using a parametric (e.g. generalized linear model) ver-
sus a nonparametric (e.g. MaxEnt) modelling technique.
Material and methods
LiDAR data acquisition, processing and variable
selection
Discrete LiDAR data were collected in Krkonose Mountains
National Park (KRNAP), Czech Republic (Supplementary
material Appendix 1 Fig. A1) in 2012 using a small-footprint
airborne LiDAR system (RIEGL LMS Q-680i). e average
point density was approximately six points per square meter.
e LiDAR point cloud was automatically classied into
ground, vegetation, building, wire and transmission tower
classes in the ENVI LiDAR software (ver. 5.3) and LAStools
(ver. 171215). e terrain data points were used to produce
a digital terrain model (DTM), and the vegetation data
points were used to produce a canopy height model (CHM)
(Khosravipour et al. 2016). Both models were generated
from the point cloud at a 0.5 m resolution and subsequently
resampled to 5 m cell resolution for the analysis to improve
processing time. A topographic wetness index (TWI) was
derived from the DTM based on the equation
TWIlnAs
tan
=
β
where As is the specic catchment area and tan β is the local
slope in radians (Beven and Kirkby 1979). To calculate the
specic catchment area, we used the multiple ow routing
algorithm of Quinnetal. (1991), recommended by Kopecký
and Čížková (2010), using SAGA-GIS (Conrad 2003).
e selection of these three variables (DTM, CHM,
TWI) was motivated by the need to simulate a realistic sit-
uation that includes variables with various levels of spatial
Table 1. Overview of prior studies focused on the influence of positional error in species occurrence data on SDMs.
Species
data
Environmental
data
Resolution of input
environmental data
(pixel size) Range of shifting occurrences
Grahametal. 2008 observed categorical,
continuous
100 × 100 m 0–5 km 0–50 pixels
Johnson and
Gillingham 2008
observed categorical 30 × 30 m 50–1000 m (over 50 m) 1–34 pixels
Osborne and
Leitão 2009
observed continuous 1 × 1 km 0–1, 2–3, 4–5, 0–5 km 0–1, 2–3,
4–5, 0–5
pixels
Fernandezetal. 2009 observed continuous 1 × 1 km 5–10–25–50 km 1–5, 1–10,
1–25,
1–50
pixels
Naimietal. 2011 artificial continuous artificial data x 1–30 (over 1
pixel)
Mitchelletal. 2017 observed continuous 2.5 × 2.5 m 5–25–50–20–400 m 1–2, 1–12,
1–80,
1–160
pixels
259
autocorrelation (Supplementary material Appendix 2 Fig.
A2). CHM describes a horizontal structural variability of the
vegetation and is known to aect species richness (Lefskyetal.
2002). For example, higher vegetation was found to be related
to higher bird species richness (Davies and Asner 2014). TWI
is a surrogate for soil moisture, an environmental variable
that aects the vegetation composition and that has been pre-
viously used to predict bird occurrences (Besnardetal. 2013,
Reifetal. 2018). e relationships between CHM and TWI
on the one side and bird distribution and richness on the
other side make our study relatable to applications with real
species; our virtual species could theoretically be birds with
specic habitat requirements in terms of terrain characteristic
and vegetation structure. We also used the DTM as a sur-
rogate for climatic variables and to restrict our virtual species
to certain altitudes (Coopsetal. 2010, Vogeleretal. 2014).
Simulating virtual species with different niche
breadths
Virtual species were generated with the virtualspecies pack-
age (Leroyetal. 2016) in the statistical software R v.3.4.4 (R
Development Core Team). e process involved three steps:
a) generating the true distribution of the virtual species’ envi-
ronmental suitability, b) converting the environmental suit-
ability into presences and absences and c) sampling species
occurrences for further analysis and modelling.
Applying the formatFunctions function in R, we dened
the species–environment relationships using normal distribu-
tion curves. To simulate species with dierent niche breadth,
prevalence and ROA, we used the same means and varied
standard deviations of the used environmental variables
(Supplementary material Appendix 3 Table B1). Specically,
we simulated three distinct virtual species with varying ROAs
and prevalence that represent realistic scenarios of species
extent of occurrence in the study area. e species with low
ROA (4%) represents a specialist with low species prevalence
(0.04), narrow niche breadth and small geographical range.
e species with medium ROA (12%) may be described as an
intermediate species (species prevalence = 0.12) with a wider
niche breadth and medium geographical range. Finally, the
species with high ROA (52%) can be perceived as a general-
ist with high species prevalence (0.47), wide niche breadth
and wide geographical range (Futuyma and Moreno 1988,
Devictor et al. 2010, Franklin 2010, Peers et al. 2012).
Subsequently, we multiplied individual species’ responses to
environmental variables in order to acquire an environmental
suitability raster (function generateSpFromFun). We opted
for multiplication of the variables to assume irreplaceability
of environmental conditions (i.e. we assumed that unsuitabil-
ity of one condition causes a low probability of occurrence
even though remaining conditions are in species’ range of
suitable values).
As noted in several studies (Meynard and Kaplan 2012,
2013, Moudrý 2015, Meynardetal. 2019), an appropriate
setting of the whole simulation with respect to the research
questions is crucial for obtaining reliable results. In addition,
Meynardet al. (2019) highlighted that simulation studies
based on the threshold approach fail in appropriately separat-
ing factors such as prevalence and niche breadth. erefore,
due to these concerns, we adopted a probabilistic simulation
approach (logistic function with α = 0.05 and β = 0.3) to
convert the environmental suitability rasters into probabili-
ties of occurrences that were subsequently used to sample
binary presence/absence rasters (function convertToPA). To
sample species occurrences (function sampleOccurrences),
we randomly generated, using a uniform random distribu-
tion, both presence-only and presence/absence data. Both
types of occurrence datasets were generated in order to test
dierent modelling techniques (cf. section Model tting and
evaluation). To test whether it is possible to compensate the
assumed negative eect of positional error with a higher sam-
ple size, we generated four dierent sample sizes. Specically,
30, 100, 500 and 1000 species presences were generated,
complemented for the purpose of GLM modelling by twice
as many absences.
Simulating positional error in species occurrences
It is generally assumed that the magnitude of the positional
error in species occurrence varies based on the source of the
error. e positional error associated with GNSS points (e.g.
species occurrences) may range from a few centimetres up to
several metres. Furthermore, in some species such as birds or
big predators, it is usually impossible to record their accu-
rate position and such data are shifted by tens or hundreds
of meters. An even greater shift is sometimes observed in
museum databases. erefore, to evaluate the range of pos-
sible magnitudes of the positional error, we simulated the
positional error by shifting the sampled locations (i.e. pres-
ences and, in case of GLM, also absences) in a random direc-
tion according to six scenarios that corresponded to dierent
distances ranging from 5–10 m up to 100–500 m. e error
in the focal virtual species locations was 5–10 m for S1 sce-
nario, 10–15 m for S2, 15–20 m for S3, 20–50 m for S4,
50–100 m for S5 and 100–500 m for S6 (Supplementary
material Appendix 4 Table C1). Scenarios S1–S4 simulated
realistic degrees of error if using modern monitoring tech-
nologies like GNSS, while scenarios S5–S6 simulated more
extreme positional errors that could be associated with spe-
cies observations recorded without GNSS, species dicult to
pinpoint properly such as birds or big predators, or occur-
rences from museum databases. If the shifting of the original
data points resulted in the points falling outside the study
area, we recalculated the shift until the new coordinates were
located within the boundaries of the study area. We provide a
script of how we simulated virtual species and shifting occur-
rences in Supplementary material Appendix 2.
Model fitting and evaluation
We selected generalized linear models (GLM; Nelder and
Baker 1972, Oksanen and Minchin 2002) as a presence/
absence method and MaxEnt (Phillips et al. 2006) as a
260
presence-background method that are often adopted in
ecological studies (Moudrý and Šímová 2013, Lindaetal.
2016, Malavasietal. 2018, Gáboretal. 2019, Wattsetal.
2019). In addition, Grahametal. (2008) showed that these
two approaches were among the better performing model-
ling techniques when the data was aected by positional
errors. Models were built in the statistical software R using
the ‘dismo’ (ver. 1.1.4) and ‘glm2’ (ver. 1.2.1) packages. e
GLM was run with a logit–link function and binomial distri-
bution. e quadratic terms of the three environmental vari-
ables were included because of the known normal distribution
curves of the response function. To enable the comparison
of individual SDMs, we needed to maintain the param-
eters of MaxEnt unchanged, as done in many prior studies
(Franklinetal. 2014, Fourcadeetal. 2014, Hollowayetal.
2016, Rancetal. 2016, Tingleyetal. 2018, Yeetal. 2018).
e default settings established by Phillipsetal. (2009) were
used with randomly drawn background data generated from
the binary map of the true occurrences of the virtual spe-
cies. e same three environmental variables (DTM, CHM
and TWI) used in the process of generating virtual species
were used in the SDMs. Fivefold cross-validation where the
data were randomly divided into fths was used to evaluate
the models. Four fths of the data were used to train the
model and the remaining one fth was used to assess the
performance. Control models without positional error were
calculated for all three species with dierent niche breadth,
prevalence and ROA and for both modelling techniques,
allowing an easy comparison of the eect of positional error
on model performance.
e area under the receiver operating characteristic curve
(AUC) (Fielding and Bell 1997, Jiménez-Valverde 2012) and
the true-skill statistic (TSS) (Alloucheetal. 2006) were used
to assess model performance (i.e. discrimination accuracy).
AUC is widely used in ecological studies as a single threshold-
independent measure of model performance (Václavík and
Meentemeyer 2012, Mitchelletal. 2017). e AUC ranges
from 0 to 1 where a score of 1 indicates perfect discrimi-
nation, a score of 0.5 indicates random performance and
values lower than 0.5 indicate a worse than random perfor-
mance. TSS is a frequently used threshold dependent metric
(Cianfranietal. 2018, Eatonetal. 2018) taking both omis-
sion and commission errors into account. It ranges from 1
to +1 where +1 indicates perfect agreement and values of zero
or less indicate random performance (Alloucheetal. 2006).
To quantify dierences between the true probability of
occurrence of virtual species and the predicted distribution
inferred from the models in geographical space, their niche
overlap was compared using the I measure (Warren et al.
2008, Rödder and Engler 2011) and Spearman’s rank cor-
relation. e I ranges between 0 (no overlap) and 1 (perfect
overlap). Following Rödder and Engler (2011), we used the
following classes to interpret the results: no or very limited
overlap (0–0.2), low overlap (0.2–0.4), moderate overlap
(0.4–0.6), high overlap (0.6–0.8) and very high overlap (0.8–
1.0). Spearman’s rank correlation ranges between 1 and +1,
where 1 indicates that species responses to the environment
are exactly negatively correlated (opposite) and +1 indicates
perfectly positively correlated overlap (identical). e closer
the values are to zero, the lower is the niche overlap.
e magnitude of the negative eect of the positional error
on SDMs is dependent on the size of the positional error
and distribution of species’ suitable environment in the geo-
graphical space (Naimietal. 2011). e positional data may
be shifted in the geographical space and even a relatively low
positional error in geographical space can have a profound
eect on environmental niche estimates in environmental
space and vice versa. Furthermore, we expected this would be
related to the species niche breadth. erefore, we were also
interested in how the positional error is manifested in the
environmental space and measured the niche overlap in the
environmental space as well. We used I and Spearman’s rank
correlation implemented in ENMTools 0.2 (Warren et al.
2019a, b) to estimate overlap in the environmental space
between models tted with accurate occurrences without any
positional error (hereafter unaltered models) and models t-
ted with shifted occurrences (i.e. scenarios S1–S6).
We ran the entire process from species generation to
model evaluation 30 times (Fig. 1). In addition, we used the
analysis of variance (ANOVA) to assess the strength of the
individual eects of the positional error, sample size, ROA
and modelling technique, including all possible interactions.
We compared the relative importance of individual predictors
based on their contribution to the overall explained variation
(R2). Instead of formal testing, we plotted the eects (and
their condence intervals) of all predictors combinations and
evaluated them qualitatively. Because both AUC and TSS
values were highly heteroscedastic (e.g. the ratio between
maximum and minimum standard deviation across all fac-
tors combinations was 22 resp. 19 for AUC resp. TSS), we
used robust variance–covariance matrix estimator suggested
by MacKinnon and White (1985) for computation of con-
dence intervals. is was done using an R package ‘sandwich
(Zeileis 2006).
Results
Unaltered models
Both performance metrics (AUC and TSS) largely followed
the same pattern and highlighted excellent model perfor-
mance for all, i.e. specialist, intermediate and generalist, spe-
cies (AUC ranged from 0.91 up to 0.97 for MaxEnt models
and from 0.80 up to 0.85 for GLM models). e only excep-
tion were the MaxEnt models for generalist species where
AUC achieved only good performance (mean AUC 0.73).
MaxEnt models were more successful in modelling special-
ist and intermediate species while GLM models were more
accurate for the generalist species (Fig. 2).
Models achieved high or very high niche overlaps in geo-
graphical space according to both I and Spearman’s rank
correlation. In general, the niche overlap decreased in the fol-
lowing order: generalist, specialists and intermediate species,
261
Figure1. General modelling process. (i) We rst acquired and processed LiDAR data and selected three ne-scale environmental predictors:
DTM, CHM and TWI. (ii) We simulated virtual species with dierent niche breadths (ROA) by dening their response to environmental
gradients for each environmental variable. (iii) We multiplied those variables to generate environmental suitability (‘true’ distribution of
virtual species). (iv) We translated the probability of species occurrence to a presence–absence raster. (v) We sampled occurrences based on
the presence–absence raster. (vi) We simulated the positional error in species occurrences. (vii) We generated SDMs with accurate as well as
shifted occurrences, evaluated their performances (AUC, TSS) and assessed the niche overlap (I, Spearman’s rank correlation) in the geo-
graphical and environmental space.
262
Figure2. Resulting AUC (A) and TSS (B) scores according to dierent species niche breadth (specialist, intermediate, generalist), positional
error (S0, unaltered models; S1, 5–10 m; S2, 10–15 m; S3, 15–20 m; S4 20–50 m, S5, 50–100 m; S6, 100–500 m) and sample size (number
of presences = 30, 100, 500, 1000; note that for GLM models twice as many absences compared to presences were generated). Black colour
shows results for GLM models while grey shows results for MaxEnt models.
263
except for the Spearman’s rank correlation for specialists
modelled by MaxEnt that achieved very high correlation.
Comparison of modelling techniques showed that MaxEnt
models achieved a higher niche overlap than GLM for all spe-
cies with the most obvious dierences in specialist species. An
increase in the sample size of unaltered models led to none or
negligible increase in niche overlap (Fig. 3).
Effect of positional error on models of species with
different niche breadth
Results show, independently of the modelling technique,
a clear trend of the positional error worsening model per-
formance (both AUC and TSS). e highest drop is evi-
dent between unaltered models and models aected by the
smallest simulated positional error (5–10 m). Increasing
the positional error further led to additional decrease in
model performances; however, this decrease was mini-
mal (positional error 10–50 m). Even the extreme cases
of positional error (50–100 and 100–500 m) led to a rela-
tively low decrease in models’ performances in contrast
to the drop caused by the 5–10 m error. For example, in
the case of MaxEnt models for intermediate species, AUC
dropped on average from 0.91 (unaltered models) to 0.79
for the positional error of magnitude inherent to any occur-
rence data (i.e. up to 10 m), and to 0.71 in the case of the
extreme positional error (100–500 m), respectively (Fig. 2).
Nevertheless, the magnitude of the negative eect of posi-
tional error varied according to the species niche breadth.
For both GLM and MaxEnt models the drop between unal-
tered models and the smallest simulated positional error
(5–10 m) was higher for specialist and intermediate species
(AUC dropped on average about 0.12) than for generalist
species (AUC dropped on average about 0.05).
e results showed that the positional error in the occur-
rence data reduced the niche overlap in both the geographical
and environmental space of both GLM and MaxEnt models.
Niche overlap decreased gradually with the increasing posi-
tional error with an especially signicant decrease in mod-
els’ niche overlap at the extreme case of the positional error
(100–500 m) (Fig. 3, 4). However, the eect of the positional
error on the niche overlap varied depending on species’ niche
breadth. Decrease in the niche overlap was higher for spe-
cialist and intermediate species than for generalist species,
especially in the geographical space. For example, in case of
MaxEnt models, Spearman’s rank correlation was reduced
from 0.98 to 0.58 for the specialist and from 0.83 to 0.70
for the generalist species, respectively (Fig. 3). However,
the eect of the positional error was not that evident from
I, especially for the generalist species in geographical space.
For example, the decrease for generalist species and MaxEnt
models was on average only from 0.96 to 0.9 and the GLM
models appeared as not being aected at all.
Finally, independently of the validation metric, results
showed that increasing the sample size cannot compensate for
the eect of positional error (Fig. 2–4). On the contrary, it is
evident that a combination of low sample size of 30 samples
with positional error led to erratic behaviour and generally
low performance of the models.
Comparison of the relative importance of individual
predictors (R2)
e results show that the positional error and modelling tech-
nique had the highest relative importance (R2) for the model
performance (AUC, TSS). e relative importance of the
sample size and niche breadth was much smaller and mutu-
ally comparable (Table 2). According to the niche overlap in
geographical space assessed by I (model predictions), niche
breadth had the greatest eect, followed by the positional
error, modelling technique and sample size, the importance
of which was almost negligible. In contrast, according to cor-
relations, the modelling technique and positional error had
the highest relative importance (R2) followed by the niche
breadth and by sample size, the importance of which was
minimal. When assessing relative importance for niche over-
lap in the environmental space, the modelling technique and
positional error showed the highest contribution followed
by the niche breadth and by sample size, the importance of
which was almost negligible, just like in the above metrics.
All those factors signicantly aected SDMs performance
and predictions (p-value < 0.05).
Discussion
In this study, we focused on the eect of positional error in
species occurrences on ne-scale SDMs. We simulated species
with dierent levels of niche breadth to assess whether there
was a link between the width of the environmental niche and
the eect of the size of positional error. Our results showed
that introducing positional error into species occurrence
data led to a decrease in model performance and prediction
accuracy in both the geographical and environmental space.
However, the eect of the positional error varied with species
niche breadth. e same positional error had a greater impact
on specialist (low ROA and prevalence, narrow breadth of
niche) than on generalist (high ROA and prevalence, wide
breadth of niche) species. is is likely because in case of
specialist species, occurrences could be easily shifted to inap-
propriate environments outside of the species’ environmental
niche. is could also explain the inconsistent conclusions of
previous studies (Grahametal. 2008, Fernandezetal. 2009).
Higher sample sizes slightly improved unaltered models
accuracy; the results however showed that increasing the sam-
ple size could not compensate for the eect of positional error
on models’ accuracy (Fig. 2–4). On the other hand, low sam-
ple sizes of positionally inaccurate data were especially prob-
lematic for modelling. ese results are in general agreement
with the study by Mitchelletal. (2017) who investigated the
inuence of sample size (ranging from 100 samples to 400)
in conjunction with the positional error; their results showed
that models based on smaller sample sizes were more aected
by a positional error than those with higher numbers of spe-
cies occurrences. However, it is dicult to conclude whether
264
Figure3. Resulting I (A) and Spearman’s rank correlation (B) scores of niche overlap in geographical space according to dierent species niche
breadth (specialist, intermediate, generalist), positional error (S0, unaltered models; S1, 5–10 m; S2, 10–15 m; S3, 15–20 m; S4, 20–50 m,
S5, 50–100 m; S6, 100–500 m) and sample sizes (number of presences = 30, 100, 500, 1000; note that for GLM models twice as many
absences compared to presences were generated). Black colour shows results for GLM models while grey shows results for MaxEnt models.
265
Figure4. Resulting I (A) and Spearman’s rank correlation (B) scores of niche overlap in the environmental space according to dierent spe-
cies niche breadth (specialist, intermediate, generalist), positional error and sample size (number of presences = 30, 100, 500, 1000; note
that for GLM models, twice as many absences as presences were generated). Also note that here we show the niche overlap between unal-
tered models and models aected by a specied positional error (and not a comparison with simulated probability of occurrences as in Fig.
3). us, for example, S1 shows a comparison of niche overlap between unaltered models and models aected with positional error in the
range of 5–10 m. Black colour shows results for GLM models while grey shows results for MaxEnt models.
266
or not 100 records with positional error of 10 m are better
or worse for modelling at the scale of 5 m than 500 records
with positional error 25 m. For example, Moudrý and Šímová
(2012) suggested that the spatial resolution of the environ-
mental data should be coarser than the biggest positional error
of the occurrence data and Naimietal. (2011) showed that the
eect of positional error is reduced by spatial autocorrelation
in environmental variables. However, the trade-o between
the scale and positional error has not been thoroughly studied.
e degree of decrease between unaltered and altered mod-
els (i.e. those with positional error) diered among adopted
validation metrics and assuming a suciently large sample
size, AUC and TSS provided clear evidence of decreasing
model quality. e ability of evaluation metrics to identify the
magnitude of error caused by positional inaccuracies was pre-
viously discussed by Osborne and Leitão (2009). Interestingly,
they found that the use of AUC for the error quantication
in models aected by positional error was limited as AUC
did not decrease when compared to the control models. We
hypothesize that this contradiction results from confounding
eects of real data used in their study (i.e. they did not use
virtual species). In Osborne and Leitão (2009), the model-
ling algorithms were allowed to choose the best combination
of environmental variables from a set of twelve variables for
scenarios with dierent levels of positional error. Indeed, they
showed that positional error led to alteration of the variables
selected by the modelling algorithm. e selected variables
however often failed to represent the conditions pertinent to
the species during habitat selection. In contrast, here we used
the same variables throughout, both to generate the virtual
species and to model their distribution. Hence, our modelling
approaches (GLM, MaxEnt) did not have the option to select
variables that would provide a closer t to the altered occur-
rence data but that were lacking ecological relevance and as a
result did not lead to spurious increase in AUC and TSS val-
ues. We suggest that the eect of positional error on selection
of environmental variables should be further investigated.
e eects discussed above raise serious concerns as it is
possible that the use of positionally inaccurate data com-
bined with an arbitrary selection of environmental variables
that may lack ecological relevance results in seemingly accu-
rate but entirely wrong models. For instance, Fourcadeetal.
(2018) successfully tted SDMs with non-ecological vari-
ables such as paintings to demonstrate this point. While
Osborne and Leitão (2009) and Mitchellet al. (2017) sug-
gested that useful predictions can still be generated from data
aected by positional error, they warned that the ecological
interpretation of such data and predictions was dangerous.
Our results support the importance of assessing data in terms
of tness-for-use (Lecours 2017). Fitness-for-use is the con-
cept of determining whether or not a dataset is of sucient
quality for a particular purpose (Goodchild 2006). Spatial
scale is intrinsically linked to such assessment of tness-for-
use (Lecoursetal. 2017) as data accuracy is dependent on
the spatial resolution of the environmental data. As indicated
by Moudrý and Šímová (2012), the spatial resolution of the
environmental data should always be coarser than the largest
positional error associated with occurrence data.
In line with previous work (Van Niel and Austin 2007,
Rocchinietal. 2011, Lecoursetal. 2017), we believe that
attempts to predict species distributions with data of unknown
accuracy are potentially dangerous and as such, we highlight
the necessity of quantifying the positional accuracy of data. If
such assessment is limited by metadata availability, for exam-
ple in case of historical data, we recommend to at least approx-
imate the positional accuracy based on known information
such as the collection methodology or the number of deci-
mals recorded with coordinates. With a proper tness-for-use
assessment that includes data quality and scale, the resolution
of environmental variables can be coarsened before they are
integrated into a modelling exercise to minimize the adverse
eects of the positional error of species occurrences. However,
we are aware that this may involve altering the spatial resolu-
tion of data to a level that is no longer eligible for potentially
optimal resolution(s), i.e. the scale at which species respond
to the environment (Lecoursetal. 2015, Moudrýetal. 2019).
As demonstrated in Lecourset al. (2017), there is a trade-
o between spatial scale and data quality that needs to be
evaluated as a part of the tness-for-use assessment. While
no experiments are currently available to help quantify which
is more important for successful modelling (whether it is the
data quality or scale), we suggest that pre-analyses be per-
formed to test whether keeping a ner resolution is more
important than minimizing positional error, or vice-versa. For
new surveys, we suggest paying a close attention to measure-
ment techniques to minimize positional error, for instance by
using dierential GNSS, especially for species with a narrow
ecological niche as our results show that the positional error
of species occurrence data has a profound eect on results of
SDMs. Finally, we advocate for additional studies focused on
the inuence of positional error using more complex virtual
species (e.g. with a higher number of environmental variables
or with more complex response curves) to improve SDM use
in ecology, macroecology and biogeography.
Table 2. Comparison of the relative importance of individual factors (R2, %) for ANOVA of performance metrics (AUC, TSS) and niche over-
lap in the geographical and environmental spaces (I, correlation).
Factor AUC TSS
I geographical
space
Correlation
geographical space
I environmental
space
Correlation
environmental space
ROA 4 4.14 75 11.2 9.7 1.7
Sample size 1.1 1.78 0.1 1 0.2 0.4
Modelling technique 18.7 21.35 8 24.7 45.4 21.5
Positional error 25.4 24.58 8.4 27.5 13.2 18.3
267
Conclusions
In this study, we explored how positional error in spe-
cies occurrences aects ne-scale SDMs. We showed that
the inuence of positional error on SDMs diered accord-
ing to the width of species’ ecological niches and this eect
was evident in both geographical and environmental space.
e eect of the positional error on generalist species was
much smaller than the eect on specialist species, which were
aected the most. In addition, our results show that the neg-
ative eects of positionally inaccurate data entering SDMs
cannot be mitigated by increasing the sample size. erefore,
a take away message of our study is that improving positional
accuracy of data appears to be more eective than increas-
ing sample size. We suggest that it is critical to evaluate the
quality of data with respect to the spatial resolution of the
environmental variables and to select occurrences with a low
positional error (note that a low positional error can be even
1km if the spatial resolution of environmental variables is of
similar size). Future research should be focused on the inu-
ence of positional error using more complex virtual species
(e.g. with a higher number of environmental variables or with
more complex response curves) and on how positional accu-
racy errors may aect the selection of variables in modelling
species distribution to improve its future application in ecol-
ogy, macroecology and biogeography.
Data availability statement
Using our methods, species occurrence data may be articially
generated using virtualspecies package in R. e LiDAR data
are owned by Krkonose Mountains National Park and are
available upon request for research purposes.
Acknowledgements – e authors would like to thank the Krkonose
Mountains National Park for providing LiDAR data. We greatly
appreciate the contribution of the subject editor and both reviewers.
Funding – is research was funded by the Internal Grant Agency
of Faculty of Environmental Sciences, Czech Univ. of Life Sciences
Prague, grant no. 20174241 and no. 20194224. VM, VB and MF
were also supported by the Czech Science Foundation (project no.
17-17156Y).
Author contributions – All authors contributed substantially to the
work. VM and TV are authors of the main idea of the research and
supervised the whole research. LG and VB performed all GIS and
statistical analyses. VB supervised statistical analyses. MF processed
LiDAR data. LG wrote the rst draft of the manuscript. VL, MM,
PŠ and DR helped to improve the manuscript. All authors gave
nal approval for publication.
References
Allouche, O.etal. 2006. Assessing the accuracy of species distribu-
tion models: prevalence, kappa and the true skill statistic (TSS).
– J. Appl. Ecol. 43: 1223–1232.
Araújo, M. B. et al. 2019. Standards for distribution models in
biodiversity assessments. – Sci. Adv. 5: eaat4858.
Arribas, P.etal. 2012. Dispersal ability rather than ecological toler-
ance drives dierences in range size between lentic and lotic
water beetles (Coleoptera: Hydrophilidae). – J. Biogeogr. 39:
984–994.
BarbetMassin, M.etal. 2012. Selecting pseudoabsences for spe-
cies distribution models: how, where and how many? – Methods
Ecol. Evol. 3: 327–338.
Besnard, A. G.etal. 2013. Topographic wetness index predicts the
occurrence of bird species in oodplains. – Divers. Distrib. 19:
955–963.
Beven, K. J. and Kirkby, M. J. 1979. A physically based, variable
contributing area model of basin hydrology. – Hydrol. Sci. J.
24: 43–69.
Boulangeat, I. et al. 2012. Niche breadth, rarity and ecological
characteristics within a regional ora spanning large environ-
mental gradients. – J. Biogeogr. 39: 204–214.
Breiner, F. T.etal. 2015. Overcoming limitations of modelling rare
species by using ensembles of small models. – Methods Ecol.
Evol. 6: 1210–1218.
Brown, J. H. 1984. On the relationship between abundance and
distribution of species. – Am. Nat. 124: 255–279.
Bulluck, L.et al. 2006. Spatial and temporal variations in species
occurrence rate aect the accuracy of occurrence models.
– Global Ecol. Biogeogr. 15: 27–38.
Chefaoui, R. M.etal. 2011. Eects of species’ traits and data char-
acteristics on distribution models of threatened invertebrates.
– Anim. Biodivers. Conserv. 34: 229–247.
Cianfrani, C.etal. 2018. More than range exposure: global otter
vulnerability to climate change. – Biol. Conserv. 221: 103–113.
Connor, T.etal. 2018. Eects of grain size and niche breadth on
species distribution modeling. – Ecography 41: 1270–1282.
Conrad, O. 2003. Module topographic wetness index (SAGA).
– Version 2.1.3.
Coops, N. C.etal. 2010. Assessing the utility of LiDAR remote
sensing technology to identify mule deer winter habitat. – Can.
J. Remote Sens. 36: 81–88.
Davies, A. B. and Asner, G. P. 2014. Advances in animal ecology
from 3D-LiDAR ecosystem mapping. – Trends Ecol. Evol. 29:
681–691.
Devictor, V.et al. 2010. Dening and measuring ecological spe-
cialization. – J. Appl. Ecol. 47: 15–25.
Eaton, S.etal. 2018. Adding small species to the big picture: spe-
cies distribution modelling in an age of landscape scale conser-
vation. – Biol. Conserv. 217: 251–258.
Evangelista, P. H.etal. 2008. Modelling invasion for a habitat gen-
eralist and a specialist plant species. – Divers. Distrib. 14:
808–817.
Fernandes, R. F.etal. 2018. How much should one sample to accu-
rately predict the distribution of species assemblages? A virtual
community approach. – Ecol. Inform. 48: 125–134.
Fernandez, M.etal. 2009. Locality uncertainty and the dierential
performance of four common niche-based modeling tech-
niques. – Biodivers. Inform. 6: 36–52.
Ferrier, S.etal. 2017. Biodiversity modelling as part of an observa-
tion system. – In: Walters, M. and Scholers, R. (eds), e GEO
handbook on biodiversity observation networks. Springer, pp.
239–257.
Fielding, A. H. and Bell, J. F. 1997. A review of methods for the
assessment of prediction errors in conservation presence/absence
models. – Environ. Conserv. 24: 38–49.
Fourcade, Y.etal. 2014. Mapping species distributions with MAX-
ENT using a geographically biased sample of presence data: a
268
performance assessment of methods for correcting sampling
bias. – PLoS One 9: e97122.
Fourcade, Y.etal. 2018. Paintings predict the distribution of spe-
cies, or the challenge of selecting environmental predictors and
evaluation statistics. – Global Ecol. Biogeogr. 27: 245–256.
Frair, J. L. etal. 2010. Resolving issues of imprecise and habitat-
biased locations in ecological analyses using GPS telemetry
data. – Phil. Trans. R. Soc. B 365: 2187–2200.
Franklin, J. 2010. Mapping species distributions: spatial inference
and prediction. – Cambridge Univ. Press.
Franklin, J.etal. 2014. Linking spatially explicit species distribu-
tion and population models to plan for the persistence of plant
species under global change. – Environ. Conserv. 41: 97–109.
Futuyma, D. J. and Moreno, G. 1988. e evolution of ecological
specialisation. – Annu. Rev. Ecol. Syst. 207–233.
Gábor, L.etal. 2019. How do species and data characteristics aect
species distribution models and when to use environmental l-
tering? – Int. J. Geogr. Inform. Sci. doi:
10.1080/13658816.2019.1615070
Gaston, K. J.etal. 1997. Interspecic abundance range size rela-
tionships: an appraisal of mechanisms. – J. Anim. Ecol. 66:
579–601.
Goodchild, M. F. 2006. Fundamentals of spatial data quality.
– ISTE, London.
Gottschalk, T. K. et al. 2011. Inuence of grain size on species–
habitat models. – Ecol. Model. 222: 3403–3412.
Graham, C. H.etal. 2008. e inuence of spatial errors in species
occurrence data used in distribution models. – J. Appl. Ecol.
45: 239–247.
Hijmans, R. J. 2012. Crossvalidation of species distribution mod-
els: removing spatial sorting bias and calibration with a null
model. – Ecology 93: 679–688.
Holloway, P.etal. 2016. Incorporating movement in species distri-
bution models: how do simulations of dispersal aect the accu-
racy and uncertainty of projections? – Int. J. Geogr. Inform.
Sci. 30: 2050–2074.
JiménezValverde, A. 2012. Insights into the area under the receiver
operating characteristic curve (AUC) as a discrimination meas-
ure in species distribution modelling. – Global Ecol. Biogeogr.
21: 498–507.
Johnson, C. J. and Gillingham, M. P. 2008. Sensitivity of species-
distribution models to error, bias and model design: an applica-
tion to resource selection functions for woodland caribou.
– Ecol. Model. 213: 143–155.
Khosravipour, A.etal. 2016. Generating spike-free digital surface
models using LiDAR raw point clouds: a new approach for
forestry applications. – Int. J. Appl. Earth Observ. Geoinform.
52: 104–114.
Kopecký, M. and Čížková, Š. 2010. Using topographic wetness
index in vegetation ecology: does the algorithm matter? – Appl.
Veg. Sci. 13: 450–459.
Lecours, V. 2017. On the use of maps and models in conservation
and resource management (warning: results may vary). – Front.
Mar. Sci. 4: 1–18.
Lecours, V. et al. 2015. Spatial scale and geographic context in
benthic habitat mapping: review and future directions. – Mar.
Ecol. Progr. Ser. 535: 259–284.
Lecours, V.etal. 2017. Artefacts in marine digital terrain models:
a multiscale analysis of their impact on the derivation of terrain
attributes. – IEEE Trans. Geosci. Remote Sens. 55: 5391–5406.
Lefsky, M. A. et al. 2002. LiDAR remote sensing for ecosystem
studies: LiDAR, an emerging remote sensing technology that
directly measures the three-dimensional distribution of plant
canopies, can accurately estimate vegetation structural attrib-
utes and should be of particular interest to forest, landscape and
global ecologists. – BioScience 52: 19–30.
Leroy, B.etal. 2016. virtualspecies, an R package to generate virtual
species distributions. – Ecography 39: 599–607.
Leroy, B.etal. 2018. Without quality presence–absence data, dis-
crimination metrics such as TSS can be misleading measures of
model performance. – J. Biogeogr. 45: 1994–2002.
Linda, R. et al. 2016. Developing a criterion for distinguishing
tetraploid birch species from diploid and modelling their poten-
tial distribution on the Czech Republic. – In: Kacálek, D.etal.
(eds), Proceedings of central European silviculture, pp. 71–77.
Lobo, J. M. 2008. More complex distribution models or more rep-
resentative data? – Biodivers. Inform. 5: 14–19.
Luoto, M.etal. 2005. Uncertainty of bioclimate envelope models
based on the geographical distribution of species. – Global Ecol.
Biogeogr. 14: 575–584.
MacKinnon, J. G. and White, H. 1985. Some heteroskedasticity-
consistent covariance matrix estimators with improved nite
sample properties. – J. Economet. 29: 305–325.
Malavasi, M.etal. 2018. Plant invasions in Italy: an integrative
approach using the European LifeWatch infrastructure data-
base. – Ecol. Indic. 91: 182–188.
McPherson, J. M. and Jetz, W. 2007. Eects of species’ ecology
on the accuracy of distribution models. – Ecography 30:
135–151.
Meynard, C. N. and Kaplan, D. M. 2012. e eect of a gradual
response to the environment on species distribution modeling
performance. – Ecography 35: 499–509.
Meynard, C. N. and Kaplan, D. M. 2013. Using virtual species to
study species distributions and model performance. – J. Bioge-
ogr. 40: 1–8.
Meynard, C. N.etal. 2019. Testing methods in species distribution
modelling using virtual species: what have we learnt and what
are we missing? – Ecography doi: 10.1111/ecog.04385
Mitchell, P. J.etal. 2017. Sensitivity of nescale species distribu-
tion models to locational uncertainty in occurrence data across
multiple sample sizes. – Methods Ecol. Evol. 8: 12–21.
Moudrý, V. 2015. Modelling species distributions with simulated
virtual species. – J. Biogeogr. 42: 1365–1366.
Moudrý, V. and Šímová, P. 2012. Inuence of positional accuracy,
sample size and scale on modelling species distributions: a
review. – Int. J. Geogr. Inform. Sci. 26: 2083–2095.
Moudrý, V. and Šímová, P. 2013. Relative importance of climate,
topography and habitats for breeding wetland birds with dier-
ent latitudinal distributions in the Czech Republic. – Appl.
Geogr. 44: 165–171.
Moudrý, V.etal. 2017. Which breeding bird categories should we
use in models of species distribution? – Ecol. Indic. 74:
526–529.
Moudrý, V.etal. 2018. On the use of global DEMs in ecological
modelling and the accuracy of new bare-earth DEMs. – Ecol.
Model. 383: 3–9.
Moudrý, V.etal. 2019. Potential pitfalls in rescaling digital terrain
model-derived attributes for ecological studies. – Ecol. Inform.
54: 100987.
Naimi, B.etal. 2011. Spatial autocorrelation in predictors reduces
the impact of positional uncertainty in occurrence data on spe-
cies distribution modelling. – J. Biogeogr. 38: 1497–1509.
Nelder, J. A. and Baker, R. J. 1972. Generalized linear models.
– Wiley.
269
Oksanen, J. and Minchin, P. R. 2002. Continuum theory revisited:
what shape are species responses along ecological gradients?
– Ecol. Model. 157: 119–129.
Osborne, P. E. and Leitão, P. J. 2009. Eects of species and habitat
positional errors on the performance and interpretation of spe-
cies distribution models. – Divers. Distrib. 15: 671–681.
Peers, M. J. et al. 2012. Reconsidering the specialist–generalist
paradigm in niche breadth dynamics: resource gradient selec-
tion by Canada lynx and bobcat. – PLoS One 7: e51488.
Phillips, S. J.etal. 2006. Maximum entropy modelling of species
geographic distributions. – Ecol. Model. 190: 231–259.
Phillips, S. J.etal. 2009. Sample selection bias and presenceonly
distribution models: implications for background and pseudo
absence data. – Ecol. Appl. 19: 181–197.
Qiao, H.etal. 2015. Marble algorithm: a solution to estimating eco-
logical niches from presence-only records. – Sci. Rep. 5: 14232.
Quinn, P. F. B. J.etal. 1991. e prediction of hillslope ow paths
for distributed hydrological modelling using digital terrain
models. – Hydrol. Process. 5: 59–79.
Ranc, N.etal. 2016. Performance tradeos in targetgroup bias cor-
rection for species distribution models. – Ecography 40:
1076–1087.
Rattray, A. et al. 2014. Quantication of spatial and thematic
uncertainty in the application of underwater video for benthic
habitat mapping. – Mar. Geodesy 37: 315–336.
Reif, J. et al. 2018. Competitiondriven niche segregation on a
landscape scale: evidence for escaping from syntopy towards
allotopy in two coexisting sibling passerine species. – J. Anim.
Ecol. 87: 774–789.
Rocchini, D.etal. 2011. Accounting for uncertainty when map-
ping species distributions: the need for maps of ignorance.
– Progr. Phys. Geogr. 35: 211–226.
Rödder, D. and Engler, J. O. 2011. Quantitative metrics of overlaps
in Grinnellian niches: 422 advances and possible drawbacks.
– Global Ecol. Biogeogr. 20: 915–927.
Šímová, P.etal. 2019 Fine scale waterbody data improve prediction
of waterbird occurrence despite coarse species data. – Ecography
42: 511–520.
Slatyer, R. A.etal. 2013. Niche breadth predicts geographical range
size: a general ecological pattern. – Ecol. Lett. 16: 1104–1114.
Tingley, R. et al. 2018. Integrating transport pressure data and
species distribution models to estimate invasion risk for alien
stowaways. – Ecography 41: 635–646.
Václavík, T. and Meentemeyer, R. K. 2012. Equilibrium or not?
Modelling potential distribution of invasive species in dierent
stages of invasion. – Divers. Distrib. 18: 73–83.
Van Niel, K. P. and Austin, M. P. 2007. Predictive vegetation mod-
elling for conservation: impact of error propagation from digi-
tal elevation data. – Ecol. Appl. 17: 266–280.
Vogeler, J. C.et al. 2014. Terrain and vegetation structural inu-
ences on local avian species richness in two mixed-conifer for-
ests. – Remote Sens. Environ. 147: 13–22.
Warren, D. L.etal. 2008. Environmental niche equivalency versus
conservatism: quantitative approaches to niche evolution.
– Evolution 62: 2868–2883.
Warren, D. L.etal. 2019a. Evaluating species distribution models
with discrimination accuracy is uninformative for many appli-
cations. – BioRxiv 684399.
Warren, D. L. et al. 2019b. danlwarren/ENMTools: initial beta
release. – Package ver. 0.2, Zenodo, < https://github.com/dan-
lwarren/ENMTools >.
Watts, S. M.etal. 2019. Modelling potential habitat for snow leop-
ards (Panthera uncia) in Ladakh, India. – PLoS One 14:
e0211509.
Wieczorek, J.etal. 2004. e point-radius method for georeferenc-
ing locality descriptions and calculating associated uncertainty.
– Int. J. Geogr. Inform. Sci. 18: 745–767.
Wisz, M. S. et al. 2008. Eects of sample size on the
performance of species distribution models. – Divers. Distrib.
14: 763–773.
Ye, X.etal. 2018. Impacts of future climate and land cover changes
on threatened mammals in the semi-arid Chinese Altai Moun-
tains. – Sci. Total Environ. 612: 775–787.
Zeileis, A. 2006. Object-oriented computation of sandwich estima-
tors. – J. Stat. Softw. 16: 1–16.
Zhang, G.etal. 2018. A heuristicbased approach to mitigating
positional errors in patrol data for species distribution mode-
ling. – Trans. GIS 22: 202–216.
Zurell, D.etal. 2010. e virtual ecologist approach: simulating
data and observers. – Oikos 119: 622–635.
Supplementary material (available online as Appendix ecog-
04687 at < www.ecography.org/appendix/ecog-04687 >).
Appendix 1–5.
... wrong transformations among coordinate reference systems, rounding of coordinates, or lack of error correction procedures such as post-differential correction; Sillero and Gonçalves-Seco 2014). Unfortunately, the positional Gábor et al. 2020b Virtual 5 × 5 m 5-500 m 1-100 cells Gábor et al. 2023 Virtual 50 × 50 m 50-1500 m 1-30 cells Gábor et al. 2023 Observed 200 × 200 m 1-30 km 1-30 cells uncertainty of species records is often undocumented (Moudrý andDevillers 2020, Marcer et al. 2022). ...
... Although positional uncertainty seems to depend on species characteristics, its role in affecting SDMs for different groups (such as insects versus big mammals; mobile organisms like birds versus sessile organisms like plants, corals, etc.) is understudied. Among the few studies that analysed the interaction between positional uncertainty and species ecology, Velásquez-Tibatá et al. (2016) and, more recently, Gábor et al. (2020b), showed that positional uncertainty has a greater impact on SDMs' performances for specialists (i.e. species with a narrow niche breadth) than for generalist species (i.e. ...
... In such a case, it is important to consider the following steps to estimate and acknowledge the potential impact of positional uncertainty on the performance of the model. • Fourth, we suggest considering positional uncertainty in light of the particular species' ecology as some groups of species, such as mobile species, might be less affected by positional uncertainty than others (Gábor et al. 2020b). • Fifth, researchers should examine the spatial autocorrelation in predictors to gain insight into whether predictions are likely to be affected by positional uncertainty (Naimi et al. 2011(Naimi et al. , 2014. ...
Article
Full-text available
Species distribution models (SDMs) have proven valuable in filling gaps in our knowledge of species occurrences. However, despite their broad applicability, SDMs exhibit critical shortcomings due to limitations in species occurrence data. These limitations include, in particular, issues related to sample size, positional uncertainty, and sampling bias. In addition, it is widely recognised that the quality of SDMs as well as the approaches used to mitigate the impact of the aforementioned data limitations depend on species ecology. While numerous studies have evaluated the effects of these data limitations on SDM performance, a synthesis of their results is lacking. However, without a comprehensive understanding of their individual and combined effects, our ability to predict the influence of these issues on the quality of modelled species–environment associations remains largely uncertain, limiting the value of model outputs. In this paper, we review studies that have evaluated the effects of sample size, positional uncertainty, sampling bias, and species ecology on SDMs outputs. We build upon their findings to provide recommendations for the critical assessment of species data intended for use in SDMs.
... Like bias, uncertainty is present in all the components of biodiversity data and can stem from various sources. For instance, in the taxonomic dimension uncertainty may arise from imprecise or equivocal species names (Stropp et al. 2022), whereas in the geographic space, positional inaccuracy of survey locations is recognized as a contributor to the overall uncertainty in the data (Gábor et al. 2020). While these aspects of taxonomic and spatial uncertainty are routinely considered in macroecological research, the uncertainty derived from the temporal dimension of the data is often neglected. ...
Article
Full-text available
The availability of biodiversity databases is expanding at unprecedented rates. Nevertheless, species occurrence data can be intrinsically biased and contain uncertainties that impact the accuracy and reliability of biodiversity estimates. In this study, we developed a reproducible framework to assess three dimensions of bias-taxonomic, spatial, and temporal-as well as temporal uncertainty associated with data collections. We utilized the vegetation plot data located in Europe, from sPlotOpen, an open-access database, as a case study. The metrics proposed for estimating bias include completeness of the species richness for taxonomic bias, Nearest Neighbor Index for spatial bias, and Pielou's index for temporal bias. Additionally, we introduced a new method based on a negative exponential curve to model the temporal decay in biodiversity data, aiming to quantify temporal uncertainty. Finally, we assessed the sampling bias considering the influence of various spatial variables (i.e, road density, human population count, Natura 2000 network and topographic roughness). We discovered that the facets of bias and the temporal uncertainty varied throughout Europe, as did the different roles played by spatial variables in determining biases. sPlotOpen showed a clustered distribution of the vegetation plots, and an uneven distribution in sampling completeness, year of sampling and temporal uncertainty. The facets of bias were significantly explained mainly by the presence of Natura 2000 network and marginally by the human population count. These results suggest that employing an efficient procedure to examine biases and uncertainties in data collections can enhance data quality and provide more reliable biodiversity estimates.
... Coordinates do not suffice to know confidently and rigorously the environmental conditions of a specimen's preferred habitat (Gábor et al. 2019). In fact, the knowledge of the degree of uncertainty with which these coordinates have been determined is crucial to determine the fitness of data for a particular research objective. ...
Article
Full-text available
Georeferencing is a key process in the digitization of natural history collections as it assigns spatial coordinates to preserved specimen collecting locations, facilitating their use in ecological, evolutionary and conservation research. Georeference data in public repositories such as GBIF is often missing or incomplete, jeopardising their use in research and limiting the return on investment made by public institutions. Despite the existence of guidelines for best practices for georeferencing and widely accepted standards for biodiversity data, there is a lack of a simple yet effective software tool that offers the implementation of both concepts. We present GeoPick with the aim to offer the collections community a standards‐compliant tool that eases the georeferencing process, making it more cost‐effective, and which, by applying best practices, contributes to the betterment of the occurrence data in public repositories. GeoPick also offers the possibility of collaboration between users and institutions through the sharing of georeferences. The tool is hosted by GBIF at https://geopick.gbif.org, and is open source. Its code is available at a public GitHub repository (https://github.com/rtdeb/GeoPick). Keywords: Darwin Core, georeferencing, natural history collections, point‐radius method, web application, Well Known Text Format, WKT
... Even though SDMs are now commonly adopted, ecologists still face challenges. These are in particular related to the quality of the input data, which can significantly impact the fitted models (Araújo et al. 2019;Gábor et al. 2020;Bazzichetto et al. 2023;Smith et al. 2023;Wang and Jackson 2023). Such challenges include, among other issues, the selection of the appropriate scale/grain (Miguet et al. 2016;Wunderlich et al. 2022;Zarzo-Arias et al. 2022) and environmental variables (Williams et al. 2012;Moudrý et al. 2019;Smith and Santos 2020). ...
Article
Full-text available
Species distribution models are widely used in ecology. The selection of environmental variables is a critical step in SDMs, nowadays compounded by the increasing availability of environmental data. To evaluate the interaction between the grain size and the binary (presence or absence of water) or proportional (proportion of water within the cell) representation of the water cover variable when modeling water bird species distribution. eBird occurrence data with an average number of records of 880,270 per species across the North American continent were used for analysis. Models (via Random Forest) were fitted for 57 water bird species, for two seasons (breeding vs. non-breeding), at four grains (1 km2 to 2500 km2) and using water cover as a proportional or binary variable. The models’ performances were not affected by the type of the adopted water cover variable (proportional or binary) but a significant decrease was observed in the importance of the water cover variable when used in a binary form. This was especially pronounced at coarser grains and during the breeding season. Binary representation of water cover is useful at finer grain sizes (i.e., 1 km2). At more detailed grains (i.e., 1 km2), the simple presence or absence of a certain land-cover type can be a realistic descriptor of species occurrence. This is particularly advantageous when collecting habitat data in the field as simply recording the presence of a habitat is significantly less time-consuming than recording its total area. For models using coarser grains, we recommend using proportional land-cover variables.
Article
Full-text available
Introduction Coffea arabica (Arabica coffee) is an important cash crop in Yunnan, China. Ongoing climate change has made coffee production more difficult to sustain, posing challenges for the region’s coffee industry. Predictions of the distribution of potentially suitable habitats for Arabica coffee in Yunnan could provide a theoretical basis for the cultivation and rational management of this species. Methods In this study, the MaxEnt model was used to predict the potential distribution of suitable habitat for Arabica coffee in Yunnan under current and future (2021-2100) climate scenarios (SSP2-4.5, SSP3-7.0, and SSP5-8.5) using 56 distributional records and 17 environmental variables and to analyze the important environmental factors. Marxan model was used to plan the priority planting areas for this species at last. Results The predicted suitable and sub-suitable areas were about 4.21×10 ⁴ km ² and 13.87×10 ⁴ km ² , respectively, accounting for 47.15% of the total area of the province. The suitable areas were mainly concentrated in western and southern Yunnan. The minimum temperature of the coldest month, altitude, mean temperature of the wettest quarter, slope, and aluminum saturation were the main environmental variables affecting the distribution of Arabica coffee in Yunnan Province. Changes in habitat suitability for Arabica coffee were most significant and contracted under the SSP3-7.0 climate scenario, while expansion was highest under the SSP5-8.5 climate scenario. Priority areas for Arabica coffee cultivation in Yunnan Province under the 30% and 50% targets were Pu’er, Xishuangbanna, Honghe, Dehong, and Kunming. Discussion Climate, soil, and topography combine to influence the potential geographic distribution of Arabica coffee. Future changes in suitable habitat areas under different climate scenarios should lead to the delineation of coffee-growing areas based on appropriate environmental conditions and active policy measures to address climate change.
Article
Aim As global change accelerates, accurate predictions of species distributions and biodiversity patterns are critical to limit biodiversity loss. Numerous studies have found that coarse‐grain species distribution models (SDMs) perform poorly relative to fine‐grain models because they mismatch environmental information with observations. However, it remains unclear how grain‐size biases vary in intensity across space and time, possibly generating inaccurate predictions for specific regions, seasons or species. For example, coarse‐grain biases may intensify in patchy, discontinuous landscapes. Such biases may accumulate to produce highly misleading estimates of continental and seasonal biodiversity patterns. Location United States and Canada. Time Period 2004–2021. Major Taxa Studied Birds (Aves). Methods We fit presence‐absence SDMs characterising the summer and winter distributions of 572 bird species native to the US and Canada across five spatial grains from 1 to 50 km, using observations from the eBird citizen science initiative. We combined these predictions to generate seasonal biodiversity estimates across the US and Canada, which we validated using observations from 322 independent sites. Results We find that in both seasons, 1 km models more accurately predicted species presence, absence and richness at local sites. Coarse‐grain models (even at 3 km) consistently under‐predicted range area, potentially missing important habitat. This bias intensified during summer (83%–86% of species) when many birds have smaller ‘operational scales’ via localised home ranges while breeding. Biases were greatest in desert regions with patchier habitat and for range‐restricted and habitat‐specialist species. Predictions based on coarse‐grain models overpredicted avian diversity in the west and underpredicted it in the great plains, prairie pothole region and boreal zones. Main Conclusions We demonstrate that coarse‐grain models can bias seasonal and continental estimates of biodiversity patterns across space and time and that grain‐related biases intensify during summer and in patchier landscapes, especially for range‐restricted and habitat specialist species at risk of population declines.
Article
Full-text available
Climate change significantly alters species distributions. Numerous studies project the future distribution of species using Species Distribution models (SDMs), most often using coarse resolutions. Working at coarse resolutions in forest ecosystems fails to capture landscape-level dynamics, spatially explicit processes, and temporally defined events that act at finer resolutions and that can disproportionately affect future outcomes. Dynamic Forest Landscape Models (FLMs) can simulate the survival, growth, and mortality of (stands of) trees over long time periods at small resolutions. However, as they are able to simulate at fine resolutions, study landscapes remain relatively small due to computational constraints. The large amount of feedbacks between biodiversity, forest, and ecosystem processes cannot completely be captured by FLMs or SDMs alone. Integrating SDMs with FLMs enables a more detailed understanding of the impact of perturbations on forest landscapes and their biodiversity. Several studies have used this approach at landscape scales, using fine resolutions. Yet, many scientific questions in the fields of biogeography, macroecology, conservation management, among others, require a focus on both large scales and fine resolutions. Here, drawn from literature and experience, we provide our perspective on the most important challenges that need to be overcome to use integrated frameworks at spatial scales larger than the landscape and at fine resolutions. Future research should prioritize these challenges to better understand drivers of species distributions in forest ecosystems and effectively design conservation strategies under the influence of changing climates on spatially and temporally explicit processes. We further discuss possibilities to address these challenges.
Article
Full-text available
O presente artigo é uma revisão bibliográfica sobre os modelos de distribuição de espécies utilizando váriaveis climáticas e os principais aspectos metodológicos deste processo. O objetivo deste estudo é fornecer um embasamento teórico a respeito das técnicas de modelagem e de seus aspectos conceituais e metodológicos, além de destacar as principais aplicações destas ferramentas no meio científico. O método utilizado foi um levantamento na base de dados on-line Scielo e Google acadêmico, sendo selecionados 28 trabalhos científicos para serem analisados seguindo critérios de inclusão e exclusão. Observamos que os modelos de nicho ecológico têm avançados consideravelmente ao longo dos anos e, mesmo com a recente pandemia, as publicações sobre modelagem são frequentes na literatura, bem como, avanços técnicos e estatísticos associados ao desenvolvimento destes modelos. Contudo, as dificuldades de acurácia nas predições ainda são grandes, seja pelo desconhecimento do processo ou por possíveis falhas a que este está sujeito.
Article
Blacklegged ticks (Ixodes scapularis Say) pose an enormous public health risk in eastern North America as the vector responsible for transmitting 7 human pathogens, including those causing the most common vector-borne disease in the United States, Lyme disease. Species distribution modeling is an increasingly popular method for predicting the potential distribution and subsequent risk of blacklegged ticks, however, the development of such models thus far is highly variable and would benefit from the use of standardized protocols. To identify where standardized protocols would most benefit current distribution models, we completed the “Overview, Data, Model, Assessment, and Prediction” (ODMAP) distribution modeling protocol for 21 publications reporting 22 blacklegged tick distribution models. We calculated an average adherence of 73.4% (SD ± 29%). Most prominently, we found that authors could better justify and connect their selection of variables and associated spatial scales to blacklegged tick ecology. In addition, the authors could provide clearer descriptions of model development, including checks for multicollinearity, spatial autocorrelation, and plausibility. Finally, authors could improve their reporting of variable effects to avoid undermining the models’ utility in informing species–environment relationships. To enhance future model rigor and reproducibility, we recommend utilizing several resources including the ODMAP protocol, and suggest that journals make protocol compliance a publication prerequisite.
Article
Full-text available
Aim Species distribution models are used across evolution, ecology, conservation and epidemiology to make critical decisions and study biological phenomena, often in cases where experimental approaches are intractable. Choices regarding optimal models, methods and data are typically made based on discrimination accuracy: a model's ability to predict subsets of species occurrence data that were withheld during model construction. However, empirical applications of these models often involve making biological inferences based on continuous estimates of relative habitat suitability as a function of environmental predictor variables. We term the reliability of these biological inferences ‘functional accuracy.’ We explore the link between discrimination accuracy and functional accuracy. Methods Using a simulation approach we investigate whether models that make good predictions of species distributions correctly infer the underlying relationship between environmental predictors and the suitability of habitat. Results We demonstrate that discrimination accuracy is only informative when models are simple and similar in structure to the true niche, or when data partitioning is geographically structured. However, the utility of discrimination accuracy for selecting models with high functional accuracy was low in all cases. Main conclusions These results suggest that many empirical studies and decisions are based on criteria that are unrelated to models’ usefulness for their intended purpose. We argue that empirical modelling studies need to place significantly more emphasis on biological insight into the plausibility of models, and that the current approach of maximizing discrimination accuracy at the expense of other considerations is detrimental to both the empirical and methodological literature in this active field. Finally, we argue that future development of the field must include an increased emphasis on simulation; methodological studies based on ability to predict withheld occurrence data may be largely uninformative about best practices for applications where interpretation of models relies on estimating ecological processes, and will unduly penalize more biologically informative modelling approaches.
Article
Full-text available
Species distribution models (SDMs) have become one of the major predictive tools in ecology. However, multiple methodological choices are required during the modelling process, some of which may have a large impact on forecasting results. In this context, virtual species, i.e., the use of simulations involving a fictitious species for which we have perfect knowledge of its occurrence‐environment relationships and other relevant characteristics, have become increasingly popular to test SDMs. This approach provides for a simple virtual ecologist framework under which to test model properties, as well as the effects of the different methodological choices, and allows teasing out the effects of targeted factors with great certainty. This simplification is therefore very useful in setting up modelling standards and best practice principles. As a result, numerous virtual species studies have been published over the last decade. The topics covered include differences in performance between statistical models, effects of sample size, choice of threshold values, methods to generate pseudo‐absences for presence‐only data, among many others. These simulations have therefore already made a great contribution to setting best modelling practices in SDMs. Recent software developments have greatly facilitated the simulation of virtual species, with at least 3 different packages published to that effect. However, the simulation procedure has not been homogeneous, which introduces some subtleties in the interpretation of results, as well as differences across simulation packages. Here we (1) review the main contributions of the virtual species approach in the SDM literature; (2) compare the major virtual species simulation approaches and software packages; and (3) propose a set of recommendations for best simulation practices in future virtual species studies in the context of SDMs. This article is protected by copyright. All rights reserved.
Article
Full-text available
The snow leopard Panthera uncia is an elusive species inhabiting some of the most remote and inaccessible tracts of Central and South Asia. It is difficult to determine its distribution and density pattern, which are crucial for developing conservation strategies. Several techniques for species detection combining camera traps with remote sensing and geographic information systems have been developed to model the habitat of such cryptic and low-density species in challenging terrains. Utilising presence-only data from camera traps and direct observations, alongside six environmental variables (elevation, aspect, ruggedness, distance to water, land cover, and prey habitat suitability), we assessed snow leopard habitat suitability across Ladakh in northern India. This is the first study to model snow leopard distribution both in India and utilising direct observation data. Results suggested that elevation and ruggedness are the two most influential environmental variables for snow leopard habitat suitability, with highly suitable habitat having an elevation range of 2,800 m to 4,600 m and ruggedness of 450 m to 1,800 m. Our habitat suitability map estimated approximately 12% of Ladakh's geographical area (c. 90,000 km²) as highly suitable and 18% as medium suitability. We found that 62.5% of recorded livestock depredation along with over half of all livestock corrals (54%) and homestays (58%) occurred within highly suitable snow leopard habitat. Our habitat suitability model can be used to assist in allocation of conservation resources by targeting construction of livestock corrals to areas of high habitat suitability and promoting ecotourism programs in villages in highly suitable snow leopard habitat.
Article
Full-text available
Demand for models in biodiversity assessments is rising, but which models are adequate for the task? We propose a set of best-practice standards and detailed guidelines enabling scoring of studies based on species distribution models for use in biodiversity assessments. We reviewed and scored 400 modeling studies over the past 20 years using the proposed standards and guidelines. We detected low model adequacy overall, but with a marked tendency of improvement over time in model building and, to a lesser degree, in biological data and model evaluation. We argue that implementation of agreed-upon standards for models in biodiversity assessments would promote transparency and repeatability, eventually leading to higher quality of the models and the inferences used in assessments. We encourage broad community participation toward the expansion and ongoing development of the proposed standards and guidelines.
Poster
Full-text available
Species distribution modelling is now routinely applied in many macroecologicalstudies. However, the reliability of evaluation metrics used to validate these models remains debated. Moreover, the emergence of online databases of environmental variables with global coverage, especially climatic, has favoured the use of the same set of standard predictors. Unfortunately, the effort of variable selection based on the species’ ecology is often limited. In this context, our aim was to highlight the importance of selecting ad hoc variables in species distribution modelling, and to assess the ability of classical evaluation statistics to identify biologically non-significant models.
Article
Terrain attributes (e.g., slope, rugosity) derived in Geographic Information Systems (GIS) from digital terrain models (DTMs) are widely used in both terrestrial and marine ecological studies due to their potential to act as surrogates of species distribution. However, the spatial resolution of DTMs is often altered to match the scale at which species observations were collected. Here, we highlight the significance of adequately reporting the methods used to derive terrain attributes from DTMs and the consequences of their incorrect reporting in ecological studies. To ensure full repeatability of studies, they should report (i) the source and the resolution of the original DTM; (ii) the algorithm used to calculate terrain attributes; (iii) the method used for rescaling (e.g., aggregating or resampling, using the mean or maximum values); and (iv) the order in which these operations were performed. We contrast the effects of two common scale alteration approaches for the derivation of terrain attributes from DTMs. These two scale alteration methods differ in the step at which the change is performed: (i) the resolution alteration is performed after computing terrain attributes from the original DTM at the native resolution, or (ii) the resolution alteration is performed on the native DTM before computing terrain attributes. While these approaches conceptually do the same thing (i.e., change the resolution of the terrain attributes), we demonstrate that they produce two distinct sets of variables that are not interchangeable and describe different properties of the terrain. In a species distribution modelling (SDM) context, the first approach calculates terrain attribute values within the cell where a species is found, while the second approach calculates terrain attribute values with respect to neighbouring cells. A mutual substitution of the two approaches results in a decrease of models' discrimination ability and in misleading spatial predictions of species probability of occurrence. Regardless of the DTM-derived attribute, we argue that the choice of the approach should be carefully guided by both the ecological scale relevant to the question being asked and the performance of pre-analyses. We emphasize that selected methods be clearly described to encourage reproducibility and proper interpretation of results, thus enabling a better understanding of the role of scale in ecology.
Article
Species distribution models (SDMs) are widely used in ecology and conservation. However, their performance is known to be affected by a variety of factors related to species occurrence characteristics. In this study, we used a virtual species approach to overcome the difficulties associated with testing of combined effects of those factors on performance of presence-only SDMs when using real data. We focused on the individual and combined roles of factors related to response variable (i.e. sample size, sampling bias, environmental filtering, species prevalence, and species response to environmental gradients). Results suggest that environmental filtering is not necessarily helpful and should not be performed blindly, without evidence of bias in species occurrences. The more gradual the species response to environmental gradients is, the greater is the model sensitivity to an inappropriate use of environmental filtering, although this sensitivity decreases with higher species prevalence. Results show that SDMs are affected to the greatest degree by the species response to environmental gradients, species prevalence, and sample size. Models’ accuracy decreased with sample size below 300 presences. Furthermore, a high level of interactions among individual factors was observed. Ignoring the combined effects of factors may lead to misleading outcomes and conclusions.
Article
Correlative species distribution models (SDMs) are widely used to predict species distributions and assemblages, with many fundamental and applied uses. Different factors were shown to affect SDM prediction accuracy. However, real data cannot give unambiguous answers on these issues, and for this reason, artificial data have been increasingly used in recent years. Here, we move one step further by assessing how different factors can affect the prediction accuracy of virtual assemblages obtained by stacking individual SDM predictions (stacked SDMs, S-SDM). We modelled 100 virtual species in a real study area, testing five different factors: sample size (200-800-3200), sampling method (nested, non-nested), sampling prevalence (25%, 50%, 75% and species true prevalence), modelling technique (GAM, GLM, BRT and RF) and thresholding method (ROC, MaxTSS, and MaxKappa). We showed that the accuracy of S-SDM predictions is mostly affected by modelling technique followed by sample size. Models fitted by GAM/GLM had a higher accuracy and lower variance than BRT/RF. Model accuracy increased with sample size and a sampling strategy reflecting the true prevalence of the species was most successful. However, even with sample sizes as high as >3000 sites, residual uncertainty remained in the predictions, potentially reflecting a bias introduced by creating and/or resampling the virtual species. Therefore, when evaluating the accuracy of predictions from S-SDMs fitted with real field data, one can hardly expect reaching perfect accuracy, and reasonably high values of similarity or predictive success can already be seen as valuable predictions. We recommend the use of a ‘plot-like’ sampling method (best approximation of the species' true prevalence) and not simply increasing the number of presences-absences of species. As presented here, virtual simulations might be used more systematically in future studies to inform about the best accuracy level that one could expect given the characteristics of the data and the methods used to fit and stack SDMs.