ArticlePDF Available

Abstract and Figures

Species distribution models (SDMs) have proven valuable in filling gaps in our knowledge of species occurrences. However, despite their broad applicability, SDMs exhibit critical shortcomings due to limitations in species occurrence data. These limitations include, in particular, issues related to sample size, positional uncertainty, and sampling bias. In addition, it is widely recognised that the quality of SDMs as well as the approaches used to mitigate the impact of the aforementioned data limitations depend on species ecology. While numerous studies have evaluated the effects of these data limitations on SDM performance, a synthesis of their results is lacking. However, without a comprehensive understanding of their individual and combined effects, our ability to predict the influence of these issues on the quality of modelled species–environment associations remains largely uncertain, limiting the value of model outputs. In this paper, we review studies that have evaluated the effects of sample size, positional uncertainty, sampling bias, and species ecology on SDMs outputs. We build upon their findings to provide recommendations for the critical assessment of species data intended for use in SDMs.
This content is subject to copyright. Terms and conditions apply.
www.ecography.org
ECOGRAPHY
Ecography
Page 1 of 20
is is an open access article under the terms of the Creative Commons
Attribution License, which permits use, distribution and reproduction in any
medium, provided the original work is properly cited.
Subject Editor: Miguel Araújo
Editor-in-Chief: Miguel Araújo
Accepted 1 July 2024
doi: 10.1111/ecog.07294
2024
1–20
2024: e07294
© 2024 e Authors. Ecography published by John Wiley & Sons Ltd on behalf of Nordic Society
Oikos
Species distribution models (SDMs) have proven valuable in lling gaps in our
knowledge of species occurrences. However, despite their broad applicability,
SDMs exhibit critical shortcomings due to limitations in species occurrence data.
ese limitations include, in particular, issues related to sample size, positional
uncertainty, and sampling bias. In addition, it is widely recognised that the quality
of SDMs as well as the approaches used to mitigate the impact of the aforemen-
tioned data limitations depend on species ecology. While numerous studies have
evaluated the eects of these data limitations on SDM performance, a synthesis
of their results is lacking. However, without a comprehensive understanding of
their individual and combined eects, our ability to predict the inuence of these
issues on the quality of modelled species–environment associations remains largely
uncertain, limiting the value of model outputs. In this paper, we review studies
Optimising occurrence data in species distribution models:
sample size, positional uncertainty, and sampling bias matter
Vítězslav Moudrý 1, Manuele Bazzichetto 1, Ruben Remelgado 2,3, Rodolphe Devillers 4,
Jonathan Lenoir 5, Rubén G. Mateo 6, Jonas J. Lembrechts 7, Neftalí Sillero 8, Vincent Lecours 9,
Anna F. Cord 2,3, Vojtěch Barták 1, Petr Balej 1, Duccio Rocchini 1,10, Michele Torresani 11,
Salvador Arenas-Castro 12, Matěj Man 13, Dominika Prajzlerová 1, Kateřina Gdulová 1, Jiří Prošek 1,13,
Elisa Marchetto 10, Alejandra Zarzo-Arias 14,15, Lukáš Gábor 1, François Leroy 1, Matilde Martini 10,
Marco Malavasi 16, Roberto Cazzolla Gatti 10, Jan Wild 1,13 and Petra Šímová 1
1Department of Spatial Sciences, Faculty of Environmental Sciences, Czech University of Life Sciences Prague, Praha-Suchdol, Czech Republic
2Chair of Computational Landscape Ecology, TUD Dresden University of Technology, Dresden, Germany
3Agro-Ecological Modeling Group, Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
4UMR Espace-Dev, Institut de Recherche Pour le Développement, Univ Réunion, La Réunion, France
5UMR CNRS 7058 ‘Ecologie et Dynamique des Systèmes Anthropisés’ (EDYSAN), Université de Picardie Jules Verne, Amiens, France
6Departamento de Biología and Centro de Investigacion en Biodiversidad y Cambio Global (CIBC-UAM), Universidad Autonoma de Madrid,
Madrid, Spain
7Research Group of Plants and Ecosystems (PLECO), Department of Biology, University of Antwerp, Antwerp, Belgium
8Centro de Investigação em Ciências Geo-Espaciais (CICGE), Faculdade de Ciências da Universidade do Porto, Alameda do Monte da Virgem, Vila
Nova de Gaia, Portugal
9Université du Québec à Chicoutimi, Saguenay, QC, Canada
10BIOME Lab, Department of Biological, Geological and Environmental Sciences, Alma Mater Studiorum University of Bologna, Bologna, Italy
11Free University of Bolzano/Bozen, Faculty of Agricultural, Environmental and Food Sciences, Bolzano/Bozen, Italy
12Área de Ecología, Dpto. de Botánica, Ecología y Fisiología Vegetal, Facultad de Ciencias, Universidad de Córdoba, Edicio Celestino Mutis (C-4),
Córdoba, Spain
13Institute of Botany of the Czech Academy of Sciences, Průhonice, Czech Republic
14Universidad de Oviedo, Oviedo, Spain
15Department of Biogeography and Global Change, Museo Nacional de Ciencias Naturales (MNCN-CSIC), Madrid, Spain
16Department of Chemistry, Physics, Mathematics and Natural Sciences, University of Sassari, Sassari, Italy
Correspondence: Manuele Bazzichetto (manuele.bazzichetto@gmail.com)
Review
20
Page 2 of 20
that have evaluated the eects of sample size, positional uncertainty, sampling bias, and species ecology on SDMs out-
puts. We build upon their ndings to provide recommendations for the critical assessment of species data intended for
use in SDMs.
Keywords: data quality, ecological niche modelling, ltering, sampling, spatial scale, validation
Introduction
e quantity and quality of biological observations have
improved dramatically over the past few decades. However,
a certain level of uncertainty is inherently present in such
data, resulting in uncertainties of scientic inferences based
on them (Hortalet al. 2015, Daru and Rodriguez 2023,
Hughesetal. 2023). Correlative species distribution models
(SDMs; also known as habitat suitability models or eco-
logical niche models; Sillero 2011) are useful for tackling
the gaps in our knowledge of species occurrence (Elith and
Leathwick 2009). ese models combine environmental
and species occurrence data to build a set of rules describing
the environmental space where species were observed (i.e.
species ecological niche) and can then be used to predict
the distribution of that species (Ferrieretal. 2017). SDMs
support a wide variety of ecological applications, such as the
assessment of the spread of invasive species (Guisanet al.
2013, Bazzichettoet al. 2021), the detection of potential
impacts of environmental changes on biodiversity (Ehrlén
and Morris 2015, Haesen et al. 2023), or the identica-
tion of suitable locations for the relocation of endangered
species (Guisanet al. 2013, Segaletal. 2021). However,
despite their broad applicability, SDMs have critical short-
comings associated in particular with the characteristics of
input data, including their quantity and quality (Elithetal.
2002, Barry and Elith 2006, Rocchinietal. 2011, Moudrý
and Šímová 2012, Wüestetal. 2020, Davieset al. 2023).
In this paper, we focus on the limitations of species occur-
rence data (for issues associated with environmental data,
see for example Fourcadeetal. 2018, Araújo etal. 2019
Moudrýetal. 2023).
Limitations of species occurrence data can introduce
uncertainty and biases in the estimation of species–environ-
ment relationships and, consequently, of their predicted dis-
tributions (Araújoetal. 2019). In particular, data availability
(i.e. sample size) is critical; the smaller the minimum sam-
ple size that can theoretically be used in SDMs, the higher
the number of species that can be modelled (Stockwell and
Peterson 2002). However, measurement errors associated with
data acquisition methods (i.e. positional error; Smithetal.
2023) are another major source of uncertainty, which may, in
eect, necessitate the use of a larger sample size than had the
data been accurate. In addition, the choice of inappropriate
sampling strategies can introduce biases towards certain loca-
tions (i.e. sampling bias; Bazzichettoetal. 2023). Moreover,
it is well-recognised that the quality of SDMs is also inu-
enced by the species’ ecology (Segurado and Araujo 2004,
Heikkinenetal. 2006, Guisanetal. 2007, McPherson and
Jetz 2007, Collartetal. 2023) and the fact that the eects of
dierent data limitations (e.g. sample size, positional uncer-
tainty, and sampling bias) may be species-specic.
As the interest in using SDMs continues to grow, tackling
data limitations becomes increasingly critical (Araújoet al.
2019, Wüest etal. 2020, Jansenetal. 2022, Marceretal.
2022). In this context, it is now expected that data character-
istics and limitations are considered and properly reported
during the conceptualisation and validation of SDMs
(Fenget al. 2019, Zurell etal. 2020, Sillero and Barbosa
2021, Tessaroloetal. 2021, Jansenetal. 2022, Jeliazkovetal.
2022, Boydet al. 2023). However, without proper knowl-
edge of the individual or combined eects of sample size,
positional uncertainty, sampling bias, and their interaction
with species’ ecology, our ability to anticipate the impact of
these issues on the quality of SDMs remains largely uncer-
tain, limiting the value of model outputs (see Fig. 1 for a
diagram introducing data characteristics and their relation-
ships considered in this review).
A common approach to the evaluation of the eects of
data limitations on model performances is to manipulate the
input data experimentally or to simulate datasets impacted
by various sources of bias or uncertainty. Here, we examine
studies that manipulated sample size (section ‘Sample size’)
or introduced positional uncertainty (section ‘Positional
uncertainty’) or sampling bias (section ‘Sampling bias’) to
investigate their impact on SDMs’ outputs. Building upon
these studies, we provide guidance on how to critically assess
the spatial data used in SDMs, and identify directions for
optimising the tradeos between data limitations and accu-
rate modelling of species–environment relationships (section
‘Guidelines and future directions’).
Sample size
Among all possible factors, sample size (Box 1) has the
most profound eect on the performance of an SDM
(ibaudetal. 2014, Santinietal. 2021). Sample size poses
an important constraint to the model complexity, i.e. to
the number of parameters to be estimated, as well as to the
algorithms and their settings used for modelling. In SDMs,
sample size can range from just a few (Papeş and Gaubert
2007, Pearsonetal. 2007) to millions (Botellaetal. 2023,
Gáboretal. 2024) of records. In the vast literature measuring
the eect of sample size on model performance (Table 1), the
primary concern has been to determine the minimum ade-
quate sample size required to produce reliable and t-for-pur-
pose models (Stockwell and Peterson 2002, Hanberryetal.
2012, Proosdijet al. 2016). In parallel, ecological research
investigates to what extent additional time and economic
Page 3 of 20
resources should be spent to improve models by increasing
the sample size (Liu et al. 2019). Knowing the minimum
(and maximum) sample size required for accurate predictions
would theoretically allow optimisation of the resources spent
on labour-intensive eldwork and, therefore, help reduce
associated costs. Nonetheless, the extent to which modelling
could replace eldwork remains questionable.
Importance of sample size in model training
and testing
Studies focusing on a better understanding of how sample
size impacts models’ accuracy revealed that it is in principle
possible to train SDMs with a relatively small sample. Values
typically range from 50 to 150 presences (or presences–
absences), although values as low as 10 presences or as high as
a few hundred have also been reported (Table 1). However, it
is important to note that studies typically reported minimum
sample size when the model was still relatively useful, not
sample size when the model gave optimal results. Besides, it
has been reported that models relying on fewer than approxi-
mately 70 presences do not reliably identify the variables
aecting distributional patterns (Smith and Santos 2020) or
result in poor(er) estimates of the shapes of species response
curves (Coudun and Gégout 2006, Shiroyamaetal. 2020,
Bazzichettoetal. 2023, Wang and Jackson 2023). In gen-
eral, all studies agreed that increasing sample size increased a
model’s predictive performance (keeping the number of pre-
dictors xed), although a plateau in model performance is
generally reached (Stockwell and Peterson 2002). According
to recent studies, hundreds of presences are needed to reach
the plateau where increasing sample size further adds little to
the model performance (Liuetal. 2019, Gáboretal. 2020a).
Insucient attention has so far been devoted to the eval-
uation of possible eects of the testing dataset sample size
on validating SDMs’ predictive performances. Generally,
small validation datasets can lead to inaccurate assessment
of model performance (Hallman and Robinson 2020).
Recently, Jiménez-Valverde (2020) showed that 30 pres-
ence–absence records (i.e. 15 presences and 15 absences) are
a (minimum) adequate sample size for a validation dataset
to estimate the predictive performance of presence–absence
models. However, their conclusions are based on simu-
lations, and it is important to note that studies using real
data are essential to generalise these results. In addition, the
minimal sample size of a validation dataset has not yet been
evaluated in the case of presence–background data; since
these carry less information than presence–absence data,
the validation set should be correspondingly larger (Collart
and Guisan 2023). While the importance of a suciently
large validation sample is intuitive, the impact of increasing
the sample size of the testing dataset on validation accuracy
urgently needs further testing.
Relationships between sample size, species ecology,
and model complexity
e association between model performance and sample
size depends largely on the species’ ecology. Studies have
repeatedly indicated that, for a given sample size, SDMs
better predict species with restricted geographical distribu-
tions (i.e. low range size, prevalence, or relative occurrence
Figure1. Sample size, positional uncertainty, and sampling bias are the three essential characteristics of species occurrence data addressed in
this review. ese interconnected characteristics can have a signicant impact on the reliability of species distribution models (SDMs)
results. Researchers must thoughtfully address these factors during the collection of species occurrence data (sampling design) and the
formulation of models (model complexity). Maximising sample size, using sampling bias correction methods, and minimising positional
uncertainty relative to the spatial resolution and autocorrelation of environmental predictors during model training and testing, are all
essential steps. Additionally, species ecology and the distribution of species observations in the geographic and environmental space can
exacerbate or attenuate the negative eects of small sample size, high sampling bias, and high positional uncertainty on the reliability of
SDMs results. See Box 1 for denitions of key terms and concepts.
Page 4 of 20
area), as well as specialist species with strict ecological
requirements (i.e. narrow ecological niche) than species with
wide geographic ranges and generalist (i.e. wide ecological
niche) species (Stockwell and Peterson 2002, Seoane etal.
2005, Hernandezetal. 2006, Tsoaretal. 2007, Mateoetal.
2010, Tessaroloetal. 2014, Proosdijet al. 2016, Hallman
and Robinson 2020, Arenas-Castroetal. 2022, Wang and
Jackson 2023). e association between model performance,
sample size, and species ecology can be explained by niche
completeness (i.e. the proportion of a species' niche covered
by the sampling). For example, if a species has a restricted
ecological niche (or range), the niche may likely be well rep-
resented by a low number of occurrences. On the other hand,
a large sample size does not necessarily mean a complete cov-
erage of the entire ecological niche for widespread species
(Bazzichettoetal. 2023, Boydetal. 2023).
is is further related to model complexity. Selecting a
model with an appropriate level of complexity, which would
prevent overtting noise in the data and, at the same time,
allow discrimination of inuential predictors from uninu-
ential ones and accurately capture the true species–environ-
ment relationship, remains a challenge (Merowetal. 2014,
García-Callejas and Araújo 2016, Baartman etal. 2020).
Building models with complex species response shapes and/
or too many predictors can result in diculties in recognis-
ing true complexity from noise, especially in case of low
sample size. However, even large sample sizes can result in
low accuracy in the estimation of model parameters if the
model is overly complex (i.e. includes too many parameters
or interactions, e.g. Wiszetal. 2008, Moreno-Amatet al.
2015). At the same time, undertting models that are not
exible enough to describe species–environment asso-
ciations risk failing to identify the factors shaping spe-
cies distributions. While adding more predictor variables
avoids neglecting important ones and can improve model
performance, the ability to distinguish between inuential
and uninuential variables depends on sample size (Smith
and Santos 2020). It is, therefore, recommended to keep
Box 1. Glossary of key terms.
Ecological niche: Hutchinsonian niche, dened as a hypothetical hypervolume spanned by the eco-physiological responses
of a species to all environmental factors aecting its tness.
Model complexity refers to the level of intricacy and exibility in the representation of a species' ecological niche. It
reects how well the model can capture the underlying relationships between predictors and species distribution. e
choice of model complexity depends on the nature of the problem, the amount and quality of available data, the number
of model parameters, and the available computational resources. Finding the right balance between a model's ability to
capture patterns and its potential for overtting is a key challenge in building eective models.
Model performance: here intended in a broad sense as a model capacity of recovering the underlying species–environ-
ment relationship using available data (‘explanatory’ performance), while also being able to extend (predict) out of the
sample used for training/calibration (‘predictive’ performance).
Model training is the process of teaching a machine learning or statistical model to make predictions based on data.
It is a crucial step in building and developing predictive models. Model training involves using a dataset with known
outcomes to enable the model to learn the underlying patterns and relationships in the data.
Model testing, also known as model evaluation, is the process of assessing the performance and eectiveness of a
machine learning or statistical model using a separate (independent) dataset that the model has not seen during training.
e primary purpose of model testing is to determine how well the trained model generalises to new, unseen data and to
assess its predictive accuracy and reliability.
Positional uncertainty (sometimes also referred to as positional error) in species occurrence data refers to inaccuracies
or uncertainty in the recorded coordinates of where a species was observed or collected. is error can result from factors
such as imprecise global navigation satellite systems (GNSS) measurements, data entry mistakes, or a lack of accurate
location information.
Spatial resolution or grain refers to the level of detail or granularity at which data are collected, represented, or analysed
in a spatial context. It can also be thought of as the size of the smallest spatial unit in a dataset (i.e. pixel size).
Sampling design refers to the approach used to collect species occurrence data. e sampling design is a crucial
aspect of SDMs, as it should in principle ensure that the data include all relevant information to represent the ecologi-
cal niche of the species and the environmental conditions in the study area. e quality and representativeness of the
data collected directly impact the accuracy and reliability of the model.
Sample size: the size of the data sample used to train and validate the model. Here, we dene sample size as the total
number of presences and absences (i.e. presence–absence data). When discussing studies based on presence–background
data, we refer specically to the number of presences.
Sampling bias: species occurrence records typically exhibit spatial bias, wherein some locations or environmental
conditions are more intensively sampled than others. People sample accessible locations more intensively than remote or
unpopular ones. is type of bias means that the available data used as the response variable fail to represent the complete
niche of the species.
Page 5 of 20
Table 1. Overview of studies testing the role of the number of presences, or presences and absences, for model performance. PA, presences–absences.
Study
Number of
species Training sample Testing sample Study extent / resolution
No.
predictors No. obs. suggested
Stockwell and Peterson 2002 130 birds 1–100 1000; presence–background Mexico / 3 × 3 minutes 8 At least 50 presences
Kadmonetal. 2003 192 plants 10–200 96 plots; presence–absence Israel / 1 × 1 km 3 50–75 presences
Hernandezetal. 2006 18 animals 5–100 50 presences California / 1 × 1 km 10 50–75 presences
Wiszetal. 2008 46 plants,
animals
10–100 Presence–absence data Five regions / 100 × 100 m;
1 × 1 km
11–13 At least 30 presences
Mateoetal. 2010 2 plants 9–60 Compared to maps created with
full datasets
Ecuador / 1 × 1 km 19 At least 20 presences
Feeley and Silman 2011 65 plants 25–150 Compared to maps created with
full datasets
Tropical South America / 5 ×
5 km
3 Larger than evaluated
Hanberryetal. 2012 16 trees 30–2500 Presence samples not used for
training
46 000 km2 / 310 000 polygons 16 At least 200 presences
Proosdijetal. 2016 6 virtual 3–50 Compared with actual virtual
species distribution
18 000 000 km2 / 5 × 5 minutes 15 14–25 presences
Liuetal. 2019 1800 virtual 20–640 3000 presences and absences
of virtual species distribution
62 500 km2 / 1 × 1 km 6 A few hundred presences
Støaetal. 2019 30 insects 5–320 Compared to maps created with
full datasets
Norway / 1 × 1 km 2 10–15 presences
Smith and Santos 2020 1 virtual 8–1024 400 presences and absences of
virtual species distribution
Virtual landscape / 1024 ×
1024 cells
1 At least 128 presences
McPhersonetal. 2004 7 birds 50–500 500 presences–absences South Africa / 0.25 ×
0.25 degrees
61 300 PA
Coudun and Gégout 2006 54 virtual 50–5000 Not used Not relevant 1 At least 50 PA
Jiménez-Valverdeetal. 2009 1 virtual 182–182, 288 Compared with actual virtual
species distribution
6576 km2 / 0.04 × 0.04°4 At least 70 PA
Shiroyamaetal. 2020 Bluegill 50–900 110 presences absences Seven rivers in Kanto region,
Japan.
4 At least 400 PA
Bazzichettoetal. 2023 2 virtual 200–500 Compared with actual virtual
species distribution
10 794 km2 / 1 × 1 km 2 At least 200 PA
Wang and Jackson 2023 16 virtual 50–800 50 presences–absences 140 000 km2 / 4 × 4 km 2 At least 100 PA
Page 6 of 20
the number of predictors reasonably small with respect to
the sample size (Williams et al. 2012, Brun et al. 2020,
Ramampiandraetal. 2023). e minimum required sample
size increases with the number of parameters, which also
determines the complexity of the assumed species response
curves (e.g. quadratic response curves or statistical interac-
tions among predictors; Austin 2002, Barry and Elith 2006,
Magginietal. 2006, Ficetolaetal. 2014, Merowetal. 2014,
Bell and Schlaepfer 2016, Carretero and Sillero, 2016). To
minimise the risks of overtting and undertting, it is use-
ful to evaluate models with varying levels of complexity
and sample size and to select the one with the best per-
formance while also minimising the performance dierence
between model training and testing (Merowet al. 2014,
Ramampiandraetal. 2023).
e minimum ratio of events to predictor variables is sug-
gested by the ‘events per variable’ (EPV) rule. A popular cri-
terion says that one should rely on at least ten observations
per predictor considering the event class (presence or absence
in case of binary data) with the lowest abundance (e.g. a data-
set with 70 presences and 30 absences would allow includ-
ing a maximum of three predictors; Reineking and Schröder
2006). However, it is worth noting that the EPV rule is a
guideline rather than a strict rule, and it is increasingly being
questioned (van Smeden et al. 2019). For example, the
appropriate ratio may vary depending on the specic context
and the complexity of the data (García-Callejas and Araújo
2016). erefore, in addition to sample size, it is important
to consider model complexity with respect to sampling bias
and positional uncertainty (see sections ‘Positional uncer-
tainty’ and ‘Sampling bias’).
Recommendations associated with sample size
e above-mentioned studies showed that SDMs can per-
form relatively well even with small sample sizes (Table 1).
However, the studies mentioned in Table 1 are dicult to
compare due to the use of dierent species, dierences in the
used modelling algorithms, numbers of parameters, spatial
resolutions, and geographical extents. Whether the sample size
is considered small or sucient depends largely on the num-
ber of predictors in the model, and the complexity and nature
of the species–environment relationships (Merowetal. 2014,
Smith and Santos 2020, Bazzichetto et al. 2023). Hence,
given how context-dependent these relationships are, we can-
not recommend a specic threshold of what a ‘small’ or ‘large’
sample is, but we provide a series of steps that researchers
should consider when preparing SDMs:
First, the sample size required for a particular analy-
sis requires careful consideration of the purpose of the
study (Foody 2011). On the one hand, models based
on low sample sizes can help identify potential knowl-
edge gaps and optimise the allocation of funds for eld
surveys (e.g. to pinpoint areas with a high potential for
discovering unknown populations of the studied spe-
cies; Raxworthy et al. 2003, Fois et al. 2015, 2018,
Rhodenet al. 2017, Beckeretal. 2022). On the other
hand, healthy scepticism remains in the scientic com-
munity of (macro)ecologists and biogeographers regard-
ing the usability of predictions derived from models with
small sample sizes as guidelines for applications such as
modelling species ranges, predicting responses to climate
change, or planning conservation eorts (Loiselle et al.
2008, Feeley and Silman 2011, Duputié et al. 2014,
Muscatelloetal. 2021).
Second, species’ ecology has to be considered as SDMs
better predict specialist species with narrow ecological
niches than generalist species with wider ecological niches
(Tsoaretal. 2007).
ird, researchers should consider the number of predic-
tors investigated. As the ability to dierentiate between
inuential and non-inuential variables decreases with
decreasing sample sizes, the challenge lies in the a priori
identication of variables that genuinely inuence spe-
cies distribution (Smith and Santos 2020). Studies that
include a small number of variables selected based on
expert opinion will generally require a smaller sample size
than studies that select variables from a large pool using
automated algorithms (Ficetolaetal. 2014).
Fourth, the complexity of the shape of species response
curves must be taken into account as models based on small
sample sizes result in less precise estimates of these shapes
(Bazzichetto et al. 2023). Models aiming at generating
simple response curves (e.g. linear, hinge, or step) can
be developed with relatively low sample sizes. However,
models identifying more complex shapes such as Gaussian
or even non-parametric smooth functions require much
larger sample sizes. Adding interactions between variables
increases the requirements for the sample size even more.
Fifth, we cannot suggest a minimum number of pres-
ences (presences–absences) as a rule of thumb, but if a
researcher is unsure whether the sample size is sucient
given the objectives and complexity of the model, we rec-
ommend testing the eect of sample size. Start with the
most comprehensive model you think is appropriate in
your particular case and progressively increase the sample
size until you reach your possible maximum (i.e. all pres-
ences you have), and see if your model performance is
reaching a plateau. If no plateau is reached, it is likely that
more presences are necessary. In such a case, a reduction in
the number of variables or the complexity of the response
curves should be considered. Remember to set aside at
least 30 presence–absence records for model validation, as
recommended by Jiménez-Valverde (2020).
Finally, while it is possible to design accurate SDMs
with a well-balanced sampling of as few as 50 presences
(Table 1), most observational data are too ad hoc and far
from being representative of spatial variation in species–
environment associations due to confounding eects
of data limitations such as positional uncertainty (sec-
tion ‘Positional uncertainty’), or sampling bias (section
‘Sampling bias’). Hence, researchers should also consider
these data limitations before attempting to build a model
based on a small sample.
Page 7 of 20
Positional uncertainty
Species occurrence data are always prone to positional uncer-
tainty, i.e. the dierence between the actual and recorded
location of a species in the coordinate reference system of
the dataset. e magnitude of the positional uncertainty
associated with species observations can range from a few
centimetres up to tens of kilometres. Under high positional
uncertainty, SDMs using environmental layers at spatial reso-
lutions ner than the magnitude of the positional uncertainty
(e.g. environmental layers at a 10 m resolution and a 50 m
positional uncertainty of species observations) can estimate
erroneous/misleading species–environment relationships.
e potential eect of positional uncertainty on SDMs per-
formance is determined by several interacting factors (Fig. 2).
erefore, positional uncertainty should be assessed before
calibrating and validating SDMs, as it can negatively aect
training and testing datasets as well as modelling decisions,
such as the spatial resolution of environmental variables.
How to address positional uncertainty in training
and testing datasets
Several studies have examined the impact of positional uncer-
tainty on SDMs performance by simulating shifts in species
presences (Table 2). ese studies typically compare SDMs
outcomes based on data with high positional accuracy against
results obtained using the same data but aected by posi-
tional uncertainty of dierent magnitudes. Findings from
these studies have been somewhat mixed: some found little
eect of positional uncertainty and reported that SDMs were
relatively robust to it (Grahametal. 2008, Fernandezetal.
2009); others concluded that species occurrence data with
positional uncertainty generally lead to less accurate SDMs
(Johnson and Gillingham 2008, Osborne and Leitao 2009,
Mitchelletal. 2017).
In real-world applications, a mix of high- and low-accuracy
distribution data is the most common situation, and research-
ers usually have to nd a compromise between positional
uncertainty and sample size (Smithetal. 2023). However,
studies focusing on this issue yielded somewhat conict-
ing results. Resideetal. (2011) warned that increasing the
sample size by incorporating historic species occurrence data
with inaccurate positions can reduce SDMs performances.
On the other hand, Smith et al. (2023) showed that the
removal of data with high positional uncertainty can exces-
sively reduce the sample size and, thus, the model accuracy
(Smithetal. 2023). Furthermore, Gáboretal. (2023) showed
that even models aected by positional uncertainty in spe-
cies data can be ecologically interpretable. Another study
Figure2. ree groups of interacting factors that determine the magnitude and potential impact of positional uncertainty on species
distribution models (SDMs) performance can be specied: the recording technique and data processing (section ‘Role of recording technique
and data processing’); species ecology and characteristics of the site (section ‘Relationships between positional uncertainty, species ecology
and ecosystem characteristics’); and the spatial resolution and degree of spatial autocorrelation of the predictors (section ‘Relationship
between positional uncertainty, spatial resolution and autocorrelation’).
Page 8 of 20
investigating the eect of positional uncertainty concluded
that models with small sample sizes were more aected by
positional uncertainty than models based on larger sample
sizes (Mitchelletal. 2017).
e role of positional uncertainty is rarely considered
in the evaluation of SDMs. Surprisingly, most SDM stud-
ies dealing with positional uncertainty only focused on the
training dataset, while ignoring the (potential) eect of inac-
curately georeferenced data in the validation dataset. e
ultimate consequence of positional uncertainty in species
data lies in an erroneous identication of the presence or
absence in a given cell (i.e. in specic environmental condi-
tions). In this regard, Foody (2011) demonstrated that vali-
dation data should be error-free (i.e. correctly distinguish
between presences and absences), as even a small amount
of error could result in misidentication of presences/
absences and substantial misestimation of model perfor-
mance. erefore, data correctly labelled as species presence
or absence (i.e. with minimal positional uncertainty) are
essential for assessing model performance. More recently,
Moudrýet al. (2017) showed that the inclusion of poten-
tially erroneous presences (in this case ambiguous breeding
bird categories used in the breeding bird atlases, i.e. possible
and probable breeding) severely aected models’ perfor-
mance metrics when added to the validation dataset, while
it had a relatively minor eect on model performance when
added to the training dataset. erefore, we suggest relying
on large sample size, possibly including observations with
low positional accuracy (i.e. with higher positional uncer-
tainty than the spatial resolution of predictors) for model
calibration, while preserving high-accuracy data for model
validation.
Alternatively, Moudrý and Šímová (2012) suggested that
knowing the positional uncertainty of the occurrences allows
balancing high- and poor-quality data in both training and
testing datasets, e.g. by including a predictor in the model
(even as a categorical variable with a few levels of data posi-
tional uncertainty) to be tested or to up/downweight the
importance of observations (see VelásquezTibatáetal. 2016
for such an approach using Bayesian models). is allows pre-
serving most of the data and osetting the potential negative
eect of high positional uncertainty. On the other hand, if
the predictor has many levels and few observations (per level),
it might be better to subset the data to retain only those of the
best quality. If only a small sample size is available, we recom-
mend considering the use of methods to mitigate positional
uncertainty (Heeyetal. 2014, Zhangetal. 2018, Smithetal.
2023). Note, however, that the existing approaches typically
either require knowledge of the magnitude of the uncertainty
and that their use is limited to data with relatively small posi-
tional uncertainty (Zhangetal. 2018), or they require that at
least part of the dataset is recorded with minimal positional
uncertainty (Heeyetal. 2014, Smithetal. 2023). Although
recent literature is favouring the inclusion of observations
with reasonable positional uncertainty rather than reducing
sample size (Gáboretal. 2023, Smithetal. 2023), we rec-
ommend careful consideration of this trade-o. Whether it
is preferable to maintain the sample size or to minimise the
adverse eect of positional uncertainty remains a very timely
and unanswered question.
Role of recording technique and data processing
Old datasets, such as historical observations archived in
museums, atlases, and natural history collections that were
retrospectively georeferenced, are usually thought to be more
prone to relatively higher positional error than new ones
(Grahametal. 2004, Wieczoreketal. 2004, Newbold 2010,
Bloometal. 2018, Marceretal. 2022). However, positional
error aects any dataset, including those georeferenced using
modern technologies such as the global navigation satellite
systems (GNSS). Indeed, several factors can degrade GNSS
positional accuracy, including the number and position of
satellites, and the characteristics of the study site (e.g. beneath
a dense forest canopy versus an open grassland). e use of a
low number of satellites to georeference species data may be
due to the use of outdated technology, such as the use of a
device that relies only on the US Global Positioning System
(GPS), instead of using all currently available systems (e.g.
Galileo, Glonass, and Beidou). Even when the above-men-
tioned challenges are overcome, species occurrence data may
still be impacted by errors introduced during data process-
ing (e.g. wrong transformations among coordinate reference
systems, rounding of coordinates, or lack of error correc-
tion procedures such as post-dierential correction; Sillero
and Gonçalves-Seco 2014). Unfortunately, the positional
Table 2. Studies analysing the influence of positional uncertainty in species occurrence data on species distribution models (SDMs).
Study Species data Resolution of environmental var.
Range of shifting occurrences
Distance Cells
Grahametal. 2008 Observed 100 × 100 m 0–5 km 0–50 cells
Johnson and Gillingham 2008 Observed 30 × 30 m 50–1000 m 1–34 cells
Osborne and Leitao 2009 Observed 1 × 1 km 0–5 km 0–5 cells
Fernandezetal. 2009 Observed 1 × 1 km 5–50 km 1–50 cells
Naimietal. 2011 Virtual Artificial data Not valid 1–30 cells
Mitchelletal. 2017 Observed 2.5 × 2.5 m 5–400 m 1–160 cells
VelásquezTibatáetal. 2016 Virtual 150 × 150 cells Not valid 5–15 cells
Gáboretal. 2020b Virtual 5 × 5 m 5–500 m 1–100 cells
Gáboretal. 2023 Virtual 50 × 50 m 50–1500 m 1–30 cells
Gáboretal. 2023 Observed 200 × 200 m 1–30 km 1–30 cells
Page 9 of 20
uncertainty of species records is often undocumented
(Moudrý and Devillers 2020, Marceretal. 2022).
Relationships between positional uncertainty,
species ecology, and ecosystem characteristics
It is usually impossible to accurately georeference positions
for non-sessile species (unless they are equipped with trans-
mitters) due to environmental barriers (for example, it is
impossible to get close to the species in some habitats) and/
or species characteristics (e.g. size, mobility, and behaviour)
(Frairetal. 2010). Besides, georeferencing species' location
using GNSS in a dense forest or at the bottom of a narrow
and deep ravine may be dicult due to the poor reception of
the satellite signal. In addition, buildings, walls, and trees in
the proximity of an antenna can reect the signal from satel-
lites, thereby further reducing the positioning accuracy (a
phenomenon known as multipath; Kosetal. 2010). Besides,
GNSS does not work underwater; in eect, the position-
ing of species observations in marine and freshwater envi-
ronments is based on acoustic positioning, which leads to a
decrease in accuracy with the water column depth, or simply
on recording a position at the surface of water and disregard-
ing movements of the sampling gear in the water column
(Rattrayetal. 2014, Mitchelletal. 2017). As a result, data
for mobile animals can have a positional uncertainty of tens
to hundreds of metres. e distance between an animal and
the observer is positively associated with the species' body
size and, therefore, big animals are typically less accurately
georeferenced as they move a lot or can be even danger-
ous, which leads to recording their location from a distance
(Zhangetal. 2018).
e eect of positional uncertainty on SDMs may also
depend on the species' mobility, expressed as the daily dis-
persal range or migration ability. Many birds, shes, and big
predators are very mobile, and the accurate georeferencing of
their location may play a smaller role in SDM performance
than in the case of sessile species (see Fig. 2 for an overview
of the factors that may interact with the magnitude of posi-
tional uncertainty when building SDMs). In this regard,
Gáboretal. (2023) showed that the performance of a band-
tailed pigeon SDM only slightly decreased with increasing
positional uncertainty, while virtual species simulations that
did not consider species mobility showed a rapid decrease in
SDM performance. Although positional uncertainty seems
to depend on species characteristics, its role in aecting
SDMs for dierent groups (such as insects versus big mam-
mals; mobile organisms like birds versus sessile organisms like
plants, corals, etc.) is understudied. Among the few studies
that analysed the interaction between positional uncertainty
and species ecology, VelásquezTibatá et al. (2016) and,
more recently, Gáboretal. (2020b), showed that positional
uncertainty has a greater impact on SDMs’ performances for
specialists (i.e. species with a narrow niche breadth) than for
generalist species (i.e. those with a wide niche breadth). is
is due to occurrences of specialist species being more suscep-
tible to a shift into unsuitable environments.
Relationships between positional uncertainty, spatial
resolution, and autocorrelation
e spatial resolution of predictors used in SDMs is
another critical factor determining the impact of positional
uncertainty on model performance. Previous studies on
positional uncertainty considered shifts from 5 m up to 50
km. Such a range of uncertainty results in a less impactful
shift of species data over raster cells (and across environmental
conditions) in a coarse-resolution set of environmental layers
(e.g. 1 × 1 km) than in a ne-resolution set of environmental
layers (e.g. 10 × 10 m). Note that more recent studies
investigated shifts of the species occurrence data by up to 160
pixels (which is almost threefold compared to older studies)
thanks to the reduced pixel sizes in the current environmental
data (see Table 2 for the combinations of adopted resolution
and positional uncertainty in existing studies). Indeed, with
today’s availability of high spatial resolution predictors,
misuse of positionally inaccurate species occurrences is
increasingly likely, with the risk of exacerbating the negative
eect of positional uncertainty on SDMs’ performances.
To reduce the eect of positional uncertainty, multiple
studies suggested adjusting spatial resolution so that the largest
positional uncertainty associated with occurrence data is lower
than the spatial resolution of the predictors (Engleretal. 2004,
Moudrý and Šímová 2012, Keiletal. 2014, Volleringetal.
2016, Silleroetal. 2021a). However, coarsening the spatial
resolution of the environmental variables may degrade
information on ne-scale heterogeneity in environmental
variables, eventually reducing their explanatory power for
predicting species distribution (Mertes and Jetz 2018). In
addition, spatial resolution can be coarsened to a level that is
too far from the relevant ecological scale (Lecoursetal. 2015,
Moudrýetal. 2023). Recently, Gáboretal. (2022) showed that
coarsening the spatial resolution to compensate for positional
uncertainty does not improve model performance. However,
they used a relatively simple virtual species approach, so more
studies, preferably using ‘real’ species, are needed to validate
their results. Whether maintaining the spatial resolution of
the response variable close to the ecological scale is more
important than minimising the adverse eect of positional
uncertainty (or whether the opposite is true) remains a very
current and unanswered question (see Moudrýetal. 2023 for
a review of practices for appropriate grain selection).
It is crucial to recognise that shifting species records in the
geographic space does not necessarily translate to an equiva-
lent shift in the environmental space. High positional uncer-
tainty can lead to mischaracterizing the conditions under
which a species occurs, especially in regions characterised
by steep ecological gradients, such as mountainous areas or
heavily fragmented landscapes. Indeed, the impact of posi-
tional uncertainty is related to the spatial autocorrelation of
environmental variables. Naimietal. (2011) found that the
impact of positional uncertainty on SDMs’ prediction perfor-
mance decreased with increasing spatial autocorrelation in the
environmental variables. In this regard, examining the degree
of spatial autocorrelation in environmental variables was
Page 10 of 20
suggested as a way to a priori assess the impact of positional
uncertainty on SDMs predictions (Naimietal. 2011, 2014).
Recommendations associated with positional
uncertainty
It is crucial to consider data quality and to carefully assess the
implications of using data aected by positional uncertainty in
either the training or validation process. Such considerations
will yield more reliable assessments of model performance
and improve the accuracy of SDMs.
First, we recommend ‘cleaning’ the dataset and removing
aberrant errors (e.g. records with switched latitude and
longitude, or records located at zoos or botanical gardens).
is can be performed using automated methods such as
those implemented by the ‘CoordinateCleaner’ R package
(Zizkaetal. 2019).
Second, researchers should quantify the positional uncer-
tainty of the remaining input data, for example, using
attributes specifying positional uncertainty. If such assess-
ment is limited by metadata availability, for example in
the case of historical data, it is recommended to at least
approximate the positional uncertainty based on known
information, such as the collection methodology or the
number of decimals recorded with coordinates (Peterson
and Samy 2016, Watcharamongkoletal. 2018, Moudrý
and Devillers 2020).
ird, we recommend researchers to carefully weigh the
trade-os between positional uncertainty and spatial reso-
lution of environmental variables, with greater emphasis on
the use of a resolution as close to the ecological scale as pos-
sible (Gáboretal. 2022, Moudrýetal. 2023). Preferably,
the positional uncertainty should be lower than the spa-
tial resolution of the environmental variables (Moudrý and
Šímová 2012). We suggest that the spatial resolution should
be at least twice the positional uncertainty to reduce the
risk of miscalculation of species–environment relationships.
However, this may not always be achievable. In such a case,
it is important to consider the following steps to estimate
and acknowledge the potential impact of positional uncer-
tainty on the performance of the model.
Fourth, we suggest considering positional uncertainty in
light of the particular species’ ecology as some groups of
species, such as mobile species, might be less aected by
positional uncertainty than others (Gáboretal. 2020b).
Fifth, researchers should examine the spatial autocorrelation
in predictors to gain insight into whether predictions are
likely to be aected by positional uncertainty (Naimietal.
2011, 2014). is may include testing the impact of
various resolutions on model performance.
Finally, we recommend considering the use of methods
to mitigate positional uncertainty (Heey et al. 2014,
Zhangetal. 2018, Smithetal. 2023). Alternatively, know-
ing the positional uncertainty of the occurrences allows
the inclusion of predictors in the model to be tested or to
up/downweight the importance of observations (Moudrý
and Šímová 2012, VelásquezTibatá et al. 2016). For
new surveys, we suggest using measurement techniques
that minimise positional uncertainty, such as dierential
GNSS (Silleroetal. 2021b), and providing an estimate of
the measurement accuracy (as is increasingly common in
global databases).
Sampling bias
Sampling bias poses a signicant challenge in SDMs, lead-
ing to models that provide a partial or distorted view of spe-
cies distribution or ecological niche (Kadmon et al. 2004,
Leitãoetal. 2011, Beanetal. 2012, Becketal. 2014, Stolar
and Nielsen 2015, Bardonet al. 2021). Despite advances,
our knowledge of species distributions still remains limited
for most taxa due to the variations in the sampling inten-
sity over time and huge regions of the world remaining
poorly sampled (Isaac and Pocock 2015, Menegotto and
Rangel 2018, Hughes et al. 2021, Daru and Rodriguez
2023). Typically, positive sampling biases have been reported
towards easily accessible areas (e.g. proximity to roads, riv-
ers, and urban settlements, Kadmonetal. 2004), protected
areas (Boakesetal. 2010, Girardelloetal. 2019), more popu-
lated areas (Geldmannetal. 2016), and charismatic species
(Troudetetal. 2017), leading to spatial and taxonomic biases
(Hugesetal. 2021). Uneven data-sharing practices further
exacerbate this issue (Meyeretal. 2015). Various methods
have been proposed to compensate for sampling bias in spe-
cies occurrence records, aiming to create models with qual-
ity comparable to models developed with unbiased data.
Prevalent approaches for bias compensation include adjust-
ing background samples (the target-group background, TGB,
approach; Phillipsetal. 2009) in presence–background mod-
els, or ltering (thinning) presences (Veloz 2009) (Table 3).
e rationale behind the TGB approach is to select back-
ground data with the same sampling bias as the set of presence
records (i.e. to bias the background locations towards areas
where the presences were sampled; Phillipsetal. 2009). e
TGB approach adjusts the selection of the background data
by assessing the ‘sampling eort’, which indicates the eort
invested during sampling. For example, the TGB approach
restricts the sampling of background data to locations where
other species of the same order or family as the target species
have been observed (preferably using the same methodology/
database). is is done assuming that hypothetical surveys
would have detected the focal species if it had been present
in those locations. erefore it is especially useful for large
citizen science projects (Barberetal. 2022, Boydetal. 2023)
but less suitable for poorly sampled regions where information
on the target group may not be available. An appropriately
selected target-group background leads to a more reliable esti-
mation of species–environment relationships. Note, however,
the importance of careful selection of target group species, as
the density of occurrences not only reects sampling eort but
also the varied densities of species and their ecological prefer-
ences, potentially introducing new biases (Botellaetal. 2020).
Page 11 of 20
e ltering approach (or thinning) was designed to reduce
the negative eect of sampling bias by reducing the number
of presences in oversampled regions in the geographic space
(Veloz 2009) or oversampled environmental conditions in the
environmental space (Varela etal. 2014). Both geographic
and environmental ltering use a distance between presences
to determine the lter size. However, while geographic lter-
ing uses distances in the geographic space (e.g. latitude and
longitude), environmental ltering uses the range between
values of multiple environmental variables (Varela et al.
2014, Castellanosetal. 2019). Another strategy carried out
in the environmental space is to use presence data (i.e. their
position in the environmental space) to identify and lter out
background points likely associated with suitable habitats
(Da Reetal. 2023). Many studies have evaluated the perfor-
mance of these methods, simulating the bias by sub-sampling
the original data (i.e. a presumably complete dataset without
any bias) or by addressing bias already present in the datasets
(Table 3). Such assessments require independent evaluation
data containing both presence and absence records or com-
parison against models based on the unbiased dataset before
sub-sampling simulation.
Should the bias be assessed in the geographic or
environmental space?
ere is an ongoing debate about whether bias should be
assessed in the geographic or environmental space, or both
(Varelaetal. 2014, Moudrý 2015, Cosentino and Maiorano
2021, Xuet al. 2024). According to Hutchinson's duality,
there is a correspondence between the species' niche in envi-
ronmental space and its distribution in geographic space.
is means that the environmental conditions where a spe-
cies occurs (its ecological niche) are reected in its geographic
distribution. Conversely, the geographic distribution of a
species can provide insights into its ecological niche require-
ments (Colwell and Rangel 2009). In theory, every location
in geographic space can be ‘uniquely’ characterised by the
environmental conditions at that location. However, pro-
jections of subsets of environmental space into geographic
space can have complicated structures (i.e. a single point in
environmental space may correspond to many locations in
geographic space; see Colwell and Rangel 2009, Soberón and
Nakamura 2009). If only partial knowledge of the ecological
niche of a species is available, predicting its distribution in
geographic space may result in the omission of multiple loca-
tions. On the other hand, a missing site in the geographic
space may be substituted by another site with the same envi-
ronmental conditions. Consequently, the challenge in esti-
mating species–environment relationships lies not only in
the spatial bias within the geographic space where the bias
originates but also in how this bias is reected in the environ-
mental space (i.e. the ecological niche space). All SDMs are
not purely spatial methods (like interpolation, for instance),
and the calculations actually occur within the environmen-
tal space dening the species’ ecological niche. erefore,
Table 3. Studies that evaluated the effect of sampling bias and the effectiveness of methods proposed to compensate for sampling bias on
model performance. TGB, target-group background.
Study Number of species Bias type Evaluation approach Bias correction Main conclusion
Phillipsetal. 2009 226 Existing Independent data TGB Bias correction improve
models
Bystriakovaetal.
2012
5 plants
(Asplenium spp.)
Existing Independent data
(but only
presences)
TGB Bias correction improve
models
Kramer-Schadtetal.
2013
Malay civet, two
virtual species
Existing,
Simulated
Simulated data Geographic filtering, TGB Geographic filter is preferred
relative to TGB
Syfertetal. 2013 Tree fern Existing Independent data TGB Bias correction improve
models
Fourcadeetal. 2014 Turtle, salamander,
virtual species
Simulated Original model
based on
unbiased data
Five methods Variable efficiency, further
research needed
Varelaetal. 2014 Virtual Simulated Original model
based on
unbiased data
Environmental and
geographic filtering
Recommend environmental
filtering
Rancetal. 2017 Virtual Simulated True distribution of
simulated species
TGB Bias correction is detrimental
for some species
Castellanosetal.
2019
Virtual Simulated True distribution of
simulated species
Environmental and
geographic filtering
Recommend environmental
filtering
Gáboretal. 2020a Virtual Simulated True distribution of
simulated species
Environmental filtering Filtering is not necessarily
helpful
Chauvieretal. 2021 1,900 plants Existing Independent data Bias covariate correction,
and environmental bias
correction
Combining both methods
might be the best choice
Inmanetal. 2021 Virtual Simulated True distribution of
simulated species
TGB, geographic and
environmental filtering
Bias correction is detrimental
for some species
Bakeretal. 2022 Virtual Simulated True distribution of
simulated species
Geographic filtering More mechanistic
understanding of how
sampling biases arise is
needed
Page 12 of 20
addressing bias within the environmental space directly tack-
les the model calibration.
Sampling bias is inuenced by the sampling design (Hirzel
and Guisan 2002, Tessarolo et al. 2014, Mateoetal. 2018,
Bazzichettoetal. 2023). A fundamental assumption under-
lying presence–background methods is that environmental
conditions are sampled in proportion to their actual avail-
ability (Hastie and Fithian 2013). Note that it is not a geo-
graphic space where uniform sampling is required but rather
the environmental conditions that have to be sampled in pro-
portion to their availability (Aartsetal. 2012, Merowet al.
2013). If this is not fullled, clustered occurrences may lead
to the overestimation of the environmental suitability for the
respective species in environments that have been sampled
more intensively (e.g. environments in protected areas, or
near roads and towns) and underestimated for those surveyed
less intensively (Barry and Elith 2006, Guillera-Arroitaetal.
2015). For instance, fully random draws of species' presence
in the geographic space may introduce a bias towards the most
widespread environmental conditions, which possibly leads
to uneven sampling of the species’ niche within the environ-
mental space (Bazzichettoet al. 2023). is issue is associ-
ated with another underlying assumption: that the species'
niche is comprehensively sampled across the entire spectrum
of environmental conditions in which it occurs (Phillipsetal.
2009). Failing to meet this assumption, which can happen
when there is a lack of knowledge about a species’ tolerance
to abiotic conditions (i.e. environmental bias), may cause a
poor estimation of the actual niche occupied by the species
(Hortaletal. 2008). If the ecological niche of the species is
truncated (i.e. the complete niche of the species is not cap-
tured by the occurrences), it is not possible to extrapolate a
reliable model into dierent spatial or temporal dimensions
(Chevalieretal. 2022). erefore, representative sampling of
the environmental space should in principle give better results,
regardless of its bias in the geographic space (Tessaroloetal.
2014, Sabatinietal. 2021, Bazzichettoetal. 2023).
We recommend considering both geographic and
environmental spaces in the assessment of sampling bias
(Tessarolo et al. 2014, Cosentino and Maiorano 2021). In
areas of high geographic and high environmental bias, and
particularly in undersampled environments, further sampling
eorts are required. Alternatively, bias correction based on the
TGB method or geographic ltering can be suitable options
(Inmanetal. 2021), although the latter was recently strongly
criticised, and its eectiveness in mitigating sampling biases is
being questioned (Ten Caten and Dallas 2023, Lamboley and
Fourcade 2024). Given that geographic ltering reduces the
sample size, TGB seems to be a better alternative (Barberetal.
2022). However, a bias in the geographic space does not
necessarily lead to a bias in the environmental space. If the
geographic bias is high but the environmental bias is low, no
corrections are needed, and the data can be used ‘as is’ for mod-
elling. For example, Kadmonetal. (2004) and more recently
Mccarthyetal. (2012) showed that collecting data close to
roads can still provide an adequate sampling of ecological gra-
dients if the road network has high environmental coverage,
thus allowing the uncovering of the true species–environment
relationships. In the case of low geographic but high environ-
mental bias, further sampling of undersampled environments
is preferable; however, if it is not possible, it is reasonable
to consider directly a correction in the environmental space
using environmental ltering (Varelaetal. 2014, Cosentino
and Maiorano 2021). Nevertheless, see the risks of perform-
ing this procedure described in the following paragraph.
Geographic and environmental spaces are communicat-
ing vessels, and so correcting one component (geographic
or environmental) may have a detrimental eect on the
other. For example, geographical ltering could unwittingly
exclude occurrences in the environmental space with unique
environmental conditions or disguise true patterns, e.g. due
to clustering for ecological reasons such as breeding, social
behaviour, or predator–prey dynamics (Varelaetal. 2014).
On the other hand, environmental ltering (downweighting
repeated species occurrences in similar environmental con-
ditions) identies grid cells within marginal habitats to be
equally suitable as the cells representing core habitats. For
example, if the species probability of occurrence is 0.1 at one
site and 0.7 at another, such sites will be occupied in one and
seven out of 10 cases, respectively. If we disregard the pres-
ences at the latter site, we lose the ability to discern the condi-
tions favoured by the species (Moudrýetal. 2015). Indeed,
it is impossible to use presence–background data to deter-
mine whether species observed in particular environments
result from a larger sampling eort or ecological preferences
(Guillera-Arroita et al. 2015), and removing bias without
the information on the sampling eort becomes quixotic
(Rocchinietal. 2023).
How sampling bias (and correction methods)
interact with species ecology
Several studies have reported that there was no improvement
or even detrimental eects on SDMs performance after lter-
ing out biased samples (Chefaoui and Serrão 2017, Rancetal.
2017, Gábor et al. 2020a), and it has been suggested that
this might be related to species ecology (Bystriakovaetal.
2012). For example, Rancetal. (2017) showed that range
size was the most important factor driving species vulnerabil-
ity to sampling bias, and that widespread species were more
aected by sampling bias and more likely to benet from
bias correction than species with narrow geographic ranges.
Similarly, Bakeretal. (2022) showed that species type has a
notable eect on model performance, with models generally
being more robust to the eects of sampling bias for specialist
(narrow environmental niches) than for generalist (wide envi-
ronmental niches) species. In addition, a few studies high-
lighted that bias correction was detrimental for species with
narrow ranges (Rancetal. 2017), narrow niches (Inmanetal.
2021), or low prevalence (Gáboretal. 2020a) and yielded
worse models than without bias correction. It is evident that
dierent species are dierently aected by sampling bias and
respond dierently to bias correction. erefore, species ecol-
ogy should be considered when correcting for sampling bias.
Page 13 of 20
Recommendations associated with sampling bias
Complete elimination of spatial bias from the modelling
procedure is impossible without proper knowledge of all
the processes generating it (Rocchinietal. 2023), and it is
unrealistic to assume that sampling bias in biodiversity data
can be eliminated, even with the development of automated
observation technologies. Hence, SDMs need to explore and
acknowledge the inherent biases associated with the data in
both the geographic and environmental space (Cosentino
and Maiorano 2021, Rocchinietal. 2023).
First, researchers should quantify the sampling bias of
their input data in the geographic space. For example, the
‘sampbias’ R package (Zizkaetal. 2021) can be used for
such purposes.
Second, bias should also be evaluated in the environmen-
tal space by comparing the distribution of the cells where
the focal species was present to all cells in the study area
in a gridded environmental space of ecological predictors.
is can be done, for example, by using ecological niche
factor analysis (Hirzeletal. 2002); ‘hypervolume’ R pack-
age (Blonderetal. 2014); or principal component analysis
in the ‘ecospat’ R package (Di Colaetal. 2017).
e relationship between geographic and environmental
bias should be further explored using local indicators of
spatial association (LISA; Anselin 1995) and the results of
such an assessment should be used as a basis for the selec-
tion of bias-correction methods (Cosentino and Maiorano
2021, Rocchinietal. 2023). is quantication can also
assist researchers in eectively directing their additional
sampling eorts.
e next step lies in the application of the bias-correc-
tion method, if necessary. Filtering or the TGB approach
are possible options, but caution is needed as they could
result in lower model performance in particular cases.
is requires consideration of species’ ecology, as spe-
cialist species typically do not benet from bias correc-
tion or can even be negatively aected by it (Gáboretal.
2020a, Inmanetal. 2021, Bakeretal. 2022). In addi-
tion, it is important to notice that ltering will inevitably
reduce the number of presences available for modelling.
erefore, if the sample size is relatively small, the TGB
approach might be a preferred method (or alternatives
such as that proposed by Da Reetal. 2023 for lter-
ing background points implemented in the ‘USE’ R
package).
Figure3. Workow for a critical assessment of spatial data to be used in species distribution models (SDMs). For more information on the
individual steps, see the ‘Recommendations’ subsections at the end of each main section.
Page 14 of 20
Guidelines and future directions
Despite the increasing number of studies focusing on how
various limitations inherent to species data aect the perfor-
mance of SDMs, there are still gaps in our knowledge, and
the use of SDMs remains problematic in many contexts. To
advance our understanding, future studies should focus on
comprehensive analyses that simultaneously consider vari-
ous issues, such as sample size, sampling bias (in the geo-
graphic and environmental space), positional uncertainty,
spatial resolution, and the interaction between the former
factors and species’ ecological characteristics (Fig. 1). Such
studies can help establish the urgently needed guidelines
for better-informed modelling choices (e.g. bias correc-
tion, removal of data with high positional uncertainty and
its eect on sample size and SDMs performance) concern-
ing data limitations and species ecology. Regarding species
characteristics, it is important to do such evaluations on
characteristics that are easy to specify (i.e. we know them
for the majority of species), such as species’ niche breadth
(generalist versus specialist species), dispersal ability, body
size, or trophic group. is way, the assessments can be
further used to guide data selection processes in other
studies. e consideration of data limitations is crucial in
every domain where SDMs are used (Araújo and Peterson
2012, Guisanet al. 2013). ese include the discovery of
new populations (Foiset al. 2015), reserve selection and
design (Esselman and Allan 2011), species translocations or
reintroductions (Segaletal. 2021), biological invasions and
disease transmission studies (Peterson 2014, Peterson and
Samy 2016, Johnsonetal. 2019), investigations of climate
change impacts (Ehrlén and Morris 2015, Haesen et al.
2023), or testing of biogeographical or evolutionary hypoth-
eses (Machadoetal. 2019).
Finally, it is crucial to transparently report bias and uncer-
tainty in the data used for modelling. is includes quantify-
ing sampling bias in geographical and environmental space,
as well as positional uncertainty concerning the spatial reso-
lution and autocorrelation of predictors (Fig. 3). Reporting
on how species occurrences were divided into training and
testing datasets, whether their positional uncertainty was
considered and, if applicable, which ones were removed and
what was the impact on sample size. Whenever possible, rig-
orous tests should be conducted to examine the impact of
geographical and environmental bias, as well as of positional
uncertainty, on model performance (e.g. indicating which
approaches were considered to minimise bias and positional
uncertainty and their results). Until more comprehensive
assessments are available, we strongly recommend remain-
ing vigilant about data limitations and following the basic
guidelines for a critical assessment of spatial data to be used
in SDMs shown in Fig. 3. e data collection methods, pre-
processing, model tting, and quality assessments, can be
reported using standard protocol for reporting SDMs’ over-
view, data, model, assessment, and prediction (ODMAP;
Zurelletal. 2020).
Funding – Funded by the European Union. Views and opinions
expressed are, however, those of the author(s) only and do not
necessarily reect those of the European Union or the European
Research Council Executive Agency. Neither the European Union
nor the granting authority can be held responsible for them. is
work was funded by the Horizon Europe project EarthBridge
(grant agreement no. 101079310). RR and AFC were supported
by the Deutsche Forschungsgemeinschaft (DFG, German Research
Foundation) under Germany’s Excellence Strategy – EXC 2070 –
390732324. MB acknowledges funding from the European Union's
Horizon Europe research and innovation programme under the
Marie Skłodowska-Curie grant agreement no. 101066324. RGM
was funded by project grants Connect2restore (TED2021-
129589B-I00, funded by MCIN/AEI/10.13039/501100011033
and by the European Union NextGenerationEU/PRTR), and
NextDive (PID2021-124187NB-I00, funded by MCIN/
AEI/10.13039/501100011033 and by ERDF, a way of making
Europe). AZ-A was supported by a Margarita Salas Contract
nanced by the European Union-NextGenerationEU, Ministerio
de Universidades y Plan de Recuperacion, Tranformacion y
Resiliencia, Spain. MJM, JW, and JP were supported by the
Czech Academy of Sciences (project RVO 67985939). FL was
funded by the European Union (ERC, BEAST, 101044740). JJL
was supported by BiodivERsA+ (ASICS project (G0H6720N,
BiodivClim call 2019-2020)). NS was supported by a CEEC2017
contract (CEECIND/02213/2017) from FCT – Fundação para a
Ciência e a Tecnologia, Portugal. MT was partially funded by the
European Union’s Horizon 2020 research and innovation program
under grant agreement no. 862480 (SHOWCASE).
Author contributions
Vítězslav Moudrý: Conceptualization (lead); Visualization
(equal); Writing – original draft (lead); Writing – review and
editing (equal). Manuele Bazzichetto: Conceptualization
(equal); Writing – review and editing (equal). Ruben
Remelgado: Conceptualization (equal); Writing – review
and editing (equal). Rodolphe Devillers: Conceptualization
(equal); Writing – review and editing (equal). Jonathan
Lenoir: Conceptualization (equal); Writing – review and
editing (equal). Rubén G. Mateo: Conceptualization
(equal); Writing – review and editing (equal). Jonas J.
Lembrechts: Conceptualization (equal); Writing – review
and editing (equal). Neftalí Sillero: Conceptualization
(equal); Writing – review and editing (equal). Vincent
Lecours: Conceptualization (equal); Writing – review
and editing (equal). Anna F. Cord: Conceptualization
(equal); Writing – review and editing (equal). Vojtěch
Barták: Conceptualization (equal); Writing – review and
editing (equal). Petr Balej: Conceptualization (equal);
Writing – review and editing (equal). Duccio Rocchini:
Conceptualization (equal); Writing – review and editing
(equal). Michele Torresani: Conceptualization (equal);
Writing – review and editing (equal). Salvador Arenas-
Castro: Conceptualization (equal); Writing – review
and editing (equal). Matěj Man: Conceptualization
(equal); Writing – review and editing (equal). Dominika
Prajzlerová: Conceptualization (equal); Writing – review
and editing (equal). Kateřina Gdulová: Conceptualization
Page 15 of 20
(equal); Writing – review and editing (equal). Jiří Prošek:
Visualization (equal); Writing – review and editing
(equal); Elisa Marchetto: Conceptualization (equal);
Writing – review and editing (equal). Alejandra Zarzo-
Arias: Conceptualization (equal); Writing – review and
editing (equal). Lukáš Gábor: Conceptualization (equal);
Writing – review and editing (equal). François Leroy:
Conceptualization (equal); Writing – review and editing
(equal). Matilde Martini: Conceptualization (equal);
Writing – review and editing (equal). Marco Malavasi:
Conceptualization (equal); Writing – review and editing
(equal). Roberto Cazzolla Gatti: Conceptualization
(equal); Writing – review and editing (equal). Jan Wild:
Conceptualization (equal); Writing – review and editing
(equal). Petra Šímo: Conceptualization (equal); Writing
– review and editing (equal).
Transparent peer review
e peer review history for this article is available at
https://www.webofscience.com/api/gateway/wos/
peer-review/10.1111/ecog.07294.
Data availability statement
Data sharing is not applicable to this article as no new data
were created or analyzed in this study.
References
Aarts, G., Fieberg, J. and Matthiopoulos, J. 2012. Comparative
interpretation of count, presence–absence and point methods for
species distribution models. – Methods Ecol. Evol. 3: 177–187.
Anselin, L. 1995. Local indicators of spatial association – LISA. –
Geogr. Anal. 27: 93–115.
Araújo, M. B. and Peterson, A. T. 2012. Uses and misuses of bio-
climatic envelope modeling. – Ecology 93: 1527–1539.
Araújo, M. B., Anderson, R. P., Márcia Barbosa, A., Beale, C. M.,
Dormann, C. F., Early, R., Garcia, R. A., Guisan, A., Maiorano,
L., Naimi, B., O’Hara, R. B., Zimmermann, N. E. and Rahbek,
C. 2019. Standards for distribution models in biodiversity
assessments. – Sci. Adv. 5: eaat4858.
ArenasCastro, S., Regos, A., Martins, I., Honrado, J. and Alonso,
J. 2022. Eects of input data sources on species distribution
model predictions across species with dierent distributional
ranges. – J. Biogeogr. 49: 1299–1312.
Austin, M. P. 2002. Spatial prediction of species distribution: an
interface between ecological theory and statistical modelling.
– Ecol. Modell. 157: 101–118.
Baartman, J. E., Melsen, L. A., Moore, D. and van der Ploeg, M.
J. 2020. On the complexity of model complexity: viewpoints
across the geosciences. – Catena 186: 104261.
Baker, D. J., Maclean, I. M., Goodall, M. and Gaston, K. J. 2022.
Correlations between spatial sampling biases and environmental
niches aect species distribution models. – Global Ecol.
Biogeogr. 31: 1038–1050.
Barber, R. A., Ball, S. G., Morris, R. K. and Gilbert, F. 2022.
Targetgroup backgrounds prove eective at correcting sampling
bias in Maxent models. – Divers. Distrib. 28: 128–141.
Bardon, L. R., Ward, B. A., Dutkiewicz, S. and Cael, B. B. 2021.
Testing the skill of a species distribution model using a 21st
century virtual ecosystem. – Geophys. Res. Lett. 48: e2021.
Barry, S. and Elith, J. 2006. Error and uncertainty in habitat mod-
els. – J. Appl. Ecol. 43: 413–423.
Bazzichetto, M., Massol, F., Carboni, M., Lenoir, J., Lembrechts, J. J.,
Joly, R. and Renault, D. 2021. Once upon a time in the far south:
inuence of local drivers and functional traits on plant invasion in
the harsh subAntarctic islands. – J. Veg. Sci. 32: e13057.
Bazzichetto, M., Lenoir, J., Da Re, D., Tordoni, E., Rocchini, D.,
Malavasi, M., Barták, V. and Sperandii, M. G. 2023. Sampling
strategy matters to accurately estimate response curves'
parameters in species distribution models. – Global Ecol.
Biogeogr. 32: 1717–1729.
Bean, W. T., Staord, R. and Brashares, J. S. 2012. e eects of
small sample size and sample bias on threshold selection and
accuracy assessment of species distribution models. – Ecography
35: 250–258.
Beck, J., Böller, M., Erhardt, A. and Schwanghart, W. 2014. Spatial
bias in the GBIF database and its eect on modeling species'
geographic distributions. – Ecol. Inform. 19: 10–15.
Becker, F. S., Slingsby, J. A., Measey, J., Tolley, K. A. and Altwegg,
R. 2022. Finding rare species and estimating the probability
that all occupied sites have been found. – Ecol. Appl. 32: e2502.
Bell, D. M. and Schlaepfer, D. R. 2016. On the dangers of model
complexity without ecological justication in species distribution
modeling. – Ecol. Modell. 330: 50–59.
Blonder, B., Lamanna, C., Violle, C. and Enquist, B. J. 2014. e
ndimensional hypervolume. – Global Ecol. Biogeogr. 23:
595–609.
Bloom, T. D. S., Flower, A. and DeChaine, E. G. 2018. Why georef-
erencing matters: introducing a practical protocol to prepare species
occurrence records for spatial analysis. – Ecol. Evol. 8: 765–777.
Boakes, E. H., McGowan, P. J., Fuller, R. A., Chang-qing, D.,
Clark, N. E., O'Connor, K. and Mace, G. M. 2010. Distorted
views of biodiversity: spatial and temporal bias in species occur-
rence data. – PLoS Biol. 8: e1000385.
Botella, C., Joly, A., Monestiez, P., Bonnet, P. and Munoz, F. 2020.
Bias in presence-only niche models related to sampling eort
and species niches: lessons for background point selection. –
PLoS One 15: e0232078.
Botella, C., Deneu, B., Marcos, D., Servajean, M., Estopinan, J.,
Larcher, T., and Joly, A. 2023. e GeoLifeCLEF 2023 dataset
to evaluate plant species distribution models at high spatial
resolution across Europe. – arXiv preprint arXiv:2308.05121.
Boyd, R. J., Harvey, M., Roy, D. B., Barber, T., Haysom, K. A.,
Macadam, C. R., Morris, R. K. A., Palmer, C., Palmer, S.,
Preston, C. D., Taylor, P., Ward, R., Ball, S. G. and Pescott, O.
L. 2023. Causal inference and largescale expert validation shed
light on the drivers of SDM accuracy and variance. – Divers.
Distrib. 29: 774–784.
Brun, P., uiller, W., Chauvier, Y., Pellissier, L., Wüest, R. O.,
Wang, Z. and Zimmermann, N. E. 2020. Model complexity
aects species distribution projections under climate change.
– J. Biogeogr. 47: 130–142.
Bystriakova, N., Peregrym, M., Erkens, R. H. J., Bezsmertna, O.
and Schneider, H. 2012. Sampling bias in geographic and
environmental space and its eect on the predictive power of
species distribution models. – Syst. Biodivers. 10: 305–315.
Carretero, M. A. and Sillero, N. 2016. Evaluating how species niche
modelling is aected by partial distributions with an empirical
case. – Acta Oecol. 77: 207–216.
Page 16 of 20
Castellanos, A. A., Huntley, J. W., Voelker, G. and Lawing, A. M.
2019. Environmental ltering improves ecological niche models
across multiple scales. – Methods Ecol. Evol. 10: 481–492.
Chauvier, Y., Zimmermann, N. E., Poggiato, G., Bystrova, D.,
Brun, P. and uiller, W. 2021. Novel methods to correct for
observer and sampling bias in presenceonly species distribution
models. – Global Ecol. Biogeogr. 30: 2312–2325.
Chefaoui, R. M. and Serrão, E. A. 2017. Accounting for uncertainty
in predictions of a marine species: integrating population genet-
ics to verify past distributions. – Ecol. Modell. 359: 229–239.
Chevalier, M., Zarzo-Arias, A., Guélat, J., Mateo, R. G. and Guisan,
A. 2022. Accounting for niche truncation to improve spatial
and temporal predictions of species distributions. – Front. Ecol.
Evol. 10: 944116.
Collart, F. and Guisan, A. 2023. Small to train, small to test: deal-
ing with low sample size in model evaluation. – Ecol. Inform.
75: 102106.
Collart, F., Broennimann, O., Guisan, A. and Vanderpoorten, A.
2023. Ecological and biological indicators of the accuracy of
species distribution models: lessons from European bryophytes.
– Ecography 23: e06721.
Colwell, R. K. and Rangel, T. F. 2009. Hutchinson's duality: the once
and future niche. – Proc. Natl Acad. Sci. USA 106: 19651–19658.
Cosentino, F. and Maiorano, L. 2021. Is geographic sampling bias
representative of environmental space? – Ecol. Inform. 64:
101369.
Coudun, C. and Gégout, J. C. 2006. e derivation of species
response curves with Gaussian logistic regression is sensitive to
sampling intensity and curve characteristics. – Ecol. Modell.
199: 164–175.
Da Re, D., Tordoni, E., Lenoir, J., Vanwambeke, S. O., Rocchini,
D., Bazzichetto, M. and SoilTemp Consortium. 2023. Use it:
uniformly sampling pseudo-absences within the environmental
space for applications in habitat suitability models.
Daru, B. H. and Rodriguez, J. 2023. Mass production of
unvouchered records fails to represent global biodiversity
patterns. – Nat. Ecol. Evol. 7: 816–831.
Davies, S. C., ompson, P. L., Gomez, C., Nephin, J., Knudby,
A., Park, A. E., Friesen, S. K., Pollock, L. J., Rubidge, E. M.,
Anderson, S. C., Iacarella, J. C., Lyons, D. A., MacDonald, A.,
McMillan, A., Ward, E. J., Holdsworth, A. M., Swart, N., Price,
J. and Hunter, K. L. 2023. Addressing uncertainty when
projecting marine species' distributions under climate change.
– Ecography2023: e06731.
Di Cola, V., Broennimann, O., Petitpierre, B., Breiner, F. T.,
d'Amen, M., Randin, C., Engler, R., Pottier, J., Pio, D., Dubuis,
A., Pellissier, L., Mateo, R. G., Hordijk, W., Salamin, N. and
Guisan, A. 2017. ecospat: an R package to support spatial
analyses and modeling of species niches and distributions. –
Ecography 40: 774–787.
Duputié, A., Zimmermann, N. E. and Chuine, I. 2014. Where are
the wild things? Why we need better data on species distribution.
– Global Ecol. Biogeogr. 23: 457–467.
Ehrlén, J. and Morris, W. F. 2015. Predicting changes in the distri-
bution and abundance of species under environmental change.
– Ecol. Lett. 18: 303–314.
Elith, J. and Leathwick, J. R. 2009. Species distribution models:
ecological explanation and prediction across space and time. –
Annu. Rev. Ecol. Evol. Syst. 40: 677–697.
Elith, J., Burgman, M. A. and Regan, H. M. 2002. Mapping epis-
temic uncertainties and vague concepts in predictions of species
distribution. – Ecol. Modell. 157: 313–329.
Engler, R., Guisan, A. and Rechsteiner, L. 2004. An improved
approach for predicting the distribution of rare and endangered
species from occurrence and pseudoabsence data. – J. Appl.
Ecol. 41: 263–274.
Esselman, P. C. and Allan, J. D. 2011. Application of species dis-
tribution models and conservation planning software to the
design of a reserve network for the riverine shes of northeastern
Mesoamerica. – Freshw. Biol. 56: 71–88.
Feeley, K. J. and Silman, M. R. 2011. Keep collecting: accurate
species distribution modelling requires more collections than
previously thought. – Divers. Distrib. 17: 1132–1140.
Feng, X., Park, D. S., Walker, C., Peterson, A. T., Merow, C. and
Papeş, M. 2019. A checklist for maximizing reproducibility of
ecological niche models. – Nat. Ecol. Evol. 3: 1382–1395.
Fernandez, M., Blum, S., Reichle, S., Guo, Q., Holzman, B. and
Hamilton, H. 2009. Locality uncertainty and the dierential
performance of four common niche-based modeling techniques.
– Biodivers. Inform. 6: 36–52.
Ferrier, S., Jetz, W. and Scharlemann, J. 2017. Biodiversity model-
ling as part of an observation system. e GEO handbook on
biodiversity observation networks. – Springer, pp. 239–257.
Ficetola, G. F., Bonardi, A., Mücher, C. A., Gilissen, N. L. M. and
Padoa-Schioppa, E. 2014. How many predictors in species
distribution models at the landscape scale? Land use versus
LiDAR-derived canopy height. – Int. J. Geogr. Inf. Sci. 28:
1723–1739.
Fois, M., Fenu, G., Cuena Lombraña, A. C., Cogoni, D. and Bac-
chetta, G. 2015. A practical method to speed up the discovery
of unknown populations using species distribution models. – J.
Nat. Conserv. 24: 42–48.
Fois, M., Cuena-Lombraña, A., Fenu, G. and Bacchetta, G. 2018.
Using species distribution models at local scale to guide the
search of poorly known species: review, methodological issues
and future directions. – Ecol. Modell. 385: 124–132.
Foody, G. M. 2011. Impacts of imperfect reference data on the
apparent accuracy of species presence–absence models and their
predictions. – Global Ecol. Biogeogr. 20: 498–508.
Fourcade, Y., Engler, J. O., Rödder, D. and Secondi, J. 2014. Map-
ping species distributions with MAXENT using a geographi-
cally biased sample of presence data: a performance assessment
of methods for correcting sampling bias. – PLoS One 9: e97122.
Fourcade, Y., Besnard, A. G. and Secondi, J. 2018. Paintings predict
the distribution of species, or the challenge of selecting
environmental predictors and evaluation statistics. – Global
Ecol. Biogeogr. 27: 245–256.
Frair, J. L., Fieberg, J., Hebblewhite, M., Cagnacci, F., DeCesare,
N. J. and Pedrotti, L. 2010. Resolving issues of imprecise and
habitat-biased locations in ecological analyses using GPS telem-
etry data. – Phil. Trans. R. Soc. B 365: 2187–2200.
Gábor, L., Moudrý, V., Barták, V. and Lecours, V. 2020a. How do
species and data characteristics aect species distribution models
and when to use environmental ltering? – Int. J. Geogr. Inf.
Sci. 34: 1567–1584.
Gábor, L., Moudrý, V., Lecours, V., Malavasi, M., Barták, V., Fogl,
M., Šímová, P., Rocchini, D. and Václavík, T. 2020b. e eect
of positional error on ne scale species distribution models
increases for specialist species. – Ecography 43: 256–269.
Gábor, L., Jetz, W., Lu, M., Rocchini, D., Cord, A., Malavasi, M.,
Zarzo-Arias, A., Barták, V. and Moudrý, V. 2022. Positional
errors in species distribution modelling are not overcome by the
coarser grains of analysis. – Methods Ecol. Evol. 13: 2289–2302.
Page 17 of 20
Gábor, L., Jetz, W., ZarzoArias, A., Winner, K., Yanco, S., Pinkert, S.,
Marsh, C. J., Rogan, M. S., Mäkinen, J., Rocchini, D., Barták, V.,
Malavasi, M., Balej, P. and Moudrý, V. 2023. Species distribution
models aected by positional uncertainty in species occurrences
can still be ecologically interpretable. – Ecography 2023: e06358.
Gábor, L., Cohen, J., Moudrý, V. and Jetz, W. 2024. Assessing the
applicability of binary land-cover variables to species distribution
models across multiple grains. – Landscape Ecol. 39: 66.
García-Callejas, D. and Araújo, M. B. 2016. e eects of model
and data complexity on predictions from species distributions
models. – Ecol. Modell. 326: 4–12.
Geldmann, J., HeilmannClausen, J., Holm, T. E., Levinsky, I.,
Markussen, B. O., Olsen, K., Rahbek, C. and Tøttrup, A. P.
2016. What determines spatial bias in citizen science? Exploring
four recording schemes with dierent prociency requirements.
– Divers. Distrib. 22: 1139–1149.
Girardello, M., Chapman, A., Dennis, R., Kaila, L., Borges, P. A.
and Santangeli, A. 2019. Gaps in buttery inventory data: a
global analysis. – Biol. Conserv. 236: 289–295.
Graham, C. H., Ferrier, S., Huettman, F., Moritz, C. and Peterson,
A. T. 2004. New developments in museum-based informatics
and applications in biodiversity analysis. – Trends Ecol. Evol.
19: 497–503.
Graham, C. H., Elith, J., Hijmans, R. J., Guisan, A., Townsend
Peterson, A., Loiselle, B. A. and NCEAS Predicting Species
Distributions Working Group. 2008. e inuence of spatial
errors in species occurrence data used in distribution models.
– J. Appl. Ecol. 45: 239–247.
GuilleraArroita, G., LahozMonfort, J. J., Elith, J., Gordon, A.,
Kujala, H., Lentini, P. E., McCarthy, M. A., Tingley, R. and
Wintle, B. A. 2015. Is my species distribution model t for
purpose? Matching data and models to applications. – Global
Ecol. Biogeogr. 24: 276–292.
Guisan, A., Zimmermann, N. E., Elith, J., Graham, C. H., Phillips,
S. and Peterson, A. T. 2007. What matters for predicting the
occurrences of trees: techniques, data, or species' characteristics?
– Ecol. Monogr. 77: 615–630.
Guisan, A.etal. 2013. Predicting species distributions for conserva-
tion decisions. – Ecol. Lett. 16: 1424–1435.
Haesen, S., Lenoir, J., Gril, E., De Frenne, P., Lembrechts, J. J.,
Kopecký, M., Macek, M., Man, M., Wild, J. and Van Meerbeek,
K. 2023. Microclimate reveals the true thermal niche of forest
plant species. – Ecol. Lett. 26: 2043–2055.
Hallman, T. A. and Robinson, W. D. 2020. Deciphering ecology
from statistical artefacts: competing inuence of sample size,
prevalence and habitat specialization on species distribution
models and how small evaluation datasets can inate metrics of
performance. – Divers. Distrib. 26: 315–328.
Hanberry, B. B., He, H. S. and Dey, D. C. 2012. Sample sizes and
model comparison metrics for species distribution models. –
Ecol. Modell. 227: 29–33.
Hastie, T. and Fithian, W. 2013. Inference from presenceonly data;
the ongoing controversy. – Ecography 36: 864–867.
Heey, T. J., Baasch, D. M., Tyre, A. J. and Blankenship, E. E.
2014. Correction of location errors for presenceonly species
distribution models. – Methods Ecol. Evol. 5: 207–214.
Heikkinen, R. K., Luoto, M., Araújo, M. B., Virkkala, R., uiller,
W. and Sykes, M. T. 2006. Methods and uncertainties in
bioclimatic envelope modelling under climate change. – Prog.
Phys. Geogr. 30: 751–777.
Hernandez, P. A., Graham, C. H., Master, L. L. and Albert, D. L.
2006. e eect of sample size and species characteristics on
performance of dierent species distribution modeling methods.
– Ecography 29: 773–785.
Hirzel, A. and Guisan, A. 2002. Which is the optimal sampling
strategy for habitat suitability modelling. – Ecol. Modell. 157:
331–341.
Hirzel, A. H., Hausser, J., Chessel, D. and Perrin, N. 2002. Eco-
logicalniche factor analysis: how to compute habitatsuitability
maps without absence data? – Ecology 83: 2027–2036.
Hortal, J., JiménezValverde, A., Gómez, J. F., Lobo, J. M. and
Baselga, A. 2008. Historical bias in biodiversity inventories
aects the observed environmental niche of the species. – Oikos
117: 847–858.
Hortal, J., de Bello, F., Diniz-Filho, J. A. F., Lewinsohn, T. M.,
Lobo, J. M. and Ladle, R. J. 2015. Seven shortfalls that beset
large-scale knowledge of biodiversity. – Annu. Rev. Ecol. Evol.
Syst. 46: 523–549.
Hughes, A., Dorey, J., Bossert, S., Qiao, H. and Orr, M. 2023. Big
data – big problems? How to circumvent problems in
biodiversity mapping and ensure meaningful results. –
Ecography 2024: e07115.
Hughes, A. C., Orr, M. C., Ma, K., Costello, M. J., Waller, J.,
Provoost, P., Yang, Q., Zhu, C. and Qiao, H. 2021. Sampling
biases shape our view of the natural world. – Ecography 44:
1259–1269.
Inman, R., Franklin, J., Esque, T. and Nussear, K. 2021. Compar-
ing sample bias correction methods for species distribution
modeling using virtual species. – Ecosphere 12: e03422.
Isaac, N. J. and Pocock, M. J. 2015. Bias and information in bio-
logical records. – Biol. J. Linn. Soc. 115: 522–531.
Jansen, J., Woolley, S. N., Dunstan, P. K., Foster, S. D., Hill, N.
A., Haward, M. and Johnson, C. R. 2022. Stop ignoring map
uncertainty in biodiversity science and conservation policy. –
Nat. Ecol. Evol. 6: 828–829.
Jeliazkov, A., Gavish, Y., Marsh, C. J., Geschke, J., Brummitt, N.,
Rocchini, D., Haase, P., Kunin, W. E. and Henle, K. 2022.
Sampling and modelling rare species: conceptual guidelines for
the neglected majority. – Global Change Biol. 28: 3754–3777.
Jiménez-Valverde, A. 2020. Sample size for the evaluation of pres-
ence-absence models. – Ecol. Indic. 114: 106289.
Jiménez-Valverde, A., Lobo, J. and Hortal, J. 2009. e eect of
prevalence and its interaction with sample size on the reliability
of species distribution models. – Commun. Ecol. 10: 196–205.
Johnson, C. J. and Gillingham, M. P. 2008. Sensitivity of species-
distribution models to error, bias, and model design: an
application to resource selection functions for woodland cari-
bou. – Ecol. Modell. 213: 143–155.
Johnson, E. E., Escobar, L. E. and Zambrana-Torrelio, C. 2019.
An ecological framework for modeling the geography of disease
transmission. – Trends Ecol. Evol. 34: 655–668.
Kadmon, R., Farber, O. and Danin, A. 2003. A systematic analysis
of factors aecting the performance of climatic envelope models.
– Ecol. Appl. 13: 853–867.
Kadmon, R., Farber, O. and Danin, A. 2004. Eect of roadside
bias on the accuracy of predictive maps produced by bioclimatic
models. – Ecol. Appl. 14: 401–413.
Keil, P., Wilson, A. M. and Jetz, W. 2014. Uncertainty, priors,
autocorrelation and disparate data in downscaling of species
distributions. – Divers. Distrib. 20: 797–812.
Kos, T., Markezic, I. and Pokrajcic, J. 2010. Eects of multipath
reception on GPS positioning performance. – In: Grgić, M.,
Božek, J. and Grgić, S. (eds), Proceedings ELMAR-2010. IEEE,
pp. 399–402.
Page 18 of 20
KramerSchadt, S.et al. 2013. e importance of correcting for
sampling bias in MaxEnt species distribution models. – Divers.
Distrib. 19: 1366–1379.
Lamboley, Q. and Fourcade, Y. 2024. No optimal spatial ltering
distance for mitigating sampling bias in ecological niche mod-
els. – J. Biogeogr., doi: 10.1111/jbi.14854.
Lecours, V., Devillers, R., Schneider, D. C., Lucieer, V. L., Brown,
C. J. and Edinger, E. N. 2015. Spatial scale and geographic
context in benthic habitat mapping: review and future
directions. – Mar. Ecol. Prog. Ser. 535: 259–284.
Leitão, P. J., Moreira, F. and Osborne, P. E. 2011. Eects of geo-
graphical data sampling bias on habitat models of species dis-
tributions: a case study with steppe birds in southern Portugal.
– Int. J. Geogr. Inf. Sci. 25: 439–454.
Liu, C., Newell, G. and White, M. 2019. e eect of sample size
on the accuracy of species distribution models: considering both
presences and pseudoabsences or background sites. – Ecography
42: 535–548.
Loiselle, B. A., Jørgensen, P. M., Consiglio, T., Jiménez, I., Blake,
J. G., Lohmann, L. G. and Montiel, O. M. 2008. Predicting
species distributions from herbarium collections: does climate
bias in collection sampling inuence model outcomes? – J. Bio-
geogr. 35: 105–116.
Machado, A. F., Nunes, M. S., Silva, C. R., Dos Santos, M. A.,
Farias, I. P., da Silva, M. N. F. and Anciães, M. 2019. Integrating
phylogeography and ecological niche modelling to test
diversication hypotheses using a Neotropical rodent. – Evol.
Ecol. 33: 111–148.
Maggini, R., Lehmann, A., Zimmermann, N. E. and Guisan, A.
2006. Improving generalized regression analysis for the spatial
prediction of forest communities. – J. Biogeogr. 33: 1729–1749.
Marcer, A., Chapman, A. D., Wieczorek, J. R., Xavier Picó, F.,
Uribe, F., Waller, J. and Ariño, A. H. 2022. Uncertainty matters:
ascertaining where specimens in natural history collections
come from and its implications for predicting species distribu-
tions. – Ecography 2022: e06025.
Mateo, R. G., Felicísimo, Á. M. and Muñoz, J. 2010. Eects of the
number of presences on reliability and stability of MARS
species distribution models: the importance of regional niche
variation and ecological heterogeneity. – J. Veg. Sci. 21:
908–922.
Mateo, R. G., Gastón, A., Aroca-Fernández, M. J., Saura, S. and
García-Viñas, J. I. 2018. Optimization of forest sampling
strategies for woody plant species distribution modelling at the
landscape scale. – For. Ecol. Manage. 410: 104–113.
McCarthy, K. P., FletcherJr, R. J., Rota, C. T. and Hutto, R. L.
2012. Predicting species distributions from samples collected
along roadsides. – Conserv. Biol. 26: 68–77.
McPherson, J. M. and Jetz, W. 2007. Eects of species’ ecology on
the accuracy of distribution models. – Ecography 30: 135–151.
McPherson, J. M., Jetz, W. and Rogers, D. J. 2004. e eects of
species’ range sizes on the accuracy of distribution models:
ecological phenomenon or statistical artefact? – J. Appl. Ecol.
41: 811–823.
Menegotto, A. and Rangel, T. F. 2018. Mapping knowledge gaps
in marine diversity reveals a latitudinal gradient of missing
species richness. – Nat. Commun. 9: 4713.
Merow, C., Smith, M. J. and Silander Jr, J. A. 2013. A practical guide
to MaxEnt for modeling species' distributions: what it does, and
why inputs and settings matter. – Ecography 36: 1058–1069.
Merow, C., Smith, M. J., Edwards Jr, T. C., Guisan, A., McMahon,
S. M., Normand, S., uiller, W., Wüest, R. O., Zimmermann,
N. E. and Elith, J. 2014. What do we gain from simplicity
versus complexity in species distribution models? – Ecography
37: 1267–1281.
Mertes, K. and Jetz, W. 2018. Disentangling scale dependencies in
species environmental niches and distributions. – Ecography
41: 1604–1615.
Meyer, C., Kreft, H., Guralnick, R. and Jetz, W. 2015. Global
priorities for an eective information basis of biodiversity
distributions. – Nat. Commun. 6: 8221.
Mitchell, P. J., Monk, J. and Laurenson, L. 2017. Sensitivity of
nescale species distribution models to locational uncertainty
in occurrence data across multiple sample sizes. – Methods
Ecol. Evol. 8: 12–21.
Moreno-Amat, E., Mateo, R. G., Nieto-Lugilde, D., Morueta-
Holme, N., Svenning, J. C. and García-Amorena, I. 2015.
Impact of model complexity on cross-temporal transferability
in Maxent species distribution models: an assessment using
paleobotanical data. – Ecol. Modell. 312: 308–317.
Moudrý, V. 2015. Modelling species distributions with simulated
virtual species. – J. Biogeogr. 42: 1365–1366.
Moudrý, V. and Devillers, R. 2020. Quality and usability challenges
of global marine biodiversity databases: an example for marine
mammal data. – Ecol. Inform. 56: 101051.
Moudrý, V. and Šímová, P. 2012. Inuence of positional accuracy,
sample size and scale on modelling species distributions: a
review. – Int. J. Geogr. Inf. Sci. 26: 2083–2095.
Moudrý, V., Komárek, J. and Šímová, P. 2017. Which breeding bird
categories should we use in models of species distribution? –
Ecol. Indic. 74: 526–529.
Moudrý, V., Keil, P., Gábor, L., Lecours, V., Zarzo-Arias, A., Barták,
V., Malavasi, M., Rocchini, D., Torresani, M., Gdulová, K.,
Grattarola, F., Leroy, F., Marchetto, E., ouverai, E., Prošek,
J., Wild, J. and Šímová, P. 2023. Scale mismatches between
predictor and response variables in species distribution
modelling: a review of practices for appropriate grain selection.
– Prog. Phys. Geogr. 47: 467–482.
Muscatello, A., Elith, J. and Kujala, H. 2021. How decisions about
tting species distribution models aect conservation outcomes.
– Conserv. Biol. 35: 1309–1320.
Naimi, B., Skidmore, A. K., Groen, T. A. and Hamm, N. A. 2011.
Spatial autocorrelation in predictors reduces the impact of
positional uncertainty in occurrence data on species distribu-
tion modelling. – J. Biogeogr. 38: 1497–1509.
Naimi, B., Hamm, N. A. S., Groen, T. A., Skidmore, A. K. and
Toxopeus, A. G. 2014. Where is positional uncertainty a prob-
lem for species distribution modelling? – Ecography 37:
191–203.
Newbold, T. 2010. Applications and limitations of museum data
for conservation and ecology, with particular attention to
species distribution models. – Prog. Phys. Geogr. 34: 3–22.
Osborne, P. E. and Leitao, P. J. 2009. Eects of species and
habitat positional errors on the performance and interpreta-
tion of species distribution models. – Divers. Distrib. 15:
671–681.
Papeş, M. and Gaubert, P. 2007. Modelling ecological niches from
low numbers of occurrences: assessment of the conservation
status of poorly known viverrids (Mammalia, Carnivora) across
two continents. – Divers. Distrib. 13: 890–902.
Pearson, R. G., Raxworthy, C. J., Nakamura, M. and Townsend
Peterson, A. 2007. Predicting species distributions from small
numbers of occurrence records: a test case using cryptic geckos
in Madagascar. – J. Biogeogr. 34: 102–117.
Page 19 of 20
Peterson, A. T. 2014. Mapping disease transmission risk: enriching
models using biogeography and ecology. – Johns Hopkins Univ.
Press.
Peterson, A. T. and Samy, A. M. 2016. Geographic potential of
disease caused by Ebola and Marburg viruses in Africa. – Acta
Trop. 162: 114–124.
Phillips, S. J., Dudík, M., Elith, J., Graham, C. H., Lehmann, A.,
Leathwick, J. and Ferrier, S. 2009. Sample selection bias and
presenceonly distribution models: implications for background
and pseudoabsence data. – Ecol. Appl. 19: 181–197.
Proosdij, A. S. J. van, Sosef, M. S. M., Wieringa, J. J. and Raes, N.
2016. Minimum required number of specimen records to develop
accurate species distribution models. – Ecography 39: 542–552.
Ramampiandra, E. C., Scheidegger, A., Wydler, J. and Schuwirth,
N. 2023. A comparison of machine learning and statistical
species distribution models: quantifying overtting supports
model interpretation. – Ecol. Modell. 481: 110353.
Ranc, N., Santini, L., Rondinini, C., Boitani, L., Poitevin, F.,
Angerbjörn, A. and Maiorano, L. 2017. Performance tradeos
in targetgroup bias correction for species distribution models.
– Ecography 40: 1076–1087.
Rattray, A., Ierodiaconou, D., Monk, J., Laurenson, L. J. B. and
Kennedy, P. 2014. Quantication of spatial and thematic
uncertainty in the application of underwater video for benthic
habitat mapping. – Mar. Geod. 37: 315–336.
Raxworthy, C. J., Martinez-Meyer, E., Horning, N., Nussbaum, R.
A., Schneider, G. E., Ortega-Huerta, M. A. and Townsend
Peterson, A. 2003. Predicting distributions of known and
unknown reptile species in Madagascar. – Nature 426: 837–841.
Reineking, B. and Schröder, B. S. 2006. Constrain to perform:
regularization of habitat models. – Ecol. Modell. 193: 675–690.
Reside, A. E., Watson, I., VanDerWal, J. and Kutt, A. S. 2011.
Incorporating low-resolution historic species location data
decreases performance of distribution models. – Ecol. Modell.
222: 3444–3448.
Rhoden, C. M., Peterman, W. E. and Taylor, C. A. 2017. Maxent-
directed eld surveys identify new populations of narrowly
endemic habitat specialists. – PeerJ 5: e3632.
Rocchini, D., Hortal, J., Lengyel, S., Lobo, J. M., Jimenez-Valverde,
A., Ricotta, C., Bacaro, G. and Chiarucci, A. 2011. Accounting
for uncertainty when mapping species distributions: the need for
maps of ignorance. – Prog. Phys. Geogr. 35: 211–226.
Rocchini, D.etal. 2023. A quixotic view of spatial bias in model-
ling the distribution of species and their diversity. – NPJ Biodiv.
2: 10.
Sabatini, F. M.etal. 2021. sPlotOpen – an environmentally bal-
anced, openaccess, global dataset of vegetation plots. – Global
Ecol. Biogeogr. 30: 1740–1764.
Santini, L., BenítezLópez, A., Maiorano, L., Čengić, M. and Hui-
jbregts, M. A. 2021. Assessing the reliability of species distribu-
tion projections in climate change research. – Divers. Distrib.
27: 1035–1050.
Segal, R. D., Massaro, M., Carlile, N. and Whitsed, R. 2021.
Smallscale species distribution model identies restricted
breeding habitat for an endemic island bird. – Anim. Conserv.
24: 959–969.
Segurado, P. and Araujo, M. B. 2004. An evaluation of methods for
modelling species distributions. – J. Biogeogr. 31: 1555–1568.
Seoane, J., Carrascal, L. M., Alonso, C. L. and Palomino, D. 2005.
Species-specic traits associated to prediction errors in bird
habitat suitability modelling. – Ecol. Modell. 185: 299–308.
Shiroyama, R., Wang, M. and Yoshimura, C. 2020. Eect of sam-
ple size on habitat suitability estimation using random forests:
a case of bluegill, Lepomis macrochirus. – Ann. Limnol. Int. J.
Limnol. 56: 13.
Sillero, N. 2011. What does ecological modelling model? A pro-
posed classication of ecological niche models based on their
underlying methods. – Ecol. Modell. 222: 1343–1346.
Sillero, N. and Barbosa, A. M. 2021. Common mistakes in eco-
logical niche models. – Int. J. Geogr. Inf. Sci. 35: 213–226.
Sillero, N. and Gonçalves-Seco, L. 2014. Spatial structure analysis
of a reptile community with airborne LiDAR data. – Int. J.
Geogr. Inf. Sci. 28: 1709–1722.
Sillero, N., Arenas-Castro, S., EnriquezUrzelai, U., Vale, C. G.,
Sousa-Guedes, D., Martínez-Freiría, F., Real, R. and Barbosa,
A. M. 2021a. Want to model a species niche? A step-by-step
guideline on correlative ecological niche modelling. – Ecol.
Modell. 456: 109671.
Sillero, N., Dos Santos, R., Teodoro, A. C. and Carretero, M. A.
2021b. Ecological niche models improve home range estima-
tions. – J. Zool. 313: 145–157.
Smith, A. B. and Santos, M. J. 2020. Testing the ability of species
distribution models to infer variable importance. – Ecography
43: 1801–1813.
Smith, A. B., Murphy, S. J., Henderson, D. and Erickson, K. D.
2023. Including imprecisely georeferenced specimens improves
accuracy of species distribution models and estimates of niche
breadth. – Global Ecol. Biogeogr. 32: 342–355.
Soberón, J. and Nakamura, M. 2009. Niches and distributional
areas: concepts, methods, and assumptions. – Proc. Natl Acad.
Sci. USA 106: 19644–19650.
Støa, B., Halvorsen, R., Stokland, J. N. and Gusarov, V. I. 2019.
How much is enough? Inuence of number of presence obser-
vations on the performance of species distribution models. –
Sommerfeltia 39: 1–28.
Stockwell, D. R. and Peterson, A. T. 2002. Eects of sample size
on accuracy of species distribution models. – Ecol. Modell. 148:
1–13.
Stolar, J. and Nielsen, S. E. 2015. Accounting for spatially biased
sampling eort in presenceonly species distribution modelling.
– Divers. Distrib. 21: 595–608.
Syfert, M. M., Smith, M. J. and Coomes, D. A. 2013. e eects
of sampling bias and model complexity on the predictive per-
formance of MaxEnt species distribution models. – PLoS One
8: e55158.
Ten Caten, C. and Dallas, T. 2023. inning occurrence points
does not improve species distribution model performance. –
Ecosphere 14: e4703.
Tessarolo, G., Rangel, T. F., Araújo, M. B. and Hortal, J. 2014.
Uncertainty associated with survey design in species distribu-
tion models. – Divers. Distrib. 20: 1258–1269.
Tessarolo, G., Ladle, R. J., Lobo, J. M., Rangel, T. F. and Hortal,
J. 2021. Using maps of biogeographical ignorance to reveal the
uncertainty in distributional data hidden in species distribution
models. – Ecography 44: 1743–1755.
ibaud, E., Petitpierre, B., Broennimann, O., Davison, A. C. and
Guisan, A. 2014. Measuring the relative eect of factors aect-
ing species distribution model predictions. – Methods Ecol.
Evol. 5: 947–955.
Troudet, J., Grandcolas, P., Blin, A., Vignes-Lebbe, R. and Leg-
endre, F. 2017. Taxonomic bias in biodiversity data and societal
preferences. – Sci. Rep. 7: 9132.
Page 20 of 20
Tsoar, A., Allouche, O., Steinitz, O., Rotem, D. and Kadmon, R.
2007. A comparative evaluation of presenceonly methods for
modelling species distribution. – Divers. Distrib. 13: 397–405.
van Smeden, M., Moons, K. G., de Groot, J. A., Collins, G. S.,
Altman, D. G., Eijkemans, M. J. and Reitsma, J. B. 2019.
Sample size for binary logistic prediction models: beyond
events per variable criteria. – Stat. Methods Med. Res. 28:
2455–2474.
Varela, S., Anderson, R. P., GarcíaValdés, R. and Fernández
González, F. 2014. Environmental lters reduce the eects of
sampling bias and improve predictions of ecological niche mod-
els. – Ecography 37: 1084–1091.
VelásquezTibatá, J., Graham, C. H. and Munch, S. B. 2016. Using
measurement error models to account for georeferencing error
in species distribution models. – Ecography 39: 305–316.
Veloz, S. D. 2009. Spatially autocorrelated sampling falsely inates
measures of accuracy for presenceonly niche models. – J. Bio-
geogr. 36: 2290–2299.
Vollering, J., Schuiteman, A., de Vogel, E., van Vugt, R. and Raes,
N. 2016. Phytogeography of New Guinean orchids: patterns of
species richness and turnover. – J. Biogeogr. 43: 204–214.
Wang, L. and Jackson, D. A. 2023. Eects of sample size, data
quality, and species response in environmental space on mod-
eling species distributions. – Landscape Ecol. 38: 4009–4031.
Watcharamongkol, T., Christin, P. A. and Osborne, C. P. 2018. C4
photosynthesis evolved in warm climates but promoted migra-
tion to cooler ones. – Ecol. Lett. 21: 376–383.
Wieczorek, J., Guo, Q. and Hijmans, R. 2004. e point-radius
method for georeferencing locality descriptions and calculating
associated uncertainty. – Int. J. Geogr. Inf. Sci. 18: 745–767.
Williams, K. J., Belbin, L., Austin, M. P., Stein, J. L. and Ferrier,
S. 2012. Which environmental variables should I use in my
biodiversity model? – Int. J. Geogr. Inf. Sci. 26: 2009–2047.
Wisz, M. S., Hijmans, R. J., Li, J., Peterson, A. T., Graham, C. H.,
Guisan, A. and NCEAS Predicting Species Distributions Work-
ing Group. 2008. Eects of sample size on the performance of
species distribution models. – Divers. Distrib. 14: 763–773.
Wüest, R. O., Zimmermann, N. E., Zurell, D., Alexander, J. M.,
Fritz, S. A., Hof, C., Kreft, H., Normand, S., Cabral, J. S.,
Szekely, E., uiller, W., Wikelski, M. and Karger, D. N. 2020.
Macroecology in the age of Big Data – where to go from here?
– J. Biogeogr. 47: 1–12.
Xu, Q., Wang, X., Yi, J. and Wang, Y. 2024. Bias correction in
species distribution models based on geographic and environ-
mental characteristics. – Ecol. Inform. 81: 102604.
Zhang, G., Zhu, A. X., Huang, Z. P. and Xiao, W. 2018. A heu-
risticbased approach to mitigating positional errors in patrol
data for species distribution modeling. – Trans. GIS. 22:
202–216.
Zizka, A., Silvestro, D., Andermann, T., Azevedo, J., Duarte Ritter,
C., Edler, D., Farooq, H., Herdean, A., Ariza, M., Scharn, R.,
Svantesson, S., Wengström, N., Zizka, V. and Antonelli, A.
2019. CoordinateCleaner: standardized cleaning of occurrence
records from biological collection databases. – Methods Ecol.
Evol. 10: 744–751.
Zizka, A., Antonelli, A. and Silvestro, D. 2021. Sampbias, a method
for quantifying geographic sampling biases in species distribu-
tion data. – Ecography 44: 25–32.
Zurell, D.et al. 2020. A standard protocol for reporting species
distribution models. – Ecography 43: 1261–1277.
... For species distribution modeling, the occurrence points must be accurate, and special care was taken to select highquality points. The quality, number, and distribution of occurrence records directly influence the accuracy of SDM models (Bazzichetto et al., 2024;Deiß et al., 2024;Mäkinen et al., 2024;Wahid & Aldiansyah, 2024). For species with limited sample points, the occurrence records were closely evaluated to ensure they met the necessary conditions for modeling, avoiding vague or unreliable data. ...
... Most SDMs require that occurrence data be spatially independent to achieve high performance. However, spatial autocorrelation, where occurrence points are clustered, can bias the model by overfitting it to specific environments (Bazzichetto et al., 2024;Soley-guardia et al., 2024;Xu et al., 2024). To address this, spatial rarefying was applied. ...
... To this end, species distribution modelling (SDM) and remote sensing have been widely applied for predictions. Ecologists have long used SDMs to (Allouche et al. 2006;Jiménez-Valverde 2014;Schwagera and Berg, 2021) to quantify the relationship between species and the environment by identifying ecological niches (Elliott et al., 2024), which enables the creation of distribution maps despite limited and abundant species distribution data (Truong et al., 2017;Moudrý et al., 2024). The performance of the models (boosted regression trees (BRT) Maximum Entropy (MaxEnt), Support Vector Machine (SVM), RF, Generalised Linear Model (GLM)) varied across species and environments, and no single best model was identified (Moudrý et al., 2024). ...
... Ecologists have long used SDMs to (Allouche et al. 2006;Jiménez-Valverde 2014;Schwagera and Berg, 2021) to quantify the relationship between species and the environment by identifying ecological niches (Elliott et al., 2024), which enables the creation of distribution maps despite limited and abundant species distribution data (Truong et al., 2017;Moudrý et al., 2024). The performance of the models (boosted regression trees (BRT) Maximum Entropy (MaxEnt), Support Vector Machine (SVM), RF, Generalised Linear Model (GLM)) varied across species and environments, and no single best model was identified (Moudrý et al., 2024). Choosing and implementing these models is required. ...
Article
Full-text available
Savanna rangelands have experienced widespread degradation due to bush encroachment, raising significant concerns among conservationists and rural communities. In the context of climate change, these ecosystem shifts are likely to intensify, especially in South Africa's semi-arid regions. Understanding the impacts of climate variability and change on species distribution within these rangelands is crucial for mitigating further ecosystem disruption. Environmental factors, along with climatic variables, can accelerate the process of bush encroachment , threatening both biodiversity and land use. Early identification of areas vulnerable to invasion is key to developing effective and cost-efficient management strategies. This study aims to model the distribution of invasive species across protected and communal landscapes under long-term climate change projections. A Random Forest (RF) model produced the highest accuracy metrics for Area under the curve (AUC) = 0.99 and True Skill Statistic (TSS)=0.97, while a MaxEnt model recorded the second highest AUC (0.98) and TSS (0.97). The results show a clear difference between the current and future scenarios of the spatial distribution in all the models. Applying a species distribution model (SDM) using both MaxEnt and RF produced a higher degree of prediction accuracy because RF is susceptible to overfitting training data while MaxEnt can produce predictable and complex results. Moreover, the overall predictions using the ensemble model demonstrated an increase in areas suitable for encroachment under RCP 8.5 but a decrease in the bush encroachment rate under RCP 2.6. These findings underscore the critical need for proactive management strategies to mitigate bush encroachment, particularly under high-emission scenarios, ensuring the sustainability of semi-arid savanna rangelands in the face of climate change.
... A key question in the modeling process is whether to use all available data regardless of potential errors and ecological relevance or to filter the data to reduce bias (Kramer-Schadt et al. 2013;Guillera-Arroita et al. 2015). In this study, we performed rarefaction of species occurrence points, which involves reducing the spatial clustering of occurrence records to mitigate sampling bias and address issues of spatial autocorrelation and overrepresentation of certain areas due to uneven sampling effort, ultimately leading to improved model performance (Boria et al. 2014;Moudrý et al. 2024). In this study, we found higher model performance after rarefying the occurrence points (Table 2) and achieved the best model performance with occurrence data rarefied at a resolution of 20 km, indicating excellent model accuracy and a strong match between the observed and predicted outcomes. ...
Article
Full-text available
Background Global risk assessment of invasive weeds is a proactive strategy for identifying high-risk species and regions, predicting invasion rates and extents, and evaluating harmful impacts on native biodiversity, agriculture, and ecosystems. In this study, species distribution modeling was used to assess the global invasion risk of Ardisia elliptica , a highly invasive tropical shrub native to South and Southeast Asia that is harmful in other parts of the world, under the current climate and future climate change scenarios [shared socioeconomic pathways (SSPs) SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5] and other environmental variables, including land use and land cover change, soil moisture, soil carbon, soil pH, and human influence index. Results Our study revealed that annual precipitation, human influence index, and precipitation in the wettest month contributed significantly to the MaxEnt model, with estimated contributions of 31.35%, 22.76%, and 14.77%, respectively. These findings suggest that the global distribution of A. elliptica is limited primarily by climatic variables, whereas anthropogenic factors also play an important role in its habitat expansion. The current invasion risk was highest in South America, Oceania (east), and Africa, affecting up to 24.51% of the total land surface area. A risk assessment of 165 countries revealed a risk of invasion in 41 countries with no records of species occurrence. Under future climate change scenarios, a significant global expansion of the distribution was predicted, with invasion in South America covering up to 48.97% of the land surface area by 2061–2080. Habitat suitability analysis revealed that 21 countries under the current climate and 47 countries under SSP5-8.5 had extremely suitable habitats for A. elliptica . Additionally, the species has already invaded at least 115 countries, while 15 countries, including Benin, Burundi, Japan, Uruguay, Swaziland, and South Korea, are predicted to shift categories from having unsuitable or poor invasion risk to having high invasion risk. Conclusions These findings are crucial for understanding the global invasion risk of A. elliptica under substantial climate change and anthropogenic activities and support the development of effective biosecurity measures and sustainable management strategies for this harmful species at the global and national levels.
... Several approaches exist to address sampling bias (Inman et al., 2021;Baker et al., 2024;Moudrỳ et al., 2024), including adjusting background points (Barber et al., 2022;Vollering et al., 2019), applying environmental filtering (Castellanos et al., 2019), incorporating bias-related covariates (Varela et al., 2014;Chauvier et al., 2021), or modeling preferential sampling (Diggle et al., 2010;Amaral et al., 2024). One of the most widely used methods to mitigate these biases across the geographic space is spatial thinning, which selectively filters points based on specified criteria to reduce the overrepresentation of the densest locations (Veloz, 2009;Boria et al., 2014). ...
Preprint
Full-text available
In this paper we present GeoThinneR, an R package for efficient and flexible spatial thinning of species occurrence data. Spatial thinning is a widely used preprocessing step in species distribution modeling (SDM) that can help reduce sampling bias, but existing R implementations rely on brute-force algorithms that scale poorly with large datasets. GeoThinneR implements multiple thinning approaches, including ensuring a minimum distance between points, subsampling points on a grid, and filtering based on decimal precision. To handle large datasets, it introduces two optimized algorithms based on local kd-trees and adaptive neighbor estimation, which greatly reduce memory usage and execution time. Additional functionalities such as group-wise thinning and point prioritization are included to facilitate its use in SDM workflows. We here provide performance benchmarks using both simulated and real-world data to demonstrate substantial performance improvements over existing tools.
... We used this occurrence data set alongside environmental predictors characterising climate, land cover, topography, geology and human pressure to build species distribution models (SDMs) for 49 bat species at 1 km resolution (see Appendix S2 for a description and overview of all environmental predictor variables). For each species, we performed cross-validations to select the bestperforming variables with pairwise correlations < 0.7 (Dormann et al. 2013), limiting the number of variables chosen to 1 per 10 available presence records to avoid overfitting (Moudrý et al. 2024;Reineking and Schröder 2006). The number of selected variables ranged from 2 to 16 (median: 8). ...
Article
Full-text available
Aim More species‐rich communities are often assumed to contain more specialist species with narrower niches and smaller ranges. Stronger interspecific competition in species‐rich communities is thought to be a key mechanism explaining these patterns. Yet, the relationship between richness and specialisation has so far only been studied for a few taxa, and characterising the effects of interspecific competition on species distributions is challenging. Here, we assess broad‐scale relationships between niche breadth, range sizes and geographic exclusion along richness gradients of bats. Location Eastern Mediterranean, Western Asia, and Central Asia. Taxon Bats (Chiroptera). Methods Based on a novel integrated species distribution modelling approach that combines occurrence information with expert range maps, we assessed how environmental niche breadth and range sizes varied with species richness. In addition, by contrasting species' potential and realised distributions in areas where species pairs overlap, we derived indicators of geographic exclusion to understand how potential interspecific competition is affecting range limits along richness gradients. Results and Main Conclusions We found a nonlinear association between environmental niche breadth and richness, with the most specialised species occurring in species‐poor regions and niche breadth peaking at intermediate richness. Despite a positive association of niche breadth and range sizes at the species level, range sizes in predicted bat communities declined continuously with species richness. In addition, patterns of geographic exclusion were linked to patterns of niche breadth, with species filling less of their potential range overlaps when overlapping species were more specialised. Our findings suggest that small range sizes in species‐rich bat communities are better explained by the number of interacting species than by environmental specialisation or stronger exclusion between individual species. More broadly, we show how integrated distribution modelling approaches can shed new light on the interplay of species richness, specialisation and community structure, and caution against generalising relationships between richness and specialisation across taxa and geographies.
... Such outputs enable researchers to make critical inferences about community assemblages, evolutionary trends, and ecological dynamics' information often beyond the reach of local field data alone (Murphy and Smith 2021;McShea 2014). Despite their broad applicability, SDMs face critical limitations (Moudrý et al. 2024), one of which is sampling bias (Kramer-Schadt et al. 2013). This occurs when available data are disproportionately clumped in certain areas or regions (Phillips et al. 2009). ...
Article
Full-text available
Controlling background data selection in presence-only models is crucial for addressing sampling biases and enhancing model performance. While numerous studies have evaluated the impact of various background data selection techniques across different taxa, research remains limited on how spatially restricted background areas and employing random and biased distribution methods, influence model performance for Rattus species predictions. These species often present challenging collection conditions and low trap success rates, potentially leading to spatial biases in the occurrence records that may affect the accuracy of model predictions. Thus, this study examined methods to assess model accuracy variability for Rattus species by applying spatial background restrictions within the study area. These restrictions were defined by four main criteria: (1) areas within islands with documented species occurrences, (2) areas within the species' extent of occurrence according to IUCN range maps, (3) defined road distance, and (4) varying buffer areas around recorded species occurrences. To further assess the effects of spatial background restrictions on model performance, we used two methods to distribute the background sampling points: random and biased (bias file) method. Our findings demonstrated that the selection of spatial background restrictions and the distribution methods for background sampling points play a critical role in influencing model performance and the accuracy of predicted habitat suitability for Rattus species. Our findings highlight that defining a specific spatial restriction, such as restricting background selection to within 5 km of a road, improves model performance. However, overly narrow or restrictive buffer sizes, such as the 20 km buffer size used in this study, fail to capture the full environmental variability of the species, which can diminish model accuracy. Furthermore, the method used to distribute background sampling points whether random or biased affects species predictive outcomes. To ensure reliable predictions, we recommend a systematic evaluation of different spatial restriction methods and distribution approaches, along with a thorough analysis of their impacts on model performance. This approach not only reveals how outcomes vary across different modeling scenarios but also provides a strong basis for determining the most reliable predictions. By carefully assessing these factors, researchers can refine and optimize habitat suitability models for Rattus species, ultimately enhancing predic-tive accuracy and ensuring more consistent and dependable results.
Article
Full-text available
Risk assessments of invasive species present one of the most challenging applications of species distribution models (SDMs) due to the fundamental issues of distributional disequilibrium, niche changes, and truncation. Invasive species often occupy only a fraction of their potential environmental and geographic ranges, as their spatiotemporal dynamics are shaped by intraspecific variability, human‐mediated introductions, novel biotic interactions, climate change, rapid selection, and ecological niche shifts. Traditional correlative SDMs struggle to capture these processes because they implicitly assume distributions are at equilibrium and rely on observed occurrences that seldom represent the full environmental niche of invasive species. Predicting future potential distributions therefore requires moving beyond simple climate‐matching approaches to models that explicitly capture the mechanisms underlying species responses to their environment. Mechanistic niche models (MNMs) are process‐explicit models that address these limitations by capturing species' performance across environmental gradients. By incorporating physiological constraints and vital rates, MNMs offer a mechanistic understanding of species‐environment relationships and enable more robust predictions onto novel environments. However, a unified MNM framework remains elusive. In this review, we delve into the theoretical foundations of MNMs, emphasizing their advantages over correlative approaches, focusing on invasive species. We provide insights into diverse modelling techniques across taxa and examine the benefits and limitations of MNMs for predicting species distributions under novel conditions. Our systematic review reveals that MNMs have been applied sparingly to invasive species, focusing primarily on insects and plants, likely due to high data requirements. MNMs constitute the most suitable approach for defining species distribution limits under novel conditions, but their success depends on the relevance of input data and effective parameterisation, including genotype selection, model type, experimental conditions and physiological curve‐fitting techniques. MNMs offer significant potential for advancing ecological research and providing robust tools for evidence‐based management decisions for populations in disequilibrium under changing environmental conditions.
Article
Full-text available
Anthropogenic climate and land use change pose major threats to island floras worldwide, yet few studies have integrated these drivers in a single vulnerability assessment. Here, we examine the endemic flora of Evvia, the second-largest Aegean island in Greece and an important biodiversity hotspot, as a model system to address how these disturbances may reshape species distributions, community composition, and phylogenetic diversity patterns. We used species distribution models under the Ensemble of Small Models and the ENphylo framework, specifically designed to overcome parameter uncertainty in rare species with inherently limited occurrence records. By integrating climate projections and dynamic land use data, we forecasted potential range shifts, habitat fragmentation, and biodiversity patterns for 114 endemic taxa through the year 2100. We addressed transferability uncertainty, a key challenge in projecting distributions under novel conditions, using the Shape framework extrapolation analysis, thus ensuring robust model projections. Our findings reveal pronounced projected range contractions and increased habitat fragmentation for all studied taxa, with more severe impacts on single-island endemics. Our models demonstrated high concordance with established IUCN Red List assessments, validating their ecological relevance despite the sample size limitations of single-island endemics. Current biodiversity hotspots, primarily located in mountainous regions, are expected to shift towards lowland areas, probably becoming extinction hotspots due to projected species losses, especially for Evvia’s single-island endemics. Emerging hotspot analysis identified new biodiversity centres in lowland zones, while high-altitude areas showed sporadic hotspot patterns. Temporal beta diversity analysis indicated higher species turnover of distantly related taxa at higher elevations, with closely related species clustering at lower altitudes. This pattern suggests a homogenisation of plant communities in lowland areas. The assessment of protected area effectiveness revealed that while 94.6% of current biodiversity hotspots are within protected zones, this coverage is projected to decline by 2100. Our analysis identified conservation gaps, highlighting areas requiring urgent protection to preserve future biodiversity. Our study reveals valuable information regarding the vulnerability of island endemic floras to global change, offering a framework applicable to other insular systems. Our findings demonstrate that adaptive conservation strategies should account for projected biodiversity shifts and serve as a warning for other insular biodiversity hotspots, urging immediate actions to maintain the unique evolutionary heritage of islands.
Preprint
Full-text available
Risk assessments of invasive species are among the most challenging applications of species distribution models (SDMs). This challenge arises from the disequilibrium in invasive distributions, where recorded occurrences do not fully represent the species' potential range. The spatiotemporal dynamics of invasive populations are shaped by intraspecific variability, human-mediated introductions, novel biotic interactions, climate change, and ecological niche shifts, which are only indirectly incorporated into correlative SDMs. Predicting future potential distributions under these conditions requires moving beyond traditional frameworks reliant on historical climatic data to models that explicitly capture the mechanisms underlying species potential. Mechanistic niche models (MNMs) address these limitations as process-explicit models that integrate species' physiological performance across environmental gradients. By incorporating physiological constraints and vital rates, MNMs define species distribution limits, offering a mechanistic understanding of species-environment relationships and enabling more robust predictions under changing conditions. However, a unified MNM framework remains elusive. In this review we delve into the theoretical foundations of MNMs, emphasizing their advantages over correlative approaches, especially for invasive species. We provide insights into diverse modelling techniques across taxa and examine the benefits and limitations of MNMs for predicting species distributions under novel conditions. Our systematic review revealed that MNMs have been applied sparingly to invasive species, focusing primarily on insects and plants, likely due to high data requirements. While MNMs do not explicitly capture spatial processes, they remain the most suitable approach for defining species distribution limits under novel conditions, but their success depends on the relevance of input data and effective parameterization, including genotype selection, model type, experimental conditions, and physiological curve-fitting techniques. MNMs offer significant potential for advancing ecological research and providing robust tools for evidence-based management decisions. By addressing key challenges, they can enhance our understanding of invasive species and other populations in disequilibrium under changing environmental conditions.
Article
Full-text available
Our knowledge of biodiversity hinges on sufficient data, reliable methods, and realistic models. Without an accurate assessment of species distributions, we cannot effectively target and stem biodiversity loss. Species range maps are the foundation of such efforts, but countless studies have failed to account for the most basic assumptions of reliable species mapping practices, undermining the credibility of their results and potentially misleading and hindering conservation and management efforts. Here, we use examples from the recent literature and broader conservation community to highlight the substantial shortfalls in current practices and their consequences for both analyses and conservation management. We detail how different decisions on data filtering impact the outcomes of analysis and provide practical recommendations and steps for more reliable analysis, whilst understanding the limits of what available data will reliably allow and what methods are most appropriate. Whilst perfect analyses are not possible for many taxa given limited data, and biases, ensuring we use data within reasonable limits and understanding inherent assumptions is crucial to ensure appropriate use. By embracing and enacting such best practices, we can ensure both the accuracy and improved comparability of biodiversity analyses going forward, ultimately enhancing our ability to use data to facilitate our protection of the natural world.
Article
Full-text available
Aim The continuous development of statistical tools applied to ecology has contributed to great advances for modelling species' niches and distributions from opportunistic observations. However, as these observations are subject to biases caused by spatial variation in sampling effort, ecological niche models (ENMs) are also frequently biased. Among several bias correction methods that have been proposed, spatial filtering—imposing a minimum distance between occurrences—is widely used, yet lacks clear guidelines for choosing the filtering distance. Here, we aimed to explore the impact of spatial filtering distances on the performance of ENMs. Location Europe. Taxon Virtual species. Methods We applied ENMs to two virtual species with contrasting levels of specialisation, across a spectrum of modelling conditions, bias types and sample sizes. Results Models applied to the specialist species had on average a lower performance than those applied to the generalist species. Using a biased sample reduced model performance, especially when the bias was strong, and when the sample size was large. In many cases, spatial filtering failed to improve model performance or even reduced it. We did find an improvement for the generalist species modelled with large and strongly biased datasets. However, there was no optimal filtering distance, as this improvement was linearly and positively associated with filtering distance. Moreover, because the initial bias was strong and the filtered dataset became very small, the resulting models had only very low accuracy. Main Conclusions Our results suggest that there is no optimal filtering distance for dealing with sampling bias in ENMs, and that spatial filtering never improves model performance enough to draw accurate predictions. We therefore recommend spatial filtering to be employed cautiously, only when enough data are available, and bearing in mind that its effectiveness remains highly uncertain.
Article
Full-text available
Correcting sampling bias in species distribution models (SDMs) is challenging. The difficulty lies in accurately identifying and quantifying bias and the scarcity of samples, which greatly impedes the implementation of bias correction. Current methods often adjust the distribution of presence or background points within geographic or environmental spaces to correct the sampling bias in probability estimation within SDMs. However, these methods may lead to information loss, rely on subjective assumptions, and often separate geography and environment when correcting for bias. This study proposes a novel and easily implementable method termed “aggregation background.” This method selects background data based on the aggregation degree of presence points in the geographic and environmental feature space, thereby approximating the representation and correction of sampling bias in the presence samples. We compared this new method with other prevalent sampling bias correction methods in the existing literature by analyzing ecological authenticity. Under varying biases and sample sizes, the aggregation background and geographic filtering methods achieved more accurate species distribution predictions compared to the target group background and other methods. Notably, when the sample size was small (≤70), the aggregation background was superior to that obtained using the geographic filtering method. These findings underscore the effectiveness of the aggregation background in improving bias correction using limited available presence sample data, without relying on assumptions about sampling bias. Our method provides a new approach for correcting complex unknown biases in SDMs.
Article
Full-text available
Species distribution models are widely used in ecology. The selection of environmental variables is a critical step in SDMs, nowadays compounded by the increasing availability of environmental data. To evaluate the interaction between the grain size and the binary (presence or absence of water) or proportional (proportion of water within the cell) representation of the water cover variable when modeling water bird species distribution. eBird occurrence data with an average number of records of 880,270 per species across the North American continent were used for analysis. Models (via Random Forest) were fitted for 57 water bird species, for two seasons (breeding vs. non-breeding), at four grains (1 km2 to 2500 km2) and using water cover as a proportional or binary variable. The models’ performances were not affected by the type of the adopted water cover variable (proportional or binary) but a significant decrease was observed in the importance of the water cover variable when used in a binary form. This was especially pronounced at coarser grains and during the breeding season. Binary representation of water cover is useful at finer grain sizes (i.e., 1 km2). At more detailed grains (i.e., 1 km2), the simple presence or absence of a certain land-cover type can be a realistic descriptor of species occurrence. This is particularly advantageous when collecting habitat data in the field as simply recording the presence of a habitat is significantly less time-consuming than recording its total area. For models using coarser grains, we recommend using proportional land-cover variables.
Article
Full-text available
Context There have been many studies using species distribution models (SDMs) to predict shifts in species distributions due to environmental changes, but few consider effects of data quantity, data quality, or species response shape. Modeling studies using field-sampled data may be impaired to an unknown degree by lack of knowledge on species’ true relationships with environmental changes. Objectives Using simulations with known relationships we assess model predictions, and investigate which models are more sensitive to sample size, detection limit, or species response shape issues when different SDMs are used for predicting species distribution shifts under environmental changes. Methods We simulated 16 species response relationships to ecological gradients differing in response shape (skewness and kurtosis) using a generalized β-function. Populations were randomly sampled at different sample sizes and detection limits. Linear discriminant analysis (LDA), multiple logistic regression (MLR), generalized additive models (GAM), boosted regression trees (BRT), random forests (RF), artificial neural networks (ANN), and maximum entropy models (MaxEnt) were developed on sampled datasets and compared for predicting species occurrence. We used these SDMs to predict distribution patterns for virtual species with different response shapes across a real landscape of varying heterogeneity in environmental conditions, and compared them with the probability of occurrence generated by the β-function. Results GAM and BRT were sensitive to both sample size and detection limit changes; RF was more affected by detection limit; ANN and MaxEnt were more affected by sample size; LDA and MLR were sensitive to species response shape changes. Conclusions Overall, if little is known about species response to environmental changes, ANN is recommended especially for large sample size. If a focal species is likely to occur only in a narrow range of environmental conditions, GAM and BRT are preferred for large good-quality datasets, and GAM tends to perform slightly better under varied data conditions; RF is recommended for limited amounts of good-quality data. If a focal species is likely to be present in a wide range of environmental conditions, MaxEnt is preferred but caution should be taken for small sample size. If the goal is to identify potential distributions of invasive or endangered species but data quantity and quality are very limited, LDA and MLR are recommended as they generally provide reasonable model sensitivity.
Article
Full-text available
Species distributions are conventionally modelled using coarse‐grained macroclimate data measured in open areas, potentially leading to biased predictions since most terrestrial species reside in the shade of trees. For forest plant species across Europe, we compared conventional macroclimate‐based species distribution models (SDMs) with models corrected for forest microclimate buffering. We show that microclimate‐based SDMs at high spatial resolution outperformed models using macroclimate and microclimate data at coarser resolution. Additionally, macroclimate‐based models introduced a systematic bias in modelled species response curves, which could result in erroneous range shift predictions. Critically important for conservation science, these models were unable to identify warm and cold refugia at the range edges of species distributions. Our study emphasizes the crucial role of microclimate data when SDMs are used to gain insights into biodiversity conservation in the face of climate change, particularly given the growing policy and management focus on the conservation of refugia worldwide.
Article
Full-text available
Species distribution models (SDMs) have been widely used to project terrestrial species' responses to climate change and are increasingly being used for similar objectives in the marine realm. These projections are critically needed to develop strategies for resource management and the conservation of marine ecosystems. SDMs are a powerful and necessary tool; however, they are subject to many sources of uncertainty, both quantifiable and unquantifiable. To ensure that SDM projections are informative for management and conservation decisions, sources of uncertainty must be considered and properly addressed. Here we provide ten overarching guidelines that will aid researchers to identify, minimize, and account for uncertainty through the entire model development process, from the formation of a study question to the presentation of results. These guidelines focus on correlative models and were developed at an international workshop attended by over 50 researchers and practitioners. Although our guidelines are broadly applicable across biological realms, we provide particular focus to the challenges and uncertainties associated with projecting the impacts of climate change on marine species and ecosystems.
Article
Full-text available
Aim Assessing how different sampling strategies affect the accuracy and precision of species response curves estimated by parametric species distribution models. Major Taxa Studied Virtual plant species. Location Abruzzo (Italy). Time Period Timeless (simulated data). Methods We simulated the occurrence of two virtual species with different ecology (generalist vs specialist) and distribution extent. We sampled their occurrence following different sampling strategies: random, stratified, systematic, topographic, uniform within the environmental space (hereafter, uniform) and close to roads. For each sampling design and species, we ran 500 simulations at increasing sampling efforts (total: 42,000 replicates). For each replicate, we fitted a binomial generalised linear model, extracted model coefficients for precipitation and temperature, and compared them with true coefficients from the known species' equation. We evaluated the quality of the estimated response curves by computing bias, variance and root mean squared error (RMSE). Additionally, we (i) assessed the impact of missing covariates on the performance of the sampling approaches and (ii) evaluated the effect of incompletely sampling the environmental space on the uniform approach. Results For the generalist species, we found the lowest RMSE when uniformly sampling the environmental space, while sampling occurrence data close to roads provided the worst performance. For the specialist species, all sampling designs showed comparable outcomes. Excluding important predictors similarly affected all sampling strategies. Sampling limited portions of the environmental space reduced the performance of the uniform approach, regardless of the portion surveyed. Main Conclusions Our results suggest that a proper estimate of the species response curve can be obtained when the choice of the sampling strategy is guided by the species' ecology. Overall, uniformly sampling the environmental space seems more efficient for species with wide environmental tolerances. The advantage of seeking the most appropriate sampling strategy vanishes when modelling species with narrow realised niches.
Article
Spatial biases are an intrinsic feature of occurrence data used in species distribution models (SDMs). Thinning species occurrences, where records close in the geographic or environmental space are removed from the modeling procedure, is an approach often used to address these biases. However, thinning occurrence data can also negatively affect SDM performance, given that the benefits of removing spatial biases might be outweighed by the detrimental effects of data loss caused by this approach. We used real and virtual species to evaluate how spatial and environmental thinning affected different performance metrics of four SDM methods. The occurrence data of virtual species were sampled randomly, evenly spaced, and clustered in the geographic space to simulate different types of spatial biases, and several spatial and environmental thinning distances were used to thin the occurrence data. Null datasets were also generated for each thinning distance where we randomly removed the same number of occurrences by a thinning distance and compared the results of the thinned and null datasets. We found that spatially or environmentally thinned occurrence data is no better than randomly removing them, given that thinned datasets performed similarly to null datasets. Specifically, spatial and environmental thinning led to a general decrease in model performances across all SDM methods. These results were observed for real and virtual species, were positively associated with thinning distance, and were consistent across the different types of spatial biases. Our results suggest that thinning occurrence data usually fails to improve SDM performance and that the use of thinning approaches when modeling species distributions should be considered carefully.