Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Species distribution models (SDMs) are widely used in ecology and conservation. However, their performance is known to be affected by a variety of factors related to species occurrence characteristics. In this study, we used a virtual species approach to overcome the difficulties associated with testing of combined effects of those factors on performance of presence-only SDMs when using real data. We focused on the individual and combined roles of factors related to response variable (i.e. sample size, sampling bias, environmental filtering, species prevalence, and species response to environmental gradients). Results suggest that environmental filtering is not necessarily helpful and should not be performed blindly, without evidence of bias in species occurrences. The more gradual the species response to environmental gradients is, the greater is the model sensitivity to an inappropriate use of environmental filtering, although this sensitivity decreases with higher species prevalence. Results show that SDMs are affected to the greatest degree by the species response to environmental gradients, species prevalence, and sample size. Models’ accuracy decreased with sample size below 300 presences. Furthermore, a high level of interactions among individual factors was observed. Ignoring the combined effects of factors may lead to misleading outcomes and conclusions.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The desired effect of stringent filtering is an increase in quality, by reducing bias and error (Steen et al., 2019). Yet, sample size is inevitably reduced by filtering, and as sample size is known to have a major influence on model performance (Gábor et al., 2019;Wisz et al., 2008), stringent filtering leads to a trade-off between data quality and sample size. To our knowledge, the combined impact of data quality and sample size in stringent filtering on the performance of SDMs remains underexplored. ...
... The different drivers of model performance make the interpretation complex, but also highlight the importance of analysing multiple aspects of data manipulation together (Gábor et al., 2019). We add data quality to the list of drivers that can notably impact model performance, such as species characteristics, modeling technique and sample size (Gábor et al., 2019;Tessarolo et al., 2014). ...
... The different drivers of model performance make the interpretation complex, but also highlight the importance of analysing multiple aspects of data manipulation together (Gábor et al., 2019). We add data quality to the list of drivers that can notably impact model performance, such as species characteristics, modeling technique and sample size (Gábor et al., 2019;Tessarolo et al., 2014). Compared to these factors, previous studies found marginal importance of the impact of sampling bias (Gábor et al., 2019;Tessarolo et al., 2014) and we have no reason to contest this finding based on our results (but note that we partially controlled for sampling bias by spatial thinning (Kramer-Schadt et al., 2013)). ...
Article
Full-text available
Opportunistically collected species occurrence data are often used for species distribution models (SDMs) when high-quality data collected through standardized recording protocols are unavailable. While opportunistic data are abundant, uncertainty is usually high, e.g. due to observer effects or a lack of metadata. To increase data quality and improve model performance, we filtered species records based on record attributes that provide information on the observation process or post-entry data validation. Data filtering does not only increase the quality of species records, it simultaneously reduces sample size, a trade-off that remains relatively unexplored. By controlling for sample size in a dataset of 255 species, we were able to explore the combined impact of data quality and sample size on model performance. We applied three data quality filters based on observers' activity, the validation status of a record in the database and the detail of a submitted record, and analyzed changes in AUC, Sensitivity and Specificity using Maxent with and without filtering. The impact of stringent filtering on model performance depended on (1) the quality of the filtered data: records validated as correct and more detailed records lead to higher model performance, (2) the proportional reduction in sample size caused by filtering and the remaining absolute sample size: filters causing small reductions that lead to sample sizes of more than 100 presences generally benefitted model performance and (3) the taxonomic group: plant and dragonfly models benefitted more from data quality filtering compared to bird and butterfly models. Our results also indicate that recommendations for quality filtering depend on the goal of the study, e.g. increasing Sensitivity and/or Specificity. Further research must identify what drives species' sensitivity to data quality. Nonetheless, our study confirms that large quantities of volunteer generated and opportunistically collected data can make a valuable contribution to ecological research and species conservation.
... In any case, SDMs are seldom fitted using species (and environmental) data collected strictly for that purpose. Instead, biodiversity data used as input in SDMs are primarily opportunistic and sampled for different purposes (Gábor et al., 2020;Hirzel & Guisan, 2002). ...
... This suggests that the sampling of presence/absence data should be planned on a case-by-case basis, that is according to the ecological characteristics of the species (span of the niche breadth and distribution extent) and the environmental heterogeneity of the study area (Chefaoui et al., 2011). We also found that collecting more data (increasing the sample size N) alleviates the impact of the sampling strategy on the variance and RMSE of the coefficients, thereby confirming results from previous studies (Albert et al., 2010;Chefaoui et al., 2011;Gábor et al., 2020;Tessarolo et al., 2014). This suggests that, although exhaustive sampling campaigns are time-and cost-consuming, larger sample sizes successfully improve the estimation of species response curves irrespective of the sampling strategy used. ...
Article
Full-text available
Aim Assessing how different sampling strategies affect the accuracy and precision of species response curves estimated by parametric species distribution models. Major Taxa Studied Virtual plant species. Location Abruzzo (Italy). Time Period Timeless (simulated data). Methods We simulated the occurrence of two virtual species with different ecology (generalist vs specialist) and distribution extent. We sampled their occurrence following different sampling strategies: random, stratified, systematic, topographic, uniform within the environmental space (hereafter, uniform) and close to roads. For each sampling design and species, we ran 500 simulations at increasing sampling efforts (total: 42,000 replicates). For each replicate, we fitted a binomial generalised linear model, extracted model coefficients for precipitation and temperature, and compared them with true coefficients from the known species' equation. We evaluated the quality of the estimated response curves by computing bias, variance and root mean squared error (RMSE). Additionally, we (i) assessed the impact of missing covariates on the performance of the sampling approaches and (ii) evaluated the effect of incompletely sampling the environmental space on the uniform approach. Results For the generalist species, we found the lowest RMSE when uniformly sampling the environmental space, while sampling occurrence data close to roads provided the worst performance. For the specialist species, all sampling designs showed comparable outcomes. Excluding important predictors similarly affected all sampling strategies. Sampling limited portions of the environmental space reduced the performance of the uniform approach, regardless of the portion surveyed. Main Conclusions Our results suggest that a proper estimate of the species response curve can be obtained when the choice of the sampling strategy is guided by the species' ecology. Overall, uniformly sampling the environmental space seems more efficient for species with wide environmental tolerances. The advantage of seeking the most appropriate sampling strategy vanishes when modelling species with narrow realised niches.
... In any case, SDMs are seldom fitted using species (and environmental) data collected strictly for that purpose. Instead, biodiversity data used as input in SDMs are primarily opportunistic and sampled for different purposes (Hirzel & Guisan 2002;Gábor, Moudrý, Barták, Lecours, 2020). Examples include opportunistic data from museum collections or herbaria (Newbold 2010), citizen science (Leandro, Jay-Robert, Mériguet, Houard, Renner, 2020;Feldman et al., 2021), vegetation surveys (Bazzichetto et al., 2021); or a combination of these (Wasof et al., 2015). ...
... This suggests that the sampling of presence/absence data should be planned on a case-by-case basis, i.e., according to the ecological characteristics of the species (span of the niche breadth and distribution extent) and the environmental heterogeneity of the study area (Chefaoui, Lobo, Hortal, 2011). We also found that collecting more data (increasing the sample size N) alleviates the impact of the sampling strategy on the variance and RMSE of the coefficients, thereby confirming results from previous studies (Albert et al., 2010;Chefaoui et al., 2011;Tessarolo et al., 2014;Gábor et al., 2020). This suggests that, although exhaustive sampling campaigns are time-and cost-consuming, larger sample sizes successfully improve the estimation of species response curves irrespective of the sampling strategy used. ...
Preprint
Full-text available
Aim: Assessing how different sampling strategies affect the accuracy and precision of species response curves estimated by parametric Species Distribution Models. Major taxa studied: Virtual plant species. Location: Abruzzo (Italy). Time period: Timeless (simulated data). Methods: We simulated the occurrence of two virtual species with different ecology (generalist vs specialist) and distribution extent. We sampled their occurrence following different sampling strategies: random, stratified, systematic, topographic, uniform within the environmental space (hereafter, uniform), and close to roads. For each sampling design and species, we ran 500 simulations at increasing sampling efforts (total: 42,000 replicates). For each replicate, we fitted a binomial generalised linear model, extracted model coefficients for precipitation and temperature, and compared them with true coefficients from the known species’ equation. We evaluated the quality of the estimated response curves by computing bias, variance, and root mean squared error. Additionally, we i) assessed the impact of missing covariates on the performance of the sampling approaches and ii) evaluated the effect of incompletely sampling the environmental space on the uniform approach. Results: For the generalist species, we found the lowest root mean squared error when uniformly sampling the environmental space, while sampling occurrence data close to roads provided the worst performance. For the specialist species, all sampling designs showed comparable outcomes. Excluding important predictors similarly affected all sampling strategies. Sampling limited portions of the environmental space reduced the performance of the uniform approach, regardless of the portion surveyed. Main conclusions: Our results suggest that a proper estimate of the species response curve can be obtained when the choice of the sampling strategy is guided by the species’ ecology. Overall, uniformly sampling the environmental space seems more efficient for species with wide environmental tolerances. The advantage of seeking the most appropriate sampling strategy vanishes when modelling species with narrow realised niches.
... This consisted of the following steps: (1) removing duplicate records, (2) verifying records with geographic inconsistencies, and (3) reducing areas with a high density of records, linked to oversampling close to accessible areas (settlements, roads, rivers, etc.) [44,45], in order to mitigate spatial autocorrelation and overfitting in the models [46,47]. This process was performed using the spThin package in R [48], managing a minimum distance of 1 km between each record. In this context, the cleaned database included 366 presence records ( Figure 1). ...
... Multiple recommendations have been made in recent decades to improve the quality of niche models [34,35], including adequate cleaning of presence registers [33], delimitation of the calibration area (M) [48,49], incorporating a wide selection of parameters and configurations to minimize model complexity [56], and the use of multiple statistical criteria during the evaluation process [54]. These recommendations were taken into account when developing the final models, resulting in the generation of robust models at the predictive level. ...
Article
Full-text available
At present, climate change is a direct threat to biodiversity and its effects are evidenced by an increasingly accelerated loss of biodiversity. This study identified the main threats presently facing the Tapirus pinchaque species in Ecuador, generated predictive models regarding its distribution, and analyzed the protected areas as a conservation tool. The methodology was based on a literature review and the application of binary predictive models to achieve these objectives. The main results indicate that the T. pinchaque is seriously threatened, mainly by changes in land use. In addition, three models were selected that show current and future suitable areas for the conservation of the species. Its current distribution amounts to 67,805 km2, 33% (22,872 km2) of which is located in 31 of the 61 protected areas. Finally, it is important to take timely actions focused on biodiversity conservation, considering the importance of balance in ecosystems to the humans dependent thereof, and the results regarding the changes in the current and future distribution areas of the mountain tapir are a great contribution to be used as a management tool for its conservation.
... leave-one-out; Pearson et al., 2007) improves model assessments and Ensemble of Small Models (ESMs) enable to deal with model complexity while keeping sufficient explanatory power (Breiner et al. 2015). Other techniques are dedicated to sampling bias correction, often implying the filtering of occurrence or environmental data (Gábor et al., 2019) or non-random pseudo-absence selection (Phillips et al., 2009). However, data filtering can become problematic for species with low sample sizes (Vollering et al., 2019), especially when species distribution is highly localised (Inman et al., 2021) and is not recommended in absence of evidence of bias in occurrence data (Gábor et al., 2019). ...
... Other techniques are dedicated to sampling bias correction, often implying the filtering of occurrence or environmental data (Gábor et al., 2019) or non-random pseudo-absence selection (Phillips et al., 2009). However, data filtering can become problematic for species with low sample sizes (Vollering et al., 2019), especially when species distribution is highly localised (Inman et al., 2021) and is not recommended in absence of evidence of bias in occurrence data (Gábor et al., 2019). Similarly, non-random pseudo-absence selection is not always effective (Dubos et al., 2021a(Dubos et al., , 2021b and tends to make predictions worse in narrow-niche species (Inman et al., 2021). ...
Article
Full-text available
Narrow-ranging species are usually omitted from Species distribution models (SDMs) due to statistical constraints, while they are predicted to be particularly vulnerable to climate change. The recently available high-resolution environmental predictors, along with recently developed methods enable to increase the eligibility of narrow-ranging species for SDMs, provided their distribution is well known. We fill a gap of knowledge on the effect of predicted climate change on narrow-ranging species. We modelled the distribution of the golden mantella frog Mantella aurantiaca and the Manapany day gecko Phelsuma inexpectata, for which the distribution of their occurrence records is well documented. Our modelling scheme included a range of processes susceptible to address statistical issues related to narrow-ranging species. We predict an alarming decline in climate suitability in the whole current distribution area of both species by 2070, potentially leading to a complete extinction in most scenarios. We identified the areas with the best climate suitability in the future, but these remain largely suboptimal regarding species climatic niche. The high level of habitat fragmentation suggests that both species likely need to be at least partly translocated. Climate change may not only drive range contractions or distribution shifts in narrow-ranging species, but may lead to the complete extirpation of suitable environments across their entire region. This study suggests that the level of threats of narrow-ranging species already identified as threatened may be underestimated, especially in heterogeneous tropical areas. We stress the need to develop sampling campaigns and implement proactive actions for narrow-ranging species in the tropics.
... However, their use remains complex, which implies the need to follow good configuration practices [7] and to interpret the results by combining statistical, spatial, and expert-based indices [8]. Notably, the performance of OCCs is highly sensitive to classifier parametrization (e.g., fitting, thresholding, variable selection) [9][10][11], the quality of the predictive variables used [12], and the reference data [13]. Moreover, assessing the accuracy of OCCs remains challenging without absence data [14]. ...
... Remote Sens. 2021,13, 1892 ...
Article
Full-text available
Advances in remote sensing (RS) technology in recent years have increased the interest in including RS data into one-class classifiers (OCCs). However, this integration is complex given the interdisciplinary issues involved. In this context, this review highlights the advances and current challenges in integrating RS data into OCCs to map vegetation classes. A systematic review was performed for the period 2013–2020. A total of 136 articles were analyzed based on 11 topics and 30 attributes that address the ecological issues, properties of RS data, and the tools and parameters used to classify natural vegetation. The results highlight several advances in the use of RS data in OCCs: (i) mapping of potential and actual vegetation areas, (ii) long-term monitoring of vegetation classes, (iii) generation of multiple ecological variables, (iv) availability of open-source data, (v) reduction in plotting effort, and (vi) quantification of over-detection. Recommendations related to interdisciplinary issues were also suggested: (i) increasing the visibility and use of available RS variables, (ii) following good classification practices, (iii) bridging the gap between spatial resolution and site extent, and (iv) classifying plant communities.
... We use a simulation approach to explore the effects of sampling bias on SDM in the Maxent PB framework and to compare three bias correction methods that are often used but have not been systematically compared across a robust set of virtual species. Where these methods have been evaluated previously (e.g., Kramer-Schadt et al. 2013, Varela et al. 2014, Fourcade et al. 2014, Stolar and Nielsen 2015, Gábor et al. 2019, emphasis has been on spatial predictions of habitat potential. Here, we dig deeper into their use by exploring how sampling bias not only affects spatial predictions, but also our understanding of fundamental niche characteristics such as which explanatory variables and species-environment relationships best represent a species' true niche. ...
... SDM is most often used for spatial prediction of distributions, and in this case, the use of bias correction methods is clearly recommended. Previous work with virtual species has shown contrasting results, however, with some studies suggesting that G-Filter methods may outperform the Fac-torBiasOut method in some instances (Kramer-Schadt et al. 2013, Fourcade et al. 2014, Stolar and Nielsen 2015, but others showing E-Filter methods as superior to G-Filter methods when using climatic variables (Varela et al. 2014), or that that E-Filtering can lead to mixed results depending on environmental gradients (Gábor et al. 2019) and may not necessarily provide improvements. Each of these studies considered a differing number of virtual species, with at least one, but no more than ten considered. ...
Article
Full-text available
Abstract A key assumption in species distribution modeling (SDM) with presence‐background (PB) methods is that sampling of occurrence localities is unbiased and that any sampling bias is proportional to the background distribution of environmental covariates. This assumption is rarely met when SDM practitioners rely on federated museum records from natural history collections for geo‐located occurrences due to inherent sampling bias found in these collections. We use a simulation approach to explore the effectiveness of three methods developed to account for sampling bias in SDM with PB frameworks. Two of the methods rely on careful filtering of observation data—geographic thinning (G‐Filter) and environmental thinning (E‐Filter)—while a third, FactorBiasOut, creates selection weights for background data to bias locations toward areas where the observation dataset was sampled. While these methods have been assessed previously, evaluation has emphasized spatial predictions of habitat potential. Here, we dig deeper into the effectiveness of these methods by exploring how sampling bias not only affects predictions of habitat potential, but also our understanding of niche characteristics such as which explanatory variables and response curves best represent species–environment relationships. We simulate 100 virtual species ranging from generalist to specialist in their habitat preferences and introduce geographic and environmental bias at three intensity levels to measure the effectiveness of each correction method to (1) predict true probability of occurrence across a study area, (2) recover true species–environment relationships, and (3) identify true explanatory variables. We find that the FactorBiasOut most often showed the greatest improvement in recreating known distributions but did no better at correctly identifying environmental covariates or recreating species–environment relationships than G‐Filter or E‐Filter methods. Narrow niche species are most problematic for biased calibration datasets, such that correction methods can, in some cases, make predictions worse.
... The virtual species approach allowed us to control the experiment and to isolate the effects of positional error ( Zurell et al. 2010). This approach is increasingly used to evaluate the effects of data inaccuracies on model performance ( Barbet-Massin et al. 2012, Václavík and Meentemeyer 2012, Qiao et al. 2015, Ranc et al. 2016, Fernandes et al. 2018, Leroy et al. 2018, Moudrý et al. 2018, Gábor et al. 2019, Meynard et al. 2019), but has yet to be adopted for the study of positional error. In particular, we tested whether: 1) SDMs for specialist species are more affected by positional error than those for generalist species; 2) it is possible to compensate the assumed negative effect of a positional error with a higher sample size; and 3) the positional error has different effects when using a parametric (e.g. ...
... We selected generalized linear models (GLM; Nelder andBaker 1972, Oksanen andMinchin 2002) as a presence/ absence method and MaxEnt ( Phillips et al. 2006) as a presence-background method that are often adopted in ecological studies (Moudrý and Šímová 2013, Linda et al. 2016, Malavasi et al. 2018, Gábor et al. 2019, Watts et al. 2019). In addition, Graham et al. (2008) showed that these two approaches were among the better performing modelling techniques when the data was affected by positional errors. ...
Article
Full-text available
Species occurrences inherently include positional error. Such error can be problematic for species distribution models (SDMs), especially those based on fine‐resolution environmental data. It has been suggested that there could be a link between the influence of positional error and the width of the species ecological niche. Although positional errors in species occurrence data may imply serious limitations, especially for modelling species with narrow ecological niche, it has never been thoroughly explored. We used a virtual species approach to assess the effects of the positional error on fine‐scale SDMs for species with environmental niches of different widths. We simulated three virtual species with varying niche breadth, from specialist to generalist. The true distribution of these virtual species was then altered by introducing different levels of positional error (from 5 to 500 m). We built generalized linear models and MaxEnt models using the distribution of the three virtual species (unaltered and altered) and a combination of environmental data at 5 m resolution. The models’ performance and niche overlap were compared to assess the effect of positional error with varying niche breadth in the geographical and environmental space. The positional error negatively impacted performance and niche overlap metrics. The amplitude of the influence of positional error depended on the species niche, with models for specialist species being more affected than those for generalist species. The positional error had the same effect on both modelling techniques. Finally, increasing sample size did not mitigate the negative influence of positional error. We showed that fine‐scale SDMs are considerably affected by positional error, even when such error is low. Therefore, where new surveys are undertaken, we recommend paying attention to data collection techniques to minimize the positional error in occurrence data and thus to avoid its negative effect on SDMs, especially when studying specialist species.
... Además, se descargaron de forma manual registros de la Base de datos de la Unión Internacional para la Conservación de la Naturaleza (UICN; www.iucnredlist.org). Posteriormente, con el objetivo de mitigar el sesgo potencial de muestreo asociado a los datos de presencia (Gábor et al., 2020;Zizka et al., 2021) y siguiendo los estándares sugeridos para la elaboración de modelos ecológicos (Araújo et al., 2019;Zurell et al., 2020); se llevó a cabo un protocolo de limpieza de datos (Cobos et al., 2018;Simoes et al., 2020). El protocolo consistió en lo siguiente: 1) Eliminar registros reportados fuera del rango nativo de distribución de O. pyramidale; 2) Conservar registros reportados desde 1900 hasta la actualidad y se encuentren en categoría de espécimen preservado; 3) Eliminar registros duplicados y con valores nulos en sus coordenadas; 4) reducir densidad de registros asociada al esfuerzo de colecta (Lobo, 2015), usando un distancia mínima de reducción de 5km. ...
Article
Full-text available
Este estudio investiga el potencial de la balsa (Ochroma pyramidale) como una herramienta sostenible para la restauración de áreas degradadas en la Amazonía Ecuatoriana. A través del modelado del nicho ecológico utilizando Maxent y la metodología Kuenm, identificamos aproximadamente 7423 km2 de áreas climáticamente adecuadas para el cultivo y restauración con O. pyramidale en la región. La importancia radica en su papel para conservar la biodiversidad, mitigar el cambio climático y fomentar el desarrollo económico. Los resultados resaltan la viabilidad de esta especie nativa en la restauración, destacando su potencial para promover la biodiversidad y captura de carbono, superando a especies exóticas. Estas conclusiones tienen implicaciones para la conservación y la restauración, y respaldan la necesidad de futuras investigaciones que evalúen su impacto en la flora y fauna local. La balsa emerge como una alternativa valiosa para enfrentar los desafíos ambientales y socioeconómicos en la Amazonía Ecuatoriana, con el respaldo de colaboraciones interdisciplinarias y la orientación de tomadores de decisiones para implementar programas de restauración basados en la información climática proporcionada por este estudio.
... Disturbance-stratified Reduced sample sizes due to filtering were often associated with greater divergence from the unfiltered model, as well as lower performance metrics. While a smaller, evenly sampled dataset has been found to be more effective than a larger, biased one (Bean et al., 2012;Varela et al., 2014), sample size can have a substantial effect on HSM performance (Gábor et al., 2020), and thus small samples may be further negatively impacted by environmental filtering. We observed a strong negative correlation between per cent decrease in sample size and overlap with the unfiltered HSM (i.e. the more records were filtered out, the less similar the iNaturalist models became). ...
Article
Full-text available
Aim Citizen science is a cost‐effective potential source of invasive species occurrence data. However, data quality issues due to unstructured sampling approaches may discourage the use of these observations by science and conservation professionals. This study explored the utility of low‐structure iNaturalist citizen science data in invasive plant monitoring. We first examined the prevalence of invasive taxa in iNaturalist plant observations and sampling biases associated with these data. Using four invasive species as examples, we then compared iNaturalist and professional agency observations and used the two datasets to model suitable habitat for each species. Location Hawai'i, USA. Methods To estimate the prevalence of invasive plant data, we compared the number of species and observations recorded in iNaturalist to botanical checklists for Hawai'i. Sampling bias was quantified along gradients of site accessibility, protective status and vegetation disturbance using a bias index. Habitat suitability for four invasive species was modelled in Maxent, using observations from iNaturalist, professional agencies and stratified subsets of iNaturalist data. Results iNaturalist plant observations were biased towards invasive species, which were frequently recorded in areas with higher road/trail density and vegetation disturbance. Professional observations of four example invasive species tended to occur in less accessible, native‐dominated sites. Habitat suitability models based on iNaturalist versus professional data showed moderate overlap and different distributions of suitable habitat across vegetation disturbance classes. Stratifying iNaturalist observations had little effect on how suitable habitat was distributed for the species modelled in this study. Main Conclusions Opportunistic iNaturalist observations have the potential to complement and expand professional invasive plant monitoring, which we found was often affected by inverse sampling biases. Invasive species represented a high proportion of iNaturalist plant observations, and were recorded in environments that were not captured by professional surveys. Combining the datasets thus led to more comprehensive estimates of suitable habitat.
... Concerning spatial data thinning 33 , it might decrease the probability of retaining species with unique environmental conditions. However, in case of a gradual species response to environmental gradients, there is a high model sensitivity to an inappropriate use of data thinning in the environmental space, based on e.g., thresholding methods 125 . From this point of view, a blind data thinning without testing model sensitivity is strongly discouraged. ...
Article
Full-text available
Ecological processes are often spatially and temporally structured, potentially leading to autocorrelation either in environmental variables or species distribution data. Because of that, spatially-biased in-situ samples or predictors might affect the outcomes of ecological models used to infer the geographic distribution of species and diversity. There is a vast heterogeneity of methods and approaches to assess and measure spatial bias; this paper aims at addressing the spatial component of data-driven biases in species distribution modelling, and to propose potential solutions to explicitly test and account for them. Our major goal is not to propose methods to remove spatial bias from the modelling procedure, which would be impossible without proper knowledge of all the processes generating it, but rather to propose alternatives to explore and handle it. In particular, we propose and describe three main strategies that may provide a fair account of spatial bias, namely: (i) how to represent spatial bias; (ii) how to simulate null models based on virtual species for testing biogeographical and species distribution hypotheses; and (iii) how to make use of spatial bias - in particular related to sampling effort - as a leverage instead of a hindrance in species distribution modelling. We link these strategies with good practice in accounting for spatial bias in species distribution modelling.
... The results of the phylogenetic diversity (PDMPD and PDMNTD) in all university campuses were less than zero, suggesting that bird assemblages were characterized by a clustered community structure (SES value of between −2.5 and −1.0), following the environmental filtering hypothesis [43]. Habitat environments on campuses are relatively simple, which can atract bird species with similar ecological requirements in terms of habitat and food demands, resulting in closely related species being assembled in the same community [44]. ...
Article
Full-text available
Simple Summary Accelerated urbanization has changed the composition of regional landscape patterns, directly affecting the composition of bird communities. This study analyzes bird community assembly mechanisms and the driving factors in university campuses in Nanjing, China. We found that the phylogeny of bird communities in all universities followed a pattern of aggregation. Grass, water, and buildings were the main factors affecting the campus bird communities’ functional and phylogenetic diversity. Based on our results, we offer several practical measures for urban planners to better protect urban biodiversity and develop eco-friendly cities. Abstract University campuses are important components of cities, harboring the majority of urban biodiversity. In this study, based on monthly bird survey data covering 12 university campuses located either downtown or in the newly developed areas in Nanjing, China, in 2019, we studied the assembly processes of each campus’s bird population and their main drivers by modeling a set of ecological and landscape determinants. Our results showed that (1) bird abundance and species diversity in the newly developed areas were significantly higher than in those downtown; (2) the phylogeny of bird communities in all universities followed a pattern of aggregation, indicating that environmental filtering played a major role in community assembly; (3) specifically, grass, water, and buildings were the main factors affecting each campus’s bird community’s functional and phylogenetic diversity, with the areas of grass and water habitats having a significant positive correlation with phylogenetic diversity, while the size of building areas was negatively correlated. Our results emphasize that habitat features play a decisive role in determining urban bird population diversity and community assembly processes. We suggest that increasing landscape diversity, e.g., by reasonably arranging the location and area of water bodies and grasslands and improving the landscape connectivity, could be a powerful way to maintain and promote urban bird diversity.
... Results of the phylogenetic diversity (PDMPD and PDMNTD) in all universities campuses were less than zero, suggesting that bird assemblages were characterized by a clustered community structure (SES value between − 2.5 and − 1.0), following the environmental ltering hypothesis (Gábor et al. 2020). Habitat environments in campuses are relatively simple, which can inhibit bird species with similar ecological requirements in terms of the habitat and food demands, resulting in closely related species assembled in the same community (Freeman et al. 2022). ...
Preprint
Full-text available
Understanding the drivers of community assembly process is of great importance for better conservation outcomes; and the main mechanisms include competitive exclusion, environmental filtering and neutral assembly. While mechanisms of assembly processes for vertebrates living in natural habitats have been well studied, their urban counterparts encountering highly human modified environments are still largely understudied. As a result, there are knowledge gaps for urban planners to better protect urban biodiversity and develop eco-friendly cities. University campuses are important components of cities, harboring the majority of urban biodiversity. In this study, based on monthly bird surveys data covering 12 university campuses located either downtown or in the newly developed areas in Nanjing, China, in 2019, we studied the assembly processes of campus’s birds, and their main drivers, by modeling a set of ecological and landscape determinants. Our results showed that bird diversity in the newly developed areas were significantly higher than those downtown. The phylogeny of bird communities in all universities followed a pattern of aggregation, indicating that environmental filtering played a major role in the community assembly. Specifically, grass, water and building were the main factors affecting campus’s bird functional and phylogenetic diversity, with the area of grass and water habitats having a significant positive correlation with phylogenetic diversity while the building areas was negatively correlated. Our results emphasize that habitat features play a decisive role in determining urban bird community assembly processes. We suggest that increasing landscape diversity and improving the landscape connectivity could be a powerful way to maintain and promote urban bird diversity.
... Spatial sampling bias is a major factor affecting the predictive performance of SDMs (Araújo and Guisan, 2006;Barbet-Massin et al., 2012;Kramer-Schadt et al., 2013;Meynard et al., 2019). A number of procedures have been developed to account for sampling bias, which include spatial filtering of presence points (Edrén et al., 2010;Boria et al., 2014;Matutini et al., 2021), environmental filtering (Varela et al., 2014;Gábor et al., 2020), the combination of presence-only and standardised presence-absence data (Dorazio, 2014;Fithian et al., 2015;Koshkina et al., 2017) and the production of a similar sampling bias in non-presence background data/pseudoabsences (Phillips et al., 2009). However, presence points and environmental filtering consist in the removal of occurrence data, thereby inducing a loss of information and statistical power. ...
Article
Full-text available
1. Open-source biodiversity databases contain a large number of species occurrence records but are often spatially biased; which affects the reliability of species distribution models based on these records. Sample bias correction techniques require data filtering which comes at the cost of record numbers, or require considerable additional sampling effort. Since independent data is rarely available, assessment of the correction technique often relies solely on performance metrics computed using subsets of the available – biased – data, which may prove misleading. 2. Here, we assess the extent to which an acknowledged sample bias correction technique is likely to improve models’ ability to predict species distributions in the absence of independent data. We assessed variation in model predictions induced by the aforementioned correction and model stochasticity; the variability between model replicates related to a random component (pseudo-absences sets and cross-validation subsets). We present, then, an index of the effect of correction relative to model stochasticity; the Relative Overlap Index (ROI). We investigated whether the ROI better represented the effect of correction than classic performance metrics (Boyce index, cAUC, AUC and TSS) and absolute overlap metrics (Schoener’s D, Pearson’s and Spearman’s correlation coefficients) when considering data related to 64 vertebrate species and 21 virtual species with a generated sample bias. 3. When based on absolute overlaps and cross-validation performance metrics, we found that correction produced no significant effects. When considering its effect relative to model stochasticity, the effect of correction was strong for most species at one of the three sites. The use of virtual species enabled us to verify that the correction technique improved both distribution predictions and the biological relevance of the selected variables at the specific site, when these were not correlated with sample bias patterns. 4. In the absence of additional independent data, the assessment of sample bias correction based on subsample data may be misleading. We propose to investigate both the biological relevance of environmental variables selected, and, the effect of sample bias correction based on its effect relative to model stochasticity.
... This could be due to the number of PO records used to fit these SDMs. Species rarity leads to small numbers of PO records and hence alters the ability of models to fully capture the species-environment relationship (Gábor et al. 2020). The results show that with the PO model, even b(s) close to 1, the biases in the estimates are not zero. ...
Article
Full-text available
Species distribution models (SDMs) have become tools of great importance in ecology, as advanced knowledge of suitable species habitat is required for the process of global biodiversity conservation. Presence-only data are the more abundant and readily available data widely used in SDM applications. These data should be treated as a thinned Poisson process to account for detection errors related to sampling bias and imperfect detection that arise in them. Failure to do so could be detrimental to SDM's predictions. This study assesses the effects of the species abundance, the variation in detection probability, and the number of sites visited in planned surveys on the performance of SDMs accounting for detection errors using simulated data. The results show that the accuracy and precision of estimates differ depending on models and species abundance. Their main difference lies in their ability to estimate 0 , the model intercept. The lower the species abundance, the higher the bias and variance of ̂0. Furthermore, the lower the detection probability, the higher the bias and variance of ̂0. However, 1 , the slope parameter, is estimated with almost high accuracy and precision for all models. This study demonstrates the low efficiency of accounting for sampling bias and imperfect detection based on presence-only data alone. Analysing presence-only data in conjunction with point-count outperformed the other approaches, whatever the species abundance, as long as the detection probability is at least 0.25 with average values of detectability covariates. The acceptable accuracy and precision, the minimum number of sites to consider vary depending on species abundance. At least 200 sites are required for the rare species, whereas 50 sites can suffice for the abundant species. Since collecting high-quality data are very expensive, this study emphasizes the need to promote initiatives such as citizen science programs that aim to collect species occurrence data with as little bias as possible.
... models specific to a particular dataset) owing to spatial autocorrelation of environmental variables (Boria et al., 2014). Spatial filtering can reduce this potential source of model bias, but must be done carefully as it risks lowering sample sizes and correspondingly reducing model performance (Gábor et al., 2020 (Table 1). ...
Article
• Olympic mudminnow (Novumbra hubbsi) is the only endemic freshwater fish species in Washington State and is limited to south-western and northern coastal wetlands there. Population decline has led to its listing as state ‘Sensitive’, while recent genetic analysis has identified north coast populations as a sub-group of potential concern because of historical isolation and low level of occurrence. Substantial knowledge gaps about the species have made further assessment of conservation status difficult and hampered proactive conservation measures. • This article describes a three-tiered approach to evaluate conservation status, comprising: (i) a set of high-priority research questions identified by experts and stakeholders to advance conservation knowledge; (ii) a habitat suitability model to identify environmental factors related to the presence and absence of the species; and (iii) synthesis of information from that suitability model and other research to evaluate status using IUCN Red List criteria. Together these components provide an initial research agenda to guide future management and monitoring. • Evidence for an elevated conservation status of Olympic mudminnow across its entire range is mixed, with knowledge about changes in distribution and population over time lacking in many places. The case for enhanced conservation status is strongest for the genetically distinct populations along the northern coast, where suitable habitat is limited and populations highly disjointed. • Development of a monitoring network that would detect changes in distribution and abundance of key populations is a central requirement for the conservation of the species, which is threatened by the effects of climate, land use change and invasive species. • Involving stakeholders in conservation planning at early stages allows development of robust, practical research questions based on shared data and diverse expertise. This is particularly valuable where species knowledge is highly constrained. These elements produce confidence in implementing research and management actions that are more likely to gain wide acceptance and increase knowledge of conservation status and needs over time.
... Moreover, the HC5 values of Ni, Cu, and As differ from the national standard value based on Soil Environmental Quality Risk Control Standard for Soil Contamination of Development Land (MEE 2018) ( Table 5). This may be related to the selection of species for the analysis, which can greatly affect the results (Gábor et al. 2020). As the species composition of ecological systems differs from one location to another, national benchmarks may not apply to a specific research area (Xu et al. 2015). ...
Article
Full-text available
With the acceleration of urbanization, road dust poses a significant threat to ecological systems. To explore the source and ecological risks associated with heavy metals in surface dust, 36 road dust samples were collected in Beijing in May, August, and November 2018, and February 2019. The results indicated that the concentrations of Cd, Cr, Cu, Hg, Ni, Pb, and Zn exceeded their background values, and the mean concentration of Cd was nearly 10 times the background value. Three main sources were identified by Principal Component Analysis (PCA): industrial emissions, fuel combustion, and traffic exhaust. According to the Genetic Algorithm optimized Back Propagation (GA-BP) neural network, heavy metals from industrial sources and fuel combustion originate from suburban districts, while heavy metals from traffic exhaust originate from the city center. After careful screening of species and corresponding toxicity data, the concentration of pollutants corresponding to a 5% cumulative probability on the curve (HC5) of eight heavy metals was calculated by species sensitivity distribution (SSD), and the results indicate that most HC5 values are consistent with national standards. However, the HC5 values of Ni, Cu, and As differed from the national standards, which can be used to supplement existing standards. From the risk analysis based on HC5 values, the high-risk area of Ni (12.70–30.97%) was much larger than that of As (1.55–4.95%), and the proportions of high-risk areas of Ni and As reached their highest values in spring and autumn, respectively. Uncertainty existed between different geostatistical methods. The areas of uncertainty of Ni and As were small (less than 5.58%). The heavy metals in the surface dust in Beijing had potential ecological risks that vary with seasons and regions. Therefore, it is critical to implement targeted management measures based on pollution sources and risk distribution.
... Thapa et al. [13] treated all bamboo species as single entity in modelling, while we predicted species-wise distribution and integrated the predicted models of all species to predict overall bamboo distribution. In addition, we have used a large sample size [29] and excluded correlated variables in distribution prediction [17], which makes our estimation more accurate. ...
Chapter
Bamboo availability is a central aspect of red panda conservation. However, its diversity, distribution, and species-wise contribution to red panda diet is unknown. This study aimed to list bamboo species, map their distribution and evaluate their contribution in red panda diet. We recorded 13 bamboo species in red panda range of Nepal. Of these, 11 bamboo species contributed 99% of the total bamboo constituent present in red panda diet. The four bamboo species, namely Thamnocalamus spathiflorus, Yushania maling, Y. microphylla and Drepanostachyum falcatum contributed more than three fourth of the total contributions in the diet. Interestingly, no evidence suggested the discrimination of red panda towards these bamboo species. Red panda could be more vulnerable to bamboo loss, and the level of vulnerability could be severe especially during the bamboo flowering events. We suggest plantation of alternative bamboo species, regulate bamboo harvesting, and manage connectivity in red panda habitat.
... In practice, however, sampling bias is inevitable, especially when the input data has to be assembled from different sources that are based on different sampling methods Lobo & Tognelli, 2011;Qiao et al., 2017;Stolar & Nielsen, 2015). While the sampling bias caused by uneven sampling can be reduced by rarifying or filtering the occurrence records (Castellanos et al., 2019;Gábor et al., 2020;, imprecisely recorded occurrence locations are very difficult (although not entirely impossible) to correct (Hefley et al., 2017). In certain cases, e.g. when using citizen-science databases or local monitoring systems, the occurrence locations of the species may be of a coarse precision or only available at municipal or county level (i.e. ...
Thesis
Full-text available
Vector-borne diseases are infectious diseases that are transmitted among vertebrate hosts by (typically arthropod) vectors. Among the whole world’s population, 80% is at risk of one or more vector-borne diseases, leading to an annual death toll of 700 000. These striking numbers are calling for urgent actions to prevent vector-borne diseases from emerging further. However, to apply preventions, we need to know where a risk exists; and if possible, when the prevention should take place. The key to those two primary questions are risk maps, which are typically generated with ecological niche models or epidemiological models. Ecological niche models require occurrence records of the transmissions and the respective environmental variables (mostly long-term-averaged) to build a correlative model. This correlative model can be projected to a different spatial extent, or into future climate scenarios, etc., showing the spatial outbreak risk. Epidemiological models, on the other hand, look into the transmission process and thus require a good understanding of the transmission cycle of the investigated vector-borne disease. Epidemiological models can work with time-series data, and produce spatio-temporal risk maps based on the basic reproduction number R0. In practice, both ecological niche models and epidemiological models have their respective strengths and drawbacks. In this thesis, I contribute to the improvement of both approaches by analyzing some of their drawbacks and making suggestions for new standards. For ecological niche models, the correlative models are highly dependent on the quality of occurrence records. In this thesis, I investigate how positional error, i.e. substituting the geographical centroid of the respective administrative spatial unit for unknown occurrence records, affects model performance in the context of varying grain size of environmental data. I quantify the decrease of model performance caused by the use of geographical centroids and varying grain size, respectively. As a consequence, I suggest that special cautions should be given when geographical centroids are applied as substitutes; when possible, central tendency values should be preferred. For epidemiological models, I review the common ways to generate risk maps and illustrate them with an example. I demonstrate that using different temporal aggregation methods affects the comparability and the quantity information of the resulting maps; and that via different visualization methods, two fundamentally different maps can appear very similar, and vice versa. Consequently, I highlight the importance of using appropriate temporal aggregations and visualizations and give suggestions for best practice. I recommend to show both intensity and duration of the risk, using small time-steps to show spatio-temporal dynamics when possible. Pushing towards new standards for best practice in vector-borne disease risk mapping, I directly compare ecological niche models and epidemiological models, using Usutu virus as an example. The results from the parallel-model approach shows that relying on a single model for assessing vector-borne disease risk may lead to incomplete conclusions. For future research, it is crucial to realize this and aim to apply different modelling approaches for risk-assessment of under-studied emerging pathogens like Usutu virus.
... In practice, however, sampling bias is inevitable, especially when the input data have to be assembled from different sources that are based on different sampling methods Lobo & Tognelli, 2011;Qiao et al., 2017;Stolar & Nielsen, 2015). While the sampling bias caused by uneven sampling can be reduced by rarifying or filtering the occurrence records (Castellanos et al., 2019;Gábor et al., 2020;Kramer-Schadt et al., 2013), imprecisely recorded occurrence locations are very difficult (although not entirely impossible) to correct (Hefley et al., 2017). In certain cases, for example, when using citizen-science databases or local monitoring systems, the occurrence locations of the species may be of a coarse precision or only available at municipal or county level (i.e., related to geographical surfaces of differing sizes). ...
... In practice, however, sampling bias is inevitable, especially when the input data have to be assembled from different sources that are based on different sampling methods Lobo & Tognelli, 2011;Qiao et al., 2017;Stolar & Nielsen, 2015). While the sampling bias caused by uneven sampling can be reduced by rarifying or filtering the occurrence records (Castellanos et al., 2019;Gábor et al., 2020;Kramer-Schadt et al., 2013), imprecisely recorded occurrence locations are very difficult (although not entirely impossible) to correct (Hefley et al., 2017). In certain cases, for example, when using citizen-science databases or local monitoring systems, the occurrence locations of the species may be of a coarse precision or only available at municipal or county level (i.e., related to geographical surfaces of differing sizes). ...
Article
Aim Ecological niche models (ENMs) typically require point locations of species’ occurrence as input data. Where exact locations are not available, geographical centroids of the respective administrational spatial units (ASUs) are often used as a substitute. We investigated how the use of ASU centroids in ENMs affects model performance, what role the size of ASUs plays, and what effects different grain sizes of explanatory variables have. Location Europe. Major taxa studied Virtual species. Methods We set up a two‐factorial study design with artificial ASUs of three different sizes and environmental data of four commonly used grain sizes, repeated over three study regions. To control other factors that may affect ENM performance, we created a virtual species with a known response to environmental variables, precise and even sampling and a known spatial distribution. We ran a series of Maxent models for the virtual species based on centroids and precise occurrence locations under varying ASU and grain sizes. Results The use of ASU centroids introduces a value frequency mismatch of the explanatory variables between centroids and true occurrence locations, and it has a negative effect on ENM performance. Value frequency mismatch, negative effect on ENM performance and over‐prediction of the species’ range all increase with ASU size. The effect of grain size of environmental data, on the contrary, was small in comparison. Main conclusions ENMs built upon ASU centroids can suffer considerably from the introduced error. For ASUs that are sufficiently small or show low spatial heterogeneity of explanatory variables, ASU centroids can still be a viable and convenient surrogate for precise occurrence locations. When possible, however, central tendency values (median, mean) that represent the whole ASU rather than just a single point location need to be considered.
... Knowledge of species distributions is limited (i.e. the so-called Wallacean shortfall; Lomolino 2004) and, in general, geographical and environmentally biased (Beck et al., 2014;Hortal et al., 2015Hortal et al., , 2008Oliveira et al., 2016). In addition, the particular characteristics of each species, like those related to their ecological role and geographical distribution, may also seriously influence the uncertainty of SDM predictions (Chefaoui et al., 2011;Gábor et al., 2019). Species often differ in the biological traits determining their distributions, hindering the automatic detection of the "true" relationship between species occurrences (and, eventually, absences) and environmental predictors Thuiller et al., 2010). ...
Article
Full-text available
Species distribution models (SDM) are widely used as indicators of different aspects of geographical ranges for many purposes, from conservation to biogeographical and evolutionary analyses. However, these techniques are susceptible to various sources of uncertainty. Data coverage, species’ ecology, and the characteristics of their geographic distributions can affect SDM results, often generating critical errors in predicted distribution maps. We assess the influence of data quality, the characteristics of species distributions, and ecological traits on SDM performance. We predict the distributions of dung beetle species in Madrid region (central Spain) using six SDM techniques and validate them on an independent dataset. We relate variations in model performance with environmental completeness, data characteristics, and species traits through a partial least squares analysis. In this analysis, body size, nesting behaviour, marginality, rarity, data prevalence, Relative Occurrence Area (ROA), range size, niche breadth, and completeness are used as predictors of six assessment metrics (sensitivity, specificity, kappa, TSS, CCR, and AUC). Marginality and data prevalence were the variables that most influenced SDM performance, followed by range size, ROA, and niche breadth: species presenting higher marginality and data prevalence, and smaller ROA and niche breadth were associated with better models. Nesting behaviour, rarity, niche completeness, and body size had minor importance for SDM performance. Our results highlight the importance of taking species’ and data characteristics into account when modelling and comparing large groups of species using SDM. This implies that estimates of species richness and composition based on stacked SDMs can show high levels of error if they are constructed for groups of species with diverse ecological traits and types of geographic distributions. We suggest that the species holding characteristics that lead to poor SDM performance should not be included when constructing composite biodiversity variables. Further effort is needed to develop SDM methodologies and protocols that account for such source of uncertainty.
... Alternatively, Varela et al. (2014) showed that filtering by environmental criteria provides better results. However, filtering may provide worse results if performed without reliable information on the bias in species occurrence data (Gábor et al. 2019). ...
Article
Full-text available
Ecological niche models (ENMs) are widely used statistical methods to estimate various types of species niches. After lecturing several editions of introductory courses on ENMs and reviewing numerous manuscripts on this subject, we frequently faced some recurrent mistakes: 1) presence-background modelling methods, such as Maxent or ENFA, are used as if they were pseudo-absence methods; 2) spatial autocorrelation is confused with clustering of species records; 3) environmental variables are used with a higher spatial resolution than species records; 4) correlations between variables are not taken into account; 5) machine-learning models are not replicated; 6) topographical variables are calculated from unprojected coordinate systems, and; 7) environmental variables are downscaled by resampling. Some of these mistakes correspond to student misunderstandings and are corrected before publication. However, other errors can be found in published papers. We explain here why these approaches are erroneous and we propose ways to improve them.
... Data quality and species distribution models remain a relevant focus in this special issue, where Gábor et al. (2020) explored how species occurrence characteristics affect model performance using a virtual species approach. A virtual species approach involves simulating an ecological pattern with known characteristics in order to test a model's ability to estimate it absent of confounding factors associated with 'real' data (Zurell et al. 2010, Miller 2014. ...
... While datasets provided by global aggregators are increasingly rich and useful, they were shown to suffer from various types of data quality issues, such as duplicate records (Mesibov, 2018), records with high positional uncertainty (Maldonado et al., 2015;Otegui et al., 2013), heterogeneity amongst taxa (i.e., our knowledge of species distribution remains poor for most taxa), and to be spatially biased (i.e., uneven distribution of biodiversity information) (Amano et al., 2016;Amano and Sutherland, 2013;Jetz et al., 2012;Menegotto and Rangel, 2018;Meyer et al., 2015Meyer et al., , 2016Webb et al., 2010). Those issues keep challenging the usability of those datasets, which can in turn impact the accuracy of species distribution models (SDMs; Beck et al., 2014;Gábor et al., 2019a) or global species richness characterization (Menegotto and Rangel, 2018;Peterson and Soberón, 2018;Turak et al., 2017). To support the international collaboration related to biodiversity data quality, groups such as the Biodiversity Information Standards group were established (https://www.tdwg.org/), ...
Article
SHARE LINK (50 days free): https://authors.elsevier.com/a/1aPUk5c6cKuCNn Knowing spatial and temporal patterns of species distribution is paramount to support marine species persistence. While datasets provided by global aggregators are increasingly rich and useful, they suffer from various types of data quality issues that can impact their usage. Using marine mammals as an example, we assessed the quality and information gaps in species distribution data from three major databases: the Global Biodiversity Information Facility (GBIF), the Ocean Biogeographic Information System (OBIS) and the International Union for Conservation of Nature (IUCN) range maps. We analysed marine mammal records from 2015 (n=1,396,581) and from 2019 (n=1,904,968), for six types of common quality or usability issues. Results for both OBIS and GBIF indicate that 35 to 55% (depending on the respective database and year) of individual database's records are potential duplicates, fall on land, or miss a data collection date. The positional accuracy of data records varies greatly due to varying precision and rounding of geographic coordinates. However, coordinate precision is specified only in 45% and 70% of records in GBIF and OBIS, respectively. In 2019, only approximately 70% of GBIF and OBIS records are encoded using more than three decimals (i.e. remaining records have a positional accuracy lower than 100 m). We also quantified that only 19% (n=135,885) and 11% (n=133,882) of the records in 2015 and 2019, respectively, were common to OBIS and GBIF. Despite the continuous increase in the number of records in both databases, the number of shared records slightly decreased. It is therefore likely that new records added to GBIF and OBIS between 2015 and 2019 come from different data providers. Finally, to identify potential information gaps in marine mammal distributions, we overlaid IUCN range maps and species occurrences from global databases. We found that areas previously identified as hotspots for marine mammals' diversity show some of the highest rates of potential false positives (i.e. species are thought to occur there based on their range map, but no species record exist in either GBIF or OBIS). While global biodiversity databases are key to assess global species distribution patterns, our study points to challenges that can limit data usability in biodiversity research. Improving existing data entry mechanisms, quality control routines, as well as data exchange between aggregators should help make those databases more useful to the community and reduce the risks of misuse of biological data.
... Ecological niche models (ENMs) are fundamental tools for describing the multivariate structure of a species niche and producing spatially explicit maps of probability of occurrence at the landscape scale. Developing reliable ENMs is challenging and the correlative methods used in some studies may suffer from overfitting and biased prediction due to issues relating to spatial autocorrelation and sampling bias in presence localities (Redding et al. 2017), inappropriate background data selection (Barbet-Massin et al. 2012), insufficient independent testing data (Radosavljevic and Anderson 2014), incorrect spatial scale of environmental covariates (McGarigal et al. 2016a), and improper model parameterisation (Huang et al. 2018;Gábor et al. 2019). ...
Article
Full-text available
Context Carnivores in the central Iranian plateau have experienced considerable declines in their populations during the last century. Ecological niche models can inform conservation efforts aimed at increasing the suitability of carnivore habitat by providing valuable information on the scale-dependent relationships between species and their environment. Objectives We used a multiscale modeling framework to predict habitat suitability and investigate the influence of spatial scale on species-environment relationships for three sympatric felids, chosen as surrogate species, including Asiatic cheetah (Acynonix jubatus), Persian leopard (Panthera pardus), and sand cat (Felis margarita) with the aim of informing conservation efforts for these species and other Iranian carnivores more widely. Methods We used opportunistically collected occurrence data and a presence-only, multiscale MaxEnt approach whilst exploring the impact of spatial filtering and data partitioning on model predictions and performance. Results Scaling optimization showed that the performance of models was associated with variables at multiple spatial scales, with relationships tending to be strongest at the largest scales (4–8 km). Our findings showed that landscape composition generally have stronger influences on occurrence of the studied species than configuration. The comparison among models showed distinct patterns of habitat selection, implying niche partitioning between species. Conclusions Our knowledge of scale-dependent relationships between three sympatric felids and their spatial niches facilitates effective conservation of habitat connectivity for multiple carnivore species by prioritizing predicted key suitable patches inside and outside of protected areas which have significant contribution in maintaining landscape connectivity in Iran.
... Here we provide a virtual species example to demonstrate the effects of the substitution of two scale alteration approaches in a controlled environment on SDMs (Gábor et al., 2019;Meynard et al., 2019;Moudrý, 2015). Our study area is a real landscape located in north-west Bohemia, Czech Republic (50°32′ N, 13°50′ E) and occupies an area of 35 km 2 (Fig. 4). ...
Article
Terrain attributes (e.g., slope, rugosity) derived in Geographic Information Systems (GIS) from digital terrain models (DTMs) are widely used in both terrestrial and marine ecological studies due to their potential to act as surrogates of species distribution. However, the spatial resolution of DTMs is often altered to match the scale at which species observations were collected. Here, we highlight the significance of adequately reporting the methods used to derive terrain attributes from DTMs and the consequences of their incorrect reporting in ecological studies. To ensure full repeatability of studies, they should report (i) the source and the resolution of the original DTM; (ii) the algorithm used to calculate terrain attributes; (iii) the method used for rescaling (e.g., aggregating or resampling, using the mean or maximum values); and (iv) the order in which these operations were performed. We contrast the effects of two common scale alteration approaches for the derivation of terrain attributes from DTMs. These two scale alteration methods differ in the step at which the change is performed: (i) the resolution alteration is performed after computing terrain attributes from the original DTM at the native resolution, or (ii) the resolution alteration is performed on the native DTM before computing terrain attributes. While these approaches conceptually do the same thing (i.e., change the resolution of the terrain attributes), we demonstrate that they produce two distinct sets of variables that are not interchangeable and describe different properties of the terrain. In a species distribution modelling (SDM) context, the first approach calculates terrain attribute values within the cell where a species is found, while the second approach calculates terrain attribute values with respect to neighbouring cells. A mutual substitution of the two approaches results in a decrease of models' discrimination ability and in misleading spatial predictions of species probability of occurrence. Regardless of the DTM-derived attribute, we argue that the choice of the approach should be carefully guided by both the ecological scale relevant to the question being asked and the performance of pre-analyses. We emphasize that selected methods be clearly described to encourage reproducibility and proper interpretation of results, thus enabling a better understanding of the role of scale in ecology.
Article
Full-text available
Species distribution models (SDMs) are powerful tools in ecology and conservation. Choosing the right environmental drivers and filtering species' occurrences taking their biases into account are key factors to consider before modeling. In this case study, we address five common problems arising during the selection of input data for presence-only SDMs on an example of a general-ist species: the endangered Cantabrian brown bear. First, we focus on the selection of environmental variables that may drive its distribution, testing if climatic variables should be considered at a 1-km analysis grain. Second, we investigate how filtering the species' data in view of (1) their collection procedures , (2) different time frames, (3) dispersal areas, and (4) subpopulations affects the performance and outputs of the models at three different spatial analysis grains (500 m, 1 km, and 5 km). Our results show that models with different input data yielded only minor differences in performance and behaved properly in terms of model validation, although coarsening the analysis grain deteriorated model performance. Still, the contribution of individual variables and the habitat suitability predictions differed among models. We show that a combination of limited data availability and poor selection of environmental variables can lead to inaccurate predictions. Specifically for the brown bear, we conclude that climatic variables should not be considered for exploring habitat suitability and that the best input data for modeling habitat suitability in the study area originate from (1) observations and traces from the (2) most recent period (2006-2019) in which the population is expanding, (3) not considering cells of dispersing bear occurrences and (4) modeling sub-populations independently (as they show distinct habitat preferences). In conclusion , SDMs can serve as a useful tool for generalist species including all available data; still, expert evaluation from the perspective of data suitability for the purpose of modeling and possible biases is recommended. This is especially important when the results are intended for management and conservation purposes at the local level, and for species that respond to the environment at coarse analysis grains.
Article
The performance of species distribution models (SDMs) is known to be affected by analysis grain and positional error of species occurrences. Coarsening of the analysis grain has been suggested to compensate for positional errors. Nevertheless, this way of dealing with positional errors has never been thoroughly tested. With increasing use of fine‐scale environmental data in SDMs, it is important to test this assumption. Models using fine‐scale environmental data are more likely to be negatively affected by positional error as the inaccurate occurrences might easier end up in unsuitable environment. This can result in inappropriate conservation actions. Here, we examined the trade‐offs between positional error and analysis grain and provide recommendations for best practice. We generated narrow niche virtual species using environmental variables derived from LiDAR point clouds at 5 × 5 m fine‐scale. We simulated the positional error in the range of 5 m to 99 m and evaluated the effects of several spatial grains in the range of 5 m to 500 m. In total, we assessed 49 combinations of positional accuracy and analysis grain. We used three modelling techniques (MaxEnt, BRT and GLM) and evaluated their discrimination ability, niche overlap with virtual species and change in realized niche. We found that model performance decreased with increasing positional error in species occurrences and coarsening of the analysis grain. Most importantly, we showed that coarsening the analysis grain to compensate for positional error did not improve model performance. Our results reject coarsening of the analysis grain as a solution to address the negative effects of positional error on model performance. We recommend fitting models with the finest possible analysis grain and as close to the response grain as possible even when available species occurrences suffer from positional errors. If there are significant positional errors in species occurrences, users are unlikely to benefit from making additional efforts to obtain higher resolution environmental data unless they also minimize the positional errors of species occurrences. Our findings are also applicable to coarse analysis grain, especially for fragmented habitats, and for species with narrow niche breadth.
Article
Full-text available
The scale dependence of benthic terrain attributes is well-accepted, and multi-scale methods are increasingly applied for benthic habitat mapping. There are, however, multiple ways to calculate terrain attributes at multiple scales, and the suitability of these approaches depends on the purpose of the analysis and data characteristics. There are currently few guidelines establishing the appropriateness of multi-scale raster calculation approaches for specific benthic habitat mapping applications. First, we identify three common purposes for calculating terrain attributes at multiple scales for benthic habitat mapping: i) characterizing scale-specific terrain features, ii) reducing data artefacts and errors, and iii) reducing the mischaracterization of ground-truth data due to inaccurate sample positioning. We then define criteria that calculation approaches should fulfill to address these purposes. At two study sites, five raster terrain attributes, including measures of orientation, relative position, terrain variability, slope, and rugosity were calculated at multiple scales using four approaches to compare the suitability of the approaches for these three purposes. Results suggested that specific calculation approaches were better suited to certain tasks. A transferable parameter, termed the ‘analysis distance’, was necessary to compare attributes calculated using different approaches, and we emphasize the utility of such a parameter for facilitating the generalized comparison of terrain attributes across methods, sites, and scales.
Article
Full-text available
Aim Species distribution information is essential under increasing global changes, and models can be used to acquire such information but they can be affected by different errors/bias. Here, we evaluated the degree to which errors in species data (false presences–absences) affect model predictions and how this is reflected in commonly used evaluation metrics. Location Western Swiss Alps. Methods Using 100 virtual species and different sampling methods, we created observation datasets of different sizes (100–400–1,600) and added increasing levels of errors (creating false positives or negatives; from 0% to 50%). These degraded datasets were used to fit models using generalized linear model, random forest and boosted regression trees. Model fit (ability to reproduce calibration data) and predictive success (ability to predict the true distribution) were measured on probabilistic/binary outcomes using Kappa, TSS, MaxKappa, MaxTSS and Somers'D (rescaled AUC). Results The interpretation of models’ performance depended on the data and metrics used to evaluate them, with conclusions differing whether model fit, or predictive success were measured. Added errors reduced model performance, with effects expectedly decreasing as sample size increased. Model performance was more affected by false positives than by false negatives. Models with different techniques were differently affected by errors: models with high fit presenting lower predictive success (RFs), and vice versa (GLMs). High evaluation metrics could still be obtained with 30% error added, indicating that some metrics (Somers'D) might not be sensitive enough to detect data degradation. Main conclusions Our findings highlight the need to reconsider the interpretation scale of some commonly used evaluation metrics: Kappa seems more realistic than Somers'D/AUC or TSS. High fits were obtained with high levels of error added, showing that RF overfits the data. When collecting occurrence databases, it is advisory to reduce the rate of false positives (or increase sample sizes) rather than false negatives.
Article
Full-text available
While modelling habitat suitability and species distribution, ecologists must deal with issues related to the spatial resolution of species occurrence and environmental data. Indeed, given that the spatial resolution of species and environmental datasets range from centimeters to hundreds of kilometers, it underlines the importance of choosing the optimal combination of resolutions to achieve the highest possible modelling prediction accuracy. We evaluated how the spatial resolution of land cover/waterbody datasets (meters to 1 km) affect waterbird habitat suitability models based on atlas data (grid cell of 12×11 km). We hypothesized that the area, perimeter and number of waterbodies computed from high resolution datasets would explain distributions of waterbirds better because coarse resolution datasets omit small waterbodies affecting species occurrence. Specifically, we investigated which spatial resolution of waterbodies better explain the distribution of seven waterbirds nesting on ponds/lakes with areas ranging from 0.1 ha to hundreds of hectares. Our results show that the area and perimeter of waterbodies derived from high resolution datasets (raster data with 30 m resolution, vector data corresponding with map scale 1:10,000) explain the distribution of the waterbirds better than those calculated using less accurate datasets despite the coarse grain of the species data. Taking into account the spatial extent (global vs regional) of the datasets, we found the Global Inland Waterbody Dataset to be the most suitable for modelling distribution of waterbirds. In general, we recommend using land cover data of a resolution sufficient to capture the smallest patches of the habitat suitable for a given species’ presence for both fine and coarse grain habitat suitability and distribution modelling. This article is protected by copyright. All rights reserved.
Article
Full-text available
Scale is a vital component to consider in ecological research, and spatial resolution or grain size is one of its key facets. Species distribution models (SDMs) are prime examples of ecological research in which grain size is an important component. Despite this, SDMs rarely explicitly examine the effects of varying the grain size of the predictors for species with different niche breadths. To investigate the effect of grain size and niche breadth on SDMs, we simulated four virtual species with different grain sizes/niche breadths using three environmental predictors (elevation, aspect, and percent forest) across two real landscapes of differing heterogeneity in predictor values. We aggregated these predictors to seven different grain sizes and modeled the distribution of each of our simulated species using MaxEnt and GLM techniques at each grain size. We examined model accuracy using the AUC statistic, Pearson's correlations of predicted suitability with the true suitability, and the binary area of presence determined from suitability above the maximum true skill statistic (TSS) threshold. Habitat specialists were more accurately modeled than generalist species, and the models constructed at the grain size from which a species was derived generally performed the best. The accuracy of models in the homogenous landscape deteriorated with increasing grain size to a greater degree than models in the heterogenous landscape. Variable effects on the model varied with grain size, with elevation increasing in importance as grain size increased while aspect lost importance. The area of predicted presence was drastically affected by grain size, with larger grain sizes over predicting this value by up to a factor of 14. Our results have implications for species distribution modeling and conservation planning, and we suggest more studies include analysis of grain size as part of their protocol.
Article
Full-text available
The number of alien species transported as stowaways is steadily increasing, and new approaches are urgently needed to tackle this emerging invasion pathway. We introduce a general framework for identifying high-risk transport pathways and receiving sites for alien species that are unintentionally transported via goods and services. This approach combines the probability of species arrival at transport hubs with the likelihood that the environment in the new region can sustain populations of that species. We illustrate our approach using a case study of the Asian black-spined toad (Duttaphrynus melanostictus) in Australia, a species that is of significant biosecurity concern in Australasia, Indonesia, and Madagascar. A correlative model fitted to occurrence data from the native geographic range of D. melanostictus predicted high environmental suitability at locations where the species has established alien populations globally. Applying the model to Australia revealed that transport hubs with the highest numbers of border interceptions and on-shore detections of D. melanostictus were environmentally similar to locations within the species’ native range. Numbers of D. melanostictus interceptions and detections in Australia increased over time, but were unrelated to indices of air and maritime trade volume. Instead, numbers of interceptions and detections were determined by the country of origin of airplanes (Thailand) and ships (Indonesia). Thus, the common assumption that transport pressure is correlated with invasion risk does not hold in all cases. Our work builds on previous efforts to integrate transport pressure data and species distribution models, by jointly modelling the number of intercepted and detected stowaways, while incorporating imperfect detection and the environmental suitability of receiving hubs. The approach presented here can be applied to any system for which historical biosecurity data are available, and provides an efficient means to allocate quarantine and surveillance efforts to reduce the probability of alien species establishment. This article is protected by copyright. All rights reserved.
Article
Full-text available
Full-text typesetted available at : http://rdcu.be/qJFC - Inland aquatic ecosystems are vulnerable to both climate change and biological invasion at broad spatial scales. The aim of this study was to establish the current and future potential distribution of three invasive plant taxa, Egeria densa, Myriophyllum aquaticum and Ludwigia spp., in their native and exotic ranges. We used Species Distribution Models (SDMs), with nine different algorithms and three global circulation models, and we restricted the suitability maps to cells containing aquatic ecosystems. The current bioclimatic range of the taxa was predicted to represent 6.6 to 12.3% of their suitable habitats at global scale, with a lot of variations between continents. In Europe and North America, their invasive ranges are predicted to increase up to 2 fold by 2070 with the highest gas emission scenario. Suitable new areas will mainly be located to the north of their current range. In other continents where they are exotic and in their native range (South America), the surface areas of suitable locations are predicted to decrease with climate change, especially for Ludwigia spp. in South America (down to -55% by 2070 with RCP 8.5 scenario). This study allows to identify areas vulnerable to ongoing invasions by aquatic plant species and thus could help the prioritisation of monitoring and management, as well as contribute to the public awareness regarding biological invasions.
Article
Full-text available
Biological control using natural antagonists has been a most successful management tool against alien invasive plants that threaten biodiversity. The selection of candidate agents remains a critical step in a biocontrol program before more elaborate and time-consuming experiments are conducted. Here, we propose a biogeographic approach to identify candidates and combinations of candidates to potentially cover a large range of the invader. We studied Ambrosia artemisiifolia (common ragweed), native to North America (NA) and invasive worldwide, and six NA biocontrol candidates for the introduced Europe (EU) range of ragweed, both under current and future bioclimatic conditions. For the first time, we constructed species distribution models based on worldwide occurrences and important bioclimatic variables simultaneously for a plant invader and its biocontrol candidates in view of selecting candidates that potentially cover a large range of the target invader. Ordination techniques were used to explore climatic constraints of each species and to perform niche overlap tests with ragweed. We show a large overlap in climatic space between candidates and ragweed, but a considerable discrepancy in geographic range overlap between EU (31.4%) and NA (83.3%). This might be due to niche unfilling and expansion of ragweed in EU and the fact that habitats with high ragweed occurrences in EU are rare in NA and predicted to be unsuitable for the candidates. Total geographic range of all candidates combined is expected to decrease under climate change in both ranges, but they will respond differently. The relative geographic coverage of a plant invader by biocontrol candidates at home is largely transferable to the introduced range, even when the invader shifts its niche. Our analyses also identified which combination of candidates is expected to cover the most area and for which abiotic conditions to select in order to develop climatically adapted strains for particular regions, where ragweed is currently unlikely to be controlled.
Article
Full-text available
Predictive models are useful to support decision making, management and conservation planning. However, the performance of models varies across techniques and is affected by several factors including species prevalence (i.e. the occurrence rate of each species in the total samples). Here, we analysed and compared the performance of four common modelling techniques based on the species prevalence. The occurrence of macroinvertebrates collected at 63 sites along the Lower Mekong Basin was predicted using Logistic Regression, Random Forest, Support Vector Machine and Artificial Neural Network (ANN). Model performance was evaluated using Cohen’s Kappa Statistic (Kappa), area under receiver operating characteristic curve (AUC) and error rate. We found a highly significant quadratic effect of species prevalence on the four modelling techniques’ performance. Kappa and AUC were less depended on the species prevalence, making them a better measure. The best performance (Kappa and AUC) was reached when predicting species with an intermediate prevalence (e.g. 0.4-0.6). The four modelling techniques significantly yielded different performances (p<0.01), of which ANN performed generally better when using the complete prevalence range (i.e. 0.0-1.0) and the lower prevalence range (i.e. <0.1). However, the four techniques similarly performed when predicting species with a higher prevalence range (i.e. ≥0.3). Our results provide useful insights into the application of modelling techniques in predicting species occurrence and how their performance varies for species with different prevalence ranges. We suggest that the selection of appropriate modelling techniques should carefully take into account the species prevalence, particularly in the case of rare and generalist species.
Article
Full-text available
We describe an algorithm that helps to predict potential distributional areas for species using presence-only records. The Marble Algorithm is a density-based clustering program based on Hutchinson's concept of ecological niches as multidimensional hypervolumes in environmental space. The algorithm characterizes this niche space using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. When MA is provided with a set of occurrence points in environmental space, the algorithm determines two parameters that allow the points to be grouped into several clusters. These clusters are used as reference sets describing the ecological niche, which can then be mapped onto geographic space and used as the potential distribution of the species. We used both virtual species and ten empirical datasets to compare MA with other distribution-modeling tools, including Bioclimate Analysis and Prediction System, Environmental Niche Factor Analysis, the Genetic Algorithm for Rule-set Production, Maximum Entropy Modeling, Artificial Neural Networks, Climate Space Models, Classification Tree Analysis, Generalised Additive Models, Generalised Boosted Models, Generalised Linear Models, Multivariate Adaptive Regression Splines and Random Forests. Results indicate that MA predicts potential distributional areas with high accuracy, moderate robustness, and above-average transferability on all datasets, particularly when dealing with small numbers of occurrences.
Article
Full-text available
virtualspecies is a freely available package for R designed to generate virtual species distributions, a procedure increasingly used in ecology to improve species distribution models. This package combines the existing methodological approaches with the objective of generating virtual species distributions with increased ecological realism. The package includes (1) generating the probability of occurrence of a virtual species from a spatial set of environmental conditions (i.e., environmental suitability), with two different approaches; (2) converting the environmental suitability into presence-absence with a probabilistic approach; (3) introducing dispersal limitations in the realised virtual species distributions and (4) sampling occurrences with different biases in the sampling procedure. The package was designed to be extremely flexible, to allow users to simulate their own defined species-environment relationships, as well as to provide a fine control over every simulation parameter. The package also includes a function to generate random virtual species distributions. We provide a simple example in this paper showing how increasing ecological realism of the virtual species impacts the predictive performance of species distribution models. We expect that this new package will be valuable to researchers willing to test techniques and protocols of species distribution models as well as various biogeographical hypotheses.This article is protected by copyright. All rights reserved.
Article
Full-text available
Species distribution models (SDMs) are widely used to predict the occurrence of species. Because SDMs generally use presence-only data, validation of the predicted distribution and assessing model accuracy is challenging. Model performance depends on both sample size and species’ prevalence, being the fraction of the study area occupied by the species. Here, we present a novel method using simulated species to identify the minimum number of records required to generate accurate SDMs for taxa of different pre-defined prevalence classes. We quantified model performance as a function of sample size and prevalence and found model performance to increase with increasing sample size under constant prevalence, and to decrease with increasing prevalence under constant sample size. The area under the curve (AUC) is commonly used as a measure of model performance. However, when applied to presence-only data it is prevalence-dependent and hence not an accurate performance index. Testing the AUC of an SDM for significant deviation from random performance provides a good alternative. We assessed the minimum number of records required to obtain good model performance for species of different prevalence classes in a virtual study area and in a real African study area. The lower limit depends on the species’ prevalence with absolute minimum sample sizes as low as 3 for narrow-ranged and 13 for widespread species for our virtual study area which represents an ideal, balanced, orthogonal world. The lower limit of 3, however, is flawed by statistical artefacts related to modelling species with a prevalence below 0.1. In our African study area lower limits are higher, ranging from 14 for narrow-ranged to 25 for widespread species. We advocate identifying the minimum sample size for any species distribution modelling by applying the novel method presented here, which is applicable to any taxonomic clade or group, study area or climate scenario.
Article
Full-text available
Species distribution models ( SDM s) have become a standard tool in ecology and applied conservation biology. Modelling rare and threatened species is particularly important for conservation purposes. However, modelling rare species is difficult because the combination of few occurrences and many predictor variables easily leads to model overfitting. A new strategy using ensembles of small models was recently developed in an attempt to overcome this limitation of rare species modelling and has been tested successfully for only a single species so far. Here, we aim to test the approach more comprehensively on a large number of species including a transferability assessment. For each species, numerous small (here bivariate) models were calibrated, evaluated and averaged to an ensemble weighted by AUC scores. These ‘ensembles of small models’ ( ESM s) were compared to standard SDM s using three commonly used modelling techniques ( GLM , GBM and Maxent) and their ensemble prediction. We tested 107 rare and under‐sampled plant species of conservation concern in Switzerland. We show that ESM s performed significantly better than standard SDM s. The rarer the species, the more pronounced the effects were. ESM s were also superior to standard SDM s and their ensemble when they were evaluated using a transferability assessment. By averaging simple small models to an ensemble, ESM s avoid overfitting without losing explanatory power through reducing the number of predictor variables. They further improve the reliability of species distribution models, especially for rare species, and thus help to overcome limitations of modelling rare species.
Article
Full-text available
At the local spatial scale, land-use variables are often employed as predictors for ecological niche models (ENMs). Remote sensing can provide additional synoptic information describing vegetation structure in detail. However, there is limited knowledge on which environmental variables and how many of them should be used to calibrate ENMs. We used an information-theoretic approach to compare the performance of ENMs using different sets of predictors: (1) a full set of land-cover variables (seven, obtained from the LGN6 Dutch National Land Use Database); (2) a reduced set of land-cover variables (three); (3) remotely sensed laser data optimized to measure vegetation structure and canopy height (LiDAR, light detection and ranging); and (4) combinations of land cover and LiDAR. ENMs were built for a set of bird species in the Veluwe Natura 2000 site (the Netherlands); for each species, 26–214 records were available from standardized monitoring. Models were built using MaxEnt, and the best performing models were identified using the Akaike’s information criterion corrected for small sample size (AICc). For 78% of the bird species analysed, LiDAR data were included in the best AICc model. The model including LiDAR only was the best performing one in most cases, followed by the model including a reduced set of land-use variables. Models including many land-use variables tended to have limited support. The number of variables included in the best model increased for species with more presence records. For all species with 33 records or less, the best model included LiDAR only. Models with many land-use variables were only selected for species with >150 records. Test area under the curve (AUC) scores ranged between 0.72 and 0.92. Remote sensing data can thus provide regional information useful for modelling at the local and landscape scale, particularly when presence records are limited. ENMs can be optimized through the selection of the number and identity of environmental predictors. Few variables can be sufficient if presence records are limited in number. Synoptic remote sensing data provide a good measure of vegetation structure and may allow a better representation of the available habitat, being extremely useful in this case. Conversely, a larger number of predictors, including land-use variables, can be useful if a large number of presence records are available.
Article
Full-text available
MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the distribution of a species from a set of records and environmental predictors. However, datasets of species occurrence used to train the model are often biased in the geographical space because of unequal sampling effort across the study area. This bias may be a source of strong inaccuracy in the resulting model and could lead to incorrect predictions. Although a number of sampling bias correction methods have been proposed, there is no consensual guideline to account for it. We compared here the performance of five methods of bias correction on three datasets of species occurrence: one "virtual" derived from a land cover map, and two actual datasets for a turtle (Chrysemys picta) and a salamander (Plethodon cylindraceus). We subjected these datasets to four types of sampling biases corresponding to potential types of empirical biases. We applied five correction methods to the biased samples and compared the outputs of distribution models to unbiased datasets to assess the overall correction performance of each method. The results revealed that the ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species. However, the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases. The strong effect of initial conditions on correction performance highlights the need for further research to develop a step-by-step guideline to account for sampling bias. However, this method seems to be the most efficient in correcting sampling bias and should be advised in most cases.
Article
Full-text available
Monitoring for species occupancy is often carried out at local scales, reflecting specific targets, available logistics, and funding. Problematically, conservation planning and management operate at broader scales and use information inventories with good scale coverage. Translating information between local and landscape scales is commonly treated in an ad hoc manner, but conservation decision-making can benefit from quantifying spatial-knowledge relationships. Fauna occupancy monitoring, in particular, suffers from this issue of scale, as there are many different survey methods employed for different purposes. Rather than ignoring how informative these methods are when predicting regional distributions, we describe a statistical approach that identifies survey combinations that provide the greatest additive value in mammal detection across different scales. We identified minimal sets of survey methods for 53 terrestrial mammal species across a large area in Australia (New South Wales (NSW), 800,000 km2) and for each of the 18 bioregions it encompasses. Utility of survey methods varied considerably at a landscape scale. Unplanned opportunistic sightings were the single largest source of species information (35%). The utility of other survey methods varied spatially; some were retained in minimal sets for many bioregions, while others were spatially restricted or unimportant. Predator scats, Elliot and pitfall trapping, spotlighting, and diurnal herpetofauna surveys were the most frequently included survey methods at a landscape scale. Use of our approach can guide identi- fication of efficient combinations of survey methods, maximising detection and returns for monitoring. Findings and methodologies are easily transferable and are globally applicable across any taxa. They provide guidelines for managing scarce resources for regional monitoring programs, and improving regional strategic conservation planning.
Article
Full-text available
Climate and land-use changes are projected to threaten biodiversity over this century. However, few studies have considered the spatial and temporal overlap of these threats to evaluate how ongoing land-use change could affect species ranges projected to shift outside conservation areas.We evaluated climate change and urban development effects on vegetation distribution in the Southwest ecoregion, California Floristic Province, USA. We also evaluated how well a conservation network protects suitable habitat for rare plant species under these change projections and identified primary sources of uncertainty. We used consensus-based maps from three species distribution models (SDMs) to project current and future suitable habitat for 19 species representing different functional types (defined by fire-response – obligate seeders, resprouting shrubs – and life forms – herbs, subshrubs), and range sizes (large/common, small/rare). We used one spatially explicit urban growth projection; two climate models, emission scenarios, and probability thresholds applied to SDMs; and high-resolution (90 m) environmental data. We projected that suitable habitat could disappear for 4 species and decrease for 15 by 2080. Averaged centroids of suitable habitat (all species) were projected to shift tens (up to hundreds) of kilometers. Herbs showed a small-projected response to climate change, while obligate seeders could suffer the greatest losses. Several rare species could lose suitable habitat inside conservation areas while increasing area outside. We concluded that (i) climate change is more important than urban development for vegetation habitat loss in this ecoregion through 2080 due to diminishing amounts of undeveloped private land in this region; (ii) the existing conservation plan, while extensive, may be inadequate to protect plant diversity under projected patterns of climate change and urban development, (iii) regional assessments of the dynamics of the drivers of biodiversity change based on high-resolution environmental data and consensus predictive mapping, such as this study, are necessary to identify the species expected to be the most vulnerable and to meaningfully inform regional-scale conservation.
Article
Full-text available
Conservation managers and policy makers require models that can rank the impacts of multiple, interacting threats on biodiversity so that actions can be prioritized. An integrated modelling framework was used to predict the viability of plant populations for five species in southern California's Mediterranean-type ecosystem. The framework integrates forecasts of land-use change from an urban growth model with projections of future climatically-suitable habitat from climate and species distribution models, which are linked to a stochastic population model. The population model incorporates the effects of disturbance regimes and management actions on population viability. This framework: (1) ranks threats by their relative and cumulative impacts on population viability, such as land-use change, climate change, altered disturbance regimes or invasive species, and (2) ranks management responses in terms of their effectiveness for land protection, assisted dispersal, fire management and invasive species control. Too-frequent fire was often the top threat for the species studied, thus fire reduction was ranked the most important management option. Projected changes in suitable habitat as a result of climate change were generally large, but varied across species and climate scenarios; urban development could exacerbate loss of suitable habitat.
Article
Full-text available
Species distribution models (SDMs) trained on presence-only data are frequently used in ecological research and conservation planning. However, users of SDM software are faced with a variety of options, and it is not always obvious how selecting one option over another will affect model performance. Working with MaxEnt software and with tree fern presence data from New Zealand, we assessed whether (a) choosing to correct for geographical sampling bias and (b) using complex environmental response curves have strong effects on goodness of fit. SDMs were trained on tree fern data, obtained from an online biodiversity data portal, with two sources that differed in size and geographical sampling bias: a small, widely-distributed set of herbarium specimens and a large, spatially clustered set of ecological survey records. We attempted to correct for geographical sampling bias by incorporating sampling bias grids in the SDMs, created from all georeferenced vascular plants in the datasets, and explored model complexity issues by fitting a wide variety of environmental response curves (known as "feature types" in MaxEnt). In each case, goodness of fit was assessed by comparing predicted range maps with tree fern presences and absences using an independent national dataset to validate the SDMs. We found that correcting for geographical sampling bias led to major improvements in goodness of fit, but did not entirely resolve the problem: predictions made with clustered ecological data were inferior to those made with the herbarium dataset, even after sampling bias correction. We also found that the choice of feature type had negligible effects on predictive performance, indicating that simple feature types may be sufficient once sampling bias is accounted for. Our study emphasizes the importance of reducing geographical sampling bias, where possible, in datasets used to train SDMs, and the effectiveness and essentialness of sampling bias correction within MaxEnt.
Article
Correlative species distribution models (SDMs) are widely used to predict species distributions and assemblages, with many fundamental and applied uses. Different factors were shown to affect SDM prediction accuracy. However, real data cannot give unambiguous answers on these issues, and for this reason, artificial data have been increasingly used in recent years. Here, we move one step further by assessing how different factors can affect the prediction accuracy of virtual assemblages obtained by stacking individual SDM predictions (stacked SDMs, S-SDM). We modelled 100 virtual species in a real study area, testing five different factors: sample size (200-800-3200), sampling method (nested, non-nested), sampling prevalence (25%, 50%, 75% and species true prevalence), modelling technique (GAM, GLM, BRT and RF) and thresholding method (ROC, MaxTSS, and MaxKappa). We showed that the accuracy of S-SDM predictions is mostly affected by modelling technique followed by sample size. Models fitted by GAM/GLM had a higher accuracy and lower variance than BRT/RF. Model accuracy increased with sample size and a sampling strategy reflecting the true prevalence of the species was most successful. However, even with sample sizes as high as >3000 sites, residual uncertainty remained in the predictions, potentially reflecting a bias introduced by creating and/or resampling the virtual species. Therefore, when evaluating the accuracy of predictions from S-SDMs fitted with real field data, one can hardly expect reaching perfect accuracy, and reasonably high values of similarity or predictive success can already be seen as valuable predictions. We recommend the use of a ‘plot-like’ sampling method (best approximation of the species' true prevalence) and not simply increasing the number of presences-absences of species. As presented here, virtual simulations might be used more systematically in future studies to inform about the best accuracy level that one could expect given the characteristics of the data and the methods used to fit and stack SDMs.
Article
The discriminating capacity (i.e. ability to correctly classify presences and absences) of species distribution models (SDMs) is commonly evaluated with metrics such as the area under the receiving operating characteristic curve (AUC), the Kappa statistic and the true skill statistic (TSS). AUC and Kappa have been repeatedly criticized, but TSS has fared relatively well since its introduction, mainly because it has been considered as independent of prevalence. In addition, discrimination metrics have been contested because they should be calculated on presence–absence data, but are often used on presence‐only or presence‐background data. Here, we investigate TSS and an alternative set of metrics—similarity indices, also known as F‐measures. We first show that even in ideal conditions (i.e. perfectly random presence–absence sampling), TSS can be misleading because of its dependence on prevalence, whereas similarity/F‐measures provide adequate estimations of model discrimination capacity. Second, we show that in real‐world situations where sample prevalence is different from true species prevalence (i.e. biased sampling or presence‐pseudoabsence), no discrimination capacity metric provides adequate estimation of model discrimination capacity, including metrics specifically designed for modelling with presence‐pseudoabsence data. Our conclusions are twofold. First, they unequivocally impel SDM users to understand the potential shortcomings of discrimination metrics when quality presence–absence data are lacking, and we recommend obtaining such data. Second, in the specific case of virtual species, which are increasingly used to develop and test SDM methodologies, we strongly recommend the use of similarity/F‐measures, which were not biased by prevalence, contrary to TSS.
Article
It is now widely acknowledged that the increasing availability of remotely sensed data facilitates ecological modelling. Digital elevation models (DEMs) are arguably one of the most common remote sensing products used in this context. Topographic indices (e.g. slope, orientation, rugosity) derived from DEMs are widely used as surrogates for field-measured environmental variables. Available global DEMs, such as those from the shuttle radar topography mission (SRTM), however, do not provide information on bare-earth elevation as they measure elevation of the highest objects above the ground (e.g. canopy). This affects the derived topographic indices and limits the use of global DEMs in ecological modelling. Unfortunately, most ecological studies ignore this limitation despite the fact that methods to remove the vegetation offset have been developed. We used high resolution LiDAR DTM to assess the accuracy of two newly available global bare-earth DEMs where such methods were applied and to compare them with the SRTM DEM. Furthermore, we assessed the effect of DEMs' vertical error on species distribution models (SDMs) by calculating slope and topographic wetness index (TWI) from these different models and evaluating their suitability for SDMs by adopting a virtual species approach. We simulated virtual species based on slope and TWI derived from accurate LiDAR DTM at three resolutions (30 m, 90 m and 900 m) and developed univariate generalized models to assess the performance of the bare-earth and SRTM DEMs. Our results show that the vertical error in both newly available, vegetation-corrected global DEMs is indeed successfully reduced. The overall vertical root mean squared error (RMSE) was 10.52 m for SRTM, while it was 6.80 m and 6.25 m for the two global bare-earth DEMs. The effect of the vertical error on SDMs was most significant at finer spatial resolutions. Using SRTM DEM, as opposed to a more accurate bare-earth DEM, led to a decline in area under curve (AUC) values from 0.94 to 0.77. SDMs fitted with slope and TWI derived from new global bare-earth DTMs performed slightly better than SRTM. Since methods for vegetation-offset removal in DEMs exist and corrected DEMs are freely available, we argue that the vertical accuracy of DEMs should be more consistently considered. Local, high-accuracy DEMs should be used where available; in remaining instances, however, global DEMs where vertical bias was minimized should be used in ecological modelling. Further improvement of global DEMs at 30 m and better resolutions are needed to enhance accuracy of derived indices and ecological models.
Article
Dryland biodiversity plays important roles in the fight against desertification and poverty, but is highly vulnerable to the impacts of environmental change. However, little research has been conducted on dual pressure from climate and land cover changes on biodiversity in arid and semi-arid environments. Concequntly, it is crutial to understand the potential impacts of future climate and land cover changes on dryland biodiversity. Here, using the Chinese Altai Mountains as a case study area, we predicted the future spatial distributions and local assemblages of nine threatened mammal species under projected climate and land cover change scenarios for the period 2010-2050. The results show that remarkable declines in mammal species richness as well as high rates of species turnover are seen to occur across large areas in the Chinese Altai Mountains, highlighting an urgent need for developing protection strategies for areas outside of current nature reserve network. The selected mammals are predicted to lose more than 50% of their current ranges on average, which is much higher than species' range gains (around 15%) under future climate and land cover changes. Most of the species are predicted to contract their ranges while moving eastwards and to higher altitudes, raising the need for establishing cross-border migration pathways for species. Furthermore, the inclusion of land cover changes had notable effects on projected range shifts of individual species under climate changes, demonstrating that land cover changes should be incorporated into the assessment of future climate impacts to facilitate biodiversity conservation in arid and semi-arid environments.
Article
In a megadiverse country such as South Africa, plant locality data are routinely sourced from the South African National Herbarium (PRE). Evidence suggests that large areas of the country remain poorly collected and that locality records are not always adequately represented in PRE. Our aim was to assess whether distribution information obtained exclusively from PRE adequately represented the known range of selected species. We also assessed the relative value of regional herbaria and supplementary sources of locality data. Locality information was sourced from PRE, 17 regional herbaria, sight records and literature for a subset of 121 ethnomedicinal plant species that are currently regarded to be threatened with extinction or of conservation concern according to the IUCN Red List criteria. Geographic range (km 2) was calculated using distribution information (Quarter-Degree Squares, QDS) obtained from PRE and non-PRE sources. The species' ranges were examined to compare the differences in range size and the overall proportion of QDS records represented in PRE and non-PRE sources. Supplementary data obtained from regional herbaria and other sources increased the number of known QDS records by ±45% per species across the various IUCN Red List threat categories, and the ranges increased by ±28% per species. As the threat status of a species increased, proportionally more QDS were likely to come from supplementary sources. Rarer species tended to be found only in herbaria within their province of occupancy. 'Return for effort' analyses indicated that QDS records should be sourced from PRE plus one other herbarium located within each province in which a species of interest occurs. QDS coverage within species' geographic ranges was under-represented using only data obtained from PRE, reducing the accuracy of species occurrences and distributions relying solely on information sourced from that repository. We demonstrate that this can impact on the accuracy of conservation planning resources such as Red Lists. Our results highlight the relative importance of regional herbaria.
Chapter
Modelling provides an effective means of integrating the complementary strengths of biodiversity data derived from in situ observation versus remote sensing. The use of modelling in biodiversity change observation, or monitoring, is just one of a number of roles that modelling can play in biodiversity assessment. These roles place different levels of emphasis on explanatory versus predictive modelling, and on modelling across space alone, versus across both space and time, either past-to-present or present-to-future. One of the most challenging, yet vitally important, applications of modelling to biodiversity monitoring involves mapping change in the distribution and retention of terrestrial biodiversity. Unlike many structural and functional attributes of ecosystems, most biological entities at the species and genetic levels of biodiversity cannot be readily detected through remote sensing. Estimating change in these levels of biodiversity across large spatial extents is therefore benefiting from advances in both species-level and community-level approaches to model-based integration of in situ biological observations and remotely sensed environmental data.
Article
Studies often use breeding bird atlases to assess species’ habitat requirements or to estimate species’ potential distribution under environmental changes. In breeding bird atlases, one of the attributes recorded for each grid square is evidence of breeding. The attribute represent probability of breeding (confirmed, probable, possible) categorized according to breeding behaviour. However, the majority of studies often make arbitrary decisions on which category to use. This may have severe consequences for results. This study evaluated whether models’ discrimination ability change by inclusion of ambiguous breeding categories (probable, possible). We compared models’ predictions for distribution of nine wetland birds derived from Atlas of the breeding distribution of birds in the Czech Republic. For each species, we developed generalized linear models using combinations of the breeding categories as input to model calibration and validation. Our results show that the discrimination ability (AUC) decreased in most cases when all breeding categories were uncritically used in calibration and validation process. On the other hand, however, inclusion of probable and possible breeding categories to model calibration did not affect models’ abilities to predict confirmed presences and absences. This implies that inclusion of ambiguous breeding categories has more serious impact on models’ performance when added to validation than to calibration data set. We advocate for more rigorous use of different breeding categories and emphasize that widely used atlases from citizen science programmes offer more than simple occurrence data. Additional attributes (e.g. breeding category, sampling effort) should be used to select high quality data to validate the models. Free download (until February 15, 2017) https://authors.elsevier.com/a/1UHn1,XRNLRQ42
Article
Species distribution models (SDMs) are often calibrated using presence- only datasets plagued with environmental sampling bias, which leads to a decrease of model accuracy. In order to compensate for this bias, it has been suggested that background data (or pseudoabsences) should represent the area that has been sampled. However, spatially-explicit knowledge of sampling effort is rarely available. In multi-species studies, sampling effort has been inferred following the target-group (TG) approach, where aggregated occurrence of TG species informs the selection of background data. However, little is known about the species- specific response to this type of bias correction.
Article
To understand how the integration of contextual spatial data on land cover and human infrastructure can help reduce spatial bias in sampling effort, and improve the utilization of citizen science-based species recording schemes. By comparing four different citizen science projects, we explore how the sampling design's complexity affects the role of these spatial biases. Denmark, Europe. We used a point process model to estimate the effect of land cover and human infrastructure on the intensity of observations from four different citizen science species recording schemes. We then use these results to predict areas of under- and oversampling as well as relative biodiversity ‘hotspots’ and ‘deserts’, accounting for common spatial biases introduced in unstructured sampling designs. We demonstrate that the explanatory power of spatial biases such as infrastructure and human population density increased as the complexity of the sampling schemes decreased. Despite a low absolute sampling effort in agricultural landscapes, these areas still appeared oversampled compared to the observed species richness. Conversely, forests and grassland appeared undersampled despite higher absolute sampling efforts. We also present a novel and effective analytical approach to address spatial biases in unstructured sampling schemes and a new way to address such biases, when more structured sampling is not an option. We show that citizen science datasets, which rely on untrained amateurs, are more heavily prone to spatial biases from infrastructure and human population density. Objectives and protocols of mass-participating projects should thus be designed with this in mind. Our results suggest that, where contextual data is available, modelling the intensity of individual observation can help understand and quantify how spatial biases affect the observed biological patterns.
Article
Species distribution models (SDMs) are one of the most important GIScience research areas in biogeography and are the primary means by which the potential effects of climate change on species’ distributions and ranges are investigated. Dispersal is an important ecological process for species responding to changing climates, however, SDMs and their subsequent spatial products rarely reflect accessibility to any future suitable environment. Dispersal-related movement can be confounded by factors that vary across landscapes and climates, as well as within and among species, and it has therefore remained difficult to parametrise in SDMs. Here we compared 20 models that have previously been used (or have the potential to be used) to represent dispersal processes in SDM to predict future range shifts in response to climate change. We assessed the different dispersal models in terms of their accuracy at predicting future distributions, as well as the uncertainty associated with their predictions. Atlas data for 50 bird species from 1988 to 1991 in Great Britain were treated as base distributions (t1), with the species–environment relationships extrapolated (using three commonly used statistical methods) to 2008–2011 (t2). Dispersal (in the form of the 20 different models) was simulated from the base distribution (t1) to 2008–2011 (t2). The results were then combined and used to identify locations that were both abiotically suitable (obtained from the statistical methods) and accessible (obtained from the dispersal models). The accuracy of these coupled projections was assessed with the 2008–2011 atlas data (the observed t2 distribution). There was substantial variation in the accuracy of the different dispersal models, and in general, the more restrictive dispersal models (e.g. fixed rate dispersal) resulted in lower accuracy for the metrics which reward correct prediction of presences. Ensemble models of the dispersal methods (generated by combining multiple projection outcomes) were created for each species, and a new Ensemble Agreement Index (EAI), which ranges from 0 (no agreement among models) to 1 (full agreement among models) was developed to quantify uncertainty among the projections. EAI values ranged from 0.634 (some areas of disagreement and therefore medium uncertainty among dispersal models) to 0.999 (large areas of agreement and low uncertainty among dispersal models). The results of this research highlight the importance of incorporating dispersal and also illustrate that the method with which dispersal is simulated greatly impacts the projected future distribution. This has important implications for studies aimed at predicting the effects of changing environmental conditions on species’ distributions.
Article
Biological recording is in essence a very simple concept in which a record is the report of a species at a physical location at a certain time. The collation of these records into a dataset is a powerful approach to addressing large-scale questions about biodiversity change. Records are collected by volunteers at times and places that suit them, leading to a variety of biases: uneven sampling over space and time, uneven sampling effort per visit and uneven detectability. These need to be controlled for in statistical analyses that use biological records. In particular, the data are ‘presence-only’, and lack information on the sampling protocol or intensity. Submitting ‘complete lists’ of all the species seen is one potential solution because the data can be treated as ‘presence–absence’ and detectability of each species can be statistically modelled. The corollary of bias is that records vary in their ‘information content’. The information content is a measure of how much an individual record, or collection of records, contributes to reducing uncertainty in a parameter of interest. The information content of biological records varies, depending on the question to which the data are being applied. We consider a set of hypothetical ‘syndromes’ of recording behaviour, each of which is characterized by different information content. We demonstrate how these concepts can be used to support the growth of a particular type of recording behaviour. Approaches to recording are rapidly changing, especially with the growth of mass participation citizen science. We discuss how these developments present a range of challenges and opportunities for biological recording in the future. © 2015 The Linnean Society of London, Biological Journal of the Linnean Society, 2015, ●●, ●●–●●.
Article
Species distribution models (SDMs) have become a dominant paradigm for quantifying species-environment relationships, and both the models and their outcomes have seen widespread use in conservation studies, particularly in the context of climate change research. With the growing interest in SDMs, extensive comparative studies have been undertaken. However, few generalizations and recommendations have resulted from these empirical studies, largely due to the confounding effects of differences in and interactions among the statistical methods, species traits, data characteristics, and accuracy metrics considered. This progress report addresses virtual species distribution models': the use of spatially explicit simulated data to represent a true' species distribution in order to evaluate aspects of model conceptualization and implementation. Simulating a true' species distribution, or a virtual species distribution, and systematically testing how these aspects affect SDMs, can provide an important baseline and generate new insights into how these issues affect model outcomes.
Article
Species distribution models (SDMs) are used to inform a range of ecological, biogeographical and conservation applications. However, users often underestimate the strong links between data type, model output and suitability for end-use. We synthesize current knowledge and provide a simple framework that summarizes how interactions between data type and the sampling process (i.e. imperfect detection and sampling bias) determine the quantity that is estimated by a SDM. We then draw upon the published literature and simulations to illustrate and evaluate the information needs of the most common ecological, biogeographical and conservation applications of SDM outputs. We find that, while predictions of models fitted to the most commonly available observational data (presence records) suffice for some applications, others require estimates of occurrence probabilities, which are unattainable without reliable absence records. Our literature review and simulations reveal that, while converting continuous SDM outputs into categories of assumed presence or absence is common practice, it is seldom clearly justified by the application's objective and it usually degrades inference. Matching SDMs to the needs of particular applications is critical to avoid poor scientific inference and management outcomes. This paper aims to help modellers and users assess whether their intended SDM outputs are indeed fit for purpose.
Article
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence-absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size (n < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.
Article
The following is a report on software developed to create virtual species for the study of species distribution modelling (SDM). SDMvspecies provides several methods to create virtual species. The package is designed to be simple and intuitive, even for users who are not familiar with the R language. SDMvspecies is available online free of charge from <http://cran.r-project.org/web/packages/sdmvspecies/>.
Article
AimSpecies distribution models (SDM) can be used to predict the location of unknown populations from known species occurrences. It follows that how the data used to calibrate the models are collected can have a great impact on prediction success. We evaluated the influence of different survey designs and their interaction with the modelling technique on SDM performance.LocationIberian Peninsula.Methods We examine how data recorded using seven alternative survey designs (random, systematic, environmentally stratified by class and environmentally stratified using P-median, biased due to accessibility, biased by human density aggregation and biased towards protected areas) could affect SDM predictions generated with nine modelling techniques (BIOCLIM, Gower distance, Mahalanobis distance, Euclidean distance, GLM, MaxEnt, ENFA and Random Forest). We also study how sample size, species’ characteristics and modelling technique affected SDM predictive ability, using six evaluation metrics.ResultsSurvey design has a small effect on prediction success. Characteristics of species’ ranges rank highest among the factors affecting SDM results: the species with lower relative occurrence area (ROA) are predicted better. Model predictions are also improved when sample size is large.Main conclusionsThe species modelled – particularly the extent of its distribution – are the largest source of influence over SDM results. The environmental coverage of the surveys is more important than the spatial structure of the calibration data. Therefore, climatic biases in the data should be identified to avoid erroneous conclusions about the geographic patterns of species distributions.
Article
1.Species distribution models are increasingly used to address conservation questions, so their predictive capacity requires careful evaluation. Previous studies have shown how individual factors used in model construction can affect prediction. Although some factors probably have negligible effects compared to others, their relative effects are largely unknown.2.We introduce a general “virtual ecologist” framework to study the relative importance of factors involved in the construction of species distribution models.3.We illustrate the framework by examining the relative importance of five key factors-a missing covariate, spatial autocorrelation due to a dispersal process in presences/absences, sample size, sampling design and modeling technique-in a real study framework based on virtual plants in a mountain landscape at regional scale, and show that, for the parameter values considered here, most of the variation in prediction accuracy is due to sample size and modeling technique. Contrary to repeatedly reported concerns, spatial autocorrelation has only comparatively small effects.4.This study shows the importance of using a nested statistical framework to evaluate the relative effects of factors that may affect species distribution models.This article is protected by copyright. All rights reserved.
Article
Metapopulation biology is concerned with the dynamic consequences of migration among local populations and the conditions of regional persistence of species with unstable local populations. Well established effects of habitat patch area and isolation on migration, colonization and population extinction have now become integrated with classic metapopulation dynamics. This has led to models that can be used to predict the movement patterns of individuals, the dynamics of species, and the distributional patterns in multispecies communities in real fragmented landscapes.
Article
Ecological niche models represent key tools in biogeography but the effects of biased sampling hinder their use. Here, we address the utility of two forms of filtering the calibration data set (geographic and environmental) to reduce the effects of sampling bias. To do so we created a virtual species, projected its niche to the Iberian Peninsula and took samples from its binary geographic distribution using several biases. We then built models for various sample sizes after applying each of the filtering approaches. While geographic filtering did not improve discriminatory ability (and sometimes worsened it), environmental filtering consistently led to better models. Models made with few but climatically filtered points performed better than those made with many unfiltered (biased) points. Future research should address additional factors such as the complexity of the species’ niche, strength of filtering, and ability to predict suitability (rather than focus purely on discrimination).
Article
Given species' vulnerability to climate change, land use change, and habitat loss, it is pertinent to examine how the distribution of a particular species is related to those factors. We assessed the use of climate, habitat, and topography data for modeling the distributions of 14 central European wetland birds, and compared the relative importance of these factors among bird groups with differing latitudinal distributions in Europe. We used the Third Atlas of Breeding Birds in the Czech Republic as a source of species distribution data. Variables were derived from Corine Land Cover, WorldClim, and Shuttle Radar Topography Mission (SRTM) data. Hierarchical partitioning and multiple logistic models identified climatic, topographical, and habitat predictors as important determinants of distribution for each of the species under study. However, the relative contributions of particular variables differed among the species. Climatic, topographical, and habitat factor groups also differed in their importance to latitudinal species groups. Our results indicated that wetland birds with range margins close to the Czech Republic were potentially limited by two different factors: climate conditions impact the southerly distributed species and the availability of suitable habitat affects the northerly distributed species. The accuracy of the study models varied from fair to high (the area under curve values was 0.60-0.89) and revealed negative correlations with the relative occurrence area. In this study, we propose that any difference in model performance is more attributable to data characteristics than to a species' geographical characteristics.
Article
Species distribution models (SDMs) are an important tool in biogeography and ecology and are widely used for both fundamental and applied research purposes. SDMs require spatially explicit information about species occurrence and environmental covariates to produce a set of rules that identify and scale the environmental space where the species was observed and that can further be used to predict the suitability of a site for the species. More spatially accurate data are increasingly available, and the number of publications on the influence of spatial inaccuracies on the performance of modelling procedures is growing exponentially. Three main sources of uncertainty are associated with the three elements of a predictive function: the dependent variable, the explanatory variables and the algorithm or function used to relate these two variables. In this study, we review how spatial uncertainties influence model accuracy and we propose some methodological issues in the application of SDMs with regard to the modelling of fundamental and realized niches of species. We distinguish two cases suitable for different types of spatial data accuracy. For modelling the realized distribution of a species, particularly for management and conservation purposes, we suggest using only accurate species occurrence data and large sample sizes. Appropriate data filtering and examination of the spatial autocorrelation in predictors should be a routine procedure to minimize the possible influence of positional uncertainty in species occurrence data. However, if the data are sparse, models of the potential distribution of species can be created using a relatively small sample size, and this can provide a generalized indication of the main regional drivers of the distribution patterns. By this means, field surveys can be targeted to discover unknown populations and species in poorly surveyed regions in order to improve the robustness of the data for later modelling of the realized distributions. Based on this review, we conclude that (1) with data that are currently available, studies performed at a resolution of 1-100 km(2) are useful for hypothesizing about the environmental conditions that limit the distribution of a species and (2) incorporating coarse resolution species occurrence data in a model, despite an increase in sample size, lowers model performance.
Article
With the rise of new powerful statistical techniques and GIS tools, the development of predictive habitat distribution models has rapidly increased in ecology. Such models are static and probabilistic in nature, since they statistically relate the geographical distribution of species or communities to their present environment. A wide array of models has been developed to cover aspects as diverse as biogeography, conservation biology, climate change research, and habitat or species management. In this paper, we present a review of predictive habitat distribution modeling. The variety of statistical techniques used is growing. Ordinary multiple regression and its generalized form (GLM) are very popular and are often used for modeling species distributions. Other methods include neural networks, ordination and classification methods, Bayesian models, locally weighted approaches (e.g. GAM), environmental envelopes or even combinations of these models. The selection of an appropriate method should not depend solely on statistical considerations. Some models are better suited to reflect theoretical findings on the shape and nature of the species’ response (or realized niche). Conceptual considerations include e.g. the trade-off between optimizing accuracy versus optimizing generality. In the field of static distribution modeling, the latter is mostly related to selecting appropriate predictor variables and to designing an appropriate procedure for model selection. New methods, including threshold-independent measures (e.g. receiver operating characteristic (ROC)-plots) and resampling techniques (e.g. bootstrap, cross-validation) have been introduced in ecology for testing the accuracy of predictive models. The choice of an evaluation measure should be driven primarily by the goals of the study. This may possibly lead to the attribution of different weights to the various types of prediction errors (e.g. omission, commission or confusion). Testing the model in a wider range of situations (in space and time) will permit one to define the range of applications for which the model predictions are suitable. In turn, the qualification of the model depends primarily on the goals of the study that define the qualification criteria and on the usability of the model, rather than on statistics alone.