Article

Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics

Authors:
  • LEHNA University of Lyon 1 ; University of Angers,
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Aim Species distribution modelling, a family of statistical methods that predicts species distributions from a set of occurrences and environmental predictors, is now routinely applied in many macroecological studies. However, the reliability of evaluation metrics usually employed to validate these models remains questioned. Moreover, the emergence of online databases of environmental variables with global coverage, especially climatic, has favoured the use of the same set of standard predictors. Unfortunately, the selection of variables is too rarely based on a careful examination of the species' ecology. In this context, our aim was to highlight the importance of selecting ad hoc variables in species distribution models, and to assess the ability of classical evaluation statistics to identify models with no biological realism. Innovation First, we reviewed the current practices in the field of species distribution modelling in terms of variable selection and model evaluation. Then, we computed distribution models of 509 European species using pseudo‐predictors derived from paintings or using a real set of climatic and topographic predictors. We calculated model performance based on the area under the receiver operating curve (AUC) and true skill statistics (TSS), partitioning occurrences into training and test data with different levels of spatial independence. Most models computed from pseudo‐predictors were classified as good and sometimes were even better evaluated than models computed using real environmental variables. However, on average they were better discriminated when the partitioning of occurrences allowed testing for model transferability. Main conclusions These findings confirm the crucial importance of variable selection and the inability of current evaluation metrics to assess the biological significance of distribution models. We recommend that researchers carefully select variables according to the species' ecology and evaluate models only according to their capacity to be transfered in distant areas. Nevertheless, statistics of model evaluations must still be interpreted with great caution.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Additionally, they provide critical insights into ecological niche understanding [Sillero et al., 2021]. These models correlate observations of species occurrence with recorded environmental variables [Elith and Leathwick, 2009], often focusing on abiotic factors, such as temperature, precipitation, and soil properties [Fourcade et al., 2018], and sometimes incorporating biotic factors, such as vegetation cover and species interactions [Wisz et al., 2013]. The selection of which abiotic and biotic variables to include in SDMs is critical, as the modeled outcome can vary depending on the choice of predictors [Araújo and Guisan, 2006, Austin and Van Niel, 2011, Peterson et al., 2011, Sillero et al., 2021. ...
... However, their availability is not always consistent, and traditional SDMs such as Maxent, generalized linear models (GLMs), or decision tree-based approaches [Valavi et al., 2022], often struggle with collinearity among predictors, particularly when occurrence data are scarce , Braunisch et al., 2013, Ashcroft et al., 2011. Consequently, the number of predictors is frequently reduced, often oversimplifying the ecological processes being modeled [Fourcade et al., 2018, Cobos et al., 2019. ...
... First, SDMs should provide the flexibility to select predictors at inference that are deemed most relevant to a specific task and target species. The applications and research questions for SDMs are numerous, each requiring a different set of predictors to be fed into the model [Araújo and Guisan, 2006, Williams et al., 2012, Mod et al., 2016, Fourcade et al., 2018. For example, estimating the current range of a species requires incorporating human influence data along with environmental variables, as anthropogenic pressures significantly affect habitat suitability [Frans and Liu, 2024]. ...
Preprint
Full-text available
Species Distribution Models (SDMs) play a vital role in biodiversity research, conservation planning, and ecological niche modeling by predicting species distributions based on environmental conditions. The selection of predictors is crucial, strongly impacting both model accuracy and how well the predictions reflect ecological patterns. To ensure meaningful insights, input variables must be carefully chosen to match the study objectives and the ecological requirements of the target species. However, existing SDMs, including both traditional and deep learning-based approaches, often lack key capabilities for variable selection: (i) flexibility to choose relevant predictors at inference without retraining; (ii) robustness to handle missing predictor values without compromising accuracy; and (iii) explainability to interpret and accurately quantify each predictor's contribution. To overcome these limitations, we introduce MaskSDM, a novel deep learning-based SDM that enables flexible predictor selection by employing a masked training strategy. This approach allows the model to make predictions with arbitrary subsets of input variables while remaining robust to missing data. It also provides a clearer understanding of how adding or removing a given predictor affects model performance and predictions. Additionally, MaskSDM leverages Shapley values for precise predictor contribution assessments, improving upon traditional approximations. We evaluate MaskSDM on the global sPlotOpen dataset, modeling the distributions of 12,738 plant species. Our results show that MaskSDM outperforms imputation-based methods and approximates models trained on specific subsets of variables. These findings underscore MaskSDM's potential to increase the applicability and adoption of SDMs, laying the groundwork for developing foundation models in SDMs that can be readily applied to diverse ecological applications.
... regularisation multipliers and feature classes) can have a significant influence on resulting MaxEnt model outputs and inferences drawn (Warren and Seifert, 2011;Webber et al., 2011;Shcheglovitova and Anderson, 2013;Boria et al., 2017;Sutton and Martin, 2022). Despite its importance in the model building process, covariate selection methods have received considerably little attention to date (but see Austin and Van Niel, 2011;Fourcade et al., 2018;Low et al., 2021;Adde et al., 2023), whereby covariate selection refers to ''identify [ing] the best subset of covariates out of a panel of many candidates, both from an ecological and statistical perspective'' (see Adde et al., 2023, and references therein). ...
... RIDGE and LASSO regression (Guisan et al., 2002;Guyon and Elisseeff, 2003;Saeys et al., 2007;Adde et al., 2023). For a more detailed overview of the different covariate selection methods, readers are directed to several recent reviews on this topic (Fois et al., 2018;Fourcade et al., 2018;Melo-Merino et al., 2020). While there are guidelines on how to implement these methods individually, there is considerable variation in parameter recommendations. ...
... A growing number of species distribution modelling studies have shown that covariate selection and model parameter choice can affect predictive output to varying degrees, depending on both the quality of the input data and the biological and ecological context of the focal taxon/taxa (e.g. niche conservatism) (Araújo and Guisan, 2006;Austin and Van Niel, 2011;Petitpierre et al., 2017;Fourcade et al., 2018;Adde et al., 2023). The present work used three sets of independent testing data for the insect citrus pest, Diaphorina citri, to explore the effects of covariate selection on MaxEnt models by applying multiple reduction methods to combinations of starting climatic covariates. ...
Article
Full-text available
The performance and transferability of species distribution models (SDMs) depends on a number of ecological, biological, and methodological factors. There is a growing body of literature that explores how the choice of climate covariate combinations and model parameters can affect predictive performance, but relatively few that delve into covariate reduction methods and the optimisation of model parameters, and the resulting spatial and temporal transferability of those models. The present work used the citrus pest, Diaphorina citri Kuwayama (Hemiptera: Psyllidae), to illustrate how MaxEnt models trained on the insect's native range in Asia varied in their predictions of climatic suitability across the introduced range when eight different covariate reduction methods were applied during model building. Additionally, it showed how model sensitivity varied across these different covariate combinations using three sets of independently validated occurrence points in the invaded range of the psyllid. Climatically suitable areas for D. citri differed by as much as twofold between the best and worst-performing models in selected areas. Great care should be taken in the selection of the highest-performing predictor combinations and model parameter settings for SDMs, particularly in the case of invasive species where the assumption of environmental equilibrium is likely violated in the introduced range. Understanding how the predictive ability of SDMs can be influenced by the methodological choices made during the model building phase is vital to ensuring that ecological and invasion management programmes do not over-or underestimate climatic suitability and subsequent invasion risk.
... To avoid nonindependence between training and testing data, we implemented the ENMeval 'block' approach, which partitions occurrences according to their longitude and latitude, as recommended by Radosavljevic and Anderson (2014). The result is four nonoverlapping geographical intervals of equal numbers of occurrences, corresponding to each corner of the geographical space (Fourcade et al. 2018). In this approach, the background points were also split following the same spatial partitions. ...
... Then, in each modeling step, the model was trained without background points located in the same area as the test points. Therefore, the block method provides the best spatial independence between the training and testing datasets that can be obtained from partitioning a single dataset (Fourcade et al. 2018). ...
... Consequently, this method quantifies the ability of models to extrapolate their predictions into new areas (Fourcade et al. 2018). For model selection, the Akaike information criterion corrected for small sample sizes (AICc) was used (Burnham and Anderson 2002). ...
Article
Full-text available
Anthropogenic climate change significantly impacts ecosystem health, biodiversity, and the life cycle and distribution of aquatic macrophytes. Mexican aquatic habitats for macrophytes are particularly vulnerable, with their degradation posing severe ecological risks for freshwater, wetland, and terrestrial ecosystems. This study analyzed the current and future distributions of Sagittaria latifolia and S. macrophylla, two crucial aquatic plant species in Mexico. Species distribution models (SDM) were used, incorporating bioclimatic and topographic variables, with projections for 2041-2060 and 2061-2080 using three Global Circulation Models. Niche overlap was also assessed. The Trans-Mexican Volcanic Belt emerged as a significant region for both species. We observed substantial variability among climate models. For S. latifolia, gains ranged from 1.708% (CNRM-CM6-1 model) to 74.806% (HadGEM3-GC31-LL model) for 2041-2060, while the highest loss was 44.11% (MPI-ESM1-2-HR model). Similarly, S. macrophylla showed gains up to 73.591% (MPI-ESM1-2-HR) and losses up to 19.734% (CNRM-CM6-1). These results highlight species-specific responses to future climate scenarios. Niche overlap analyses revealed that both species currently share up to 41% of their niches, with this overlap likely to continue in the future. This study provides insights into the potential impacts of climate change on species distributions, informing conservation and management strategies. Given S. latifolia's native status and S. macrophylla's endemic and threatened nature, understanding their distribution dynamics is crucial for conservation efforts. This research underscores the need to address climatic threats to ensure the survival of these key species and maintain the health of Mexican aquatic ecosystems.
... AUC is a common metric to assess SDM accuracy, with values > 0.75 suggesting the model provides good discrimination between locations where the species is present and where it is absent (Elith et al. 2006). Because of the importance of using multiple metrics for SDM evaluation (Fourcade et al. 2017), predictive performance was evaluated using the AUC and TSS metrics on three training and testing dataset combinations: (a) the full dataset tested, (b) k-fold cross-validation with a 75%/25% training/testing data split over each of fivefolds, and (c) "Leave One Out" cross-validation in which a year of data was iteratively left out from training and retained for testing . To calculate AUC and TSS for the GAM density model, we used the sensitivity-specificity sum maximization approach (Liu et al. 2005) to obtain thresholds for species presence. ...
... Nonetheless, since the main objective was the comparison of model performances of a variety of modeling techniques, the different survey coverage is not strictly relevant. High -Ideal for zero-inflated data -More information about the population (distribution and aggregation) -Require knowledge of data distribution -More models necessary Both threshold-independent (AUC) and threshold-dependent (TSS) metrics can be misleading when the proportion of the study area occupied by the species is low (Fourcade et al. 2017;Somodi et al. 2017). Our findings align with previous research indicating that AUC alone may not be a reliable indicator of species distribution model (SDM) predictive performance, as it does not account for the spatial distribution of model errors (Lobo et al. 2008). ...
Article
Full-text available
Understanding the habitat of highly migratory species is aided by using species distribution models to identify species‐habitat relationships and to inform conservation and management plans. While Generalized Additive Models (GAMs) are commonly used in ecology, and particularly the habitat modeling of marine mammals, there remains a debate between modeling habitat (presence/absence) versus density (# individuals). Our study assesses the performance and predictive capabilities of GAMs compared to boosted regression trees (BRTs) for modeling both fin whale density and habitat suitability alongside Hurdle Models treating presence/absence and density as a two‐stage process to address the challenge of zero‐inflated data. Fin whale data were collected from 2008 to 2022 along fixed transects crossing the NW Mediterranean Sea during the summer period. Data were analyzed using traditional line transect methodology, obtaining the Effective Area monitored. Based on existing literature, we select various covariates, either static in nature, such as bathymetry and slope, or variable in time, for example, SST, MLD, Chl concentration, EKE, and FSLE. We compared both the explanatory power and predictive skill of the different modeling techniques. Our results show that all models performed well in distinguishing presences and absences but, while density and presence patterns for the fin whale were similar, their dependencies on environmental factors can vary depending on the chosen model. Bathymetry was the most important variable in all models, followed by SST and the chlorophyll recorded 2 months before the sighting. This study underscores the role SDMs can play in marine mammal conservation efforts and emphasizes the importance of selecting appropriate modeling techniques. It also quantifies the relationship between environmental variables and fin whale distribution in an understudied area, providing a solid foundation for informed decision making and spatial management.
... To assess the distribution of the Japanese giant salamander, we used the presence data and their corresponding environmental characteristics, considering climatic, topographic and land-use predictors at a 1 km² resolution. As it is recommended to avoid too many predictors in the same models and to focus on variables likely to influence the distribution of the species (Ficetola et al. 2014;Fourcade et al. 2018), we used a limited set of environmental variables to perform the models, selected according to the Japanese giant salamander's ecology (Okada et al. 2008;Harris et al. 2013;Bjordahl et al. 2020;Soley-Guardia et al. 2024). Four climatic variables were selected among the 19 bioclimatic variables of WorldClim version 2 climate dataset (Fick and Hijmans 2017) at a high (1 km²) spatial resolution: mean temperature of the wettest quarter (Bio8), mean temperature of the warmest quarter (Bio10), precipitation of the warmest quarter (Bio18) and precipitation of the coldest quarter (Bio19). ...
... material 1: table S1; Suppl. material 2: table S2 for details on each data sources) (Elith et al. 2006;Fourcade et al. 2018). ...
Article
Full-text available
Giant salamanders are the world’s largest amphibians and keystone predators in riverine ecosystems where they face global declines. Identifying environmental variables influencing their distribution is, therefore, an essential step for their conservation. This study aims to assess the current habitat suitability and distribution of the Japanese giant salamander (Andrias japonicus) and to predict changes under future climate scenarios. We used species distribution models (SDMs) over a 282,916 km² area, including 477 high-resolution occurrence data of giant salamanders and seven remote-sensing environmental predictors (climatic, topographic and land use). We projected the prediction maps, identified the most contributing variables and calculated the shifts of suitable areas for three periods (2050, 2070 and 2090) under projected climatic conditions. Climatic variables highly contributed to the distribution of giant salamanders (76% of the total), with preferences for areas with moderate precipitations during cold and wet seasons and mild summer temperatures. A moderately steep surrounding environment was favourable for salamanders, whereas the land-use variables had less influence. Future climate predictions indicate a major decrease of suitable areas. Altogether, our results highlight the habitat preferences of giant salamanders at a broad scale and the negative impact of climate change on future suitable areas. These findings provide important steps for upcoming conservation actions for this threatened species in delineating favourable distribution ranges and priority areas that should be directly affected by climate change. Finally, they emphasise the need for new research at a fine scale on disturbances to the aquatic habitat to enhance the conservation of giant salamanders. Highlights We used a species distribution model (MaxEnt), high-resolution occurrence data and remote sensing data (climatic, topographic and land use) to identify suitable habitats for the Japanese giant salamander in Japan. The most suitable environments for the Japanese giant salamander are located both within and beyond its current distribution range, with the ‘Japanese Alps’ forming an impassable natural barrier. Among the variables studied, precipitation of the warmest quarter, precipitation of the coldest quarter, mean temperature of the warmest quarter and mean temperature of the wettest quarter were the most important environmental predictors of the species’ distribution. Climate change is expected to severely reduce the potential suitable geographical areas for the Japanese giant salamander in the future. The present work calls for new surveys based on the projected maps to improve the mapping of salamander distribution and to focus on ecological features and threats at the aquatic habitat level to understand the risks to their populations.
... Currently, among the reported species distribution models, the MaxEnt model has better stability and higher accuracy, and it exhibits less distortion in dealing with group temperature factors [28,32,61]. Variable selection has a remarkable effect on species distribution modeling [62]. Terrestrial ecosystems exhibit a high sensitivity to temperature changes induced by climate change, which can directly or indirectly influence the spatial In summary, the results reveal a significant decline in high-suitability habitat for North China leopards, while areas classified as medium and low suitability are projected to increase (Figure 4). ...
... Currently, among the reported species distribution models, the MaxEnt model has better stability and higher accuracy, and it exhibits less distortion in dealing with group temperature factors [28,32,61]. Variable selection has a remarkable effect on species distribution modeling [62]. Terrestrial ecosystems exhibit a high sensitivity to temperature changes induced by climate change, which can directly or indirectly influence the spatial distribution patterns of species and their associated ecological factors. ...
Article
Full-text available
Climate change has a profound impact on the phenology and growth of vegetation, which in turn influences the distribution and behavior of animal communities, including prey species. This dynamic shift significantly affects predator survival and activities. This study utilizes the MaxEnt model to explore how climate change impacts the distribution of the North China leopard (Panthera pardus japonensis) in the Ziwuling region of Gansu Province, China. As an endemic subspecies and apex predator, the North China leopard is vital for maintaining the structure and function of local ecosystems. Unfortunately, its population faces several threats, including habitat change, interspecies competition, and human encroachment, all of which are compounded by the ongoing effects of climate change. To assess the requirement and quality of habitat for this species, we conducted a population survey in the Ziwuling area from May 2020 to June 2022, utilizing 240 infrared cameras, which identified 46 active leopard sites. Using the MaxEnt model, we simulated habitat suitability and future distribution under different climate change scenarios based on nine environmental variables. Our results indicate that the population distribution of North China leopards is primarily influenced by the mean diurnal range (Bio2), with additional sensitivity to isothermal conditions (Bio3), temperature seasonality (Bio4), maximum temperature of the warmest month (Bio5), and annual temperature range (Bio7). We also evaluated habitat suitability across three socioeconomic pathways (SSP126, SSP245, and SSP585) for three time intervals: the 2050s (2041–2060), the 2070s (2061–2080), and the 2090s (2081–2100). The findings suggest a significant decline in high-suitability habitat for North China leopards, while areas of medium and low suitability are projected to increase. Understanding these distributional changes in North China leopards will enhance our comprehension of the region’s biogeography and inform conservation strategies aimed at mitigating the impacts of climate change.
... The usefulness of SDM, like most models, is highly dependent on the quality of the input data. The algorithms under the hood are not based on biology and most were developed as general purpose regression models and will attempt to fit a model even when using nonsense data (Fourcade et al. 2018). As such, the input data for these models were carefully selected and cleaned to remove low quality data and ensure that the climatic data is biologically relevant for each species. ...
... To build the best models, environmental variables must be selected which have biological relevance to the species being studied. Selection of many correlated or irrelevant variables will lead to model overfitting or poor model predictions due to spurious correlations between species presence and environmental data (Fourcade et al. 2018). To start the process of identifying relevant environmental variables, a literature review was conducted for each species, looking for studies which have done species distribution modeling, or studies that investigated the climatic conditions favorable to growth of each species. ...
Technical Report
Full-text available
Many species of invasive plants are currently considered incipient on O'ahu, existing in small populations, but with a high risk of spreading to new locations and forming much larger populations that can transform habitats. This project uses species distribution modeling (SDM) to model and correlate the presence of these species to climatic information and create products which predict what areas of Hawai'i are climatically suitable for the growth of each species. These maps can be used to prioritize early detection surveys in areas where a species is not yet found, but is climatically suitable as well as to identify high priority habitats which may be invaded if a species continues to spread. In total, 20 species were modeled across the Hawaiian islands including several high risk invaders such as fountain grass (Cenchrus setaceus), devilweed (Chromolaena odorata), and cane ti (Tibouchina herbacea). The majority of models produced have high predictive ability for these species in Hawai'i, although some must be interpreted with caution due to limited data or changes in climatic niche between Hawai'i and the species' native range. Background Species distribution modeling (SDM) has emerged as a powerful tool in ecology not only for predicting the spread of invasive species (Barbet-Massin et al. 2018; Canelles et al. 2021; Meriggi et al. 2022) but also for identifying new populations of endangered species (Allen & McMullin 2019; Eyre et al. 2022), guiding the selection of biological reserves (Guisan et al. 2013), and prioritizing sites for outplanting or release of endangered species (Eyre et al. 2022). SDM uses wild species occurrence data (GPS points) in combination with environmental data such as rainfall and temperature to train a machine learning model which can then predict what areas are climatically suitable for the growth of the species (Urbina-Cardona et al. 2019).
... First, we selected four climate variables describing the extremes of temperature and ranges of precipitation, as these are likely to be key climatic limiting factors for plant species establishing in mountainous regions (Elith and Leathwick, 2009). Among the 19 bioclimatic variables that are frequently used in SDM studies (Fourcade et al., 2018), we thus used bio 5 (maximum temperature of the warmest month), bio 6 (minimum temperature of the coldest month), bio 13 (precipitation of the wettest month) and bio 14 (precipitation of the driest month). The climate variables were obtained from the CHELSA 2.1 database and represent the average global climate for the period 1980-2010 (Karger et al., 2017). ...
... This strategy allowed the calculation of the area under the receiver operating curve (AUC), the true skill statistic (TSS) and the continuous Boyce index. As AUC and TSS have been criticised for providing a poor measure of overall model performance (Lobo et al., 2008;Fourcade et al., 2018), we used the Boyce index to construct our ensemble predictions, emphasising that any evaluation metric should not be taken as an absolute measure of models' discrimination ability. We computed the importance of each variable using a permutation approach with 10 permutations, which consists in calculating the correlation between model predictions from the original and shuffled variables (see biomod2 manual for details). ...
Article
Full-text available
Climate change is affecting biodiversity across all taxonomic groups and ecosystems globally. Mountain ecosystems are particularly sensitive to climate warming, as temperature is generally a limiting factor for their vegetation. The Irano-Anatolian global biodiversity hotspot, which includes high elevations with a rich endemic biodiversity, offers good opportunities to study the effects of future climate change on its plant diversity. We used species distribution models to predict changes in species' habitat suitability by the end of the century (2071-2100) under two extreme shared socioeconomic pathways (SSPs), for 713 endemic plant species of the area. We found that a remarkably high number of species are predicted to experience a shift in their climatically suitable habitats from lower to higher elevations, resulting in a decrease in their potential range areas (79 % and 86 % of species, under the SSP 1-2.6 and SSP 5-5.8 scenarios, respectively). As a consequence, we also predicted a decrease in species richness in the low (< 1200 m) and middle (1200-2500 m) elevational belts, while an increase in species richness in high elevational belt (> 2500 m). This study demonstrates that climate change has the potential to cause a massive restructuring of plant community composition in this area, including the risk of extinction for many species. This poses a significant threat to the biodiversity of this region, which calls for urgent action to mitigate as far as possible the adverse effects of climate change in the region.
... While prediction accuracy on an independent dataset is the standard way to assess model performance, it is equally important to understand how model predictions occurred. Studies have shown that accurate model predictions can be achieved even when irrelevant and artificial predictors are used, outperforming models with domain-relevant predictors (Fourcade et al., 2018;Behrens & Viscarra Rossel, 2020). A tradeoff between interpretability and accuracy is thus needed, particularly in natural sciences, where the models are used for causal explanation (Breiman, 2001b;Shmueli, 2010). ...
... However, we preferred the local R 2 from each spatial block to get (a) the strictest evaluation for each model (Supplementary Table S14) and (b) the comparison between different spatial blocks, identifying the spatial blocks that model predictions converged or diverged (Fig. 7). The partitioning of the geographical and feature space represents better the error related to extrapolation and transferability although it increases the AOA (Wenger & Olden, 2012;Roberts et al., 2017;Fourcade et al., 2018;Hao et al., 2020;Gazis & Greinert, 2021;Meyer & Pebesma, 2021). ...
Article
Full-text available
High-resolution mapping of deep-sea polymetallic nodules is needed (a) to understand the reasons behind their patchy distribution, (b) to associate nodule coverage with benthic fauna occurrences, and (c) to enable an accurate resource estimation and mining path planning. This study used an autonomous underwater vehicle to map 37 km ² of a geomorphologically complex site in the Eastern Clarion–Clipperton Fracture Zone. A multibeam echosounder system (MBES) at 400 kHz and a side scan sonar at 230 kHz were used to investigate the nodule backscatter response. More than 30,000 seafloor images were analyzed to obtain the nodule coverage and train five machine learning (ML) algorithms: generalized linear models, generalized additive models, support vector machines, random forests (RFs) and neural networks (NNs). All models ML yielded similar maps of nodule coverage with differences occurring in the range of predicted values, particularly at parts with irregular topography. RFs had the best fit and NNs had the worst spatial transferability. Attention was given to the interpretability of model outputs using variable importance ranking across all models, partial dependence plots and domain knowledge. The nodule coverage is higher on relatively flat seafloor ( < 3°) with eastward-facing slopes. The most important predictor was the MBES backscatter, particularly from incident angles between 25 and 55°. Bathymetry, slope, and slope orientation were important geomorphological predictors. For the first time, at a water depth of 4500 m, orthophoto-mosaics and image-derived digital elevation models with 2-mm and 5-mm spatial resolutions supported the geomorphological analysis, interpretation of polymetallic nodules occurrences, and backscatter response.
... Therefore, a dominant challenge in applying ALS data is choosing appropriate metrics to represent ecologically important structure (Fourcade et al. 2018, Moudrý et al. 2023. While the wider morphological classes of cover, height, and structural complexity are useful organizational concepts, specific and quantifiable metrics are regularly required for modeling. ...
... The specific predictors tested in modeling are usually chosen to best represent important habitat characteristics or key ecological drivers, but choices are strongly restricted by data availability leading to the use of many proximal metrics with limited direct relevance to a species' ecology (Pratt et al. 2022). The danger of using predictors that have weak ecological justification is that phenomenological models can be produced which adequately describe training data but do not reflect the actual species-environment relationships found in nature (Fourcade et al. 2018, Matthiopoulos et al. 2023). This does little to advance the understanding of a species' ecology and often produces inaccurate predictions in new areas or times (Evans et al. 2015, Yates et al. 2018, Pratt et al. 2022, Rousseau and Betts 2022. ...
Article
Full-text available
Species' habitats are strongly influenced by the 3‐dimensional (3D) structure of ecosystems. The dominant technique used to measure 3D structure is Airborne Laser Scanning (ALS), a type of LiDAR (Light Detection and Ranging) technology. Airborne Laser Scanning captures fine‐scale structural information over large spatial extents and provides useful environmental predictors for habitat modeling. However, due to technical complexities of processing ALS data, the full potential of ALS is not yet realized in wildlife research, with most studies relying on a limited set of 3D predictors, such as vegetation metrics developed principally for forestry applications. Here, we highlight the full potential of ALS data for wildlife research and provide insight into how it can be best used to capture the environmental conditions, resources, and risks that directly determine a species' habitat. We provide a nontechnical overview of ALS data, covering data considerations and the modern options available for creating custom, ecologically relevant, ALS predictors. Options included the following: i) direct point cloud approaches that measure structure using grid, voxel, and point metrics, ii) object‐based approaches that identify user‐defined features in the point cloud, and iii) modeled environmental predictors that use additional modeling to infer a range of habitat characteristics, including the extrapolation of field acquired measurements over ALS data. By using custom ALS predictors that capture species‐specific resources, risks, and environmental conditions, wildlife practitioners can produce models that are tailored to a species' ecology, have greater biological realism, test a wider range of species‐environment relationships across scales, and provide more meaningful insights to inform wildlife conservation and management.
... Model accuracy was firstly evaluated by the area under the receiver operating curve (AUC; Fourcade et al. 2018) to assess the ability of the model to discriminate between presence and pseudo-absence points, and also by Kappa to measure the expected agreement between the prediction and actual presence data [61,62]. These metrics provide an accuracy value estimated across the entire prediction area and were termed 'global' model accuracy estimates. ...
... acceptable between 0.7-0.8, moderate between 0.6-0.7 and poor for AUC values below 0.5 [63]. Kappa values range from − 1 to 1 with results considered to indicate excellent agreement for values 0.8-1, substantial for values 0.6-0.8, ...
Article
Full-text available
Background Accurate predictions of animal occurrence in time and space are crucial for informing and implementing science-based management strategies for threatened species. Methods We compiled known, available satellite tracking data for pygmy blue whales in the Eastern Indian Ocean (n = 38), applied movement models to define low (foraging and reproduction) and high (migratory) move persistence underlying location estimates and matched these with environmental data. We then used machine learning models to identify the relationship between whale occurrence and environment, and predict foraging and migration habitat suitability in Australia and Southeast Asia. Results Our model predictions were validated by producing spatially varying accuracy metrics. We identified the shelf off the Bonney Coast, Great Australian Bight, and southern Western Australia as well as the slope off the Western Australian coast as suitable habitat for migration, with predicted foraging/reproduction suitable habitat in Southeast Asia region occurring on slope and in deep ocean waters. Suitable foraging habitat occurred primarily on slope and shelf break throughout most of Australia, with use of the continental shelf also occurring, predominanly in South West and Southern Australia. Depth of the water column (bathymetry) was consistently a top predictor of suitable habitat for most regions, however, dynamic environmental variables (sea surface temperature, surface height anomaly) influenced the probability of whale occurrence. Conclusions Our results indicate suitable habitat is related to dynamic, localised oceanic processes that may occur at fine temporal scales or seasonally. An increase in the sample size of tagged whales is required to move towards developing more dynamic distribution models at seasonal and monthly temporal scales. Our validation metrics also indicated areas where further data collection is needed to improve model accuracy. This is of particular importance for pygmy blue whale management, since threats (e.g., shipping, underwater noise and artificial structures) from the offshore energy and shipping industries will persist or may increase with the onset of an offshore renewable energy sector in Australia.
... Some factors within the WorldClim data [57] exhibit a high degree of correlation, which might exert an adverse influence on the modelling outcomes [58]. In this study, the Pearson correlation coefficient and variance inflation factor (VIF) were comprehensively employed to reduce the dimensionality of the environmental data extracted from the actual occurrence points, which effectively addresses this issue [60][61]. Furthermore, when species distribution data are deficient and fail to cover the entire extent of species existence, it is termed incomplete sampling [62][63] and thereby augments the uncertainty of the model results. ...
Article
Full-text available
Background As a species of considerable medicinal, ecological, and economic significance, the protection of C. songaricum and its host plants is of paramount importance. Biodiversity patterns and species distribution are profoundly influenced by climate change. Understanding the adaptive mechanisms of organisms in response to these changes is essential for effective species conservation. However, there is currently limited information available on simulating habitat suitability and assessing key environmental factors associated with parasite species using niche models. Methods This study utilized environmental and species distribution data to analyze the shifts in the geographic range of C. songaricum and its host plants under current and projected future climate scenarios using the Biomod2 platform, which integrates multiple individual models into an ensemble framework. Additionally, the study quantified the environmental variables influencing the observed distribution patterns. Results The potential geographical distribution and overlapping areas of C. songaricum and its host plants are primarily concentrated in Asia and North America. Under all four scenarios within the two timeframes (2041–2060 and 2061–2080), the overall suitable habitat areas for C. songaricum, Nitraria tangutorum Bobr., N. sphaerocarpa Maxim., and Peganum multisectum (Maxim.) Bobrov are expected to decrease compared with current climatic conditions. Conversely, the total area of suitable habitat for Kalidium foliatum (Pall.) Moq., Nitraria sibirica Pall., and Zygophyllum xanthoxylum (Bunge) Maxim. is predicted to increase. All species except K. foliatum will experience greater reductions between 2041 and 2060 than between 2061 and 2080 under more severe climate change scenarios. There is significant ecological niche overlap among C. songaricum, N. sphaerocarpa, N. tangutorum, and P. multisectum. Key factors influencing the future distribution of C. songaricum include the mean ultraviolet-B light of the lowest month, altitude, and annual mean temperature. Conclusion A comprehensive analysis demonstrated that the accuracy of predictions could be significantly enhanced and the distributional error for individual species could be minimized by employing the Biomod2 ensemble model to simulate the suitable habitats of parasitic species. The findings of this study can significantly inform both the management of C. songaricum plantations and the conservation of C. songaricum and its host plants.
... TSS values range from −1 (all wrong predictions) to 1 (all correct predictions), with values > 0 indicating models that predict better than random. TSS's prevalence-independence is advantageous when prevalence among classes varies (Allouche et al. 2006;Freeman and Moisen 2008), and it is often used to evaluate ecological models with rare classes and imbalanced datasets (Akosa 2017;Fourcade et al. 2018;Somodi et al. 2017). Relative influence is the percentage contribution of each predictor variable to model fit, calculated as the improvement to classification accuracy made by each variable averaged across all trees. ...
Article
Road-crossing structures limit organism movement, but their passabilities are rarely measured because they are numerous and time-consuming to survey. Instead, road-crossing passability could be treated in one of four ways: assuming equal passability at all locations (uniform method), assigning random passability values sampled from barrier surveys (random sample method), using remote sensing data to infer presence (presence/absence method) or rate passability (rating category method). Each prediction method produces different passability estimates for individual barriers, but how these differences affect river connectivity estimates has not been systematically evaluated. We compared river connectivity estimates from these four road-crossing passability prediction methods in the Bear River Basin, USA. We parameterized barrier passability methods with Bonneville Cutthroat Trout Oncorhynchus clarkii utah passage survey data at 140 road crossings. Road crossings blocked fish passage at 37% of survey locations. Those road-crossing barriers that obstructed fish movement also decreased the proportion of connected reaches in the river network from 12% (with dams and all road crossings assumed to be passable) to just 3%. All passability prediction methods produced similar results and had considerable uncertainty predicting passability for individual barriers. Our findings suggest that simpler methods, like uniform or random sample road-crossing passability predictions, are sufficient to characterize river connectivity. Our work highlights the importance of identifying road crossings that act as barriers to organism passage and identifies critical limitations to predicting barrier status for connectivity analysis and conservation planning.
... These approaches require data on both species occurrences and underlying environmental or habitat conditions, and, given their complexity, take significant effort to build and validate. As such, a wide literature exists regarding best practices for developing (Araújo and New, 2007;Elith and Leathwick, 2009;Barbet-Massin et al., 2012;Robinson et al., 2017;Derville et al., 2018) and evaluating (Allouche et al., 2006;Fourcade et al., 2018) SDMs . In addition, the spatial or temporal resolution of predictive models are only as good as the underlying environmental data upon which they rely. ...
Chapter
Full-text available
Navigating Our Way reflects the broader insights and diverse voices revolutionizing marine conservation. This volume brings together an array of scholars, practitioners, and experts from multiple fields, creating a network of trans-disciplinary and multi-cultural perspectives to address the complex problems in marine conservation. Larry B. Crowder, a leading voice in the field, has curated contributions on a wide range of topics, including critically endangered species in the Bahamas, Argentinian penguins, and the ecosystems of our coral reefs. The book delves deeply into human relationships with nature, the development of climate-smart solutions, and the governance of collective action. Committed to inclusivity, this volume also includes conversations across the disciplines of natural sciences, social sciences, and governance, incorporating both Western and Indigenous knowledge traditions. This volume is highly relevant to marine conservation scholars, practitioners, managers, and students, and anyone interested in preserving our marine environment.
... Pearson's correlation coefficient (r) was then employed to analyze the pair correlation among all these transformed predictors. If two variables were significantly correlated (jrj > 0.7), we retained one of them, choosing those that have frequently been recognized in the literature as being of ecological and biological relevance (Fourcade et al., 2018). ...
Article
Full-text available
Societal Impact Statement African sandalwood (Osyris lanceolata) leaves, roots, barks, fruits, and woods are used for multiple purposes throughout Asia, Africa, and Europe. The species is threatened in several eastern African countries. To improve the species' management and conservation, a habitat suitability study was undertaken in its at‐risk region in eastern Africa and extended to southern and horn of Africa due to its continuous distribution. African sandalwood continues to face intense human pressure and needs to be prioritized in terms of sustainable management practices. The plant's significant human importance necessitates inclusive conservation measures in all three habitat regions in Africa to safeguard it. Summary African sandalwood (Osyris lanceolata) is a versatile plant with significant economic and societal importance. It is threatened in several countries in Africa due to overexploitation. The lack of knowledge about the plant's ecology and environmental requirements complicates the species' long‐term management. We sought to address this issue by providing a novel understanding of the environmental factors that influence the occurrence of African sandalwood and its potential distribution. Using publicly available occurrence records from 1950 to 2021 and field data, we examined the species' habitat requirements in eastern, southern, and horn of Africa regions. We applied the Generalized Additive Models to link the plant's occurrence data to 12 environmental variables reflecting climatic, physiographic, and edaphic characteristics, while controlling for the biases that arise from publicly gathered occurrence records. Our findings revealed that the plant's habitat requirements vary among the three regions investigated. While climatic factors are essential in all three regions, physiographic aspects are mainly important for the eastern and southern populations, while edaphic variables were pertinent exclusively in southern region. Areas suitable and optimal for the plant were estimated to comprise 674,700 km² (17.3% of total land area) in eastern Africa, 267,750 km² (25.6%) in the horn of Africa, and 716,300 km² (13.9%) in southern Africa. More than two‐thirds of these areas are located on unprotected lands, highlighting the importance of community involvement for a sustainable management of the species. Our results on the potential geographical distribution of African sandalwood are crucial to guide more targeted conservation and recovery efforts.
... Predictor selection -The selection of environmental predictors is an unresolved issue in SDM (Leroy, 2023). To best fit with the latest methodological recommendations ( (Dubos et al., 2022d;Fourcade et al., 2018;Hui, 2022), (4) statistical selection based on relative importance (Bellard et al., 2016b;Thuiller et al., 2009), (5) consideration of potential interactive effects (Gábor et al., 2019). (1) We discarded bio3 (isothermality) because the relationship with species presence was unclear. ...
Preprint
Full-text available
Amegilla pulchra is a solitary bee from Australia that has recently been spread throughout many islands of the Pacific. The non-regulated human-driven spread of the species may affect the local pollinator communities and their interactions with host plants. We used an ecological niche modelling approach, accounting for non-equilibrium and anthropogenic spread with the most recently recommended approach, and predicted the potential spread of the species under current and future conditions. We expected climate change and increase in human density to offer new suitable environments for the spread of the species. Invasion risks will increase in the future overall, but more in the non-native regions compared to the native region. In the native region, the projected effect of future environmental change was highly contrasted, with increasing invasion risk in human-dense areas but decreasing elsewhere. We found high risks of invasion in eastern Asia and provided a world ranking of entry points for surveillance priority which accounts for maritime traffic. This study highlights potential contrasted effects between climate and anthropogenic change, with differing projections between the native and the non-native regions. Public awareness and prevention will be the key to prevent further spread and mitigate potential adverse effects of the species on island systems. In regions that are already invaded, we propose that habitat restoration is a promising strategy for both the mitigation of the spread and the conservation of local communities.
... Predictor selection -The selection of environmental predictors is an unresolved issue in SDM (Leroy, 2023). To best fit with the latest methodological recommendations ( (Dubos et al., 2022d;Fourcade et al., 2018;Hui, 2022), (4) statistical selection based on relative importance (Bellard et al., 2016b;Thuiller et al., 2009), (5) consideration of potential interactive effects (Gábor et al., 2019). (1) We discarded bio3 (isothermality) because the relationship with species presence was unclear. ...
Preprint
Full-text available
Amegilla pulchra is a solitary bee from Australia that has recently been spread throughout many islands of the Pacific. The non-regulated human-driven spread of the species may affect the local pollinator communities and their interactions with host plants. We used an ecological niche modelling approach, accounting for non-equilibrium and anthropogenic spread with the most recently recommended approach, and predicted the potential spread of the species under current and future conditions. We expected climate change and increase in human density to offer new suitable environments for the spread of the species. Invasion risks will increase in the future overall, but more in the non-native regions compared to the native region. In the native region, the projected effect of future environmental change was highly contrasted, with increasing invasion risk in human-dense areas but decreasing elsewhere. We found high risks of invasion in eastern Asia and provided a world ranking of entry points for surveillance priority which accounts for maritime traffic. This study highlights potential contrasted effects between climate and anthropogenic change, with differing projections between the native and the non-native regions. Public awareness and prevention will be the key to prevent further spread and mitigate potential adverse effects of the species on island systems. In regions that are already invaded, we propose that habitat restoration is a promising strategy for both the mitigation of the spread and the conservation of local communities.
... This is often because small sample sizes preclude withholding testing data or, more importantly, there is a lack of independently collected test data. Spatially independent data collected from other data sources or in other areas provide a rigorous test of model performance when applied to common model assessment metrics such as AUC or other measures of model discrimination (Fourcade et al. 2018). A rich independent dataset thus provides this work, and a similar model we created for the Northwest Rocky Mountains (Olson et al 2021), with a rare opportunity for rigorous model testing and validation. ...
Article
Full-text available
Understanding how species distributions and associated habitat are impacted by natural and anthropogenic disturbance is central for the conservation of rare forest carnivores dependent on subalpine forests. Canada lynx at their range periphery occupy subalpine forests that are structured by large-scale fire and insect outbreaks that increase with climate change. In addition, the Southern Rocky Mountains of the western United States is a destination for winter recreationists worldwide with an associated high degree of urbanization and resort development. We modeled habitat for a reintroduced population of Canada lynx in the Southern Rocky Mountains using an ensemble species distribution model built on abiotic and biotic covariates and validated with independent lynx locations including satellite telemetry, aerial telemetry, camera traps, den locations, and winter backtracking. Based on this model, we delineated Likely and Core lynx-habitat as thresholds that captured 95% and 50% of testing data, respectively. Likely (5727 km²) and Core (441 km²) habitat were spatially limited and patchily distributed across western Colorado, USA. Natural (e.g., insect outbreaks, fire) and anthropogenic (e.g., urbanization, ski resort development, forest management) disturbance overlapped 37% of Likely lynx-habitat and 24 % of highest quality Core. Although overlap with fire disturbance was low (5%), future burns likely represent the greatest potential impact over decades-long timeframes. The overlap of publicly owned lands administratively classified as “protected” with Likely (62% overlap) and Core (49%) habitat may insulate lynx from permanent habitat conversion due to direct human disturbance (urbanization, ski resort development).
... A total of 100,000 background points were randomly generated and the block approach of ENMeval was used to spatially partition occurrences, as per the methodology of Radosavljevic and Anderson (2014). This method creates four non-overlapping geographic bins to ensure spatial independence between the training and testing datasets (Fourcade et al. 2018). Model performance was assessed using the area under the curve (AUC) from receiver operating characteristic (ROC) plots (Metz 1978;Phillips et al. 2006). ...
Article
Full-text available
Ambystoma altamirani is a critically endangered, microendemic amphibian species inhabiting the high-altitude rivers and streams of the Trans-Mexican Volcanic Belt (TMVB), a region experiencing severe ecological disturbances. This study aims to assess the current and future distribution of A. altamirani under different climate and land-use change scenarios using ecological niche modelling (ENM). We also evaluate the connectivity of suitable habitats and the overlap with existing natural protected areas (NPAs). Using occurrence records and environmental variables, we modelled the species’ potential distribution under two climate models (CN85 and MP85) for 2050. The results indicate a significant reduction in suitable habitat, particularly in areas such as the Sierra de las Cruces and the Chichinautzin Biological Corridor, with habitat losses projected to reach up to 13.95% by 2050 under the CN85 scenario. Forest cover loss between 2001 and 2023 further exacerbates this threat, especially in municipalities like Tlalpan and Ocuilan. Our analysis highlights the urgent need for targeted conservation efforts, including the preservation of mixed Abies-Pinus forests and the restoration of degraded ecosystems. The findings underscore the critical importance of integrated conservation strategies that address habitat degradation, climate resilience and ecological connectivity to ensure the long-term survival of A. altamirani.
... We obtained climate data for current and future conditions from Worldclim 2.1 (Fick & Hijmans, 2017) at a resolution of 30 arc-seconds (~1 km). Choosing variables that are ecologically relevant to the studied species is critical for developing reliable SDMs (Fourcade et al., 2018). We chose five bioclimatic variables that we believe are relevant to the distribution of herpetofauna in Mediterranean areas: the minimum temperature of the coldest month (Bio6), the mean temperature of the warmest quarter (Bio10), precipitations of the driest month (Bio14), precipitation seasonality (Bio15) and precipitations of coldest quarter (Bio19). ...
Article
Full-text available
Climate change and natural land conversion are causing dramatic shifts in species distribution. Amphibians and reptiles, ectothermic animals with limited dispersal ability, and Mediterranean mountain ranges, which are home to numerous locally adapted taxa, are especially vulnerable to these threats. This is the case with Cilento, a highly biodiverse yet under‐investigated area in the southern Apennine Mountains that is protected by a National Park and 30 Natura 2000 sites. We used bias‐corrected species distribution models and area of habitat (AOH) maps to assess the potential combined impact of climate and land‐use change on 11 amphibians and 16 reptiles in the Park and overlapping Natura 2000 sites. The former estimates species climatic suitability (CS) by correlating species presence to climatic characteristics, whereas the latter classifies the land‐use types based on species–habitat relationships. We estimated CS and AOH for current conditions and two climate and land‐use/cover change scenarios: one of sustainability (SSP1‐2.6) and one of fossil‐fueled development (SSP5‐8.5). Under both scenarios, most species showed significant CS loss, with the greatest declines estimated for amphibians and under SSP5‐8.5. Highland species appear to be the most vulnerable, whereas lowland species may gain CS. Given the widespread renaturalization of agricultural land under both scenarios, most species did not show declines in AOH due to land‐use change. However, all species were projected to face significant shifts in CS under both scenarios, presenting a crucial challenge to their survival. These findings offer valuable insights for climate mitigation initiatives aimed at securing the long‐term protection of herpetofauna within Cilento's protected areas.
... To prevent overfitting due to collinearity among environmental factors, variance inflation factor (VIF) analysis was performed in R to screen the environmental variables, and Spearman correlation tests were conducted to reduce model complexity and improve the accuracy of the distribution model (Rose 1995, Fourcade et al. 2018. After screening, factors with a correlation of <0.7 were identified, and factors with a VIF of <5 were then selected. ...
Article
Gynaephora alpherakii (Grum-Grschimailo) (Lepidoptera: Lymantriidae) is a major pest in alpine meadow areas in the Qinghai–Tibetan Plateau (QTP) and causes severe losses in the local livestock production industry. Assessing areas at high risk for G. alpherakii infestation is critical for the effective management of this pest. In this study, an ensemble distribution model was used to analyze areas suitable for G. alpherakii on the QTP. Risk zoning was performed based on the vegetation and environmental conditions in areas with high-occurrence points, and differences between high-occurrence points and other occurrence points were compared. The results revealed that the suitable areas for G. alpherakii on the QTP amounted to 28.27 × 104 hm2, accounting for 10.94% of the total area of the QTP; the area of high-risk was 19.07 × 104 hm2, and these areas were located mainly in the eastern part of the QTP. Qinghai Province had the highest risk, accounting for 77% of the total area identified as high-risk. In terms of habitat, G. alpherakii preferred alpine Kobresia meadows, which have abundant sunshine, loose soil, and scarce precipitation. This study supports efforts to manage G. alpherakii outbreaks and contributes to the ecological protection of the QTP.
... SSURGO parent material (PM) and previous SoilGrids maps used in Ramcharan et al. (2018) as covariates were not included due to spatial artifacts and data circularity concerns. Since this study is focused on prediction quality rather than explanatory inference, we do not summarize variable importance due to concerns about misleading interpretations (e.g., Fourcade et al., 2018;Wadoux et al., 2020), but all model objects are included in our online repository if readers have interest in assessing importance (T. Nauman, 2024). ...
Article
Full-text available
Detailed soil property maps are increasingly important for land management decisions and environmental modeling. The US Soil Survey is investing in production of the Soil Landscapes of the United States (SOLUS), a new set of national predictive soil property maps. This paper documents initial 100‐m resolution maps of 20 soil properties that include various textural fractions, physical parameters, chemical parameters, carbon, and depth to restrictions. Many of these properties have not been previously mapped at this resolution. A hybrid training strategy helped increase training data by roughly 10‐fold over previous similar studies by combining commonly used laboratory data with underutilized field descriptions tied to soil survey map unit component property estimates (to help represent within polygon variability) as well as randomly selected soil survey map unit weighted average property estimates. Relative prediction intervals were used to help select which training data sources improved model performance. Conventional and spatial cross‐validation strategies yielded generally strong coefficients of determination between 0.5 and 0.7, but with substantial variability and outliers among the various properties, types of training data, and depths. Internal review of the maps highlighted both strengths and weaknesses of the maps, but most of the critical comments were in areas with high model uncertainty that can be used to guide future improvements. Generally, previously glaciated areas and complex large alluvial basins were harder to model. The new SOLUS 100‐m maps will be updated in the future to address identified issues and feedback as users interact with the data.
... To assess the appropriateness of using alternative occurrence datasets and model algorithms, it is essential to evaluate the performance of SDMs in terms of their reliability and applicability across different contexts. Multiple evaluation metrics comparing the predicted distributions with the actual species occurrences have been developed (Allouche et al., 2006;Fourcade et al., 2018;Wunderlich et al., 2019). These evaluation metrics are categorised into thresholddependent and threshold-independent measures. ...
Article
Full-text available
Species distribution models (SDMs) are widely used to project how species distributions may vary over time, particularly in response climate change. Although the fit of such models to current distributions is regularly enumerated, SDMs are rarely tested across longer time spans to gauge their actual performance under environmental change. Here, we utilise paleozoological presence/absence records to independently assess the predictive accuracy of SDMs through time. To illustrate the approach, we focused on modelling the Holocene distribution of the hartebeest, Alcelaphus buselaphus, a widespread savannah‐adapted African antelope. We applied various modelling algorithms to three occurrence datasets, including a point dataset from online repositories and two range maps representing current and ‘natural’ (i.e. hypothetical assuming no human impact) distributions. We compared conventional model evaluation metrics which assess fit to current distributions (i.e. True Skill Statistic, TSSc, and Area Under the Curve, AUCc) to analogous ‘paleometrics’ for past distributions (i.e. TSSp, AUCp, and in addition Boycep, F2‐scorep and Sorensenp). Our findings reveal only a weak correlation between the ranking of conventional metrics and paleometrics, suggesting that the models most effectively capturing present‐day distributions may not be the most reliable to hindcast historical distributions, and that the choice of input data and modelling algorithm both significantly influences environmental suitability predictions and SDM performance. We thus advocate assessment of model performance using paleometrics, particularly those capturing the correct prediction of presences, such as F2‐scorep or Sorensenp, due to the potential unreliability of absence data in paleozoological records. By integrating archaeological and paleontological records into the assessment of alternative models' ability to project shifts in species distributions over time, we are likely to enhance our understanding of environmental constraints on species distributions.
... However, despite their widespread use, SDMs face significant challenges associated in particular with the quality of input data (Araújo et al., 2019;Fourcade et al., 2018;G abor et al., 2024;Rocchini et al., 2011). ...
Article
Full-text available
Global mapping of forest height is an extremely important task for estimating habitat quality and modeling biodiversity. Recently, three global canopy height maps have been released, the global forest canopy height map (GFCH), the high‐resolution canopy height model of the Earth (HRCH), and the global map of tree canopy height (GMTCH). Here, we assessed their accuracy and usability for biodiversity modeling. We examined their accuracy by comparing them with the reference canopy height models derived from airborne laser scanning (ALS). Our results show considerable differences between the evaluated maps. The root mean square error ranged between 10 and 18 m for GFCH, 9–11 m for HRCH, and 10–17 m for GMTCH, respectively. GFCH and GMTCH consistently underestimated the height of all canopies regardless of their height, while HRCH tended to overestimate the height of low canopies and underestimate tall canopies. Biodiversity models using predicted global canopy height maps as input data are sufficient for estimating simple relationships between species occurrence and canopy height, but their use leads to a considerable decrease in the discrimination ability of the models and to mischaracterization of species niches where derived indices (e.g., canopy height heterogeneity) are concerned. We showed that canopy height heterogeneity is considerably underestimated in the evaluated global canopy height maps. We urge that for temperate areas rich in ALS data, activities should concentrate on harmonizing ALS canopy height maps rather than relying on modeled global products.
... We used checkerboard, spatial, and environment blocking schemes to divide occurrences into 'k folds'; iterative model training involved k − 1 folds, with each iteration reserving a distinct 'fold' for testing (Valavi et al., 2019). These schemes maximize spatial independence between training and testing datasets, allowing the extrapolation of model projections into new regions (Fourcade et al., 2018). ...
Article
Full-text available
Pine wilt disease is one of the most severe and devastating diseases affecting pine forests worldwide, resulting in huge economic losses in many countries. The pinewood nematode (PWN), Bursaphelenchus xylophilus, is the causal agent of pine wilt disease and is obligately vectored by pine sawyer beetles, of the genus Monochamus. For the disease to be present, the habitat must be suitable for the PWN, and include at least one vector species, and at least one host species. To predict its potential distribution, a model must consider all three components. However, no comprehensive study has examined the influence of climatic suitability on the distribution of this “biological complex”. This study addresses this gap by incorporating biotic interactions, specifically involving 13 vectors and 61 host plants, into projections based on the PWN model. We predicted the global potential distribution of pine wilt disease and compared it with the PWN model to highlight the importance of including biotic interactions in species distribution models under climate change. We found that the model revealed an overall trend of increasing suitability scores for both the PWN and pine wilt disease models under future climate scenarios. Furthermore, compared to the PWN model, the biotic model results in an apparent increase in suitability worldwide in the future as the climate will be more suitable to vector and host complexes, suggesting that pine wilt disease could potentially spread to other places via available hosts and vectors. Synthesis and applications. By incorporating biotic interactions, we projected a more accurate suitable area for pine wilt disease, offering valuable insights into regions at high risk for future invasions by the disease and its vectors. This information supports the development of management and early detection strategies in areas of high suitability, helping to mitigate potential economic and ecological losses. Additionally, this study introduces a novel approach for integrating biotic factors into species distribution models.
... These symptoms only appear in the final stage of the disease before the hosts die within an average of 12 to 18 days after infection [12]. reflect species' physiological constraints [37,38]. Therefore, correlative SDM can lead to mismatches between the native and predicted invasive distribution [37,39]. ...
Article
Full-text available
Chytridiomycosis is one of the greatest threats to the diversity of amphibians worldwide. Caused by the chytrid fungus Batrachochytrium salamandrivorans (Bsal), it plays a decisive role in species declines. Bsal is particularly harmful to the European fire salamander (Salamandra salamandra), causing ulcerations, anorexia and ataxia, which ultimately lead to death. While most studies have focused on the geographic expansion of the pathogen, there is little high-resolution information available. Therefore, we chose a three-step approach in this study: We (I) used a mechanistic distribution model to project the microclimatic growth rate of Bsal within its invasive range on a spatially very high resolution (25 m). We (II) used a correlative distribution model to predict the potential distribution of S. salamandra and (III) applied n-dimensional hypervolumes to quantify the realized microclimatic niches of both species and examine their overlaps. We estimated future trends based on comparisons among three climate scenarios, the current microclimatic conditions and a +2 • C and +4 • C global mean temperature scenario. We demonstrated that Bsal finds suitable growth conditions everywhere within our study area, thus putting S. salamandra at high risk. However, climate change could lead to less suitable thermal conditions for Bsal, possibly providing a loophole for S. salamandra.
... The LVQ was worked by searching for the shortest distance to the value and eliminating the noise, which could potentially interfere with the process of convergence in the forecasting system in large data (Kohonen, 1995). Model verification in this study employs several metrics includin Receiver Operating Characteristics-Area Under Curve (ROC-AUC) (Shabani et al., 2016), Correlation (COR), True Skill Statistics (TSS) (Fourcade et al., 2018) Deviance (Agresti, 2018), Prevalence (Allouche et al., 2006), and Calibration (Fieberg et al., 2018). The ROC-AUC evaluates the model's ability to distinguish between presence and absence data, with values ranging from 0 to 1; an AUC>0.7 indicates good model performance. ...
Article
Full-text available
Climate change significantly impacts living organisms, leading to alterations in their range, distribution, and abundance. This study estimates the potential distribution of representatives of the family Musaceae, noted for their large size and importance to tropical ecosystems. We focus on Musa ingens Simmonds 1960 and employ bioclimatic variables and in situ datasets to model its species distribution. We differentiate potential distribution areas for M. ingens and present a prognostic map of its distribution under four climate change scenarios. Precipitation during the warmest quarter emerges as the primary factor influencing the spatial distribution of M. ingens. Under the RCP (Representative Concentration Pathway) 6.0 scenario, the potential distribution shows an initial decrease, followed by a significant increase by 2070. Meanwhile, the RCP 8.5 scenario indicates an increase in 2050, with a subsequent six percent decrease in 2070. Under the RCP 4.5 scenario for 2050, the species distribution shifts regionally, particularly around the Osua Trikora Mountains and the highlands of the Giluwe Mountains to Mount Victoria. By 2070, the feasible area is expected to expand. Notably, the RCP 2.6 scenario for 2070 predicts a dramatic reduction in habitable area around Mount Bintang Lestari, on the border between Indonesia and Papua New Guinea, rendering the entire lowland region of Papua uninhabitable. Consequently, a sharp decline in the population of M. ingens in this area is predicted.
... Second, our analysis was based on the adequacy of spatial proxies from a prediction accuracy point of view. When using the RF model for knowledge discovery, variables with long or infinite autocorrelation ranges, such as spatial proxies, have been identified as being beyond the prediction horizon (Behrens and Viscarra Rossel, 2020;Wadoux et al., 2020b;Fourcade et al., 2018), and variable-importance statistics in models that include these variables should be interpreted with extreme caution (Meyer et al., 2019;Wadoux et al., 2020a). Third, feature selection based on an appropriate CV scheme has been shown to be helpful in discarding irrelevant features prone to overfitting that generalize poorly to new locations, such as coordinates (Meyer et al., 2019). ...
Article
Full-text available
Spatial proxies, such as coordinates and distance fields, are often added as predictors in random forest (RF) models without any modifications being made to the algorithm to account for residual autocorrelation and improve predictions. However, their suitability under different predictive conditions encountered in environmental applications has not yet been assessed. We investigate (1) the suitability of spatial proxies depending on the modelling objective (interpolation vs. extrapolation), the strength of the residual spatial autocorrelation, and the sampling pattern; (2) which validation methods can be used as a model selection tool to empirically assess the suitability of spatial proxies; and (3) the effect of using spatial proxies in real-world environmental applications. We designed a simulation study to assess the suitability of RF regression models using three different types of spatial proxies: coordinates, Euclidean distance fields (EDFs), and random forest spatial prediction (RFsp). We also tested the ability of probability sampling test points, random k-fold cross-validation (CV), and k-fold nearest neighbour distance matching (kNNDM) CV to reflect the true prediction performance and correctly rank models. As real-world case studies, we modelled annual average air temperature and fine particulate air pollution for continental Spain. In the simulation study, we found that RFs with spatial proxies were poorly suited for spatial extrapolation to new areas due to significant feature extrapolation. For spatial interpolation, proxies were beneficial when both strong residual autocorrelation and regularly or randomly distributed training samples were present. In all other cases, proxies were neutral or counterproductive. Random k-fold cross-validation generally favoured models with spatial proxies even when it was not appropriate, whereas probability test samples and kNNDM CV correctly ranked models. In the case studies, air temperature stations were well spread within the prediction area, and measurements exhibited strong spatial autocorrelation, leading to an effective use of spatial proxies. Air pollution stations were clustered and autocorrelation was weaker and thus spatial proxies were not beneficial. As the benefits of spatial proxies are not universal, we recommend using spatial exploratory and validation analyses to determine their suitability, as well as considering alternative inherently spatial modelling approaches.
... E) Occurrence records of Ambystoma altamirani in Bosque de Agua with the Block method partition latitude following the recommendations of Radosavljevic and Anderson (2014). In this context, four geographically independent quadrants with the same number of occurrences were obtained corresponding to each corner of the geographic space (Fourcade et al. 2018). Subsequently, in each modeling stage, the model was executed without background points located in the same area as the training and testing points. ...
Article
Full-text available
Ambystoma altamirani is a microendemic amphibian limited to central Mexico, specifically the Bosque de Agua region in the Trans-Mexican Volcanic Belt, renowned for its endemic amphibian species. Anthropogenic activities such as land use change, water pollution, and the introduction of exotic species such as rainbow trout (Oncorhynchus mykiss) have substantially transformed its habitat, creating barriers that fragment it and impeding the mobility of the species and connectivity with other populations. This fragmentation poses challenges, including emerging diseases, inbreeding, limited gene flow, and a loss of genetic diversity, placing Ambystoma altamirani in national and international risk categories. The present study utilized the ENMeval and biomod2 models for environmental niche modeling (ENM) to assess the potential distribution of Ambystoma altamirani in the Bosque de Agua region. The key supporting variables include rivers, lakes, altitude, and a combination of Abies and Pinus forests, while the detrimental factors include urbanization and agriculture. Employing circuit theory (CT) and least-cost path (LCP) methodologies, this research explored functional connectivity, identifying core areas in the central region of Bosque de Agua. As migration distance decreases, the number of corridors facilitating population flow decreases. In the concluding phase, an analysis assessed the coincidence of state and federal Mexican Natural Protected Areas with core areas, revealing a lack of protection. The results of this study could lead to improved knowledge about Ambystoma altamirani, providing valuable tools for helping stakeholders formulate comprehensive strategies for species conservation.
... However, despite their broad applicability, SDMs have critical shortcomings associated in particular with the characteristics of input data, including their quantity and quality (Elith et al. 2002, Barry and Elith 2006, Rocchini et al. 2011, Moudrý and Šímová 2012, Davies et al. 2023. In this paper, we focus on the limitations of species occurrence data (for issues associated with environmental data, see for example Fourcade et al. 2018, Araújo et al. 2019Moudrý et al. 2023. ...
Article
Full-text available
Species distribution models (SDMs) have proven valuable in filling gaps in our knowledge of species occurrences. However, despite their broad applicability, SDMs exhibit critical shortcomings due to limitations in species occurrence data. These limitations include, in particular, issues related to sample size, positional uncertainty, and sampling bias. In addition, it is widely recognised that the quality of SDMs as well as the approaches used to mitigate the impact of the aforementioned data limitations depend on species ecology. While numerous studies have evaluated the effects of these data limitations on SDM performance, a synthesis of their results is lacking. However, without a comprehensive understanding of their individual and combined effects, our ability to predict the influence of these issues on the quality of modelled species–environment associations remains largely uncertain, limiting the value of model outputs. In this paper, we review studies that have evaluated the effects of sample size, positional uncertainty, sampling bias, and species ecology on SDMs outputs. We build upon their findings to provide recommendations for the critical assessment of species data intended for use in SDMs.
... Moreover, we applied a spatial "block" cross-validation scheme (Muscarella et al., 2014): data were split into four geographically non-overlapping folds of equal numbers of occurrences, corresponding to each corner of the entire geographical space. This method has been used to assess model transfer-ability, that is the ability to extrapolate predictions into new areas (Roberts et al., 2017), and to penalize models based on biologically meaningless predictors (Fourcade et al., 2018). Maps showing the spatial partitioning usied in the spatial "block" cross-validation scheme can be seen in Supplementary Material 2. The predictive performance of each model was assessed by measuring the area under the receiver operating characteristic curve (AUC; (Hanley and McNeil, 1982) and the true skill statistic (TSS; (Allouche et al., 2006). ...
... This metric is commonly used to assess the accuracy and predictive power of models (Fourcade et al., 2018;Freer et al., 2022;Proosdij et al., 2016;Zhang et al., 2018), and a model with an AUC value of ≥0.7 is considered to have a high level of predictive performance (Kindt, 2018;Mudereri et al., 2021). The COR represents the correlation between the observations in the presence-absence dataset and the corresponding predictions. ...
Article
Full-text available
Knowing the impacts of global climate change on the habitat suitability distribution of Limassolla leafhoppers contributes to understanding the feedback of organisms on climate change from a macroecological perspective, and provides important scientific basis for protecting the ecological environment and biodiversity. However, there is limited knowledge on this aspect. Thus, our study aimed to address this gap by analyzing Asian habitat suitability and centroid shifts of Limassolla based on 19 bioclimatic variables and occurrence records. Selecting five ecological niche models with the outstanding predictive performance (Maxlike, generalized linear model, generalized additive model, random forest, and maximum entropy) along with their ensemble model from 12 models, the current habitat suitability of Limassolla and its future habitat suitability under two Shared Socio‐economic Pathways (SSP1‐2.6 and SSP5‐8.5) in the 2050s and 2090s were predicted. The results showed that the prediction results of the five models are generally consistent. Based on ensemble model, 11 potential biodiversity hotspots with high suitability were identified. With climate change, the suitable range of Limassolla will experience both expansion and contraction. In SSP5‐8.52050s, the expansion area is 118.56 × 10⁴ km², while the contraction area is 25.40 × 10⁴ km²; in SSP1‐2.62090s, the expansion area is 91.71 × 10⁴ km², and the contraction area is 26.54 × 10⁴ km². Furthermore, the distribution core of Limassolla will shift toward higher latitudes in the northeast direction, and the precipitation of warmest quarter was found to have the greatest impact on the distribution of Limassolla. Our research results supported our four hypotheses. Finally, this research suggests establishing ecological reserves in identified contraction to prevent habitat loss, enhancing the protection of biodiversity hotspots, and pursuing a sustainable development path with reduced emissions.
... Future distributions of both native and invasive species are difficult to predict (Fourcade et al., 2018;Gallien et al., 2010;Rumpf et al., 2018). Our work suggests that spread (and the predictability of spread) will be strongly influenced by both dispersal evolution and adaptation to environmental gradients. ...
Article
Full-text available
Rapid evolution of increased dispersal at the edge of a range expansion can accelerate invasions. However, populations expanding across environmental gradients often face challenging environments that reduce fitness of dispersing individuals. We used an eco‐evolutionary model to explore how environmental gradients influence dispersal evolution and, in turn, modulate the speed and predictability of invasion. Environmental gradients opposed evolution of increased dispersal during invasion, even leading to evolution of reduced dispersal along steeper gradients. Counterintuitively, reduced dispersal could allow for faster expansion by minimizing maladaptive gene flow and facilitating adaptation. While dispersal evolution across homogenous landscapes increased both the mean and variance of expansion speed, these increases were greatly dampened by environmental gradients. We illustrate our model's potential application to prediction and management of invasions by parameterizing it with data from a recent invertebrate range expansion. Overall, we find that environmental gradients strongly modulate the effect of dispersal evolution on invasion trajectories.
... Most evaluations of SDMs rely on contemporary occurrence datasets (presence-absence or presence-background data) for model validation. Since independent data sets are difficult to acquire, quasi-independence can be 'enforced' on the data by using spatial or temporal cross-validation (Araújo et al. 2005, Roberts et al. 2017, Fourcade et al. 2018, Liu et al. 2020. Cross-validation is well-suited for evaluating the accuracy of models focused on present-day distributions, but less useful for comparing hindcasted or forecasted SDMs. ...
Article
Full-text available
Climate change poses a threat to biodiversity, and it is unclear whether species can adapt to or tolerate new conditions, or migrate to areas with suitable habitats. Reconstructions of range shifts that occurred in response to environmental changes since the last glacial maximum (LGM) from species distribution models (SDMs) can provide useful data to inform conservation efforts. However, different SDM algorithms and climate reconstructions often produce contrasting patterns, and validation methods typically focus on accuracy in recreating current distributions, limiting their relevance for assessing predictions to the past or future. We modeled historically suitable habitat for the threatened North American tree green ash Fraxinus pennsylvanica using 24 SDMs built using two climate models, three calibration regions, and four modeling algorithms. We evaluated the SDMs using contemporary data with spatial block cross‐validation and compared the relative support for alternative models using a novel integrative method based on coupled demographic‐genetic simulations. We simulated genomic datasets using habitat suitability of each of the 24 SDMs in a spatially‐explicit model. Approximate Bayesian computation (ABC) was then used to evaluate the support for alternative SDMs through comparisons to an empirical population genomic dataset. Models had very similar performance when assessed with contemporary occurrences using spatial cross‐validation, but ABC model selection analyses consistently supported SDMs based on the CCSM climate model, an intermediate calibration extent, and the generalized linear modeling algorithm. Finally, we projected the future range of green ash under four climate change scenarios. Future projections using the SDMs selected via ABC suggest only minor shifts in suitable habitat for this species, while some of those that were rejected predicted dramatic changes. Our results highlight the different inferences that may result from the application of alternative distribution modeling algorithms and provide a novel approach for selecting among a set of competing SDMs with independent data.
... Species distribution models assess the niche requirements of species by correlating occurrence records with environmental variables and ultimately produce suitability maps. These maps are a widely used tool for delineating species conservation regions (Esselman & Allan, 2011;Fourcade et al., 2018). However, species distribution models assume that the niche of species and biotic interactions remain constant over time, which may be unrealistic, especially for long time scales (Roberts & Hamann, 2015). ...
Article
Full-text available
Climatic change is a challenge for plant conservation due to plants' limited dispersal abilities. The survival and sustainable development of plants directly depend on the availability of suitable habitats. In this study, we employed an optimized MaxEnt model to evaluate the relative contribution of each environmental variable and predict the suitable habitat for Alsophila costularis under past, current, and future periods, which is an endangered relict tree fern known as a living fossil. For the Last Glacial Maximum (LGM) and Mid‐Holocene scenarios, we adopted two atmosphere–ocean general circulation models: CCSM4 and MIROC‐ESM. The BCC‐CSM2‐MR model was used for future projections. The results revealed that temperature annual range (Bio7) contributed most to the model construction with an optimal range of 13.74–22.44°C. Species distribution modeling showed that current suitable areas were mainly located in most areas of Yunnan, most areas of Hainan, most areas of Taiwan, southeastern Tibet, southwestern Guizhou, western Guangxi, southern Sichuan, and southern Guangdong, with an area of 35.90 × 10⁴ km². The suitable habitat area expanded northward in Yunnan from the Last Interglacial to the LGM under the CCSM4 model, while a significant contraction toward southwestern Yunnan was found under the MIROC‐ESM model. Furthermore, the potential distributions during the Mid‐Holocene were more widespread in Yunnan compared to those under current period. It is predicted that in the future, the range will significantly expand to northern Yunnan and western Guizhou. Almost all centroids of suitable habitats were distributed in southeastern Yunnan under different periods. The stable areas were located in southwestern Yunnan in all scenarios. The simulation results could provide a theoretical basis for the formulation of reasonable conservation and management measures to mitigate the effects of future climate change for A. costularis.
... Since robust predictors with low uncertainty and high quality are fundamental for obtaining reliable SDMs (Fourcade et al., 2018), the key advantage of digital soil maps is that they are spatially explicit and can cover soil variation at greater extents than is feasible with measured soil profile data. Therefore, modelled digital soil maps add value to 'in-situ' measured data sets. ...
... SDMs are based on single sources of empirical data, and this can be limiting, with estimated relationships within the model and specific parameters only as good as the data underpinning them (Fourcade et al., 2018;Guillera-Arroita et al., 2015). ...
Article
Full-text available
Species distribution modelling is a highly used tool for understanding and predicting biodiversity change, and recent work has emphasised the importance of understanding how species distributions change over both time and space. Spatio‐temporal models require large amounts of data spread over time and space, and as such are clear candidates to benefit from model‐based integration of different data sources. However, spatio‐temporal models are highly computationally intensive and integrating different data sources can make this approach even more unfeasible to ecologists. Here we demonstrate how the R‐INLA methodology can be used for model‐based data integration for spatio‐temporally explicit modelling of species distribution change. We demonstrate that this method can be applied to both point and areal data with two contrasting case studies, one using the SPDE approach for modelling spatio‐temporal change in the Gatekeeper butterfly (Pyronia tithonus) across Great Britain and the second using a spatio‐temporal areal model to describe change in caddisfly (Trichoptera) populations across the River Thames catchment. We show that in the caddisfly case study integrating together different data sources led to greater understanding of the change in abundance across the River Thames both seasonally and over 5 years of data. However, in the butterfly case study moving to a spatio‐temporal context exacerbated differences between the data sources and resulted in no greater ecological insight into change in the Gatekeeper population. Our work provides a computationally feasible framework for spatio‐temporally explicit integration of data within SDMs and demonstrates both the potential benefits and the challenges in applying this methodology to real ecological data.
... To avoid independence between the test data, we implemented the ENMeval "block" standard, which divides the distribution points according to their longitude and latitude following the recommendations of Radosavljevic and Anderson (2014). In this context, four geographically independent quadrants with the same number of occurrences were obtained corresponding to each corner of the geographic space (Fourcade et al. 2018). Subsequently, in each modeling stage, the model ran without background points located in the same area as the training and testing points. ...
Preprint
Full-text available
Ambystoma altamirani is a microendemic amphibian limited to central Mexico, speci cally the Bosque de Agua region in the Trans-Mexican Volcanic Belt, renowned for its endemic amphibian species. Anthropogenic activities such as land use change, water pollution, and the introduction of exotic species such as rainbow trout (Oncorhynchus mykiss) have substantially transformed its habitat, creating barriers that fragment it and impeding the mobility of the species and connectivity with other populations. This fragmentation poses challenges, including emerging diseases, inbreeding, limited gene ow, and a loss of genetic diversity, placing Ambystoma altamirani in national and international risk categories. The present study utilized the ENMeval and biomod2 models for environmental niche modeling (ENM) to assess the potential distribution of Ambystoma altamirani in the Bosque de Agua region. The key supporting variables include rivers, lakes, altitude, and a combination of Abies and Pinus forests, while the detrimental factors include urbanization and agriculture. Employing circuit theory (CT) and least-cost path (LCP) methodologies, this research explored structural connectivity, identifying core areas in the central region of Bosque de Agua. As migration distance decreases, the number of corridors facilitating population ow decreases. In the concluding phase, an analysis assessed the coincidence of state and federal Mexican Natural Protected Areas with core areas, revealing a lack of protection. The results of this study could lead to improved knowledge about Ambystoma altamirani, providing valuable tools for helping stakeholders formulate comprehensive strategies for species conservation.
... We therefore randomly undersampled pseudo-absences to match the number of observations. As predictors, we used all 19 bioclimatic variables from WorldClim ver. 2 (Fourcade et al. 2018), which were centered and standardized. ...
Article
Full-text available
Deep neural networks (DNN) have become a central method in ecology. To build and train DNNs in deep learning (DL) applications, most users rely on one of the major deep learning frameworks, in particular PyTorch or TensorFlow. Using these frameworks, however, requires substantial experience and time. Here, we present ‘cito', a user‐friendly R package for DL that allows specifying DNNs in the familiar formula syntax used by many R packages. To fit the models, ‘cito' takes advantage of the numerically optimized ‘torch' library, including the ability to switch between training models on the CPU or the graphics processing unit (GPU) which allows the efficient training of large DNNs. Moreover, ‘cito' includes many user‐friendly functions for model plotting and analysis, including explainable AI (xAI) metrics for effect sizes and variable importance. All xAI metrics as well as predictions can optionally be bootstrapped to generate confidence intervals, including p‐values. To showcase a typical analysis pipeline using ‘cito', with its built‐in xAI features, we built a species distribution model of the African elephant. We hope that by providing a user‐friendly R framework to specify, deploy and interpret DNNs, ‘cito' will make this interesting class of models more accessible to ecological data analysis. A stable version of ‘cito' can be installed from the comprehensive R archive network (CRAN).
Article
Full-text available
Wood density is a critical control on tree biomass, so poor understanding of its spatial variation can lead to large and systematic errors in forest biomass estimates and carbon maps. The need to understand how and why wood density varies is especially critical in tropical America where forests have exceptional species diversity and spatial turnover in composition. As tree identity and forest composition are challenging to estimate remotely, ground surveys are essential to know the wood density of trees, whether measured directly or inferred from their identity. Here, we assemble an extensive dataset of variation in wood density across the most forested and tree-diverse continent, examine how it relates to spatial and environmental variables, and use these relationships to predict spatial variation in wood density over tropical and sub-tropical South America. Our analysis refines previously identified east-west Amazon gradients in wood density, improves them by revealing fine-scale variation, and extends predictions into Andean, dry, and Atlantic forests. The results halve biomass prediction errors compared to a naïve scenario with no knowledge of spatial variation in wood density. Our findings will help improve remote sensing-based estimates of aboveground biomass carbon stocks across tropical South America.
Article
Aim The changing frequency and intensity of climatic extremes due to climate change can have sudden and adverse impacts on the distribution of species. While species distribution modelling is a vital tool in ecological applications, current approaches fail to fully capture the distribution of climatic extremes, particularly of rare events with the most disruptive potential. Especially at the edges of species' ranges, where conditions are already less favourable, predictions might be inaccurate when these extremes are not well represented. Location Europe. Taxon Tree species. Methods We present a novel approach to integrate extreme events into species distribution models based on the generalised extreme value (GEV) distribution. This distribution, following from the extreme value theory has been established as a valuable tool in analysing climatic extremes, both in an ecological context and beyond. The approach relying on the GEV distribution is broadly applicable, readily transferable across species and relies on widely available data. We demonstrate the efficacy of our approach for 28 European tree species, illustrating its superior ability to fully capture the distribution of climatic extremes compared to state‐of‐the‐art methods. Results We found that incorporating parameters on climatic extremes derived from the GEV distribution increased model performance (AIC model ) and characterised range edges more accurately (AUC edge ) compared to competing approaches. However, general AUC values were only marginally increased across the species and study period analysed. Overall, the GEV model predicted a narrower niche for the species included in this study. Main Conclusions Incorporating climatic extremes can impact spatial predictions of species distribution models, especially at range margins. We found that using the GEV distribution to characterise extreme variables in SDMs yields the best performance at these distribution edges. Given the importance of range edges for species conservation, a detailed inclusion of extremes in SDMs employed for those applications will help ensure robust conclusions.
Article
Wildfires are natural phenomena that have shaped ecosystems and maintained biodiversity for millions of years. However, the increased frequency and severity of wildfires in recent decades are predominantly attributed to human activities. These anthropogenic factors, including land use change, climate change, and fire suppression, have disrupted the natural fire regime and heightened the risk of large-scale, destructive wildfires. Reptiles, as ectothermic and often slow-moving animals, are particularly vulnerable to the effects of fires due to their limited mobility and reliance on specific microhabitats. Understanding the impacts of wildfires on reptile populations is crucial for their effective conservation and management in fire-prone areas. This paper focuses on Phrynosoma orbiculare, a species distributed across the northern and southeastern regions of Mexico, where wildfires are common. The study revealed that key environmental variables driving the distribution of P. orbiculare include altitude, temperature extremes, and forest composition, while fire occurrence is strongly influenced by climatic conditions such as temperature and precipitation. As fires become more frequent and severe, the niche overlap between P. orbiculare and fire-prone regions is expected to expand. These findings highlight the importance of integrating fire management into conservation planning, particularly for protecting fire-sensitive ecosystems like Abies forests. Understanding the complex interaction between fire and species distributions is essential for developing effective conservation strategies that ensure the survival of P. orbiculare and other fire-sensitive species in Mexico’s changing landscapes.
Article
In the context of changes in global climate and land uses, biodiversity patterns and plant species distributions have been significantly affected. Soil salinization is a growing problem, particularly in the arid areas of Northwest China. Halophytes are ideal for restoring soil salinization because of their adaptability to salt stress. In this study, we collected the current and future bioclimatic data released by the WorldClim database, along with soil data from the Harmonized World Soil Database (v1.2) and A Big Earth Data Platform for Three Poles. Using the maximum entropy (MaxEnt) model, the potential suitable habitats of six halophytic plant species (Halostachys caspica (Bieb.) C. A. Mey., Halogeton glomeratus (Bieb.) C. A. Mey., Kalidium foliatum (Pall.) Moq., Halocnemum strobilaceum (Pall.) Bieb., Salicornia europaea L., and Suaeda salsa (L.) Pall.) were assessed under the current climate conditions (average for 1970–2000) and future (2050s, 2070s, and 2090s) climate scenarios (SSP245 and SSP585, where SSP is the Shared Socio-economic Pathway). The results revealed that all six halophytic plant species exhibited the area under the receiver operating characteristic curve values higher than 0.80 based on the MaxEnt model, indicating the excellent performance of the MaxEnt model. The suitability of the six halophytic plant species significantly varied across regions in the arid areas of Northwest China. Under different future climate change scenarios, the suitable habitat areas for the six halophytic plant species are expected to increase or decrease to varying degrees. As global warming progresses, the suitable habitat areas of K. foliatum, S. salsa, and H. strobilaceum exhibited an increasing trend. In contrast, the suitable habitat areas of H. glomeratus, S. europaea, and H. caspica showed an opposite trend. Furthermore, considering the ongoing global warming trend, the centroids of the suitable habitat areas for various halophytic plant species would migrate to different degrees, and four halophytic plant species, namely, S. salsa, H. strobilaceum, H. gbmeratus, and H. capsica, would migrate to higher latitudes. Temperature, precipitation, and soil factors affected the possible distribution ranges of these six halophytic plant species. Among them, precipitation seasonality (coefficient of variation), precipitation of the warmest quarter, mean temperature of the warmest quarter, and exchangeable Na+ significantly affected the distribution of halophytic plant species. Our findings are critical to comprehending and predicting the impact of climate change on ecosystems. The findings of this study hold significant theoretical and practical implications for the management of soil salinization and for the utilization, protection, and management of halophytes in the arid areas of Northwest China.
Preprint
1. The fundamental unit of spatial ecology is a species range: the geographic area that it occupies. Species ranges are delineated by range edges (also known as boundaries or limits). Why range edges occur where they do and not elsewhere, and what makes them move, has been an active area of research since the 19th century. In the present day, range edge dynamics are an important metric of biodiversity’s response to climate change, as species shift toward the poles to track their climatic niches. Yet methods for measuring range edges and quantifying their displacement have never been formalized. 2. Here I described common methods for describing range edge positions and applied them to example data for a bird species and a fish species, using some of the most popular datasets in climate biogeography: the Audubon Society Christmas Bird Count and a National Oceanic and Atmospheric Administration bottom trawl survey. 3. I showed that the choice of range edge metric influences where range edge positions are estimated to occur; whether they are estimated to be shifting over time; and the estimated rate of shift. The lack of universal metrics for range edges has likely shaped statistics reported in synthesis studies that measured overall biodiversity responses to climate change and global rates of range shifts. Through simulation, I found that reliably detecting range edge shifts may require decades of data or more, suggesting that many global change studies in this field are underpowered. 4. Pairing metrics to research questions, sharing raw data and code, and conducting power analyses before reporting statistically significant results will all help to minimize this issue. Going forward, the field of biogeography should confront the degree to which ad hoc methods have influenced our understanding of range edge dynamics, and move toward universally accepted metrics.
Article
Full-text available
Environmental niche modeling (ENM) is commonly used to develop probabilistic maps of species distribution. Among available ENM techniques, MaxEnt has become one of the most popular tools for modeling species distribution, with hundreds of peer-reviewed articles published each year. MaxEnt’s popularity is mainly due to the use of a graphical interface and automatic parameter configuration capabilities. However, recent studies have shown that using the default automatic configuration may not be always appropriate because it can produce non-optimal models; particularly when dealing with a small number of species presence points. Thus, the recommendation is to evaluate the best potential combination of parameters (feature classes and regularization multiplier) to select the most appropriate model. In this work we reviewed 244 articles published between 2013 and 2015 to assess whether researchers are following recommendations to avoid using the default parameter configuration when dealing with small sample sizes, or if they are using MaxEnt as a “black box tool.” Our results show that in only 16% of analyzed articles authors evaluated best feature classes, in 6.9% evaluated best regularization multipliers, and in a meager 3.7% evaluated simultaneously both parameters before producing the definitive distribution model. We analyzed 20 articles to quantify the potential differences in resulting outputs when using software default parameters instead of the alternative best model. Results from our analysis reveal important differences between the use of default parameters and the best model approach, especially in the total area identified as suitable for the assessed species and the specific areas that are identified as suitable by both modelling approaches. These results are worrying, because publications are potentially reporting over-complex or over-simplistic models that can undermine the applicability of their results. Of particular importance are studies used to inform policy making. Therefore, researchers, practitioners, reviewers and editors need to be very judicious when dealing with MaxEnt, particularly when the modelling process is based on small sample sizes.
Article
Full-text available
Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross-validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross-validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non-causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross-validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non-random and blocked cross-validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross-validation is nearly universally more appropriate than random cross-validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross-validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations. This article is protected by copyright. All rights reserved.
Article
Full-text available
The area under the receiver operating characteristic (ROC) curve, known as the AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence–absence variable, by summarizing overall model performance over all possible thresholds. In this manuscript we review some of the features of this measure and bring into question its reliability as a comparative measure of accuracy between model results. We do not recommend using AUC for five reasons: (1) it ignores the predicted probability values and the goodness-of-fit of the model; (2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most importantly, (5) the total extent to which models are carried out highly influences the rate of well-predicted absences and the AUC scores.
Article
Full-text available
Spatial thinning of species occurrence records can help address problems associated with spatial sampling biases. Ideally, thinning removes the fewest records necessary to substantially reduce the effects of sampling bias, while simultaneously retaining the greatest amount of useful information. Spatial thinning can be done manually; however, this is prohibitively time consuming for large datasets. Using a randomization approach, the ‘thin’ function in the spThin R package returns a dataset with the maximum number of records for a given thinning distance, when run for sufficient iterations. We here provide a worked example for the Caribbean spiny pocket mouse, where the results obtained match those of manual thinning.
Article
Full-text available
Choice of variables, climate models and emissions scenarios all influence the results of species distribution models under future climatic conditions. However, an overview of applied studies suggests that the uncertainty associated with these factors is not always appropriately incorporated or even considered. We examine the effects of choice of variables, climate models and emissions scenarios can have on future species distribution models using two endangered species: one a short-lived invertebrate species (Ptunarra Brown Butterfly), and the other a long-lived paleo-endemic tree species (King Billy Pine). We show the range in projected distributions that result from different variable selection, climate models and emissions scenarios. The extent to which results are affected by these choices depends on the characteristics of the species modelled, but they all have the potential to substantially alter conclusions about the impacts of climate change. We discuss implications for conservation planning and management, and provide recommendations to conservation practitioners on variable selection and accommodating uncertainty when using future climate projections in species distribution models.
Article
Full-text available
Recent studies have demonstrated a need for increased rigour in building and evaluating ecological niche models ( ENM s) based on presence‐only occurrence data. Two major goals are to balance goodness‐of‐fit with model complexity (e.g. by ‘tuning’ model settings) and to evaluate models with spatially independent data. These issues are especially critical for data sets suffering from sampling bias, and for studies that require transferring models across space or time (e.g. responses to climate change or spread of invasive species). Efficient implementation of procedures to accomplish these goals, however, requires automation. We developed ENM eval , an R package that: (i) creates data sets for k ‐fold cross‐validation using one of several methods for partitioning occurrence data (including options for spatially independent partitions), (ii) builds a series of candidate models using Maxent with a variety of user‐defined settings and (iii) provides multiple evaluation metrics to aid in selecting optimal model settings. The six methods for partitioning data are n −1 jackknife, random k ‐folds ( = bins), user‐specified folds and three methods of masked geographically structured folds. ENM eval quantifies six evaluation metrics: the area under the curve of the receiver‐operating characteristic plot for test localities ( AUC TEST ), the difference between training and testing AUC ( AUC DIFF ), two different threshold‐based omission rates for test localities and the Akaike information criterion corrected for small sample sizes ( AIC c). We demonstrate ENM eval by tuning model settings for eight tree species of the genus Coccoloba in Puerto Rico based on AIC c. Evaluation metrics varied substantially across model settings, and models selected with AIC c differed from default ones. In summary, ENMeval facilitates the production of better ENM s and should promote future methodological research on many outstanding issues.
Article
Full-text available
Over the past few decades, there has been a rapid proliferation of statistical methods that infer evolutionary and ecological processes from data on species distributions. These methods have led to considerable new insights, but they often fail to account for the effects of historical biogeography on present-day species distributions. Because the geography of speciation can lead to patterns of spatial and temporal autocorrelation in the distributions of species within a clade, this can result in misleading inferences about the importance of deterministic processes in generating spatial patterns of biodiversity. In this opinion article, we discuss ways in which patterns of species distributions driven by historical biogeography are often interpreted as evidence of particular evolutionary or ecological processes. We focus on three areas that are especially prone to such misinterpretations: community phylogenetics, environmental niche modelling, and analyses of beta diversity (compositional turnover of biodiversity).
Article
Full-text available
MAXENT is now a common species distribution modeling (SDM) tool used by conservation practitioners for predicting the distribution of a species from a set of records and environmental predictors. However, datasets of species occurrence used to train the model are often biased in the geographical space because of unequal sampling effort across the study area. This bias may be a source of strong inaccuracy in the resulting model and could lead to incorrect predictions. Although a number of sampling bias correction methods have been proposed, there is no consensual guideline to account for it. We compared here the performance of five methods of bias correction on three datasets of species occurrence: one "virtual" derived from a land cover map, and two actual datasets for a turtle (Chrysemys picta) and a salamander (Plethodon cylindraceus). We subjected these datasets to four types of sampling biases corresponding to potential types of empirical biases. We applied five correction methods to the biased samples and compared the outputs of distribution models to unbiased datasets to assess the overall correction performance of each method. The results revealed that the ability of methods to correct the initial sampling bias varied greatly depending on bias type, bias intensity and species. However, the simple systematic sampling of records consistently ranked among the best performing across the range of conditions tested, whereas other methods performed more poorly in most cases. The strong effect of initial conditions on correction performance highlights the need for further research to develop a step-by-step guideline to account for sampling bias. However, this method seems to be the most efficient in correcting sampling bias and should be advised in most cases.
Article
Full-text available
Predicting how species distributions might shift as global climate changes is fundamental to the successful adaptation of conservation policy. An increasing number of studies have responded to this challenge by using climate envelopes, modeling the association between climate variables and species distributions. However, it is difficult to quantify how well species actually match climate. Here, we use null models to show that species–climate associations found by climate envelope methods are no better than chance for 68 of 100 European bird species. In line with predictions, we demonstrate that the species with distribution limits determined by climate have more northerly ranges. We conclude that scientific studies and climate change adaptation policies based on the indiscriminate use of climate envelope methods irrespective of species sensitivity to climate may be misleading and in need of revision. • bioclimatic niche • global change • null models • ornithology • species distribution
Article
Full-text available
Aim Species distribution models have been widely used to tackle ecological, evolutionary and conservation problems. Most species distribution modelling techniques produce continuous suitability predictions, but many real applications (e.g. reserve design, species invasion and climate change impact assessment) and model evaluations require binary outputs, and thresholds are needed for these transformations. Although there are many threshold selection methods for presence/absence data, it is unclear whether these are suitable for presence‐only data. In this paper, we investigate mathematically and empirically which of the existing threshold selection methods can be used confidently with presence‐only data. Location We used real spatially explicit environmental data derived from the western part of the state of V ictoria, south‐eastern A ustralia, and simulated species distributions within this area. Methods Thirteen existing threshold selection methods were investigated mathematically to see whether the same threshold can be produced using either presence/absence data or presence‐only data. We further adopted a simulation approach, created many virtual species with differing prevalences in a real landscape in south‐eastern A ustralia, generated data sets with different proportions of pseudo‐absences, built eight types of models with four modelling techniques, and investigated the behaviours of four threshold selection methods in these situations. Results Three threshold selection methods were not affected by pseudo‐absences, including max SSS (which is based on maximizing the sum of sensitivity and specificity), the prevalence of model training data and the mean predicted value of a set of random points. Max SSS produced higher sensitivity in most cases and higher true skill statistic and kappa in many cases than the other methods. The other methods produced different thresholds from presence‐only data to those determined from presence/absence data. Main conclusions Max SSS is a promising method for threshold selection when only presence data are available.
Article
Full-text available
Correlative species distribution models are frequently used to predict species’ range shifts under climate change. However, climate variables often show high collinearity and most statistical approaches require the selection of one among strongly correlated variables. When causal relationships between species presence and climate parameters are unknown, variable selection is often arbitrary, or based on predictive performance under current conditions. While this should only marginally affect current range predictions, future distributions may vary considerably when climate parameters do not change in concert. We investigated this source of uncertainty using four highly correlated climate variables together with a constant set of landscape variables in order to predict current (2010) and future (2050) distributions of four mountain bird species in central Europe. Simulating different parameterization decisions, we generated a) four models including each of the climate variables singly, b) a model taking advantage of all variables simultaneously and c) an un-weighted average of the predictions of a). We compared model accuracy under current conditions, predicted distributions under four scenarios of climate change, and – for one species – evaluated back-projections using historical occurrence data. Although current and future variable-correlations remained constant, and the models’ accuracy under contemporary conditions did not differ, future range predictions varied considerably in all climate change scenarios. Averaged models and models containing all climate variables simultaneously produced intermediate predictions; the latter, however, performed best in back-projections. This pattern, consistent across different modelling methods, indicates a benefit from including multiple climate predictors in ambiguous situations. Variable selection proved to be an important source of uncertainty for future range predictions, difficult to control using contemporary information. Small, but diverging changes of climate variables, masked by constant overall correlation patterns, can cause substantial differences between future range predictions which need to be accounted for, particularly when outcomes are intended for conservation decisions.
Article
Full-text available
AimInterest in species distribution models (SDMs) and related niche studies has increased dramatically in recent years, with several books and reviews being prepared since 2000. The earliest SDM studies are dealt with only briefly even in the books. Consequently, many researchers are unaware of when the first SDM software package (bioclim) was developed and how a broad range of applications using the package was explored within the first 8 years following its release. The purpose of this study is to clarify these early developments and initial applications, as well as to highlight bioclim's continuing relevance to current studies. LocationMainly Australia and New Zealand, but also some global applications. Methods We outline the development of the bioclim package, early applications (1984–1991) and its current relevance. Resultsbioclim was the first SDM package to be widely used. Early applications explored many of the possible uses of SDMs in conservation biogeography, such as quantifying the environmental niche of species, identifying areas where a species might be invasive, assisting conservation planning and assessing the likely impacts of climate change on species distributions. Main conclusionsUnderstanding this pioneering work is worthwhile as bioclim was for many years one of the leading SDM packages and remains widely used. Climate interpolation methods developed for bioclim were used to create the WorldClim database, the most common source of climate data for SDM studies, and bioclim variables are used in about 76% of recent published MaxEnt analyses of terrestrial ecosystems. Also, some of the bioclim studies from the late 1980s, such as measuring niche (both realized and fundamental) and assessing possible impacts of climate change, are still highly relevant to key conservation biogeography issues.
Article
Full-text available
Species distribution models (SDMs) are increasingly proposed to support conservation decision making. However, evidence of SDMs supporting solutions for on-ground conservation problems is still scarce in the scientific literature. Here, we show that successful examples exist but are still largely hidden in the grey literature, and thus less accessible for analysis and learning. Furthermore, the decision framework within which SDMs are used is rarely made explicit. Using case studies from biological invasions, identification of critical habitats, reserve selection and translocation of endangered species, we propose that SDMs may be tailored to suit a range of decision-making contexts when used within a structured and transparent decision-making process. To construct appropriate SDMs to more effectively guide conservation actions, modellers need to better understand the decision process, and decision makers need to provide feedback to modellers regarding the actual use of SDMs to support conservation decisions. This could be facilitated by individuals or institutions playing the role of 'translators' between modellers and decision makers. We encourage species distribution modellers to get involved in real decision-making processes that will benefit from their technical input; this strategy has the potential to better bridge theory and practice, and contribute to improve both scientific knowledge and conservation outcomes.
Article
Full-text available
The utility of species distribution models for applications in invasion and global change biology is critically dependent on their transferability between regions or points in time, respectively. We introduce two methods that aim to improve the transferability of presence-only models: density-based occurrence thinning and performance-based predictor selection. We evaluate the effect of these methods along with the impact of the choice of model complexity and geographic background on the transferability of a species distribution model between geographic regions. Our multifactorial experiment focuses on the notorious invasive seaweed Caulerpacylindracea (previously Caulerparacemosa var. cylindracea) and uses Maxent, a commonly used presence-only modeling technique. We show that model transferability is markedly improved by appropriate predictor selection, with occurrence thinning, model complexity and background choice having relatively minor effects. The data shows that, if available, occurrence records from the native and invaded regions should be combined as this leads to models with high predictive power while reducing the sensitivity to choices made in the modeling process. The inferred distribution model of Caulerpacylindracea shows the potential for this species to further spread along the coasts of Western Europe, western Africa and the south coast of Australia.
Article
Full-text available
Background/Question/Methods Maxent, one of the most commonly used methods for inferring species distributions and environmental tolerances from occurrence data, allows users to fit models of arbitrary complexity. Model complexity is typically constrained via a process known as L1 regularization, but at present little guidance is available for setting the appropriate level of regularization, and the effects of inappropriately complex or simple models are largely unknown. In this study, we demonstrate the use of information criterion approaches to setting regularization in Maxent, and compare models selected using information criteria to models selected using other criteria that are common in the literature. We evaluate model performance using occurrence data generated from a known “true” initial Maxent model, using several different metrics for model quality and transferability. Results/Conclusions We demonstrate that models that are inappropriately complex or inappropriately simple show reduced ability to infer habitat quality, reduced ability to infer the relative importance of variables in constraining species’ distributions, and reduced transferability to other time periods. We also measure the relative effectiveness of different model selection criteria, and demonstrate that information criteria may offer significant advantages over the AUC-based methods commonly used in the literature.
Article
Full-text available
Aim: Modeling the distribution of rare and invasive species often occurs in situations where reliable absences for evaluating model performance are unavailable. However, predictions at randomly located sites, or “background” sites, can stand in for true absences. The maximum value of the area under the receiver operator characteristic curve, AUC, calculated with background sites is believed to be 1 – a/2, where a is the typically unknown prevalence of the species on the landscape. Location: Any occasion when background sites are used in place of absences for evaluating models and when test presences do not represent each inhabited region of a species’ range in proportion to its area. Methods: Using a simple example of a species’ range, I show how AUC can achieve values >1 – a/2. I then demonstrate algebraically how disproportionate representation of habitable sites influences AUC above this threshold. This example is then extended to more realistic situations. Results: Values of AUC that surpass 1 – a/2 are associated with higher model predictions in areas overrepresented in the test data set, even if they are less environmentally suitable than other regions the species occupies. Pursuit of high AUC values can encourage inclusion of spurious predictors in the final model if they help to differentiate areas with disproportionate representation in the test data. Main conclusions: Choices made during modeling to increase AUC calculated with background sites on the assumption that higher scores connote more accurate models can decrease actual accuracy when test presences disproportionately represent inhabited areas.
Article
Full-text available
Recently, interest in species distribution modelling has increased following the development of new methods for the analysis of presence‐only data and the deployment of these methods in user‐friendly and powerful computer programs. However, reliable inference from these powerful tools requires that several assumptions be met, including the assumptions that observed presences are the consequence of random or representative sampling and that detectability during sampling does not vary with the covariates that determine occurrence probability. Based on our interactions with researchers using these tools, we hypothesized that many presence‐only studies were ignoring important assumptions of presence‐only modelling. We tested this hypothesis by reviewing 108 articles published between 2008 and 2012 that used the MAXENT algorithm to analyse empirical (i.e. not simulated) data. We chose to focus on these articles because MAXENT has been the most popular algorithm in recent years for analysing presence‐only data. Many articles (87%) were based on data that were likely to suffer from sample selection bias; however, methods to control for sample selection bias were rarely used. In addition, many analyses (36%) discarded absence information by analysing presence–absence data in a presence‐only framework, and few articles (14%) mentioned detection probability. We conclude that there are many misconceptions concerning the use of presence‐only models, including the misunderstanding that MAXENT , and other presence‐only methods, relieve users from the constraints of survey design. In the process of our literature review, we became aware of other factors that raised concerns about the validity of study conclusions. In particular, we observed that 83% of articles studies focused exclusively on model output (i.e. maps) without providing readers with any means to critically examine modelled relationships and that MAXENT 's logistic output was frequently (54% of articles) and incorrectly interpreted as occurrence probability. We conclude with a series of recommendations foremost that researchers analyse data in a presence–absence framework whenever possible, because fewer assumptions are required and inferences can be made about clearly defined parameters such as occurrence probability.
Article
Full-text available
1. Ecologists have long sought to distinguish relationships that are general from those that are idiosyncratic to a narrow range of conditions. Conventional methods of model validation and selection assess in- or out-of-sample prediction accuracy but do not assess model generality or transferability, which can lead to overestimates of performance when predicting in other locations, time periods or data sets. 2. We propose an intuitive method for evaluating transferability based on techniques currently in use in the area of species distribution modelling. The method involves cross-validation in which data are assigned non-randomly to groups that are spatially, temporally or otherwise distinct, thus using heterogeneity in the data set as a surrogate for heterogeneity among data sets. 3. We illustrate the method by applying it to distribution modelling of brook trout (Salvelinus fontinalis Mitchill) and brown trout (Salmo trutta Linnaeus) in western United States. We show that machine-learning techniques such as random forests and artificial neural networks can produce models with excellent in-sample performance but poor transferability, unless complexity is constrained. In our example, traditional linear models have greater transferability. 4. We recommend the use of a transferability assessment whenever there is interest in making inferences beyond the data set used for model fitting. Such an assessment can be used both for validation and for model selection and provides important information beyond what can be learned from conventional validation and selection techniques.
Article
Full-text available
Species distribution models (SDMs) trained on presence-only data are frequently used in ecological research and conservation planning. However, users of SDM software are faced with a variety of options, and it is not always obvious how selecting one option over another will affect model performance. Working with MaxEnt software and with tree fern presence data from New Zealand, we assessed whether (a) choosing to correct for geographical sampling bias and (b) using complex environmental response curves have strong effects on goodness of fit. SDMs were trained on tree fern data, obtained from an online biodiversity data portal, with two sources that differed in size and geographical sampling bias: a small, widely-distributed set of herbarium specimens and a large, spatially clustered set of ecological survey records. We attempted to correct for geographical sampling bias by incorporating sampling bias grids in the SDMs, created from all georeferenced vascular plants in the datasets, and explored model complexity issues by fitting a wide variety of environmental response curves (known as "feature types" in MaxEnt). In each case, goodness of fit was assessed by comparing predicted range maps with tree fern presences and absences using an independent national dataset to validate the SDMs. We found that correcting for geographical sampling bias led to major improvements in goodness of fit, but did not entirely resolve the problem: predictions made with clustered ecological data were inferior to those made with the herbarium dataset, even after sampling bias correction. We also found that the choice of feature type had negligible effects on predictive performance, indicating that simple feature types may be sufficient once sampling bias is accounted for. Our study emphasizes the importance of reducing geographical sampling bias, where possible, in datasets used to train SDMs, and the effectiveness and essentialness of sampling bias correction within MaxEnt.
Article
Full-text available
Bioclimatic envelope models use associations between aspects of climate and species' occurrences to estimate the conditions that are suitable to maintain viable populations. Once bioclimatic envelopes are characterized, they can be applied to a variety of questions in ecology, evolution, and conservation. However, some have questioned the usefulness of these models, because they may be based on implausible assumptions or may be contradicted by empirical evidence. We review these areas of contention, and suggest that criticism has often been misplaced, resulting from confusion between what the models actually deliver and what users wish that they would express. Although improvements in data and methods will have some effect, the usefulness of these models is contingent on their appropriate use, and they will improve mainly via better awareness of their conceptual basis, strengths, and limitations.
Article
Full-text available
1. Protected area networks for river ecosystems must account for the highly connected nature of river habitats and the fact that conditions in distant locations can influence downstream habitats and biota. We used Marxan conservation planning software to address the unique constraints of reserve design in river ecosystems and structure a reserve network to overcome key challenges to freshwater conservation. 2. The range limits of 63 fish species in Mesoamerica were predicted and used in Marxan to design a network of conservation focal areas that encompasses 15% of the range of each species in areas with low risk of environmental degradation. Upstream risk intensity was estimated by propagating landscape-based sources of stress downstream along the direction of flow in GIS. We constrained Marxan solutions to account for basin divides, and we defined critical management zones to include important habitats that contribute to species persistence and mitigate threats. 3. The proposed reserve network encompassed 11% of the study area, half of which was contained within existing protected areas. Our exercise also identified important gaps in protection. Because terrestrial-based environmental risks were propagated through the river network and considered in the solution, focal areas were constrained to catchments with low levels of upstream human activity. Addition of critical management zones – riparian buffers and fish migration corridors – expanded the network area by one-fifth. 4. Our application of Marxan allowed longitudinal connectivity and topographic barriers to species movement to be considered. Adding critical management zones expanded the size of the reserve network, but is crucial to the network’s conservation efficacy. We call for an evaluation of the added management capacity needed to conserve critical management zones and suggest ways to further improve the reserve design process.
Article
Full-text available
Statistical species distribution models (SDMs) are widely used to predict the potential changes in species distributions under climate change scenarios. We suggest that we need to revisit the conceptual framework and ecological assumptions on which the relationship between species distributions and environment is based. We present a simple conceptual framework to examine the selection of environmental predictors and data resolution scales. These vary widely in recent papers, with light inconsistently included in the models. Focusing on light as a necessary component of plant SDMs, we briefly review its dependence on aspect and slope and existing knowledge of its influence on plant distribution. Differences in light regimes between north- and south-facing aspects in temperate latitudes can produce differences in temperature equivalent to moves 200 km polewards. Local topography may create refugia that are not recognized in many climate change SDMs using coarse-scale data. We argue that current assumptions about the selection of predictors and data resolution need further testing. Application of these ideas can clarify many issues of scale, extent and choice of predictors, and potentially improve the use of SDMs for climate change modelling of biodiversity.
Chapter
This chapter describes a framework for selecting appropriate strategies for evaluating model performance and significance. It begins with a review of key concepts, focusing on how primary occurrence data can be presence-only, presence/background, presence/pseudoabsence, or presence/absence as well as factors that may contribute to apparent commission error. It then considers the availability of two pools of occurrence data: one for model calibration and another for evaluation of model predictions. It also discusses strategies for detecting overfitting or sensitivity to bias in model calibration, with particular emphasis on quantification of performance and tests of significance. Finally, it suggests directions for future research as regards model evaluation, highlighting areas in need of theoretical and/or methodological advances.
Article
Aim To synthesize the species distribution modelling (SDM) literature to inform which variables have been used in MaxEnt models for different taxa and to quantify how frequently they have been important for species’ distributions. Location Global. Methods We conducted a quantitative synthesis analysing the contribution of over 400 distinct environmental variables to 2040 MaxEnt SDMs for nearly 1900 species representing over 300 families. Environmental variables were grouped into 24 related factors and results were analysed by examining the frequency with which variables were found to be most important, the mean contribution of each variable (at various taxonomic levels), and using TrueSkill™, a Bayesian skill rating system. Results Precipitation, temperature, bathymetry, distance to water and habitat patch characteristics were the most important variables overall. Precipitation and temperature were analysed most frequently and one of these variables was often the most important predictor in the model (nearly 80% of models, when tested). Notably, distance to water was the most important variable in the highest proportion of models in which it was tested (42% of 225 models). For terrestrial species, precipitation, temperature and distance to water had the highest overall contributions, whereas for aquatic species, bathymetry, precipitation and temperature were most important. Main conclusions Over all MaxEnt models published, the ability to discriminate occurrence from reference sites was high (average AUC = 0.92). Much of this discriminatory ability was due to temperature and precipitation variables. Further, variability (temperature) and extremes (minimum precipitation) were the most predictive. More generally, the most commonly tested variables were not always the most predictive, with, for instance, ‘distance to water’ infrequently tested, but found to be very important when it was. Thus, the results from this study summarize the MaxEnt SDM literature, and can aid in variable selection by identifying underutilized, but potentially important variables, which could be incorporated in future modelling efforts.
Article
Aim Niche‐based species distribution models (SDMs) are commonly used to predict impacts of global change on biodiversity, but the reliability of these predictions in space and time depends on their transferability. We tested how the strategy used to choose predictors impacts the transferability of SDMs at a cross‐continental scale. Location North America, Eurasia and Australia. Method We used a systematic approach including 50 Holarctic plant invaders and 27 initial predictor variables, considering 10 different strategies for variable selection, accounting for the proximality, multicollinearity and climate analogy of predictors. We compared the average performance of each strategy, some of which used a large number of predictor combinations. Next, we looked for the single best model for each species across all the predictor combinations retained in the analysis. Transferability was considered as the predictive success of SDMs calibrated in the native range and projected onto the invaded range. Results Two strategies showed better SDM transferability on average: a set of predictors known for their ecologically meaningful effects on plant distribution, and the two first axes of a principal component analysis calibrated on all predictor variables (S pc2 ). From the more than 2000 combinations of predictors per species across strategies, the best set of predictors yielded SDMs with good transferability for 45 species (90%). These best combinations consisted of eight randomly assembled (39 species) or uncorrelated predictors (6 species) and S pc2 (5 species). We also found that internal cross‐validation was not sufficient to give full information about the transferability of a SDM to a distinct range. Main conclusion Transferring SDMs at the macroclimatic scale, and thus anticipating invasions, is possible for the large majority of invasive plants considered in this study, but the accuracy of the predictions relies strongly on the choice of predictors. From our results, we recommend including either proximal and state‐of‐the‐art variables or a reduced and orthogonalized set to obtain robust SDM projections.
Article
Species distribution model (SDM) projections under future climate scenarios are increasingly being used to inform resource management and conservation strategies.Acritical assumption for projecting climate change responses is that SDMs are transferable through time, an assumption that is largely untested because investigators often lack temporally independent data for assessing transferability. Further, understanding how the ecology of species influences temporal transferability is critical yet al.most wholly lacking. This raises two questions. (1) Are SDM projections transferable in time? (2) Does temporal transferability relate to species ecological traits? To address these questions we developed SDMs for 133 vascular plant species using data from the mountain ranges of California (USA) from two time periods: the 1930s and the present day. We forecast historical models over 75 years of measured climate change and assessed their projections against current distributions. Similarly, we hindcast contemporary models and compared their projections to historical data. We quantified transferability and related it to species ecological traits including physiognomy, endemism, dispersal capacity, fire adaptation, and commonness. We found that non-endemic species with greater dispersal capacity, intermediate levels of prevalence, and little fire adaptation had higher transferability than endemic species with limited dispersal capacity that rely on fire for reproduction. We demonstrate that variability in model performance was driven principally by differences among species as compared to model algorithms or time period of model calibration. Further, our results suggest that the traits correlated with prediction accuracy in a single time period may not be related to transferability between time periods. Our findings provide a priori guidance for the suitability of SDM as an approach for forecasting climate change responses for certain taxa.
Article
Data about biodiversity are either scattered in many databases or reside on paper or other media not amenable to interactive searching. The Global Biodiversity Information Facility (GBIF) is a framework for facilitating the digitization of biodiversity data and for making interoperable an as-yet-unknown number of biodiversity databases that are distributed around the globe. In concert with other existing efforts, GBIF will catalyze the completion of a Catalog of the Names of Known Organisms and will develop search engines to mine the vast quantities of biodiversity data. It will be an outstanding tool for scientists, natural resource managers, and policy-makers.
Article
Aim Presence‐only datasets represent an important source of information on species' distributions. Collections of presence‐only data, however, are often spatially biased, particularly along roads and near urban populations. These biases can lead to inaccurate inferences and predicted distributions. We demonstrate a new approach of accounting for effort bias in presence‐only data by explicitly incorporating sample biases in species distribution modelling. Location Alberta, Canada. Methods First, we used logistic regression to model sampling effort of recorded rare vascular plants, bryophytes and butterflies in Alberta. Second, we simulated presence/absence data for nine ‘virtual’ species based on three relative occurrence thresholds – common, rare and very rare – for each taxonomic group. We sampled these virtual species using our bias model to represent typical sampling effort characteristic of presence‐only datasets. We then modelled the distributions of these virtual species using logistic regression and attempted to recover their original simulated distributions using a sample weighting term (prior weight) estimated as the inverse of probability of sampling. Bias‐adjusted model estimates were compared to those obtained from random samples and biased samples without adjustment. We also compared prior‐weight adjustment to bias‐file and target‐group background approaches in Maxent. Results Sample weighting recovered regression coefficients and mapped predictions estimated from unbiased presence‐only data and improved model predictive accuracy as evaluated by regression and correlation coefficients, sensitivity and specificity. Similar model improvements were achieved using the Maxent bias‐file method, but results were inconsistent for the target‐group background approach. Main conclusions These results suggest that sample weighting can be used to account for spatially biased presence‐only datasets in species distribution modelling. The framework presented is potentially widely applicable due to availability of online biodiversity databases and the flexibility of the approach.
Article
When using species distribution models to predict distributions of invasive species, we are faced with the trade-off between model realism, generality, and precision. Models are most applicable to specific conditions on which they are developed, but typically not readily transferred to other situations. To better assist management of biological invasions, it is critical to know how to validate and improve model generality while maintaining good model precision and realism. We examined this issue with Bythotrephes longimanus, to determine the importance of different models and datasets in providing insights into understanding and predicting invasions. We developed models (linear discriminant analysis, multiple logistic regression, random forests, and artificial neural networks) on datasets with different sample sizes (315 or 179 lakes) and predictor information (environmental with or without fish data), and evaluated them by cross-validation and several independent datasets. In cross-validation, models developed on 315-lake environmental dataset performed better than those developed on 179-lake environmental and fish dataset. The advantage of a larger dataset disappeared when models were tested on independent datasets. Predictions of the models were more diverse when developed on environmental conditions alone, whereas they were more consistent when including fish (especially diversity) data. Random forests had relatively good and more stable performance than the other approaches when tested on independent datasets. Given the improvement of model transferability in this study by including relevant species occurrence or diversity index, incorporating biotic information in addition to environmental predictors, may help develop more reliable models with better realism, generality, and precision.
Article
The MaxEnt software package is one of the most popular tools for species distribution and environmental niche modeling, with over 1000 published applications since 2006. Its popularity is likely for two reasons: 1) MaxEnt typically outperforms other methods based on predictive accuracy and 2) the software is particularly easy to use. MaxEnt users must make a number of decisions about how they should select their input data and choose from a wide variety of settings in the software package to build models from these data. The underlying basis for making these decisions is unclear in many studies, and default settings are apparently chosen, even though alternative settings are often more appropriate. In this paper, we provide a detailed explanation of how MaxEnt works and a prospectus on modeling options to enable users to make informed decisions when preparing data, choosing settings and interpreting output. We explain how the choice of background samples reflects prior assumptions, how nonlinear functions of environmental variables (features) are created and selected, how to account for environmentally biased sampling, the interpretation of the various types of model output and the challenges for model evaluation. We demonstrate MaxEnt’s calculations using both simplified simulated data and occurrence data from South Africa on species of the flowering plant family Proteaceae. Throughout, we show how MaxEnt’s outputs vary in response to different settings to highlight the need for making biologically motivated modeling decisions.
Article
Species distribution modelling has become a common approach in ecology in the last decades. As in any modelling exercise, evaluation of the predicted suitability surfaces is a key process, and the area under the receiver operating characteristic (ROC) curve (AUC) has become the most popular statistic for this purpose. A close covariation between the AUC and threshold-dependent discrimination measures (sensitivity Se and specificity Sp) raises into question the advantage of the threshold-independence of the AUC. In this study, the relationship between the AUC and several threshold-dependent discrimination measures is characterized in detail, and the sensitivity of the pattern to variations in the shape of the ROC curve is assessed. Hypothetical suitability values, coming from normal and skew-normal distributions, were simulated for both instances of presence and absence. The flexibility of the skew-normal distribution allowed for the simulation of a wide range of ROC curve configurations. The relationship between the AUC and threshold-dependent measures was graphically assessed; independently of the ROC curve shape, a nonlinear asymptotic relationship between the AUC and Se (and Sp) was obtained after applying the threshold that makes Se = Sp. A nonlinear asymptotic relationship between the AUC and the Youden index was also reported. These results imply that the AUC does not appropriately measure changes in the discrimination of models, and it is especially incapable of distinguishing between models with high discrimination capacity. Se or Sp derived from the application of the threshold that makes them equal is a preferred measure of discrimination power. Together with the rate of false positives and negatives, and with the prevalence of the species, these statistics provide more information about the discrimination capacity of the models than the AUC.
Article
Aim Models of species niches and distributions have become invaluable to biogeographers over the past decade, yet several outstanding methodological issues remain. Here we address three critical ones: selecting appropriate evaluation data, detecting overfitting, and tuning program settings to approximate optimal model complexity. We integrate solutions to these issues for Maxent models, using the Caribbean spiny pocket mouse, H eteromys anomalus , as an example. Location N orth‐western S outh A merica. Methods We partitioned data into calibration and evaluation datasets via three variations of k ‐fold cross‐validation: randomly partitioned, geographically structured and masked geographically structured (which restricts background data to regions corresponding to calibration localities). Then, we carried out tuning experiments by varying the level of regularization, which controls model complexity. Finally, we gauged performance by quantifying discriminatory ability and overfitting, as well as via visual inspections of maps of the predictions in geography. Results Performance varied among data‐partitioning approaches and among regularization multipliers. The randomly partitioned approach inflated estimates of model performance and the geographically structured approach showed high overfitting. In contrast, the masked geographically structured approach allowed selection of high‐performing models based on all criteria. Discriminatory ability showed a slight peak in performance around the default regularization multiplier. However, regularization levels two to four times higher than the default yielded substantially lower overfitting. Visual inspection of maps of model predictions coincided with the quantitative evaluations. Main conclusions Species‐specific tuning of model parameters can improve the performance of Maxent models. Further, accurate estimates of model performance and overfitting depend on using independent evaluation data. These strategies for model evaluation may be useful for other modelling methods as well.
Article
Aim When faced with dichotomous events, such as the presence or absence of a species, discrimination capacity (the ability to separate the instances of presence from the instances of absence) is usually the only characteristic that is assessed in the evaluation of the performance of predictive models. Although neglected, calibration or reliability (how well the estimated probability of presence represents the observed proportion of presences) is another aspect of the performance of predictive models that provides important information. In this study, we explore how changes in the distribution of the probability of presence make discrimination capacity a context‐dependent characteristic of models. For the first time, we explain the implications that ignoring the context dependence of discrimination can have in the interpretation of species distribution models. Innovation In this paper we corroborate that, under a uniform distribution of the estimated probability of presence, a well‐calibrated model will not attain high discrimination power and the value of the area under the curve will be 0.83. Under non‐uniform distributions of the probability of presence, simulations show that a well‐calibrated model can attain a broad range of discrimination values. These results illustrate that discrimination is a context‐dependent property, i.e. it gives information about the performance of a certain algorithm in a certain data population. Main conclusions In species distribution modelling, the discrimination capacity of a model is only meaningful for a certain species in a given geographic area and temporal snapshot. This is because the representativeness of the environmental domain changes with the geographical and temporal context, which unavoidably entails changes in the distribution of the probability of presence. Comparative studies that intend to generalize their results only based on the discrimination capacity of models may not be broadly extrapolated. Assessment of calibration is especially recommended when the models are intended to be transferred in time or space.
Book
Alfred Russel Wallace (1823–1913) was a British biologist and explorer whose theories of evolution, arrived at independently, caused Darwin to allow their famous joint paper to go forward to the Linnean Society in 1858. Considered the nineteenth century's leading expert on the geographical distribution of animals, Wallace carried out extensive fieldwork in areas as diverse as North and South America, Africa, China, India and Australia to document the habitats, breeding, migration and feeding behaviour of thousands of species around the world, and the influence of environmental conditions on their survival. First published in 1876, this two-volume set presents Wallace's findings, and represents a landmark in the study of zoology, evolutionary biology and palaeontology which remains relevant to scholars in these fields today. Volume 2 explores the distribution of primates, the habitats and characteristics of mammals, birds, reptiles, fish and insects, and patterns of migration.
Article
The area under the receiver operating characteristic (ROC) curve, known as the AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence-absence variable, by summarizing overall model performance over all possible thresholds. In this manuscript we review some of the features of this measure and bring into question its reliability as a comparative measure of accuracy between model results. We do not recommend using AUC for five reasons: (1) it ignores the predicted probability values and the goodness-of-fit of the model; (2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most importantly, (5) the total extent to which models are carried out highly influences the rate of well-predicted absences and the AUC scores.
Article
Simple interval estimate methods for proportions exhibit poor coverage and can produce evidently inappropriate intervals. Criteria appropriate to the evaluation of various proposed methods include: closeness of the achieved coverage probability to its nominal value; whether intervals are located too close to or too distant from the middle of the scale; expected interval width; avoidance of aberrations such as limits outside [0,1] or zero width intervals; and ease of use, whether by tables, software or formulae. Seven methods for the single proportion are evaluated on 96,000 parameter space points. Intervals based on tail areas and the simpler score methods are recommended for use. In each case, methods are available that aim to align either the minimum or the mean coverage with the nominal 1 - α.
Article
Distribution models are used to predict the likelihood of occurrence or abundance of a species at locations where census data are not available. An integral part of modelling is the testing of model performance. We compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey. The four testing schemes we compared featured increasing independence between test and training data: resubstitution, random data hold-out and two spatially segregated data hold-out designs. The different testing measures also addressed different levels of information content in the dependent variable: regression R2 for absolute abundance, squared correlation coefficient r2 for relative abundance and AUC/Somer’s D for presence/absence. We found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy. Even for data collected independently, spatial autocorrelation leads to dependence between random hold-out test data and training data, and thus to inflated measures of model performance. While there is a general awareness of the importance of autocorrelation to model building and hypothesis testing, its consequences via violation of independence between training and testing data have not been addressed systematically and comprehensively before. Furthermore, increasing information content (from correctly classifying presence/absence, to predicting relative abundance, to predicting absolute abundance) leads to decreasing predictive performance. The current tests for presence/absence distribution models are typically overly optimistic because a) the test and training data are not independent and b) the correct classification of presence/absence has a relatively low information content and thus capability to address ecological and conservation questions compared to a prediction of abundance. Meaningful evaluation of model performance requires testing on spatially independent data, if the intended application of the model is to predict into new geographic or climatic space, which arguably is the case for most applications of distribution models.
Article
Aim The area under the receiver operating characteristic (ROC) curve (AUC) is a widely used statistic for assessing the discriminatory capacity of species distribution models. Here, I used simulated data to examine the interdependence of the AUC and classical discrimination measures (sensitivity and specificity) derived for the application of a threshold. I shall further exemplify with simulated data the implications of using the AUC to evaluate potential versus realized distribution models. Innovation After applying the threshold that makes sensitivity and specificity equal, a strong relationship between the AUC and these two measures was found. This result is corroborated with real data. On the other hand, the AUC penalizes the models that estimate potential distributions (the regions where the species could survive and reproduce due to the existence of suitable environmental conditions), and favours those that estimate realized distributions (the regions where the species actually lives). Main conclusions Firstly, the independence of the AUC from the threshold selection may be irrelevant in practice. This result also emphasizes the fact that the AUC assumes nothing about the relative costs of errors of omission and commission. However, in most real situations this premise may not be optimal. Measures derived from a contingency table for different cost ratio scenarios, together with the ROC curve, may be more informative than reporting just a single AUC value. Secondly, the AUC is only truly informative when there are true instances of absence available and the objective is the estimation of the realized distribution. When the potential distribution is the goal of the research, the AUC is not an appropriate performance measure because the weight of commission errors is much lower than that of omission errors.
Article
Aim Species distribution modelling is commonly used to guide future conservation policies in the light of potential climate change. However, arbitrary decisions during the model-building process can affect predictions and contribute to uncertainty about where suitable climate space will exist. For many species, the key climatic factors limiting distributions are unknown. This paper assesses the uncertainty generated by using different climate predictor variable sets for modelling the impacts of climate change.
Article
Aim Species distribution models (SDMs) are used to infer niche responses and predict climate change-induced range shifts. However, their power to distinguish real and chance associations between spatially autocorrelated distribution and environmental data at continental scales has been questioned. Here this is investigated at a regional (10 km) scale by modelling the distributions of 100 plant species native to the UK. Location UK. Methods SDMs fitted using real climate data were compared with those utilizing simulated climate gradients. The simulated gradients preserve the exact values and spatial structure of the real ones, but have no causal relationships with any species and so represent an appropriate null model. SDMs were fitted as generalized linear models (GLMs) or by the Random Forest machine-learning algorithm and were either non-spatial or included spatially explicit trend surfaces or autocovariates as predictors. Results Species distributions were significantly but erroneously related to the simulated gradients in 86% of cases (P < 0.05 in likelihood-ratio tests of GLMs), with the highest error for strongly autocorrelated species and gradients and when species occupied 50% of sites. Even more false effects were found when curvilinear responses were modelled, and this was not adequately mitigated in the spatially explicit models. Non-spatial SDMs based on simulated climate data suggested that 70–80% of the apparent explanatory power of the real data could be attributable to its spatial structure. Furthermore, the niche component of spatially explicit SDMs did not significantly contribute to model fit in most species. Main conclusions Spatial structure in the climate, rather than functional relationships with species distributions, may account for much of the apparent fit and predictive power of SDMs. Failure to account for this means that the evidence for climatic limitation of species distributions may have been overstated. As such, predicted regional- and national-scale impacts of climate change based on the analysis of static distribution snapshots will require re-evaluation.
Article
Within the field of species distribution modelling an apparent dichotomy exists between process-based and correlative approaches, where the processes are explicit in the former and implicit in the latter. However, these intuitive distinctions can become blurred when comparing species distribution modelling approaches in more detail. In this review article, we contrast the extremes of the correlativeprocess spectrum of species distribution models with respect to core assumptions, model building and selection strategies, validation, uncertainties, common errors and the questions they are most suited to answer. The extremes of such approaches differ clearly in many aspects, such as model building approaches, parameter estimation strategies and transferability. However, they also share strengths and weaknesses. We show that claims of one approach being intrinsically superior to the other are misguided and that they ignore the processcorrelation continuum as well as the domains of questions that each approach is addressing. Nonetheless, the application of process-based approaches to species distribution modelling lags far behind more correlative (process-implicit) methods and more research is required to explore their potential benefits. Critical issues for the employment of species distribution modelling approaches are given, together with a guideline for appropriate usage. We close with challenges for future development of process-explicit species distribution models and how they may complement current approaches to study species distributions.