Article

Comparison between optimized MaxEnt and random forest modeling in predicting potential distribution: A case study with Quasipaa boulengeri in China

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Random forest (RF) and MaxEnt models are shallow machine learning approaches that perform well in predicting species' potential distributions. RF models can produce robust results with the default automatic configuration in most cases, but it is necessary for MaxEnt to optimize the model settings to improve the performance, and the predictive performance difference between optimized MaxEnt and RF is uncertain. To explore this issue, the potential distribution of the endangered amphibian Quasipaa boulengeri in China was predicted using optimized MaxEnt and RF models. A total of 408 occurrence data were selected, 1000 locations were generated as pseudo-absence data by the geographic distance method, and 10,000 sites were selected as background data by creating a bias file. Partial ROC at different thresholds and success rate curves were used to compare the predictive performances between optimized MaxEnt and RF. Our results showed that the RF and optimized MaxEnt models both had good performance in predicting the potential distribution of Q. boulengeri, with the RF model performing slightly better whether based on partial ROC or success rate curves. Furthermore, the core suitable habitat regions of Q. boulengeri identified by RF and MaxEnt were similar and were all located in the Sichuan, Chongqing, Hubei, Hunan, and Guizhou provinces. However, the RF model produced a habitat suitability map with higher discrimination and greater heterogeneity. Temperature annual range, mean temperature of the driest quarter, and annual precipitation were the vital environmental variables limiting the distribution of Q. boulengeri. The RF model is the stronger machine learner. We believe it may be more applicable in predicting the native potential distributions of species with sufficient occurrence data, given the additional predictive detail, the simplicity of use, the computational time involved, and the operational complexity.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... There are more than 10 species distribution models (SDMs) that have been reported, but the MaxEnt model is low cost, simple to operate, short to run, and can simulate the fitness range of species well with a very small number of samples (n ≥ 5) [5]. At present, it is also widely used in the prediction research of amphibian habitats, such as Odrrana hainanensis [2], and five species of Scutiger [6], and others such as Rana Zhenhaiensis [7], Buergeria oxycephala [8], Nanorana parkeri [9], Plethodon [10], and Quasipaa boulengeri [11], and so on. ...
Article
Full-text available
Quasipaa spinosa is a large cold-water frog unique to China, with great ecological and economic value. In recent years, due to the impact of human activities on the climate, its habitat has been destroyed, resulting in a sharp decline in natural population resources. Based on the existing distribution records of Q. spinosa, this study uses the optimized MaxEnt model and ArcGis 10.2 software to screen out 10 factors such as climate and altitude to predict its future potential distribution area because of climate change. The results show that when the parameters are FC = LQHP and RM = 3, the MaxEnt model is optimal and AUC values are greater than 0.95. The precipitation of the driest month (bio14), temperature seasonality (bio4), elevation (ele), isothermality (bio3), and the minimum temperature of coldest month (bio6) were the main environmental factors affecting the potential range of the Q. spinosa. At present, high-suitability areas are mainly in the Hunan, Fujian, Jiangxi, Chongqing, Guizhou, Anhui, and Sichuan provinces of China. In the future, the potential distribution area of Q. spinosa may gradually extend to the northwest and north. The low-concentration emissions scenario in the future can increase the area of suitable habitat for Q. spinosa and slow down the reduction in the amount of high-suitability areas to a certain extent. In conclusion, the habitat of Q. spinosa is mainly distributed in southern China. Because of global climate change, the high-altitude mountainous areas in southern China with abundant water resources may be the main potential habitat area of Q. spinosa. Predicting the changes in the distribution patterns of Q. spinosa can better help us understand the biogeography of Q. spinosa and develop conservation strategies to minimize the impacts of climate change.
... To handle this problem, we chose the Random Forest (RF) machine learning method implemented in the Python programming language in the Scikit-learn package [44]. We selected RF as a method to build the models following the results of several studies indicating that RF may be more applicable in predicting the native potential distribution of species with sufficient species occurrence data [45,46]. Scikit-learn is a general-purpose machine learning package focused on rapid prototyping, validating, and deploying supervised and unsupervised learning models. ...
Article
Full-text available
Spruce taiga forests in Northeast Asia are of great economic and conservation importance. Continued climate warming may cause profound changes in their distribution. We use prognostic and retrospective species distribution models based on the Random Forest machine learning method to estimate the potential range change of the dominant taiga conifer Jezo spruce (Picea jezoensis (Siebold & Zucc.) Carrière) for the year 2070 climate warming scenarios and for past climate epochs-the Last Glacial Maximum (LGM) (~21,000 years before present) and the mid-Holocene Climatic Optimum (MHO) (~7000 years before the present) using the MIROC-ESM and CCSM4 climate models. The current suitable climatic conditions for P. jezoensis are estimated to be 500,000 km 2. Both climatic models show similar trends in past and future ranges but provide different quantitative areal estimates. During the LGM, the main part of the species range was located much further south than today at 35-45 • N. Projected climate warming will cause a greater change in the distributional range of P. jezoensis than has occurred since the MHO. Overlapping climatic ranges at different times show that the Changbai Mountains, the central parts of the Japanese Alps, Hokkaido, and the Sikhote-Alin Mountains will remain suitable refugia for Jezo spruce until 2070. The establishment of artificial forest stands of P. jezoensis and intraspecific taxa in the future climate-acceptable regions may be important for the preservation of genetic diversity.
... However, the MaxEnt model was strongly influenced by specific variables, whereas the RF model was generally influenced by more variables than the MaxEnt model. This was consistent with the results of previous studies using the RF and MaxEnt models [48,52]. Additionally, the occurrence probability of WCSBs increased with high elevation, bright light at night, and proximity to roads, whereas it decreased as annual precipitation increased ( Figure 5). ...
Article
Full-text available
The western conifer seed bug (WCSB; Leptoglossus occidentalis) causes huge ecological and economic problems as an alien invasive species in forests. In this study, a species distribution model (SDM) was developed to evaluate the potential occurrence of the WCSBs and the effects of climate on WCSB distribution in South Korea. Based on WCSB occurrence and environmental data, including geographical and meteorological variables, SDMs were developed with maximum entropy (MaxEnt) and random forest (RF) algorithms, which are machine learning methods, and they showed good performance in predicting WCSB occurrence. On the potential distribution map of WCSBs developed by the model ensemble with integrated MaxEnt and RF models, the WCSB occurrence areas were mostly located at low altitudes, near roads, and in urban areas. Additionally, environmental factors associated with anthropogenic activities, such as roads and night lights, strongly influenced the occurrence and dispersal of WCSBs. Metropolitan cities and their vicinities in South Korea showed a high probability of WCSB occurrence. Furthermore, the occurrence of WCSBs in South Korea is predicted to intensify in the future owing to climate change.
... cn). We used Arcgis 10.6 to test the cross-correlation of 30 variables and only those variables with a correlation coefficient (r 2 ) < 0.8 were selected (Negrete et al., 2020;Zhao et al., 2022). Each ephemeral plant's occurrence records were randomly selected from each cell with dimensions of 20 × 20 km (Boria et al., 2014). ...
Article
Full-text available
Background Arid and semi-arid regions account for about 40% of the world’s land surface area, and are the most sensitive areas to climate change, leading to a dramatic expansion of arid regions in recent decades. Ephemeral plants are crucial herbs in this area and are very sensitive to climate change, but it is still unclear which factors can determine the distribution of ephemeral plants and how the distribution of ephemeral plants responds to future climate change across the globe. Aims Understanding the impact of climate change on ephemeral plant distribution is crucial for sustainable biodiversity conservation. Methods This study explored the potential distribution of three types of ephemeral plants in arid and semi-arid regions (cold desert, hot desert, and deciduous forest) on a global scale using the MaxEnt software. We used species global occurrence data and 30 environmental factors in scientific collections. Results Our results showed that (1) the average value of the area under the receiver operating curve (AUC) of each species was higher than 0.95, indicating that the MaxEnt model’s simulation accuracy for each species was good; (2) distributions of cold desert and deciduous forest species were mainly determined by soil pH and annual mean temperature; the key factor that determines the distribution of hot desert species was precipitation of the driest month; and (3) the potential distribution of ephemeral plants in the cold desert was increased under one-third of climate scenarios; in the hot desert, the potential suitable distribution for Anastatica hierochuntica was decreased in more than half of the climate scenarios, but Trigonella arabica was increased in more than half of the climate scenarios. In deciduous forests, the ephemeral plant Crocus alatavicus decreased in nearly nine-tenths of climate scenarios, and Gagea filiformis was increased in 75% of climate scenarios. Conclusions The potential suitable distributions of ephemeral plants in the different ecosystems were closely related to their specific adaptation strategies. These results contribute to a comprehensive understanding of the potential distribution pattern of some ephemeral plants in arid and semi-arid ecosystems.
... In contrast, absence points (or "background" points) in MaxEnt were randomly sampled across the whole area (Massada et al., 2012;Oppel et al., 2012) and may, by chance, also depict single presence points, which might influence the model results. Nevertheless, in other comparative studies, both machine learning-based methods tend to perform similarly (Acharya et al., 2019;Bektas et al., 2022;Kaky et al., 2020;Kaky & Gilbert, 2016;Mi et al., 2017;Zhao et al., 2022). As also in our case, the results of the MaxEnt and the modeling strategy M1 (the best of the random forest models) were qualitatively similar, we only present the results of the random forest machine learning approaches in the results section. ...
Article
Full-text available
Subterranean animals act as ecosystem engineers, for example, through soil perturbation and herbivory, shaping their environments worldwide. As the occurrence of animals is often linked to above‐ground features such as plant species composition or landscape textures, satellite‐based remote sensing approaches can be used to predict the distribution of subterranean species. Here, we combine in‐situ collected vegetation composition data with remotely sensed data to improve the prediction of a subterranean species across a large spatial scale. We compared three machine learning‐based modeling strategies, including field and satellite‐based remote sensing data to different extents, in order to predict the distribution of the subterranean giant root‐rat GRR, Tachyoryctes macrocephalus, an endangered rodent species endemic to the Bale Mountains in southeast Ethiopia. We included no, some and extensive fieldwork data in the modeling to test how these data improved prediction quality. We found prediction quality to be particularly dependent on the spatial coverage of the training data. Species distributions were best predicted by using texture metrics and eyeball‐selected data points of landscape marks created by the GRR. Vegetation composition as a predictor showed the lowest contribution to model performance and lacked spatial accuracy. Our results suggest that the time‐consuming collection of vegetation data in the field is not necessarily required for the prediction of subterranean species that leave traceable above‐ground landscape marks like the GRR. Instead, remotely sensed and spatially eyeball‐selected presence data of subterranean species could profoundly enhance predictions. The usage of remote sensing‐derived texture metrics has great potential for improving the distribution modeling of subterranean species, especially in arid ecosystems. We compared three machine learning‐based modeling strategies, which included field‐ and remote‐sensing data to a different extent, for predicting the distribution of a subterranean species. We used the endangered giant root‐rat Tachyoryctes macrocephalus, endemic to the afro‐alpine ecosystem of the Bale Mountains in Ethiopia, for demonstrating that remotely sensed and spatially eyeball‐selected presence data of subterranean species could profoundly enhance distribution predictions. Our results suggest that the time‐consuming collection of vegetation data in the field is not necessarily required for the distribution prediction of subterranean species that leave traceable above‐ground marks in the landscape.
Article
Full-text available
Broussonetia papyrifera is an important native tree species in China with strong adaptability, wide distribution, and economic importance. Climate change is considered as the main threat to ecological processes and global biodiversity. Predicting the potential geographical distribution of B. papyrifera in future climate change scenarios will provide a scientific basis for ecological restoration in China. Principal component analysis (PCA) and Pearson correlation analysis were conducted to select the environmental variables. The distribution and changes in the potential suitable area for B. papyrifera were predicted using the maximum entropy (MaxEnt) model and the CIMP6 dataset from 2041 to 2060. The current highly suitable areas for B. papyrifera were mainly located in Guangdong (5.60×10 4 km 2), Guangxi (4.39×10 4 km 2), Taiwan (2.54×10 4 km 2), and Hainan (2.17×10 4 km 2) provinces. The mean temperature of the coldest quarter (11.54-27.11℃), precipitation of the driest quarter (51.48-818.40 mm), and precipitation of the wettest quarter (665.51-2302.60 mm) were the main factors limiting the suitable areas for B. papyrifera. The multi-modal average of the highly and the total suitable areas for B. papyrifera were 111.42×10 4 km 2 and 349.11×10 4 km 2 in the SSP5-8.5 scenario, while those in the SSP1-2.6 scenario were 87.50×10 4 km 2 and 328.29×10 4 km 2, respectively. The gained suitable areas for B. papyrifera will expand to the western and northern China in the future scenarios. The multi-model averaging results showed that the potential available planting area was 212.66×10 4 km 2 and 229.32×10 4 km 2 in the SSP1-2.6 and SSP5-8.5 scenarios, respectively, when the suitable area within the farmland range was excluded.
Article
Full-text available
Wildfires directly affect global ecosystem stability and severely threaten human life. The mountainous areas of Southwest China experience frequent wildfires. Mapping the susceptibility patterns and analyzing the drivers of wildfires are crucial for effective wildfire management, especially considering that the inclusion of seasonal dimensions will produce more dynamic results. Using Yunnan Province of China as a case study area, a method was attempted to distinguish dependable wildfires by season, while possible wildfire drivers were obtained and refined within seasons. The patterns of wildfire susceptibility in different seasons were modeled based on the Maxent and random forest models. Then, the spatial relationships between wildfire and potential drivers were analyzed integrating with GeoDetector to evaluate the variable importance of drivers and the marginal effect of drivers. The results showed that the two models effectively depicted each season's wildfire susceptibility. The susceptible wildfire areas in spring and winter are located throughout Yunnan Province, with anthropogenic factors being the most significant drivers. During the summer and autumn, wildfire risk areas are relatively concentrated, showing a trend of dominant drought-driven and humid conditions. The differences in wildfire drivers across seasons reflect the lagged effect of climate factors on wildfires, leading to significant discrepancies in the marginal effects of how seasonal drivers affect wildfires. The findings improve our understanding of the effects of the interseasonal variability of environmental variables on wildfires and promote the development of specific seasonal wildfire management strategies.
Article
Full-text available
Largehead hairtail Trichiurus japonicus is a major commercial fish species in the Beibu Gulf of the northwestern South China Sea. Despite much effort to protect the fishery resource, the current stock of T. japonicus is overexploited. As the impacts of climate change unfold globally, seasonal changes in the distribution of largehead hairtail in the Beibu Gulf have not yet been clarified. Maximum entropy model based on mixed layer depth and salinity were projected onto seasonal habitat changes of T. japonicus in the Beibu Gulf under a current scenario and three different Representative Concentration Pathways (126, 370, 585) to evaluate geographic distribution changes under the different climate-change scenarios. The current geographic distribution results showed variation with seasonality, as the wintering population shifts toward the northeast. Under each of three SSP scenarios, there is higher risk to habitat suitability in the 2090s as compared with that in the 2050s. The disadvantage to T. japonicus distribution is greatest in winter under each of the three climate change scenarios, both in the short- and long-term. Potential suitable habitat distributions have a minor range extension in Representative Concentration Pathway 370–2050 winter, but in the rest of the scenes and years they contract to south of the Beibu Gulf. The overall results indicate that seasonal differences in suitable habitat should be considered to ensure effective planning of future management strategies for T. japonicus.
Article
Full-text available
Background Hemorrhagic fever with renal syndrome (HFRS) is a serious public health problem in China. The geographic distribution has went throughout China, among which Zhejiang Province is an important epidemic area. Since 1963, more than 110,000 cases have been reported.Methods We collected the meteorological factors and socioeconomic indicators of Zhejiang Province, and constructed the HFRS ecological niche model of Zhejiang Province based on the algorithm of maximum entropy.ResultsModel AUC from 2009 to 2018, is 0.806–0.901. The high incidence of epidemics in Zhejiang Province is mainly concentrated in the eastern, western and central regions of Zhejiang Province. The contribution of digital elevation model ranged from 2009 to 2018 from 4.22 to 26.0%. The contribution of average temperature ranges from 6.26 to 19.65%, Gross Domestic Product contribution from 7.53 to 21.25%, and average land surface temperature contribution with the highest being 16.73% in 2011. In addition, the average contribution of DMSP/OLS, 20-8 precipitation and 8-20 precipitation were all in the range of 9%. All-day precipitation increases with the increase of rainfall, and the effect curve peaks at 1,250 mm, then decreases rapidly, and a small peak appears again at 1,500 mm. Average temperature response curve shows an inverted v-shape, where the incidence peaks at 17.8°C. The response curve of HFRS for GDP and DMSP/OLS shows a positive correlation.Conclusion The incidence of HFRS in Zhejiang Province peaked in areas where the average temperature was 17.8°C, which reminds that in the areas where temperature is suitable, personal protection should be taken when going out as to avoid contact with rodents. The impact of GDP and DMSP/OLS on HFRS is positively correlated. Most cities have good medical conditions, but we should consider whether there are under-diagnosed cases in economically underdeveloped areas.
Article
Full-text available
Invasive species have been the focus of ecologists due to their undesired impacts on the environment. The extent and rapid increase in invasive plant species is recognized as a natural cause of global-biodiversity loss and degrading ecosystem services. Biological invasions can affect ecosystems across a wide spectrum of bioclimatic conditions. Understanding the impact of climate change on species invasion is crucial for sustainable biodiversity conservation. In this study, the possibility of mapping the distribution of invasive Prosopis juliflora (Swartz) DC. was shown using present background data in Khuzestan Province, Iran. After removing the spatial bias of background data by creating weighted sampling bias grids for the occurrence dataset, we applied six modelling algorithms (generalized additive model (GAM), classification tree analysis (CTA), random forest (RF), multivariate adaptive regression splines (MARS), maximum entropy (MaxEnt) and ensemble model) to predict invasion distribution of the species under current and future climate conditions for both optimistic (RCP 2.6) and pessimistic (RCP 8.5) scenarios for the years 2050 and 2070, respectively. Predictor variables including weighted mean of CHELSA (climatologies at high resolution for the Earth’s land surface areas)-bioclimatic variables and geostatistical-based bioclimatic variables (1979–2020), physiographic variables extracted from shuttle radar topography mission (SRTM) and some human factors were used in modelling process. To avoid causing a biased selection of predictors or model coefficients, we resolved the spatial autocorrelation of presence points and multi-collinearity of the predictors. As in a conventional receiver operating characteristic (ROC), the area under curve (AUC) is calculated using presence and absence observations to measure the probability and the two error components are weighted equally. All models were evaluated using partial ROC at different thresholds and other statistical indices derived from confusion matrix. Sensitivity analysis showed that mean diurnal range (Bio2) and annual precipitation (Bio12) explained more than 50% of the changes in the invasion distribution and played a pivotal role in mapping habitat suitability of P. juliflora. At all thresholds, the ensemble model showed a significant difference in comparison with single model. However, MaxEnt and RF outperformed the others models. Under climate change scenarios, it is predicted that suitable areas for this invasive species will increase in Khuzestan Province, and increasing climatically suitable areas for the species in future will facilitate its future distribution. These findings can support the conservation planning and management efforts in ecological engineering and be used in formulating preventive measures.
Article
Full-text available
Accurate species delimitation is the key to precise estimation of species diversity and is fundamental to most branches of biology. Unclear species boundaries within species complexes may lead to the underestimation of species diversity. However, species delimitation of species complexes remains challenging due to the continuum of phenotypic variations. To robustly examine species boundaries within a species complex, integrative approaches in phylogeny, ecology and morphology were applied to the Stewartia sinensis complex (Theaceae) endemic to China. Multispecies coalescent‐based species delimitation using 572 nuclear ortholog sequences (anchored enrichment) supported reciprocal phylogenetic monophyly of the northern lineage (NL) and southern lineage (SL) which were not sister clades. Niche equivalency and similarity tests demonstrated significant climatic niche differentiation between NL and SL with observed Warren et al.’s I = 0.0073 and Schoener’s D = 0.0021. Species distribution modeling also separated their potential distribution. Morphometric analyses suggested an inter‐lineage differentiation of multiple traits including the ratio of length and width, leaf width, and pedicel length significantly although overall similarity did not differ. Based on the integrative species concept, two distinct species were proposed with legitimate names of S. gemmata for SL and S. sinensis for NL, respectively. Our empirical study of the S. sinensis complex highlights the importance of applying multiple species criteria, in particular the underappreciated niche differentiation, to species delimitation in species complexes pervasive in plants. This article is protected by copyright. All rights reserved.
Article
Full-text available
Abstract Bioclimatic envelope models are commonly used to assess the influence of climate change on species' distributions and biodiversity patterns. Understanding how methodological choices influence these models is critical for a comprehensive evaluation of the estimated impacts. Here we systematically assess the performance of bioclimatic envelope models in relation to the selection of predictors, modeling technique, and pseudo‐absences. We considered (a) five different predictor sets, (b) seven commonly used modeling techniques and an ensemble model, and (c) three sets of pseudo‐absences (1,000 pseudo‐absences, 10,000 pseudo‐absences, and the same as the number of presences). For each combination of predictor set, modeling technique, and pseudo‐absence set, we fitted bioclimatic envelope models for 300 species of mammals, amphibians, and freshwater fish, and evaluated the predictive performance of the models using the true skill statistic (TSS), based on a spatially independent test set as well as cross‐validation. On average across the species, model performance was mostly influenced by the choice of predictor set, followed by the choice of modeling technique. The number of the pseudo‐absences did not have a strong effect on the model performance. Based on spatially independent testing, ensemble models based on species‐specific nonredundant predictor sets revealed the highest predictive performance. In contrast, the Random Forest technique yielded the highest model performance in cross‐validation but had the largest decrease in model performance when transferred to a different spatial context, thus highlighting the need for spatially independent model evaluation. We recommend building bioclimatic envelope models according to an ensemble modeling approach based on a nonredundant set of bioclimatic predictors, preferably selected for each modeled species.
Article
Full-text available
Aim: Despite the large literature documenting the negative effects of invasive grasses, we lack an understanding of the drivers of their habitat suitability, especially for shade-tolerant species that do not respond positively to canopy disturbance. We aimed to understand the environmental niche and potential spatial distribution of a relatively new invasive species, wavyleaf basketgrass (Oplismenus undulatifolius (Ard.) Roem. & Schult, WLBG) by leveraging data available at two different spatial scales. Location: Mid-Atlantic region of the United States. Methods: Maximum entropy modeling (Maxent) was used to predict the habitat suitability of WLBG at the regional scale and the landscape scale. Following variable evaluation, model calibration, and model evaluation, final models were created using 1,000 replicates and projected to each study area. Results: At the regional scale, our best models show that suitability for WLBG was driven by relatively high annual mean temperatures, low temperature seasonality and monthly range, low slope, and high cumulative Normalized Difference Vegetation Index (NDVI). At the landscape scale, suitability was highest near roads and streams, far from trails, at low elevations, in sandy, moist soil, and in areas with high NDVI. Main conclusions: We found that invasion potential of this relatively new invader appears high in productive, mesic habitats at low slope and elevations. At the regional scale, our model predicted areas of suitable habitat far outside areas where WLBG has been reported, including large portions of Virginia and West Virginia, suggests serious potential for spread. However, large portions of this area carry a high extrapolation risk and should therefore be interpreted with caution. In contrast, at the landscape level, the suitability of WLBG is largely restricted to areas near current presence points, suggesting that the expansion risk of this species within Shenandoah National Park is somewhat limited.
Article
Full-text available
Predictive models are central to both archaeological research and cultural resource management. Yet, archaeological applications of predictive models are often insufficient due to small training data sets, inadequate statistical techniques, and a lack of theoretical insight to explain the responses of past land use to predictor variables. Here we address these critiques and evaluate the predictive power of four statistical approaches widely used in ecological modeling—generalized linear models, generalized additive models, maximum entropy, and random forests—to predict the locations of Formative Period (2100–650 BP) archaeological sites in the Grand Staircase-Escalante National Monument. We assess each modeling approach using a threshold-independent measure, the area under the curve (AUC), and threshold-dependent measures, like the true skill statistic. We find that the majority of the modeling approaches struggle with archaeological datasets due to the frequent lack of true-absence locations, which violates model assumptions of generalized linear models, generalized additive models, and random forests, as well as measures of their predictive power (AUC). Maximum entropy is the only method tested here which is capable of utilizing pseudo-absence points (inferred absence data based on known presence data) and controlling for a non-representative sampling of the landscape, thus making maximum entropy the best modeling approach for common archaeological data when the goal is prediction. Regression-based approaches may be more applicable when prediction is not the goal, given their grounding in well-established statistical theory. Random forests, while the most powerful, is not applicable to archaeological data except in the rare case where true-absence data exist. Our results have significant implications for the application of predictive models by archaeologists for research and conservation purposes and highlight the importance of understanding model assumptions.
Article
Full-text available
In the two recent decades various security authorities around the world acknowledged the importance of exploiting the ever-growing amount of information published on the web on various types of events for early detection of certain threats, situation monitoring and risk analysis. Since the information related to a particular real-world event might be scattered across various sources and mentioned on different dates, an important task is to link together all event mentions that are interrelated. This article studies the application of various statistical and machine learning techniques to solve a new application-oriented variation of the task of event pair relatedness classification, which merges different fine-grained event relation types reported elsewhere into one concept. The task focuses on linking event templates automatically extracted from online news by an existing event extraction system, which contain only short text snippets, and potentially erroneous and incomplete information. Results of exploring the performance of shallow learning methods such as decision tree-based random forest and gradient boosted tree ensembles (XGBoost) along with kernel-based support vector machines (SVM) are presented in comparison to both simpler shallow learners as well as a deep learning approach based on long short-term memory (LSTM) recurrent neural network. Our experiments focus on using linguistically lightweight features (some of which not reported elsewhere) which are easily portable across languages. We obtained F1 scores ranging from 92% (simplest shallow learner) to 96.4% (LSTM-based recurrent neural network) evaluated on a newly created event linking corpus.
Article
Full-text available
Sclerophrys perreti is a critically endangered Nigerian native frog currently imperilled by human activities. A better understanding of its potential distribution and habitat suitability will aid in conservation; however, such knowledge is limited for S. perreti. Herein, we used a species distribution model (SDM) approach with all known occurrence data (n = 22) from our field surveys and primary literature, and environmental variable predictors (19 bioclimatic variables, elevation and land cover) to elucidate habitat suitability and impact of climate change on this species. The SDM showed that temperature and precipitation were the predictors of habitat suitability for S. perreti with precipitation seasonality as the strongest predictor of habitat suitability. The following variable also had a significant effect on habitat suitability: temperature seasonality, temperature annual range, precipitation of driest month, mean temperature of wettest quarter and isothermality. The model predicted current suitable habitat for S. perreti covering an area of 1,115 km2. However, this habitat is predicted to experience 60% reduction by 2050 owing to changes in temperature and precipitation. SDM also showed that suitable habitat exists in south‐eastern range of the inselberg with predicted low impact of climate change compared to other ranges. Therefore, this study recommends improved conservation measures through collaborations and stakeholder's meeting with local farmers for the management and protection of S. perreti.
Article
Full-text available
The acid frogs of eastern Australia are a highly specialized group of threatened species endemic to acidic coastal wetlands of southern Queensland and New South Wales. The distribution of these species overlaps with areas of increasing development where land‐use intensification poses a significant threat. Successful conservation of these species requires that areas of high conservation value for acid frogs are properly identified and protected, particularly in south‐east Queensland which supports important populations of all four acid frog species: Litoria olongburensis, Litoria freycineti, Crinia tinnula, and the Queensland‐endemic Litoria cooloolensis. Species distribution modeling using rigorously vetted species occurrence data was used to identify areas of potential acid frog habitat with >89% predictive power for all species. Key predictor variables for acid frog species occurrence included: soil sandiness, vegetation, presence and/or type of wetland, and soil clay content. All species' predicted distributions occurred primarily in coastal regions, overlapping with densely human‐populated areas. Our modeling and analysis of species' distributions highlight local government areas where protection of wallum habitat is most important for the conservation of acid frogs, as well as areas of higher conservation value providing habitat for multiple acid frog species.
Article
Full-text available
Climate change is likely to impact multiple dimensions of biodiversity. Species range shifts are expected and may drive changes in the composition of species assemblages. In some regions, changes in climate may precipitate the loss of geographically restricted, niche specialists and facilitate their replacement by more widespread, niche generalists, leading to decreases in β‐diversity and biotic homogenization. However, in other regions climate change may drive local extinctions and range contraction, leading to increases in β‐diversity and biotic heterogenization. Regional topography should be a strong determinant of such changes as mountainous areas often are home to many geographically restricted species, whereas lowlands and plains are more often inhabited by widespread generalists. Climate warming, therefore, may simultaneously bring about opposite trends in β‐diversity in mountainous highlands versus relatively flat lowlands. To test this hypothesis, we used species distribution modelling to map the present‐day distributions of 2669 Neotropical anuran species, and then generated projections of their future distributions assuming future climate change scenarios. Using traditional metrics of β‐diversity, we mapped shifts in biotic homogenization across the entire Neotropical region. We used generalized additive models to then evaluate how changes in β‐diversity were associated with shifts in species richness, phylogenetic diversity and one measure of ecological generalism. Consistent with our hypothesis, we find increasing biotic homogenization in most highlands, associated with increased numbers of generalists and, to a lesser extent, losses of specialists, leading to an overall increase in alpha diversity, but lower mean phylogenetic diversity. In the lowlands, biotic heterogenization was more common, and primarily driven by local extinctions of generalists, leading to lower α‐diversity, but higher mean phylogenetic diversity. Our results suggest that impacts of climate change on β‐diversity are likely to vary regionally, but will generally lead to lower diversity, with increases in β‐diversity offset by decreases in α‐diversity.
Article
Full-text available
Random forests (RF) is a powerful species distribution model (SDM) algorithm. This ensemble model by default can produce categorical and numerical species distribution maps based on its classification tree (CT) and regression tree (RT) algorithms, respectively. The CT algorithm can also produce numerical predictions (class probability). Here, we present a detailed procedure involving the use of the CT and RT algorithms using the RF method with presence-only data to model the distribution of species. CT and RT are used to generate numerical prediction maps, and then numerical predictions are converted to binary predictions through objective threshold-setting methods. We also applied simple methods to deal with collinearity of predictor variables and spatial autocorrelation of species occurrence data. A geographically stratified sampling method was employed for generating pseudo-absences. The detailed procedural framework is meant to be a generic method to be applied to virtually any SDM prediction question using presence-only data. •How to use RF as a standard method for generic species distributions with presence-only data•How to choose RF (CT or RT) methods for the distribution modeling of species•A general and detailed procedure for any SDM prediction question.
Article
Full-text available
Spatiotemporal predictions of bycatch (i.e., catch of nontargeted species) have shown promise as dynamic ocean management tools for reducing bycatch. However, which spatiotemporal model framework to use for generating these predictions is unclear. We evaluated a relatively new method, Gaussian Markov random fields (GMRFs), with two other frameworks, generalized additive models (GAMs) and random forests. We fit geostatistical delta-models to fisheries observer bycatch data for six species with a broad range of movement patterns (e.g., highly migratory sea turtles versus sedentary rockfish) and bycatch rates (percentage of observations with nonzero catch, 0.3%–96.2%). Random forests had better interpolation performance than the GMRF and GAM models for all six species, but random forests performance was more sensitive when predicting data at the edge of the fishery (i.e., spatial extrapolation). Using random forests to identify and remove the 5% highest bycatch risk fishing events reduced the bycatch-to-target species catch ratio by 34% on average. All models considerably reduced the bycatch-to-target ratio, demonstrating the clear potential of species distribution models to support spatial fishery management.
Article
Full-text available
Understanding geographic distributions of species is a crucial step in spatial planning for biodiversity conservation, particularly as regards changes in response to global climate change. This information is especially important for species of global conservation concern that are susceptible to the effects of habitat loss and climate change. In this study, we used ecological niche modeling to assess the current and future geographic distributional potential of White-breasted Guineafowl (Agelastes meleagrides) (Vulnerable) across West Africa. We used primary occurrence data obtained from the Global Biodiversity Information Facility and national parks in Liberia and Sierra Leone, and two independent environmental datasets (Moderate Resolution Imaging Spectroradiometer normalized difference vegetation index at 250 m spatial resolution, and Worldclim climate data at 2.5′ spatial resolution for two representative concentration pathway emissions scenarios and 27 general circulation models for 2050) to build ecological niche models in Maxent. From the projections, White-breasted Guineafowl showed a broader potential distribution across the region compared to the current IUCN range estimate for the species. Suitable areas were concentrated in the Gola rainforests in northwestern Liberia and southeastern Sierra Leone, the Tai-Sapo corridor in southeastern Liberia and southwestern Côte d’Ivoire, and the Nimba Mountains in northern Liberia, southeastern Guinea, and northwestern Côte d’Ivoire. Future climate-driven projections anticipated minimal range shifts in response to climate change. By combining remotely sensed data and climatic data, our results suggest that forest cover, rather than climate is the major driver of the species’ current distribution. Thus, conservation efforts should prioritize forest protection and mitigation of other anthropogenic threats (e.g. hunting pressure) affecting the species.
Article
Full-text available
The construction of transport infrastructure is often preceded by an environmental impact assessment procedure, which should identify amphibian breeding sites and migration routes. However, the assessment is very difficult to conduct because of the large number of habitats spread out over a vast expanse, and the limited amount of time available for fieldwork. We propose utilizing local environmental variables that can be gathered remotely using only GIS systems and satellite images together with machine learning methods. In this article, we introduce six new and easily extractable types of environmental features. Most of the features we propose can be easily obtained from satellite imagery and spatial development plans. The proposed feature space was evaluated using four machine learning algorithms, namely: a C4.5 decision tree, AdaBoost, random forest and gradient-boosted trees. The obtained results indicated that the proposed feature space facilitated prediction and was comparable to other solutions. Moreover, three of the new proposed features are ranked most important; these are the three dominant properties of the surroundings of water reservoirs. One of the new features is the percentage access from the edges of the reservoir to open areas, but it affects only a few species. Furthermore, our research confirmed that the gradient-boosted trees were the best method for the analyzed dataset.
Article
Full-text available
Background Ecological niche modeling is a set of analytical tools with applications in diverse disciplines, yet creating these models rigorously is now a challenging task. The calibration phase of these models is critical, but despite recent attempts at providing tools for performing this step, adequate detail is still missing. Here, we present the kuenm R package, a new set of tools for performing detailed development of ecological niche models using the platform Maxent in a reproducible way. Results This package takes advantage of the versatility of R and Maxent to enable detailed model calibration and selection, final model creation and evaluation, and extrapolation risk analysis. Best parameters for modeling are selected considering (1) statistical significance, (2) predictive power, and (3) model complexity. For final models, we enable multiple parameter sets and model transfers, making processing simpler. Users can also evaluate extrapolation risk in model transfers via mobility-oriented parity (MOP) metric. Discussion Use of this package allows robust processes of model calibration, facilitating creation of final models based on model significance, performance, and simplicity. Model transfers to multiple scenarios, also facilitated in this package, significantly reduce time invested in performing these tasks. Finally, efficient assessments of strict-extrapolation risks in model transfers via the MOP and MESS metrics help to prevent overinterpretation in model outcomes.
Article
Full-text available
Climate change‐induced species range shift may pose severe challenges to species conservation. The Qinghai‐Tibet Plateau is the highest and biggest plateau, and also one of the most sensitive areas to global warming in the world, which provides important shelters for a unique assemblage of species. Here, ecological niche‐based model was employed to project the potential distributions of 59 key rare and endangered species under three climate change scenarios (RCP2.6, RCP4.5 and RCP8.5) in Qinghai Province. I assessed the potential impacts of climate change on these key species (habitats, species richness and turnover) and effectiveness of nature reserves (NRs) in protecting these species. The results revealed that that climate change would shrink the geographic ranges of about a third studied species and expand the habitats for two thirds of these species, which would thus alter the conservation value of some local areas and conservation effectiveness of some NRs in Qinghai Province. Some regions require special attention as they are expected to experience significant changes in species turnover, species richness or newly colonized species in the future, including Haidong, Haibei and Haixi junctions, the southwestern Yushu, Qinghai Nuomuhong Provincial NR, Qinghai Qaidam and Haloxylon Forest NR. The Haidong and the eastern part of Haibei, are projected to have high species richness and conservation value in both current and future, but they are currently not protected, and thus require extra protection in the future. The results could provide the first basis on the high latitude region to formulate biodiversity conservation strategies on climate change adaptation.
Article
Full-text available
Climate change might drive species declines by altering species interactions, such as host–parasite interactions. However, few studies have combined experiments, field data, and historical climate records to provide evidence that an interaction between climate change and disease caused any host declines. A recently proposed hypothesis, the thermal mismatch hypothesis, could identify host species that are vulnerable to disease under climate change because it predicts that cool‐ and warm‐adapted hosts should be vulnerable to disease at unusually warm and cool temperatures, respectively. Here, we conduct experiments on Atelopus zeteki, a critically endangered, captively bred frog that prefers relatively cool temperatures, and show that frogs have high pathogen loads and high mortality rates only when exposed to a combination of the pathogenic chytrid fungus (Batrachochytrium dendrobatidis) and high temperatures, as predicted by the thermal mismatch hypothesis. Further, we tested various hypotheses to explain recent declines experienced by species in the amphibian genus Atelopus that are thought to be associated with B. dendrobatidis and reveal that these declines are best explained by the thermal mismatch hypothesis. As in our experiments, only the combination of rapid increases in temperature and infectious disease could account for the patterns of declines, especially in species adapted to relatively cool environments. After combining experiments on declining hosts with spatiotemporal patterns in the field, our findings are consistent with the hypothesis that widespread species declines, including possible extinctions, have been driven by an interaction between increasing temperatures and infectious disease. Moreover, our findings suggest that hosts adapted to relatively cool conditions will be most vulnerable to the combination of increases in mean temperature and emerging infectious diseases.
Article
Full-text available
This study tested and compared the mineral potential mapping capabilities of the random forest (RF) and maximum entropy (MaxEnt) algorithms using gold deposit occurrences within the Hezuo–Meiwu district, West Qinling Orogen, China. Eighteen orogenic gold deposits in this district and associated regional exploration datasets were used to construct data-driven predictive models to identify locations prospective for gold mineralization. The 18 orogenic gold deposits used in the modeling can be divided into magmatic-hydrothermal gold deposits and mesothermal gold deposits in terms of metallogenic characteristics and nine evidential maps associated with Au deposit occurrences (i.e., distance to intrusions and faults; Au, As, Ag, Cu, and Sb singularity indices; and principal component scores (PC1 and PC2) based on isometric logratio-transformed geochemical data were selected as inputs to the models). The PC1 represents a primary geochemical signature of tectonic process or their products (i.e., fault system), whereas PC2 represents a secondary geochemical signature. Both RF and MaxEnt models were then used to quantitatively rank the importance and identify the sensitivity of the evidential maps based on their spatial relationships to the known gold deposits in the study area. The two groups of populations in the response curves and marginal effect curves indicate that the mineral potential mapping should be performed by zones in consideration of different metallogenic characteristics of gold deposits. The accuracy of the resulting models was then assessed, and the results of the mineral potential mapping were examined using receiver operating characteristic (ROC) analysis, capture-efficiency curve, and success rate curve. Both mineral potential mapping by zones with RF and MaxEnt models have higher area under the ROC curve (AUC) values than the models performed in the study area and delineate 19% of the study area containing > 88% of the known deposit occurrences. Finally, according to the concentration–area (C-A) thresholds for prospectivity maps, two ternary prospectivity maps were generated for further mineral exploration. The results indicate that the RF and MaxEnt algorithms can be used effectively for mineral potential mapping and represent machine learning algorithms that can be used in areas with a few known mineral occurrences.
Article
Full-text available
We report new records of Pinyon Jay Gymnorhinus cyanocephalus) in Chihuahua, northern Mexico. All were made at Rancho Canoas, in the municipality of Gómez Farías, Chihuahua, involving more than 50 individuals between October 2014 and October 2015. Despite being considered a casual visitor to the Alta Babícora Basin, the presence of G. cyanocephalus may reflect the abundant Pinus cembroides in this region, as the species primarily inhabits forests of pine and Juniperus. We discuss the species' current and historical status, based on the published literature, online databases, and unpublished sightings from experienced birdwatchers. We compared the environmental parameters of available records across the species' geographic range with those in Chihuahua, and found no climatic differences between them.
Article
Full-text available
As a consequence of anthropogenic environmental change, the world is facing a possible sixth mass extinction event. The severity of this biodiversity crisis is exemplified by the rapid collapse of hundreds of amphibian populations around the world. Amphibian declines are associated with a range of factors including habitat loss/modification, human utilisation, exotic/invasive species, environmental acidification and contamination, infectious disease, climate change, and increased ultraviolet-B radiation (UVBR) due to stratospheric ozone depletion. However, it is recognised that these factors rarely act in isolation and that amphibian declines are likely to be the result of complex interactions between multiple anthropogenic and natural factors. Here we present a synthesis of the effects of ultraviolet radiation (UVR) in isolation and in combination with a range of naturally occurring abiotic (temperature, aquatic pH, and aquatic hypoxia) and biotic (infectious disease, conspecific density, and predation) factors on amphibians. We highlight that examining the effects of UVR in the absence of other ecologically relevant environmental factors can greatly oversimplify and underestimate the effects of UVR on amphibians. We propose that the pathways that give rise to interactive effects between multiple environmental factors are likely to be mediated by the behavioural and physiological responses of amphibians to each of the factors in isolation. A sound understanding of these pathways can therefore be gained from the continued use of multi-factorial experimental studies in both the laboratory and the field. Such an understanding will provide the foundation for a strong theoretical framework that will allow researchers to predict the combinations of abiotic and biotic conditions that are likely to influence the persistence of amphibian populations under future environmental change.
Article
Full-text available
We created a new dataset of spatially interpolated monthly climate data for global land areas at a very high spatial resolution (approximately 1 km 2). We included monthly temperature (minimum, maximum and average), precipitation, solar radiation, vapour pressure and wind speed, aggregated across a target temporal range of 1970–2000, using data from between 9000 and 60 000 weather stations. Weather station data were interpolated using thin-plate splines with covariates including elevation, distance to the coast and three satellite-derived covariates: maximum and minimum land surface temperature as well as cloud cover, obtained with the MODIS satellite platform. Interpolation was done for 23 regions of varying size depending on station density. Satellite data improved prediction accuracy for temperature variables 5–15% (0.07–0.17 ∘ C), particularly for areas with a low station density, although prediction error remained high in such regions for all climate variables. Contributions of satellite covariates were mostly negligible for the other variables, although their importance varied by region. In contrast to the common approach to use a single model formulation for the entire world, we constructed the final product by selecting the best performing model for each region and variable. Global cross-validation correlations were ≥ 0.99 for temperature and humidity, 0.86 for precipitation and 0.76 for wind speed. The fact that most of our climate surface estimates were only marginally improved by use of satellite covariates highlights the importance having a dense, high-quality network of climate station data.
Article
Full-text available
Effective conservation and utilization strategies for natural biological resources require a clear understanding of the geographic distribution of the target species. Tricholoma matsutake is an ectomycorrhizal (ECM) mushroom with high ecological and economic value. In this study, the potential geographic distribution of T. matsutake under current conditions in China was simulated using MaxEnt software based on species presence data and 24 environmental variables. The future distributions of T. matsutake in the 2050s and 2070s were also projected under the RCP 8.5, RCP 6, RCP 4.5 and RCP 2.6 climate change emission scenarios described in the Special Report on Emissions Scenarios (SRES) by the Intergovernmental Panel on Climate Change (IPCC). The areas of marginally suitable, suitable and highly suitable habitats for T. matsutake in China were approximately 0.22 × 106 km2, 0.14 × 106 km2, and 0.11 × 106 km2, respectively. The model simulations indicated that the area of marginally suitable habitats would undergo a relatively small change under all four climate change scenarios; however, suitable habitats would significantly decrease, and highly suitable habitat would nearly disappear. Our results will be influential in the future ecological conservation and management of T. matsutake and can be used as a reference for studies on other ectomycorrhizal mushroom species.
Article
Full-text available
Environmental niche modeling (ENM) is commonly used to develop probabilistic maps of species distribution. Among available ENM techniques, MaxEnt has become one of the most popular tools for modeling species distribution, with hundreds of peer-reviewed articles published each year. MaxEnt’s popularity is mainly due to the use of a graphical interface and automatic parameter configuration capabilities. However, recent studies have shown that using the default automatic configuration may not be always appropriate because it can produce non-optimal models; particularly when dealing with a small number of species presence points. Thus, the recommendation is to evaluate the best potential combination of parameters (feature classes and regularization multiplier) to select the most appropriate model. In this work we reviewed 244 articles published between 2013 and 2015 to assess whether researchers are following recommendations to avoid using the default parameter configuration when dealing with small sample sizes, or if they are using MaxEnt as a “black box tool.” Our results show that in only 16% of analyzed articles authors evaluated best feature classes, in 6.9% evaluated best regularization multipliers, and in a meager 3.7% evaluated simultaneously both parameters before producing the definitive distribution model. We analyzed 20 articles to quantify the potential differences in resulting outputs when using software default parameters instead of the alternative best model. Results from our analysis reveal important differences between the use of default parameters and the best model approach, especially in the total area identified as suitable for the assessed species and the specific areas that are identified as suitable by both modelling approaches. These results are worrying, because publications are potentially reporting over-complex or over-simplistic models that can undermine the applicability of their results. Of particular importance are studies used to inform policy making. Therefore, researchers, practitioners, reviewers and editors need to be very judicious when dealing with MaxEnt, particularly when the modelling process is based on small sample sizes.
Article
Full-text available
High-quality abundance data are expensive and time-consuming to collect and often highly limited in availability. Nonetheless, accurate, high-resolution abundance distributions are essential for many ecological applications ranging from species conservation to epidemiology. Producing models that can predict abundance well, with good resolution over large areas, has therefore been an important aim in ecology, but poses considerable challenges. We present a two-stage approach to modeling abundance, combining two established techniques. First, we produce ensemble species distribution models (SDMs) of trees in Great Britain at a fine resolution, using much more common presence-absence data and key environmental variables. We then use random forest regression to predict abundance by linking the results of the SDMs to a much smaller amount of abundance data. We show that this method performs well in predicting the abundance of 20 of 25 tested British tree species, a group that is generally considered challenging for modeling distributions due to the strong influence of human activities. Maps of predicted tree abundance for the whole of Great Britain are provided at 1 km² resolution. Abundance maps have a far wider variety of applications than presence-only maps, and these maps should allow improvements to aspects of woodland management and conservation including analysis of habitats and ecosystem functioning, epidemiology, and disease management, providing a useful contribution to the protection of British trees. We also provide complete R scripts to facilitate application of the approach to other scenarios.
Article
Full-text available
Species distribution models (SDMs) have become an essential tool in ecology, bio-geography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.
Article
Full-text available
More than half of the world population is at risk of vector-borne diseases including dengue fever, chikungunya, zika, yellow fever, leishmaniasis, chagas disease, and malaria, with highest incidences in tropical regions. In Ecuador, vector-borne diseases are present from coastal and Amazonian regions to the Andes Mountains; however, a detailed characterization of the distribution of their vectors has never been carried out. We estimate the distribution of 14 vectors of the above vector-borne diseases under present-day and future climates. Our results consistently suggest that climate warming is likely threatening some vector species with extinction, locally or completely. These results suggest that climate change could reduce the burden of specific vector species. Other vector species are likely to shift and constrain their geographic range to the highlands in Ecuador potentially affecting novel areas and populations. These forecasts show the need for development of early prevention strategies for vector species currently absent in areas projected as suitable under future climate conditions. Informed interventions could reduce the risk of human exposure to vector species with distributional shifts, in response to current and future climate changes. Based on the mixed effects of future climate on human exposure to disease vectors, we argue that research on vector-borne diseases should be cross-scale and include climatic, demographic, and landscape factors, as well as forces facilitating disease transmission at fine scales.
Article
Full-text available
Many studies predict that climate change will cause species movement and turnover, but few have considered the effect of climate change on range fragmentation for current species and/or populations. We used MaxEnt to predict suitable habitat, fragmentation and turnover for 134 amphibian species in China under 40 future climate change scenarios spanning four pathways (RCP2.6, RCP4.5, RCP6 and RCP8.5) and two time periods (the 2050s and 2070s). Our results show that climate change may cause a major shift in spatial patterns of amphibian diversity. Amphibians in China would lose 20% of their original ranges on average; the distribution outside current ranges would increase by 15%. Suitable habitats for over 90% of species will be located in the north of their current range, for over 95% of species in higher altitudes (from currently 137–4,124 m to 286–4,396 m in the 2050s or 314–4,448 m in the 2070s), and for over 75% of species in the west of their current range. Also, our results predict two different general responses to the climate change: some species contract their ranges while moving westwards, southwards and to higher altitudes, while others expand their ranges. Finally, our analyses indicate that range dynamics and fragmentation are related, which means that the effects of climate change on Chinese amphibians might be two-folded.
Article
Full-text available
There is an increasing demand for biodiversity mapping to address new challenges in the management of marine ecosystems. Species distribution models are a key tool in supplying part of this information. However, the use of these models in the marine environment is still developing and the reasons for the underlying use of different methodological approaches are not always clear. In this work, we compared four different statistical techniques: the ecological niche factor analysis (ENFA), the MAXimun ENTropy algorithm (MAXENT), general additive Models (GAMs), and Random Forest. ENFA and MAXENT were applied using presence-only data whereas GAM and Random Forest used presence–absence data. As a case study, we used four deep sea urchin species: Centrostephanus longispinus, Coelopleurus floridanus, Stylocidaris affinis, and Cidaris cidaris. The distribution of the studied sea urchins showed strong bathymetric segregation. Depth was the most important variable, followed by reflectivity and slope. The correlations between the predictive outputs of the models were similar between GAM, Random Forest and MAXENT, and lower for ENFA. Models using presence/absence data showed the highest scores in the four species, significantly outperforming ENFA in most of the cases, although differences with MAXENT were significant in only one species.
Article
Full-text available
AimTo assess the usefulness of combining climate predictors with additional types of environmental predictors in species distribution models for range-restricted species, using common correlative species distribution modelling approaches.LocationFlorida, USAMethods We used five different algorithms to create distribution models for 14 vertebrate species, using seven different predictor sets: two with bioclimate predictors only, and five ‘combination’ models using bioclimate predictors plus ‘additional’ predictors from groups representing: human influence, land cover, extreme weather or noise (spatially random data).We use a linear mixed-model approach to analyse the effects of predictor set and algorithm on model accuracy, variable importance scores and spatial predictions.ResultsRegardless of modelling algorithm, no one predictor set produced significantly more accurate models than all others, though models including human influence predictors were the only ones with significantly higher accuracy than climate-only models. Climate predictors had consistently higher variable importance scores than additional predictors in combination models, though there was variation related to predictor type and algorithm. While spatial predictions varied moderately between predictor sets, discrepancies were significantly greater between modelling algorithms than between predictor sets. Furthermore, there were no differences in the level of agreement between binary ‘presence–absence’ maps and independent species range maps related to the predictor set used.Main conclusionsOur results indicate that additional predictors have relatively minor effects on the accuracy of climate-based species distribution models and minor to moderate effects on spatial predictions. We suggest that implementing species distribution models with only climate predictors may provide an effective and efficient approach for initial assessments of environmental suitability.
Article
Full-text available
We used species distribution modeling to investigate the potential effects of climate change on 24 species of Neotropical anurans of the genus Melanophryniscus. These toads are small, have limited mobility, and a high percentage are endangered or present restricted geographical distributions. We looked at the changes in the size of suitable climatic regions and in the numbers of known occurrence sites within the distribution limits of all species. We used the MaxEnt algorithm to project current and future suitable climatic areas (a consensus of IPCC scenarios A2a and B2a for 2020 and 2080) for each species. 40% of the species may lose over 50% of their potential distribution area by 2080, whereas 28% of species may lose less than 10%. Four species had over 40% of the currently known occurrence sites outside the predicted 2080 areas. The effect of climate change (decrease in climatic suitable areas) did not differ according to the present distribution area, major habitat type or phylogenetic group of the studied species. We used the estimated decrease in specific suitable climatic range to set a conservation priority rank for Melanophryniscus species. Four species were set to high conservation priority: M. montevidensis, (100% of its original suitable range and all known occurrence points potentially lost by 2080), M. sp.2, M. cambaraensis, and M. tumifrons. Three species (M. spectabilis, M. stelzneri, and M. sp.3) were set between high to intermediate priority (more than 60% decrease in area predicted by 2080); nine species were ranked as intermediate priority, while eight species were ranked as low conservation priority. We suggest that monitoring and conservation actions should be focused primarily on those species and populations that are likely to lose the largest area of suitable climate and the largest number of known populations in the short-term.
Article
Species distribution models (SDMs) are efficient tools for modeling species geographic distribution under climate change scenarios. Due to differences among predictions of these models, their results are combined using consensus methods to form an ensemble model. This paper provides an optimal combination of the common SDMs according to accuracy and correlation to model the climatic suitability of Quercus brantii in the west of Iran and projects it into the years 2050 and 2070. This is done using 1000 samples of the species presence and absence, 4 bioclimatic variables related to temperature and precipitation, and 10 modeling algorithms. An ensemble combination of Global Climate Models (GCMs) and 4 optimistic and pessimistic greenhouse-gas emissions scenarios were utilized to identify the climatically suitable areas in the years 2050 and 2070. These models were combined using three common statistics, including mean, median, and weighted mean. The predictive accuracies of the single-models and the consensus methods were assessed using the area under the curve (AUC) metric that validates the acceptable performance of the 9 out of the 10 models studied. Applying the genetic algorithm, the best combination of the models was selected including 4 algorithms with accuracy and correlation equal 0.95 and 0.30 respectively. The results show that the Random Forest (RF) model causes less error in the ensemble model and also compensates other models' errors more. Projections into the years 2050 and 2070 showed that in both time periods and under all scenarios, changes will occur in the spatial distribution of this species, and the most severe one would be a 55.6% loss under the most pessimistic scenario in 2070.
Article
Understanding the relationship between the geographical distribution of taxa and their environmental conditions is a key concept in ecology and conservation. The use of ensemble modelling methods for species distribution modelling (SDM) have been promoted over single algorithms such as Maximum Entropy (MaxEnt). Nevertheless, we suggest that in cases where data, technical support or computational power are limited, for example in developing countries, single algorithm methods produce robust and accurate distribution maps. We fit SDMs for 114 Egyptian medicinal plant species (nearly all native) with a total of 14,396 occurrence points. The predictive performances of eight single-algorithm methods (maxent, random forest (rf), support-vector machine (svm), maxlike, boosted regression trees (brt), classification and regression trees (cart), flexible dis-criminant analysis (fda) and generalised linear models (glm)) were compared to an ensemble modelling approach combining all eight algorithms. Predictions were based originally on the current climate, and then projected into the future time slice of 2050 based on four alternate climate change scenarios (A2a and B2a for CMIP3 and RCP 2.6 and RCP 8.5 for CMIP5). Ensemble modelling, MaxEnt and rf achieved the highest predictive performances based on AUC and TSS, while svm and cart had the poorest performance. There is high similarity in habitat suitability between MaxEnt and ensemble predictive maps for both current and future emission scenarios, but lower similarity between rf and ensemble, or rf and MaxEnt. We conclude that single-algorithm modelling methods, particularly MaxEnt, are capable of producing distribution maps of comparable accuracy to ensemble methods. Furthermore, the ease of use, reduced computational time and simplicity of methods like MaxEnt provides support for their use in scenarios when the choice of modelling methods, knowledge or computational power is limited but the need for robust and accurate conservation predictions is urgent.
Article
The Bureau of Land Management (BLM) manages the National Petroleum Reserve-Alaska on the remote North Slope but has limited data on fish distributions on which to base leasing and management decisions. To address this, we used environmental DNA, traditional sampling, watershed landscape characterizations, and maximum entropy modeling to develop species distribution models (SDMs) for 19 fish species. The difficulty of characterizing up stream environments for every stream-reach has limited the development of SDMs for riverine taxa to using either only local conditions or a small subset of potential watersheds. We apply a new technique (StreamCat) to characterize the background variation in watershed conditions. We also assessed how including temporal variation in addition to spatial variation and how adjusting the parameters that controlled model parsimony would affect model performance. The best models (mean TSS = 0.87 across all 19 taxa) used only static data, regularization parameters between 1.0 (default) and 2.0 (slightly more parsimonious), and watershed background data. Important predictors in these models included temperature, slope, and land cover. Approaches like this have great potential for providing critically needed data in rapidly developing but data poor regions like the North Slope of Alaska.
Article
Habitat suitability estimates derived from species distribution models (SDMs) are increasingly used to guide management of threatened species. Poorly estimating species’ ranges can lead to underestimation of threatened status, undervaluing of remaining habitat and misdirection of conservation funding. We aimed to evaluate the utility of a SDM, similar to the models used to inform government regulation of habitat in our study region, in estimating the contemporary distribution of a threatened and declining species. We developed a presence‐only SDM for the endangered New Holland Mouse (Pseudomys novaehollandiae) across Victoria, Australia. We conducted extensive camera trap surveys across model‐predicted and expert‐selected areas to generate an independent data set for use in evaluating the model, determining confidence in absence data from non‐detection sites with occupancy and detectability modelling. We assessed the predictive capacity of the model at thresholds based on (1) sum of sensitivity and specificity (SSS), and (2) the lowest presence threshold (LPT; i.e. the lowest non‐zero model‐predicted habitat suitability value at which we detected the species). We detected P. novaehollandiae at 40 of 472 surveyed sites, with strong support for the species’ probable absence from non‐detection sites. Based on our post hoc optimised SSS threshold of the SDM, 25% of our detection sites were falsely predicted as non‐suitable habitat and 75% of sites predicted as suitable habitat did not contain the species at the time of our survey. One occupied site had a model‐predicted suitability value of zero, and at the LPT, 88% of sites predicted as suitable habitat did not contain the species at the time of our survey. Our findings demonstrate that application of generic SDMs in both regulatory and investment contexts should be tempered by considering their limitations and currency. Further, we recommend engaging species experts in the extrapolation and application of SDM outputs.
Article
Species distribution modelling is a powerful tool that can gives us ecological insights about species distributions, and potential effects of environmental factors, in poorly known habitats. For the first time the distribution of terrestrial reptiles in Saudi Arabia was modelled, and environmental factors that affect their current distribution and richness investigated. Reptiles are a major vertebrate group in Saudi Arabia and protecting them should be a priority for conservation in such an arid environment. Temperature was the most important of eleven predictors. Maximum species richness of reptiles was predicted in the central plateau, north-western borders, and in coastal areas of Saudi Arabia. Overall, the predicted and the observed patterns of species richness followed a similar pattern. Our analysis revealed that large scattered parts of Saudi Arabia are considered under-sampled in terms of sampling efforts of terrestrial reptile species. Our results represent the most comprehensive description of terrestrial reptile diversity distributions and habitat suitability in Saudi Arabia to date.
Article
The random forest (RF) model is a powerful machine learning technique that has been increasingly used for species distribution modeling (SDM) by ecologists and fisheries scientists given various threats to marine habitats and biodiversity. However, the observations for model training are often constrained by limited surveys and financial resources. Under these circumstances, identifying the appropriate sample size for modeling is important for successful predictions. In addition, species with different biological characteristics present various challenges for SDM, which needs to be considered when evaluating model performance. We built and evaluated RF models for 21 marine demersal species using catch data and environmental variables collected during a bottom trawl survey in the coastal waters of Shandong Peninsula, China. The predictive performances of the RF models were evaluated for eight sample sizes using cross validation, in which a range of 10-80 sample sites were used to train the model. The resulting predictive performance was examined for a range of biological and be-havioral traits. For most species, the predictive performance of the RF model was substantially improved when the sample size increased from 10 to 30 sites, but less improvement was evident with larger datasets. An ANOVA identified significant influences of migratory behavior, lifespan, body size, feeding mode and prevalence on the model predictability, whereas the effects of trophic level and taxon were insignificant, as were the interactions between the sample size and species traits. The abundance distributions could be better predicted for benthi-vores, and species with short migratory distances, short lifespans, and small body sizes, and for each species trait, the variation in the relative predictive performances of the trait categorical groups was generally consistent among sample sizes and performance metrics. Our study may contribute to an improved understanding of successful SDM and provide guidance for the application of RF models to predict the abundance distributions of fish species.
Article
1.The many and varied effects of human induced environmental change have the potential to threaten animal biodiversity and species abundance. Importantly, human land use and global climate change are predicted to reduce water availability, which might have negative consequences for freshwater organisms. 2.In this study, we tested for an effect of a shortened hydroperiod on larval growth and development, and post‐metamorphic survival and immune function in a temperate frog, Rana pipiens. 3.Animals developing under pond drying conditions metamorphosed at a smaller size and had lower survival after metamorphosis. We found sex‐specific differences in larval period in our fastest drying treatment, with males metamorphosing more quickly than females. Individuals that developed under drying conditions also showed reduced skin swelling after phytohemagglutinin injection, indicating a compromised immune response. We found support for trade‐offs between growth, development, and post‐metamorphic immune function across hydroperiod treatments. Whole blood from animals with shorter larval periods had lower bacterial killing ability, and small‐bodied juveniles had lower antibody titers. 4.Overall, our results indicate that a shortened hydroperiod can affect the rate of larval amphibian growth and development, and might negatively impact the condition of species that rely on freshwater for development. Our work improves understanding of the complex impacts that environmental stressors might have on the health of animal populations. This article is protected by copyright. All rights reserved.
Article
The random forests (RF) algorithm is a superb learner and classifier in machine learning applications. This ensemble model is also one of the most popular species distribution model algorithms (SDMs) available to date. RF by default can produce categorical and numerical species distribution maps based on its classification tree (CT) and regression tree (RT) algorithms, respectively. Statistically, CT can also produce numerical predictions (class probability). Many real-world applications (e.g. conservation planning) employ binary presence–absence outputs that use classification thresholds to make these conversions. However, there is little available information regarding the difference in model performance between CT and RT for inference settings. Here, under an ensemble modeling framework, 52 forest tree species with presence-only data for all of China were selected for comparison of the performance of CT and RT algorithms in projecting the distribution and potential range shifts of these species under current and future climates. Five climatic variables were used to develop CT and RT models. Eight threshold-setting approaches were employed to convert numerical predictions into binary predictions. With regard to probabilistic predictions, the relative performance of CT and RT depended on the choice of the evaluation criteria. For both RT and CT, threshold-setting methods significantly altered the determination of thresholds, model performance, and subsequently projections of species range shifts under climate change. The four threshold selection methods (MaxKappa, MaxOA, MaxTSS, and MinROCdist) based on the composite model accuracy measures most often achieved significantly higher model performance than CT default threshold method and other threshold methods. They consistently projected that species' geographical ranges changed in response to climate change with the same direction and magnitude. We argue for choosing RT rather than CT as the SDM if model discrimination capacity (the ability to differentiate between occurrences of presence and absence) is viewed as more important than model reliability (the agreement between predicted relative indexes of occurrence and observed proportions of occurrence), and vice versa. In line with gradient theory, we can recommend the use of numerical predictions for species distribution modeling since they help to convey more information than binary predictions. Binary conversion of model outputs should only be carried out when it is clearly justified by the application's objective. The four aforementioned threshold methods are promising objective methods for binary conversions of continuous predictions when presence-only data are available. This study proposes guidelines on how machine learning can be used for specific applied and theoretical applications in a SDM context.
Article
Amphibians are a valuable indicator group to study potential impacts of climate change (CC) because reproduction is closely linked to the availability of fresh water. Climate projections for the humid subtropical region of South America predict an increase in temperature towards the southwest and an increasing of precipitation during the rainy season and decreasing during the dry season. In this context, we aimed to predict the changes in the distribution range of amphibian species and the variation in their richness. In addition, we attempted to determine the most vulnerable species in terms of the extent of habitat loss and the overlap of optimal species distributions by contrasting present and future species range. We modelled the current and future distribution of 55 amphibian species using an inductive approach to model the ecological niche with three different algorithms. We used WorldClim data for current climate and IPPC5 climate projections from Global Climate Model for two greenhouse gas concentrations at 2050. Depending on the CC scenario, between 48 and 57% of the species showed a decrease in their optimal distribution, and 9–10% of them are likely to be affected by further population fragmentation. We identified three types of patterns of change in the geographical distribution of the optimal areas: (I) reduction, (II) displacement, and (III) increase in their distribution range. Future new areas with favourable conditions may not be reached due to the low dispersion tendency of amphibians. For this reason, it is important to identify those current favourable areas that are maintained in the different future scenarios. In this sense, this study allows to highlight priority areas for the conservation of the studied species and to identify those being highly vulnerable to the predicted scenarios. Our results contribute to the knowledge of how different future climates scenarios could affect the conservation of the studied amphibian species and provide key information for the development of strategies and public policies for management and biodiversity conservation.
Article
Polyporus umbellatus is a fungus that has been used medically as a diuretic for thousands of years in China. To evaluate the impacts of climatic change on the distribution of P. umbellatus, we selected the annual mean air temperature, isothermality, minimum temperature of the coldest month, annual temperature range, annual precipitation and precipitation seasonality and used observations from the 2000s and simulated values from two future periods (2041 to 2060 and 2061 to 2080) to build an ensemble model (EM); then, we developed a comprehensive habitat suitability model by integrating soil and vegetation conditions into the EM to assess the distribution of suitable P. umbellatus habitats across China in the 2000s and the two future periods. Our results show that annual precipitation and annual mean air temperature together largely determine the distribution of P. umbellatus and those suitable P. umbellatus habitats generally occur in areas with an optimal annual precipitation of approximately 1000 mm and an optimal annual mean air temperature of approximately 13 °C. In other words, P. umbellatus requires a humid and cool environment for growth. In addition, brown soils with a granular structure and low acidity are more suitable for P. umbellatus. Furthermore, we have observed that the distribution of P. umbellatus is usually associated with the presence of coniferous, mixed coniferous, and broad-leaved forests, suggesting that these vegetation types are suitable habitats for P. umbellatus. In the future, annual precipitation and annual mean air temperature will continue to increase, consequently increasing the availability of habitats suitable for P. umbellatus in northeastern and southwestern China but likely leading to a degradation of suitable P. umbellatus habitats in central China.
Article
Amphibian diversity in Neotropical mountains habitats is at risk, particularly those species associated with stream habitats at altitudes >500 m above sea level (a.s.l.). This pertains especially to the amphibian diversity of Mexico, where the number of species is high on the central and southwestern highlands. In the present study, we predicted the potential distribution of Ambystoma ordinarium using a Geographic Information System modeling approach. We used survey data from 2013 to 2015 and historical data reported in databases and literature, and employed environmental variables from the WorldClim-Global Climate Data Project. Our results indicate that a single factor, Mean Diurnal Range, contributed most to the model, followed by other factors (Minimum Temperature of the Coldest Month and Precipitation of the Driest Month). The conservative predicted distribution was 5256 km², especially in areas have dynamic aquatic ecosystems (e.g., small streams). The highest probability of occurrence of the species at locations of 1900-2900 m a.s.l., with 13.7-16.3°C diurnal terrestrial air temperatures, and annual precipitation of 829-1454 mm. In these areas, native forest vegetation has decreased by almost 250 km², and native grassland by 280 km². Agricultural activities, human settlements, and secondary succession vegetation increased by 160, 120, and 330 km², respectively. We infer that A. ordinarium is susceptible to changes in habitat, with most of the constraint on the distribution of this species arising from deforestation, increased urbanization and agricultural activities. Based on our model, and a recent genetic study, we suggest that the population of this species from lower elevations could be considered a different taxon. Consequently, the relative species distribution boundaries should be redefined, and appropriate monitoring programs redesigned to support conservation of the Michoacan stream salamander.
Article
Foxtail millet is one of the main food crops in arid and semi-arid areas of China. Due to its strong anti-adversity, wide adaptability and resistance against drought and barren, the foxtail millet is treated as an important strategic crop reserve for the future drought situation. In this study, data from 157 geographical distributions were used to choose 10 climatic indices, 7 soil indices and 3 topographical indices, which were based on the relationship between the foxtail millet production and the environmental factors. Four species distribution models, including maximum entropy model (MaxEnt), ecological niche factor analysis (ENFA), random forest (RF) and generalized additive model (GAM), were applied to analyze the potential geographic distribution of foxtail millet in China. The results showed that all four models did a good job in simulating the potential geographic distribution for foxtail millet and the MaxEnt model was the best one. Precipitation and temperature were most sensitive to the distribution of foxtail millet among all selected environmental factors. The outputs of models, together with the ArcGIS spatial analyst module, displayed that the total potential suitable growing regions for the foxtail millet, including the highly and moderately suitable gro-wing regions, occupied 55.68×10⁴ km², which were much larger than the actual foxtail millet gro-wing area. The potential suitable growing regions were mainly located in northeast China, including the Northeast Plain, south of Changbai Mountain and Mudanjiang River basin, north China, including north of the Huaihe River, central China, including east of Hanjiang River and north of Dabie Mountains, northwest China, including Loess Plateau, the southern Ordos Plateau, the eastern Qilian Mountains, the eastern Tianshan Mountains and the Altai Mountains, and southwest China, including north of Chongqing and the western Guizhou Province.
Article
Objectives: Detailed and reliable information about the spatial distribution of species provides important information for species conservation management, especially in the case of rare species of conservation interest. We aimed to study the consequences of climate change on geographical distributions of the tertiary rare tree species Thuja sutchuenensis Franch. (Cupressaceae) to provide reference for conservation management of this species, including priority area selection for introduction and cultivation of the species. We expect that this approach could be promising in predicting the potential distribution of other rare tree species, and as such can be an effective tool in rare tree species restoration and conservation planning, especially species with narrow distribution or raw presence-only occurrence data. Methods: 107 records covering the whole distribution range of T. sutchuenensis in the Daba Mountains were obtained during a 3-year field survey. The principle of maximum entropy (Maxent) was used to model the species’ potential distribution area under paleoclimate, current and future climate background. Results: The Maxent model was highly accurate with a statistically significant AUC value of 0.998, which is higher than 0.5 of a null model; The location of the potential distribution for the last interglacial period is in southeastern China, with the largest optimal habitat area being only 1666 km2. In other periods, the central location of the potential distribution is accordant with the real present distribution, but the model’s predicted optimal habitat area is outside the current distribution. Conclusions: Our findings can be applied in various ways such as the identification of additional localities where T. sutchuenensis may already exist, but has not yet been detected; the recognition of localities where it is likely to spread to; the priority selection area for introduction and cultivation and the conservation management of such rare tree species.
Article
As part of the development of the 2011 National Land Cover Database (NLCD) tree canopy cover layer, a pilot project was launched to test the use of high-resolution photography coupled with extensive ancillary data to map the distribution of tree canopy cover over four study regions in the conterminous US. Two stochastic modeling techniques, random forests (RF) and stochastic gradient boosting (SGB), are compared. The objectives of this study were first to explore the sensitivity of RF and SGB to choices in tuning parameters and, second, to compare the performance of the two final models by assessing the importance of, and interaction between, predictor variables, the global accuracy metrics derived from an independent test set, as well as the visual quality of the resultant maps of tree canopy cover. The predictive accuracy of RF and SGB was remarkably similar on all four of our pilot regions. In all four study regions, the independent test set mean squared error (MSE) was identical to three decimal places, with the largest difference in Kansas where RF gave an MSE of 0.0113 and SGB gave an MSE of 0.0117. With correlated predictor variables, SGB had a tendency to concentrate variable importance in fewer variables, whereas RF tended to spread importance among more variables. RF is simpler to implement than SGB, as RF has fewer parameters needing tuning and also was less sensitive to these parameters. As stochastic techniques, both RF and SGB introduce a new component of uncertainty: repeated model runs will potentially result in different final predictions. We demonstrate how RF allows the production of a spatially explicit map of this stochastic uncertainty of the final model. © 2016 National Research Council of Canada, All rights reserved.
Article
Species distribution models (SDM) have been routinely used for the purpose of species conservation and biodiversity management, especially in the context of global climate change. However, there is little knowledge about the uncertainty source on the SDM for the predictions in aquatic ecosystems, especially in the large-scale research. Therefore, we contribute to the first perspective on the uncertainties of SDMs in predicting fish species distribution in lake ecosystems. In total, 92 fish species were predicted with climatic and geographical variables, respectively, using nine widely implemented species distribution models. Generally, we focused on the potential impacts from two main kinds of uncertainty sources: species characteristics (containing species prevalence, altitude range, temperature range and precipitation range) and model technique (calibration technique and evaluation technique). Finally, our results highlight that predictions from single SDM were so variable and unreliable for all species while ensemble approaches could yield more accurate predictions; we also found that there was no significant influence on the model outcomes from the evaluation measures; we emphasized that species characteristics as species prevalence, altitude range size and precipitation range size would strongly affect the outcomes of SDMs, but temperature range size didn’t show a significant influence; our findings finally verified the hypothesis that species distributed with a smaller range size could be more accurately predicted than species with large range size was plausible in aquatic ecosystems. Our research would provide promising insights into the prediction of fish species in aquatic ecosystems under the impacts of global climate change, especially for the conservation of endemic fish species in China. Moreover, our results improved the understanding of uncertainties from species characteristics and modelling techniques in species distribution model.
Article
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence–environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence–environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building ‘under fit’ models, having insufficient flexibility to describe observed occurrence–environment relationships, we risk misunderstanding the factors shaping species distributions. By building ‘over fit’ models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.