Article

Modeling of species distributions with MAXENT: new extensions and a comprehensive evaluation

Wiley
Ecography
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use ‘‘default settings’’, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presenceabsence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce ‘‘hinge features’ ’ that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore ‘‘background sampling’’ strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... To generate the ecological niche models, we employed the maximum entropy algorithm, Maxent 3.4.1 (Phillips and Dudík 2008;Phillips et al. 2018), using standard features. This algorithm has shown favorable performance compared to other niche modeling methodologies (Ferrier et al. 2006;Hernandez et al. 2006;Wisz et al. 2008). ...
... In other words, a species will not change its environmental requirements on a short time scale, and the non-occupied areas correspond to the non-realized climate niche rather than niche expansion (Guisan et al. 2014;Liu et al. 2017). The subject also involves the idea of transferability when a potential distribution model calibrated in one region is applied or transferred to another region (Peterson et al. 2007;Phillips 2008;Liu et al. 2022). One way to improve the prediction of climate niche models of invasive species is to Calibrate them using trade data. ...
... These areas, affected by human activities, are believed to exhibit diminished biotic resistance (Elton 1958), rendering them particularly conducive for the establishment of invasive species. It is crucial to emphasize that the models might lead to underestimated areas due to insufficient data collection in regions of limited scientific exploration or nonsubmission of collection data by specific organizations or countries (Phillips 2008). These factors could result in larger areas of invasion and competition. ...
Article
Full-text available
Global pet trade demand has led to the introduction of large constrictor snakes into new environments either intentionally or accidentally. Brazil has the third-highest reptile species diversity globally, with snakes representing the predominant reptilian group, including 12 constrictor species. The potential for competition and predation between invasive and native snakes underscores the need for comprehensive assessment of the risks faced by endemic herpetofauna. This study aimed to identify potential areas for the establishment of invasive Python and assess their impact on native constrictors in Brazil. Environmental Niche Models were employed to predict suitable habitats for invasive pythons and the four endemic Brazilian snake species. By overlapping Python spp. records with those of endemic serpents, this study sought to understand the resource availability for potentially invasive species and the vulnerability of native species to Python invasion. These results highlight Python sebae and Python bivittatus as potentially invasive species that threaten native constrictors. Conversely, Eunectes murinus, with its semi-aquatic behavior, exhibited lower vulnerability. Endemic serpents, including Boa constrictor, Corallus hortulanus, and Epicrates cenchria, were identified as being highly susceptible to potential competition from invasive pythons. These findings emphasize the importance of understanding the potential ecological impacts of introducing invasive species into native ecosystems.
... Recently, SDMs have been applied to large-scale datasets to estimate global species distributions (Heshmati et al., 2019;Lee et al., 2021). When using fine-grain background data, computational costs become prohibitively high (Phillips & Dudík, 2008), often mitigated by employing random sampling of background data. The default sample size for random sampling in Maxent is set to 10,000 (Hijmans et al., 2023), although studies suggest that 86,000 points are needed for reliable results (Renner et al., 2015), with recent recommendations suggesting 50,000 based on area under the ROC curve (AUC) values (Valavi et al., 2022). ...
... The default sample size for random sampling in Maxent is set to 10,000 (Hijmans et al., 2023), although studies suggest that 86,000 points are needed for reliable results (Renner et al., 2015), with recent recommendations suggesting 50,000 based on area under the ROC curve (AUC) values (Valavi et al., 2022). Practical strategies include increasing the number of background points until model fitting and predictive performance stabilize (Phillips & Dudík, 2008). This paper introduces a different approach to improve computational efficiency in Maxent and, equivalently, in PPM, by using all locations of background data and applying the cumulantbased approximation (CBA) to the normalization constant of Maxent or the numerical integration of the intensity function of PPM. ...
... The runtime for seeking estimators in both PPM and Maxent (Phillips et al., 2006) is prohibitively slow for large n in the iteration process of learning algorithms (Phillips & Dudík, 2008). Valavi et al. (2022) illustrate the relation between the runtime and n together with the estimation accuracy using a species in Australia. ...
Preprint
Species distribution modeling plays an important role in estimating the habitat suitability of species using environmental variables. For this purpose, Maxent and the Poisson point process are popular and powerful methods extensively employed across various ecological and biological sciences. However, the computational speed becomes prohibitively slow when using huge background datasets, which is often the case with fine-resolution data or global-scale estimations. To address this problem, we propose a computationally efficient species distribution model using a cumulant-based approximation (CBA) applied to the loss function of $\gamma$-divergence. Additionally, we introduce a sequential estimating algorithm with an $L_1$ penalty to select important environmental variables closely associated with species distribution. The regularized geometric-mean method, derived from the CBA, demonstrates high computational efficiency and estimation accuracy. Moreover, by applying CBA to Maxent, we establish that Maxent and Fisher linear discriminant analysis are equivalent under a normality assumption. This equivalence leads to an highly efficient computational method for estimating species distribution. The effectiveness of our proposed methods is illustrated through simulation studies and by analyzing data on 226 species from the National Centre for Ecological Analysis and Synthesis and 709 Japanese vascular plant species. The computational efficiency of the proposed methods is significantly improved compared to Maxent, while maintaining comparable estimation accuracy. A R package {\tt CBA} is also prepared to provide all programming codes used in simulation studies and real data analysis.
... For instance, the R package (Hijmans 2018) is widely used, as are comprehensive multi-model platforms such as 'SDM' (Nguyen et al. 2021) and 'biomod2' (Ngila et al. 2023). Additionally, adjustments can be made using generalised linear models to refine models, as seen with the MaxEnt approach (Phillips et al. 2017). Recently, Google developers have incorporated MaxEnt into Google Earth Engine (GEE) (Gorelick et al. 2017). ...
... • ACCESS scenario from the Australian Research Council Centre of Excellence for Climate System Science. (Phillips et al. 2017). A total of 30% of the occurrence sample data were reserved for assessing the model's capacity, while the remaining 70% were used for training. ...
... The MaxEnt model can simultaneously incorporate both continuous and discrete variables as input data. This model has been widely used in habitat zoning for the conservation of various plant species worldwide (Phillips and Dudík 2008, Warren and Seifert 2011, Nguyen et al. 2021). ...
Article
Full-text available
Cinnamomum parthenoxylon (Jack) Meisn. is a tree in genus Cinnamomum that has been facing global threats due to forest degradation and habitat fragmentation. Many recent studies aim to describe habitats and assess population and species genetic diversity for species conservation by expanding afforestation models for this species. Understanding their current and future potential distribution plays a major role in guiding conservation efforts. Using five modern machine-learning algorithms available on Google Earth Engine helped us evaluate suitable habitats for the species. The results revealed that Random Forest (RF) had the highest accuracy for model comparison, outperforming Support Vector Machine (SVM), Classification and Regression Trees (CART), Gradient Boosting Decision Tree (GBDT) and Maximum Entropy (MaxEnt). The results also showed that the extremely suitable ecological areas for the species are mostly distributed in northern Vietnam, followed by the North Central Coast and the Central Highlands. Elevation, Temperature Annual Range and Mean Diurnal Range were the three most important parameters affecting the potential distribution of C. parthenoxylon. Evaluation of the impact of climate on its distribution under different climate scenarios in the past (Last Glacial Maximum and Mid-Holocene), in the present (Worldclim) and in the future (using four climate change scenarios: ACCESS, MIROC6, EC-Earth3-Veg and MRI-ESM2-0) revealed that of C. parthenoxylon would likely expand to the northeast, while a large area of central Vietnam will gradually lose its adaptive capacity by 2100.
... Among many Species Distribution Models (SDMs) algorithms, MaxEnt is the most useful and popular tool for predicting suitable areas for endangered species. Even with a few occurrence records, MaxEnt utilizes presence-only datasets [11][12][13]. MaxEnt performs well with incomplete datasets, has a fast model runtime, is simple to use, and has reduced sample size requirements [11,14,15]. ...
... The MaxEnt model has frequently been utilized in prior research to forecast the potential ranges of numerous endangered medicinal and aromatic species across various Mediterranean regions [18,19]. During the prediction process, the model achieved an AUC value of 0.995, indicating outstanding performance [12]. The model's accuracy is contingent on both the quality and quantity of occurrence records, as well as the choice of environmental variables [20,21]. ...
Article
Although ecology is important for plant growth, survival, biodiversity and distribution. Climate is considered the most influential driver of spatial patterns in plant species, particularly those with limited distribution ranges. One of those species in Morocco is Thymus atlanticus a vulnerable aromatic and medicinal endemic species, only found in high mountains and threatened by overgrazing in Morocco. In this study, we used MaxEnt to model a current potential appropriate distribution to serve as a basis for its protection and preservation and as well as determine the main climatic factors that affect its geographical distribution. In our analysis, we employed a dataset comprising 32 field-based occurrence points and incorporated 20 environmental variables, consisting of 19 bioclimatic variables and one topographic variable (elevation), to project the potential distribution area of this particular species. In this prediction process, the area under curve (AUC) value was 0.995 and the standard deviation is 0.006. The response curves unveiled that this species exhibits a preference for habitats characterized by an annual mean temperature (Bio 1) ranging from 9 to 11 °C. Furthermore, precipitation seasonality (Bio 15) is optimal of T. atlanticus at 28 mm. However, the peak temperature seasonality (Bio 4) is 742C of V. T. atlanticus grows optimally at high elevation around 2300 m a.s.l. The potential distribution map of T. atlanticus shows the area predicted under least and unsuitable regions were 14,107 km2 and 688,970 km2, respectively. The remaining 4862 km2 and 2908 km2 areas were found moderately and highly suitable respectively. Thus, the suitability areas in Morocco of T. atlanticus are primarily concentrated in Atlas Mountains on all its chains: the High-Atlas, the Eastern Middle-Atlas and the Anti-Atlas. These results could be beneficial to developing adaptive approaches to management that enhance performance T. atlanticus protection and rehabilitation.
... Different species distribution model approaches are being used to determine suitable habitat for wildlife species (Ouyang et al. 1995;Schadt et al. 2002;Phillips et al. 2006;Bosso et al. 2022). In this context, presence-only models are the most used for conserving biodiversity, where Maxent has frequently demonstrated accurate prediction abilities with high validation performances (Phillips et al. 2006;Hijmans and Graham 2006;Phillips and Dudík 2008;Elith and Leathwick 2009;Clements et al. 2012). Maxent (Phillips et al. 2006) is a software based on the principles of maximum entropy (Javidan et al. 2021) to explore the potential distribution of species across space by using only occurrences and environmental data (Du et al. 2021;Gull et al. 2022), and on species presence data. ...
... An AUC of 1 means your model perfectly discriminate presences from absences. Our modeling had an AUC value of 0.993, confirm that the model is valid and the predicted values are accurate (Phillips and Dudík 2008;Tanner et al. 2020;Hosni et al. 2020Hosni et al. , 2022. ...
Article
Full-text available
Identification and assessment of habitat suitability are essential to the conservation of threatened species such as the Asiatic black bear (Ursus thibetanus) in Pakistan. Regionally, for example, in the Hindu Kush Mountains, there has been growing public concern regarding negative impacts on the bears’ natural habitats due to land use and climate change. Many of the efforts to identify and conserve suitable habitats are based on limited data and have been unable to accurately predict habitat preferences. This study aims to fill this gap by developing predictive models for U. thibetanus based on the integration of new occurrences and climate and land cover data. We installed camera traps in 81 different locations across a gradient of elevation. Over the duration of 413 trap nights, we collected 110 different bear detections at 31 camera stations. The bear favored densely forested regions between 1,835 m and 3,348 m above sea level, with a catch rate of 26.6/100 trap nights. Our models demonstrated high levels of prediction accuracy (AUC > 0.97) and predicted that 43% of the total area would make a good habitat for bears. The mean temperature of coldest quarter, normalized difference vegetation index, and annual mean temperature were the main determinants of habitat suitability. The findings of this study, which is the first to map the current distribution and suitable habitat of the Asiatic black bear in the Hindu Kush Mountain Range, contribute new local-scale habitat suitability data to the study of bears in Swat Valley, Pakistan. Our results may be used to provide important conservation information for U. thibetanus that is useful to policymakers for improving future management planning.
... In the species distribution model, the Maxent approach utilizes environmental information to estimate the occurrence of a given species, which can then be used to construct a spatial potential distribution map (Çoban et al., 2020;Phillips & Dudík, 2008;Zhang & Wang, 2022). In addition, it also provides the levels of interdependence between variables and predictions during the training process (Urbani et al., 2017). ...
... In addition, it also provides the levels of interdependence between variables and predictions during the training process (Urbani et al., 2017). In this study, we applied the Maxent program developed by Phillips and Dudík (2008) to develop a species distribution model for J. subtriplinerve. We developed the model with the following parameter settings. ...
Article
Full-text available
Jasminum subtriplinerve, a rare and valuable medicinal plant, is facing the problem of habitat loss. This study was conducted to determine the influences of environmental factors on the distribution and identify habitat suitability areas for this species in Central Vietnam. Based on the 19 bioclimatic, 10 soil properties, 3 topographic variables and 86 observed locations, we used Correlation Analysis processing, follow by the Maximum entropy (MaxEnt) algorithm was applied to predict the spatial potential distribution of J. subtriplinerve. Using the MaxEnt model for analyzing the J. subtriplinerve distribution showed a good performance (AUC testing = 0.821 and AUC training = 0.887). The most influential factors on J. subtriplinerve distribution were Altitude (49.2%), Mean diurnal range (10.6%), Isothermality (6.9%), Bulk density (5.2%), Normalized difference vegetation index (4.7%) and Proportion of sand particles (4.6%). In addition, the model showed that very high and high potential habitat areas of J. subtriplinerve occupied 3.17% (658.88 km2) and 8.95% (1,858.56 km2) of the studied site, distributed in the midland and low mountainous areas, the most suitable Altitude ranged from 8.1 to 138.7 m. Meanwhile, 15.17%, 18.10% and 54.60% of the studied site were nominated as moderate (3,148.93 km2) low (3,758.10 km2), and very low (11,334.17 km2) potential habitat areas, respectively. This study comprehensively evaluated factors affecting habitat suitability, therefore, these results provide an understanding of the bioecological distribution of J. subtriplinerve guiding the identification of optimal areas for cultivation and conservation.
... We predicted the distribution of suitable Kaputar rock skink habitat using the Maximum Entropy species distribution modelling algorithm, MaxEnt (Phillips et al. 2006). Species distribution modelling explores the contrast between environmental conditions of occupied sites (i.e. ...
... Feature classes determine the flexibility of the shape of the relationship of covariates to response. The regularisation multiplier dictates the penalty for model complexity, with lower values resulting in a model more fitted to the presences (over-fitted), and higher values in more spread out (underfitted) model predictions (Phillips and Dudík 2008;Elith et al. 2011;Merow et al. 2013). Notably, as one of our aims was to predict new areas of potential occurrence, we did not use values of the regularisation multiplier lower than 1. ...
Article
Full-text available
Context. Knowledge of species' distribution and habitat associations is fundamental for conservation planning and management, especially in the context of range-restricted taxa. The Critically Endangered Kaputar rock skink (Egernia roomi) is a high elevation species that is restricted to the Nandewar Ranges (New South Wales, Australia). The species was not formally recognised until 2019, with its distribution, ecology, and threats poorly known. Aims. To determine the geographical distribution of the Kaputar rock skink and explore its ecology and threats. Methods. We performed surveys throughout high elevation regions of Mount Kaputar National Park, targeting suitable habitat for the Kaputar rock skink (rock outcrops and plateaux). Species distributional modelling (SDM) was used to identify potentially suitable habitat outside of our search areas. Key results. We detected the species at all historical record sites and at 15 new sites, increasing the species' known area of occupancy (AOO) four-fold (from 8 km 2 to 40 km 2), and elevational range threefold (from 1360-1480 m to 1147-1509 m). Conclusion. The AOO for the species now exceeds the IUCN Red List threshold for Critically Endangered, but falls within the range for Endangered under Criterion B. Our SDMs indicated that all predicted suitable habitat for the species falls within the region that we surveyed in this study. Implications. Our study provides valuable information on the geographic range of a threatened lizard species and evaluates the potential impact of large-scale fires on the persistence of the species.
... The biometric variables were applied to the entire world map, and a subset was created for Brazil to highlight the native distribution area of the morphotypes (supplementary material). An initial analysis was conducted to build the model with all 19 bio-variables in MaxEnt v.3.2.1 (Phillips and Dudík 2008), using Jackknife to measure variable importance and then selecting the ones that contributed most for a second modeling. Some default MaxEnt values were retained, with the maximum number of iterations set to 500 and a convergence threshold of 10 −5 , while modifying the random test percentage parameter to 20 (20% test data and 80% training data; Phillips et al. 2006;Phillips and Dudík 2008). ...
... An initial analysis was conducted to build the model with all 19 bio-variables in MaxEnt v.3.2.1 (Phillips and Dudík 2008), using Jackknife to measure variable importance and then selecting the ones that contributed most for a second modeling. Some default MaxEnt values were retained, with the maximum number of iterations set to 500 and a convergence threshold of 10 −5 , while modifying the random test percentage parameter to 20 (20% test data and 80% training data; Phillips et al. 2006;Phillips and Dudík 2008). The AUC (Area Under the Curve) measure was used to avoid random predictions, with a threshold set above 0.5 (Phillips et al. 2006). ...
Article
Libidibia is a small genus of caesalpinioid legumes with seven species spanning from Mexico and the Caribbean to southern South America. Within this genus, Libidibia ferrea stands out as an iconic Brazilian tree currently classified into the varieties ferrea, glabrescens, leiostachya, and parvifolia. They comprise a species complex together with three other varieties currently accepted as synonyms (var. cearensis, var. megaphylla and var. petiolulata). Together they exhibit complex morphological variation, along with confusion regarding their common names and geographic distribution. Five distinct morphotypes were recognized which were compared using a morphometric study of 26 quantitative leaf characters. We also performed ecological niche modeling for those morphotypes spanning from Quaternary to the present. Principal Component Analysis (PCA) and Discriminant Analysis (DA) revealed four main clusters which also present distinct niche preferences throughout the Quaternary and current distinct geographical distributions. Based on our findings, we propose recognizing four morphotypes as separate species: L. ferrea, L. juca, L. leiostachya, and L. parvifolia. Libidibia ferrea and L. juca are small trees and shrubs, distributed respectively in the southern portion of the ‘Caatinga’ and from the Amazon to the northern ‘Caatinga’ region. On the other hand, L. leiostachya and L. parvifolia are both tall trees, predominantly inhabiting the wetter regions of the ‘Caatinga’ (L. parvifolia) and extending into the coastal rainforests of southeastern Brazil (L. leiostachya). Three new combinations are proposed and an identification key, diagnostic descriptions, and taxonomic notes are presented.
... A low AI indicates relatively dry climatic conditions, whereas a high AI indicates relatively wet climatic conditions. Habitat suitability for B. ermanii in Japan was predicted using the maximum entropy principal algorithm in MaxEnt (Phillips & Dudík, 2008). Detailed information on this estimation can be found in Aihara et al. (2024). ...
Article
Full-text available
As plant distribution and performance are determined by both environmental and genetic factors, clarifying the contribution of these two factors is a key for understanding plant adaptation and predicting their distribution under ongoing global warming. Betula ermanii is an ideal species for such research because of its wide distribution across diverse environments. Stomatal density and size are crucial traits that plants undergo changes in to adapt to different environments as these traits directly influence plant photosynthesis and transpiration. In this study, we conducted a multi‐location common garden experiment using B. ermanii to (1) clarify the contribution of both environmental and genetic factors to the variation in stomatal density and size of B. ermanii, (2) demonstrate the differences in the plasticity of stomatal density and size among B. ermanii populations, and (3) understand how stomatal density and size of B. ermanii would respond to increased temperature and changing precipitation patterns. Genetic factors played a more significant role in stomatal size than environmental factors, suggesting that B. ermanii struggles to adjust its stomatal size in response to a changing environment. Our results also revealed a positive correlation between stomatal size plasticity and original habitat suitability, indicating that in B. ermanii populations in harsh environments exhibit lower adaptability to environmental shifts. Although stomatal density and size of B. ermanii showed the significant responses to increased temperature and shifting precipitation patterns, the response ranges of stomatal density and size to the environmental factors varied among populations. Our findings highlighted the interplay between genetic and environmental factors in determining the intraspecific variation in stomatal density and size in B. ermanii. This indicated that certain populations of B. ermanii exhibit limited stomatal plasticity and adaptability, which could directly affect photosynthesis and transpiration, suggesting potential population‐specific fitness implications for B. ermanii under future climate change.
... For both the random and the target group approach, we selected pseudo-absence points for the presence-absence algorithms in our ensembles and background points for the Maxent algorithm (see below). The number of pseudo-absence points was set to four times the number of presence points (Barbet-Massin et al. 2012) and the number of background points was set to 20,000 (Phillips and Dudík 2008). Pseudo-absence and background points in both the random and target group approach were selected from the area enclosed by a convex hull constructed around all presence points of cultivated cacao and of wild cacao extended with a buffer corresponding to 10% of the hull's largest axis (Acevedo et al. 2012) (Fig. S1). ...
Article
Full-text available
Climate change is expected to impact cacao cultivation in Ecuador, the fifth largest cacao producing country in the world and largest exporter of fine flavour cacao. The objective of this study was to evaluate the future impact of climate change on the suitable distribution of cultivated and wild cacao and identify areas where climate change tolerant genotypes may occur in Ecuador. Using 26,152 presence points for cultivated cacao and 95 presence points for wild cacao, we modelled the present suitability distribution of cultivated and wild cacao and performed future climate projections under two greenhouse gas emission scenarios (SSP2-4.5 and SSP3-7.0) and two time periods (2050s and 2070s). For both cultivated and wild cacao, we constructed six different ensemble models employing different filtering methods for presence points, we projected each ensemble model to future climatic conditions, and we then built the final maps of present distribution and future projections based on the majority-vote criterion. Our future projections predict a 8–16% contraction and 19–21% expansion of the currently suitable area of cultivated cacao, while wild cacao is expected to maintain most of its suitable area and experience a further 7–12% expansion in the future. Ecogeographical zones are predicted to change in 23-33% of the combined distributions of cultivated and wild cacao. We identified the areas in Ecuador where populations of climate change tolerant genotypes are expected to occur. Interventions to promote adaptation to climate change will be required in cacao cultivation areas that are expected to be impacted by climate change in Ecuador, including the use of tolerant genotypes.
... The AUC ratio was calculated by the AUC value at an error rate of 5% (E = 0.05), where a ratio greater than one indicated good model performance [31]. Predictive performance was categorized as good (TSS ≥ 0.6; Kappa ≥ 0.75; AUC ≥ 0.9), moderate (0.2 ≤ TSS ≤ 0.6; 0.4 ≤ Kappa ≤ 0.75; 0.7 ≤ AUC ≤ 0.9), or poor (TSS ≤ 0.2; Kappa ≤ 0.4; AUC ≤ 0.7) [32]. ...
Article
Full-text available
Global climate changes are expected to profoundly shape species distribution. Quercus oxyphylla, a valuable evergreen broad-leaved tree species, is rigorously conserved and managed in China owing to its substantial scientific, economic, and ecological value. However, the impact of projected climate change on its future distribution and potential climatic drivers remains unclear. Here, a maximum entropy model (MaxEnt) was used to explore the distribution of Q. oxyphylla in China under current conditions and three future scenarios (SSP1-2.6, SSP2-4.5, and SSP5-8.5) for the 2050s and 2070s. We optimized the model using the ‘ENMeval’ package to obtain the best parameter combination (RM = 1, FC = LQHPT), and multiple evaluation metrics (AUC ≥ 0.9; TSS ≥ 0.6; Kappa ≥ 0.75) verified the high accuracy of the model and the reliability of the prediction results. We found the following: (1) The potential distribution of Q. oxyphylla spans across 28 provinces in China under current climatic conditions, predominantly in southern regions, with Sichuan exhibiting the largest suitable area for survival. The total suitable habitat covers 244.98 × 104 km2, comprising highly, moderately, and poorly suitable habitats of 51.66 × 104 km2, 65.98 × 104 km2, and 127.34 × 104 km2, respectively. (2) Under future climate conditions, the overall geographical boundaries of Q. oxyphylla are predicted to remain similar to the present one, with an increase of 10.29% in the 2050s and 11.31% in the 2070s. In the 2050s, the total suitable habitats for Q. oxyphylla under the three scenarios (SSP1-2.6, SSP2-4.5, and SSP5-8.5) might increase by 8.83%, 9.62%, and 12.42%, while in the 2070s they might increase by 10.39%, 17.21%, and 6.33%, respectively. (3) Moreover, the centroid of the suitable area is expected to migrate southwestward under the three scenarios in the future. (4) Annual precipitation, isothermality, and temperature annual range emerged as the main factors influencing the distribution of Q. oxyphylla, with contributions of 55.9%, 25.7%, and 13.5%, respectively. Our findings refined the spatial arrangement of Q. oxyphylla growth and revealed its climate resilience. This suggested that under climate change, Sichuan and Shaanxi are the optimal regions for cultivation and management, while appropriate conservation strategies should be formulated in Tibet and Hubei.
... Per lo sviluppo di un modello di idoneità ecologica in ciascuna delle due aree di studio, è stato impiegato il metodo MaxEnt, che elabora la potenziale distribuzione geografica di una specie utilizzando informazioni unicamente sulla sua presenza e diverse variabili ambientali (PhilliPS et al., 2006;PhilliPS & dudík, 2008 ...
Article
Full-text available
Abstract - The orchid Himantoglossum adriaticum H. Baumann in Lombardy. In the past five years, the monitoring of the Lombard populations of Himantoglossum adriaticum was undertaken, since this orchid had been included in Annexes II and IV of the "Habitats" Directive (92/43/EEC). Records of H. adriaticum acquired from various sources allowed the interpretation of its distribution and understanding the species from a geoecological perspective, in order to confirm that H. adriaticum is strictly confined to calcareous soils. In the Alpine area, the distribution mainly corresponds to the hilly system; accordingly, it occurs at significantly lower altitudes than the Apennine area, where the distribution is more uniform and influenced by land use. The number of individuals does not differ between the two analysed areas, where small populations with fewer than ten individuals are prevailing. The comparison between the ecological suitability model and the distribution of known populations highlights significant potential, which could indicate an underestimation of the presence of H. adriaticum or areas not yet colonized. Riassunto - Nell'ultimo quinquennio è stato intrapreso un monitoraggio delle popolazioni lombarde di Himantoglossum adriaticum, in quanto questa orchidea è inserita negli Allegati II e IV della Direttiva "Habitat" (92/43/CEE). Le segnalazioni di H. adriaticum ottenute da diverse fonti hanno consentito di precisarne la distribuzione e di inquadrare la specie dal punto di vista geoecologico, così da confermare che H. adriaticum è strettamente confinato a suoli di natura carbonatica. Nell'area alpina la distribuzione corrisponde in prevalenza al sistema collinare; quindi è presente a quote sensibilmente inferiori rispetto a quanto si riscontra nell'area appenninica, dove la distribuzione risulta più uniforme e condizionata dall'uso del suolo. Il numero di individui non varia tra le due aree analizzate, dove prevalgono piccole popolazioni con meno di dieci individui. Il confronto tra il modello di idoneità ecologica e la distribuzione delle popolazioni note evidenzia una ampia potenzialità, che potrebbe indicare una sottostima della presenza di H. adriaticum oppure zone non ancora colonizzate.
... These bioclimatic and soil data were used to create species distribution models (SDM) using the software Maxent (Version 3.4.4 - (Phillips and Dudík, 2008) in RStudio (Version 2022.2.0.443 -(R core team, 2023). Paleoclimate data were sourced from paleoclim.org ...
Preprint
Full-text available
Cannabis sativa L. is an annual flowering herb of Eurasian origin that has long been associated with humans. Domesticated independently at multiple locations at different times for different purposes (food, fiber, and medicine), these long-standing human associations have influenced its distribution. However, changing environmental conditions and climatic fluctuations have also contributed to the distribution of the species and define where it is optimally cultivated. Here we explore the shifts in distribution that C. sativa may have experienced in the past and explore the likely shifts in the future. Modeling under paleoclimatic scenarios shows niche expansion and contraction in Eurasia through the timepoints examined. Temperature and precipitation variables and soil variable data were combined for species distribution modeling in the present day and showed high and improved predictive ability together as opposed to when examined in isolation. The five most important variables explaining ~65% of the total variation were soil organic carbon content (ORCDRC), pH index measured in water solution (PHIHOX), annual mean temperature (BIO-1), mean temperature of the coldest quarter (BIO-11) and soil organic carbon density (OCDENS) (AUC = 0.934). Climate model projections where efforts are made to curb emissions (RCP45/SSP245) and the business as usual (RCP85/SSP585) models were evaluated. Under projected future climate scenarios, shifts worldwide are predicted with a loss of ~43% in suitability areas with scores above 0.4 observed by 2050 and continued but reduced rates of loss by 2070. Changes in habitat range have large implications for the conservation of wild relatives as well as for the cultivation of Cannabis as the industry moves toward outdoor cultivation practices.
... ; https://doi.org/10.1101/2024.06.10.598232 doi: bioRxiv preprint is generally considered acceptable and allows for the standardization of methods across studies and ease of use (Phillips and Dudík, 2008). Regardless, the overall difference in model performance between the methods was relatively small and a weighted consensus of all seven AI algorithms was used to inform the final monthly ensemble models. ...
Preprint
Development of renewable wind resources on the Outer Continental Shelf of the United States (OCS) has led to growing concerns for marine wildlife. However, significant uncertainty remains regarding the technology's potential to impact species of interest that may occupy planned development sites. This is further compounded by the difficulty of monitoring highly migratory or data-poor species in marine waters, making practical assessment of site- or species-specific threats that could require additional management intervention particularly problematic. Here, I identify a highly generalizable framework to inform species interactions in marine habitats allocated for offshore resource exploitation, using telemetry-derived artificial intelligence species distribution models. Results from a case study of the federally protected Atlantic Sturgeon ( Acipenser oxyrinchus ) demonstrate excellent discriminatory capacity (i.e., AUC ≥ 0.9) at a relatively fine scale (raster resolution = 1 km ² ), while providing critical information on predicted occurrence over a broad swath of unmonitored marine habitats (i.e., the Atlantic OCS region of the US; area > 620,000 km ² ). Furthermore, ensemble map products developed from these models are readily scalable to ongoing management needs and, when overlaid with offshore wind energy lease areas, can feed directly into management strategies to inform best practices for potential habitat influences on Atlantic Sturgeon, as well as other species of commercial or conservation interest.
... A receiver operating characteristic (ROC) and the associated area under the ROC curve (AUC) indicate an overall goodness-of-fit of the MaxEnt model. AUC values often used in assessing the accuracy of infectious disease predictions range from 0 to 1 where 0.5 indicates random prediction and higher values mean more accurate results [55,57,58]. ...
Article
Full-text available
The Korean Demilitarized Zone (DMZ) is one of the world’s most preserved habitats for wild animals and migratory birds. The area also plays a major role in the spread of infectious animal diseases, in particular, African swine fever (ASF) and highly pathogenic avian influenza (HPAI). These outbreaks threaten the livelihood of local livestock farms, not infrequently. In this paper, we explore these relatively under-researched diseases by modeling and mapping ASF and HPAI risks in tandem using MaxEnt, a machine-learning algorithm. The results show robust predictive power with high area under the curve values, of 0.92 and 0.99, respectively. We found that precipitation from spring to early summer and solar radiation in winter were essential in explaining the potential distribution of ASF, but land use contributed little. Thus, understanding only wild boars’ habitat preferences may not be sufficient in preventing ASF epidemics. HPAI risks were shaped by precipitation and mean temperature from winter to spring and land use. Areas with high ASF and HPAI risks were primarily found in forest and agricultural lands, respectively. The DMZ included many high-risk areas, indicating that the DMZ could lead to a broader regional spread of ASF and HPAI in the peninsula. Thus, our results highlight the essential role of cross-border collaboration and the combination of environmental and epidemiological insights in strategies to control ASF and HPAI risks within and surrounding the DMZ.
... In future research, other marine environmental factors can be incorporated to explore their impact on the Fishes 2024, 9,209 15 of 18 distribution of this species' trawl fishing grounds, thereby enhancing understanding of the influence of marine environmental factors on the movement of fishing ground centers. Furthermore, although the Maxent model's results in this study show high AUC values, and researchers generally believe that higher AUC values indicate a more accurate model, some scholars [47] argue that a high AUC value does not necessarily mean better fit, and human factors may interfere with the model's computations. ...
Article
Full-text available
To understand the spatial temporal distribution characteristics of Illex argentinus caught by trawl fishing vessels in the Southwestern Atlantic Ocean and their relationship with key marine environmental factors, this study analyzed the temporal and spatial changes in the fishing ground center of trawl vessels at the ten-day scale from December 2019 to May 2022, combining Chinese trawl fishing log data marine environmental data with satellite remote sensing marine environmental data. Utilizing the Maxent model, ten-day intervals were used as the temporal scale, and ten marine environmental factors, including sea surface temperature, sea surface height, sea surface salinity, chlorophyll concentration, temperature at 50 m and 100 m depth, and the meridional and zonal velocities of ocean currents were quantitatively analyzed to explore the correlation between the spatial distribution of catch and environmental factors. The study reveals that the trawl fishing grounds for Illex argentinus are divided into southern and northern grounds. The southern grounds first appear near 45°20′ S in December, gradually moving southeastward in February and March. The northern grounds do not appear until April, near 42° S in the high seas. On the ten-day time scale, the central fishing grounds of Illex argentinus show significant spatial variability but minor interannual differences. The Maxent model results indicate that sea surface temperature and chlorophyll a concentration are the key environmental factors influencing the spatial and temporal variability of the high seas trawl fishing grounds for most of the time, with high environmental contribution rates during the fishing season. While the range of suitable habitats with an HSI > 0.6 identified by the Maxent model varies significantly between years, a pattern is observed where the range expands at the start and end of the fishing season and contracts during the peak fishing season. This suggests that a more concentrated range of suitable habitats is conducive to accurate predictions of trawl fishing grounds, enabling efficient fishing operations.
... The final predictions for the suitable habitat distribution of O. glabra were conducted using MaxEnt 3.4.4 [24], with the optimal combination of Feature Classes (FC) and Regularization Multipliers (RM) settings, consistent with previous studies [25,26]. ...
Article
Full-text available
The research on the significant toxic weed Oxytropis glabra, which adversely affects the grazing industry and the ecological integrity of natural grasslands in the arid and semi-arid regions of northern China, aims to delineate its potential distribution amidst changing climate conditions. This analysis involves both current conditions (1970–2000) and future projections (2050s and 2070s) under four climate scenarios using an R-optimized MaxEnt model. The results indicate that the distribution of O. glabra was primarily influenced by the temperature of the coldest quarter (bio11, ranging from −12.04 to −0.07 °C), precipitation of the coldest quarter (bio19, 0 to 15.17 mm), and precipitation of the warmest quarter (bio18, 0 to 269.50 mm). Currently, the weed predominantly occupies parts of Xinjiang, Inner Mongolia, Gansu, Qinghai, Ningxia, and Tibet. Projections indicate that, across four future climate scenarios, the area of suitable habitats for O. glabra is expected to expand and shift toward higher latitudes and elevations. The research provides valuable information and a theoretical foundation for the management of O. glabra, alongside advancing grassland ecological research and grazing practices.
... Analyzing the spreading trend and potential distribution of S. alterniflora is of great significance for improving wetland ecosystem management and maintaining biodiversity in the coastal zone of Guangxi. The Maxent model (Phillips et al., 2006;Phillips & Dudik, 2008), one of the commonly used species distribution models, is based on the Maximum Entropy Principle and was proposed by the biologist Jaynes in 1957. ...
Article
Full-text available
In recent years, the continuous expansion of Spartina alterniflora (S. alterniflora) has caused serious damage to coastal wetland ecosystem. Mapping the coverage of S. alterniflora by remote sensing and analyzing its growth pattern pose great importance in controlling the expansion and maintaining the biodiversity of coastal wetlands in Guangxi. This study aimed to use harmonic regression to fit time series data of vegetation indices based on Landsat images, and the phenological features were extracted as the input of random forest model to distinguish S. alterniflora in coastal zone of Guangxi from 2009 to 2020. The influence of natural environmental factors on the distribution of S. alterniflora was evaluated by Maxent model, and the potential distribution was analyzed. The results showed that: (1) Based on the time series data of characteristic indices fitted by harmonic regression, the extraction of phenological features of S. alterniflora identification effect exhibited high accuracy (in the result of 2009, Overall Accuracy [OA] = 97.33%, Kappa = 0.95). (2) During 2009–2020, the S. alterniflora in coastal zone of Guangxi kept proliferating and expanding from east to west. The total area of S. alterniflora continued to increase while the growth rate showed a trend that increased first and then decreased. (3) The Maxent model shows good accuracy in simulating the habitat of S. alterniflora, with a potential distribution area of 14,303.39 hm². The findings will be beneficial to the understanding of dynamic changes of S. alterniflora in coastal zone of Guangxi and provide a scientific reference for other coastal wetland studies on S. alterniflora expansion.
... The model was developed in Maxent software v. 3.4.4. Regular multipliers (RM) and feature classes (FC) are closely related to the accuracy of the Maxent model (Phillips and Dudík 2008). Therefore, we used the ENMeval package in R software to choose the best combination of FC and RM values based on the lowest Akaike's Information Corrected Criterion (AICc) score (Muscarella et al. 2014). ...
Article
Full-text available
China’s bamboo output is closely associated with its national economy; however, it is currently rapidly declining due to damage from the pests Anaka burmensis and Cicadella viridis. Identifying regions that are environmentally suitable for these pests is a critical step in their effective control. Therefore, in this study, we used a Maxent model to predict their current and future potential areas of distribution (2021–2040, 2041–2060, and 2061–2080) and explored changes over time using distribution data and related environmental variables. The model results demonstrates that the current potential areas of distribution of A. burmensis are predominantly concentrated in several provinces of southern and central China, such as Guizhou, Guangxi, and Hubei, whereas the current potential areas of distribution of C. viridis are primarily in many provinces across southern, central, and northeastern China. In the future, the potential distribution of A. burmensis will increase and move minimally, whereas the potential distribution of C. viridis will decrease and move considerably. The results of the present study provide vital information for predicting the spread and outbreaks of C. viridis and A. burmensis and provide a reference framework for developing management strategies to control these two pests, thereby minimizing economic loss in the bamboo industry.
... We also tested the significance of the partial response curves using pROC function in the 'ntbox' package (Osorio-Olvera et al., 2020). These performance metrics were calculated over 100-iteration bootstraps using 10% test presence, which reserves 10% of the known occurrence locations for testing the resulting models (Phillips et al., 2006;Phillips and Dudik, 2008). A full array of the test statistics available is presented in Supplementary Table 1. ...
Article
Defining plant ecophysiological responses across natural distributions enables a greater understanding of the niche that plants occupy. Much of the foundational knowledge of species’ ecology and responses to environmental change across their distribution is often lacking, particularly for rare and threatened species, exacerbating management and conservation challenges. Combining high-resolution species distribution models (SDMs) with ecophysiological monitoring characterized the spatiotemporal variation in both plant traits and their interactions with their surrounding environment for the range-restricted Aluta quadrata Rye & Trudgen, and a common, co-occurring generalist, Eremophila latrobei subsp. glabra (L.S.Sm.) Chinnock., from the semi-arid Pilbara and Gascoyne region in northwest Western Australia. The plants reflected differences in gas exchange, plant health and plant water relations at sites with contrasting suitability from the SDM, with higher performance measured in the SDM-predicted high-suitability site. Seasonal differences demonstrated the highest variation across ecophysiological traits in both species, with higher performance in the austral wet season across all levels of habitat suitability. The results of this study allow us to effectively describe how plant performance in A. quadrata is distributed across the landscape in contrast to a common, widespread co-occurring species and demonstrate a level of confidence in the habitat suitability modelling derived from the SDM in predicting plant function determined through intensive ecophysiology monitoring programmes. In addition, the findings also provide a baseline approach for future conservation actions, as well as to explore the mechanisms underpinning the short-range endemism arid zone systems.
... Subsequently, we built models using the maximum entropy algorithm in MAXENT 3.4.1 (Phillips et al. 2006), and the area under the curve (AUC) was used to assess the accuracy of the modeling outcomes (Lobo et al. 2008). The logistic output of each species' best model, representing the probability of presence varies from 0 to 1, was used for direct comparison across models (Phillips and Dudik 2008). To mitigate overprediction in species-poor areas, we removed grid cells with suitability values lower than the maximum training sensitivity plus specificity threshold (Liu et al. 2013). ...
Article
Full-text available
How ecological and evolutionary factors affect small mammal diversity in arid regions remains largely unknown. Here, we combined the largest phylogeny and occurrence dataset of Gerbillinae desert rodents to explore the underlying factors shaping present-day distribution patterns. In particular, we analyzed the relative contributions of ecological and evolutionary factors on their species diversity using a variety of models. Additionally, we inferred the ancestral range and possible dispersal scenarios and estimated the diversification rate of Gerbilliane. We found that Gerbillinae likely originated in the Horn of Africa in the Middle Miocene and then dispersed and diversified across arid regions in northern and southern Africa and western and central Asia, forming their current distribution pattern. Multiple ecological and evolutionary factors jointly determine the spatial pattern of Gerbillinae diversity, but evolutionary factors (evolutionary time and speciation rate) and habitat filtering were the most important in explaining the spatial variation in species richness. Our study enhances the understanding of the diversity patterns of small mammals in arid regions and highlights the importance of including evolutionary factors when interpreting the mechanisms underlying large-scale species diversity patterns.
... The SDM was build using MaxEnt 3.4.4 (Phillips and Dudík 2008). This method used presence-only data to assess the correlation between species occurrence and climatical factors (Phillips et al. 2006). ...
Article
Full-text available
Environmental characteristics act as limiting factors for the establishment and survival of organisms, and this holds particular significance for ectothermic. Grammostola vachoni is an endemic Argentinean tarantula that inhabits central mountainous grasslands. In this study, we model their potential distribution and assess thermal parameters and vulnerability indices in unstudied populations to understand the species’ survival limits. The models to explain the geographic distribution of G. vachoni demonstrate a non-random pattern and robust predictive capabilities. The monthly variables in December, October, and March were the greatest influence. Respect to thermal parameters calculated throughout locomotor performance, the critical thermal limits were 4.9 °C and 51.1 °C, and the preferred temperatures were correlated whit optimal temperature. According the occurrence probability, the species has a low probability of persisting in extreme coldest and hottest environments. In a more temperate environment, the probability increases, mostly at temperatures of about 35 °C. The thermal performance breadth was 19.30, the thermal tolerance range was 46.1 °C and the inferior and superior limits of B80 were 18.9 °C and 37.1 °C. Grammostola vachoni show a better performance at higher temperatures, and a preference for higher temperatures than other ectotherms. The diverse microclimates provide by the heterogeneous nature of mountain grassland outcrops can act as a refuge. Implications for insect conservation: Our results show that grassland outcrops play a key role in maintaining the G. vachoni population, as these environments appear to be less affected in the future. Moreover, the species’ locomotor performance could cope with future thermal shifts.
... We incorporated environmental factors by obtaining habitat suitability. The habitat suitability is achieved using Maxent software 9,16,25,[37][38][39][40][41][42] . Habitat suitability models can help to understand and predict the dynamics of invasions. ...
Article
Full-text available
The spread of American Bullfrog has a significant impact on the surrounding ecosystem. It is important to study the mechanisms of their spreading so that proper mitigation can be applied when needed. This study analyzes data from national surveys on bullfrog distribution. We divided the data into 25 regional clusters. To assess the spread within each cluster, we constructed temporal sequences of spatial distribution using the agglomerative clustering method. We employed Elementary Cellular Automata (ECA) to identify rules governing the changes in spatial patterns. Each cell in the ECA grid represents either the presence or absence of bullfrogs based on observations. For each cluster, we counted the number of presence location in the sequence to quantify spreading intensity. We used a Convolutional Neural Network (CNN) to learn the ECA rules and predict future spreading intensity by estimating the expected number of presence locations over 400 simulated generations. We incorporated environmental factors by obtaining habitat suitability maps using Maxent. We multiplied spreading intensity by habitat suitability to create an overall assessment of bullfrog invasion risk. We estimated the relative spreading assessment and classified it into four categories: rapidly spreading, slowly spreading, stable populations, and declining populations.
... AUC ranges between 0 and 1, where 1 is perfect prediction and 0.5 is the baseline accuracy of a binary outcome. Models with an AUC values > 0.7 are considered good (Phillips and Dudík 2008). ...
Article
Full-text available
Understanding the mechanisms that maintain species coexistence and determine patterns of community assembly are fundamental goals of ecology. Quantifying the relationship between species traits and stress gradients is a necessary step to disentangle assembly processes and to be able to predict the outcome of environmental change. We examined the hypothesis that desert ant communities are assembled by niche-based processes i.e., environmental filtering and limiting similarity. First, we used population-level morphological trait measurements to study the functional structure of ant communities along a dryland environmental stress gradient. Second, we developed species distribution models for each species to quantify large-scale climatic niche overlap between species. Body, femur, antennal scape, and head lengths were correlated with environmental gradients. Regionally, the ant community was significantly and functionally overdispersed in terms of morphological traits which suggests the importance of competition to ant community structure. Ant community assembly was also strongly influenced by environmental factors as the degree of functional trait divergence, but not phylogenetic divergence, decreased with increasing environmental stress. Thus, environmental stress likely mediates limiting similarity in these desert ecosystems. Species with lower climatic niche overlap were more dissimilar in morphological traits. This suggests that environmental filtering on ant functional traits is important at the scale of species distributions in addition to regional scales. This study shows that environmental and biotic filtering (i.e., niche-based assembly mechanisms) are jointly and non-independently structuring the ant community.
... To model the distribution of the species and each climateadapted genotype, we used three different classes of algorithms for an ensemble forecasting (Araújo & New, 2007), including an envelope method (DOMAIN), logistic regression (GLM) and maximum entropy (Maxent) (Carpenter et al., 1993;Guisan et al., 2002;Phillips & Dudík, 2008). We performed 10 replications for each combination of variables for each method, using 10-fold cross-validation to evaluate the results. ...
Article
Full-text available
Aim The impact of climate change on biodiversity is often analysed under a stable evolutionary perspective focused on whether species can currently tolerate warmer climates. However, species may adapt to changes, and particularly under conditions of low habitat fragmentation, standing adaptive genetic variation can spread across populations tracking changing climates, increasing the potential for evolutionary rescue. Here, our aim is to integrate genomic data, niche modelling and landscape ecology to predict range shifts and the potential for evolutionary rescue. Location The megadiverse Amazonian rainforest. Methods We use genome–environment association analyses to search for candidate loci under environmental selection, while accounting for neutral genetic variation in a widespread Amazonian whiptail lizard (Teiidae: Kentropyx calcarata). We then model the distribution of individuals with genotypes adapted to different climate conditions. We predict range shifts for each genotype in distinct future climate change scenarios by integrating this information with dispersal constraints based on predicted scenarios of forest cover across Amazonia. The predicted ranges of each genotype were then overlapped to infer the potential for evolutionary rescue. Results We find that the potential for evolutionary rescue and, therefore, a smaller degree of range loss buffering extinction risk in the future is considerably high, provided that current forest cover is retained and climate change is not extreme. However, under extreme environmental change scenarios, range loss will be high in central and southern Amazonia, irrespective of the degree of deforestation. Main Conclusions Our results suggest that protecting the Amazonian rainforest against further deforestation and mitigating climate change to moderate scenarios until 2070 could foster evolutionary rescue of ectothermic organisms. These actions could prevent substantial biodiversity loss in Amazonia, emphasizing the importance of understanding species adaptability in maintaining biodiversity.
Preprint
Full-text available
In this study, present and future distributions of stone pine due to climate changes were modeled with MaxEnt. CNRM ESM2-1 climate model and bioclimatic variables obtained from the WorldClim database were used as climate models. As climate scenarios, SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5 climate change scenarios and 2041–2060 and 2081–2100 periods were used. Pearson Correlation analysis was performed to prevent high correlation in bioclimatic variables and the multicollinearity problem was eliminated by reducing 19 bioclimatic variables to 9 variables. The contribution of bioclimatic variables to the model was determined by the Jackknife test. To determine the spatial and locational differences between the present and future potential distributions estimated for the species, an analysis of change was conducted. According to the findings of the study, our model has a very high predictive power and the Jackknife test results, the bioclimatic variables BIO19, BIO6, and BIO4 contribute the most to the model. Our prediction model predicts that the distribution area of stone pine will decrease, shifting northward and towards higher altitudes. We believe that this will lead to increased risk of forest fires, loss of ecosystem services, and reduced income from stone pine. For these reasons, benefit from stone pine need to take into account the effects of climate change in their land use planning and give importance to climate change adaptation efforts. These maps, created with current and future predictions of potential habitat distribution, can be use in afforestation, ecological restoration, rural development, conservation, and all kinds of land use studies.
Poster
Full-text available
Poster presentation is about the modelling of the distribution of gall-rust disease in Mindanao, Philippines under the current climate and future climate scenario
Thesis
Full-text available
Climate relicts within arid environments may persist in isolated refugia if suitable conditions exist in microsites that differ enough from the regional environment to provide relief from drought and temperature stress. Disjunct populations of plant taxa associated with coastal and interior chaparral have persisted in just a few mountain ranges within the Mojave Desert. These mountain biotic communities are ecological confluences where plant species characteristic of distant floristic regions co-occur. Some of these species are widespread in chaparral of the species rich California Floristic Province and in the warmer Madrean Floristic Province of Arizona. While the Mojave Desert does not fall within the California or Madrean Floristic Provinces, the occurrence of coastal and interior chaparral species within this region is an interesting exception. Here, the plant community composition of sites where several of these species occur were surveyed in 6 mountain ranges across the Mojave Desert landscape. The distributions of 8 focal chaparral species were found to be influenced by topographic variables including slope position, aspect and terrain wetness, in addition to climatic variables such as summer monsoon precipitation and temperature. A k-means clustering analysis pointed toward community assemblages characterized by species associated along a transitional gradient from xeric shrublands to relatively mesic woodlands where chaparral occurs in the desert. Non-metric multidimensional scaling (nMDS) indicated that differences in the species composition among these mountain ranges was explained by regional floristic variation, and species frequently associated with more mesic woodland assemblages. Species distribution models predicted that, over the next 50 years, the availability of suitable habitat for most of these focal species is likely to contract within the Mojave Desert under the projected RCP 4.5 climate change scenario. These results demonstrate how the distribution of relict plant populations can be influenced in an arid environment, and illustrate challenges that chaparral taxa face with prolonged environmental stress.
Article
Advances continue to be made by plant pathologists on topics in plant health, environmental protection and food security. Many advances have been made for individual crops, pathogens and diseases that in many cases have led to their successful management. A wider impact of research depends on recognition of the multifaceted challenges posed by plant diseases and the need to integrate studies in a systems level approach. The adoption of high‐throughput sequencing for diagnosis and detection is widespread but impact depends upon the agricultural and ecological context combined with improved surveillance. Deployment of host resistance in the field needs to be aligned with a greater appreciation of plant genetic diversity and the complementary contribution made by tolerance of disease. Epidemiological understanding of the spatiotemporal spread of plant diseases has improved through population dynamic and genetic analyses. Research emphasis on the plant microbiome has invigorated soil microbial studies, especially for disease complexes and declines, but the challenge is to move to interventions that benefit plant health. Analysis of the impacts of climate change has been made for single‐crop disease studies, but seldom have these been placed in the context of pathogen adaptation, new crops, wild plants, vectors and soil microbes. Advances in informatic analysis illustrate not only the global impacts of plant disease introductions, but also the challenges inherent in marshalling and integrating information. Advances have been made in applying artificial intelligence technologies across many areas of plant pathology but have yet to be integrated within any coordinated research agenda.
Article
Full-text available
Resumo. Este trabalho objetiva obter dados de ocorrência de Hancornia speciosa Gomes (Apocynaceae) e através deles avaliar o efeito das mudanças climáticas em cenários contrastantes futuros sobre a distribuição e diversidade dessa espécie. Os dados usados sobre a ocorrência da distribuição da mangaba foram obtidos nos bancos de dados (GBIF e SpeciesLink). Foram utilizados seis algoritmos de modelagem de distribuição potencial de espécies, considerando cenários climáticos futuros otimistas e pessimistas, fornecidos pelo Painel Intergovernamental sobre Mudanças Climáticas (IPCC). Os resultados apontam que as mudanças climáticas terão um impacto negativo na distribuição de Hancornia speciosa Gomes e que a espécie ficará restrita a pequenas áreas de Mata Atlântica. Palavras-chave: Mudanças climáticas; Mangaba; IPCC. Abstract. The current and future geographical distribution of Hancornia speciosa Gomes in Brazil: Perspectives on conservation. What is the future of mangaba? This work aims to obtain data on the occurrence of Hancornia speciosa Gomes (Apocynaceae) and through them to evaluate the effect of climate change in contrasting future scenarios on the distribution and diversity of this species. The data used on the occurrence of mangaba distribution were obtained from databases (GBIF and speciesLink). Six potential species distribution modeling algorithms were used, considering optimistic and pessimistic future climate scenarios, provided by the Intergovernmental Panel on Climate Change (IPCC). The results show that climate change will have a negative impact on the distribution of Hancornia speciosa Gomes, and that the species will be restricted to small areas of the Atlantic Forest.
Article
Full-text available
Background Relict species are important for enhancing the understanding of modern biogeographic distribution patterns. Although both geological and climatic changes since the Cenozoic have affected the relict flora in East Asia, the contributions of geographical processes remain unclear. In this study, we employed restriction-site associated DNA sequencing (RAD-seq) and shallow genome sequencing data, in conjunction with ecological niche modeling (ENM), to investigate the spatial genetic patterns and population differentiation history of the relict species Rehderodendron kwangtungense Chun. Results A total of 138 individuals from 16 populations were collected, largely covering the natural distribution of R. kwangtungense. The genetic diversity within the R. kwangtungense populations was extremely low (HO = 0.048 ± 0.019; HE = 0.033 ± 0.011). Mantel tests revealed isolation-by-distance pattern (R² = 0.38, P < 0.001), and AMOVA analysis showed that the genetic variation of R. kwangtungense occurs mainly between populations (86.88%, K = 7). Between 23 and 21 Ma, R. kwangtungense underwent a period of rapid differentiation that coincided with the rise of the Himalayas and the establishment of the East Asian monsoon. According to ENM and population demographic history, the suitable area and effective population size of R. kwangtungense decreased sharply during the glacial period and expanded after the last glacial maximum (LGM). Conclusion Our study shows that the distribution pattern of southern China mountain relict flora may have developed during the panplain stage between the middle Oligocene and the early Miocene. Then, the flora later fragmented under the force of orogenesis, including intermittent uplift during the Cenozoic Himalayan orogeny and the formation of abundant rainfall associated with the East Asian monsoon. The findings emphasized the predominant role of geographical processes in shaping relict plant distribution patterns.
Article
Full-text available
The environmental conditions of Mexico allow the presence of several species of bamboo, where commercial uses have diversified due to their rapid growth and characteristics. The Mexican Guaduas are specimens with exemplary structural characteristics, however, there is no cartographic information in Mexico that allows locating and sizing the areas where bamboo species are located or can be located, which restricts decision making for commercial forest plantations. Therefore, it is intended to determine the probability of occurrence of the 7 species of this genus of bamboo, based on the maximum entropy approach and three arrangements of 21 environmental variables, for which 478 presence records were used. All generated models defined a good fit with the training data, with an AUC value greater than 0.90. It was found that the niche distribution of Guadua mexicana species is mainly influenced by altitude (ELEV), so they are found in areas close to coastal regions. Likewise, in defining the distribution of this genus, annual precipitation (BIO12), evapotranspiration (ETP) and average annual temperature (BIO1) stood out. This was independent of the three arrangements of environmental variables that were tested. Keywords Maximum entropy; Omission/commission curves; Permutation importance
Preprint
Full-text available
Goldenseal (Hydrastis canadensis L.) is a perennial herbaceous plant native to forestlands in eastern North America. In Pennsylvania (PA), a U.S. state within the northeastern edge of its range, commercial harvesting for medicinal markets and habitat loss have led to conservation concerns. A better understanding of habitat predilections could help guide in situ conservation efforts including locating extant populations, forest farming adoption, and assisted migration. In this study, GIS-based Maximum Entropy (Maxent) modeling using occurrences (n=51) was combined with field plot data (n=28) to determine factors governing goldenseal’s distribution in PA and identify floral indicators of supportive habitat. The Maxent model suggested that winter temperature and bedrock type were the most important characteristics governing habitat suitability. The model identified base-rich bedrock types as most suitable; a trait confirmed in the field by soil test results showing high calcium and pH levels. However, the influence of bedrock is complicated by overlapping land use legacy, particularly in the Piedmont and Ridge-and-Valley physiographic provinces. Community analysis identified 159 woody and herbaceous associates, including indicators of the following supportive rich mesic forest types: “Tuliptree-Beech-Maple,” “Red Oak-Mixed hardwood,” and “Central Appalachian Rich Cove”. Thirteen so called “invasive” taxa were encountered, of which at least one was present in 83% of plots. These results suggest that goldenseal habitat is widespread in the state, and species absences may be due to abiotic factors, most importantly the severity of winter temperatures. Additionally, future negative impacts on extant goldenseal populations might be anticipated resulting from the continued spread of invasive taxa.
Article
Mongolian herder households maintain the health and condition of their livestock by adapting to the characteristics of the local vegetation distribution. Thus, predicting future vegetation changes is important for stable livestock grazing and sustainable rangeland use. We predicted the distributional extent of rangeland vegetation, specifically desert steppe, steppe and meadow steppe communities, for the period 2081–2100, based on vegetation data obtained from a previous study. Rangeland vegetation data collected in Mongolia (43–50° N, 87–119° E) between 2012 and 2016 (278 plots) were classified into community types. Species distribution modeling was conducted using a maximum entropy (MaxEnt) model. Distribution data for desert steppe, steppe and meadow steppe communities were used as objective variables, and bioclimatic data obtained from WorldClim were used as explanatory variables. CMIP6-downscaled future climate projections provided by WorldClim were used for future prediction. The area under the curve values for the desert steppe, steppe and meadow steppe models were 0.850, 0.847 and 0.873, respectively. Suitable habitat was projected to shrink under all scenarios and for all communities with climate change. The extent of reduction in potential suitable areas was greatest for meadow steppe communities. Our results indicate that meadow steppe communities will transition to steppe communities with future climate change.
Article
Full-text available
Information from natural history collections (NHCs) about the diversity, taxonomy and historical distributions of species worldwide is becoming increasingly available over the Internet. In light of this relatively new and rapidly increasing resource, we critically review its utility and limitations for addressing a diverse array of applications. When integrated with spatial environmental data, NHC data can be used to study a broad range of topics, from aspects of ecological and evolutionary theory, to applications in conservation, agriculture and human health. There are challenges inherent to using NHC data, such as taxonomic inaccuracies and biases in the spatial coverage of data, which require consideration. Promising research frontiers include the integration of NHC data with information from comparative genomics and phylogenetics, and stronger connections between the environmental analysis of NHC data and experimental and field-based tests of hypotheses.
Article
Full-text available
Theory predicts low niche differentiation between species over evolutionary time scales, but little empirical evidence is available. Reciprocal geographic predictions based on ecological niche models of sister taxon pairs of birds, mammals, and butterflies in southern Mexico indicate niche conservatism over several million years of independent evolution (between putative sister taxon pairs) but little conservatism at the level of families. Niche conservatism over such time scales indicates that speciation takes place in geographic, not ecological, dimensions and that ecological differences evolve later.
Article
Full-text available
Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning---in the sense of setting weights to exact zeros---becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.
Article
Full-text available
Ecological niche modeling, a new methodology for predicting the geographic course of species' invasions, was tested based on four invasive plant species (garlic mustard, sericea lespedeza, Russian olive, and hydrilla) in North America. Models of ecological niches and geographic distributions on native distributional areas (Europe and Asia) were highly statistically significant. Projections for each species to North America-effectively predictions of invasive potential-were highly coincident with areas of known invasions. Hence, in each case, the geographic invasive potential was well summarized in a predictive sense; this methodology holds promise for development of control and eradication strategies and for risk assessment for species' invasions.
Article
Full-text available
Increasing concern over the implications of climate change for biodiversity has led to the use of species–climate envelope models to project species extinction risk under climate-change scenarios. However, recent studies have demonstrated significant variability in model predictions and there remains a pressing need to validate models and to reduce uncertainties. Model validation is problematic as predictions are made for events that have not yet occurred. Resubstituition and data partitioning of present-day data sets are, therefore, commonly used to test the predictive performance of models. However, these approaches suffer from the problems of spatial and temporal autocorrelation in the calibration and validation sets. Using observed distribution shifts among 116 British breeding-bird species over the past ∼20 years, we are able to provide a first independent validation of four envelope modelling techniques under climate change. Results showed good to fair predictive performance on independent validation, although rules used to assess model performance are difficult to interpret in a decision-planning context. We also showed that measures of performance on nonindependent data provided optimistic estimates of models' predictive ability on independent data. Artificial neural networks and generalized additive models provided generally more accurate predictions of species range shifts than generalized linear models or classification tree analysis. Data for independent model validation and replication of this study are rare and we argue that perfect validation may not in fact be conceptually possible. We also note that usefulness of models is contingent on both the questions being asked and the techniques used. Implementations of species–climate envelope models for testing hypotheses and predicting future events may prove wrong, while being potentially useful if put into appropriate context.
Article
Full-text available
Prediction of species’ distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species’ distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species’ occurrence data. Presence-only data were effective for modelling species’ distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.
Article
Full-text available
Aim Many attempts to predict the potential range of species rely on environmental niche (or ‘bioclimate envelope’) modelling, yet the effects of using different niche‐based methodologies require further investigation. Here we investigate the impact that the choice of model can have on predictions, identify key reasons why model output may differ and discuss the implications that model uncertainty has for policy‐guiding applications. Location The Western Cape of South Africa. Methods We applied nine of the most widely used modelling techniques to model potential distributions under current and predicted future climate for four species (including two subspecies) of Proteaceae. Each model was built using an identical set of five input variables and distribution data for 3996 sampled sites. We compare model predictions by testing agreement between observed and simulated distributions for the present day (using the area under the receiver operating characteristic curve (AUC) and kappa statistics) and by assessing consistency in predictions of range size changes under future climate (using cluster analysis). Results Our analyses show significant differences between predictions from different models, with predicted changes in range size by 2030 differing in both magnitude and direction (e.g. from 92% loss to 322% gain). We explain differences with reference to two characteristics of the modelling techniques: data input requirements (presence/absence vs. presence‐only approaches) and assumptions made by each algorithm when extrapolating beyond the range of data used to build the model. The effects of these factors should be carefully considered when using this modelling approach to predict species ranges. Main conclusions We highlight an important source of uncertainty in assessments of the impacts of climate change on biodiversity and emphasize that model predictions should be interpreted in policy‐guiding applications along with a full appreciation of uncertainty.
Article
Full-text available
Aim To design and apply statistical tests for measuring sampling bias in the raw data used to the determine priority areas for conservation, and to discuss their impact on conservation analyses for the region. Location Sub-Saharan Africa. Methods An extensive data set comprising 78,083 vouchered locality records for 1068 passerine birds in sub-Saharan Africa has been assembled. Using geographical information systems, we designed and applied two tests to determine if sampling of these taxa was biased. First, we detected possible biases because of accessibility by measuring the proximity of each record to cities, rivers and roads. Second, we quantified the intensity of sampling of each species inside and surrounding proposed conservation priority areas and compared it with sampling intensity in non-priority areas. We applied statistical tests to determine if the distribution of these sampling records deviated significantly from random distributions. Results The analyses show that the location and intensity of collecting have historically been heavily influenced by accessibility. Sampling localities show dense, significant aggregation around city limits, and along rivers and roads. When examining the collecting sites of each individual species, the pattern of sampling has been significantly concentrated within and immediately surrounding areas now designated as conservation priorities. Main conclusions Assessment of patterns of species richness and endemicity at the scale useful for establishing conservation priorities, below the continental level, undoubtedly reflects biases in taxonomic sampling. This is especially problematic for priorities established using the criterion of complementarity because the estimated spatial costs of this approach are highly sensitive to sampling artefacts. Hence such conservation priorities should be interpreted with caution proportional to the bias found. We argue that conservation priority setting analyses require (1) statistical tests to detect these biases, and (2) data treatment to reflect species distribution rather than patterns of collecting effort.
Article
Full-text available
We describe and develop a close relationship between two problems that have customarily been regarded as distinct: that of maximizing entropy, and that of minimizing worst-case expected loss. Using a formulation grounded in the equilibrium theory of zero-sum games between Decision Maker and Nature, these two problems are shown to be dual to each other, the solution to each providing that to the other. Although Tops0e described this connection for the Shannon entropy over 20 years ago, it does not appear to be widely known even in that important special case. We here
Article
Full-text available
Abstract.—Modeling approaches that relate known occurrences of species to landscape features to discover ecological properties and predict geographic occurrences have seen extensive recent application in ecology, systematics, and conservation. A key component in this process is estimation or characterization of species’ distributions in ecological space, which can then be useful in understanding their potential distributions in geographic space. Hence, this process is often termed ecological niche modeling or (less boldly) species distribution modeling. Applications of this approach vary widely in their aims, products, and requirements; this variety is reviewed herein, examples are provided, and differences in data needs and possible interpretations are discussed.
Article
Full-text available
Climate change over the past approximately 30 years has produced numerous shifts in the distributions and abundances of species and has been implicated in one species-level extinction. Using projections of species' distributions for future climate scenarios, we assess extinction risks for sample regions that cover some 20% of the Earth's terrestrial surface. Exploring three approaches in which the estimated probability of extinction shows a power-law relationship with geographical range size, we predict, on the basis of mid-range climate-warming scenarios for 2050, that 15-37% of species in our sample of regions and taxa will be 'committed to extinction'. When the average of the three methods and two dispersal scenarios is taken, minimal climate-warming scenarios produce lower projections of species committed to extinction ( approximately 18%) than mid-range ( approximately 24%) and maximum-change ( approximately 35%) scenarios. These estimates show the importance of rapid implementation of technologies to decrease greenhouse gas emissions and strategies for carbon sequestration.
Article
Full-text available
Patterns of biological diversity should be interpreted in light of both contemporary and historical influences; however, to date, most attempts to explain diversity patterns have largely ignored history or have been unable to quantify the influence of historical processes. The historical effects on patterns of diversity have been hypothesized to be most important for taxonomic groups with poor dispersal abilities. We quantified the relative stability of rainforests over the late Quaternary period by modeling rainforest expansion and contraction in 21 biogeographic subregions in northeast Australia across four time periods. We demonstrate that historical habitat stability can be as important, and in endemic low-dispersal taxa even more important, than current habitat area in explaining spatial patterns of species richness. In contrast, patterns of endemic species richness for taxa with high dispersal capacity are best predicted by using current environmental parameters. We also show that contemporary patterns of species turnover across the region are best explained by historical patterns of habitat connectivity. These results clearly demonstrate that spatially explicit analyses of the historical processes of persistence and colonization are both effective and necessary for understanding observed patterns of biodiversity. • Australian Wet Tropics • β-diversity • dispersal • habitat connectivity • historical habitat stability
Article
This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient from ordinary linear regression. The population value is the correlation between the response and its conditional expectation given the predictors, and the sample value is the correlation between the observed response and the model predicted value. We compare four estimators of the measure in terms of bias, mean squared error and behaviour in the presence of overparameterization. The sample estimator and a jack-knife estimator usually behave adequately, but a cross-validation estimator has a large negative bias with large mean squared error. One can use bootstrap methods to construct confidence intervals for the population value of the correlation measure and to estimate the degree to which a model selection procedure may provide an overly optimistic measure of the actual predictive power. Copyright © 2000 John Wiley & Sons, Ltd.
Article
Predicting the probability of successful establishment of plant species by matching climatic variables has considerable potential for incorporation in early warning systems for the management of biological invasions. We select South Africa as a model source area of invasions worldwide because it is an important exporter of plant species to other parts of the world because of the huge international demand for indigenous flora from this biodiversity hotspot. We first mapped the five ecoregions that occur both in South Africa and other parts of the world, but the very coarse definition of the ecoregions led to unreliable results in terms of predicting invasible areas. We then determined the bioclimatic features of South Africa's major terrestrial biomes and projected the potential distribution of analogous areas throughout the world. This approach is much more powerful, but depends strongly on how particular biomes are defined in donor countries. Finally, we developed bioclimatic niche models for 96 plant taxa (species and subspecies) endemic to South Africa and invasive elsewhere, and projected these globally after successfully evaluating model projections specifically for three well-known invasive species (Carpobrotus edulis, Senecio glastifolius, Vellereophyton dealbatum) in different target areas. Cumulative probabilities of climatic suitability show that high-risk regions are spatially limited globally but that these closely match hotspots of plant biodiversity. These probabilities are significantly correlated with the number of recorded invasive species from South Africa in natural areas, emphasizing the pivotal role of climate in defining invasion potential. Accounting for potential transfer vectors (trade and tourism) significantly adds to the explanatory power of climate suitability as an index of invasibility. The close match that we found between the climatic component of the ecological habitat suitability and the current pattern of occurrence of South Africa alien species in other parts of the world is encouraging. If species' distribution data in the donor country are available, climatic niche modelling offers a powerful tool for efficient and unbiased first-step screening. Given that eradication of an established invasive species is extremely difficult and expensive, areas identified as potential new sites should be monitored and quarantine measures should be adopted.
Article
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Article
Natural-history collections in museums contain data critical to decisions in biodiversity conservation. Collectively, these specimen-based data describe the distributions of known taxa in time and space. As the most comprehensive, reliable source of knowledge for most described species, these records are potentially available to answer a wide range of conservation and research questions. Nevertheless, these data have shortcomings, notably geographic gaps, resulting mainly from the ad hoc nature of collecting effort. This problem has been frequently cited but rarely addressed in a systematic manner. We have developed a methodology to evaluate museum collection data, in particular the reliability of distributional data for narrow-range taxa. We included only those taxa for which there were an appropriate number of records, expert verification of identifications, and acceptable locality accuracy. First, we compared the available data for the taxon of interest to the “background data,” comprised of records for those organisms likely to be captured by the same methods or by the same collectors as the taxon of interest. The “adequacy”of background sampling effort was assessed through calculation of statistics describing the separation, density, and clustering of points, and through generation of a sampling density contour surface. Geographical information systems (GIS) technology was then used to model predicted distributions of species based on abiotic (e.g., climatic and geological) data. The robustness of these predicted distributions can be tested iteratively or by bootstrapping. Together, these methods provide an objective means to assess the likelihood of the distributions obtained from museum collection records representing true distributions. Potentially, they could be used to evaluate any point data to be collated in species maps, biodiversity assessment, or similar applications requiring distributional information.
Article
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influ- ence of sampling design. Topromote better use of this method, we review its application and interpretation under 3 sampling designs: random, case–control, and use–availability. Logistic regression is appropriate for habitat use–nonuse,studies employing,random,sampling,and can be used to directly model,the conditional,probability of use in such cases. Logistic regression also is appropriate for studies employing case–control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case–control studies should be interpreted as odds ratios, rather than probability of use orrelative probability of use. When data are gathered under a use–availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, howev- er, logistic regression is inappropriatefor modeling habitat selection in use–availability studies. In particular, using logistic regression to fit the exponential,model,of Manly et al. (2002:100) does not guarantee,maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but itis not guaranteed ,to be ,proportional ,to probability ,of use. Other problems ,associated with the exponential model,also are discussed. We describe,an alternative,model,based on Lancaster,and Imbens (1996) that offers a method for estimating conditional probability of use in use–availability studies. Although promising, this model fails to converge ,to a ,unique ,solution in some ,important ,situations. Further work ,is needed ,to obtain ,a robust method,that is broadly applicable to use–availability studies. JOURNAL OF WILDLIFE MANAGEMENT 68(4):774–789 Key words: bias, case–control, contaminated control, exponential model, habitat modeling, log-binomial model, logis-
Chapter
Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex
Article
Current circumstances — that the majority of species distribution records exist as presence-only data (e.g. from museums and herbaria), and that there is an established need for predictions of species distributions — mean that scientists and conservation managers seek to develop robust methods for using these data. Such methods must, in particular, accommodate the difficulties caused by lack of reliable information about sites where species are absent. Here we test two approaches for overcoming these difficulties, analysing a range of data sets using the technique of multivariate adaptive regression splines (MARS). MARS is closely related to regression techniques such as generalized additive models (GAMs) that are commonly and successfully used in modelling species distributions, but has particular advantages in its analytical speed and the ease of transfer of analysis results to other computational environments such as a Geographic Information System. MARS also has the advantage that it can model multiple responses, meaning that it can combine information from a set of species to determine the dominant environmental drivers of variation in species composition. We use data from 226 species from six regions of the world, and demonstrate the use of MARS for distribution modelling using presence-only data. We test whether (1) the type of data used to represent absence or background and (2) the signal from multiple species affect predictive performance, by evaluating predictions at completely independent sites where genuine presence–absence data were recorded. Models developed with absences inferred from the total set of presence-only sites for a biological group, and using simultaneous analysis of multiple species to inform the choice of predictor variables, performed better than models in which species were analysed singly, or in which pseudo-absences were drawn randomly from the study area. The methods are fast, relatively simple to understand, and useful for situations where data are limited. A tutorial is included.
Article
Predicting the probability of successful establishment of plant species by matching climatic variables has considerable potential for incorporation in early warning systems for the management of biological invasions. We select South Africa as a model source area of invasions worldwide because it is an important exporter of plant species to other parts of the world because of the huge international demand for indigenous flora from this biodiversity hotspot. We first mapped the five ecoregions that occur both in South Africa and other parts of the world, but the very coarse definition of the ecoregions led to unreliable results in terms of predicting invasible areas. We then determined the bioclimatic features of South Africa's major terrestrial biomes and projected the potential distribution of analogous areas throughout the world. This approach is much more powerful, but depends strongly on how particular biomes are defined in donor countries. Finally, we developed bioclimatic niche models for 96 plant taxa (species and subspecies) endemic to South Africa and invasive elsewhere, and projected these globally after successfully evaluating model projections specifically for three well-known invasive species (Carpobrotus edulis, Senecio glastifolius, Vellereophyton dealbatum) in different target areas. Cumulative probabilities of climatic suitability show that high-risk regions are spatially limited globally but that these closely match hotspots of plant biodiversity. These probabilities are significantly correlated with the number of recorded invasive species from South Africa in natural areas, emphasizing the pivotal role of climate in defining invasion potential. Accounting for potential transfer vectors (trade and tourism) significantly adds to the explanatory power of climate suitability as an index of invasibility.
Article
This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient from ordinary linear regression. The population value is the correlation between the response and its conditional expectation given the predictors, and the sample value is the correlation between the observed response and the model predicted value. We compare four estimators of the measure in terms of bias, mean squared error and behaviour in the presence of overparameterization. The sample estimator and a jack-knife estimator usually behave adequately, but a cross-validation estimator has a large negative bias with large mean squared error. One can use bootstrap methods to construct confidence intervals for the population value of the correlation measure and to estimate the degree to which a model selection procedure may provide an overly optimistic measure of the actual predictive power. Copyright © 2000 John Wiley & Sons, Ltd.
Article
Aim Numerous geographical information system (GIS)‐based techniques for estimating a species’ potential geographical distribution now exist. While a species’ potential distribution is more extensive than its documented range, the lack of records from some suitable regions may simply derive from inadequate sampling there. Using occurrence records of both the study species and the more inclusive overall target group, I propose a progression of statistical models to evaluate apparent absences in species distributions. Location Northern Venezuela. Methods Employing data from the Smithsonian Venezuelan Project (a large set of standardized mammalian inventories undertaken across Venezuela), I tested distributional hypotheses for the sigmodontine rodent Oryzomys albigularis ( Tomes, 1860 ). Those inventories collected O. albigularis in two of the five major montane regions of northern Venezuela (the Cordillera de Mérida/Macizo de El Tamá and Cordillera de la Costa Central). I used the Genetic Algorithm for Rule‐Set Prediction (GARP) to estimate the species’ potential distribution in northern Venezuela. Then, based on all collection localities from the Smithsonian Venezuelan Project, I determined the probability that the absence of O. albigularis from the three regions of potential presence where it was not documented (the Serranía de Perijá, Lara–Falcón highlands, and Cordillera de la Costa Oriental) could be the result of inadequate sampling. Results and main conclusions All statistical models indicated that the sampling efforts of the Smithsonian Venezuelan Project were insufficient to demonstrate conclusively the absence of O. albigularis from any of the three regions lacking records. Indeed, a subsequent compilation of specimens from ten natural history museums confirmed its presence in the Serranía de Perijá and the Lara–Falcón highlands. Tests using empirical sampling effort and taking human modification of the landscape into account most closely fulfilled the assumptions required for the tests. By providing a framework for bringing additional quantitative rigour to studies of species distributions, these methods will probably prove of wide applicability to other systems.
Article
Comparison of generative and discriminative classifiers is an ever-lasting topic. As an important contribution to this topic, based on their theoretical and empirical comparisons between the naïve Bayes classifier and linear logistic regression, Ng and Jordan (NIPS 841–848, 2001) claimed that there exist two distinct regimes of performance between the generative and discriminative classifiers with regard to the training-set size. In this paper, our empirical and simulation studies, as a complement of their work, however, suggest that the existence of the two distinct regimes may not be so reliable. In addition, for real world datasets, so far there is no theoretically correct, general criterion for choosing between the discriminative and the generative approaches to classification of an observation x into a class y; the choice depends on the relative confidence we have in the correctness of the specification of either p(y|x) or p(x, y) for the data. This can be to some extent a demonstration of why Efron (J Am Stat Assoc 70(352):892–898, 1975) and O’Neill (J Am Stat Assoc 75(369):154–160, 1980) prefer normal-based linear discriminant analysis (LDA) when no model mis-specification occurs but other empirical studies may prefer linear logistic regression instead. Furthermore, we suggest that pairing of either LDA assuming a common diagonal covariance matrix (LDA-Λ) or the naïve Bayes classifier and linear logistic regression may not be perfect, and hence it may not be reliable for any claim that was derived from the comparison between LDA-Λ or the naïve Bayes classifier and linear logistic regression to be generalised to all generative and discriminative classifiers.
Article
Predicting the distribution of endangered species from habitat data is frequently perceived to be a useful technique. Models that predict the presence or absence of a species are normally judged by the number of prediction errors. These may be of two types: false positives and false negatives. Many of the prediction errors can be traced to ecological processes such as unsaturated habitat and species interactions. Consequently, if prediction errors are not placed in an ecological context the results of the model may be misleading. The simplest, and most widely used, measure of prediction accuracy is the number of correctly classified cases. There are other measures of prediction success that may be more appropriate. Strategies for assessing the causes and costs of these errors are discussed. A range of techniques for measuring error in presence/absence models, including some that are seldom used by ecologists (e.g. ROC plots and cost matrices), are described. A new approach to estimating prediction error, which is based on the spatial characteristics of the errors, is proposed. Thirteen recommendations are made to enable the objective selection of an error assessment technique for ecological presence/absence models.
Chapter
Information theory answers two fundamental questions in communication theory: what is the ultimate data compression (answer: the entropy H), and what is the ultimate transmission rate of communication (answer: the channel capacity C). For this reason some consider information theory to be a subset of communication theory. We will argue that it is much more. Indeed, it has fundamental contributions to make in statistical physics (thermodynamics), computer science (Kolmogorov complexity or algorithmic complexity), statistical inference (Occam's Razor: “The simplest explanation is best”) and to probability and statistics (error rates for optimal hypothesis testing and estimation). The relationship of information theory to other fields is discussed. Information theory intersects physics (statistical mechanics), mathematics (probability theory), electrical engineering (communication theory) and computer science (algorithmic complexity). We describe these areas of intersection in detail.
Article
Aim To assess the geographical transferability of niche‐based species distribution models fitted with two modelling techniques. Location Two distinct geographical study areas in Switzerland and Austria, in the subalpine and alpine belts. Methods Generalized linear and generalized additive models (GLM and GAM) with a binomial probability distribution and a logit link were fitted for 54 plant species, based on topoclimatic predictor variables. These models were then evaluated quantitatively and used for spatially explicit predictions within (internal evaluation and prediction) and between (external evaluation and prediction) the two regions. Comparisons of evaluations and spatial predictions between regions and models were conducted in order to test if species and methods meet the criteria of full transferability. By full transferability, we mean that: (1) the internal evaluation of models fitted in region A and B must be similar; (2) a model fitted in region A must at least retain a comparable external evaluation when projected into region B, and vice‐versa; and (3) internal and external spatial predictions have to match within both regions. Results The measures of model fit are, on average, 24% higher for GAMs than for GLMs in both regions. However, the differences between internal and external evaluations (AUC coefficient) are also higher for GAMs than for GLMs (a difference of 30% for models fitted in Switzerland and 54% for models fitted in Austria). Transferability, as measured with the AUC evaluation, fails for 68% of the species in Switzerland and 55% in Austria for GLMs (respectively for 67% and 53% of the species for GAMs). For both GAMs and GLMs, the agreement between internal and external predictions is rather weak on average (Kulczynski's coefficient in the range 0.3–0.4), but varies widely among individual species. The dominant pattern is an asymmetrical transferability between the two study regions (a mean decrease of 20% for the AUC coefficient when the models are transferred from Switzerland and 13% when they are transferred from Austria). Main conclusions The large inter‐specific variability observed among the 54 study species underlines the need to consider more than a few species to test properly the transferability of species distribution models. The pronounced asymmetry in transferability between the two study regions may be due to peculiarities of these regions, such as differences in the ranges of environmental predictors or the varied impact of land‐use history, or to species‐specific reasons like differential phenotypic plasticity, existence of ecotypes or varied dependence on biotic interactions that are not properly incorporated into niche‐based models. The lower variation between internal and external evaluation of GLMs compared to GAMs further suggests that overfitting may reduce transferability. Overall, a limited geographical transferability calls for caution when projecting niche‐based models for assessing the fate of species in future environments.
Article
Identification of areas containing high biological diversity (‘hotspots’) from species presence-only data has become increasingly important in species and ecosystem management when presence/absence data is unavailable. However, as presence-only data sets lack any information on absences and as they suffer from many biases associated with the ad hoc or non-stratified sampling, they are often assumed problematic and inadequate for most statistical modeling methods. In this paper, this supposition is investigated by comparing generalized additive models (GAM) fitted with 43 native New Zealand fern species presence/absence data, obtained from a survey of 19 875 forested plots, to GAM models and ecological niche factor analysis (ENFA) models fitted with identical presence data and, in the case of GAM models, computer generated ‘pseudo’ absences. By using the same presence data for all models, absence data is isolated as the varying factor allowing different techniques for generating ‘pseudo’ absences used in the GAM models to be analyzed and compared over three principal levels of investigation. GAM models fitted with an environmentally weighted distribution of ‘pseudo’ absences and ENFA models selected environmental variables more similar to the GAM presence/absence models than did the GAM models fitted with randomly distributed ‘pseudo’ absences. Average contributions for the GAM presence/absence model showed mean annual temperature and mean annual solar radiation as the most important factors followed by lithology. Comparisons of prediction results show GAM models incorporating an environmentally weighted distribution of ‘pseudo’ absences to be more closely correlated to the GAM presence/absence models than either the GAM models fitted with randomly selected ‘pseudo’ absences or the ENFA models. ENFA models were found to be the least correlated to the GAM presence/absence models. These latter models were also shown to give the most optimistic predictions overall, however, as ENFA predicts habitat suitability rather than probability of presence this was expected. Summation of species predictions used as a measure of species potential biodiversity ‘hotspots’ also shows ENFA models to give the highest and largest distribution of potential biodiversity. Additionally, GAM models incorporating ‘pseudo’ absences were more highly correlated to the GAM presence/absence model than was ENFA. However, ENFA identified more areas of potential biodiversity ‘hotspots’ similar to the GAM presence/absence model, than either GAM model incorporating ‘pseudo’ absences.
Article
This article describes flexible statistical methods that may be used to identify and characterize nonlinear regression effects. These methods are called "generalized additive models". For example, a commonly used statistical model in medical research is the logistic regression model for binary data. Here we relate the mean of the binary response ¯ = P (y = 1) to the predictors via a linear regression model and the logit link function: log
Conference Paper
We study the problem of maximum entropy density estimation in the presence of known sample selection bias. We propose three bias cor- rection approaches. The first one takes advantage of unbiase d sufficient statistics which can be obtained from biased samples. The second one es- timates the biased distribution and then factors the bias ou t. The third one approximates the second by only using samples from the sampling distri- bution. We provide guarantees for the first two approaches an d evaluate the performance of all three approaches in synthetic experiments and on real data from species habitat modeling, where maxent has been success- fully applied and where sample selection bias is a significan t problem.
Conference Paper
We consider the problem of estimating an unknown probability dis- tribution from samples using the principle of maximum entropy (maxent). To alleviate overfitting with a very large number of features, w e propose applying the maxent principle with relaxed constraints on the expectations of the features. By convex duality, this turns out to be equivalent to finding t he Gibbs distribu- tion minimizing a regularized version of the empirical log loss. We prove non- asymptotic bounds showing that, with respect to the true underlying distribu- tion, this relaxed version of maxent produces density estimates that are almost as good as the best possible. These bounds are in terms of the deviation of the feature empirical averages relative to their true expectat ions, a number that can be bounded using standard uniform-convergence techniques. In particular, this leads to bounds that drop quickly with the number of samples, and that depend very moderately on the number or complexity of the features. We also derive and prove convergence for both sequential-update and parallel-update algorithms. Fi- nally, we briefly describe experiments on data relevant to th e modeling of species geographical distributions.
Article
Treatment of the predictive aspect of statistical mechanics as a form of statistical inference is extended to the density-matrix formalism and applied to a discussion of the relation between irreversibility and information loss. A principle of "statistical complementarity" is pointed out, according to which the empirically verifiable probabilities of statistical mechanics necessarily correspond to incomplete predictions. A preliminary discussion is given of the second law of thermodynamics and of a certain class of irreversible processes, in an approximation equivalent to that of the semiclassical theory of radiation. It is shown that a density matrix does not in general contain all the information about a system that is relevant for predicting its behavior. In the case of a system perturbed by random fluctuating fields, the density matrix cannot satisfy any differential equation because rho&dot;(t) does not depend only on rho(t), but also on past conditions The rigorous theory involves stochastic equations in the type rho(t)=G(t, 0)rho(0), where the operator G is a functional of conditions during the entire interval (0-->t). Therefore a general theory of irreversible processes cannot be based on differential rate equations corresponding to time-proportional transition probabilities. However, such equations often represent useful approximations.
Article
Information theory provides a constructive criterion for setting up probability distributions on the basis of partial knowledge, and leads to a type of statistical inference which is called the maximum-entropy estimate. It is the least biased estimate possible on the given information; i.e., it is maximally noncommittal with regard to missing information. If one considers statistical mechanics as a form of statistical inference rather than as a physical theory, it is found that the usual computational rules, starting with the determination of the partition function, are an immediate consequence of the maximum-entropy principle. In the resulting "subjective statistical mechanics," the usual rules are thus justified independently of any physical argument, and in particular independently of experimental verification; whether or not the results agree with experiment, they still represent the best estimates that could have been made on the basis of the information available.
Book
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.
Chapter
This chapter introduces the concept of differential entropy, which is the entropy of a continuous random variable. Differential entropy is also related to the shortest description length, and is similar in many ways to the entropy of a discrete random variable. But there are some important differences, and there is need for some care in using the concept.
Article
A new method is presented for flexible regression modeling of high dimensional data. The model takes the form of an expansion in product spline basis functions, where the number of basis functions as well as the parameters associated with each one (product degree and knot locations) are automatically determined by the data. This procedure is motivated by the recursive partitioning approach to regression and shares its attractive properties. Unlike recursive partitioning, however, this method produces continuous models with continuous derivatives. It has more power and flexibility to model relationships that are nearly additive or involve interactions in at most a few variables. In addition, the model can be represented in a form that separately identifies the additive contributions and those associated with the different multivariable interactions.
Article
In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation-maximization algorithm to estimate the underlying presence-absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence-absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided.
Article
This article reviews flexible statistical methods that are useful for characterizing the effect of potential prognostic factors on disease endpoints. Applications to survival models and binary outcome models are illustrated.
Article
Theory predicts low niche differentiation between species over evolutionary time scales, but little empirical evidence is available. Reciprocal geographic predictions based on ecological niche models of sister taxon pairs of birds, mammals, and butterflies in southern Mexico indicate niche conservatism over several million years of independent evolution (between putative sister taxon pairs) but little conservatism at the level of families. Niche conservatism over such time scales indicates that speciation takes place in geographic, not ecological, dimensions and that ecological differences evolve later.
Article
This paper studies summary measures of the predictive power of a generalized linear model, paying special attention to a generalization of the multiple correlation coefficient from ordinary linear regression. The population value is the correlation between the response and its conditional expectation given the predictors, and the sample value is the correlation between the observed response and the model predicted value. We compare four estimators of the measure in terms of bias, mean squared error and behaviour in the presence of overparameterization. The sample estimator and a jack-knife estimator usually behave adequately, but a cross-validation estimator has a large negative bias with large mean squared error. One can use bootstrap methods to construct confidence intervals for the population value of the correlation measure and to estimate the degree to which a model selection procedure may provide an overly optimistic measure of the actual predictive power.