Steven J. Phillips’s research while affiliated with AMP and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Figure 2: Examples of plots that can be achieved with functions supplied in the R package vignette. Top left: maps of species data, top right: an interactive map with PA site locations; Bottom left: density plot showing the distribution of PO data along one environmental gradient, compared with that of random points from the region; Bottom right: pairwise correlations between variable.
Presence-only and Presence-absence Data for Comparing Species Distribution Modeling Methods
  • Article
  • Full-text available

July 2020

·

6,783 Reads

·

73 Citations

Biodiversity Informatics

·

·

·

[...]

·

Species distribution models (SDMs) are widely used to predict and study distributions of species. Many different modeling methods and associated algorithms are used and continue to emerge. It is important to understand how different approaches perform, particularly when applied to species occurrence records that were not gathered in struc­tured surveys (e.g. opportunistic records). This need motivated a large-scale, collaborative effort, published in 2006, that aimed to create objective comparisons of algorithm performance. As a benchmark, and to facilitate future comparisons of approaches, here we publish that dataset: point location records for 226 anonymized species from six regions of the world, with accompanying predictor variables in raster (grid) and point formats. A particularly interesting characteristic of this dataset is that independent presence-absence survey data are available for evaluation alongside the presence-only species occurrence data intended for modeling. The dataset is available on Open Science Framework and as an R package and can be used as a benchmark for modeling approaches and for testing new ways to evaluate the accuracy of SDMs.

Download

Point process models for presence‐only analysis

February 2015

·

1,922 Reads

·

412 Citations

Presence‐only data are widely used for species distribution modelling, and point process regression models are a flexible tool that has considerable potential for this problem, when data arise as point events. In this paper, we review point process models, some of their advantages and some common methods of fitting them to presence‐only data. Advantages include (and are not limited to) clarification of what the response variable is that is modelled; a framework for choosing the number and location of quadrature points (commonly referred to as pseudo‐absences or ‘background points’) objectively; clarity of model assumptions and tools for checking them; models to handle spatial dependence between points when it is present; and ways forward regarding difficult issues such as accounting for sampling bias. Point process models are related to some common approaches to presence‐only species distribution modelling, which means that a variety of different software tools can be used to fit these models, including maxent or generalised linear modelling software.


Phillips SJ, Anderson RP, Schapire RE.. Maximum entropy modeling of species geographic distribution. Ecol Model 19: 231-259

April 2013

·

8,164 Reads

·

8,595 Citations

Ecological Modelling

The availability of detailed environmental data, together with inexpensive and powerful computers, has fueled a rapid increase in predictive modeling of species environmental requirements and geographic distributions. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, absence data are not available for most species. In this paper, we introduce the use of the maximum entropy method (Maxent) for modeling species geographic distributions with presence-only data. Maxent is a general-purpose machine learning method with a simple and precise mathematical formulation, and it has a number of aspects that make it well-suited for species distribution modeling. In order to investigate the efficacy of the method, here we perform a continental-scale case study using two Neotropical mammals: a lowland species of sloth, Bradypus variegatus, and a small montane murid rodent, Microryzomys minutus. We compared Maxent predictions with those of a commonly used presence-only modeling method, the Genetic Algorithm for Rule-Set Prediction (GARP). We made predictions on 10 random subsets of the occurrence records for both species, and then used the remaining localities for testing. Both algorithms provided reasonable estimates of the species’ range, far superior to the shaded outline maps available in field guides. All models were significantly better than random in both binomial tests of omission and receiver operating characteristic (ROC) analyses. The area under the ROC curve (AUC) was almost always higher for Maxent, indicating better discrimination of suitable versus unsuitable areas for the species. The Maxent modeling approach can be used in its present form for many applications with presence-only datasets, and merits further research and development.



Logistic Methods for Resource Selection Functions and Presence-Only Species Distribution Models

August 2011

·

28 Reads

·

12 Citations

Proceedings of the AAAI Conference on Artificial Intelligence

In order to better protect and conserve biodiversity, ecologists use machine learning and statistics to understand how species respond to their environment and to predict how they will respond to future climate change, habitat loss and other threats. A fundamental modeling task is to estimate the probability that a given species is present in (or uses) a site, conditional on environmental variables such as precipitation and temperature. For a limited number of species, survey data consisting of both presence and absence records are available, and can be used to fit a variety of conventional classification and regression models. For most species, however, the available data consist only of occurrence records --- locations where the species has been observed. In two closely-related but separate bodies of ecological literature, diverse special-purpose models have been developed that contrast occurrence data with a random sample of available environmental conditions. The most widespread statistical approaches involve either fitting an exponential model of species' conditional probability of presence, or fitting a naive logistic model in which the random sample of available conditions is treated as absence data; both approaches have well-known drawbacks, and do not necessarily produce valid probabilities. After summarizing existing methods, we overcome their drawbacks by introducing a new scaled binomial loss function for estimating an underlying logistic model of species presence/absence. Like the Expectation-Maximization approach of Ward et al. and the method of Steinberg and Cardell, our approach requires an estimate of population prevalence, Pr(y=1)\Pr(y=1), since prevalence is not identifiable from occurrence data alone. In contrast to the latter two methods, our loss function is straightforward to integrate into a variety of existing modeling frameworks such as generalized linear and additive models and boosted regression trees. We also demonstrate that approaches by Lele and Keim and by Lancaster and Imbens that surmount the identifiability issue by making parametric data assumptions do not typically produce valid probability estimates.



The art of modeling range-shifted species

December 2010

·

5,403 Reads

·

2,423 Citations

1. Species are shifting their ranges at an unprecedented rate through human transportation and environmental change. Correlative species distribution models (SDMs) are frequently applied for predicting potential future distributions of range-shifting species, despite these models’ assumptions that species are at equilibrium with the environments used to train (fit) the models, and that the training data are representative of conditions to which the models are predicted. Here we explore modelling approaches that aim to minimize extrapolation errors and assess predictions against prior biological knowledge. Our aim was to promote methods appropriate to range-shifting species.


Modeling of species distributions with MAXENT: new extensions and a comprehensive evaluation

April 2008

·

5,159 Reads

·

7,124 Citations

Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use ‘‘default settings’’, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presenceabsence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce ‘‘hinge features’ ’ that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore ‘‘background sampling’’ strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model



The influence of spatial errors in species occurrence data used in distribution models

February 2008

·

407 Reads

·

444 Citations

Species distribution modelling is used increasingly in both applied and theoretical research to predict how species are distributed and to understand attributes of species’ environmental requirements. In species distribution modelling, various statistical methods are used that combine species occurrence data with environmental spatial data layers to predict the suitability of any site for that species. While the number of data sharing initiatives involving species’ occurrences in the scientific community has increased dramatically over the past few years, various data quality and methodological concerns related to using these data for species distribution modelling have not been addressed adequately. We evaluated how uncertainty in georeferences and associated locational error in occurrences influence species distribution modelling using two treatments: (1) a control treatment where models were calibrated with original, accurate data and (2) an error treatment where data were first degraded spatially to simulate locational error. To incorporate error into the coordinates, we moved each coordinate with a random number drawn from the normal distribution with a mean of zero and a standard deviation of 5 km. We evaluated the influence of error on the performance of 10 commonly used distributional modelling techniques applied to 40 species in four distinct geographical regions. Locational error in occurrences reduced model performance in three of these regions; relatively accurate predictions of species distributions were possible for most species, even with degraded occurrences. Two species distribution modelling techniques, boosted regression trees and maximum entropy, were the best performing models in the face of locational errors. The results obtained with boosted regression trees were only slightly degraded by errors in location, and the results obtained with the maximum entropy approach were not affected by such errors. Synthesis and applications . To use the vast array of occurrence data that exists currently for research and management relating to the geographical ranges of species, modellers need to know the influence of locational error on model quality and whether some modelling techniques are particularly robust to error. We show that certain modelling techniques are particularly robust to a moderate level of locational error and that useful predictions of species distributions can be made even when occurrence data include some error.


Citations (18)


... Species distribution modeling (SDM) involves many approaches, that have been used and proposed in literature to fit presence-only data in ecology. Most of the considered methods involve point processes models approach (Pearce and Boyce 2006;Elith and Leathwick 2009;Warton and Shepherd 2010;Chakraborty et al. 2011;McDonald 2013;Warton et al. 2013;Wiegand and Moloney 2013;Merow et al. 2016;Yuan et al. 2017;Soriano-Redondo et al. 2019;Isaac et al. 2020;Martino et al. 2021) as well as a variety of other methods involving simple logistic regression (Phillips and Elith 2011), penalized regression and/or regularization (Song and Raskutti 2019;Guilbault et al. 2023), Bayesian inference (Di Lorenzo et al. 2011;Golini 2012), Machine-learning and Classification and Regression Trees (CART) (Ward 2007;Zheng and Raskutti 2023). ...

Reference:

Larval fish abundance classification and modeling through spatio-temporal point processes approach
Logistic Methods for Resource Selection Functions and Presence-Only Species Distribution Models
  • Citing Article
  • August 2011

Proceedings of the AAAI Conference on Artificial Intelligence

... MaxEnt relies on presence data (locations where the species has been observed) and and contrasts them with background data sampled from across the study area to estimate the relative environmental suitability of different locations. In contrast, many machine learning models require both presence and absence data for more comprehensive modeling (Elith et al., 2020;Lotfian et al., 2022). This study employs the Random Forest algorithm to generate distribution models for three selected invasive plant species. ...

Presence-only and Presence-absence Data for Comparing Species Distribution Modeling Methods

Biodiversity Informatics

... We randomly generated locations within each site's respective availability polygon. The ratio of used to available locations can heavily bias point-process models such as RSFs [73,74]. To determine the appropriate number of available locations in each study site, we first gathered resource layers representative of distance to each land cover type for each crop-growing season for each study site (winter and summer; detailed above). ...

Point process models for presence‐only analysis
  • Citing Article
  • February 2015

... Effective conservation of large, wide ranging predators demands large protected areas which safeguard the preservation of other species as well (Di Minin et al., 2016). However, many large areas of suitable habitat may not support functional populations due to anthropogenic pressures, historical factors, or isolation from other suitable habitat patches (Phillips, 2012). Consequently, population level data are essential for implementing effective conservation and management plans. ...

Inferring prevalence from presence-only data: A response to 'Can we model the probability of presence of species without absence data?'
  • Citing Article
  • May 2012

... Several factors, including historical geological or climatic events, prolonged droughts, or significant physical barriers such as tall mountains, may prevent a species from establishing itself in a particular area. While early studies on this subject are limited, the recognition that historical factors influence elephant distribution is an emerging area of research ( Wisz et al., 2008 ). Recent studies have begun integrating historical influences into static models to better understand their effects on wildlife dispersal. ...

Effects of sample size on the performance of species distribution models

Diversity and Distributions

... The Nigerian shapefile was obtained from World Bank Data Catalog (an Open license standardized resource of boundaries (i.e., state, county) for every country in the world). species distributions 3,[21][22][23] . The nineteen bioclimatic variables with a 30 s spatial resolution (about 1 km) were downloaded from the WorldClim database (http://www.worldclim.org/) ...

Sensitivity of predictive species distribution models to change in grain size
  • Citing Article
  • May 2007

... Studies have shown that species misidentification (Phillips et al., 2009), sampling bias (Leitão et al., 2011;Beck et al., 2014), temporal or spatial bias (Graham et al., 2008;Meyer et al., 2015) and sampling strategy (Hirzel and Guisan, 2002) in species occurrence data can lead to uncertainty in species distribution models. In addition, the quality of environmental data (Graham et al., 2008), model assumptions (Chen et al., 2019) and biological interactions (Wisz et al., 2013) also contribute to uncertainty in these models. ...

The influence of spatial errors in species occurrence data used in distribution models
  • Citing Article
  • February 2008

... Thus, the default number of background samples (10,000 points) was used to characterize the environmental conditions in the area of interest 69 . Moreover, as the number of records included in P1 differs from that in P2, and thus the sampling effort in the two periods differed, a bias file for each period was created 70 . Bias files were created based on all the orchid records in each period using the geographic distance method. ...

Transferability, sample selection bias and background data in presence-only modelling: A response to Peterson et al. (2007)
  • Citing Article
  • March 2008

... Maximum entropy procedures were adapted from García-Cancel and Cox (2023) to develop the SDMs using the software program MaxEnt v.3.4.0, December 2016 ( Phillips 2017 ), which uses presence-only data for the target organism ( Phillips and Dusík 2008 ;Stohlgren et al . 2010 ). ...

Modeling of species distributions with MAXENT: new extensions and a comprehensive evaluation
  • Citing Article
  • April 2008

... Still, the non-equilibrium of species distribution in the invaded range was often overlooked, possibly leading to overfitting (i.e., the excessive dependence of model parameters on environmental conditions found in the calibration area). The issue of overfitting is particularly relevant for invasives because the aim is usually to predict suitable areas outside the known distribution of the target species (Elith et al. 2010;Mainali et al. 2015;Fourcade 2021). This means that it is crucial to find a balance between model flexibility, which is needed to capture complex species-environment relationships, and overfitting, which should be minimized to avoid modeling the spatial structure of the training points, allowing the production of meaningful predictions outside the geographic space of model training (Moreno-Amat et al. 2015;Valavi et al. 2023). ...

The art of modeling range-shifted species