Conference Paper

EXPLICITLY ACCOUNTING FOR UNCERTAINTY IN CROWDSOURCED DATA FOR SPECIES DISTRIBUTION MODELLING

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Species distribution models represent an important approach to map the spread of plant and animal species over space (and time). As all the statistical modelling techniques related to data from the field, they are prone to uncertainty. In this study we explicitly dealt with uncertainty deriving from field data sampling; in particular we propose i) methods to map sampling effort bias and ii) methods to map semantic bias.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... MaxEnt (Phillips et al., 2006), the current state-of-the-art of machine-learning algorithm traditionally used in distribution modelling Peterson et al., 2007;Warren and Seifert, 2011), represents an appealing alternative to common "soft" classifiers because it can be trained with presence-only data, treating land cover classes the same way as a single species or habitat (Mack et al., 2016). The use of models which imply presence-only methods, represent a fascinating challenge since they allow for the mapping of uncertainty in the form of suitability maps instead of binary presence/absence (Rocchini et al., 2015), overcoming the problems derived from a crisp view of landscapes and taking into account the complexity of land cover classes. ...
Article
Land-cover change, a major driver of the distribution and functioning of ecosystems, is characterized by a high diversity of patterns of change across space and time. Thus, a large amount of information is necessary to analyse change and develop plans for proper management of natural resources. In this work we tested MaxEnt algorithm in a completely remote land-cover classification and change analysis. In order to provide an empirical example, we selected Italian southern Alps as test region. We classified two Landsat images (1976 and 2001) in order to forecast probability of occurrence for unsampled locations and determining the best subset of predictors (spectral bands). A difference map for each land cover class, representing the difference between 1976 and 2001 probability of occurrence values, was realised. In order to better address the analysis of change patterns, we put together difference maps and topographic variables, since, in the study area, they are considered as the main environmental characteristic driving the land-use change topographic variables, in connection with climate change. Our results indicate that the selected algorithm, applied to land cover classes, can provide reliable data, especially when referring to classes with homogeneous texture properties and surface reflectance. The performed models had satisfactory predictive performance, showing relatively clear patterns of difference between the two time steps considered. The development of a methodology that, in the absence of field data, allow to obtain data on land use change dynamics, is of extreme importance for land planning and management.
Full-text available
Article
Binary similarity indices are widely used in ecology, e.g. for detecting associations between species’ occurrence patterns, comparing regional and temporal species assemblages, and assessing beta diversity patterns, including spatial and temporal species loss and turnover. Such indices have widespread applications in biogeography, global change biology, and biodiversity conservationSimilarity indices are commonly calculated upon binary presence/absence (or sometimes modelled suitable/unsuitable) data, which are generally incomplete and more categorical than their underlying natural patterns. Probable false absences are disregarded, amplifying the effects of data deficiencies and the scale-dependence of the resultsFuzzy occurrence data, with a degree of uncertainty attributed to localities where presence or absence cannot be safely assigned, could better reflect species distributions, compensating for incomplete knowledge and methodological errors. Similarity indices would therefore also benefit from accommodating such fuzzy data directlyThis paper proposes fuzzy versions of the binary similarity indices most commonly used in ecology, so that they can be directly applied to continuous (fuzzy) rather than binary occurrence values, thus producing more realistic similarity assessments. Fuzzy occurrence can be obtained with several methods, some of which are also provided. The procedure is robust to data source disparities, gaps or other errors in species occurrence records, even for restricted species for which slight inaccuracies can affect substantial parts of their rangeThe method is implemented in a free and open-source software package, fuzzySim, which is available for the R statistical software and under implementation for the QGIS geographic information system. It is provided with sample data and an illustrated tutorial suitable for non-experienced users.
Full-text available
Article
The confusion matrix is the standard way for reporting the accuracy of land cover and other information classified from remote-sensing imagery. This letter describes a geographically weighted method for generating spatially distributed measures of accuracy (overall, user and producer accuracies) from a logistic geographically weighted regression. A kernel-based approach defines the data and weights that are used to calculate the accuracies at each location in the study area. The results compare the global accuracy measures from a standard confusion matrix with those that have been allowed to vary locally. Maps of spatially varying user and producer accuracies describe the spatial autocorrelation of error. The use of geographically weighted models in the context of land cover accuracy is discussed and suggested as a generic approach for examining how and where error processes vary.
Full-text available
Article
The error matrix is the most common way of expressing the accuracy of remote sensing image classifications, such as land cover. However, it and the measures that can be calculated from it have been criticised for not providing any indication of the spatial distribution of errors. Other research has identified the need for methods to analyse the spatial non-stationarity of error and to visualise the spatial variation in classification uncertainty. This research uses geographically weighted approaches to model the spatial variations in the accuracy of both (crisp) Boolean and (soft) fuzzy land cover classes. Remotely sensed data were classified using a maximum likelihood classifier and a fuzzy classifier to predict Boolean and fuzzy land cover classes respectively. Field data were collected at sub-pixel locations and used to generate soft and crisp validation data. A Geographically Weighted Regression was used to analyse spatial variations in the relationships between observations of Boolean land cover in the field and land cover classified from remote sensing imagery. A geographically weighted difference measure was used to analyse spatial variations in fuzzy land cover accuracy. Maps of the spatial distribution of accuracy were created for fuzzy and Boolean classes. This research demonstrates that data collected as part of a standard remote sensing validation exercise can be used to estimate mapped, spatial distributions of accuracy that would augment standard accuracy measures reported in the error matrix. It suggests that geographically weighted approaches, and the spatially explicit representations of accuracy they support, offer the opportunity to report land cover accuracy in a more informative way.
Full-text available
Article
a b s t r a c t The coherence between different aspects in the environmental system leads to a demand for comprehen-sive models of this system to explore the effects of different management alternatives. Fuzzy logic has been suggested as a means to extend the application domain of environmental modelling from physical relations to expert knowledge. In such applications the expert describes the system in terms of fuzzy variables and inference rules. The result of the fuzzy reasoning process is a numerical output value. In such a model, as in any other, the model context, structure, technical aspects, parameters and inputs may contribute uncertainties to the model output. Analysis of these contributions in a simplified model for agriculture suitability shows how important information about the accuracy of the expert knowledge in relation to the other uncertainties can be provided. A method for the extensive assessment of uncertain-ties in compositional fuzzy rule-based models is proposed, combining the evaluation of model structure, input and parameter uncertainties. In an example model, each of these three appear to have the potential to dominate aggregated uncertainty, supporting the relevance of an ample uncertainty approach.
Full-text available
Article
Species-area relationships (SARs) are fundamental to the study of key and high-profile issues in conservation biology and are particularly widely used in establishing the broad patterns of biodiversity that underpin approaches to determining priority areas for biological conservation. Classically, the SAR has been argued in general to conform to a power-law relationship, and this form has been widely assumed in most applications in the field of conservation biology. Here, using nonlinear regressions within an information theoretical model selection framework, we included uncertainty regarding both model selection and parameter estimation in SAR modeling and conducted a global-scale analysis of the form of SARs for vascular plants and major vertebrate groups across 792 terrestrial ecoregions representing almost 97% of Earth's inhabited land. The results revealed a high level of uncertainty in model selection across biomes and taxa, and that the power-law model is clearly the most appropriate in only a minority of cases. Incorporating this uncertainty into a hotspots analysis using multimodel SARs led to the identification of a dramatically different set of global richness hotspots than when the power-law SAR was assumed. Our findings suggest that the results of analyses that assume a power-law model may be at severe odds with real ecological patterns, raising significant concerns for conservation priority-setting schemes and biogeographical studies. • conservation biology • ecoregions • model selection • vascular plants • vertebrates
Full-text available
Article
Map makers have for many years searched for a way to construct cartograms, maps in which the sizes of geographic regions such as countries or provinces appear in proportion to their population or some other analogous property. Such maps are invaluable for the representation of census results, election returns, disease incidence, and many other kinds of human data. Unfortunately, to scale regions and still have them fit together, one is normally forced to distort the regions' shapes, potentially resulting in maps that are difficult to read. Many methods for making cartograms have been proposed, some of them are extremely complex, but all suffer either from this lack of readability or from other pathologies, like overlapping regions or strong dependence on the choice of coordinate axes. Here, we present a technique based on ideas borrowed from elementary physics that suffers none of these drawbacks. Our method is conceptually simple and produces useful, elegant, and easily readable maps. We illustrate the method with applications to the results of the 2000 U.S. presidential election, lung cancer cases in the State of New York, and the geographical distribution of stories appearing in the news.
Full-text available
Article
Guidelines for submitting commentsPolicy: Comments that contribute to the discussion of the article will be posted within approximately three business days. We do not accept anonymous comments. Please include your email address; the address will not be displayed in the posted comment. Cell Press Editors will screen the comments to ensure that they are relevant and appropriate but comments will not be edited. The ultimate decision on publication of an online comment is at the Editors' discretion. Formatting: Please include a title for the comment and your affiliation. Note that symbols (e.g. Greek letters) may not transmit properly in this form due to potential software compatibility issues. Please spell out the words in place of the symbols (e.g. replace “α” with “alpha”). Comments should be no more than 8,000 characters (including spaces ) in length. References may be included when necessary but should be kept to a minimum. Be careful if copying and pasting from a Word document. Smart quotes can cause problems in the form. If you experience difficulties, please convert to a plain text file and then copy and paste into the form.
Article
There is much interest in using volunteered geographic information (VGI) in formal scientific analyses. This analysis uses VGI describing land cover that was captured using a web-based interface, linked to Google Earth. A number of control points, for which the land cover had been determined by experts allowed measures of the reliability of each volunteer in relation to each land cover class to be calculated. Geographically weighted kernels were used to estimate surfaces of volunteered land cover information accuracy and then to develop spatially distributed correspondences between the volunteer land cover class and land cover from 3 contemporary global datasets (GLC-2000, GlobCover and MODIS v.5). Specifically, a geographically weighted approach calculated local confusion matrices (correspondences) at each location in a central African study area and generated spatial distributions of user's, producer's, portmanteau, and partial portmanteau accuracies. These were used to evaluate the global datasets and to infer which of them was ‘best’ at describing Tree cover at each location in the study area. The resulting maps show where specific global datasets are recommended for analyses requiring Tree cover information. The methods presented in this research suggest that some of the concerns about the quality of VGI can be addressed through careful data collection, the use of control points to evaluate volunteer performance and spatially explicit analyses. A research agenda for the use and analysis of VGI about land cover is outlined.
Article
Positive regional correlations between biodiversity and human population have been detected for several taxonomic groups and geographical regions. Such correlations could have important conservation implications and have been mainly attributed to ecological factors, with little testing for an artefactual explanation: more populated regions may show higher biodiversity because they are more thoroughly surveyed. We tested the hypothesis that the correlation between people and herptile diversity in Europe is influenced by survey effort.
Article
The use of fuzzy sets in map accuracy assessment expands the amount of information that can be provided regarding the nature, frequency, magnitude, and source of errors in a thematic map. The need for using fuzzy sets arises from the observation that all map locations do not fit unambiguously in a single map category. Fuzzy sets allow for varying levels of set membership for multiple map categories. A linguistic measurement scale allows the kinds of comments commonly made during map evaluations to be used to quantify map accuracy. Four tables result from the use of fuzzy functions, and when taken together they provide more information than traditional confusion matrices. The use of a hypothetical dataset helps illustrate the benefits of the new methods. It is hoped that the enhanced ability to evaluate maps resulting from the use of fuzzy sets will improve our understanding of uncertainty in maps and facilitate improved error modeling. 40 refs.
Article
Aim To explore the impacts of imperfect reference data on the accuracy of species distribution model predictions. The main focus is on impacts of the quality of reference data (labelling accuracy) and, to a lesser degree, data quantity (sample size) on species presence–absence modelling. Innovation The paper challenges the common assumption that some popular measures of model accuracy and model predictions are prevalence independent. It highlights how imperfect reference data may impact on a study and the actions that may be taken to address problems. Main conclusions The theoretical independence of prevalence of popular accuracy measures, such as sensitivity, specificity, true skills statistics (TSS) and area under the receiver operating characteristic curve (AUC), is unlikely to occur in practice due to reference data error; all of these measures of accuracy, together with estimates of species occurrence, showed prevalence dependency arising through the use of a non-gold-standard reference. The number of cases used also had implications for the ability of a study to meet its objectives. Means to reduce the negative effects of imperfect reference data in study design and interpretation are suggested.
Article
Aim This article aims to test for and explore spatial nonstationarity in the relationship between avian species richness and a set of explanatory variables to further the understanding of species diversity variation. Location Sub-Saharan Africa. Methods Geographically weighted regression was used to study the relationship between species richness of the endemic avifauna of sub-Saharan Africa and a set of perceived environmental determinants, comprising the variables of temperature, precipitation and normalized difference vegetation index. Results The relationships between species richness and the explanatory variables were found to be significantly spatially variable and scale-dependent. At local scales > 90% of the variation was explained, but this declined at coarser scales, with the greatest sensitivity to scale variation evident for narrow ranging species. The complex spatial pattern in regression model parameter estimates also gave rise to a spatial variation in scale effects. Main conclusions Relationships between environmental variables are generally assumed to be spatially stationary and conventional, global, regression techniques are therefore used in their modelling. This assumption was not satisfied in this study, with the relationships varying significantly in space. In such circumstances the average impression provided by a global model may not accurately represent conditions locally. Spatial nonstationarity in the relationship has important implications, especially for studies of species diversity patterns and their scaling.
Article
This article traces the development of conceptual paradigms of soil classification and mapping from the pre-1960's model of crisp classes in attribute space linked to crisply delineated mapping units in geographical space, to modern approaches using fuzzy classification and geostatistical interpolation for simultaneously handling continuous variation in both attributes and location. Continuous classification yields a separate map of class membership values for every class the dominance of any class at each location can be expressed by a confusion index, CI. If spatial correlation is strong, zones of high CI are concentrated in narrow geographical transition zones between locally dominant classes: these zones can be refined to delineate automatically classspecific boundaries. If spatial correlation in membership values is weak then broad zones of large values of CI occur all over the map. Simulation modelling and two case studies demonstrate that contiguity in geographical space is more important for successful mapping than attribute class compactness. The studies show that soil information systems must take the spatial aspects of soil variation into account; further improvements in identifying and mapping significant soil groupings should be possible using numerical models of soil processes together with the methods presented here.
Article
Remote sensing has considerable potential for vegetation mapping. The model of vegetation distribution represented in an image classification, however, may not always be appropriate as the algorithms typically used give a ‘hard’ class allocation. Here the output of three classification techniques, a maximum likelihood, artificial neural network and fuzzy sets classification, are softened and shown to be able to reflect the class composition of image pixels and so be able to provide a better representation of some vegetation from remotely sensed imagery.
Article
Given the pervasive influence of human induced habitat fragmentation in ecological processes, landscape models are a welcome advance. The development of GIS software has allowed a greater use of these models and of analyses of the relationship between species and habitat variables. Habitat suitability models are thus theoretical concepts that can be used for planning in fragmented landscapes and habitat conservation. The most commonly used models are based on single species and on the assignment of suitability values for some environmental variables. Generally the cartographic basis for modeling suitability are thematic maps produced by a Boolean logic. In this paper we propose a model based on a set of focal species and on maps produced by a fuzzy classification method. Focal species, selected by an expert-based approach, provide a practical way of extending the scope of habitat suitability models to the conservation of biodiversity at landscape scale. The utilisation of a classification method that applies a continuity criterion may allow more consideration of the connectivity of an area because it allows a better detection of ecological gradients within a landscape. We applied this methodology to the Tuscany region focusing on terrestrial mammals. Performing a fuzzy classification we produced five land cover maps and through image processing operations we obtained a suitability model which applies a continuity criterion. The resulting suitability fuzzy model seems better for the study of connectivity and fragmentation, especially in areas with high spatial complexity.
Article
In this paper, we introduce and characterize several types of normality in double fuzzy topological spaces. The effects of some types of functions on these types of normality are introduced.
Article
Accurate mapping of species distributions is a fundamental goal of modern biogeography, both for basic and applied purposes. This is commonly done by plotting known species occurrences, expert-drawn range maps or geographical estimations derived from species distribution models. However, all three kinds of maps are implicitly subject to uncertainty, due to the quality and bias of raw distributional data, the process of map building, and the dynamic nature of species distributions themselves. Here we review the main sources of uncertainty suggesting a code of good practices in order to minimize their effects. Specifically, we claim that uncertainty should be always explicitly taken into account and we propose the creation of maps of ignorance to provide information on where the mapped distributions are reliable and where they are uncertain.
Article
In this article we review the problems encountered during the use of taxonomic information for the purpose of monitoring biodiversity. These problems encompass the nature of taxonomic data that requires human interpretation in order to be recognised in the field and grouped into well-defined classes such as species. We then briefly discuss some methods that may be utilized in order to minimise these problems.
Grass gis: A multi-purpose open source gis. Environmental Modelling & Software 31
  • M Neteler
  • M H Bowman
  • M Landa
  • M Metz
Neteler, M., Bowman, M. H., Landa, M. and Metz, M., 2012. Grass gis: A multi-purpose open source gis. Environmental Modelling & Software 31, pp. 124-130.
Fuzzy sets. Information Control
  • L Zadeh
Zadeh, L., 1965. Fuzzy sets. Information Control 8, pp. 338-353.
France This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper
ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume II-3/W5, 2015 ISPRS Geospatial Week 2015, 28 Sep -03 Oct 2015, La Grande Motte, France This contribution has been peer-reviewed. The double-blind peer-review was conducted on the basis of the full paper. Editors: A.-M. Olteanu-Raimond, C. de-Runz, and R. Devillers doi:10.5194/isprsannals-II-3-W5-333-2015