Technical ReportPDF Available

Developing a Bayesian species occupancy/abundance indicator for the UK National Plant Monitoring Scheme

Authors:

Abstract and Figures

The National Plant Monitoring Scheme (NPMS) is a volunteer-based structured plant recording scheme. This report focuses on the development of a new statistical model for the species-level data generated by the NPMS. The aim is ultimately for this to contribute to a new indicator of UK habitat quality.  NPMS surveyors collect data on plant abundance (percentage covers) from small plots targeted at specific habitats. They can participate at different levels, with the level of participation influencing the list of species sought in the field. Typically, surveyors record around 5 small plots in a 1 km square, with each plot being visited twice a year.  NPMS data must be processed in order to accurately represent the information content of the plot surveys. Because surveyors use different lists of species depending on their level, in some cases we need to distinguish between true absences (species on a surveyor's target list but not reported) and unknown cases (species not on a surveyor's target list, meaning that absence from a list is not informative).  We present a novel hierarchical statistical model for NPMS species-level data. This model seeks to make maximum use of the data collected, and integrates a standard occupancy modelling approach for plot detections with a Beta distribution model for a species' non-zero cover data.  We evaluate the proposed model using a variety of different simulated datasets. The performance of the model is assessed in relation to the bias and variance shown relative to the actual parameters used in the data simulations.  The simulations indicate that the model performs as expected under a "perfect" scenario. Smaller datasets induce various biases, many of which can be traced to the fact that, in our simulations, abundance and detectability are closely related. This biases the estimated mean of the underlying cover distribution upwards, and also impacts estimates of the intercept and regression coefficient in the detection sub-model. In real datasets this relationship would likely be less clear-cut, and we do not expect these biases to affect species' relative annual trend estimates.  Finally, we apply the model to NPMS data collected between 2015 and 2018 for 86 grassland species. The model estimates ecologically sensible mean cover values for the species analysed. However, mean plot occupancies tended to centre on 0.5, suggesting that many species may not yet have sufficient data for mean occupancy to be well estimated.  A novel combined abundance/occupancy indicator has been developed for NPMS data in a Bayesian framework. The simulation tests and applications to real data explored in this report indicate that the model performs well in ideal scenarios; biases in less data-rich scenarios can largely be explained by relationships between abundance and detectability. These are likely to be less clear-cut in real datasets, and future work will explore how additional covariates describing a species' detectability could be incorporated. Extending the model to create annual indices, and considering how these may be aggregated, will also be required for the future creation of a habitat quality indicator using NPMS data.
Content may be subject to copyright.
A preview of the PDF is not available
... However, as the underlying data often show spatial and temporal inconsistencies in the sampling coverage of subregions within the complete study region, detection of changes in biodiversity over time can be very challenging (Hill, 2012;Pescott, Powney, et al., 2019). Relevé data on the plot level are available from research institutes, universities, online databases (e.g., German Vegetation Reference Jandt & Bruelheide, 2012;or veget web.de, ...
Preprint
Full-text available
Based on plant occurrence data covering all parts of Germany, we investigated changes in the distribution of 2136 plant species between 1960 and 2017. We analyzed 29 million occurrence records over an area of approx. 350.000 km^2 on a 5 x 5 km grid using temporal and spatio-temporal models and accounting for sampling bias. Since the 1960s, more than 70% of investigated plant species showed significant declines in nation-wide occurrence. Archaeophytes (species introduced before 1492) most strongly declined but also native plant species experienced severe declines. In contrast, neophytes (species introduced after 1492) increased in their nation-wide occurrence but not homogeneously throughout the country. Our analysis suggests that the strongest declines in native species already happened in the 1960s-80s, a time frame in which usually few data exist. Increases in neophytic species were strongest in the 1990s and 2010s. Overall, the increase in neophytes did not compensate for the loss of other species, resulting in a decrease in mean grid-cell species-richness of -1.9% per decade. The decline in plant biodiversity is a widespread phenomenon occurring in different habitats and geographic regions. It is likely that this decline has major repercussions on ecosystem functioning and overall biodiversity, potentially with cascading effects across trophic levels. The approach used in this study is transferable to large-scale trend analyses using heterogeneous occurrence data.
Article
Full-text available
Based on plant occurrence data covering all parts of Germany, we investigated changes in the distribution of 2136 plant species between 1960 and 2017. We analyzed 29 million occurrence records over an area of ~350,000 km2 on a 5 × 5 km grid using temporal and spatiotemporal models and accounting for sampling bias. Since the 1960s, more than 70% of investigated plant species showed declines in nationwide occurrence. Archaeophytes (species introduced before 1492) most strongly declined but also native plant species experienced severe declines. In contrast, neophytes (species introduced after 1492) increased in their nationwide occurrence but not homogeneously throughout the country. Our analysis suggests that the strongest declines in native species already happened in the 1960s–1980s, a time frame in which often few data exist. Increases in neophytic species were strongest in the 1990s and 2010s. Overall, the increase in neophytes did not compensate for the loss of other species, resulting in a decrease in mean grid cell species richness of −1.9% per decade. The decline in plant biodiversity is a widespread phenomenon occurring in different habitats and geographic regions. It is likely that this decline has major repercussions on ecosystem functioning and overall biodiversity, potentially with cascading effects across trophic levels. The approach used in this study is transferable to other large‐scale trend analyses using heterogeneous occurrence data. Full Text available as OpenAccess here: https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15447
Article
Full-text available
Volunteer-based plant monitoring in the UK has focused mainly on distribution mapping; there has been less emphasis on the collection of data on plant communities and habitats. Abundance data provide different insights into ecological pattern and allow for more powerful inference when considering environmental change. Abundance monitoring for other groups of organisms is well-established in the UK, e.g. for birds and butterflies, and conservation agencies have long desired comparable schemes for plants. We describe a new citizen science scheme for the UK (the ‘National Plant Monitoring Scheme’; NPMS), with the primary aim of monitoring the abundance of plants at small scales. Scheme development emphasised volunteer flexibility through scheme co-creation and feedback, whilst retaining a rigorous approach to design. Sampling frameworks, target habitats and species, field methods and power are all described. We also evaluate several outcomes of the scheme design process, including: (i) landscape-context bias in the first two years of the scheme; (ii) the ability of different sets of indicator species to capture the main ecological gradients of UK vegetation; and, (iii) species richness bias in returns relative to a professional survey. Survey rates have been promising (over 60% of squares released have been surveyed), although upland squares are under-represented. Ecological gradients present in an ordination of an independent, unbiased, national survey were well-represented by NPMS indicator species, although further filtering to an entry-level set of easily identifiable species degraded signal in an ordination axis representing succession and disturbance. Comparison with another professional survey indicated that different biases might be present at different levels of participation within the scheme. Understanding the strengths and limitations of the NPMS will guide development, increase trust in outputs, and direct efforts for maintaining volunteer interest, as well as providing a set of ideas for other countries to experiment with.
Article
Full-text available
Imperfect detection leads to underestimates of species presence and decreases the reliability of survey data. Imperfect detection has not been examined in detail for boreal forest understory plants, despite widespread use of surveys for rare plants prior to development. We addressed this issue using detectability trials conducted in Alberta, Canada with decoy vascular plants. Volunteer observers searched in survey plots for species while unaware of their true presence or abundance. Our findings indicate that the detection of cryptic species is very low when abundance is low (0–35%) and plot size is large (< 50% in ≥ 100 m²). Plant density (individuals per unit area) was the most important determinant of detection probability, where more abundant species were detected more often and with less survey effort. When abundance was held constant, diffusely arranged species were twice as likely to be detected compared to those in clumps. Detection of cryptic species can be low even when individuals are flowering, and even morphologically distinct species can go unnoticed in small plots. We suggest that future decoy trials investigate search strategies that could improve detection and that field surveys for vascular plants address imperfect detection through careful consideration of plot size, characteristics of the target species, and survey effort, both in terms of time expenditure within an area and the number of observers employed.
Article
Full-text available
Understanding patterns of species occurrence and the processes underlying these patterns is fundamental to the study of ecology. One of the more commonly used approaches to investigate species occurrence patterns is occupancy modeling, which can account for imperfect detection of a species during surveys. In recent years, there has been a proliferation of Bayesian modeling in ecology, which includes fitting Bayesian occupancy models. The Bayesian framework is appealing to ecologists for many reasons, including the ability to incorporate prior information through the specification of prior distributions on parameters. While ecologists almost exclusively intend to choose priors so that they are “uninformative” or “vague”, such priors can easily be unintentionally highly informative. Here we report on how the specification of a “vague” normally distributed (i.e., Gaussian) prior on coefficients in Bayesian occupancy models can unintentionally influence parameter estimation. Using both simulated data and empirical examples, we illustrate how this issue likely compromises inference about species-habitat relationships. While the extent to which these informative priors influence inference depends on the data set, researchers fitting Bayesian occupancy models should conduct sensitivity analyses to ensure intended inference, or employ less commonly used priors that are less informative (e.g., logistic or t prior distributions). We provide suggestions for addressing this issue in occupancy studies, and an online tool for exploring this issue under different contexts.
Article
Full-text available
Plant cover data collected by monitoring schemes are often expressed on interval-censored scales to reduce field effort. Existing statistical approaches to such data may not make full use of available information, or may both induce bias and assume more precision than may be warranted, e.g. by analysing mid-points and disregarding the spread of observations within a class. We compare four approaches to modelling such data: two established methods (the proportional odds model and generalised linear mixed models) and two novel methods that explicitly accommodate the interval-censored nature of much data on plant cover. Of the latter, the first is a maximum likelihood (ML) approach that incorporates knowledge of the metric interval in which each datum lies. The second uses a Bayesian approach to incorporate interval-censoring and random effects to account for variation in annual changes between sites. All four methods are compared using data simulated with parameter values derived from analysis of a long-term monitoring dataset. We demonstrate that model choice can influence the quality of statistical inference, particularly between models that make simplifications for convenience of fitting, and those which combine realistic distributional assumptions with accommodation of imprecise observations. A comparison of three of the methods demonstrated that all provide good accuracy and increasing precision over time. A comparison of power across the three frequentist approaches showed higher power for the novel ML approach. This is likely to be due to this non-hierarchical method underestimating residual variance. The Bayesian model is not directly comparable, but the measure of belief in a negative trend considered here was generally high, providing gradual increases in the believability of a decline with increasing time, number of sites, initial abundance, and larger effect sizes. Our results suggest that the use of hierarchical models for plant monitoring schemes, conveniently applied in a Bayesian context, will help to bring greater realism and sensitivity to assessments of population change, and allow the use of more of the underlying information contained within cover data. Interval-censored methods will also allow for the integration of long-term plant datasets collected according to different cover scales, as well as presence/absence data.
Article
Full-text available
Aims Vegetation sampling employing observers is prone to both inter-observer and intra-observer error. Three types of errors are common: (i) overlooking error (i.e. not observing species actually present), (ii) misidentification error (i.e. not correctly identifying species) and (iii) estimation error (i.e. not accurately estimating abundance). I conducted a literature review of 59 articles that provided quantitative estimates or statistical inferences regarding observer error in vegetation studies.
Article
Full-text available
www.npms.org.uk/sites/www.npms.org.uk/files/PDF/British%20Wildlife%2026_4%2007%20plant%20survey_Copyright%20Bloomsbury%20Publishing_0.pdf
Article
Effective wildlife habitat management and conservation requires understanding the factors influencing distribution and abundance of plant species. Field studies, however, have documented observation errors in visually estimated plant cover including measurements which differ from the true value (measurement error) and not observing a species that is present within a plot (detection error). Unlike the rapid expansion of occupancy and N-mixture models for analysing wildlife surveys, development of statistical models accounting for observation error in plants has not progressed quickly. Our work informs development of a monitoring protocol for managed wetlands within the National Wildlife Refuge System. Zero-augmented beta (ZAB) regression is the most suitable method for analysing areal plant cover recorded as a continuous proportion but assumes no observation errors. We present a model extension that explicitly includes the observation process thereby accounting for both measurement and detection errors. Using simulations, we compare our approach to a ZAB regression that ignores observation errors (naïve model) and an "ad hoc" approach using a composite of multiple observations per plot within the naïve model. We explore how sample size and within-season revisit design affect the ability to detect a change in mean plant cover between 2 years using our model. Explicitly modelling the observation process within our framework produced unbiased estimates and nominal coverage of model parameters. The naïve and "ad hoc" approaches resulted in underestimation of occurrence and overestimation of mean cover. The degree of bias was primarily driven by imperfect detection and its relationship with cover within a plot. Conversely, measurement error had minimal impacts on inferences. We found >30 plots with at least three within-season revisits achieved reasonable posterior probabilities for assessing change in mean plant cover. For rapid adoption and application, code for Bayesian estimation of our single-species ZAB with errors model is included. Practitioners utilizing our R-based simulation code can explore trade-offs among different survey efforts and parameter values, as we did, but tuned to their own investigation. Less abundant plant species of high ecological interest may warrant the additional cost of gathering multiple independent observations in order to guard against erroneous conclusions.
Book
A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics * Wide variety of examples involving many taxa (birds, amphibians, mammals, insects, plants) * Development of classical, likelihood-based procedures for inference, as well as Bayesian methods of analysis * Detailed explanations describing the implementation of hierarchical models using freely available software such as R and WinBUGS * Computing support in technical appendices in an online companion web site.
Book
Applied Hierarchical Modeling in Ecology: Distribution, Abundance, Species Richness offers a new synthesis of the state-of-the-art of hierarchical models for plant and animal distribution, abundance, and community characteristics such as species richness using data collected in metapopulation designs. These types of data are extremely widespread in ecology and its applications in such areas as biodiversity monitoring and fisheries and wildlife management. This first volume explains static models/procedures in the context of hierarchical models that collectively represent a unified approach to ecological research, taking the reader from design, through data collection, and into analyses using a very powerful class of models. Applied Hierarchical Modeling in Ecology, Volume 1 serves as an indispensable manual for practicing field biologists, and as a graduate-level text for students in ecology, conservation biology, fisheries/wildlife management, and related fields. Provides a synthesis of important classes of models about distribution, abundance, and species richness while accommodating imperfect detection Presents models and methods for identifying unmarked individuals and species Written in a step-by-step approach accessible to non-statisticians and provides fully worked examples that serve as a template for readers' analyses Includes companion website containing data sets, code, solutions to exercises, and further information
Article
Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods-in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach. Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals. This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management. Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticians. Covers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and more. Deemphasizes computer coding in favor of basic principles. Explains how to write out properly factored statistical expressions representing Bayesian models.