Ecological Informatics

Published by Elsevier
Print ISSN: 1574-9541
Publications
This paper presents a new approach of spatiotemporally visualizing the simulation output of migratory insect dynamics and resultant vegetation changes in real-time. The visualization is capable of displaying simulated ecological phenomena in an intuitive manner, which allows research results to be easily understood by a wide range of users. In order to design a fast and efficient visualization technique, a simplified mathematical model is applied to intelligibly represent migrating groups of insects. In addition, impostors are used to accelerate rendering processes. The presented visualization method is implemented in an integrated spatiotemporal analysis system, which models, simulates and analyzes ecological phenomena such as insect migration through time at a variety of spatial resolutions.
 
Predictions of the potential for an insect species to invade a new locality have previously been made using a number of different modelling approaches that include either biotic or abiotic factors as predictor variables. Few models include both variables despite that it is recognised that both factors are important predictors of species distributions. Therefore, models that use both factors as independent input variables would be expected to be more accurate than those that are based on any factor alone. This study compares the accuracy of a range of multilayer perceptron (MLP) artificial neural network (ANN) modelling approaches for modelling the global distribution of six insect species using various combinations of abiotic and biotic factors considered to influence insect species establishment. As well as individual MLP, the modelling approaches included ensemble and cascaded networks. The biotic factors were represented by regional host plant and insect species assemblages, and abiotic factors were represented by a range of climatic variables. While no single model was found to be superior for all species, in general, ensembles of models and the combination of biotic and abiotic factors, particularly in cascaded MLP, gave improved prediction accuracy. Interestingly, sensitivity and contribution analyses showed that the presence or absence of other insect species, represented by the regional insect assemblage, was the best predictor of target species distribution compared with climate and/or host plant assemblage.
 
The use of multi-layer perceptrons (MLP) to determine the relative significance of climatic variables to the establishment of insect pest species is described. Results show that the MLP are able to learn to accurately predict the establishment of a pest species within a specific geographic region. Analysis of the MLP yielded insights into the contribution of the individual input variables and allowed for the identification of those variables that were most significant in either encouraging or inhibiting establishment.
 
In Piedmont (Italy) the environmental changes due to human impact have had profound effects on rivers and their inhabitants. Thus, it is necessary to develop practical tools providing accurate ecological assessments of river and species conditions. We focus our attention on Salmo marmoratus, an endangered salmonid which is characteristic of the Po river system in Italy. In order to contribute to the management of the species, four different approaches were used to assess its presence: discriminant function analysis, logistic regression, decision tree models and artificial neural networks. Either all the 20 environmental variables measured in the field or the 7 coming from feature selection were used to classify sites as positive or negative for S. marmoratus. The performances of the different models were compared. Discriminant function analysis, logistic regression, and decision tree models (unpruned and pruned) had relatively high percentages of correctly classified instances. Although neither tree-pruning technique improved the reliability of the models significantly, they did reduce the tree complexity and hence increased the clarity of the models. The artificial neural network (ANN) approach, especially the model built with the 7 inputs coming from feature selection, showed better performance than all the others. The relative contribution of each independent variable to this model was determined by using the sensitivity analysis technique. Our findings proved that the ANNs were more effective than the other classification techniques. Moreover, ANNs achieved their high potentials when they were applied in models used to make decisions regarding river and conservation management.
 
Plant abundance data are often analysed using standard statistical procedures without considering their distributional features and the underlying ecological processes. However, plant abundance data, e.g. when measured in biodiversity monitoring programs, are often sampled using a hierarchical sampling procedure, and since plant abundance data in a hierarchical sampling procedure are typically both zero-inflated and over-dispersed, the use of a standard statistical procedure is sub-optimal and not the best possible practice in the modelling of plant abundance data. Two distributions (the zero-inflated generalised binomial distribution and the zero-inflated bounded beta distribution) are suggested as possible distributions for analysing either discrete, continuous, or ordinal hierarchically sampled plant cover data.
 
Statistical mechanics of relative species abundance (RSA) patterns in biological networks is presented. The theory is based on multispecies replicator dynamics equivalent to the Lotka–Volterra equation, with diverse interspecies interactions. Various RSA patterns observed in nature are derived from a single parameter related to productivity or maturity of a community. The abundance distribution is formed like a widely observed left-skewed lognormal distribution. It is also found that the “canonical hypothesis” is supported in some parameter region where the typical RSA patterns are observed. As the model has a general form, the result can be applied to similar patterns in other complex biological networks, e.g. gene expression.
 
The complexity of ecosystems is staggering, with hundreds or thousands of species interacting in a number of ways from competition and predation to facilitation and mutualism. Understanding the networks that form the systems is of growing importance, e.g. to understand how species will respond to climate change, or to predict potential knock-on effects of a biological control agent. In recent years, a variety of summary statistics for characterising the global and local properties of such networks have been derived, which provide a measure for gauging the accuracy of a mathematical model for network formation processes. However, the critical underlying assumption is that the true network is known. This is not a straightforward task to accomplish, and typically requires minute observations and detailed field work. More importantly, knowledge about species interactions is restricted to specific kinds of interactions. For instance, while the interactions between pollinators and their host plants are amenable to direct observation, other types of species interactions, like those mentioned above, are not, and might not even be clearly defined from the outset. To discover information about complex ecological systems efficiently, new tools for inferring the structure of networks from field data are needed. In the present study, we investigate the viability of various statistical and machine learning methods recently applied in molecular systems biology: graphical Gaussian models, L1-regularised regression with least absolute shrinkage and selection operator (LASSO), sparse Bayesian regression and Bayesian networks. We have assessed the performance of these methods on data simulated from food webs of known structure, where we combined a niche model with a stochastic population model in a 2-dimensional lattice. We assessed the network reconstruction accuracy in terms of the area under the receiver operating characteristic (ROC) curve, which was typically in the range between 0.75 and 0.9, corresponding to the recovery of about 60% of the true species interactions at a false prediction rate of 5%. We also applied the models to presence/absence data for 39 European warblers, and found that the inferred species interactions showed a weak yet significant correlation with phylogenetic similarity scores, which tended to weakly increase when including bio-climate covariates and allowing for spatial autocorrelation. Our findings demonstrate that relevant patterns in ecological networks can be identified from large-scale spatial data sets with machine learning methods, and that these methods have the potential to contribute novel important tools for gaining deeper insight into the structure and stability of ecosystems.
 
Environmental sensor networks are now commonly being deployed within environmental observatories and as components of smaller-scale ecological and environmental experiments. Effectively using data from these sensor networks presents technical challenges that are difficult for scientists to overcome, severely limiting the adoption of automated sensing technologies in environmental science. The Realtime Environment for Analytical Processing (REAP) is an NSF-funded project to address the technical challenges related to accessing and using heterogeneous sensor data from within the Kepler scientific workflow system. Using distinct use cases in terrestrial ecology and oceanography as motivating examples, we describe workflows and extensions to Kepler to stream and analyze data from observatory networks and archives. We focus on the use of two newly integrated data sources in Kepler: DataTurbine and OPeNDAP. Integrated access to both near real-time data streams and data archives from within Kepler facilitates both simple data exploration and sophisticated analysis and modeling with these data sources.
 
Standard interfaces for data and information access facilitate data management and usability by minimizing the effort required to acquire, catalog and integrate data from a variety of sources. The authors have prototyped several data management and analysis applications using Sensor Web Enablement Services, a suite of service protocols being developed by the Open Geospatial Consortium specifically for handling sensor data in near-real time. This paper provides a brief overview of some of the service protocols and describes how they are used in various sensor web projects involving near-real-time management of sensor data.
 
Ecological communities consist of a large number of species. Most species are rare or have low abundance, and only a few are abundant and/or frequent. In quantitative community analysis, abundant species are commonly used to interpret patterns of habitat disturbance or ecosystem degradation. Rare species cause many difficulties in quantitative analysis by introducing noises and bulking datasets, which is worsened by the fact that large datasets suffer from difficulties of data handling. In this study we propose a method to reduce the size of large datasets by selecting the most ecologically representative species using a self organizing map (SOM) and structuring index (SI). As an example, we used diatom community data sampled at 836 sites with 941 species throughout the French hydrosystem. Out of the 941 species, 353 were selected. The selected dataset was effectively classified according to the similarities of community assemblages in the SOM map. Compared to the SOM map generated with the original dataset, the community pattern gave a very similar representation of ecological conditions of the sampling sites, displaying clear gradients of environmental factors between different clusters. Our results showed that this computational technique can be applied to preprocessing data in multivariate analysis. It could be useful for ecosystem assessment and management, helping to reduce both the list of species for identification and the size of datasets to be processed for diagnosing the ecological status of water courses.
 
This paper presents a model of a population of error-prone self-replicative species (replicators) that interact with its environment. The population evolves by natural selection in an environment whose change is caused by the evolutionary process itself. For simplicity, the environment is described by a single scalar factor, i.e. its temperature. The formal formulation of the model extends two basic models of Ecology and Evolutionary Biology, namely, Daisyworld and Quasispecies models. It is also assumed that the environment can also change due to external perturbations that are summed up as an external noise. Unlike previous models, the population size self-regulates, so no ad hoc population constraints are involved. When species replication is error-free, i.e. without mutation, the system dynamics can be described by an (n + 1)-dimensional system of differential equations, one for each of the species initially present in the system, and another for the evolution of the environment temperature. Analytical results can be obtained straightforwardly in low-dimensional cases. In these examples, we show the stabilizing effect of thermal white noise on the system behavior. The error-prone self-replication, i.e. with mutation, is studied computationally. We assume that species can mutate two independent parameters: its optimal growth temperature and its influence on the environment temperature. For different mutation rates the system exhibits a large variety of behaviors. In particular, we show that a quasispecies distribution with an internal sub-distribution appears, facilitating species adaptation to new environments. Finally, this ecologically inspired evolutionary model is applied to study the origin and evolution of public opinion.
 
-Four real examples to illustrate the observed patterns of differentiation in the information given by each approach about the functional structure of the vertebrate predator assemblage at Las Chinchillas National Reserve (LCNR). In the middle of each panel is the respective dendrogram of functional dissimilarity between predator species (Ta: Tyto alba, Gn: Glaucidium nanum, Pc: Pseudalopex culpaeus, Bm: Bubo magellanicus, Ac: Athene cunicularia, Fs: Falco sparverius). Histograms represent the combined sample frequency distribution for all nodes, obtained by resampling the observed prey use matrices (1000 iterations per biological season). Upper box-plots show the quantiles 2.5%, 97.5% (whiskers), 5% and 95% (boxes) of such sample distributions, and the observed node values (open dots). The right margin of the boxes corresponds to the threshold value for A1, while whiskers represent threshold values for A2. Lower box-plots display the threshold values for each node (whiskers: 2.5% and 97.5% quantiles; boxes: 25% and 75% quantiles) used in A3 and the observed node values (open dots), ordered by rank of functional dissimilarity. The four panels represent: (A) No clear differentiation (non-breeding season of 2002); (B) A2 detects a high degree of functional divergence while A3 adds no new information (breeding season of 1996); (C) A3 detects significant functional divergence for most nodes, not detected by A1 and A2 (breeding season of 2003); (D) A3 detects significant guild aggregation for most nodes, not detected by A1 and A2 (non-breeding season of 1987).
-(A) Fluctuations in the overall functional structure (FSt) of the predator assemblage at LCNR through the entire study period, following A1. (B) Fluctuations in FSt and (C) degree of functional divergence (FDv) of the predator assemblage, following A2. Because nodes come from dendrograms of diet dissimilarity (Figs. 1 and 2), positive values in panel C indicate higher relative contribution of functional divergence to overall functional structure, while negative values indicate higher contribution of guild aggregation. (D) Fluctuations in FSt and (E) in FDv of the predator assemblage, following A3. Positive and negative values indicate the same as in C. All values were calculated for the respective non-breeding (filled dots) and subsequent breeding (open dots) biological seasons of each year. In A, B and D, values of FSt were calculated using all the observed node values (solid line) or only those that were significantly different from random (segmented line). However, note that the very small differences shown by them in D, make both dynamics to collapse.
-Differences in statistical power of three algorithms as estimated by the relative discrepancy between the overall functional structure values obtained from all observed nodes values (FSt) and those obtained only from significant node values (FSt*). Box-plots show the distribution of relative discrepancy values [calculated as (FSt − FSt*) / FStr] obtained for all the non-breeding and breeding seasons (midline: median, box: 1st and 3rd quartiles, whiskers: range). Letters above the box-plots show if differences of means were significant (a vs b: P D b 0.017), marginally significant ( ab vs b: P D = 0.031), or non-significant (a vs a, or a vs ab: P D N 0.05) after randomization tests.
The study of functional structure in species assemblages emphasizes the detection of significant guild aggregation patterns. Thus, protocols based on intensive resampling of empirical data have been proposed to assess guild structure. Such protocols obtain the frequency distribution of a given functional similarity metric, and identify a threshold value (often the 95th percentile) beyond which clusters in a functional dendrogram are considered as significant guilds (using one-tailed tests). An alternative approach sequentially searches for significant differences between clusters at decreasing levels of similarity in a dendrogram until one is detected, then assumes that all subsequent nodes should also be significant. Nevertheless, these protocols do not test both the significance and sign of deviations from random at all levels of functional similarity within a dendrogram. Here, we propose a new bootstrapping approach that: (1) overcomes such pitfalls by performing two-tailed tests for each node in a dendrogram of functional similarity after separately determining their respective sample distributions, and (2) enables the quantification of the relative contribution of guild aggregation and functional divergence to the overall functional structure of the entire assemblage. We exemplify this approach by using long-term data on guild dynamics in a vertebrate predator assemblage of central Chile. Finally, we illustrate how the interpretation of functional structure is improved by applying this new approach to the data set available.
 
Precious ecological information extracted from limnological long-term time series advances the theory on functioning and evolution of freshwater ecosystems. This paper presents results of applications of artificial neural networks (ANN) and evolutionary algorithms (EA) for ordination, clustering, forecasting and rule discovery of complex limnological time-series data of two distinctively different lakes. Ten years of data of the shallow and hypertrophic Lake Kasumigaura (Japan) are utilized in comparison with 13 years of data of the deep and mesotrophic Lake Soyang (Korea). Results demonstrate the potential that: (1) recurrent supervised ANN and EA facilitate 1-week-ahead forecasting of outbreaks of harmful algae or water quality changes, (2) EA discover explanatory rule sets for timing and abundance of harmful outbreaks algal populations, and (3) non-supervised ANN provide clusters to unravel ecological relationships regarding seasons, water quality ranges and long-term environmental changes.
 
A graphical user interface is presented that allows users of taxonomic data to explore concept relationships between conflicting but related taxonomic classifications.Ecological analyses that use taxonomic metadata depend on accurate naming of specimens and taxa, and if the metadata involves several taxonomies, care has to be taken to match concepts between them. To perform this accurately requires expert-defined concept relationships, which are more complex yet more representative than the simple one-to-one mappings found through simple name matching, and can accommodate nomenclatural changes and differences in classification technique (cf ‘lumpers’ versus ‘splitters’). In the SEEK-Taxon (Scientific Environment for Ecological Knowledge) project we aim to help users of taxonomic datasets untangle and understand these relationships through a prototype visual interface which graphically displays these relationship structures, allowing users to comprehend such information and more accurately name their data.
 
Ecological patterns are difficult to extract directly from vegetation data. The respective surveys provide a high number of interrelated species occurrence variables. Since often only a limited number of ecological gradients determine species distributions, the data might be represented by much fewer but effectively independent variables. This can be achieved by reducing the dimensionality of the data. Conventional methods are either limited to linear feature extraction (e.g., principal component analysis, and Classical Multidimensional Scaling, CMDS) or require a priori assumptions on the intrinsic data dimensionality (e.g., Nonmetric Multidimensional Scaling, NMDS, and self organized maps, SOM).
 
We compared the ability of three machine learning algorithms (linear discriminant analysis, decision tree, and support vector machines) to automate the classification of calls of nine frogs and three bird species. In addition, we tested two ways of characterizing each call to train/test the system. Calls were characterized with four standard call variables (minimum and maximum frequencies, call duration and maximum power) or eleven variables that included three standard call variables (minimum and maximum frequencies, call duration) and a coarse representation of call structure (frequency of maximum power in eight segments of the call). A total of 10,061 isolated calls were used to train/test the system. The average true positive rates for the three methods were: 94.95% for support vector machine (0.94% average false positive rate), 89.20% for decision tree (1.25% average false positive rate) and 71.45% for linear discriminant analysis (1.98% average false positive rate). There was no statistical difference in classification accuracy based on 4 or 11 call variables, but this efficient data reduction technique in conjunction with the high classification accuracy of the SVM is a promising combination for automated species identification by sound. By combining automated digital recording systems with our automated classification technique, we can greatly increase the temporal and spatial coverage of biodiversity data collection.
 
The Global Positioning System (GPS) has been increasingly used during the past decade to monitor the movements of free-ranging animals. This technology allows to automatically relocate fitted animals, which often results into a high-frequency sampling of their trajectory during the study period. However, depending on the objective of trajectory analysis, this study may quickly become difficult, due to the lack of well designed computer programs. For example, the trajectory may be built by several “parts” corresponding to different behaviours of the animal, and the aim of the analysis could be to identify the different parts, and thereby the different activities, based on the properties of the trajectory. This complex task needs to be performed into a flexible computing environment, to facilitate exploratory analysis of its properties. In this paper, we present a new class of object of the R software, the class “ltraj” included in the package adehabitat, allowing the analysis of animals' trajectories. We developed this class of data after an extensive review of the literature on the analysis of animal movements. This class of data facilitates the computation of descriptive parameters of the trajectory (such as the relative angles between successive moves, distance between successive relocations, etc.), graphical exploration of these parameters, as well a numerous tests and analyses developed in the literature (first passage time, trajectory partitioning, etc.). Finally, this package also contains numerous examples of animal trajectories, and a working example illustrating the use of the package.
 
We describe a semantic data validation tool that is capable of observing incoming real-time sensor data and performing reasoning against a set of rules specific to the scientific domain to which the data belongs. Our software solution can produce a variety of different outcomes when a data anomaly or unexpected event is detected, ranging from simple flagging of data points, to data augmentation, to validation of proposed hypotheses that could explain the phenomenon. Hosted on the Jena Semantic Web Framework, the tool is completely domain-agnostic and is made domain-aware by reference to an ontology and Knowledge Base (KB) that together describe the key resources of the system being observed. The KB comprises ontologies for the sensor packages and for the domain; historical data from the network; concepts designed to guide discovery of internet resources unavailable in the local KB but relevant to reasoning about the anomaly; and a set of rules that represent domain expert knowledge of constraints on data from different kinds of instruments as well as rules that relate types of ecosystem events to properties of the ecosystem. We describe an instance of such a system that includes a sensor ontology, some rules describing coastal storm events and their consequences, and how we relate local data to external resources. We describe in some detail how a specific actual event—an unusually high chlorophyll reading—can be deduced by machine reasoning to be consistent with being caused by benthic diatom resuspension, consistent with being caused by an algal bloom, or both.
 
Ecological systems are governed by complex interactions which are mainly nonlinear. In order to capture this complexity and nonlinearity, statistical models recently gained popularity. However, although these models are commonly applied in ecology, there are no studies to date aiming to assess the applicability and performance. We provide an overview for nature of the wide range of the data sets and predictive variables, from both aquatic and terrestrial ecosystems with different scales of time-dependent dynamics, and the applicability and robustness of predictive modeling methods on such data sets by comparing different statistical modeling approaches. The methods considered k-NN, LDA, QDA, generalized linear models (GLM) feedforward multilayer backpropagation networks and pseudo-supervised network ARTMAP. For ecosystems involving time-dependent dynamics and periodicities whose frequency are possibly less than the time scale of the data considered, GLM and connectionist neural network models appear to be most suitable and robust, provided that a predictive variable reflecting these time-dependent dynamics included in the model either implicitly or explicitly. For spatial data, which does not include any time-dependence comparable to the time scale covered by the data, on the other hand, neighborhood based methods such as k-NN and ARTMAP proved to be more robust than other methods considered in this study. In addition, for predictive modeling purposes, first a suitable, computationally inexpensive method should be applied to the problem at hand a good predictive performance of which would render the computational cost and efforts associated with complex variants unnecessary.
 
Lake Tuendae is a shallow, alkaline, artificially constructed Mojave Desert aquatic environment housing the endangered Mojave Tui Chub (Gila bicolar mohavensis). Detailed physiological response studies have been reported on the Mojave Tui Chub but few on the physico-chemical state of Lake Tuendae, one of four key Mojave Desert habitats for this species. Two sampling campaigns (spring 2004 and 2005) were conducted with correlation analysis, cluster analysis (CA) and principal component analysis (PCA) of physico-chemical water column and surface sediment quality parameters performed. CA proved useful in displaying parameter similarity for initial interpretation. PCA proved to be a more reliable display model and permitted the reduction of 14 parameters for the water column to four principal components accounting for 71% of the total variability. For surface sediments, four principal components accounted for 81% of the total variability. This work highlights the successful use of chemometric multivariate techniques in helping elucidate the physico-chemical make-up of shallow desert aquatic environments, and instructive for investigators assessing the health of aquatic species in such habitats.
 
The dynamics of the dissolved oxygen in water bodies is the result of complex interactions involving physical and biological processes. Understanding how the balance of these influences determines the amount of oxygen available for living organisms is a key factor to interpret the water body conditions, and eventually to use dissolved oxygen as an indicator of the water quality. In this paper we present a Qualitative Reasoning model developed to improve understanding of changes in the amount of dissolved oxygen in different segments of the river Mesta in Bulgaria. Effects on dissolved oxygen result from changes in physical, chemical and biological processes induced both by natural and anthropogenic activities within the watershed. To explore the possibility of establishing a landmark value that may change according to specific conditions, we developed the concept of flexible value mapping, which dynamically captures changes in the dependencies between the landmark value and the values of other quantities as the conditions of the system change during the simulations. The paper also discusses the concept of dominance of a specific process over other competing processes affecting a quantity. With the model described here, we aim to discuss possible solutions to interesting modelling problems and to provide the community of ecological modellers support for educational activities and water resources management.
 
– Description of the 33 environmental variables used to model EPT richness 
Map of the study area and sample sites (n = 195) in the Pacific Northwest and Southern Rocky Mountain regions of the United States. These sites are spatially distributed across 9 level-III ecoregions, including Southern Rockies (n = 53), Colorado Plateau (n = 5), Columbia Plateau (n = 9), Williamette Valley (n = 4), Coast Range (n = 86), Klamath Mountains (n = 4), Cascades (n = 16), Eastern Cascades Slopes and Foothills (n = 10) and North Cascades (n = 8). Symbol sizes are proportional to EPT taxa richness, i.e., larger symbols represent larger EPT richness.
The field of ecoinformatics is concerned with gaining a greater understanding of complex ecological systems. Many ecoinformatic tools, including artificial neural networks (ANNs), can shed important insights into the complexities of ecological data through pattern recognition and prediction; however, we argue that ecological knowledge has been used in a very limited fashion to shape the manner in which these approaches are applied. The present study provides a simple example of using ecological theory to better direct the use of neural networks to address a fundamental question in aquatic ecology—how are local stream macroinvertebrate communities structured by a hierarchy of environmental factors operating at multiple spatial scales? Using data for 195 sites in the western United States, we developed single-scale, multi-scale and hierarchical multi-scale neural networks relating EPT (Orders: Ephermeroptera, Plecoptera, Trichoptera) richness to environmental variables quantified at 3 spatial scales: entire watershed, valley bottom (100s–1000s m), and local stream reach (10s–100s m). Results showed that models based on multiple spatial scales greatly outperformed single-scale analyses (R = 0.74 vs. R¯ = 0.51) and that a hierarchical ANN, which accounts for the fact that valley- and watershed-scale drivers influence local characteristics of the stream reach, provided greater insight into how environmental factors interact across nested spatial scales than did the non-hierarchical multi-scale model. Our analysis suggests that watershed drivers play a greater role in structuring local macroinvertebrate assemblages via their direct effects on local-scale habitats, whereas they play a much smaller indirect role through their influence on valley-scale characteristics. For the hierarchical model, the strongest predictors of EPT richness included descriptors of climate, land-use and hydrology at the watershed scale, land-use at the valley scale, and substrate characteristics and riparian cover at the reach scale. In summary, our results highlight the importance of incorporating environmental hierarchies to better understand and predict local patterns of macroinvertebrate assemblage structure in stream ecosystems. More generally, our case study serves to emphasize how incorporating prior ecological knowledge into ANN model structure can strengthen the relevance of ecoinformatic techniques for the broader scientific community.
 
Fuzzy importance matrices from CCA results.
FromTable 1 results, projections of effects of all environmental variables across rows.
In a global assessment, canonical correspondence analysis (CCA) and partial CCA were used to ordinate Lake Huron phytoplankton abundances from June and August 1991 and environmental variables. June taxa were associated with NO3 and chloride, while August taxa were associated with SiO2 and temperature, and to some degree, with TSP and NH3. Dominant taxa were Asterionella formosa, Fragilaria capucina, Fragilaria crotonensis, Tabellaria fenestrata, and Urosolenia eriensis in June, and Achnanthidium minutissimum, Cyclotella #6, Cyclotella comensis, Cyclotella michiganiana, and Cyclotella pseudostelligera in August reflecting seasonal change. From local analysis using results from CCA and partial CCA in fuzzy relational analysis, A. minutissimum and C. comensis were influential in June, while F. crotonensis was influential in August. From linguistic translation and trophic status assignment, F. capucina and T. fenestrata indicated eutrophy, A. formosa indicated mesotrophy, C. pseudostelligera indicated mesotrophy–eutrophy, F. crotonensis and U. eriensis indicated oligotrophy–eutrophy, Cyclotella #6 indicated oligotrophy–mesotrophy, and C. michiganiana indicated oligotrophy. A linguistic solution with respect to trophic status is useful for policy makers and others interested in understanding water quality and ways to develop decisions about remediation.
 
Multivariate statistical analysis is a powerful method of examining complex datasets, such as species assemblages, that does not suffer from the oversimplification prevalent in many univariate analyses. However, identifying whether data points on a multivariate plot are clustered is subjective, as there is no determination of significant differences between the points and no indication of the level of confidence in those points. The validity of drawing such conclusions may therefore be considered suspect. This paper describes a method of bootstrapping calculated principal components to estimate a confidence radius, similar to confidence intervals in univariate techniques. Plotting 3D scatterplots of the principal components, with the size of the spherical point representative of the level of confidence of the estimate, gives a clear and visual indication of significant difference between the points — where the spheres overlap there is no significant difference. We apply the technique to mammal assemblages at sites in Epping Forest (Essex, UK) that differ in the level of disturbance present and find that differences between some sites that appear large using traditional principal components analysis are actually not significantly different at the 95% confidence level, while other sites do differ significantly. Sites that differ most in anthropogenic disturbance are not significantly different in terms of assemblage structure.
 
This paper presents a new statistical techniques — Bayesian Generalized Associative Functional Networks (GAFN), to model the dynamical plant growth process of greenhouse crops. GAFNs are able to incorporate the domain knowledge and data to model complex ecosystem. By use of the functional networks and Bayesian framework, the prior knowledge can be naturally embedded into the model, and the functional relationship between inputs and outputs can be learned during the training process. Our main interest is focused on the Generalized Associative Functional Networks (GAFNs), which are appropriate to model multiple variable processes. Three main advantages are obtained through the applications of Bayesian GAFN methods to modeling dynamic process of plant growth. Firstly, this approach provides a powerful tool for revealing some useful relationships between the greenhouse environmental factors and the plant growth parameters. Secondly, Bayesian GAFN can model Multiple-Input Multiple-Output (MIMO) systems from the given data, and presents a good generalization capability from the final single model for successfully fitting all 12 data sets over 5-year field experiments. Thirdly, the Bayesian GAFN method can also play as an optimization tool to estimate the interested parameter in the agro-ecosystem. In this work, two algorithms are proposed for the statistical inference of parameters in GAFNs. Both of them are based on the variational inference, also called variational Bayes (VB) techniques, which may provide probabilistic interpretations for the built models. VB-based learning methods are able to yield estimations of the full posterior probability of model parameters. Synthetic and real-world examples are implemented to confirm the validity of the proposed methods.
 
Coalbed Natural Gas extraction usually results in the production of excess, or product, water, necessitating a strategy for disposal and minimizing landscape and habitat impacts. In the Powder River Basin in Wyoming, product water is either discharged into ephemeral streams or retention/detention ponds. Monitoring these water bodies is important for environmental, habitat, and human health perspectives. This study assessed the benefits of using higher spatial resolution ASTER image, in contrast to more commonly used moderate-resolution Landsat imagery, for detecting smaller water bodies in the Powder River Basin. ASTER and Landsat Thematic Mapper (TM) images were acquired concomitantly and classified following similar methods to identify water bodies for three color classes and a range of sizes. Results showed that the ASTER image had significantly higher accuracies for detecting clear and green colored water bodies, but did not demonstrate significant improvement for detecting turbid water bodies. ASTER also showed significant improvements in detecting small-scale water bodies. However this improved performance was somewhat offset due to the misclassification of other landscape elements as water in the ASTER image. Overall when compared to Landsat TM image, ASTER image more accurately detected more water bodies, especially those with a relatively small surface area, with the two images producing similar results at large scales. The application of ASTER is therefore appropriate for monitoring and evaluation of water bodies in the Powder River Basin and elsewhere.
 
The flow regulation of rivers, mainly for flood control in wet season and water supply in dry season, dramatically altered the hydrological regime in the downstream, thus imposed significant impacts on the aquatic ecosystem. The evolution of riparian vegetation is an important indicator to quantify these impacts. This research focuses on the understanding of the vegetation dynamics and succession of riparian zones due to flow regulations by reservoir operation. The study developed an integrated model which couples a two-dimensional hydrodynamics module with a vegetation evolution module. Owing to the ability to well present spatial heterogeneity and local interactions, the vegetation module applied a cellular automata approach. To more precisely describe the complex morphology and topography, and to improve computation efficiency, an unstructured cellular automata (UCA) scheme which implemented a triangular mesh was used. The developed model was applied to a typical compound channel of the Lijiang River, which has been largely affected by the flow regulations of the Qingshitan reservoir for navigation purpose. The model was calibrated by the historical vegetation data, the field observations and the controlling experiment data. Through the scenarios simulation, the effects of flow regulation on riparian vegetation dynamics were analyzed. In addition, the potentials of UCA in riparian vegetation modelling were well explored.
 
Dependence of radar backscatter on biomass. L-band refers to the sensor wavelength (approximately 23 cm), and hv-polarization to the plane of propagation (horizontal transmit, vertical receive) of the radar waves. The plotted curves show the positive relationship between field-measured biomass and radar backscatter plus coefficient of determination (r 2 ). The curves also show that different structural types (e.g. aspens, pines, etc.) exhibit somewhat different forms of the relationship. Extinction (leveling off or decrease in backscatter) may occur at very high biomass quantities of the particular forest structural type and varies by type.
Michigan Forests Test Site (MFTS). The inset map shows the location of the study site in the upper Great Lakes region, USA. The background map is derived from Landsat data used in this study and depicts general land-cover types of the region. The outlined rectangle corresponds to the path and swath of the radar sensor and to the test site boundaries.
– Images used or created through this study: a) vegetation type layer, b) majority layer, c) variety layer, d) biomass layer, e) output composite habitat map for red-eyed vireo modeled with vegetation and biomass.  
The goal of this study was to evaluate the contributions of forest and landscape structure derived from remote sensing instruments to habitat mapping. Our empirical data focused at the landscape scale on a test site in northern Michigan, using radar and Landsat imagery and bird-presence data by species. We tested the contributions of multi-dimensional forest and landscape structure variables using GARP (Genetic Algorithm for Rule-Set Production), a representative modeling methodology used in biodiversity informatics. For our multi-dimensional variables, radar data were processed to derive forest biomass maps and these data were used with a Landsat-derived vegetation type classification and spatial neighborhood analyses. We collected field data on bird species presence and habitat for northern forest birds known to have a range of vegetation habitat requirements. We modeled and tested the relationships between bird presence and 1) vegetation type, 2) vegetation type and spatial neighborhood descriptions, 3) vegetation type and biomass, and 4) all variables together, using GARP, for three bird species. Modeled results showed that inclusion of biomass or neighborhoods improved the accuracy of bird habitat prediction over vegetation type alone, and that the inclusion of neighborhoods and biomass together generally produced the greatest improvement. The maps and model rules resulting from the multiple factor models were interpreted to be more precise depictions of a particular species habitat when compared with the models that used vegetation type only. We suggest that for bird species whose niche requirements include forest and landscape structure, inclusion of multi-dimensional information may be advantageous in habitat modeling at the landscape level. Further research should focus on testing additional variables and species, on further integration of newer radar and lidar remote sensing capabilities with multi-spectral sensors for quantifying forest and landscape multi-dimensional structure, and incorporating these in biodiversity informatics modeling.
 
Large-scale spatial planning requires careful use and presentation of spatial data as it provides a means for communication with local stakeholders and decision makers. This is especially true for endangered species, such as the badger (Meles meles) in the Netherlands. To effectively mitigate the badger's traffic mortality in an area, two types of tools are needed. The first one estimates the probability of a successful road crossing for individual animals. The second tool is GIS-based and not only models the movement patterns of animals but also estimates an animal's daily number of road crossings. With data on population size as well as on road and traffic characteristics, a combination of both tools provides a measure of the mortality risk roads pose to wildlife in an area. Such estimations proved to be invaluable in a planning process with local inhabitants in the municipality of Brummen (the Netherlands), where ecological as well as safety problems appear. Our study demonstrates the applicability of GIS tools in balancing ecological consequences of road network options with a different distribution of traffic flows over the area in spatial planning and ecology.
 
In this study, stream modifications were surveyed in order to discover the relationships between geographical characteristics, human population distribution, and artificial stream alteration in the Nakdong River system, South Korea. Prior to this study, there was no comprehensive survey of stream modifications of the Nakdong River basin, even though the utilization of its water resources and ecosystem is recognized as an important issue. A total of 1655 stream sites were investigated by applying the Stream Modification Index (SMI), consisting of 12 parameters, comprising three characteristic factors of channels, land use, and levees (each characteristic factor contained four parameters). Those parameters were dichotomous (i.e. marked as 0 or 1), and a higher score of summing 12 parameters values (the SMI score) indicates a more modified state (maximum 12, minimum 0). This data was averaged in accordance with 265 unit catchments in the Nakdong River basin, and compared with population density, seven land coverage categories, elevation, and slope of each of unit catchments to discover general patterns of stream modification in the river basin by the application of a self-organizing map (SOM). A general tendency of increase in survey scores was observed in which unit catchments in urbanized area as well as high population density was found, and significant Spearman rank correlation coefficients were obtained for those relationship. However, though the statistical analysis exhibited significance, the relationship between survey results and socio-geographical information was unclear. SOM application clustered the 265 unit catchments into four groups on the map size of 9 × 6 plane (quantization error 0.329; topographic error, 0.000), such as catchments where streams were largely modified due to urbanization (cluster 4), relatively well preserved due to high elevation (cluster 2), moderately modified due to agricultural land coverage along with the main channel of the Nakdong River (cluster 1), and the remaining catchments with relatively moderately modified streams (cluster 3). The modification degree represented by the index scores was relatively high in which catchments in a highly urbanized area with large human population density exist, while scarce modification of stream occurred in relatively elevated and forested area. The results of this study suggest not only information and evidences of the general tendency of artificial stream utilization, but also the efficiency of SOM application to a basin-level characterization.
 
Emerging ecological time series from long-term ecological studies and remote sensing provide excellent opportunities for ecologists to study the dynamic patterns and governing processes of ecological systems. However, signal extraction from long-term time series often requires system learning (e.g., estimation of true system state) to process the large amount of information, to reconstruct system state, to account for measurement error, and to handle missing data. State-space models (SSMs) are a natural choice for these tasks and thus have received increasing attention in ecological and environmental studies. Data-based learning using SSMs that connect ecological processes to the measurement of system state becomes a useful technique in the ecological informatics toolkit. The present study illustrates the use of the Kalman filter (KF), an estimator of SSMs, with case studies of population dynamics. The examples of the SSM applications include the reconstruction of system state using the KF method and Markov chain Monte Carlo methods, estimation of measurement-error variances in the estimates of animal population abundance using basic structural models (BSMs), and estimation of missing values using the KF and Kalman smoother. Estimation of measurement-error variances by BSMs does not require knowledge of the functional form that generates the time series data. Instead, BSMs approximate the trajectory or deterministic skeleton of a system dynamics in a semi-parametric fashion, and provide a robust estimator of measurement-error variances. The present study also compares Bayesian SSMs with non-Bayesian SSMs. The joint use of the KF method or its extensions and Markov chain Monte Carlo (MCMC) methods is a promising approach to the parameter estimation of SSMs.
 
Beta diversity represents a powerful indicator of ecological conditions because of its intrinsic relation with environmental gradients. In this view, remote sensing may be profitably used to derive models characterizing or estimating species turnover over an area. While several examples exist using spectral variability to estimate species diversity at several spatial scales, most of these have relied on standard correlation or regression approaches like the common Ordinary Least Square (OLS) regression which are problematic with noisy data. Moreover, very few attempts were made to derive beta diversity characterization models at different taxonomic ranks. In this paper, we performed quantile regression to test if spectral distance represents a good proxy of beta diversity considering different data thresholds and taxonomic ranks. We used plant distribution data from the North and South Carolina including 146 counties and covering a variety of vegetation formations. The dissimilarity in species composition at different taxonomic ranks (using Sørensen distance) among pairs of counties was compared with their distance in NDVI values derived from 23 yearly MODIS images. Our results indicate that (i) spectral variability represents a good proxy of beta diversity when appropriate statistics are applied and (ii) a lower taxonomic rank is important when changes in species composition are examined spatially using remotely sensed data.
 
Biodiversity and ecosystem data are both geo-referenced and “species-referenced”. Ecoregion classification systems are relevant to basic ecological research and have been increasingly used for making policy and management decisions. There are practical needs to integrate taxonomic data with ecoregion data in a GIS to visualize and explore species distribution conveniently. In this study, we represent the species distributed in an ecoregion as a taxonomic tree and extend the classic GIS data model to incorporate operations on taxonomic trees. A prototype called GBD-Explorer was developed on top of the open source JUMP GIS. We use the World Wildlife Fund (WWF) terrestrial ecoregion and WildFinder species databases as an example to demonstrate the rich capabilities implemented in the prototype.
 
Advances in technology have enabled new approaches for sensing the environment and collecting data about the world. Once collected, sensor readings can be assembled into data streams and transmitted over computer networks for storage and processing at observatories or to evoke an immediate response from an autonomic computer system. However, such automated collection of sensor data produces an immense quantity of data that is time consuming to organize, search and distill into meaningful information. In this paper, we explore the design and use of distributed pipelines for automated processing of sensor data streams. In particular, we focus on the detection and extraction of meaningful sequences, called ensembles, from acoustic data streamed from natural areas. Our goal is automated detection and classification of various species of birds.
 
Average γ values for 1000 simulated tunnels based on values for C. formosanus where 0.0 b P branch b 0.4 and 0.1 b P term b 0.9. The yellow box indicates the value of γ (P term = 0.43 and P branch = 0.1) for empirical tunnel patterns. The normalized foraging efficiency was represented as the degree of gray color. Brighter gray means higher foraging efficiency. Four typical tunnel patterns were represented for three regions, I, II, and III. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)  
In our previous study, we constructed a lattice model of termite tunnel pattern to explore the relationship between tunnel geometry and foraging efficiency. The model was based on experimental data obtained from homogeneous soil substrates without food resource. In the present study, we adopted a more general rule in the model to determine branching tunnel lengths. The rule was described by two variables, the probability of tunnel branching, Pbranch, and the probability for a branching tunnel to terminate, Pterm. With the modified model, we explored the influence of the geometry of branching tunnel on foraging efficiency, γ, for two termite species, Coptotermes formosanus and Reticulitermes flavipes (Isoptera: Rhinotermitidae). For C. formosanus, γ map consisting of the two variables were partitioned in three regions by the level of γ value, while γ for R. flavipes was categorized as two regions: higher γ and lower γ. This result was discussed in termite foraging strategy.
 
Successful transfer and uptake of qualitative reasoning technology for modelling and simulation in a variety of domains has been hampered by the lack of a structured methodology to support formalisation of ideas. We present a framework that structures and supports the capture of conceptual knowledge about system behaviour using a qualitative reasoning approach. This framework defines a protocol for representing content that supports the development of a conceptual understanding of systems and how they behave. The framework supports modellers in two ways. First, it structures and explicates the work involved in building models. Second, it facilitates easier comparison and evaluation of intermediate and final results of modelling efforts. We show how this framework has been used in developing qualitative reasoning models about three case studies of sustainable development in different river systems.
 
Our paper computationally explores the extinction dynamics of an animal species effected by a sudden spike in mortality due to an extreme event. In our study, the animal species has a 2-stage life cycle and is endowed with a high survival probability under normal circumstances. Our proposed approach does not involve any restraining assumptions concerning environmental variables or predator–prey relationships. Rather it is based on the simple premise that if observed on an year-to-year basis, the population size will be noted to either have gone up or come down as compared to last year. The conceptualization is borrowed from the theory of asset pricing in stochastic finance. The survival probability (λ) is set at unity i.e. the model assumes that all young members of the population mature into adults capable of reproduction. As we bias our model heavily in favor of survival, the chance of the population size increasing over time is much higher than it suffering a decline, if no extreme events occur. One of the critical parameters in our simulation model is the shock size i.e. the number of immediate mortalities that may be caused by an extreme event. We run our model for two pre-selected fecundity levels denoted as “high” and “low”. Under each of the two fecundity levels one hundred independent simulation runs are conducted over a time period of ten stages (i.e. five generations) and the relevant descriptive statistics are reported for the terminal (i.e. the fifth) generation. Shock sizes are varied until at least one scenario of total extinction is observed in the simulation output. Any extinctions occurring in t0 are treated as “trivial cases” and not counted. Our results indicate that an extreme event with a minimum shock size exceeding 2 / 3 the size of the pristine population can potentially drive animal species with 2-stage life cycles to extinction for both “low” and “high” fecundity levels.
 
The exact area calculation of irregularly distributed data is in the focus of all territorial geochemical balancing methods or definition of protection zones. Especially in the deep-sea environment the interpolation of measurements into surfaces represents an important gain of information, because of cost- and time-intensive data acquisition. The geostatistical interpolation method indicator kriging therefore is applied for an accurate mapping of the spatial distribution of benthic communities following a categorical classification scheme at the deep-sea submarine Håkon Mosby Mud Volcano. Georeferenced video mosaics were obtained during several dives by the Remotely Operated Vehicle Victor6000 in a water depth of 1260 m. Mud volcanoes are considered as significant source locations for methane indicated by unique chemoautotrophic communities as Beggiatoa mats and pogonophoran tube worms. For the detection and quantification of their spatial distribution 2840 georeferenced video mosaics were analysed by visual inspection. Polygons, digitised on the georeferenced images within a GIS, build the data basis for geostatistically interpolated mono-parametric surface maps. Indicator kriging is applied to the centroids of the polygons calculating surface maps. The quality assessment of the surface maps is conducted by leave-one-out cross-validation evaluating the fit of the indicator kriging variograms by using statistical mean values. Furthermore, the estimate was evaluated by a validation dataset of the visual inspection of 530 video mosaics not included within the interpolation, thus, proving the interpolated surfaces independently. With regard to both validating mechanisms, we attained satisfying results and we provided each category applied for the identification of biogeochemical habitats with a percentage probability value of occurrence.
 
Predictive modeling and mapping based on the quantitative relationships between a species and the biophysical features (predictor variables) of the ecosystem in which it occurs can provide fundamental information for developing sustainable resource management policies for species and ecosystems. To create management strategies with the goal of sustaining a species such as sage grouse (Centrocercus urophasianus), whose distribution throughout North America has declined by approximately 50%, land management agencies need to know what attributes of the range they now inhabit will keep populations sustainable and which attributes attract disproportionate levels of use within a home range. The objectives of this study were to 1) quantify the relationships between sage grouse nest-site locations and a set of associated biophysical attributes using Maximum Entropy, 2) find the best subset of predictor variables that explain the data adequately, 3) create quantitative sage grouse distribution maps representing the relative likelihood of nest-site habitat based on those relationships, and 3) evaluate the implications of the results for future management of sage grouse. Nest-site location data from 1995 to 2003 were collected as part of a long-term research program on sage grouse reproductive ecology at Hart Mountain National Antelope Refuge. Two types of models were created: 1) with a set of predictor variables derived from digital elevation models, a field-validated vegetation classification, and UTM coordinates and 2) with the same predictors and UTM coordinates excluded. East UTM emerged as the most important predictor variable in the first type of model followed by the vegetation classification which was the most important predictor in the second type of model. The average training gain from ten modeling runs using all presence records and randomized background points was used to select the best subset of predictors. A predictive map of sage grouse nest-site habitat created from the application of the model to the study area showed strong overlap between model predictions and nest-site locations.
 
This paper presents a hybrid evolutionary algorithm (HEA) to discover complex rule sets predicting the concentration of chlorophyll-a (Chl.a) based on the measured meteorological, hydrological and limnological variables in the hypertrophic Nakdong River. The HEA is designed: (1) to evolve the structure of rule sets by using genetic programming and (2) to optimise the random parameters in the rule sets by means of a genetic algorithm. Time-series of input–output data from 1995 to 1998 without and with time lags up to 7 days were used for training HEA. Independent input–output data for 1994 were used for testing HEA. HEA successfully discovered rule sets for multiple nonlinear relationships between physical, chemical variables and Chl.a, which proved to be predictive for unseen data as well as explanatory. The comparison of results by HEA and previously applied recurrent artificial neural networks to the same data with input–output time lags of 3 days revealed similar good performances of both methods. The sensitivity analysis for the best performing predictive rule set unraveled relationships between seasons, specific input variables and Chl.a which to some degree correspond with known properties of the Nakdong River. The statistics of numerous random runs of the HEA also allowed determining most relevant input variables without a priori knowledge.
 
Sampling sites in the Du river basin, Vietnam. 
Observed river characteristics in the Du river basin during the sampling period 2006-2008.
Number of runs in which input variables were selected by classi fi cation trees with genetic algorithms (a) and number of times that input variables were selected as important in SVM models (b). 
Correctly Classi fi ed Instances (CCI) (a) and Cohen's kappa (b) of classi fi cation trees for 30 modelled benthic macroinvertebrate taxa before (dashed bars) and after (black bars) variable selection. 
Correctly Classi fi ed Instances (CCI) (a) and Cohen's kappa (b) of support vector machines for 30 modelled benthic macroinvertebrate taxa before (dashed bars) and after (black bars) variable selection. 
In the present study, classification trees (CTs) and support vector machines (SVMs) were used to study habitat suitability for 30 macroinvertebrate taxa in the Du river in Northern Vietnam. The presence/absence of the 30 most common macroinvertebrate taxa was modelled based on 21 physical-chemical and structural variables. The predictive performance of the CT and SVM models was assessed based on the percentage of Correctly Classified Instances (CCI) and Cohen's kappa statistics. The results of the present study demonstrated that SVMs performed better than CTs. Attribute weighing in SVMs could replace the application of genetic algorithms for input variable selection. By weighing attributes, SVMs provided quantitative correlations between environmental variables and the occurrence of macroinvertebrates and thus allowed better ecological interpretation. SVMs thus proved to have a high potential when applied for decision-making in the context of river restoration and conservation management.
 
The potential for physical flora collections to support scientific research is being enhanced by rapid development of digital databases that represent characteristics of the physical specimens held in those collections and make this information available remotely. One example is the unified database of California flora observations from the Consortium of California Herbaria that was developed to support the exploration of plant diversity patterns, distribution ranges of species, and vegetation associations for specimens held in physical collections. Many of the records in the herbaria database, and in complementary databases elsewhere, are geo-referenced; but, current web tools for accessing the data do not take advantage of that georeferencing. In this paper, we report on development and implementation of a web-based client–server map interface to facilitate open mapping and exploration of the dataset. Three research objectives were addressed: (1) develop a method for efficient web-map client–server interaction involving large volumes of spatiotemporal point data, (2) develop a symbology and symbol scaling method for representing those spatial–temporal data in the client, and (3) develop an interface for client–server interactions and data exploration. With a focus on cartographically-sound visualization and user-friendly interaction, we introduce HerbariaViz, a web mapping application that provides space–time–species data query responses efficiently. Following a discussion of relevant literature, we present open-source methods for aggregating point data spatially and temporally, outline our approach to sound cartographic representations of those data, and detail the design of a client interface for making requests and mapping responses. A focus group session involving domain experts was performed to provide user evaluation of the application. In our discussion, we present potential avenues of future work, including: facilitating query response comparisons, handling incomplete and inaccurate data, and generalizing the method presented here.
 
Regional climate modeling is a technique for simulating high-resolution physical processes in the atmosphere, soil and vegetation. It can be used to evaluate wildfire potential by either providing meteorological conditions for computation of fire indices or predicting soil moisture as a direct measure of fire potential. This study examines these roles using a regional climate model (RCM) for the drought and wildfire events in 1988 in the northern United States. The National Center for Atmospheric Research regional climate model (RegCM) was used to conduct simulations of a summer month in each year from 1988 to 1995. The simulated precipitation and maximum surface air temperature were used to calculate the Keetch–Byram Drought Index (KBDI), which is a popular fire potential index. We found that the KBDI increased significantly under the simulated drought condition. The corresponding fire potential was upgraded from moderate for a normal year to high level for the drought year. High fire potential is often an indicator for occurrence of intense and extensive wildfires. Fire potential changed in the opposite direction for the 1993 flood event, indicating little possibility of severe wildfires. The soil moisture and KBDI evaluations under the drought and flood conditions are in agreement with satellite remotely sensed vegetation conditions and the actual wildfire activity. The precipitation anomaly was a more important contributor to the KBDI changes than temperature anomaly. The small magnitude of the simulated soil moisture anomalies during the drought event did not provide sufficient evidence for the role of simulated soil moisture as a direct measure of wildfire potential.
 
Neutral models in ecology have attracted much attention in recent literature. They can provide considerable insight into the roles of non-species-specific factors (e.g. stochasticity, dispersal, speciation) on community dynamics but often require intensive simulations, particularly in spatial settings. Here, we clearly explain existing techniques for modelling spatially explicit neutral processes in ecology using coalescence. Furthermore, we present several novel extensions to these methods including procedures for dealing with system boundaries which enable improved investigation of the effects of dispersal. We also present a semi-analytical algorithm that calculates the expected species richness in a sample, for any speciation rate. By eliminating the effect of stochasticity in the speciation process, we reduce the variance in estimates of species richness. Our benchmarks show that the combination of existing coalescence theory and our extensions produces higher quality results in vastly shorter time scales than previously possible: years of simulation time are reduced to minutes. As an example application, we find parameters for a spatially explicit neutral model to approximate the species richness of a tropical forest dataset.
 
Study locations on the Porcupine Bank, west of Ireland. Species occurrence records were derived from ROV-based video data at a local area (R1, red square) and a region on the eastern margin of the Rockall Trough (green rectangle). Bathymetric contours every 100 m. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
(a) Bathymetry over the western margin of the Porcupine Bank. Black triangular symbols indicate sample locations of ROV-based video data acquired as part of this study and employed in model development. Maps of the probability of occurrence of L. pertusa from composite models based upon model generation with terrain parameters of (b) BPI at an analysis scale of 4950 m reported the highest AUC score (0.93) (c) rugosity, slope, BPI, aspect and curvature at analysis scales of 4950 m, and (d) rugosity, slope, BPI, aspect and curvature at analysis scales of 1650 m (AUC scores 0.86 and 0.73 respectively). Blue square symbols represent locations of species presence used as training data in the model generation; black filled circles denote locations of species presence used as validation data, recorded from previous surveys. On a scale of 0 to 100, areas indicated by 0 mean the model has not predicted species presence whereas areas approaching 100 are where the model predicts a high probability of species occurrence. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
(a) Bathymetry over R1 showing a simplified version of the track (black line) of ROV-based video data. Maps of the percentage probability of occurrence of L. pertusa from composite models developed using (b) rugosity, slope and aspect at an analysis scale of 510 m had the highest AUC score (N 0.99), (c) rugosity at an analysis scale of 90 m, and (d) rugosity at an analysis scale of 270 m with AUC scores 0.76 and 0.96 respectively. The areas predicted to be of highest suitability for L. pertusa are coloured red; unsuitable habitat is predicted to occur in green areas. Blue square symbols represent locations of species presence used as training data in the model generation; black filled circles denote locations of species presence used as validation data. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Despite a growing appreciation of the need to protect sensitive deep sea ecosystems such as cold-water corals, efforts to map the extent of their distribution are limited by their remoteness. Here we develop ecological niche models to predict the likely distributions of cold-water corals based on occurrence records and data describing environmental parameters (e.g. seafloor terrain attributes and oceanographic conditions). This study has used bathymetric data derived from ship-borne multibeam swath systems, species occurrence data from remotely operated vehicle video surveys and oceanographic parameters from hydrodynamic models to predict coral locations in regions where there is a paucity of direct observations. Predictions of the locations of the scleractinian coral, Lophelia pertusa are based primarily upon ecological niche modelling using a genetic algorithm. Its accuracy has been quantified at local (~ 25 km2) and regional scales (~ 4000 km2) along the Irish continental slope using a variety of error assessment techniques and a comparison with another ecological niche modelling technique. With appropriate choices of parameters and scales of analyses, ecological niche modelling has been effective in predicting the distributions of species at local and regional scales. Refinements of this approach have the potential to be particularly useful for ocean management given the need to manage areas of sensitive habitat where survey data are often limited.
 
This paper introduces a novel numerical stochastic optimization algorithm inspired from colonizing weeds. Weeds are plants whose vigorous, invasive habits of growth pose a serious threat to desirable, cultivated plants making them a threat for agriculture. Weeds have shown to be very robust and adaptive to change in environment. Thus, capturing their properties would lead to a powerful optimization algorithm. It is tried to mimic robustness, adaptation and randomness of colonizing weeds in a simple but effective optimizing algorithm designated as Invasive Weed Optimization (IWO). The feasibility, the efficiency and the effectiveness of IWO are tested in details through a set of benchmark multi-dimensional functions, of which global and local minima are known. The reported results are compared with other recent evolutionary-based algorithms: genetic algorithms, memetic algorithms, particle swarm optimization, and shuffled frog leaping. The results are also compared with different versions of simulated annealing — a generic probabilistic meta-algorithm for the global optimization problem — which are simplex simulated annealing, and direct search simulated annealing. Additionally, IWO is employed for finding a solution for an engineering problem, which is optimization and tuning of a robust controller. The experimental results suggest that results from IWO are better than results from other methods. In conclusion, the performance of IWO has a reasonable performance for all the test functions.
 
The ecoinformatics community recognizes that ecological synthesis across studies, space, and time will require new informatics tools and infrastructure. Recent advances have been encouraging, but many problems still face ecologists who manage their own datasets, prepare data for archiving, and search data stores for synthetic research. In this paper, we describe how work by the Canopy Database Project (CDP) might enable use of database technology by field ecologists: increasing the quality of database design, improving data validation, and providing structural and semantic metadata — all of which might improve the quality of data archives and thereby help drive ecological synthesis.
 
Ecological databases continue to grow in volume, breadth and complexity. Higher level descriptions of data (i.e., metadata) and information derived from subsequent data processing and analyses (i.e., “meta-information” in the broadest sense) are essential for understanding and using the increasingly complex and voluminous data and information. The concepts of meta-information, in general, and metadata, in particular, have evolved in concert with the increasing needs for functionality by the community. From a scientific perspective, metadata may be characterized as having developed from initially supporting data discovery; to facilitating acquisition, comprehension and utilization of data by humans; and, most recently, to beginning to enable automated data discovery, ingestion, processing and analysis via metadata-enabled scientific workflow systems. The continued conceptual and operational developments in metadata required to support comprehensive automated scientific workflow systems portend many challenges and opportunities. For example, there are significant opportunities for collaboration among ecologists and computer scientists in developing domain-specific controlled vocabularies and ontologies that provide the basis for semantic mediation—the “glue” technologies that enable automated data discovery, ingestion, processing and analysis. Similarly, there are opportunities for computer scientists and engineers to develop new mechanisms that support automated metadata encoding—such as providing the information that would be necessary to understand the end-to-end flow of sensor data from in situ data collection, streaming through quality assurance filtering, aggregation, transformation and additional processing, analysis, and publication of digital products. As the technologies mature, we still have many sociological barriers to overcome including the needs for increased attention to software usability testing and engineering to enhance user-friendliness of metadata management software, new capital investments in ecological data archives, and increasing the metadata management benefit–cost ratio for the average scientist via incentives and enabling tools.
 
The rapid global loss of biodiversity has led to a proliferation of systematic conservation planning methods. In spite of their utility and mathematical sophistication, these methods only provide approximate solutions to real-world problems where there is uncertainty and temporal change. The consequences of errors in these solutions are seldom characterized or addressed. We propose a conceptual structure for exploring the consequences of input uncertainty and oversimplified approximations to real-world processes for any conservation planning tool or strategy. We then present a computational framework based on this structure to quantitatively model species representation and persistence outcomes across a range of uncertainties. These include factors such as land costs, landscape structure, species composition and distribution, and temporal changes in habitat. We demonstrate the utility of the framework using several reserve selection methods including simple rules of thumb and more sophisticated tools such as Marxan and Zonation. We present new results showing how outcomes can be strongly affected by variation in problem characteristics that are seldom compared across multiple studies. These characteristics include number of species prioritized, distribution of species richness and rarity, and uncertainties in the amount and quality of habitat patches. We also demonstrate how the framework allows comparisons between conservation planning strategies and their response to error under a range of conditions. Using the approach presented here will improve conservation outcomes and resource allocation by making it easier to predict and quantify the consequences of many different uncertainties and assumptions simultaneously. Our results show that without more rigorously generalizable results, it is very difficult to predict the amount of error in any conservation plan. These results imply the need for standard practice to include evaluating the effects of multiple real-world complications on the behavior of any conservation planning method.
 
Top-cited authors
Cameron Lucas
  • Federation University Australia
Jin Li
  • Data2action Australia
Duccio Rocchini
William Michener
  • University of New Mexico
Kate S. He
  • Murray State University