Focus group discussion is frequently used as a qualitative approach to gain an in‐depth understanding of social issues. The method aims to obtain data from a purposely selected group of individuals rather than from a statistically representative sample of a broader population. Even though the application of this method in conservation research has been extensive, there are no critical assessment of the application of the technique. In addition, there are no readily available guidelines for conservation researchers.
Here, we reviewed the applications of focus group discussion within biodiversity and conservation research between 1996 and April 2017. We begin with a brief explanation of the technique for first‐time users. We then discuss in detail the empirical applications of this technique in conservation based on a structured literature review (using Scopus).
The screening process resulted in 170 articles, the majority of which (67%, n = 114,) were published between 2011 and 2017. Rarely was the method used as a stand‐alone technique. The number of participants per focus group (where reported) ranged from 3 to 21 participants with a median of 10 participants. There were seven (median) focus group meetings per study. Focus group discussion sessions lasted for 90 (median) minutes. Four main themes emerged from the review: understanding of people's perspectives regarding conservation (32%), followed by the assessment of conservation and livelihoods practices (21%), examination of challenges and impacts of resource management interventions (19%) and documenting the value of indigenous knowledge systems (16%). Most of the studies were in Africa ( n = 76), followed by Asia ( n = 44), and Europe ( n = 30).
We noted serious gaps in the reporting of the methodological details in the reviewed papers. More than half of the studies ( n = 101) did not report the sample size and group size ( n = 93), whereas 54 studies did not mention the number of focus group discussion sessions while reporting results. Rarely have the studies provided any information on the rationale for choosing the technique. We have provided guidelines to improve the standard of reporting and future application of the technique for conservation.
Assessing the biological relevance of variance components estimated using Markov chain Monte Carlo (MCMC)‐based mixed‐effects models is not straightforward. Variance estimates are constrained to be greater than zero and their posterior distributions are often asymmetric. Different measures of central tendency for these distributions can therefore vary widely, and credible intervals cannot overlap zero, making it difficult to assess the size and statistical support for among‐group variance. Statistical support is often assessed through visual inspection of the whole posterior distribution and so relies on subjective decisions for interpretation.
We use simulations to demonstrate the difficulties of summarizing the posterior distributions of variance estimates from MCMC‐based models. We then describe different methods for generating the expected null distribution (i.e. a distribution of effect sizes that would be obtained if there was no among‐group variance) that can be used to aid in the interpretation of variance estimates.
Through comparing commonly used summary statistics of posterior distributions of variance components, we show that the posterior median is predominantly the least biased. We further show how null distributions can be used to derive a p ‐value that provides complementary information to the commonly presented measures of central tendency and uncertainty. Finally, we show how these p ‐values facilitate the implementation of power analyses within an MCMC framework.
The use of null distributions for variance components can aid study design and the interpretation of results from MCMC‐based models. We hope that this manuscript will make empiricists using mixed models think more carefully about their results, what descriptive statistics they present and what inference they can make.
Step selection analysis (SSA) is a common framework for understanding animal movement and resource selection using telemetry data. Such data are, however, inherently autocorrelated in space, a complication that could impact SSA‐based inference if left unaddressed. Accounting for spatial correlation is standard statistical practice when analysing spatial data, and its importance is increasingly recognized in ecological models (e.g. species distribution models). Nonetheless, no framework yet exists to account for such correlation when analysing animal movement using SSA.
Here, we extend the popular method integrated step selection analysis (iSSA) by including a Gaussian field (GF) in the linear predictor to account for spatial correlation. For this, we use the Bayesian framework R‐INLA and the stochastic partial differential equations (SPDE) technique.
We show through a simulation study that our method provides accurate fixed effects estimates, quantifies their uncertainty well and improves the predictions. In addition, we demonstrate the practical utility of our method by applying it to three wolverine ( Gulo gulo ) tracks.
Our method solves the problems of assuming spatially independent residuals in the SSA framework. In addition, it offers new possibilities for making long‐term predictions of habitat usage.
Significant advances in computational ethology have allowed the quantification of behaviour in unprecedented detail. Tracking animals in social groups, however, remains challenging as most existing methods can either capture pose or robustly retain individual identity over time but not both.
To capture finely resolved behaviours while maintaining individual identity, we built NAPS (NAPS is ArUco Plus SLEAP), a hybrid tracking framework that combines state‐of‐the‐art, deep learning‐based methods for pose estimation (SLEAP) with unique markers for identity persistence (ArUco). We show that this framework allows the exploration of the social dynamics of the common eastern bumblebee ( Bombus impatiens ).
We provide a stand‐alone Python package for implementing this framework along with detailed documentation to allow for easy utilization and expansion. We show that NAPS can scale to long timescale experiments at a high frame rate and that it enables the investigation of detailed behavioural variation within individuals in a group.
Expanding the toolkit for capturing the constituent behaviours of social groups is essential for understanding the structure and dynamics of social networks. NAPS provides a key tool for capturing these behaviours and can provide critical data for understanding how individual variation influences collective dynamics.
1. Automatic monitoring of wildlife is becoming a critical tool in the field of ecology. In particular, Radio-Frequency IDentification (RFID) is now a widespread technology to assess the phenology, breeding and survival of many species. While RFID produces massive datasets, no established fast and accurate methods are yet available for this type of data processing. Deep learning approaches have been used to overcome similar problems in other scientific fields and hence might hold the potential to overcome these analytical challenges and unlock the full potential of RFID studies. 2. We present a deep learning workflow, coined "RFIDeep", to derive ecological features, such as breeding status and outcome, from RFID mark-recapture data. To demonstrate the performance of RFIDeep with complex datasets, we used a long-term automatic monitoring of a long-lived seabird that breeds in densely packed colonies, hence with many daily entries and exits. 3. To determine individual breeding status and phenology and for each breeding season, we first developed a one-dimensional convolution neural network (1D-CNN) architecture. Second, to account for variance in breeding phenology and technical limitations of field data acquisition, we built a new data augmentation
Passive acoustic telemetry is widely used to study the movements of aquatic animals. However, a holistic, mechanistic modelling framework that permits the reconstruction of fine‐scale movements and emergent patterns of space use from detections at receivers remains lacking.
Here, we introduce an integrative modelling framework that recapitulates the movement and detection processes that generate detections to reconstruct fine‐scale movements and patterns of space use. This framework is supported by a new family of algorithms designed for detection and depth observations and can be flexibly extended to incorporate other data types. Using simulation, we illustrate applications of our framework and evaluate algorithm utility and sensitivity in different settings. As a case study, we analyse movement data collected from the Critically Endangered flapper skate ( Dipturus intermedius ) in Scotland.
We show that our methods can be used to reconstruct fine‐scale movement paths, patterns of space use and support habitat preference analyses. For reconstructing patterns of space use, simulations show that the methods are consistently more instructive than the most widely used alternative approach (the mean‐position algorithm), particularly in clustered receiver arrays. For flapper skate, the reconstruction of movements reveals responses to disturbance, fine‐scale spatial partitioning and patterns of space use with significant implications for marine management.
We conclude that this framework represents a widely applicable methodological advance with applications to studies of pelagic, demersal and benthic species across multiple spatiotemporal scales.
The rise of passive acoustic monitoring and the rapid growth in large audio datasets is driving the development of analysis methods that allow ecological inferences to be drawn from acoustic data.
Acoustic indices are currently one of the most widely applied tools in ecoacoustics. These numerical summaries of the sound energy contained in digital audio recordings are relatively straightforward and fast to calculate but can be challenging to interpret. Misapplication and misinterpretation have produced conflicting results and led some to question their value.
To encourage better use of acoustic indices, we provide nine points of guidance to support good study design, analysis and interpretation. We offer practical recommendations for the use of acoustic indices in the study of both whole soundscapes and individual taxa and species, and point to emerging trends in ecoacoustic analysis. In particular, we highlight the critical importance of understanding the links between soundscape patterns and acoustic indices.
Acoustic indices can offer insights into the state of organisms, populations, and ecosystems, complementing other ecological research techniques. Judicious selection, appropriate application and thorough interpretation of existing indices is vital to bolster robust developments in ecoacoustics for biodiversity monitoring, conservation and future research.
Rapid advances in the field of movement ecology have led to increasing insight into both the population‐level abundance patterns and individual‐level behaviour of migratory species. Despite this progress, research questions that require scaling individual‐level understanding of the behaviour of migrating organisms to the population level remain difficult to investigate.
To bridge this gap, we introduce a generalizable framework for training full‐annual cycle individual‐based models of migratory movements by combining information from tracking studies and species occurrence records. Focusing on migratory birds, we call this method: Models of Individual Movement of Avian Species (MIMAS). We implement MIMAS to design individual‐based models of avian migration that are trained using previously published weekly occurrence maps and fit via Approximate Bayesian Computation.
MIMAS models leverage individual‐ and population‐level information to faithfully represent continental‐scale migration patterns. Models can be trained successfully for species even when little existing individual‐level data is available for parameterization by relying on population‐level information. In contrast to existing mathematical models of migration, MIMAS explicitly represents and estimates behavioural attributes of migrants. MIMAS can additionally be used to simulate movement over consecutive migration seasons, and models can be easily updated or validated as new empirical data on migratory behaviours becomes available.
MIMAS can be applied to a variety of research questions that require representing individual movement at large scales. We demonstrate three applied uses for MIMAS: estimating population‐specific migratory phenology, predicting the spatial patterns and magnitude of ectoparasite dispersal by migrants, and simulating the spread of a pathogen across the annual cycle of a migrant species. Currently, MIMAS can easily be used to build models for hundreds of migratory landbird species but can also be adapted in the future to build models of other types of migratory animals.
Drift experiments are essential to understand stranding patterns and estimate the mortality of beached animals. Most studies do not use telemetry technology due to the high costs of this methodology. The objective of this paper is to describe the possibilities of tracking marine tetrapod carcasses with a low‐cost and replicable methodology. The study was carried out on the Southern Subtropical Shelf (~28°–34°S), a highly productive and key ecological region of the southwestern Atlantic Ocean (SWA).
We designed and tested a low‐cost mixed methodology that includes Global Positioning System trackers, passive drifters (reused glass bottles) and Citizen Science (through an instant message platform and email) to track carcasses of marine tetrapods. We conducted four drift experiments during the four seasons of 2019. We released 787 drifters (600 nonbiological and 187 carcasses of seabirds, sea turtles, and cetaceans) at sea, at five equally separated distances (5–25 km) from the coast. Beach surveys and citizen science were implemented to recover the beached drifters.
We recovered 71.83% of non‐biological drifters and 27.27% of carcasses released. We tracked the movements of 38 carcasses (25 sea turtles and 13 cetaceans) with 17 GPS devices. The drifting time, until reaching the beach, ranged from 12 h to 17 days for carcasses and 12 h to 406 days for bottles. Citizen Science was the most important source of recovery of nonbiological drifters, representing 66.67% of the total recovered bottles. For carcasses, active search was the most important recovery source, representing 64.7% of the total carcasses recovered.
Our study contributes with new findings on marine tetrapod drift patterns in the SWA and describes an accessible low‐cost mixed methodology for small and medium‐budget projects that can be replicated in other coastal regions of the world for tracking a wide range of marine tetrapod species.
Core samples from trees are a critical reservoir of ecological information, informing our understanding of past climates, as well as contemporary ecosystem responses to global change. Manual measurements of annual growth rings in trees are slow, labour‐intensive and subject to human bias, hindering the generation of big datasets. We present an alternative, neural network‐based implementation that automates detection and measurement of tree‐ring boundaries from coniferous species.
We trained our Mask R‐CNN extensively on over 8000 manually annotated ring boundaries from microscope‐imaged Norway Spruce Picea abies increment cores. We assessed the performance of the trained model after post‐processing on real‐world data generated from our core processing pipeline.
The CNN after post‐processing performed well, with recognition of over 98% of ring boundaries (recall) with a precision in detection of 96% when tested on real‐world data. Additionally, we have implemented automatic measurements based on minimum distance between rings. With minimal editing for missed ring detections, these measurements were 98% correlated with human measurements of the same samples. Tests on other three conifer species demonstrate that the CNN generalizes well to other species with similar structure.
We demonstrate the efficacy of automating the measurement of growth increment in tree core samples. Our CNN‐based system provides high predictive performance in terms of both tree‐ring detection and growth rate determination. Our application is readily deployable as a Docker container and requires only basic command line skills. Additionally, an easy re‐training option allows users to expand capabilities to other wood types. Application outputs include both editable annotations of predictions as well as ring‐width measurements in a commonly used .pos format, facilitating the efficient generation of large ring‐width measurement datasets from increment core samples, an important source of environmental data.
Next‐generation sequencing of pooled samples (Pool‐seq) is an important tool in population genomics and molecular ecology. In Pool‐seq, the relative number of reads with an allele reflects the allele frequencies in the sample. However, unequal individual contributions to the pool and sequencing errors can lead to inaccurate allele frequency estimates, influencing downstream analysis. When designing Pool‐seq studies, researchers need to decide the pool size (number of individuals) and average depth of coverage (sequencing effort). An efficient sampling design should maximise the accuracy of allele frequency estimates while minimising the sequencing effort. We describe a novel tool to simulate single nucleotide polymorphism (SNP) data using coalescent theory and account for sources of uncertainty in Pool‐seq.
We introduce an R package, poolHelper , enabling users to simulate Pool‐seq data under different combinations of average depth of coverage and pool size, accounting for unequal individual contributions and sequencing errors, modelled by adjustable parameters. The mean absolute error is computed by comparing the sample allele frequencies obtained based on individual genotypes with the frequency estimates obtained with Pool‐seq.
poolHelper enables users to simulate multiple combinations of pooling errors, average depth of coverage, pool sizes and number of pools to assess how they influence the error of sample allele frequencies and expected heterozygosity. Using simulations under a single population model, we illustrate that increasing the depth of coverage does not necessarily lead to more accurate estimates, reinforcing that finding the best Pool‐seq study design is not straightforward. Moreover, we show that simulations can be used to identify different combinations of parameters with similarly low mean absolute errors. This can help users to define an effective sampling design by using those combinations of parameters that minimise the sequencing effort.
The poolHelper package provides tools for performing simulations with different combinations of parameters (e.g. pool size, depth of coverage, unequal individual contribution) before sampling and generating data, allowing users to define sampling schemes based on simulations. This allows researchers to focus on the best sampling scheme to answer their research questions. poolHelper is comprehensively documented with examples to guide effective use.
Considering local population dynamics and dispersal is crucial to project species' range adaptations in changing environments. Dynamic models including these processes are highly computer intensive, with consequent restrictions on spatial extent and/or resolution.
We present CATS, an open‐source, extensible modelling framework for simulating spatially and temporarily explicit population dynamics of plants. It can be used in conjunction with species distribution models, or via direct parametrisation of vital rates and allows for fine‐grained control over the demographic and dispersal processes' models.
The performance and flexibility of CATS is exemplified (i) by modelling the range shift of four plant species under three future climate scenarios across Europe at a spatial resolution of 100 m., and (ii) by exploring consequences of demographic compensation for range expansion on artificial landscapes.
The presented software attempts to leverage the availability of computational resources and lower the barrier of entry for large‐extent, fine‐resolution simulations of plant range shifts in changing environments.
We introduce community‐level basis function models (CBFMs) as an approach for spatiotemporal joint distribution modelling. CBFMs can be viewed as related to spatiotemporal latent variable models, where the latent variables are replaced by a set of pre‐specified spatiotemporal basis functions which are common across species.
In a CBFM, the coefficients that link the basis functions to each species are treated as random slopes. As such, the CBFM can be formulated to have a similar structure to a generalised additive model. This allows us to adapt existing techniques to fit CBFMs efficiently.
CBFMs can be used for a variety of reasons, such as inferring patterns of habitat use in space and time, understanding how residual covariation between species varies spatially and/or temporally, and spatiotemporal predictions of species‐ and community‐level quantities.
A simulation study and an application to data from a bottom trawl survey conducted across the U.S. Northeast shelf show that CBFMs can achieve similar and sometimes better predictive performance compared to existing approaches for spatiotemporal joint species distribution modelling, while being computationally more scalable.
Transcriptome sequencing technologies have revolutionized the field of phylogenomics by facilitating the identification of homologous genes for species without whole genome sequences. To infer complex evolution relationship among eukaryotes, it is essential to obtain complete sequences of protein‐coding genes to provide informative mutations. However, transcriptomes in eukaryotes often consist of a large number of duplicated genes and alternative isoforms, bringing great challenges for developing effective tools to obtain complete coding gene sequences.
Here, we present a net‐flow based assembler TransMCL, which aims to assemble fragmented transcripts into complete sequences while eliminating redundant isoforms during homologue clustering. By employing Markov clustering strategies and homologous gene guidance, TransMCL can accurately assemble genes for multiple organisms in an affordable time frame, making it well‐suited for phylogenomic studies based on transcriptomic data.
Our results demonstrate that TransMCL can assemble 89.95%–92.95% of the total expressed genes into near‐complete transcripts on benchmark plant/animal datasets. Furthermore, applying TransMCL to multiple transcriptomes in a single run enhances the completeness of genes, even in the absence of guidance homologues from closely related species.
These findings highlight the potential of TransMCL in phylogenomic studies, enabling the comprehensive characterization of gene families at the whole genome scale. By overcoming the challenges of complex gene contents in eukaryotes, TransMCL can significantly enhance our understanding of evolution of gene families across species.
1. Statistical tests for molecular evolution provide quantifiable insights into the selection pressures that govern a genome’s evolution. Increasing sample sizes used for analysis leads to higher statistical power. However, this requires more computational nodes or longer computational time.
2. CATE (CUDA Accelerated Testing of Evolution) is a computational solution to this problem comprised of two main innovations. The first is a file organization system coupled with a novel search algorithm and the second is a large-scale parallelization of algorithms using both GPU and CPU. CATE is capable of conducting evolutionary tests such as Tajima’s D, Fu and Li's, and Fay and Wu’s test statistics, McDonald–Kreitman Neutrality Index, Fixation Index, and Extended Haplotype Homozygosity.
3. CATE is magnitudes faster than standard tools with benchmarks estimating it being on average over 180 times faster. For instance, CATE processes all 54,849 human genes for all 22 autosomal chromosomes across the five super populations present in the 1000 Genomes Project in less than thirty minutes while counterpart software took 3.62 days.
4. This proven framework has the potential to be adapted for GPU-accelerated large-scale parallel analyses of many evolutionary and genomic analyses.
The use of weather radars to detect and distinguish between different biological patterns greatly improves our understanding of aeroecology and its consequences for our lives. Importantly, it allows us to quantify passerine bird migration at different scales. Yet, no algorithm to detect soaring bird flocks in weather radar is available, precluding our ability to study this type of migration over large spatial scales.
We developed the first automatic algorithm for detecting the migration of flocks of soaring birds, an important bio‐flow phenomenon involving many millions of birds that travel across large spatial extents, with implications for risk of bird‐aircraft collisions. The algorithm was developed with a deep learning network for semantic segmentation using U‐Net architecture. We tested several models with different weather radar products and with image sequences for flock movement identification.
The best model includes the radial velocity product and a sequence of two previous images. It identifies 93% of soaring bird flocks that were tagged by a human on the radar image, with a false discovery of less than 20%.
Large birds such as those detected by the algorithm pose a serious risk for flight safety of civilian and military transportation and therefore the application of this algorithm can substantially reduce bird‐strikes, leading to reduced financial losses and threats to human lives. In addition, it can help overcome one of the main challenges in the study of bird migration by automatically and continuously detecting flocks of large birds over wide spatial scales without the need to equip the birds with tracking devices, unravelling the abundance, timing, spatial flyways, seasonal trends and influences of environmental conditions on the migration of bird flocks.
Often there are several complex ecosystem models available to address a specific question. However, structural differences, systematic discrepancies and uncertainties mean that they typically produce different outputs. Rather than selecting a single ‘best’ model, it is desirable to combine them to give a coherent answer to the question at hand.
Many methods of combining ecosystem models assume that one of the models is exactly correct, which is unlikely to be the case. Furthermore, models may not be fitted to the same data, have the same outputs, nor be run for the same time period, making many common methods difficult to implement. In this paper, we use a statistical model to describe the relationship between the ecosystem models, prior beliefs and observations to make coherent predictions of the true state of the ecosystem with robust quantification of uncertainty.
We introduce EcoEnsemble, an R package that takes advantage of the statistical model's structure to efficiently fit the ensemble model, either sampling from the posterior distribution or maximising the posterior density.
We demonstrate EcoEnsemble by investigating what would happen to four fish species in the North Sea under future management scenarios. Although developed for applications in ecology, EcoEnsemble can be used to combine any group of mechanistic models, for example in climate modelling, epidemiology or biology.
Biotic interactions such as predation are difficult ecological processes to quantify in the wild. This is especially the case in the marine environment due to logistical difficulties in capturing animal behaviour. Common approaches use aquarium‐based experiments, live‐tethering, or assays with bait as proxies for quantifying predation pressure. However, these methods often fail to account for natural interactions between species in the wild and may raise ethical and animal welfare concerns.
We designed a novel field‐based method to quantify predator–prey interactions for marine fishes. The “predation dome” is a clear acrylic aquarium that contains a live fish. The dome is filmed and, in contrast to other methods, it allows for natural olfactory and visual cues, and the prey fish is returned to the wild after the assay. Here, we provide a step‐by‐step guide on building and deploying the predation dome in the wild. To demonstrate its use, we quantified predation pressure using the domes in two tropical and two temperate locations.
Piscivores were attracted to the domes and displayed predatory behaviours such as circling or striking. Although the overall number of predatory attacks did not differ among locations, predation domes revealed higher predation pressure by piscivores at the tropical locations in comparison to temperate reefs.
Our results show that predation domes represent an ethical and complementary approach to measure predation that may better represent piscivory as compared to other behaviours. Predation domes can be also used to measure other biotic interactions such as territorial defence or courtship.
Remote sensing (RS) increasingly seeks to produce global‐coverage maps of plant functional diversity (PFD) across scales. PFD can be quantified with metrics assessing field or RS data dissimilarity. However, their comparison suffers from the lack of normalization approaches that (1) correct for differences in the number and correlation of traits and spectral variables and (2) do not require comparing all the available samples to estimate the maximum trait's dissimilarity (unfeasible in RS).
We propose a generalizable normalization (GN) based on the maximum potential dissimilarity for the traits and spectral data considered and compare it to more traditional approaches (e.g. the maximum dissimilarity within datasets). To do so, we simulated plant communities with radiative transfer models and compared RS‐based diversity measurements across spatial scales (α‐ and β‐diversity components). Specifically, we assessed the capability of different normalization approaches (GN, local, none) to provide PFD estimates comparable between (1) RS and plant traits and (2) estimates from different RS missions.
Unlike the other approaches, GN provides diversity component estimates that are directly comparable between field data and RS missions with different spectral configurations by removing the effect of differences in the number of traits or bands and the maximum dissimilarity across datasets.
Therefore, GN enables the separated analysis of RS images from different sensors to produce comparable global‐coverage cartography. We suggest GN is necessary to validate RS approaches and develop interpretable maps of PFD using different RS missions.
The accurate estimation of species richness in a target region is still a statistical challenge, especially in a highly heterogeneous community. Most richness estimators have been developed based on the assumption that data are randomly sampled with replacement or that data are sampled from an infinite population. However, in reality, most sampling schemes in the field are implemented as sampling without replacement (SWOR). As such, estimators derived based on sampling with replacement may cause overestimation as the sampling fraction increases and not converge to the true richness as the sampling fraction approaches one. Sample‐based incidence data, in which the sampling unit is a plot, and only the presence or absence of a species in each chosen plot is recorded, is one of the most commonly used data types for assessing species diversity in ecological studies.
In this manuscript, according to sample‐based incidence data collected through SWOR, a new richness estimator is proposed using a truncated beta‐binomial mixture model. The new estimator was obtained through the moment approach, which avoids using iterative numerical algorithms for parameter estimation and presents a closed‐form estimator as an alternative to the maximum likelihood method. Although the newly proposed method is a parametric‐based richness estimator, similar to nonparametric estimators, only the rare species in the sample (i.e. the frequencies of uniques and duplicates) are required to estimate undetected richness.
Based on the hypothetical models, the statistical performances of the proposed estimator are evaluated under varying degrees of heterogeneity and different mean species detection rates. Compared to other widely used nonparametric and parametric estimators, the simulation results indicate that the proposed estimator has a smaller bias and lower root mean square error when the sampling fraction is greater than 10%, particularly in highly heterogeneous communities.
In addition, one ForestGEO permanent forest plot dataset is used to evaluate and compare the proposed approach with other estimators discussed in the study. The results demonstrate that the proposed estimator, in comparison to other widely used estimators, produces less biased estimate of true richness, along with more accurate 95% confidence interval.
Projects focused on movement behaviour and home range are commonplace, but beyond a focus on choosing appropriate research questions, there are no clear guidelines for such studies. Without these guidelines, designing an animal tracking study to produce reliable estimates of space‐use and movement properties (necessary to answer basic movement ecology questions), is often done in an ad hoc manner.
We developed ‘ movedesign ’, a user‐friendly Shiny application, which can be utilized to investigate the precision of three estimates regularly reported in movement and spatial ecology studies: home range area, speed and distance travelled. Conceptually similar to statistical power analysis, this application enables users to assess the degree of estimate precision that may be achieved with a given sampling design; that is, the choices regarding data resolution (sampling interval) and battery life (sampling duration).
Leveraging the ‘ ctmm ’ R package, we utilize two methods proven to handle many common biases in animal movement datasets: autocorrelated kernel density estimators (AKDEs) and continuous‐time speed and distance (CTSD) estimators. Longer sampling durations are required to reliably estimate home range areas via the detection of a sufficient number of home range crossings. In contrast, speed and distance estimation requires a sampling interval short enough to ensure that a statistically significant signature of the animal's velocity remains in the data.
This application addresses key challenges faced by researchers when designing tracking studies, including the trade‐off between long battery life and high resolution of GPS locations collected by the devices, which may result in a compromise between reliably estimating home range or speed and distance. ‘ movedesign ’ has broad applications for researchers and decision‐makers, supporting them to focus efforts and resources in achieving the optimal sampling design strategy for their research questions, prioritizing the correct deployment decisions for insightful and reliable outputs, while understanding the trade‐off associated with these choices.
(1) Earth’s biosphere is undergoing drastic reorganization due to the sixth mass extinction brought on by the Anthropocene. Impacts of local and regional extirpation of species have been demonstrated to propagate through the complex interaction networks they are part of, leading to secondary extinctions and exacerbating biodiversity loss. Contemporary ecological theory has developed several measures to analyse the structure and robustness of ecological networks under biodiversity loss. However, a toolbox for directly simulating and quantifying extinction cascades and creating novel interactions (i.e. rewiring) remains absent.
(2) Here, we present NetworkExtinction—a novel R package which we have developed to explore the propagation of species extinction sequences through ecological networks and quantify the effects of rewiring potential in response to primary species extinctions. With NetworkExtinction, we integrate ecological theory and computational simulations to develop functionality with which users may analyse and visualize the structure and robustness of ecological networks. The core functions introduced with NetworkExtinction focus on simulations of sequential primary extinctions and associated secondary extinctions, allowing user-specified secondary extinction thresholds and realization of rewiring potential.
(3) With the package NetworkExtinction, users can estimate the robustness of ecological networks after performing species extinction routines based on several algorithms. Moreover, users can compare the number of simulated secondary extinctions against a null model of random extinctions. In-built visualizations enable graphing topological indices calculated by the deletion sequence functions after each simulation step. Finally, the user can estimate the network’s degree distribution by fitting different common distributions. Here, we illustrate the use of the package and its outputs by analysing a Chilean coastal marine food web.
(4) NetworkExtinction is a compact and easy-to-use R package with which users can quantify changes in ecological network structure in response to different patterns of species loss, thresholds and rewiring potential. Therefore, this package is particularly useful for evaluating ecosystem responses to anthropogenic and environmental perturbations that produce nonrandom and sometimes targeted, species extinctions.
Microbiomes never stop changing. Their compositions and functions are shaped by the complex interplay of intrinsic and extrinsic drivers, such as growth and migration rates, species interactions, available nutrients and environmental conditions. Mathematical models help us make sense of these complex drivers and intuitively explain how, why and when specific microbiome states are reached while others are not. To make simulations of microbiome dynamics intuitive and accessible, we present miaSim.
miaSim provides users with a wide range of possibilities to match their specific assumptions and scenarios, starting from a core implementation of four widely used models (namely the stochastic logistic model, MacArthur's consumer‐resource model, Hubbell's neutral model and the generalized Lotka‐Volterra model) and several of their derivations. The diverse model implementations share the same data structures and, whenever possible, share state variables, which significantly facilitates cross‐model combinations and comparisons.
We combined and simulated some published examples of microbiome models in miaSim and performed cross‐model comparisons and tested diverse model assumptions. Our examples illustrate the reliability, robustness and user‐friendliness of the package. In addition, miaSim is accompanied by miaSimShiny, which allows users to explore the parameter space of their models in real‐time in an intuitive graphical interface. miaSim is fully compatible with the ‘miaverse’, an R/Bioconductor framework for microbiome data science, allowing users to combine and compare model simulations with microbiome datasets. The stable version of miaSim is available through Bioconductor 3.15, and the version for future development is available at https://github.com/microbiome/miaSim. miaSimShiny is available at https://github.com/gaoyu19920914/miaSimShiny.
We anticipate that miaSim will significantly facilitate the task of simulating microbiome dynamics, highlighting the role of ecological simulations as important tools in microbiome data science.
In the light of declining biodiversity, monitoring its fate is essential for conservation strategies. Aggregation of temporal change of different species into multi-species indices such as geometric means makes it possible to identify species groups that are at risk as well as those that are doing well. However, aggregated indices mask the between-species variability in the temporal trajectories, which could be of high relevance for conservation actions.
We propose a toolbox, available as an r package, to investigate compositions of species dynamics in geometric mean multi-species indices. The toolbox is based on a dynamic factor analysis which uses species dynamics and their uncertainty to (1) identify common latent trends in those species dynamics, (2) display the variability of species dynamics and (3) extract clusters of species with similar dynamics within the species groups used for the indices.
We apply the toolbox to common breeding birds in Sweden and explore the variability in dynamics among species included in EU-official indices for farmland and woodland species, highlighting clusters of species with related dynamics previously hidden by averaging.
The toolbox is designed to be applicable to a wide range of ecological monitoring data. By enabling a deeper exploration of the structure behind existing indices, we may refine our understanding of biodiversity change to better inform subsequent conservation policies.
The description of biological objects, such as seeds, mainly relies on manual measurements of few characteristics, and on visual classification of structures, both of which can be subjective, error prone and time‐consuming. Image analysis tools offer means to address these shortcomings, but we currently lack a method capable of automatically handling seeds from different taxa with varying morphological attributes and obtaining interpretable results.
Here, we provide a simple image acquisition and processing protocol and introduce Traitor , an open‐source software available as a command‐line interface (CLI), which automates the extraction of seed morphological traits from images. The workflow for trait extraction consists of scanning seeds against a high‐contrast background, correcting image colours, and analysing images with the software. Traitor is capable of processing hundreds of images of varied taxa simultaneously with just three commands, and without a need for training, manual fine‐tuning or thresholding. The software automatically detects each object in the image and extracts size measurements, traditional morphometric descriptors widely used by scientists and practitioners, standardised shape coordinates, and colorimetric measurements.
The method was tested on a dataset comprising of 91,667 images of seeds from 1228 taxa. Traitor 's extracted average length and width values closely matched the average manual measurements obtained from the same collection (concordance correlation coefficient of 0.98). Further, we used a large image dataset to demonstrate how Traitor 's output can be used to obtain representative seed colours for taxa, determine the phylogenetic signal of seed colour, and build objective classification categories for shape with high levels of visual interpretability.
Our approach increases productivity and allows for large‐scale analyses that would otherwise be unfeasible. Traitor enables the acquisition of data that are readily comparable across different taxa, opening new avenues to explore functional relevance of morphological traits and to advance on new tools for seed identification.
1. Two main types of species distribution models are used to project species range shifts in future climatic conditions: correlative and process-based models. Although there is some continuity between these two types of models, they are fundamentally different in their hypotheses (statistical relationships vs. mechanistic relationships) and their calibration methods (SDMs tend to be occurrence data driven while PBMs tend to be prior driven).
2. One of the limitations to the use of process-based models is the difficulty to parameterize them for a large number of species compared to correlative SDMs. We investigated the feasibility of using an evolutionary algorithm (called covariance matrix adaptation evolution strategy, CMA-ES) to calibrate process based models using species distribution data. This method is well established in some fields (robotics, aerodynamics, etc.), but has never been used, to our knowledge, in ecology, despite its ability to deal with very large space dimensions. Using tree species occurrence data across Europe, we adapted the CMA-ES algorithm to find appropriate values of model parameters. We estimated simultaneously 27–77 parameters of two process based models simulating forest tree's ecophysiology for three species with varying range sizes and geographical distributions.
3. CMA-ES provided parameter estimates leading to better prediction of species distribution than parameter estimates based on expert knowledge. Our results also revealed that some model parameters and processes were strongly dependent, and different parameter combinations could therefore lead to high model accuracy.
4. We conclude that CMA-ES is an efficient state-of-the-art method to calibrate process-based models with a large number of parameters using species occurrence data. Inverse modelling using CMA-ES is a powerful method to calibrate process-based parameters which can hardly be measured. However, the method does not warranty that parameter estimates are correct because of several sources of bias, similarly to correlative models, and expert knowledge is required to validate results
Rhythmicity in the millisecond to second range is a fundamental building block of communication and coordinated movement. But how widespread are rhythmic capacities across species, and how did they evolve under different environmental pressures? Comparative research is necessary to answer these questions but has been hindered by limited crosstalk and comparability among results from different study species.
Most acoustics studies do not explicitly focus on characterising or quantifying rhythm, but many are just a few scrapes away from contributing to and advancing the field of comparative rhythm research. Here, we present an eight‐level rhythm reporting framework which details actionable steps researchers can take to report rhythm‐relevant metrics. Levels fall into two categories: metric reporting and data sharing. Metric reporting levels include defining rhythm‐relevant metrics, providing point estimates of temporal interval variability, reporting interval distributions, and conducting rhythm analyses. Data sharing levels are: sharing audio recordings, sharing interval durations, sharing sound element start and end times, and sharing audio recordings with sound element start/end times.
Using sounds recorded from a sperm whale as a case study, we demonstrate how each reporting framework level can be implemented on real data. We also highlight existing best practice examples from recent research spanning multiple species. We clearly detail how engagement with our framework can be tailored case‐by‐case based on how much time and effort researchers are willing to contribute. Finally, we illustrate how reporting at any of the suggested levels will help advance comparative rhythm research.
This framework will actively facilitate a comparative approach to acoustic rhythms while also promoting cooperation and data sustainability. By quantifying and reporting rhythm metrics more consistently and broadly, new avenues of inquiry and several long‐standing, big picture research questions become more tractable. These lines of research can inform not only about the behavioural ecology of animals but also about the evolution of rhythm‐relevant phenomena and the behavioural neuroscience of rhythm production and perception. Rhythm is clearly an emergent feature of life; adopting our framework, researchers from different fields and with different study species can help understand why.
The value of large‐scale collaborations for solving complex problems is widely recognized, but many barriers hinder meaningful authorship for all on the resulting multi‐author publications. Because many professional benefits arise from authorship, much of the literature on this topic has focused on cheating, conflict and effort documentation. However, approaches specifically recognizing and creatively overcoming barriers to meaningful authorship have received little attention.
We have developed an inclusive authorship approach arising from 15 years of experience coordinating the publication of over 100 papers arising from a long‐term, international collaboration of hundreds of scientists.
This method of sharing a paper initially as a storyboard with clear expectations, assignments and deadlines fosters communication and creates unambiguous opportunities for all authors to contribute intellectually. By documenting contributions through this multi‐step process, this approach ensures meaningful engagement by each author listed on a publication.
The perception that co‐authors on large authorship publications have not meaningfully contributed underlies widespread institutional bias against multi‐authored papers, disincentivizing large collaborations despite their widely recognized value for advancing knowledge. Our approach identifies and overcomes key barriers to meaningful contributions, protecting the value of authorship even on massively multi‐authored publications.
1. Photo observations are a highly valuable but rarely used source of citizen sci-ence (CS) data. Recently, the number of publicly available photo observations has increased strongly, for example, due to the use of smartphone applications for species identification. This has enabled the raising of ecological insights in poorly studied subjects. One of the fields with the highest potential to benefit from the use of photo observations is phenology.
2. We propose a workflow for iPhenology, the use of publicly available photo obser-vations to track phenological events at large scales. The workflow comprises data acquisition, cleaning of observations, phenological classification and modelling spatiotemporal patterns of phenology. We explore the suitability of iPhenology to observe key phenological stages in the plant reproductive cycle of a model species and discuss limitations and future prospects of the approach using the example of an invasive species in Europe.
3. We show that iPhenology is suitable to track key phenological events of wide-spread species. However, the number and quality of available observations may differ among species and phenological stages.
4. Overall, publicly available CS photo observations are suitable to track key pheno-logical events and can thus significantly advance the knowledge on the timing and drivers of plant phenology. In future, integrating the workflow with automated image processing and analysis may enable real-time tracking of plant phenology.
Classifying insect species involves a tedious process of identifying distinctive morphological insect characters by taxonomic experts. Machine learning can harness the power of computers to potentially create an accurate and efficient method for performing this task at scale, given that its analytical processing can be more sensitive to subtle physical differences in insects, which experts may not perceive. However, existing machine learning methods are designed to only classify insect samples into described species, thus failing to identify samples from undescribed species.
We propose a novel deep hierarchical Bayesian model for insect classification, given the taxonomic hierarchy inherent in insects. This model can classify samples of both described and undescribed species; described samples are assigned a species while undescribed samples are assigned a genus, which is a pivotal advancement over just identifying them as outliers. We demonstrated this proof of concept on a new database containing paired insect image and DNA barcode data from four insect orders, including 1040 species, which far exceeds the number of species used in existing work. A quarter of the species were excluded from the training set to simulate undescribed species.
With the proposed classification framework using combined image and DNA data in the model, species classification accuracy for described species was 96.66% and genus classification accuracy for undescribed species was 81.39%. Including both data sources in the model resulted in significant improvement over including image data only (39.11% accuracy for described species and 35.88% genus accuracy for undescribed species), and modest improvement over including DNA data only (73.39% genus accuracy for undescribed species).
Unlike current machine learning methods, the proposed deep hierarchical Bayesian learning approach can simultaneously classify samples of both described and undescribed species, a functionality that could become instrumental in biodiversity monitoring across the globe. This framework can be customized for any taxonomic classification problem for which image and DNA data can be obtained, thus making it relevant for use across all biological kingdoms.
Associating fish sounds to specific species and behaviours is important for making passive acoustics a viable tool for monitoring fish. While recording fish sounds in tanks can sometimes be performed, many fish do not produce sounds in captivity. Consequently, there is a need to identify fish sounds in situ and characterise these sounds under a wide variety of behaviours and habitats.
We designed three portable audio-video platforms capable of identifying species-specific fish sounds in the wild: a large array, a mini array and a mobile array. The large and mini arrays are static autonomous platforms than can be deployed on the seafloor and record audio and video for one to two weeks. They use multichannel acoustic recorders and low-cost video cameras mounted on PVC frames. The mobile array also uses a multichannel acoustic recorder, but mounted on a remotely operated vehicle with built-in video, which allows remote control and real-time positioning in response to observed fish presence. For all arrays, fish sounds were localised in three dimensions and matched to the fish positions in the video data. We deployed these three platforms at four locations off British Columbia, Canada.
The large array provided the best localisation accuracy and, with its larger footprint, was well suited to habitats with a flat seafloor. The mini and mobile arrays had lower localisation accuracy but were easier to deploy, and well suited to rough/uneven seafloors. Using these arrays, we identified, for the first time, sounds from quillback rockfish Sebastes maliger, copper rockfish Sebastes caurinus and lingcod Ophiodon elongatus. In addition to measuring temporal and spectral characteristics of sounds for each species, we estimated mean source levels for lingcod and quillback rockfish sounds (115.4 and 113.5 dB re 1 μPa, respectively) and maximum detection ranges at two sites (between 10.5 and 33 m).
All proposed array designs successfully identified fish sounds in the wild and were adapted to various budget, logistical and habitat constraints. We include here building instructions and processing scripts to help users replicate this methodology, identify more fish sounds around the world and make passive acoustics a more viable way to monitor fish.
Surveillance programmes are essential for detecting emerging pathogens and often rely on molecular methods to make inference about the presence of a target disease agent. However, molecular methods rarely detect target DNA perfectly. For example, molecular pathogen detection methods can result in misclassification (i.e. false positives and false negatives) or partial detection errors (i.e. detections with ‘ambiguous’, ‘uncertain’ or ‘equivocal’ results). Then, when data are to be analysed, these partial observations are either discarded or censored; this, however, disregards information that could be used to make inference about the true state of the system. There is a critical need for more direction and guidance related to how many samples are enough to declare a unit of interest ‘pathogen free’.
Here, we develop a Bayesian hierarchal framework that accommodates false negative, false positive and uncertain detections to improve inference related to the occupancy of a pathogen. We apply our modelling framework to a case study of the fungal pathogen Pseudogymnoascus destructans (Pd) identified in Texas bats at the invasion front of white‐nose syndrome. To improve future surveillance programmes, we provide guidance on sample sizes required to be 95% certain a target organism is absent from a site.
We found that the presence of uncertain detections increased the variability of resulting posterior probability distributions of pathogen occurrence, and that our estimates of required sample size were very sensitive to prior information about pathogen occupancy, pathogen prevalence and diagnostic test specificity. In the Pd case study, we found that the posterior probability of occupancy was very low in 2018, but occupancy probability approached 1 in 2020, reflecting increasing prior probabilities of occupancy and prevalence elicited from the site manager.
Our modelling framework provides the user a posterior probability distribution of pathogen occurrence, which allows for subjective interpretation by the decision‐maker. To help readers apply and use the methods we developed, we provide an interactive RShiny app that generates target species occupancy estimation and sample size estimates to make these methods more accessible to the scientific community (https://rmummah.shinyapps.io/ambigDetect_sampleSize). This modelling framework and sample size guide may be useful for improving inferences from molecular surveillance data about emerging pathogens, non‐native invasive species and endangered species where misclassifications and ambiguous detections occur.
Global, continental and regional maps of concentrations, stocks and fluxes of natural resources provide baseline data to assess how ecosystems respond to human disturbance and global warming. They are also used as input to numerous modelling efforts. But these maps suffer from multiple error sources and hence it is good practice to report estimates of the associated map uncertainty, so that users can evaluate their fitness for use.
We explain why quantification of uncertainty of spatial aggregates is more complex than uncertainty quantification at point support, because it must account for spatial autocorrelation of the map errors. Unfortunately this is not done in a number of recent high-profile studies. We describe how spatial autocorrelation of map errors can be accounted for with block kriging, a method that requires geostatistical expertise. Next, we propose a new, model-based approach that avoids the numerical complexity of block kriging and is feasible for large-scale studies where maps are typically made using machine learning. Our approach relies on Monte Carlo integration to derive the uncertainty of the spatial average or total from point support prediction errors. We account for spatial autocorrelation of the map error by geostatistical modelling of the standardized map error.
We show that the uncertainty strongly depends on the spatial autocorrelation of the map errors. In a first case study, we used block kriging to show that the uncertainty of the predicted topsoil organic carbon in France decreases when the support increases. In a second case study, we estimated the uncertainty of spatial aggregates of a machine learning map of the aboveground biomass in Western Africa using Monte Carlo integration. We found that this uncertainty was small because of the weak spatial autocorrelation of the standardized map errors.
We present a tool to get realistic estimates of the uncertainty of spatial averages and totals of natural resources maps. The method presented in this paper is essential for parties that need to evaluate whether differences in aggregated environmental variables or natural resources between regions or over time are statistically significant.
1. The ability of marine mammals to accumulate sufficient lipid energy reserves is vital for mammals' survival and successful reproduction. However, long-term monitoring of at-sea changes in body condition, specifically lipid stores, has only been possible in elephant seals performing prolonged drift dives (low-density li-pids alter the rates of depth change while drifting). This approach has limited applicability to other species.
2. Using hydrodynamic performance analysis during transit glides, we developed and validated a novel satellite-linked data logger that calculates real-time changes in body density (∝lipid stores). As gliding is ubiquitous amongst divers, the system can assess body condition in a broad array of diving animals. The tag processes high sampling rate depth and three-axis acceleration data to identify 5 s high pitch angle glide segments at depths >100 m. Body density is estimated for each glide using gliding speed and pitch to quantify drag versus buoyancy forces acting on the gliding animal.
3. We used tag data from 24 elephant seals (Mirounga spp.) to validate the onboard calculation of body density relative to drift rate. The new tags relayed body density estimates over 200 days and documented lipid store accumulation during migration with good correspondence between changes in body density and drift rate. Our study provided updated drag coefficient values for gliding (C d,f = 0.03) and drifting (C d,s = 0.12) elephant seals, both substantially lower than previous estimates. We also demonstrated post-hoc estimation of the gliding drag coefficient and body density using transmitted data, which is especially useful when drag parameters cannot be estimated with sufficient accuracy before tag deployment.
1. Spatially explicit densities of wildlife are important for understanding environmental drivers of populations, and density surfaces of intraspecific classes allow exploration of links between demographic ratios and environmental conditions. Although spatially explicit densities and class densities are valuable, conventional design-based estimators remain prevalent when using camera-trapping
methods for unmarked populations.
2. We developed a density surface model that utilized camera trap distance sampling data within a hierarchical generalized additive modelling framework. We
estimated density surfaces of intraspecific classes of a common ungulate, white-tailed deer Odocoileus virginianus, across three large management regions in Indiana, United States. We then extended simple statistical theory to test for differences in two ratios of density.
3. Deer density was influenced by landscape fragmentation, wetlands and anthropogenic development. We documented class-specific responses of density to availability of concealment cover, and found strong evidence that increased recruitment of young was tied to increased resource availability from anthropogenic agricultural land use. The coefficients of variation of the total density estimates within the three regions we surveyed were 0.11, 0.10 and 0.06.
4. Synthesis and applications. Our strategy extends camera trap distance sampling and enables managers to use camera traps to better understand spatial predictors of density. Our density estimates were more precise than previous estimates from camera trap distance sampling. Population managers can use our methods to detect finer spatiotemporal changes in density or ratios of intraspecific-class densities. Such changes in density can be linked to land use, or to management regimes on habitat and harvest limits of game species.
Among the many diversity indices in the ecologist toolbox, measures that can be partitioned into additive terms are particularly useful as the different components can be related to different ecological processes shaping community structure.
In this paper, an additive diversity decomposition is proposed to partition the diversity structure of a given community into three complementary fractions: functional diversity, functional redundancy and species dominance. These three components sum up to one. Therefore, they can be used to portray the community structure in a ternary diagram.
Since the identification of community-level patterns is an essential step to investigate the main drivers of species coexistence, the ternary diagram of functional diversity can be used to relate different facets of diversity to community assembly processes more exhaustively than looking only at one index at a time.
The value of the proposed diversity decomposition is demonstrated by the analysis of actual abundance data on plant assemblages sampled in grazed and ungrazed grasslands in Tuscany (Central Italy).
Keyfitz' entropy is a widely used metric to quantify the shape of the survivorship curve of populations, from plants to animals and microbes. Keyfitz' entropy values <1 correspond to life histories with an increasing mortality rate with age (i.e. actuarial senescence), whereas values >1 correspond to species with a decreasing mortality rate with age (negative senescence), and a Keyfitz entropy of exactly 1 corresponds to a constant mortality rate with age. Keyfitz' entropy was originally defined using a continuous‐time model, and has since been discretised to facilitate its calculation from discrete‐time demographic data.
Here, we show that the previously used discretisation of the continuous‐time metric does not preserve the relationship with increasing, decreasing or constant mortality rates. To resolve this discrepancy, we propose a new discrete‐time formula for Keyfitz' entropy for age‐classified life histories.
We show that this new method of discretisation preserves the relationship with increasing, decreasing, or constant mortality rates. We analyse the relationship between the original and the new discretisation, and we find that the existing metric tends to underestimate Keyfitz' entropy for both short‐lived species and long‐lived species, thereby introducing a consistent bias.
To conclude, to avoid biases when classifying life histories as (non‐)senescent, we suggest researchers use either the new metric proposed here, or one of the many previously suggested survivorship shape metrics applicable to discrete‐time demographic data such as Gini coefficient or Hayley's median.
Estimating the genetic variation underpinning a trait is crucial to understanding and predicting its evolution. A key statistical tool to estimate this variation is the animal model. Typically, the environment is modelled as an external variable independent of the organism, affecting the focal phenotypic trait via phenotypic plasticity. We studied what happens if the environment is not independent of the organism because it chooses or adjusts its environment, potentially creating non‐zero genotype–environment correlations.
We simulated a set of biological scenarios assuming the presence or absence of a genetic basis for a focal phenotypic trait and/or the focal environment (treated as an extended phenotype), as well as phenotypic plasticity (the effect of the environment on the phenotypic trait) and/or ‘environmental plasticity’ (the effect of the phenotypic trait on the local environment). We then estimated the additive genetic variance of the phenotypic trait and/or the environment by applying five animal models which differed in which variables were fitted as the dependent variable and which covariates were included.
We show that animal models can estimate the additive genetic variance of the local environment (i.e. the extended phenotype) and can detect environmental plasticity. We show that when the focal environment has a genetic basis, the additive genetic variance of a phenotypic trait increases if there is phenotypic plasticity. We also show that phenotypic plasticity can be mistakenly inferred to exist when it is actually absent and instead environmental plasticity is present. When the causal relationship between the phenotype and the environment is misunderstood, it can lead to severe misinterpretation of the genetic parameters, including finding ‘phantom’ genetic variation for traits that, in reality, have none. We also demonstrate how using bivariate models can partly alleviate these issues. Finally, we provide the mathematical equations describing the expected estimated values.
This study highlights that not taking gene–environment correlations into account can lead to erroneous interpretations of additive genetic variation and phenotypic plasticity estimates. If we aim to understand and predict how organisms adapt to environmental change, we need a better understanding of the mechanisms that may lead to gene–environment correlations.
The flexibility of UAV-lidar remote sensing offers a myriad of new opportunities for savanna ecology, enabling researchers to measure vegetation structure at a variety of temporal and spatial scales. However, this flexibility also increases the number of customizable variables, such as flight altitude, pattern, and sensor parameters, that, when adjusted, can impact data quality as well as the applicability of a dataset to a specific research interest.
To better understand the impacts that UAV flight patterns and sensor parameters have on vegetation metrics, we compared 7 lidar point clouds collected with a Riegl VUX − 1LR over a 300 × 300 m area in the Kruger National Park, South Africa. We varied the altitude (60 m above ground, 100 m, 180 m, and 300 m) and sampling pattern (slowing the flight speed, increasing the overlap between flightlines and flying a crosshatch pattern), and compared a variety of vertical vegetation metrics related to height and fractional cover.
Comparing vegetation metrics from acquisitions with different flight patterns and sensor parameters, we found that both flight altitude and pattern had significant impacts on derived structure metrics, with variation in altitude causing the largest impacts. Flying higher resulted in lower point cloud heights, leading to a consistent downward trend in percentile height metrics and fractional cover. The magnitude and direction of these trends also varied depending on the vegetation type sampled (trees, shrubs or grasses), showing that the structure and composition of savanna vegetation can interact with the lidar signal and alter derived metrics. While there were statistically significant differences in metrics among acquisitions, the average differences were often on the order of a few centimetres or less, which shows great promise for future comparison studies.
We discuss how these results apply in practice, explaining the potential trade-offs of flying at higher altitudes and with alternate patterns. We highlight how flight and sensor parameters can be geared toward specific ecological applications and vegetation types, and we explore future opportunities for optimizing UAV-lidar sampling designs in savannas.
Ecological data are being collected at a large scale from a multitude of different sources, each with their own sampling protocols and assumptions. As a result, the integration of disparate datasets is a rapidly growing area in quantitative ecology, and is subsequently becoming a major asset in understanding the shifts and trends in species' distributions.
However, the tools and software available to construct statistical models to integrate these disparate datasets into a unified framework is lacking. This has made these methods inaccessible to general practitioners and has stagnated the growth of data integration in more applied settings.
We therefore present PointedSDMs: an easy to use R package used to construct integrated species distribution models. It provides functions to easily format the data, fit the models in a computationally efficient way and presents the output in a format that is convenient for additional work.
This paper illustrates the different uses and functions available in the package, which are designed to simplify the modelling of integrated models. A case study using the package is also presented: combining three datasets coming from different sampling protocols, all containing records of Setophaga caerulescens across Pennsylvania state.
Organisms such as allopolyploids and F1 hybrids contain multiple distinct subgenomes, each potentially with its own evolutionary history. These organisms present a challenge for multilocus phylogenetic inference and other analyses since it is not apparent which gene copies from different loci are from the same subgenome and thus share an evolutionary history.
Here we introduce homologizer, a flexible Bayesian approach that uses a phylogenetic framework to infer the phasing of gene copies across loci into their respective subgenomes.
Through the use of simulation tests, we demonstrate that homologizer is robust to a wide range of factors, such as incomplete lineage sorting and the phylogenetic informativeness of loci. Furthermore, we establish the utility of homologizer on real data, by analysing a multilocus dataset consisting of nine diploids and 19 tetraploids from the fern family Cystopteridaceae.
Finally, we describe how homologizer may potentially be used beyond its core phasing functionality to identify non‐homologous sequences, such as hidden paralogs or contaminants.
Some capture–recapture models for population estimation cannot easily be fitted by the usual methods (maximum likelihood and Markov‐chain Monte Carlo). For example, there is no straightforward probability model for the capture of animals in traps that hold a maximum of one individual (‘single‐catch traps’), yet such data are commonly collected. It is usual to ignore the limit on individuals per trap and analyse with a competing‐risk ‘multi‐catch’ model that gives unbiased estimates of average density. However, that approach breaks down for models with varying density.
Simulation and inverse prediction was suggested by Efford (2004) for estimating population density with data from single‐catch traps, but the method has been little used, in part because the existing software allows only a narrow range of models. I describe a new R package that refines the method and extends it to include models with varying density, trap interference and other sources of non‐independence among detection histories.
The method depends on (i) a function of the data that generates a proxy for each parameter of interest and (ii) functions to simulate new datasets given values of the parameters. By simulating many datasets, it is possible to infer the relationship between proxies and parameters and, by inverting that relationship, to estimate the parameters from the observed data.
The method is applied to data from a trapping study of brushtail possums Trichosurus vulpecula in New Zealand. A feature of these data is the high frequency of non‐capture events that disabled traps (interference). Allowing for a time‐varying interference process in a model fitted by simulation and inverse prediction increased the steepness of inferred year‐on‐year population decline. Drawbacks and possible extensions of the method are discussed.
Neighbor‐net is a widely used network reconstructing method that approximates pairwise distances between taxa by a circular phylogenetic network.
We present Lpnet, a variant of Neighbor‐net. We first apply standard methods to construct a binary phylogenetic tree and then use integer linear programming to compute an optimal circular ordering that agrees with all tree splits.
This approach achieves an improved approximation of the input distance for the clear majority of experiments that we have run for simulated and real data. We release an implementation in R that can handle up to 94 taxa and usually needs about 1 min on a standard computer for 80 taxa. For larger taxa sets, we include a top‐down heuristic which also tends to perform better than Neighbor‐net.
Our Lpnet provides an alternative to Neighbor‐net and performs better in most cases. We anticipate Lpent will be useful to generate phylogenetic hypotheses.
Mitochondrial DNA (mtDNA) sequences are often found as byproducts in next‐generation sequencing (NGS) datasets that were originally created to capture genomic or transcriptomic information of an organism. These mtDNA sequences are often discarded, wasting this valuable sequencing information.
We developed MitoGeneExtractor, an innovative tool which allows to extract mitochondrial protein coding genes (PCGs) of interest from NGS libraries through multiple sequence alignments of sequencing reads to amino acid references. General references, for example on order level are sufficient for mining mitochondrial PCGs. In a case study, we applied MitoGeneExtractor to recently published genomic datasets of 1993 birds and were able to extract complete or nearly complete sequences for all 13 mitochondrial PCGs for a large proportion of libraries. Compared to an existing assembly guided sequence reconstruction algorithm, MitoGeneExtractor was faster and substantially more sensitive.
We compared COI sequences mined with MitoGeneExtractor to COI databases. Mined sequences show a high sequence similarity and correct taxonomic assignment between the recovered sequence and the assigned morphospecies in most samples. In some cases of incongruent taxonomic assignments, we found evidence for contamination in NGS libraries.
MitoGeneExtractor allows a fast extraction of mitochondrial PCGs from a wide range of NGS datasets. We recommend to routinely harvest and curate mitochondrial sequence information from genomic resources. MitoGeneExtractor output can be used to identify contaminated NGS libraries and to validate the species identity of the sequenced animal based on the extracted COI sequences.
Ecological variables may be expressed on four basic measurement scales (nominal, ordinal, interval or ratio), whereas circular variables and those combining a nominal state with other scale types are also common. However, existing methods are not suited to calculate correlations between all pairwise combinations of such variables, preventing the application of standard multivariate techniques.
The essence of the new approach is to derive a so‐called difference semimatrix for all pairs of observations for each variable, and then to calculate the matrix correlation based on two such semimatrices. The advantage of this function, termed d‐correlation, is that comparisons are made on the same logical basis regardless of the measurement scale, allowing for the use of principal components analysis to visualize interrelationships among many variables simultaneously. Further advantages are that missing values in the data are tolerated and that the Gower index of dissimilarity between objects may also be computed.
The use of the method is demonstrated on a small toy matrix, an artificial plant trait matrix and a large dataset summarizing ecological features of all vascular plant species of Sardinia, Italy. The source code in R and FORTRAN, and applications for three different operation systems, are provided for computations with results serving as input for other statistical software.
The new computational framework will allow the comparison of any types of ecological traits in a mathematically meaningful manner. This option was not available earlier in the field of multivariate statistics, and the method is expected to receive applications in other subject areas as well in which many objects are described in terms of variables expressed on different measurement scales.
Metabarcoding (high‐throughput sequencing of marker gene amplicons) has emerged as a promising and cost‐effective method for characterizing insect community samples. Yet, the methodology varies greatly among studies and its performance has not been systematically evaluated to date. In particular, it is unclear how accurately metabarcoding can resolve species communities in terms of presence‐absence, abundance and biomass.
Here we use mock community experiments and a simple probabilistic model to evaluate the effect of different DNA extraction protocols on metabarcoding performance. Specifically, we ask four questions: (Q1) How consistent are the recovered community profiles across replicate mock communities?; (Q2) How does the choice of lysis buffer affect the recovery of the original community?; (Q3) How are community estimates affected by differing lysis times and homogenization? and (Q4) Is it possible to obtain adequate species abundance estimates through the use of biological spike‐ins?
We show that estimates are quite variable across community replicates. In general, a mild lysis protocol is better at reconstructing species lists and approximate counts, while homogenization is better at retrieving biomass composition. Small insects are more likely to be detected in lysates, while some tough species require homogenization to be detected. Results are less consistent across biological replicates for lysates than for homogenates. Some species are associated with strong PCR amplification bias, which complicates the reconstruction of species counts. Yet, with adequate spike‐in data, species abundance can be determined with roughly 40% standard error for homogenates, and with roughly 50% standard error for lysates, under ideal conditions. In the latter case, however, this often requires species‐specific reference data, while spike‐in data generalize better across species for homogenates.
We conclude that a non‐destructive, mild lysis approach shows the highest promise for the presence/absence description of the community, while also allowing future morphological or molecular work on the material. However, homogenization protocols perform better for characterizing community composition, in particular in terms of biomass.
Network theory allows us to understand complex systems by evaluating how their constituent elements interact with one another. Such networks are built from matrices which describe the effect of each element on all others. Quantifying the strength of these interactions from empirical data can be difficult, however, because the number of potential interactions increases nonlinearly as more elements are included in the system, and not all interactions may be empirically observable when some elements are rare.
We present a novel modelling framework which uses measures of species performance in the presence of varying densities of their potential interaction partners to estimate the strength of pairwise interactions in diverse horizontal systems.
Our method allows us to directly estimate pairwise effects when they are statistically identifiable and to approximate pairwise effects when they would otherwise be statistically unidentifiable. The resulting interaction matrices can include positive and negative effects, the effect of a species on itself, and allows for non‐symmetrical interactions.
We show how to link the parameters inferred by our framework to a population dynamics model to make inferences about the effect of interactions on community dynamics and diversity.
The advantages of these features are illustrated with a case study on an annual wildflower community of 22 focal and 52 neighbouring species, and a discussion of potential applications of this framework extending well beyond plant community ecology.
Unbiased mortality estimates are fundamental for testing ecological and evolutionary theory as well as for developing effective conservation actions. However, mortality estimates are often confounded by dispersal, especially in studies where dead‐recovery is not possible. In such instances, missing individuals (i.e. individuals with unobserved time of death) may have died or permanently emigrated from a study area, making inferences about their fate difficult. Mortality before and during dispersal, as well as the decision to disperse, usually depend on a suite of individual, social and environmental covariates, which in turn can be used to draw conclusions about the fate of missing individuals.
Here, we propose a Bayesian hierarchical model that takes into account time‐varying covariates to estimate transitions between life‐history states and mortality in each state using mark‐resighting data with missing individuals. Specifically, our framework estimates mortality rates in two states (resident and dispersing state) by treating the fate of missing individuals as a latent (i.e. unobserved) variable that is statistically inferred based on information from individuals with a known fate and given the individual, social and environmental conditions at the time of disappearance. Our model also estimates rates of state transition (i.e. emigration) to assess whether a missing individual was more likely to have died or survived due to unobserved emigration from the study area.
We used simulations to check the validity of our model and assessed its performance with data of varying degrees of uncertainty. Our modelling framework provided accurate mortality and emigration estimates for simulated data of different sample sizes, proportions of missing individuals, and resighting intervals. Variation in sample size appeared to affect the precision of estimated parameters the most.
Our approach offers a solution to estimating unbiased mortality of both resident and dispersing individuals as well as the probability of emigration using mark‐resighting data with incomplete death records. Conditional on the availability of data on known‐fate individuals and relevant time‐varying covariates, our model can reconstruct the fate (death or emigration) of missing individuals. The modularity of our framework allows mortality analyses to be tailored to a variety of species‐specific life histories.
Large‐scale monitoring of seasonal animal movement is integral to science, conservation and outreach. However, gathering representative movement data across entire species ranges is frequently intractable. Citizen science databases collect millions of animal observations throughout the year, but it is challenging to infer individual movement behaviour solely from observational data.
We present birdflow, a probabilistic modelling framework that draws on citizen science data from the eBird database to model the population flows of migratory birds. We apply the model to 11 species of North American birds, using GPS and satellite tracking data to tune and evaluate model performance.
We show that birdflow models can accurately infer individual seasonal movement behaviour directly from eBird relative abundance estimates. Supplementing the model with a sample of tracking data from wild birds improves performance.
Researchers can extract a number of behavioural inferences from model results, including migration routes, timing, connectivity and forecasts. The birdflow framework has the potential to advance migration ecology research, boost insights gained from direct tracking studies and serve a number of applied functions in conservation, disease surveillance, aviation and public outreach.
Integrating ecological theory with empirical methods is ubiquitous in ecology using hierarchical Bayesian models. However, there has been little development focused on integration of ecological theory into models for survival analysis. Survival is a fundamental process, linking individual fitness with population dynamics, but incorporating life history strategies to inform survival estimation can be challenging because mortality processes occur at multiple scales.
We develop an approach to survival analysis, incorporating model constraints based on a species' life history strategy using functional analytical tools. Specifically, we structurally separate intrinsic patterns of mortality that arise from age‐specific processes (e.g. increasing survival during early life stages due to growth or maturation, versus senescence) from extrinsic mortality patterns that arise over different periods of time (e.g. seasonal temporal shifts). We use shape constrained generalized additive models (CGAMs) to obtain age‐specific hazard functions that incorporate theoretical information based on classical survivorship curves into the age component of the model and capture extrinsic factors in the time component.
We compare the performance of our modelling approach to standard survival modelling tools that do not explicitly incorporate species life history strategy in the model structure, using metrics of predictive power, accuracy, efficiency and computation time. We applied these models to two case studies that reflect different functional shapes for the underlying survivorship curves, examining age‐period survival for white‐tailed deer Odocoileus virginianus in Wisconsin, USA and Columbian sharp‐tailed grouse Tympanuchus phasianellus columbianus in Colorado, USA.
We found that models that included shape constraints for the age effects in the hazard curves using CGAMs outperformed models that did not include explicit functional constraints. We demonstrate a data‐driven and easily extendable approach to survival analysis by showing its utility to obtain hazard rates and survival probabilities, accounting for heterogeneity across ages and over time, for two very different species. We show how integration of ecological theory using constrained generalized additive models, with empirical statistical methods, enhances survival analyses.