Methods in Ecology and Evolution

Published by Wiley and British Ecological Society

Online ISSN: 2041-210X

Disciplines: Methods & Statistics in Ecology

Journal websiteAuthor guidelines

Top read articles

635 reads in the past 30 days

The use of focus group discussion methodology: Insights from two decades of application in conservation

January 2018

·

60,511 Reads

·

·

·

Focus group discussion is frequently used as a qualitative approach to gain an in‐depth understanding of social issues. The method aims to obtain data from a purposely selected group of individuals rather than from a statistically representative sample of a broader population. Even though the application of this method in conservation research has been extensive, there are no critical assessment of the application of the technique. In addition, there are no readily available guidelines for conservation researchers. Here, we reviewed the applications of focus group discussion within biodiversity and conservation research between 1996 and April 2017. We begin with a brief explanation of the technique for first‐time users. We then discuss in detail the empirical applications of this technique in conservation based on a structured literature review (using Scopus). The screening process resulted in 170 articles, the majority of which (67%, n = 114,) were published between 2011 and 2017. Rarely was the method used as a stand‐alone technique. The number of participants per focus group (where reported) ranged from 3 to 21 participants with a median of 10 participants. There were seven (median) focus group meetings per study. Focus group discussion sessions lasted for 90 (median) minutes. Four main themes emerged from the review: understanding of people's perspectives regarding conservation (32%), followed by the assessment of conservation and livelihoods practices (21%), examination of challenges and impacts of resource management interventions (19%) and documenting the value of indigenous knowledge systems (16%). Most of the studies were in Africa ( n = 76), followed by Asia ( n = 44), and Europe ( n = 30). We noted serious gaps in the reporting of the methodological details in the reviewed papers. More than half of the studies ( n = 101) did not report the sample size and group size ( n = 93), whereas 54 studies did not mention the number of focus group discussion sessions while reporting results. Rarely have the studies provided any information on the rationale for choosing the technique. We have provided guidelines to improve the standard of reporting and future application of the technique for conservation.
Download

203 reads in the past 30 days

Fig. 1. (a) Sample-size-based and (c) coverage-based rarefaction (solid line segment) and extrapolation (dotted line segments) sampling curves with 95% confidence intervals (shaded areas) for the spider data of two treatments, separately by diversity order: q = 0 (species richness, left panel), q = 1 (Shannon diversity, middle panel) and q = 2 (Simpson diversity, right panel). The solid dots/triangles represent the reference samples. (b) Sample completeness curves linking curves in (a) and (c).
Fig. 2. (a) Sample-size-based and (c) coverage-based rarefaction (solid line segment) and extrapolation (dotted line segments) sampling curves for species richness (q = 0) with 95% confidence intervals (shaded areas) for the tropical ant data at five elevations. The solid dots and the other four symbols represent the reference samples. (b) Sample completeness curves linking curves in (a) and (c). iNEXT offers a customized graphic theme to change grey background to black and white (see the Appendix S1 for details).
List of the functions in the iNEXT package and their description
iNEXT: An R package for rarefaction and extrapolation of species diversity (Hill numbers)

June 2016

·

5,659 Reads

172 reads in the past 30 days

Using acoustic indices in ecology: Guidance on study design, analyses and interpretation

August 2023

·

404 Reads

·

·

·

[...]

·

Aims and scope


Promotes development of new methods in ecology and evolution, and facilitates their dissemination and uptake by the research community.

Recent articles


Describing posterior distributions of variance components: Problems and the use of null distributions to aid interpretation
  • New
  • Article
  • Full-text available

September 2023

·

49 Reads

Assessing the biological relevance of variance components estimated using Markov chain Monte Carlo (MCMC)‐based mixed‐effects models is not straightforward. Variance estimates are constrained to be greater than zero and their posterior distributions are often asymmetric. Different measures of central tendency for these distributions can therefore vary widely, and credible intervals cannot overlap zero, making it difficult to assess the size and statistical support for among‐group variance. Statistical support is often assessed through visual inspection of the whole posterior distribution and so relies on subjective decisions for interpretation. We use simulations to demonstrate the difficulties of summarizing the posterior distributions of variance estimates from MCMC‐based models. We then describe different methods for generating the expected null distribution (i.e. a distribution of effect sizes that would be obtained if there was no among‐group variance) that can be used to aid in the interpretation of variance estimates. Through comparing commonly used summary statistics of posterior distributions of variance components, we show that the posterior median is predominantly the least biased. We further show how null distributions can be used to derive a p ‐value that provides complementary information to the commonly presented measures of central tendency and uncertainty. Finally, we show how these p ‐values facilitate the implementation of power analyses within an MCMC framework. The use of null distributions for variance components can aid study design and the interpretation of results from MCMC‐based models. We hope that this manuscript will make empiricists using mixed models think more carefully about their results, what descriptive statistics they present and what inference they can make.
Share

Accounting for unobserved spatial variation in step selection analyses of animal movement via spatial random effects

August 2023

·

58 Reads

Step selection analysis (SSA) is a common framework for understanding animal movement and resource selection using telemetry data. Such data are, however, inherently autocorrelated in space, a complication that could impact SSA‐based inference if left unaddressed. Accounting for spatial correlation is standard statistical practice when analysing spatial data, and its importance is increasingly recognized in ecological models (e.g. species distribution models). Nonetheless, no framework yet exists to account for such correlation when analysing animal movement using SSA. Here, we extend the popular method integrated step selection analysis (iSSA) by including a Gaussian field (GF) in the linear predictor to account for spatial correlation. For this, we use the Bayesian framework R‐INLA and the stochastic partial differential equations (SPDE) technique. We show through a simulation study that our method provides accurate fixed effects estimates, quantifies their uncertainty well and improves the predictions. In addition, we demonstrate the practical utility of our method by applying it to three wolverine ( Gulo gulo ) tracks. Our method solves the problems of assuming spatially independent residuals in the SSA framework. In addition, it offers new possibilities for making long‐term predictions of habitat usage.

NAPS : Integrating pose estimation and tag‐based tracking

August 2023

·

11 Reads

Significant advances in computational ethology have allowed the quantification of behaviour in unprecedented detail. Tracking animals in social groups, however, remains challenging as most existing methods can either capture pose or robustly retain individual identity over time but not both. To capture finely resolved behaviours while maintaining individual identity, we built NAPS (NAPS is ArUco Plus SLEAP), a hybrid tracking framework that combines state‐of‐the‐art, deep learning‐based methods for pose estimation (SLEAP) with unique markers for identity persistence (ArUco). We show that this framework allows the exploration of the social dynamics of the common eastern bumblebee ( Bombus impatiens ). We provide a stand‐alone Python package for implementing this framework along with detailed documentation to allow for easy utilization and expansion. We show that NAPS can scale to long timescale experiments at a high frame rate and that it enables the investigation of detailed behavioural variation within individuals in a group. Expanding the toolkit for capturing the constituent behaviours of social groups is essential for understanding the structure and dynamics of social networks. NAPS provides a key tool for capturing these behaviours and can provide critical data for understanding how individual variation influences collective dynamics.

RFIDeep: Unfolding the potential of deep learning for radio- frequency identification

August 2023

·

28 Reads

1. Automatic monitoring of wildlife is becoming a critical tool in the field of ecology. In particular, Radio-Frequency IDentification (RFID) is now a widespread technology to assess the phenology, breeding and survival of many species. While RFID produces massive datasets, no established fast and accurate methods are yet available for this type of data processing. Deep learning approaches have been used to overcome similar problems in other scientific fields and hence might hold the potential to overcome these analytical challenges and unlock the full potential of RFID studies. 2. We present a deep learning workflow, coined "RFIDeep", to derive ecological features, such as breeding status and outcome, from RFID mark-recapture data. To demonstrate the performance of RFIDeep with complex datasets, we used a long-term automatic monitoring of a long-lived seabird that breeds in densely packed colonies, hence with many daily entries and exits. 3. To determine individual breeding status and phenology and for each breeding season, we first developed a one-dimensional convolution neural network (1D-CNN) architecture. Second, to account for variance in breeding phenology and technical limitations of field data acquisition, we built a new data augmentation

An integrative modelling framework for passive acoustic telemetry

August 2023

·

68 Reads

Passive acoustic telemetry is widely used to study the movements of aquatic animals. However, a holistic, mechanistic modelling framework that permits the reconstruction of fine‐scale movements and emergent patterns of space use from detections at receivers remains lacking. Here, we introduce an integrative modelling framework that recapitulates the movement and detection processes that generate detections to reconstruct fine‐scale movements and patterns of space use. This framework is supported by a new family of algorithms designed for detection and depth observations and can be flexibly extended to incorporate other data types. Using simulation, we illustrate applications of our framework and evaluate algorithm utility and sensitivity in different settings. As a case study, we analyse movement data collected from the Critically Endangered flapper skate ( Dipturus intermedius ) in Scotland. We show that our methods can be used to reconstruct fine‐scale movement paths, patterns of space use and support habitat preference analyses. For reconstructing patterns of space use, simulations show that the methods are consistently more instructive than the most widely used alternative approach (the mean‐position algorithm), particularly in clustered receiver arrays. For flapper skate, the reconstruction of movements reveals responses to disturbance, fine‐scale spatial partitioning and patterns of space use with significant implications for marine management. We conclude that this framework represents a widely applicable methodological advance with applications to studies of pelagic, demersal and benthic species across multiple spatiotemporal scales.

Using acoustic indices in ecology: Guidance on study design, analyses and interpretation

August 2023

·

404 Reads

The rise of passive acoustic monitoring and the rapid growth in large audio datasets is driving the development of analysis methods that allow ecological inferences to be drawn from acoustic data. Acoustic indices are currently one of the most widely applied tools in ecoacoustics. These numerical summaries of the sound energy contained in digital audio recordings are relatively straightforward and fast to calculate but can be challenging to interpret. Misapplication and misinterpretation have produced conflicting results and led some to question their value. To encourage better use of acoustic indices, we provide nine points of guidance to support good study design, analysis and interpretation. We offer practical recommendations for the use of acoustic indices in the study of both whole soundscapes and individual taxa and species, and point to emerging trends in ecoacoustic analysis. In particular, we highlight the critical importance of understanding the links between soundscape patterns and acoustic indices. Acoustic indices can offer insights into the state of organisms, populations, and ecosystems, complementing other ecological research techniques. Judicious selection, appropriate application and thorough interpretation of existing indices is vital to bolster robust developments in ecoacoustics for biodiversity monitoring, conservation and future research.

Individual‐based models of avian migration for estimating behavioural traits and predicting ecological interactions

July 2023

·

73 Reads

Rapid advances in the field of movement ecology have led to increasing insight into both the population‐level abundance patterns and individual‐level behaviour of migratory species. Despite this progress, research questions that require scaling individual‐level understanding of the behaviour of migrating organisms to the population level remain difficult to investigate. To bridge this gap, we introduce a generalizable framework for training full‐annual cycle individual‐based models of migratory movements by combining information from tracking studies and species occurrence records. Focusing on migratory birds, we call this method: Models of Individual Movement of Avian Species (MIMAS). We implement MIMAS to design individual‐based models of avian migration that are trained using previously published weekly occurrence maps and fit via Approximate Bayesian Computation. MIMAS models leverage individual‐ and population‐level information to faithfully represent continental‐scale migration patterns. Models can be trained successfully for species even when little existing individual‐level data is available for parameterization by relying on population‐level information. In contrast to existing mathematical models of migration, MIMAS explicitly represents and estimates behavioural attributes of migrants. MIMAS can additionally be used to simulate movement over consecutive migration seasons, and models can be easily updated or validated as new empirical data on migratory behaviours becomes available. MIMAS can be applied to a variety of research questions that require representing individual movement at large scales. We demonstrate three applied uses for MIMAS: estimating population‐specific migratory phenology, predicting the spatial patterns and magnitude of ectoparasite dispersal by migrants, and simulating the spread of a pathogen across the annual cycle of a migrant species. Currently, MIMAS can easily be used to build models for hundreds of migratory landbird species but can also be adapted in the future to build models of other types of migratory animals.

Tracking marine tetrapod carcasses using a low-cost mixed methodology with GPS trackers, passive drifters and citizen science

July 2023

·

67 Reads

Drift experiments are essential to understand stranding patterns and estimate the mortality of beached animals. Most studies do not use telemetry technology due to the high costs of this methodology. The objective of this paper is to describe the possibilities of tracking marine tetrapod carcasses with a low‐cost and replicable methodology. The study was carried out on the Southern Subtropical Shelf (~28°–34°S), a highly productive and key ecological region of the southwestern Atlantic Ocean (SWA). We designed and tested a low‐cost mixed methodology that includes Global Positioning System trackers, passive drifters (reused glass bottles) and Citizen Science (through an instant message platform and email) to track carcasses of marine tetrapods. We conducted four drift experiments during the four seasons of 2019. We released 787 drifters (600 nonbiological and 187 carcasses of seabirds, sea turtles, and cetaceans) at sea, at five equally separated distances (5–25 km) from the coast. Beach surveys and citizen science were implemented to recover the beached drifters. We recovered 71.83% of non‐biological drifters and 27.27% of carcasses released. We tracked the movements of 38 carcasses (25 sea turtles and 13 cetaceans) with 17 GPS devices. The drifting time, until reaching the beach, ranged from 12 h to 17 days for carcasses and 12 h to 406 days for bottles. Citizen Science was the most important source of recovery of nonbiological drifters, representing 66.67% of the total recovered bottles. For carcasses, active search was the most important recovery source, representing 64.7% of the total carcasses recovered. Our study contributes with new findings on marine tetrapod drift patterns in the SWA and describes an accessible low‐cost mixed methodology for small and medium‐budget projects that can be replicated in other coastal regions of the world for tracking a wide range of marine tetrapod species.

Automation of tree‐ring detection and measurements using deep learning

July 2023

·

42 Reads

Core samples from trees are a critical reservoir of ecological information, informing our understanding of past climates, as well as contemporary ecosystem responses to global change. Manual measurements of annual growth rings in trees are slow, labour‐intensive and subject to human bias, hindering the generation of big datasets. We present an alternative, neural network‐based implementation that automates detection and measurement of tree‐ring boundaries from coniferous species. We trained our Mask R‐CNN extensively on over 8000 manually annotated ring boundaries from microscope‐imaged Norway Spruce Picea abies increment cores. We assessed the performance of the trained model after post‐processing on real‐world data generated from our core processing pipeline. The CNN after post‐processing performed well, with recognition of over 98% of ring boundaries (recall) with a precision in detection of 96% when tested on real‐world data. Additionally, we have implemented automatic measurements based on minimum distance between rings. With minimal editing for missed ring detections, these measurements were 98% correlated with human measurements of the same samples. Tests on other three conifer species demonstrate that the CNN generalizes well to other species with similar structure. We demonstrate the efficacy of automating the measurement of growth increment in tree core samples. Our CNN‐based system provides high predictive performance in terms of both tree‐ring detection and growth rate determination. Our application is readily deployable as a Docker container and requires only basic command line skills. Additionally, an easy re‐training option allows users to expand capabilities to other wood types. Application outputs include both editable annotations of predictions as well as ring‐width measurements in a commonly used .pos format, facilitating the efficient generation of large ring‐width measurement datasets from increment core samples, an important source of environmental data.

poolHelper : An R package to help in designing Pool‐Seq studies

July 2023

·

18 Reads

Next‐generation sequencing of pooled samples (Pool‐seq) is an important tool in population genomics and molecular ecology. In Pool‐seq, the relative number of reads with an allele reflects the allele frequencies in the sample. However, unequal individual contributions to the pool and sequencing errors can lead to inaccurate allele frequency estimates, influencing downstream analysis. When designing Pool‐seq studies, researchers need to decide the pool size (number of individuals) and average depth of coverage (sequencing effort). An efficient sampling design should maximise the accuracy of allele frequency estimates while minimising the sequencing effort. We describe a novel tool to simulate single nucleotide polymorphism (SNP) data using coalescent theory and account for sources of uncertainty in Pool‐seq. We introduce an R package, poolHelper , enabling users to simulate Pool‐seq data under different combinations of average depth of coverage and pool size, accounting for unequal individual contributions and sequencing errors, modelled by adjustable parameters. The mean absolute error is computed by comparing the sample allele frequencies obtained based on individual genotypes with the frequency estimates obtained with Pool‐seq. poolHelper enables users to simulate multiple combinations of pooling errors, average depth of coverage, pool sizes and number of pools to assess how they influence the error of sample allele frequencies and expected heterozygosity. Using simulations under a single population model, we illustrate that increasing the depth of coverage does not necessarily lead to more accurate estimates, reinforcing that finding the best Pool‐seq study design is not straightforward. Moreover, we show that simulations can be used to identify different combinations of parameters with similarly low mean absolute errors. This can help users to define an effective sampling design by using those combinations of parameters that minimise the sequencing effort. The poolHelper package provides tools for performing simulations with different combinations of parameters (e.g. pool size, depth of coverage, unequal individual contribution) before sampling and generating data, allowing users to define sampling schemes based on simulations. This allows researchers to focus on the best sampling scheme to answer their research questions. poolHelper is comprehensively documented with examples to guide effective use.

CATS : A high‐performance software framework for simulating plant migration in changing environments

July 2023

·

21 Reads

Considering local population dynamics and dispersal is crucial to project species' range adaptations in changing environments. Dynamic models including these processes are highly computer intensive, with consequent restrictions on spatial extent and/or resolution. We present CATS, an open‐source, extensible modelling framework for simulating spatially and temporarily explicit population dynamics of plants. It can be used in conjunction with species distribution models, or via direct parametrisation of vital rates and allows for fine‐grained control over the demographic and dispersal processes' models. The performance and flexibility of CATS is exemplified (i) by modelling the range shift of four plant species under three future climate scenarios across Europe at a spatial resolution of 100 m., and (ii) by exploring consequences of demographic compensation for range expansion on artificial landscapes. The presented software attempts to leverage the availability of computational resources and lower the barrier of entry for large‐extent, fine‐resolution simulations of plant range shifts in changing environments.

An illustration of how the basis function approach works for purely spatial data. The community‐level basis function model attempts to approximate the underlying spatial correlation structure by replacing the latent variables (left panel) in a spatial latent variable model with a linear combination of basis functions (BF 1–4, panels on the right‐hand side), which may be interpreted as a set of building blocks or basis fields. The quantities that need to be estimated then are the weights (νk1−νk4) to allocate to each building blocks.
Performance for estimated species‐specific regression coefficients in the simulation study with presence‐absence responses. Each boxplot is based on 60 points (J=30 species and two slope effects per species). Within each panel, the columns correspond to the size of the training set, while the rows correspond to whether the two unobserved covariates were stationary or non‐stationary spatial fields. Note LVM (NNGP) and LVM (GPP) are excluded from comparison since they used a different link function from the true model that is probit instead of logit link. CBFM, community‐level basis function model; LVM, latent variable model; SDM, species distribution model; sjSDM, sparse joint species distribution model.
Evaluation of predictive calibration (Brier score, Tjur R2, logarithmic score) and computation time in the simulation study with presence‐absence responses. Each boxplot in the first three panels is based on 30 points (one per species), while the boxplot on the bottom right panel is based on 200 points (one per simulated dataset). Within each panel, the columns correspond to the size of the training set, while the rows correspond to whether the two unobserved covariates were stationary or non‐stationary spatial fields. Computing times for stacked SDMs were omitted from the comparison. CBFM, community‐level basis function model; LVM, latent variable model; SDM, species distribution model; sjSDM, sparse joint species distribution model.
Results from fitting a community‐level basis function model to the National Oceanic and Atmospheric Administration fall bottom trawl survey. (a) presents a heatmap showing which covariates were found to have statistically clear evidence of a relationship for each of the J=39 demersal species. Statistical significance was based on whether the p‐value of the Wald test (for survey vessel) or the test of the entire smooth term (for the five environmental covariates) was less than 0.05. We refer to Appendix C for plots of the component smooths for each species. (b) is a stacked barplot displaying the proportion of the variance explained by the six predictors, the spatial and the temporal basis functions for each species analysed.
Boxplots of the relative differences in Brier score, Tjur R2 and logarithmic score based on predictions to the test set of the National Oceanic and Atmospheric Administration fall bottom trawl survey. The differences are computed relative to the CBFM with smooth terms of the covariates for example negative values correspond to lower Brier score/Tjur R2/logarithmic score relative to this model. Each boxplot is based on 39 points (one per species). CBFM, community‐level basis function model; GAM, generalised additive model; LVM, latent variable model.
Spatiotemporal joint species distribution modelling: A basis function approach

July 2023

·

98 Reads

We introduce community‐level basis function models (CBFMs) as an approach for spatiotemporal joint distribution modelling. CBFMs can be viewed as related to spatiotemporal latent variable models, where the latent variables are replaced by a set of pre‐specified spatiotemporal basis functions which are common across species. In a CBFM, the coefficients that link the basis functions to each species are treated as random slopes. As such, the CBFM can be formulated to have a similar structure to a generalised additive model. This allows us to adapt existing techniques to fit CBFMs efficiently. CBFMs can be used for a variety of reasons, such as inferring patterns of habitat use in space and time, understanding how residual covariation between species varies spatially and/or temporally, and spatiotemporal predictions of species‐ and community‐level quantities. A simulation study and an application to data from a bottom trawl survey conducted across the U.S. Northeast shelf show that CBFMs can achieve similar and sometimes better predictive performance compared to existing approaches for spatiotemporal joint species distribution modelling, while being computationally more scalable.

TransMCL enables the assembly of full‐length coding genes for collecting complete hierarchical orthogroups

July 2023

·

8 Reads

Transcriptome sequencing technologies have revolutionized the field of phylogenomics by facilitating the identification of homologous genes for species without whole genome sequences. To infer complex evolution relationship among eukaryotes, it is essential to obtain complete sequences of protein‐coding genes to provide informative mutations. However, transcriptomes in eukaryotes often consist of a large number of duplicated genes and alternative isoforms, bringing great challenges for developing effective tools to obtain complete coding gene sequences. Here, we present a net‐flow based assembler TransMCL, which aims to assemble fragmented transcripts into complete sequences while eliminating redundant isoforms during homologue clustering. By employing Markov clustering strategies and homologous gene guidance, TransMCL can accurately assemble genes for multiple organisms in an affordable time frame, making it well‐suited for phylogenomic studies based on transcriptomic data. Our results demonstrate that TransMCL can assemble 89.95%–92.95% of the total expressed genes into near‐complete transcripts on benchmark plant/animal datasets. Furthermore, applying TransMCL to multiple transcriptomes in a single run enhances the completeness of genes, even in the absence of guidance homologues from closely related species. These findings highlight the potential of TransMCL in phylogenomic studies, enabling the comprehensive characterization of gene families at the whole genome scale. By overcoming the challenges of complex gene contents in eukaryotes, TransMCL can significantly enhance our understanding of evolution of gene families across species.

Overview of the general workflow of CATE. The user begins by configuring the query regions and parameters file followed by execution using the command line interface. CATE will then analyse by identifying the file segments within which the pertaining SNP data for the query region being processed is contained. These SNPs will be processed by the GPU in parallel. Finally using the extracted information CATE will calculate the test statistic for the specified neutrality tests and write the results to a file. Under CATE's normal mode, this process will be repeated till all query regions are processed. CATE, CUDA Accelerated Testing of Evolution; CUDA, compute unified device architecture; GPU, graphical processing unit; SNP, single nucleotide polymorphism.
Illustration of the compound interpolated search algorithm's architecture with an example. It collects required segment files that satisfy the query region of interest. The data that satisfies the query region lies within these segment files. The interpolation search algorithm is used to identify the latch point. The latch point is the first file segment that satisfies the query region. The algorithm will search for the remaining segment files by searching the space around the latch point using two independent sequential searches in parallel threads.
Diagrammatic overview of CATE's high‐performance mode Prometheus. Compared to CATE's normal mode the user configures Prometheus using four additional parameters. In contrast to its normal mode, Prometheus processes multiple query regions together as a batch. It uses four additional parallel processing instances depicted as Multithreading blocks to achieve this. The number of threads used by each block is represented in brackets. Multithreading block 1 determines the specific file segments required to satisfy each query region, followed by the removal of redundant file segments. Multithreading block 2 is optional. It is activated in the presence of SSD drives. If present all segment files will be read at once, else each file will be read sequentially. To prevent the overloading of GPU memory the number of SNPs the GPU handles at a time is controlled by the parameter SNPs per time (s). Multithreading block 3 will then process SNPs and validate them as segregating sites in batches. Once the SNPs are sorted by position the respective neutrality test statistics will be calculated in parallel for all query regions in Multithreading block 4. CATE, CUDA Accelerated Testing of Evolution; CPU, central processing unit; CUDA, compute unified device architecture; GPU, graphical processing unit; SNP, single nucleotide polymorphism; SSD, solid state drive.
Assessment of CATE's performance by comparing the time taken to conduct the four neutrality test functions (Tajima's D, Fu and Li's, and Fay and Wu's test statistics) on 22 autosomal chromosomes from the 1000 Genomes Project. (a) CATE's normal mode versus CATE's Prometheus mode for the all‐genes test type. (b) CATE's normal mode versus CATE's Prometheus versus PopGenome for analysing all three neutrality tests using the Neutrality full function for the all‐genes test type. (c) CATE's normal mode versus CATE's Prometheus mode for the window test type (window size = 10,000, step size = 10,000). (d) CATE's normal mode versus CATE's Prometheus versus PopGenome for analysing neutrality tests using the neutrality full function for the window test type (window size = 10,000, step size = 10,000). (b, d) Consider the analysis times with the inclusion of the pre‐processing time for CATE's file hierarchy preparation as well as without it. In all instances, CATE outperforms PopGenome in processing time with increased speed when Prometheus is activated. (e) Bar plot of Prometheus' total run times for the sliding window analysis (window size = 10,000, step size = 1 SNP). (f) Distribution plot breaking down Prometheus' run times to process each autosomal chromosome during the sliding window analysis. It shows that on average most chromosomes were processed within 9–12 min. (a–e) Bar plots show the total time taken by each software to process all 22 chromosomes per neutrality test. CATE, CUDA Accelerated Testing of Evolution; SNP, single nucleotide polymorphism.
CATE: A fast and scalable CUDA implementation to conduct highly parallelized evolutionary tests on large scale genomic data

June 2023

·

18 Reads

1. Statistical tests for molecular evolution provide quantifiable insights into the selection pressures that govern a genome’s evolution. Increasing sample sizes used for analysis leads to higher statistical power. However, this requires more computational nodes or longer computational time. 2. CATE (CUDA Accelerated Testing of Evolution) is a computational solution to this problem comprised of two main innovations. The first is a file organization system coupled with a novel search algorithm and the second is a large-scale parallelization of algorithms using both GPU and CPU. CATE is capable of conducting evolutionary tests such as Tajima’s D, Fu and Li's, and Fay and Wu’s test statistics, McDonald–Kreitman Neutrality Index, Fixation Index, and Extended Haplotype Homozygosity. 3. CATE is magnitudes faster than standard tools with benchmarks estimating it being on average over 180 times faster. For instance, CATE processes all 54,849 human genes for all 22 autosomal chromosomes across the five super populations present in the 1000 Genomes Project in less than thirty minutes while counterpart software took 3.62 days. 4. This proven framework has the potential to be adapted for GPU-accelerated large-scale parallel analyses of many evolutionary and genomic analyses.

Satellite imagery of the study area with the weather radars (green triangles), their 50‐km‐radius coverage (white circles) and international borders (black lines). MER, Mt. Meron; RAM, Mitzpe Ramon.
The different patterns observed in the MER radar with the radial velocity parameter. The date and time of the images are shown at the bottom of each image. (a) rain clouds, (b) passerine migration and (c) soaring bird flock migration. MER, Mt. Meron.
Example of two ways to tag the same image correctly from RAM radar. On (a) with small dots and on (b) with longer lines. We created a method to evaluate the success of the model by contours which considers both ways of tagging. RAM, Mitzpe Ramon.
Example of semantic segmentation results for the MER radar from the 28 August 2018 at 11:44 AM. On this day, many flocks of Honey Buzzards migrated through Israel. The left image is the mask ground truth with red marks of tagged soaring bird flocks over radial velocity radar image, the middle one is the estimated mask produced by the model with blue marks over radial velocity radar image, and the right image is the estimated mask produced by the model (blue) over the mask ground truth (red). The first row is for the 0° scan and the second row is for the 1° scan. MER, Mt. Meron.
Example of semantic segmentation results for the RAM radar from the 24 September 2018 at 10:43 AM. See description from Figure 4. RAM, Mitzpe Ramon.
Automatic detection of migrating soaring bird flocks using weather radars by deep learning

June 2023

·

66 Reads

The use of weather radars to detect and distinguish between different biological patterns greatly improves our understanding of aeroecology and its consequences for our lives. Importantly, it allows us to quantify passerine bird migration at different scales. Yet, no algorithm to detect soaring bird flocks in weather radar is available, precluding our ability to study this type of migration over large spatial scales. We developed the first automatic algorithm for detecting the migration of flocks of soaring birds, an important bio‐flow phenomenon involving many millions of birds that travel across large spatial extents, with implications for risk of bird‐aircraft collisions. The algorithm was developed with a deep learning network for semantic segmentation using U‐Net architecture. We tested several models with different weather radar products and with image sequences for flock movement identification. The best model includes the radial velocity product and a sequence of two previous images. It identifies 93% of soaring bird flocks that were tagged by a human on the radar image, with a false discovery of less than 20%. Large birds such as those detected by the algorithm pose a serious risk for flight safety of civilian and military transportation and therefore the application of this algorithm can substantially reduce bird‐strikes, leading to reduced financial losses and threats to human lives. In addition, it can help overcome one of the main challenges in the study of bird migration by automatically and continuously detecting flocks of large birds over wide spatial scales without the need to equip the birds with tracking devices, unravelling the abundance, timing, spatial flyways, seasonal trends and influences of environmental conditions on the migration of bird flocks.

A plot of the natural logarithm of the prior predictive spawning‐stock biomass of each species considered in the study, as predicted by each simulator, single‐species assessments and the ensemble model. The edges of the shaded area represent the 5% and 95% quantiles of the prior predictive SSB.
A plot of the posterior predictive natural logarithm of the spawning‐stock biomass of each species considered in the study, as predicted by each simulator, single‐species assessments and the ensemble model. The edges of the shaded area represent the 5% and 95% quantiles of the posterior predictive SSB.
EcoEnsemble: A general framework for combining ecosystem models in R

June 2023

·

15 Reads

Often there are several complex ecosystem models available to address a specific question. However, structural differences, systematic discrepancies and uncertainties mean that they typically produce different outputs. Rather than selecting a single ‘best’ model, it is desirable to combine them to give a coherent answer to the question at hand. Many methods of combining ecosystem models assume that one of the models is exactly correct, which is unlikely to be the case. Furthermore, models may not be fitted to the same data, have the same outputs, nor be run for the same time period, making many common methods difficult to implement. In this paper, we use a statistical model to describe the relationship between the ecosystem models, prior beliefs and observations to make coherent predictions of the true state of the ecosystem with robust quantification of uncertainty. We introduce EcoEnsemble, an R package that takes advantage of the statistical model's structure to efficiently fit the ensemble model, either sampling from the posterior distribution or maximising the posterior density. We demonstrate EcoEnsemble by investigating what would happen to four fish species in the North Sea under future management scenarios. Although developed for applications in ecology, EcoEnsemble can be used to combine any group of mechanistic models, for example in climate modelling, epidemiology or biology.

(a) Site locations (Lizard Island and Sydney) where predation domes were used (b).
Steps of the experiment using a predation dome. (a) placing the dome; (b) filming the tag to identify the dome; (c) measuring the distance between the dome and the GoPro; and (d) fish being filmed. Photos from Lizard Island.
Fish species behaviour using the predation domes in tropical (Lizard Island) and temperate (Sydney) regions: (a) attack per hour and (b) inspection per hour. Each dot represents the individual dome‐level value. Note that y‐axes are on log scale. [Corrections added on 24 June 2023, after first online publication: Figure 3 has been updated to include a colour key.]
Predation domes: In‐situ field assays to measure predatory behaviours by fish

June 2023

·

99 Reads

Biotic interactions such as predation are difficult ecological processes to quantify in the wild. This is especially the case in the marine environment due to logistical difficulties in capturing animal behaviour. Common approaches use aquarium‐based experiments, live‐tethering, or assays with bait as proxies for quantifying predation pressure. However, these methods often fail to account for natural interactions between species in the wild and may raise ethical and animal welfare concerns. We designed a novel field‐based method to quantify predator–prey interactions for marine fishes. The “predation dome” is a clear acrylic aquarium that contains a live fish. The dome is filmed and, in contrast to other methods, it allows for natural olfactory and visual cues, and the prey fish is returned to the wild after the assay. Here, we provide a step‐by‐step guide on building and deploying the predation dome in the wild. To demonstrate its use, we quantified predation pressure using the domes in two tropical and two temperate locations. Piscivores were attracted to the domes and displayed predatory behaviours such as circling or striking. Although the overall number of predatory attacks did not differ among locations, predation domes revealed higher predation pressure by piscivores at the tropical locations in comparison to temperate reefs. Our results show that predation domes represent an ethical and complementary approach to measure predation that may better represent piscivory as compared to other behaviours. Predation domes can be also used to measure other biotic interactions such as territorial defence or courtship.

A generalizable normalization for assessing plant functional diversity metrics across scales from remote sensing

June 2023

·

135 Reads

Remote sensing (RS) increasingly seeks to produce global‐coverage maps of plant functional diversity (PFD) across scales. PFD can be quantified with metrics assessing field or RS data dissimilarity. However, their comparison suffers from the lack of normalization approaches that (1) correct for differences in the number and correlation of traits and spectral variables and (2) do not require comparing all the available samples to estimate the maximum trait's dissimilarity (unfeasible in RS). We propose a generalizable normalization (GN) based on the maximum potential dissimilarity for the traits and spectral data considered and compare it to more traditional approaches (e.g. the maximum dissimilarity within datasets). To do so, we simulated plant communities with radiative transfer models and compared RS‐based diversity measurements across spatial scales (α‐ and β‐diversity components). Specifically, we assessed the capability of different normalization approaches (GN, local, none) to provide PFD estimates comparable between (1) RS and plant traits and (2) estimates from different RS missions. Unlike the other approaches, GN provides diversity component estimates that are directly comparable between field data and RS missions with different spectral configurations by removing the effect of differences in the number of traits or bands and the maximum dissimilarity across datasets. Therefore, GN enables the separated analysis of RS images from different sensors to produce comparable global‐coverage cartography. We suggest GN is necessary to validate RS approaches and develop interpretable maps of PFD using different RS missions.

A species richness estimator for sample‐based incidence data sampled without replacement

June 2023

·

9 Reads

The accurate estimation of species richness in a target region is still a statistical challenge, especially in a highly heterogeneous community. Most richness estimators have been developed based on the assumption that data are randomly sampled with replacement or that data are sampled from an infinite population. However, in reality, most sampling schemes in the field are implemented as sampling without replacement (SWOR). As such, estimators derived based on sampling with replacement may cause overestimation as the sampling fraction increases and not converge to the true richness as the sampling fraction approaches one. Sample‐based incidence data, in which the sampling unit is a plot, and only the presence or absence of a species in each chosen plot is recorded, is one of the most commonly used data types for assessing species diversity in ecological studies. In this manuscript, according to sample‐based incidence data collected through SWOR, a new richness estimator is proposed using a truncated beta‐binomial mixture model. The new estimator was obtained through the moment approach, which avoids using iterative numerical algorithms for parameter estimation and presents a closed‐form estimator as an alternative to the maximum likelihood method. Although the newly proposed method is a parametric‐based richness estimator, similar to nonparametric estimators, only the rare species in the sample (i.e. the frequencies of uniques and duplicates) are required to estimate undetected richness. Based on the hypothetical models, the statistical performances of the proposed estimator are evaluated under varying degrees of heterogeneity and different mean species detection rates. Compared to other widely used nonparametric and parametric estimators, the simulation results indicate that the proposed estimator has a smaller bias and lower root mean square error when the sampling fraction is greater than 10%, particularly in highly heterogeneous communities. In addition, one ForestGEO permanent forest plot dataset is used to evaluate and compare the proposed approach with other estimators discussed in the study. The results demonstrate that the proposed estimator, in comparison to other widely used estimators, produces less biased estimate of true richness, along with more accurate 95% confidence interval.

movedesign: Shiny R app to evaluate sampling design for animal movement studies

June 2023

·

98 Reads

Projects focused on movement behaviour and home range are commonplace, but beyond a focus on choosing appropriate research questions, there are no clear guidelines for such studies. Without these guidelines, designing an animal tracking study to produce reliable estimates of space‐use and movement properties (necessary to answer basic movement ecology questions), is often done in an ad hoc manner. We developed ‘ movedesign ’, a user‐friendly Shiny application, which can be utilized to investigate the precision of three estimates regularly reported in movement and spatial ecology studies: home range area, speed and distance travelled. Conceptually similar to statistical power analysis, this application enables users to assess the degree of estimate precision that may be achieved with a given sampling design; that is, the choices regarding data resolution (sampling interval) and battery life (sampling duration). Leveraging the ‘ ctmm ’ R package, we utilize two methods proven to handle many common biases in animal movement datasets: autocorrelated kernel density estimators (AKDEs) and continuous‐time speed and distance (CTSD) estimators. Longer sampling durations are required to reliably estimate home range areas via the detection of a sufficient number of home range crossings. In contrast, speed and distance estimation requires a sampling interval short enough to ensure that a statistically significant signature of the animal's velocity remains in the data. This application addresses key challenges faced by researchers when designing tracking studies, including the trade‐off between long battery life and high resolution of GPS locations collected by the devices, which may result in a compromise between reliably estimating home range or speed and distance. ‘ movedesign ’ has broad applications for researchers and decision‐makers, supporting them to focus efforts and resources in achieving the optimal sampling design strategy for their research questions, prioritizing the correct deployment decisions for insightful and reliable outputs, while understanding the trade‐off associated with these choices.

NetworkExtinction : An R package to simulate extinction propagation and rewiring potential in ecological networks

June 2023

·

157 Reads

(1) Earth’s biosphere is undergoing drastic reorganization due to the sixth mass extinction brought on by the Anthropocene. Impacts of local and regional extirpation of species have been demonstrated to propagate through the complex interaction networks they are part of, leading to secondary extinctions and exacerbating biodiversity loss. Contemporary ecological theory has developed several measures to analyse the structure and robustness of ecological networks under biodiversity loss. However, a toolbox for directly simulating and quantifying extinction cascades and creating novel interactions (i.e. rewiring) remains absent. (2) Here, we present NetworkExtinction—a novel R package which we have developed to explore the propagation of species extinction sequences through ecological networks and quantify the effects of rewiring potential in response to primary species extinctions. With NetworkExtinction, we integrate ecological theory and computational simulations to develop functionality with which users may analyse and visualize the structure and robustness of ecological networks. The core functions introduced with NetworkExtinction focus on simulations of sequential primary extinctions and associated secondary extinctions, allowing user-specified secondary extinction thresholds and realization of rewiring potential. (3) With the package NetworkExtinction, users can estimate the robustness of ecological networks after performing species extinction routines based on several algorithms. Moreover, users can compare the number of simulated secondary extinctions against a null model of random extinctions. In-built visualizations enable graphing topological indices calculated by the deletion sequence functions after each simulation step. Finally, the user can estimate the network’s degree distribution by fitting different common distributions. Here, we illustrate the use of the package and its outputs by analysing a Chilean coastal marine food web. (4) NetworkExtinction is a compact and easy-to-use R package with which users can quantify changes in ecological network structure in response to different patterns of species loss, thresholds and rewiring potential. Therefore, this package is particularly useful for evaluating ecosystem responses to anthropogenic and environmental perturbations that produce nonrandom and sometimes targeted, species extinctions.

Flowchart of modelling in miaSim. Arrows are possible flows of data, and blocks are available data/models/modules. (a) To start using miaSim, essential variables should be defined (either the number of species or resources and the total time to simulate), while optional variables (e.g. a list of metabolites and their quantities for consumer‐resource model, a matrix describing interspecies interactions for generalized Lotka‐Volterra model, etc.) will be generated if not given. Coloured blocks near each input indicate their applicability to different models in the next step. (b) A model should be selected among all available models (including the logistic model, the generalized Lotka‐Volterra model, the consumer‐resource model, Hubbell's neutral model, the Ricker model and the self‐organized instability model). (c) The results containing simulation data from different models will be stored in an identical format for one single simulation or a series of different simulations. (d) Finally, data are returned in the TreeSummarizedExperiment format, which can be visualized using miaViz or miaSimShiny. In addition, with the evaluation of modelling results, some parameters can be adjusted, generating a new single simulation or a series of multiple simulations using alternative models. A series of multiple simulations can be generated by a single call to the “generateSimulations” function.
Data structures of the outputs in miaSim. (a) A result of the consumer‐resource model simulation is a list acquired by running the simulateConsumerResource function once and contains a matrix of community compositions at different time points in the simulation, initial abundances, probabilities of dispersal from the metacommunity, growth rates and the magnitude of the measurement error in the model. (b) A list of multiple simulations generated using a series of inputs by running generateSimulations function. (c) A summary data frame of multiple simulations acquired from a series of simulating results by applying the getCommunity function to a list of multiple simulations. In this data frame, each row represents one independent simulation.
Impact of heterogeneity of interaction‐strength on the distinctness of community types. This figure is a recomputation of the model from (Gibson et al., 2016) using miaSim but with different random seeds and specific numerical values. With different interaction strength heterogeneities, we generated a series of inter‐species interaction matrices, and subsequently, simulated the corresponding community changes using the generalized Lotka‐Volterra model. (a) Histograms of the interaction heterogeneity matrix H with different α (7, 3, 2, 1.6, 1.01) in a power‐law distribution. The highest H display an increasing value with α approaching 1 (b) Randomly generated networks of interspecies interactions. Each network contains 100 nodes, indicating 100 species in the local species pool, from which 80 species were chosen randomly 100 times to form 100 initial communities in every interaction strength pattern. From light yellow to dark blue, the edge colours indicate different strengths of interactions between species. (c) Principal coordinate analyses (PCoA) of the final state of communities using the Jensen‐Shannon distance. In every plot, each point represents a final community, and number of clusters is determined by the Silhouette Index (SI) acquired using Partitioning‐Around‐Medoid (PAM) algorithm. SI represents the goodness of clustering. It increases with the decrease of α, implying that the Strongly Interacting Species (SIS) in the last interaction patterns promotes the formation of different community types.
Uniform Manifold Approximation and Projection (UMAP) of simulated communities in different media (both components and concentrations). In each row, the relative composition of nutrients in the medium is the same, but from left to right, the quantities of resources for microbial communities increase in a geometric sequence. In each subfigure, different symbols represent their different initial community compositions. Each subfigure in the last column presents a common clustering result: four community types (communities 1 and 2; communities 3, 4 and 7; communities 5, 8 and 9; communities 6 and 10) are consistently observed in different environments.
Changes in community growth yield when more alternative carbon sources are added. This figure is a recomputation of the model from (Pacheco et al., 2021) using miaSim with a different random seed and slight model differences from the original publication, but qualitatively similar. Box plots contain the distribution of growth yields at the end of simulations of a complex community with 13 species (left) and simple communities with 3 species (middle) and 4 species (right) growing at the same concentration but different types of carbon sources. The central lines indicate the median values, and the top and bottom box edges are the 25th and 75th percentiles, respectively.
miaSim: an R/Bioconductor package to easily simulate microbial community dynamics

May 2023

·

184 Reads

Microbiomes never stop changing. Their compositions and functions are shaped by the complex interplay of intrinsic and extrinsic drivers, such as growth and migration rates, species interactions, available nutrients and environmental conditions. Mathematical models help us make sense of these complex drivers and intuitively explain how, why and when specific microbiome states are reached while others are not. To make simulations of microbiome dynamics intuitive and accessible, we present miaSim. miaSim provides users with a wide range of possibilities to match their specific assumptions and scenarios, starting from a core implementation of four widely used models (namely the stochastic logistic model, MacArthur's consumer‐resource model, Hubbell's neutral model and the generalized Lotka‐Volterra model) and several of their derivations. The diverse model implementations share the same data structures and, whenever possible, share state variables, which significantly facilitates cross‐model combinations and comparisons. We combined and simulated some published examples of microbiome models in miaSim and performed cross‐model comparisons and tested diverse model assumptions. Our examples illustrate the reliability, robustness and user‐friendliness of the package. In addition, miaSim is accompanied by miaSimShiny, which allows users to explore the parameter space of their models in real‐time in an intuitive graphical interface. miaSim is fully compatible with the ‘miaverse’, an R/Bioconductor framework for microbiome data science, allowing users to combine and compare model simulations with microbiome datasets. The stable version of miaSim is available through Bioconductor 3.15, and the version for future development is available at https://github.com/microbiome/miaSim. miaSimShiny is available at https://github.com/gaoyu19920914/miaSimShiny. We anticipate that miaSim will significantly facilitate the task of simulating microbiome dynamics, highlighting the role of ecological simulations as important tools in microbiome data science.

Latent trends from the dynamic factor analysis (DFA) model. (a) The three latent trends obtained for the 15 farmland species and (b) the four latent trends obtained for the 26 woodland species. The shaded areas correspond to the 95% confidence intervals.
Heterogeneity in species dynamics among Swedish farmland birds (n = 15). (a) Clusters of species displayed in the first factorial plane obtained from a PCA on the factor loadings (three latent trends). Species of each cluster are shown by different colours (cluster 1: 11 species in red, cluster 2: three species in blue and one species (the Ortolan bunting, Emberiza hortulana) out of the two clusters). Cluster centres are shown by black square dots. Species stability into a cluster is shown by the size and the brightness of its dot: species always associated with one cluster are depicted by small and bright dots. Insert plots show the variation in dynamics along the first and second principal component axes with remaining loadings set to 0. (b) Multi‐species index of the 15 farmland species between 1998 and 2020 obtained from the latent trends and the means of all the factor loadings (line), and geometric means of species abundances (dots). (c) Time series of the centre of the first cluster between 1998 and 2020. Approximate standard errors are shown by the shaded area. Cluster stability = 0.69 and scaled mean distance between species and cluster centre = 1.51. (d) Time series of the centre of the second cluster between 1998 and 2020. Standard error shown by the shaded area. Cluster stability = 0.51 and scaled mean distance between species and cluster centre = 1.63. All trends are shown at the log scale.
Heterogeneity in species dynamics among Swedish woodland birds (n = 26). (a) Clusters of species displayed in the first factorial plane obtained from a PCA on the factor loadings (four latent trends). Species of each cluster are shown in a different colour (cluster 1: 6 species in red, cluster 2: 20 species in blue). Cluster centres are shown by black square dots. Species stability into a cluster is shown by the size and the brightness of its dot: species always associated with one cluster are depicted by small and bright dots. (b) Multi‐species index of the 26 woodland species between 1998 and 2020 obtained from the latent trends and the means of all the factor loadings (line), and geometric means of species abundances (dots). (c) Time series of the centre of the first cluster between 1998 and 2020. Approximate standard error is shown by the shaded area. Cluster stability = 0.69 and scaled mean distance between species and cluster centre = 1.78. (d) Time series of the centre of the second cluster between 1998 and 2020. Standard error is shown by the shaded area. Cluster stability = 0.74 and scaled mean distance between species and cluster centre = 1.70. All trends are shown at the log scale.
A toolbox to explore the composition of species dynamics behind multi‐species indices

May 2023

·

94 Reads

In the light of declining biodiversity, monitoring its fate is essential for conservation strategies. Aggregation of temporal change of different species into multi-species indices such as geometric means makes it possible to identify species groups that are at risk as well as those that are doing well. However, aggregated indices mask the between-species variability in the temporal trajectories, which could be of high relevance for conservation actions. We propose a toolbox, available as an r package, to investigate compositions of species dynamics in geometric mean multi-species indices. The toolbox is based on a dynamic factor analysis which uses species dynamics and their uncertainty to (1) identify common latent trends in those species dynamics, (2) display the variability of species dynamics and (3) extract clusters of species with similar dynamics within the species groups used for the indices. We apply the toolbox to common breeding birds in Sweden and explore the variability in dynamics among species included in EU-official indices for farmland and woodland species, highlighting clusters of species with related dynamics previously hidden by averaging. The toolbox is designed to be applicable to a wide range of ecological monitoring data. By enabling a deeper exploration of the structure behind existing indices, we may refine our understanding of biodiversity change to better inform subsequent conservation policies.

Automated extraction of seed morphological traits from images

May 2023

·

316 Reads

The description of biological objects, such as seeds, mainly relies on manual measurements of few characteristics, and on visual classification of structures, both of which can be subjective, error prone and time‐consuming. Image analysis tools offer means to address these shortcomings, but we currently lack a method capable of automatically handling seeds from different taxa with varying morphological attributes and obtaining interpretable results. Here, we provide a simple image acquisition and processing protocol and introduce Traitor , an open‐source software available as a command‐line interface (CLI), which automates the extraction of seed morphological traits from images. The workflow for trait extraction consists of scanning seeds against a high‐contrast background, correcting image colours, and analysing images with the software. Traitor is capable of processing hundreds of images of varied taxa simultaneously with just three commands, and without a need for training, manual fine‐tuning or thresholding. The software automatically detects each object in the image and extracts size measurements, traditional morphometric descriptors widely used by scientists and practitioners, standardised shape coordinates, and colorimetric measurements. The method was tested on a dataset comprising of 91,667 images of seeds from 1228 taxa. Traitor 's extracted average length and width values closely matched the average manual measurements obtained from the same collection (concordance correlation coefficient of 0.98). Further, we used a large image dataset to demonstrate how Traitor 's output can be used to obtain representative seed colours for taxa, determine the phylogenetic signal of seed colour, and build objective classification categories for shape with high levels of visual interpretability. Our approach increases productivity and allows for large‐scale analyses that would otherwise be unfeasible. Traitor enables the acquisition of data that are readily comparable across different taxa, opening new avenues to explore functional relevance of morphological traits and to advance on new tools for seed identification.

Species distribution maps obtained with PHENOFIT expert and inverse calibrations, compared with observed species occurrences. Optimal threshold to dichotomize model predicted fitness index in presence/absence is the Youden index‐based cut‐off point. Note that models predict species climatic niche which is larger than the realized niche that corresponds to species presence map.
Species distribution maps obtained with CASTANEA expert and inverse calibrations, compared with observed species occurrences. Optimal threshold to dichotomize model predicted carbon reserves in presence/absence is the Youden index‐based cut‐off point. Note that models predict species climatic niche which is larger than the realized niche that corresponds to species presence map. Note also that CASTANEA cannot be used in high‐latitude regions (grey area).
Effects of data subsampling and stochasticity on CMA‐ES calibration using the PHENOFIT model for beech: (a) calibration AUC (calculated only with calibration cells) and (b) total AUC (calculated with all presence/absence cells). Each colour is a different subsampling of occurrence data, each point is a calibration run. Diamonds (with black border) are mean AUC values. On (a), the grouping letters represent the multiple comparisons with pairwise Dunn's tests.
Effects of stochasticity of CMA‐ES calibration on PHENOFIT leaf unfolding model parameter values for beech. Each panel is a parameter. Y‐axis limits are lower and upper bounds used during calibration. Each point is a calibrated parameter value, colour gradient is based on Fcrit values. Red diamonds are parameter values obtained with expert calibration, blue ones are parameter values obtained with the best inverse calibration.
Estimating process‐based model parameters from species distribution data using the evolutionary algorithm CMA‐ES

May 2023

·

24 Reads

1. Two main types of species distribution models are used to project species range shifts in future climatic conditions: correlative and process-based models. Although there is some continuity between these two types of models, they are fundamentally different in their hypotheses (statistical relationships vs. mechanistic relationships) and their calibration methods (SDMs tend to be occurrence data driven while PBMs tend to be prior driven). 2. One of the limitations to the use of process-based models is the difficulty to parameterize them for a large number of species compared to correlative SDMs. We investigated the feasibility of using an evolutionary algorithm (called covariance matrix adaptation evolution strategy, CMA-ES) to calibrate process based models using species distribution data. This method is well established in some fields (robotics, aerodynamics, etc.), but has never been used, to our knowledge, in ecology, despite its ability to deal with very large space dimensions. Using tree species occurrence data across Europe, we adapted the CMA-ES algorithm to find appropriate values of model parameters. We estimated simultaneously 27–77 parameters of two process based models simulating forest tree's ecophysiology for three species with varying range sizes and geographical distributions. 3. CMA-ES provided parameter estimates leading to better prediction of species distribution than parameter estimates based on expert knowledge. Our results also revealed that some model parameters and processes were strongly dependent, and different parameter combinations could therefore lead to high model accuracy. 4. We conclude that CMA-ES is an efficient state-of-the-art method to calibrate process-based models with a large number of parameters using species occurrence data. Inverse modelling using CMA-ES is a powerful method to calibrate process-based parameters which can hardly be measured. However, the method does not warranty that parameter estimates are correct because of several sources of bias, similarly to correlative models, and expert knowledge is required to validate results

Reporting framework visual guide and key concepts. (a) Overview of reporting framework levels (for more details, see Section 2 and Boxes 1 and 2). For each level, a ‘best practice’ example is provided (Chronister et al., 2021; Kruger & Du Preez, 2016; Moran et al., 2020; Norton & Scharff, 2016; Prat et al., 2017; Raimondi et al., 2023; Schneider & Mercado III, 2019; Tønnesen et al., 2020). (b) A toy example, showing how a sound spectrogram (top) can be temporally labelled (by marking the start and end points of elements; middle left, dark green boxes), annotated (by marking element information, such as element type; middle right, coloured letters), or both (bottom). (c) This decision tree can help users determine which level of reporting is most suitable for their specific situation.
Histogram (dark green) and density plot (gold) of click IOIs (in seconds) from a recording of an echolocating sperm whale. The mean IOI is denoted by the solid purple line and the median IOI by the dashed orange line.
Robust rhythm reporting will advance ecological and evolutionary research

April 2023

·

59 Reads

Rhythmicity in the millisecond to second range is a fundamental building block of communication and coordinated movement. But how widespread are rhythmic capacities across species, and how did they evolve under different environmental pressures? Comparative research is necessary to answer these questions but has been hindered by limited crosstalk and comparability among results from different study species. Most acoustics studies do not explicitly focus on characterising or quantifying rhythm, but many are just a few scrapes away from contributing to and advancing the field of comparative rhythm research. Here, we present an eight‐level rhythm reporting framework which details actionable steps researchers can take to report rhythm‐relevant metrics. Levels fall into two categories: metric reporting and data sharing. Metric reporting levels include defining rhythm‐relevant metrics, providing point estimates of temporal interval variability, reporting interval distributions, and conducting rhythm analyses. Data sharing levels are: sharing audio recordings, sharing interval durations, sharing sound element start and end times, and sharing audio recordings with sound element start/end times. Using sounds recorded from a sperm whale as a case study, we demonstrate how each reporting framework level can be implemented on real data. We also highlight existing best practice examples from recent research spanning multiple species. We clearly detail how engagement with our framework can be tailored case‐by‐case based on how much time and effort researchers are willing to contribute. Finally, we illustrate how reporting at any of the suggested levels will help advance comparative rhythm research. This framework will actively facilitate a comparative approach to acoustic rhythms while also promoting cooperation and data sustainability. By quantifying and reporting rhythm metrics more consistently and broadly, new avenues of inquiry and several long‐standing, big picture research questions become more tractable. These lines of research can inform not only about the behavioural ecology of animals but also about the evolution of rhythm‐relevant phenomena and the behavioural neuroscience of rhythm production and perception. Rhythm is clearly an emergent feature of life; adopting our framework, researchers from different fields and with different study species can help understand why.

Overview of the approach and contribution rubric to provide opportunities for—and documentation of—co‐author contributions while maximizing communication and efficiency throughout the manuscript production process.
Writing a massively multi‐authored paper: Overcoming barriers to meaningful authorship for all

April 2023

·

52 Reads

The value of large‐scale collaborations for solving complex problems is widely recognized, but many barriers hinder meaningful authorship for all on the resulting multi‐author publications. Because many professional benefits arise from authorship, much of the literature on this topic has focused on cheating, conflict and effort documentation. However, approaches specifically recognizing and creatively overcoming barriers to meaningful authorship have received little attention. We have developed an inclusive authorship approach arising from 15 years of experience coordinating the publication of over 100 papers arising from a long‐term, international collaboration of hundreds of scientists. This method of sharing a paper initially as a storyboard with clear expectations, assignments and deadlines fosters communication and creates unambiguous opportunities for all authors to contribute intellectually. By documenting contributions through this multi‐step process, this approach ensures meaningful engagement by each author listed on a publication. The perception that co‐authors on large authorship publications have not meaningfully contributed underlies widespread institutional bias against multi‐authored papers, disincentivizing large collaborations despite their widely recognized value for advancing knowledge. Our approach identifies and overcomes key barriers to meaningful contributions, protecting the value of authorship even on massively multi‐authored publications.

iPhenology: Using open-access citizen science photos to track phenology at continental scale

April 2023

·

184 Reads

1. Photo observations are a highly valuable but rarely used source of citizen sci-ence (CS) data. Recently, the number of publicly available photo observations has increased strongly, for example, due to the use of smartphone applications for species identification. This has enabled the raising of ecological insights in poorly studied subjects. One of the fields with the highest potential to benefit from the use of photo observations is phenology. 2. We propose a workflow for iPhenology, the use of publicly available photo obser-vations to track phenological events at large scales. The workflow comprises data acquisition, cleaning of observations, phenological classification and modelling spatiotemporal patterns of phenology. We explore the suitability of iPhenology to observe key phenological stages in the plant reproductive cycle of a model species and discuss limitations and future prospects of the approach using the example of an invasive species in Europe. 3. We show that iPhenology is suitable to track key phenological events of wide-spread species. However, the number and quality of available observations may differ among species and phenological stages. 4. Overall, publicly available CS photo observations are suitable to track key pheno-logical events and can thus significantly advance the knowledge on the timing and drivers of plant phenology. In future, integrating the workflow with automated image processing and analysis may enable real-time tracking of plant phenology.

Classifying the unknown: Insect identification with deep hierarchical Bayesian learning

April 2023

·

35 Reads

Classifying insect species involves a tedious process of identifying distinctive morphological insect characters by taxonomic experts. Machine learning can harness the power of computers to potentially create an accurate and efficient method for performing this task at scale, given that its analytical processing can be more sensitive to subtle physical differences in insects, which experts may not perceive. However, existing machine learning methods are designed to only classify insect samples into described species, thus failing to identify samples from undescribed species. We propose a novel deep hierarchical Bayesian model for insect classification, given the taxonomic hierarchy inherent in insects. This model can classify samples of both described and undescribed species; described samples are assigned a species while undescribed samples are assigned a genus, which is a pivotal advancement over just identifying them as outliers. We demonstrated this proof of concept on a new database containing paired insect image and DNA barcode data from four insect orders, including 1040 species, which far exceeds the number of species used in existing work. A quarter of the species were excluded from the training set to simulate undescribed species. With the proposed classification framework using combined image and DNA data in the model, species classification accuracy for described species was 96.66% and genus classification accuracy for undescribed species was 81.39%. Including both data sources in the model resulted in significant improvement over including image data only (39.11% accuracy for described species and 35.88% genus accuracy for undescribed species), and modest improvement over including DNA data only (73.39% genus accuracy for undescribed species). Unlike current machine learning methods, the proposed deep hierarchical Bayesian learning approach can simultaneously classify samples of both described and undescribed species, a functionality that could become instrumental in biodiversity monitoring across the globe. This framework can be customized for any taxonomic classification problem for which image and DNA data can be obtained, thus making it relevant for use across all biological kingdoms.

Identification of fish sounds in the wild using a set of portable audio‐video arrays

April 2023

·

103 Reads

Associating fish sounds to specific species and behaviours is important for making passive acoustics a viable tool for monitoring fish. While recording fish sounds in tanks can sometimes be performed, many fish do not produce sounds in captivity. Consequently, there is a need to identify fish sounds in situ and characterise these sounds under a wide variety of behaviours and habitats. We designed three portable audio-video platforms capable of identifying species-specific fish sounds in the wild: a large array, a mini array and a mobile array. The large and mini arrays are static autonomous platforms than can be deployed on the seafloor and record audio and video for one to two weeks. They use multichannel acoustic recorders and low-cost video cameras mounted on PVC frames. The mobile array also uses a multichannel acoustic recorder, but mounted on a remotely operated vehicle with built-in video, which allows remote control and real-time positioning in response to observed fish presence. For all arrays, fish sounds were localised in three dimensions and matched to the fish positions in the video data. We deployed these three platforms at four locations off British Columbia, Canada. The large array provided the best localisation accuracy and, with its larger footprint, was well suited to habitats with a flat seafloor. The mini and mobile arrays had lower localisation accuracy but were easier to deploy, and well suited to rough/uneven seafloors. Using these arrays, we identified, for the first time, sounds from quillback rockfish Sebastes maliger, copper rockfish Sebastes caurinus and lingcod Ophiodon elongatus. In addition to measuring temporal and spectral characteristics of sounds for each species, we estimated mean source levels for lingcod and quillback rockfish sounds (115.4 and 113.5 dB re 1 μPa, respectively) and maximum detection ranges at two sites (between 10.5 and 33 m). All proposed array designs successfully identified fish sounds in the wild and were adapted to various budget, logistical and habitat constraints. We include here building instructions and processing scripts to help users replicate this methodology, identify more fish sounds around the world and make passive acoustics a more viable way to monitor fish.

Modified from Conn and Cooch (2009). A contrast of misclassification alone (a), partial observation alone (b), and misclassification and partial observation combined (c). With pure misclassification (a), all samples are assigned a state, but observations are subject to misclassification (shown in blue). In a partially observed system (b), states are determined definitively for some fraction of observations, while those that cannot be determined are recorded as unknown (Obs U) or ambiguous in our pathogen example. Lastly, (c) in the case of misclassifications and partial observations, samples are assigned one of three states, but observations are subject to misclassification and some fraction of observations cannot be determined with certainty.
Decision tree diagram. The observation process is a composite of a detection probability (p) and a diagnostic rate (δ or r), which are defined in the text and found in Table 2. The θ_10 branch (shown in red) is presented for symmetry but is not possible in most pathogen detection studies (i.e. it is impossible for an individual to be infected at an uninfected site; see Guillera‐Arroita et al., 2017 for cases where this is possible). Contamination of a sampling device (which may occur either in the field or in the laboratory) may result in a sample false positive (shown in blue), either from an uninfected individual at an infected site (e.g. Pr(y_ijk = 1|w_ij = 0, z_i = 1)) or an uninfected individual at an uninfected site (e.g. Pr(y_ijk = 1|w_ij = 0, z_i = 0)). False negatives (shown in blue) may also occur by failing to detect the pathogen on a sample from an infected individual. These parameters may be informed by analysis of known samples processed in the field (field blanks are only used for eDNA studies).
Sample size estimates reflecting the number of 0U infection histories across a range of parameter combinations. Each figure panel shows the minimum 0U sample size necessary to be 95% confident that the pathogen is absent from a site given an increasing prior probability of occupancy (ψ'). Columns represent combinations of sensitivity (δ) and specificity (r) values, whereas rows are defined by combinations of detection probabilities (p_111, p_101 and p_100). The prior expectation of prevalence (θ_11') is shown by the colour gradient from low (dark, purple colour) to high (light, yellow colour) values. The individual contamination rate (θ_10) was set to 0, and, therefore, the value of the detection probability p_110 is irrelevant and is not shown.
Posterior probability distributions of Pd occupancy from 2014 to 2020 for location three under three scenarios of varying Pd occupancy and prevalence. Each column represents a different year, whereas each row denotes a Scenario (A, B, C1–C3). The sequence of three numbers at the top of each column shows the total number of Pd non‐detections, uncertain detections and detections for a given year. The three scenarios dictate the values of Pd occupancy and prevalence through time as shown in Table S1. The dashed vertical red lines show the prior probability of Pd occupancy for each year and scenario. The gradient of colours of the distributions represents the posterior probability from low (dark purple) to high (light yellow) values. Every panel contains the posterior probability estimates for the 85,500 parameter combinations (δ: 0.7–0.9, r: 0.7–0.9, p_111: 0.5–0.9, p_110: 0.005–0.05, p_101: 0.005–0.05).
Distribution of posterior probabilities for five locations in Texas, where Pd is suspected to be invading. Each panel shows the distribution of estimated posterior probabilities across all detection parameter combinations (δ: 0.7–0.9, r: 0.7–0.9, p_111: 0.5–0.9, p_110: 0.005–0.05, p_100: 0.005–0.05). The series of numbers in each panel represent the number of negative, uncertain and positive detections for a given location and year. Panels that do not have a histogram or sample size shown did not have samples collected during that time frame at that site. The colour gradient for the distributions indicates increasing posterior probability of occupancy from dark purple to light yellow. Across the years, prevalence was fixed at 1% and the prior probability of occupancy increased from 0.1 in 2014–2016 to 0.2 (2017), 0.5 (2018), 0.6 (2019) and 0.9 (2020) as shown by the vertical dashed red lines.
Inferring pathogen presence when sample misclassification and partial observation occur

April 2023

·

26 Reads

Surveillance programmes are essential for detecting emerging pathogens and often rely on molecular methods to make inference about the presence of a target disease agent. However, molecular methods rarely detect target DNA perfectly. For example, molecular pathogen detection methods can result in misclassification (i.e. false positives and false negatives) or partial detection errors (i.e. detections with ‘ambiguous’, ‘uncertain’ or ‘equivocal’ results). Then, when data are to be analysed, these partial observations are either discarded or censored; this, however, disregards information that could be used to make inference about the true state of the system. There is a critical need for more direction and guidance related to how many samples are enough to declare a unit of interest ‘pathogen free’. Here, we develop a Bayesian hierarchal framework that accommodates false negative, false positive and uncertain detections to improve inference related to the occupancy of a pathogen. We apply our modelling framework to a case study of the fungal pathogen Pseudogymnoascus destructans (Pd) identified in Texas bats at the invasion front of white‐nose syndrome. To improve future surveillance programmes, we provide guidance on sample sizes required to be 95% certain a target organism is absent from a site. We found that the presence of uncertain detections increased the variability of resulting posterior probability distributions of pathogen occurrence, and that our estimates of required sample size were very sensitive to prior information about pathogen occupancy, pathogen prevalence and diagnostic test specificity. In the Pd case study, we found that the posterior probability of occupancy was very low in 2018, but occupancy probability approached 1 in 2020, reflecting increasing prior probabilities of occupancy and prevalence elicited from the site manager. Our modelling framework provides the user a posterior probability distribution of pathogen occurrence, which allows for subjective interpretation by the decision‐maker. To help readers apply and use the methods we developed, we provide an interactive RShiny app that generates target species occupancy estimation and sample size estimates to make these methods more accessible to the scientific community (https://rmummah.shinyapps.io/ambigDetect_sampleSize). This modelling framework and sample size guide may be useful for improving inferences from molecular surveillance data about emerging pathogens, non‐native invasive species and endangered species where misclassifications and ambiguous detections occur.

Uncertainty of spatial averages and totals of natural resource maps

April 2023

·

395 Reads

Global, continental and regional maps of concentrations, stocks and fluxes of natural resources provide baseline data to assess how ecosystems respond to human disturbance and global warming. They are also used as input to numerous modelling efforts. But these maps suffer from multiple error sources and hence it is good practice to report estimates of the associated map uncertainty, so that users can evaluate their fitness for use. We explain why quantification of uncertainty of spatial aggregates is more complex than uncertainty quantification at point support, because it must account for spatial autocorrelation of the map errors. Unfortunately this is not done in a number of recent high-profile studies. We describe how spatial autocorrelation of map errors can be accounted for with block kriging, a method that requires geostatistical expertise. Next, we propose a new, model-based approach that avoids the numerical complexity of block kriging and is feasible for large-scale studies where maps are typically made using machine learning. Our approach relies on Monte Carlo integration to derive the uncertainty of the spatial average or total from point support prediction errors. We account for spatial autocorrelation of the map error by geostatistical modelling of the standardized map error. We show that the uncertainty strongly depends on the spatial autocorrelation of the map errors. In a first case study, we used block kriging to show that the uncertainty of the predicted topsoil organic carbon in France decreases when the support increases. In a second case study, we estimated the uncertainty of spatial aggregates of a machine learning map of the aboveground biomass in Western Africa using Monte Carlo integration. We found that this uncertainty was small because of the weak spatial autocorrelation of the standardized map errors. We present a tool to get realistic estimates of the uncertainty of spatial averages and totals of natural resources maps. The method presented in this paper is essential for parties that need to evaluate whether differences in aggregated environmental variables or natural resources between regions or over time are statistically significant.

Body condition changes at sea: Onboard calculation and telemetry of body density in diving animals

April 2023

·

206 Reads

1. The ability of marine mammals to accumulate sufficient lipid energy reserves is vital for mammals' survival and successful reproduction. However, long-term monitoring of at-sea changes in body condition, specifically lipid stores, has only been possible in elephant seals performing prolonged drift dives (low-density li-pids alter the rates of depth change while drifting). This approach has limited applicability to other species. 2. Using hydrodynamic performance analysis during transit glides, we developed and validated a novel satellite-linked data logger that calculates real-time changes in body density (∝lipid stores). As gliding is ubiquitous amongst divers, the system can assess body condition in a broad array of diving animals. The tag processes high sampling rate depth and three-axis acceleration data to identify 5 s high pitch angle glide segments at depths >100 m. Body density is estimated for each glide using gliding speed and pitch to quantify drag versus buoyancy forces acting on the gliding animal. 3. We used tag data from 24 elephant seals (Mirounga spp.) to validate the onboard calculation of body density relative to drift rate. The new tags relayed body density estimates over 200 days and documented lipid store accumulation during migration with good correspondence between changes in body density and drift rate. Our study provided updated drag coefficient values for gliding (C d,f = 0.03) and drifting (C d,s = 0.12) elephant seals, both substantially lower than previous estimates. We also demonstrated post-hoc estimation of the gliding drag coefficient and body density using transmitted data, which is especially useful when drag parameters cannot be estimated with sufficient accuracy before tag deployment.

Modelling density surfaces of intraspecific classes using camera trap distance sampling

March 2023

·

80 Reads

1. Spatially explicit densities of wildlife are important for understanding environmental drivers of populations, and density surfaces of intraspecific classes allow exploration of links between demographic ratios and environmental conditions. Although spatially explicit densities and class densities are valuable, conventional design-based estimators remain prevalent when using camera-trapping methods for unmarked populations. 2. We developed a density surface model that utilized camera trap distance sampling data within a hierarchical generalized additive modelling framework. We estimated density surfaces of intraspecific classes of a common ungulate, white-tailed deer Odocoileus virginianus, across three large management regions in Indiana, United States. We then extended simple statistical theory to test for differences in two ratios of density. 3. Deer density was influenced by landscape fragmentation, wetlands and anthropogenic development. We documented class-specific responses of density to availability of concealment cover, and found strong evidence that increased recruitment of young was tied to increased resource availability from anthropogenic agricultural land use. The coefficients of variation of the total density estimates within the three regions we surveyed were 0.11, 0.10 and 0.06. 4. Synthesis and applications. Our strategy extends camera trap distance sampling and enables managers to use camera traps to better understand spatial predictors of density. Our density estimates were more precise than previous estimates from camera trap distance sampling. Population managers can use our methods to detect finer spatiotemporal changes in density or ratios of intraspecific-class densities. Such changes in density can be linked to land use, or to management regimes on habitat and harvest limits of game species.

Schematic illustration of the proposed additive diversity decomposition. Vertical bars in (a) illustrate the different fractions of functional diversity. The DRQ ternary diagram in (b) represents the same diversity components in graphical form, with species dominance, functional redundancy and functional diversity corresponding to the three corners of the ternary diagram.
DRQ ternary diagram for the grazed and ungrazed plots of the dry calcareous grassland in Tuscany. Convex hulls delimit groups of grazed and ungrazed plots. According to distance‐based multivariate ANOVA, the two groups of plots occupy significantly different positions of the ternary diagram at p < 0.001 (F = 12.84; Bray–Curtis dissimilarity and 10,000 randomizations).
The ternary diagram of functional diversity

March 2023

·

225 Reads

Among the many diversity indices in the ecologist toolbox, measures that can be partitioned into additive terms are particularly useful as the different components can be related to different ecological processes shaping community structure. In this paper, an additive diversity decomposition is proposed to partition the diversity structure of a given community into three complementary fractions: functional diversity, functional redundancy and species dominance. These three components sum up to one. Therefore, they can be used to portray the community structure in a ternary diagram. Since the identification of community-level patterns is an essential step to investigate the main drivers of species coexistence, the ternary diagram of functional diversity can be used to relate different facets of diversity to community assembly processes more exhaustively than looking only at one index at a time. The value of the proposed diversity decomposition is demonstrated by the analysis of actual abundance data on plant assemblages sampled in grazed and ungrazed grasslands in Tuscany (Central Italy).

Three example survivorship functions of the three different types (type I: senescence; type II: constant mortality; type III, negative senescence). The two Keyfitz entropy measures given by Equations (5) and (10)) are calculated for these three curves using two different widths of the age classes, Δt, given below in Table 1. That is, we discretized the survivorship curves shown in the Figure using two different sizes of discrete intervals, Δt=0.01 and Δt=1.
Variation between the Original Discrete Entropy (Hlx) and New Discrete Entropy (HN) measures from matrix population models of animals in the COMADRE database (panels a and b), and of plants in the COMPADRE database (panels c and d). The blue shaded areas in (a and c) represent regions where the two entropy metrics have given a different classification to a survivorship curve (top blue square: New Discrete metric classified the curve as senescent whereas the Original Discrete metric classified the curve as negatively senescent; bottom blue square: vice versa). (a) Comparing Keyfitz' entropy of animal matrix models from the COMADRE database using the Original Discrete and the New Discrete metric. Points in dark blue are matrices where entropy shifted from negative values (positive senescence) to positive values (negative senescence); points in light blue do not have that shift. Matrices converted from stage to age are shown in orange. (b) Plot of the difference between the new and the existing metric HN−Hlx as a function of the life expectancy of animal species from COMADRE. (c) Comparing Keyfitz' entropy of plant matrix models from the COMPADRE database using the Original Discrete and New Discrete Entropy metric. Points in grey are converted stage‐to‐age matrix models that were not included in the analyses in Bernard et al. (2020). (d) Plot of the difference between the new and the existing metric HN−Hlx as a function of the life expectancy of plant from COMPADRE.
Discretising Keyfitz' entropy for studies of actuarial senescence and comparative demography

March 2023

·

56 Reads

Keyfitz' entropy is a widely used metric to quantify the shape of the survivorship curve of populations, from plants to animals and microbes. Keyfitz' entropy values <1 correspond to life histories with an increasing mortality rate with age (i.e. actuarial senescence), whereas values >1 correspond to species with a decreasing mortality rate with age (negative senescence), and a Keyfitz entropy of exactly 1 corresponds to a constant mortality rate with age. Keyfitz' entropy was originally defined using a continuous‐time model, and has since been discretised to facilitate its calculation from discrete‐time demographic data. Here, we show that the previously used discretisation of the continuous‐time metric does not preserve the relationship with increasing, decreasing or constant mortality rates. To resolve this discrepancy, we propose a new discrete‐time formula for Keyfitz' entropy for age‐classified life histories. We show that this new method of discretisation preserves the relationship with increasing, decreasing, or constant mortality rates. We analyse the relationship between the original and the new discretisation, and we find that the existing metric tends to underestimate Keyfitz' entropy for both short‐lived species and long‐lived species, thereby introducing a consistent bias. To conclude, to avoid biases when classifying life histories as (non‐)senescent, we suggest researchers use either the new metric proposed here, or one of the many previously suggested survivorship shape metrics applicable to discrete‐time demographic data such as Gini coefficient or Hayley's median.

(a) Typical quantitative genetic partitioning of focal phenotypic trait z into direct additive genetic (az), environmental (x) and residual (ez) components. A known environment may affect the phenotypic trait via phenotypic plasticity (βxz), but this effect is independent of the organism's genes. (b) Compared to a), when an organism has a genetic preference (ax) to choose or adjust its local environment (x), the genes of the organism and its environment are no longer independent. In this scenario, the genes influencing choice or adjustment of the local environment also indirectly influence the phenotypic trait through phenotypic plasticity (βxz). (c) When the phenotypic trait (z) of an organism has a genetic component (az) and affects the choice or adjustment of the local environment (x) via ‘environmental plasticity’ (βzx), the genes underpinning the expression of the phenotypic trait indirectly affect the choice or adjustment of the environment. Thus, the genes of the organism and its environment are no longer independent, causing a genetic correlation between the organism and its environment.
Estimated additive genetic variance of the phenotypic trait for models 1 (phenotypic trait as dependent variable) and 2 (with the focal environment as a covariate) for the simulated values in scenarios 5–8. The box plots illustrate the distribution of estimates of the 100 simulations for each scenario (the bottom and the top of the boxes are the first and third quartiles, the middle band is the median, its whiskers extend from the box to highest and lowest points within 1.5 times the interquartile range. Outliers are represented with black dots. Red dots are the simulated direct genetic variances for the focal phenotypic trait. Orange dots are the simulated total genetic variance (direct + indirect; see Section 4). Crossed squares (☒) indicate if non‐zero direct genetic variance for the phenotypic trait, direct genetic variance for the focal environment, phenotypic plasticity and/or environmental plasticity were simulated.
Estimated effects of the focal environment on the phenotypic trait (i.e. strength of phenotypic plasticity), for model 2 (focal environment as covariate). See Figure 2 for box plot description and legend explanation.
Distribution of the estimated values of additive genetic variance of the phenotypic trait for models 1 (phenotypic trait as dependent variable) and 2 (focal environment as covariate) for the simulated values in scenarios 9–12 (see Figure 2 for box plot description and legend explanation).
Estimated values of additive genetic covariance between the phenotypic trait and the focal environment (model 5, bivariate model) for the simulated values in all (12) scenarios. Red dots are the simulated additive genetic covariance between the focal phenotypic trait and the focal environment (see Figure 2 for box plot description and legend explanation).
Estimation of additive genetic variance when there are gene–environment correlations: Pitfalls, solutions and unexplored questions

March 2023

·

35 Reads

Estimating the genetic variation underpinning a trait is crucial to understanding and predicting its evolution. A key statistical tool to estimate this variation is the animal model. Typically, the environment is modelled as an external variable independent of the organism, affecting the focal phenotypic trait via phenotypic plasticity. We studied what happens if the environment is not independent of the organism because it chooses or adjusts its environment, potentially creating non‐zero genotype–environment correlations. We simulated a set of biological scenarios assuming the presence or absence of a genetic basis for a focal phenotypic trait and/or the focal environment (treated as an extended phenotype), as well as phenotypic plasticity (the effect of the environment on the phenotypic trait) and/or ‘environmental plasticity’ (the effect of the phenotypic trait on the local environment). We then estimated the additive genetic variance of the phenotypic trait and/or the environment by applying five animal models which differed in which variables were fitted as the dependent variable and which covariates were included. We show that animal models can estimate the additive genetic variance of the local environment (i.e. the extended phenotype) and can detect environmental plasticity. We show that when the focal environment has a genetic basis, the additive genetic variance of a phenotypic trait increases if there is phenotypic plasticity. We also show that phenotypic plasticity can be mistakenly inferred to exist when it is actually absent and instead environmental plasticity is present. When the causal relationship between the phenotype and the environment is misunderstood, it can lead to severe misinterpretation of the genetic parameters, including finding ‘phantom’ genetic variation for traits that, in reality, have none. We also demonstrate how using bivariate models can partly alleviate these issues. Finally, we provide the mathematical equations describing the expected estimated values. This study highlights that not taking gene–environment correlations into account can lead to erroneous interpretations of additive genetic variation and phenotypic plasticity estimates. If we aim to understand and predict how organisms adapt to environmental change, we need a better understanding of the mechanisms that may lead to gene–environment correlations.

Flying high: Sampling savanna vegetation with UAV-lidar

March 2023

·

67 Reads

The flexibility of UAV-lidar remote sensing offers a myriad of new opportunities for savanna ecology, enabling researchers to measure vegetation structure at a variety of temporal and spatial scales. However, this flexibility also increases the number of customizable variables, such as flight altitude, pattern, and sensor parameters, that, when adjusted, can impact data quality as well as the applicability of a dataset to a specific research interest. To better understand the impacts that UAV flight patterns and sensor parameters have on vegetation metrics, we compared 7 lidar point clouds collected with a Riegl VUX − 1LR over a 300 × 300 m area in the Kruger National Park, South Africa. We varied the altitude (60 m above ground, 100 m, 180 m, and 300 m) and sampling pattern (slowing the flight speed, increasing the overlap between flightlines and flying a crosshatch pattern), and compared a variety of vertical vegetation metrics related to height and fractional cover. Comparing vegetation metrics from acquisitions with different flight patterns and sensor parameters, we found that both flight altitude and pattern had significant impacts on derived structure metrics, with variation in altitude causing the largest impacts. Flying higher resulted in lower point cloud heights, leading to a consistent downward trend in percentile height metrics and fractional cover. The magnitude and direction of these trends also varied depending on the vegetation type sampled (trees, shrubs or grasses), showing that the structure and composition of savanna vegetation can interact with the lidar signal and alter derived metrics. While there were statistically significant differences in metrics among acquisitions, the average differences were often on the order of a few centimetres or less, which shows great promise for future comparison studies. We discuss how these results apply in practice, explaining the potential trade-offs of flying at higher altitudes and with alternate patterns. We highlight how flight and sensor parameters can be geared toward specific ecological applications and vegetation types, and we explore future opportunities for optimizing UAV-lidar sampling designs in savannas.

Representation of the structure of the integrated species distribution model, where each dataset is a separate realization of the ‘true’ species distribution. This is done by assuming each dataset has its own observation process, with a common latent, which is described by ecological covariates and parameters.
(a–c) Plots of the three datasets considered in the case study for species Setophaga caerulescens. (d) Map of the United States of America, highlighting Pennsylvania state.
Mean and standard deviation of the predicted intensity (logλs) of the integrated species distribution model, which gives a reflection of relative abundance across the spatial map.
PointedSDMs: An R package to help facilitate the construction of integrated species distribution models

March 2023

·

181 Reads

Ecological data are being collected at a large scale from a multitude of different sources, each with their own sampling protocols and assumptions. As a result, the integration of disparate datasets is a rapidly growing area in quantitative ecology, and is subsequently becoming a major asset in understanding the shifts and trends in species' distributions. However, the tools and software available to construct statistical models to integrate these disparate datasets into a unified framework is lacking. This has made these methods inaccessible to general practitioners and has stagnated the growth of data integration in more applied settings. We therefore present PointedSDMs: an easy to use R package used to construct integrated species distribution models. It provides functions to easily format the data, fit the models in a computationally efficient way and presents the output in a format that is convenient for additional work. This paper illustrates the different uses and functions available in the package, which are designed to simplify the modelling of integrated models. A case study using the package is also presented: combining three datasets coming from different sampling protocols, all containing records of Setophaga caerulescens across Pennsylvania state.

Phasing gene copies into polyploid subgenomes on a mul‐tree. (a) A phylogenetic network representing a single reticulation, giving rise to an allopolyploid. (b) The mul‐tree representation of this phylogenetic network has two leaves (green and blue) representing the two subgenomes of the allopolyploid. (c) Four loci were sequenced from the allopolyploid. Two copies (green and blue) of each locus were recovered. Loci 2 and 3 are incorrectly phased. (d) After phasing, each locus is assigned to the correct subgenome.
Model testing for allelic variants and/or homeologs. In this example, two copies of each locus were sequenced from a single allotetraploid. (a) A phasing model that swaps the gene copies between two mul‐tree tips (the green and the blue tips) assumes that the two gene copies of each locus are homeologous. (b) An alternative phasing model swaps gene copies between three mul‐tree tips. This potentially allows for the gene copies of some loci to be inferred as allelic variation and the gene copies of other loci to be inferred as homeologs. In this example, the gene copies of locus 1 and locus 4 are homeologs and the gene copies of locus 2 and 3 are allelic variants. The fit of the data to these two models—mul‐tree A and mul‐tree B—can be compared through Bayes factors.
Performance of homologizer in phasing four loci under different simulated conditions. Each row shows how phasing performance was impacted by a factor expected to influence accuracy: the sequence length, the smallest distance between subgenomes; the smallest distance between a polyploid subgenome and a sampled diploid and the degree of incomplete lineage sorting (ILS). See the main text for details on how each factor was quantified. The x‐axis in the left panel of each row shows the mean marginal probability of the joint MAP phasing: each point represents an individual simulated polyploid. Points are coloured red if the joint MAP phasing was estimated incorrectly and grey if the phasing was estimated correctly. The grey and red densities represent the distribution of the mean marginal posterior probabilities of correctly and incorrectly phased simulated polyploids, respectively. The right panel of each row summarizes how the proportion of times the model was correct changes with the focal factor. The orange bar is the variance.
Effectiveness of model comparison tests to distinguish allelic variation from homeologs. Each dot represents a single simulated dataset as described in the main text. The y‐axis shows the Bayes factor for the three mul‐tree tip phasing model compared to the two mul‐tree tip phasing model. All replicates had two gene copies simulated for each locus. Replicates 1–50 (left panel; red dots) were simulated with each of the two tips representing a different polyploid subgenome. The Bayes factor for all of these replicates was less than 0 (dashed line), indicating the three‐tip model was not supported over the two‐tip phasing model. Replicates 51–100 (right panel; grey dots) were simulated with three total tips representing two different subgenomes of a polyploid in some loci and a single subgenome with two allelic variants in other loci. For nearly all these replicates the Bayes factor was over 100 (dotted line), indicating “decisive” support (Kass & Raftery, 1995) for the three‐tip phasing model over the two‐tip model.
homologizer analysis of the Cystopteridaceae dataset. The phasing of gene copies into subgenomes is summarized on the maximum a posteriori (MAP) phylogeny. Thickened branches have posterior probabilities of 1.0; posterior probabilities <1.0 are indicated. The columns of the heatmap each represent a locus, and the joint MAP phase assignment is shown as text within each box. Each box is coloured by the marginal posterior probability of the phase assignment. Adjacent to the heatmap is a column that shows the mean marginal probability across loci of the phasing assignment per tip, which summarizes the model's overall confidence in the phasing of that tip. In the sample labels, “A.” = Acystopteris; “C.” = Cystopteris; “G.” = Gymnocarpium; the four‐digit numbers are Fern* Labs Database accession numbers (https://fernlab.biology.duke.edu/); capital A, B, and so forth indicate subgenomes; sp1 and sp2 indicate undescribed cryptic species; and lowercase letters following the accession numbers indicate haploid “individuals” within the sampled diploids (i.e. those diploids are heterozygous). Copy names with a “BLANK” suffix indicate missing sequences (e.g. a subgenome that was present in some loci but not retrieved for others).
homologizer: Phylogenetic phasing of gene copies into polyploid subgenomes

March 2023

·

136 Reads

Organisms such as allopolyploids and F1 hybrids contain multiple distinct subgenomes, each potentially with its own evolutionary history. These organisms present a challenge for multilocus phylogenetic inference and other analyses since it is not apparent which gene copies from different loci are from the same subgenome and thus share an evolutionary history. Here we introduce homologizer, a flexible Bayesian approach that uses a phylogenetic framework to infer the phasing of gene copies across loci into their respective subgenomes. Through the use of simulation tests, we demonstrate that homologizer is robust to a wide range of factors, such as incomplete lineage sorting and the phylogenetic informativeness of loci. Furthermore, we establish the utility of homologizer on real data, by analysing a multilocus dataset consisting of nine diploids and 19 tetraploids from the fern family Cystopteridaceae. Finally, we describe how homologizer may potentially be used beyond its core phasing functionality to identify non‐homologous sequences, such as hidden paralogs or contaminants.

Steps for fitting secr model by simulation and inverse prediction. Simulations are conducted at the vertices of a box in parameter space (top left; link scale) centred on an initial guess (blue diamond). The results in proxy space (top right; frame connects design point means, centre omitted for clarity) support a linear model for proxies as a function of parameters. The model is inverted and applied to the observed proxy vector (yellow square) giving the centre of a new, smaller box in parameter space (bottom left). The model is refined by further simulations (bottom right) from which the final parameter estimates are inferred (white square, bottom left). Function ipsecr.fit performs all steps.
Relationship between the proxy variable log(number of individuals detected) (‘logn’) and the density parameter D while holding detection parameters constant. Distribution of values for 100 simulated datasets at each level. Boxes span 25th to 75th percentiles. The relationship is nearly linear for a narrow range of densities.
Simulations of a spatial trend in density fitted using inverse prediction (N = 100, grey curves; mean dashed blue line) or with mis‐specified likelihood for multi‐catch traps (mean dotted red line). True trend (solid black curve) was a log‐linear function of distance in the east–west direction (x). Red ticks and vertical dashed lines mark extent of trapping grid.
ipsecr: An R package for awkward spatial capture–recapture data

March 2023

·

94 Reads

Some capture–recapture models for population estimation cannot easily be fitted by the usual methods (maximum likelihood and Markov‐chain Monte Carlo). For example, there is no straightforward probability model for the capture of animals in traps that hold a maximum of one individual (‘single‐catch traps’), yet such data are commonly collected. It is usual to ignore the limit on individuals per trap and analyse with a competing‐risk ‘multi‐catch’ model that gives unbiased estimates of average density. However, that approach breaks down for models with varying density. Simulation and inverse prediction was suggested by Efford (2004) for estimating population density with data from single‐catch traps, but the method has been little used, in part because the existing software allows only a narrow range of models. I describe a new R package that refines the method and extends it to include models with varying density, trap interference and other sources of non‐independence among detection histories. The method depends on (i) a function of the data that generates a proxy for each parameter of interest and (ii) functions to simulate new datasets given values of the parameters. By simulating many datasets, it is possible to infer the relationship between proxies and parameters and, by inverting that relationship, to estimate the parameters from the observed data. The method is applied to data from a trapping study of brushtail possums Trichosurus vulpecula in New Zealand. A feature of these data is the high frequency of non‐capture events that disabled traps (interference). Allowing for a time‐varying interference process in a model fitted by simulation and inverse prediction increased the steepness of inferred year‐on‐year population decline. Drawbacks and possible extensions of the method are discussed.

Lpnet: Reconstructing phylogenetic networks from distances using integer linear programming

March 2023

·

24 Reads

Neighbor‐net is a widely used network reconstructing method that approximates pairwise distances between taxa by a circular phylogenetic network. We present Lpnet, a variant of Neighbor‐net. We first apply standard methods to construct a binary phylogenetic tree and then use integer linear programming to compute an optimal circular ordering that agrees with all tree splits. This approach achieves an improved approximation of the input distance for the clear majority of experiments that we have run for simulated and real data. We release an implementation in R that can handle up to 94 taxa and usually needs about 1 min on a standard computer for 80 taxa. For larger taxa sets, we include a top‐down heuristic which also tends to perform better than Neighbor‐net. Our Lpnet provides an alternative to Neighbor‐net and performs better in most cases. We anticipate Lpent will be useful to generate phylogenetic hypotheses.

Illustration of MitoGeneExtractor algorithm. Exonerate is called to translate and align the DNA sequence reads to the amino acid reference (a). Then, MitoGeneExtractor creates the MSA of reads (b) and infers a gene consensus sequence (c). The degeneracy of the genetic code allows a considerable DNA sequence variation between the reads and the reference (d).
Runtime differences between MitoGeneExtractor (MGE) and MitoFinder. Total runtime needed to reconstruct only COI, or all PCGs with MitoGeneExtractor compared to the runtime for the MitoFinder assembly and annotation, depending on various dataset sizes.
COI (left) and ND5 (right) reconstruction success with MitoGeneExtractor (blue) and MitoFinder (red). Density plots indicate the probability density curve. Dots show the number of nucleotides in individual sequences. Diamonds indicate the median nucleotide recovery obtained with MitoGeneExtractor (COI = 1545, ND5 = 1755) and MitoFinder (COI = 0, ND5 = 0). The avian COI and ND5 gene typically comprise 1551 and 1818 nucleotides respectively.
MitoGeneExtractor: Efficient extraction of mitochondrial genes from next‐generation sequencing libraries

February 2023

·

39 Reads

Mitochondrial DNA (mtDNA) sequences are often found as byproducts in next‐generation sequencing (NGS) datasets that were originally created to capture genomic or transcriptomic information of an organism. These mtDNA sequences are often discarded, wasting this valuable sequencing information. We developed MitoGeneExtractor, an innovative tool which allows to extract mitochondrial protein coding genes (PCGs) of interest from NGS libraries through multiple sequence alignments of sequencing reads to amino acid references. General references, for example on order level are sufficient for mining mitochondrial PCGs. In a case study, we applied MitoGeneExtractor to recently published genomic datasets of 1993 birds and were able to extract complete or nearly complete sequences for all 13 mitochondrial PCGs for a large proportion of libraries. Compared to an existing assembly guided sequence reconstruction algorithm, MitoGeneExtractor was faster and substantially more sensitive. We compared COI sequences mined with MitoGeneExtractor to COI databases. Mined sequences show a high sequence similarity and correct taxonomic assignment between the recovered sequence and the assigned morphospecies in most samples. In some cases of incongruent taxonomic assignments, we found evidence for contamination in NGS libraries. MitoGeneExtractor allows a fast extraction of mitochondrial PCGs from a wide range of NGS datasets. We recommend to routinely harvest and curate mitochondrial sequence information from genomic resources. MitoGeneExtractor output can be used to identify contaminated NGS libraries and to validate the species identity of the sequenced animal based on the extracted COI sequences.

Methodological scheme illustrating the calculation of d‐correlations of variables (P) from the data (X) directly or from ranks (R) indirectly through the use of difference matrices (D). As a “by product”, Gower dissimilarities (G) may also be calculated from the D matrices, by averaging absolute differences.
Results for the artificial plant trait matrix (Table 1). (a) Leaf size versus flowering sequence differences demonstrating positive d‐correlation. (b) Flowering sequence versus light EIV differences demonstrating negative d‐correlation. (c) Principal components analysis of all variables.
Principal components analysis of variables describing the Sardinian vascular flora. (a, b) Ordinal variables are treated using relative rank differences. (c, d) Ordinal variables are treated by counting nearest neighbour interchanges. See Table 2, for abbreviations.
Correlating variables with different scale types: A new framework based on matrix comparisons

February 2023

·

86 Reads

Ecological variables may be expressed on four basic measurement scales (nominal, ordinal, interval or ratio), whereas circular variables and those combining a nominal state with other scale types are also common. However, existing methods are not suited to calculate correlations between all pairwise combinations of such variables, preventing the application of standard multivariate techniques. The essence of the new approach is to derive a so‐called difference semimatrix for all pairs of observations for each variable, and then to calculate the matrix correlation based on two such semimatrices. The advantage of this function, termed d‐correlation, is that comparisons are made on the same logical basis regardless of the measurement scale, allowing for the use of principal components analysis to visualize interrelationships among many variables simultaneously. Further advantages are that missing values in the data are tolerated and that the Gower index of dissimilarity between objects may also be computed. The use of the method is demonstrated on a small toy matrix, an artificial plant trait matrix and a large dataset summarizing ecological features of all vascular plant species of Sardinia, Italy. The source code in R and FORTRAN, and applications for three different operation systems, are provided for computations with results serving as input for other statistical software. The new computational framework will allow the comparison of any types of ecological traits in a mathematically meaningful manner. This option was not available earlier in the field of multivariate statistics, and the method is expected to receive applications in other subject areas as well in which many objects are described in terms of variables expressed on different measurement scales.

Optimizing insect metabarcoding using replicated mock communities

February 2023

·

113 Reads

Metabarcoding (high‐throughput sequencing of marker gene amplicons) has emerged as a promising and cost‐effective method for characterizing insect community samples. Yet, the methodology varies greatly among studies and its performance has not been systematically evaluated to date. In particular, it is unclear how accurately metabarcoding can resolve species communities in terms of presence‐absence, abundance and biomass. Here we use mock community experiments and a simple probabilistic model to evaluate the effect of different DNA extraction protocols on metabarcoding performance. Specifically, we ask four questions: (Q1) How consistent are the recovered community profiles across replicate mock communities?; (Q2) How does the choice of lysis buffer affect the recovery of the original community?; (Q3) How are community estimates affected by differing lysis times and homogenization? and (Q4) Is it possible to obtain adequate species abundance estimates through the use of biological spike‐ins? We show that estimates are quite variable across community replicates. In general, a mild lysis protocol is better at reconstructing species lists and approximate counts, while homogenization is better at retrieving biomass composition. Small insects are more likely to be detected in lysates, while some tough species require homogenization to be detected. Results are less consistent across biological replicates for lysates than for homogenates. Some species are associated with strong PCR amplification bias, which complicates the reconstruction of species counts. Yet, with adequate spike‐in data, species abundance can be determined with roughly 40% standard error for homogenates, and with roughly 50% standard error for lysates, under ideal conditions. In the latter case, however, this often requires species‐specific reference data, while spike‐in data generalize better across species for homogenates. We conclude that a non‐destructive, mild lysis approach shows the highest promise for the presence/absence description of the community, while also allowing future morphological or molecular work on the material. However, homogenization protocols perform better for characterizing community composition, in particular in terms of biomass.

Posterior predictive check showing the density distribution of observed seed production values (red line) to simulated seed production values (light grey) as estimated by the joint model, on a log scale. Simulated values were generated using the 80% posterior confidence intervals for each parameter, the black line shows simulated values using the median of each parameter.
Competitive (a) and facilitative (b) scaled interaction networks estimated from our model framework. Competitive and facilitative interactions (a, b) are shown separately for ease of viewing but were analysed together. Only focal species are included in these networks, arrows point to species i and line thickness denotes interaction strength. Interaction strengths are given as the median over 1000 samples. Purple coloured nodes correspond to highly abundant native species, red nodes to exotic species, and the green node indicates a potential keystone species.
For all graphs, diamonds are species medians across all network samples, black lines cover the 50% quantile and grey dots indicate the full range of values as calculated from 1000 sampled networks. Note that only focal × focal interactions are included. The shaded part of the graphs show facilitative (negative) interactions. Dashed lines represent the median value for all focals. Coloured triangles indicate the species referred to in the main text for each of the ecological questions associated with (a–c). In (a), the x‐axis shows the strength of scaled intraspecific interactions, that is how strongly a focal species interacts with itself, plotted against a focal species' total log abundance (y‐axis). Values above 0 indicate competition, and values less than 0 indicate facilitation. The two most abundant natives, Velleia rosea (VERO) and Podolepsis canescens (POCA; purple triangles), do not compete with themselves any more or less strongly than the median for all species in the system (dashed line). (b) The sum of interaction effects of focal species on neighbours (x‐axis) against the focal species' total log abundance (y‐axis). On the x‐axis, values greater than 0 indicate that a focal species has an overall competitive effect on neighbours, and values less than 0 indicate that it has an overall facilitative effect. Green triangles identifies a species with low overall abundance but strong competitive effects on neighbours: Gilberta tenuifolia (GITE). (c) Decomposes a focal species' net interaction strength into its competitive effects (x‐axis) and facilitative effects (y‐axis). Red triangles show the exotic species Hypochaeris glabra (HYPO), Arctotheca calendula (ARCA) and Pentameris aroides (PEAI). The light grey diagonal shows where x = y, species above that line have an overall facilitative effect on other species whereas those below that line have an overall competitive effect. Points capturing the effects of H. glabra on neighbours are spread much further apart than for any other species, this is largely driven by it's high germination rate which magnifies interaction effects when those are scaled according to the population dynamics model (see Supporting Information S1.5).
Estimating interaction strengths for diverse horizontal systems using performance data

February 2023

·

32 Reads

Network theory allows us to understand complex systems by evaluating how their constituent elements interact with one another. Such networks are built from matrices which describe the effect of each element on all others. Quantifying the strength of these interactions from empirical data can be difficult, however, because the number of potential interactions increases nonlinearly as more elements are included in the system, and not all interactions may be empirically observable when some elements are rare. We present a novel modelling framework which uses measures of species performance in the presence of varying densities of their potential interaction partners to estimate the strength of pairwise interactions in diverse horizontal systems. Our method allows us to directly estimate pairwise effects when they are statistically identifiable and to approximate pairwise effects when they would otherwise be statistically unidentifiable. The resulting interaction matrices can include positive and negative effects, the effect of a species on itself, and allows for non‐symmetrical interactions. We show how to link the parameters inferred by our framework to a population dynamics model to make inferences about the effect of interactions on community dynamics and diversity. The advantages of these features are illustrated with a case study on an annual wildflower community of 22 focal and 52 neighbouring species, and a discussion of potential applications of this framework extending well beyond plant community ecology.


Schematic representation of exemplary life histories (LHs): (a) LH of an individual that dispersed sometime between age yleft and yright as part of a multiple‐member dispersing coalition, separated from the original coalition between age zleft and zright, settled between age tRleft and age tRright, and died following its last observation as resident at age xleft; (b) LH of an individual that was a potential disperser when it disappeared under unexplained circumstances following its last observation as resident; (c) LH of an individual that was in the dispersing state at the time of last observation and part of a multiple‐member dispersing coalition when it disappeared under unexplained circumstances; (d) LH of an individual that died prior to sex determination. Times τ refer to ages at which the focal individual was observed alive and relevant covariates were updated. A description of all remaining variables and parameters is provided in Table 1.
Exemplary illustration of how covariate effects alter the form of the bathtub‐shaped mortality curve of the Siler model (Equation 1). In a proportional hazards framework (Kleinbaum & Klein, 2010), the covariate effects are estimated as hazard ratios (HRs), which have a multiplicative effect on the baseline mortality rate (solid line): HR > 1 shifts the baseline mortality curve upwards (dotted line), which results in higher mortality compared to the ‘baseline’ (for a given age and keeping all other covariates unchanged). The opposite applies to HR < 1 (dashed line). For covariates with an age‐independent effect on mortality (Equation 6), panel (a) shows how such covariates would alter the mortality curve proportionally with age. For covariates with an age‐dependent effect (Equation 7), panels (b) and (c) exemplify how the mortality curve would change at ‘young’ and ‘old’ ages, respectively.
Flow‐chart representation of the procedure used to sample the conditional posteriors (see Section 2.3.3). A Markov chain Monte Carlo algorithm is used with parallel chains in which each chain is initiated with different randomly drawn starting values for the unknown parameters. Following the initialisation of all chains, the algorithm involves sampling conditional posteriors of the unknown parameters in successive steps. At each step, the algorithm updates the unknown parameters (analogous to Equation 18) successively. Each parameter is updated by looping over all individuals of the focal sample (illustrated by the blue boxes).
Comparison of estimated mortality rates (shaded areas) with mortality rates used to simulate data (solid lines) for individuals in a resident state (blue) or dispersing state (orange). Shaded areas depict 95% credible intervals (CI) and dashed lines indicate the mean of estimated posteriors. Results are shown for simulated data of different sample sizes (a–d vs. e–h), proportions of missing individuals (a,c,e,g vs. b,d,f,h), and resighting intervals (a,b,e,f vs. c,d,g,h). Note that for computational reasons, we deliberately chose parameters that made simulated mortality of dispersers smaller than mortality of residents (see main text for details). While a minimum dispersal age of 1 year was used in the simulation, the youngest dispersal age of all simulated life histories was 1.3 years; thus, the mortality rates shown for the dispersing state were truncated at that age. For datasets with unknown fates, the ability of the model to correctly infer the fates of missing individuals is indicated by the receiver operator characteristic (ROC) curve and the corresponding area under the curve (AUC), where true positive rates (TPR) are plotted against false positive rates (FPR). When AUC ≥ 0.8, the predictive performance of the model can be considered as excellent (Hosmer & Lemeshow, 2000).
A workflow that guides the user to modify the provided R code by adding or removing individual model components that are needed for running mortality analyses tailored to the species under study. A description of all parameters is given in Table 1.
A hierarchical approach for estimating state‐specific mortality and state transition in dispersing animals with incomplete death records

February 2023

·

45 Reads

Unbiased mortality estimates are fundamental for testing ecological and evolutionary theory as well as for developing effective conservation actions. However, mortality estimates are often confounded by dispersal, especially in studies where dead‐recovery is not possible. In such instances, missing individuals (i.e. individuals with unobserved time of death) may have died or permanently emigrated from a study area, making inferences about their fate difficult. Mortality before and during dispersal, as well as the decision to disperse, usually depend on a suite of individual, social and environmental covariates, which in turn can be used to draw conclusions about the fate of missing individuals. Here, we propose a Bayesian hierarchical model that takes into account time‐varying covariates to estimate transitions between life‐history states and mortality in each state using mark‐resighting data with missing individuals. Specifically, our framework estimates mortality rates in two states (resident and dispersing state) by treating the fate of missing individuals as a latent (i.e. unobserved) variable that is statistically inferred based on information from individuals with a known fate and given the individual, social and environmental conditions at the time of disappearance. Our model also estimates rates of state transition (i.e. emigration) to assess whether a missing individual was more likely to have died or survived due to unobserved emigration from the study area. We used simulations to check the validity of our model and assessed its performance with data of varying degrees of uncertainty. Our modelling framework provided accurate mortality and emigration estimates for simulated data of different sample sizes, proportions of missing individuals, and resighting intervals. Variation in sample size appeared to affect the precision of estimated parameters the most. Our approach offers a solution to estimating unbiased mortality of both resident and dispersing individuals as well as the probability of emigration using mark‐resighting data with incomplete death records. Conditional on the availability of data on known‐fate individuals and relevant time‐varying covariates, our model can reconstruct the fate (death or emigration) of missing individuals. The modularity of our framework allows mortality analyses to be tailored to a variety of species‐specific life histories.

BirdFlow: Learning seasonal bird movements from eBird data

January 2023

·

101 Reads

Large‐scale monitoring of seasonal animal movement is integral to science, conservation and outreach. However, gathering representative movement data across entire species ranges is frequently intractable. Citizen science databases collect millions of animal observations throughout the year, but it is challenging to infer individual movement behaviour solely from observational data. We present birdflow, a probabilistic modelling framework that draws on citizen science data from the eBird database to model the population flows of migratory birds. We apply the model to 11 species of North American birds, using GPS and satellite tracking data to tune and evaluate model performance. We show that birdflow models can accurately infer individual seasonal movement behaviour directly from eBird relative abundance estimates. Supplementing the model with a sample of tracking data from wild birds improves performance. Researchers can extract a number of behavioural inferences from model results, including migration routes, timing, connectivity and forecasts. The birdflow framework has the potential to advance migration ecology research, boost insights gained from direct tracking studies and serve a number of applied functions in conservation, disease surveillance, aviation and public outreach.

Classical survivorship curves (left) show functional shapes of types I (dashed curve), II (solid) and III (dotted curve) survivorship, with corresponding mortality hazard functions (right) that give rise to the survivorship functions. Species can exhibit a mixture of type I and type III hazards (left), denoted here as a type IV survivorship curve (dotted‐dashed curve).
Means (black lines) and 95% credible intervals with the generating functions (green lines) for the natural log hazard age effects (first column), period effects (second column), cumulative survival probability for different ages (third column) and cumulative survival probability for the study period (fourth column) for the Type III simulation using data generated from a single seed. The first row results are for the top model based on relative mean square error, which was the constrained generalized additive model for age effects and the kernel convolution process model for period effects (C–K). The second row results are for the second best model with the constrained generalized additive model with additive natural splines and a LASSO prior for the age effects and a kernel convolution process for the period effects (CSL‐K).
Means (black lines) and 95% credible intervals with the generating functions (green lines) for the natural log hazard age effects (first column), period effects (second column), cumulative survival probability for different ages (third column) and cumulative survival probability for the study period (fourth column) for the Type IV simulation using data generated from a single seed (10000). The first row results are for the top model based on relative mean square error, which was the constrained generalized additive model with additive natural splines and a LASSO prior for the age effects and a kernel convolution process for the period effects (CSL‐K). The second row results are for the second best model with the constrained generalized additive model with additive natural splines and a ridge regression prior for the age effects and a kernel convolution process for the period effects (CS‐K).
Means (black lines) and 95% credible intervals (shaded regions) for the top two models, based on minimum WAIC, of age‐period chick and juvenile survival for Columbian sharp‐tailed grouse in northern Colorado from 2015–2017. The top model uses a kernel convolution process model for the log hazard age effects and a kernel convolution process for the period effects. The right column are results from the second best model which has the constrained generalized additive model with additive natural splines and the LASSO prior for the age effects, and the kernel convolution process for period effects.
Means (black lines) and 95% credible intervals (shaded regions) for the top two models, based on minimum WAIC, of age‐period survival for white‐tailed deer in south central Wisconsin from January 2017–May 2021. The top model uses a constrained generalized additive model for the log hazard age effects and a kernel convolution process for the period effects. The right column are results from the second best model which has the constrained generalized additive model with additive natural splines and the LASSO prior for the age effects, and the kernel convolution process for period effects.
Assimilating ecological theory with empiricism: Using constrained generalized additive models to enhance survival analyses

January 2023

·

38 Reads

Integrating ecological theory with empirical methods is ubiquitous in ecology using hierarchical Bayesian models. However, there has been little development focused on integration of ecological theory into models for survival analysis. Survival is a fundamental process, linking individual fitness with population dynamics, but incorporating life history strategies to inform survival estimation can be challenging because mortality processes occur at multiple scales. We develop an approach to survival analysis, incorporating model constraints based on a species' life history strategy using functional analytical tools. Specifically, we structurally separate intrinsic patterns of mortality that arise from age‐specific processes (e.g. increasing survival during early life stages due to growth or maturation, versus senescence) from extrinsic mortality patterns that arise over different periods of time (e.g. seasonal temporal shifts). We use shape constrained generalized additive models (CGAMs) to obtain age‐specific hazard functions that incorporate theoretical information based on classical survivorship curves into the age component of the model and capture extrinsic factors in the time component. We compare the performance of our modelling approach to standard survival modelling tools that do not explicitly incorporate species life history strategy in the model structure, using metrics of predictive power, accuracy, efficiency and computation time. We applied these models to two case studies that reflect different functional shapes for the underlying survivorship curves, examining age‐period survival for white‐tailed deer Odocoileus virginianus in Wisconsin, USA and Columbian sharp‐tailed grouse Tympanuchus phasianellus columbianus in Colorado, USA. We found that models that included shape constraints for the age effects in the hazard curves using CGAMs outperformed models that did not include explicit functional constraints. We demonstrate a data‐driven and easily extendable approach to survival analysis by showing its utility to obtain hazard rates and survival probabilities, accounting for heterogeneity across ages and over time, for two very different species. We show how integration of ecological theory using constrained generalized additive models, with empirical statistical methods, enhances survival analyses.