Wiley

Molecular Ecology Resources

Published by Wiley

Online ISSN: 1755-0998

·

Print ISSN: 1755-098X

Journal websiteAuthor guidelines

Top-read articles

101 reads in the past 30 days

Similarity and replication consistency between samples obtained by two different library protocols. NMDS ordination of the community profiles obtained from taxonomic classification of rRNA, (A). The total extracted RNA g⁻¹ DW soil, (B). The per cent relative abundance of SSU rRNA reads assigned to the order Thermoprotei or deeper. Boxplot showing 25% and 75% quantiles, median (horizontal line) and 5% and 95% quantiles (whiskers), (C). Significant differences tested with two‐way ANOVA, **p < 0.01 (p = 0.003).
Average relative proportion of the Pro‐ and Eukaryota in the community per sample, shown for nonAMP (A) and AMP (B) sequenced libraries and qRT‐PCR reads of 16S and 18S rRNA (primer pair; 1183F/1443R (16S) and 515F/806R (18S); C). Significant differences were tested with two‐way ANOVA, followed by Tukey's HSD post hoc test for between‐sample statistics separately for each dataset A to C. Significant differences (p < 0.05) are indicated with letters a, b, c both between methods (header) and sample type (on bars).
(A) log–log plot of the estimated number of 16S transcripts g⁻¹ DW soil by NAEstd or qMeTra (y‐axis) and qRT‐PCR (x‐axis) and their linear regression (the slope is given in the legend). Diagonal line represents the 1:1 line. (B) Person's r correlation matrix for all estimates including the total extracted RNA from the samples. Significant correlations are indicated with ***p < 0.001 (n = 46).
Conceptual drawing of how changes in soil RNA retention strength can result in false community size estimates for qMeTra (and qRT‐PCR), while NAEstd estimates will not be affected.
Quantifying Soil Microbiome Abundance by Metatranscriptomics and Complementary Molecular Techniques—Cross‐Validation and Perspectives

June 2025

·

102 Reads

·

Stella Brachmann

·

·

[...]

·

Download

Aims and scope


Molecular Ecology Resources is a broad scope journal publishing resources including computer programs, statistical and molecular advances, and extensive molecular tools that facilitate studies in evolution, ecology, and conservation. Papers reporting on empirical research in ecology, rather than new resources and tools, should be submitted to our companion journal, Molecular Ecology.

Recent articles


1‐D plot of the limit of detection in a multiplexed ddPCR assay. Infection standard consisted of 100 singly infected copepods. Last column is a field sample included as a comparison to the lab standard. The panel highlighted in blue is the negative template control. Positive reactions and negative reactions (light green and black, respectively) are separated by a manually set threshold (dashed line, at amplitude of 2000). Green points are FAM‐fluorescently labelled S. solidus droplets and grey points are HEX‐fluorescently labelled copepod droplets. Differences in fluorescence amplitude signal of each target dye (here FAM (green) shines brighter than HEX (black)) allow spatial differentiation of the droplet clusters. Each point is an individual reaction.
Linear relationship between dilution factor and gene concentration. There exists a significant correlation (r² = −0.98, p < 0.001 for both targets) between dilution and absolute concentration, demonstrating a reliable linearity of the dilution assay.
Host–parasite detection in environmental samples. In multiplexed ddPCR reactions, we find positive results in (1) diverse lakes in different months and (2) in both water samples and zooplankton tows. Single representative amplification wells are visualised here, but samples were run in triplicate across the dilution series (except for the eDNA sample, which was run only once). Averages in grey boxes are corrected to full strength.
Lake‐level estimates of parasites (A), hosts (B), and mean parasite load (C) varies substantially across sites. Parasite load is highest when both parasite and host DNA copy numbers are high, as observed in Black Lake. In contrast, when parasites are less abundant relative to hosts, as in Pachena Lake, parasite load is lower. Blackwater Lake, where both parasite and host levels are relatively low, exhibits intermediate parasite loads. Error bars represent SEs between replicates (n = 3).
Needle in a Haystack: A Droplet Digital Polymerase Chain Reaction Assay to Detect Rare Helminth Parasites Infecting Natural Host Populations
  • Article
  • Full-text available

June 2025

·

22 Reads

Helminths infect humans, livestock, and wildlife, yet remain understudied despite their significant impact on public health and agriculture. Because many of the most prevalent helminth‐borne diseases are zoonotic, understanding helminth transmission among wildlife could improve predictions and management of infection risks across species. A key challenge to understanding helminth transmission dynamics in wildlife is accurately and quantitatively tracking parasite load across hosts and environments. Traditional methods, such as visual parasite identification from environmental samples or infected hosts, are time‐consuming, while standard molecular techniques (e.g., PCR and qPCR) often lack the sensitivity to reliably detect lower parasite burdens. These limitations can underestimate the prevalence and severity of infection, hindering efforts to manage infectious diseases. Here, we developed a multiplexed droplet digital PCR (ddPCR) assay to quantify helminth loads in aquatic habitats using 18S rRNA target genes. Using Schistocephalus solidus and their copepod hosts as a case study, we demonstrate ddPCR's sensitivity and precision. The assay is highly reproducible, reliably detecting target genes at concentrations as low as 1 pg of DNA in lab standards and field samples (multi‐species and eDNA). Thus, we provide a toolkit for quantifying parasite load in intermediate hosts and monitoring infection dynamics across spatio‐temporal scales in multiple helminth systems of concern for public health, agriculture, and conservation biology.


Genome‐scale species delimitation results from three methods (ASAP, PTP, GMYC). (A) Inferred species boundaries are shown against the reference tree of the samples. The samples' geographic origins and morphological identifications are indicated below the molecular species boundaries (ad = adiantiformis; bi = bifurcatum; ova = ovalifolium). The tree was inferred from the genome‐scale alignment with BEAST, and values at internal nodes are Bayesian posterior probabilities (shown if > 0.95). (B) Sequence divergences between samples belonging to the same (intra) or different (inter) inferred species using the three methods, along with summary statistics. Jitter was added to the points in the x‐ and y‐directions. μ indicates the mean.
Gene‐by‐gene comparison of inferred species limits. The guide tree and the numbered grey bars at the top are taken from Figure 1 (numbers refer to the ASAP species limits) to facilitate comparison with the genome‐scale analysis. The values given on the left‐hand side are the substitution rates and the number of parsimony‐informative sites (PIS) for each gene, the latter given on a log‐10 scale. The gene rate and PIS values are also represented as colours along a blue (low) over white (intermediate) to orange (high) colour ramp. The three species delimitation methods are shown in the same colours as in Figure 1.
Spread of inferred species numbers from individual genes. μ and σ indicate the mean and standard deviation of the distribution of inferred species numbers.
Numbers of species predicted in single‐gene analyses as a function of gene features (parsimony‐informative sites and evolutionary rate). The result for the rbcL gene, a traditional DNA barcode in red algae, has a black border around the point.
(A) Scaling of estimated species numbers with the size of the dataset, showing how methods scale from individual genes to whole organelle genomes. (B) Maximum likelihood (ML) estimates of the coalescent lambda parameter from GMYC analyses on the unique haplotype datasets. Values shown across all graphs are means of 10 replicate analyses, with the bars indicating standard error.
Scaling Up Species Delimitation From DNA Barcodes to Whole Organelle Genomes: Strong Evidence for Discordance Among Genes and Methods for the Red Alga Dasyclonium

June 2025

·

57 Reads

Molecular sequence data have become a ubiquitous tool for delimiting species and are particularly important in organisms where morphological traits are not informative about species boundaries. A range of statistical methods have been developed to derive species limits from molecular data, for example, by quantifying changes in branching patterns in phylogenetic trees. We aim to investigate how such methods scale up from single genes to whole organelle genomes. We gathered chloroplast genome data from 38 samples of the red algal genus Dascyclonium and analysed them with the popular species delimitation methods Assemble Species by Automatic Partitioning (ASAP), General Mixed Yule Coalescent (GMYC), and Poisson Tree Processes (PTP). We show extensive variation in inferred species boundaries depending on the method and dataset used. Genome‐scale analyses differed substantially between methods, with ASAP predicting the fewest species, PTP intermediate, and GMYC inferring many species. Based on a series of simulations, we identify a tendency of GMYC to overestimate species numbers as alignments increase in length, while the other two methods are not sensitive to this scaling. Gene‐by‐gene analyses show strong differences in predicted species limits, which is unexpected seeing that all genes are on a single uniparentally inherited chromosome, and highlight that choosing a particular gene as a DNA barcode has significant consequences for species diversity estimates. We show extensive cryptic diversity in the genus Dasyclonium and propose a consensus solution for species limits based on our combined results, enriched with biogeographic and morphological interpretations. Finally, we make recommendations for interpreting the results and improving the inferences drawn from species delimitation methods.



A Practical Comparison of Short‐ and Long‐Read Metabarcoding Sequencing: Challenges and Solutions for Plastid Read Removal and Microbial Community Exploration of Seaweed Samples

June 2025

·

41 Reads

Short‐read metabarcoding analysis is the gold standard for accessing partial 16S and ITS genes with high read quality. With the advent of long‐read sequencing, the amplification of full‐length target genes is possible, but with low read accuracy. Moreover, 16S rRNA gene amplification in seaweed results in a large proportion of plastid reads, which are directly or indirectly derived from cyanobacteria. Primers designed not to amplify plastid sequences are available for short‐read sequencing, while Oxford Nanopore Technology (ONT) offers adaptive sampling, a unique way to remove reads in real time. In this study, we compare three options to address the issue of plastid reads: deleting plastid reads with adaptive sampling, using optimised primers with Illumina MiSeq technology, and sequencing large numbers of reads with Illumina NovaSeq technology with universal primers. We show that adaptive sampling using the default settings of the MinKNOW software was ineffective for plastid depletion. NovaSeq sequencing with universal primers stood out with its deep coverage, low error rate, and ability to include both eukaryotes and bacteria in the same sequencing run, but it had limitations regarding the identification of fungi. The ONT sequencing helped us explore the fungal diversity and allowed for the retrieval of taxonomic information for genera poorly represented in the sequence databases. We also demonstrated with a mock community that the SAMBA workflow provided more accurate taxonomic assignment at the bacterial genus level than the IDTAXA and KRAKEN2 pipelines, but many false positives were generated at the species level.


Similarity and replication consistency between samples obtained by two different library protocols. NMDS ordination of the community profiles obtained from taxonomic classification of rRNA, (A). The total extracted RNA g⁻¹ DW soil, (B). The per cent relative abundance of SSU rRNA reads assigned to the order Thermoprotei or deeper. Boxplot showing 25% and 75% quantiles, median (horizontal line) and 5% and 95% quantiles (whiskers), (C). Significant differences tested with two‐way ANOVA, **p < 0.01 (p = 0.003).
Average relative proportion of the Pro‐ and Eukaryota in the community per sample, shown for nonAMP (A) and AMP (B) sequenced libraries and qRT‐PCR reads of 16S and 18S rRNA (primer pair; 1183F/1443R (16S) and 515F/806R (18S); C). Significant differences were tested with two‐way ANOVA, followed by Tukey's HSD post hoc test for between‐sample statistics separately for each dataset A to C. Significant differences (p < 0.05) are indicated with letters a, b, c both between methods (header) and sample type (on bars).
(A) log–log plot of the estimated number of 16S transcripts g⁻¹ DW soil by NAEstd or qMeTra (y‐axis) and qRT‐PCR (x‐axis) and their linear regression (the slope is given in the legend). Diagonal line represents the 1:1 line. (B) Person's r correlation matrix for all estimates including the total extracted RNA from the samples. Significant correlations are indicated with ***p < 0.001 (n = 46).
Conceptual drawing of how changes in soil RNA retention strength can result in false community size estimates for qMeTra (and qRT‐PCR), while NAEstd estimates will not be affected.
Quantifying Soil Microbiome Abundance by Metatranscriptomics and Complementary Molecular Techniques—Cross‐Validation and Perspectives

June 2025

·

102 Reads

Linking meta‐omics and biogeochemistry approaches in soils has remained challenging. This study evaluates the use of an internal RNA extraction standard and its potential for making quantitative estimates of a given microbial community size (biomass) in soil metatranscriptomics. We evaluate commonly used laboratory protocols for RNA processing, metatranscriptomic sequencing and quantitative reverse transcription polymerase chain reaction (qRT‐PCR). Metatranscriptomic profiles from soil samples were generated using two library preparation protocols and prepared in triplicates. RNA extracted from pure cultures of Saccharolobus solfataricus was added to the samples as an internal nucleic acid extraction standard (NAEstd). RNA reads originating from NAEstd were identified with a 99.9% accuracy. A remarkable replication consistency between triplicates was seen (average Bray–Curtis dissimilarity 0.03 ± 0.02), in addition to a clear library preparation bias. Nevertheless, the between‐sample pattern was not affected by library type. Estimates of 16S rRNA transcript abundance derived from qRT‐PCR experiments, NAEstd and a previously published quantification method of metatranscriptomics (hereafter qMeTra) were compared with microbial biomass carbon (MBC) and nitrogen (MBN) extracts. The derived biomass estimates differed by orders of magnitude. While most estimates were significantly correlated with each other, no correlation was observed between NAEstd and MBC extracts. We discuss how simultaneous changes in community size and the soils nucleic acid retention strength might hamper accurate biomass estimation. Adding NAEstd has the potential to shed important light on nucleic acid retention in the substance matrix (e.g., soil) during extraction.


Epigenetic ages of N = 122 post‐fledgling individuals estimated using LOOCV (r = 0.84, MAD = 0.401). Regression line of epigenetic age against chronological age is shown in blue with the standard error in grey. The dotted line represents the regression line if predicted age is identical to chronological age.
Epigenetic ages of N = 67 pre‐fledgling individuals estimated using LOOCV (r = 89, MAD = 1.06). Regression line of epigenetic age against chronological age is shown in blue with the standard error in grey. The dotted line represents the regression line if predicted age is identical to chronological age.
Epigenetic ages estimated using LOOCV for post‐fledgling females (A; N = 58) and males (B; N = 64), and for pre‐fledgling females (C; N = 25) and males (D; N = 26). Regression line of epigenetic age against chronological age is shown in blue with the standard error in grey. The dotted line represents the regression line when the epigenetic age is identical to the chronological age.
Mean (± standard error) epigenetic ages (shown in red), estimated using the final pre‐fledgling clock model, of 14‐day old individuals in the control, enlarged and reduced treatment groups of a brood size manipulation experiment. Treatments with different letters are significantly different from each other, tested using the estimated marginal means with Fisher's least significant difference test. Grey dots represent the raw estimated epigenetic ages of each individual.
Independent Avian Epigenetic Clocks for Ageing and Development

June 2025

·

31 Reads

Information on individual age is a fundamental aspect in many ecological and evolutionary studies. However, accurate and non‐lethal methods that can be applied to estimate the age of wild animals are often absent. Furthermore, since the process of ageing is accompanied by a physical decline and the deterioration of biological functions, the biological age often deviates from the chronological age. Epigenetic marks are widely suggested to be associated with this age‐related physical decline, and especially changes in DNA methylation are suggested to be reliable age‐predictive biomarkers. Here, we developed separate epigenetic clocks for ageing and development in a small passerine bird, the great tit (Parus major). The ageing clock was constructed and evaluated using erythrocyte DNA methylation data of 122 post‐fledging individuals, and the developmental clock using 67 pre‐fledging individuals from a wild population. Using a leave‐one‐out cross‐validation approach, we were able to accurately predict the ages of individuals with median absolute deviations of 0.40 years for the ageing and 1.06 days for the development clock. Moreover, using existing data from a brood‐size manipulation, we show that nestlings from reduced broods are estimated to be biologically older compared to control nestlings, while they are expected to have higher fitness. These epigenetic clocks provide further evidence that, as observed in mammals, changes in DNA methylation of certain CpG sites are highly correlated with chronological age in birds and this opens up new avenues for broad applications in behavioural and evolutionary ecology.


The Impact of Whole‐Animal Fluid Preservation on the Observed Gut Microbiome of Vertebrates: Implications for the Use of Museum Specimens in Microbiome Research

May 2025

·

39 Reads

The vertebrate gut houses diverse microbial communities that provide insights into their host's ecological and evolutionary histories. Nevertheless, microbiome research has not been distributed equally across host taxonomy, geography and timescales. The millions of fluid‐preserved specimens stored in natural history museums worldwide represent a potentially untapped resource for microbiome information. However, it is unknown how fluid preservation and long‐term storage change the composition and diversity of the original microbial community across a variety of host taxa. Here, we present the largest study to date aimed at addressing this question. Specifically, we identified an optimal method for extracting DNA from preserved samples using commercially available kits. Next, for 11 host species representing four vertebrate classes, we compared the gut microbiomes between animals dissected freshly and those collected simultaneously but subsequently fixed in formalin and stored in 70% ethanol for 1 year, similar to museum conditions. In a secondary analysis in amphibians, we compared our collected samples with those from decades‐old historical museum specimens. We found that while fluid preservation altered the community composition and reduced the diversity of the recovered microbiome inventories, host species identity predominated in shaping the gut microbiome, and differences across species and geographic localities were retained after preservation. Historical specimens had microbiomes that were the most different from fresh specimens, suggesting that over time, changes in the microbiome of populations have occurred, or preservation effects have compounded. Considering these findings, we discuss the potential for use of fluid‐preserved museum specimens in future microbiome studies.


Analysis of DNA extracted from herbarium samples highlights typical signatures of ancient DNA and suggests plant/soil and species‐dependent effects on patterns of degradation. (A) An illustration of the herbarium plants with associated soil used for the experiments; numbers in parenthesis refer to their ‘RecolNat’ accession numbers (https://explore.recolnat.org/search/botanique/type=index). (B) Length distribution and deamination profiles of DNA fragments (merged reads) extracted from Triticum durum roots (blue curves) and associated soil (in red) affiliated to T. durum, Streptomyces and Nocardioides. Deamination profiles are illustrated for the first 25 nucleotides at the 5′ end (G to A transitions) and 3′ one (C to T transitions). (C) Distribution (box plots) of the average lengths of the sequenced DNA fragments extracted from 39 leaf (green), root (blue) and soil‐associated (red) herbarium specimens. Each box plot refers to the fragments assigned to either a plant genome (4 species) or one of the 12 most abundant bacterial genera. Different letters below the plots indicate statistically supported (p < 0.001) differences between taxa, while asterisks indicate, for each of the taxa, significant (p < 0.05) differences in the average length of the fragments extracted from soil and plant tissues.
Herbarium‐associated soil and root microbial communities cluster with extant soil communities. Herbarium soil and root microbial communities (empty triangles) cluster close to extant soil communities (pink dots) in a global analysis of microbial communities sampled in highly contrasted environments worldwide (dots). Principal Coordinates Analysis (PCoA) ordination is based on Bray–Curtis indices computed using the Qiita management platform. Note that two of the three herbarium samples located under the 0.0 value of the PCoA2 axis are two of the three communities from leaves (green triangles).
Taxonomic (alpha) diversity of herbarium‐associated soil and root microbial communities. (A) Taxonomic composition (phylum level) of eukaryotic (top) and bacterial (bottom) microbial communities associated with the roots or the soils of the four studied plant species. Each bar represents the mean value of the different replicates of samples collected in Saint Cloud garden in Paris (excluding samples collected from other collection sites). Phyla represented by < 1% of the total number of annotated sequences were pooled in a single category (Others). (B) Distribution of Shannon biodiversity indices (calculated at the genus level) of root and soil eukaryotic and bacterial communities. Dotted lines connect the values of the root and soil communities of each individual herbarium plant sample. Lactuca herbarium plant samples from which no soil samples could be collected were excluded from this analysis.
Sampling collection site and sample type (soil vs. root) affect herbarium‐associated communities at both the phylum and genus levels. (A) At the phylum level, non‐metric multidimensional scaling (NMDS, k = 3, stress < 0.05) ordination based on Bray–Curtis distances illustrates the distribution of microbial communities according to collection site and sample type (red: soil; blue: root; filled dots: garden of Saint‐Cloud, empty triangles: other sites). Explanatory variables (phyla represented by more than 1% of the reads assigned) that significantly (p < 0.05) contributed to the separation of the communities along the two axes are indicated. The lower bar gives the percentage of total variance explained by each variable. (B) and (C) Principal Coordinates Analysis ordinations based on Bray–Curtis distances illustrate the distribution of eukaryotic (B) and bacterial (C) communities according to collection site and sample type. Venn diagrams give the percentage of total variance explained by each of the variables and their interaction (only explained variance values > 0.01 are reported).
Bacterial and eukaryotic genera that showed differences in abundance between herbarium soil and root samples. Analysis using the DeSeq software allowed identifying taxa that displayed a difference in read abundance between the global root and soil sequence datasets above a threshold of 1.5/−1 for Bacteria and 0.9/−1 for Eukarya (Log2 scale). (A) Differentially abundant bacterial taxa (p < 0.001). Taxonomic annotation (phylum level) highlights differences between taxa displaying a higher abundance in soil (dominated by Planctomycetes) and those more abundant in roots (dominated by Proteobacteria and Actinomycetes). (B) Differentially abundant eukaryotic taxa (p < 0.05). Functional annotation highlights differences between taxa displaying a higher abundance in soil (unicellular protists and fungal saprotrophs) and those more abundant in roots (dominated by plant pathogens and symbionts). Since reads assigned to eukarya were far less abundant compared to those attributed to bacteria, the analysis was restricted to taxa with more than 50,000 and 5000 reads in the global dataset for bacteria and eukarya, respectively.
Ancient Microbiomes as Mirrored by DNA Extracted From Century‐Old Herbarium Plants and Associated Soil

May 2025

·

56 Reads

Numerous specimens stored in natural history collections have been involuntarily preserved together with their associated microbiomes. We propose exploiting century‐old soils occasionally found on the roots of herbarium plants to assess the diversity of ancient soil microbial communities originally associated with these plants. We extracted total DNA and sequenced libraries produced from rhizospheric soils and roots of four plants preserved in herbaria for more than 120 years in order to characterise the preservation and taxonomic diversity that can be recovered in such contexts. Extracted DNA displayed typical features of ancient DNA, with cytosine deamination at the ends of fragments predominantly shorter than 50 bp. When compared to extant microbiomes, herbarium microbial communities clustered with soil communities and were distinct from communities from other environments. Herbarium communities also displayed biodiversity features and assembly rules typical of soil and plant‐associated ones. Soil communities were richer than root‐associated ones with which they shared most taxa. Regarding community turnover, we detected collection site, soil versus root and plant species effects. Eukaryotic taxa that displayed a higher abundance in roots were mostly plant pathogens that were not identified among soil‐enriched ones. Conservation of these biodiversity features and assembly rules in herbarium‐associated microbial communities indicates that herbarium‐extracted DNA might reflect the composition of the original plant‐associated microbial communities and that preservation in herbaria seemingly did not dramatically alter these characteristics. Using this approach, it should be possible to investigate historical soils and herbarium plant roots to explore the diversity and temporal dynamics of soil microbial communities.



An Accessible Metagenomic Strategy Allows for Better Characterisation of Invertebrate Bulk Samples

DNA‐based techniques are a popular approach for assessing biodiversity in ecological research, especially for organisms which are difficult to detect or identify morphologically. Metabarcoding, the most established method for determining species composition and relative abundance in bulk samples, can be more sensitive and time‐ and cost‐effective than traditional morphological approaches. However, one drawback of this method is PCR bias caused by between‐species variation in the amplification efficiency of a marker gene. Metagenomics, bypassing PCR amplification, has been proposed as an alternative to overcome this bias. Several studies have already shown the promising potential of metagenomics, but they all indicate the unavailability of reference genomes for most species in any ecosystem as one of the primary bottlenecks preventing its wider implementation. In this study, we present a strategy that combines unassembled reads of low‐coverage whole genome sequencing and publicly available reference genomes to construct a genomic reference database, thus circumventing high sequencing costs and intensive bioinformatic processing. We show that this approach is superior to metabarcoding for approximating relative biomass of macrobenthos species from bulk samples. Furthermore, these results can be obtained with a sequencing effort comparable to metabarcoding. The strategy presented here can thus accelerate the implementation of metagenomics in biodiversity assessments, as it should be relatively easy to adopt by laboratories familiar with metabarcoding and can be used as an accessible alternative.


FIGURE 4 | Comparison of total and viral reads from different samples using five nucleic acid extraction methods (B2, CELL, DRB4, SPEC and CTAB): The bar charts represent the total number of reads (red) and viral reads (green) for each sample, with the viral read percentages indicated by a dashed line. The box plots (red boxes edged in aqua blue for total reads and green boxes edged in red for viral reads) illustrate the distribution of total and viral reads from all extraction methods, emphasising the differences in viral read recovery between techniques. The lower boundary of each box represents the 25th percentile, the line inside the box indicates the median, and the upper boundary represents the 75th percentile. Whiskers (aqua blue for total reads and yellow for viral reads) extend above and below the box, marking the 90th and 10th percentiles. The yellow dashed line indicates the mean for viral reads, while outliers are shown as yellow circles for viral reads and grey squares for total reads.
FIGURE 5 | Virome composition of the 11 samples used to assess the efficiency of the five nucleic acid extraction methods compared in this study. The graph represents a total of 10 distinct viral species that were detected: Grapevine rupestris stem pitting-associated virus (GRSPV), Grapevine leafroll-associated virus 2 (GLRaV-2), Grapevine leafroll-associated virus 3 (GLRaV-3), Grapevine virus H (GVH), Grapevine virus B (GVB), Grapevine virus E (GVE), Grapevine Pinot Gris virus (GPGV), Grapevine red blotch virus (GRBV), Grapevine Red Globe virus (GRGV) and Nepovirus lycopersici (Tomato ringspot virus, ToRSV). Note that the virus icons in the figure are symbolic and do not reflect the actual morphology of the virions.
FIGURE 6 | Heatmap showing the abundance of each virus in five extraction methods (B2-based, CELL, DRB4, CTAB and SPEC). In this figure, "Expected" refers to the true detection of a virus, based on the criteria outlined in the main text and Figure 2; B2 represents the B2-based dsRNA extraction method; CELL, the cellulose-based dsRNA extraction method; DRB4 method, the DRB4-based extraction method (commercial kit, MBL); CTAB, the cetyltrimethylammonium bromide-based RNA extraction method; and SPEC, the total RNA extraction method using the Spectrum Kit. Viral abundance is expressed as a percentage of transcripts per kilobase million (TPM).
FIGURE 7 | Graphical representation of the results of the receiver operating characteristic (ROC) analysis. Specificity refers to the proportion of healthy plants correctly identified as virus-free, while sensitivity reflects the proportion of virus-infected plants correctly identified. The area under the ROC curve (AUC) represents overall accuracy. To, TN, FP and FN denote true positive, true negative, false positive and false negative events, respectively. Each detection event was assigned a random unique value for plotting in the coordinate system. B2 refers to the B2-based dsRNA extraction method; CELL, the cellulose-based method; DRB4 method, the DRB4-based commercial kit; CTAB, the cetyltrimethylammonium bromidebased RNA extraction method; and SPEC, total RNA extraction using the Spectrum Kit.
Grapevine samples, origin and varieties used in this study.
An Innovative Binding-Protein-Based dsRNA Extraction Method: Comparison of Cost-Effectiveness of Virus Detection Methods Using High-Throughput Sequencing

May 2025

·

23 Reads

Viral diseases represent a threat to global food production. Managing the impact of viruses on crop production requires the ability to monitor viruses, study their ecology and anticipate outbreaks. Double‐stranded RNA (dsRNA) sequencing is a well‐established and reliable method of detecting viruses and studying virome‐host interactions and ecology. Compared to total RNA extraction, dsRNA extraction eliminates the majority of host RNAs, improving the recovery of viral RNAs. In this study, we developed and evaluated a novel dsRNA extraction method for high‐throughput sequencing (HTS) applications based on the Flock House virus (FHV) B2 protein (B2‐based method), and compared its performance with that of established cellulose‐based and DRB4‐based methods (commercial kit), as well as total RNA extraction techniques. The electrostatic properties of B2 have been instrumental in developing a bead‐free and resin‐free dsRNA extraction method. The B2‐based method demonstrated high viral read recovery, achieving proportions exceeding 20% in most samples, and provided better dsRNA purity with less low weight molecule co‐extracted RNA than the DRB4‐based method and cellulose‐based methods. Despite producing overall fewer total reads than the DRB4‐based method, the B2‐based enrichment for viral‐derived dsRNA was better, with a higher percentage of viral reads, making it effective in virome profiling. Furthermore, it had an excellent detection specificity (0.97) and a good detection sensitivity (0.71), minimising false positives and false negatives. In addition, the B2‐based method proved to be highly cost‐effective, with a per‐reaction cost of 4.47,comparedto4.47, compared to 35.34 for the DRB4‐based method. This method offers a practical solution for laboratories with limited resources or for large‐scale sampling for viral ecology studies. Future improvements to the B2‐based method should focus on optimising sensitivity to Vitivirus species and developing scalable, automated workflows for high‐throughput viral detection.


FIGURE 6 | Differences in treatment on the proportion of reads for Actinopterygii species amplified with MiFishU, where treatments are control (no BSA, normal cycling conditions); +BSA (addition of BSA, normal cycling conditions); and TD-PCR (no BSA, TD PCR) for each Taq polymerase. (A) Mean proportion of reads (from triplicate samples) for each treatment, for each Taq polymerase used (IPSF = Invitrogen Platinum SuperFi; NPHF = NEB Phusion HiFi; PGTF = Promega GoTaq Flexi; QMMM = Qiagen Multiplex Master Mix). (B) Fitted parameters from zoid by treatment, with the intercept term given for the control treatment with NPHF in relation to Trachurus symmetricus in the first panel; and the change in that term given for each treatment in the next panels (whiskers show 95% posterior credibility interval for each effect; values of effect sizes on y-axis are in additive log-ratio space relative to the reference condition), with colours denoting the total number of mismatches to the forward and reverse primers (summed). Species are as follows: Clupea pallasii (Cp), Diogenichthys atlanticus (Da), Engraulis mordax (Em), Hippoglossus stenolepis (Hs), Leuroglossus stilbius (Ls), Merluccius productus (Mp), Oncorhynchus nerka (On), Oncorhynchus tshawytscha (Ot), Sardinops sagax (Ss), Thaleichthys pacificus (Tp), and Trachurus symmetricus (Ts).
Observation Bias in Metabarcoding

May 2025

·

64 Reads

DNA metabarcoding is subject to observation bias associated with PCR and sequencing, which can result in observed read proportions differing from actual species proportions in the DNA extract. Here, we amplify and sequence a mock community of known composition containing marine fishes and cetaceans using four different primer sets and a variety of PCR conditions. We first compare metabarcoding observations to two different sets of expected species proportions based on total genomic DNA and on target mitochondrial template DNA. We find that calibrating observed read proportions based on template DNA concentration is most appropriate as it isolates PCR amplification bias; calibration with total genomic DNA results in bias that can be attributed to both PCR amplification bias and differing ratios of template to total genomic DNA. We then model the remaining amplification bias and find that approximately 60% can be explained by inherent species‐specific DNA characteristics. These include primer‐template mismatches, amplicon fragment length, and GC content, which vary somewhat across Taq polymerases. Finally, we investigate how different PCR protocols influence community composition regardless of expected proportions and find that changing protocols most strongly influence the amplification of templates with primer mismatches. Our findings suggest that using primer‐template pairs without mismatches and targeting a narrow taxonomic group can yield more repeatable and accurate estimates of species' true, underlying DNA template proportions. These findings identify key factors that should be considered when designing studies that aim to apply metabarcoding data quantitatively.


FIGURE 3 | Predictive accuracy of polar bear epigenetic clocks trained with varying levels of class overlap with the testing data, measured by median absolute error (MAE, blue) of epigenetic age relative to chronological age and the R-squared (orange) of the linear relationship between epigenetic and chronological age. Brighter orange and blue boxes indicate more accurate clocks and darker-shaded boxes are less accurate. For each overlap proportion, we fit 100 clocks with new training and testing samples, and the resulting accuracy metrics are displayed as boxplots showing the median, interquartile range and outliers. (A) predicts epigenetic age in 30 samples from two western-Arctic subpopulations (Southern and Northern Beaufort) using 75 samples from the same populations and a genetically distinct central-Arctic subpopulation (Western Hudson Bay), with overlap proportions ranging from genetically identical (0) to entirely distinct (1). (B) predicts epigenetic age in 30 male samples using 60 samples ranging from entirely female (overlap = 0) to entirely male (overlap = 1), with equal numbers from each subpopulation. (C) predicts epigenetic age in 75 muscle samples from seven subpopulations across the Canadian Arctic, using 100 samples ranging from only muscle (overlap = 1) to blood and skin (overlap = 0). (D) predicts epigenetic age in 30 mature bears (> 5 years) using 45 samples ranging from entirely mature (overlap = 1) to entirely immature (< 5 years, overlap = 0), with equal representation from each subpopulation. The plots indicate that clock performance is most affected by biased tissue types and age groups in the training data and that these biases have a greater impact on the deviation of epigenetic age from chronological age than on the linear relationship between epigenetic and chronological age.
Designing Epigenetic Clocks for Wildlife Research

May 2025

·

38 Reads

·

1 Citation

The applications of epigenetic clocks – statistical models that predict an individual's age based on DNA methylation patterns – are expanding in wildlife conservation and management. This growing interest highlights the need for field‐specific design best practices. Here, we provide recommendations for two main applications of wildlife epigenetic clocks: estimating the unknown ages of individuals and assessing their biological ageing rates. Epigenetic clocks were originally developed to measure biological ageing rates of human tissues, which presents challenges for their adoption in wildlife research. Most notably, the estimated chronological ages of sampled wildlife can be unreliable, and sampling restrictions limit the number and variety of tissues with which epigenetic clocks can be constructed, reducing their accuracy. To address these challenges, we present a detailed workflow for designing, validating applying accurate wildlife epigenetic clocks. Using simulations and analyses applied to an extensive polar bear dataset from across the Canadian Arctic, we demonstrate that accurate epigenetic clocks for wildlife can be constructed and validated using a limited number of samples, accommodating projects with small budgets and sampling constraints. The concerns we address are critical for clock design, whether researchers or third‐party service providers perform the bioinformatics. With our workflow and examples, we hope to support the accessible and widespread use of epigenetic clocks in wildlife conservation and management.


Unrecognised DNA Degradation in Flash-Frozen Genetic Samples in Natural History Collections

May 2025

·

99 Reads

Optimal preservation of tissues from the field to long‐term cryo‐storage is paramount to securing genetic resources for research needs. DNA preservation techniques vary, with flash freezing currently considered the gold standard in tissue preservation. However, flash freezing tissue samples in the field presents challenges, necessitating a more comprehensive understanding of the quantity and quality of preserved DNA from different techniques in archival collections. We compared metrics from DNA extractions from field‐collected amphibian, squamate and bird tissues from archival collections that were flash‐frozen in liquid nitrogen or fixed in either ethanol or tissue lysis buffer prior to archival cryopreservation. We also included DNA extracted from tissues of known liquid nitrogen tank failures to provide a baseline of DNA degradation under the very worst‐case scenario. Flash‐frozen tissues often preserved higher yields of DNA, but peak fragment size, the percentage of fragments larger than 10 kb and DNA integrity numbers were all significantly reduced compared to tissues first preserved in fixative buffers. This pattern was observed across independent samples and between flash‐frozen and buffer‐preserved pair replicates. Degradation seen in flash‐frozen tissues was also distinct to tissues from known tank failures. We suggest that degradation in flash‐frozen tissues occurred during shipping, sample sorting/accession or during subsequent subsampling when tissues may partially or fully thaw, exposing DNA to damaging freeze–thaw processes. By contrast, tissues in fixative buffers were likely protected from freeze–thaw damage. This study highlights that using multiple field preservation methods and minimising freeze–thaw cycles for flash‐frozen tissues may provide the most robust protection against the DNA degradation sources encountered by field collections.


Towards Large-Scale Museomics Projects: A Cost-Effective and High-Throughput Extraction Method for Obtaining Historical DNA From Museum Insect Specimens

May 2025

·

51 Reads

Natural history collections serve as invaluable repositories of biodiversity data. Large‐scale genomic analysis would greatly expand the utility and accessibility of museum collections, but the high cost and time‐intensive nature of genomic methods limit such projects, particularly for invertebrate specimens. This paper presents an innovative, cost‐effective and high‐throughput approach for extracting genomic DNA from diverse insect specimens using single‐phase reverse immobilisation (SPRI) beads. We optimised PEG‐8000 and NaCl concentrations to balance DNA yield and purity, reducing reagent cost to 4.0–11.6¢ per sample, cost dependent on sample type. Our method was validated against three widely used extraction protocols and showed comparable DNA yield and amplification success to the widely used Qiagen DNeasy kit. We successfully applied the protocol in a high‐throughput manner, extracting DNA from 3786 insect specimens across a broad range of ages, taxonomies and tissue types. A detailed protocol and instructional video are provided to facilitate the adoption of the method by other researchers. By improving one of the most crucial steps in any molecular project, this SPRI bead‐based DNA extraction approach has significant potential for enabling large‐scale museomics projects, thereby increasing the utility of historical collections for biodiversity research and conservation efforts.


Beyond Presence and Absence: Using eDNA and Microsatellite Genotyping to Estimate Densities of Microscopic Life Forms in Wild Populations

May 2025

·

34 Reads

Many challenges arise when monitoring organisms with cryptic life‐histories. For example, some cryptic life‐stages are hard to identify or sample due to their microscopic nature, which creates unknowns surrounding an organism's population dynamics. Environmental DNA (eDNA) is a non‐invasive sampling technique used to monitor cryptic species when traditional survey methods are challenging. Generally, eDNA has been used to quantify the presence/absence of species in various habitats. However, recent advances in high‐throughput amplicon sequencing techniques have enabled researchers to detect intraspecific genetic diversity with eDNA. In this study, we present two complementary R packages that can be used to estimate the number of individuals in an eDNA sample. The first package (Amplicomsat) cleans high‐throughput amplicon microsatellite sequences and counts the observed alleles identified in eDNA. Our second package (GenotypeQuant) then uses a numerical maximum likelihood estimator (NMLE) to estimate the number of contributors most likely to have produced the sequenced panel of microsatellite alleles amplified from eDNA. We first present simulations to characterise the accuracy and precision of the method. We then estimated densities of Nereocystis luetkeana (bull kelp) microscopic gametophytes from eDNA collected from an experiment with a manipulated number of gametophytes. Finally, we analysed benthic eDNA from kelp forest habitats. We found that gametophyte estimates produced by the NMLE varied within +3/−2 individuals when processing eDNA from rocks with 8 seeded gametophytes. We estimated 500 to 800 gametophytes·m ⁻² densities in July, five or more months since spore germination and before the current year's spore release. Gametophyte abundance scaled with the sampling area and numbers were higher than total sporophyte densities.


EukFunc: A Holistic Eukaryotic Functional Reference for Automated Profiling of Soil Eukaryotes

April 2025

·

219 Reads

The soil eukaryome constitutes a significant portion of Earth's biodiversity that drives major ecosystem functions, such as controlling carbon fluxes and plant performance. Currently, however, we miss a standardised approach to functionally classify the soil eukaryome in a holistic way. Here we compiled EukFunc, the first functional reference database that characterises the most abundant and functionally important soil eukaryotic groups: fungi, nematodes and protists. We classified the 14,060 species in the database based on their mode of nutrient acquisition into the main functional classes of symbiotroph (40%), saprotroph (26%), phototroph (17%), predator (16%) and unknown (2%). EukFunc provides further detailed information about nutrition mode, including a secondary functional class (i.e., for organisms with multiple nutrition modes), and preyed or associated organisms for predatory or symbiotic taxa, respectively. EukFunc is available in multiple formats for user‐friendly functional analyses of specific taxa or annotations of metabarcoding datasets, both embedded in the R package EukFunc. Using a soil dataset from alpine and subalpine meadows, we highlighted the extended ecological insights obtained from combining functional information across the entire soil eukaryome as compared to focusing on fungi, protists or nematodes individually. EukFunc streamlines the annotation process, enhances efficiency and accuracy, and facilitates the investigation of the functional roles of soil eukaryotes—a prerequisite to better understanding soil systems.


Performance of DNA Metabarcoding vs. Morphological Methods for Assessing Intertidal Turf and Foliose Algae Diversity

April 2025

·

45 Reads

Large biogeographical shifts in marine communities are taking place in response to climate change and biological invasions yet we still lack a full understanding of their diversity and distribution. An important example of this is turf and foliose algae that are key coastal primary producers in several regions and are expanding into new environments. Traditionally, monitoring turf and foliose algae communities involves species identification based on morphological traits, which is challenging due to their reduced dimensions and highly variable morphology. Molecular methods promise to revolutionise this field, but their effectiveness in detecting turf and foliose algae has yet to be tested. Here, we evaluate the performance of DNA metabarcoding (COI and rbc L markers) and morphological identification (in situ and photoquadrat) to describe intertidal turf and foliose algae communities along the Portuguese coast. Both molecular markers detected more taxa than the morphological methods and showed greater discrimination of turf and foliose algae communities between regions, matching our knowledge of the geographical and climatic patterns for the region. In sum, our multi‐marker metabarcoding approach was more efficient than morphology‐based methods in characterising turf and foliose algae communities along the Portuguese coast, differentiating morphologically similar species, and detecting unicellular organisms. However, certain taxa that were identified by in situ and photoquadrat approaches were not detected through metabarcoding, partly due to lack of reference barcodes or taxonomic resolution. Metabarcoding emerges as a valuable tool for monitoring these communities, particularly in long‐term programmes requiring accuracy, speed, and reproducibility.


A Comprehensive Evaluation of Taxonomic Classifiers in Marine Vertebrate eDNA Studies

April 2025

·

25 Reads

Environmental DNA (eDNA) metabarcoding is a widely used tool for surveying marine vertebrate biodiversity. To this end, many computational tools have been released and a plethora of bioinformatic approaches are used for eDNA‐based community composition analysis. Simulation studies and careful evaluation of taxonomic classifiers are essential to establish reliable benchmarks to improve the accuracy and reproducibility of eDNA‐based findings. Here we present a comprehensive evaluation of nine taxonomic classifiers exploring three widely used mitochondrial markers (12S rDNA, 16S rDNA and COI) in Australian marine vertebrates. Curated reference databases and exclusion database tests were used to simulate diverse species compositions, including three positive control and two negative control datasets. Using these simulated datasets ranging from 36 to 302 marker genes, we were able to identify between 19% and 89% of marine vertebrate species using mitochondrial markers. We show that MMSeqs2 and Metabuli generally outperform BLAST with 10% and 11% higher F1 scores for 12S and 16S rDNA markers, respectively, and that Naive Bayes Classifiers such as Mothur outperform sequence‐based classifiers except MMSeqs2 for COI markers by 11%. Database exclusion tests reveal that MMSeqs2 and BLAST are less susceptible to false positives compared to Kraken2 with default parameters. Based on these findings, we recommend that MMSeqs2 is used for taxonomic classification of marine vertebrates given its ability to improve species‐level assignments while reducing the number of false positives. Our work contributes to the establishment of best practices in eDNA‐based biodiversity analysis to ultimately increase the reliability of this monitoring tool in the context of marine vertebrate conservation.


Optimization and Evaluation of the bestRAD Sequencing Approach: Towards Ascertainment of the Invasion Routes of the Oriental Fruit Fly, Bactrocera dorsalis

April 2025

·

17 Reads

The bestRAD technique is a reduced genome representation approach with high‐capacity sample multiplexing and physical isolation of biotin‐labelled target DNA fragments using streptavidin beads, which should reduce total cost and genotyping errors. While we here formalise the relevance of this approach within the HTS landscape, our foremost aim was to improve its replicability, validity, and transparency. We first optimised the molecular laboratory protocol and shared the associated protocols (e.g., final detailed methodologies, quality control, best practices) under the FAIR principles. Using 84 worldwide individual samples of the Oriental fruit fly, Bactrocera dorsalis , a major invasive pest, we revealed a low rate of PCR duplicates, robustness to DNA quality and quantity, high genotype call rate, insignificant genotyping error rate, high nuclear and mitochondrial genome representativeness, and a high level of genetic information. This in‐depth data quality assessment, along with total cost and handling time reduced by an estimated one‐third relative to the parent RAD‐Seq version, demonstrates that bestRAD is an excellent compromise between cost and quality. While we generated high‐quality genomic resources for B. dorsalis , we also share details and recommendations for the bestRAD technique that can be readily used in any laboratory and applied to all organisms, even without published genome sequence.


FIGURE 3 | Assay validation using soil samples from 'La petite ferme de Portet', Toulouse, France. (a) Schematic design of animal arrangements on site. The Soil1 sample was taken from the area mainly grouping sheep and goats, versus horses, donkeys and cattle for the Soil2 sample. Soil6 was taken from near the fences. (b) ML phylogenetic reconstruction of cattle mitochondrial variation, including the mitogenome recovered from the Soil2 sample, which clusters within haplogroup T, especially in sub-haplogroup T3, which is the most common haplogroup of domestic cattle in Europe (each coloured bar indicates an individual haplogroup).
FIGURE 4 | Assay validation using cave hyena coprolite samples from Cassenade cave, France. (a) Maximum likelihood (ML) phylogenetic reconstruction of Bison and Bos mitochondrial variation, including the mitochondrial sequence data retrieved from 13 Cassenade coprolites. (b) ML phylogenetic reconstruction of Crocuta crocuta mitochondrial variation, using Proteles cristata as an outgroup, and including the mitochondrial sequence data retrieved from 19 Cassenade coprolites (each coloured bar indicates an individual haplogroup).
Validating a Target-Enrichment Design for Capturing Uniparental Haplotypes in Ancient Domesticated Animals

April 2025

·

319 Reads

In the last three decades, DNA sequencing of ancient animal osteological assemblages has become an important tool complementing standard archaeozoological approaches to reconstruct the history of animal domestication. However, osteological assemblages of key archaeological contexts are not always available or do not necessarily preserve enough ancient DNA for a cost‐effective genetic analysis. Here, we develop an in‐solution target‐enrichment approach, based on 80‐mer species‐specific RNA probes (ranging from 306 to 1686 per species) to characterise (in single experiments) the mitochondrial genetic variation from eight domesticated animal species of major economic interest: cattle, chickens, dogs, donkeys, goats, horses, pigs and sheep. We also illustrate how our design can be adapted to enrich DNA library content and map the Y‐chromosomal diversity within Equus caballus . By applying our target‐enrichment assay to an extensive panel of ancient osteological remains, farm soil, and cave sediments spanning the last 43 kyrs, we demonstrate that minimal sequencing efforts are necessary to exhaust the DNA library complexity and to characterise mitogenomes to an average depth‐of‐coverage of 19.4 to 2003.7‐fold. Our assay further retrieved horse mitogenome and Y‐chromosome data from Late Pleistocene coprolites, as well as bona fide mitochondrial sequences from species that were not part of the probe design, such as bison and cave hyena. Our methodology will prove especially useful to minimise costs related to the genetic analyses of maternal and paternal lineages of a wide range of domesticated and wild animal species, and for mapping their diversity changes over space and time, including from environmental samples.


Leveraging Synteny to Generate Reference Genomes for Conservation: Assembling the Genomes of Hector's and Māui Dolphins

April 2025

·

58 Reads

Escalating concern regarding the impacts of reduced genetic diversity on the conservation of endangered species has spurred efforts to obtain chromosome‐level genomes through consortia such as the Vertebrate Genomes Project. However, assembling reference genomes for many threatened species remains challenging due to difficulties obtaining optimal input samples (e.g., fresh tissue, cell lines) that can characterise long‐term conservation collections. Here, we present a pipeline that leverages genome synteny to construct high‐quality genomes for species of conservation concern despite less‐than‐optimal samples and/or sequencing data, demonstrating its use on Hector's and Māui dolphins. These endemic New Zealand dolphins are threatened by human activities due to their coastal habitat and small population sizes. Hector's dolphins are classified as endangered by the IUCN, while the Māui dolphin is among the most critically endangered marine mammals. To assemble reference genomes for these dolphins, we created a pipeline combining de novo assembly tools with reference‐guided techniques, utilising chromosome‐level genomes of closely related species. The pipeline assembled highly contiguous chromosome‐level genomes (scaffold N50: 110 MB, scaffold L50: 9, miniBUSCO completeness scores > 96.35%), despite non‐optimal input tissue samples. We demonstrate that these genomes can provide insights relevant for conservation, including historical demography revealing long‐term small population sizes, with subspecies divergence occurring ~20 kya, potentially linked to the Last Glacial Maximum. Māui dolphin heterozygosity was 40% lower than Hector's and comparable to other cetacean species noted for reduced genetic diversity. Through these exemplar genomes, we demonstrate that our pipeline can provide high‐quality genomic resources to facilitate ongoing conservation genomics research.


CCS-Consensuser: A Haplotype-Aware Consensus Generator for PacBio Amplicon Sequences

April 2025

·

29 Reads

DNA sequencing technology has undergone substantial improvements in recent years, to the extent that Third Generation Sequencing platforms are capable of massively generating long‐reads. Amplicon sequencing has been among the most popular techniques due to its wide application in diverse fields of biological sciences. However, there is a lack of software specifically designed to analyse intra‐individual genetic variation using amplicon long‐read data. Here, we present CCS‐consensuser, an end‐to‐end pipeline that generates consensus sequences from amplicon sequencing using high‐fidelity reads produced by PacBio circular consensus sequencing (CCS). We evaluated the concordance of the results produced using CCS + CCS‐consensuser and other sequencing platforms (Illumina and Sanger), as well as accuracy using a simulated dataset. This assessment showed that CCS amplicon data coupled with CCS‐consensuser can produce high‐quality sequences (PHRED > 30). The pipeline resulted in high proportions of identical sequence bins for real data, achieving up to 94.94% concordance with COI Sanger sequences and 92.61% with nuclear loci Illumina sequences (considering heterozygous loci), and 95.55% with a fully phased nuclear simulated dataset. Furthermore, our pipeline can be used to detect heteroplasmy in mtDNA, cross‐contamination, resolve the phase of nuclear genes in diploid organisms, and conceivably for multi‐copy gene systems such as rDNA. These results not only support its potential for application in studies using haploid data such as DNA barcoding, but also demonstrate its unique capacity to explore within individual haplotype variation. Therefore, our strategy shows promise for a broad range of applications in biology and medicine that have been challenging to assess using traditional techniques.


Chromosome-Level Genome Assembly of the Loach Goby Rhyacichthys aspro Offers Insights Into Gobioidei Evolution

April 2025

·

114 Reads

The percomorph fish clade Gobioidei is a suborder that comprises over 2200 species distributed in nearly all aquatic habitats. To understand the genetics underlying their species diversification, we sequenced and annotated the genome of the loach goby, Rhyacichthys aspro, an early-diverging group, and compared it with nine additional Gobioidei species. Within Gobioidei, the loach goby possesses the smallest genome at 594 Mb, and a rise in species diversity from early-diverging to more recently diverged lineages is mirrored by enlarged genomes and a higher presence of transposable elements (TEs), particularly DNA transposons. These DNA transposons are enriched in genic and regulatory regions and their copy number increase is strongly correlated with substitution rate, suggesting that DNA repair after transposon excision/insertion leads to nearby mutations. Consequently, the proliferation of DNA transposons might be the crucial driver of Gobioidei diversification and adaptability. The loach goby genome also points to mechanisms of ecological adaptation. It contains relatively few genes for lateral line development but an overrepresentation of synaptic function genes, with genes putatively under selection linked to synapse organisation and calcium signalling, implicating a sensory system distinct from other Gobioidei species. We also see an overabundance of genes involved in neurocranium development and renal function, adaptations likely connected to its flat morphology suited for strong currents and an amphidromous life cycle. Comparative analyses with hill-stream loaches and the European eel reveal convergent adaptations in body shape and saltwater balance. These findings shed new light on the loach goby's survival mechanisms and the broader evolutionary trends within Gobioidei.


Improving Whole Biodiversity Monitoring and Discovery With Environmental DNA Metagenomics

April 2025

·

84 Reads

Environmental DNA (eDNA) metagenomics sequences all DNA molecules present in environmental samples and has the potential of identifying virtually any organism from which they are derived. However, due to unacceptable levels of false positives and negatives, this approach is underexplored as a tool for biodiversity monitoring across the tree of life, particularly for non‐microscopic eukaryotes. We present SeqIDist, a framework that combines multilocus BLAST matches against several reference databases followed by an analysis of sequence identity distribution patterns to disentangle false positives while revealing new biodiversity and increasing the accuracy of metagenomic approaches. We tested SeqIDist on an eDNA metagenomic dataset from a riverine site and compared the results to those obtained with an eDNA metabarcoding approach for benchmarking purposes. We start by characterising the biological community (~2000 taxa) across the tree of life at low taxonomic levels and show that eDNA metagenomics has a higher sensitivity than eDNA metabarcoding in discovering new diversity. We show that limited representation of whole genome sequences in reference databases can lead to false positives. For non‐microscopic eukaryotes, eDNA metagenomic data often consist of a few sparse, anonymous sequences scattered across the genome, making metagenome assembly methods unfeasible. Finally, we infer eDNA source and residency time using read length distributions as a measure of decay status. The higher accuracy of SeqIDist opens the discussion of the potential of eDNA metagenomics for archived samples and its implementation in long‐term biodiversity monitoring at a planetary scale.


Journal metrics


5.5 (2023)

Journal Impact Factor™


27%

Acceptance rate


15.6 (2023)

CiteScore™


33 days

Submission to first decision


2.587 (2023)

SNIP


$5,250.00 / £3,500.00 / €4,430.00

Article processing charge