Figure 5 - available via license: Creative Commons Attribution-NoDerivatives 4.0 International
Content may be subject to copyright.
Microbial detection sensitivity vs. specificity comparison between KrakenUniq, HOPS and aMeta at different assigned read thresholds: A) Intersection over Union (Jaccard similarity), and B) F1 score, are computed with respect to the simulated microbial abundance ground truth.
Source publication
Analysis of microbial data from archaeological samples is a rapidly growing field with a great potential for understanding ancient environments, lifestyles and disease spread in the past. However, high error rates have been a long-standing challenge in ancient metagenomics analysis. This is also complicated by a limited choice of ancient microbiome...
Contexts in source publication
Context 1
... each depth of coverage threshold applied to the abundance matrices, we compared microbial organisms identified by KrakenUniq and HOPS against the true list of organisms simulated by gargammel. As a criterion of overlap between the prediction and ground truth we used two metrics: Intersection over Union (IoU), aka Jaccard similarity, and F1 score, which both quantify the balance between sensitivity and specificity of microbial detection by KrakenUniq and HOPS, Figure 5. Illustrated by the solid lines in Figure 5, it is demonstrated how IoU and F1 score change at different depth of coverage thresholds applied to the KrakenUniq and HOPS microbial abundance matrices. ...
Context 2
... each depth of coverage threshold applied to the abundance matrices, we compared microbial organisms identified by KrakenUniq and HOPS against the true list of organisms simulated by gargammel. As a criterion of overlap between the prediction and ground truth we used two metrics: Intersection over Union (IoU), aka Jaccard similarity, and F1 score, which both quantify the balance between sensitivity and specificity of microbial detection by KrakenUniq and HOPS, Figure 5. Illustrated by the solid lines in Figure 5, it is demonstrated how IoU and F1 score change at different depth of coverage thresholds applied to the KrakenUniq and HOPS microbial abundance matrices. The dashed horizontal line in Figure 5 corresponds to the IoU and F1 score computed using the depth and breadth of coverage thresholds set by default in aMeta. ...
Context 3
... a criterion of overlap between the prediction and ground truth we used two metrics: Intersection over Union (IoU), aka Jaccard similarity, and F1 score, which both quantify the balance between sensitivity and specificity of microbial detection by KrakenUniq and HOPS, Figure 5. Illustrated by the solid lines in Figure 5, it is demonstrated how IoU and F1 score change at different depth of coverage thresholds applied to the KrakenUniq and HOPS microbial abundance matrices. The dashed horizontal line in Figure 5 corresponds to the IoU and F1 score computed using the depth and breadth of coverage thresholds set by default in aMeta. The default aMeta filtering thresholds were previously empirically determined from the analysis of a number of ancient metagenomic samples [34,35]. ...
Context 4
... default aMeta filtering thresholds were previously empirically determined from the analysis of a number of ancient metagenomic samples [34,35]. As Figure 5 shows, the default settings of aMeta result in nearly optimal IoU and F1 score values obtained from filtering the KrakenUniq abundance matrix. In Figure 5, one can observe that irrespective of the depth of coverage threshold applied to the KrakenUniq and HOPS abundance matrices, the IoU and F1 score quality metrics for HOPS are always below the sensitivity vs. specificity level provided by KrakenUniq and aMeta. ...
Context 5
... Figure 5 shows, the default settings of aMeta result in nearly optimal IoU and F1 score values obtained from filtering the KrakenUniq abundance matrix. In Figure 5, one can observe that irrespective of the depth of coverage threshold applied to the KrakenUniq and HOPS abundance matrices, the IoU and F1 score quality metrics for HOPS are always below the sensitivity vs. specificity level provided by KrakenUniq and aMeta. ...
Context 6
... unbiased pre-screening against large databases becomes computationally feasible thanks to the recent low-memory development of KrakenUniq [22]; meaning, provided that a reference database has been built already and is of a reasonable size, taxonomic classification can be performed on virtually any computer, even a laptop, irrespective of the database size. Indeed, according to our tests, Supplementary Figure 5, the new KrakenUniq development enables 10 times faster classification using a 450 GB reference database even on a computer cluster node with 128 GB of RAM, which was previously impossible without a node with at least 512 GB of RAM. This new development opens up exciting opportunities for truly unbiased pre-screening by KrakenUniq, followed by alignment, validation and authentication by MALT, as implemented in our workflow. ...
Context 7
... the memory gain of our workflow does not compromise the accuracy of microbial detection and authentication. Instead, as shown in Figures 5 and 6, aMeta has a better sensitivity vs. specificity balance for both microbial detection and authentication compared to HOPS. On one hand, the superior sensitivity of aMeta comes from a larger reference database used by KrakenUniq compared to the one used by HOPS. ...
Context 8
... default we filter output of KrakenUniq using 1000 unique k-mers (breadth of coverage filter), and 300 reads (depth of coverage filter) assigned to microbial species. Therefore, the dashed horizontal line in Figure 5 corresponds to the IoU and F1 score computed using the default depth (300 assigned reads) and breadth (1000 unique k-mers) of coverage thresholds set in aMeta. ...
Similar publications
The rapid diagnosis of infectious diseases has an essential impact on their control, treatment, and recovery. Oxford Nanopore Technologies (ONT) sequencing opens up a new dimension in applying clinical metagenomics. In a large-scale pig farm in Hungary, four fattening and one piglet nasal swab pooled samples were sequenced using ONT for metagenomic...
Analysis of microbial data from archaeological samples is a growing field with great potential for understanding ancient environments, lifestyles, and diseases. However, high error rates have been a challenge in ancient metagenomics, and the availability of computational frameworks that meet the demands of the field is limited. Here, we propose aMe...
Background
The human gut microbiome develops rapidly during infancy, a key window of development coinciding with the maturation of the adaptive immune system. However, little is known about the microbiome growth dynamics over the first few months of life and whether there are any generalizable patterns across human populations. We performed metagen...
Evaluating the quality of metagenomic assemblies is important for constructing reliable metagenome-assembled genomes and downstream analyses. Here, we present metaMIC (https://github.com/ZhaoXM-Lab/metaMIC), a machine learning-based tool for identifying and correcting misassemblies in metagenomic assemblies. Benchmarking results on both simulated a...
Citations
... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata are also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Background Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.org) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications. Methods Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate rapid data filtering and downloading of ancient metagenomic data, as well as improving automated metadata curation and validation for AncientMetagenomeDir. Results AncientMetagenomeDir was extended to include standardised metadata of over 6000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community. Conclusions Together, both standardised metadata reporting and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... HOPS runs on MALT [35], which provides the options of polymerase chain reaction (PCR) duplicate removal and deamination pattern tolerance at the ends of reads. However, MALT requires high computational memory, which can go over 1 TB of RAM on even a modest size genome database to build and medium-size FASTQ files to input [36]. Therefore, it is often only used in research clusters capable of supporting and maintaining such large databases and allowing usage of large memory systems. ...
Taxonomic profiling of ancient metagenomic samples is challenging due to the accumulation of specific damage patterns on DNA over time. Although a number of methods for metagenome profiling have been developed, most of them have been assessed on modern metagenomes or simulated metagenomes mimicking modern metagenomes. Further, a comparative assessment of metagenome profilers on simulated metagenomes representing a spectrum of degradation depth, from the extremity of ancient (most degraded) to current or modern (not degraded) metagenomes, has not yet been performed. To understand the strengths and weaknesses of different metagenome profilers, we performed their comprehensive evaluation on simulated metagenomes representing human dental calculus microbiome, with the level of DNA damage successively raised to mimic modern to ancient metagenomes. All classes of profilers, namely, DNA-to-DNA, DNA-to-protein, and DNA-to-marker comparison-based profilers were evaluated on metagenomes with varying levels of damage simulating deamination, fragmentation, and contamination. Our results revealed that, compared to deamination and fragmentation, human and environmental contamination of ancient DNA (with modern DNA) has the most pronounced effect on the performance of each profiler. Further, the DNA-to-DNA (e.g., Kraken2, Bracken) and DNA-to-marker (e.g., MetaPhlAn4) based profiling approaches showed complementary strengths, which can be leveraged to elevate the state-of-the-art of ancient metagenome profiling.
... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata is also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Background : Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.github.io) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications.
Methods : Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate automated metadata curation and data validation, as well as rapid data filtering and downloading.
Results : AncientMetagenomeDir was extended to include standardised metadata of over 5000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community.
Conclusions : Together, both standardised metadata and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... Furthermore, new tools have been developed to enhance the accuracy of taxonomic assignment of metagenomic reads (PIA, Cribdon et al. (2020), and for processing and analyzing ancient metagenomics shotgun data, specifically targeting ultra-short molecules, (e.g. Collin et al. (2020); Fellows Yates et al. (2021b); Pochon et al. (2022);Neuenschwander et al. (2023)). However, identifying ancient sequences is still a challenge due to the lack of standard bioinformatics pipelines to analyze DNA metabarcoding or shotgun metagenomics data. ...
Sedimentary ancient DNA (sedaDNA) offers a novel retrospective approach to reconstructing the history of marine ecosystems over geological timescales. Until now, the biological proxies used to reconstruct paleoceanographic and paleoecological conditions were limited to organisms whose remains are preserved in the fossil record. The development of ancient DNA analysis techniques substantially expands the range of studied taxa, providing a holistic overview of past biodiversity. Future development of marine sedaDNA research is expected to dramatically improve our understanding of how the marine biota responded to changing environmental conditions. However, as an emerging approach, marine sedaDNA holds many challenges, and its ability to recover reliable past biodiversity information needs to be carefully assessed. This review aims to highlight current advances in marine sedaDNA research and to discuss potential methodological pitfalls and limitations.
Metagenomic classification tackles the problem of characterising the taxonomic source of all DNA sequencing reads in a sample. A common approach to address the differences and biases between the many different taxonomic classification tools is to run metagenomic data through multiple classification tools and databases. This, however, is a very time-consuming task when performed manually - particularly when combined with the appropriate preprocessing of sequencing reads before the classification.
Here we present nf-core/taxprofiler, a highly parallelised read-processing and taxonomic classification pipeline. It is designed for the automated and simultaneous classification and/or profiling of both short- and long-read metagenomic sequencing libraries against a 11 taxonomic classifiers and profilers as well as databases within a single pipeline run. Implemented in Nextflow and as part of the nf-core initiative, the pipeline benefits from high levels of scalability and portability, accommodating from small to extremely large projects on a wide range of computing infrastructure. It has been developed following best-practise software development practises and community support to ensure longevity and adaptability of the pipeline, to help keep it up to date with the field of metagenomics.
Analysis of microbial data from archaeological samples is a growing field with great potential for understanding ancient environments, lifestyles, and diseases. However, high error rates have been a challenge in ancient metagenomics, and the availability of computational frameworks that meet the demands of the field is limited. Here, we propose aMeta, an accurate metagenomic profiling workflow for ancient DNA designed to minimize the amount of false discoveries and computer memory requirements. Using simulated data, we benchmark aMeta against a current state-of-the-art workflow and demonstrate its superiority in microbial detection and authentication, as well as substantially lower usage of computer memory.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-023-03083-9.
Palaeogenomics continues to yield valuable insights into the evolution, population dynamics, and ecology of our ancestors and other extinct species. However, DNA sequencing cannot reveal tissue-specific gene expression, cellular identity, or gene regulation, only attainable at the transcriptional level. Pioneering studies have shown that useful RNA can be extracted from ancient specimens preserved in permafrost and historical skins from extant canids, but no attempts have been made so far on extinct species. We extract, sequence and analyze historical RNA from muscle and skin tissue of a ~130-year-old Tasmanian tiger (Thylacinus cynocephalus) preserved in desiccation at room temperature in a museum collection. The transcriptional profiles closely resemble those of extant species, revealing specific anatomical features such as slow muscle fibers or blood infiltration. Metatranscriptomic analysis, RNA damage, tissue-specific RNA profiles, and expression hotspots genome-wide further confirm the thylacine origin of the sequences. RNA sequences are used to improve protein-coding and noncoding annotations, evidencing missing exonic loci and the location of ribosomal RNA genes, while increasing the number of annotated thylacine microRNAs from 62 to 325. We discover a thylacine-specific microRNA isoform that could not have been confirmed without RNA evidence. Finally, we detect traces of RNA viruses, suggesting the possibility of profiling viral evolution. Our results represent the first successful attempt to obtain transcriptional profiles from an extinct animal species, providing thought-to-be-lost information on gene expression dynamics. These findings hold promising implications for the study of RNA molecules across the vast collections of Natural History museums and from well-preserved permafrost remains.