Figure 2 - available via license: Creative Commons Attribution-NoDerivatives 4.0 International
Content may be subject to copyright.
Demonstration of the difference between depth and breadth of coverage concepts. Two read alignment scenarios, A) and B), have identical depth of coverage (or simply coverage) of N reads * L read / L genome = 4 * L read / 4 * L read = 1X. However, the reads are spread unevenly in case A) and evenly in case B). The latter has a higher breadth of coverage and corresponds to a true-positive hit, while the former, A), scenario is typical for a falsepositive microbial detection.
Source publication
Analysis of microbial data from archaeological samples is a rapidly growing field with a great potential for understanding ancient environments, lifestyles and disease spread in the past. However, high error rates have been a long-standing challenge in ancient metagenomics analysis. This is also complicated by a limited choice of ancient microbiome...
Contexts in source publication
Context 1
... breadth of coverage information is obtained through alignments, therefore the advantage of KrakenUniq is that it is capable of delivering a breadth of coverage estimation via classification without performing explicit alignments. Figure 2 schematically demonstrates why detecting microbial organisms solely based on depth of coverage (or simply coverage), which is largely equivalent to the number of mapped reads, might lead to false-positive identifications. Suppose we have a toy reference genome of length 4 * L and 4 reads of length L mapping to the reference genome. ...
Context 2
... we have a toy reference genome of length 4 * L and 4 reads of length L mapping to the reference genome. When a microbe is truly detected, the reads should map evenly across the reference genome, see Figure 2B. In contrast, in case of misaligned reads, i.e. when reads originating from species A map to the reference genome of species B, it is common to observe "piles'' of reads aligned to a few conserved regions of the reference genome, which is the case in Figure 2A (see also Supplementary Figure 1 for a real data example, where reads from unknown microbial organisms are forced to map to Yersinia pestis reference genome alone). ...
Context 3
... a microbe is truly detected, the reads should map evenly across the reference genome, see Figure 2B. In contrast, in case of misaligned reads, i.e. when reads originating from species A map to the reference genome of species B, it is common to observe "piles'' of reads aligned to a few conserved regions of the reference genome, which is the case in Figure 2A (see also Supplementary Figure 1 for a real data example, where reads from unknown microbial organisms are forced to map to Yersinia pestis reference genome alone). Therefore, we consider the breadth of coverage information delivered by KrakenUniq to be of crucial importance for robust filtering in our workflow. ...
Context 4
... concluded that the Microbial NCBI NT provides sufficient accuracy when performing microbial profiling, i.e. including eukaryotic organisms into the database (as it is the case for the Full NCBI NT) does not significantly affect the accuracy of microbial detection. Despite the large variation (large error bars) of Jaccard similarity in Supplementary Figure 2, the increasing profile of Jaccard similarity as a function of database size growth is quite clear. Therefore, we concluded that, in our simulation work, the larger databases provide higher accuracy of microbial detection, while smaller databases suffer from low sensitivity and may introduce biases into microbial identification in metagenomic samples. ...
Similar publications
Background
The human gut microbiome develops rapidly during infancy, a key window of development coinciding with the maturation of the adaptive immune system. However, little is known about the microbiome growth dynamics over the first few months of life and whether there are any generalizable patterns across human populations. We performed metagen...
Background
Colorectal cancer (CRC) is linked to distinct gut microbiome patterns. The efficacy of gut bacteria as diagnostic biomarkers for CRC has been confirmed. Despite the potential to influence microbiome physiology and evolution, the set of plasmids in the gut microbiome remains understudied.
Methods
We investigated the essential features of...
Large-scale microbiome studies are progressively utilizing multiomics designs, which include the collection of microbiome samples together with host genomics and metabolomics data. Despite the increasing number of data sources, there remains a bottleneck in understanding the relationships between different data modalities due to the limited number...
The rapid diagnosis of infectious diseases has an essential impact on their control, treatment, and recovery. Oxford Nanopore Technologies (ONT) sequencing opens up a new dimension in applying clinical metagenomics. In a large-scale pig farm in Hungary, four fattening and one piglet nasal swab pooled samples were sequenced using ONT for metagenomic...
Citations
... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata are also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Background Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.org) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications. Methods Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate rapid data filtering and downloading of ancient metagenomic data, as well as improving automated metadata curation and validation for AncientMetagenomeDir. Results AncientMetagenomeDir was extended to include standardised metadata of over 6000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community. Conclusions Together, both standardised metadata reporting and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... HOPS runs on MALT [35], which provides the options of polymerase chain reaction (PCR) duplicate removal and deamination pattern tolerance at the ends of reads. However, MALT requires high computational memory, which can go over 1 TB of RAM on even a modest size genome database to build and medium-size FASTQ files to input [36]. Therefore, it is often only used in research clusters capable of supporting and maintaining such large databases and allowing usage of large memory systems. ...
Taxonomic profiling of ancient metagenomic samples is challenging due to the accumulation of specific damage patterns on DNA over time. Although a number of methods for metagenome profiling have been developed, most of them have been assessed on modern metagenomes or simulated metagenomes mimicking modern metagenomes. Further, a comparative assessment of metagenome profilers on simulated metagenomes representing a spectrum of degradation depth, from the extremity of ancient (most degraded) to current or modern (not degraded) metagenomes, has not yet been performed. To understand the strengths and weaknesses of different metagenome profilers, we performed their comprehensive evaluation on simulated metagenomes representing human dental calculus microbiome, with the level of DNA damage successively raised to mimic modern to ancient metagenomes. All classes of profilers, namely, DNA-to-DNA, DNA-to-protein, and DNA-to-marker comparison-based profilers were evaluated on metagenomes with varying levels of damage simulating deamination, fragmentation, and contamination. Our results revealed that, compared to deamination and fragmentation, human and environmental contamination of ancient DNA (with modern DNA) has the most pronounced effect on the performance of each profiler. Further, the DNA-to-DNA (e.g., Kraken2, Bracken) and DNA-to-marker (e.g., MetaPhlAn4) based profiling approaches showed complementary strengths, which can be leveraged to elevate the state-of-the-art of ancient metagenome profiling.
... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata is also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Background : Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.github.io) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications.
Methods : Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate automated metadata curation and data validation, as well as rapid data filtering and downloading.
Results : AncientMetagenomeDir was extended to include standardised metadata of over 5000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community.
Conclusions : Together, both standardised metadata and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... Furthermore, new tools have been developed to enhance the accuracy of taxonomic assignment of metagenomic reads (PIA, Cribdon et al. (2020), and for processing and analyzing ancient metagenomics shotgun data, specifically targeting ultra-short molecules, (e.g. Collin et al. (2020); Fellows Yates et al. (2021b); Pochon et al. (2022);Neuenschwander et al. (2023)). However, identifying ancient sequences is still a challenge due to the lack of standard bioinformatics pipelines to analyze DNA metabarcoding or shotgun metagenomics data. ...
Sedimentary ancient DNA (sedaDNA) offers a novel retrospective approach to reconstructing the history of marine ecosystems over geological timescales. Until now, the biological proxies used to reconstruct paleoceanographic and paleoecological conditions were limited to organisms whose remains are preserved in the fossil record. The development of ancient DNA analysis techniques substantially expands the range of studied taxa, providing a holistic overview of past biodiversity. Future development of marine sedaDNA research is expected to dramatically improve our understanding of how the marine biota responded to changing environmental conditions. However, as an emerging approach, marine sedaDNA holds many challenges, and its ability to recover reliable past biodiversity information needs to be carefully assessed. This review aims to highlight current advances in marine sedaDNA research and to discuss potential methodological pitfalls and limitations.
Metagenomic classification tackles the problem of characterising the taxonomic source of all DNA sequencing reads in a sample. A common approach to address the differences and biases between the many different taxonomic classification tools is to run metagenomic data through multiple classification tools and databases. This, however, is a very time-consuming task when performed manually - particularly when combined with the appropriate preprocessing of sequencing reads before the classification.
Here we present nf-core/taxprofiler, a highly parallelised read-processing and taxonomic classification pipeline. It is designed for the automated and simultaneous classification and/or profiling of both short- and long-read metagenomic sequencing libraries against a 11 taxonomic classifiers and profilers as well as databases within a single pipeline run. Implemented in Nextflow and as part of the nf-core initiative, the pipeline benefits from high levels of scalability and portability, accommodating from small to extremely large projects on a wide range of computing infrastructure. It has been developed following best-practise software development practises and community support to ensure longevity and adaptability of the pipeline, to help keep it up to date with the field of metagenomics.
Analysis of microbial data from archaeological samples is a growing field with great potential for understanding ancient environments, lifestyles, and diseases. However, high error rates have been a challenge in ancient metagenomics, and the availability of computational frameworks that meet the demands of the field is limited. Here, we propose aMeta, an accurate metagenomic profiling workflow for ancient DNA designed to minimize the amount of false discoveries and computer memory requirements. Using simulated data, we benchmark aMeta against a current state-of-the-art workflow and demonstrate its superiority in microbial detection and authentication, as well as substantially lower usage of computer memory.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-023-03083-9.
Palaeogenomics continues to yield valuable insights into the evolution, population dynamics, and ecology of our ancestors and other extinct species. However, DNA sequencing cannot reveal tissue-specific gene expression, cellular identity, or gene regulation, only attainable at the transcriptional level. Pioneering studies have shown that useful RNA can be extracted from ancient specimens preserved in permafrost and historical skins from extant canids, but no attempts have been made so far on extinct species. We extract, sequence and analyze historical RNA from muscle and skin tissue of a ~130-year-old Tasmanian tiger (Thylacinus cynocephalus) preserved in desiccation at room temperature in a museum collection. The transcriptional profiles closely resemble those of extant species, revealing specific anatomical features such as slow muscle fibers or blood infiltration. Metatranscriptomic analysis, RNA damage, tissue-specific RNA profiles, and expression hotspots genome-wide further confirm the thylacine origin of the sequences. RNA sequences are used to improve protein-coding and noncoding annotations, evidencing missing exonic loci and the location of ribosomal RNA genes, while increasing the number of annotated thylacine microRNAs from 62 to 325. We discover a thylacine-specific microRNA isoform that could not have been confirmed without RNA evidence. Finally, we detect traces of RNA viruses, suggesting the possibility of profiling viral evolution. Our results represent the first successful attempt to obtain transcriptional profiles from an extinct animal species, providing thought-to-be-lost information on gene expression dynamics. These findings hold promising implications for the study of RNA molecules across the vast collections of Natural History museums and from well-preserved permafrost remains.