aMeta: ancient metagenomics profiling workflow overview. The workflow represents a combination of taxonomic classification + filtering steps with KrakenUniq that allows to establish a list of microbial candidates for further building a MALT database, running LCA-based alignments with MALT against the database, and performing validation + authentication analysis based on the alignments.

aMeta: ancient metagenomics profiling workflow overview. The workflow represents a combination of taxonomic classification + filtering steps with KrakenUniq that allows to establish a list of microbial candidates for further building a MALT database, running LCA-based alignments with MALT against the database, and performing validation + authentication analysis based on the alignments.

Source publication
Preprint
Full-text available
Analysis of microbial data from archaeological samples is a rapidly growing field with a great potential for understanding ancient environments, lifestyles and disease spread in the past. However, high error rates have been a long-standing challenge in ancient metagenomics analysis. This is also complicated by a limited choice of ancient microbiome...

Contexts in source publication

Context 1
... aMeta workflow overview is shown in Figure 1. It represents an end-to-end processing and analysis framework implemented in Snakemake [24] that accepts raw data as a set of files, usually belonging to a common project, and outputs a ranked list of detected ancient microbial species together with their abundances for each sample, as well as a number of validation and authentication plots for each identified microorganism in each sample. ...
Context 2
... a microbe is truly detected, the reads should map evenly across the reference genome, see Figure 2B. In contrast, in case of misaligned reads, i.e. when reads originating from species A map to the reference genome of species B, it is common to observe "piles'' of reads aligned to a few conserved regions of the reference genome, which is the case in Figure 2A (see also Supplementary Figure 1 for a real data example, where reads from unknown microbial organisms are forced to map to Yersinia pestis reference genome alone). Therefore, we consider the breadth of coverage information delivered by KrakenUniq to be of crucial importance for robust filtering in our workflow. ...
Context 3
... KrakenUniq and aMeta have higher sensitivity for microbial detection. This conclusion is confirmed by Supplementary Figures 7-12, where the ground truth for the microbial presence-absence per sample is compared against the one reconstructed by aMeta and HOPS. For example, such simulated species as Campylobacter rectus, Fusarium fujikuroi, Methylobacterium bullatum, Micromonas commoda, Micromonospora echinospora and Mycolicibacterium aurum were correctly identified by aMeta but not detected by HOPS in any simulated sample. ...
Context 4
... ran aMeta and HOPS with default settings on the simulated ground truth dataset, and obtained lists of microbial organisms ranked by the scoring system of aMeta and HOPS, where likely present and ancient microbes received higher scores. Visual inspection of the native heatmap output from HOPS demonstrated its poor authentication performance, Supplementary Figure 13. More specifically, a few bacteria such as Rhodopseudomonas palustris, Rhodococcus hoagii, Lactococcus lactis, Brevibacterium aurantiacum, Burkholderia mallei were erroneously reported by HOPS to be ancient as they got the highest scores in several samples, while they were supposed to be modern according to the simulation's design. ...
Context 5
... were, however, correctly ranked low as potential modern contaminants by aMeta. In contrast, the simulated ancient Salmonella enterica genome was ranked low by HOPS due to read misalignment, while it obtained high scores from aMeta correctly indicating its presence and ancient status, see Supplementary Figures 14-15. Overall, we conclude that aMeta has a lower authentication error compared to HOPS, see Supplementary Information S6 for more details. ...
Context 6
... discovered that nearly 22 000 reads mapped uniquely, i.e. with mapping quality MAPQ > 0, Supplementary Figure 3. Since the sample was from a modern infant who unlikely suffered from plague, the mapped reads cannot be used as evidence of Y. pestis presence in the infant's stool sample. Further, visually inspecting the alignments in Integrative Genomics Viewer (IGV) [29] we confirmed that the reads aligned unevenly implying Y. pestis was not a right reference genome for the reads, see Supplementary Figure 1. Assuming that a large fraction of the aligned reads might be of human rather than bacterial origin, and thus misaligned to the Y. pestis reference genome due to the absence of a human reference genome in the reference database, we concatenated the hg38 human reference genome, https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/, ...
Context 7
... substantial constraint of HOPS is that it does not provide information on the breadth and evenness of coverage. As discussed in the main text and shown in Figure 2, this often results in Supplementary Figure 1. ...
Context 8
... belonging to the HOPS toolbox, by default assigns +1 to any microbial hit with non-zero terminal damage. Therefore, even a tiny transition mutation frequency, such as 0.001 at the terminal ends of the reads, counts as the presence of damage, which substantially inflates the authentication error, see Supplementary Figure 13. ...
Context 9
... ground truth abundance matrix was binarized using 1% of all reads as a detection threshold, and a binary (present / absent) heatmap of distribution of microbial organisms across samples is presented in Supplementary Figure 8. To compare the microbial detection accuracy of HOPS and aMeta, we built binary heatmaps based on the microbial abundances reconstructed by HOPS and aMeta, see Supplementary Figures 9 and 10, and compared them with the ground truth in Supplementary Figure 8. For aMeta, default internal filter settings were used to determine the presence / absence of a microbe, while we used a reasonable threshold of 300 reads to binarize the HOPS microbial abundance matrix. ...
Context 10
... summarize the results, we computed the accuracy of microbial detection, i.e., presence or absence irrespective of the ancient status, based on the confusion matrices for aMeta and HOPS, see Supplementary Figure 11. The confusion matrix corresponding to aMeta had a detection accuracy of 86% and was more balanced compared to the HOPS confusion matrix, which had an accuracy of 69% and a very high false-negative rate. ...
Context 11
... /2022 filtering) to 800 reads (very harsh filtering), see Supplementary Figure 12. Within the tested range of read number cutoffs, the accuracy of HOPS microbial detection was consistently lower than that of aMeta and did not show substantial variation. ...
Context 12
... native HOPS scoring system includes only three quality metrics: deamination profile and edit distances for all reads and damaged reads. This not only results in many obvious false-positive discoveries, as shown in Supplementary Figure 13, but also too little variation in scores that is not straightforward to align with aMeta's seven-metric scoring system to compute a ROC-curve. Therefore, for a proper comparison, we had to modify the native HOPS scoring system and add reasonable assessments of depth and evenness of coverage, as well as PMD scores and read length distribution. ...

Similar publications

Article
Full-text available
Background The human gut microbiome develops rapidly during infancy, a key window of development coinciding with the maturation of the adaptive immune system. However, little is known about the microbiome growth dynamics over the first few months of life and whether there are any generalizable patterns across human populations. We performed metagen...
Article
Full-text available
Background Colorectal cancer (CRC) is linked to distinct gut microbiome patterns. The efficacy of gut bacteria as diagnostic biomarkers for CRC has been confirmed. Despite the potential to influence microbiome physiology and evolution, the set of plasmids in the gut microbiome remains understudied. Methods We investigated the essential features of...
Article
Full-text available
Large-scale microbiome studies are progressively utilizing multiomics designs, which include the collection of microbiome samples together with host genomics and metabolomics data. Despite the increasing number of data sources, there remains a bottleneck in understanding the relationships between different data modalities due to the limited number...
Preprint
Full-text available
The rapid diagnosis of infectious diseases has an essential impact on their control, treatment, and recovery. Oxford Nanopore Technologies (ONT) sequencing opens up a new dimension in applying clinical metagenomics. In a large-scale pig farm in Hungary, four fattening and one piglet nasal swab pooled samples were sequenced using ONT for metagenomic...

Citations

... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata are also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Article
Background Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.org) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications. Methods Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate rapid data filtering and downloading of ancient metagenomic data, as well as improving automated metadata curation and validation for AncientMetagenomeDir. Results AncientMetagenomeDir was extended to include standardised metadata of over 6000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community. Conclusions Together, both standardised metadata reporting and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... HOPS runs on MALT [35], which provides the options of polymerase chain reaction (PCR) duplicate removal and deamination pattern tolerance at the ends of reads. However, MALT requires high computational memory, which can go over 1 TB of RAM on even a modest size genome database to build and medium-size FASTQ files to input [36]. Therefore, it is often only used in research clusters capable of supporting and maintaining such large databases and allowing usage of large memory systems. ...
Article
Full-text available
Taxonomic profiling of ancient metagenomic samples is challenging due to the accumulation of specific damage patterns on DNA over time. Although a number of methods for metagenome profiling have been developed, most of them have been assessed on modern metagenomes or simulated metagenomes mimicking modern metagenomes. Further, a comparative assessment of metagenome profilers on simulated metagenomes representing a spectrum of degradation depth, from the extremity of ancient (most degraded) to current or modern (not degraded) metagenomes, has not yet been performed. To understand the strengths and weaknesses of different metagenome profilers, we performed their comprehensive evaluation on simulated metagenomes representing human dental calculus microbiome, with the level of DNA damage successively raised to mimic modern to ancient metagenomes. All classes of profilers, namely, DNA-to-DNA, DNA-to-protein, and DNA-to-marker comparison-based profilers were evaluated on metagenomes with varying levels of damage simulating deamination, fragmentation, and contamination. Our results revealed that, compared to deamination and fragmentation, human and environmental contamination of ancient DNA (with modern DNA) has the most pronounced effect on the performance of each profiler. Further, the DNA-to-DNA (e.g., Kraken2, Bracken) and DNA-to-marker (e.g., MetaPhlAn4) based profiling approaches showed complementary strengths, which can be leveraged to elevate the state-of-the-art of ancient metagenome profiling.
... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata is also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Article
Full-text available
Background : Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.github.io) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications. Methods : Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate automated metadata curation and data validation, as well as rapid data filtering and downloading. Results : AncientMetagenomeDir was extended to include standardised metadata of over 5000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community. Conclusions : Together, both standardised metadata and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... Furthermore, new tools have been developed to enhance the accuracy of taxonomic assignment of metagenomic reads (PIA, Cribdon et al. (2020), and for processing and analyzing ancient metagenomics shotgun data, specifically targeting ultra-short molecules, (e.g. Collin et al. (2020); Fellows Yates et al. (2021b); Pochon et al. (2022);Neuenschwander et al. (2023)). However, identifying ancient sequences is still a challenge due to the lack of standard bioinformatics pipelines to analyze DNA metabarcoding or shotgun metagenomics data. ...
Article
Full-text available
Sedimentary ancient DNA (sedaDNA) offers a novel retrospective approach to reconstructing the history of marine ecosystems over geological timescales. Until now, the biological proxies used to reconstruct paleoceanographic and paleoecological conditions were limited to organisms whose remains are preserved in the fossil record. The development of ancient DNA analysis techniques substantially expands the range of studied taxa, providing a holistic overview of past biodiversity. Future development of marine sedaDNA research is expected to dramatically improve our understanding of how the marine biota responded to changing environmental conditions. However, as an emerging approach, marine sedaDNA holds many challenges, and its ability to recover reliable past biodiversity information needs to be carefully assessed. This review aims to highlight current advances in marine sedaDNA research and to discuss potential methodological pitfalls and limitations.
Preprint
Full-text available
Metagenomic classification tackles the problem of characterising the taxonomic source of all DNA sequencing reads in a sample. A common approach to address the differences and biases between the many different taxonomic classification tools is to run metagenomic data through multiple classification tools and databases. This, however, is a very time-consuming task when performed manually - particularly when combined with the appropriate preprocessing of sequencing reads before the classification. Here we present nf-core/taxprofiler, a highly parallelised read-processing and taxonomic classification pipeline. It is designed for the automated and simultaneous classification and/or profiling of both short- and long-read metagenomic sequencing libraries against a 11 taxonomic classifiers and profilers as well as databases within a single pipeline run. Implemented in Nextflow and as part of the nf-core initiative, the pipeline benefits from high levels of scalability and portability, accommodating from small to extremely large projects on a wide range of computing infrastructure. It has been developed following best-practise software development practises and community support to ensure longevity and adaptability of the pipeline, to help keep it up to date with the field of metagenomics.
Article
Full-text available
Analysis of microbial data from archaeological samples is a growing field with great potential for understanding ancient environments, lifestyles, and diseases. However, high error rates have been a challenge in ancient metagenomics, and the availability of computational frameworks that meet the demands of the field is limited. Here, we propose aMeta, an accurate metagenomic profiling workflow for ancient DNA designed to minimize the amount of false discoveries and computer memory requirements. Using simulated data, we benchmark aMeta against a current state-of-the-art workflow and demonstrate its superiority in microbial detection and authentication, as well as substantially lower usage of computer memory. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-023-03083-9.
Article
Full-text available
Palaeogenomics continues to yield valuable insights into the evolution, population dynamics, and ecology of our ancestors and other extinct species. However, DNA sequencing cannot reveal tissue-specific gene expression, cellular identity, or gene regulation, only attainable at the transcriptional level. Pioneering studies have shown that useful RNA can be extracted from ancient specimens preserved in permafrost and historical skins from extant canids, but no attempts have been made so far on extinct species. We extract, sequence and analyze historical RNA from muscle and skin tissue of a ~130-year-old Tasmanian tiger (Thylacinus cynocephalus) preserved in desiccation at room temperature in a museum collection. The transcriptional profiles closely resemble those of extant species, revealing specific anatomical features such as slow muscle fibers or blood infiltration. Metatranscriptomic analysis, RNA damage, tissue-specific RNA profiles, and expression hotspots genome-wide further confirm the thylacine origin of the sequences. RNA sequences are used to improve protein-coding and noncoding annotations, evidencing missing exonic loci and the location of ribosomal RNA genes, while increasing the number of annotated thylacine microRNAs from 62 to 325. We discover a thylacine-specific microRNA isoform that could not have been confirmed without RNA evidence. Finally, we detect traces of RNA viruses, suggesting the possibility of profiling viral evolution. Our results represent the first successful attempt to obtain transcriptional profiles from an extinct animal species, providing thought-to-be-lost information on gene expression dynamics. These findings hold promising implications for the study of RNA molecules across the vast collections of Natural History museums and from well-preserved permafrost remains.