Figure 4 - available via license: Creative Commons Attribution-NoDerivatives 4.0 International
Content may be subject to copyright.
Authentication output of aMeta. Panels from left to right, top to bottom: a) edit distance computed on all assigned reads, b) edit distance computed on damaged reads, c) evenness / breadth of coverage, d) deamination pattern, e) read length distribution, f) PMD scores distribution, g) number of reads assigned with an identity to a reference, h) candidate reference sequences with percentages of mapped reads, i) MaltExtract statistics
Source publication
Analysis of microbial data from archaeological samples is a rapidly growing field with a great potential for understanding ancient environments, lifestyles and disease spread in the past. However, high error rates have been a long-standing challenge in ancient metagenomics analysis. This is also complicated by a limited choice of ancient microbiome...
Contexts in source publication
Context 1
... the mentioned quality metrics are complementary and serve for more informed decisions about presence and ancient status of microorganisms in metagenomics samples. A typical graphical output from aMeta is demonstrated in Figure 4. ...
Context 2
... the aMeta workflow we constructed and implemented a special scoring system that should facilitate getting a quick user-friendly overview of likely present ancient microbes. The score is computed per microbe and per sample, and represents a quantity that combines seven validation and authentication metrics presented graphically in Figure 4, more specifically: 1) deamination profile, 2) evenness of coverage, 3) edit distance (amount of mismatches) for all reads, 4) edit distance (amount of mismatches) for damaged reads, 5) read length distribution, 6) PMD scores distribution, 7) number of assigned reads (depth of coverage). The score assigns heavier weights to evenness of coverage as an ultimate criterion for the true presence of a microbe, and deamination profile as the most important evidence of its ancient origin. ...
Context 3
... mentioned above, the execution order of different Snakemake rules is determined by the workflow manager creating a Directed Acyclic Graph (DAG) of jobs that can be automatically parallelized. A typical DAG of aMeta run is presented in Supplementary Figure 4. ...
Context 4
... the range of scores a microbe can obtain varies from a minimum of 0 to a maximum value of 10. Below we explain how each validation and authentication metric presented in Figure 4 is quantified according to the aMeta scoring system. First, the deamination profile represents an increased frequency of transition mutations (C→T and G→A) at the terminal ends of the sequencing reads compared to the frequencies of all other possible mutations. ...
Context 5
... large contribution of evenness of coverage to the total score is due to its importance for validating the presence of a microbe in a sample. To quantify the evenness of coverage in Figure 4, we split the reference genome into 100 tiles and count the number of tiles with near zero average breadth of coverage (percent of covered bases in a tile). If fewer than 3 tiles have an average breadth of coverage of less than 1%, i.e. if a few aDNA reads are present in almost all tiles, the coverage is considered sufficiently uniform and the total score is increased by 3 points. ...
Similar publications
Large-scale microbiome studies are progressively utilizing multiomics designs, which include the collection of microbiome samples together with host genomics and metabolomics data. Despite the increasing number of data sources, there remains a bottleneck in understanding the relationships between different data modalities due to the limited number...
It is a challenge to assemble an enormous amount of metagenome data in metagenomics. Usually, metagenome cluster sequence before assembly accelerates the whole process. In SpaRC, sequences are defined as nodes and clustered by a parallel label propagation algorithm (LPA). To address the randomness of label selection from the parallel LPA during clu...
Metagenomic binning has revolutionized the study of uncultured microorganisms. Here we compare single- and multi-coverage binning on the same set of samples, and demonstrate that multi-coverage binning produces better results than single-coverage binning and identifies contaminant contigs and chimeric bins that other approaches miss. While resource...
The rapid diagnosis of infectious diseases has an essential impact on their control, treatment, and recovery. Oxford Nanopore Technologies (ONT) sequencing opens up a new dimension in applying clinical metagenomics. In a large-scale pig farm in Hungary, four fattening and one piglet nasal swab pooled samples were sequenced using ONT for metagenomic...
Citations
... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata are also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Background Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.org) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications. Methods Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate rapid data filtering and downloading of ancient metagenomic data, as well as improving automated metadata curation and validation for AncientMetagenomeDir. Results AncientMetagenomeDir was extended to include standardised metadata of over 6000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community. Conclusions Together, both standardised metadata reporting and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... HOPS runs on MALT [35], which provides the options of polymerase chain reaction (PCR) duplicate removal and deamination pattern tolerance at the ends of reads. However, MALT requires high computational memory, which can go over 1 TB of RAM on even a modest size genome database to build and medium-size FASTQ files to input [36]. Therefore, it is often only used in research clusters capable of supporting and maintaining such large databases and allowing usage of large memory systems. ...
Taxonomic profiling of ancient metagenomic samples is challenging due to the accumulation of specific damage patterns on DNA over time. Although a number of methods for metagenome profiling have been developed, most of them have been assessed on modern metagenomes or simulated metagenomes mimicking modern metagenomes. Further, a comparative assessment of metagenome profilers on simulated metagenomes representing a spectrum of degradation depth, from the extremity of ancient (most degraded) to current or modern (not degraded) metagenomes, has not yet been performed. To understand the strengths and weaknesses of different metagenome profilers, we performed their comprehensive evaluation on simulated metagenomes representing human dental calculus microbiome, with the level of DNA damage successively raised to mimic modern to ancient metagenomes. All classes of profilers, namely, DNA-to-DNA, DNA-to-protein, and DNA-to-marker comparison-based profilers were evaluated on metagenomes with varying levels of damage simulating deamination, fragmentation, and contamination. Our results revealed that, compared to deamination and fragmentation, human and environmental contamination of ancient DNA (with modern DNA) has the most pronounced effect on the performance of each profiler. Further, the DNA-to-DNA (e.g., Kraken2, Bracken) and DNA-to-marker (e.g., MetaPhlAn4) based profiling approaches showed complementary strengths, which can be leveraged to elevate the state-of-the-art of ancient metagenome profiling.
... previously developed the AncientMetagenomeDir project, a set of curated standard sample metadata for ancient host-associated shotgun-sequenced metagenomes, ancient environmental metagenomes, and/or host-associated microbial genomes. 3 However, while sample-level metadata already help with the discovery of suitable comparative data, library-level metadata is also needed to further facilitate data reuse in dedicated aDNA analysis pipelines such as PALEOMIX, 4 nf-core/eager, 5 aMeta, 6 and nf-core/mag. 7 aDNA researchers often build many different types of NGS libraries 8 and may generate (meta)genomic data using multiple different sequencing platforms that require different bioinformatic pre-processing workflows. ...
Background : Access to sample-level metadata is important when selecting public metagenomic sequencing datasets for reuse in new biological analyses. The Standards, Precautions, and Advances in Ancient Metagenomics community (SPAAM, https://spaam-community.github.io) has previously published AncientMetagenomeDir, a collection of curated and standardised sample metadata tables for metagenomic and microbial genome datasets generated from ancient samples. However, while sample-level information is useful for identifying relevant samples for inclusion in new projects, Next Generation Sequencing (NGS) library construction and sequencing metadata are also essential for appropriately reprocessing ancient metagenomic data. Currently, recovering information for downloading and preparing such data is difficult when laboratory and bioinformatic metadata is heterogeneously recorded in prose-based publications.
Methods : Through a series of community-based hackathon events, AncientMetagenomeDir was updated to provide standardised library-level metadata of existing and new ancient metagenomic samples. In tandem, the companion tool 'AMDirT' was developed to facilitate automated metadata curation and data validation, as well as rapid data filtering and downloading.
Results : AncientMetagenomeDir was extended to include standardised metadata of over 5000 ancient metagenomic libraries. The companion tool 'AMDirT' provides both graphical- and command-line interface based access to such metadata for users from a wide range of computational backgrounds. We also report on errors with metadata reporting that appear to commonly occur during data upload and provide suggestions on how to improve the quality of data sharing by the community.
Conclusions : Together, both standardised metadata and tooling will help towards easier incorporation and reuse of public ancient metagenomic datasets into future analyses.
... Furthermore, new tools have been developed to enhance the accuracy of taxonomic assignment of metagenomic reads (PIA, Cribdon et al. (2020), and for processing and analyzing ancient metagenomics shotgun data, specifically targeting ultra-short molecules, (e.g. Collin et al. (2020); Fellows Yates et al. (2021b); Pochon et al. (2022);Neuenschwander et al. (2023)). However, identifying ancient sequences is still a challenge due to the lack of standard bioinformatics pipelines to analyze DNA metabarcoding or shotgun metagenomics data. ...
Sedimentary ancient DNA (sedaDNA) offers a novel retrospective approach to reconstructing the history of marine ecosystems over geological timescales. Until now, the biological proxies used to reconstruct paleoceanographic and paleoecological conditions were limited to organisms whose remains are preserved in the fossil record. The development of ancient DNA analysis techniques substantially expands the range of studied taxa, providing a holistic overview of past biodiversity. Future development of marine sedaDNA research is expected to dramatically improve our understanding of how the marine biota responded to changing environmental conditions. However, as an emerging approach, marine sedaDNA holds many challenges, and its ability to recover reliable past biodiversity information needs to be carefully assessed. This review aims to highlight current advances in marine sedaDNA research and to discuss potential methodological pitfalls and limitations.
Metagenomic classification tackles the problem of characterising the taxonomic source of all DNA sequencing reads in a sample. A common approach to address the differences and biases between the many different taxonomic classification tools is to run metagenomic data through multiple classification tools and databases. This, however, is a very time-consuming task when performed manually - particularly when combined with the appropriate preprocessing of sequencing reads before the classification.
Here we present nf-core/taxprofiler, a highly parallelised read-processing and taxonomic classification pipeline. It is designed for the automated and simultaneous classification and/or profiling of both short- and long-read metagenomic sequencing libraries against a 11 taxonomic classifiers and profilers as well as databases within a single pipeline run. Implemented in Nextflow and as part of the nf-core initiative, the pipeline benefits from high levels of scalability and portability, accommodating from small to extremely large projects on a wide range of computing infrastructure. It has been developed following best-practise software development practises and community support to ensure longevity and adaptability of the pipeline, to help keep it up to date with the field of metagenomics.
Analysis of microbial data from archaeological samples is a growing field with great potential for understanding ancient environments, lifestyles, and diseases. However, high error rates have been a challenge in ancient metagenomics, and the availability of computational frameworks that meet the demands of the field is limited. Here, we propose aMeta, an accurate metagenomic profiling workflow for ancient DNA designed to minimize the amount of false discoveries and computer memory requirements. Using simulated data, we benchmark aMeta against a current state-of-the-art workflow and demonstrate its superiority in microbial detection and authentication, as well as substantially lower usage of computer memory.
Supplementary Information
The online version contains supplementary material available at 10.1186/s13059-023-03083-9.
Palaeogenomics continues to yield valuable insights into the evolution, population dynamics, and ecology of our ancestors and other extinct species. However, DNA sequencing cannot reveal tissue-specific gene expression, cellular identity, or gene regulation, only attainable at the transcriptional level. Pioneering studies have shown that useful RNA can be extracted from ancient specimens preserved in permafrost and historical skins from extant canids, but no attempts have been made so far on extinct species. We extract, sequence and analyze historical RNA from muscle and skin tissue of a ~130-year-old Tasmanian tiger (Thylacinus cynocephalus) preserved in desiccation at room temperature in a museum collection. The transcriptional profiles closely resemble those of extant species, revealing specific anatomical features such as slow muscle fibers or blood infiltration. Metatranscriptomic analysis, RNA damage, tissue-specific RNA profiles, and expression hotspots genome-wide further confirm the thylacine origin of the sequences. RNA sequences are used to improve protein-coding and noncoding annotations, evidencing missing exonic loci and the location of ribosomal RNA genes, while increasing the number of annotated thylacine microRNAs from 62 to 325. We discover a thylacine-specific microRNA isoform that could not have been confirmed without RNA evidence. Finally, we detect traces of RNA viruses, suggesting the possibility of profiling viral evolution. Our results represent the first successful attempt to obtain transcriptional profiles from an extinct animal species, providing thought-to-be-lost information on gene expression dynamics. These findings hold promising implications for the study of RNA molecules across the vast collections of Natural History museums and from well-preserved permafrost remains.