ArticlePDF Available

MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis

Springer Nature
Microbiome
Authors:

Abstract and Figures

Background: The study of microbiomes using whole-metagenome shotgun sequencing enables the analysis of uncultivated microbial populations that may have important roles in their environments. Extracting individual draft genomes (bins) facilitates metagenomic analysis at the single genome level. Software and pipelines for such analysis have become diverse and sophisticated, resulting in a significant burden for biologists to access and use them. Furthermore, while bin extraction algorithms are rapidly improving, there is still a lack of tools for their evaluation and visualization. Results: To address these challenges, we present metaWRAP, a modular pipeline software for shotgun metagenomic data analysis. MetaWRAP deploys state-of-the-art software to handle metagenomic data processing starting from raw sequencing reads and ending in metagenomic bins and their analysis. MetaWRAP is flexible enough to give investigators control over the analysis, while still being easy-to-install and easy-to-use. It includes hybrid algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from metagenomic data through bin consolidation and reassembly. MetaWRAP's hybrid bin extraction algorithm outperforms individual binning approaches and other bin consolidation programs in both synthetic and real data sets. Finally, metaWRAP comes with numerous modules for the analysis of metagenomic bins, including taxonomy assignment, abundance estimation, functional annotation, and visualization. Conclusions: MetaWRAP is an easy-to-use modular pipeline that automates the core tasks in metagenomic analysis, while contributing significant improvements to the extraction and interpretation of high-quality metagenomic bins. The bin refinement and reassembly modules of metaWRAP consistently outperform other binning approaches. Each module of metaWRAP is also a standalone component, making it a flexible and versatile tool for tackling metagenomic shotgun sequencing data. MetaWRAP is open-source software available at https://github.com/bxlab/metaWRAP .
This content is subject to copyright. Terms and conditions apply.
S O F T W A R E Open Access
MetaWRAPa flexible pipeline for
genome-resolved metagenomic data
analysis
Gherman V. Uritskiy, Jocelyne DiRuggiero
*
and James Taylor
*
Abstract
Background: The study of microbiomes using whole-metagenome shotgun sequencing enables the analysis of
uncultivated microbial populations that may have important roles in their environments. Extracting individual draft
genomes (bins) facilitates metagenomic analysis at the single genome level. Software and pipelines for such analysis
have become diverse and sophisticated, resulting in a significant burden for biologists to access and use
them. Furthermore, while bin extraction algorithms are rapidly improving, there is still a lack of tools for their
evaluation and visualization.
Results: To address these challenges, we present metaWRAP, a modular pipeline software for shotgun metagenomic
data analysis. MetaWRAP deploys state-of-the-art software to handle metagenomic data processing starting from raw
sequencing reads and ending in metagenomic bins and their analysis. MetaWRAP is flexible enough to give
investigators control over the analysis, while still being easy-to-install and easy-to-use. It includes hybrid
algorithms that leverage the strengths of a variety of software to extract and refine high-quality bins from
metagenomic data through bin consolidation and reassembly. MetaWRAPs hybrid bin extraction algorithm
outperforms individual binning approaches and other bin consolidation programs in both synthetic and real
data sets. Finally, metaWRAP comes with numerous modules for the analysis of metagenomic bins, including
taxonomy assignment, abundance estimation, functional annotation, and visualization.
Conclusions: MetaWRAP is an easy-to-use modular pipeline that automates the core tasks in metagenomic
analysis, while contributing significant improvements to the extraction and interpretation of high-quality
metagenomic bins. The bin refinement and reassembly modules of metaWRAP consistently outperform
other binning approaches. Each module of metaWRAP is also a standalone component, making it a flexible
and versatile tool for tackling metagenomic shotgun sequencing data. MetaWRAP is open-source software
available at https://github.com/bxlab/metaWRAP.
Keywords: Metagenomics, WGS, Metagenome, Binning, Bin, Draft genome, Pipeline, Reassembly
Background
The study of microbial communities through whole-
metagenome (WMG) shotgun sequencing opens new ave-
nues for the investigation of the metabolic potentials of
microbiomes, in addition to their taxonomic composition
[13]. This greatly improves the ability to interpret and
predict functional interactions, antibiotic resistance, and
population dynamics of microbiomes, with applications in
human health, waste treatment, agriculture, and environ-
mental stewardship [46]. WMG shotgun sequencing
reads from hundreds to thousands of community mem-
bers generate unique challenges for data analysis and in-
terpretation [3,7]. Software for WMG data analysis have
grown in number and complexity, improving our ability to
process, analyze, and interpret such data [812]. However,
these tools are burdensome for biologists to work with. As
the field of WMG expands, comprehensive and accessible
software for unified analysis of metagenomic data is
needed [7,11].
* Correspondence: jdiruggiero@jhu.edu;james@taylorlab.org
Department of Biology, Johns Hopkins University, 3400 N Charles St.,
Baltimore, MD 21218, USA
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Uritskiy et al. Microbiome (2018) 6:158
https://doi.org/10.1186/s40168-018-0541-1
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Running a WMG analysis requires investigators to find
the best currently available tools, install and configure
them, address conflicting libraries and environment vari-
ables, and write scripts to convert outputs from one tool
into the correct format to input into the next tool [13,
14]. These challenges present a major burden to anyone
attempting metagenomic analysis, especially for investi-
gators without computational experience, hindering pro-
gress of microbial genomics as a field [15]. Existing
automated pipelines and cloud services lack modularity,
do not give users control over the analysis, and often
lack functions for genome-resolved metagenomics, the
extraction of putative genomes (bins) through the bin-
ning of metagenomic assemblies [14,1619].
Genome-resolved metagenomics allows for recon-
struction of the functional potential of individual taxa
and microbiome comparison at a finer scale. While a
number of sophisticated tools such as CONCOCT, Max-
Bin, and metaBAT have been developed to address bin-
ning, it is still an actively improving field [9,1921].
Qualities of a metagenomic bin are (1) completion, the
level of coverage of a population genome, and (2) con-
tamination, the amount of sequence that does not be-
long to this population from another genome. These
metrics can be estimated by counting universal
single-copy genes within each bin [22,23]. CheckM im-
proves on this by checking for single-copy genes that a
genome of the bins taxonomy is expected to have [24].
The percentage of expected single-copy genes that is
found in a bin is interpreted as its completion, while the
contamination is estimated from the percentage of
single-copy genes that are found in duplicate.
Most metagenomic binning tools extract bins by clus-
tering together scaffolds that have similar sequence
properties, such as K-mer composition and codon usage,
and similar read coverages across multiple samples [25,
26]. Because no single binning approach is superior in
every case, bin consolidation tools attempt to combine
the strengths and minimize the weaknesses of different
approaches. DAS_Tool predicts single-copy genes in all
the provided bin sets, aggregates bins from different bin-
ning predictions, and extracts a more complete consen-
sus bin from each aggregate such that the resulting bin
has the most single-copy genes while having a reason-
ably low number of duplicate genes [27]. This collapsing
approach significantly improves the completion of the
bins. Binning_refiner, on the other hand, splits the con-
tigs into more bins such that no two contigs are in the
same bin if they were in different bins in any of the ori-
ginal bin sets. This breaks the contigs into many more
bins, reducing contamination [28]. Both of these ap-
proaches consolidate sets of bins from different methods
and result in a superior bin set, but they have limita-
tionsDAS_Tool increases completion at the expense of
introducing contamination, while Binning_refiner priori-
tizes purity but loses completeness. Another way to im-
prove draft genome quality that is relatively unexplored
is bin reassemblyextracting reads that belong to a
given bin and assembling them separately from the rest
of the metagenome. With proper benchmarking, this ap-
proach could significantly improve the quality and
downstream functional annotation of at least some bins
in a microbial community.
Because the field of shotgun metagenomics is relatively
new, there is a lack of software to inspect, analyze,
and visualize metagenomic bins. While there are tools
that can accurately predict the taxonomy of metage-
nomic scaffolds (such as Taxator-tk), there is no tool
to classify entire metagenomic bins [29,30]. Similarly,
there are many ways to estimate the coverage of
scaffolds based on read alignment depth but no way
to find the coverages of entire bins across many sam-
ples [31,32]. Finally, there is no tool to visualize draft
genomes in context of whole metagenomic communi-
ties. The need for an easy-to-use integrated tool for
WMG data analysis, as well as the lack of available
tools for metagenomic bin analysis, motivated the con-
struction of MetaWRAP.
Implementation
Main wrapper function
MetaWRAP is command line software for Unix-based
systems that calls on a collection of modules, each being
a standalone program addressing one aspect of WMG
data processing or analysis (Fig. 1). Each module is a
shell script pipeline that takes in a variety of input file
parameters through command line flags. For detailed
outlines of the algorithms behind each module, see
supplementary material (Additional file 1). The modules
call upon numerous installed software as well as custom
Python 2.7 scripts (Additional file 2: Figure S1). Meta-
WRAP relies on the module folder (metawrap-modules),
the script folder (metaWRAP-scripts), and a file con-
taining paths to databases (config-metawrap) to be avail-
able in the PATH. MetaWRAP is hosted on github
(https://github.com/bxlab/metaWRAP), distributed through
Anaconda [33], and can be easily installed locally and on
remote clusters. The metawrap-mg conda package (https://
anaconda.org/ursky/metawrap-mg) includes metaWRAP
and the necessary software for running any metaWRAP
modules. The databases required by some modules need to
be downloaded and unpackaged as described in the meta-
WRAP database download guide (https://github.com/
bxlab/metaWRAP/blob/master/installation/database_in
stallation.md) and their paths indicated in the config-
metawrap file. MetaWRAP v0.7 was used in all bench-
marking runs.
Uritskiy et al. Microbiome (2018) 6:158 Page 2 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Bin_refinement module
The metaWRAP-Bin_refinement module produces a su-
perior bin set from multiple original binning predictions.
First, hybrid bin sets are produced with Binning_refiner
[28], which splits the contigs such that no two contigs
are together if they were in different bins in any of the
original sets. Then, the module goes over the different
variants of each bin found in the original and hybrid-
ized bin sets and choses its best version based on com-
pletion and contamination metrics estimated with
CheckM [24]. The decision of the best binis based
on the user-provided minimum completion and max-
imum contamination parameters. The contigs in the
final bin set are then de-replicated, and a report of
their completion, contamination, and other metrics is
produced (Additional file 3:FigureS2).Seesupplemen-
tary methods (Additional file 1) for more details on the
Bin_refinement module.
Reassemble_bins module
The metaWRAP-Reassemble_bins module improves a
set of bins by individually re-assembling each bin
(Additional file 4: Figure S3). Reads are mapped to the
bins with BWA v0.7.15 [32] strictly (no mismatches) and
permissively (< 5 mismatches) and stored into their
respective FastQ files. Importantly, read pairs will be
pulled out even if only one read is aligned to the bin.
Each read set is then reassembled with SPAdes [34],
which produces more contiguous sequences compared
to metagenomic assemblers such as MegaHit [35] and
metaSPAdes [36] used in the Assembly modules.
CheckM [24] is used to evaluate the completion and
contamination of each of the three versions of each
binthe original bin, the strictre-assembled bin, and
permissivere-assembled binand the best version of
each bin is chosen for the final bin set based on the
user-defined desired bin quality. See supplementary
methods (Additional file 1) for more details on the
Reassemble_bins module.
Results and discussion
MetaWRAP is a flexible, modular pipeline
The metaWRAP installation produces a bioinformatics
environment with over 150 commonly used bioinformat-
ics software and libraries (Additional file 2: Figure S1).
MetaWRAP itself is a collection of modules, each of
which uses a variety of pre-existing and newly developed
software and databases to accomplish a specific step of
metagenomic analysis. Unlike existing metagenomic
wrappers and cloud services, metaWRAP retains modu-
larity and grants the user control of the analysis pipeline.
The user may follow the intuitive workflow starting from
raw metagenomic shotgun sequencing reads all the way
to high-quality draft genomes and their functional anno-
tation or use only specific functions, as each module is
also a standalone program (Fig. 1).
First, the metaWRAP-Read_qc module trims the raw
sequence reads, removes human contamination, and
produces quality reports for each of the sequenced sam-
ples. The reads from all given samples can then be as-
sembled with the metaWRAP-Assembly module using
MegaHit [35] or metaSPAdes [36], which also produces
an assembly report. Both the reads from each sample
and the assembly can be rapidly taxonomically profiled
with the Kraken [29] module, producing interactive kro-
nagrams [37] of community taxonomy. It should be
Fig. 1 Overall workflow of metaWRAP. Modules (red), metagenomic data (green), intermediate (orange) and final bin sets (yellow), and data reports
and figures (blue)
Uritskiy et al. Microbiome (2018) 6:158 Page 3 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
noted that while Kraken is fast, post-classification
standardization may be needed to obtain a more accur-
ate community composition estimate [38]. The assembly
is then binned with the metaWRAP-Binning module by
three metagenomic binning softwareMaxBin2, meta-
BAT2, and CONCOCT [1921].
The other modules of metaWRAP focus on refining,
analyzing, and visualizing metagenomic bins from either
the Binning module or other sources. The metaWRAP-
Bin_refinement module consolidates multiple binning
predictions into a new, improved bin set, while also prov-
ing metrics of their completion and contamination. Meta-
WRAP-Reassemble_bins can then be used to reassemble
the reads belonging to each bin, improving their N50,
completion, and contamination. The resulting bins can be
visualized by using the metaWRAP-Blobology module
[39], which plots the contigs of the joint assembly on a
blob plot, annotating them with their taxonomy and bin
membership. The metaWRAP-Quant_bins module can be
used to quickly estimate the abundance of each bin in
each of the metagenomic samples. MetaWRAP-Classify_-
bins can be used to conservatively but accurately estimate
their taxonomy. Finally, the bins can be functionally anno-
tated with the metaWRAP-Annotate_bins module.
Compute time of metaWRAP modules
The runtime of each metaWRAPs modules was evaluated
on a subset of the Human Intestinal Tract (MetaHIT) sur-
vey [40]. The same subset is used in the metaWRAP
tutorial page https://github.com/bxlab/metaWRAP/blob/
master/Usage_tutorial.md. The data contained three
WMG samples, totaling 145.8 million 75 bp paired-end
reads, or 21.9 Gbp of sequencing data. MetaWRAP was
used to analyze this data set on a Linux server with 24
cores and 100 GB of RAM. All modules were run on de-
fault settings, and the total runtime of each module was
recorded (Additional file 5: Module runtime). The entire
pipeline was completed in 5 h and 36 min, with the major-
ity of compute time dedicated to the Read_qc, Binning,
Bin_refinement, and Reassemble_bins modules. With the
exception of CONCOCT [19], the programs wrapped into
metaWRAP can take advantage of multi-core systems and
scale well with larger data sets. MetaWRAP itself also par-
allelizes processes when possible.
MetaWRAP-Bin_refinement improved bin predictions in
synthetic data
To test the efficacy of the metaWRAP-Bin_refinement
module at consolidating and improving bin sets, we used
synthetic metagenomic data sets of varying complexity
from the Critical Assessment of Metagenomic Interpret-
ation (CAMI) study [9]. The gold standardassemblies
from the high,”“medium,and lowdiversity chal-
lenges were first binned with metaBAT2, Maxbin2, and
CONCOCT [1921] using the metaWRAP-Binning
module, and the resulting three bin sets were then con-
solidated with DAS_Tool [27], Binning_refiner [28], and
metaWRAP-Bin_refinement. The completion and con-
tamination of the bins in the original and refined bin
sets were evaluated with CheckM [24] (Additional file 6:
Figure S4) and Amber [41] (Additional file 7: Figure S5).
True recall and precision for each bin calculated with
Amber were converted to completion and contamination
percentages to be comparable to the CheckM results
(Fig. 2). We found that metaBAT2 consistently outper-
formed MaxBin2 and CONCOCT, producing a total of
385 high-quality bins between all the challenges (com-
pletion greater than 90% and contamination less than
5%) and 271 near-perfect bins (completion greater than
95% and contamination less than 1%). MaxBin2 came in
second with 275 high-quality bins and 164 near-perfect
bins. CONCOCT performed rather poorly in all but the
smallest CAMI challenge data sets, producing 58
high-quality bins and 40 near-perfect bins.
In the consolidated bin sets, DAS_Tool produced 426
high-quality bins and 263 near-perfect bins across all
CAMI challenges, while Binning_refiner produced 289
and 210 bins, respectively. DAS_Tool consistently pro-
duced high-completion bins; however, these bins had
relatively high contamination, which is a result of the ag-
gregation approach that DAS_Tool takes. Binning_refi-
ner on the other hand produced very pure bins with its
splitting approach; however, it did so at the expense of
significantly reduced completion. MetaWRAP-Bin_re-
finement produced bins that had both high completion
and low contamination. In total, it produced 457 high-
quality bins and 339 near-perfect bins (Fig. 2) due to
both splitting and aggregation steps. These results con-
firmed that metaWRAP not only consistently improved
bin sets through its consolidation approach, but it also
outperformed other consolidation algorithms in data sets
of varying complexity.
The CAMI challenge consisted of genomes of varying
degrees of similarity and categorized the genomes into
two broad categories depending on their average nucleo-
tide identity (ANI) to other genomes in the mix.
Unique strainsare defined as genomes with < 95%
ANI to any other genome and common strainsas ge-
nomes with 95% ANI to another genome in the data
set. [9] This gave us an opportunity to benchmark meta-
WRAP at recovering genomes from contig clusters of
varying complexity. We found that metaWRAP outper-
formed all other binning methods in reconstituting both
closely and distantly related genomes (Additional file 8:
CAMI binning summary table). Interestingly, we found
that Binning_refiner performed almost as well as meta-
WRAP in distantly related genomes but performed
poorly in closely related genomes. On the other hand,
Uritskiy et al. Microbiome (2018) 6:158 Page 4 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
DAS_Tool recovered almost as many closely related ge-
nomes as metaWRAP but performed relatively poorly in
more discrete genomes.
The use of CheckM (Additional file 6: Figure S4) and
Amber (Fig. 2) to evaluate the binning sets produced
similar results, although overall CheckM slightly overes-
timated both completion and contamination of the pro-
duced bins. More importantly, the relative performance
of the six binning approaches was the same when evalu-
ating with CheckM or Amber. This validated the use of
CheckM for benchmarking binning results in data sets
where the true genomes remain unknown.
Benchmarking metaWRAP on real metagenomes
MetaWRAPs performance was also assessed with real
WMG Illumina paired read sequencing data, using rep-
resentative metagenomic data sets from water, gut, and
soil microbiomes. The water data set was from a brack-
ish water survey, which investigated the seasonal dynam-
ics and biogeography of the surface bacterioplankton in
the Baltic Sea [42]. This data set included 36 samples for
a total of 196 Gbp of sequencing data. The gut data set
came from the Metagenomic of the Human Intestinal
Tract (MetaHIT) survey, which sequenced the gut
microbiota from volunteers across Europe to explore the
diversity and drivers in individual gut microbiome com-
position. [40]. The benchmarking data set consisted of
50 samples for a total of 144 Gbp of sequencing data.
The soil data came from sequencing the highly diverse
grassland soil microbial communities from Angelo
Coastal Reserve, CA [27]. This data set consisted of six
samples for a total of 481 Gbp of sequencing data.
Samples from each microbiome type were pre-proc-
essed through the metaWRAP-Read_qc module to trim
reads and remove human contamination, and the Kra-
ken and Blobology modules were used to evaluate the
taxonomic profile of the communities. The water sam-
ples were dominated by Proteobacteria, the gut samples
were dominated by Bacteroidetes and Firmicutes, and
the soil samples comprised of a wide variety of Proteo-
bacteria and Actinobacteria (Additional file 9). Notably,
contigs from the soil microbiomes had much higher GC
content compared to those of the gut and water. Also,
soil contigs did not form as many defined clusters on
the GC vs. abundance plot, suggesting that the commu-
nities were comprised of multiple closely related taxa
(Fig. 3). Due to the high GC content and high taxonomic
similarity of soil microbiota, this data set posed signifi-
cant binning challenges compared to the water and gut
microbiomes.
Bin_refinement improved bin predictions in real data
The quality-controlled reads from the representative
metagenomic data sets were then co-assembled with the
metaWRAP-Assembly module and the assemblies
binned with metaBAT2 Maxbin2 and CONCOCT using
the metaWRAP-Binning module. The resulting three bin
sets for each microbiome type were consolidated with
Fig. 2 True completion and contamination of bins recovered from the CAMIs high, medium, and low complexity synthetic data sets using
original binning software (metaBAT2, MaxBin2, and CONCOCT) and software consolidating the original sets (DAS_Tool, Binning_refiner, and
metaWRAPs Bin_refinement module). Only bins with 50% completion and 10% contamination are shown
Uritskiy et al. Microbiome (2018) 6:158 Page 5 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
DAS_Tool, Binning_refiner, and metaWRAP-Bin_refine-
ment, and the completion and contamination of the
resulting bins were evaluated with CheckM (Fig. 4).
Across the original binning software, metaBAT2 consist-
ently produced the best sets of bins when compared to
MaxBin2 and CONCOCT, with 202, 146, and 88 accept-
able quality bins (comp 50%, cont 10%) in the water,
gut, and soil samples, respectively. MaxBin2 had 151, 98,
and 40 bins, and CONCOCT 65, 121, and 39 bins.
Despite incorporating all the binning methods, DAS_-
Tool was unable to improve the original bin sets, produ-
cing 198, 130, and 63 acceptable quality bins in the
water, gut, and soil samples, respectively. DAS_Tool per-
formed relatively well at higher bin completion ranges
(80%), although at the expense of increased contamin-
ation. Binning_refiner performed similarly, with 206,
138, and 83 bins in the water, gut, and soil data sets, re-
spectively. The bins from Binning_refiner were less
complete but also had significantly lower contamination
than bins in the original bin sets. MetaWRAPs Bin_re-
finement module produced 235, 175, and 134 acceptable
quality bins in the water, gut, and soil samples, respect-
ively, significantly outperforming all other tested ap-
proaches. The module uses Binning_refiner in its
pipeline to hybridize the input bin sets and then chooses
the best version of each bin from the original and hy-
bridized sets. Because the Bin_refinement module lever-
ages the strength of Binning_refiner but still has a
collapsing step similar to DAS_Tool, it is able to match
DAS_Tools high-completion rankings, while retaining
the low-contamination rankings of Binning_refiner.
Overall, MetaWRAP consistently produced the highest
quality bin sets in all the tested metagenomic data sets,
which ranged greatly in diversity, taxonomic compos-
ition, and sequencing depths.
It is important to note that the use of metaWRAPs
Bin_refinement module to improve binning predictions
is not limited to the bin sets produced from the
metaWRAP-Binning module (metaBAT2, MaxBin2, and
CONCOCT). Bin sets from any two or three binning
software may be used as input for the module. Further-
more, because the algorithm leverages the differences
between the input bin predictions, it is also possible to
use bin sets produced from different parameters of the
same software as input.
Bin_refinement adjusts to the desired bin quality
To consolidate the original and hybridized bin sets,
metaWRAP-Bin_refinement chooses the best version of
each bin based on their completion and contamination
values. However, this selection is subjective and depends
on what the user believes to be the best bin.The mini-
mum completion (-c) and maximum contamination (-x)
options are key parameters that greatly alter the quality
of the bins produced, as the module will dynamically ad-
just its algorithms to produce the maximum number of
bins in this range.
To demonstrate the effects of changing the -c and -x
parameters of metaWRAPs Bin_refinement module, we
ran the original bin sets from the water, gut, and soil
Fig. 3 GC vs. abundance plots of contigs from the water, gut, and soil metagenomes, produced with the Blobology module. Abundance
of contigs was calculated from standardized read coverage in each sample. Contigs were annotated with their phylum taxonomy, as
determined by BLAST
Uritskiy et al. Microbiome (2018) 6:158 Page 6 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
data sets with varying minimum completion (but fixed
maximum contamination) (Additional file 10:Figure
S6) and varying maximum contamination (but fixed
minimum completion) (Additional file 11:FigureS7)
parameters. When compared to the original Bin_refine-
ment run (-c 50 -x 10), the module produced a greater
number of bins at any given threshold when it was
given custom -c and -x parameters. The improvements
were especially noticeable at higher completion and
lower contamination ranges. For example, MetaWRAP-
Bin_refinement -c 90 -x 10 recovered 19, 18, and 1
(water, gut, and soil, respectively) extra bins with a
minimumcompletionof90%,whencomparedtothe
baseline -c 50 -x 10 run. Similarly, MetaWRAP-Bin_re-
finement with -c 50 -x 1 parameters extracted 8, 21,
and 4 (water, gut, and soil, respectively) more bins at a
maximum contamination of 1%, when compared to the
baseline run. Unlike arbitrary and sometime confusing
thresholding parameters in many other software, the
minimum completion and maximum contamination
options offer the user an intuitive way to parameterize
metaWRAPs Bin_refinement module to their needs.
This leads to significant increases in the number of
quality bins they are able to extract from their data.
It is important to note that while refinement of bin-
ning predictions results in high-quality bins when evalu-
ated with single-copy gene numbers, they do not
represent the genomes of single individuals in a commu-
nity or even individual strains. In this context, a bin is
simply the optimized taxonomic clustering of contigs,
which themselves are representative consensus resulting
from the clustering of reads belonging to closely related
taxa. In the context of phylogeny, bins may represent in-
dividual strains, species, or even higher-order averaged
taxa, depending on the level of heterogeneity of the
community in question. In the literature, bins are some-
times referred to as population genomes [43], underlying
the complex nature of bins. As described in the context
of the CAMI challenge, the analysis of a community
with mostly unique strains,i.e., distantly related organ-
isms, will result in bins potentially representing species
or even strains, whereas the analysis of a community
with mostly common strains,i.e., closely related organ-
isms, will result in more hybrid bins. In reality, most
communities are an assemblage of both closely and dis-
tantly related taxa resulting in a range of bin qualities.
Because of this, contamination resulting from strain
heterogeneity is expected [44], and the desired bin
quality can be tailored to the requirements of the down-
stream applications. For accurate taxonomic assignment
of bins, a low contamination is important (15%) but a
high completion may not be (2050% may be sufficient).
For accurate reconstruction of metabolic potential on
the other hand, it is more important to reconstruct the
genome with a higher completion (9098%), even at the
expense of introducing contamination (510%), as long
as the user understands that the resulting bins represent
the averaging of closely related taxa. The parameterization
Fig. 4 Completion and contamination of bins recovered from the water, gut, and soil metagenomes using original binning software (metaBAT2,
MaxBin2, and CONCOCT) and software consolidating the original sets (DAS_Tool, Binning_refiner, and metaWRAPs Bin_refinement module). Only
bins with 50% completion and 10% contamination are shown (estimated by CheckM)
Uritskiy et al. Microbiome (2018) 6:158 Page 7 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
will also be constrained by the characteristics of the
microbiome in question. Communities with relatively low
diversity, low strain heterogeneity, and low GC content
(such as gut microbiomes) will yield bins with lower con-
tamination and higher completion than those extracted
from a community with high diversity, heterogeneity, and
average GC content (such as soil microbiomes).
Reassemble_bins significantly improved bin quality
MetaWRAPs Reassemble_bins module improves a given
set of bins through individual reassembly with SPAdes
[34]. The module only replaces the original bins if the
reassembled ones are better in terms of completion and
contamination. Like the Bin_refinement module, the
Reassemble_bins module takes in minimum completion
(-c) and maximum contamination (-x) parameters to
allow the user to define what they consider a goodbin.
The bins produced from the water, gut, and soil data
with metaWRAP-Bin_refinement module runs (-c 50 -x
10) were run through the metaWRAP-Reassemble_bins
module (-c 50 -x 10), and the resulting bins were
re-evaluated with CheckM [24].
The Reassemble_bins module improved upon 78%,
98%, and 2% of the bins in the water, gut, and soil bin
sets, respectively. The module significantly improved
the water and gut binsoverall metrics, increasing their
N50 and completion scores. Even more strikingly, the
reassembly process significantly reduced contamination
in these bin sets (Fig. 5). The success of the bin re-
assembly algorithm relies heavily on accurate and spe-
cific recruitment of the correct reads to each bin. In
very diverse and heterogeneous communities such as
those found in soil, the read recruitment may not be
specific enough. This confused the assembler during
the re-assembly stage and resulted in an improvement
for only a small fraction of the bins. However, draft ge-
nomes from the gut and water samples were still sig-
nificantly improved with the Reassemble_bins module
despite their complexity (Fig. 3). Just as with the bin-
ning process, it is important to note that the bins
resulting from the reassembly do not represent the true
genomes of individual organisms found in the commu-
nity but are rather consensus backbones for reads com-
ing from closely related organisms.
Fig. 5 N50, completion, and contamination metrics of original bins extracted from the water, gut, and soil metagenomes with the metaWRAPs
Bin_refinement module and the same bins reassembled with metaWRAPs Reassemble_bins module. Only bins with 50% completion and
10% contamination are shown (estimated with CheckM)
Uritskiy et al. Microbiome (2018) 6:158 Page 8 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
MetaWRAP produced high-quality draft genomes
We investigated the performance of different binning
approaches (both original binners and bin consolidation
software) when extracting high-quality draft genomes,
with a contamination less than 5% and completion
greater than 70%, 80%, 90%, and 95% (Fig. 6). The de-
fault run of metaWRAP-Bin_refinement consistently
produced the highest number of high-quality draft ge-
nomes in the water, gut, and soil data sets. These num-
bers further improved when re-running the module with
appropriate minimum completion (-c) settings (i.e., run-
ning Bin_refinement -c 90 when benchmarking for bins
with a minimum completion of 90%). This approach sig-
nificantly outperformed every other tested binning and
bin refinement method at every quality threshold.
The reassembly of the metaWRAP-derived bins with
the Reassemble_bins module made a further improve-
ment on the number of high-quality draft genomes
extracted from the gut and water data sets. Even the
default run of Reassemble_bins produced a signifi-
cantly better bin set compared to non-reassembled bin
sets produced by all tested software, including meta-
WRAPs Bin_refinement. However, just like in the
Bin_refinement runs, the results were further enhanced
when Reassemble_bins was provided with an appropri-
ate -c option.
When comparing to the original binning software
(MaxBin2, metaBAT2, and CONCOCT) and bin consoli-
dation tools (DAS_Tool and Binning_refiner), metaWRAP
produced the largest number of high-quality draft ge-
nomes in all the tested WMG data sets. Additionally, it
should also be considered that metaWRAP is capable of
improving bin sets from any binning software. Therefore,
when new metagenomic binning software are developed,
their outputs can still be used with metaWRAP refine-
ment and reassembly algorithms.
MetaWRAP enables analysis and visualization of
metagenomic bins
The rest of metaWRAP modules address examining and
processing a set of bins in preparation for downstream
analysis. The user may visualize the bins in the context
of the entire community with the Blobology module,
quantify their abundances across samples with the
Quant_bins module, estimate their taxonomy with the
Classify_bins module, and functionally annotate them
with the Annotate_bins module.
The metaWRAP-Quant_bins module was used to esti-
mate bin abundances across samples from their respect-
ive microbiome survey, and the results were shown in a
clustered heatmap (Additional file 12: Figure S8). Clus-
tered heatmaps may be used to infer bin co-abundance
and to identify similarities and differences between
samples. Because this approach considers the abun-
dances of every extracted bin individually, it offers
higher resolution information than when using higher
taxonomic ranks.
Fig. 6 Number of high-purity bins (less than 5% contamination) extracted from the water, gut, and soil metagenomes with 70%, 80%, 90%, and
95% completion (estimated with CheckM) using original binning software (metaBAT2, MaxBin2, and CONCOCT) and bin-refining algorithms
(Binning_refiner, DAS_Tool, metaWRAP-Bin_refinement, and metaWRAP-Reassemble_bins). MetaWRAP modules were run with varying
-c (minimum completion) parameters. MetaWRAPs Reassemble_bins module was run on the output of the Bin_refinement module
Uritskiy et al. Microbiome (2018) 6:158 Page 9 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Bins were also visualized with the metaWRAP-Blobol-
ogy module. The module produces GC vs. abundance
plots of contigs, annotated with their taxonomy [45]
(Fig. 3), bin membership (Fig. 7), or both (Add-
itional file 13: Figure S9). These plots allow for inspec-
tion of the extracted bins in the context of the entire
community that they belong to, as well as visualize the
relative success of the binning process.
The final reassembled bins were taxonomy profiled with
the metaWRAP-Classify_bins module (Additional file 14:
Bin taxonomy) and functionally annotated with the Anno-
tate_bins module. Together, this information may be used
in downstream analysis to investigate complex questions
about functional interactions and metabolic potential of
individual community members.
Conclusions
Genome-level analysis of WMG sequencing data is
essential in understanding the composition and func-
tion of microbiomes. Until now, this rapidly growing
field lacked a unifying platform to utilize the wealth
of currently available software and make them easily
accessible to researchers. MetaWRAP is a flexible
pipeline that can handle common tasks in metage-
nomic data analysis starting from raw read quality
control and ending in bin extraction and analysis.
MetaWRAP is easy to install through Bioconda and
simple to use, and its modularity gives the investiga-
tor flexibility in their analysis approach.
MetaWRAP contributed significant improvements to the
recovery of draft genomes from shotgun metagenomic data
through bin refinement and reassembly. The bin refine-
ment module uses a novel hybrid approach to consolidate
bin predictions from different binning software, producing
a single stronger set. This approach significantly outper-
formed individual binning software, as well as other
consolidation algorithms. The algorithm can adjust to
accommodate specific draft genome quality targets, making
it suitable for many research applications. MetaWRAPsbin
reassembly module further improved the draft genomes in
both completeness and purity. Finally, metaWRAP contains
multiple modules for analysis and evaluation of metage-
nomic binsbin taxonomy assignment, abundance estima-
tion, functional annotation, and visualization.
Availability and requirements
Project name: metaWRAP
Project home page: https://github.com/bxlab/metaWRAP
Operating system: Linux64
Programming languages: Shell; Python 2.7
Other requirements: Conda; other packages automatic-
ally installed with metaWRAP: CONCOCT [19], MaxBin2
[20], metaBAT [21], CheckM [24], Binning_refiner [28],
Kraken [29], Taxator-tk [30], BWA [32], SPAdes [34,36],
MegaHIT [35], KronaTools [37], Blobology [39], Mega-
BLAST [45], TrimGalore [46], BMTagger [47], FastQC
[48], Bowtie2 [49], Salmon [50,51], and PROKKA [52].
License: MIT
Fig. 7 GC vs. abundance plots of contigs from the water, gut, and soil metagenomes, produced with the Blobology module. Abundance of
contigs was calculated from standardized read coverage in each sample. The contigs were annotated with the bins that they belong to (bin
colors are chosen at random), allowing for quick inspection of binning success. Bins were produced with metaWRAPs Bin_Refinement module.
Only bins with 70% completion and 10% contamination are shown (estimated with CheckM)
Uritskiy et al. Microbiome (2018) 6:158 Page 10 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
Additional files
Additional file 1: Supplementary methods. Descriptions of analysis
pipelines to process the benchmarking data, and detailed outlines of the
algorithms in each metaWRAP module. (DOCX 170 kb)
Additional file 2: Figure S1. Detailed walkthrough of the data files,
software, databases, and custom scripts that metaWRAP uses. The
components of each metaWRAP module grouped and denoted with
dotted lines. (PNG 2140 kb)
Additional file 3: Figure S2. Logical workflow of the Bin_refinement
modules of metaWRAP. The module takes in three bin sets produced from
the same assembly by different software or different parameters of the
same software. Binning_refiner is used to create hybridized intermediates
(four possible combinations), and the completion and contamination of the
original and hybridized bins are estimated with CheckM. The best version of
each bin is then found in the resulting seven bin sets. (PNG 123 kb)
Additional file 4: Figure S3. Logical workflow of the Reassemble_bins
module, which extracts reads belonging to bins in a given bin set and
individually reassembles them. This process is done for perfectly mapping
reads (strict) and reads mapping with less than three mismatches
(permissive). For each bin, CheckM is used to choose the best bin
between the original and the two reassembled versions. (PNG 164 kb)
Additional file 5: Module runtime. The total real runtime of each module
of metaWRAP when analyzing three samples from the metaHIT gut
metagenomic survey. The modules were tested with default parameters on
a Linux x64 server with 24 cores and 100 GB of RAM. (XLSX 23 kb)
Additional file 6: Figure S4. Completion and contamination (determined
with CheckM) of bins recovered from the CAMIs high, medium, and low
complexity synthetic data sets using original binning software (metaBAT2,
MaxBin2, CONCOCT) and software consolidating the original sets (DAS_Tool,
Binning_refiner, metaWRAP). Only bins with 50 % completion and 10%
contamination are shown. (EPS 474 kb)
Additional file 7: Figure S5. True recall and precision (determined with
AMBER) of bins recovered from the CAMIs high, medium, and low complexity
synthetic data sets using original binning software (metaBAT2, MaxBin2,
CONCOCT) and software consolidating the original sets (DAS_Tool,
Binning_refiner, metaWRAP). Only bins with 0.5% recall and 0.9%
precision are shown. (EPS 474 kb)
Additional file 8: CAMI binning summary table. The number of bins
recovered at different quality thresholds (determined with AMBER) from the
CAMI challenge with original binning software (metaBAT2, MaxBin2,
CONCOCT) and software consolidating the original sets (DAS_Tool,
Binning_refiner, metaWRAP). MetaWRAP was run with default parameters.
Performance is shown for unique strain(ANI < 95% to any other genome)
and common strain(ANI >95% to another genome) genomes. (XLSX 39 kb)
Additional file 9: Taxonomic distribution of reads from water, gut, and
soil metagenomes, estimated with the metaWRAP-Kraken module.
(HTML 972 kb)
Additional file 10: Figure S6. Completion of bins recovered from water,
gut, and soil metagenomes with the metaWRAP-Bin_refinement module
with a varying minimum completion parameter (-c) but constant maximum
contamination parameter (-x 10). The numbers in the brackets indicate
the number of extra bins gained at that threshold compared to the
baseline run (-c 50 -x 10). Only bins with 50% completion and 10%
contamination are shown. (EPS 394 kb)
Additional file 11: Figure S7. Contamination of bins recovered from
water, gut, and soil metagenomes with the metaWRAP-Bin_refinement
module with a varying maximum contamination parameter (-x) but
constant minimum completion parameter (-c 50). The numbers in the
brackets indicate the number of extra bins gained at that threshold
compared to the baseline run (-c 50 -x 10). Only bins with 50%
completion and 10% contamination are shown. (EPS 99 kb)
Additional file 12: Figure S8. Clustered heatmaps showing the log of
bin abundance of bins extracted with metaWRAP-Bin_refinement (-c 50
-x 10) across samples in water, gut, and soil metagenomes, calculated
and plotted with metaWRAPs Quant_bins module. (PNG 974 kb)
Additional file 13: Figure S9. MetaWRAP-Blobology visualization of
water, gut, and soil metagenomes, showing the GC and average coverage
of each successfully binned contig (metaWRAP-Bin_refinement -c 70 -x 10)
in the assemblies and annotated with the taxonomy at the phylum
level and the bins that they belong to (bin colors are chosen at
random). (PNG 2629 kb)
Additional file 14: Bin taxonomy. Distribution of the taxonomy among
bacterial bins extracted from water, gut, and soil metagenomes using
metaWRAPs Bin_refinement module (-c 50 - x 10). Taxonomy estimated
with metaWRAPs Classify_bins module. (HTML 167 kb)
Abbreviations
ANI: Average nucleotide identity; -c: Minimum completion parameter;
comp: Completion; cont: Contamination; WMG: Whole metagenome;
-x: Maximum contamination parameter
Acknowledgements
We thank the early users of metaWRAP Alejandro Palomo, Keith Arora-Williams,
and Emine Ertekin for their patience and help with debugging; Bing Ma for
module suggestions; and Michael Sauria for computational support.
Funding
This work was supported by grants NNX15AP18G and NNX15AK57G from
NASA, grant DEB1556574 from the NSF, and grant HG006620 from NIH/
NHGRI.
Availability of data and materials
The data sets supporting the conclusions of this article are available from the
original CAMI challenge (https://data.cami-challenge.org/participate) for the
synthetic data sets, the National Centre for Biotechnology Information under
SRA numbers SRR2053273SRR2053308 (https://www.ncbi.nlm.nih.gov/
bioproject/PRJNA273799) for the Central Baltic Surface Water Metagenome,
SRA numbers ERR011087-ERR011136 (https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB2054) for the Metagenomic of the Human Intestinal Tract (MetaHIT)
survey, and Joint Genome Institute under Gold Analysis Project IDs
Ga0007435, Ga0007436, Ga0007437, Ga0007438, Ga0007439, and
Ga0007440 (https://gold.jgi.doe.gov/study?id=Gs0110119) for the soil
data. All analysis results and scripts used to generate figures are
available at https://github.com/ursky/metawrap_paper.
Authorscontributions
GU built, released, and maintained the metaWRAP software, ran the benchmarks,
and wrote the manuscript. JDR and JT provided ideas for building and
improving metaWRAP and edited the manuscript. All authors read and
approved the final manuscript.
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
PublishersNote
Springer Nature remains neutral with regard to jurisdictional claims in published
maps and institutional affiliations.
Received: 6 March 2018 Accepted: 29 August 2018
References
1. Jovel J, Patterson J, Wang W, Hotte N, O'Keefe S, Mitchel T, Perry T, Kao D,
Mason AL, Madsen KL, et al. Characterization of the gut microbiome using
16S or shotgun metagenomics. Front Microbiol. 2016;7:459.
2. Mendes LW, Braga LPP, Navarrete AA, Souza DG, Silva GGZ, Tsai SM. Using
metagenomics to connect microbial community biodiversity and functions.
Curr Issues Mol Biol. 2017;24:10318.
Uritskiy et al. Microbiome (2018) 6:158 Page 11 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
3. Quince C, Walker AW, Simpson JT, Loman NJ, Segata N. Shotgun
metagenomics, from sampling to analysis. Nat Biotechnol. 2017;35(9):83344.
4. Wang WL, Xu SY, Ren ZG, Tao L, Jiang JW, Zheng SS. Application of
metagenomics in the human gut microbiome. World J Gastroenterol.
2015;21(3):80314.
5. Guo J, Li J, Chen H, Bond PL, Yuan Z. Metagenomic analysis reveals
wastewater treatment plants as hotspots of antibiotic resistance genes and
mobile genetic elements. Water Res. 2017;123:46878.
6. Meyer KM, Klein AM, Rodrigues JL, Nusslein K, Tringe SG, Mirza BS, Tiedje
JM, Bohannan BJ. Conversion of Amazon rainforest to agriculture alters
community traits of methane-cycling organisms. Mol Ecol. 2017;26(6):
154756.
7. Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N,
Kotoulas G, Arvanitidis C, Iliopoulos I. Metagenomics: tools and insights for
analyzing next-generation sequencing data derived from biodiversity
studies. Bioinform Biol Insights. 2015;9:7588.
8. Roumpeka DD, Wallace RJ, Escalettes F, Fotheringham I, Watson M. A review
of bioinformatics tools for bio-prospecting from metagenomic sequence
data. Front Genet. 2017;8:23.
9. Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Droge J, Gregor I,
Majda S, Fiedler J, Dahms E, et al. Critical assessment of metagenome
interpretation - a benchmark of metagenomics software. Nat Methods.
2017;14(11):106371.
10. Piro VC, Matschkowski M, Renard BY. MetaMeta: integrating metagenome
analysis tools to improve taxonomic profiling. Microbiome. 2017;5(1):101.
11. Escobar-Zepeda A, Vera-Ponce de Leon A, Sanchez-Flores A. The road to
metagenomics: from microbiology to DNA sequencing technologies and
bioinformatics. Front Genet. 2015;6:348.
12. Sharpton TJ. An introduction to the analysis of shotgun metagenomic data.
Front Plant Sci. 2014;5:209.
13. Ladoukakis E, Kolisis FN, Chatziioannou AA. Integr ative workflows for
metagenomic analysis. Front Cell Dev Biol. 2014;2:70.
14. Batut B, Gravouil K, Defois C, Hiltemann S, Brugère J-F, Peyret aillade E,
Peyret P. ASaiM: a Galaxy-based framework to analyze raw shotgun
data from microbiota. bioRxiv. 201 7; https://doi.org/10.1101/183970.
15. Kesh S, Raghupathi W. Critical issues in bioinformatics and computing.
Perspect Health Inf Manag. 2004;1:9.
16. Keegan KP, Glass EM, Meyer F: MG-RAST, a metagenomics service for
analysis of microbial community structure and function. In: Microbial
Environmental Genomics (MEG). Martin F, Uroz S, https://doi.org/10.
1007/978-1-4939-3369-3_13. New York: Springer New York; 2016: 207233.
17. Chen IA, Markowitz VM, Chu K, Palaniappan K, Szeto E, Pillay M, Ratner A,
Huang J, Andersen E, Huntemann M, et al. IMG/M: integrated genome and
metagenome comparative data analysis system. Nucleic Acids Res. 2017;
45(D1):D50716.
18. Louvel G, Der Sarkissian C, Hanghoj K, Orlando L. metaBIT, an integrative
and automated metagenomic pipeline for analysing microbial profiles
from high-throughput sequencing shotgun data. Mol Ecol Resour. 2016;
16(6):141527.
19. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L,
Loman NJ, Andersson AF, Quince C. Binning metagenomic contigs by
coverage and composition. Nat Methods. 2014;11(11):11446.
20. Wu YW, Simmons BA, Singer SW. MaxBin 2.0: an automated binning
algorithm to recover genomes from multiple metagenomic datasets.
Bioinformatics. 2016;32(4):6057.
21. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately
reconstructing single genomes from complex microbial communities. PeerJ.
2015;3:e1165.
22. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF.
Time series community genomics analysis reveals rapid shifts in bacterial
species, strains, and phage during infant gut colonization. Genome Res.
2013;23(1):11120.
23. Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF,
Darling A, Malfatti S, Swan BK, Gies EA, et al. Insights into the phylogeny
and coding potential of microbial dark matter. Nature. 2013;499(7459):
4317.
24. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM:
assessing the quality of microbial genomes recovered from isolates, single
cells, and metagenomes. Genome Res. 2015;25(7):104355.
25. Mande SS, Mohammed MH, Ghosh TS. Classification of metagenomic
sequences: methods and challenges. Brief Bioinform. 2012;13(6):66981.
26. Imelfort M, Parks D, Woodcroft BJ, Dennis P, Hugenholtz P, Tyson GW.
GroopM: an automated tool for the recovery of population genomes from
related metagenomes. PeerJ. 2014;2:e603.
27. Sieber CMK, Probst AJ, Sharrar A, Thomas BC, Hess M, Tringe SG, Banfield JF.
Recovery of genomes from metagenomes via a dereplication, aggregation
and scoring strategy. Nat Microbiol. 2018; https://doi.org/10.1038/s41564-
018-0171-1.
28. Song WZ, Thomas T. Binning_refiner: improving genome bins through the
combination of different binning programs. Bioinformatics. 2017;33(12):
18735.
29. Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification
using exact alignments. Genome Biol. 2014;15(3):R46.
30. Droge J, Gregor I, McHardy AC. Taxator-tk: precise taxonomic assignment of
metagenomes by fast approximation of evolutionary neighborhoods.
Bioinformatics. 2015;31(6):81724.
31. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing
genomic features. Bioinformatics. 2010;26(6):8412.
32. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics. 2009;25(14):175460.
33. Grüning B, Dale R, Sjödin A, Rowe J, Chapman BA, Tomkins-Tinch CH,
Valieris R, Köster J. Bioconda: a sustainable and comprehensive
software distribution for the life sciences. bioRxiv. 2017; https://doi.
org/10.1101/207092.
34. Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin
VM, Nikolenko SI, Pham S, Prjibelski AD, et al. SPAdes: a new genome
assembly algorithm and its applications to single-cell sequencing. J
Comput Biol. 2012;19:45577.
35. Li D, Luo R, Liu CM, Leung CM, Ting HF, Sadakane K, Yamashita H, Lam TW.
MEGAHIT v1.0: A fast and scalable metagenome assembler driven by
advanced methodologies and community practices. Methods. 2016;
102:311.
36. Nurk S, Meleshko D, Korobeynikov A, Pevzner PA. metaSPAdes: a new versatile
metagenomic assembler. Genome Res. 2017;27(5):82434.
37. Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization
in a Web browser. BMC Bioinformatics. 2011;12:385.
38. Lu J, Breitwieser FP, Thielen P, Salzberg SL. Bracken: estimating species
abundance in metagenomics data. PeerJ Computer Science. 2017;3:e104.
39. Kumar S, Jones M, Koutsovoulos G, Clarke M, Blaxter M. Blobology: exploring
raw genome data for contaminants, symbionts and parasites using taxon-
annotated GC-coverage plots. Front Genet. 2013;4:237.
40. QinJ,LiR,RaesJ,ArumugamM,BurgdorfKS,ManichanhC,NielsenT,
Pons N, Levenez F, Yamada T, et al. A human gut microbial gene
catalogue established by metagenomic sequencing. Nature. 2010;
464(7285):5965.
41. Meyer F, Hofmann P, Belmann P, Garrido-Oter R, Fritz A, Sczyrba A, McHardy
AC. AMBER: Assessment of Metagenome BinnERs. bioRxiv. 2017; https://doi.
org/10.1101/239582.
42. Hugerth LW, Larsson J, Alneberg J, Lindh MV, Legrand C, Pinhassi J,
Andersson AF. Metagenome-assembled genomes uncover a global
brackish microbiome. Genome Biol. 2015;16:279.
43. Sangwan N, Xia F, Gilbert JA. Recovering complete and draft population
genomes from metagenome datasets. Microbiome. 2016;4:8.
44. Quince C, Delmont TO, Raguideau S, Alneberg J, Darling AE, Collins G, Eren
AM. DESMAN: a new tool for de novo extraction of strains from metagenomes.
Genome Biol. 2017;18(1):181.
45. Chen Y, Ye W, Zhang Y, Xu Y. High speed BLASTN: an accelerated MegaBLAST
search tool. Nucleic Acids Res. 2015;43(16):77628.
46. Krueger F. Trim Galore!: a wrapper tool around Cutadapt and FastQC to
consistently apply quality and adapter trimming to FastQ files. In.,
http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/,0.4.5
edn: Bioconda; 2015. Accessed 15 Feb 2018.
47. Agarwala R, Morgulis A: BMTagger aka Best Match Tagger is for
removing human reads from metagenomics datasets. In., ftp:// ftp.ncbi.
nlm.nih.gov/pub/agarwala/bmtagger/, 3.101 edn: Bioconda; 2010.
Accessed 15 Feb 2018.
48. Brown J, Pirrung M, McCue LA. FQC Dashboard: integrates FastQC
results into a web-bas ed, interactive, and extensible FASTQ quality
control tool. Bioinformatics. 2017; https://doi.org/10.1093/bioinformatics/
btx373.
49. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat
Methods. 2012;9(4):3579.
Uritskiy et al. Microbiome (2018) 6:158 Page 12 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
50. Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast
and bias-aware quantification of transcript expression. Nat Methods. 2017;
14(4):4179.
51. Alexander H, Brown CT: DIBSI Metagenomics Workshop at UC Davis. In.,
http://2017-dibsi-metagenomics.readthedocs.io/en/latest/; 2017. Accessed
15 Feb 2018.
52. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics.
2014;30(14):20689.
Uritskiy et al. Microbiome (2018) 6:158 Page 13 of 13
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... We further employed de novo assembly to recover metagenomeassembled genomes (MAGs) as previously described (Blaustein et al., 2023). In brief, metaSPAdes v.3.15.0 (Bankevich et al., 2012), MetaWRAP v.1.2.2 (Uritskiy et al., 2018), and GUNC v1.0.5 (Orakov et al., 2021) were used, each with default parameters, for assembly, binning (with concoct, maxbin2, and metabat2), and prediction and removal of chimeric MAGs, respectively. For the latter, MAGs with contamination greater than 0.05, clade separation greater than 0.45 and a reference representation score greater than 0.5 were considered chimeric and removed (Saheb Kashaf et al., 2022). ...
Article
Full-text available
Agricultural ponds are essential irrigation resources, though may also serve as reservoirs for pathogens and antimicrobial resistance (AMR) genes. While monitoring microbiological water quality is critical for food safety, the influence of sampling factors (e.g., when and where to collect samples) in making risk assessments and potential applications for using environmental covariates as indicators remain unclear. Here, we explored the hypothesis that metagenomes of agricultural waters change with spatiotemporal shifts in physicochemical water quality, i.e., across water depths over time. Water samples and underlying sediments were collected at a model pond at the surface and within the water column (0, 1, 2 m depths) throughout one day (i.e., 9:00, 12:00, 15:00). All samples were processed for shotgun metagenomic sequencing analysis and enumeration of various water quality parameters (e.g., temperature, nutrient concentrations, turbidity, pH, culturable Escherichia coli). At the pond surface, Microcystis aeruginosa and members of Cyanobacteria, along with genes encoding pathways related to photosynthesis and nucleotide biosynthesis, were enriched throughout the day. In contrast, within the water column (1–2 m depths) and sediments, diverse members of Proteobacteria and Actinobacteria were more dominant, along with encoded pathways related to respiration and amino acid biosynthesis. Various aspects of water quality (i.e., chlorophyll dissolved organic matter, ammonia, E. coli concentrations) correlated with water metagenome diversity, albeit not with any specific AMR genes or virulence factors. Nevertheless, de novo assembly of sequenced reads uncovered 22 unique strains encoding several AMR, virulence, or stress response genetic elements, thus linking metagenome functional potential to key taxa. Overall, our findings highlight distinctions in agricultural pond water metagenomes at the surface and in the water column and demonstrate the potential for metagenomic surveillance in water quality monitoring to support food safety.
... The MAG reconstruction, reads were reassembled using the assembly module of the MetaWRAP v1.2.1 pipeline (Uritskiy et al., 2018), which internally uses MEGAHIT v1.1.3 (Li et al., 2016). ...
Article
Full-text available
Salt-tolerant and halophilic microorganisms are critical drivers of ecosystem stability and biogeochemical cycling in athalassohaline environments. Lake Barkol, a high-altitude inland saline lake, provides a valuable natural setting for investigating microbial community dynamics and adaptation mechanisms under extreme salinity. In this study, we employed high-throughput metagenomic sequencing to characterize the taxonomic composition, metabolic potential, and ecological functions of microbial communities in both water and sediment samples from Lake Barkol. We reconstructed 309 metagenome-assembled genomes (MAGs), comprising 279 bacterial and 30 archaeal genomes. Notably, approximately 97% of the MAGs could not be classified at the species level, indicating substantial taxonomic novelty in this ecosystem. Dominant bacterial phyla included Pseudomonadota, Bacteroidota, Desulfobacterota, Planctomycetota, and Verrucomicrobiota, while archaeal communities were primarily composed of Halobacteriota, Thermoplasmatota, and Nanoarchaeota. Metabolic reconstruction revealed the presence of diverse carbon fixation pathways, including the Calvin-Benson-Bassham (CBB) cycle, the Arnon-Buchanan reductive tricarboxylic acid (rTCA) cycle, and the Wood-Ljungdahl pathway. Autotrophic sulfur-oxidizing bacteria, alongside members of Cyanobacteria and Desulfobacterota, were implicated in primary production and carbon assimilation. Nitrogen metabolism was predominantly mediated by Gammaproteobacteria, with evidence for both nitrogen fixation and denitrification processes. Sulfur cycling was largely driven by Desulfobacterota and Pseudomonadota, contributing to sulfate reduction and sulfur oxidation pathways. Microbial communities exhibited distinct osmoadaptation strategies. The “salt-in” strategy was characterized by ion transport systems such as Trk/Ktr potassium uptake and Na⁺/H⁺ antiporters, enabling active intracellular ion homeostasis. In contrast, the “salt-out” strategy involved the biosynthesis and uptake of compatible solutes including ectoine, trehalose, and glycine betaine. These strategies were differentially enriched between water and sediment habitats, suggesting spatially distinct adaptive responses to local salinity gradients and nutrient regimes. Additionally, genes encoding microbial rhodopsins were widely distributed, suggesting that rhodopsin-based phototrophy may contribute to supplemental energy acquisition under osmotic stress conditions. The integration of functional and taxonomic data highlights the metabolic versatility and ecological roles of microbial taxa in sustaining biogeochemical processes under hypersaline conditions. Overall, this study reveals extensive taxonomic novelty and functional plasticity among microbial communities in Lake Barkol and underscores the influence of salinity in structuring microbial assemblages and metabolic pathways in athalassohaline ecosystems.
... Metagenomic sequencing for each sample yielded 10 Gb raw data. Clean reads from all samples were pooled, co-assembled and binned using metaWrap v1.3 (Uritskiy et al., 2018), which ultimately resulted in 68 bins. Taxonomy annotation was performed using GTDB-tk v2.2.3 software (Chaumeil et al., 2020), and the relative abundance calculation was calculated with coverm v0.6.1 software. ...
Article
Full-text available
This study investigated the responses of the bacterial community structure and metabolic pathways in a sulfur-based autotrophic denitrification filter (SADF) system to fast elevated sulfate salinity, from 0.04 to 1.2% in 30 days. Results showed that the SADF system exhibited robust sulfate salinity stress tolerance at low nitrate concentrations. In the context of sulfate scenarios, the genus Thiobacillus significantly proliferated and was identified as the dominant sulfur-oxidizing player in the SADF system, achieving a relative abundance of 63.79% under 1.2% sulfate salinity. Cooperative and competitive interactions were found in the SADF-related microorganisms, promoting stable denitrification performance under high salinity. Surprisingly, with a low hydraulic retention time (HRT) of 60 min, metagenomic sequencing revealed a upregulated abundance of functional genes encoding for enzymes associated with nitrogen and sulfur metabolism, while positive correlations were observed between these two pathways in response to sulfate salinity. Furthermore, global wastewater treatment plants were thoroughly explored for the distribution of the SADF-related microorganisms identified in this study. Interestingly, one-way ANOVA analysis showed that the SADF-related microorganisms were widely distributed globally, demonstrating their universality in potential engineering applications worldwide.
... From paired reads, single sample assemblies (SA) were performed with MetaSPAdes assembler v3.15.5 [28] and co-assemblies per treatment group and sampling time were generated with Megahit v1.0.2 [29] with a minimum contig length of 1000 nt and the "-presets metalarge" option for large and complex metagenomes. From both SA and co-assemblies, bins were generated with the binning module of the MetaWRAP software v1.3 [30]. For SA, the binning tools implemented were CONCOCT, MaxBin2, and MetaBAT2 [31][32][33]. ...
Article
Full-text available
Background High-throughput sequencing technologies play an increasingly active role in the surveillance of major global health challenges, such as the emergence of antimicrobial resistance. The post-weaning period is of critical importance for the swine industry and antimicrobials are still required when infection occurs during this period. Here, two sequencing approaches, shotgun metagenomics and metatranscriptomics, have been applied to decipher the effect of different treatments used in post-weaning diarrhea on the transcriptome and resistome of pig gut microbiome. With this objective, a metagenome-assembled genome (MAG) catalogue was generated to use as a reference database for transcript mapping obtained from a total of 140 pig fecal samples in a cross-sectional and longitudinal design to study differential gene expression. The different treatments included antimicrobials trimethoprim/sulfamethoxazole, colistin, gentamicin, and amoxicillin, and an oral commercial vaccine, a control with water acidification, and an untreated control. For metatranscriptomics, fecal samples from pigs were selected before weaning, three days and four weeks post-treatment. Results The final non-redundant MAGs collection comprised a total of 1396 genomes obtained from single assemblies and co-assemblies per treatment group and sampling time from the metagenomics data. Analysis of antimicrobial resistance genes (ARGs) at this assembly level considerably reduced the total number of ARGs identified in comparison to those found at the reads level. Besides, from the metatranscriptomics data, half of those ARGs were detected transcriptionally active in all treatment groups. Differential gene expression between sampling times after treatment found major number of differential expressed genes (DEGs) against the group treated continuously with amoxicillin, with DEGs being correlated with antimicrobial resistance. Moreover, at three days post-treatment, a high number of significantly downregulated genes was detected in the group treated with gentamicin. At this sampling time, this group showed an altered expression of ribosomal-related genes, demonstrating the rapid effect of gentamicin to inhibit bacterial protein synthesis. Conclusions Different antimicrobial treatments can impact differently the transcriptome and resistome of microbial communities, highlighting the relevance of novel sequencing approaches to monitor the resistome and contribute to a more efficient antimicrobial stewardship.
Article
Methane seeps harbor uncharacterized animal–microbe symbioses with unique nutritional strategies. Three undescribed sea spider species (family Ammotheidae; genus Sericosura ) endemic to methane seeps were found along the eastern Pacific margin, from California to Alaska, hosting diverse methane- and methanol-oxidizing bacteria on their exoskeleton. δ ¹³ C tissue isotope values of in situ specimens corroborated methane assimilation (−45‰, on average). Live animal incubations with ¹³ C-labeled methane and methanol, followed by nanoscale secondary ion mass spectrometry, confirmed that carbon derived from both compounds was actively incorporated into the tissues within five days. Methano- and methylotrophs of the bacterial families Methylomonadaceae, Methylophagaceae and Methylophilaceae were abundant, based on environmental metagenomics and 16S rRNA sequencing, and fluorescence and electron microscopy confirmed dense epibiont aggregations on the sea spider exoskeleton. Egg sacs carried by the males hosted identical microbes suggesting vertical transmission. We propose that these sea spiders farm and feed on methanotrophic and methylotrophic bacteria, expanding the realm of animals known to harness C1 compounds as a carbon source. These findings advance our understanding of the biology of an understudied animal lineage, unlocking some of the unique nutritional links between the microbial and faunal food webs in the oceans.
Preprint
Full-text available
Background New generation of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies. Findings We therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides a curated collection of tools to explore and visualize taxonomic and functional information from raw amplicon, metagenomic or metatranscriptomic sequences. To guide different analyses, several customizable workflows are included. All workflows are supported by tutorials and Galaxy interactive tours to guide the users through the analyses step by step. ASaiM is implemented as Galaxy Docker flavour. It is scalable to many thousand datasets, but also can be used a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online ( http://asaim.readthedocs.io/ ) Conclusions Based on the Galaxy framework, ASaiM offers sophisticated analyses to scientists without command-line knowledge. ASaiM provides a powerful framework to easily and quickly explore microbiota data in a reproducible and transparent environment.
Preprint
Full-text available
Reconstructing the genomes of microbial community members is key to the interpretation of shotgun metagenome samples. Genome binning programs deconvolute reads or assembled contigs of such samples into individual bins, but assessing their quality is difficult due to the lack of evaluation software and standardized metrics. We present AMBER, an evaluation package for the comparative assessment of genome reconstructions from metagenome benchmark data sets. It calculates the performance metrics and comparative visualizations used in the first benchmarking challenge of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). As an application, we show the outputs of AMBER for ten different binnings on two CAMI benchmark data sets. AMBER is implemented in Python and available under the Apache 2.0 license on GitHub ( https://github.com/CAMI-challenge/AMBER ).
Article
Full-text available
Microbial communities are critical to ecosystem function. A key objective of metagenomic studies is to analyse organism-specific metabolic pathways and reconstruct community interaction networks. This requires accurate assignment of assembled genome fragments to genomes. Existing binning methods often fail to reconstruct a reasonable number of genomes and report many bins of low quality and completeness. Furthermore, the performance of existing algorithms varies between samples and biotopes. Here, we present a dereplication, aggregation and scoring strategy, DAS Tool, that combines the strengths of a flexible set of established binning algorithms. DAS Tool applied to a constructed community generated more accurate bins than any automated method. Indeed, when applied to environmental and host-associated samples of different complexity, DAS Tool recovered substantially more near-complete genomes, including previously unreported lineages, than any single binning method alone. The ability to reconstruct many near-complete genomes from metagenomics data will greatly advance genome-centric analyses of ecosystems.
Preprint
Full-text available
We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multi-platform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 contributors. Bioconda improves analysis reproducibility by allowing users to define isolated environments with defined software versions, all of which are easily installed and managed without administrative privileges.
Article
Full-text available
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Article
Full-text available
We introduce DESMAN for De novo Extraction of Strains from Metagenomes. Large multi-sample metagenomes are being generated but strain variation results in fragmentary co-assemblies. Current algorithms can bin contigs into metagenome-assembled genomes but are unable to resolve strain-level variation. DESMAN identifies variants in core genes and uses co-occurrence across samples to link variants into haplotypes and abundance profiles. These are then searched for against non-core genes to determine the accessory genome of each strain. We validated DESMAN on a complex 50-species 210-genome 96-sample synthetic mock data set and then applied it to the Tara Oceans microbiome. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1309-9) contains supplementary material, which is available to authorized users.
Article
Full-text available
Background: Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. Results: We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. Conclusions: In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .
Article
Full-text available
Microbes constitute about a third of the Earth's biomass and are composed by an enormous genetic diversity. In a majority of environments the microbial communities play crucial roles for the ecosystem functioning, where a drastic biodiversity alteration or loss could lead to negative effects on the environment and sustainability. A central goal in microbiome studies is to elucidate the relation between microbial diversity to functions. A better understanding of the relation diversity-function would increase the ability to manipulate that diversity to improve plant and animal health and also setting conservation priorities. The recent advances in genomic methodologies in microbial ecology have provide means to assess highly complex communities in detail, making possible the link between diversity and the functions performed by the microbes. In this work we first explore some advances in bioinformatics tools to connect the microbial community biodiversity to their potential metabolism and after present some examples of how this information can be useful for a better understanding of the microbial role in the environment.
Article
Diverse microbial communities of bacteria, archaea, viruses and single-celled eukaryotes have crucial roles in the environment and in human health. However, microbes are frequently difficult to culture in the laboratory, which can confound cataloging of members and understanding of how communities function. High-throughput sequencing technologies and a suite of computational pipelines have been combined into shotgun metagenomics methods that have transformed microbiology. Still, computational approaches to overcome the challenges that affect both assembly-based and mapping-based metagenomic profiling, particularly of high-complexity samples or environments containing organisms with limited similarity to sequenced genomes, are needed. Understanding the functions and characterizing specific strains of these communities offers biotechnological promise in therapeutic discovery and innovative ways to synthesize products using microbial factories and can pinpoint the contributions of microorganisms to planetary, animal and human health.