A preview of this full-text is provided by Springer Nature.
Content available from Nature Methods
This content is subject to copyright. Terms and conditions apply.
Nature Methods | Volume 20 | August 2023 | 1170–1173 1170
nature methods
Brief Communication
https://doi.org/10.1038/s41592-023-01934-8
A comparison of single-coverage and
multi-coverage metagenomic binning
reveals extensive hidden contamination
Jennifer Mattock 1 & Mick Watson 2,3
Metagenomic binning has revolutionized the study of uncultured
microorganisms. Here we compare single- and multi-coverage binning on
the same set of samples, and demonstrate that multi-coverage binning
produces better results than single-coverage binning and identies
contaminant contigs and chimeric bins that other approaches miss. While
resource expensive, multi-coverage binning is a superior approach and
should always be performed over single-coverage binning.
Metagenomic binning, the resolution of metagenomic sequence data
into individual genomes, has been used to identify hundreds of thou-
sands of genomes from microbiome samples
1–6
. These studies are ena-
bled by software that groups together assembled contigs based on the
assumption that contigs with similar sequence content and coverage
profiles across multiple samples probably originate from the same
genome7,8. However, calculating coverage from multiple samples repre-
sents a problem for large sample sizes, requiring an all-against-all com-
parison. It has therefore become routine for single-coverage binning
to be performed for large datasets. Previous research has described
multi-coverage binning in the context of co-assembly, finding that at
least five samples are required for it to be worthwhile3; increasing the
number of samples when performing multi-coverage binning decreased
the contamination and increased the completeness of bins
7,9
. However,
co-assembly is suboptimal as it allows the reconstruction of only one
bin per species3.
In this Brief Communication, we compare single- and multi-coverage
binning on the same dataset, to quantify the effect of the loss of
coverage information on the quantity and quality of bins produced.
We hypothesize that single-coverage binning will frequently bin
together contigs that are co-abundant only in a single sample (Fig. 1a),
that these errors represent invisible contamination and that they can
be detected by using multi-coverage data.
Forty-two rumen microbiome samples were assembled and
binned using two strategies, single-coverage and multi-coverage bin-
ning. All other parameters remained the same. The completeness and
contamination results for all bins produced by both methods are shown
in Fig. 1b. Minimal difference is observed between the distribution
of completeness scores in the single- and multi-coverage bins; how-
ever, the single-coverage bins have increased contamination: 22.5%
(1,273/5,658) of the single-coverage bins have a contamination score
of 5 or greater versus 3.5% (293/8,420) of the multi-coverage bins. This
suggests that more contigs classed as contaminant DNA are incorpo-
rated using the single-coverage approach.
The single-coverage approach produced a total of 5,658 bins across
the 42 samples, whereas the multi-coverage approach produced 8,420
(Fig. 1c). A filtered set of bins was produced using completeness and
contamination cutoffs that have previously been used in ruminants6,10–12
(completeness ≥80% and contamination ≤10%). Using these cutoffs,
the single-coverage approach produced 931 filtered bins, compared
to 1,660 produced by the multi-coverage approach, an increase of 78%.
This suggests that the multi-coverage approach results in more bins of
higher quality. The filtered bins were used for all downstream analysis.
The taxonomies produced by either binning method were com-
pared. Variation was observed in the proportion of bins belonging to
each taxa at each rank. A greater proportion of the multi-coverage bins
were archaea (4.3%) than in the single-coverage bins (3.1%). In both
approaches the predominant phyla was Bacteroidota with a slight
variation in the Firmicutes/Bacteroidota ratio, 1.28 in multi-coverage
bins versus 1.05 in multi-coverage bins. One phylum, Patescibacte-
ria; two classes, Endomicrobia and Saccharimonadia, three orders,
nine families, 35 genera and 96 species were found exclusively in the
multi-coverage bins. Just two genera and 11 species were found exclu-
sively in the single-coverage bins. This suggests that single-coverage
binning may overlook taxa that can be recovered using multi-coverage
binning, perhaps due to the increased coverage data available with
Received: 22 February 2022
Accepted: 28 May 2023
Published online: 29 June 2023
Check for updates
1The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK. 2Centre for Digital Innovation,
DSM Biotechnology Center, Delft, The Netherlands. 3Scotland’s Rural College, Peter Wilson Building, King’s Buildings, Edinburgh, UK.
e-mail: mick.watson@dsm.com
Content courtesy of Springer Nature, terms of use apply. Rights reserved