A preview of this full-text is provided by Springer Nature.
Content available from Nature Microbiology
This content is subject to copyright. Terms and conditions apply.
Letters
https://doi.org/10.1038/s41564-017-0053-y
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
1National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA. 2Institut Pasteur, Unité Biologie Moléculaire du Gène
chez les Extrêmophiles, Paris, France. 3Viral Information Institute, Department of Biology, San Diego State University, San Diego, CA, USA.
*e-mail: koonin@ncbi.nlm.nih.gov
Metagenomic sequence analysis is rapidly becoming the pri-
mary source of virus discovery1–3. A substantial majority of
the currently available virus genomes come from metagenom-
ics, and some of these represent extremely abundant viruses,
even if never grown in the laboratory. A particularly striking
case of a virus discovered via metagenomics is crAssphage,
which is by far the most abundant human-associated virus
known, comprising up to 90% of sequences in the gut virome4.
Over 80% of the predicted proteins encoded in the approxi-
mately 100 kilobase crAssphage genome showed no sig-
nificant similarity to available protein sequences, precluding
classification of this virus and hampering further study. Here
we combine a comprehensive search of genomic and metage-
nomic databases with sensitive methods for protein sequence
analysis to identify an expansive, diverse group of bacterio-
phages related to crAssphage and predict the functions of the
majority of phage proteins, in particular those that comprise
the structural, replication and expression modules. Most,
if not all, of the crAss-like phages appear to be associated
with diverse bacteria from the phylum Bacteroidetes, which
includes some of the most abundant bacteria in the human gut
microbiome and that are also common in various other habi-
tats. These findings provide for experimental characterization
of the most abundant but poorly understood members of the
human-associated virome.
Viruses are the most abundant biological entities on Earth. In
most environments, from ocean water to the content of animal guts,
the number of detected virus particles exceeds that of cells by one
to two orders of magnitude2. Among these viruses, more than 90%
are tailed bacteriophages1. More than 99% of the prokaryotic diver-
sity in the biosphere is represented by bacteria and archaea that fail
to grow in laboratory cultures and, accordingly, the great majori-
tyof the viruses are thought to infect these uncultivated microbes1.
Moreover, analysis of the human gut virome shows that most of the
sequences, in contrast to the bacterial and archaeal sequences, have
no matches in the current sequence databases, suggesting a vast
virome consisting primarily of ‘dark matter’5–7.
The crAssphage is the utmost manifestation of this trend.
The complete crAssphage (after Cross Assembly) genome was
assembled by joining contigs obtained from several human fae-
cal viral metagenomes as a circular double-stranded (ds) DNA
molecule of ~97 kilobases (kb)4. The circular genome map apparently
results from terminal redundancy and/or circular permutation.
The crAssphage is extremely abundant, accounting for up to 90%
of the reads in the virus-like particle-enriched fraction of the gut
metagenome and about 22% of the reads in the total metagenome.
Numerous reads matching the crAssphage genome have been iden-
tified in numerous gut metagenomes collected in diverse geographic
locations, indicating that crAssphage is not only the most abundant
virus in the human gut microbiome but also a (nearly) ubiquitous
one4,8,9. Read co-occurrence analysis points to bacteria of the phy-
lum Bacteroidetes as the host(s) of crAssphage4,10. This assignment
is compatible with the presence in the crAssphage genome of a pro-
tein containing carbohydrate-binding domains (BACON domains)
that is highly similar to a homologous protein from Bacteroides
and with partial matches between two crAssphage sequences and
CRISPR spacers from two species of Bacteroides4. Members of the
Bacteroidetes dominate the gut microbiome, but most of these bac-
teria so far have not been grown in culture11,12. Thus, it is hardly
surprising that the most abundant—but never isolated—phage from
this environment appears to be a parasite of Bacteroidetes. Analysis
of the protein sequences encoded in the crAssphage genome failed
to identify specific relationships with other bacteriophages4. Several
proteins implicated in phage genome replication have been identi-
fied, including a family of B DNA polymerase (DNAP), a primase
and a flavin-dependent thymidylate synthase, but neither the major
capsid protein nor other structural and morphogenetic proteins
were detected. In an attempt to clarify the provenance of this most
abundant but enigmatic human-associated virus, we reanalysed the
crAssphage genome using the most sensitive available methods for
protein sequence analysis and taking advantage of database growth
since the time of crAssphage discovery. The result is the identifica-
tion of a previously unknown, expansive bacteriophage family that
appears to be associated with diverse members of Bacteroidetes and
for which we now recognize the structural, replication and expres-
sion gene modules.
The sequences of the crAssphage proteins were compared (using
PSI-BLAST) to the non-redundant protein sequence database
(nr) and the Whole Genome Shotgun (WGS) databases (NCBI,
NIH, Bethesda) containing microbial genomic and metagenomic
sequences. Sequences with significant similarity to crAssphage
proteins were detected in four genomes of previously identified
bacteriophages and numerous contigs assigned to bacterial genomes
(possibly, prophages) and metagenomic contigs. These sequences
Discovery of an expansive bacteriophage family
that includes the most abundant viruses from
the human gut
Natalya Yutin1, Kira S. Makarova1, Ayal B. Gussow 1, Mart Krupovic 2, Anca Segall1,3,
Robert A. Edwards 3 and Eugene V. Koonin 1*
NATURE MICROBIOLOGY | VOL 3 | JANUARY 2018 | 38–46 | www.nature.com/naturemicrobiology
38
Content courtesy of Springer Nature, terms of use apply. Rights reserved