ArticlePublisher preview available

Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Metagenomic sequence analysis is rapidly becoming the primary source of virus discovery (1-3) . A substantial majority of the currently available virus genomes come from metagenomics, and some of these represent extremely abundant viruses, even if never grown in the laboratory. A particularly striking case of a virus discovered via metagenomics is crAssphage, which is by far the most abundant human-associated virus known, comprising up to 90% of sequences in the gut virome (4) . Over 80% of the predicted proteins encoded in the approximately 100 kilobase crAssphage genome showed no significant similarity to available protein sequences, precluding classification of this virus and hampering further study. Here we combine a comprehensive search of genomic and metagenomic databases with sensitive methods for protein sequence analysis to identify an expansive, diverse group of bacteriophages related to crAssphage and predict the functions of the majority of phage proteins, in particular those that comprise the structural, replication and expression modules. Most, if not all, of the crAss-like phages appear to be associated with diverse bacteria from the phylum Bacteroidetes, which includes some of the most abundant bacteria in the human gut microbiome and that are also common in various other habitats. These findings provide for experimental characterization of the most abundant but poorly understood members of the human-associated virome.
Letters
https://doi.org/10.1038/s41564-017-0053-y
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
1National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA. 2Institut Pasteur, Unité Biologie Moléculaire du Gène
chez les Extrêmophiles, Paris, France. 3Viral Information Institute, Department of Biology, San Diego State University, San Diego, CA, USA.
*e-mail: koonin@ncbi.nlm.nih.gov
Metagenomic sequence analysis is rapidly becoming the pri-
mary source of virus discovery13. A substantial majority of
the currently available virus genomes come from metagenom-
ics, and some of these represent extremely abundant viruses,
even if never grown in the laboratory. A particularly striking
case of a virus discovered via metagenomics is crAssphage,
which is by far the most abundant human-associated virus
known, comprising up to 90% of sequences in the gut virome4.
Over 80% of the predicted proteins encoded in the approxi-
mately 100 kilobase crAssphage genome showed no sig-
nificant similarity to available protein sequences, precluding
classification of this virus and hampering further study. Here
we combine a comprehensive search of genomic and metage-
nomic databases with sensitive methods for protein sequence
analysis to identify an expansive, diverse group of bacterio-
phages related to crAssphage and predict the functions of the
majority of phage proteins, in particular those that comprise
the structural, replication and expression modules. Most,
if not all, of the crAss-like phages appear to be associated
with diverse bacteria from the phylum Bacteroidetes, which
includes some of the most abundant bacteria in the human gut
microbiome and that are also common in various other habi-
tats. These findings provide for experimental characterization
of the most abundant but poorly understood members of the
human-associated virome.
Viruses are the most abundant biological entities on Earth. In
most environments, from ocean water to the content of animal guts,
the number of detected virus particles exceeds that of cells by one
to two orders of magnitude2. Among these viruses, more than 90%
are tailed bacteriophages1. More than 99% of the prokaryotic diver-
sity in the biosphere is represented by bacteria and archaea that fail
to grow in laboratory cultures and, accordingly, the great majori-
tyof the viruses are thought to infect these uncultivated microbes1.
Moreover, analysis of the human gut virome shows that most of the
sequences, in contrast to the bacterial and archaeal sequences, have
no matches in the current sequence databases, suggesting a vast
virome consisting primarily of ‘dark matter’57.
The crAssphage is the utmost manifestation of this trend.
The complete crAssphage (after Cross Assembly) genome was
assembled by joining contigs obtained from several human fae-
cal viral metagenomes as a circular double-stranded (ds) DNA
molecule of ~97 kilobases (kb)4. The circular genome map apparently
results from terminal redundancy and/or circular permutation.
The crAssphage is extremely abundant, accounting for up to 90%
of the reads in the virus-like particle-enriched fraction of the gut
metagenome and about 22% of the reads in the total metagenome.
Numerous reads matching the crAssphage genome have been iden-
tified in numerous gut metagenomes collected in diverse geographic
locations, indicating that crAssphage is not only the most abundant
virus in the human gut microbiome but also a (nearly) ubiquitous
one4,8,9. Read co-occurrence analysis points to bacteria of the phy-
lum Bacteroidetes as the host(s) of crAssphage4,10. This assignment
is compatible with the presence in the crAssphage genome of a pro-
tein containing carbohydrate-binding domains (BACON domains)
that is highly similar to a homologous protein from Bacteroides
and with partial matches between two crAssphage sequences and
CRISPR spacers from two species of Bacteroides4. Members of the
Bacteroidetes dominate the gut microbiome, but most of these bac-
teria so far have not been grown in culture11,12. Thus, it is hardly
surprising that the most abundant—but never isolated—phage from
this environment appears to be a parasite of Bacteroidetes. Analysis
of the protein sequences encoded in the crAssphage genome failed
to identify specific relationships with other bacteriophages4. Several
proteins implicated in phage genome replication have been identi-
fied, including a family of B DNA polymerase (DNAP), a primase
and a flavin-dependent thymidylate synthase, but neither the major
capsid protein nor other structural and morphogenetic proteins
were detected. In an attempt to clarify the provenance of this most
abundant but enigmatic human-associated virus, we reanalysed the
crAssphage genome using the most sensitive available methods for
protein sequence analysis and taking advantage of database growth
since the time of crAssphage discovery. The result is the identifica-
tion of a previously unknown, expansive bacteriophage family that
appears to be associated with diverse members of Bacteroidetes and
for which we now recognize the structural, replication and expres-
sion gene modules.
The sequences of the crAssphage proteins were compared (using
PSI-BLAST) to the non-redundant protein sequence database
(nr) and the Whole Genome Shotgun (WGS) databases (NCBI,
NIH, Bethesda) containing microbial genomic and metagenomic
sequences. Sequences with significant similarity to crAssphage
proteins were detected in four genomes of previously identified
bacteriophages and numerous contigs assigned to bacterial genomes
(possibly, prophages) and metagenomic contigs. These sequences
Discovery of an expansive bacteriophage family
that includes the most abundant viruses from
the human gut
Natalya Yutin1, Kira S. Makarova1, Ayal B. Gussow 1, Mart Krupovic 2, Anca Segall1,3,
Robert A. Edwards 3 and Eugene V. Koonin 1*
NATURE MICROBIOLOGY | VOL 3 | JANUARY 2018 | 38–46 | www.nature.com/naturemicrobiology
38
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Supplementary resource (1)

... Two of the assembled phage genomes (NCBI Genbank number: LSPY01000004) and LSPY01000006 existed in all collected colonies and showed similarity to a phage previously sequenced from a bacterium (Candidatus Azobacteroides pseudotrichonymphae, CAP) associated with cellulase digesting protozoa (Pseudotrichonympha sp.) in the gut of a subterranean termite species Prorhinotermes japonicus from Lanyu island (Taiwan) (Pramono et al., 2017). Phylogenetic analysis by Yutin et al. (2018) characterized both phages as crAss-like phage (Caudovirales, Podoviridae). Therefore, LSPY01000004 was named "Termite Gut CrAss-like phage 04" abbreviated as "Phage TG-crAlp-04" and LSPY01000006 was named "Termite Gut CrAss-like phage 06" abbreviated as "Phage TG-crAlp-06"). ...
... Therefore, LSPY01000004 was named "Termite Gut CrAss-like phage 04" abbreviated as "Phage TG-crAlp-04" and LSPY01000006 was named "Termite Gut CrAss-like phage 06" abbreviated as "Phage TG-crAlp-06"). CrAssphages were first identified in 2014 as the most abundant virus in the human gut microbiome (Dutilh et al., 2014;Yutin et al., 2018Yutin et al., , 2021Koonin and Yutin, 2020). Although hundreds of crAss-like phage genomes have been identified in silico, few crAssphages have been isolated in pure culture (Shkoporov et al., 2018Guerin et al., 2021). ...
... Their genome sizes (Phage TG-crAlp-04 -100,626 bp, Phage TG-crAlp-06 -98,173 bp) fell in the general size range of crAss-like phages which are typically around 100 kb in size (Guerin et al., 2021). Detailed annotation of the genomes of Phage TG-crAlp-04 and TG-crAlp-06 confirmed signature genes from all major phage modules typically found in phages in general and in crAss-like phages (Keary et al., 2014;Yutin et al., 2018;Koonin and Yutin, 2020). Protein sequences encoded by the signature genes in both phages matched to crAss-like phages in the NCBI database, thus, confirming the phylogenetic placement of the termite gut phages in the crAss-like phage group (Tikhe and Husseneder, 2018) similar to crAss-like phages from human gut and environmental metagenomes (Yutin et al., 2018(Yutin et al., , 2021. ...
Article
Full-text available
Subterranean termites depend nutritionally on their gut microbiota, which includes protozoa as well as taxonomically and functionally diverse bacteria. Our previous metavirome study revealed a high diversity and novel families of bacteriophages in the guts of Coptotermes formosanus workers from New Orleans, Louisiana, United States. Two assembled bacteriophage genomes (Phages TG-crAlp-04 and 06, family Podoviridae) existed in all colonies and showed similarity to a prophage (ProJPt-Bp1) previously sequenced from a bacterial endosymbiont (Candidatus Azobacteroides pseudotrichonymphae, CAP) of protozoa in the gut of a termite species of the genus Prorhinotermes from Taiwan. In this study the genomes of Phage TG-crAlp-04 and 06 were subjected to detailed functional annotation. Both phage genomes contained conserved genes for DNA packaging, head and tail morphogenesis, and phage replication. Approximately 30% of the amino acid sequences derived from genes in both genomes matched to those of ProJPt-Bp1 phage or other phages from the crAss-like phage group. No integrase was identified; the lack of a lysogeny module is a characteristic of crAss-like phages. Primers were designed to sequence conserved genes of the two phages and their putative host bacterium (CAP) to detect their presence in different termite species from native and introduced distribution ranges. Related strains of the host bacterium were found across different termite genera and geographic regions. Different termite species had separate CAP strains, but intraspecific geographical variation was low. These results together with the fact that CAP is an important intracellular symbiont of obligate cellulose-digesting protozoa, suggest that CAP is a core gut bacterium and co-evolved across several subterranean termite species. Variants of both crAss-like phages were detected in different Coptotermes species from the native and introduced range, but they did not differentiate by species or geographic region. Since similar phages were detected in different termite species, we propose the existence of a core virome associated with core bacterial endosymbionts of protozoa in the guts of subterranean termites. This work provides a strong basis for further study of the quadripartite relationship of termites, protozoa, bacteria, and bacteriophages.
... Consequently, human gut phage studies are limited to relatively low taxonomic levels. While recent efforts uncovered viral families that are widespread in human populations, such as the Crassvirales phages 15,16 , these have not been successfully linked to disease states. In order to develop microbiome-targeted interventions to benefit human health, it is pivotal to study such higher-level phage taxonomies in the gut among relevant cohorts. ...
... Despite efforts to catalog the human gut virome 14,32 , taxonomically higher structures are still largely absent. This study shows the worth of analyzing phages at higher taxonomic levels than genomes or VCs, similarly to what has been shown in recent years regarding the Crassvirales phage order 15,16 . Unlike the Crassvirales, however, Ca. ...
Article
Full-text available
There is significant interest in altering the course of cardiometabolic disease development via gut microbiomes. Nevertheless, the highly abundant phage members of the complex gut ecosystem -which impact gut bacteria- remain understudied. Here, we show gut virome changes associated with metabolic syndrome (MetS), a highly prevalent clinical condition preceding cardiometabolic disease, in 196 participants by combined sequencing of bulk whole genome and virus like particle communities. MetS gut viromes exhibit decreased richness and diversity. They are enriched in phages infecting Streptococcaceae and Bacteroidaceae and depleted in those infecting Bifidobacteriaceae. Differential abundance analysis identifies eighteen viral clusters (VCs) as significantly associated with either MetS or healthy viromes. Among these are a MetS-associated Roseburia VC that is related to healthy control-associated Faecalibacterium and Oscillibacter VCs. Further analysis of these VCs revealed the Candidatus Heliusviridae, a highly widespread gut phage lineage found in 90+% of participants. The identification of the temperate Ca. Heliusviridae provides a starting point to studies of phage effects on gut bacteria and the role that this plays in MetS.
... Thus, these crAssphages are widely regarded as virulent phages. Despite its ubiquity in human gut, over 80% of the predicted proteins in crAssphage genomes showed no significant similarity to annotated crAssphage protein sequences, hampering their identifications in newly sequenced metagenomic data [56]. The low similarity also poses a hard case for current lifestyle classification tools. ...
Preprint
Full-text available
Bacteriophages (or phages), which infect bacteria, have two distinct lifestyles: virulent and temperate. Predicting the lifestyle of phages helps decipher their interactions with their bacterial hosts, aiding phages' applications in fields such as phage therapy. Because experimental methods for annotating the lifestyle of phages cannot keep pace with the fast accumulation of sequenced phages, computational method for predicting phages' lifestyles has become an attractive alternative. Despite some promising results, computational lifestyle prediction remains difficult because of the limited known annotations and the sheer amount of sequenced phage contigs assembled from metagenomic data. In particular, most of the existing tools cannot precisely predict phages' lifestyles for short contigs. In this work, we develop PhaTYP (Phage TYPe prediction tool) to improve the accuracy of lifestyle prediction on short contigs. We design two different training tasks, self-supervised and fine-tuning tasks, to overcome lifestyle prediction difficulties. We rigorously tested and compared PhaTYP with four state-of-the-art methods: DeePhage, PHACTS, PhagePred, and BACPHLIP. The experimental results show that PhaTYP outperforms all these methods and achieves more stable performance on short contigs. In addition, we demonstrated the utility of PhaTYP for analyzing the phage lifestyle on human neonates' gut data. This application shows that PhaTYP is a useful means for studying phages in metagenomic data and helps extend our understanding of microbial communities.
... The crAssphages also appear to be abundant and widespread in diverse animal and environmental-human habitats (Yutin et al., 2018). Several bacteria of the phylum Bacteroidetes appear to be the primary hosts of crAssphages, being the 0CrAss001, the first crAssphage member isolated and infecting the Bacteroides intestinalis (Shkoporov et al., genera was proposed (Guerin et al., 2018). ...
Preprint
Full-text available
Viral metagenomics studies of the human gut microbiota unravel differences in phage populations between healthy and disease, stimulating interest in the role that phages play in bacterial ecosystem regulation. CrAssphages are not only the most abundant viruses but also are a common component of the gut phageome across human populations. However, the role of crAssphages in obesity (O) and obesity with metabolic syndrome (OMS) remains largely unknown. Therefore, we explored the role that crAssphages have on both diseases in a children's cohort. We found decreased crAssphage abundance, prevalence, richness, and diversity in O and OMS compared to normal-weight (NW), suggesting a loss of crAssphages stability in the human phageome associated with the disease. Interestingly, when we analyzed the abundance of host crAssphages bacteria, we found that Bacteroidetes, Bacteroidia, and Bacteroidales were significantly decreased in O and OMS, suggesting a possible relation with the loss of crAssphages stability. Regarding crAssphage taxonomy, a significantly decreased abundance of the crAssphage Alpha subfamily and the Alpha_1 and Alpha_4 genus and a significant overabundance of the Delta_8 was found in OMS. A strong taxonomical signature of obesity is the over-abundance of Bacilli, which also were significantly increased in O and OMS. Notably, we found a significant negative correlation between crAssphages and Bacilli abundances, suggesting an association between the decreased abundance of crAssphage and the over-abundance of Bacilli in OMS. Furthermore, we found a loss of crAssphage stability in the human virome associated with the presence of obesity, having a more significant impact on obesity with metabolic syndrome, suggesting that these bacteriophages could play an essential role in inhibiting metabolic syndrome in obese individuals. Our results open a promising treatment for these diseases through fecal crAssphage transplantation.
... Most gut commensal bacteria are strictly anaerobic and require a special growth medium and environment. Massive expansion of gut bacteriophages identified in silico has created new capabilities to further investigate interactions among phages, gut bacteria, immunity, and disease (Benler et Gregory et al., 2020;Nayfach et al., 2021;Yutin et al., 2018). But efforts to isolate gut bacteriophages have proven difficult, and only a limited number of gut phages have been isolated to date (Guerin et al., 2018(Guerin et al., , 2021Hryckowian et al., 2020;Porter et al., 2020). ...
Article
Full-text available
Advances in synthetic genomics have led to a great demand for genetic manipulation. Trimming any process to simplify and accelerate streamlining of genetic code into life holds great promise for synthesizing and studying organisms. Here, we develop a simple but powerful stepping-stone strategy to promote genome refactoring of viruses in one pot, validated by successful cross-genus and cross-order rebooting of 90 phages infecting 4 orders of popular pathogens. Genomic sequencing suggests that rebooting outcome is associated with gene number and DNA polymerase availability within phage genomes. We integrate recombineering, screening, and rebooting processes in one pot and demonstrate genome assembly and genome editing of phages by stepping-stone hosts in an efficient and economic manner. Under this framework, in vitro assembly, yeast-based assembly, or genetic manipulation of native hosts are not required. As additional stepping-stone hosts are being developed, this framework will open doors for synthetic phages targeting more pathogens and commensals.
... This set of phages is complemented by a large variable population. Interestingly, the phage community in the human digestive tract is also highly distinct among individuals, with the exception of crAssPhage and perhaps a few other phages that are frequently observed in many individuals (52). crAssPhage is ubiquitous in humans and several groups of primates (53). ...
Article
Full-text available
Viruses that infect bacteria (bacteriophages) are abundant in the microbial communities that live on and in plants and animals. However, our knowledge of the structure, dynamics, and function of these viral communities lags far behind our knowledge of their bacterial hosts.
Article
Bacteriophages (phages) are often described as obligate predators of their bacterial hosts, and phage predation is one of the leading forces controlling the density and distribution of bacterial populations. Every 48 h half of all bacteria on Earth are killed by phages. Efficient killing also forms the basis of phage therapy in humans and animals and the use of phages as food preservatives. In turn, bacteria have a plethora of resistance systems against phage attack, but very few bacterial species, if any, have entirely escaped phage predation. However, in complex communities and environments such as the human gut, this antagonistic model of attack and counter-defence does not fully describe the scope of phage–bacterium interactions. In this Review, we explore some of the more mutualistic aspects of phage–bacterium interactions in the human gut, and we suggest that the relationship between phages and their bacterial hosts in the gut is best characterized not as a fight to the death between enemies but rather as a mutualistic relationship between partners. Bacteriophages are obligate parasites of their bacterial hosts; nevertheless, on a population level, phage–bacterium interactions can have beneficial outcomes. In this Review, Shkoporov, Turkington and Hill discuss the evidence for such mutualistic interactions in the gut microbiota and their roles.
Article
The human body is colonized by a multitude of bacteria, fungi, and viruses, which play important roles in health and disease. Microbial colonization during early life is thought to be a particularly important period with lasting consequences for health. Viral populations in the gut are particularly dynamic in early life before they stabilize in adulthood. The composition of the early-life virome is increasingly recognized as a determinant of disease later in life. Here, we review the development of the virome in healthy infants, as well as the role of the early-life virome in the development of disease states including diarrhea, malnutrition, and autoimmune diseases.
Article
There is significant interest in altering the course of cardiometabolic disease development via the gut microbiome. Nevertheless, the highly abundant phage members -which impact gut bacteria- of the complex gut ecosystem remain understudied. Here, we characterized gut phageome changes associated with metabolic syndrome (MetS), a highly prevalent clinical condition preceding cardiometabolic disease. MetS gut phageome populations exhibited decreased richness and diversity, but larger inter-individual variation. These populations were enriched in phages infecting Bacteroidaceae and depleted in those infecting Ruminococcaeae. Differential abundance analysis identified eighteen viral clusters (VCs) as significantly associated with either MetS or healthy phageomes. Among these are a MetS-associated Roseburia VC that is related to healthy control-associated Faecalibacterium and Oscillibacter VCs. Further analysis of these VCs revealed the Candidatus Heliusviridae, a highly widespread gut phage lineage found in 90+% of the participants. The identification of the temperate Ca. Heliusviridae provides a novel starting point to a better understanding of the effect that phages have on their bacterial hosts and the role that this plays in MetS.
Article
Full-text available
Intense biological conflicts between prokaryotic genomes and their genomic parasites have resulted in an arms race in terms of the molecular “weaponry” deployed on both sides. Using a recursive computational approach, we uncovered a remarkable class of multidomain proteins with 2 to 15 domains in the same polypeptide deployed by viruses and plasmids in such conflicts. Domain architectures and genomic contexts indicate that they are part of a widespread conflict strategy involving proteins injected into the host cell along with parasite DNA during the earliest phase of infection. Their unique feature is the combination of domains with highly disparate biochemical activities in the same polypeptide; accordingly, we term them polyvalent proteins. Of the 131 domains in polyvalent proteins, a large fraction are enzymatic domains predicted to modify proteins, target nucleic acids, alter nucleotide signaling/metabolism, and attack peptidoglycan or cytoskeletal components. They further contain nucleic acid-binding domains, virion structural domains, and 40 novel uncharacterized domains. Analysis of their architectural network reveals both pervasive common themes and specialized strategies for conjugative elements and plasmids or (pro)phages. The themes include likely processing of multidomain polypeptides by zincin-like metallopeptidases and mechanisms to counter restriction or CRISPR/Cas systems and jump-start transcription or replication. DNA-binding domains acquired by eukaryotes from such systems have been reused in XPC/RAD4-dependent DNA repair and mitochondrial genome replication in kinetoplastids. Characterization of the novel domains discovered here, such as RNases and peptidases, are likely to aid in the development of new reagents and elucidation of the spread of antibiotic resistance. IMPORTANCE This is the first report of the widespread presence of large proteins, termed polyvalent proteins, predicted to be transmitted by genomic parasites such as conjugative elements, plasmids, and phages during the initial phase of infection along with their DNA. They are typified by the presence of multiple domains with disparate activities combined in the same protein. While some of these domains are predicted to assist the invasive element in replication, transcription, or protection of their DNA, several are likely to target various host defense systems or modify the host to favor the parasite's life cycle. Notably, DNA-binding domains from these systems have been transferred to eukaryotes, where they have been incorporated into DNA repair and mitochondrial genome replication systems.
Article
Full-text available
The gut microbiota is essentially a multifunctional bioreactor within a human being. The exploration of its enormous metabolic potential provides insights into the mechanisms underlying microbial ecology and interactions with the host. The data obtained using “shotgun” metagenomics capture information about the whole spectrum of microbial functions. However, each new study presenting new sequencing data tends to extract only a little of the information concerning the metabolic potential and often omits specific functions. A meta-analysis of the available data with an emphasis on biomedically relevant gene groups can unveil new global trends in the gut microbiota. As a step toward the reuse of metagenomic data, we developed a method for the quantitative profiling of user-defined groups of genes in human gut metagenomes. This method is based on the quick analysis of a gene coverage matrix obtained by pre-mapping the metagenomic reads to a global gut microbial catalogue. The method was applied to profile the abundance of several gene groups related to antibiotic resistance, phages, biosynthesis clusters and carbohydrate degradation in 784 metagenomes from healthy populations worldwide and patients with inflammatory bowel diseases and obesity. We discovered country-wise functional specifics in gut resistome and virome compositions. The most distinct features of the disease microbiota were found for Crohn’s disease, followed by ulcerative colitis and obesity. Profiling of the genes belonging to crAssphage showed that its abundance varied across the world populations and was not associated with clinical status. We demonstrated temporal resilience of crAssphage and the influence of the sample preparation protocol on its detected abundance. Our approach offers a convenient method to add value to accumulated “shotgun” metagenomic data by helping researchers state and assess novel biological hypotheses.
Article
Full-text available
The HU superfamily of proteins, with a unique DNA-binding mode, has been extensively studied as the primary chromosome-packaging protein of the bacterial superkingdom. Representatives also play a role in DNA-structuring during recombination events and in eukaryotic organellar genome maintenance. However, beyond these well-studied roles, little is understood of the functional diversification of this large superfamily. Using sensitive sequence and structure analysis methods we identify multiple novel clades of the HU superfamily. We present evidence that a novel eukaryotic clade prototyped by the human CCDC81 protein acquired roles beyond DNA-binding, likely in protein-protein interaction in centrosome organization and as a potential cargo-binding protein in conjunction with Dynein-VII. We also show that these eukaryotic versions were acquired via an early lateral transfer from bacteroidetes, where we predict a role in chromosome partition. This likely happened prior to the last eukaryotic common ancestor, pointing to potential endosymbiont contributions beyond that of the mitochondrial progenitor. Further, we show that the dramatic lineage-specific expansion of this domain in the bacteroidetes lineage primarily is linked to a functional shift related to potential recognition and preemption of genome invasive entities such as mobile elements. Remarkably, the CCDC81 clade has undergone a similar massive lineage-specific expansion within the archosaurian lineage in birds, suggesting a possible use of the HU superfamily in a similar capacity in recognition of non-self molecules even in this case.
Article
Full-text available
Termites depend nutritionally on their gut microbes, and protistan, bacterial, and archaeal gut communities have been extensively studied. However, limited information is available on viruses in the termite gut. We herein report the complete genome sequence (99,517 bp) of a phage obtained during a genome analysis of "Candidatus Azobacteroides pseudotrichonymphae" phylotype ProJPt-1, which is an obligate intracellular symbiont of the cellulolytic protist Pseudotrichonympha sp. in the gut of the termite Prorhinotermes japonicus. The genome of the phage, designated ProJPt-Bp1, was circular or circularly permuted, and was not integrated into the two circular chromosomes or five circular plasmids composing the host ProJPt-1 genome. The phage was putatively affiliated with the order Caudovirales based on sequence similarities with several phage-related genes; however, most of the 52 protein-coding sequences had no significant homology to sequences in the databases. The phage genome contained a tRNA-Gln (CAG) gene, which showed the highest sequence similarity to the tRNA-Gln (CAA) gene of the host "Ca. A. pseudotrichonymphae" phylotype ProJPt-1. Since the host genome lacked a tRNA-Gln (CAG) gene, the phage tRNA gene may compensate for differences in codon usage bias between the phage and host genomes. The phage genome also contained a non-coding region with high nucleotide sequence similarity to a region in one of the host plasmids. No other phage-related sequences were found in the host ProJPt-1 genome. To the best of our knowledge, this is the first report of a phage from an obligate, mutualistic endosymbiont permanently associated with eukaryotic cells.
Article
Full-text available
Viruses are the most abundant biological entities on earth and show remarkable diversity of genome sequences, replication and expression strategies, and virion structures. Evolutionary genomics of viruses revealed many unexpected connections but the general scenario(s) for the evolution of the virosphere remains a matter of intense debate among proponents of the cellular regression, escaped genes, and primordial virus world hypotheses. A comprehensive sequence and structure analysis of major virion proteins indicates that they evolved on about 20 independent occasions, and in some of these cases likely ancestors are identifiable among the proteins of cellular organisms. Virus genomes typically consist of distinct structural and replication modules that recombine frequently and can have different evolutionary trajectories. The present analysis suggests that, although the replication modules of at least some classes of viruses might descend from primordial selfish genetic elements, bona fide viruses evolved on multiple, independent occasions throughout the course of evolution by the recruitment of diverse host proteins that became major virion components.
Article
Full-text available
The number and diversity of viral sequences that are identified in metagenomic data far exceeds that of experimentally characterized virus isolates. In a recent workshop, a panel of experts discussed the proposal that, with appropriate quality control, viruses that are known only from metagenomic data can, and should be, incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV). Although a taxonomy that is based on metagenomic sequence data alone represents a substantial departure from the traditional reliance on phenotypic properties, the development of a robust framework for sequence-based virus taxonomy is indispensable for the comprehensive characterization of the global virome. In this Consensus Statement article, we consider the rationale for why metagenomic sequence data should, and how it can, be incorporated into the ICTV taxonomy, and present proposals that have been endorsed by the Executive Committee of the ICTV.
Article
Full-text available
Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host tax-onomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure d * 2 at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacte-riophage and 2699 bacterial genomes, d * 2 host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Eu-clidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, d * 2-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metage-nomic contigs from the same habitat or samples as the query viruses. The d * 2 ONF method will greatly improve the characterization of novel, metagenomic viruses.
Article
Full-text available
The role of bacteriophages in influencing the structure and function of the healthy human gut microbiome is unknown. With few exceptions, previous studies have found a high level of heterogeneity in bacteriophages from healthy individuals. To better estimate and identify the shared phageome of humans, we analyzed a deep DNA sequence dataset of active bacteriophages and available metagenomic datasets of the gut bacteriophage community from healthy individuals. We found 23 shared bacteriophages in more than one-half of 64 healthy individuals from around the world. These shared bacteriophages were found in a significantly smaller percentage of individuals with gastrointestinal/irritable bowel disease. A network analysis identified 44 bacteriophage groups of which 9 (20%) were shared in more than one-half of all 64 individuals. These results provide strong evidence of a healthy gut phageome (HGP) in humans. The bacteriophage community in the human gut is a mixture of three classes: a set of core bacteriophages shared among more than one-half of all people, a common set of bacteriophages found in 20-50% of individuals, and a set of bacteriophages that are either rarely shared or unique to a person. We propose that the core and common bacteriophage communities are globally distributed and comprise the HGP, which plays an important role in maintaining gut microbiome structure/function and thereby contributes significantly to human health.
Article
Over the last decade, our appreciation for the contribution of resident gut microorganisms—the gut microbiota—to human health has surged. However, progress is limited by the sheer diversity and complexity of these microbial communities. Compounding the challenge, the majority of our commensal microorganisms are not close relatives of Escherichia coli or other model organisms and have eluded culturing and manipulation in the laboratory. In this Review, we discuss how over a century of study of the readily cultured, genetically tractable human gut Bacteroides has revealed important insights into the biochemistry, genomics and ecology that make a gut bacterium a gut bacterium. While genome and metagenome sequences are being produced at breakneck speed, the Bacteroides provide a significant ‘jump-start’ on uncovering the guiding principles that govern microbiota–host and inter-bacterial associations in the gut that will probably extend to many other members of this ecosystem.
Article
Applying synthetic biology to engineer gut-resident microbes provides new avenues to investigate microbe-host interactions, perform diagnostics, and deliver therapeutics. Here, we describe a platform for engineering Bacteroides, the most abundant genus in the Western microbiota, which includes a process for high-throughput strain modification. We have identified a novel phage promoter and translational tuning strategy and achieved an unprecedented level of expression that enables imaging of fluorescent-protein-expressing Bacteroides stably colonizing the mouse gut. A detailed characterization of the phage promoter has provided a set of constitutive promoters that span over four logs of strength without detectable fitness burden within the gut over 14 days. These promoters function predictably over a 1,000,000-fold expression range in phylogenetically diverse Bacteroides species. With these promoters, unique fluorescent signatures were encoded to allow differentiation of six species within the gut. Fluorescent protein-based differentiation of isogenic strains revealed that priority of gut colonization determines colonic crypt occupancy.