ArticlePublisher preview available

Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Metagenomic sequence analysis is rapidly becoming the primary source of virus discovery (1-3) . A substantial majority of the currently available virus genomes come from metagenomics, and some of these represent extremely abundant viruses, even if never grown in the laboratory. A particularly striking case of a virus discovered via metagenomics is crAssphage, which is by far the most abundant human-associated virus known, comprising up to 90% of sequences in the gut virome (4) . Over 80% of the predicted proteins encoded in the approximately 100 kilobase crAssphage genome showed no significant similarity to available protein sequences, precluding classification of this virus and hampering further study. Here we combine a comprehensive search of genomic and metagenomic databases with sensitive methods for protein sequence analysis to identify an expansive, diverse group of bacteriophages related to crAssphage and predict the functions of the majority of phage proteins, in particular those that comprise the structural, replication and expression modules. Most, if not all, of the crAss-like phages appear to be associated with diverse bacteria from the phylum Bacteroidetes, which includes some of the most abundant bacteria in the human gut microbiome and that are also common in various other habitats. These findings provide for experimental characterization of the most abundant but poorly understood members of the human-associated virome.
Letters
https://doi.org/10.1038/s41564-017-0053-y
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
1National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA. 2Institut Pasteur, Unité Biologie Moléculaire du Gène
chez les Extrêmophiles, Paris, France. 3Viral Information Institute, Department of Biology, San Diego State University, San Diego, CA, USA.
*e-mail: koonin@ncbi.nlm.nih.gov
Metagenomic sequence analysis is rapidly becoming the pri-
mary source of virus discovery13. A substantial majority of
the currently available virus genomes come from metagenom-
ics, and some of these represent extremely abundant viruses,
even if never grown in the laboratory. A particularly striking
case of a virus discovered via metagenomics is crAssphage,
which is by far the most abundant human-associated virus
known, comprising up to 90% of sequences in the gut virome4.
Over 80% of the predicted proteins encoded in the approxi-
mately 100 kilobase crAssphage genome showed no sig-
nificant similarity to available protein sequences, precluding
classification of this virus and hampering further study. Here
we combine a comprehensive search of genomic and metage-
nomic databases with sensitive methods for protein sequence
analysis to identify an expansive, diverse group of bacterio-
phages related to crAssphage and predict the functions of the
majority of phage proteins, in particular those that comprise
the structural, replication and expression modules. Most,
if not all, of the crAss-like phages appear to be associated
with diverse bacteria from the phylum Bacteroidetes, which
includes some of the most abundant bacteria in the human gut
microbiome and that are also common in various other habi-
tats. These findings provide for experimental characterization
of the most abundant but poorly understood members of the
human-associated virome.
Viruses are the most abundant biological entities on Earth. In
most environments, from ocean water to the content of animal guts,
the number of detected virus particles exceeds that of cells by one
to two orders of magnitude2. Among these viruses, more than 90%
are tailed bacteriophages1. More than 99% of the prokaryotic diver-
sity in the biosphere is represented by bacteria and archaea that fail
to grow in laboratory cultures and, accordingly, the great majori-
tyof the viruses are thought to infect these uncultivated microbes1.
Moreover, analysis of the human gut virome shows that most of the
sequences, in contrast to the bacterial and archaeal sequences, have
no matches in the current sequence databases, suggesting a vast
virome consisting primarily of ‘dark matter’57.
The crAssphage is the utmost manifestation of this trend.
The complete crAssphage (after Cross Assembly) genome was
assembled by joining contigs obtained from several human fae-
cal viral metagenomes as a circular double-stranded (ds) DNA
molecule of ~97 kilobases (kb)4. The circular genome map apparently
results from terminal redundancy and/or circular permutation.
The crAssphage is extremely abundant, accounting for up to 90%
of the reads in the virus-like particle-enriched fraction of the gut
metagenome and about 22% of the reads in the total metagenome.
Numerous reads matching the crAssphage genome have been iden-
tified in numerous gut metagenomes collected in diverse geographic
locations, indicating that crAssphage is not only the most abundant
virus in the human gut microbiome but also a (nearly) ubiquitous
one4,8,9. Read co-occurrence analysis points to bacteria of the phy-
lum Bacteroidetes as the host(s) of crAssphage4,10. This assignment
is compatible with the presence in the crAssphage genome of a pro-
tein containing carbohydrate-binding domains (BACON domains)
that is highly similar to a homologous protein from Bacteroides
and with partial matches between two crAssphage sequences and
CRISPR spacers from two species of Bacteroides4. Members of the
Bacteroidetes dominate the gut microbiome, but most of these bac-
teria so far have not been grown in culture11,12. Thus, it is hardly
surprising that the most abundant—but never isolated—phage from
this environment appears to be a parasite of Bacteroidetes. Analysis
of the protein sequences encoded in the crAssphage genome failed
to identify specific relationships with other bacteriophages4. Several
proteins implicated in phage genome replication have been identi-
fied, including a family of B DNA polymerase (DNAP), a primase
and a flavin-dependent thymidylate synthase, but neither the major
capsid protein nor other structural and morphogenetic proteins
were detected. In an attempt to clarify the provenance of this most
abundant but enigmatic human-associated virus, we reanalysed the
crAssphage genome using the most sensitive available methods for
protein sequence analysis and taking advantage of database growth
since the time of crAssphage discovery. The result is the identifica-
tion of a previously unknown, expansive bacteriophage family that
appears to be associated with diverse members of Bacteroidetes and
for which we now recognize the structural, replication and expres-
sion gene modules.
The sequences of the crAssphage proteins were compared (using
PSI-BLAST) to the non-redundant protein sequence database
(nr) and the Whole Genome Shotgun (WGS) databases (NCBI,
NIH, Bethesda) containing microbial genomic and metagenomic
sequences. Sequences with significant similarity to crAssphage
proteins were detected in four genomes of previously identified
bacteriophages and numerous contigs assigned to bacterial genomes
(possibly, prophages) and metagenomic contigs. These sequences
Discovery of an expansive bacteriophage family
that includes the most abundant viruses from
the human gut
Natalya Yutin1, Kira S. Makarova1, Ayal B. Gussow 1, Mart Krupovic 2, Anca Segall1,3,
Robert A. Edwards 3 and Eugene V. Koonin 1*
NATURE MICROBIOLOGY | VOL 3 | JANUARY 2018 | 38–46 | www.nature.com/naturemicrobiology
38
Content courtesy of Springer Nature, terms of use apply. Rights reserved

Supplementary resource (1)

... A powerful computational assay was utilized to find extended sequence databases for the functions of most of the crAssphage genes. Many crAss-like phages were identified in multiple host-associated and environmental viromes [14]. Genetically, crAssphages are highly diverse and have been grouped into ten genera that share more than 40% of their ORFs [15]. ...
... The highest crAssphage prevalence was detected at a relatively high wind speed range (15-21 km/h) in all three sampling areas. On the other hand, crAssphage was not detected in KSU-WWTP and EMB-WWTP in the (8)(9)(10)(11)(12)(13)(14) wind speed ranges ( Figure 6). CrAssphage prevalence was detected at different wind speed levels varying from a low wind speed of 1-7 Km/h to a high wind speed level of 30 Km/h. ...
Article
Full-text available
The most common DNA virus found in wastewaters globally is the cross-assembly phage (crAssphage). King Saud University wastewater treatment plant (KSU-WWTP); Manfoha wastewater treatment plant (MN-WWTP); and the Embassy wastewater treatment plant (EMB-WWTP) in Riyadh, Saudi Arabia were selected, and 36 untreated sewage water samples during the year 2022 were used in the current study. The meteorological impact on crAssphage prevalence was investigated. CrAssphage prevalence was recorded using PCR and Sanger sequencing. The molecular diversity of crAssphage sequences was studied for viral gene segments from the major capsid protein (MCP) and membrane protein containing the peptidoglycan-binding domain (MP-PBD). KSU-WWTP and EMB-WWTP showed a higher prevalence of crAssphage (83.3%) than MN-WWTP (75%). Phylogenetic analysis of MCP and MP-PBD segments depicted a close relationship to the Japanese isolates. The MCP gene from the current study’s isolate WW/2M/SA/2022 depicted zero evolutionary divergence from 3057_98020, 2683_104905, and 4238_99953 isolates (d = 0.000) from Japan. A significant influence of temporal variations on the prevalence of crAssphage was detected in the three WWTPs. CrAssphage displayed the highest prevalence at high temperatures (33–44 °C), low relative humidity (6–14%), and moderate wind speed (16–21 Km/h). The findings provided pioneering insights into crAssphage prevalence and its genetic diversity in WWTPs in Riyadh, Saudi Arabia.
... Las endolisinas estudiadas hasta ahora no abarcan todavía la diversidad de enzimas codificadas en los genomas de los fagos diseminados en los ecosistemas del planeta. Las investigaciones con base en la secuenciación masiva de las bacterias que habitan nichos ecológicos particulares, han revolucionado nuestra comprensión de la diversidad genética de los fagos, revelando genomas con linajes evolutivos completamente nuevos en una escala sin precedentes (Roux et al., 2016;Paez-Espino et al., 2016;Yutin et al., 2018). Estos genomas son un recurso biológico para el descubrimiento de endolisinas con características únicas que ayuden al combate de los patógenos para los que se buscan urgentemente tratamientos a nivel global. ...
Article
Full-text available
Recently, lytic enzymes encoded by bacteriophages or phages (endolysins) have been proposed as an alternative to combat pathogenic bacteria. The antibacterial character of these enzymes results from their ability to hydrolyze peptidoglycan activity, which composes the bacterial cell wall. Endolysins are characterized by a narrow spectrum of action, rapid bactericidal effect, low probability of bacterial resistance, harmlessness, and effectiveness against antibiotic-resistant pathogens. Thanks to sequencing platform accessibility and massive data analysis, many endolysin-encoding genes have been identified within phage genomes. Moreover, their bactericidal activity can be enhanced by manipulating their functional domains through protein engineering. Notably, this manipulation can broaden their bactericidal spectrum against Gram-negative bacteria, which are mostly insensitive to their effect. The fusion of peptides capable of permeabilizing the outer membrane has expanded the application of these endolysins against pathogens of critical priority. There are research groups worldwide with outstanding developments and patents with this technology. However, in Mexico, it is a line of research little explored. With this work, we intend to disseminate this new generation of antibacterials.
... Due to its ubiquity in the global human population and abundance in fecally polluted water, p-crAssphage has been proposed as a human fecal indicator. Other Crassvirales are also highly abundant in the mammalian gut, particularly, in humans 3,8,31,42 .However, before other Crassvirales can be employed as human fecal markers, several issues need to be addressed. To date, crAssphage has been largely targeted in many reports, and reported as if it were a single virus, even though there is a great heterogeneity within the Crassvirales order. ...
Article
Full-text available
Crassvirales (crAss-like phages) are an abundant group of human gut-specific bacteriophages discovered in silico. The use of crAss-like phages as human fecal indicators is proposed but the isolation of only seven cultured strains of crAss-like phages to date has greatly hindered their study. Here, we report the isolation and genetic characterization of 25 new crAss-like phages (termed crAssBcn) infecting Bacteroides intestinalis, belonging to the order Crassvirales, genus Kehishuvirus and, based on their genomic variability, classified into six species. CrAssBcn phage genomes are similar to ΦCrAss001 but show genomic and aminoacidic differences when compared to other crAss-like phages of the same family. CrAssBcn phages are detected in fecal metagenomes around the world at a higher frequency than ΦCrAss001. This study increases the known crAss-like phage isolates and their abundance and heterogeneity open the question of what member of the Crassvirales group should be selected as human fecal marker.
... It has a dsDNA genome and was predicted to mainly infect Bacteroides spp. The genome of crAssphage is ∼100 kb, 80% of which encode predicted proteins with no significant similarity to available protein sequences (Yutin et al., 2018). Initially, genome analysis of crAssphage was unable to find any related phages or fully explain the function of most genes present. ...
Article
Full-text available
The human gastrointestinal tract is colonized by a large number of microorganisms, including bacteria, archaea, viruses, and eukaryotes. The bacterial community has been widely confirmed to have a significant impact on human health, while viruses, particularly phages, have received less attention. Phages are viruses that specifically infect bacteria. They are abundant in the biosphere and exist in a symbiotic relationship with their host bacteria. Although the application of high-throughput sequencing and bioinformatics technology has greatly improved our understanding of the genomic diversity, taxonomic composition, and spatio-temporal dynamics of the human gut phageome, there is still a large portion of sequencing data that is uncharacterized. Preliminary studies have predicted that the phages play a crucial role in driving microbial ecology and evolution. Prior to exploring the function of phages, it is necessary to address the obstacles that hinder establishing a comprehensive sequencing database with sufficient biological properties and understanding the impact of phage–bacteria interactions on human health. In this study, we provide an overview of the human gut phageome, including its composition, structure, and development. We also explore the various factors that may influence the phageome based on current research, including age, diet, ethnicity, and geographical location. Additionally, we summarize the relationship between the phageome and human diseases, such as IBD, IBS, obesity, diabetes, and metabolic syndrome.
... Another point of concern is the relatively high proportion of ORF without a known function, being in 3/13 prophages greater than 50%. These findings are aligned with previous studies, which remark not only the vast number of unknown phages sequenced amidst metagenomic data-referred to as viral dark matter-but also the abundance of putative proteins whose function we ignore (72)(73)(74)(75). In this regard, further studies concerning prophage identification, regulatory pathways, interaction with their host, and protein function should be made. ...
Article
Full-text available
Prophages are bacteriophages integrated into the bacterial host’s chromosome. This research aims to analyze and characterize the existing prophages within a collection of 53 Pseudomonas aeruginosa strains from intensive care units (ICUs) in Portugal and Spain. A total of 113 prophages were localized in the collection, with 18 of them being present in more than one strain simultaneously. After annotation, five of them were discarded as incomplete, and the 13 remaining prophages were characterized. Of 13, 10 belonged to the siphovirus tail morphology group, 2 to the podovirus tail morphology group, and 1 to the myovirus tail morphology group. All prophages had a length ranging from 20,199 to 63,401 bp and a GC% between 56.2% and 63.6%. The number of open reading frames (ORFs) oscillated between 32 and 88, and in 3/13 prophages, more than 50% of the ORFs had an unknown function. With our findings, we show that prophages are present in the majority of the P. aeruginosa strains isolated from Portuguese and Spanish critically ill patients, many of them found in more than one circulating strain at the same time and following a similar clonal distribution pattern. Although a great sum of ORFs had an unknown function, number of proteins in relation to viral defense (anti-CRISPR proteins, toxin/antitoxin modules, proteins against restriction-modification systems) as well as to prophage interference into their host’s quorum sensing system and regulatory cascades were found. This supports the idea that prophages have an influence in bacterial pathogenesis and anti-phage defense. IMPORTANCE Despite being known for decades, prophages remain understudied when compared to the lytic phages employed in phage therapy. This research aims to shed some light into the nature, composition, and role of prophages found within a set of circulating strains of Pseudomas aeruginosa , with special attention to high-risk clones. Given the fact that prophages can effectively influence bacterial pathogenesis, prophage basic research constitutes a topic of growing interest. Furthermore, the abundance of viral defense and regulatory proteins within prophage genomes detected in this study evidences the importance of characterizing the most frequent prophages in circulating clinical strains and in high-risk clones if phage therapy is to be used.
Preprint
Full-text available
Stool samples for fecal immunochemical tests (FIT) are collected in large numbers worldwide as part of colorectal cancer screening programs, but to our knowledge, the utility of these samples for virome studies is still unexplored. Employing FIT samples from 1034 CRCbiome participants, recruited from a Norwegian colorectal cancer screening study, we identified and annotated more than 18000 virus clusters (vOTUs), using shotgun metagenome sequencing. Only six percent of vOTUs were assigned to a known taxonomic family, with Microviridae being the most prevalent viral family. Genome integration state was family-associated, and the majority of identified viruses were unintegrated. Linking individual profiles to comprehensive lifestyle and demographic factors showed 17/25 of them to be associated with the gut virome. Physical activity, smoking, and dietary fiber consumption exhibited strong and consistent associations with both diversity and relative abundance of individual vOTUs, as well as with enrichment for auxiliary metabolic genes. We demonstrate the suitability of FIT samples for virome analysis, opening an opportunity for large-scale studies of this yet enigmatic part of the gut microbiome. The diverse viral populations and their connections to the individual lifestyle uncovered herein paves the way for further exploration of the role of the gut virome in health and disease.
Article
Bacteriophage tail fibers (or called tail spikes) play a critical role in the early stage of infection by binding to the bacterial surface. Podophages with known structures usually possess one or two types of fibers. Here, we resolved an asymmetric structure of the podophage GP4 to near-atomic resolution by cryo-EM. Our structure revealed a symmetry-mismatch relationship between the components of the GP4 tail with previously unseen topologies. In detail, two dodecameric adaptors (adaptors I and II), a hexameric nozzle, and a tail needle form a conserved tail body connected to a dodecameric portal occupying a unique vertex of the icosahedral head. However, five chain-like extended fibers (fiber I) and five tulip-like short fibers (fiber II) are anchored to a 15-fold symmetric fiber-tail adaptor, encircling the adaptor I, and six bamboo-like trimeric fibers (fiber III) are connected to the nozzle. Five fibers I, each composed of five dimers of the protein gp80 linked by an elongated rope protein, are attached to the five edges of the tail vertex of the icosahedral head. In this study, we identified a new structure of the podophage with three types of tail fibers, and such phages with different types of fibers may have a broad host range and/or infect host cells with considerably high efficiency, providing evolutionary advantages in harsh environments.
Preprint
Full-text available
Stool samples for fecal immunochemical tests (FIT) are collected in large numbers worldwide as part of colorectal cancer screening programs, but to our knowledge, the utility of these samples for virome studies is still unexplored. Employing FIT samples from 1034 CRCbiome participants, recruited from a Norwegian colorectal cancer screening study, we identified and annotated more than 18000 virus clusters (vOTUs), using shotgun metagenome sequencing. Only six percent of vOTUs were assigned to a known taxonomic family, with Microviridae being the most prevalent viral family. Genome integration state was family-associated, and the majority of identified viruses were unintegrated. Linking individual profiles to comprehensive lifestyle and demographic factors showed 17/25 of them to be associated with the gut virome. Physical activity, smoking, and dietary fiber consumption exhibited strong and consistent associations with both diversity and relative abundance of individual vOTUs, as well as with enrichment for auxiliary metabolic genes. We demonstrate the suitability of FIT samples for virome analysis, opening an opportunity for large-scale studies of this yet enigmatic part of the gut microbiome. The diverse viral populations and their connections to the individual lifestyle uncovered herein paves the way for further exploration of the role of the gut virome in health and disease.
Article
The gut microbiome is a dense and metabolically active consortium of microorganisms and viruses located in the lower gastrointestinal tract of the human body. Bacteria and their viruses (phages) are the most abundant members of the gut microbiome. Investigating their biology and the interplay between the two is important if we are to understand their roles in human health and disease. In this review, we summarize recent advances in resolving the taxonomic structure and ecological functions of the complex community of phages in the human gut—the gut phageome. We discuss how age, diet, and geography can all have a significant impact on phageome composition. We note that alterations to the gut phageome have been observed in several diseases such as inflammatory bowel disease, irritable bowel syndrome, and colorectal cancer, and we evaluate whether these phageome changes can directly or indirectly contribute to disease etiology and pathogenesis. We also highlight how lack of standardization in studying the gut phageome has contributed to variation in reported results. Expected final online publication date for the Annual Review of Microbiology, Volume 77 is September 2023. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Article
The gut microbiota has been gaining attention due to its interactions with the human body and its role in pathophysiological processes. One of the main interactions is the "gut-liver axis", in which disruption of the gut mucosal barrier seen in portal hypertension and liver disease can influence liver allograft function over time. For example, in patients who are undergoing liver transplantation, preexisting dysbiosis, perioperative antibiotic use, surgical stress, and immunosuppressive use have each been associated with alterations in gut microbiota, potentially impacting overall morbidity and mortality. In this review, studies exploring gut microbiota changes in patients undergoing liver transplantation are reviewed, including both human and experimental animal studies. Common themes include an increase in Enterobacteriaceae and Enterococcaceae species and a decrease in Faecalibacterium prausnitzii and Bacteriodes, while a decrease in the overall diversity of gut microbiota following liver transplantation.
Article
Full-text available
Intense biological conflicts between prokaryotic genomes and their genomic parasites have resulted in an arms race in terms of the molecular “weaponry” deployed on both sides. Using a recursive computational approach, we uncovered a remarkable class of multidomain proteins with 2 to 15 domains in the same polypeptide deployed by viruses and plasmids in such conflicts. Domain architectures and genomic contexts indicate that they are part of a widespread conflict strategy involving proteins injected into the host cell along with parasite DNA during the earliest phase of infection. Their unique feature is the combination of domains with highly disparate biochemical activities in the same polypeptide; accordingly, we term them polyvalent proteins. Of the 131 domains in polyvalent proteins, a large fraction are enzymatic domains predicted to modify proteins, target nucleic acids, alter nucleotide signaling/metabolism, and attack peptidoglycan or cytoskeletal components. They further contain nucleic acid-binding domains, virion structural domains, and 40 novel uncharacterized domains. Analysis of their architectural network reveals both pervasive common themes and specialized strategies for conjugative elements and plasmids or (pro)phages. The themes include likely processing of multidomain polypeptides by zincin-like metallopeptidases and mechanisms to counter restriction or CRISPR/Cas systems and jump-start transcription or replication. DNA-binding domains acquired by eukaryotes from such systems have been reused in XPC/RAD4-dependent DNA repair and mitochondrial genome replication in kinetoplastids. Characterization of the novel domains discovered here, such as RNases and peptidases, are likely to aid in the development of new reagents and elucidation of the spread of antibiotic resistance. IMPORTANCE This is the first report of the widespread presence of large proteins, termed polyvalent proteins, predicted to be transmitted by genomic parasites such as conjugative elements, plasmids, and phages during the initial phase of infection along with their DNA. They are typified by the presence of multiple domains with disparate activities combined in the same protein. While some of these domains are predicted to assist the invasive element in replication, transcription, or protection of their DNA, several are likely to target various host defense systems or modify the host to favor the parasite's life cycle. Notably, DNA-binding domains from these systems have been transferred to eukaryotes, where they have been incorporated into DNA repair and mitochondrial genome replication systems.
Article
Full-text available
The gut microbiota is essentially a multifunctional bioreactor within a human being. The exploration of its enormous metabolic potential provides insights into the mechanisms underlying microbial ecology and interactions with the host. The data obtained using “shotgun” metagenomics capture information about the whole spectrum of microbial functions. However, each new study presenting new sequencing data tends to extract only a little of the information concerning the metabolic potential and often omits specific functions. A meta-analysis of the available data with an emphasis on biomedically relevant gene groups can unveil new global trends in the gut microbiota. As a step toward the reuse of metagenomic data, we developed a method for the quantitative profiling of user-defined groups of genes in human gut metagenomes. This method is based on the quick analysis of a gene coverage matrix obtained by pre-mapping the metagenomic reads to a global gut microbial catalogue. The method was applied to profile the abundance of several gene groups related to antibiotic resistance, phages, biosynthesis clusters and carbohydrate degradation in 784 metagenomes from healthy populations worldwide and patients with inflammatory bowel diseases and obesity. We discovered country-wise functional specifics in gut resistome and virome compositions. The most distinct features of the disease microbiota were found for Crohn’s disease, followed by ulcerative colitis and obesity. Profiling of the genes belonging to crAssphage showed that its abundance varied across the world populations and was not associated with clinical status. We demonstrated temporal resilience of crAssphage and the influence of the sample preparation protocol on its detected abundance. Our approach offers a convenient method to add value to accumulated “shotgun” metagenomic data by helping researchers state and assess novel biological hypotheses.
Article
Full-text available
The HU superfamily of proteins, with a unique DNA-binding mode, has been extensively studied as the primary chromosome-packaging protein of the bacterial superkingdom. Representatives also play a role in DNA-structuring during recombination events and in eukaryotic organellar genome maintenance. However, beyond these well-studied roles, little is understood of the functional diversification of this large superfamily. Using sensitive sequence and structure analysis methods we identify multiple novel clades of the HU superfamily. We present evidence that a novel eukaryotic clade prototyped by the human CCDC81 protein acquired roles beyond DNA-binding, likely in protein-protein interaction in centrosome organization and as a potential cargo-binding protein in conjunction with Dynein-VII. We also show that these eukaryotic versions were acquired via an early lateral transfer from bacteroidetes, where we predict a role in chromosome partition. This likely happened prior to the last eukaryotic common ancestor, pointing to potential endosymbiont contributions beyond that of the mitochondrial progenitor. Further, we show that the dramatic lineage-specific expansion of this domain in the bacteroidetes lineage primarily is linked to a functional shift related to potential recognition and preemption of genome invasive entities such as mobile elements. Remarkably, the CCDC81 clade has undergone a similar massive lineage-specific expansion within the archosaurian lineage in birds, suggesting a possible use of the HU superfamily in a similar capacity in recognition of non-self molecules even in this case.
Article
Full-text available
Termites depend nutritionally on their gut microbes, and protistan, bacterial, and archaeal gut communities have been extensively studied. However, limited information is available on viruses in the termite gut. We herein report the complete genome sequence (99,517 bp) of a phage obtained during a genome analysis of "Candidatus Azobacteroides pseudotrichonymphae" phylotype ProJPt-1, which is an obligate intracellular symbiont of the cellulolytic protist Pseudotrichonympha sp. in the gut of the termite Prorhinotermes japonicus. The genome of the phage, designated ProJPt-Bp1, was circular or circularly permuted, and was not integrated into the two circular chromosomes or five circular plasmids composing the host ProJPt-1 genome. The phage was putatively affiliated with the order Caudovirales based on sequence similarities with several phage-related genes; however, most of the 52 protein-coding sequences had no significant homology to sequences in the databases. The phage genome contained a tRNA-Gln (CAG) gene, which showed the highest sequence similarity to the tRNA-Gln (CAA) gene of the host "Ca. A. pseudotrichonymphae" phylotype ProJPt-1. Since the host genome lacked a tRNA-Gln (CAG) gene, the phage tRNA gene may compensate for differences in codon usage bias between the phage and host genomes. The phage genome also contained a non-coding region with high nucleotide sequence similarity to a region in one of the host plasmids. No other phage-related sequences were found in the host ProJPt-1 genome. To the best of our knowledge, this is the first report of a phage from an obligate, mutualistic endosymbiont permanently associated with eukaryotic cells.
Article
Full-text available
Significance The entire history of life is the story of virus–host coevolution. Therefore the origins and evolution of viruses are an essential component of this process. A signature feature of the virus state is the capsid, the proteinaceous shell that encases the viral genome. Although homologous capsid proteins are encoded by highly diverse viruses, there are at least 20 unrelated varieties of these proteins. We show here that many, if not all, capsid proteins evolved from ancestral proteins of cellular organisms on multiple, independent occasions. These findings reveal a stronger connection between the virosphere and cellular life forms than previously suspected.
Article
Full-text available
The number and diversity of viral sequences that are identified in metagenomic data far exceeds that of experimentally characterized virus isolates. In a recent workshop, a panel of experts discussed the proposal that, with appropriate quality control, viruses that are known only from metagenomic data can, and should be, incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV). Although a taxonomy that is based on metagenomic sequence data alone represents a substantial departure from the traditional reliance on phenotypic properties, the development of a robust framework for sequence-based virus taxonomy is indispensable for the comprehensive characterization of the global virome. In this Consensus Statement article, we consider the rationale for why metagenomic sequence data should, and how it can, be incorporated into the ICTV taxonomy, and present proposals that have been endorsed by the Executive Committee of the ICTV.
Article
Full-text available
Viruses and their host genomes often share similar oligonucleotide frequency (ONF) patterns, which can be used to predict the host of a given virus by finding the host with the greatest ONF similarity. We comprehensively compared 11 ONF metrics using several k-mer lengths for predicting host tax-onomy from among ∼32 000 prokaryotic genomes for 1427 virus isolate genomes whose true hosts are known. The background-subtracting measure d * 2 at k = 6 gave the highest host prediction accuracy (33%, genus level) with reasonable computational times. Requiring a maximum dissimilarity score for making predictions (thresholding) and taking the consensus of the 30 most similar hosts further improved accuracy. Using a previous dataset of 820 bacte-riophage and 2699 bacterial genomes, d * 2 host prediction accuracies with thresholding and consensus methods (genus-level: 64%) exceeded previous Eu-clidian distance ONF (32%) or homology-based (22-62%) methods. When applied to metagenomically-assembled marine SUP05 viruses and the human gut virus crAssphage, d * 2-based predictions overlapped (i.e. some same, some different) with the previously inferred hosts of these viruses. The extent of overlap improved when only using host genomes or metage-nomic contigs from the same habitat or samples as the query viruses. The d * 2 ONF method will greatly improve the characterization of novel, metagenomic viruses.
Article
Full-text available
Significance Humans need a stable, balanced gut microbiome (GM) to be healthy. The GM is influenced by bacteriophages that infect bacterial hosts. In this work, bacteriophages associated with the GM of healthy individuals were analyzed, and a healthy gut phageome (HGP) was discovered. The HGP is composed of core and common bacteriophages common to healthy adult individuals and is likely globally distributed. We posit that the HGP plays a critical role in maintaining the proper function of a healthy GM. As expected, we found that the HGP is significantly decreased in individuals with gastrointestinal disease (ulcerative colitis and Crohn’s disease). Together, these results reveal a large community of human gut bacteriophages that likely contribute to maintaining human health.
Article
Over the last decade, our appreciation for the contribution of resident gut microorganisms—the gut microbiota—to human health has surged. However, progress is limited by the sheer diversity and complexity of these microbial communities. Compounding the challenge, the majority of our commensal microorganisms are not close relatives of Escherichia coli or other model organisms and have eluded culturing and manipulation in the laboratory. In this Review, we discuss how over a century of study of the readily cultured, genetically tractable human gut Bacteroides has revealed important insights into the biochemistry, genomics and ecology that make a gut bacterium a gut bacterium. While genome and metagenome sequences are being produced at breakneck speed, the Bacteroides provide a significant ‘jump-start’ on uncovering the guiding principles that govern microbiota–host and inter-bacterial associations in the gut that will probably extend to many other members of this ecosystem.
Article
Applying synthetic biology to engineer gut-resident microbes provides new avenues to investigate microbe-host interactions, perform diagnostics, and deliver therapeutics. Here, we describe a platform for engineering Bacteroides, the most abundant genus in the Western microbiota, which includes a process for high-throughput strain modification. We have identified a novel phage promoter and translational tuning strategy and achieved an unprecedented level of expression that enables imaging of fluorescent-protein-expressing Bacteroides stably colonizing the mouse gut. A detailed characterization of the phage promoter has provided a set of constitutive promoters that span over four logs of strength without detectable fitness burden within the gut over 14 days. These promoters function predictably over a 1,000,000-fold expression range in phylogenetically diverse Bacteroides species. With these promoters, unique fluorescent signatures were encoded to allow differentiation of six species within the gut. Fluorescent protein-based differentiation of isogenic strains revealed that priority of gut colonization determines colonic crypt occupancy.