Researchers complete unprecedented study of the global virome

“The amount of analysis and discoveries that we anticipate will follow this dataset cannot be overstated.”

Viruses are the most abundant biological entities on Earth, but we know about only a small fraction of them. An unprecedented study published today in Nature has closed some of those gaps. Researchers analyzed over 5 Tb of metagenomics sequence data from 3,042 samples to assess viruses’ global distribution, phylogenetic diversity, and host specificity. We speak with lead author David Paez Espino about what they learned and the research this dataset paves the way for.

ResearchGate: How might this project prove useful to researchers in the future?

David Paez Espino: With this paper, we are opening for the first time a wide door to the study of viruses. Until now we were looking at less than 5,000 viruses, and there were no large-scale multihabitat studies of viral discovery. With this work, we have increased the number of viral sequences by 50 times, and through that we identified 99 percent more viral diversity compared to what was known before. This provides an enormous amount of new data that will be studied in more detail in the years to come. We have more than doubled the number of microbial phyla that serve as hosts to viruses, and have created the first global viral distribution map. The amount of analysis and discoveries that we anticipate will follow this dataset cannot be overstated.

RG: What were your goals when you started the project?

Paez Espino: This work had several aims. First, we wanted to overcome the limitations of the current narrow and biased collection of isolate viruses. We also sought to shed light on environmental viral taxonomic diversity and to fill in some of the existing gaps in our understanding of host-virus interactions and host range specificity. Finally, we wanted to explore patterns of biogeographic distribution of the predicted metagenomic viruses.

RG: Why hasn’t this been done before?

Paez Espino: This is the first time that anyone has looked systematically across all habitats and across such a large compendium of data: more than 3,000 different samples. A lot of those viruses have been missed before, because this is the first time that a systematic and very sensitive approach has been performed.

RG: How did you obtain samples to analyze?

Paez Espino: All the samples were obtained from the public IMG/M system, a computational platform developed by some of the co-authors that supports comparative analysis of microbial community aggregate genomes, or metagenomes, in the context reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments.

RG: Did anything you learned surprise you?

Paez Espino: We were surprised with several findings, for example the detection of the largest phage reported to date (~600 Mb), the remarkable habitat specificity for the vast majority of the viral sequences, the presence of the same viruses shared across many different individuals, and the number of viral sequences able to infect microbial phyla previously unknown to be infected by viruses.

We were also surprised by the identification of viruses that can infect organisms from different phyla.   The vast majority of the newly discovered metagenomic viral genomes have a narrow host range, consistent with the prevailing notion that broad host range is negatively correlated with viral infection success. However, our data unexpectedly revealed the existence of microbial viruses with expansive taxonomic host ranges, including examples of different microbial phylum-level hosts.

As a comparison, it is well recognized that interspecies and interclass transmission occurs across vertebrates. The Influenza A virus, for example, infects humans as well as swine, horses, birds, and waterfowl. However, lytic phages and archaeal viruses are traditionally thought to display species specificity, making transmission difficult across diverse hosts.

RG: What are some potential applications of this information?

Paez Espino: Our results lead to new insight into microbial host-viral interactions, particularly within the human host ecosystem. These interactions could be leveraged for biotechnological purposes, such as the use of regulatory DNA parts that will work across many different microbial organisms, allowing us to build genes and pathways that could be expressed in many different hosts.

Featured image courtesy of NAID