Project

DIGS for EVEs

Goal: The goal of the DIGS for EVEs project is to systematically screen Eukaryotic genomes in silico for non-retroviral EVEs, and to release the curated results as versioned-controlled 'catalogs' via GitHub. Genome screening is performed using published genome sequence data and the database integrated genome screening (DIGS) tool.

Methods: Virology, Genomics, Multiple Sequence Alignment, Evolution, Phylogeny

Updates
0 new
4
Recommendations
0 new
0
Followers
0 new
20
Reads
0 new
145

Project log

Robert J Gifford
added an update
In association with our recent study of flavivirus evolution, we created Flavivirus-GLUE - an open resource for comparative analysis of flaviviruses, built using the GLUE software framework.
Flavivirus-GLUE incorporates all of the data items typically used in comparative sequence analysis of flaviviruses (e.g. sequences, alignments, genome feature annotations). Moreover, it represents the complex semantic links between these data items via GLUE's underlying relational database, so that only minimal data preparation is required for genomic analysis, with data effectively being 'poised' for immediate use.
We have incorporated endogenous viral element (EVE) data into this resource, using standardised nomenclature and the GLUE database to capture detailed information about EVE loci. We hope that by applying a higher level of order to EVE data we will facilitate their further use in research. Moreover broadly, we help establish protocols and mechanisms through which EVE researchers can work collaboratively to describe EVE diversity, and to investigate EVE biology using comparative approaches.
 
Robert J Gifford
added a research item
The flaviviruses (family Flaviviridae ) are a group of positive-strand RNA viruses, many of which pose serious risks to human health on a global scale. Here, we calibrate the timeline of flavivirus evolution using flavivirus-derived DNA sequences identified in animal genomes. We demonstrate that the family is at least 100 million years old and show that this timing can be integrated with dates inferred from co-phylogenetic analysis and paleontological records to produce a cohesive overview of flavivirus evolution in which the main subgroups originate early in animal evolution and broadly co-diverge with animal phyla. In addition, we show that the arthropod-borne ‘classical’ flaviviruses first evolved from tick-specific viruses, and only later adapted to become insect-borne. Our findings demonstrate that the biological properties of flaviviruses have been acquired over many millions of years of evolution, implying that broad-scale comparative analysis can reveal fundamental insights into flavivirus biology. We implement a novel approach to computational genomic studies of viruses that can support these efforts by enabling more efficient utilization of evolution-related domain knowledge in virus research. Significance Understanding how pathogenic viruses evolved can provide vital insights into their biology. In this study we use genomic data to show that flaviviruses – a group of viruses that includes important pathogens such as Dengue virus and hepatitis C virus - arose through an extended history of evolutionary interaction with host and vector species. Our findings show that comparative studies can productively utilise genomic data to reveal insights into flavivirus biology, which will help to facilitate the development of more effective antiviral treatments and strategies.
Robert J Gifford
added a research item
Hepadnaviruses (family Hepadnaviviridae) are reverse-transcribing animal viruses that infect vertebrates. DNA sequences derived from ancient hepadnaviruses have been identified in the germline genome of numerous vertebrate species, and these ‘endogenous hepatitis B viruses’ (eHBVs) reveal aspects of the long-term coevolutionary relationship between hepadnaviruses and their vertebrate hosts. Here, we use a novel, data-oriented approach to recover and analyse the complete repertoire of eHBV elements in published animal genomes. We show that germline incorporation of hepadnaviruses is exclusive to a single vertebrate group (Sauria) and that the eHBVs contained in saurian genomes represent a far greater diversity of hepadnaviruses than previously recognised. Through in-depth characterisation of eHBV elements we establish the existence of four distinct subgroups within the genus Avihepadnavirus and trace their evolution through the Cenozoic Era. Furthermore, we provide a completely new perspective on hepadnavirus evolution by showing that the metahepadnaviruses (genus Metahepadnavirus) originated >300 million years ago in the Paleozoic Era and have historically infected a broad range of vertebrates. We also show that eHBVs have been intra-genomically amplified in some saurian lineages, and that eHBVs located at approximately equivalent genomic loci have been acquired in entirely distinct germline integration events. These findings indicate that selective forces have favoured the accumulation of hepadnaviral sequences at specific loci in the saurian germline. Our investigation provides a range of new insights into the long-term evolutionary history of reverse-transcribing DNA viruses and shows that germline incorporation of hepadnaviruses has played a role in shaping the evolution of saurian genomes.
Robert J Gifford
added an update
We used database-integrated genome-screening (DIGS) to recover a comprehensive catalog of hepadnavirus-derived EVEs - usually referred to as endogenous hepatitis B viruses (eHBVs). We used the GLUE software framework to capture all of the data associated with our investigation.
The resulting Hepadnaviridae-GLUE project incorporates a set of principles for organising the hepadnavirus 'fossil record', and a protocol through which it can be accessed and collaboratively developed.
Please see the Hepadnaviridae-GLUE website for highlights and further details:
 
Robert J Gifford
added 7 research items
A diverse range of DNA sequences derived from circoviruses (family Circoviridae) have been identified in samples obtained from humans and domestic animals, often in association with pathological conditions. In the majority of cases, however, little is known about the natural biology of the viruses from which these sequences are derived. Endogenous circoviral elements (CVe) are DNA sequences derived from circoviruses that occur in animal genomes and provide a useful source of information about circovirus-host relationships. In this study we screened genome assemblies of 675 animal species and identified numerous circovirus-related sequences, including the first examples of CVe derived from cycloviruses. We confirmed the presence of these CVe in the germline of the elongate twig ant ( Pseudomyrmex gracilis ), thereby establishing that cycloviruses infect insects. We examined the evolutionary relationships between CVe and contemporary circoviruses, showing that CVe from ants and mites group relatively closely with cycloviruses in phylogenies. Furthermore, the relatively random interspersal of CVe from insect genomes with cyclovirus sequences recovered from vertebrate samples, suggested that contamination might be an important consideration in studies reporting these viruses. Our study demonstrates how endogenous viral sequences can inform metagenomics-based virus discovery. In addition, it raises doubts about the role of cycloviruses as pathogens of humans and other vertebrates.
Sequences derived from parvoviruses (family Parvoviridae ) are relatively common in animal genomes, but the functional significance of these endogenous parvoviral element (EPV) sequences remains unclear. In this study we use a combination of in silico and molecular biological approaches to investigate a fusion gene encoded by guinea pigs (genus Cavia ) that is partially derived from an EPV. This gene, named enRep-M9l , encodes a predicted polypeptide gene product comprising a partial myosin 9-like ( M9l ) gene fused to a 3′ truncated, EPV-encoded replicase. We examine the genomic and phylogenetic characteristics of the EPV locus ( enRep ) that encodes the viral portions of enRep-M9l revealing that it derives from an ancient dependoparvovirus (genus Dependoparvovirus ) that was incorporated into the guinea pig germline between ∼22-35 million years ago (MYA). Despite these ancient origins, the regions of the enRep locus that are expressed in the enRep-M9l gene are conserved across multiple species in the family Caviidae (guinea pigs and cavies), consistent with a potential function at amino acid level. Using molecular biological approaches, we further demonstrate that: (i) enRep-M9l mRNA is broadly transcribed in guinea pig cells; (ii) the cloned enRep-M9l transcript can express a protein of the expected size in guinea pig cells in vitro , and; (iii) the expressed protein localizes to the cytosol. Our findings demonstrate that, consistent with a functional role, the enRep-M9l fusion gene is evolutionarily conserved, broadly transcribed, and capable of expressing protein. IMPORTANCE DNA from viruses has been ‘horizontally transferred’ to mammalian genomes during evolution, but the impact of this process on mammalian biology remains poorly understood. The findings of our study indicate that in guinea pigs a novel gene has evolved through fusion of host and virus genes.
The Deltaretrovirus genus of retroviruses (family Retroviridae) includes the human T cell leukemia viruses and bovine leukemia virus (BLV). Relatively little is known about the biology and evolution of these viruses, because only a few species have been identified and the genomic 'fossil record' is relatively sparse. Here, we report the discovery of multiple novel endogenous retroviruses (ERVs) derived from ancestral deltaretroviruses. These sequences-two of which contain complete or near complete internal coding regions-reside in genomes of several distinct mammalian orders, including bats, carnivores, cetaceans, and insectivores. We demonstrate that two of these ERVs contain unambiguous homologs of the tax gene, indicating that complex gene regulation has ancient origins within the Deltaretrovirus genus. ERVs demonstrate that the host range of the deltaretrovirus genus is much more extensive than suggested by the relatively small number of exogenous deltaretroviruses described so far, and allow the evolutionary timeline of deltaretrovirus-mammal interaction to be more accurately calibrated.
Robert J Gifford
added 4 research items
Retroviral integration into germline DNA can result in the formation of a vertically inherited proviral sequence called an endogenous retrovirus (ERV). Over the course of their evolution, vertebrate genomes have accumulated many thousands of ERV loci. These sequences provide useful retrospective information about ancient retroviruses, and have also played an important role in shaping the evolution of vertebrate genomes. There is an immediate need for a unified system of nomenclature for ERV loci, not only to assist genome annotation, but also to facilitate research on ERVs and their impact on genome biology and evolution. In this review, we examine how ERV nomenclatures have developed, and consider the possibilities for the implementation of a systematic approach for naming ERV loci. We propose that such a nomenclature should not only provide unique identifiers for individual loci, but also denote orthologous relationships between ERVs in different species. In addition, we propose that—where possible—mnemonic links to previous, well-established names for ERV loci and groups should be retained. We show how this approach can be applied and integrated into existing taxonomic and nomenclature schemes for retroviruses, ERVs and transposable elements.
Amdoparvoviruses (family Parvoviridae: genus Amdoparvovirus) infect carnivores, and are a major cause of morbidity and mortality in farmed animals. In this study, we systematically screened animal genomes to identify PVe disclosing a high degree of similarity to amdoparvoviruses, and investigated their genomic, phylogenetic and protein structural features. We report the first examples of full-length, amdoparvovirus-derived PVe in the genome of the Transcaucasian mole vole (Ellobius lutescens). Furthermore, we identify four further PVe in mammal and reptile genomes that are intermediate between amdoparvoviruses and their sister genus (Protoparvovirus) in terms of their phylogenetic placement and genomic features. In particular, we identify a genome-length PVe in the genome of a pit viper (Protobothrops mucrosquamatus) that is more like a protoparvovirus than an amdoparvovirus in terms of its phylogenetic placement and the structural features of its capsid protein (as revealed by homology modeling), yet exhibits characteristically amdoparvovirus-like genome features including: (i) a putative middle ORF gene; (ii) a capsid gene that lacks a phospholipase A2 (PLA2) domain; (iii) a genome structure consistent with an amdoparvovirus-like mechanism of capsid gene expression. Our findings indicate that amdoparvovirus host range has extended to rodents in the past, and that parvovirus lineages possessing a mixture of proto- and amdoparvovirus-like characteristics have circulated in the past. In addition, we show that PVe in the mole vole and pit viper encode intact, expressible replicase genes that have potentially been co-opted or exapted in these host species.
A significant fraction of most genomes is comprised of DNA sequences that have been incompletely investigated. This genomic ‘dark matter’ contains a wealth of useful biological information that can be recovered by systematically screening genomes in silico using sequence similarity search tools. Specialized computational tools are required to implement these screens efficiently. Here, we describe the database-integrated genome-screening (DIGS) tool: a computational framework for performing these investigations. To demonstrate, we screen mammalian genomes for endogenous viral elements (EVEs) derived from the Filoviridae, Parvoviridae, Circoviridae and Bornaviridae families, identifying numerous novel elements in addition to those that have been described previously. The DIGS tool provides a simple, robust framework for implementing a broad range of heuristic, sequence analysis-based explorations of genomic diversity. Availability http://giffordlabcvr.github.io/DIGS-tool/ Contact robert.gifford@glasgow.ac.uk Supplementary information Supplementary data are available at Bioinformatics online.
Robert J Gifford
added an update
We are using GLUE, a flexible software framework for virus genome data, to collate EVE data recovered via DIGS for EVEs. I created a GLUE project for the Circoviridae that includes a set of reference circoviruses as well as all endogenous circoviral elements (ECV) identified as part of DIGS for EVEs. The project is contained within the project repo on GitHub, and can be downloaded and built on any computer that has GLUE installed.
GLUE installation instructions: tools.glue.cvr.ac.uk/#/installation
 
Robert J Gifford
added a project goal
The goal of the DIGS for EVEs project is to systematically screen Eukaryotic genomes in silico for non-retroviral EVEs, and to release the curated results as versioned-controlled 'catalogs' via GitHub. Genome screening is performed using published genome sequence data and the database integrated genome screening (DIGS) tool.