Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model

Department of Biology, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China.
BMC Evolutionary Biology (Impact Factor: 3.41). 06/2010; 10:192. DOI: 10.1186/1471-2148-10-192
Source: PubMed

ABSTRACT Background: The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes.
Results: In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV).
Conclusions: The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Glossina hytrosavirus (family Hytrosaviridae) is a double-stranded DNA virus with rod-shaped, enveloped virions. Its 190 kbp genome encodes 160 putative open reading frames. The virus replicates in the nucleus, and acquires a fragile envelope in the cell cytoplasm. Glossina hytrosavirus was first isolated from hypertrophied salivary glands of the tsetse fly, Glossina pallidipes Austen (Diptera; Glossinidae) collected in Kenya in 1986. A certain proportion of laboratory G. pallidipes flies infected by Glossina hytrosavirus develop hypertrophied salivary glands and midgut epithelial cells, gonadal anomalies and distorted sex-ratios associated with reduced insemination rates, fecundity and lifespan. These symptoms are rare in wild tsetse populations. In East Africa, G. pallidipes is one of the most important vectors of African trypanosomosis, a debilitating zoonotic disease that afflicts 37 sub-Saharan African countries. There is a large arsenal of control tactics available to manage tsetse flies and the disease they transmit. The sterile insect technique (SIT) is a robust control tactic that has shown to be effective in eradicating tsetse populations when integrated with other control tactics in an area-wide integrated approach. The SIT requires production of sterile male flies in large production facilities. To supply sufficient numbers of sterile males for the SIT component against G. pallidipes, strategies have to be developed that enable the management of the Glossina hytrosavirus in the colonies. This review provides a historic chronology of the emergence and biogeography of Glossina hytrosavirus, and includes researches on the infectomics (defined here as the functional and structural genomics and proteomics) and pathobiology of the virus. Standard operation procedures for viral management in tsetse mass-rearing facilities are proposed and a future outlook is sketched.
    Insects. 07/2013; 4(3):287-319.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The composition vector (CV) method is an alignment-free method for phylogenetics. Because of its simplicity when compared with the alignment-based methods, the method has been widely discussed lately. There are mainly four steps in the CV method: (1) count the frequency of each k-string in the sequence; (2) construct the composition vector for the sequence; (3) compute the distance between every two composition vectors to form a distance matrix; and (4) construct the phylogenetic tree. In this paper, we review several developments of the CV method respectively.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Seronegative hepatitis-non-A, non-B, non-C, non-D, non-E hepatitis-is poorly characterized but strongly associated with serious complications. We collected 92 sera specimens from patients with non-A-E hepatitis in Chongqing, China between 1999 and 2007. Ten sera pools were screened by Solexa deep sequencing. We discovered a 3,780-bp contig present in all 10 pools that yielded BLASTx E scores of 7e-05-0.008 against parvoviruses. The complete sequence of the in silico-assembled 3,780-bp contig was confirmed by gene amplification of overlapping regions over almost the entire genome, and the virus was provisionally designated NIH-CQV. Further analysis revealed that the contig was composed of two major ORFs. By protein BLAST, ORF1 and ORF2 were most homologous to the replication-associated protein of bat circovirus and the capsid protein of porcine parvovirus, respectively. Phylogenetic analysis indicated that NIH-CQV is located at the interface of Parvoviridae and Circoviridae. Prevalence of NIH-CQV in patients was determined by quantitative PCR. Sixty-three of 90 patient samples (70%) were positive, but all those from 45 healthy controls were negative. Average virus titer in the patient specimens was 1.05 e4 copies/µL. Specific antibodies against NIH-CQV were sought by immunoblotting. Eighty-four percent of patients were positive for IgG, and 31% were positive for IgM; in contrast, 78% of healthy controls were positive for IgG, but all were negative for IgM. Although more work is needed to determine the etiologic role of NIH-CQV in human disease, our data indicate that a parvovirus-like virus is highly prevalent in a cohort of patients with non-A-E hepatitis.
    Proceedings of the National Academy of Sciences 05/2013; · 9.81 Impact Factor

Full-text (3 Sources)

Available from
Jun 3, 2014