Using a mutual information-based site transition network to map the genetic evolution of influenza A/H3N2 virus.

Institute of Bioinformatics, Zhejiang University, Hangzhou, PR China.
Bioinformatics (Impact Factor: 5.47). 09/2009; 25(18):2309-17. DOI: 10.1093/bioinformatics/btp423
Source: PubMed

ABSTRACT Mapping the antigenic and genetic evolution pathways of influenza A is of critical importance in the vaccine development and drug design of influenza virus. In this article, we have analyzed more than 4000 A/H3N2 hemagglutinin (HA) sequences from 1968 to 2008 to model the evolutionary path of the influenza virus, which allows us to predict its future potential drifts with specific mutations.
The mutual information (MI) method was used to design a site transition network (STN) for each amino acid site in the A/H3N2 HA sequence. The STN network indicates that most of the dynamic interactions are positioned around the epitopes and the receptor binding domain regions, with strong preferences in both the mutation sites and amino acid types being mutated to. The network also shows that antigenic changes accumulate over time, with occasional large changes due to multiple co-occurring mutations at antigenic sites. Furthermore, the cluster analysis by subdividing the STN into several subnetworks reveals a more detailed view about the features of the antigenic change: the characteristic inner sites and the connecting inter-subnetwork sites are both responsible for the drifts. A novel five-step prediction algorithm based on the STN shows a reasonable accuracy in reproducing historical HA mutations. For example, our method can reproduce the 2003-2004 A/H3N2 mutations with approximately 70% accuracy. The method also predicts seven possible mutations for the next antigenic drift in the coming 2009-2010 season. The STN approach also agrees well with the phylogenetic tree and antigenic maps based on HA inhibition assays.
All code and data are available at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The capacity of zoonotic influenza to cross species boundaries to infect humans poses a global health threat. A previous study identified sites in 10 influ-enza proteins that characterize the host shifts from avian to human influenza. Here, we used seven feature selection algorithms based on machine learning techniques to generate a novel and exten-sive selection of diverse sites from the nine internal proteins of influenza based on statistically impor-tance to differentiating avian from human viruses. A set of 131 sites was generated by processing each protein independently, and a selection of 113 sites was found by analyzing a concatenation of se-quences from all nine proteins. These new sites were analyzed according to their annual mutational trends. The correlation of each site with all other sites (one-to-many) and the connectivity within groups of specific sites (one-to-one) were identified. We compared the performance of these new sites evaluated by four classifiers against those recorded in previous research, and found our sites to be bet-ter suited to host distinction in all but one protein, validating the significance of our site selection. Our findings indicated that, in our selection of sites, human influenza tended to mutate more than avian influenza. Despite this, the correlation and connec-tivity between the avian sites was stronger than that of the human sites, and the percentage of sites with high connectivity was also greater in avian influenza.
    Journal of biomedical science and engineering 01/2010; 3. · 0.27 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Vaccine design for rapidly changing viruses is based on empirical surveillance of strains circulating in a given season to assess those that will most likely spread during the next season. The choice of which strains to include in the vaccine is critical, as an erroneous decision can lead to a nonimmunized human population that will then be at risk in the face of an epidemic or, worse, a pandemic. Here, we present the first steps toward a very general phylogenetic approach to predict the emergence of novel viruses. Our genomic model builds upon natural features of viral evolution such as selection and recombination / reassortment, and incorporates episodic bursts of evolution and or of recombination. As a proof-of-concept, we assess the performance of this model in a retrospective study, focusing: (i) on the emergence of an unexpected H3N2 influenza strain in 2007, and (ii) on a longitudinal design. Based on the analysis of hemagglutinin (HA) and neuraminidase (NA) genes, our results show a lack of predictive power in both experimental designs, but shed light on the mode of evolution of these two antigens: (i) supporting the lack of significance of recombination in the evolution of this influenza virus, and (ii) showing that HA evolves episodically while NA changes gradually.
    Journal of Molecular Evolution 12/2013; · 2.15 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The hypothesis that Mutual Information (MI) dendrograms of influenza A viruses reflect informational groups generated during viral evolutionary processes is put forward. Phylogenetic reconstructions are used for guidance and validation of MI dendrograms. It is found that MI profiles display an oscillatory behavior for each of the eight RNA segments of influenza A. It is shown that dendrograms of MI values of geographically and historically different segments coming from strains of RNA virus influenza A turned out to be unexpectedly similar to the clusters, but not with the topology of the phylogenetic trees. No matter how diverse the RNA sequences are, MI dendrograms crisply discern actual viral subtypes together with gain and/or losses of information that occur during viral evolution. The amount of information during a century of evolution of RNA segments of influenza A is measured in terms of bits of information for both human and avian strains. Overall the amount of information of segments of pandemic strains oscillates during viral evolution. To our knowledge this is the first description of clades of information of the viral subtypes and the estimation of the flow content of information, measured in bits, during an evolutionary process of a virus.
    Entropy 07/2013; 15:3065. · 1.35 Impact Factor