Using a mutual information-based site transition network to map the genetic evolution of influenza A/H3N2 virus

Institute of Bioinformatics, Zhejiang University, Hangzhou, PR China.
Bioinformatics (Impact Factor: 4.62). 09/2009; 25(18):2309-17. DOI: 10.1093/bioinformatics/btp423
Source: PubMed

ABSTRACT Mapping the antigenic and genetic evolution pathways of influenza A is of critical importance in the vaccine development and drug design of influenza virus. In this article, we have analyzed more than 4000 A/H3N2 hemagglutinin (HA) sequences from 1968 to 2008 to model the evolutionary path of the influenza virus, which allows us to predict its future potential drifts with specific mutations.
The mutual information (MI) method was used to design a site transition network (STN) for each amino acid site in the A/H3N2 HA sequence. The STN network indicates that most of the dynamic interactions are positioned around the epitopes and the receptor binding domain regions, with strong preferences in both the mutation sites and amino acid types being mutated to. The network also shows that antigenic changes accumulate over time, with occasional large changes due to multiple co-occurring mutations at antigenic sites. Furthermore, the cluster analysis by subdividing the STN into several subnetworks reveals a more detailed view about the features of the antigenic change: the characteristic inner sites and the connecting inter-subnetwork sites are both responsible for the drifts. A novel five-step prediction algorithm based on the STN shows a reasonable accuracy in reproducing historical HA mutations. For example, our method can reproduce the 2003-2004 A/H3N2 mutations with approximately 70% accuracy. The method also predicts seven possible mutations for the next antigenic drift in the coming 2009-2010 season. The STN approach also agrees well with the phylogenetic tree and antigenic maps based on HA inhibition assays.
All code and data are available at

  • [Show abstract] [Hide abstract]
    ABSTRACT: Human influenza A viruses are rapidly evolving pathogens that cause substantial morbidity and mortality in seasonal epidemics around the globe. To ensure continued protection, the strains used for the production of the seasonal influenza vaccine have to be regularly updated, which involves data collection and analysis by numerous experts worldwide. Computer-guided analysis is becoming increasingly important in this problem due to the vast amounts of generated data. We here describe a computational method for selecting a suitable strain for production of the human influenza A virus vaccine. It interprets available antigenic and genomic sequence data based on measures of antigenic novelty and rate of propagation of the viral strains throughout the population. For viral isolates sampled between 2002 and 2007 we used this method to predict the antigenic evolution of the H3N2 viruses in retrospective testing scenarios. When seasons are scored as true or false predictions, our method returned six true positives, three false negatives, eight true negatives and one false positive prediction or 77% accuracy overall. In comparison to the recommendations by the WHO, we identified the correct antigenic variant once at the same time and twice one season ahead. Even though it cannot be ruled out that practical reasons such as lack of a sufficiently well-growing candidate strain may in some cases have prevented recommendation of the best matching strain by the WHO, our computational decision procedure allows to quantitatively interpret the growing amounts of data and may help to match the vaccine better to predominating strains in seasonal influenza epidemics.
    Journal of Virology 08/2014; 88(20). DOI:10.1128/JVI.01861-14 · 4.65 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The 2009 H1N1 influenza pandemic has attracted worldwide attention. The new virus first emerged in Mexico in April, 2009 was identified as a unique combination of a triple-reassortant swine influenza A virus, composed of genetic information from pigs, hu-mans, birds, and a Eurasian swine influenza virus. Several recent studies on the 2009 H1N1 virus util-ized small datasets to conduct analysis. With new se-quences available up to date, we were able to extend the previous research in three areas. The first was finding two networks of co-mutations that may po-tentially affect the current flu-drug binding sites on neuraminidase (NA), one of the two surface proteins of flu virus. The second was discovering a special stalk motif, which was dominant in the H5N1 strains in the past, in the 2009 H1N1 strains for the first time. Due to the high virulence of this motif, the second finding is significant in our current research on 2009 H1N1. The third was updating the phylogenetic an-alysis of current NA sequences of 2009 H1N1 and H5N1, which demonstrated that, in clear contrast to previous findings, the N1 sequences in 2009 are di-verse enough to cover different major branches of the phylogenetic tree of those in previous years. As the novel influenza A H1N1 virus continues to spread globally, our results highlighted the importance of performing timely analysis on the 2009 H1N1 virus.
    Journal of Biomedical Science and Engineering 01/2009; 02(07). DOI:10.4236/jbise.2009.27080
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: H3N2 human influenza A virus causes epidemics of influenza mainly in the winter season in temperate regions. Since the antigenicity of this virus evolves rapidly, several attempts have been made to predict the major amino acid sequence of hemagglutinin 1 (HA1) in the target season of vaccination. However, the usefulness of predicted sequence was unclear because its relationship to the antigenicity was unknown. Here the antigenic model for estimating the degree of antigenic difference (antigenic distance) between amino acid sequences of HA1 was integrated into the process of selecting vaccine strains for H3N2 human influenza A virus. When the effectiveness of a potential vaccine strain for a target season was evaluated retrospectively using the average antigenic distance between the strain and the epidemic viruses sampled in the target season, the most effective vaccine strain was identified mostly in the season one year before the target season (pre-target season). Effectiveness of actual vaccines appeared to be lower than that of the strains randomly chosen in the pre-target season on average. It was recommended to replace the vaccine strain for every target season with the strain having the smallest average antigenic distance to the others in the pre-target season. The procedure of selecting vaccine strains for future epidemic seasons described in the present study was implemented in the influenza virus forecasting system (INFLUCAST) (
    06/2015; 4. DOI:10.1016/j.mgene.2015.03.003