Using a mutual information-based site transition network to map the genetic evolution of influenza A/H3N2 virus

Institute of Bioinformatics, Zhejiang University, Hangzhou, PR China.
Bioinformatics (Impact Factor: 4.62). 09/2009; 25(18):2309-17. DOI: 10.1093/bioinformatics/btp423
Source: PubMed

ABSTRACT Mapping the antigenic and genetic evolution pathways of influenza A is of critical importance in the vaccine development and drug design of influenza virus. In this article, we have analyzed more than 4000 A/H3N2 hemagglutinin (HA) sequences from 1968 to 2008 to model the evolutionary path of the influenza virus, which allows us to predict its future potential drifts with specific mutations.
The mutual information (MI) method was used to design a site transition network (STN) for each amino acid site in the A/H3N2 HA sequence. The STN network indicates that most of the dynamic interactions are positioned around the epitopes and the receptor binding domain regions, with strong preferences in both the mutation sites and amino acid types being mutated to. The network also shows that antigenic changes accumulate over time, with occasional large changes due to multiple co-occurring mutations at antigenic sites. Furthermore, the cluster analysis by subdividing the STN into several subnetworks reveals a more detailed view about the features of the antigenic change: the characteristic inner sites and the connecting inter-subnetwork sites are both responsible for the drifts. A novel five-step prediction algorithm based on the STN shows a reasonable accuracy in reproducing historical HA mutations. For example, our method can reproduce the 2003-2004 A/H3N2 mutations with approximately 70% accuracy. The method also predicts seven possible mutations for the next antigenic drift in the coming 2009-2010 season. The STN approach also agrees well with the phylogenetic tree and antigenic maps based on HA inhibition assays.
All code and data are available at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: H3N2 human influenza A virus causes epidemics of influenza mainly in the winter season in temperate regions. Since the antigenicity of this virus evolves rapidly, several attempts have been made to predict the major amino acid sequence of hemagglutinin 1 (HA1) in the target season of vaccination. However, the usefulness of predicted sequence was unclear because its relationship to the antigenicity was unknown. Here the antigenic model for estimating the degree of antigenic difference (antigenic distance) between amino acid sequences of HA1 was integrated into the process of selecting vaccine strains for H3N2 human influenza A virus. When the effectiveness of a potential vaccine strain for a target season was evaluated retrospectively using the average antigenic distance between the strain and the epidemic viruses sampled in the target season, the most effective vaccine strain was identified mostly in the season one year before the target season (pre-target season). Effectiveness of actual vaccines appeared to be lower than that of the strains randomly chosen in the pre-target season on average. It was recommended to replace the vaccine strain for every target season with the strain having the smallest average antigenic distance to the others in the pre-target season. The procedure of selecting vaccine strains for future epidemic seasons described in the present study was implemented in the influenza virus forecasting system (INFLUCAST) (
    06/2015; 4. DOI:10.1016/j.mgene.2015.03.003
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The 2009 H1N1 influenza pandemic has attracted worldwide attention. The new virus first emerged in Mexico in April, 2009 was identified as a unique combination of a triple-reassortant swine influenza A virus, composed of genetic information from pigs, hu-mans, birds, and a Eurasian swine influenza virus. Several recent studies on the 2009 H1N1 virus util-ized small datasets to conduct analysis. With new se-quences available up to date, we were able to extend the previous research in three areas. The first was finding two networks of co-mutations that may po-tentially affect the current flu-drug binding sites on neuraminidase (NA), one of the two surface proteins of flu virus. The second was discovering a special stalk motif, which was dominant in the H5N1 strains in the past, in the 2009 H1N1 strains for the first time. Due to the high virulence of this motif, the second finding is significant in our current research on 2009 H1N1. The third was updating the phylogenetic an-alysis of current NA sequences of 2009 H1N1 and H5N1, which demonstrated that, in clear contrast to previous findings, the N1 sequences in 2009 are di-verse enough to cover different major branches of the phylogenetic tree of those in previous years. As the novel influenza A H1N1 virus continues to spread globally, our results highlighted the importance of performing timely analysis on the 2009 H1N1 virus.
    Journal of Biomedical Science and Engineering 01/2009; 02(07). DOI:10.4236/jbise.2009.27080
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Pandemic 2009 H1N1; Polymerase The NP, PA, PB1, and PB2 proteins of influenza viruses together are responsible for the tran-scription and replication of viral RNA, and the latter three proteins comprise the viral poly-merase. Two recent reports indicated that the mutation at site 627 of PB2 plays a key role in host range and increased virulence of influenza viruses, and could be compensated by multiple mutations at other sites of PB2, suggesting the association of this mutation with those at other sites. The objective of this study was to analyze the co-mutated sites within and between these important proteins of influenza. With mutual information, a set of statistically significant co-mutated position pairs (P value = 0) in NP, PA, PB1, and PB2 of avian, human, pandemic 2009 H1N1, and swine influenza were identified, based on which several highly connected net-works of correlated sites in NP, PA, PB1, and PB2 were discovered. These correlation net-works further illustrated the inner functional dependence of the four proteins that are critical for host adaptation and pathogenicity. Mutual information was also applied to quantify the correlation of sites within each individual pro-tein and between proteins. In general, the inter protein correlation of the four proteins was stronger than the intra protein correlation. Fi-nally, the correlation patterns of the four pro-teins of pandemic 2009 H1N1 were found to be closer to those of avian and human than to swine influenza, thus rendering a novel insight into the interaction of the four proteins of the pandemic 2009 H1N1 virus when compared to avian, human, and swine influenza and how the origin of these four proteins might affect the correlation patterns uncovered in this analysis.
    Natural Science 01/2010; 2(10):1138-1147. DOI:10.4236/ns.2010.210141