ArticlePDF Available

Association analyses of large-scale glycan microarray data reveal novel host-specific substructures in influenza A virus binding glycans

Springer Nature
Scientific Reports
Authors:
  • Somur Tech. Co. Ltd.

Abstract and Figures

Influenza A viruses can infect a wide variety of animal species and, occasionally, humans. Infection occurs through the binding formed by viral surface glycoprotein hemagglutinin and certain types of glycan receptors on host cell membranes. Studies have shown that the α2,3-linked sialic acid motif (SA2,3Gal) in avian, equine, and canine species; the α2,6-linked sialic acid motif (SA2,6Gal) in humans; and SA2,3Gal and SA2,6Gal in swine are responsible for the corresponding host tropisms. However, more detailed and refined substructures that determine host tropisms are still not clear. Thus, in this study, we applied association mining on a set of glycan microarray data for 211 influenza viruses from five host groups: humans, swine, canine, migratory waterfowl, and terrestrial birds. The results suggest that besides Neu5Acα2–6Galβ, human-origin viruses could bind glycans with Neu5Acα2–8Neu5Acα2–8Neu5Ac and Neu5Gcα2–6Galβ1–4GlcNAc substructures; Galβ and GlcNAcβ terminal substructures, without sialic acid branches, were associated with the binding of human-, swine-, and avian-origin viruses; sulfated Neu5Acα2–3 substructures were associated with the binding of human- and swine-origin viruses. Finally, through three-dimensional structure characterization, we revealed that the role of glycan chain shapes is more important than that of torsion angles or of overall structural similarities in virus host tropisms.
Content may be subject to copyright.
1
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
www.nature.com/scientificreports
Association analyses of large-scale
glycan microarray data reveal
novel host-specic substructures in
inuenza A virus binding glycans
Nan Zhao1,2, Brigitte E. Martin1, Chun-Kai Yang1, Feng Luo3 & Xiu-Feng Wan1,2
Inuenza A viruses can infect a wide variety of animal species and, occasionally, humans. Infection
occurs through the binding formed by viral surface glycoprotein hemagglutinin and certain types
of glycan receptors on host cell membranes. Studies have shown that the α2,3-linked sialic acid
motif (SA2,3Gal) in avian, equine, and canine species; the α2,6-linked sialic acid motif (SA2,6Gal) in
humans; and SA2,3Gal and SA2,6Gal in swine are responsible for the corresponding host tropisms.
However, more detailed and rened substructures that determine host tropisms are still not clear.
Thus, in this study, we applied association mining on a set of glycan microarray data for 211
inuenza viruses from ve host groups: humans, swine, canine, migratory waterfowl, and terrestrial
birds. The results suggest that besides Neu5Acα2–6Galβ, human-origin viruses could bind glycans
with Neu5Acα2–8Neu5Acα2–8Neu5Ac and Neu5Gcα2–6Galβ1–4GlcNAc substructures; Galβ and
GlcNAcβ terminal substructures, without sialic acid branches, were associated with the binding
of human-, swine-, and avian-origin viruses; sulfated Neu5Acα23 substructures were associated
with the binding of human- and swine-origin viruses. Finally, through three-dimensional structure
characterization, we revealed that the role of glycan chain shapes is more important than that of
torsion angles or of overall structural similarities in virus host tropisms.
Inuenza A viruses infect a wide range of hosts, such as humans, sea mammals, swine, bats, and avian,
equine, and canine species1–4. e carbohydrates or glycans in host cells serve as the receptors for inu-
enza viruses and are key to successful virus entry, the rst step in inuenza infection5. e structures of
these glycan receptors have been shown to be unique in animal hosts and even within dierent tissues
in the same host, and these unique glycan structures determine host and tissue tropisms of inuenza A
viruses.
In humans, glycans with α 2,6-linked sialic acid (SA2,6Gal) are detected more plentifully in the upper
respiratory tract than the lower respiratory tract. SA2,6Gal and SA2,3Gal (α 2,3-linked sialic acid) are
heterogeneously distributed in the human nasopharynx and bronchi. e expression of SA2,3Gal is
greater than that of SA2,6Gal in the respiratory tract of young children6,7. In avian species, SA2,3Gal and
SA2,6Gal are distributed within respiratory and intestinal tracts. Although SA2,3Gal are mostly found
in waterfowl, it is possible that SA2,3Gal and SA2,6Gal could be expressed very dierently in terrestrial
birds8–11. Swine express both SA2,3Gal and SA2,6Gal in the respiratory tract, but SA2,6Gal are abundant
in the upper trachea and bronchi, and SA2,3Gal are more abundant in the lower respiratory tract10,12–14.
e presence of both SA2,3Gal and SA2,6Gal (e.g. Neu5Acα 2–3Gal and Neu5Acα 2–6Gal) in swine
1Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, MS, USA. 2Institute
for Genomics, Biocomputing & Biotechnology, Mississippi State University, MS, USA. 3School of Computing,
Clemson University, Clemson, SC, USA. Correspondence and requests for materials should be addressed to
X.-F.W. (email: wan@cvm.msstate.edu)
Received: 23 June 2015
Accepted: 29 September 2015
Published: 28 October 2015
OPEN
www.nature.com/scientificreports/
2
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
could allow these animals to be susceptible to avian-origin and human-origin inuenza A viruses; thus
swine have been proposed as “mixing vessels” for inuenza viruses12.
By presenting multiple glycans or glycoconjugates printed on a single slide, the glycan microarray
technique has oered high-throughput analyses of the glycan-binding prole of inuenza viruses15,16.
Glycan microarray has become a routine experimental tool for characterizing the receptor-binding pro-
les of inuenza viruses17. To date, > 500 inuenza virus–related glycan microarray data entries have
been deposited in the Consortium for Functional Glycomics (CFG) glycan microarray database18, and
this number is still increasing. Glycan microarray proling of inuenza virus pandemic strains has
shed light on the receptor-binding specicities of their hemagglutinin (HA). For example, such analyses
revealed that the 2009 inuenza A(H1N1)pdm09 pandemic virus bound to α 2,6-linked and to a large
range of α 2,3-linked sialyl sequences19–22. Moreover, glycan microarray analysis has been widely used
to study receptor recognition and host tropism of inuenza virus mutants23–28. In addition to providing
data on Neu5Acα 2–3Gal and Neu5Acα 2–6Gal, glycan microarray analysis also provided data on other
complicated glycan substructures. In one study, structural topology (i.e., two glycan chain shapes, one
cone-like and the other umbrella-like) was reported to be related to SA2,3Gal and SA2,6Gal during inu-
enza virus–receptor interactions29. On the other hand, during a glycan microarray screening, inuenza A
viruses were shown to bind receptors other than SA2,3Gal and SA2,6Gal, although such bindings have
not been conrmed by interventional experiments22,30. For example, inuenza A (H1N1) virus can bind
α 2,8-linked polysialyl sequences22. Nevertheless, it is still unclear what specic substructures or moieties
in host receptors determine inuenza virus host tropisms.
To better understand structural specicities for glycan binding, Cholleti et al.31 developed an algo-
rithm called GlycanMotifMiner, or GLYMMR, that is frequently used with subtree mining to identify
motifs for protein–glycan interactions for a given glycan microarray data entry. Porter et al.32 applied a
clustering algorithm to identify glycan substructures with high intensities in the glycan array data. More
recently, we developed a novel quantitative structure–activity relationship (QSAR) method to analyze
glycan array data; the method focuses on glycan substructure features by applying PLS regression and
selection functions to the glycan microarray data33. Another frequent glycan structure mining of inu-
enza virus data also detected sulfated glycan motifs increased viral infection34. However, none of the
above methods were designed for large-scale glycan microarray data analysis that integrates multiple
microarray data entries for a particular research interest. Particularly, statistic-based motif identica-
tion methods rely on pre-dened hypothesis and could not discover unexpected and infrequent ones.
Feature selection strategies for the regressions of glycan microarray data have not considered modeling
multiple microarrays. us, a computational method is needed to characterize glycan substructure motifs
by utilizing the information across multiple datasets, especially glycan microarray data across various
platforms, and this method must be able to tolerate the noises within and across glycan microarray data.
e relationship between host receptors (glycan substructures) and inuenza A viruses (e.g., viruses
with dierent host origins) can be naturally formulated as a computational problem of data integration
plus association rule mining. erefore, in this study, we rstly applied a PLA regression on individual
glycan microarray data entries as normalization and then used association rule analysis on extracted
glycan substructure features to identify motifs for inuenza virus host tropisms. In addition to SA2,3Gal
and SA2,6Gal, results showed that glycan substructures with SA2,8SA, non-sialic acid saccharides (Galβ
and GlcNAcβ terminal substructures), and sulfated SA2,3Gal could contribute to inuenza host tro-
pism dierently. Additional computational modeling demonstrated that, for trisaccharide substructures,
a shape angle formed by mass centers of three residues, instead of linkage torsion angles, may determine
the overall glycan chain shapes and thus distinguish glycans with SA2,3Gal from those with SA2,6Gal
or SA2,8SA. ese ndings may imply a more general property caused by glycan terminals than just by
sialic acid with dierent linkages during inuenza – glycan binding.
Methods
Figure1 shows a simplied owchart of the computational strategy we used, with glycan microarray data,
to identify host-specic glycan substructures. In brief, we collected and integrated glycan microarray
datasets (Fig.1A), dened and extracted substructures from glycans (Fig.1B), and applied association
rule mining to identify the inuenza viruses’ specic glycans and their substructures (Fig.1C).
Datasets. Collection of inuenza A virus–specic glycan microarray data. A computational script was
written to automatically retrieve glycan microarray datasets from CFG18 by using the keyword “inu-
enza.” A total of 542 glycan microarray entries were retrieved, of which 324 were excluded from the nal
dataset: 31 entries for inuenza B viruses, 182 for mutant viruses, 51 for mouse-adapted strains, 53 for
HA recombinant proteins, and 7 for microarrays with incomplete binding anity values. e remaining
218 entries were for inuenza A virus–specic glycan microarray datasets with complete binding anity
values. ese datasets, which were used for further analyses (Table1), consisted of inuenza A viruses of
human origin (n = 154), waterfowl origin (n = 17), terrestrial bird origin (n = 13), canine origin (n = 6),
and swine origin (n = 21). e metadata associated with these datasets, including CFG entry identica-
tion codes, investigators’ names, inuenza virus sample names, glycan array version, raw array binding
les, and host species, are listed in Table S1 in the supporting information.
www.nature.com/scientificreports/
3
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
Integration of glycan microarray datasets. In the CFG database, the datasets were generated by using
11 versions of glycan microarrays, each of which had a dierent number of glycan entries. For example,
version 1 had 200 glycans, whereas version 5.1 had 611 glycans. However, most glycans present in earlier
glycan microarray version are present in later versions. To facilitate the data analyses across dierent
datasets, we merged all microarray versions into one with 936 unique glycans (Table S2) and generated
a single matrix for the 211 data entries (211 viruses × 936 glycans, Table S3). Glycan-binding anities
(i.e., uorescent signal values) in our dataset were assigned to corresponding elements in the matrix. e
elements for which there was no corresponding anity value among the 936 glycans were assigned a “not
available” value and excluded from the glycan substructure feature selection.
Figure 1. Flowchart of the computational analysis protocol. (A) Glycan microarray data collection for
inuenza virus bindings. (B) Glycan structure organizations and substructures’ feature extraction and
selection. (C) Association rule mining on viral host labeled substructure feature vectors. CFG, Consortium
for Functional Glycomics; HA, hemagglutinin; QSAR, quantitative structure–activity relationship.
www.nature.com/scientificreports/
4
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
Glycan substructure feature extraction. Glycan substructures were dened as described else-
where33. Specically, mono-, di-, tri-, and tetrasaccharide substructures were extracted from 936 glycans
as features. ese extractions resulted in 249 monosaccharide, 738 disaccharide, 1,198 trisaccharide, and
1,477 tetrasaccharide substructures (Tables S4–S7). e uorescent signal value for the corresponding
glycan on the array was assigned as the binding anity for each individual substructure. Only uores-
cent signal values 2,000 were considered as eective numbers in regression, and those < 2,000 were
treated as background noise. Next, a partial least squares (PLS) regression and feature selection algorithm
(QSAR33) were adapted to select the features predominating glycan binding from an inuenza virus–
specic glycan microarray dataset (see details in Supplementary Information). is PLS regression was
performed four times for each single data entry from our 211 glycan microarrays by using four sets of
substructure feature denitions (mono-, di-, tri-, and tetrasaccharides). Each feature vector was nally
labeled according to the host origin of the inuenza A viruses used in the glycan microarray experiments
(i.e., human, swine, canine, waterfowl, or terrestrial bird [chicken, quail, and turkey] host).
Association rule mining for selected glycan substructures. We formulated the detection of
host-specic glycan substructures as an association-mining task (see more details in Supplementary
Information), where we let items
=,,,Iiii{}
n12
represent a set of items and let
=,,,Ttt t{}
m12
be a set of transactions forming a database. An association rule,
XY
, where
,⊆XY I
, is usually
interpreted to mean that when the items in
X
exist, those in
Y
also occur at a certain condence level35.
Here, for our glycan microarray dataset, transactions T were the data derived from inuenza virus–spe-
cic glycan microarray entries, so m = 211; the substructure features X derived from glycans on the array
by previous PLS-β selection and the labeled features Y with host origin will form I. Given a rule
XY
,
the condence is dened as
(⇒)= ()/()Conf XY supp XYsupp X
, where sup (X) is the support of
item set
X
. e support was dened as the proportion of transactions in the dataset, which contains the
item set. Another measurement, li, is the ratio of the observed support and was dened as
(⇒)= ()/( ()×())Lift XY supp XYsupp XsuppY
35. erefore, we expected to obtain interesting
association rules with high condences ( 80%), high lis (has a li value 136), and low supports
( 0.005, infrequent but potentially interesting) to supply highly probable, unexpected, and infrequent
conclusions. We adapted the Apriori algorithm implemented in R37 to infer these host substructure–spe-
cic associations. Moreover, during the mining process, redundant rules were also removed by dening
super rules as redundancy. A super rule is a rule with the same or lower li value, where the le hand
side, X, contains more items than a previous rule, but still results in the same right hand side, Y. Last,
we kept only satised rules, which were ltered by leaving only those with terminal saccharides on the
substructure features.
Three-dimensional structural modeling and analysis. Structural characterization for terminal gly-
can saccharides. To understand the structural determinants for a specic glycan associated with certain
inuenza A virus, we compared the spatial relationship between six terminal trisaccharide features derived
from data mining. ese six features were (Neu5Acα 2–6)-(6Galβ 1–4)-4GlcNAc (PDB38 accession number:
Array
versions No. Inuenza
virus arrays
Host species
Human Swine Canine
Avian
Wat erfo w l Terrestrial
V1.0 7 5 0 0 2 0
V2.0 11 10 0 0 1 0
V2.1 4 4 0 0 0 0
V3.0 10 10 0 0 0 0
V3.1 21 2 1 0 8 10
V3.2 5 4 0 0 0 1
V4.0 33 31 2 0 0 0
V4.1 28 15 13 0 0 0
V4.2 20 9 3 0 6 2
V5.0 68 64 0 4 0 0
V5.1 4 0 2 2 0 0
Tot a l 211 154 21 6 17 13
Table 1. Glycan-binding microarray data collected for 211 wild-type inuenza A virus–specic glycan
microarray datasets with complete binding anity values.
www.nature.com/scientificreports/
5
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
3UBN39), (Neu5Acα 2–8)–(8Neu5Acα 2–8)–8Neu5Ac (3HMY40), (Neu5Acα 2–3)–(3Galβ 1–4)-4GlcNAc
(3UBQ39), (Neu5Gcα 2–3)–(3Galβ 1–4)–4GlcNAc (4POT41), (Galβ 1–4)–(4GlcNAcβ 1–3)–3Gal (2XRS42),
and (Galβ 1–4)(Fucβ 1–3)–(3,4GlcNAcβ 1–3)–3Gal (1SL543). e following three geometric measure-
ments were calculated:
(1) e angle formed by the mass centers of three saccharides. We calculated the angle formed by the mass
centers of three saccharides as a measurement of the glycan chain’s turning shape.
(2) e root-mean-square deviation (RMSD). Given two glycan substructures, each containing the termi-
nal saccharide, we superimposed the corresponding atoms on the six-membered rings of the terminal
saccharides. From there, while keeping the terminal saccharides superimposed, the following two
values of RMSD were measured: RMSD2 and RMSD3. Using the standard formula of calculating
RMSD from two sets of six-membered ring atoms
and
w
:
()
(, )= (− )+(− )+(− )
=
RM
SD vw vw vw
vw
n
i
nix ix iy iy iy iy
1
1
222
, where n = 6, RMSD2 was
calculated for the two saccharides linked directly to their respective terminal saccharides. If both
glycan substructures had a third saccharide, RMSD3 was then calculated for the third pair of
saccharides.
(3) φ and ψ torsion angles. We calculated the φ and ψ torsion angles for each linkage between two adja-
cent residues. Glycosidic torsions were dened by only heavy linkage atoms as
φ
= O5–C1–On–Cn
and ψ = C1–On–Cn–Cn-144. Accordingly, a trisaccharide substructure has two linkages with two sets
of torsions.
ree-dimensional structures for protein–glycan interactions. To demonstrate structural interactions
between inuenza viruses and glycan substructures at the molecular level, we used four HA protein
crystal structures. ese structures were from viruses with dierent host origins (A/California/04/09
H1N1, human origin, SA2,6Gal specic; A/swine/Iowa/15/1930 H1N1, swine origin, SA2,6Gal spe-
cic; A/Canine/Colorado/06 H3N8, canine origin, SA2,3Gal specic; A/Vietnam/1203/2004 H5N1,
avian origin, SA2,3Gal specic) and were obtained from PDB38 data entries, 3LZG45, 1RVT46, 4UO547,
and 2FK020, respectively. We docked these HA proteins with corresponding glycan substructures
(6SLN [α 2–6-sialyl-N-acetyllactosamine], analogous to Neu5Acα 2–6Galβ 1–4GlcNAc, and 3SLN [α
2–3-sialyl-N-acetyllactosamine], analogous to Neu5Acα 2–3Galβ 1–4GlcNAc) and highlighted con-
served and diverse regions on the receptor-binding pocket (amino acid residues 98, 133–138, 153, 183,
188–195, 221–228 by H3 numbering). Substructures, (Neu5Acα 2–8)–(8Neu5Acα 2–8)–8Neu5Ac and
(Galβ 1–4)–(4GlcNAcβ 1–3)–3Gal, were also simulated against bound 6SLN and 3SLN to be docked at the
receptor-binding pocket of the human-origin and the avian-origin viral HAs respectively. e HA-glycan
docking was conducted by following three steps: 1) an initial structure was obtained by superposing the
structures of HA and a glycan analog against a native HA 3D structure with glycan; 2) an energy min-
imization with 500 steps of conjugation and 500 additional steps of steepest descent was performed by
using the AMBER force eld48 at a GROMACS49 dynamic simulation process; and 3) the nal complex
structure was obtained aer an binding free energy repairing by using the FoldX soware50 at simulation
temperature of 298K and without hydrogen atoms.
Results
Inuenza virus–specic features derived from glycan microarray data by PLS regression and
feature selection. Certain saccharide residues are enriched at glycan substructures contributing to
inuenza virus binding. In the integrated dataset of glycan microarrays with 936 unique glycans, the
glycans with inuenza virus–binding anities mostly consist of sialic acids (Neu5Ac and Neu5Gc),
galactose (Gal), N-acetylgalactosamine (GalNAc), glucose (Glc), N-acetylglucosamine (GlcNAc), fucose
(Fuc), and mannose (Man). Table2 summarizes the number and percentage of glycan substructures that
have these saccharides by each one of the four substructure feature denitions on the glycan microarrays
(i.e., mono-, di-, tri-, and tetrasaccharides) and thus reects their existence according to the microarray
design. Table 3 lists the same distribution values of inuenza virus–specic substructures selected by
PLS regression and illustrates that only a small portion of glycan substructures (73/249 monosaccha-
rides, 230/738 disaccharides, 322/1,198 trisaccharides, and 320/1,477 tetrasaccharides) was determined
to contribute to a binding signal of 2,000 with inuenza viruses. All PLS-selected substructure features
are also summarized in Table S8.
A comparison of data in Table2 with that in Table3 shows that Neu5Ac, Neu5Gc, Gal, and GlcNAc
were more abundant in the glycan substructures contributing to inuenza virus binding. For example,
Neu5Ac appeared in 9.59% of the monosaccharides, 11.3% of the disaccharides, 18.0% of the trisaccha-
rides, and 31.9% of the tetrasaccharides when QSAR was used to select signicant glycan substructures
for inuenza virus binding (Table3), compared with 4.82%, 6.10%, 7.68%, and 10.9%, respectively, of all
the glycan substructures from microarrays (Table2). Similar dierences were observed for Neu5Gc, Gal,
and GlcNAc. ese ndings suggest that inuenza virus–specic glycan substructures are prone to have
these four saccharides. Nevertheless, when QSAR was used, glycan substructures with GalNAc, Glc, Fuc,
and Man were equally or less frequently correlated with inuenza virus binding than those on the glycan
www.nature.com/scientificreports/
6
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
arrays (Tables2 and 3), which indicates a limited contribution to inuenza binding by substructures with
these saccharides.
Host-specic glycan substructures derived from association rule mining. To understand the
specic substructures associated with each inuenza A virus, we performed associate rule analyses across
211 inuenza virus–specic glycan microarray data. On the basis of their host origins, the 211 inuenza
A viruses were categorized into human (n = 154), canine (n = 6), swine (n = 21), waterfowl (n = 17),
terrestrial (i.e., chicken, quail, and turkeys, n= 13), and avian (waterfowl plus terrestrial birds, n = 30).
e association analysis results (summarized in Fig.2 and Table S9) illustrate the specic substructures
being associated with each of six host origins; these associations aid in our understanding of the key
substructures that determine inuenza host and tissue tropisms.
Neu5Acα2–8Neu5Acα2–8Neu5Ac and Neu5Gcα2–6Galβ1–4GlcNAc substructures contribute to the glycan
binding with human-origin inuenza A viruses. In addition to the reported α 2,6-linked sialic acid glycan
substructures (with a Neu5Acα 2–6 terminal), which were detected multiple times (28 rules in Table S9)
to be associated with human-origin inuenza A viruses (Fig.2), Neu5Acα 2–8Neu5Acα 2–8Neu5Ac (fre-
quency = 0.0299, condence = 1.00, li = 1.44) and Neu5Gcα 2–6Galβ 1–4GlcNAc (frequency = 0.0479,
condence = 1.00, li = 1.44) substructures were also found to be associated with human-origin inuenza
A viruses’ glycan binding (Fig.2 and Table S9). In Fig.3A,B, as case studies, two glycan microarray data
entries of human-origin viruses demonstrate the signicantly high binding anities to these substruc-
tures separately. Specically, virus A/OK/5342/2010 binds to glycan Neu5Acα 2–8Neu5Acα 2–8Neu5Ac
and A/Memphis/4/73 to glycan Neu5Gcα 2–6Galβ 1–4GlcNAc. Although Neu5Gc has not been reported
to be present in human respiratory track tissues, human-origin viruses may have the binding ability to
Residue
No. (%) glycan substructures
Monosaccharide
N = 249 Disaccharide
N = 738 Trisaccharide
N = 1,198 Tetrasaccharide
N = 1,477
Neu5Ac 12 (4.82) 45 (6.10) 92 (7.68) 162 (10.9)
Neu5Gc 4 (1.61) 10 (1.36) 16 (1.34) 18 (1.22)
Gal 44 (17.7) 333 (45.1) 811 (67.7) 1,064 (72.0)
GalNAc 35 (14.1) 166 (22.5) 329 (27.5) 400 (27.1)
Glc 31 (12.4) 109 (14.8) 175 (14.6) 201 (13.6)
GlcNAc 45 (18.1) 364 (49.3) 816 (68.1) 1,206 (81.7)
Fuc 6 (2.41) 43 (5.83) 167 (13.9) 310 (20.9)
Man 28 (11.2) 89 (12.1) 235 (19.6) 558 (37.8)
Table 2. Distribution of saccharide residues in the glycan substructures from all glycans on
the microarrays. Abbreviations: Fuc, fucose; Gal, galactose; GalNAc, N-acetylgalactosamine; Glc,
glucose; GlcNAc, N-acetylglucosamine; Man, mannose; Neu5Ac, N-acetylneuraminic acid; Neu5Gc, N-
glycolylneuraminic acid.
Residue
No. (%) glycan substructures
Monosaccharide
N = 73 (out of 249)
Disaccharide
N = 230 (out of
738)
Trisaccharide
N = 322 (out of
1,198)
Tetrasaccharide
N = 320 (out of
1,477)
Neu5Ac 7 (9.59) 26 (11.3) 58 (18.0) 102 (31.9)
Neu5Gc 1 (1.37) 3 (1.30) 5 (1.55) 6 (1.88)
Gal 13 (17.8) 107 (46.5) 241 (74.8) 264 (82.5)
GalNAc 10 (13.7) 54 (23.5) 84 (26.1) 76 (23.8)
Glc 4 (5.48) 13 (5.65) 26 (8.07) 32 (10.0)
GlcNAc 15 (20.5) 146 (63.5) 242 (75.2) 268 (83.8)
Fuc 0 (0.00) 7 (3.04) 24 (7.45) 41 (12.8)
Man 12 (16.4) 27 (11.7) 38 (11.8) 79 (24.7)
Table 3. Distribution of saccharide residues in the glycan substructures selected by using the
quantitative structure–activity relationship. Abbreviations: Fuc, fucose; Gal, galactose; GalNAc,
N-acetylgalactosamine; Glc, glucose; GlcNAc, N-acetylglucosamine; Man, mannose; Neu5Ac, N-
acetylneuraminic acid; Neu5Gc, N-glycolylneuraminic acid.
www.nature.com/scientificreports/
7
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
α 2,6-linked Neu5Gc substructures on glycan microarrays. is observation also suggests that, in addi-
tion to Neu5Acα 2–6 terminal, other sialic acids with either α 2–6 or α 2–8 linkages may be recognized
by human-origin inuenza A viruses.
Galβ and GlcNAcβ terminal substructures, in addition to Sialic Acids, associated with the glycan bind-
ing of human-, swine-, and avian-origin inuenza A viruses. e α 2,3-linked and α 2,6-linked sialic
acid glycan substructures were identied as predominated glycan binding motifs of all types of inu-
enza A viruses (Fig.2). However, it was interesting to observe that glycan substructures with Galβ and
GlcNAcβ terminals were detected to be associated with human-, swine-, and avian-origin viruses. ese
two terminal saccharides are usually followed by β 1,3-, β 1,4-linked, and occasionally α 1,3-, α 1,3-linked
Gal, GlcNAc, or GalNAc (e.g., Galβ 1–4(Fucα 1–3)GlcNAcβ 1–3Gal with human-origin virus binding:
frequency = 0.0359, condence = 0.857, li = 1.23; GlcNAcα 1–4Galβ 1–4GlcNAc with swine-origin virus
binding: frequency = 0.012, condence = 1.00, li = 9.28; Galβ 1–4GlcNAcβ 1–3Gal with avian-origin
virus binding: frequency = 0.0419, condence = 1.00, li = 5.96) (Fig. 2 and Table S9). Moreover, these
Figure 2. Host-specic glycan substructures detected by association rule mining. Human-associated,
mammal (swine and canine)-associated, and avian (waterfowl and terrestrial bird)-associated terminal
substructures.
www.nature.com/scientificreports/
8
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
Figure 3. Case studies: individual glycan microarrays of inuenza A viruses with interesting binding
motifs. (A) Human-origin virus A/OK/5342/2010 binds glycan Neu5Acα 2–8Neu5Acα 2–8Neu5Ac.
(B) Human-origin virus A/Memphis/4/73 binds glycan Neu5Gcα 2–6Galβ 1–4GlcNAc. (C) Human-origin
virus A/Oklahoma 309/06 shows binding ability to glycan with Galβ 1–4(Fucα 1–3)GlcNAcβ 1–3Gal
substructure. (D) Swine-origin virus A/swine/New Jersey/1/76 shows binding ability to glycans with
GlcNAcα 1–4Galβ 1–4GlcNAc substructure. (E) Waterfowl-origin virus A/Duck/HK/149/1977 shows binding
ability to glycans with Galβ 1–4GlcNAcβ 1–3Gal substructure. (F) Human-origin virus A/Aichi/2/68 binds to
sulfated glycan with α 2,3-linked sialic acid terminals. (G) Swine-origin virus A/Swine/Iowa/1876 binds to
sulfated glycan with α 2,3-linked sialic acid terminals.
www.nature.com/scientificreports/
9
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
Galβ and GlcNAcβ terminal substructures could result in inuenza viruses binding independently, since
many glycans with either Galβ or GlcNAcβ terminals, out of the 936 unique glycans on microarrays, do
not contain any sialic acid saccharides (Table S2). And there are individual microarray data entries show-
ing inuenza viruses bind to Galβ or GlcNAcβ terminal glycans with no other branches at the same time.
For instance, in Fig.3C,E, human-origin virus A/Oklahoma 309/06, swine-origin virus A/Swine/New
Jersey/1/76, and waterfowl-origin virus A/Duck/HK/149/1977 all have relatively high binding signals on
their individual glycan microarrays. We then conclude that, without existing sialic acid saccharide resi-
dues, glycans having Galβ or GlcNAcβ terminal could serve as potential receptors for inuenza A virus.
Sulfation causes Neu5Acα2–3 substructures associated with the glycan binding of human- and swine-origin
inuenza A viruses. As with human-origin inuenza A viruses, not surprisingly, multiple substructures
(19 rules in Table S9) with Neu5Acα 2–6 were identied as being associated with the glycan binding of
swine-origin inuenza A viruses. In addition, the substructures with Neu5Acα 2–3 were also associated
with swine-origin inuenza A viruses (Fig.2). However, interestingly, the substructures with Neu5Acα 2–3
terminals are usually sulfated on the following saccharide residues, when they were identied to be
associated with either human- or swine-origin inuenza A viruses. For example, Neu5Acα 2–3(6OSO3)
Galβ 1–4GlcNAc was associated with human-origin (frequency = 0.0719, condence = 0.80, li = 1.51)
and swine-origin (frequency = 0.0179, condence = 1.00, li = 9.28) viruses separately (Fig. 2 and
Table S9). As shown in Fig. 3F,G, a human-origin virus A/Aichi/2/68 and a swine-origin A/Swine/
Iowa/1976 both have their highest binding anity to sulfated glycans with α 2,3-linked sialic acid termi-
nals (Neu5Acα 2–3Galβ 1–4(6OSO3)GlcNAc, and Neu5Acα 2–3(6OSO3)Galβ 1–4GlcNAc). is associa-
tion rule linking sulfated Neu5Acα 2–3 glycans and human-, swine-origin virus binding may support a
unique role of sulfation during human and swine adaptation of avian-origin inuenza A viruses.
Consensus among the inuenza virus–specic glycan substructures. To identify common fea-
tures from the substructures associated with dierent hosts, we compared the structural similarity among
them by calculating, as described in Methods, the angle formed by three mass centers of all residues for
trisaccharide substructures, RMSD2 and RMSD3, and φ and ψ torsion angles of linkages for the six rep-
resentative glycan substructures (Fig.4). In addition, superposition images of these glycan substructures
are shown in Fig.5.
3D structural characterization for glycan substructures with sialic acid terminals. As shown in Fig.4A,D,
we obtained four trisaccharide three-dimensional structures, of which one has SA2,6Gal terminal, one
has SA2,8SA terminal, and two have SA2,3Gal terminals. ree observations were made from the sub-
structures. First, the residue mass centers for the SA2,6Gal substructure formed acute angles (63.1°), for
the SA2,8SA substructure formed an angle of 91.1°, and the mass centers for both α 2,3-linked substruc-
tures formed obtuse angles (142.9° and 132.2°). is observation suggests that SA2,6Gal and SA2,3Gal
substructures are fundamentally dierent from each other on saccharide chain shapes and thus could
lead to virus host tropism, in which human inuenza viruses recognize glycans with SA2,6Gal termi-
nals, canine and avian viruses recognize glycans with SA2,3Gal terminals specically, but swine viruses
can recognize and bind to both shapes. Moreover, the α 2,8-linked polysialyl substructure with a right
angle shares a more similar turning shape to the one of SA2,6Gal and then may cause the human-origin
inuenza virus binding.
Second, the all-against-all RMSD values for these glycan substructures indicate that none of the sub-
structures with sialic acid terminals are similar on the basis of both RMSD2 and RMSD3 values, if we
dene similar saccharide structures by using RMSD2 smaller than 3 Å and RMSD3 smaller than 5 Å
(Table4, Fig.5). is nding shows that shape angles formed by residue mass centers are not the sole
factor for glycan structural diversity.
e third observation involves the linkage torsion angles of these four representative host-specic
trisaccharides (Table5). It is shown that, although most torsions of linkage 2 share similar values and
hence do not contribute much to virus host types, both φ and ψ angles of linkage 1 distribute vari-
ously and indicate the shape-forming roles of α 2,6, α 2,8 and α 2,3 linkages with terminal sialic acids.
In particular, the linkage 1 φ angle values of these four trisaccharides, combined with their shape angles
formed by residue mass centers, could shed some light on the relationship between glycan geometric
shapes and inuenza virus host types. On one hand, when trisaccharides with SA2,6Gal or SA2,8SA
have angles of acute shapes (Fig.4A,B), a positive φ angle (e.g., 71.32° of Neu5Acα 2–6Gal or 55.05° of
Neu5Acα 2–8Neu5Ac in Table 5) might be necessary to make the glycan associated with human host
type; however, the association might not be unique because the positive φ angle might also result in an
association with swine viruses. On the other hand, when trisaccharides with SA2,3Gal have angles of
obtuse shapes (Fig.4C,D), dierent terminal residues (i.e., Neu5Ac and Neu5Gc) form φ angles with
dierent values (e.g., a 59.47° of Neu5Acα 2–3Gal and a 50.95° of Neu5Gcα 2–3Gal in Table5). is
observation illustrates that an obtuse angle of α 2,3-linked trisaccharides is sucient, but not necessary,
for a glycan to associate with non-human-origin viruses and that a positive φ torsion angle at linkage
1 may make the trisaccharides associated with canine- and avian-origin viruses only. Furthermore, all
four trisaccharides, except Neu5Acα 2–8Neu5Acα 2–8Neu5Ac, have linkage 1 ψ angles of similar negative
values and therefore do not show a clear relationship with virus host types.
www.nature.com/scientificreports/
10
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
In summary, the structural characteristics of glycan trisaccharides with sialic acid terminals might
be associated with inuenza virus host tropisms. For example, it seems that the shape angle formed by
residue mass centers plus the linkage 1 φ torsion angle, not just torsion angles themselves, might suggest
certain glycan structural patterns associated with inuenza virus host tropism.
3D structural characterization for glycan substructures with a Gal terminal. For glycan trisaccharide
substructures with a Gal, we found two additional representative three-dimensional structures of glycans
Figure 4. ree-dimensional structures of host-specic glycan substructures. (AD) Representative
structures of host-specic trisaccharide substructures with sialic acid terminals. e shape angles were
calculated by the mass centers of three residues. (E–F) Structures of trisaccharide substructures with Gal
(non-sialic acid) terminal saccharide.
Figure 5. Superposition of terminal saccharide residues and the root-mean-square deviations (RMSDs)
between the second (RMSD2) and the third (RMSD3) residues.
www.nature.com/scientificreports/
11
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
that had the same terminal residues and linkages associated with viruses of dierent host-origins:
Galβ 1–4GlcNAcβ 1–3Gal and the fucosylated one Galβ 1–4(Fucα 1–3)GlcNAcβ 1–3Gal (Fig. 4E,F). e
Galβ 1–4GlcNAcβ 1–3Gal backbone of both substructures formed similar obtuse angles (156.9° and
146.1°), while the additional fucose residue formed a 54.9° angle with the Galβ 1–4GlcNAcβ 1 terminal.
is additional turning angle introduced by fucosylation may lead to the human-origin virus bind-
ing (Fig. 4F). Next, we measured RMSDs between the Galβ 1–4GlcNAcβ 1–3Gal substructure and the
ones with sialic acid terminals (Table 4 and Fig. 5C,D). Although Galβ 1–4GlcNAcβ 1–3Gal has simi-
lar RMSD2 values with Neu5Acα 2–6Galβ 1–4GlcNAc (2.78 Å) and with Neu5Acα 2–3Galβ 1–4GlcNAc
(2.99 Å), a smaller RMSD3 value (6,44 Å) with Neu5Acα 2–3Galβ 1–4GlcNAc showed its better struc-
tural similarity to the SA2,3Gal motif. us, this nding might suggest that the glycan substructure
Galβ 1–4GlcNAcβ 1–3Gal does not have to maintain a sialic acid terminal to share a structural similarity
with the avian–virus-binding motif (SA2,3Gal). Last, nevertheless, torsion angles of both linkages of
Galβ 1–4GlcNAcβ 1–3Gal have unique values comparing to those of sialic acid substructures and there-
fore do not support any relationships with various virus bindings.
Structural conservation of receptor binding pocket in inuenza A viruses. In Fig. 6A,B,
we show superposed HA receptor binding pockets of dierent inuenza viruses (human-origin with
swine-origin viruses and canine-origin with avian-origin viruses) interacting with 6SLN and 3SLN (anal-
ogous to glycan substructures). Human- and swine-origin HAs recognize glycans with α 2,6-linked sialic
acid terminals, and they share a very conserved receptor binding pocket, which diers by only four
amino acid residues (133A, 225, 227, and 189) for the dierent host type viruses (Fig. 6A). Similarly,
canine- and avian-origin HAs recognize glycans with α 2,3-linked sialic acid terminals, and they also
have a conserved receptor binding pocket, but with more diverse residues (133A is deleted on canine
HA, and residues dier at amino acids 135, 137, 221, 222, 224, 188, and 189) (Fig.6B).
In Fig.6C,D, we docked Neu5Acα 2–8Neu5Ac and Galβ 1–4GlcNAcβ 1–3Gal to the receptor binding
pocket of the human-origin HA (PDB 3LZG) and the avian-origin HA (PDB 2FK0) separately by using
a HA-glycan structural complex as the template (see Methods). Previous association results suggested
Glycan substructure names RMSD 2
value, Å RMSD 3
value, ÅSubstructure 1 Substructure 2
Neu5Acα 2-6Galβ 1-4GlcNAc Neu5Acα 2-8 Neu5Acα 2-8Neu5Ac 3.13 5.58
Neu5Acα 2-6Galβ 1-4GlcNAc Neu5Acα 2-3Galβ 1-4GlcNAc 3.32 7.15
Neu5Acα 2-6Galβ 1-4GlcNAc Neu5Gcα 2-3Galβ 1-4GlcNAc 2.86 7.26
Neu5Acα 2-6Galβ 1-4GlcNAc Galβ 1-4GlcNAcβ 1-3Gal 2.78 8.97
Neu5Acα 2-8 Neu5Acα 2-8Neu5Ac Neu5Acα 2-3Galβ 1-4GlcNAc 5.08 8.90
Neu5Acα 2-8 Neu5Acα 2-8Neu5Ac Neu5Gcα 2-3Galβ 1-4GlcNAc 3.15 4.86
Neu5Acα 2-8 Neu5Acα 2-8Neu5Ac Galβ 1-4GlcNAcβ 1-3Gal 3.19 7.83
Neu5Acα 2-3Galβ 1-4GlcNAc Neu5Gcα 2-3Galβ 1-4GlcNAc 3.29 8.01
Neu5Acα 2-3Galβ 1-4GlcNAc Galβ 1-4GlcNAcβ 1-3Gal 2.99 6.44
Neu5Gcα 2-3Galβ 1-4GlcNAc Galβ 1-4GlcNAcβ 1-3Gal 1.94 4.38
Table 4. All-against-all RMSD values of representative inuenza A host-specic glycan trisaccharide
substructures. Abbreviations: RMSD, root-mean-square deviation; RMSD2, the RMSD between the two six-
membered rings of the saccharides linked to the terminal saccharide; RMSD3, the RMSD between them of
the third pair of saccharides.
Glycan substructure name
Linkage 1 Linkage 2
φψφψ
Neu5Acα 2-6Galβ 1-4GlcNAc 71.32 151.25 94.84 82.75
Neu5Acα 2–8 Neu5Acα 2–8Neu5Ac 55.05 112.07 57.35 121.40
Neu5Acα 2–3Galβ 1–4GlcNAc 59.47 126.54 81.61 124.23
Neu5Gcα 2–3Galβ 1–4GlcNAc 50.95 134.56 67.73 105.46
Galβ 1–4GlcNAcβ 1–3Gal 96.61 95.29 46.87 139.85
Table 5. Torsion angles (φ and ψ) of linkage 1 and linkage 2 of representative inuenza A host-specic
glycan substructures.
www.nature.com/scientificreports/
12
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
a relationship of Neu5Acα 2–8Neu5Ac and Galβ 1–4GlcNAcβ 1–3Gal with the binding for inuenza A
viruses; thus, their comparable binding poses are expected to occur at the virus HA binding pockets
(Fig.6C,D).
Discussion
e objective of this study was to characterize the host-specic glycan substructure responding to inu-
enza A virus infections. Glycan microarray data provide an opportunity to systematically study the
factors that determine virus–glycan binding. However, such analyses have several limitations. e rst
limitation is that glycan microarray data are not quantitative because values from batch to batch are
highly variable. e variability is caused by spot intensities dependent on immobilization eciency and
results in the misleading use of uorescence intensities to quantify binding anities51. e second lim-
itation is that the glycans on microarray do not represent all glycans or all substructures in the natural
hosts, and they are also distributed dierently from those in nature. e last limitation is that the number
of datasets for inuenza A viruses from viruses of dierent host origins are not equal. For example, we
have 155 datasets for human-origin inuenza A viruses but only 7 for canine-origin inuenza A viruses.
In this study, we expected association analysis to detect signicantly nonrandom, but possibly infre-
quent, substructure features contributing to inuenza A virus binding. To ensure better coverage of all
potential substructures, hierarchical clusters (mono-, di-, tri-, and tetra-) of substructure proles were
characterized and integrated into data mining, and our analyses focused on the terminal structures. To
minimize the potential noise across dierent datasets due to variations in glycan microarray versions and
experiments, we integrated the signicant substructures extracted from each individual dataset by PLS
regression. To identify the host-associated glycan substructures, we categorized 211 data entries into ve
categories (human, swine, canine, waterfowl, and terrestrial birds) and then formulated glycan substruc-
ture problems as a typical association mining problem, where we treated glycan substructure features
as products, virus host types as the only label of customs, and the glycan–virus binding signals in the
Figure 6. ree-dimensional structures of dierent hemagglutinin (HA) receptor binding pockets
interacting with various glycan substructures. (A) Human-origin (Protein Data Bank [PDB] entry 3LZG)
and swine-origin (PDB 1RVT) HA (recognizing α 2,6 sialic acid) superposed and bound to 6SLN (analogous
to Neu5Aca2–6Galb1–4GlcNAc). (B) Canine-origin (PDB 4UO5) and avian-origin (PDB 2FK0) HA
(recognizing α 2,3 sialic acid) superposed and bound to 3SLN (analogous to Neu5Aca2–3Galb1–4GlcNAc).
(C) A predicted docked structure of Neu5Acα 2–8 Neu5Acα 2–8Neu5Ac interacting with HA receptor
binding pocket (human-origin, PDB 3LZG). (D) A predicted docked structure of Galβ 1–4GlcNAcβ 1–3Gal
interacting with HA receptor binding pocket (avian-origin, PDB 2FK0).
www.nature.com/scientificreports/
13
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
dataset as transactions. Comparing to other methods, either statistical or mining strategies, our formu-
lation of the problem benets the novel observations in this study in two following ways. First, aer the
PLS regressions on individual glycan microarray entries, the binding transection denition was used to
integrate all of them for a cross-array analysis, by which we overcame the challenges from the varying
numbers of glycans on dierent version of arrays. Second, the association mining strategy avoided par-
ticular hypothesis before analyses and were able to detect rare but potentially signicant rules.
We have not been able to use this method to identify the specic substructures for glycan bind-
ings when multiple terminal glycans are present. For example, glycans with dierent terminals (e.g.
sialic acid and Gal) were observed frequently, but they may both be important players during inuenza
virus binding because they could bind inuenza viruses simultaneously. To avoid this problem, in this
study, we ignored the associated substructures with branch linkages, because they may be extracted from
a glycan with other terminals and by themselves may not contribute to virus binding. To avoid such
false-positives, we included in the results only terminal substructures without branches. Moreover, four
substructure denitions (mono-, di-, tri-, tetrasaccharide) could lead to overlapped glycan features that
were associated with the same virus host. For example, in Fig.2, swine-associated disaccharides are all
subsets of the corresponding trisaccharides, which are subsets of corresponding tetrasaccharides. Similar
patterns could be observed with other host-origin categories (Table S9). To be consistent, we interpreted
these overlapped rules by ignoring subset features and by keeping only substructures with the highest
number of saccharide residues (see Supplementary Methods).
Our results show that (1) human-origin inuenza A viruses could bind glycans with
Neu5Acα 2–8Neu5Acα 2–8Neu5Ac and Neu5Gcα 2–6Galβ 1–4GlcNAc substructures; (2) Galβ and
GlcNAcβ terminal substructures, without any existing sialic acid terminals, are associated with the glycan
binding of human-, swine-, and avian-origin inuenza A viruses; (3) Sulfated Neu5Acα 2–3 substruc tures
are believed to be associated with the glycan binding of human- and swine-origin inuenza A viruses.
ese observations, on one hand, are consistent with previously reported results about various types of
host-origin inuenza A viruses5. On the other hand, we also identied other substructures: α 2,6-linked
Neu5Gc substructures, α 2,8-linked multiple sialic acids, substructures with a Gal and GlcNAc terminals,
and sulfated α 2,3-linked Neu5Ac, which contribute to dierent virus bindings. ese newly discovered
inuenza A binding moieties, particularly those with the non-sialic acidic saccharides (Gal, GlcNAc),
may suggest that it is the structural pattern of acidic acids, instead of just Neu5Ac, Neu5Gc themselves,
which are recognized by inuenza viruses of various host origins.
e potential glycan receptors with α 2,8-linked sialic acid were reported to be associated with
inuenza virus binding22, which supports our results with Neu5Acα 2–8Neu5Acα 2–8Neu5Ac for
human inuenza viruses. e relatively low 3D structural similarities between this substructures and
human-like α 2,6-linked sialic acid substructures (Table4) could imply a potentially novel binding mode
for Neu5Acα 2–8Neu5Acα 2–8Neu5Ac (Fig.6C). Similarly, it has been reported that glycans with Gal ter-
minals could play a role in some virus receptor binding52,53. Our association results detailed this conclu-
sion, especially for Galβ 1–4GlcNAcβ 1–3Gal substructure, by supplying similar structural characteristics
to substructures with sialic acid. Concerning the associations detected for sulfated α 2,3-linked Neu5Ac,
it was reported that sulfated glycan motifs might increase inuenza virus binding34. Our results further
suggest that this sulfation process may lead to a SA2,3Gal binding for a SA2,6Gal-binding virus. It is
worth mentioning that all these substructure motifs were infrequent substructures in association rules
(Table S9), indicating eectiveness of association mining in this study.
Our three-dimensional structure analysis of representative host-specic substructures showed that for
trisaccharides, the shape angle formed by mass centers of three residues could be the key feature that dis-
tinguishes α 2,6-linked, α 2,8-linked and α 2,3-linked glycans and their virus host tropisms (Fig.4A–D).
Although recent studies argued that the dierent torsion angles of residue linkages could be the reason
for their diverse chain shapes29,44, our torsion angle values calculated from the three-dimensional struc-
tures did not support a role for torsion angle in forming the overall trisaccharide chain shapes. Hence,
we argue that signicant host-specic patterns related to glycan shape may become evident if shape
angles are measured instead of exible torsion angles. In addition, for trisaccharides without sialic acid
terminals (e.g. Galβ 1–4GlcNAcβ 1–3Gal), neither torsion angles nor RMSD values could suggest any
host-specic patterns from our results. However, since we only found a few unique such glycans asso-
ciated with inuenza viruses, we considered them only as individual cases of virus binding without an
identiable structural feature for host tropism.
References
1. Webster, . G., Bean, W. J., Gorman, O. T., Chambers, T. M. & awaoa, Y. Evolution and ecology of inuenza A viruses.
Microbiol. ev. 56, 152–179 (1992).
2. Sehel, J. J. & Wiley, D. C. eceptor binding and membrane fusion in virus entry: the inuenza hemagglutinin. Annu. ev.
Biochem. 69, 531–569, doi: 10.1146/annurev.biochem.69.1.531 (2000).
3. Tong, S. et al. A distinct lineage of inuenza A virus from bats. Proc. Natl. Acad. Sci. USA 109, 4269–4274, doi: 10.1073/
pnas.1116200109 (2012).
4. Tong, S. et al. New world bats harbor diverse inuenza A viruses. PLoS Pathog. 9, e1003657,doi: 10.1371/journal.ppat.1003657
(2013).
5. de Graaf, M. & Fouchier, . A. ole of receptor binding specicity in inuenza A virus transmission and pathogenesis. EMBO
J. 33, 823–841, doi: 10.1002/embj.201387442 (2014).
www.nature.com/scientificreports/
14
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
6. Nicholls, J. M., Bourne, A. J., Chen, H., Guan, Y. & Peiris, J. S. Sialic acid receptor detection in the human respiratory tract:
evidence for widespread distribution of potential binding sites for human and avian inuenza viruses. espir. es. 8, 73, doi:
10.1186/1465-9921-8-73 (2007).
7. Walther, T. et al. Glycomic analysis of human respiratory tract tissues and correlation with inuenza virus infection. PLoS Pathog.
9, e1003223, doi: 10.1371/journal.ppat.1003223 (2013).
8. Franca, M., Stallnecht, D. E. & Howerth, E. W. Expression and distribution of sialic acid inuenza virus receptors in wild birds.
Avian Pathol. 42, 60–71, doi: 10.1080/03079457.2012.759176 (2013).
9. Costa, T. et al. Distribution patterns of inuenza virus receptors and viral attachment patterns in the respiratory and intestinal
tracts of seven avian species. Vet. es. 43, 28, doi: 10.1186/1297-9716-43-28 (2012).
10. Nelli, . . et al. Comparative distribution of human and avian type sialic acid inuenza receptors in the pig. BMC Vet. es. 6,
4, doi: 10.1186/1746-6148-6-4 (2010).
11. Trebbien, ., Larsen, L. E. & Viu, B. M. Distribution of sialic acid receptors and inuenza A virus of avian and swine origin in
experimentally infected pigs. Virol. J. 8, 434, doi: 10.1186/1743-422X-8-434 (2011).
12. Scholtisse, C. Pigs as ‘mixing vessels’ for the creation of new pandemic inuenza A viruses. Med. Prin. Pract. 2, 65–71 (1990).
13. Ito, T. et al. Molecular basis for the generation in pigs of inuenza A viruses with pandemic potential. J. Virol. 72, 7367–7373
(1998).
14. Van Pouce, S. G., Nicholls, J. M., Nauwync, H. J. & Van eeth, . eplication of avian, human and swine inuenza viruses in
porcine respiratory explants and association with sialic acid distribution. Virol. J. 7, 38, doi: 10.1186/1743-422X-7-38 (2010).
15. Blixt, O. et al. Printed covalent glycan array for ligand proling of diverse glycan binding proteins. Proc. Natl. Acad. Sci. USA
101, 17033–17038, doi: 10.1073/pnas.0407902101 (2004).
16. Alvarez, . A. & Blixt, O. Identication of ligand specicities for glycan-binding proteins using glycan arrays. Methods Enzymol.
415, 292–310, doi: 10.1016/S0076-6879(06)15018-1 (2006).
17. Stevens, J., Blixt, O., Paulson, J. C. & Wilson, I. A. Glycan microarray technologies: tools to survey host specicity of inuenza
viruses. Nat. ev. Microbiol. 4, 857–864, doi: 10.1038/nrmicro1530 (2006).
18. aman, . et al. Advancing glycomics: implementation strategies at the consortium for functional glycomics. Glycobiology 16,
82–90, doi: 10.1093/glycob/cwj080 (2006).
19. Stevens, J. et al. Glycan microarray analysis of the hemagglutinins from modern and pandemic inuenza viruses reveals dierent
receptor specicities. J. Mol. Biol. 355, 1143–1155, doi: 10.1016/j.jmb.2005.11.002 (2006).
20. Stevens, J. et al. Structure and receptor specicity of the hemagglutinin from an H5N1 inuenza virus. Science 312, 404–410,
doi: 10.1126/science.1124513 (2006).
21. umari, . et al. eceptor binding specicity of recent human H3N2 inuenza viruses. Virol. J. 4, 42, doi: 10.1186/1743-422X-
4-42 (2007).
22. Childs, . A. et al. eceptor-binding specicity of pandemic inuenza A (H1N1) 2009 virus determined by carbohydrate
microarray. Nat. Biotechnol. 27, 797–799, doi: 10.1038/nbt0909-797 (2009).
23. Nobusawa, E., Ishihara, H., Morishita, T., Sato, . & Naajima, . Change in receptor-binding specicity of recent human
inuenza A viruses (H3N2): a single amino acid change in hemagglutinin altered its recognition of sialyloligosaccharides.
Virology 278, 587–596, doi: 10.1006/viro.2000.0679 (2000).
24. Liu, Y. et al. Altered receptor specicity and cell tropism of D222G hemagglutinin mutants isolated from fatal cases of pandemic
A(H1N1) 2009 inuenza virus. J. Virol. 84, 12069–12074, doi: 10.1128/JVI.01639-10 (2010).
25. Yang, Z. Y. et al. Immunization by avian H5 inuenza hemagglutinin mutants with altered receptor binding specicity. Science
317, 825–828, doi: 10.1126/science.1135165 (2007).
26. Belser, J. A. et al. Eect of D222G mutation in the hemagglutinin protein on receptor binding, pathogenesis and transmissibility
of the 2009 pandemic H1N1 inuenza virus. PLoS One 6, e25091, doi: 10.1371/journal.pone.0025091 (2011).
27. Puzelli, S. et al. Transmission of hemagglutinin D222G mutant strain of pandemic (H1N1) 2009 virus. Emerg. Infect. Dis. 16,
863–865, doi: 10.3201/eid1605.091815 (2010).
28. Yang, G. et al. Mutation tryptophan to leucine at position 222 of haemagglutinin could facilitate H3N2 inuenza A virus
infection in dogs. J. Gen. Virol. 94, 2599–2608 (2013).
29. Chandrasearan, A. et al. Glycan topology determines human adaptation of avian H5N1 virus hemagglutinin. Nat. Biotechnol.
26, 107–113, doi: 10.1038/nbt1375 (2008).
30. Stevens, J. et al. eceptor specicity of inuenza A H3N2 viruses isolated in mammalian cells and embryonated chicen eggs. J.
Virol. 84, 8287–8299, doi: 10.1128/JVI.00058-10 (2010).
31. Cholleti, S. . et al. Automated motif discovery from glycan array data. OMICS 16, 497–512, doi: 10.1089/omi.2012.0013 (2012).
32. Porter, A. et al. A motif-based analysis of glycan array data to determine the specicities of glycan-binding proteins. Glycobiology
20, 369–380, doi: 10.1093/glycob/cwp187 (2010).
33. Xuan, P., Zhang, Y., Tzeng, T. ., Wan, X. F. & Luo, F. A quantitative structure-activity relationship (QSA) study on glycan array
data to determine the specicities of glycan-binding proteins. Glycobiology 22, 552–560, doi: 10.1093/glycob/cwr163 (2012).
34. Ichimiya, T., Nishihara, S., Taase-Yoden, S., ida, H. & Aoi-inoshita, . Frequent glycan structure mining of inuenza virus
data revealed a sulfated glycan motif that increased viral infection. Bioinformatics 30, 706–711, doi: 10.1093/bioinformatics/
btt573 (2014).
35. Hand, D. J., Mannila, H. & Smyth, P. In Principles of data mining. Ch. 13, 254–267 (MIT press, 2001).
36. Geng, L. & Hamilton, H. J. Interestingness measures for data mining: A survey. ACM Comput. Surv. (CSU) 38, 9 (2006).
37. Borgelt, C. & ruse, . Induction of association rules: Apriori implementation. In Proceedings of the 15th conference on Compstat.
395–400 (Springer, 2002).
38. Berman, H. M. et al. e protein data ban. Nucleic Acids es. 28, 235–242 (2000).
39. Xu, ., McBride, ., Nycholat, C. M., Paulson, J. C. & Wilson, I. A. Structural characterization of the hemagglutinin receptor
specicity from the 2009 H1N1 inuenza pandemic. J. Virol. 86, 982–990, doi: 10.1128/JVI.06322-11 (2012).
40. Chen, C., Fu, Z., im, J. J., Barbieri, J. T. & Baldwin, M. . Gangliosides as high anity receptors for tetanus neurotoxin. J. Biol.
Chem. 284, 26569–26577, doi: 10.1074/jbc.M109.027391 (2009).
41. han, Z. M. et al. Crystallographic and glycan microarray analysis of human polyomavirus 9 VP1 identies N-glycolyl neuraminic
acid as a receptor candidate. J. Virol. 88, 6100–6111, doi: 10.1128/JVI.03455-13 (2014).
42. Holmner, A. et al. Crystal structures exploring the origins of the broader specicity of escherichia coli heat-labile enterotoxin
compared to cholera toxin. J. Mol. Biol. 406, 387–402, doi: 10.1016/j.jmb.2010.11.060 (2011).
43. Guo, Y. et al. Structural basis for distinct ligand-binding and targeting properties of the receptors DC-SIGN and DC-SIGN.
Nat. Struct. Mol. Biol. 11, 591–598, doi: 10.1038/nsmb784 (2004).
44. amiya, Y., Yagi-Utsumi, M., Yagi, H. & ato, . Structural and molecular basis of carbohydrate-protein interaction systems as
potential therapeutic targets. Curr. Pharm. Des. 17, 1672–1684 (2011).
45. Xu, . et al. Structural basis of preexisting immunity to the 2009 H1N1 pandemic inuenza virus. Science 328, 357–360, doi:
10.1126/science.1186430 (2010).
www.nature.com/scientificreports/
15
SCIENTIFIC RepoRts | 5:15778 | DOI: 10.1038/srep15778
46. Gamblin, S. J. et al. e structure and receptor binding properties of the 1918 inuenza hemagglutinin. Science 303, 1838–1842,
doi: 10.1126/science.1093155 (2004).
47. Collins, P. J. et al. ecent evolution of equine inuenza and the origin of canine inuenza. Proc. Natl. Acad. Sci. USA 111,
11175–11180, doi: 10.1073/pnas.1406606111 (2014).
48. Ponder, J. W. & Case, D. A. Force elds for protein simulations. Adv Protein Chem 66, 27–85 (2003).
49. Pron, S. et al. GOMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolit. Bioinformatics
29, 845–854, doi:10.1093/bioinformatics/btt055 (2013).
50. Schymowitz, J. et al. e FoldX web server: an online force eld. Nucleic Acids es. 33, W382–388, doi: 10.1093/nar/gi387
(2005).
51. Liang, P. H., Wu, C. Y., Greenberg, W. A. & Wong, C. H. Glycan arrays: biological and medical applications. Curr. Opin. Chem.
Biol. 12, 86–92, doi: 10.1016/j.cbpa.2008.01.031 (2008).
52. Shen, S., Bryant, . D., Brown, S. M., andell, S. H. & Asoan, A. Terminal N-lined galactose is the primary receptor for adeno-
associated virus 9. J. Biol. Chem. 286, 13532–13540 (2011).
53. Upham, J. P., Picett, D., Irimura, T., Anders, E. M. & eading, P. C. Macrophage receptors for inuenza A virus: role of the
macrophage galactose-type lectin and mannose receptor in viral entry. J. Virol. 84, 3730–3737, doi: 10.1128/JVI.02148-09 (2010).
Acknowledgements
We thank Drs. Robert Woods and Chi-Ren Shyu for critical discussions. is study was supported by
grants 1R15AI107702 and P20GM103646 from National Institutes of Health.
Author Contributions
N.Z. and X.F.W. conceived the experiment design. N.Z. conducted the experiments. N.Z. and B.E.M.
performed the data collection. F.L. implemented the feature extraction program. N.Z. and C.K.Y. analyzed
the results. N.Z., C.K.Y. and X.F.W. wrote the manuscript.
Additional Information
Supplementary information accompanies this paper at http://www.nature.com/srep
Competing nancial interests: e authors declare no competing nancial interests.
How to cite this article: Zhao, N. et al. Association analyses of large-scale glycan microarray data
reveal novel host-specic substructures in inuenza A virus binding glycans. Sci. Rep. 5, 15778; doi:
10.1038/srep15778 (2015).
is work is licensed under a Creative Commons Attribution 4.0 International License. e
images or other third party material in this article are included in the article’s Creative Com-
mons license, unless indicated otherwise in the credit line; if the material is not included under the
Creative Commons license, users will need to obtain permission from the license holder to reproduce
the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
... 111,112 Glycan microarray profiling of influenza strains has provided a new understanding of the specificity of hemagglutinins, especially the specificity of binding toward the internal glycan beyond the sialyl galactose linkage. 113,114 Different virus families and their preference for carbohydrate structures as receptors for entry are presented in Table 2. Most of the members of these virus families bind to glycoepitopes containing terminal sialic acids of sulfated glycan motifs of proteoglycan chains. 115−119 Protein−glycan interactions also occur during the infection of many other pathogens. ...
Article
Glycoconjugates are major constituents of mammalian cells that are formed via covalent conjugation of carbohydrates to other biomolecules like proteins and lipids and often expressed on the cell surfaces. Among the three major classes of glycoconjugates, proteoglycans and glycoproteins contain glycans linked to the protein backbone via amino acid residues such as Asn for N-linked glycans and Ser/Thr for O-linked glycans. In glycolipids, glycans are linked to a lipid component such as glycerol, polyisoprenyl pyrophosphate, fatty acid ester, or sphingolipid. Recently, glycoconjugates have become better structurally defined and biosynthetically understood, especially those associated with human diseases, and are accessible to new drug, diagnostic, and therapeutic developments. This review describes the status and new advances in the biological study and therapeutic applications of natural and synthetic glycoconjugates, including proteoglycans, glycoproteins, and glycolipids. The scope, limitations, and novel methodologies in the synthesis and clinical development of glycoconjugates including vaccines, glyco-remodeled antibodies, glycan-based adjuvants, glycan-specific receptor-mediated drug delivery platforms, etc., and their future prospectus are discussed.
... Only a few studies have been conducted in which Neu5Gc, a common form of Sia, acts as an influenza virus receptor [61][62][63][64][65][66], while most studies have focused on Neu5Ac. Glycan microarray analyses suggested that avian, swine, canine, and equine IAVs have the ability to bind both Neu5Ac and Neu5Gc, whereas human IAVs typically do not bind Neu5Gc [67]. In human respiratory tracts, the epithelial cells do not express Neu5Gc, whereas those in swine, equine, and canine respiratory tracts do [61,65,68]. ...
Article
Full-text available
In humans and other mammals, the respiratory tract is represented by a complex network of polarized epithelial cells, forming an apical surface facing the external environment and a basal surface attached to the basement layer. These cells are characterized by differential expression of proteins and glycans, which serve as receptors during influenza virus infection. Attachment between these host receptors and the viral surface glycoprotein hemagglutinin (HA) initiates the influenza virus life cycle. However, the virus receptor binding specificities may not be static. Sialylated N-glycans are the most well-characterized receptors but are not essential for the entry of influenza viruses, and other molecules, such as O-glycans and non-sialylated glycans, may be involved in virus-cell attachment. Furthermore, correct cell polarity and directional trafficking of molecules are essential for the orderly development of the system and affect successful influenza infection; on the other hand, influenza infection can also change cell polarity. Here we review recent advances in our understanding of influenza virus infection in the respiratory tract of humans and other mammals, particularly the attachment between the virus and the surface of the polar cells and the polarity variation of these cells due to virus infection.
... For example, the analysis of stool and serum samples from Clostridium difficile patients on microarrays containing oligosaccharide epitopes of the C. difficile cell wall polysaccharides PS-I and PS-II confirmed novel vaccine candidates [213]. Glycan arrays are powerful resources when assessing bacteria or viruses binding to host carbohydrate structures, such as interactions of the influenza virus to host receptors [214][215][216][217]. Further, arrays consisting of immobilized antibodies or lectins have been developed to analyze carbohydrate-binding properties [218]. ...
Article
Full-text available
Carbohydrate-specific antibodies are widespread among all classes of immunoglobulins. Despite their broad occurrence, little is known about their formation and biological significance. Carbohydrate-specific antibodies are often classified as natural antibodies under the assumption that they arise without prior exposure to exogenous antigens. On the other hand, various carbohydrate-specific antibodies, including antibodies to ABO blood group antigens, emerge after the contact of immune cells with the intestinal microbiota, which expresses a vast diversity of carbohydrate antigens. Here we explore the development of carbohydrate-specific antibodies in humans, addressing the definition of natural antibodies and the production of carbohydrate-specific antibodies upon antigen stimulation. We focus on the significance of the intestinal microbiota in shaping carbohydrate-specific antibodies not just in the gut, but also in the blood circulation. The structural similarity between bacterial carbohydrate antigens and surface glycoconjugates of protists, fungi and animals leads to the production of carbohydrate-specific antibodies protective against a broad range of pathogens. Mimicry between bacterial and human glycoconjugates, however, can also lead to the generation of carbohydrate-specific antibodies that cross-react with human antigens, thereby contributing to the development of autoimmune disorders.
... They further reported another strategy to elucidate higher-level lectin binding preference by treating glycan microarray with exoglycosidases before binding assays (Klamer et al. 2017). Other valuable software/algorithms have also been developed to mine refined glycan substructures that influence protein-glycan interactions (Xuan et al. 2012;Kletter et al. 2013;Agravat et al. 2014;Zhao et al. 2015;Grant et al. 2016). All these algorithms are based on systematic analyses of existing microarray result datasets, which should be done with caution concerning diversity on array platforms, conjugation or immobilization strategies and processing protocols (Padler-Karavani et al. 2012;Wang et al. 2014). ...
Article
Full-text available
Glycans mediate a wide variety of biological roles via recognition by glycan-binding proteins (GBPs). Comprehensive knowledge of such interaction is thus fundamental to glycobiology. While the primary binding feature of GBPs can be easily uncovered by using a simple glycan microarray harboring limited numbers of glycan motifs, their fine specificities are harder to interpret. In this study, we prepared 98 closely related N-glycoforms that contain 5 common glycan epitopes which allowed the determination of the fine binding specificities of several plant lectins and anti-glycan antibodies. These N-glycoforms differ from each other at the monosaccharide level and were presented in an identical format to ensure comparability. With the analysis platform we used, it was found that most tested GBPs have preferences toward only one branch of the complex N-glycans, and their binding toward the epitope-presenting branch can be significantly affected by structures on the other branch. Fine specificities described here are valuable for a comprehensive understanding and applications of GBPs.
Article
Through their specific interactions with proteins, cellular glycans play key roles in a wide range of physiological and pathological processes. One of the main goals of research in the areas of glycobiology and glycomedicine is to understand glycan-protein interactions at the molecular level. Over the past two decades, glycan microarrays have become powerful tools for the rapid evaluation of interactions between glycans and proteins. In this review, we briefly describe methods used for the preparation of glycan probes and the construction of glycan microarrays. Next, we highlight applications of glycan microarrays to rapid profiling of glycan-binding patterns of plant, animal and pathogenic lectins, as well as other proteins. Finally, we discuss other important uses of glycan microarrays, including the rapid analysis of substrate specificities of carbohydrate-active enzymes, the quantitative determination of glycan-protein interactions, discovering high-affinity or selective ligands for lectins, and identifying functional glycans within cells. We anticipate that this review will encourage researchers to employ glycan microarrays in diverse glycan-related studies.
Article
N-glycans on the cell surface provide distinct signatures that are recognized by different glycan-binding proteins (GBPs) and pathogens. Most glycans in humans are asymmetric and isomeric, yet their biological functions are not well understood due to their lack of availability for studies. In this work, we have developed an improved strategy for asymmetric N-glycan as-sembly and diversification using designed common core substrates prepared chemically for selective enzymatic fucosylation and sialylation. The resulted 26 well-defined glycans that carry the sialic acid residue on different antennae were used in a microarray as a representative application to profile the binding specificity of hemagglutinin (HA) from the avian influenza virus (H5N2). We found distinct binding affinity for the Neu5Ac-Gal epitope linked to the N-acetylglucosamine (GlcNAc) of different branches and only minor effect in binding for the terminal galactose on different branches. Overall, the microarray analysis showed branch-biased and context-based recognition patterns.
Article
Due to its relatively small size, homology to humans and susceptibility to human viruses, the tree shrew becomes an ideal alternative animal model for the study of human viral infectious diseases. However, there is still no report for the comprehensive glycan profile of the respiratory tract tissues in tree shrews. In this study, we characterized the structural diversity of N-glycans in the respiratory tract of tree shrews using our well-established TiO2-PGC chip-Q-TOF-MS method. As a result, a total of 219 N-glycans were identified. Moreover, each identified N-glycan was quantitated by a high sensitivity and accurate MRM method, in which 13C-labeled internal standards were used to correct the inherent run-to-run variation in MS detection. Our results showed that N-glycan composition in turbinate and lung were significantly different from soft palate, trachea and bronchus. Meanwhile, 28 high level N-glycans in turbinate were speculated to be correlated with the infection of H1N1 virus A/California/04/2009. This study is the first to reveal the comprehensive glycomic profile of the respiratory tract of tree shrews. Our results also help to better understand the role of glycan receptors in human influenza infection and pathogenesis.
Article
The development of a simple detection method with high sensitivity is essential for the diagnosis and surveillance of infectious diseases. Previously, we constructed a sensitive biosensor for the detection of pathological human influenza viruses using a boron-doped diamond electrode terminated with a sialyloligosaccharide receptor-mimic peptide that could bind to hemagglutinins involved in viral infection. Circulation of influenza induced by the avian virus in humans has become a major public health concern, and methods for the detection of avian viruses are urgently needed. Here, peptide density and dendrimer generation terminated on the electrode altered the efficiency of viral binding to the electrode surface, thus significantly enhancing charge-transfer resistance measured by electrochemical impedance spectroscopy. The peptide-terminated electrodes exhibited an excellent detection limit of less than one plaque-forming unit of seasonal H1N1 and H3N2 viruses. Furthermore, the improved electrode was detectable for avian viruses isolated from H5N3, H7N1, and H9N2, showing the potential for the detection of all subtypes of influenza A virus, including new subtypes. The peptide-based electrochemical architecture provided a promising approach to biosensors for ultrasensitive detection of pathogenic microorganisms.
Article
Full-text available
Significance Equine influenza viruses of the H3N8 subtype have caused outbreaks of respiratory disease in horses throughout the world since their discovery in 1963 in Florida. In 2004 an equine virus in circulation was transmitted to dogs and subsequently spread throughout the United States and to Europe. Comparative analyses of the structures of hemagglutinin glycoproteins of equine and canine viruses by X-ray crystallography locate the sites of variation on the molecules, indicate a role in determining binding specificity for an amino acid sequence difference in the receptor binding site, and describe a unique structural difference in the membrane fusion region in recent equine and canine virus HAs by comparison with all other known HAs. These differences are proposed to have facilitated cross-species transfer.
Article
Full-text available
Unlabelled: Human polyomavirus 9 (HPyV9) is a closely related homologue of simian B-lymphotropic polyomavirus (LPyV). In order to define the architecture and receptor binding properties of HPyV9, we solved high-resolution crystal structures of its major capsid protein, VP1, in complex with three putative oligosaccharide receptors identified by glycan microarray screening. Comparison of the properties of HPyV9 VP1 with the known structure and glycan-binding properties of LPyV VP1 revealed that both viruses engage short sialylated oligosaccharides, but small yet important differences in specificity were detected. Surprisingly, HPyV9 VP1 preferentially binds sialyllactosamine compounds terminating in 5-N-glycolyl neuraminic acid (Neu5Gc) over those terminating in 5-N-acetyl neuraminic acid (Neu5Ac), whereas LPyV does not exhibit such a preference. The structural analysis demonstrated that HPyV9 makes specific contacts, via hydrogen bonds, with the extra hydroxyl group present in Neu5Gc. An equivalent hydrogen bond cannot be formed by LPyV VP1. Importance: The most common sialic acid in humans is 5-N-acetyl neuraminic acid (Neu5Ac), but various modifications give rise to more than 50 different sialic acid variants that decorate the cell surface. Unlike most mammals, humans cannot synthesize the sialic acid variant 5-N-glycolyl neuraminic acid (Neu5Gc) due to a gene defect. Humans can, however, still acquire this compound from dietary sources. The role of Neu5Gc in receptor engagement and in defining viral tropism is only beginning to emerge, and structural analyses defining the differences in specificity for Neu5Ac and Neu5Gc are still rare. Using glycan microarray screening and high-resolution protein crystallography, we have examined the receptor specificity of a recently discovered human polyomavirus, HPyV9, and compared it to that of the closely related simian polyomavirus LPyV. Our study highlights critical differences in the specificities of both viruses, contributing to an enhanced understanding of the principles that underlie pathogen selectivity for modified sialic acids.
Article
Full-text available
It is well known influenza viruses recognize and bind terminal sialic acid on glycans that are found on the cell surface. In this work, we used a data mining technique to analyze the glycan array data of influenza viruses to find novel glycan structures other than sialic acid that may be involved in viral infection. In addition to sialic acid structures noted previously, we noted the sulfated structures in the mining results. For verification, we overexpressed the sulfotransferase that is involved in synthesizing these structures, and we performed a viral infection experiment to assess changes in infection in these cells. In our results, we found that there is a 70-fold increase in these cells compared to the control. Thus we have indeed found a novel pattern in glycan structures that may be involved in viral infection.Availability and Implementation: The Glycan Miner Tool is available from the RINGS resource at http://www.rings.t.soka.ac.jp. kkiyoko@soka.ac.jp SUPPLEMENTARY INFORMATION: Supplementary information is available at http://www.rings.t.soka.ac.jp/supp/miner_virus2013/.
Article
Full-text available
Author Summary Previous studies indicated that a novel influenza A virus (H17N10) was circulating in fruit bats from Guatemala (Central America). Herein, we investigated whether similar viruses are present in bat species from South America. Analysis of rectal swabs from bats sampled in the Amazon rainforest region of Peru identified another new influenza A virus from bats that is phylogenetically distinct from the one identified in Guatemala. The genes that encode the surface proteins of the new virus from the flat-faced fruit bat were designated as new subtype H18N11. Serologic testing of blood samples from several species of Peruvian bats indicated a high prevalence of antibodies to the surface proteins. Phylogenetic analyses demonstrate that bat populations from Central and South America maintain as much influenza virus genetic diversity in some gene segments as all other mammalian and avian species combined. The crystal structures of the hemagglutinin and neuraminidase proteins indicate that sialic acid is not a receptor for virus attachment nor a substrate for release, suggesting a novel mechanism of influenza A virus attachment and activation of membrane fusion for entry into host cells. In summary, our findings indicate that bats constitute a potentially important reservoir for influenza viruses.
Article
In this review we examine the hypothesis that aquatic birds are the primordial source of all influenza viruses in other species and study the ecological features that permit the perpetuation of influenza viruses in aquatic avian species. Phylogenetic analysis of the nucleotide sequence of influenza A virus RNA segments coding for the spike proteins (HA, NA, and M2) and the internal proteins (PB2, PB1, PA, NP, M, and NS) from a wide range of hosts, geographical regions, and influenza A virus subtypes support the following conclusions. (i) Two partly overlapping reservoirs of influenza A viruses exist in migrating waterfowl and shorebirds throughout the world. These species harbor influenza viruses of all the known HA and NA subtypes. (ii) Influenza viruses have evolved into a number of host-specific lineages that are exemplified by the NP gene and include equine Prague/56, recent equine strains, classical swine and human strains, H13 gull strains, and all other avian strains. Other genes show similar patterns, but with extensive evidence of genetic reassortment. Geographical as well as host-specific lineages are evident. (iii) All of the influenza A viruses of mammalian sources originated from the avian gene pool, and it is possible that influenza B viruses also arose from the same source. (iv) The different virus lineages are predominantly host specific, but there are periodic exchanges of influenza virus genes or whole viruses between species, giving rise to pandemics of disease in humans, lower animals, and birds. (v) The influenza viruses currently circulating in humans and pigs in North America originated by transmission of all genes from the avian reservoir prior to the 1918 Spanish influenza pandemic; some of the genes have subsequently been replaced by others from the influenza gene pool in birds. (vi) The influenza virus gene pool in aquatic birds of the world is probably perpetuated by low-level transmission within that species throughout the year. (vii) There is evidence that most new human pandemic strains and variants have originated in southern China. (viii) There is speculation that pigs may serve as the intermediate host in genetic exchange between influenza viruses in avian and humans, but experimental evidence is lacking. (ix) Once the ecological properties of influenza viruses are understood, it may be possible to interdict the introduction of new influenza viruses into humans.
Article
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
Article
The recent emergence of a novel avian A/H7N9 influenza virus in poultry and humans in China, as well as laboratory studies on adaptation and transmission of avian A/H5N1 influenza viruses, has shed new light on influenza virus adaptation to mammals. One of the biological traits required for animal influenza viruses to cross the species barrier that received considerable attention in animal model studies, in vitro assays, and structural analyses is receptor binding specificity. Sialylated glycans present on the apical surface of host cells can function as receptors for the influenza virus hemagglutinin (HA) protein. Avian and human influenza viruses typically have a different sialic acid (SA)-binding preference and only few amino acid changes in the HA protein can cause a switch from avian to human receptor specificity. Recent experiments using glycan arrays, virus histochemistry, animal models, and structural analyses of HA have added a wealth of knowledge on receptor binding specificity. Here, we review recent data on the interaction between influenza virus HA and SA receptors of the host, and the impact on virus host range, pathogenesis, and transmission. Remaining challenges and future research priorities are also discussed.
Article
An avian-like H3N2 influenza A virus (IAV) has recently caused sporadic canine influenza outbreaks in China and Korea, but the molecular mechanisms involved in the interspecies transmission of H3N2 IAV from avian to canine species are not well understood. Sequence analysis showed that residue 222 in the hemagglutinin is predominantly tryptophan (W) in closely related avian H3N2 IAV but was leucine (L) in canine H3N2 IAV. In this study, reassortant viruses rH3N2-222L (canine-like) and rH3N2-222W (avian-like) with hemagglutinin mutation L222W were generated using reverse genetics to evaluate the significance of the L222W mutation on receptor binding and host tropism of H3N2 IAV. Compared to rH3N2-222W, rH3N2-222L grew more rapidly in MDCK cells and had significantly higher infectivity in primary canine tracheal epithelial cells. Tissue binding assays demonstrated that rH3N2-222L had a preference for canine tracheal tissues rather avian tracheal tissues, whereas rH3N2-222W favored slightly avian rather canine tracheal tissues. Glycan microarray analysis suggested both rH3N2-222L and rH3N2-222W bound preferentially to alpha-2, 3 linked sialic acids. However, the rH3N2-222W had more than 2-fold less binding affinity than rH3N2-222L to a set of glycans with Neu5Aca2-3Galb1-4(Fuca-)-like or Neu5Aca2-3Galb1-3 (Fuca-)-like structures. These data suggest the W to L mutation at position 222 of the hemagglutinin could facilitate infection of H3N2 IAV in dogs, possibly by increasing the binding affinities of the hemagglutinin to specific receptors with Neu5Aca2-3Galb1-4(Fuca-) or Neu5Aca2-3Galb1-3(Fuca-)-like structures that are present in dogs.