Page 1

A two-tiered model for simulating the

ecological and evolutionary dynamics

of rapidly evolving viruses, with an

application to influenza

Katia Koelle1,2,*, Priya Khatri1, Meredith Kamradt1

and Thomas B. Kepler3

1Department of Biology, Duke University, PO Box 90338, Durham, NC 27708, USA

2Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA

3Center for Computational Immunology, Department of Biostatistics and Bioinformatics,

Duke University Medical Center, PO Box 2734, Durham, NC 27705, USA

Understanding the epidemiological and evolutionary dynamics of rapidly evolving pathogens

is one of the most challenging problems facing disease ecologists today. To date, many math-

ematical and individual-based models have provided key insights into the factors that may

regulate these dynamics. However, in many of these models, abstractions have been made

to the simulated sequences that limit an effective interface with empirical data. This is

especially the case for rapidly evolving viruses in which de novo mutations result in antigeni-

cally novel variants. With this focus, we present a simple two-tiered ‘phylodynamic’ model

whose purpose is to simulate, along with case data, sequence data that will allow for a

more quantitative interface with observed sequence data. The model differs from previous

approaches in that it separates the simulation of the epidemiological dynamics (tier 1)

from the molecular evolution of the virus’s dominant antigenic protein (tier 2). This separ-

ation of phenotypic dynamics from genetic dynamics results in a modular model that is

computationally simpler and allows sequences to be simulated with specifications such as

sequence length, nucleotide composition and molecular constraints. To illustrate its use, we

apply the model to influenza A (H3N2) dynamics in humans, influenza B dynamics in

humans and influenza A (H3N8) dynamics in equine hosts. In all three of these illustrative

examples, we show that the model can simulate sequences that are quantitatively similar

in pattern to those empirically observed. Future work should focus on statistical estimation

of model parameters for these examples as well as the possibility of applying this model, or

variants thereof, to other host–virus systems.

Keywords: disease dynamics; viral evolution; multi-strain model;

influenza; phylodynamics

1. INTRODUCTION

The ecological and evolutionary dynamics of many

RNA viruses have been increasingly well described

over the last several decades, yet the factors driving

their dynamics are still only poorly understood. One

approach towards identifying key factors is through

the formulation of mathematical models that, when

analysed analytically or simulated, yield quantitative

predictions of the case dynamics and the evolutionary

dynamics of the viral population (Holmes & Grenfell

2009). In the case of antigenically variable viruses,

these ‘phylodynamic’ models (Grenfell et al. 2004) fre-

quently incorporate multiple antigenically distinct

strains and keep track of either the immune status or

the infection histories of individuals in the host

population.

These multi-strain models range in complexity from

the very simple and abstract (e.g. Girvan et al. 2002;

Tria et al. 2005) to the more complex and biologically

realistic (e.g. Ferguson et al. 2003; Koelle et al. 2006).

Many of them yield dynamics that are qualitatively

consistent with the dynamics they seek to reproduce.

For example, when parametrized with a short duration

of infection, the status-based multi-strain model devel-

oped by Gog & Grenfell (2002) yields self-organized

sets of strains that turn over in time, consistent with

empirical patterns of influenza. Other examples are

the phylodynamic models developed by Ferguson

et al. (2003) and Koelle et al. (2006), both of which

yield case dynamics and viral diversity patterns that

arequalitatively similartothoseobservedand

*Author for correspondence (katia.koelle@duke.edu).

Electronic supplementary material is available at http://dx.doi.org/

10.1098/rsif.2010.0007 or via http://rsif.royalsocietypublishing.org.

J. R. Soc. Interface

doi:10.1098/rsif.2010.0007

Published online

Received 7 January 2010

Accepted 4 March 2010

1

This journal is q 2010 The Royal Society

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 2

phylogenies that resemble the known topology of influ-

enza’s haemagglutinin (HA) protein. Although these

multi-strain models, among others, have been able to

simulate dynamics that are consistent with particular

features of the observed data, many of these models

embody different mechanistic hypotheses about what

factors play dominant roles in shaping the dynamics.

For example, the model by Gog & Grenfell (2002) con-

siders only strain-specific immunity, whereas the model

by Ferguson et al. (2003) considers the additional role

that generalized immunity may play in shaping the

evolutionary dynamics of influenza’s HA. The model

by Koelle et al. (2006) considers a third hypothesis:

that the evolutionary dynamics of influenza’s HA are

shaped by periodic selective sweeps occurring during

antigenic cluster transitions.

Given this growing set of phylodynamic models that

differ in their mechanistic hypotheses, determining

which model performs best when confronted statisti-

cally with observed data is now necessary. In the case

of phylodynamic models, these data come in two

forms: epidemiological (case) data and evolutionary

(sequence) data. Interfacing disease models with case

data has a long history (e.g. Bobashev et al. 2000;

Finkensta ¨dt & Grenfell 2000; Koelle & Pascual 2004;

Xia et al. 2005; Ionides et al. 2006; King et al. 2008),

with a subset of these analyses focusing on parameter

estimation and model selection for antigenically variable

RNA viruses (Xia et al. 2005; Fraser et al. 2009). How-

ever, phylodynamic models have to date not routinely

been tested statistically against observed sequence data.

A quantitative comparison of simulated sequence

data against observed sequence data could focus on a

number of different sequence-derived patterns. These

include divergence and diversity patterns, as well as

quantitative comparisons of phylogenies reconstructed

from simulated and observed sequences. Although many

phylodynamic models have considered at least one of

these patterns (Girvan et al. 2002; Ferguson et al. 2003;

Tria et al. 2005; Koelle et al. 2006; Minayev & Ferguson

2009a,b), the comparisons against observed data have

been only qualitative in nature. The reason for this limit-

ation lies in the current inability of these models to

capture these patterns quantitatively. This does not

imply that these models are missing the relevant pro-

cesses at play; rather, the quantitative mismatch

between model-simulated sequence data and empirical

sequence data results from the way in which sequences

have been represented in these models. Specifically,

phylodynamic models to date have simplified the rep-

resentation of viral sequences by considering bitstrings

(Girvan et al. 2002; Tria et al. 2005), a subset of codons

(Ferguson et al. 2003; Koelle et al. 2006) or a limited

number of antigenic loci (Recker et al. 2007; Minayev &

Ferguson 2009a,b). These sequence representations have

made the models computationally tractable at the cost

of simulating sequences that differ in length (or in struc-

ture) from the empirical sequences with which they are

being compared. In the case of the models that simulate

a subset of codons, a quantitative comparison could of

course be made between empirical and simulated

sequences, if only a subset of the empirical sequences

were considered. However, considering a subset of sites

introduces several difficulties. First, which subset should

be used? Our understanding of which sites are important

for antigenic change is still incomplete. Second, if differ-

ent phylodynamic models represent their sequences

differently, a quantitative comparison against sequence

data would use different subsets of the data. This

would result in models not being compared against the

same sequence dataset, making difficult the process of

model selection.

To enable a quantitative comparison between simu-

lated and observed sequence data, we here develop a

new phylodynamic model that makes explicit the differ-

ence between antigenic change and genetic change, and

thereby makes it computationally feasible to model

sequences in their entirety. Specifically, the model we for-

mulate consists of two tiers. The first tier of the model

simulates the ecological dynamics of the virus and its

antigenic phenotypes. As such, it builds conceptually on

the idea that strain phenotypes (i.e. antigenic variants

or clusters), instead of genotypes, can be used as the ‘fun-

damental particle’ for modelling RNA viruses such as

influenza (Plotkin et al. 2002). More recently, Go ¨kaydin

et al. (2007) and Ballesteros et al. (2009) have used this

phenotype-level approach to consider the invasion

dynamics of a new antigenic cluster into a host population

with a resident cluster. Most relevant to the research pre-

sented here, Koelle et al. (2009) have recently introduced

an antigenic tempo model, which, in a modified version,

we use here as the first tier of our two-tiered model.

The second tier of the model simulates the molecular

evolution of a virus’s antigenic protein. It does so by

taking as given the epidemiological dynamics simulated

in the first tier of the model. Biologically, this separation

of ecological dynamics from evolutionary dynamics is of

course absurd: the molecular changes of a virus drive the

emergence of new antigenic variants, and therewith affect

the case dynamics. However, the effect of molecular

changes on the epidemiological dynamics is indirect, with

the link being the dynamics of the antigenic phenotypes.

As such, to simulate case dynamics, a phenomenological

model that reproduces the emergence dynamics of anti-

genic variants can be considered. It is this modular

separation of phenotypic dynamics from genotypic

dynamics that simplifies the computational complexity of

the simulations and thereby allows us to simulate viral

sequencesthatcanbestatisticallycomparedwithempirical

sequence data. Below, we first describe the epidemiological

submodel and detail (in appendix A) how this model can

be mechanistically interpreted in terms of mutations that

enable immune escape. We then describe the second tier

of the model, the molecular evolution submodel. A sche-

matic overview of the two tiers of the model and the flow

ofsimulateddata isshowninthe electronicsupplementary

material, figure S1. The Matlab source code is available for

download from the corresponding author’s website.

As described here, the two-tiered model assumes that

viruses evolve antigenically in a punctuated manner. As

such, mutations are assumed to be antigenically neutral

or nearly so most of the time, with only the rare

mutation resulting in a large antigenic change. This

model is, therefore, a simplification of the previously

published phylodynamic model of Koelle et al. (2006),

which hypothesizes that occasional antigenic innovations

2

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.org Downloaded from

Page 3

and the selective sweeps that accompany their emer-

genceare thetwo key

evolutionary dynamics of influenza A (H3N2)’s HA in

humans. The model presented here improves upon

this previous model in terms of both its simplicity and

its ability to reproduce the quantitative patterns of

the observed sequence data.

Following the description of the model, we provide

three case studies to illustrate the flexibility of the

model and the diversity of dynamics that it can generate.

The first application is to influenza A (H3N2) in humans,

the second to influenza B in humans and the third to

H3N8 in equine hosts. Our first application to influenza

A (H3N2) shows that gradual antigenic evolution

within antigenic clusters is necessary to reproduce the

ecological and evolutionary patterns of the observed

data. To our knowledge, it is the first phylodynamic

model that can quantitatively reproduce the known pat-

terns of viral diversity and divergence over time. Our

second application demonstrates that the model under a

different parametrization can generate the emergence

and maintenance of two viral lineages, consistent with

the evolutionary patterns observed for influenza B. Our

third application serves to illustrate the possibility of

extending the model. Specifically, we extend the two-

tiered model to two patches (representative of North

America and Europe) to show that it can reproduce the

evolutionary dynamics of influenza A (H3N8) in equine

hosts subject to transatlantic quarantine measures.

Although all three of our applications focus on influ-

enza, this two-tiered model is not limited to this virus, as

other RNA viruses (e.g. HIV and norovirus) appear to

evolve by punctuated immune escape (Cobey & Koelle

2008). The model is also not limited to the assumption

of punctuated antigenic evolution, as will be discussed in

§4, although it is in this case that the modularadvantages

of this model are most clearly evident.

factors thatdrivethe

2. THE TWO-TIERED MATHEMATICAL

MODEL

2.1. Tier 1: the epidemiological submodel

To model the virus’s epidemiological dynamics, we

improve on the antigenic tempo model recently pre-

sented elsewhere (Koelle et al. 2009). This model

starts with a given multi-strain model formulation,

interpreting strains in terms of major antigenic variants

instead of in terms of unique genotypes. As in Koelle

et al. (2009), we use a status-based approach to model-

ling strain interactions (Gog & Grenfell 2002), which

assumespolarized immunity.

dynamics of susceptible

belonging to a major antigenic variant i are captured

by equations of the form

Thedeterministic

and infected individuals

dSi

dt¼ mðN ? SiÞ ?

X

n

j¼1

bSi

NsijIjþ gðN ? Si? IiÞ ð2:1Þ

and

dIi

dt¼ bSi

NIi? ðm þ nÞIi? hðt ? te

iÞIi;

ð2:2Þ

where N is the population size, m is the birth rate and

the death rate, g is the rate of within-variant waning

immunity, n is the recovery rate, b is the transmission

rate and n is the total number of antigenic variants

that have circulated in the population up to time t.

The dynamics of individuals immune to variant i, Ri,

are not shown, as they can be easily computed by

Ri¼ N 2 Si2 Ii. As previously described in Koelle

et al. (2009), the degree of immunity between variants

i and j is given by sij¼ ulij, where u is the degree of

immunity between a mother–daughter variant pair

and lijis the antigenic kinship level between variants i

and j. hðt ? te

rate, defined as the rate at which individuals infected

with variant i give rise to a new antigenic variant. We

allow this rate to depend on the age of the variant,

t ? te

in the population. Specifically, using an approach simi-

lar to the punctuated model of antigenic change

detailed in Koelle et al. (2009), we model the per

capita antigenic emergence rate, hðt ? te

logically by using a Weibull hazard function, with scale

parameter l and shape parameter k

iÞ is the per capita antigenic emergence

i, where te

iis the time at which variant i emerged

iÞ, phenomeno-

hðt ? te

iÞ ¼k

l

t ? te

l

i

??k?1

:

ð2:3Þ

When k ¼ 1, this function reduces to a constant rate of

antigenic change, which many previous multi-strain

models have assumed. However, we consider the case

of k . 1, such that the rate of antigenic change

increases with the age of the variant. This phenomeno-

logical increase in the rate of phenotypic change can be

interpreted in several ways. First, it is consistent with

‘rules of thumb’ that have been developed to predict

the emergence of antigenically novel variants. These

rules of thumb generally specify that a certain number

of amino acid changes in epitope regions are required

to precipitate a major antigenic change (Wiley et al.

1981; Wilson & Cox 1990). Because it takes time to

accumulate these amino acid changes, an endemic

viral variant that has been circulating in a population

is more likely to give rise to a new variant the longer

it persists in the population. Second, an increase in

the rate of antigenic change with variant age can be

understood in terms of neutral networks in genotype

space (Lau & Dill 1990; Koelle et al. 2006; van

Nimwegen 2006). As an antigenic variant ages, the

sequences that it comprises are accumulating neutral

or nearly neutral mutations that are changing the gen-

etic backgrounds of the sequences. Effectively, this

exploration of sequence space can therefore lead to a

per capita rate of antigenic emergence that increases

with variant age. Third, and more formally, we can

mechanistically interpret the rate of antigenic change

increasing with the age of a variant by considering a

Markov model that considers mutation accumulation.

We outline this model in appendix A.

The model specified by equations (2.1)–(2.3), and

schematically shown as the first tier in the electronic

supplementary material, figure S1, differs from the orig-

inal model formulation described in Koelle et al. (2009)

in two ways. First, we allow for the possibility of waning

Two-tiered phylodynamic model

K. Koelle et al.

3

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 4

immunity within each antigenic variant (which occurs

when g . 0), such that the dynamics within each anti-

genic variant are governed by susceptible-infected-

recovered-susceptible(SIRS)

susceptible-infected-recovered (SIR) dynamics. Our

model is therefore more general, allowing for SIR

dynamics when g ¼ 0. Second, we simplify the model

by simulating it stochastically in its entirety instead

of using the stochastic hybrid approach described in

Koelle et al. (2009). This change to a fully stochastic

simulation removes the need to simulate the additional

variables that are used to determine the emergence

times of new phenotypes. Table 1 maps the determinis-

tic equationsconstituting

(2.1)–(2.3)) into Markov chain events and their associ-

ated transition rates, which are used to simulate the

model stochasticallywith

algorithm (Gillespie 2007).

The majority of the events and their rates shown in

table 1 are frequently used in stochastic simulations of

disease dynamics. The one exception is the antigenic

emergence event, which brings phenotypic novelty

into the viral population. When an antigenic emergence

event occurs for variant i in the simulation, it results in

an individual infected with variant j, where j ¼ n þ 1

and n is the number of variants that have been in the

population up to time t. The event also results in a

decrease in the number of individuals infected with var-

iant i from Iito Ii2 1 (table 1). The number of hosts

susceptible to variant j can be computed at the time

of variant j’s emergence if the numbers of births,

deaths and infections have been tracked over the

simulation.

dynamicsinstead of

the model(equations

the Gillespie

t-leap

2.2. Tier 2: the molecular evolution submodel

The second tier of the model consists of a molecular

evolution submodel, which generates a set of time-

stamped viral sequences from which a phylogeny can

be inferred and from which diversity and divergence

patterns can be constructed. This submodel uses as

input the variant-specific disease dynamics from the

epidemiological submodel, but does not provide any

feedback to it (electronic supplementary material,

figure S1). To simulate the molecular evolution sub-

model, a desired number of sampled sequences s is

first specified. Times of isolation are then randomly

assigned to each of these s sequences, taking into con-

sideration the number of infected individuals at each

time point. This is done by setting the probability of

the time of isolation being day k as I(k)/P

on day k and index i sums over all days of the simu-

lation.Eachtime-stamped

probabilistically assigned an antigenic variant to

which it belongs by letting the probability that the

sequence belongs to variant j be given by Ij(k)/

P

on day k.

A desired nucleotide length l for the viral sequences

is then specified, along with a mutation rate mnucin

units of nucleotide changes per site per year. The

iI(i),

where I(k) is the total number of individuals infected

sequence isthen

iIi(k) where k is the day of the sequence’s isolation

and index i sums over all antigenic variants present

per-sequence mutation rate m is given by the product

mnucl. Finally, we specify a given model of sequence

evolution and provide parameters associated with this

model. The model of sequence evolution chosen can

be extremely simple and

example, Kimura’s two-parameter model requires only

one parameter for the transition rate and one parameter

for the transversion rate (Felsenstein 2004). Alterna-

tively, the model chosen can be more complicated. For

example, the general time reversible (GTR) model

with a proportion of invariant sites (I) and a gamma

parametersparse. For

Table 1. The Markov chain events and their transition rates

used to stochastically simulate the epidemiological submodel.

Events are shown for a focal variant i. Events and transition

rates for all other variants j in the set {1, 2, 3, ..., n} are

analogous. The population size N is given by Siþ Iiþ Ri.

Several constraints exist in the system and are respected

during the simulation. Because Sjþ Ijþ Rj¼ N for all

variants j, a birth event, occurring at rate mN, increases the

number of susceptible hosts for each variant j by 1 (i.e.

births do not occur independently for each variant j).

Similarly, a death, also occurring at rate mN, decreases the

number of hosts S, I or R to each variant j by 1. This

decrease is taken from Sj, Ijor Rjwith probability Sj/N, Ij/N

and Rj/N, respectively. The rate of possible infection with

variant j (related to the force of infection) is given by bIj.

When i is equal to j, a ‘possible infection’ event results in an

infection with probability Si/N. When i is not equal to j, this

possible infection event results in a gain of immunity

throughpolarizedcross-immunity

sij(Si/N). Recovery of an individual infected with variant i

and the loss of immunity to variant i occur at rates nIiand

gRi, respectively. An antigenic emergence event results in a

decrease in the number of individuals infected with variant i

and the stochastic appearance of a new antigenic variant,

variant n þ 1.

withprobability

event changerate

birth(Si,Ii,Ri) ! (Siþ 1,Ii,Ri)

(Si,Ii,Ri) ! (Si2 1,Ii,Ri) with

probability Si/N;

(Si,Ii,Ri) ! (Si,Ii2 1,Ri) with

probability Ii/N;

(Si,Ii,Ri) ! (Si,Ii,Ri2 1) with

probability Ri/N

mN

death

mN

possible

infection

for i ¼ j:

(Si,Ii,Ri) ! (Si2 1,Iiþ 1,Ri)

with probability Si/N;

(Si,Ii,Ri) ! (Si,Ii,Ri) with

probability (1 2 Si/N)

for i = j:

(Si,Ii,Ri) ! (Si2 1,Ii,Riþ 1)

with probability sijSi/N;

(Si,Ii,Ri) ! (Si,Ii,Ri) with

probability (1 2 sijSi/N)

bIj

recovery

loss of

immunity

(Si,Ii,Ri) ! (Si,Ii2 1,Riþ 1)

(Si,Ii,Ri) ! (Siþ 1,Ii,Ri2 1)

nIi

gRi

antigenic

emergence

(Si,Ii,Ri) ! (Si,Ii2 1,Riþ 1)

and (Snþ1, 0,Rnþ1) !

(Snþ12 1, 1,Rnþ1)

h(t 2 ti

e)Ii

4

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 5

distribution of rate variation (G) would entail specifying

11 parameters: the frequencies of the four nucleotide

bases, the transition rates between each pair of bases,

the proportion of invariant sites and a parameter a

that controls the shape of the gamma distribution

(Felsenstein 2004).

Under a given model of sequence evolution and its

associated parameter values, site-specific mutation

rates are then assigned to each of the l nucleotide

locations. Under Kimura’s two-parameter model, the

mutation rate of each site is mnuc. Under the GTR þ

I þ G model, the mutation rate at each site is assigned

based on the proportion of invariant sites I and the a

parameter of the G distribution, such that the

per-sequence mutation rate comes out to be m.

To begin the molecular evolution submodel simu-

lation, a single sequence of length l is generated, with

each site probabilistically assigned a nucleotide depend-

ing on base frequencies specified by the model of

sequence evolution. This sequence belongs to antigenic

variant i ¼ 1, and all infected individuals on day 0 of

the simulation are infected with this strain. Starting

with the first antigenic variant, all sampling times of

the (genetically yet undetermined) sequences that

belong to variant 1 are found, as are all the emergence

times of variants that were generated specifically by this

variant. Then, starting from day 0, from each day to the

next, two processes occur: mutation and transmission of

extant sequences. Mutation is simulated by first deter-

mining the number of mutations that will occur on a

particular day. This number is determined by drawing

from a Poisson distribution with mean mI, where I is

the number of individuals infected on a particular

day. These mutations occur in the viral population (of

size I) at sites that are chosen randomly, weighted by

their per site mutation rate mnuc. A mutation assigned

at a chosen nucleotide site in a chosen individual results

in a nucleotide change that reflects the transition rates

given by the model of sequence evolution. Transmission

is simulated by first calculating, from one day to the

next, the number of new infections and the number of

recoveries, both only of the first variant. New infections

are then drawn from the extant pool of viral sequences

belonging to variant 1 and recoveries are chosen from

this same pool. Recoveries always occur randomly,

while new infections are chosen depending on the

selective advantage of each viral sequence.

The selective advantage of a sequence is given by

fk, where k is the sequence’s nucleotide distance from

the founder sequence of the variant and f is a par-

ameter specifyingthe intensity

advantage a single nucleotide change confers within

a single antigenic variant. In the case of purely neutral

evolution within antigenic variants (g ¼ 0 in equation

(2.1)), f ¼ 0, such that new infections are chosen ran-

domly from the pool of extant sequences. In the case

of gradual antigenic evolution within antigenic var-

iants (g . 0 in equation (2.1)), f . 0, such that new

infections are preferentially chosen from the set of

extant sequences that have higher divergence from

their founder sequence. (Clearly, there should be a

quantitative relationship between the parameters g

and f, with higher values of f associated with higher

ofthe selective

values of g. Unfortunately, at this point, we do not

have an equation mechanistically linking the value of

f to the value of g as we have for linking l to m

(appendix A).) The implementation of the trans-

mission step thereby captures both demographic

stochasticity in disease transmission and the possi-

bility for within-variant selection to act.

Throughout the day-to-day simulation of mutations

and transmission events, sampling times of sequences

belonging to variant 1 will be reached, as will emergence

times of variants that arose from variant 1. When a

sampling time is reached, a viral sequence is randomly

chosen from the pool of infected individuals; this simu-

lated sequence is now one of the s sequences that can be

used in inferring a phylogeny. When an emergence

timeis reached,a viral

chosen from the pool of infected individuals and

mutated at a single nucleotide location; this simulated

sequence is now the founder of the antigenic variant

born of variant 1 at that time. Alternatively, to be

strictly consistent with the Markov chain process

detailed above, the viral sequence can be chosen prefer-

entially according to its nucleotide distance k from its

founder sequence.

The same iterative procedure of mutation, trans-

mission and sequence sampling is then performed for

variant 2 through to the last variant observed in the

simulation. The set of sampled sequences that results

from this simulation can then be compared against

empirical sequence data. Although the simulated

sequences will differ from the observed ones genetically,

patterns of sequence evolution (e.g. divergence and

diversity patterns) can be compared quantitatively

across these datasets, and a phylogeny can be inferred

from all s sampled sequences and compared against

phylogenies inferred from empirical sequences.

sequenceis randomly

3. AN APPLICATION OF THE MODEL TO

INFLUENZA DYNAMICS IN HUMANS

To illustrate its use, we simulate the two-tiered model

described above with parameter values that are reason-

able for influenza. We use influenza as an application

because sequence data are readily available for its

dominant antigenic protein, the HA, and because the

advantages of the two-tiered model in terms of its

modular design are most evident for a system like influ-

enza, where many mutations in the HA protein have

been shown to be neutral or nearly so and single

mutations have been shown to result in large antigenic

changes (Webster & Laver 1980; Nakajima et al. 1983;

Berton et al. 1984; Nakagawa et al. 2000, 2001a,b,

2002; Smith et al. 2004).

3.1. Simulating the dynamics of influenza A

(H3N2) in humans

Influenza A subtype H3N2 has been circulating in humans

since 1968, when a reassortment event between the pre-

viously circulating influenza A subtype, H2N2, and a

swine influenza virus resulted in its pandemic spread.

From 1968 until the present, this subtype has dominated

the flu season, with annual H3N2 attack rates estimated

Two-tiered phylodynamic model

K. Koelle et al.

5

J. R. Soc. Interface

on March 26, 2010 rsif.royalsocietypublishing.orgDownloaded from

Page 6

to be between 2 and 10 per cent. In temperate regions,

H3N2’s disease dynamics are largely annual, with

occasional years of low activity (figure 1a,b). The evol-

utionary dynamics of the virus are characterized by

the emergence and replacement of antigenic clusters,

occurring every2–8 years

Smith et al. 2004) and, genetically, by the ladder-like

phylogeny of its HA protein (figure 1c) (Fitch et al.

1997).

To apply the two-tiered model to influenza A

(H3N2) in humans, we first modify the epidemiologi-

cal submodel shown in equations (2.1) and (2.2) to

incorporate a slightly larger degree of realism. Because

we will compare the model simulations with the

dynamics observed in a temperate region (figure 1a),

we include in our simulations seasonal forcing of the

transmission rate and a small immigration rate.

With these modifications, equations (2.1) and (2.2)

become

(Plotkin

etal.2002;

dSi

dt¼ mðN ? SiÞ ?

X

n

j¼1

bð1 þ 1 sinð2ptÞÞ

Si

NsijðIjþ rpjÞ þ gðN ? Si? IiÞð3:1Þ

and

dIi

dt¼ bð1 þ 1sinð2ptÞÞSi

ðm þ nÞIi? hðt ? te

NðIiþ rpiÞ?

iÞIi;

ð3:2Þ

where 1 is the strength of seasonal forcing, r is the

immigration rate and pi is the proportion of cases

that belong to antigenic cluster i.

We simulate this submodel under two parameter

sets: (i) purely punctuated antigenic evolution and

(ii) antigenic evolution that includes components of

both punctuated and gradual antigenic change. A

third parameter set, with only gradual antigenic evol-

ution, is considered in the electronic supplementary

material. Under the first parameter set, there is no

waning of immunity within antigenic clusters (g ¼ 0).

In contrast, in simulations with the second parameter

set, individuals can become reinfected with a given anti-

genic cluster after an average duration of immunity 1/g.

All other parameters of the model are chosen to be con-

sistent with values from the literature, where possible

(figure 2, legend).

Although the epidemiological submodel parame-

trized for purelypunctuated

reproduces the emergence–replacement dynamics of

influenza A (H3N2)’s antigenic clusters (figure 2a,c),

it failsto reproduceother

dynamics. Specifically, the simulated annual attack

rates are generally too low and the emergence of a

new antigenic cluster results in a year having an unrea-

listically large deviation from other years’ attack rates

(figure 2a,b). However, the epidemiological submodel

parametrized for a combination of punctuated anti-

genic evolution and gradual antigenic evolution is

capable of reproducing the observed patterns of seaso-

nal and interannualvariability,

antigenic evolution

features ofH3N2’s

aswell as the

2000

(a)(c)

(b)

1500

1000

cases per 100 000

500

12

10

attack rate

8

6

4

2

0

0

198519901995

year

2000 2005

0.02

Figure 1. Observed dynamics of influenza A (H3N2) in humans. (a) The case dynamics of influenza in France over the period

1984–2008 are shown in black (http://www.sentiweb.org/). Starting in 1997, FluNet data were available to estimate variant-

specific dynamics (http://gamapserver.who.int/GlobalAtlas/). The subset of cases attributed to influenza A (H3N2) is shown

in red after this date. (b) The annual attack rate of influenza in France, computed from the case dynamics shown in (a). Starting

in 1997, the attack rate is partitioned into H3N2 cases (red) and H1N1/B cases (grey). Case dynamics are shown for France

owing to the availability of data. Similar case dynamics are observed in other temperate regions, with similar estimated

attack rates. (c) The phylogeny of influenza A (H3N2)’s HA, inferred from antigenically typed sequences isolated between

1968 and 2003 (Smith et al. 2004). Sequences are coloured by antigenic cluster.

6

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010 rsif.royalsocietypublishing.orgDownloaded from

Page 7

magnitude of H3N2’s attack rates (figure 3a,b), while

maintaining the ability to reproduce the emergence

and replacement of antigenic clusters every 2–8

years (figure3c). These

dynamics result from the strong selective advantage

of novel antigenic variants, arising from the greater

number ofsusceptiblehosts

(figure 3d). The results shown in figures 2 and 3 are

consistent with the recent findings by Ballesteros

et al. (2009) that both gradual and punctuated anti-

genic evolution are necessary to effectively reproduce

influenza A (H3N2)’s disease dynamics. A model

that considers only gradual antigenic evolution does

not reproduce the observed dynamics well, yielding

strictly annual cycles with very low incidence rates

(electronic supplementary material).

To simulate the molecular evolution submodel, we

specify the length of the HA (l ¼ 987 nucleotides) and

the HA mutation rate (mnuc¼ 5.7 ? 1023nucleotide

substitutions per site per year). We also use a GTR þ

I þ G model of sequence evolution, with parameters

that were estimated from a maximum-likelihood fit to

H3N2sequenceswhose

figure 1c (figure 4, legend). (Alternatively, we could

use a simpler model of sequence evolution; our decision

to use the GTR þ I þ G model is based on improving

the realism of the model, without requiring additional

free parameters.) For the model parametrized for punc-

tuated antigenic evolution, we set f, the degree of

emergence–replacement

availableto them

phylogenyis shownin

within-cluster positive selection, to 0. For the model

parametrized for a combination of punctuated and gra-

dual antigenic evolution, we allow for some positive

selection within antigenic clusters by setting f ¼ 0.01.

Setting f . 0 is qualitatively consistent with antigenic

maps for H3N2 that show a scatter of points for each

antigenic cluster, indicating imperfect cross-immunity

between viral strains belonging to a single cluster

(Smith et al. 2004). The phylogenies inferred from the

simulated sequences of both models (figure 4a,b) repro-

duce the cactus-like topology of H3N2’s HA phylogeny,

consisting of a long trunk and short side branches

(figure 1c). Diversity and divergence patterns of the

simulated HA sequences can also be plotted and

comparedagainst those

(figure 5a–f). The model with only punctuated anti-

genic evolution fails to reproduce the rapid divergence

that is empirically observed (figure 5c versus 5a). In

contrast, the model with a combination of gradual

and punctuated evolution captures this pattern quanti-

tatively (figure 5e versus 5a). Although the model with

both gradual and punctuated evolution clearly per-

forms better in terms of reproducing divergence

patterns, both models are capable of reproducing the

sequence diversity patterns of H3N2’s HA (figure 5d,f

versus 5b). In the electronic supplementary material,

we further considered whether the model of only gra-

dual antigenic change could effectively reproduce the

evolutionary patterns of influenza’s HA. To some

observed empirically

400

(a)

(b)

(c)

300

200

cases per

100 000

attack rate

proportion

in cluster

100

15

10

5

0

1.0

0.5

05 1015 20253035

year

0

Figure 2. Epidemiological submodel simulation under a model of purely punctuated antigenic evolution. (a) Simulated case

dynamics. Antigenic clusters are colour-coded. (b) Simulated annual attack rates. (c) Proportion of cases belonging to each anti-

genic cluster. Parameters used: N ¼ 300 million, R0¼ 2, m ¼ 1/70yr21, n ¼ 1/3d21, g ¼ 0yr21, 1 ¼ 0.25, u ¼ 0.8 (Gill &

Murphy 1977), r ¼ 0.01 N person yr21and Weibull parameters k ¼ 2, l ¼ 500. The epidemiological submodel was simulated

stochastically, using Gillespie’s t-leap algorithm (Gillespie 2007) with a time interval t of half a day.

Two-tiered phylodynamic model

K. Koelle et al.

7

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 8

extent, it was able to do so. However, we also elaborate

on why the molecular evolution submodel, as specified,

is not appropriate for this application and review results

from an existing model which show that gradual anti-

genic evolution leads to explosive viral diversity

(as previously also shown in Ferguson et al. (2003)),

inconsistent with a ladder-like phylogeny.

The results of the simulations shown here and in

the appendix indicate, first, that punctuated antigenic

change is an important driver of influenza A (H3N2)’s

dynamics; second, that gradual antigenic evolution

within clusters contributes to influenza A (H3N2)’s

ecological and evolutionary dynamics; and, third,

thatthe two-tieredmodel,

parametrized, cangenerate

quantitatively reproduce patterns in observed sequence

data.

when

sequence

appropriately

data that

3.2. Simulating the dynamics of influenza B

in humans

We now consider the dynamics of influenza B in

humans, and whether an alternative parametrization

of the two-tiered model can yield ecological and

evolutionary dynamics consistent with this influenza

type. Unlike the largely annual epidemics of influenza

A (H3N2), influenza B epidemics occur only every 2–4

years (figure 6a) (Monto & Kioumehr 1975). Also

different from H3N2, the evolution of influenza B’s

HA is characterized by two distinct lineages, the

Yamagatalineage and

(figure 6b) (Rota et al. 1992). Epidemic seasons of

influenza B usually have one of these two lineages

dominating, although a co-occurrence of strains from

both lineages is occasionally observed (Nerome et al.

1998). Within each of these two lineages, genetic and

antigenic viral turnover has been documented (Rota

et al. 1992; Nerome et al. 1998; Nakagawa et al.

2001a,b).

To reflect a lower mutation rate for influenza B (when

compared against the mutation rate for influenza A)

(Nobusawa & Sato 2006), we reparametrized the epide-

miological model with a lower value of g, a lower value

of r and a higher value of l (figure 6, legend). A lower

value of g reflects slower within-cluster waning of immu-

nity, owing to a lower mutation rate. This lower value of

g also reduces the equilibrium number of individuals

infected with influenza B when one cluster is endemic.

theVictoria lineage

1500

1000

500

cases per

100 000

0

0

20

40

attack rate

proportion

in cluster

fraction

susceptible

1.0

0.5

0.2

0.4

0.6

0

05 1015 2025 3035

time (years)

(a)

(b)

(c)

(d )

Figure 3. Epidemiological submodel simulation under a model of punctuated and gradual antigenic evolution. (a) Simulated case

dynamics. (b) Simulated annual attack rates. (c) Proportion of cases belonging to each antigenic cluster. (d) The fraction of hosts

susceptible to each antigenic cluster, shown over the lifespan of each cluster. Parameters used were as in figure 2, with the

exception of g, which here is g ¼ 1/8yr21.

8

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010 rsif.royalsocietypublishing.orgDownloaded from

Page 9

This in turn would reduce the immigration rate r of

infected individuals. Lastly, with a lower mutation rate,

we would expect a higher value of the Weibull scale

parameter l (appendix A).

In simulations of the two-tiered model with this

alternative parametrization, we find that these changes

in parameter values result in lower attack rates and

more interannual variability (results not shown), con-

sistent with the observed epidemiological dynamics.

However, the model with just these parameter changes

did not reproduce the observed pattern of lineage (and

cluster) co-circulation.

When the model is simulated with a longer duration

of infection, however, two distinct lineages emerge and

persist, with antigenic clusters replacing one another

periodically within each lineage. Furthermore, the two

lineages exhibit asynchronous dynamics, with one lin-

eage usuallydominating

(figure 6c,d). The phylogeny inferred from the simu-

lated sequences also reproduces the general topology

of the phylogeny inferred from the empirical sequences

(figure 6e).

We chose to consider a longer duration of infection

because influenza B primarily affects children, and chil-

dren are known to be infectious for longer periods of

time than adults. Although these simulations can

recover features of the observed evolutionary and eco-

logical dynamics of influenza B, other parameter

differences from influenza A (H3N2) not explored here

(e.g. in the degree of cross-immunity between antigenic

clusters) could result in similar dynamics. Statistical

estimation of model parameters will therefore be critical

for identifying the parameters that play the greatest

role in shaping the dynamical differences between

these two types of influenza.

aninfluenzaB season

3.3. Simulating the dynamics of influenza

H3N8 in equine hosts

To illustrate the ease with which the two-tiered model can

be extended, we now consider the evolutionary dynamics

of influenza (H3N8) in equine hosts. This subtype of

equine influenza A virus (EIV) was first isolated from

its hosts in 1963 in Florida (Waddell et al. 1963). Since

then, the virus has undergone many genetic and antigenic

changes that resulted in EIV outbreaks in horses around

the world (Daniels et al. 1985; Kawaoka et al. 1989;

Oxburgh et al. 1994). H3N8 is currently the only known

strain of EIV circulating in equine hosts; the H7N7 sub-

type of EIV has not been isolated in horses since 1980

(Webster 1993).

The patterns of genetic and antigenic change in

EIV resemble the evolutionary patterns of influenza

B virus in humans. Phylogenetic analysis of H3N8

from 1963 to 1986 shows a pattern of relatively

linear evolution (Oxburgh et al. 1994) with lineage

turnover and low genetic diversity (figure 7a). Some

time between 1986 and 1990, H3N8 split into two dis-

tinct lineages, referred to as the ‘European’ and

‘American’ lineages, based on their geographical

origin (figure 7a) (Lindstrom et al. 1994; Daly et al.

1996; Oxburgh et al. 1998; Oxburgh & Klingeborn

1999; Lai et al. 2004). This split was accompanied

by the extinction of the older viral lineages (Daly

et al. 1996; Oxburgh et al. 1998; Oxburgh &

Klingeborn 1999; Lai et al. 2004). After some time,

co-circulationofthetwolineageswasreportedinEuropean

countries, such as Sweden and the UK (Daly et al. 1996;

Oxburgh & Klingeborn 1999), but, interestingly, only a

single isolate of the European lineage was observed in

North America (the Canadian isolate A/Eq/Saskatoon/

0.0070

0.03

(a)(b)

Figure 4. Phylogenies inferred from sequences simulated by the molecular evolution submodel. (a) Phylogeny inferred from

sequences simulated under a model of purely punctuated antigenic evolution (f ¼ 0). (b) Phylogeny inferred from sequences

simulated under a model of punctuated and gradual antigenic evolution (with f ¼ 0.01). Both phylogenetic trees were inferred

from 300 simulated HA sequences of length l ¼ 987 nucleotides, spanning 35 years. In both simulations, mnuc¼ 5.7 ? 1023

mutations per nucleotide site per year (Fitch et al. 1997), and GTR þ I þ G parameters were set to those estimated from

the H3N2 phylogeny shown in figure 1c. These were: (pA, pC, pG, pT) ¼ (0.35, 0.22, 0.21, 0.22), transition rates: AC ¼

1.21, AG ¼ 4.75, AT ¼ 0.62, CG ¼ 0.17, CT ¼ 4.52, GT ¼ 1, gamma distribution parameter a ¼ 1.21 and proportion of invar-

iant sites f0¼ 0.28.

Two-tiered phylodynamic model

K. Koelle et al.

9

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 10

90) (Lai et al. 2004). The American lineage continued to

diversify genetically, evolving into several sublineages

(Lai et al. 2004; Bryant et al. 2009). In contrast, the Euro-

pean lineage has declined in prevalence and appears to be

heading towards extinction, with only a few European iso-

lateshavingbeenobservedsince1993(figure7a)(Oxburgh

et al. 1998; Bryant et al. 2009).

As for flu in humans, the recurrence of H3N8 in the

horse population has been attributed to the antigenic

drift of the virus’s HA protein (Hinshaw et al. 1983;

Kawaoka et al. 1989; Berg et al. 1990; Endo et al.

1992). Haemagglutination inhibition assays of H3N8

isolates from 1963 to 1986 have indicated that the

antigenic evolution of H3N8 parallels to a large

extent the genetic evolution of the virus’s HA

(figure 7a) (Hinshaw et al. 1983; Kawaoka et al.

1989).

Although similar to influenza B’s evolutionary

dynamics, the dynamics of H3N8 in equine hosts is com-

plicated by the international movement of race horses.

Geographical isolation, through quarantine measures,

has been proposed as an explanation for the appearance

of the distinct European and North American lineages

(Lindstrom et al. 1994; Daly et al. 1996; Lai et al.

2004), and the weakening of quarantine restrictions in

Europe could be a potential explanation for the co-

circulation of these lineages in Europe. Here, we

extend the two-tiered model to allow for the evolution-

ary dynamics of H3N8 to take place in two geographical

locations, representing Europe and North America.

Specifically, the epidemiological submodel equations

are first modified to consider two ‘patches’

dSNA

i

dt

¼ mðNNA? SNA

i

Þ ?

X

n

j¼1

bSNA

NNAsijINA

i

j

?

X

n

j¼1

rE!NA

? bSNA

¼ bSNA

? ðm þ nÞINA

i

NNAsijIE

jþ gðNNA? SNA

þ rE!NAbSNA

? hðt ? te

X

NEsijINA

i

? INA

i

Þ;

ð3:3aÞ

dINA

i

dt

i

NNAINA

i

i

NNAIE

iÞINA

bSE

NEsijIE

i

ii

;

ð3:3bÞ

dSE

dt

i

¼ mðNE? SE

iÞ ?

n

j¼1

i

j?

X

n

j¼1

rNA!E

? bSE

i

j

þ gðNE? SE

i? IE

iÞð3:3cÞ

and

dIE

dt

i

¼ bSE

? ðm þ nÞIE

i

NEIE

iþ rNA!EbSE

i? hðt ? te

i

NEINA

iÞIE

i

i:

ð3:3dÞ

150

100

50

distance from pandemic strain

pairwise diversity

0

0

10

20

30

40

1970198019902000

year

0102030

years

0102030

years

(a)(c)(e)

(b)(d )( f )

Figure 5. Divergence and diversity patterns of empirical and simulated HA sequences. (a) Divergence of observed HA

sequences from influenza A (H3N2) strain BI/16398/68, coloured by antigenic cluster. (b) Diversity of observed HA

sequences over time. (c) Divergence of simulated HA sequences from the initial strain, coloured by antigenic cluster, for

the purely punctuated model of antigenic evolution. (d) Diversity of simulated HA sequences over time for the purely

punctuated model of antigenic evolution. (e) Divergence, as in (c), with sequences simulated from the model of punctuated

and gradual antigenic evolution. (f ) Diversity, as in (d), with sequences simulated from the model of punctuated and

gradual antigenic evolution. For all plots, divergence is defined as the Hamming distance from each sequence to the

reference strain, and diversity is defined as the average pairwise nucleotide distance of sequences sampled in a given year.

10

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 11

This extension introduces two more parameters: the

degree of transmission reduction from the European

patch to the North American patch (rE!NA) and the

degree of transmission reduction from the North

American patch to the European patch (rNA!E). In

the simulated years 1963–1986, we assign both patches

the same reduction in the degree of transmission

(rE!NA and rNA!E¼ 1/100). During the simulated

years 1986–1999, we assume that transmission between

the patches ceases (rE!NAand rNA!E¼ 0), reflecting

effective quarantine between the two continents. After

1999, we let rE!NAremain at 0, but set rNA!Eback

to 1/100, to reflect the hypothesized breakdown of

quarantine measures in Europe.

Simulations of this two-patch epidemiological sub-

model yield recurrent outbreaks of EIV in North

America and in Europe (figure 8a,c), frequently

coinciding with the appearance of a new antigenic var-

iant. These patterns are qualitatively similar to the

reported dynamics of EIV, with outbreaks occurring

every 2–8 years and frequently being associated with

new antigenic variants (Waddell et al. 1963; Livesay

et al. 1993; Chambers et al. 1994; Daly et al. 1996;

Ilobi et al. 1998; Damiani et al. 2008). We do not

make an attempt at a quantitative comparison between

model-simulated case dynamics and observed case

dynamics because H3N8 is not a notifiable disease,

and, as such, the data are not sufficient to allow for

this comparison.

Simulations of the epidemiological submodel also

show that the model can reproduce antigenic turnover

patterns similar to those observed (figure 8b,d). Prior

to 1986, both antigenic variants that arose rapidly

spread through the two patches and globally replaced

the previously endemic variant. After the initiation of

the quarantine measures in 1986, a new antigenic var-

iant arose in North America and, later, another in

Europe. The emergence of these independent variants

corresponds qualitatively to the arrival of the American

and European lineages. In 1999, after the quarantine

measureswere weakened

American variant appeared in Europe, but did not

succeed in replacing the European variant. Thus, we

observe in our simulations co-circulation of the Euro-

pean and North American lineages in Europe. Owing

to the intact quarantine restrictions in North America,

only the North American lineage continues to circulate

there. As time goes on, we continue to observe antigenic

in Europe,theNorth

1000

19981999 2000 20012002 2003

year

20042005 2006 2007 2008

500

cases per

100 000

0

1000

500

cases per

100 000

0

proportion in

cluster

1.0

0.5

0

5 10 15202530

5 10 1520 2530

time (years)

(a)

(c)

(d)

(b)

(e)

0.03

0.02

Figure 6. Observed versus simulated dynamics of influenza B in humans. (a) As in figure 1a, the case dynamics of influenza in

France over the period 1997–2008 are shown in black. Cases attributed to influenza B are shown in red. (b) The phylogeny of

influenza B’s HA from isolates spanning the years 1973–2008. Sequences were randomly sampled from the influenza virus

resource database. (c) Epidemiological submodel simulations of influenza B case dynamics under a model of both punctuated

and gradual antigenic change. Antigenic clusters are colour-coded. (d) Proportion of cases shown in (c) belonging to each anti-

genic cluster. Parameters used: N ¼ 300 million, R0¼ 2, m ¼ 1/70yr21, n ¼ 1/7d21, g ¼ 1/15yr21, 1 ¼ 0.25, u ¼ 0.80, r ¼ 0.005

and Weibull parameters k ¼ 2, l ¼ 700. (e) Phylogeny of 600 simulated HA sequences, of length l ¼ 1041 nucleotides and span-

ning 30 years. The GTR þ I þ G parameters were set to those estimated from the influenza B phylogeny shown in (b). These

were: (pA, pC, pG, pT) ¼ (0.35, 0.20, 0.20, 0.25), transition rates: AC ¼ 1.42, AG ¼ 6.15, AT ¼ 0.54, CG ¼ 1.03, CT ¼ 7.19,

GT ¼ 1, gamma distribution parameter a ¼ 0.79 and proportion of invariant sites f0¼ 0.26. Other parameters for the molecular

evolution submodel were mnuc¼ 2.85 ? 1023mutations per nucleotide site per year and f ¼ 0.01.

Two-tiered phylodynamic model

K. Koelle et al.

11

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 12

evolution of both the North American and the

European lineages.

Following the simulation of these variant specific

case dynamics, we simulated the second tier of the

model, the molecular evolution submodel, sampling

sequences from both continents. The phylogeny recon-

structed from the simulated sequences shows both

qualitative and quantitative similarities to the tree

inferredfromobserved

(figure 7a,b). Qualitatively, lineage turnover and low

genetic diversity are observed in both trees prior to

1986. Both phylogenies then split into two lineages in

the late 1980s. In the last decade, both phylogenies

show that sequences from the North American lineage

were isolated in Europe. In addition to these qualitative

similarities, the branch lengths of the tree inferred from

simulated sequences are similar to those from the tree

inferred from observed sequences.

The application and extension of the two-tiered

model to include space in the simulation of equine

H3N8HAsequences

influenza dynamics testifies to the versatility and adap-

tability of the model to different host systems and

different ecological variables.

4. DISCUSSION

Here, we presented a two-tiered model that can be used

to simulate both the ecological and the evolutionary

dynamics of rapidly evolving RNA viruses. The

model’s novelty resides in its modular design: it separ-

ates antigenic dynamics from genotypic dynamics,

and thereby yields computationally simpler simulations

that allow for a more realistic representation of viral

sequences. At the heart of the two-tiered model is the

antigenic emergence rate, which drives the emergence

dynamics of new antigenic variants in the epidemiologi-

cal submodel. Here, we also showed that this antigenic

emergence rate, when parametrized with a shape par-

ameter k of 2, can be mechanistically interpreted in

American

lineage and

sublineages

American

lineage

European

lineage

European

lineage

antigenic

variants pre-

lineage split

pre-quarantine

lineages

0.04

(a)(b)

Figure 7. Phylogenies of influenza A H3N8’s HA in equine hosts, reconstructed from empirical and simulated viral sequences. (a)

Phylogeny inferred from equine influenza virus H3N8 sequences isolated in Europe and North America. Split of the American and

European lineages occurred between 1986 and 1990. Sequences isolated from North America are shown in blue. Sequences iso-

lated from Europe are shown in red. Three antigenic variants have been well characterized: the original Mia/63 antigenic

variant (yellow), the Ky/76 variant (purple) and an unnamed variant (orange) that appears antigenically similar to Mia/63

(Hinshaw et al. 1983). (b) Phylogeny inferred from 200 simulated H3N8 HA sequences. Prior to quarantine, there is linear evol-

ution of EIV’s HA until the lineage split in the late 1980s. Viral isolates belonging to the simulated European lineage were only

observed in the European patch, whereas viral isolates belonging to the American lineage were isolated in both the North Amer-

ican patch and the European patch. The GTR þ I þ G parameters used in the molecular sequence evolution model were set to

those estimated from the influenza A (H3N8) phylogeny shown in (a). These were: (pA, pC, pG, pT) ¼ (0.35, 0.21, 0.22, 0.22),

transition rates: AC ¼ 0.97, AG ¼ 5.65, AT ¼ 0.83, CG ¼ 0.35, CT ¼ 7.74, GT ¼ 1, gamma distribution parameter a ¼ 0.91 and

proportion of invariant sites f0¼ 0.17. Other parameters for the molecular evolution submodel were mnuc¼ 5.7 ? 1023mutations

per nucleotide site per year and f ¼ 0.01.

12

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010 rsif.royalsocietypublishing.orgDownloaded from

Page 13

terms of a model that considers neutral mutation

accumulation and the probability of immune escape

increasing linearly with the number of mutations

already accumulated (appendix A). The phenotypic

dynamics resulting from this first tier of the model are

then used as input for the second tier of the model,

the molecular evolution submodel. We showed here

that this second submodel simulates sequence data

from which quantitative indexes (divergence and diver-

sity metrics) can be computed, which can be compared

with empirical sequence data. Furthermore, phyloge-

nies can be inferred from these simulated sequences,

with branch lengths that are comparable to those

from trees inferred from empirical viral sequences. The

two-tiered model in its entirety can, therefore, be used

to generate case data and sequence data that can be

confronted statistically against empirical datasets of

these two types.

The modularity of the two-tiered model is its prin-

cipal strength and will allow this framework to be

adapted to consider alternative hypotheses and to

include alternative, and potentially better or faster,

submodels. For example, the first tier of the model

uses a status-based, reduced infectivity multi-strain

model in its implementation (Gog & Grenfell 2002).

However, recent work has shown that this model, in

contrast to other, more highly dimensional, multi-

strain models, overestimates the level of herd immu-

nity to a new antigenic variant (Ballesteros et al.

2009). Owing to its modularity, the two-tiered model

can be easily modified to consider alternative epide-

miological submodels, for example, the well-known

history-based multi-strain model (Andreasen et al.

1997). Furthermore, any of these models can be

extended to consider specific questions of interest,

such as what climate variables are important drivers

of influenza’s seasonal dynamics (Shaman et al.

2009), what role population substructure plays in the

ecological and evolutionary dynamics of influenza

(Truscott et al. 2009) and how cross-immunity may

act (e.g. by its separate effects on infectiousness and

infectious period; Park et al. 2009). The only require-

ment of the epidemiological submodel is that it

generates variant-specific case dynamics, which are

used as input into the second tier of the model

(electronic supplementary material, figure S1).

(a)

(b)

proportion

(c)

cases per

(d )

2000

cases per

100 000

1000

0

2000

100 000

1000

0

in cluster

1.0

0.5

0

proportion

in cluster

1.0

0.5

0

19651970 19751980 19851990 19952000 20052010

time (years)

Figure 8. Simulated epidemiological dynamics of influenza A (H3N8) in equine host across North America and Europe. (a) Simu-

lated case dynamics in North America. Antigenic clusters are colour-coded. (b) Proportion of cases in North America belonging to

each antigenic cluster. (c) Simulated case dynamics in Europe. Antigenic clusters are colour-coded. (d) Proportion of cases in

Europe belonging to each antigenic cluster. The black line in 1986 marks the establishment of quarantine. The black line in

1999 marks the breakdown of quarantine for horses transported from North America to Europe. Parameters used: population

sizes NNA¼ 9500 000 and NE¼ 4500 000 (http://www.horsetalk.co.nz/archives/2007/09/105.shtml), R0¼ 2, m ¼ 1/20 yr21,

n ¼ 1/7d21, g ¼ 8yr21, 1 ¼ 0, u ¼ 0.6 and Weibull parameters k ¼ 2, l ¼ 900. Prior to 1986, rE!NA¼ rNA!E¼ 0.01. Between

1986 and 1999, rE!NA¼ rNA!E¼ 0. After 1999, rNA!E¼ 0.01 and rE!NA¼ 0.

Two-tiered phylodynamic model

K. Koelle et al.

13

J. R. Soc. Interface

on March 26, 2010 rsif.royalsocietypublishing.orgDownloaded from

Page 14

Similarly, the molecular evolution submodel, as

described, can be easily replaced with an alternative sub-

model. One possible alternative submodel that would be

computationally faster, but apply to a more limited

number of cases, might use approaches based on

coalescent theory to yield viral genealogies. A second

possibility is to replace the current molecular evolution

submodel with a model that has a mechanistic link

between the parameter f in the second tier of the

model and the parameter g in the first tier of the

model. A third possibility would be to consider not

just the process of point mutations, but also to allow

for insertions, deletions, recombination and, for segmen-

ted viruses, reassortment. Here, the only requirement for

the second tier is that it takes in variant-specific case

data and generates time-stamped viral sequences.

Following its description, we applied the two-tiered

model to influenza A (H3N2) in humans, to influenza

B in humans and to influenza A (H3N8) in equine

hosts in order to illustrate its use. In the first appli-

cation, we showed that a model parametrized for a

combination of gradual and punctuated antigenic

change could quantitatively reproduce the ecological

and evolutionary patterns of this subtype in humans.

In contrast, and consistent with previous theoretical

findings (Ballesteros et al. 2009), a model with purely

punctuated antigenic evolution failed to capture these

patterns well. In the electronic supplementary material,

we also showed that only gradual antigenic evolution

was not consistent with all of the observed dynamics

of influenza A (H3N2). The ability of only the model

with both modes of antigenic change to quantitatively

reproduce the dynamic patterns of this variant resolves

the seemingly contradictory findings that antigenic

change occurs either in a punctuated (Smith et al.

2004; Wolf et al. 2006; Blackburne et al. 2008) or in a

gradual (Shih et al. 2007; Suzuki 2008) manner. Both

modes of antigenic evolution appear necessary: gradual

antigenic evolution is needed to reproduce the observed

periodicity of influenza’s ecological dynamics and the

rapid rate of HA divergence, while punctuated antigenic

evolution is needed to reproduce rates of divergence and

the overall ladder-like topology of influenza A (H3N2)’s

HA protein.

The application of the model to influenza B illustrated

the model’s ability to generate qualitatively different eco-

logical and evolutionary dynamics under alternative

parametrizations. This ability is critical for the model

to be effectively interfaced with different empirical data-

sets, through the development and application of new

statistical approaches. The application of the model to

influenza A (H3N8) in equine hosts served to illustrate

the ease with which the model could be extended to

accommodate further hypotheses. Specifically, we con-

sidered the hypothesis that the introduction and later

weakening of quarantine measures between North Amer-

ica and Europe played a role in shaping the evolutionary

dynamics of H3N8. Although the model realizing this

hypothesis was able to reproduce features of the ecologi-

cal and evolutionary dynamics of H3N8, alternative

hypotheses could easily be considered within this frame-

work. A statistical comparison between these models’

simulated sequences (and possibly case dynamics)

could then determine the appropriate level of support

for each of the models considered.

In our applications to flu, we parametrized the two-

tiered model to consider the effects of humoral

immune escape, driven by genetic changes in the

virus’s dominant antigenic protein. While this parame-

trization has empirical support in the case of influenza

(Smith et al. 2004), we may want to consider alternative

mechanisms of immune escape. For example, there is

evidence for positive selection of cytotoxic T lympho-

cyte escape mutants (Gog et al. 2003). Another

possible hypothesis is that generalized immunity plays

a role in shaping the ecological and evolutionary

dynamics of influenza (Ferguson et al. 2003). These

hypotheses, as other ones mentioned above, could

easily be integrated into this two-tiered modelling fra-

mework. This integration would enable us to finally

compare these hypotheses in a quantitative way, consid-

ering both incidence data and sequence data.

Although our focus here was on the ecological and

evolutionary dynamics of RNA viruses at the popu-

lation level, the two-tiered structure of the model

could also be used to consider the dynamics at another

level of organization. Specifically, while we modelled the

dynamics of susceptible, infected and recovered hosts

here, within-host dynamics could instead consider

classes of naive cells and cells that are infected with

virus of different antigenic phenotypes. In lieu of simu-

lating epidemiological dynamics, the first tier of the

model would simulate the viral load dynamics, by anti-

genic type. The second tier of the model would then be

used again to generate viral sequences that could be

compared with viral sequences isolated from a single

chronically infected host over several time points (e.g.

in the case of HIV; Shankarappa et al. 1999).

Regardless of whether the two-tiered framework pre-

sented here is applied at the within-host level or the

population level, its ability to generate both case data

and sequence data that can be statistically confronted

with empirical observations will improve our under-

standing of the key drivers of viral dynamics, and may

thereby ultimately help in their control.

We thank the three reviewers for their thoughtful comments

and suggestions and members of the Koelle research group

for detailed feedback. Support for K.K., P.K. and M.K. was

provided by grant NSF-EF-08-27416 and by the RAPIDD

programme of the Science and Technology Directorate,

Departmentof Homeland

International Center, National Institutes of Health.

Security,and theFogarty

APPENDIX A. A MECHANISTIC

INTERPRETATION OF THE WEIBULL

HAZARD FUNCTION

We can interpret the rate of antigenic change increasing

with the age of a variant (as specified by the Weibull

hazard function in equation (2.3)) by considering the

following Markov chain on the non-negative integers.

Denote by Pn(t) the probability mass function for the

compound event that, at time t, a viral sequence belong-

ing to a given antigenic variant i has accrued n

mutations from the founder sequence of the variant

14

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.org Downloaded from

Page 15

and that a new antigenic variant j has not yet arisen

from the mutations that have accrued in this sequence.

Denote by Qn(t) the probability that, at time t, the viral

sequence has accrued n mutations and that a new anti-

genic variant j has already arisen from these accrued

mutations. Using these definitions, we can write the

followingmasterequation

production of new antigenic variants

forconsidering the

dPnðtÞ

dt

¼ mð1 ? an?1ÞPn?1ðtÞ ? mPnðtÞðA1aÞ

and

dQnðtÞ

dt

¼ man?1Pn?1ðtÞ;

ðA1bÞ

where m is the per-sequence mutation rate and anis the

probability that a mutation to an antigenically pre-

served sequence that has already accrued n mutations

will result in a new antigenic variant. As the simplest

possible non-trivial assumption, we let this probability

be linearly increasing with the number of mutations n

an¼ rn;

ðA1cÞ

where r can be thought of as a parameter that captures,

inversely, the antigenic protein’s robustness to genetic

change. Let us consider te

antigenic variant i, to be t ¼ 0. At this time,

any sequence belonging to cluster i is in an unmutated

state

i, the time of emergence of

P0ðte

Qnðte

iÞ ¼ 1;Pnðte

iÞ ¼ 0foralln:

iÞ ¼ 0 forall n . 0and

To solve this system, we can define the generating

function

gðu;tÞ ¼

X

1

n¼0

unPnðtÞ:

ðA2Þ

Differentiating with respect to t, we have

@

@tgðu;tÞ¼m

X

1

n¼0

un½ð1?rðn?1ÞÞPn?1ðtÞ

?PnðtÞ?:

ðA3Þ

Differentiating with respect to u, we have

@

@sgðu;tÞ ¼

X

1

n¼0

nun?1PnðtÞ;

ðA4Þ

so that

@

@tgðu;tÞ þ mru2@

@sgðu;tÞ ¼ mðu ? 1Þgðu;tÞ:

ðA5Þ

To solve this, we use the method of characteristics.

Let u ¼ s(t,u0) define a family of curves with u0¼

y(0,u0) that covers the plane such that each point (t, u)

in the plane lies on one and only one curve in this

family and thus can be traced back uniquely to its

‘origin’ (0,u0).

We will produce solutions of equation (A 5) defined

on each of these curves and then consider the curves

together to form the complete solution. Define the

function restricted to the curve with origin at (0,u0)

~ gðt;u0Þ ; gðyðt;u0Þ;tÞ:

ðA6Þ

We now have

d

dt~ gðt;u0Þ ¼@

@tgðyðt;u0Þ;tÞ þ@

?dy

@ugðyðt;u0Þ;tÞ

dtðt;u0Þ:

ðA7Þ

Substituting equation (A 5) yields

d

dt~ gðt;u0Þ¼@

@tgðyðt;u0Þ;tÞ

mðy?1Þgðyðt;u0Þ;tÞ?ð@=@tÞgðyðt;u0Þ;tÞ

mry2

þ

?dy

??

dtðt;u0Þ:

ðA8Þ

We are free to define our curves any way we choose

within the constraints given above, so we let

dy

dt¼ mry2;

ðA9Þ

with solution

yðt;u0Þ ¼

u0

1 ? mu0rt:

ðA10Þ

Now, substituting equation (A 9) into equation (A 8)

yields

d

dt~ gðt;u0Þ ¼ mðy ? 1Þ~ gðt;u0Þ:

ðA11Þ

With the additional substitution of equation (A 10),

this yields

?

which itself has solution

d

dt~ gðt;u0Þ ¼ mu0ð1 þ mrtÞ ? 1

1 ? mu0rt

?

~ gðt;u0Þ;

ðA12Þ

~ gðt;u0Þ ¼~ gð0;u0Þeð?ð1=rÞlogð1?u0mrtÞ?mtÞ

¼~ gð0;u0Þð1 ? u0mrtÞð?1=rÞe?mt: ðA13Þ

initialcondition at

~ gð0;u0Þ ¼ 1; leaving

~ gðt;u0Þ ¼ ð1 ? u0mrtÞ?1=re?mt:

Finally, we trace back from u0to u, solving equation

(A 10) for u0

Thetime

t ¼ 0becomes

ðA14Þ

u0¼

u

1 þ umrt;

ðA15Þ

to get

gðu;tÞ ¼ ð1 þ umrtÞ1=re?mt:

The probability of escape by time t is

ðA16Þ

PEðtÞ ¼ 1 ? gð1;tÞ ¼ 1 ? ð1 þ mrtÞ1=re?mt:

ðA17Þ

Two-tiered phylodynamic model

K. Koelle et al.

15

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 16

To leading order, we have

logð1 ? PEðtÞÞ ¼ ?1

2rm2t2þ OððmrtÞ3Þ;

ðA18Þ

or

PEðtÞ ? 1 ? eð?1=2Þrm2t2:

ðA19Þ

The density function for the time of antigenic escape

is just the time derivative of PE. To leading order, this is

fEðtÞ ¼ ðrm2Þte?1=2ðrm2Þt2;

p

ðA20Þ

which is a Weibull density function with shape

parameter k ¼ 2 and l ¼

A mutational process for which the probability of

antigenic immune escape increases linearly with the

number of mutations already accrued can, therefore,

be modelled phenomenologically with a hazard rate

with shape parameter k ¼ 2 and a value of l that

depends on the viral mutation rate m and its sensitivity

to phenotypic change r.

Several modifications to this mechanistic model can

be considered. First, instead of assuming that the prob-

ability of antigenic immune escape is linearly increasing

with the number of accrued mutations (equation (A 1c)),

we can consider a second model that is also increasing

with the number of accrued mutations, but also has

the desired feature of saturating at 1. One such function

is an¼ rn/(rn þ 1). However, this function and those of

a similar form do not have a closed-form solution for the

probability of escape by time t. Although the linear func-

tion we use in the derivation of the Weibull hazard

function (equation (A1c)) allows the probability of anti-

genic immune escape upon the next mutation to exceed

1, we can simulate the epidemiological submodel in a

parameter regime where we are ensured that this under-

lying probability (which we do not explicitly use in the

simulations) would never get sufficiently close to 1.

Second, instead of using the hazard rate of the

Weibull density function shown in equation (A 20),

one could use an exact expression for the rate of gener-

ating new antigenic variants that is derived from the

Markov model shown in equations (A1). This rate

would be based on the exact expression for the cumulat-

ive density function shown in equation (A 17). In our

simulations, we instead used the Weibull functional

form because it is a simple and well-known distribution

that has the ageing properties that we find biologically

necessary.

ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2=m2r

.

REFERENCES

Andreasen, V., Lin, J. & Levin, S. A. 1997 The dynamics of

co-circulating influenza strains conferring partial cross-

immunity. J. Math. Biol. 35, 825–842. (doi:10.1007/

s002850050079)

Ballesteros, S., Vergu, E. & Cazelles, B. 2009 Influenza A gra-

dual andepochal evolution:

models. PLoS ONE 4, e7426. (doi:10.1371/journal.pone.

0007426)

Berg, M., Desselberger, U., Abusugra, I. A., Klingeborn, B. &

Linne ´, T. 1990 Genetic drift of equine 2 influenza A virus

(H3N8), 1963–1988: analysis by oligonucleotide mapping.

insights fromsimple

Vet.

1135(90)90109-9)

Berton, M. T., Naeve, C. W. & Webster, R. G. 1984 Antigenic

structure of the influenza B virus hemagglutinin: nucleo-

tide sequence analysis of antigenic variants selected with

monoclonal antibodies. J. Virol. 52, 919–927.

Blackburne, B. P., Hay, A. J. & Goldstein, R. A. 2008

Changing selective pressure during antigenic changes in

human influenza H3. PLoS Pathog. 4, e1 000 058. (doi:10.

1371/journal.ppat.1000058)

Bobashev, G. V., Ellner, S. P., Nychka, D. W. & Grenfell,

B. T. 2000 Reconstructing susceptible and recruitment

dynamics from measles incidence data. Math. Popul.

Stud. 8, 1–29. (doi:10.1080/08898480009525471)

Bryant, N. A. et al. 2009 Antigenic and genetic variations in

European and North American equine influenza virus

strains (H3N8) isolated from 2006 to 2007. Vet. Microbiol.

138, 41–52. (doi:10.1016/j.vetmic.2009.03.004)

Chambers, T. M., Lai, A. C. K., Franklin, K. M. & Powell, D. G.

1994 Recent evolution of the hemagglutinin of equine-2 influ-

enza virus in the USA. In Proc. 7th Int. Conf. Equine

Infectious Diseases, Tokyo, 1994, pp. 175–180. Newmarket,

UK: R & W Publications (Newmarket).

Cobey, S. & Koelle, K. 2008 Capturing escape in infectious

disease dynamics. Trends Ecol. Evol. 23, 572–577.

(doi:10.1016/j.tree.2008.06.008)

Daly, J. M., Lai, A. C., Binns, M. M., Chambers, T. M.,

Barrandeguy, M. & Mumford, J. A. 1996 Antigenic and gen-

etic evolution of equine H3N8 influenza A viruses. J. Gen.

Virol. 77, 661–671. (doi:10.1099/0022-1317-77-4-661)

Damiani, A. M. et al. 2008 Genetic characterization of equine

influenza viruses isolated in Italy between 1999 and

2005. Virus Res. 131, 100–105. (doi:10.1016/j.virusres.

2007.08.001)

Daniels, R. S., Skehel, J. J. & Wiley, D. C. 1985 Amino acid

sequences of haemagglutinins of influenza viruses of the

H3 subtype isolated from horses. J. Gen. Virol. 66,

457–464. (doi:10.1099/0022-1317-66-3-457)

Endo, A., Pecoraro, R., Sugita, S. & Nerome, K. 1992 Evol-

utionary pattern of the H3 haemagglutinin of equine

influenza viruses: multiple evolutionary lineages and

frozen replication. Arch. Virol. 123, 73–87. (doi:10.1007/

BF01317139)

Felsenstein, J. 2004 Inferring phylogenies. Sunderland, MA:

Sinauer Associates, Inc.

Ferguson, N. M., Galvani, A. P. & Bush, R. M. 2003 Ecological

and immunological determinants of influenza evolution.

Nature 422, 428–433. (doi:10.1038/nature01509)

Finkensta ¨dt, B. & Grenfell, B. 2000 Time series modelling

of childhood diseases: a dynamical systems approach.

J. R. Stat. Soc. Ser. C 49, 187–205. (doi:10.1111/1467-

9876.00187)

Fitch, W. M., Bush, R. M., Bender, C. A. & Cox, N. J. 1997

Long term trends in the evolution of H(3) HA1 human

influenza type A. Proc. Natl Acad. Sci. USA 94,

7712–7718. (doi:10.1073/pnas.94.15.7712)

Fraser, C. et al. 2009 Pandemic potential of a strain of influ-

enza A (H1N1): early findings. Science 324, 1557–1561.

(doi:10.1126/science.1176062)

Gill, P. W. & Murphy, A. M. 1977 Naturally acquired immu-

nity to influenza type A: a further prospective study.

Med. J. Aust. 2, 761–765.

Gillespie, D. T. 2007 Stochastic simulation of chemical kin-

etics. Annu. Rev. Phys. Chem. 58, 35–55. (doi:10.1146/

annurev.physchem.58.032806.104637)

Girvan, M., Callaway, D. S., Newman, M. E. J. & Strogatz,

S. H. 2002 Simple model of epidemics with pathogen

mutation. Phys. Rev. E 65, 1–9.

Microbiol.

22, 225–236.(doi:10.1016/0378-

16

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 17

Gog, J. R. & Grenfell, B. T. 2002 Dynamics and selection of

many-strain pathogens. Proc. Natl Acad. Sci. USA 99,

17209–17214. (doi:10.1073/pnas.252512799)

Gog, J. R., Rimmelzwaan, G. F., Osterhaus, A. D. M. E. &

Grenfell, B. T. 2003 Population dynamics of rapid fixation

in cytotoxic T lymphocyte escape mutants of influenza A.

Proc. Natl Acad. Sci. USA 100, 11143–11 147. (doi:10.

1073/pnas.1830296100)

Go ¨kaydin, D., Oliveira-Martins, J. B., Gordo, I. & Gomes,

M. G. M. 2007 The reinfection threshold regulates patho-

gen diversity: the case of influenza. J. R. Soc. Interface

4, 137–142. (doi:10.1098/rsif.2006.0159)

Grenfell, B. T., Pybus, O. G., Gog, J. R., Wood, J. L. N.,

Daly, J. M., Mumford, J. A. & Holmes, E. C. 2004 Unify-

ing the epidemiological and evolutionary dynamics of

pathogens. Science 303, 327–332. (doi:10.1126/science.

1090727)

Hinshaw, V. S., Naeve, C. W., Webster, R. G., Douglas, A.,

Skehel, J. J. & Bryans, J. 1983 Analysis of antigenic vari-

ation in equine 2 influenza A viruses. Bull. World Health

Org. 61, 153–158.

Holmes, E. C. & Grenfell, B. T. 2009 Discovering the phylody-

namics of RNA viruses. PLoS Comput. Biol. 5, e1000505.

(doi:10.1371/journal.pcbi.1000505)

Ilobi, C. P., Nicolson, C., Taylor, J., Mumford, J. A., Wood,

J. M. & Robertson, J. S. 1998 Direct sequencing of the

HA gene of clinical equine H3N8 influenza virus and com-

parison with laboratory derived viruses. Arch. Virol. 143,

891–901. (doi:10.1007/s007050050340)

Ionides, E. L., Breto ´, C. & King, A. A. 2006 Inference for non-

linear dynamical systems. Proc. Natl Acad. Sci. USA 103,

18438–18443. (doi:10.1073/pnas.0603181103)

Kawaoka, Y., Bean, W. J. & Webster, R. G. 1989 Evolution

of the hemagglutinin of equine H3 influenza viruses.

Virology

169, 283–292.

90153-0)

King, A. A., Ionides, E. L., Pascual, M. & Bouma, M. J. 2008

Inapparent infections and cholera dynamics. Nature 454,

877–880. (doi:10.1038/nature07084)

Koelle, K. & Pascual, M. 2004 Disentangling extrinsic from

intrinsic factors in disease dynamics: a nonlinear time

series approach with an application to cholera. Am. Nat.

163, 901–913. (doi:10.1086/420798)

Koelle, K., Cobey, S., Grenfell, B. & Pascual, M. 2006 Epochal

evolution shapes the phylodynamics of influenza A (H3N2)

in humans. Science 314, 1898–1903. (doi:10.1126/science.

1132745)

Koelle, K., Kamradt, M. & Pascual, M. 2009 Understanding

the dynamics of rapidly evolving pathogens through mod-

eling the tempo of antigenic change: influenza as a case

study. Epidemics 1, 129–137. (doi:10.1016/j.epidem.

2009.05.003)

Lai, A. C., Rogers, K. M., Rogers, K. M. & Chamber, T. 2004

Alternate circulation of recent equine-2 influenza viruses

(H3N8) from two distinct lineages in the United States.

Virus Res. 100, 159–164. (doi:10.1016/j.virusres.2003.11.

019)

Lau, K. F. & Dill, K. A. 1990 Theory for protein mutability

and biogenesis. Proc. Natl Acad. Sci. USA 87, 638–642.

(doi:10.1073/pnas.87.2.638)

Lindstrom, S., Endo, A., Pecoraro, R., Sugita, S., Hiromoto,

Y. & Nerome, K. 1994 Genetic divergency of equine

(H3N8) influenza viruses: cocirculation of earliest and

recent strains. In Proc. 7th Int. Conf. Equine Infectious

Diseases, Tokyo, 1994. p. 307. Newmarket, UK: R & W

Publications (Newmarket).

Livesay, G. J., O’Neill, T., Hannant, D., Yadav, M. P. &

Mumford, J. A. 1993 The outbreak of equine influenza

(doi:10.1016/0042-6822(89)

(H3N8) in the United Kingdom in 1989: diagnostic use of

an antigen capture ELISA. Vet. Rec. 133, 515–519.

Minayev, P. & Ferguson, N. 2009a Improving the realism of

deterministic multi-strain

modelling influenza A. J. R. Soc. Interface 6, 509–518.

(doi:10.1098/rsif.2008.0333)

Minayev, P. & Ferguson, N. 2009b Incorporating demographic

stochasticity intomulti-strain

application to influenza A. J. R. Soc. Interface 6,

989–996. (doi:10.1098/rsif.2008.0467)

Monto, A. S. & Kioumehr, F. 1975 The Tecumseh study of

respiratory illness. IX. Occurrence of influenza in the

community, 1966–1971. Am. J. Epidemiol. 102, 553–563.

Nakagawa, N., Kubota, R., Maeda, A., Nakagawa, T. &

Okuno, Y. 2000 Heterogeneity of influenza B virus strains

in one epidemic season differentiated by monoclonal anti-

bodies and nucleotide sequences. J. Clin. Microbiol. 38,

3467–3469.

Nakagawa, N., Kubota, R., Morikawa, S., Nakagawa, T.,

Baba, K. & Okuno, Y. 2001a Characterization of new

epidemic strains of influenza B virus by using neutralizing

monoclonal antibodies. J. Med. Virol. 65, 745–750.

(doi:10.1002/jmv.2099)

Nakagawa, N., Kubota, R., Nakagawa, T. & Okuno, Y. 2001b

Antigenic variants with amino acid deletions clarify a neu-

tralizing epitope specific for influenza B virus Victoria

group strains. J. Gen. Virol. 82, 2169–2172.

Nakagawa, N., Nukuzuma, S., Haratome, S., Go, S.,

Nakagawa, T. & Hayashi, K. 2002 Emergence of an influ-

enza B virus with antigenic change. J. Clin. Microbiol.

40, 3068–3070. (doi:10.1128/JCM.40.8.3068-3070.2002)

Nakajima, S., Nakajima, K. & Kendal, A. P. 1983 Identifi-

cation of the binding sites to monoclonal antibodies on

A/USSR/90/77 (H1N1) hemagglutinin and their involve-

ment in antigenic drift in H1N1 influenza viruses.

Virology

131, 116–127.

90538-X)

Nerome, R., Hiromoto, Y., Sugita, S., Tanabe, N., Ishida, M.,

Matsumoto, M., Lindstrom, E., Takahashi, T. & Nerome, K.

1998 Evolutionary characteristics of influenza B virus since

its first isolation in 1940: dynamic circulation of deletion and

insertion mechanism. Arch. Virol. 143, 1569–1583. (doi:10.

1007/s007050050399)

Nobusawa, E. & Sato, K. 2006 Comparison of the mutation

rates of human influenza A and B viruses. J. Virol. 80,

3675–3678. (doi:10.1128/JVI.80.7.3675-3678.2006)

Oxburgh, L. & Klingeborn, B. 1999 Cocirculation of two

distinct lineages of equine influenza virus subtype H3N8.

J. Clin. Microbiol. 37, 3005–3009.

Oxburgh, L., Berg, M., Klingeborn, B., Emmoth, E. & Linne ´, T.

1994 Evolution of H3N8 equine influenza virus from 1963 to

1991.

VirusRes.

34,

1702(94)90097-3)

Oxburgh, L., Akerblom, L., Fridberger, T., Klingeborn, B. &

Linne ´, L. 1998 Identification of two antigenically and

genetically distinct lineages of H3N8 equine influenza

virus in Sweden. Epidemiol. Infect. 120, 61–70. (doi:10.

1017/S0950268897008315)

Park, A., Daly, J. M., Lewis, N. S., Smith, D. J., Wood,

J. L. N. & Grenfell, B. T. 2009 Quantifying the impact of

immune escape on transmission dynamics of influenza.

Science 326, 726–728. (doi:10.1126/science.1175980)

Plotkin, J. B., Dushoff, J. & Levin, S. A. 2002 Hemagglutinin

sequence clusters and the antigenic evolution of influenza

A virus. Proc. Natl Acad. Sci. USA 99, 6263–6268.

(doi:10.1073/pnas.082110799)

Recker, M., Pybus, O. G., Nee, S. & Gupta, S. 2007 The gen-

eration of influenza outbreaks by a network of host

models: implicationsfor

epidemicmodels:

(doi:10.1016/0042-6822(83)

153–165. (doi:10.1016/0168-

Two-tiered phylodynamic model

K. Koelle et al.

17

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 18

immune responses against a limited set of antigenic types.

Proc. Natl Acad. Sci. USA 104, 7711–7716. (doi:10.1073/

pnas.0702154104)

Rota, P. A., Hemphill, M. L., Whistler, T., Regnery, H. L. &

Kendal, A. P. 1992 Antigenic and genetic characterization

of the haemagglutinins of recent cocirculating strains of

influenza B virus. J. Gen. Virol. 73, 2737–2742. (doi:10.

1099/0022-1317-73-10-2737)

Shaman, J., Pitzer, V., Cecile, V., Marc, L. & Bryan, G. 2009

Absolute humidity and the seasonal onset of influenza

in the continental US. PLoS Curr. Influenza.

http://knol.google.com/k/plos/plos-currents-influenza/28q

m4w0q65e4w/1.

Shankarappa, R. et al. 1999 Consistent viral evolutionary

changes associatedwith

immunodeficiency virus type 1 infection. J. Virol. 73,

10 489–10 502.

Shih, A. C., Hsiao, T., Ho, M.-S. & Li, W.-H. 2007 Simultaneous

amino acid substitutions at antigenic sites drive influenza A

hemagglutinin evolution. Proc. Natl Acad. Sci. USA 104,

6283–6288. (doi:10.1073/pnas.0701396104)

Smith, D. J., Lapedes, A. S., de Jong, J. C., Bestebroer, T. M.,

Rimmelzwaan, G. F., Osterhaus, A. D. M. E. & Fouchier,

R. A. M. 2004 Mapping the antigenic and genetic evolution

of influenza virus. Science 305, 371–376. (doi:10.1126/

science.1097211)

Suzuki, Y. 2008 Positive selection operates continuously on

hemagglutinin during evolution of H3N2 human influenza A

virus. Gene 427, 111–116. (doi:10.1016/j.gene.2008.09.012)

Tria, F., La ¨ssig, M., Peliti, L. & Franz, S. 2005 A minimal

stochastic model for influenza evolution. J. Stat. Mech.

P07008, 1–11.

Truscott, J., Fraser, C., Hinsley, W., Cauchemez, S.,

Donnelly, C., Ghani, A. & Ferguson, N. 2009 Quantifying

the transmissibility of human influenza and its seasonal

See

theprogressionofhuman

variation in temperate regions. PLoS Curr. Influenza. See

http://knol.google.com/k/plos/plos-currents-influenza/28q

m4w0q65e4w/1.

van Nimwegen, E. 2006 Influenza escapes immunity along

neutral networks. Science 314, 1884–1886. (doi:10.1126/

science.1137300)

Waddell, G. H., Teigland, M. B. & Siegel, M. M. 1963 A new

influenza virus associated with equine respiratory disease.

J. Am. Vet. Med. Assoc. 143, 587–590.

Webster, R. G. 1993 Are equine 1 influenza viruses still

present in horses? Equine Vet. J. 25, 537–538.

Webster, R. G. & Laver, W. G. 1980 Determination of the

number of nonoverlapping antigenic areas on Hong Kong

(H3N2) influenza virus hemagglutinin with monoclonal

antibodies and the selection of variants with potential epi-

demiological significance. Virology 104, 139–148. (doi:10.

1016/0042-6822(80)90372-4)

Wiley, D. C., Wilson, I. A. & Skehel, J. J. 1981 Structural

identification of the antibody-binding sites of Hong Kong

influenza haemagglutinin and their involvement in anti-

genic variation. Nature 289, 373–378. (doi:10.1038/

289373a0)

Wilson, I. A. & Cox, N. J. 1990 Structural basis of immune

recognition of influenza virus hemagglutinin. Annu. Rev.

Immunol.

8, 737–787.

040190.003513)

Wolf, Y. I., Viboud, C., Holmes, E. C., Koonin, E. V. &

Lipman, D. J. 2006 Long intervals of stasis punctuated

by bursts of positive selection in the seasonal evolution of

influenza A virus. Biol. Direct 1, 34. (doi:10.1186/1745-

6150-1-34)

Xia, Y., Gog, J. R. & Grenfell, B. T. 2005 Semiparametric

estimation of the duration of immunity from infectious dis-

ease time-series: influenza as a case study. Appl. Stat. 54,

659–672. (doi:10.1111/j.1467-9876.2005.05383.x)

(doi:10.1146/annurev.iy.08.

18

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010 rsif.royalsocietypublishing.orgDownloaded from

Page 19

1

Supplemental material for:

A two-tiered model for simulating the ecological and evolutionary dynamics

of rapidly evolving viruses, with an application to influenza

Katia Koelle, Priya Khatri, Meredith Kamradt, Thomas B. Kepler

Page 20

2

A schematic of the two-tiered model. Figure S1 below provides a schematic to assist in

the visualization of the two-tiered model’s structure and the flow of data between its

submodels.

Figure S1. A schematic of the two-tiered model. The first tier consists of an

epidemiological submodel (a) that is used to simulate variant-specific case data (b). In the

main text, we use the status-based reduced infectivity model as our epidemiological

multi-strain model (Gog & Grenfell 2002). Gray arrows represent events. The I ? R

transition reflects recovery, the R? S transition reflects loss of immunity, the S? I

transition reflects infection, with cross-immunity acting to transition S?R in a variant

with cross-immunity. The transition I?I reflects an antigenic emergence event. Variant-

specific case data (b) are used as input into the second tier of the model, the molecular

evolution submodel (c). Forward simulations incorporating changes in the number of

infected hosts and mutations yield sequence samples at specific timepoints (purple

arrows). These sequences are used to generate viral phylogenies as well as time-

dependent patterns of viral divergence and diversity (d).

Page 21

3

The dynamics of influenza A (H3N2) under a model of only gradual antigenic

evolution. In the main text, we considered the dynamics of influenza A (H3N2) under (i)

a model of purely punctuated antigenic evolution, and (ii) a model of antigenic evolution

that includes both punctuated and gradual antigenic change. We showed that the second

model is better able to reproduce the ecological and evolutionary dynamics of influenza

A (H3N2). Here we consider a third possibility, namely that the dynamics of this subtype

in humans is governed by only gradual antigenic evolution.

To consider this possibility, we parameterize the epidemiological submodel (equations

(4) and (5) in the main text) with a waning immunity rate γ above zero and an antigenic

emergence rate

)(th

of zero (this is done by setting λ of the Weibull hazard function to

infinity), resulting in a simple SIRS model. Specifying γ as 1/8 yrs-1 (Dushoff et al.

2004), simulations of the epidemiological submodel resulted in strictly annual cycles,

with incidence levels significantly lower than empirically observed (Figure S2a versus

Figure 1a). These epidemiological results are consistent with a previous model (the

antigenic tempo model (Koelle et al. 2009)) that models gradual antigenic change

mechanistically by having new amino acid variants only weakly escape herd immunity

(Figure S2b). In both models, the lack of interannual varibiability in the simulated time

series and their low attack rates yield dynamics that are inconsistent with those observed

for influenza A (H3N2).

Page 22

4

Figure S2. Simulated influenza A (H3N2) case dynamics under a model of only gradual

antigenic evolution. (a) Simulations using the two-tiered model presented in the main

text. Parameters are as in Figure 3, with the exception of γ, which here is 1/8 yrs-1. and λ,

which here is infinity. (b) Simulations using the antigenic tempo model (Koelle et al.

2009), letting each amino acid variant be considered its own distinct phenotype.

Reproduced from (Koelle et al. 2009). Colored lines in (b) show the dynamics of the

amino acid variants.

To simulate the molecular evolution of HA under this model, we parameterize the

molecular evolution submodel with values of f above zero. Simulations across a wide

range of f values yield divergence and diversity patterns shown in Figure S3. A

comparison between divergence patterns reconstructed from simulated sequences versus

those empirically observed (Figure S3, top row) shows that the rate of divergence under a

model of gradual antigenic evolution yields lower rates of divergence than are

empirically observed. A slowdown in the rate of divergence is also evident in each of

these subplots, which is absent from the empirical data’s divergence plot (Figure S3, top

left subplot). A comparison between diversity plots reconstructed from simulated

sequences versus from empirical sequences also shows some differences (Figure S3,

bottom row). Specifically, with weak within-cluster selection (f = 0.01 and to a lesser

extent, f = 0.02), diversity grows to too high a level. With stronger within-cluster

selection (f = 0.1 and f = 0.3), diversity levels resemble those empirically observed,

Page 23

5

although there diversity equilibrates to a slightly higher level in the simulated data

compared to the empirical data. One can easily argue though that high values of f

effectively act as punctuated immune escape, resulting in a rapid replacement of viral

sequences.

Figure S3. Divergence and diversity patterns for the ecological dynamics shown in Figure

S2, for various values of f. Top row: patterns of viral divergence. Bottom row: patterns

of viral diversity. The first column shows divergence and diversity patterns for empirical

sequence data. The remaining columns correspond to these patterns for various values of

f: f = 0, f = 0.01, f = 0.02, f = 0.1, and f = 0.3.

Finally, we inferred phylogenies for the sequences simulated for the above cases (f = 0, f

= 0.01, f = 0.02, f = 0.1, and f = 0.3). The resulting phylogenies are shown in Figure S4.

Clearly, with values of f above 0.01, the model with gradual antigenic evolution, as

specified, can yield ladderlike phylogenies. However, we do not think that these results

indicate that gradual antigenic evolution can easily yield ladderlike phylogenies; rather,

we think that the molecular evolution submodel we use in the second tier of our model is

not structurally appropriate to consider the case of only gradual antigenic evolution. The

reason for this is that the molecular evolution submodel assumes that sequences that are

genetically more distant from a cluster founder have higher relative fitness (when f > 0).

This is an acceptable assumption when a variant has recently emerged, because its

emergence results in a peak of cases which are genetically identical or very similar to the

cluster founder. However, if a variant persists in a host population for a longer duration

of time (the extreme of which is considered in a model of only gradual antigenic

Page 24

6

evolution), then the relative fitness of viral sequences are unlikely to depend on their

distances from the variant founder. Instead, their relative fitness levels are determined by

the swarm of sequences that have recently circulated in the population. We are currently

working on alternative forms of the molecular evolution submodel that take into

consideration this effect.

Figure S4. Phylogenies inferred from simulated sequences. Phylogenies shown in (a)

through (e) were inferred from simulated sequences, with f = 0, f = 0.01, f = 0.02, f = 0.1,

and f = 0.3, respectively.

An alternative approach for considering only gradual antigenic evolution is to

parameterize the two-tiered model with a γ value of zero, a low value for λ, a value of k

of one, and a very high value of θ. This parameterization yields an interpretation of an

antigenic variant as a unique amino acid sequence. Simulations of this system yield

explosive viral diversity, in contrast to the limited viral diversity observed empirically

Page 25

7

(Koelle et al. 2009) (Figure S5). The results shown in Figures S3 and S4 for gradual

antigenic evolution should therefore be interpreted with caution, with future work

needing to focus on a more appropriate molecular evolution submodel.

Figure S5. Strain diversity over time for a model with only gradual antigenic change.

Here, each amino acid sequence is considered its own variant. Unlike in Figure S3, viral

diversity continues to grow. Figure reproduced from (Koelle et al. 2009).

Supplemental References

Dushoff, J., Plotkin, J. B., Levin, S. A. & Earn, D. J. D. 2004 Dynamical resonance can

account for seasonality of influenza epidemics. Proceedings of the National

Academy of Sciences 101, 16915-16916.

Gog, J. R. & Grenfell, B. T. 2002 Dynamics and selection of many-strain pathogens.

Proceedings of the National Academy of Sciences 99, 17209-17214.

Koelle, K., Kamradt, M. & Pascual, M. 2009 Understanding the dynamics of rapidly

evolving pathogens through modeling the tempo of antigenic change: Influenza as

a case study. Epidemics 1, 129-137.