Page 1

A two-tiered model for simulating the

ecological and evolutionary dynamics

of rapidly evolving viruses, with an

application to influenza

Katia Koelle1,2,*, Priya Khatri1, Meredith Kamradt1

and Thomas B. Kepler3

1Department of Biology, Duke University, PO Box 90338, Durham, NC 27708, USA

2Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA

3Center for Computational Immunology, Department of Biostatistics and Bioinformatics,

Duke University Medical Center, PO Box 2734, Durham, NC 27705, USA

Understanding the epidemiological and evolutionary dynamics of rapidly evolving pathogens

is one of the most challenging problems facing disease ecologists today. To date, many math-

ematical and individual-based models have provided key insights into the factors that may

regulate these dynamics. However, in many of these models, abstractions have been made

to the simulated sequences that limit an effective interface with empirical data. This is

especially the case for rapidly evolving viruses in which de novo mutations result in antigeni-

cally novel variants. With this focus, we present a simple two-tiered ‘phylodynamic’ model

whose purpose is to simulate, along with case data, sequence data that will allow for a

more quantitative interface with observed sequence data. The model differs from previous

approaches in that it separates the simulation of the epidemiological dynamics (tier 1)

from the molecular evolution of the virus’s dominant antigenic protein (tier 2). This separ-

ation of phenotypic dynamics from genetic dynamics results in a modular model that is

computationally simpler and allows sequences to be simulated with specifications such as

sequence length, nucleotide composition and molecular constraints. To illustrate its use, we

apply the model to influenza A (H3N2) dynamics in humans, influenza B dynamics in

humans and influenza A (H3N8) dynamics in equine hosts. In all three of these illustrative

examples, we show that the model can simulate sequences that are quantitatively similar

in pattern to those empirically observed. Future work should focus on statistical estimation

of model parameters for these examples as well as the possibility of applying this model, or

variants thereof, to other host–virus systems.

Keywords: disease dynamics; viral evolution; multi-strain model;

influenza; phylodynamics

1. INTRODUCTION

The ecological and evolutionary dynamics of many

RNA viruses have been increasingly well described

over the last several decades, yet the factors driving

their dynamics are still only poorly understood. One

approach towards identifying key factors is through

the formulation of mathematical models that, when

analysed analytically or simulated, yield quantitative

predictions of the case dynamics and the evolutionary

dynamics of the viral population (Holmes & Grenfell

2009). In the case of antigenically variable viruses,

these ‘phylodynamic’ models (Grenfell et al. 2004) fre-

quently incorporate multiple antigenically distinct

strains and keep track of either the immune status or

the infection histories of individuals in the host

population.

These multi-strain models range in complexity from

the very simple and abstract (e.g. Girvan et al. 2002;

Tria et al. 2005) to the more complex and biologically

realistic (e.g. Ferguson et al. 2003; Koelle et al. 2006).

Many of them yield dynamics that are qualitatively

consistent with the dynamics they seek to reproduce.

For example, when parametrized with a short duration

of infection, the status-based multi-strain model devel-

oped by Gog & Grenfell (2002) yields self-organized

sets of strains that turn over in time, consistent with

empirical patterns of influenza. Other examples are

the phylodynamic models developed by Ferguson

et al. (2003) and Koelle et al. (2006), both of which

yield case dynamics and viral diversity patterns that

arequalitatively similartothoseobservedand

*Author for correspondence (katia.koelle@duke.edu).

Electronic supplementary material is available at http://dx.doi.org/

10.1098/rsif.2010.0007 or via http://rsif.royalsocietypublishing.org.

J. R. Soc. Interface

doi:10.1098/rsif.2010.0007

Published online

Received 7 January 2010

Accepted 4 March 2010

1

This journal is q 2010 The Royal Society

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 2

phylogenies that resemble the known topology of influ-

enza’s haemagglutinin (HA) protein. Although these

multi-strain models, among others, have been able to

simulate dynamics that are consistent with particular

features of the observed data, many of these models

embody different mechanistic hypotheses about what

factors play dominant roles in shaping the dynamics.

For example, the model by Gog & Grenfell (2002) con-

siders only strain-specific immunity, whereas the model

by Ferguson et al. (2003) considers the additional role

that generalized immunity may play in shaping the

evolutionary dynamics of influenza’s HA. The model

by Koelle et al. (2006) considers a third hypothesis:

that the evolutionary dynamics of influenza’s HA are

shaped by periodic selective sweeps occurring during

antigenic cluster transitions.

Given this growing set of phylodynamic models that

differ in their mechanistic hypotheses, determining

which model performs best when confronted statisti-

cally with observed data is now necessary. In the case

of phylodynamic models, these data come in two

forms: epidemiological (case) data and evolutionary

(sequence) data. Interfacing disease models with case

data has a long history (e.g. Bobashev et al. 2000;

Finkensta ¨dt & Grenfell 2000; Koelle & Pascual 2004;

Xia et al. 2005; Ionides et al. 2006; King et al. 2008),

with a subset of these analyses focusing on parameter

estimation and model selection for antigenically variable

RNA viruses (Xia et al. 2005; Fraser et al. 2009). How-

ever, phylodynamic models have to date not routinely

been tested statistically against observed sequence data.

A quantitative comparison of simulated sequence

data against observed sequence data could focus on a

number of different sequence-derived patterns. These

include divergence and diversity patterns, as well as

quantitative comparisons of phylogenies reconstructed

from simulated and observed sequences. Although many

phylodynamic models have considered at least one of

these patterns (Girvan et al. 2002; Ferguson et al. 2003;

Tria et al. 2005; Koelle et al. 2006; Minayev & Ferguson

2009a,b), the comparisons against observed data have

been only qualitative in nature. The reason for this limit-

ation lies in the current inability of these models to

capture these patterns quantitatively. This does not

imply that these models are missing the relevant pro-

cesses at play; rather, the quantitative mismatch

between model-simulated sequence data and empirical

sequence data results from the way in which sequences

have been represented in these models. Specifically,

phylodynamic models to date have simplified the rep-

resentation of viral sequences by considering bitstrings

(Girvan et al. 2002; Tria et al. 2005), a subset of codons

(Ferguson et al. 2003; Koelle et al. 2006) or a limited

number of antigenic loci (Recker et al. 2007; Minayev &

Ferguson 2009a,b). These sequence representations have

made the models computationally tractable at the cost

of simulating sequences that differ in length (or in struc-

ture) from the empirical sequences with which they are

being compared. In the case of the models that simulate

a subset of codons, a quantitative comparison could of

course be made between empirical and simulated

sequences, if only a subset of the empirical sequences

were considered. However, considering a subset of sites

introduces several difficulties. First, which subset should

be used? Our understanding of which sites are important

for antigenic change is still incomplete. Second, if differ-

ent phylodynamic models represent their sequences

differently, a quantitative comparison against sequence

data would use different subsets of the data. This

would result in models not being compared against the

same sequence dataset, making difficult the process of

model selection.

To enable a quantitative comparison between simu-

lated and observed sequence data, we here develop a

new phylodynamic model that makes explicit the differ-

ence between antigenic change and genetic change, and

thereby makes it computationally feasible to model

sequences in their entirety. Specifically, the model we for-

mulate consists of two tiers. The first tier of the model

simulates the ecological dynamics of the virus and its

antigenic phenotypes. As such, it builds conceptually on

the idea that strain phenotypes (i.e. antigenic variants

or clusters), instead of genotypes, can be used as the ‘fun-

damental particle’ for modelling RNA viruses such as

influenza (Plotkin et al. 2002). More recently, Go ¨kaydin

et al. (2007) and Ballesteros et al. (2009) have used this

phenotype-level approach to consider the invasion

dynamics of a new antigenic cluster into a host population

with a resident cluster. Most relevant to the research pre-

sented here, Koelle et al. (2009) have recently introduced

an antigenic tempo model, which, in a modified version,

we use here as the first tier of our two-tiered model.

The second tier of the model simulates the molecular

evolution of a virus’s antigenic protein. It does so by

taking as given the epidemiological dynamics simulated

in the first tier of the model. Biologically, this separation

of ecological dynamics from evolutionary dynamics is of

course absurd: the molecular changes of a virus drive the

emergence of new antigenic variants, and therewith affect

the case dynamics. However, the effect of molecular

changes on the epidemiological dynamics is indirect, with

the link being the dynamics of the antigenic phenotypes.

As such, to simulate case dynamics, a phenomenological

model that reproduces the emergence dynamics of anti-

genic variants can be considered. It is this modular

separation of phenotypic dynamics from genotypic

dynamics that simplifies the computational complexity of

the simulations and thereby allows us to simulate viral

sequencesthatcanbestatisticallycomparedwithempirical

sequence data. Below, we first describe the epidemiological

submodel and detail (in appendix A) how this model can

be mechanistically interpreted in terms of mutations that

enable immune escape. We then describe the second tier

of the model, the molecular evolution submodel. A sche-

matic overview of the two tiers of the model and the flow

ofsimulateddata isshowninthe electronicsupplementary

material, figure S1. The Matlab source code is available for

download from the corresponding author’s website.

As described here, the two-tiered model assumes that

viruses evolve antigenically in a punctuated manner. As

such, mutations are assumed to be antigenically neutral

or nearly so most of the time, with only the rare

mutation resulting in a large antigenic change. This

model is, therefore, a simplification of the previously

published phylodynamic model of Koelle et al. (2006),

which hypothesizes that occasional antigenic innovations

2

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.org Downloaded from

Page 3

and the selective sweeps that accompany their emer-

genceare thetwo key

evolutionary dynamics of influenza A (H3N2)’s HA in

humans. The model presented here improves upon

this previous model in terms of both its simplicity and

its ability to reproduce the quantitative patterns of

the observed sequence data.

Following the description of the model, we provide

three case studies to illustrate the flexibility of the

model and the diversity of dynamics that it can generate.

The first application is to influenza A (H3N2) in humans,

the second to influenza B in humans and the third to

H3N8 in equine hosts. Our first application to influenza

A (H3N2) shows that gradual antigenic evolution

within antigenic clusters is necessary to reproduce the

ecological and evolutionary patterns of the observed

data. To our knowledge, it is the first phylodynamic

model that can quantitatively reproduce the known pat-

terns of viral diversity and divergence over time. Our

second application demonstrates that the model under a

different parametrization can generate the emergence

and maintenance of two viral lineages, consistent with

the evolutionary patterns observed for influenza B. Our

third application serves to illustrate the possibility of

extending the model. Specifically, we extend the two-

tiered model to two patches (representative of North

America and Europe) to show that it can reproduce the

evolutionary dynamics of influenza A (H3N8) in equine

hosts subject to transatlantic quarantine measures.

Although all three of our applications focus on influ-

enza, this two-tiered model is not limited to this virus, as

other RNA viruses (e.g. HIV and norovirus) appear to

evolve by punctuated immune escape (Cobey & Koelle

2008). The model is also not limited to the assumption

of punctuated antigenic evolution, as will be discussed in

§4, although it is in this case that the modularadvantages

of this model are most clearly evident.

factors thatdrivethe

2. THE TWO-TIERED MATHEMATICAL

MODEL

2.1. Tier 1: the epidemiological submodel

To model the virus’s epidemiological dynamics, we

improve on the antigenic tempo model recently pre-

sented elsewhere (Koelle et al. 2009). This model

starts with a given multi-strain model formulation,

interpreting strains in terms of major antigenic variants

instead of in terms of unique genotypes. As in Koelle

et al. (2009), we use a status-based approach to model-

ling strain interactions (Gog & Grenfell 2002), which

assumespolarized immunity.

dynamics of susceptible

belonging to a major antigenic variant i are captured

by equations of the form

Thedeterministic

and infected individuals

dSi

dt¼ mðN ? SiÞ ?

X

n

j¼1

bSi

NsijIjþ gðN ? Si? IiÞ ð2:1Þ

and

dIi

dt¼ bSi

NIi? ðm þ nÞIi? hðt ? te

iÞIi;

ð2:2Þ

where N is the population size, m is the birth rate and

the death rate, g is the rate of within-variant waning

immunity, n is the recovery rate, b is the transmission

rate and n is the total number of antigenic variants

that have circulated in the population up to time t.

The dynamics of individuals immune to variant i, Ri,

are not shown, as they can be easily computed by

Ri¼ N 2 Si2 Ii. As previously described in Koelle

et al. (2009), the degree of immunity between variants

i and j is given by sij¼ ulij, where u is the degree of

immunity between a mother–daughter variant pair

and lijis the antigenic kinship level between variants i

and j. hðt ? te

rate, defined as the rate at which individuals infected

with variant i give rise to a new antigenic variant. We

allow this rate to depend on the age of the variant,

t ? te

in the population. Specifically, using an approach simi-

lar to the punctuated model of antigenic change

detailed in Koelle et al. (2009), we model the per

capita antigenic emergence rate, hðt ? te

logically by using a Weibull hazard function, with scale

parameter l and shape parameter k

iÞ is the per capita antigenic emergence

i, where te

iis the time at which variant i emerged

iÞ, phenomeno-

hðt ? te

iÞ ¼k

l

t ? te

l

i

??k?1

:

ð2:3Þ

When k ¼ 1, this function reduces to a constant rate of

antigenic change, which many previous multi-strain

models have assumed. However, we consider the case

of k . 1, such that the rate of antigenic change

increases with the age of the variant. This phenomeno-

logical increase in the rate of phenotypic change can be

interpreted in several ways. First, it is consistent with

‘rules of thumb’ that have been developed to predict

the emergence of antigenically novel variants. These

rules of thumb generally specify that a certain number

of amino acid changes in epitope regions are required

to precipitate a major antigenic change (Wiley et al.

1981; Wilson & Cox 1990). Because it takes time to

accumulate these amino acid changes, an endemic

viral variant that has been circulating in a population

is more likely to give rise to a new variant the longer

it persists in the population. Second, an increase in

the rate of antigenic change with variant age can be

understood in terms of neutral networks in genotype

space (Lau & Dill 1990; Koelle et al. 2006; van

Nimwegen 2006). As an antigenic variant ages, the

sequences that it comprises are accumulating neutral

or nearly neutral mutations that are changing the gen-

etic backgrounds of the sequences. Effectively, this

exploration of sequence space can therefore lead to a

per capita rate of antigenic emergence that increases

with variant age. Third, and more formally, we can

mechanistically interpret the rate of antigenic change

increasing with the age of a variant by considering a

Markov model that considers mutation accumulation.

We outline this model in appendix A.

The model specified by equations (2.1)–(2.3), and

schematically shown as the first tier in the electronic

supplementary material, figure S1, differs from the orig-

inal model formulation described in Koelle et al. (2009)

in two ways. First, we allow for the possibility of waning

Two-tiered phylodynamic model

K. Koelle et al.

3

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 4

immunity within each antigenic variant (which occurs

when g . 0), such that the dynamics within each anti-

genic variant are governed by susceptible-infected-

recovered-susceptible(SIRS)

susceptible-infected-recovered (SIR) dynamics. Our

model is therefore more general, allowing for SIR

dynamics when g ¼ 0. Second, we simplify the model

by simulating it stochastically in its entirety instead

of using the stochastic hybrid approach described in

Koelle et al. (2009). This change to a fully stochastic

simulation removes the need to simulate the additional

variables that are used to determine the emergence

times of new phenotypes. Table 1 maps the determinis-

tic equationsconstituting

(2.1)–(2.3)) into Markov chain events and their associ-

ated transition rates, which are used to simulate the

model stochasticallywith

algorithm (Gillespie 2007).

The majority of the events and their rates shown in

table 1 are frequently used in stochastic simulations of

disease dynamics. The one exception is the antigenic

emergence event, which brings phenotypic novelty

into the viral population. When an antigenic emergence

event occurs for variant i in the simulation, it results in

an individual infected with variant j, where j ¼ n þ 1

and n is the number of variants that have been in the

population up to time t. The event also results in a

decrease in the number of individuals infected with var-

iant i from Iito Ii2 1 (table 1). The number of hosts

susceptible to variant j can be computed at the time

of variant j’s emergence if the numbers of births,

deaths and infections have been tracked over the

simulation.

dynamicsinstead of

the model(equations

the Gillespie

t-leap

2.2. Tier 2: the molecular evolution submodel

The second tier of the model consists of a molecular

evolution submodel, which generates a set of time-

stamped viral sequences from which a phylogeny can

be inferred and from which diversity and divergence

patterns can be constructed. This submodel uses as

input the variant-specific disease dynamics from the

epidemiological submodel, but does not provide any

feedback to it (electronic supplementary material,

figure S1). To simulate the molecular evolution sub-

model, a desired number of sampled sequences s is

first specified. Times of isolation are then randomly

assigned to each of these s sequences, taking into con-

sideration the number of infected individuals at each

time point. This is done by setting the probability of

the time of isolation being day k as I(k)/P

on day k and index i sums over all days of the simu-

lation.Eachtime-stamped

probabilistically assigned an antigenic variant to

which it belongs by letting the probability that the

sequence belongs to variant j be given by Ij(k)/

P

on day k.

A desired nucleotide length l for the viral sequences

is then specified, along with a mutation rate mnucin

units of nucleotide changes per site per year. The

iI(i),

where I(k) is the total number of individuals infected

sequence isthen

iIi(k) where k is the day of the sequence’s isolation

and index i sums over all antigenic variants present

per-sequence mutation rate m is given by the product

mnucl. Finally, we specify a given model of sequence

evolution and provide parameters associated with this

model. The model of sequence evolution chosen can

be extremely simple and

example, Kimura’s two-parameter model requires only

one parameter for the transition rate and one parameter

for the transversion rate (Felsenstein 2004). Alterna-

tively, the model chosen can be more complicated. For

example, the general time reversible (GTR) model

with a proportion of invariant sites (I) and a gamma

parametersparse. For

Table 1. The Markov chain events and their transition rates

used to stochastically simulate the epidemiological submodel.

Events are shown for a focal variant i. Events and transition

rates for all other variants j in the set {1, 2, 3, ..., n} are

analogous. The population size N is given by Siþ Iiþ Ri.

Several constraints exist in the system and are respected

during the simulation. Because Sjþ Ijþ Rj¼ N for all

variants j, a birth event, occurring at rate mN, increases the

number of susceptible hosts for each variant j by 1 (i.e.

births do not occur independently for each variant j).

Similarly, a death, also occurring at rate mN, decreases the

number of hosts S, I or R to each variant j by 1. This

decrease is taken from Sj, Ijor Rjwith probability Sj/N, Ij/N

and Rj/N, respectively. The rate of possible infection with

variant j (related to the force of infection) is given by bIj.

When i is equal to j, a ‘possible infection’ event results in an

infection with probability Si/N. When i is not equal to j, this

possible infection event results in a gain of immunity

throughpolarizedcross-immunity

sij(Si/N). Recovery of an individual infected with variant i

and the loss of immunity to variant i occur at rates nIiand

gRi, respectively. An antigenic emergence event results in a

decrease in the number of individuals infected with variant i

and the stochastic appearance of a new antigenic variant,

variant n þ 1.

withprobability

event changerate

birth(Si,Ii,Ri) ! (Siþ 1,Ii,Ri)

(Si,Ii,Ri) ! (Si2 1,Ii,Ri) with

probability Si/N;

(Si,Ii,Ri) ! (Si,Ii2 1,Ri) with

probability Ii/N;

(Si,Ii,Ri) ! (Si,Ii,Ri2 1) with

probability Ri/N

mN

death

mN

possible

infection

for i ¼ j:

(Si,Ii,Ri) ! (Si2 1,Iiþ 1,Ri)

with probability Si/N;

(Si,Ii,Ri) ! (Si,Ii,Ri) with

probability (1 2 Si/N)

for i = j:

(Si,Ii,Ri) ! (Si2 1,Ii,Riþ 1)

with probability sijSi/N;

(Si,Ii,Ri) ! (Si,Ii,Ri) with

probability (1 2 sijSi/N)

bIj

recovery

loss of

immunity

(Si,Ii,Ri) ! (Si,Ii2 1,Riþ 1)

(Si,Ii,Ri) ! (Siþ 1,Ii,Ri2 1)

nIi

gRi

antigenic

emergence

(Si,Ii,Ri) ! (Si,Ii2 1,Riþ 1)

and (Snþ1, 0,Rnþ1) !

(Snþ12 1, 1,Rnþ1)

h(t 2 ti

e)Ii

4

Two-tiered phylodynamic model

K. Koelle et al.

J. R. Soc. Interface

on March 26, 2010rsif.royalsocietypublishing.orgDownloaded from

Page 5

distribution of rate variation (G) would entail specifying

11 parameters: the frequencies of the four nucleotide

bases, the transition rates between each pair of bases,

the proportion of invariant sites and a parameter a

that controls the shape of the gamma distribution

(Felsenstein 2004).

Under a given model of sequence evolution and its

associated parameter values, site-specific mutation

rates are then assigned to each of the l nucleotide

locations. Under Kimura’s two-parameter model, the

mutation rate of each site is mnuc. Under the GTR þ

I þ G model, the mutation rate at each site is assigned

based on the proportion of invariant sites I and the a

parameter of the G distribution, such that the

per-sequence mutation rate comes out to be m.

To begin the molecular evolution submodel simu-

lation, a single sequence of length l is generated, with

each site probabilistically assigned a nucleotide depend-

ing on base frequencies specified by the model of

sequence evolution. This sequence belongs to antigenic

variant i ¼ 1, and all infected individuals on day 0 of

the simulation are infected with this strain. Starting

with the first antigenic variant, all sampling times of

the (genetically yet undetermined) sequences that

belong to variant 1 are found, as are all the emergence

times of variants that were generated specifically by this

variant. Then, starting from day 0, from each day to the

next, two processes occur: mutation and transmission of

extant sequences. Mutation is simulated by first deter-

mining the number of mutations that will occur on a

particular day. This number is determined by drawing

from a Poisson distribution with mean mI, where I is

the number of individuals infected on a particular

day. These mutations occur in the viral population (of

size I) at sites that are chosen randomly, weighted by

their per site mutation rate mnuc. A mutation assigned

at a chosen nucleotide site in a chosen individual results

in a nucleotide change that reflects the transition rates

given by the model of sequence evolution. Transmission

is simulated by first calculating, from one day to the

next, the number of new infections and the number of

recoveries, both only of the first variant. New infections

are then drawn from the extant pool of viral sequences

belonging to variant 1 and recoveries are chosen from

this same pool. Recoveries always occur randomly,

while new infections are chosen depending on the

selective advantage of each viral sequence.

The selective advantage of a sequence is given by

fk, where k is the sequence’s nucleotide distance from

the founder sequence of the variant and f is a par-

ameter specifyingthe intensity

advantage a single nucleotide change confers within

a single antigenic variant. In the case of purely neutral

evolution within antigenic variants (g ¼ 0 in equation

(2.1)), f ¼ 0, such that new infections are chosen ran-

domly from the pool of extant sequences. In the case

of gradual antigenic evolution within antigenic var-

iants (g . 0 in equation (2.1)), f . 0, such that new

infections are preferentially chosen from the set of

extant sequences that have higher divergence from

their founder sequence. (Clearly, there should be a

quantitative relationship between the parameters g

and f, with higher values of f associated with higher

ofthe selective

values of g. Unfortunately, at this point, we do not

have an equation mechanistically linking the value of

f to the value of g as we have for linking l to m

(appendix A).) The implementation of the trans-

mission step thereby captures both demographic

stochasticity in disease transmission and the possi-

bility for within-variant selection to act.

Throughout the day-to-day simulation of mutations

and transmission events, sampling times of sequences

belonging to variant 1 will be reached, as will emergence

times of variants that arose from variant 1. When a

sampling time is reached, a viral sequence is randomly

chosen from the pool of infected individuals; this simu-

lated sequence is now one of the s sequences that can be

used in inferring a phylogeny. When an emergence

timeis reached,a viral

chosen from the pool of infected individuals and

mutated at a single nucleotide location; this simulated

sequence is now the founder of the antigenic variant

born of variant 1 at that time. Alternatively, to be

strictly consistent with the Markov chain process

detailed above, the viral sequence can be chosen prefer-

entially according to its nucleotide distance k from its

founder sequence.

The same iterative procedure of mutation, trans-

mission and sequence sampling is then performed for

variant 2 through to the last variant observed in the

simulation. The set of sampled sequences that results

from this simulation can then be compared against

empirical sequence data. Although the simulated

sequences will differ from the observed ones genetically,

patterns of sequence evolution (e.g. divergence and

diversity patterns) can be compared quantitatively

across these datasets, and a phylogeny can be inferred

from all s sampled sequences and compared against

phylogenies inferred from empirical sequences.

sequenceis randomly

3. AN APPLICATION OF THE MODEL TO

INFLUENZA DYNAMICS IN HUMANS

To illustrate its use, we simulate the two-tiered model

described above with parameter values that are reason-

able for influenza. We use influenza as an application

because sequence data are readily available for its

dominant antigenic protein, the HA, and because the

advantages of the two-tiered model in terms of its

modular design are most evident for a system like influ-

enza, where many mutations in the HA protein have

been shown to be neutral or nearly so and single

mutations have been shown to result in large antigenic

changes (Webster & Laver 1980; Nakajima et al. 1983;

Berton et al. 1984; Nakagawa et al. 2000, 2001a,b,

2002; Smith et al. 2004).

3.1. Simulating the dynamics of influenza A

(H3N2) in humans

Influenza A subtype H3N2 has been circulating in humans

since 1968, when a reassortment event between the pre-

viously circulating influenza A subtype, H2N2, and a

swine influenza virus resulted in its pandemic spread.

From 1968 until the present, this subtype has dominated

the flu season, with annual H3N2 attack rates estimated

Two-tiered phylodynamic model

K. Koelle et al.

5

J. R. Soc. Interface

on March 26, 2010 rsif.royalsocietypublishing.orgDownloaded from