Page 1

arXiv:q-bio/0703040v1 [q-bio.PE] 18 Mar 2007

Gene surfing in expanding populations

Oskar Hallatschek∗and David R. Nelson

Lyman Laboratory of Physics, Harvard University, Cambridge, Massachusetts 02138, USA

(Dated: February 6, 2008)

Spatially resolved genetic data is increasingly used to reconstruct the migrational history of

species. To assist such inference, we study, by means of simulations and analytical methods, the

dynamics of neutral gene frequencies in a population undergoing a continual range expansion in

one dimension. During such a colonization period, lineages can fix at the wave front by means of a

“surfing” mechanism [Edmonds C.A., Lillie A.S. & Cavalli-Sforza L.L. (2004) Proc Natl Acad Sci

USA 101: 975-979]. We quantify this phenomenon in terms of (i) the spatial distribution of lineages

that reach fixation and, closely related, (ii) the continual loss of genetic diversity (heterozygosity)

at the wave front, characterizing the approach to fixation. Our simulations show that an effective

population size can be assigned to the wave that controls the (observable) gradient in heterozygosity

left behind the colonization process. This effective population size is markedly higher in pushed

waves than in pulled waves, and increases only sub-linearly with deme size. To explain these and

other findings, we develop a versatile analytical approach, based on the physics of reaction-diffusion

systems, that yields simple predictions for any deterministic population dynamics.

Population expansions in space are common events in

the evolutionary history of many species [1, 2, 3, 4, 5, 6, 7]

and have a profound effect on their genealogy. It is widely

appreciated that any range expansion leads to a reduc-

tion of genetic diversity (“Founder Effect”) because the

gene pool for the new habitat is provided only by a small

number of individuals, which happen to arrive in the un-

explored territory first. In many species, the genetic foot-

prints of these pioneers are still recognizable today and

provide information about the migrational history of the

species. For instance, a frequently observed south-north

gradient in genetic diversity (“southern richness to north-

ern purity” [8]) on the northern hemisphere is thought to

reflect the range expansions induced by the glacial cycles.

In the case of humans, the genetic diversity decreases

essentially linearly with increasing geographic distance

from Africa [2, 3], which is indicative of the human mi-

gration out of Africa. It is hoped [9], that the observed

patterns of neutral genetic diversity can be used to infer

details of the corresponding colonization pathways.

Such an inference requires an understanding of how a

colonization process generates a gradient in genetic diver-

sity, and which parameters chiefly control the magnitude

of this gradient. Traditional models of population genet-

ics [10], which mainly focus on populations of constant

size and distribution, apply to periods before and after a

range expansion has occurred, when the population is at

demographic equilibrium. However, the spatio-temporal

dynamics in the transition period, on which we focus in

this article, is less amenable to the standard analytical

tools of population genetics, and has been so far stud-

ied mostly by means of simulations [11, 12, 13, 14, 15].

An analytical understanding is available only for a lin-

ear stepping stone model in which demes (lattice sites)

∗To whom correspondence should be addressed.

lats@physics.harvard.edu

E-mail: ohal-

are colonized one after the other, following deterministic

logistic growth [16] or instantaneously [17], in terms of

recurrence relations.

Recent computer studies suggest that the neutral ge-

netic patterns created by a propagating population wave

might be understood in terms of the mechanism of “gene

surfing” [13, 14]: As compared to individuals in the wake,

the pioneers at the colonization front are much more suc-

cessful in passing their genes on to future generations, not

only because their reproduction is unhampered by lim-

ited resources but also because their progeny start out

from a good position to keep up with the wave front (by

means of mere diffusion). The offspring of pioneers thus

have a tendency to become pioneers of the next genera-

tion, such that they, too, enjoy abundant resources, just

like their ancestors. Therefore, pioneer genes have a good

chance to be carried along with the wave front and attain

high frequencies, as if they “surf” on the wave. Thus, the

descendents of an individual sampled from the tip of the

wave have a finite probability to take over the wave front.

In this case of “successful surfing”, further colonization

will produce only descendents of the relevant pioneer be-

cause the wave front has been “fixed”. The process of fix-

ation at the front of a one-dimensional population wave

is illustrated in Fig 1.

The present study hinges on the question as to where

lineages that reach fixation originate within the wave

front. Clearly, the probability of successful surfing must

increase with the proximity to the edge of the wave [14].

On the other hand, more surfing attempts originate from

the bulk of the wave where the population density is

larger. We show that, due to this tradeoff, the origins of

successful lineages have a bell-like distribution inside the

wave front. Furthermore, this ancestral probability dis-

tribution, together with the population-density profile of

the population wave itself, is found to control the observ-

able gradients in genetic diversity. The genetic pattern

directly behind the moving colonization front turns out to

mimic that of a small well-mixed (panmictic) population.

Page 2

2

12

9

9

7

7

7

5

5

5

5

5

5

11

10

8

8

6

6

6

6

6

4

4

4

4

4

4

3

3

3

3

3

3

3

2

2

2

2

2

2

2

1

1

1

1

1

1

1

8

8

9

9

9

9

9

9

9

9

9

9

9

9

8

9

9

12

9

8

12

9

13

8

12

12

13

9

4

9

9

9

9

9

9

9

9

9

9

13

9

9

9

9

9

8

13

9

9

9

8

8

12

13

9

9

12

13

12

13

9

13

9

9

13

9

9

9

9

9

9

13

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

9

8

site nr. i

(a)

(b)

(c)

•

•

•

•

•

•

FIG. 1:

expanding population (in one spatial dimension). (a) A neutral red mutant arises at the wave front. (b) After some time, the

genetic make-up at the wave front is drastically changed due to random number fluctuations and it is apparent that descendents

of red will take over the wave front. (c) Fixation in the co-moving frame of descendents of red. Numbers in these sketches

represent “inheritable” labels that are used in our simulations to trace back the spatial origin of individuals in the wave front.

In this example, descendents of red are associated with position “9” in the co-moving frame. The dashed blue frame indicates

the co-moving simulation box.

Illustration of gene surfing by means of three consecutive snapshots of the genetic composition at the edge of an

The effective size Ne of this population “bottleneck” is

shown to be smaller than the typical number of individ-

uals in the colonization front and very sensitive to the

growth conditions in the very tip of the front. Coloniza-

tion fronts in which individuals need to be accompanied

by others in order to grow (Allee effect [18]) have a much

larger effective population size than those in which indi-

viduals grow even if they are isolated from the rest.

The outline of the paper is as follows. We first intro-

duce a stochastic computer model that we use to gener-

ate both pulled and pushed one-dimensional colonization

waves. Tracer experiments within this model are then

used to reveal the probability distribution of successful

surfers and the decrease of genetic differentiation at the

colonization front. Our succeeding theoretical treatment

reveals to what extent both measures are related, and

how they can be predicted for continuous models with

quasi-stationary demography.

tween theory and simulations, we discuss the significance

of our results in the light of inferring past range expan-

sions from spatially resolved genetic data.

After a comparison be-

I.SIMULATIONS

Most models for range expansions can be classified as

describing pulled or pushed population fronts [19, 20].

The distinction between the two cases corresponds to a

difference in behavior. Suppose individuals need to be in

proximity to other individuals in order to grow in num-

ber (Allee effect [18]). The presence of conspecifics can

be beneficial due to numerous factors, such as predator

dilution, antipredator vigilance, reduction of inbreeding

and many others [21]. Then, the individuals in the very

tip of the front do not count so much, because the rate

of reproduction decreases when the number density be-

comes too small. Consequently, the front is pushed in

the sense, that its time–evolution is determined by the

behavior of an ensemble of individuals in the boundary

region. On the other hand, a population in which an in-

dividual reproduces, even if it is completely isolated from

the rest, will be “spearheaded” by these front individu-

als. These pulled fronts are responsive to small changes

in the frontier and, therefore, are prone to large fluctua-

tions [20].

One might suspect that the genetic pattern left behind

a population wave should reflect whether the colonization

process is controlled by a small or large number of indi-

viduals. Hence, we have set up a computer model that

allows us to investigate the surfing dynamics for both

classes of waves.

A.Population dynamics

The population is distributed on a one dimensional

lattice, whose sites (demes) can carry at most N individ-

uals. The algorithm effectively treats individuals (•) and

vacancies (◦) as two types of particles, whose numbers

must sum to N at each site. A computational time step

consists of two parts: (i) a migration event, in which a

randomly chosen particle exchanges place with a particle

from a neighboring site. This step is independent of the

involved particle types. (ii) A duplication attempt: Two

particles are randomly chosen (with replacement) from

the same lattice site. A duplicate of the first one replaces

the second one (1st→ 2nd) with probabilities based on

their identities: proposed replacements ◦ → ◦, • → • and

• → ◦ (growth) are realized with probability 1, whereas

◦ → • (death) is carried out only with probability 1 − s,

depending on a growth parameter 0 < s < 1. This asym-

metry controls the effective local growth advantage of •

over ◦.

In terms of individuals and vacancies instead of parti-

cles, we see that our model describes migration and local

logistic growth of a population distributed over demes

with carrying capacity N. Starting with a step-function

initial condition, the simulation generates an expand-

ing pulled population wave. The above algorithm rep-

resents a discretized version [22] of the stochastic Fisher-

Kolmogorov equation [23] with a Moran-type of breeding

scheme [10]. To generate pushed waves as well, we extend

our model by the following rule: In demes in which the

number of individuals falls to Ncor below, we set their

Page 3

3

effective linear growth rate s to zero. This represents, for

Nc > 0, a simple version of the above mentioned Allee

effect of a reduced growth rate when the population den-

sity is too small.

B. Tracer dynamics

Tracer experiments within this computer model allow

us to extract the genealogies of front individuals. After

the population had enough time to relax into its prop-

agating equilibrium state, all individuals are labeled ac-

cording to their current position i ∈ {1...n} within the

simulation box of length n, see Fig. 1a. These labels are

henceforth inherited by the descendents, which thereby

carry information about the spatial position of their an-

cestors. The randomness in the reproduction and mi-

gration processes (genetic drift) during the succeeding

dynamics inevitably leads to a reduction in the diver-

sity of labels present in the simulation box, see Fig. 1b.

Labels are lost due to either extinction or because they

cannot keep up with the simulation box, which follows

the propagating wave front[44].

In our simulation, the gradual loss of diversity of labels

at the wave front is measured by the quantity

H(t) =

n

?

i=1

pi(t)[1 − pi(t)] ,(1)

which depends on the frequency pi(t) of label i at time

t after the wave has been labeled. H(t) represents the

time-dependent probability that two individuals, ran-

domly chosen from the bounded simulation box, carry

different labels. Provided that mutations are negligible

on the time-scale of the range expansion, we may think of

our inheritable labels as being neutral genes at one par-

ticular locus (alleles). We may thus identify H(t) with

the probability that two alleles randomly chosen from the

front region are different conditional on the well-mixed la-

beling state at t = 0 imposed by our simulation. Hence,

we refer to H(t) as the time-dependent expected heterozy-

gosity [10] at the wave front [45].

The perpetual loss of labels in our model without mu-

tations eventually leads to the fixation of one label in

the simulation box, see Fig. 1c. The value of this label

indicates the origin within the co-moving frame of this

successful “surfer”. It contributes one data point to the

spatial distribution Piof individuals whose descendents

came to fixation. After fixation, the algorithm proceeds

with the next labeling event.

C.Results

The parameters of our computer models are the deme

size N, i.e. the maximal number of individuals per lattice

site, the linear growth rate s per generation, and the crit-

ical occupation number Nc, below which the growth rate

drops to zero (Allee effect). In our simulations, we set

s = 0.1 throughout, and determine, for varying N and

Nc, the averages of the ancestral distribution function

?Pi?, the scaled occupation number ?ni?/N, both being

functions of the lattice site i in the co-moving frame, and

the time-dependent probability of non-identity, ?H(t)?.

Here, angle brackets indicate that the enclosed quanti-

ties have been averaged in time, i.e. over many fixation

events, and over multiple realizations of the same com-

puter experiment[46].

Figure 2 illustrates the relation between the front pro-

files ?ni?/N and the ancestral distribution ?Pi? in the co-

moving frame. Whereas the wave profiles have the famil-

iar sigmoidal shapes of reaction-diffusion waves [19, 20],

the ancestral distribution functions are bell-curves with

most of its support beyond the inflection point of the

wave front. The fact that ?Pi? has a maximum inside the

wave front reflects a tradeoff, mentioned earlier, between

a larger fixation probability in the tip of the wave ver-

sus a larger number of surfing attempts originating from

the bulk. Notice from Fig. 2a that, for increasing deme

size, the distribution becomes wider and shifts further

into the tip of the wave, which is in contrast to the al-

most N−independent scaled wave profiles. Fig. 2b shows

that the opposite effect is caused by increasing the cutoff

value Nc, which changes the type of the wave from pulled

to pushed.

Next, we measured the temporal decay of the heterozy-

gosity H(t), defined in Eq. (1). In Fig. 3, time-traces of

H(t) are depicted for various parameters and show an ex-

ponential decay after an initial transient. This allows us

to characterize the strength of genetic drift at the wave

front by a single number, the (asymptotic) exponential

decay rate, −∂tlog?H(t)?, which can be extracted from

logarithmic plots of ?H(t)?. By analogy with well-mixed

(panmictic) populations, in which the heterozygosity de-

cays exponentially with rate 2/N (Moran model[10]), it

is convenient to express the decay rate by 2/Ne, in terms

of an effective population size Ne. The theoretical part

below will further clarify to what extent the genetic di-

versity at the wave front mimics that of a population

“bottleneck” of constant size Ne.

Figure 4 depicts Neas a function of the deme size N

on a double logarithmic scale for Nc= 0 and Nc= 10.

Naively, one might expect Neto be, roughly, the charac-

teristic number of individuals in the width of the wave

front, since these individuals contribute (by growing) to

the advance of the wave. Thus, a linear relationship be-

tween deme and effective population size would not be

surprising. In contrast, we find that Neincreases much

slower than linearly with increasing deme size. Further-

more, the effective population size turns out to be very

sensitive to the presence of an Allee effect (Nc > 0),

which has the ability strongly increase the effective pop-

ulation size. This point is illustrated, in particular, by

the inset of Fig. 4 which depicts the effective population

size Ne in a simulation of fixed deme size (N = 1000)

and varying strength of the Allee effect (10 < Nc< 500).

Page 4

4

0 20406080

0.02

0.04

0.06

0.08

100

1000

10000

31600

(a) Pulled waves (Nc= 0)

(a)

i, lattice site

N

Nc= 0

?Pi?

?ni?/N

0 20 4060 80

0.02

0.04

0.06

0.08

0

10

30

100

(b) Pushed waves (Nc> 0)

(b)

i, lattice site

Nc

N = 1000

?Pi?

?ni?/N

FIG. 2:

?ni?/N (sigmoidal curves; scaled along the vertical axis to fit the figure) as a function of the site number i in the co-moving

frame; (a) for pulled waves (Nc = 0) with varying deme sizes N; (b) for various pushed waves (Nc > 0) with deme size N = 1000

compared to the corresponding pulled wave (dashed blue lines), which is also present in (a).

Measured distributions ?Pi? (bell-curves) of “successful surfers” together with the normalized occupation numbers

0

500

1000

1500

2000

-6

-4

-2

0

N=105; Ne=1540

N=36100; Ne=1070

N=10000; Ne=700

N=3160; Ne=550

N=1000; Ne=360

N=316; Ne=250

N=100; Ne=170

(a) Pulled waves (Nc= 0)

(a)

log(?H?)

t

0 200400600

t

8001000 1200

-4

-3

-2

-1

0

N=10000; Ne=2560

N=2000; Ne=1280

N=1000; Ne=960

N=316; Ne=590

N=100; Ne=390

(b) Pushed waves (Nc= 10)

(b)

log(?H?)

FIG. 3: The decay of genetic diversity, ?H(t)?, with time (in units of generations) on a log-linear scale for varying deme sizes;

(a) for pulled waves (Nc = 0); (b) for pushed waves with Nc = 10. Both cases show, that an asymptotic exponential decay

of ?H(t)? is reached after an initial transient where the decay is weak. The duration of this transient is dependent on the

size of the simulation box: The larger the simulation box, the larger the time until the exponential decay is approached. The

asymptotic exponential decay rate, however, has been checked to approach a constant for a sufficiently large box size. This

exponential decay rate is therefore well-defined and can be used to characterize the decrease of genetic diversity at the wave

front. By analogy with panmictic populations, in which the heterozygosity decays exponentially with rate 2/N (Moran model),

it is convenient to express the decay rate as 2/Ne, i.e., in terms of an effective population size Ne, which is noted in the legends,

and plotted in Fig. 4.

Qualitatively this phenomenon may be explained with

the pushed nature of these waves. An Allee effect shifts

the distribution Pi of successful surfers away from the

tip towards the wake of the wave (see Fig. 2b) and hence

increases the gene pool from which the next generation

of pioneers is sampled. This argument indicates a close

relation between the Neand Pi, which also emerges ex-

plicitely in the theoretical analysis below.

II. THEORY

The following employs a continuous reaction diffusion

approach to establish a theoretical basis for the relation

between the neutral genetic diversity and the popula-

tion dynamics in non-equilibrium situations like range

expansions. It will help us to reconcile the somewhat sur-

prising response of our simulations to parameter changes

Page 5

5

1001000 10000

100

1000

10 1001000

1000

10000

N = 1000

Nc= 10

Nc= 0

Ne

Ne

Nc

N

FIG. 4:

function of deme size N on a log-log scale for pulled waves

(Nc = 0, asterisks) and pushed waves (Nc = 10, crosses). The

dashed and dotted lines have slope .30 and .42, respectively,

i.e., significantly smaller than 1. Triangles represent the ef-

fective population sizes as inferred from the strong-migration

approximation, Eq. (8), using the measured ?P?–distribution

and population profiles. The inset shows the behavior of Ne

for varying cutoff-value Nc and fixed deme size N = 1000,

again, on a log-log scale.

The measured effective population size Ne as a

(deme size and Allee effect). Note from Fig. 2 that the

changes in the ancestral distribution are dramatic, while

the changes in the population profile itself are quite mod-

est. Results obtained from our approximation scheme are

tested by direct comparison of simulations and theory.

A.Gene surfing

In our simulations, as well as in many other models

of range expansions, a propagating population wave re-

sults from the combination of random short-range migra-

tion and logistic local growth. In the continuum limit, a

general coarse-grained continuum description of such a

reaction-diffusion system of a single species is given by

∂tc(x,t) = D∂2

xc(x,t) + v∂xc(x,t) + K(x,t)(2)

formulated in the frame co-moving with velocity v, where

c(x,t) represents the density of individuals at location x

at time t and D is a diffusivity.

on the right hand side represent the conservative part of

the population dynamics, for which we make the usual

diffusion assumptions [24].

accounts for both deterministic and stochastic fluctu-

ations in the number of individuals due to birth and

death processes, and typically involves non–linearities

such as a logistic interaction between individuals as well

as noise caused by number fluctuations. For instance, our

computer model with Nc = 0 maps, in the continuum

limit [22], to the stochastic Fisher equation, for which

K(x,t) = sc(c∞− c) + ǫ?c(c∞− c)η, where η(x,t) is a

The first two terms

The reaction term K(x,t)

Gaussian white noise process in space and time, c∞∝ N

is the carrying capacity and ǫ ∝

of the noise. We would like to stress, however, that the

following analysis does not rely on a particular form of

K. Therefore, we leave the reaction term unspecified.

As in our tracer experiments, let us assume that inheri-

table labels, representative of neutral genes, are attached

to individuals within the population and ask: Given

Eq. (2) is a proper description of the population dy-

namics, to what extent is the dynamics of these labels

determined? To answer this question, it is convenient to

adopt a retrospective view on the tracer dynamics. Imag-

ine following the ancestral line of a single label located

at x backwards in time to explore which spatial route its

ancestors took. This backward–dynamics of a single line

of descent will show drift and diffusion only; any reac-

tion is absent because among all the individuals living

at some earlier time there must be exactly one ancestor

from which the chosen label has descended from. We

may thus describe the ancestral process of a single lin-

eage by the probability density G(ξ,τ|x,t) that a label

presently, at time t and located at x, has descended from

an ancestor that lived at ξ at the earlier time τ. In this

context, it is natural to choose the time as increasing to-

wards the past, τ > t, and to consider (ξ,τ) and (x,t) as

final and initial state of the ancestral trajectory, respec-

tively. With this convention, the distribution G satisfies

the initial condition G(ξ,t|x,t) = δ(x−ξ), where δ(x) is

the Dirac delta function, and is normalized with respect

to ξ,?G(ξ,τ|x,t)dξ = 1.

Since G(ξ,τ|x,t) as function of ξ and τ is a probabil-

ity distribution function generated by a diffusion process

that is continuous in space and time, we expect its dy-

namics to be described by a generalized diffusion equa-

tion (Fokker–Planck equation [24]). Indeed, in the Ap-

pendix A we show that, G(x,t|ξ,τ) obeys

√N sets the strength

∂τG(ξ,τ|x,t) = −∂ξJ(ξ,τ|x,t)

J(ξ,τ|x,t) ≡ −D∂ξG + {v + 2D∂ξln[c(ξ,τ)]}G ,

(3)

where all derivatives are taken with respect to the an-

cestral coordinates (ξ,τ). The drift term in Eq. (3) has

two antagonistic parts. The first term, v, tends to push

the lineage into the tip of the wave, and is simply a con-

sequence of the moving frame of reference. The second

term proportional to twice the gradient of the logarithm

of the density is somewhat unusual. It accounts for the

purely “entropical” fact that, since there is a forward–

time flux of individuals diffusing from regions of high

density to regions of low density, an ancestral line tends

to drift into the wake of the wave where the density is

higher.[47]

Our computer experiments measure the spatial distri-

bution P of the individuals whose descendents came to

fixation. This information is encoded in the long-time

behavior of G

P(ξ,τ) = lim

t→−∞G(ξ,τ|x,t) ,(4)