Content uploaded by Nicholas Matzke

Author content

All content in this area was uploaded by Nicholas Matzke on May 26, 2016

Content may be subject to copyright.

Content uploaded by Nicholas Matzke

Author content

All content in this area was uploaded by Nicholas Matzke on Dec 18, 2015

Content may be subject to copyright.

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 789 789–804

Syst. Biol. 62(6):789–804, 2013

© The Author(s) 2013. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.

For Permissions, please email: journals.per missions@oup.com

DOI:10.1093/sysbio/syt040

Advance Access publication June 4, 2013

Bayesian Analysis of Biogeography when the Number of Areas is Large

MICHAEL J. LANDIS

1,∗

,NICHOLAS J. MATZKE

1

,BRIAN R. MOORE

2

, AND JOHN P. H UELSENBECK

1,3

1

Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA;

2

Department of Evolution and Ecology, University of

California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; and

3

Biology Department, King Abdulaziz University, Jeddah, Saudi Arabia

∗

Correspondence to be sent to: Department of Integrative Biology, University of California, 3060 VLSB #3140, Berkeley, CA 94720-3140, USA;

E-mail: mlandis@berkeley.edu.

Received 16 December 2012; reviews returned 17 April 2013; accepted 28 May 2013

Associate Editor: Peter Foster

Abstract.—Historical biogeography is increasingly studied from an explicitly statistical perspective, using stochastic models

to describe the evolution of species range as a continuous-time Markov process of dispersal between and extinction within

a set of discrete geographic areas. The main constraint of these methods is t he computational limit on the number of areas

that can be speciﬁed. We propose a Bayesian approach for inferring biogeographic history that extends the application of

biogeographic models to the analysis of more realistic problems that involve a large number of areas. Our solution is based on

a “data-augmentation” approach, in which we ﬁrst populate the tree with a history of biogeographic events that is consistent

with the observed species ranges at the tips of the tree. We then calculate the likelihood of a given history by adopting a

mechanistic interpretation of the instantaneous-rate matrix, which speciﬁes both the exponential waiting times between

biogeographic events and the relative probabilities of each biogeographic change. We develop this approach in a Bayesian

framework, marginalizing over all possible biogeographic histories using Markov chain Monte Carlo (MCMC). Besides

dramatically increasing the number of areas that can be accommodated in a biogeographic analysis, our method allows the

parameters of a given biogeographic model to be estimated and different biogeographic models to be objectively compared.

Our approach is implemented in the program, BayArea. [ancestral area analysis; Bayesian biogeographic inference; data

augmentation; historical biogeography; Markov chain Monte Carlo.]

Historical biogeography—the study of the past

geographic distribution of species and the processes

that inﬂuence species distribution—remains a

difﬁcult problem in evolutionary biology. Inference of

biogeographic history is made par ticularly challenging

because of the many factors that inﬂuence species range,

including various geological, climatic, ecological, and

chance events. Both the diversity of factors inﬂuencing

the geographic range of a species and the uncertainty

regarding their relative importance motivates pursuit

of biogeographic inference within a solid statistical

framework. A statistical approach requires t hat the

assumptions of an analysis be explicitly stated through

the construction of probabilistic models that include

parameters representing processes thought to impact

the geographic distribution of species. This approach

allows for the efﬁcient estimation of model parameters

and, perhaps more importantly, the rigorous comparison

of alternative biogeographic models.

Over the past decade, several promising met hods

have been proposed that cast biogeographic inference

in a statistical modeling framework. Lemmon and

Lemmon (2008) and Lemey et al. (2009; 2010) proposed

stochastic models that treat the distribution of species

as continuous variables. A few years earlier, Reeetal.

(2005) and Ree and Smith (2008) proposed stochastic

models that treat the distribution of species as a discrete

variable. For both approaches—those treating space

as a continuous or a discrete variable—parameters

are estimated using maximum likelihood or Bayesian

inference.

The discrete-space model of Reeetal.(2005)is

particularly intriguing because its basic statistical

ﬂexibility has the potential to profoundly change

biogeographic inference, but is hampered by

computational limitations. They modeled the

colonization of and local extinction within a set of

discrete areas as a continuous-time Markov process

with a state space consisting of all possible geographic-

range conﬁgurations. The machinery for computing

the likelihoods of discrete geographic ranges on

phylogenetic trees is the same as that used to calculate

the likelihood of discrete characters (e.g., nucleotide

sequences) on a tree; matr ix exponentiation is used

to calculate the probability of transitions among

states/ranges along branches and t he Felsenstein

(1981) pruning algorithm (also see Gallager 1962)is

used to account for different ancestral conﬁgurations

at the interior nodes of the tree. Together, matrix

exponentiation and the Felsenstein pruning algorithm

allow the likelihood to account for all possible histories

of area colonization and local extinction that could have

given rise to the observed geographic distribution of

species.

The conventional algorithms for calculating the

likelihood, however, have practical limitations. Both

matrix exponentiation and the pruning algorithm

become computationally unmanageable when the

number of areas becomes too large. Practically speaking,

this means that inference under a discrete-space model,

such as that proposed by Reeetal.(2005), is limited

to about 10 areas. With 10 areas, there are a total of

2

10

−1= 1023 possible states (geographic ranges) and

the rate matrix of the continuous-time Markov model

is 1023×1023 in dimension. A recent implementation

of the Reeetal.(2005) method allows up to 20 areas

789

at University of Tennessee ? Knoxville Libraries on November 10, 2013http://sysbio.oxfordjournals.org/Downloaded from

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 790 789–804

790 SYSTEMATIC BIOLOGY VOL. 62

to be considered, but at the expense of making some

restrictive assumptions about the number of areas that

can be occupied concurrently per species (Webb and

Ree 20 12). The usual method for working around the

limitations of the Reeetal.(2005) approach is to

group areas together in such a way that the biologist

considers no more than about 10 areas. This solution,

unfortunately, comes at a cost: hard earned species-

distribution data are lumped, limiting the spatial

resolution of the inferred biogeographic history; the

inference of parameters suffers because fewer data are

available for estimation; and the complexity of the

models that can be distinguished is limited by the small

number of areas that can be considered.

In this article, we describe a computational method—

referred to as “data augmentation”—that allows the

approach proposed by Reeetal.(2005) to be extended

to hundreds or thousands of areas. The approach is

inspired by the method described by Robinson et al.

(2003) for the analysis of amino acid sequence data under

complex models of non-independence, which relies on

Markov chain Monte Carlo (MCMC; Metropolis et al.

1953; Hastings 1970) to carry out the tasks normally

accomplished by means of matrix exponentiation and

the Felsenstein pruning algorithm. The biogeographic

model described by Reeetal.(2005) explicitly considers

various scenarios by which ancestral ranges may become

subdivided during speciation and inherited by daughter

species. In contrast, the two biogeographic models that

we describe here both assume that ancestral ranges are

inherited identically: the ﬁrst is a simple (null) model

in which every area has an equal rate of colonization

or extinction and a second model in which rates of

colonization are distance dependent. We develop this

approach in a Bayesian statistical framework in which

model parameters are estimated using MCMC and

candidate biogeographic models are compared using

Bayes factors. We explore the statistical behavior of

this approach by means of simulation, and demonstrate

its empirical application with an analysis of Malesian

species within the ﬂowering-plant clade, Rhododendron

section Vireya.

M

ETHODS

Statistical Inference of Biogeographic History

We are interested in modeling the biogeographic

distribution of M extant taxa over a geographic space

that has been discretized into N areas, where eac h

taxon occurs in at least a single area. The evolutionary

relationships among the M taxa are described by a

rooted, time-calibrated phylogenetic tree that in this

article is considered to be known without error. We label

the tips of this tree to correspond to the observed species,

1,2,...,M; the interior nodes of the tree are labeled

in postorder sequence M+1,M+2,...,2M (Fig. 1). The

ancestor of node i is denoted (i). The most recent

common ancestor of the M observed species (the “root”

node) is labeled 2M−1. We also consider both the branch

subtending the root node (the “stem” branch) and its

immediate ancestor (the “stem” node), which is labeled

2M . The times of the speciation events (nodes) on the

tree are designated t

1

,t

2

,...,t

2M

. Typically, the species

at the tips are contemporaneous and extant, such that

t

1

= t

2

=···=t

M

= 0. The temporal duration of t he branch

below node i, typically in terms of millions of years, can

be calculated as T

i

= t

(i)

−t

i

.

Our use of “geographic range” refers to the pattern

of presence and absence of a lineage within t he set

of discrete geographic areas. For the models we will

explore, all geographic ranges in which at least one area

is occupied are admissible (i.e., the case in which all

areas are unoccupied is precluded). The occurrence of

the i-th species in the j-th area is denoted x

i,j

, where x

i,j

is equal to 0 or 1. Although we model geographic ranges

as bit vectors, we represent them using bit strings (i.e., a

sequence of zeros and ones) to simplify our notation. For

example, the bit string 101 corresponds to a geographic

range for a species that is present in areas 1 and 3

and absent in area 2. The biogeographic state space, S,

includes the 2

N

−1 geographic ranges for a model with

N discrete areas. For example, all allowable geographic

ranges, S, for a model with N = 3 areas are

S ={001,010,100,011,101,110,111},

and the number of distinct conﬁgurations for this

state space is n(S) = 2

3

−1= 7. We designate t he

observed geographic range for the i-th species as

x

i

= (x

i,1

,x

i,2

,...,x

i,N

), where X

obs

= (x

1

,x

2

,...,x

M

), and

designate ancestral geographic ranges at interior nodes

ofthetreeasx

M+1

, x

M+2

,...,x

2M

.

The “states” (geographic ranges) that we observe at

the tips of the tree were generated through a potentially

complicated history of colonization and local extinction.

Figure 1b–d depicts examples of biogeographic histor ies.

A “biogeographic history” is a speciﬁc sequence of

colonization and/or local extinction events that could

have given rise to the observed geographic ranges. An

event of range expansion or contraction is denoted x

i,j,k

;

each event occurs on a speciﬁc branch (leading to node

i) and involves a single area (j) at a point in time (k,

indicating the relative time of the k-th event on branch

i,

(i)

k

∈

(i)

, which we describe in more detail below).

The history of range expansion or local extinction on

the branch with index i involving area j is denoted

x

i,j

= (x

i,j,1

,x

i,j,2

,...,x

i,j,F

), where events along branch

i are ordered such that x

i,j,1

is the oldest and x

i,j,F

is the most recent. The collection of histories over all

branches of the tree is denoted X

aug

= (x

1

,x

2

,...,x

2M−1

),

representing the data augmented biogeographic history.

For example, there are 6, 6, and 12 biogeographic events

for the histories shown in Figure 1b, c, and d, respectively.

The probability of a particular biogeographic history

can be calculated in a straightforward manner by

assuming that the events of colonization and local

extinction occur according to a continuous-time Markov

at University of Tennessee ? Knoxville Libraries on November 10, 2013http://sysbio.oxfordjournals.org/Downloaded from

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 791 789–804

2013 LANDIS ET AL.—BAYESIAN BIOGEOGRAPHY FOR MANY AREAS 791

12

5

34

6

7

8

001011 010 011

011

001

001

101

101

111

111

101

011

111

011

010

101011 010 011

011

111

101

111

011

111

111

011

001011 010 011

011

001

001

101

101

001

001

011

011

001

011

010

111

101

111

101

011

010

010

011

010

111

011

011

111

011

111

010

011

011

010

A) B)

C) D)

FIGURE 1. An example of a tree with M = 4 species. A) Nodes on the

tree are labeled such that the tips of the tree have the labels 1,2,...,M

whereas the interior nodes of the tree are labeled M +1,M +2,...,2M.

Note that in this article we also consider the “stem” branch of the tree,

which connects the root node (node 7) and its immediate common

ancestor (node 8). B–D) Several possible biogeographic histories—

comprising 6, 6, and 12 events, respectively—that can explain the

observed species ranges.

chain (Ree et al. 2005). A continuous-time Markov

chain is fully described by a matrix containing the

instantaneous rates of change between all pairs of states

(geographic ranges, in this case). This instantaneous-

rate matrix, Q, has off-diagonal elements that are

all ≥ 0 and negative diagonal elements that are

speciﬁed such that each row of the matrix sums

to 0. The elements of Q are parameterized by

functions of θ , the parameter vector, according to some

dispersal model, M. The probability of a biogeographic

history is obtained using the information on the

position of colonization/extinction events on the tree

and information from the instantaneous-rate matrix.

Consider, for example, a case in which the process starts

with a geographic range of 001 at one end of a branch,

with a subsequent colonization of area one at time t

1

(i.e., changes from 001→ 101), and then remains in the

geographic range 101 until the end of the branch at time

t

2

. The probability of this history is

−q

001,001

e

−(−q

001,001

t

1

)

Waiting time for colonization

×−

q

001,101

q

001,001

Probability of colonization event

× e

−(−q

101,101

(t

2

−t

1

))

Probability of no further events

There are an inﬁnite number of biogeographic

histories that can explain the observed geographic

ranges. When calculating the probability of the observed

geographic ranges at the tips of the phylogenetic tree,

it is unreasonable to condition on a speciﬁc history

of biogeographic change. After all, the past history

of biogeographic change is not observable. Instead,

the usual approach is to marginalize over all possible

histories of biogeographic change that could give rise

to the observed geographic ranges. The standard way

to do this is to assume that events of colonization

or local extinction occur according to a continuous-

time Markov chain (Ree et al. 2005). Marginalizing

over histories of biogeographic change is accomplished

using two procedures. First, exponentiation of the

instantaneous-rate matrix, Q, gives the probability

density of all possible biogeographic changes along a

branch

p(y → z;t,Q)=

e

−Qt

yz

,

where y is the ancestral geographic range, z is t he

current geographic range, and t is the duration of the

branch on the tree. The geographic-range transition

probabilities obtained in this way marginalize over all

possible biogeographic histories along a single branch,

but do not account for the possible combinations of

geographic ranges that can occur at inter nal nodes of

the phylogeny. The Felsenstein (1981) pruning algorithm

is typically used to marginalize over the different

combinations of “states” (ancestral geographic ranges)

at the interior nodes of th e tree. Taken together, matrix

exponentiation and the pruning algorithm comprise the

conventional approach for calculating the probability

of observing the geographic ranges at the tips of the

tree while accounting for all of t he possible ways

those observations could have been generated under the

model.

The dimensions of the instantaneous-rate matrix, Q,

however, are n(S)×n(S), where n(S)=2

N

−1, so the size

of Q grows exponentially with respect to the number

of geographic areas, N. Furthermore, computing the

matrix exponential is of complexity O(n(S)

3

)(Golub and

Loan 1983). Thus, for values of N ≥ 20, t he number of

computations required to exponentiate the rate matrix is

quite large and computing the transition probabilities

in this manner is intractable (Ree and Sanmartín

2009).

Statistical phylogenetic models encounter an

analogous problem when modeling nucleotide

evolution. As Felsenstein (1981) suggests, one

might assume that each nucleotide site evolves

under mutual independence to keep the state space

small and amenable to matrix exponentiation. For

biogeographic inference, however, the assumption

of mutual independence would imply (implausibly)

that the correlative effects between areas—such as

geographic distance—are irrelevant to dispersal

processes, which renders this assumption suitable only

at University of Tennessee ? Knoxville Libraries on November 10, 2013http://sysbio.oxfordjournals.org/Downloaded from

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 792 789–804

792 SYSTEMATIC BIOLOGY VOL. 62

as a null model for testing the ﬁtness of more plausible

(e.g., distance-dependent dispersal) biogeographic

models.

Our primary motivation here is to remove the

computational constraint that precludes the elaboration

of more complex (and realistic) biogeographic models.

As a result of this focus, we leave the rigorous

comparison of inference across alternative models and

methods as an open topic for future study.

A Distance-Dependent Biogeographic Model

The instantaneous-rate matrix, Q, describes how the

geographic range of a species can evolve through

time. As with t he formulation of Reeetal.(2005),

we assume that in an instant of time only a single

area can be gained or lost. In other words, each row

of Q contains up to N positive, non-zero entr ies,

which correspond to the rates at which any one of

the N areas switches between absent and present (i.e.,

the N 0 → 1 and 1→ 0 positive entries of t he row).

Additionally, each row contains a single element on

the diagonal of the matrix, deﬁned as q

i,i

=−

i=j

q

i,j

,

which ensures that each row of Q sums to 0. The

remaining entries in Q have a value of 0, as they entail an

instantaneous change in geographic range involving two

or more areas. This process corresponds to a dispersal–

extinction (DE) model, which is somewhat simpliﬁed

relative to the dispersal–extinction–cladogenesis (DEC)

model (Ree et al. 2005), in that ancestral ranges are

inherited identically. However, the current framework

greatly expands the scope for the elaboration and

inclusion of more diverse and realistic speciation

scenarios.

We deﬁne a distance-dependentdispersal model, M

D

,

where the rate of gaining a particular area (0→1)

depends on the relative proximity of available areas to

those currently occupied by a lineage. That is, the rate

of colonizing a nearby area just outside the perimeter

of the current geographic range should be greater than

the rate of colonizing a relatively remote geographic

area. The precise nature of the relationship between

geographic distance and dispersal probability might be

speciﬁed in numerous ways (see, e.g., Wallace 1887;

MacArthur and Wilson 1967;Hanski 1998). Our distance-

dependent model speciﬁes a simple relationship in

which the probability of dispersal between two areas

is inversely related to the geographic distance between

them.

Let q

(a)

y,z

be the rate of change from the geographic

range y to the geographic range z, where y and z differ

only at the single area index a. Note that the rate function

accepts any pair of bit vectors as arguments, allowing

us to later assign conﬁgurations from x

i,•,k

to y and z,

x

i,•,k

being the geographic range of species i at time

(i)

k

.

Also, let

0

∈ θ and

1

∈ θ be the respective rates at which

an individual area is lost or gained within a geographic

range, and (y,z,a,) be a dispersal-rate modiﬁer that

accounts for correlative distance effects. We deﬁne the

instantaneous dispersal rate as

q

(a)

y,z

=

⎧

⎪

⎪

⎪

⎪

⎪

⎨

⎪

⎪

⎪

⎪

⎪

⎩

0

if z

a

= 0

1

(y,z,a,)ifz

a

= 1

0ify and z differ at more

than one area

0ify = 00...0

(1)

and the distance-dependent dispersal-rate modiﬁer as

(y,z,a,) =

⎛

⎝

N

n=1

1

{y

n

=1}

d(G

n

,G

a

)

−

⎞

⎠

×

⎛

⎝

N

m=1

1

{z

m

=0}

N

m=1

1

{z

m

=0}

N

n=1

1

{y

n

=1}

d(G

n

,G

m

)

−

⎞

⎠

(2)

where we deﬁne 1

{x=y}

as the indicator function that

equals 1 when both argumentsare equal and 0 otherwise,

and d(·) as the Great Circle distance between two

geographical coordinates on the surface of a sphere,

known by

d(G

n

,G

m

)=2rsin

−1

sin

2

G

m,

−G

n,

2

+cos(G

n,

)

×

cos(G

m,

)sin

2

G

m,

−G

n,

2

,

where r is the radius of the sphere, and G

n

is a vector

with elements G

n,

and G

n,

that correspond to the

latitude and longitude of the the centroid of discrete area

n. Here, we take a sphere with r≈ 6.37 ×10

6

meters to

approximate the size and shape of Earth.

Figure 2 will help develop intuition for how we

model distance-dependent dispersal. In effect, the ﬁrst

term of (·) computes the sum of inverse pairwise

-exponentiated geographic distances between the

dispersal target, a, and all currently occupied areas

of the geographic range. The second term normalizes

the dispersal rate by the mean of all inverse pairwise

geographic distances between all occupied–unoccupied

area-pairs. This normalization ensures that the sum of

dispersal rates with or without the distance-dependence

modiﬁer are equal, which helps identify and interpret

parameters

1

and .If(·)= 1or=0, then the rate of

dispersal to area a equals the unmodiﬁed dispersal rate,

1

.If>0, then the rate of dispersal to nearby areas

is higher than that to more distant areas. Conversely,

when <0, the rate of dispersal to more distant areas

is higher than that to nearby areas. Finally, model M

D

is

equivalent to M

0

when = 0.

Note that the rate of gain depends on the distance-

dependent correlation function (·), but the rate of loss

does not, so the distance-dependent dispersal model is

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 793 789–804

2013 LANDIS ET AL.—BAYESIAN BIOGEOGRAPHY FOR MANY AREAS 793

0 0

12

34

0 0

12

34

0 0

12

34

η(y = 1100, z = 1101,a=4,β)=

d(G

1

,G

4

)

−β

+ d(G

2

,G

4

)

−β

×

2

d(G

1

,G

3

)

−β

+ d(G

2

,G

3

)

−β

+ d(G

1

,G

4

)

−β

+ d(G

2

,G

4

)

−β

11

11 11

FIGURE 2. Cartoon of the computation of the distance-dependent

dispersal-rate modiﬁer, (·). Here, we are interested in computing the

rate of y =1100 transitioning to z= 1101. The ﬁrst term computes the

sum of inverse distances raised to the power between the area of

interest (i.e., 4) and all currently occupied areas (i.e., areas 1 and 2).

The second term then normalizes this quantity by dividing by the

sum of inverse distances raised to the power between all occupied–

unoccupied area-pairs (i.e., the denominator), then multiplying by

number of currently unoccupied areas (i.e., 2, the numerator).

not time reversible when = 0. This fact has implications

for evaluating the stationary frequency of geographic

ranges at the root of t he tree under this biogeographic

model, which we detail below.

Sampling Biogeographic Histories

Our goal is to conduct inference under a dispersal

model that captures t he correlative effects of geographic

distance between areas when N is large. For the

computational reasons cited above, we cannot use matrix

exponentiation to compute the likelihood under such

a biogeographic model. Instead, we adapt a Bayesian

data-augmentation approach that was introduced by

Robinson et al. (2003) to model site-dependent protein

evolution. Rather than analytically integrating over

all possible biogeographic histories using matrix

exponentiation, we numerically integrate over possible

histories using data augmentation and MCMC.

We use t he stochastic character-mapping algorithm

described by Nielsen (2002) to sample biogeographic

histories under the mutual-independence model, M

0

.

This works by ﬁrst sampling a set of geographic

ranges for all internal nodes of t he phylogeny and

then sampling intermediate ranges over each of

the branches connecting pairs of ancestor–descendant

nodes. Upon completion, each branch is associated

with a biogeographic history: the events comprising

this history on each branch are ordered chronologically

from past to present. Examples of such biogeographic

histories are depicted in Figure 1b–d. We describe the

process of sampling biogeographic histories in more

detail below.

We ﬁrst sample a set of geographic ranges for all

M internal nodes from the joint posterior probability

distribution of geographic-range conﬁgurations at t he

nodes. For tip nodes, we simply assign the observed

species ranges. Next, we visit each individual branc h in

a pre-order traversal (moving from the root to the tips)

of the tree. For each branch, we simulate a sequence

of intermediate geographic ranges from the ancestral

to th e descendant node using rejection sampling;

that is, the biogeographic history simulated along a

branch must be consistent with the geographic ranges

sampled/speciﬁed for the ancestor and descendant

nodes of t hat branch. To do so, we ﬁrst identify the

initial geographic range at the ancestral node, the ﬁnal

geographic range at the descendant node, and the

duration of the branch separating these two nodes.

We then sample a history of dispersal events for

each area under the mutual-independence model, M

0

,

under a simple instantaneous-rate matrix for a single

area

Q

∗

=

−

∗

0

∗

0

∗

1

−

∗

1

,

where

∗

0

and

∗

1

are the per-area rate of loss/local

extinction (1→0) and gain/colonization (0→ 1),

respectively. To iteratively sample the biogeographic

history for each area, j ∈{1,...,N}, we initialize

0

= t

(i)

and k =1. Each iteration moves the process further

along the branch by sampling a new event time

from Q

∗

, updating

0

=

0

−, incrementing k, and

inserting

0

into

(i)

in sorted order as we go. Each

event results in the state for area j changing to its

complement (i.e., 0 → 1, or 1→ 0), which we record

in t he branch history, x

i,j,k

. We continue to sample

dispersal events until the time of the next event is

younger t han the age of the end of branch,

0

<t

i

,

whereupon we record the ﬁnal event time as

(i)

F

= t

i

.

Since time is exponentially distributed, the probability

that any two areas undergo dispersal events at precisely

the same instant occurs with probability 0, which is

consistent with the one-change-at-a-time assumption of

the model.

When the biogeographic history for area j is sampled,

we check to make sure it matches the geographic

ranges sampled at t h e nodes. Inconsistent histories are

rejected and resampled for each area. Additionally, we

reject and resample events that induce the forbidden

extinction conﬁguration. For models in which the per-

site (per-area) state space is large, rejection sampling

path histories can be computationally inefﬁcient (c.f.,

Minin and Suchard 2007). This is not a concern in the

present case, however, as the per-area state space is

binary (i.e., 1 or 0 for presence/absence of a species in

an area), so we opt for the simpler algorithm.

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 794 789–804

794 SYSTEMATIC BIOLOGY VOL. 62

We iterate this process of simulating branch-speciﬁc

biogeographic histories for the remaining branches,

which we visit in a pre-order sequence. This results

in

(i)

for each branch, an ordered vector of event

times across all N areas, enabling us to compute

the model likelihood given a sampled biogeographic

history.

Computing the Likelihood of Biogeographic Histories

Since we can compute the rate at which any area is

gained or lost given the current geographic range, we

can compute the likelihood of a sampled biogeographic

history by adopting a “mechanistic” interpretation of

the instantaneous-rate matrix, Q. In general, waiting

times between events in a continuous-time Markov

process are exponentially distributed: when the process

is in state i, the next event will occur with an

exponentially distributed waiting time, where the rate of

the exponential is equal to t he overall rate of leaving state

i: q

i,i

=−

j=i

q

i,j

. Moreover, the nature of the change

at the next event is also speciﬁed by the instantaneous-

rate matrix: the relative probability that the next event

entails a change from state i to state j is p(i→ j) =−q

i,j

/q

i,i

.

Accordingly, the probability that the next event entails

a change from state i to state j at time t is simply

equal to the probability of any event occurring at time

t times the relative probability that the event is a change

from i to j.

In the present case, we let x

i,•,k

= (x

i,1,k

,x

i,2,k

,...,x

i,N,k

)

be the state (sampled range) for lineage i at time

(i)

k

.

Then, the probability that the next event is a the state

change y →z at time t is the product of probability of

the next sampled event occurring ﬁrst among all possible

events and the probability of any event occurring at time

t, given as

p

y → z;t,θ,M

=−

q

y,z

q

y,y

−q

y,y

e

−(−q

y,y

)t

= q

y,z

e

q

y,y

t

, (3)

and the probability that no event occurs in time t is

given as

p

y → y;t,θ,M

= e

q

y,y

t

. (4)

Note that the distance-dependent dispersal model

deﬁned in (1) depends on the superscript, (a), which

indicates the single area that differs between ranges y

and z. Here, we suppress the superscript in the interest

of simplifying the notation. Changes between ranges

that differ by more than one area have a transition

rate of 0 (they are prohibited under the one-change-

at-a-time model), so this summation requires only N

computations.

The likelihood of the biogeographic history over all

branches of the phylogeny is then simply calculated as

FIGURE 3. Cartoonof the likelihood terms. The biogeographic history

for lineage i includes the lineage start at time

(i)

1

, an extinction event

at area 2 at time

(i)

2

, a dispersal event into area 3 at time

(i)

3

, and the

lineage end at time

(i)

F

, with all events laying within the time interval

(3.2,9.3). The probability of a sampled geographic range at the start of

the branch is conditioned on the previous (ancestral) geographic range

and the time separating the geographic ranges,

(i)

k

=

(i)

k−1

−

(i)

k

. The

likelihood is the product of the probabilities corresponding to each

interval accounting for an area loss at time

(i)

2

, an area gain at time

(i)

3

,

and no further changes occurring before the lineage terminates.

the product of all stepwise likelihoods (Fig. 3),

L(X

obs

,X

aug

;θ,M)=

⎛

⎜

⎜

⎜

⎝

i

⎛

⎜

⎜

⎜

⎝

F

i

−1

k=2

p

x

i,•,k−1

→ x

i,•,k

;

(i)

k

,θ,M

stepwise changes

⎞

⎟

⎟

⎟

⎠

× p

x

i,•,F

i

→ x

i,•,F

i

;

(i)

F

i

,θ,M

no change

,

⎞

⎟

⎟

⎟

⎠

(5)

where F

i

= n

(i)

is the number of events on branch

i,

(i)

k

=

(i)

k−1

−

(i)

k

is the temporal interval between

events, and X

obs

are the ranges observed at the tips.

Markov Chain Monte Carlo

We can compute the posterior probability of a single

sampled biogeographic history as

p(θ,X

aug

|X

obs

,M

D

)∝L(X

obs

,X

aug

;θ,M

D

)p(θ).

We approximate the joint posterior probability density

of the biogeographic model parameters numerically

using an MCMC algorithm. The general idea is

to construct a Markov chain with a state space

comprising the possible values for the model parameters

and a stationary probability distribution that is the

target distribution of interest (i.e., the joint poster ior

probability distribution of the model parameters). Draws

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 795 789–804

2013 LANDIS ET AL.—BAYESIAN BIOGEOGRAPHY FOR MANY AREAS 795

from the Markov chain at stationarity are valid, albeit

dependent, samples from the posterior probability

distribution of the biogeographic parameters (Tierney

1994). Accordingly, parameter estimates are based on the

frequency of samples drawn from the stationary Markov

chain.

By repeatedly sampling dispersal histories via MCMC,

we numerically integrate over X

aug

,

p(θ |X

obs

,M

D

)∝

X

aug

p(θ,X

aug

|X

obs

,M

D

).

To generate samples from this posterior, we rely on the

Metropolis–Hastings algorithm (Metropolis et al. 1953;

Hastings 1970). Below, we describe our MCMC proposals

for an audience whom we assume has some familiarity

with MCMC.

Proposing parameters.—Our method has two pairs of

parameters for each area g overning the rate at which it

is added to or removed from the current biogeographic

range:

0

and

1

, which are used when computing the

likelihood under the distance-dependent model; and

∗

0

and

∗

1

, which are used to sample biogeographic histories

under the simpler mutual-independence model. All four

rates must take on values > 0 and are distributed by

half-Cauchy(0, 1) priors. We propose changes to the

dispersal-rate parameters by ﬁrst randomly selecting

one of the four rates (uniformly with P =0.25), then

propose a new value for the selected rate parameter,

x

= xe

(u−0.5)

, where x is the current dispersal rate, x

is the proposed dispersal rate, is a tuning parameter,

and u ∼ Uniform(0,1). The probability of accepting a

proposed change to the dispersal-rate parameters,

0

and

1

, under the distance-dependent model, M

D

,is

calculated using the Metropolis–Hastings ratio

R= min

1,

L(X

obs

,X

aug

;θ

,M

D

)

L(X

obs

,X

aug

;θ,M

D

)

×

p(θ

)

p(θ)

×

,

where ﬁrst term is the ratio of the likelihoods of the

proposed and current states, the second term is the ratio

of the prior probabilities of the proposed and current

states, and the ﬁnal term is the simpliﬁed Hastings ratio

that describes the ratio of the proposal probabilities for

the proposed and current states.

To improve acceptance rates for proposed dispersal

histories under the mutual-independence model, M

0

,

we infer (

∗

0

,

∗

1

)∈θ

∗

by conditioning the likelihood on

M

0

instead of M

D

, yielding the Metropolis–Hastings

ratio

R= min

1,

L(X

obs

,X

aug

;θ

∗

,M

0

)

L(X

obs

,X

aug

;θ

∗

,M

0

)

×

p(θ

∗

)

p(θ

∗

)

×

∗

∗

.

We specify a Cauchy(0, 1) prior for the distance-power

parameter, , and propose new values

= N (,),

where is a tuning parameter. The Metropolis–Hastings

ratio to update is

R= min

1,

L(X

obs

,X

aug

;θ

,M

D

)

L(X

obs

,X

aug

;θ,M

D

)

×

p(θ

)

p(θ)

×1

,

where the Hastings ratio simpliﬁes to 1 owing to the

symmetry of the normal distribution. We used the

Cauchy and half-Cauchy distributions as priors because

they are weakly informative and fat-tailed, causing

our inference to prefer parameter values near 0 while

permitting parameters to take on large values should the

data prove informative.

Proposing biogeographic histories.—To update

biogeographic histories, we sample an internal node

uniformly at random and a set of areas, S, uniformly at

random. We then propose a new biogeographic history

by resampling the biogeographic histories for areas

S for incident branches using the stochastic-mapping

approach described earlier.

The Metropolis–Hastings ratio for this proposal is

R= min

1,

L(X

obs

,X

aug

;θ,M

D

)

L(X

obs

,X

aug

;θ,M

D

)

×

p(θ)

p(θ)

×

L(X

obs

,X

aug

;θ

∗

,M

0

)

L(X

obs

,X

aug

;θ

∗

,M

0

)

,

where the ﬁrst term is the likelihood ratio under the full

model, M

D

, and the second term is the proposal-density

ratio that accounts for the probability of sampling the

proposed biogeographic histories under the sampling

model, M

0

, using the sampling parameters, θ

∗

. The

parameters are not updated as part of this proposal, t hus

the ratio of prior probabilities may be safely omitted as

it always equals 1.

Typically, the prior probability of each state

(geographic range) at the root is equal to the

corresponding stationary frequencies of the model.

As mentioned above, our distance-dependent dispersal

model is not time reversible, so we cannot approximate

the stationary distribution by conventional means (c.f.,

Robinson et al. 2003). Instead, we leverage the fact that

the stationary frequencies of states (geographic ranges)

of a model can be approximated by simulating the

continuous-time Markov process over a sufﬁciently long

branch. Accordingly, we append a long stem branch

to the root node, sample an ancestral “‘consensus”

conﬁguration as the ancestral state at the stem node,

then simulate a biogeographic history along the stem

branch that is consistent with t he states at the beginning

(stem node) and end (root node) of the stem branch.

Thus, we simulate into the stationary distribution

of geographic ranges under the distance-dependent

dispersal model along the stem branch, and then sample

from the approximated stationary distribution at the

root node using the same proposal machinery as is used

for any internal node.

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 796 789–804

796 SYSTEMATIC BIOLOGY VOL. 62

Model Selection

The mutual-independence model, M

0

, is equivalentto

the distance-dependent dispersal model, M

D

, when =

0. Since M

0

⊆ M

D

, we compute Bayes factors for these

nested models using the Savage–Dickey ratio (Dickey

1 971; Verdinelli and Wasserman 1995), deﬁned as

B

D,0

=

P

0

(=0|M

D

)

P( = 0|

0

,

1

,x

obs

,M

D

)

,

where P

0

(=0|M

D

) is the prior probability and P(=

0|

0

,

1

,x

obs

,M

D

) is the posterior probability under the

more general distance-dependent dispersal model, M

D

,

at the restriction point = 0, where M

D

is equivalent

to the simpler mutual-independence model, M

0

.Ifthe

posterior probability under M

D

at =0 is signiﬁcantly

greater than the corresponding prior probability, then

the Bayes factor supports M

D

(i.e., M

D

provides a better

ﬁt to the data). Since there is no analytical expression for

the posterior probability, P(=0|

0

,

1

,x

obs

,M

D

), we

approximate its distribution using the non-parametric

Gaussian kernel density estimation method provided by

default in R (R Core Team 2012).

Data Analysis

Simulation study.—We simulated 50 dispersal data sets

for each of eight values of : 0, 0.25, 0.5, 1, 2, 3, 4, and 6.

These data were simulated upon a geography with 20×

30=600 uniformly spaced discrete areas positioned over

the Bay Area, California. Phylogenies were simulated

under a pure birth process wit h rate 1, then scaled

to have a height comparable to our empirical study

phylogeny. Dispersal and extinction rates were also

chosen to resemble the rates inferred from the empirical

analysis, but scaled to account for the increased number

of areas. We then ran independent MCMC analyses for

each data set under the distance-dependent model. To

identify the values of that are indistinguishable from

the mutual-independence model, we computed Bayes

factors using the Savage–Dickey ratio for all posteriors

inferred under the distance-dependent model.

We then quantiﬁed how well the posterior

probabilities of dispersal histories correspond to the true

biogeographic history known from the simulation. To

do so, we compute the posterior probability of each area

being occupied by each internal node for each analysis,

then compute the sum of squared difference between

each probability (0≤ P≤ 1) and the cor responding true

history (P = 0 or 1) recorded from the simulation. As

this error term increases, the inferred ancestral ranges

at nodes may be interpreted as less accurate.

Empirical study.—We applied our method to 65 species

of the plant clade Rhododendron section Vireya, which are

distributed throughout the Malesian Archipelago. We

used the species distributions and 20 discrete areas of

endemism reported by Brown et al. (2006), and the time-

calibrated phylogeny reported by Webb and Ree (2012).

To compute distances between areas, we used a single

representative coordinate per area (depicted in Fig. 8a).

To simplify the analysis, we hold the geography to be

constant throughout time.

Software conﬁguration.—Each MCMC analysis of the

simulated data ran for 10

6

cycles, sampling parameters

and node biogeographic histories every 10

3

cycles. For

the empirical data, we ran ﬁve independent MCMC

analyses, each set to run for 10

9

cycles, sampling every

10

4

cycles. To verify MCMC analyses converged to the

same posterior distribution, we applied the Gelman

diagnostic (Gelman and Rubin 1992) provided through

the coda package (Plummer et al. 2006). Results from

a single MCMC analysis are presented. The methods

described here have been implemented in BayArea,for

which C++ source code is available for download at

http://code.google.com/p/bayarea (last accessed June

28, 2013).

R

ESULTS

Simulation.—For 50 phylogenies of 20 tips and a ﬁxed

geography of 600 areas (see Methods section), we

simulated 50 presence–absence data matrices for eight

values of : 0, 0.25, 0.5, 1, 2, 3, 4, and 6. Distributions of the

mean posterior parameter values for the 8×50 MCMC

analyses are shown in Figure 4.For≤3, the model

was able to retrieve the true simulation parameters

accurately, but this accuracy degraded for ≥4 (see

Discussion section).

Figure 5 shows that Bayes factors consistently selected

the correct model when data were simulated for ≥1 and

for = 0. For data simulated when 0<<1, we observed

the greatest variance in the Bayes factor credible

intervals. Data simulated under conditions in which

distance had a weak effect on dispersal, i.e., ≤ 0.25, were

typically (and appropriately) indistinguishable from the

mutual-independence model.

We then compared t he true biogeographic history

of each simulation to t h e corresponding posterior

distribution of the sampled biogeographic histories.

The sum of squared differences between posterior

(estimated) and true (simulated) dispersal histories

varied little for values of ≤ 3, with slight elevation in

error for ≥ 4 (Fig. 6). The elevated error for large values

of the distance-power parameter, , may be caused by

the underestimated parameter values, or it may be an

artifact of our error metric; it carries an independence

assumption, so it over-penalizes distance-dependent

dispersal histories that contain an excess of “near misses”

relative to “wild misses”.

Vireya.—Bayes factors strongly favor the

distance-dependent dispersal model over the mutual-

independence model to explain the biogeographic

history of 65 rhododendron species in the section Vireya

over 20 biogeographical areas throughout Malesia. The

estimated maximum a posteriori (MAP) value of the

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 797 789–804

2013 LANDIS ET AL.—BAYESIAN BIOGEOGRAPHY FOR MANY AREAS 797

FIGURE 4. Distributions of means of posteriors of simulation study. Fifty data sets were simulated for each value of ∈{0,0.25,0.5,1,2,3,4,6}

while

0

= 0.05 and

1

= 0.005 were held constant. For each set of 50 data sets, the mean of the posterior of each parameter was computed under

the distance-dependent dispersal model. Distribution means are given by a bold line, while the 25th and 75th percentiles are given by the lower

and upper edges of each box, called Q1 and Q3, respectively. The upper and lower whiskers indicate Q1 − IQR and Q3 + IQR, where IQR = 1.5

× (Q3 − Q1), and circles indicate outliers. The true parameter values are given by (A,B) the horizontal dashed line, and (C) the squares.

rate of area loss was

0

= 0.13, the rate of area gain

was

1

= 0.013, and the distance power was =2.65

(Fig. 7). Gelman–Rubin convergence values for

0

,

1

,

and between all pairs of MCMC analyses were < 1.1,

which is consistent with all independent MCMC runs

converging to the same posterior.

Figure 8 shows a summary of the inferred

biogeographic history (Supplementary Fig. 1

shows the full history and observed ranges). The

per-area posterior probabilities of the ancestral

ranges strongly favor migration eastward into the

Malesian Archipelago originating from Southeast

Asia. The inferred biogeographic scenario—multiple

independent dispersal events from the Sunda Shelf

across Wallace’s Line into Wallacea—is favored over

that of a single dispersal event followed by pervasive

extinction events (Fig. 8b). Lydekker’s Line appears

to be less permeable, with only a single lineage

dispersing eastward from Wallacea across it onto

the Sahul Shelf (Fig. 8c). An interactive animation

of the ancestral range reconstruction is hosted at

http://mlandis.github.com/phylowood/?url=examples/

vireya.nhx (last accessed June 28, 2013).

Readers might naturally wonder how inferences

under the current method compare to those based

on alternative statistical biogeographic methods, such

as the DEC model of (Ree et al. 2005). Despite

their superﬁcial similarities—both are likelihood-based

methods that rely on continuous-time Markov models

to describe the evolution of species geographic range—

the methods differ to an extent that makes it difﬁcult

to draw any meaningful comparisons. Speciﬁcally,

the two methods invoke models t hat differ in many

respects (see Discussion section), and are implemented

in different statistical frameworks (maximum likelihood

vs. Bayesian inference).

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 798 789–804

798 SYSTEMATIC BIOLOGY VOL. 62

0

25

50

75

100

00.250.512346

Simulation dataset per

% of simulations favoring M

D

BF

D0

support for M

D

Favors M

0

Insubstantial

Substantial

Strong

Very strong

Decisive

FIGURE 5. Distributions of Bayes factors for the simulation study. Fifty data sets were simulated for each value of ∈{0,0.25,0.5,1,2,3,4,6}

while

0

= 0.05 and

1

= 0.005 were held constant. Columns display the frequencies of strengths of support in favor of the distance-despendent

dispersal model, where strengths of support correspond to the intervals suggested by Jeffreys (1961): Favors M

0

on (−∞,1); Insubstantial on

[1,3); Substantial on [3,10); Strong on [10,30); Very strong on [30,100); Decisive on [100,∞). Each column corresponds to the strengths of support

per 50 -valued simulations. Bayes factors generally select the correct underlying model except for = 0.25.

FIGURE 6. Errors for inferred dispersal histories of simulation study. The sum of squared differences between the posterior probability (i.e.,

0< P<1) and the true history (i.e., P=0orP = 1) for each area and each internal node were computed per simulated data set. The box plots show

the distribution of these sums for each batch of 50 simulated data sets per value of ∈{0,0.25,0.5,1,2,3,4,6}. Distribution means are given by

a bold line, while the 25th and 75th percentiles are given by the lower and upper edges of each box, called Q1 and Q3, respectively. The upper

and lower whiskers indicate Q1 − IQR and Q3 + IQR, where IQR = 1.5 × (Q3 − Q1), and circles indicate outliers.

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 799 789–804

2013 LANDIS ET AL.—BAYESIAN BIOGEOGRAPHY FOR MANY AREAS 799

FIGURE 7. Marginal posterior densities for dispersal parameters from the Malesian Rhododendron data set. MAP values (dashed gray line) for

the distance-dependent dispersal model parameters are A)

0

= 0.13, B)

1

= 0.013; and C) =2.65. The dotted black line corresponds to the prior,

∼ Cauchy(0,1). Note that the posterior probability of = 0is∼0, resulting in “Decisive” support (c.f., Jeffreys 1961) for the distance-dependent

dispersal model over the mutual-independence model.

FIGURE 8. Biogeographic history of Malesian Rhododendron. A) The region was parsed into 20 discrete geographic areas following Brown et al.

(2006), which straddle two important biotic boundaries—Wallace’s and Lydekker’s Lines. Each circle corresponds to a discrete area. Distances

between these areas are based on a single coordinate for each area, indicated by an “x”. Posterior probability of being present in an area is

proportional to the opacity of the circle. Occupied areas with posterior probabilities < 0.12 are masked to ease interpretation. Circles are shaded

according to their position relative to Wallace’s Line (B) or Lydekker’s Line (C). Branches are shaded by a gradient representing the sum of

posterior probabilities of being present per area for descendant–ancestor pairs. We infer a continental Asian origin for Malesian rhododendrons

with multiple dispersal events across Wallace’s Line (B) and a single dispersal event across Lydekker’s Line (C).

DISCUSSION

Historical biogeography has begun the transition

to explicitly model-based statistical inference (Ree

and Sanmartín 2009; Ronquist and Sanmartín 2011).

These methods descr ibe the biogeographic process by

means of continuous-time Markov chain that models

the colonization of—and extinction within—a set of

discrete geographic areas, and calculate the likelihood

of the observed species geographic ranges at the tips

of the tree using matrix exponentiation (to integrate

over possible biogeographic histories along branches)

and Felsenstein’s pruning algorithm (to integrate over

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 800 789–804

800 SYSTEMATIC BIOLOGY VOL. 62

FIGURE 8. Continued

possible ancestral ranges at the interior nodes of the tree).

Although this is a vigorous area of research, reliance

on matrix exponentiation ultimately entails serious

computational constraints that limit both our ability

to develop more elaborate and realistic biogeographic

models and to apply these methods to more complex

and typical empirical problems.

We offer a Bayesian solution to this constraint that

relies on data augmentation and MCMC to numerically

integrate over biogeographic histories to estimate the

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 801 789–804

2013 LANDIS ET AL.—BAYESIAN BIOGEOGRAPHY FOR MANY AREAS 801

C)

FIGURE 8. Continued

joint poster ior probability of the parameters given the

data. The primary implication of this approach is a

substantial increase in the number of discrete areas

that can be accommodated—by approximately two

orders of magnitude. Moreover, we propose a simple

distance-dependent dispersal model in which rates of

area colonization are a function of geographic distance.

The nature and strength of the distance effect on rates

of colonization are governed by the distance-power

parameter, . When >0, dispersal events over long

distances are penalized, whereas long-distance dispersal

events are favored when <0. Importantly, when = 0,

the distance-dependent dispersal model collapses to the

simpler mutual-independence model, and so M

0

⊆ M

D

.

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 802 789–804

802 SYSTEMATIC BIOLOGY VOL. 62

Because the models are nested, we can use the Savage–

Dickey density ratio to compute Bayes factors for robust

model selection.

In the remainder of this section, we attempt to

develop an intuition regarding the behavior of this

new biogeographic approach, describe some of the

beneﬁts and limitations of the current implementation,

and consider how this approach might be proﬁtably

extended.

Exploring the behavior of the Bayesian biogeographic

framework—We explored the statistical behavior of our

biogeographic model and inference framework via

analyses of simulated and empirical data. The simulation

study comprised 50 dispersal data sets for 20 taxa and

600 areas that were simulated under each of eight

strengths of distance effects, : 0, 0.25, 0.5, 1, 2, 3, 4,

and 6. For ≤3, we were generally able to infer the

true parameters. However, estimation accuracy begins

to suffer when ≥4, resulting in all parameters being

slightly underestimated. Estimation accuracy is also

high for inferences based on time-series data simulated

under large values, so the poor accuracy appears

to emerge from the phylogenetic structure underlying

the data. Although values of ≥ 4 are greater than

those we have inferred from empirical data, we advise

increased caution should one’s inference lie in this range

of parameters. Using the Savage–Dickey ratioto compute

Bayes factors, we found our ability to select the correct

model was largely determined by the strength of

(Fig. 5). Future simulation studies should be extended to

evaluate the effects of the phylogeny on inference (tree

size, shape, uncertainty, etc.), the sensitivity of the model

to various priors, and whether extreme parameter values

introduce greater errors in ancestral geographic-range

estimates.

As currently speciﬁed, the distance-dependent

dispersal-rate modiﬁer, (·), only changes the dispersal

rate per area, but not the summed rates of colonization

and extinction over the geographic range. Accordingly,

the equilibrium number of occupied or unoccupied

areas for the geographic range is largely determined by

the ratio of

1

and

0

(the per-area rates of colonization

and extinction, respectively). When the geographic

range involves occupation of a relatively small fraction

of available areas—as occurs when the number of areas

increases—the area colonization/extinction rate ratio

becomes small in order to explain the low observed

frequencies of area occupancy at the tips of t he tree.

In such situations, these relatively simple parameters

may fail to ﬁt the data well. Moreover, the size of

inferred ancestral geographic ranges (in terms of the

number of occupied areas) tends to be larger t han t hose

observed at the tips of the tree. This phenomenon is

also characteristic of other parsimony- and likelihood-

based biogeographic methods (e.g., Ronquist 1997; Ree

et al. 2005; Clark et al. 2008; Buerki et al. 2011). One

solution to both problems would be to favor sampled

biogeographic histories with range sizes most similar to

a carrying-capacity or range-size parameter.

We demonstrated the empirical application of our

method with an analysis of the biogeographic history of

65 Vireya species distributed over 20 geographic areas

across the Malesian Archipelago (Brown et al. 2006).

Bayes factors strongly favored the distance-dependent

model, with a MAP estimate of = 2.65 (Fig. 7). Brown et

al. offered two hypotheses for the origin of Rhododendron:

as an old genus that arose in Australia, or as a young

genus that arose in Asia. Under our model, the posterior

of sampled biogeographic histories at the root of the tree

suggests that Asia is the most probable point from which

the genus entered the Malesian archipelago (Fig. 8).

The inferred biogeographic history of Vireya involves

several episodes of dispersal across Wallace’s Line and

a single episode of dispersal across Lydekker’s Line

(Fig. 8b,c). We note two points regarding these dispersal

events. First, the earliest dispersal across Wallace’s Line

and the single dispersal across Lydekker’s Line appear

to have occurred at approximately the same time.

Adopting 55 Ma as the crown age of the Rhododendron

phylogeny (Webb and Ree 2012) implies that these

dispersal events occurred in the Late Eocene (∼40 Ma).

At that time, many of the discrete areas in the western

part of the Malesian Archipelago collectively formed

a contiguous, emergent terrestrial region, Sundaland

(Lohman et al. 2011), which may have facilitated the

easterly dispersal of Vireya species from their ancestral

range in continental Asia across Sundaland. Moreover,

the eastern border of Sundaland was not yet bounded

by a contiguous deep oceanic trench, which may

have facilitated the continued easterly dispersal from

Sundaland into Wallacea (across Wallace’s Line) and

eastward out of Wallacea (across Lydekker’s Line) into

the eastern region of t he Malesian Archipelago.

The second point pertains to the apparent prevalence

of dispersal events across Wallace’s line. The origin

of Vireya in continental Asia may have permitted the

accumulation of greater species diversity throughout

Sundaland, west of Wallacea. This would have

established a greater species-diversity gradient

across Wallace’s line than that for Lydekker’s line.

Consequently, there may have been more opportunity

for species to disperse across the western boundary

(Wallace’s line) into Wallacea than there has been

for species to disperse across the eastern boundary

(Lydekker’s line) out of Wallacea.

Advantages and limitations of the Bayesian biogeographic

method.—Increasing the number of areas offers several

beneﬁts. The most obvious, of course, is the ability

to increase t he geographic resolution of biogeographic

inference. As we increase the number of areas, discrete

biogeography better represents the continuous features

of Earth. As an example, for a clade of terrestrial species

that collectively share a global distribution, a statistical

biogeographic analysis would want to discretize the

(approximately) 1.5×10

8

km

2

of terrestrial space into

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 803 789–804

2013 LANDIS ET AL.—BAYESIAN BIOGEOGRAPHY FOR MANY AREAS 803

a meaningful number of areas. With ∼15 areas (the

previous limit), the average area would be comparable in

size to Canada (≈10

7

km

2

); for ∼1500 areas (manageable

under th e current approach), the average area would be

comparable to the size of Ohio (≈10

5

km

2

).

Second, biogeographic areas have traditionally been

deﬁned on the basis of empirical analysis. For systems

that do not have well-deﬁned biogeographic areas,

our method allows the biogeographer to agnostically

deﬁne areas according to a grid, as was done in

our simulation study. By studying the congruence

between posteriors of dispersal histories for alternatively

discretized geographies, one could determine the

optimal discretization for a particular system, including

both the number and shapes of areas. For example, a

researcher with intimate knowledge of a study system

may derive a geographic discretization that produces

radically different ancestral-range estimates than t hose

based on a uniformly gridded discretization. Such a

scenario suggests that one of the two discretizations

does not properly “weight” t he importance of certain

geographic areas when inferring the biogeographic

history.

Although it has beneﬁts, the ability to increase the

geographic resolution also raises new issues. At highly

resolved spatial scales, for example, it may become more

difﬁcult to accurately specify the occupancy of species

within individual cells of the geographic grid. Inference

under our model conditions on the biogeographic

ranges of species at the tips of the tree, and errors in

specifying t hese ranges are likely to lead inference astray.

One solution to this issue would be to use species-

distribution models to ﬁrst predict the geographic ranges

of species, and then treat these estimated ranges as the

observed species’ geographic ranges (analogous to the

conventional practice of treating a multiple-sequence

alignment—an inference predicted from the raw data—

as the observations used to infer phylogeny).

Extending the Bayesian biogeographic method.—The real

beneﬁt of the Bayesian framework is the tremendous

extensibility that it affords. The current implementation

makes various restrictive assumptions. For example,

we assume a ﬁxed (and known) tree, a static geological

history, and a homogeneous environment. Below we

touch brieﬂy on three extensions that permit the

approach to accommodate phylogenetic uncertainty,

dynamic geological history, and environmental

heterogeneity.

Our implementation assumes the phylogeny is

known without error, a luxury that exists only

under simulation. The most natural way to account

for phylogenetic uncertainty would be to exploit

a distribution of time-calibrated trees (estimated

separately) as input for biogeographic inference.

This approach is straightforward for methods that

analytically integrate over biogeographic histories:

simply deﬁne an MCMC proposal to draw a new tree

from the marginal distribution of phylogenies. However,

our model entails sampling biogeographic histories for

a speciﬁc phylogeny. Accordingly, this extension will

require the use of joint proposals for both biogeographic

history and phylogeny that maintain good mixing of

the MCMC (i.e., t hat ensure reasonable acceptance

probabilities). This will be a challenging task.

It is important to emphasize that our empirical

analysis was conducted under the assumption of a static

geological history: we explicitly ignore the substantial

effects of tectonic drift, changes in sea level, t he

formation of islands, etc. This greatly simpliﬁes t he

analysis, of course, since biogeographic likelihoods

are computed by conditioning on a single, static

set of geographic distances. Ideally, paleogeographic

reconstructions would inform t he changing proximity of

areas through time, and biogeographic inference would

be computed by conditioning on a temporally dynamic

geography. For example, consider the scenario in which

two continents drift apart as time advances, which may

be characterized as a time-ordered vector of maps, each

map corresponding to the geography appropriate to

each interval of geological time. Since our phylogeny

is also measured in units of absolute time, the rates

of gain and loss could easily be modiﬁed to condition

on the relevant set of geographical coordinates. In

the above scenario, distances between areas between

continents would increase with time, so dispersal

events between continents would become increasingly

unlikely.

By adopting a DEC-like approach wherein

cladogenesis events differ in pattern from anagenic

dispersal and extinction events, our model would

have to deﬁne transition probabilities between larger

numbers of conﬁgurations; it is trivial to compute

the model likelihood with a model that accounts

for cladogenic events by conditioning on a single

biogeographic history, but to numerically integrate over

all possible cladogenic events via MCMC will require

sophisticated proposal distributions.

Finally, we can incorporate other features of areas

beyond their latitude and longitude—such as altitude,

climate, and ecology—that may affect dispersal rates.

Morphological evolution also has a noted role in

biogeography—Bergmann’s Rule (Freckleton et al. 2003),

traits that effect long-distance dispersal ability (Carlquist

1966), etc.—and could be jointly inferred along with

dispersal patterns (Lartillot and Poujol 2011). These

factors could variously be incorporated as parameters

to construct a suite of candidate biogeographic

models. As we demonstrated for exploring the effect

of geographic distance, marginal likelihoods under

different biogeographic models could then be computed

and Bayes factors used to identify biogeographically

important model components.

Noting the simplicity of their biogeographic model,

(Ree et al. 2005) drew an analogy to the earliest work on

probabilistic models of molecular evolution—the (Jukes

and Cantor 1969) model. Although it admittedly offered

a rudimentary description of the process, this ﬁrst model

nevertheless provided a critical proof of concept that

[15:19 30/9/2013 Sysbio-syt040.tex] Page: 804 789–804

804 SYSTEMATIC BIOLOGY VOL. 62

the problem could be proﬁtably pursued in a statistical

framework. To extend this analogy, we believe the

current contribution resembles the subsequent paper by

(Felsenstein 1981), in which he proposed the pruning

algorithm that—by virtue of conferring a tremendous

increase in computational efﬁciency—heralded an era of

progress in developing stochastic models for the analysis

of DNA and amino acid sequence data that has been

one of the great success stories in evolutionary biology.

We are hopeful that the small steps made here will

precipitate a similar era of productivity in the ﬁeld of

biogeographic inference that will enhance our ability to

make progress on this important problem.

S

UPPLEMENTARY MATERIAL

Supplementary Material, including Supplementary

ﬁgures and Vireya data ﬁles can be found at

http://datadryad.org and in t he Dryad data repository

(DOI:10.5061/dryad.8346r; last accessed June 28, 2013).

F

UNDING

This research was supported by the National Science

Foundation (NSF) [DEB 0445453 to J.P.H.; DEB 0842181,

DEB 0919529 to B.R.M.] and the National Institutes

Health (NIH) [GM-069801 to J.P.H.].

A

CKNOWLEDGMENTS

We thank Richard Ree, Fredrik Ronquist, and an

anonymous reviewer for their helpful comments.

R

EFERENCES

Brown G., Nelson G., Ladiges, P.Y. 2006. Historical biogeography of

Rhododendron Section Vireya and the Malesian Archipelago. J.

Biogeogr. 33:1929–1944.

Buerki S., Forest F., Alvarez N., Nylander J.A.A., Arrigo N., Sanmartín

I. 2011. An evaluation of new parsimony-based versus parametric

inference methods in biogeography: a case study using the globally

distributed plant family Sapindaceae. J Biogeog. 38:531–550.

Carlquist S. 1966. The biota of long-distance dispersal: I. Principles of

dispersal and evolution. Q. Rev. Biol. 41:247–270.

Clark J.R., Ree R.H., Alfaro M.E., King M.G., Wagner W.L., Roalson

E.H. 2008. A comparative study in ancestral range reconstruction

methods: retracing the uncertain histories of insular lineages. Syst.

Biol. 57:693–707.

Dickey J. 1971. The weighted likelihood ratio, linear hypotheses on

normal location parameters. Ann. Stat. 42:204–223.

Felsenstein J. 1981. Evolutionary trees from DNA sequences: a

maximum likelihood approach. J. Mol. Evol. 17:368–376.

Freckleton R.P., Harvey P.H., Pagel M. 2003. Bergmann’s rule and body

size in mammals. Am. Nat. 161:821–825.

Gallager R.G. 1962. Low-density parity-check codes. IRE Trans. Inform.

Theory 8:21–28.

Gelman A., Rubin D.B. 1992. Inferences from iterative simulation using

multiple sequences. Stat. Sci. 7:457–511.

Golub G.H., Loan C.F.V. 1983. Matrix computations. Baltimore (MD):

Johns Hopkins University Press.

Hanski I. 1998. Metapopulation dynamics. Nature 396:41–49.

Hastings W.K. 1970. Monte Carlo sampling methods using Markov

chains and their applications. Biometrika 57:97–109.

Jeffreys H. 1961. Theory of probability. Oxford: Oxford University

Press.

Jukes T.H., Cantor C.R. 1969. Evolution of protein molecules. In: H.N.

Munro, editor. Mammalian Protein metabolism. Academic Press,

New York. p. 21–123

Lartillot N. Poujol R. 2011. A phylogenetic model for investigating

correlated evolution of substitution rates and continuous

phenotypic characters. Mol. Biol. Evol. 28:729–744.

Lemey P., Rambaut A., Drummond A.J., Suchard M.A. 2009. Bayesian

phylogeography ﬁnds its roots. PLoS Comput. Biol. 5:e1000520.

Lemey P., Rambaut A., Welch J.J., Suchard M.A. 2010. Phylogeography

takes a relaxed random walk incontinuous space and time. Mol.

Biol. Evol. 27:1877–1885.

Lemmon A.A., Lemmon E.M. 2008. A likelihood framework for

estimating phylogeographic history on a continuous landscape.

Syst. Biol. 57:544–561.

Lohman D.J., de Bruyn M., Page T., von Rintelen K., Hall R., Ng

P.K.L., Shih H.-T., Carvalho G.R., von Rintelen T. 2011. Biogeography

of the Indo-Australian archipelago. Ann. Rev. Ecol., Evol. Syst.

42:205–226.

MacArthur R.H., Wilson E.O. 1967. The theory of island biogeography.

Princeton (NJ): Princeton University Press.

Metropolis N., Rosenbluth A.W., Rosenbluth M.N., Teller A.H., Teller

E. 1953. Equation of state calculations by fast computing machines.

J. Chem. Phys. 21:1087–1092.

Minin V.N., Suchard M.A. 2007. Counting labeled transitions in

continuous-time Markov models of evolution. J. Math. Biol. 56:

391–412.

Nielsen R. 2002. Mapping mutations on phylogenies. Syst. Biol.

51:729–739.

Plummer M., Best N., Cowles K., Vines K. 2006. CODA: convergence

diagnosis and output analysis for MCMC. R News. 6:7–11.

R Core Team. 2012. R: a language and environment for statistical

computing. Vienna, Austria: R Foundation for Statistical

Computing. ISBN 3-900051-07-0.

Ree R.H., Moore B.R., Webb C.O., Donoghue M.J. 2005. A likelihood

framework for inferring the evolution of geographic range on

phylogenetic trees. Evolution 59:2299–2311.

Ree R.H., Sanmartín I. 2009. Prospects and challenges for parametric

models in historical biogeographical inference. J. Biogeogr. 36:1211–

1220.

Ree R.H., Smith S.A. 2008. Maximum likelihood inference of

geographic range evolution by dispersal, local extinction, and

cladogenesis. Syst. Biol. 57:4–14.

Robinson D.M., Jones D.T., Kishino H., Goldman N., Thorne J.L. 2003.

Protein evolution with dependence among codons due to tertiar y

structure. Mol. Biol. Evol. 20:1692–1704.

Ronquist F., Sanmartín I. 2011. Phylogenetic methods in biogeography.

Ann. Rev. Ecol. Evol. Syst. 42:441–464.

Ronquist F. 1997. Dispersal-vicariance analysis: a new approach to the

quantiﬁcation of historical biogeography. Syst. Biol. 46:195–203.

Tierney L. 1994. Markov chains for exploring posterior distributions

(with discussion). Ann. Stat. 22:1701–1762.

Verdinelli I., Wasserman L. 1995. Computing Bayes factors using a

generalization of the Savage-Dickey density ratio. J. Am. Stat. Assoc.

90:614–618.

Wallace A.R. 1887. Oceanic islands: their physical and biological

relations. Bull. Am. Geog. Soc. 19:1–21.

Webb C.O., Ree R.H. 2012. Historical biogeography inference in

Malesia. In: Gower D., Johnson K., Richardson J., Rosen B., Ruber

L., Williams S., editors. Biotic evolution and environmental change

in Southeast Asia. Cambridge University Press, Cambridge, UK.

p. 191–215.