Ecological Applications, 18(5), 2008, pp. 1200–1211
? 2008 by the Ecological Society of America
OPTIMIZING DISPERSAL CORRIDORS FOR THE CAPE PROTEACEAE
USING NETWORK FLOW
STEVEN J. PHILLIPS,1,4PAUL WILLIAMS,2GUY MIDGLEY,3AND AARON ARCHER1
1AT&T Labs–Research, 180 Park Avenue, Florham Park, New Jersey 07932 USA
2Entomology Department, Natural History Museum, Cromwell Road, London SW75BD UK
3Kirstenbosch Research Center, South African National Biodiversity Institute, P/Bag X7, Claremont 7735, Cape Town, South Africa
conservation landscapes through time, accounting for both the biological needs of multiple
species and the social and financial constraint of minimizing land area requiring additional
protection. Our method is based on the concept of network flow; we demonstrate its use by
optimizing protected areas in the Western Cape of South Africa to facilitate autogenic species
shifts in geographic range under climate change for a family of endemic plants, the Cape
Proteaceae. In 2005, P. Williams and colleagues introduced a novel framework for this
protected area design task. To ensure population viability, they assumed each species should
have a range size of at least 100 km2of predicted suitable conditions contained in protected
areas at all times between 2000 and 2050. The goal was to design multiple dispersal corridors
for each species, connecting suitable conditions between time periods, subject to each species’
limited dispersal ability, and minimizing the total area requiring additional protection. We
show that both minimum range size and limited dispersal abilities can be naturally modeled
using the concept of network flow. This allows us to apply well-established tools from
operations research and computer science for solving network flow problems. Using the same
data and this novel modeling approach, we reduce the area requiring additional protection by
a third compared to previous methods, from 4593 km2to 3062 km2, while still achieving the
same conservation planning goals. We prove that this is the best solution mathematically
possible: the given planning goals cannot be achieved with a smaller area, given our modeling
assumptions and data. Our method allows for flexibility and refinement of the underlying
climate-change, species-habitat-suitability, and dispersal models. In particular, we propose an
alternate formalization of a minimum range size moving through time and use network flow to
achieve the revised goals, again with the smallest possible newly protected area (2850 km2). We
show how to relate total dispersal distance to probability of successful dispersal, and compute
a trade-off curve between this quantity and the total amount of extra land that must be
We introduce a new way of measuring and optimizing connectivity in
Proteaceae; reserve design; Western Cape, South Africa.
climate change; connectivity; corridor; dispersal; minimum range size; network flow;
Habitat connectivity is a basic concept in protected
area design and more generally in landscape ecology
(Margules and Pressey 2000, Turner et al. 2001).
Ensuring habitat connectivity is an important element
of conserving biodiversity, especially since global climate
change will tend to cause many species to disperse along
climate gradients (Noss 2001, Honnay et al. 2002).
Climate change can contribute to habitat fragmentation
(Bawa and Dayanandan 1998), increasing the need for
explicit modeling of species dispersal and the spatial
arrangement of habitat (Vos et al. 2002, Del Barrio et al.
2006). Here we show that an optimization paradigm
called network flow (Ahuja et al. 1993) provides a
powerful new approach for measuring connectivity and
for multispecies optimization of protected areas and
corridors. Our approach takes into account species’
varying dispersal abilities and the spatial and temporal
distribution of suitable habitat, while making efficient
use of conservation funds by minimizing the total area
requiring new protection.
Our application of network flow falls within the
general framework of systematic conservation planning
(Margules and Pressey 2000). Our work mainly applies
to the selection of additional conservation areas to meet
the previously identified conservation goals (Stage 4 of
Margules and Pressey 2000). It also applies to the review
of existing conservation areas (Stage 3), as we use
network flow to determine species’ effective range size
over time within areas with existing protection. More
generally, we use network flow to evaluate the cost and
impact of a specific set of conservation goals. Our
Manuscript received 26 March 2007; revised 19 December
2007; accepted 7 January 2008; final version received 13
February 2008. Corresponding Editor: S. S. Heppell.
findings can therefore inform the prioritization of
Network flow is a novel approach for measuring
connectivity in a landscape, i.e., the degree to which
components of the landscape such as habitat patches or
protected areas are linked. A variety of other methods
have been used to measure connectivity, and we outline
them here in order to contrast them with network flow.
The simplest notion of connectivity forms the basis of
the patches-in-matrix model of landscape ecology
(Turner et al. 2001): for a given species, two sites are
considered connected, and therefore part of the same
patch, if there is a path between them that uses only
suitable habitat. Some species and ecological processes
are better modeled by considering non-contiguous
patches to be ‘‘functionally connected’’ if they are
separated by a small gap that the species is likely to be
able to cross (With 2002). The geographic distance of
such a gap can vary depending on the quality of
intervening areas between patches (the matrix) which
may be more or less inhospitable for the given species.
The effective distance between patches (also referred to
as ‘‘least-cost path’’ or ‘‘cost-distance’’) can be calculated
using weighted shortest-path computations (Graham
2001, Ray 2005). Similarly, patch isolation measures the
patch’s distance from other patches, and effective
isolation uses potentially varying resistance in the matrix
to weight paths between patches (Ricketts 2001). These
models all involve questions of the form ‘‘Are two
components of the landscape linked?’’ or ‘‘What is the
effective distance between components?’’, and the
answers depend only on the shortest (possibly weighted)
path between the components.
The network flow approach differs in that it addresses
the question: ‘‘What is the capacity of the connection
between two components?’’ It therefore considers the
combined contribution of multiple paths between the
components, and it is applicable when studying ecolog-
ical processes involving quantification of flows of
individuals or genes across the landscape (McRae
2006). Distance measures alone cannot effectively model
such flows. For example, two elongated patches of
suitable habitat a given distance apart in a landscape will
likely have more interchange of organisms if they are
arranged side by side rather than end to end, even
though they have the same least-cost path distance in
both cases. Therefore, there has been recent encourage-
ment for the development of methods for modeling
flows of organisms and genes directly (Forman 2002,
Vos et al. 2002).
There are some existing methods for modeling the
capacity of connections between components of the
landscape. The conductance (or equivalently, the resis-
tance) of electrical networks has been used to model
gene flow and genetic differentiation among populations
in heterogeneous landscapes (McRae 2006). Simulation
models of dispersal use random walks on unweighted
graphs (corresponding to binary habitat suitability) or
weighted graphs (for continuous habitat suitability),
measuring, for example, the time to disperse to a distant
part of the modeled landscape (Malanson 2003). This
measure (mean dispersal time) is closely related to
electrical conductance (Doyle and Snell 1984, Tetali
1991). Landscape connectivity, defined as ‘‘the degree to
which the landscape facilitates or impedes movement
among resource patches’’ (Taylor et al. 1993) focuses on
deriving a single measure of the capacity of all
connections in the landscape. It has been measured in
simulation models by randomly moving individuals and
counting successful immigrants into habitat patches, i.e.,
dispersal success; and measuring the expected time to
move between habitat patches, i.e., search time (Ti-
schendorf and Fahrig 2000). These methods all model
random movement of organisms or genes. In contrast,
flow-based models measure how large or efficient a flow
is possible, without assuming a particular model of
We apply network flow to the task of designing
protected areas for the Cape Proteacea. The Proteaceae
are the most charismatic plants of the South African
Cape Floristic Region, a narrow mountainous coastal
belt with an extremely high degree of richness, ende-
mism, and diversity. For many species of Cape
Proteaceae, the greatest threat to persistence is the low
probability of dispersing from currently occupied areas
to distant areas of predicted future suitable conditions,
since many of the species are known not to disperse and
establish easily over long distances (Midgley et al. 2002).
We focus on autogenic shifts in species ranges because
the ability to actively assist species migration is limited
by lack of understanding and the complexity of the
broader ecological system (Malcolm and Pitelka 2000).
Williams et al. (2005) formalized and gave a heuristic
solution for the task of designing protected areas to
facilitate species shifts in geographic range in response to
climate change. Here we use network flow to construct a
set of protected areas achieving exactly the same
conservation goals as Williams et al. (2005), while
requiring only two-thirds as much additional protected
area. Moreover, we prove that ours is the best solution
mathematically possible, given this model and data.
Network flow thus yields a much better assessment of
the cost of planning for climate change. We also describe
a number of ways that network flow supports general-
izations of this design problem.
The overall purpose of our method is to select a set of
areas to receive additional protection so that a collection
of species can maintain viable populations, taking into
account their dispersal limitations and changing habitat
suitability under climate change. We use a quantitative
model introduced by Williams et al. (2005) and applied
to the Proteaceae of the Cape Floristic Region of South
Africa, in which viability is ensured by protecting a
minimum range size of 100 km2through time for each
July 20081201 OPTIMIZING PROTEA DISPERSAL CORRIDORS
species, based on the area criterion of the World
Conservation Union (IUCN); this translates to 35 103
10cells in the Western Cape.
Selecting protected areas to allow for dispersal
To model protection through time, Williams et al.
(2005) introduced the concept of dispersal chains. For a
given species, a dispersal chain is a sequence of six cells
(possibly with repetitions), c2000, c2010, ..., c2050, such
that each cell has predicted suitable conditions in the
corresponding subscripted year (or is a presently
occupied cell, for c2000), and such that consecutive cells
in the sequence are separated in the grid by no more
than the distance that the species can disperse in a 10-
year period. A dispersal chain therefore represents a
feasible pathway for the species to disperse from cells
where it is currently found to cells where it is predicted
to have suitable conditions in 2050, using only suitable
cells along the way. A set of chains represents a
collection of independent pathways if the chains are
‘‘nonoverlapping’’ (Fig. 1). Combining the ideas of
minimum range size and dispersal chains, Williams et al.
(2005) model the task of designing dispersal corridors as
1) Find dispersal chains for all species. For each species,
the set of chains must:
a) be nonoverlapping;
b) use only untransformed cells with existing or
c) have at least 35 chains, if possible; if less than 35
nonoverlapping chains exist, then the set should
have the maximum possible size.
2) Subject to these requirements, the number of cells
with proposed protection should be as small as
The nonoverlapping requirement is important, as it
imposes the minimum area criterion. Without it, the set
of chains for a species may occupy much less than 35
grid cells during some time period, compromising the
species’ viability (see Discussion for more analysis of this
requirement). However, there are at least two reasonable
ways to define nonoverlapping chains:
1) No two chains use the same cell in the same year.
2) No two chains ever use the same cell, even in
Under the first definition, two nonoverlapping chains
may use the same cell, but only in different years. Fig. 1
shows an example that highlights the difference between
these two definitions. The second (stricter) definition is
the one used by Williams et al. (2005). We will focus
primarily on the first definition, and then show how our
methods extend to the second definition. The two
definitions provide different levels of resilience against
modeling errors. In the case of an incorrect prediction
that a certain cell will be suitable at some particular
future time slice, at most one chain will be affected (will
not contribute to survival of the species) under both
definitions. On the other hand, if a cell can be rendered
permanently unsuitable despite being ‘‘protected,’’ then
for the first definition, multiple chains may be affected,
possibly leaving as many as five fewer pathways for the
species to disperse to future suitable conditions (in the
case where a different chain used that cell in each of the
time slices c2010, c2020, ..., c2050).
Network flow concepts
We now introduce the basic concepts of network flow
that we will need for designing dispersal corridors.
Network flow is an optimization paradigm that is widely
used in computer science and operations research
(Ahuja et al. 1993). It has seen applications in such
diverse areas as telecommunication networks, transpor-
tation networks such as roads and railways, warehous-
ing and distribution, and production line and crew
scheduling, though we believe it is new to ecology. These
applications can all be seen as having an underlying
graph (also called a network), and a commodity that
flows over the graph (for example, telephone calls routed
over a network of fiber-optic cables). Movements of
species between habitat patches in a landscape or
between protected areas can be seen in the same way,
as flows in a graph that models the spatial and temporal
for the years 2000 through 2030. Years 2040 and 2050 have been omitted for simplicity. Some of the cells, colored gray, have
suitable conditions for a hypothetical species that can disperse at most one cell per decade. For the year 2000, only the three cells in
row 2, columns 2 through 4 are suitable. Each decade, the suitable conditions are predicted to move a single cell to the right. There
are three chains pictured, labeled a, b, and c, respectively: chain a uses cell (2,2) in year 2000, cell (2,3) in year 2010, cell (2,4) in year
2020, and cell (2,5) in year 2030. These three chains are nonoverlapping according to Definition 1 of nonoverlapping, which
requires only that no two chains use the same cell in the same year. However, any two of the three chains overlap according to
Definition 2 of nonoverlapping, which requires that no two chains ever use the same cell. For example, chains a and b overlap by
Definition 2 since they both use cell (2,3): chain a uses it in year 2010, while chain b uses it in year 2000.
Example chains and alternative definitions of nonoverlapping chains. A small area is divided into a 3 3 8 grid, shown
STEVEN J. PHILLIPS ET AL.1202
Vol. 18, No. 5
relationship between the patches or protected areas. Just
as a telephone network should be designed to allow
efficient routing of phone calls, networks of protected
areas should be designed to facilitate flows of organisms.
In this section, we give a brief introduction to two
network flow problems that will be useful for optimizing
dispersal corridors, namely ‘‘maximum flow’’ and
‘‘minimum-cost flow’’ (min-cost). For more details on
these and other flow problems and their applications, see
the comprehensive textbook by Ahuja et al. (1993). We
will use only maximum flow to address the dispersal
corridor design task introduced by Williams et al.
(2005), and mention later how min-cost flow may be
useful for variations on the model. We first define
network flow in general terms for arbitrary graphs, then
give an example where the graph represents a matrix of
cells in a landscape. For the dispersal corridor applica-
tion, we will use a more complex graph that represents a
landscape changing over time.
bydirected arcs, and let eacharc have a nonnegative value
called its capacity (Fig. 2A). A flow from one or more
nodes in the graph (sources) to one or more other nodes
(sinks) is a function f that gives a nonnegative value f(a)
for each arc a, satisfying two constraints:
1) a capacity constraint, such that for each arc a, f(a) ?
2) flow conservation, such that at each node other than
the sources and the sinks, the net flow is 0; i.e., the
flow into the node equals the flow out of it.
For an analogy, arcs can be thought of as pipes, and
the flow as the rate of water flowing through them. The
nodes represent junctions where pipes meet, and the flow
conservation constraint says that the amount of water
flowing into any junction must equal the amount flowing
out, unless the junction is a source (where water is being
injected) or a sink (where water is escaping from the
system). The capacity of an arc corresponds to the
maximum flow rate allowed by that pipe. Directed arcs
would correspond to pipes with unidirectional valves
allowing flow only in the direction of the arc. Of course,
ordinary water pipes allow flow in both directions, and
we can model this by a pair of arcs between the same
two nodes, pointing in opposite directions.
The value of a flow is defined as the total net flow out
of the source nodes, or equivalently, the total net flow
into the sink nodes. For our example graph, Fig. 2B
shows a flow of value 3 from node a (the source) to node
e (the sink). The most basic network flow problem is the
maximum flow problem, where we seek a flow of
maximum value, among all flows from a given set of
sources to a given set of sinks. It is easily verified that the
flow shown in Fig. 2B is a maximum flow. Each arc may
also have an associated cost, and the cost of a flow is
then defined as the sum over all arcs of cost times flow.
This leads to the minimum cost flow problem, where we
seek a flow of minimum cost, among all flows of a
specified value between a given source and sink. For
example, if the flow represents goods being trucked
across a road network, arc costs may represent road
lengths, so the cost of a flow is proportional to the
amount of fuel required to transport the goods.
A fundamental property is that if all arc capacities are
integers (and, in the case of min-cost flow, the specified
flow value is also integral), there is always a maximum
flow or min-cost flow that is integral, and that can
therefore be decomposed into a set of paths from
sources to sinks (Ahuja et al. 1993: Theorem 6.5). If the
arc capacities are all equal to 1, such an integral flow can
be decomposed into a set of arc-disjoint paths from
sources to sinks; ‘‘arc-disjoint’’ means that no arc is in
more than one path. Moreover, any set of arc-disjoint
paths from sources to sinks constitutes a flow, so we
have the following properties for any graph with unit
1) Finding the maximum flow is equivalent to finding
the maximum number of arc-disjoint paths from
sources to sinks.
2) Finding a minimum-cost flow of integer value F is
equivalent to finding a set of F arc-disjoint paths
from sources to sinks with minimum total cost.
In our example flow of Fig. 2B, the flow as pictured is
integral. It can be decomposed into three paths from the
source to the sink, namely the paths (a, b, d, e), (a, c, d,
e), and (a, c, e). The paths are not arc-disjoint, because
..., e) and six directed arcs. Each arc is labeled with a
nonnegative ‘‘capacity.’’ In (B), the graph has been labeled to
show an example ‘‘flow’’ from a single source node (a) to a
single sink node (e). Each arc is labeled first with its capacity,
then with the flow. The flow satisfies the following defining
properties: it is nonnegative, obeys the capacity constraint on
each arc, and satisfies flow conservation (flow in equals flow
out) at each node that is not a source or sink. The flow has
value 3, since that is the net flow out of the source (node a), or
equivalently, the net flow into the sink (node e).
(A) An example directed graph has five nodes (a, b,
July 2008 1203 OPTIMIZING PROTEA DISPERSAL CORRIDORS
the first and second share the arc (d, e) and the second
and third share the arc (a, c).
Another useful property of flows is the ‘‘max-flow
min-cut theorem,’’ which says that the maximum flow
from a set of source nodes to a set of sink nodes is equal
to the minimum capacity among all cuts. A cut means a
partition of the nodes into two subsets, one containing
all of the sources and the other all of the sinks, and the
capacity of a cut is the sum of the capacities of arcs
crossing the cut from the source side to the sink side. In
other words, the max-flow min-cut theorem says that
finding the maximum flow value is the same as finding
the bottleneck cut that most constrains flow from
sources to sinks. This equivalence may find ecological
applications, since the bottleneck that most constrains
connectivity between areas of habitat for a species often
warrants conservation attention; habitat corridors are
examples of this.
Our second example uses min-cost flow to measure
connectivity between two elongated habitat patches in a
matrix (Fig. 3). The cost of a flow from one patch to the
other is lower if the patches are arranged side by side
(Fig. 3A) than if they are arranged end to end (Fig. 3B).
Min-cost flow therefore allows us to model the
availability and length (or more generally, some cost
measure) of multiple paths between the patches, in
contrast to least-cost path measures that depend only on
a single path.
There are a number of efficient algorithms to compute
optimal solutions for the maximum flow and min-cost
flow problems (Ahuja et al. 1993). Network flow type
problems are also easily represented as linear programs,
because flow conservation and capacity requirements are
naturally represented as linear constraints. Some com-
mercial linear program solvers can take advantage of the
special structure of network flow constraints to quickly
solve large flow problems.
Modeling dispersal chains as flow
Here we show how to represent nonoverlapping
dispersal chains as flows in a directed graph that
represents dispersal ability in the landscape. We focus
on the first definition of nonoverlapping (see Appendix
for modifications to represent the second definition). We
assume that each species can disperse either one cell or
three cells per time slice (as is the case with the Cape
Proteaceae; see Methods: The Cape Proteaceae). There
are therefore two versions of the graph, called G1and G3,
corresponding to the maximum dispersal distance per
time slice being either one cell (see Fig. 4) or three cells.
The nodes of the graph consist of 2 3 NYEARS layers
(where NYEARS ¼ 6), numbered starting with layer 0;
has the following collection of arcs, each with capacity 1:
1) In-year arcs. For each y 2 f0 ... NYEARS? 1g, and
for each grid cell c, there is an arc from n2y,cto n2yþ1,c.
The arc represents the ability of a species to establish
and persist in the cell c during the yth time slice.
2) Between-year arcs. For each y 2 f0 ... NYEARS ?
2g, and for each pair of cells c1, c2that are within the
maximum dispersal distance (one for G1and three for
G3), there is an arc from n2yþ1;c1to n2yþ2;c2. The arc
represents the ability of a species to disperse from cell
c1to the cell c2by the end of the yth time slice.
grid of cells (gridlines shown in gray). There are two 133 patches of suitable conditions for a hypothetical species (shown in black),
arranged (A) side by side or (B) end to end. The cells are in a uniform matrix that has conditions that are unsuitable for the species,
but that can be traversed. For each arrangement, the black arrows show a minimum-cost flow of value 8, for the graph defined as
follows: there is a node for each cell, with unit-capacity arcs from each cell to each of its four neighboring cells. Cells in the left
patch are sources, while cells in the right patch are sinks. If all arcs have unit cost, the flow in (A) has cost 52, while the flow in (B)
has cost 68. Both flows are maximum flows, since there is a cut of value 8 (the boundary of either patch). A minimum-cost flow of
value 3 has cost 6 for (A) vs. 10 for (B).
Example of the use of network flow to measure connectivity between two patches in a matrix. An area is divided into a
STEVEN J. PHILLIPS ET AL.1204
Vol. 18, No. 5
The nodes in layers 0 and 1 represent the time slice
2000, layers 2 and 3 represent the time slice 2010, layers
4 and 5 represent the time slice 2020, and so on. A path
through G1 (or G3) from the first layer to the last
corresponds to a dispersal chain, constrained by a
dispersal distance of 1 (or 3). For the graph G1, between-
year arcs are analogous to the arcs in Fig. 3, the
difference being that here the arcs all move forward in
time (and that there are arcs to the four diagonal spatial
neighbors in addition to the two horizontal and two
To model a particular species s, we choose G1or G3
according to the maximum dispersal distance for s, and
then delete nodes (and all adjacent arcs) corresponding
to cells predicted to be unsuitable for s. In symbols, if a
cell c is unsuitable for s in the ith time slice, we delete
nodes n2i,cand n2iþ1,cand all adjacent arcs. We call the
resulting graph G(s). To model a set P of protected
areas, we delete from G(s) all nodes corresponding to
cells not in P. We call the resulting graph G(s, P).
For any species s and set P of cells, let the maximum
flow from the first layer to the last in G(s, P) be denoted
maxflow(s, P). This is the maximum number of disjoint
chains possible if exactly the cells in P are protected. Let
P0be the set of already-protected cells, and let N be the
set of all non-transformed cells. Define required flow(s)
for a species s to be the smaller of 35 and maxflow(s, N).
We call a set P of cells feasible for a species s if P is a
superset of P0and maxflow(s, P) ? required flow(s), or
equivalently, if there exists a set of required flow(s)
nonoverlapping dispersal-constrained chains within P.
A set P of cells is feasible if it is feasible for all species.
The task at hand is to find a feasible set whose size is as
small as possible.
A similar graph was used by Williams et al. (2005), the
essential difference being that we have doubled the
number of nodes and added in-year arcs. The reason for
doing this doubling is that network flow induces arc-
disjoint paths. Arc-disjoint paths in the doubled graph
correspond to node-disjoint paths (i.e., with no nodes
being in two paths) in the graph of (Williams et al.
2005), i.e., nonoverlapping dispersal-constrained chains.
This method of mapping node-disjoint paths into arc-
disjoint paths is called ‘‘node splitting’’ (Ahuja et al.
1993: Section 2.4). Finally, we note that there is no
restriction on multiple species occupying a single cell,
since the flow for a species is constrained only by
suitable conditions for that species, and by the set of
protected cells. The network flow for each species takes
place in a different graph tailored for that species. The
only element that ties these graphs together is that they
use the same set of protected cells.
Optimizing flow via integer programming
We now show how to use the flow-based character-
ization of dispersal chains to formulate an integer
program that minimizes the required protected area,
while meeting the conservation goals. A linear program
consists of a linear objective and a set of linear
constraints, i.e., it takes the following form:
maximize d ? x
subject to Ax ? b
where b and d are fixed real-valued vectors, A is a fixed
real-valued matrix, and x denotes the vector of
variables. Here, ? denotes the vector dot product, and
the inequality Ax ? b is meant to apply component-wise
to the vectors Ax and b. An integer program takes the
same form, with the added constraint that the variables
in x must have integer values, and a mixed integer
program has some subset of the variables in x
constrained to be integral.
Flow problems can be readily formulated as linear or
integer programs, because the capacity and flow
conservation constraints are all linear. We therefore
formalize the corridor design task as a mixed integer
program, with two types of variables:
1) Flow variables fsafor each species s and each arc a
of G1(s) or G3(s), depending on the maximum
dispersal distance of s.
2) A preserve variable pcfor each non-transformed cell
c indicating its protected status, with 0 representing
unprotected, and 1 representing protected.
and persistence possibilities for a species with
maximum dispersal distance equal to 1. There are
12 layers, two for each of the years 2000, 2010,
2020, 2030, 2040, and 2050. Each layer has a
node for each non-transformed cell. An example
cell is shown as a black circle in some layers.
White circles represent cells that are neighbors of
the example cell. Representative examples of two
types of arcs are shown. An in-year arc (labeled
a) is present if the example cell is predicted to be
suitable for the species in that year. Between-year
arcs (labeled b) represent the ability of the species
to disperse from the example cell to nearby cells.
All arcs are left to right (arrowheads omitted for
clarity) and have unit capacity.
The graph G1, representing dispersal
July 2008 1205 OPTIMIZING PROTEA DISPERSAL CORRIDORS
The objective is to minimize the sum of the pc
variables, i.e., the total number of protected cells. The
constraints are of four types:
1) Flow conservation. For each species s, these con-
straints ensure that the variables fsaconstitute a flow.
2) Required flow. These enforce the minimum area
requirement. For each species s, the constraint is that
the flow out of the source nodes (or equivalently, into
the sink nodes) must be at least required flow(s).
3) Capacity. These ensure that the flows use only
protected cells. For each in-year arc a corresponding
to a cell c in time slice y, there is a constraint fsa? pc
for each species s for which cell c is suitable in year y.
Inother words,inthe flownetwork for each species for
which this arc exists, the capacity of the arc is set to pc.
4) Existing protection. For all cells c that are already
protected, pc¼ 1. The preserve variables pcmust be
explicitly constrained to be integers. Because of the
integrality property of flows (mentioned earlier),
having integer capacities pcwill automatically make
the flow variables fsasolve to integers.
The Cape Proteaceae
To demonstrate our approach, we apply it to the
Proteaceae of the Cape Floristic Region of South Africa,
using exactly the same data as Williams et al. (2005). The
relevant data on these species consist of:
1) A grid of 32400 10310cells covering the western part
of the Cape Floristic region as far east as 208480E, as
far north as 318530S, and all the way to the coast on
the west and south (see Fig. 5).
2) The current (2000) distributions of 282 species of
Proteaceae (presence or absence in each grid cell).
3) Predicted areas of suitable conditions for each species
in years 2010, 2020, 2030, 2040, and 2050, derived
from Schulze and Perks (1999), based on climate
predictions from the general circulation model
HadCM2 (Johns 1997). The climate predictions were
fed into models of the species’ bioclimatic require-
ments (trained on the current distributions) to predict
areas of future suitable conditions. The bioclimatic
models use the following variables, chosen for their
direct physiological impact: mean minimum temper-
ature of the coldest month, annual sum of daily
species of Proteaceae along at least 35 paths, where possible, within areas of predicted suitable conditions under climate change.
Key: pale gray, areas with 66% or more transformation of habitat; medium gray, existing protected areas; black, goal-essential cells
(required by any feasible solution); dark gray, additional cells requiring protection, as determined by solving an integer program.
There are 994 dark gray or black cells, and this is the minimum possible number for which the conservation goal is achievable.
Map of the western part of the Cape Floristic Region of South Africa showing areas chosen to allow dispersal of 282
STEVEN J. PHILLIPS ET AL.1206
Vol. 18, No. 5
temperatures exceeding 188C, annual potential evap-
oration, and summer and winter soil moisture days
(days where soil moisture is above a critical level for
plant growth; Midgley et al. 2002).
4) A list of 1525 ‘‘already protected’’ cells (those with
adequate existing statutory protection).
5) A list of 6036 ‘‘transformed’’ cells, for which more
than two-thirds of the area in the cell has been
transformed to an unsuitable state by human activity.
6) For each of the 282 species, a ‘‘dispersal distance’’ of
one or three cells (for ant/rodent-dispersed or wind-
dispersed species, respectively), representing the
distance the species can reasonably be expected to
disperse in a 10-year period. If the dispersal distance
is one, the species can disperse from a given cell to
any of the nine cells forming a square around the
given cell; if the dispersal distance is three, the
species can disperse from a given cell to any of the
25 neighboring cells arranged in a diamond around
the given cell. (The nine and 25 numbers include the
cell itself, modeling dispersal or survival within the
We solved the integer programs for the Cape
Proteaceae to obtain solutions with the smallest possible
number of protected cells, given the constraints (see
Appendix for computational details). For the first
definition of nonoverlapping, 994 cells are necessary
and sufficient (Fig. 5), while for the second definition,
1068 cells are required (Fig. 6). The difference in the
number of cells reflects the fact that the second
definition gives resilience to more types of uncertainty
in the input data, thus requiring more protected area for
some species. For comparison, Williams et al. (2005)
required additional protection for 1602 cells, which is
50% greater than our solution that meets the same
conservation goals (using the second definition of
For 26 species, the required flow under the second
definition of nonoverlapping is smaller than under the
first definition, because fewer nonoverlapping chains are
possible. For those species, therefore, a smaller mini-
mum area is maintained over time in the solution to the
integer program. The species with the largest difference
is Serruria bolusii Phill. & Hutch. (Protea Atlas Code:
species of Proteaceae along at least 35 paths obeying the Williams et al. (2005) definition of nonoverlapping, where possible, within
areas of predicted suitable conditions under climate change. Key: pale gray: areas with 66% or more transformation of habitat;
dark gray, existing protected areas; black, additional cells requiring protection, as determined by solving an integer program. There
are 1068 black cells, and this is the minimum possible number for which the conservation goal is achievable. Goal-essential cells
could be determined as in Fig. 5, if necessary, by solving additional integer programs.
Map of the western part of the Cape Floristic Region of South Africa showing areas chosen to allow dispersal of 282
July 20081207 OPTIMIZING PROTEA DISPERSAL CORRIDORS
sebolu), which has at least 35 nonoverlapping paths by
Definition 1, while only 25 nonoverlapping paths that
satisfy Definition 2 are possible. The largest propor-
tional decrease is for Paranomus longicaulis Salisb. ex
Kn. (Protea Atlas Code: palong), which drops from 17
nonoverlapping paths to 8 nonoverlapping paths under
We note that the heuristic of Williams et al. (2005)
does not find the true maximum possible number of
chains for some species, including five of the 18 species
(shown in their Table 2) that are obligate dispersers
under the second definition of nonoverlapping. Our
application of network flow does find the true maxi-
mum, thus allowing us to achieve stronger conservation
goals for some of the species that are likely to be most
vulnerable to climate change.
Despite being produced according to different re-
quirements, the solutions of Figs. 5 and 6 are broadly
similar: more than 80% of the newly protected cells in
Fig. 5 are also protected in Fig. 6. The two solutions
both contain cells clustered in a number of distinct areas,
especially in the hills between Hermanus and Napier, the
Piketberg, the peninsula east of Simon’s Town, and the
Langeberg Mountains. These areas were also highlight-
ed by the solution of Williams et al. (2005). However,
their solution also contains a large cluster in the
Cedarberg, roughly doubling the existing amount of
protected land, and in contrast, our solutions reveal that
very little additional protection is required in the
Cedarberg for the given conservation goals.
We have used network flow to exactly characterize the
number of nonoverlapping dispersal constrained paths,
which captures the notion of a species’ range size during
the time that its range is experiencing a geographic shift.
Using network flow constraints in a mixed integer
program, we found the optimum configuration of
protected areas to support a minimum range size
conservation goal simultaneously for 282 species of
Cape Proteaceae, reducing by one-third the number of
cells required to achieve the same conservation goal,
compared to a previous study (Williams et al. 2005). The
same method can be applied to any other group of
species, as long as maps of predicted future suitable
conditions can be generated for them. Similarly, the
method can be used with updated future climate
scenarios, as models of climate change evolve. Our
results show that network flow can be a powerful tool
for modeling and optimizing the capacity of connections
between different components of a landscape, and we
expect that it will find more applications in landscape
ecology and reserve design.
Sources of uncertainty and potential errors
The optimization we have described depends on a
number of levels of modeling, each of which comes with
some potential sources of error. A major source of
uncertainty derives from the modeling of future climate
conditions, especially since there is little agreement
among climate models on likely future precipitation
trends, and the distribution of Cape Proteaceae is
strongly influenced by precipitation. A second major
source of uncertainty is the projection of species
distribution models onto future climate conditions.
Models produced by different modeling methods may
give very different future predictions (Pearson et al.
2006). Furthermore, we have assumed that the relation-
ship between suitability for each species and climatic
conditions is constant over time, and that generation
times for the modeled species are similar to the 10-year
duration of time slices. Continuous models were thresh-
olded to give predictions of future suitable areas, so the
predictions are sensitive to the choice of thresholding
rule. The modeling of species’ dispersal abilities and
minimum area requirements and the treatment of
habitat transformation are all quite simplistic, and open
to future revision when more information is available
(Williams et al. 2005). The similarity of our two
solutions for different formalizations of the minimum-
area requirement suggest that this detail of the approach
is not a major source of uncertainty. However, given the
significant uncertainties in climate and species models
and their interaction, and other simplistic details of our
modeling approach, the solutions that we have produced
should be considered tentative, and subject to revision as
the underlying models and data are improved. Never-
theless, our solutions identify areas that may be
important for persistence of the Cape Proteaceae, given
our current knowledge. By focusing on optimization, we
have eliminated one source of uncertainty, the uncer-
tainty due to the use of computational heuristics rather
than exact optimization in the design of dispersal
corridors. Indeed, we find that expanding the amount
of protected area in the Cedarberg is not as important as
was suggested by Williams et al. (2005), so more priority
can be given to other areas.
Choice of optimization criterion
We note that it is possible to model more complicated
dispersal mechanisms than we have considered here, for
example allowing populations to branch and merge.
However, to insure against stochastic events eliminating
small populations or rendering small areas unsuitable,
and to insure against uncertainty in the underlying data,
it is important that a minimum number of separate
populations be able to migrate along independent
pathways that maintain suitable climatic conditions.
Chains constitute the minimum requirement that en-
sures species viability through the existence of such
independent pathways. By focusing on chains, we are
primarily answering the question: ‘‘How much land
must be protected to ensure viability during climate
change?’’ However, given sufficient resources, we may
desire protected area configurations that offer more than
simple viability: we may prefer to maximize expected
STEVEN J. PHILLIPS ET AL. 1208
Vol. 18, No. 5
population sizes, subject to the viability constraints we
have modeled here. The latter optimization problem
would probably require more explicit modeling of
populations than the present study. We may also
combine requirements on flows and chains with other
optimization criteria, such as preferring contiguous
In this paper, we are primarily trying to minimize the
number of additional grid cells requiring protection.
That is, we seek a solution of minimum cost that satisfies
the survival requirements, where each grid cell we use is
assigned a cost of one. The model could also accom-
modate different costs for each grid cell, simply by
changing the coefficient in the mixed integer program’s
objective function that corresponds to that cell. Here we
also consider attaching costs to the arcs, in order to
model how likely the species is to survive and disperse at
each step in a dispersal chain. We set the cost of each
between-year arc to be the Euclidean distance between
the endpoints of the arc; in-year arcs had cost 0. This
results in the cost of a chain being the sum of the
Euclidean distances of its implied dispersal events.
Adding the flow cost to the objective function of our
mixed integer program combines the financial cost of
creating new protected areas with the biological cost of
requiring the species to disperse further. On its face, it
seems that adding these costs is comparing apples and
oranges. This is a common difficulty in bi-criterion
optimization problems. We want to maximize the
chances of species survival, but protecting grid cells is
difficult and has other costs that push us to minimize the
number of grid cells we protect. We need to somehow
balance these two criteria. One approach is to place a
hard limit on one of the two costs while optimizing the
other. Another approach is to multiply one of the costs
by a scaling parameter and add it to the other. Both
approaches allow us to vary the appropriate parameter
over a range, in order to fill out a trade-off curve of total
dispersal distance required (and implied reduction of
survival probability) vs. grid cell protection cost; we
chose to implement the former approach. We started
with the number of protected cells at 994, the minimum
for which the goal is achievable, and then increased it in
steps of 10 from 1000 to 1340, the point at which adding
more protected cells does not help to reduce the flow
cost (Fig. 7). On the left of the plot we see that a 5%
increase in the number of protected cells allows the flow
cost to be reduced 20%. Conversely, the right end of the
plot shows that many more cells must be protected if we
want to keep the flow cost at or near its minimum
possible value. A lower flow cost means that the species
are not required to disperse as far, so their probability of
successfully dispersing along the chains is increased.
Therefore, we conclude that a small increase above the
absolute minimum number of protected cells may be
justified, in order to increase the probability of the
species’ survival, but that maximizing survival proba-
bility (by minimizing flow cost) is not likely to be a cost-
effective use of conservation funding.
Variable dispersal likelihood and habitat suitability
The network flow and integer programming approach
offers some new possibilities for solving variations of the
corridor design task. We have seen this already in the
use of two notions of nonoverlapping; here we describe
some variations that use min-cost flow instead of
maximum flow. One possibility is to represent species
ranges with continuous values rather than binary ones,
denoting varying levels of suitability for the species. In
addition, a given species may be more likely to disperse
to nearby cells than to further away cells, or to cells in a
certain direction, due to prevailing winds. A simple way
of modeling varying dispersal likelihood is given by the
bi-criteria optimization described above, where there is
an implied cost for each dispersal event, proportional to
the dispersal distance. We can represent varying
probabilities of persistence and dispersal more explicitly
and with more detail by varying arc costs. For example,
we can make the cost of an in-year arc be the negative
log probability that the species would persist in the
corresponding cell for the corresponding 10-year period.
Similarly, we can make the cost of a between-year arc be
the negative log probability that the species would
disperse between the corresponding two cells in a 10-
year period. The cost of a source–sink path, being the
sum of the costs of its arcs, is then equal to the negative
log probability that the species manages to persist and
then disperse as required at each step in the correspond-
the number of protected cells and the flow cost. The flow cost is
the sum of the Euclidean distances of the dispersal chains. The
minimum possible number of protected cells is 994, while the
minimum possible flow cost is 1794.37. The points on the graph
show selected solutions trading off the two measures. At each
point, the y-value gives the minimum possible flow cost for the
given number of protected cells, if CPLEX solved the
corresponding mixed integer program within a few days;
otherwise the upper and lower bounds found for the minimum
flow cost are shown with an error bar.
Trade-off curve for the bi-criteria optimization of
July 20081209 OPTIMIZING PROTEA DISPERSAL CORRIDORS
ing dispersal chain, assuming each of these events is
statistically independent. Finding a min-cost flow of a
particular value v would then be equivalent to finding a
set of v dispersal chains that maximizes the probability
of successful dispersal along all the chains simultaneous-
ly. Cost here represents biological cost; economic costs
(such as variations in land price) can also easily be
Alternatively, we may be able to estimate the size of
the population that each cell can sustain in each time
slice. If we set the capacity of in-year arcs to this
estimate, then a flow can be interpreted as modeling the
dispersal of a whole population. Such a flow may no
longer be decomposable into nonoverlapping chains,
and it might use a different number of cells in each layer.
For example, it might use 35 highly suitable cells in one
year, and 50 fairly suitable cells in the next.
Another possibility for adding realism to the model is
to use what is known as generalized flow. In ordinary
network flow, the flow entering an arc is the same as the
flow leaving the arc. Generalized flow allows for there to
be multiplicative gains or losses. (For a multiplicative
loss, think of the arc as a leaky pipe.) This would be
another potential way to model the probability of
dispersal between two grid cells. Generalized flow has
also been well-studied in the operations research and
computer science communities, and there are efficient
computational techniques known for dealing with it.
Varying conservation goals
If model outputs will be used to prioritize new areas
for conservation, it may be useful to propose more
contiguous or compact areas of new protection. This can
be achieved by adding a boundary length modifier,
which adds a penalty to the objective function for each
pair of neighboring cells with different protection status
(Possingham et al. 2000). The mixed integer program
will then compute the optimum trade-off between
boundary length and number of proposed sites. A larger
penalty value will produce more contiguous designs.
Similarly, it may be useful to propose protection only
for sites that are needed by multiple species. This can be
achieved by making the required flow value for each
species be a variable, rather than a constant, and adding
a penalty to the objection function for flow values that
are less than the conservation goal. The penalty can vary
per species, depending on the maximum achievable
number of chains, or other measures of the conservation
importance of each species. This will allow the mixed
integer program to find the optimal trade-off between
the conservation goals and number (or economic cost)
of proposed sites.
Long-range dispersal events
The potential usefulness of the approach we describe
relates to the ongoing debate about the migration
capacity of species under climate change. For example,
fossil pollen records have been interpreted as indicating
that tree populations migrated much faster during post-
glacial warming in North America and Europe than our
current knowledge of seed dispersal mechanisms would
suggest (Davis 1981, Huntley and Birks 1983). Rare and
poorly understood long-distance dispersal events could
therefore be an important factor in the ability of these
and other species’ ranges to track suitable conditions
during periods of rapid climate change (Clark 1998,
Pearson and Dawson 2005), but such events are not
included in the dispersal models considered in this
paper. However, there is recent genetic evidence that tree
migration rates at the end of the last glacial period were
much slower than fossil pollen data suggests, reducing
the need to posit rare long-distance dispersal events
(McLachlan et al. 2005). Indeed, range expansion into
areas predicted to become suitable may be slow, lagging
trailing edge population extinction (Foden et al. 2007).
It is clear that further research is warranted to
understand the full range of dispersal mechanisms. This
is especially true for any species expected to undergo
pronounced range shifts, like many Proteaceae (Midgley
et al. 2003). Meanwhile, conservation planning must be
done with respect to our current understanding of
species’ dispersal abilities, and flexible planning methods
must be developed to be able to incorporate and adapt
to future insights into species’ dispersal abilities. In
general, we prefer to err on the side of caution, and use
conservative estimates of species’ dispersal abilities.
We thank Richard Pearson for helpful comments on the
manuscript, and David Applegate for sharing his expertise on
AMPL, CPLEX, and integer programming. Two anonymous
reviewers made a number of excellent suggestions resulting in
improvements to the presentation.
Ahuja, R. K., T. L. Magnanti, and J. B. Orlin. 1993. Network
flows: theory, algorithms, and applications. Prentice Hall,
Englewood Cliffs, New Jersey, USA.
Bawa, K. S., and S. Dayanandan. 1998. Global climate change
and tropical forest genetic resources. Climatic Change
Clark, J. 1998. Why trees migrate so fast: confronting theory
with dispersal biology and the paleorecord. American
Davis, M. 1981. Quaternary history and the stability of forest
communities. Pages 132–153 in D. West, H. Shugart, and D.
Botkin, editors. Forest succession: concepts and application.
Springer-Verlag, New York, New York, USA.
Del Barrio, G., P. A. Harrison, P. Berry, N. Butt, M. Sanjuan,
R. Pearson, and T. Dawson. 2006. Integrating multiple
modelling approaches to predict the potential impacts of
climate change on species’ distributions in contrasting
regions: comparison and implications for policy. Environ-
mental Science and Policy 9:129–147.
Doyle, P., and J. Snell. 1984. Random walks and electric
networks. Mathematical Association of America, Washing-
ton, D.C., USA.
Foden, W., G. F. Midgley, G. Hughes, W. J. Bond, W. Thuiller,
M. T. Hofiman, P. Kaleme, L. G. Underhill, A. Rebelo, and
L. Hannah. 2007. A changing climate is eroding the
geographical range of the Namib Desert tree Aloe through
STEVEN J. PHILLIPS ET AL. 1210
Vol. 18, No. 5
population declines and dispersal lags. Diversity and Download full-text
Forman, R. T. 2002. Foreword. Pages vii–x in K. J. Gutzwiller,
editor. Applying landscape ecology in biological conserva-
tion. Springer-Verlag, New York, New York, USA.
Graham, C. H. 2001. Factors influencing movement patterns of
Keel-billed Toucans in a fragmented tropical landscape in
Southern Mexico. Conservation Biology 15(6):1789–1798.
Honnay, O., K. Verheyen, J. Butaye, H. Jacquemyn, B.
Bossuyt, and M. Hermy. 2002. Possible effects of habitat
fragmentation and climate change on the range of forest
plant species. Ecology Letters 5(4):525–530.
Huntley, B., and H. Birks. 1983. An atlas of past and present
pollen maps for Europe: 0–13,000 B.P. Cambridge University
Press, Cambridge, UK.
Johns, T. C., R. E. Carnell, J. F. Crossley, J. M. Gregory,
J. F. B. Mitchell, C. A. Senior, S. F. B. Tett, and R. A.
Wood. 1997. The second Hadley Centre coupled ocean-
atmosphere GCM: model description, spinup and validation.
Climate Dynamics 13(2):103–134.
Malanson, G. P. 2003. Dispersal across continuous and binary
representations of landscapes. Ecological Modelling 169:17–
Malcolm, J. R., and L. F. Pitelka. 2000. Ecosystems and global
climate change: a review of potential impacts on U.S.
terrestrial ecosystems and biodiversity. Pew Center on Global
Climate Change, Arlington, Virginia, USA.
Margules, C., and R. Pressey. 2000. Systematic conservation
planning. Nature 405:243–253.
McLachlan, J. S., J. S. Clark, and P. S. Manos. 2005. Molecular
indicators of tree migration capacity under rapid climate
change. Ecology 86:2088–2098.
McRae, B. 2006. Isolation by resistance. Evolution 60:1551–
Midgley, G. F., L. Hannah, D. Millar, M. C. Rutherford, and
L. W. Powrie. 2002. Assessing the vulnerability of species
richness to anthropogenic climate change in a biodiversity
hotspot. Global Ecology and Biogeography 11:445–451.
Midgley, G. F., L. Hannah, D. Millar, W. Thuiller, and A.
Booth. 2003. Developing regional and species-level assess-
ments of climate change impacts on biodiversity in the Cape
Floristic region. Biological Conservation 112:87–97.
Noss, R. F. 2001. Beyond Kyoto: forest management in a time
of rapid climate change. Conservation Biology 15(3):578–
Pearson, R. G., and T. P. Dawson. 2005. Long-distance plant
dispersal and habitat fragmentation: identifying conservation
targets for spatial landscape planning under climate change.
Biological Conservation 123:389–4001.
Pearson, R., W. Thuiller, M. Arau ´ jo, L. Brotons, E. Martinez-
Meyer, C. McClean, L. Miles, P. Segurado, T. Dawson, and
D. Lees. 2006. Model-based uncertainty in species’ range
prediction. Journal of Biogeography 33:1704–1711.
Possingham, H., I. Ball, and S. Andelman. 2000. Mathematical
methods for identifying representative reserve networks.
Pages 291–305 in S. Ferson and M. Burgman, editors.
Quantitative methods for conservation biology. Springer-
Verlag, New York, New York, USA.
Ray, N. 2005. Pathmatrix: a geographical information system
tool to compute effective distances among samples. Molec-
ular Ecology Notes 5:177–180.
Ricketts, T. H. 2001. The matrix matters: effective isolation in
fragmented landscapes. American Naturalist 158:87–99.
Schulze, R. E., and L. A. Perks. 1999. Assessment of the impact
of climate. Final report to the South African Country Studies
Climate Change Programme. School of Bioresources Engi-
neering and Environmental Hydrology, University of Natal,
Pietermaritzburg, South Africa.
Taylor, P., L. Fahrig, K. Henein, and G. Merriam. 1993.
Connectivity is a vital element of landscape structure. Oikos
Tetali, P. 1991. Random walks and the effective resistance of
networks. Journal of Theoretical Probability 4:101–109.
Tischendorf, L., and L. Fahrig. 2000. How should we measure
landscape connectivity? Landscape Ecology 15:633–641.
Turner, M., R. Gardner, and R. O’Neill. 2001. Landscape
ecology in theory and practice. Springer, New York, New
Vos, C. C., H. Baveco, and C. J. Grashof-Bokdam. 2002.
Corridors and species dispersal. Pages 84–104 in K. J.
Gutzwiller, editor. Applying landscape ecology in biological
conservation. Springer-Verlag, New York, New York, USA.
Williams, P., L. Hannah, S. Andelman, G. Midgley, M. Arau ´ jo,
G. Hughes, L. Manne, E. Martinez-Meyer, and R. Pearson.
2005. Planning for climate change: identifying minimum-
dispersal corridors for the Cape Proteaceae. Conservation
With, K. A. 2002. Using percolation theory to assess landscape
connectivity and effects of habitat fragmentation. Pages 105–
130 in K. J. Gutzwiller, editor. Applying landscape ecology in
biological conservation. Springer-Verlag, New York, New
Modeling the second definition of nonoverlapping chains (Ecological Archives A018-043-A1).
July 20081211 OPTIMIZING PROTEA DISPERSAL CORRIDORS