Page 1

Evaluating Strategies for Running from the Cops

Carsten Moldenhauer and Nathan R. Sturtevant

Department of Computer Science

University of Alberta

Edmonton, AB, Canada T6G 2E8

moldenha, nathanst@cs.ualberta.ca

Abstract

Moving target search (MTS) or the game of cops

and robbers has a broad field of application reach-

ing from law enforcement to computer games.

Within the recent years research has focused on

computing move policies for one or multiple pur-

suers (cops). The present work motivates to ex-

tend this perspective to both sides, thus developing

algorithms for the target (robber). We investigate

the game with perfect information for both play-

ers and propose two new methods, named TrailMax

and Dynamic Abstract Trailmax, to compute move

policies for the target. Experiments are conducted

by simulating games on 20 maps of the commercial

computer game Baldur’s Gate and measuring sur-

vival time and computational complexity. We test

seven algorithms: Cover, Dynamic Abstract Mini-

max, minimax, hill climbing with distance heuris-

tic, a random beacon algorithm, TrailMax and DA-

TrailMax. Analysis shows that our methods outper-

form all the other algorithms in quality, achieving

up to 98% optimality, while meeting modern com-

puter game computation time constraints.

1

Moving target search (MTS), or the game of cops and rob-

bers, has many applications reaching from law enforcement

to video games.The game was introduced into the arti-

ficial intelligence literature by [Ishida and Korf, 1991] as

a new variant of the classical search problem. Following

this study, the question how to catch a moving prey effec-

tively has been studied extensively [Ishida and Korf, 1995;

Koenig et al., 2007; Isaza et al., 2008].

In today’s computer games, the players control robbers be-

ing chased by computer generated police agents. The same

game turned around, i.e. the player controlling a cop and hav-

ingtochasedownacomputergeneratedrobber, isfarfromre-

alizable. This is due to the fact that the focus in MTS research

has always been in developing strong pursuit strategies. Very

little is known on how to compute strategies for the target.

This paper presents a systematic study of move policies for

the robber, thus enabling better target modelling and widen-

ing the current focus in MTS.

Introduction

The game of cops and robber has also been studied in

the mathematical literature (see [Hahn, 2007] for a survey).

Here, cops and robber alternatingly choose their initial posi-

tions at the beginning of the game and then play as in MTS.

The search time of a graph, i.e. the time needed by optimal

playing cops to catch the robber, is therefore a constant. Be-

sides bounds for the one cop and one robber problem, little is

known about this graph property. However, a first algorithm

that runs in polynomial time and which computes the search

time has been developed in [Hahn and MacGillivray, 2006].

Given this algorithm it is possible to determine optimal poli-

cies for both players.

Computer games require tight bounds on resource usage,

especially computation time. Therefore, computing optimal

policies, even though possible in polynomial time with the

above algorithm, is not practical. This gives rise to the ques-

tion of how to quickly compute approximations that yield

near-optimal move policies. In the following, we will in-

troduce a new algorithm called TrailMax and its variant Dy-

namic Abstract TrailMax to respond to this question.

As optimality has only been studied recently, previous

work in MTS has been concerned with approximative solu-

tions and has not, whether for the pursuer or the target, com-

pared methods against optimal policies. Therefore, this paper

is the first to conduct a study of various target algorithms with

respect to their achieved suboptimality.

A precise definition of the cops and robber game consid-

ered in this work will be given in Section 2. We review exist-

ing algorithms, including Cover and Dynamic Abstract Min-

imax, and outline their strengths and weaknesses in Section

3. The new methods, TrailMax and Dynamic Abstract Trail-

Max, are introduced in Section 4. Evaluations of experiments

and extensive comparisons of various target algorithms when

playing against an optimal cop can be found in Section 5. We

wrap up with conclusions in Section 6.

2

The game of cops and robber is played with n cops and one

robber. Cops and robber occupy vertices in a finite undirected

connected graph G and are allowed to move to an adjacent

vertex or remain on their current location in each turn. Turns

are taken alternatingly beginning with the first to last cop fol-

lowed by the robber. The game is played with perfect in-

formation, i.e. the graph and all locations of all agents are

Game Definition

584

Page 2

C

R

CR

Figure 1: Map abstraction for DAM.

known. G is called n-cop-win if n cops have a winning strat-

egy on G.

Since our focus is on the target and the cop is potentially

played by a human player, we concentrate on the one cop one

robber problem here. However, all the following methods can

easily be extended to multiple cops. Furthermore, we are in-

terested in playing on typical video game maps that include

obstacles. Hence, one cop cannot catch a robber that plays

optimal when both agents play with same speed. To enable

execution of experiments, i.e. many simulations of the game,

we have to decide between one of the three ways to guarantee

termination: the target moves suboptimally from time to time,

the game is ended after a certain number of steps, or the cop

is faster than the target. The first possibility contradicts our

wish to compute near-optimal policies for the robber. The

second choice is problematic due to the choice of timeout

conditions. Furthermore, it does not measure the full amount

of suboptimality generated by a given strategy because the

game is truncated after the timer runs out. Moreover, it is

easy to construct an algorithm that achieves optimal results in

this game: detect all cycles around obstacles of length greater

or equal to four in the map, run to a cycle where the cop can-

not capture the robber before reaching the cycle, and exploit

the cycle. Therefore, we allow the cop to be faster than the

robber. For simplicity we allow the cop to make d subsequent

moves when the robber only gets one, i.e. to move to any

location within a radius of d of his current position.

3

There are only two advanced methods in the literature that

try to compute move policies for the robber quickly. [Bulitko

and Sturtevant, 2006] suggest using Dynamic Abstract Mini-

max (DAM). This algorithm assumes that various resolutions

of abstract maps are available, where an abstract map is cre-

ated by taking sets of states in an original map and merging

them together to form a more abstract map. DAM chooses a

level of abstraction to begin with, and then computes a min-

imax solution to a fixed depth. If the robber cannot avoid

capture at that level of abstraction, computation proceeds to

the next lower level. We illustrate this in Figure 1. In the ab-

stract map two sets of 9 states have been abstracted together

to form a 2-node graph. The cop can catch the robber in one

move in the abstract graph, so DAM will search again on the

lower level of abstraction. Assume there are ? levels of ab-

Related Work

cr

crcr

uvwuvw

Figure 2: Example of where the original tie breaking of the

cover heuristic computation can cause the robber to remain in

v instead of going to w.

straction and the cop and the robber occupy distinct nodes up

until level m. The original algorithm begins planning at level

m. Running the experiments in Section 5 for multiple frac-

tions of m showed that starting at level m/2 is superior. We

report the experiments for the later case.

If the robber can escape, an abstract goal destination is se-

lected and projected onto the actual map. PRA* [Sturtevant

and Buro, 2005]is used to compute a path to that node which

is subsequently followed for one step. Since only the goal

destinationisprojectedontothegroundlevel, DAMcanmake

mistakes when cycles exist in the strategy. For example, con-

sider a cycle with five nodes and an adjacent cop and robber.

The solution is to run around this cycle, but after seven steps

the robber will reach the initial position of the cop. Hence,

when computing with depth seven, the robber will run to-

wards the cop. The solution is to make DAM only refine one

abstract step. However, running the experiments in Section 5

for such a variant showed that the original algorithm, despite

its flaws, achieves slightly better results.

Within the present work, we use the same idea of using

abstractions for speedup. Our algorithm uses the same policy

(m/2) for selecting the first level of abstraction, solves the

problem on this level and proceeds to the next lower level if

the robber cannot survive long enough due to the computed

solution. Otherwise, the abstract solution path is refined into

a ground level path.

The Cover heuristic, as a state-of-the-art algorithm for

moving target search, has been used for both the cop and the

robber[Isaza et al., 2008]. This algorithm computes the num-

ber of nodes in the graph that the respective agent can get to

before any other agent. It then tries to maximize this area

with each move, minimizing the area the opponent can reach.

The original algorithm breaks ties by assigning the nodes

on the border between two covered areas to the cop. This

causes the heuristic to be inaccurate even for simple prob-

lems. They used a notion of risk to increase the pursuer’s

aggressiveness and circumvent this inaccuracy for the cop.

As an example for the robber, consider the graph in Figure

2. There are three vertices, u, v, and w. The cop starts on

u, the robber on v, and it is the robber’s turn. When the rob-

ber remains on v, v and w are considered robber cover. If he

moves to w, u and v are cop cover (due to the tie-breaking

rule) and only w is robber cover. Thus, when maximizing,

the robber prefers to stay in v, which is suboptimal. In this

work, we modify Cover to eliminate this problem. Vertices

are only declared robber cover if he is guaranteed to reach

them no matter what the cop does.

But, we found that no matter how the Cover heuristic is

585

Page 3

defined, it is easy to construct a simple example where hill

climbing would fail for either of the two players. Using no-

tions of ties and untouchable nodes can solve some of the

issues but subsequently turns the heuristic into a search algo-

rithm instead of a static heuristic. Thus, we seek to develop

a more principled search method instead of trying to patch

cover.

When being used for the pursuer, Cover with Risk and Ab-

straction (CRA)[Isaza et al., 2008]makes use of abstractions

to decrease computation time and to scale to large maps. This

has not been used for robber. Since the heuristic is most ac-

curate with full information, using abstractions only trades

accuracy against speed. Within our experiments, the Cover

heuristic without abstractions already performed poorly in

terms of survival time against an optimal cop. Therefore, we

did not extend the algorithm to incorporate abstractions.

Optimal move policies for both cops and robbers are stud-

ied by [Moldenhauer and Sturtevant, 2009].

oped algorithms that solve one problem instance, i.e. com-

pute optimal policies for a given initial position. Unfortu-

nately, although well optimized, optimal algorithms do not

scale to very large maps and cannot meet tight computation

time constraints of modern computer games. An algorithm

that solves a map, i.e. computes a strategy for cop and robber

for every possible initial position was first proposed by[Hahn

and MacGillivray, 2006]. We use an improved version that

has been used as a baseline in [Moldenhauer and Sturtevant,

2009] to compute optimal solutions offline and to generate a

cop that moves optimally within our experiments.

They devel-

4

We now outline our approach to computing near-optimal

move policies for the robber. We will first motivate the al-

gorithm and then provide more details. For ease of under-

standing the following ideas will be developed for the game

where cop and robber move with same speed. However, all

the definitions and theorems are extendible to different speed

games.

The robber makes the assumption that the cop knows

where he is going to move, i.e. that the cop will play a best

response against him. Under this assumption, the robber tries

to maximize the time to capture. This can also be interpreted

as “running away”, i.e. taking the path that the cop takes

longest to intersect. We will now formalize this idea. Let

N[v] = {w|(v,w) ∈ E(G)} ∪ {v} denote the closed neigh-

borhood of v ∈ G. Let

P(v) = {p : N → V |p(0) = v,∀i ≥ 0 : p(i+1) ∈ N[p(i)]}

be the set of paths starting in v. Given a path prand pcfor

the robber and cop, respectively, that they will follow disre-

garding the opponent’s actions, we can compute the sum of

the numbers of turns both agents take until capture occurs:

T(pr,pc) = min(

{2t|t ≥ 0,pc(t) = pr(t)}

∪{2t − 1|t ≥ 1,pc(t) = pr(t − 1)}).

Definition 1 (TrailMax) Let vr∈ G and vc∈ G be the po-

sitions of robber and cop in G. We define

TrailMax(vr,vc) =max

pr∈P(vr)

TrailMax

min

pc∈P(vc)T(pr,pc).

(1)

r

c

Figure 3: Smallest 1-cop-win graph where the set of moves

according to TrailMax (solid) diverges from the set of optimal

moves (dashed).

We say G is an octile map if its vertices are positions in a

two dimensional grid and each vertex is connected to its up

to eight neighbors via the two horizontals, two verticals and

four diagonals. Within our experiments we use octile maps to

model the environment.

Recall that a graph G is called n-cop-win if n cops have a

winning strategy on G for any initial position of the cops and

the robber and when all agents move with same speed.

Theorem 1 Let G be a 1-cop-win octile map.

and vc be the initial positions of robber and cop.

TrailMax(vr,vc) returns the optimal value of the game where

the cop and robber move at same speed.

Let vr

Then

This theorem also holds when the cop is faster as described

in Section 2. However, this requires obvious adjustments of

the above definitions and is therefore omitted for readability.

Unfortunately, the theorem does not hold for general 1-cop-

win or n-cop-win graphs (n ≥ 2).

TrailMaxcanbeusedtogeneratemovepoliciesfortherob-

ber. For simplicity, the resulting algorithm will be refered to

by the same name. Furthermore, a pair (pr,pc) for which

(1) is maximal will be called a TrailMax pair. The algorithm

computes a TrailMax pair (pr,pc) and then follows the rob-

ber’s path prfor k steps (k ≥ 1) disregarding the cop’s ac-

tions. Afterwards, TrailMax is called again and a new path pr

is computed, hence our notation TrailMax(k). Unfortunately,

the immediate assumption, that TrailMax(1) might yield an

optimal strategy for general n-cop-win graphs is not true. De-

picted in Figure 3 is an example of a 1-cop-win graph where

the robber is to move and the optimal move is to remain on

his current position, marked with a r. This causes the cop

to commit to a direction, after which the robber can run away

more effectively. However, according to TrailMax, the robber

has to move to either of the indicated adjacent positions.

A TrailMax pair is efficiently computed by simultaneously

expanding vertices around the robber’s and cop’s position in

a Dijkstra like fashion. Two priority queues are maintained,

one for the cop and one for the robber. All nodes of a given

cost for the robber are expanded first, because the robber

moves immediately after computing a policy. Node expan-

sions for the robber are checked against the cop’s expanded

nodes to test whether the cop could have already reached that

point and captured the robber. If this is the case, the node is

discarded. Otherwise, the vertex is declared as robber cover

and expanded normally. When taking a node from the queue

for the cop, it is always expanded normally.

A visualization is depicted in Figure 4. The grey area in-

dicates the vertices that are declared robber cover but are not

586

Page 4

robber

cop

Figure 4: Visualization of TrailMax’s computation. The grey

area is the nodes that have been reached by the robber first,

declared as robber cover but will not be expanded anymore

since they were captured by the cop in a previous turn.

expanded anymore since the expansion around the cop’s po-

sition captured them in a previous turn. Computation ends

when all nodes declared as robber cover have been expanded

by the cop as well. The last node that is explored by the cop

is the goal node the robber will run to. Path generation can be

easily done by maintaining pointers to parents when expand-

ing nodes.

The above computation finds one goal vertex and a shortest

path to it. The path then has to be extended by moves that

make the robber remain on the goal vertex until capture. It is

not hard to show that this extended shortest path is indeed a

solution to (1). Note that there might be many possible goal

vertices the robber could run to and many different paths to

gettothemthatfulfill(1). Findingallsuchverticesispossible

by remembering all robber nodes that have not been caught

before the last cop’s turn expansion. This could potentially

be used to take advantage of a suboptimal cop, although we

do not study this issue here.

Within computer game maps, edge costs are often approx-

imated to enable faster computation. Under the assumption

that path costs can only differ by a fixed number of values, i.e.

buckets can be used within the priority queue and queue ac-

cess takes constant time, the above algorithm runs in time lin-

ear in the size of the graph. Although TrailMax already scales

well to large maps (cf. Section 5) our goal is to make com-

putation time as independent of the size of the input graph as

possible. Inspired by DAM we use abstraction to achieve this

goal. Starting at an intermediate level of abstraction of the

hierarchy relative to the cop and robber positions, TrailMax

is computed. If the solution length does not exceed a certain

value q (computed by (1)), then computation proceeds to the

next lower level. If it does, the computed abstract path is re-

finedtoagroundlevelpathusingPRA*’srefinement, i.e. pro-

gressively computing a path on the next lower level that only

goes through nodes whose parents are either on or adjacent

to the abstract path. In the following, this algorithm is called

Dynamic Abstract TrailMax with threshold q and number of

steps the solution is followed k, hence DATrailMax(q,k).

5

To evaluate our algorithms we compare to the algorithms de-

scribed in Section 3 and measure the quality and required

computation time in terms of node expansions. We set d = 2,

Experiments

Figure 5: One of the maps used in Baldur’s Gate that the ex-

periments were conducted on. The black parts are obstacles,

white is traversable.

i.e. the cop can take two turns before the robber gets one and

can thus move to any location within a radius of 2 around

his current location. First experiments show that greater cop

speeds yield the same trends. In contrast, since capture oc-

curs faster, the game becomes easier and less interesting for

the robber.

To generate meaningful statistics we use 20 maps from the

commercial game Baldur’s Gate as a testbed. The smallest of

these maps has 2638, the largest 22,216 vertices. A plot of

a sample map can be found in Figure 5. Furthermore, 1000

initial positions for each map are generated randomly. We

choose the selection at random because we want to explore

the performances of the algorithms for all scenarios since in

a video game, both agents could potentially be spawned any-

where in the map.

We choose octile connections for the map representation

and subsequent levels of abstraction are generated using

Clique Abstraction [Sturtevant and Buro, 2005]. To enable

effective transposition table lookups in minimax and DAM

we set all edge costs to one in all levels of abstraction. Thus,

the distance heuristic between two positions (on an abstrac-

tion or ground level) becomes the maximum norm of these

positions. Furthermore, equidistant edge costs mean we are

optimizing the number of turns both players take rather than

the distance they travel. All the tested algorithms can be used

for nonequidistant edge costs, only minimax’s and DAM’s

performance is expected to be lower.

Using an improved version of the algorithm in [Hahn and

MacGillivray, 2006]the entire joint state space is solved first,

i.e. we compute the values of an optimal game for each tuple

of positions of the robber and cop. This is done in an offline

computation and is used to generate optimal move policies

for the cop as well as to know the optimal value of the game.

Generation of these offline solutions took up to 2.5 hours per

map.

We study the following target algorithms:

Cover. The target performs hill climbing due to the Cover

heuristic (cf. Section 3). The heuristic has to be computed in

every step and for every possible move.

Greedy. The target performs hill climbing using the distance

heuristic. This is extremly fast since distance evaluation is

very simple.

Minimax. The target runs minimax with α-β pruning, trans-

587

Page 5

algorithm

Cover

RBeacons(1)

RBeacons(5)

RBeacons(10)

RBeacons(15)

RBeacons(20)

Greedy

Minimax(5)

Minimax(7)

Minimax(9)

Minimax(11)

DAM(5)

DAM(7)

DAM(9)

DAM(11)

TrailMax(1)

TrailMax(5)

TrailMax(10)

TrailMax(15)

TrailMax(20)

DATrailMax(1)

DATrailMax(5)

DATrailMax(10)

DATrailMax(15)

DATrailMax(20)

optim.

61.9%

64.3%

65.9%

67.4%

68.6%

69.5%

76.0%

78.7%

79.2%

79.8%

80.3%

88.8%

88.4%

87.8%

87.1%

98.3%

98.0%

97.7%

97.5%

97.3%

97.0%

97.1%

97.0%

96.8%

96.7%

nE/c

4.687

0.158

0.159

0.160

0.161

0.162

0.0002

0.031

0.146

0.499

1.354

0.039

0.123

0.323

0.729

0.502

0.520

0.543

0.565

0.585

0.101

0.104

0.110

0.107

0.106

nT/c nE/t nT/t

156.831

1.065

1.070

1.075

1.083

1.091

0.001

0.216

1.027

3.546

9.709

0.238

0.752

1.985

4.476

16.682

17.301

18.060

18.769

19.436

2.283

2.342

2.487

2.395

2.359

0.037

0.022

0.017

0.015

0.248

0.148

0.117

0.102

0.108

0.059

0.043

0.035

3.598

1.970

1.433

1.169

0.023

0.014

0.011

0.010

0.515

0.311

0.251

0.225

Table 1: Experimental results.

position tables and distance heuristic as evaluation function.

We experimented with depths from 1 to 11.

DAM. The target runs dynamic abstract minimax with α-β

pruning, transposition tables and distance heuristic as evalu-

ation function (cf. Section 3). We experimented with depths

from 1 to 11. The depth is used for computation on every

level of abstraction.

RandomBeacons(1-20). The target randomly distributes 40

beacons on the map. It then selects the beacon that is heuris-

tically furthest away from the cop’s position and computes

a path to this location. The path is followed k steps before

computing a new path, hence RandomBeacons(k). We tested

RandomBeacons(k) for k = 1,...,20.

TrailMax(1-20). We tested TrailMax(k) for k = 1,...,20.

DATrailMax(10,1-20). We tested DATrailMax(10,k) for k =

1,...,20. q = 10 was chosen by hand. The question whether

there is a better setting remains for future investigation.

To evaluate performance the game is simulated for each

initial position on each map. Within these simulations, the

target algorithm is called whenever a new move has to be gen-

erated. TrailMax, DATrailMax and RandomBeacons are only

called when a new path has to be computed, thus the num-

ber of turns and algorithm calls differ in this case. All other

algorithms are called once per turn and therefore these two

numbers are equal. In fact, it is not possible for TrailMax,

DATrailMax and RandomBeacons to spread their computa-

tion among the turns where the previous computed path is

followed because the future position of the cop is unknown.

Nonetheless, when used in computer games, these algorithms

will only require computation once every k steps and there-

fore make the frames during path execution available to other

tasks. Therefore, we can also analyze the computation time

Minimax(11)

TM(1)

TM(20)

DAM(0.5, 11)

Beacons(1)

Beacons(20)

Greedy/Minimax(1)

Cover

DATM(10, 20)

DAM(0.5, 1)

DATM(10, 1)

Nodes Expanded per Turn (nE/T)

10-4

10-3

10-2

10-1

100

101

Optimality

0.60.7 0.8 0.91.0

Figure 6: Optimality versus node expansions per turn in one

game simulation. Averaged over the number of games played

in the experiments. Left bottom corner is best, right upper

corner is worst.

per turn for these three methods.

We are interested in the following performance measures:

• the expected survival time of the target measured in per-

centage of the optimal survival time (suboptimality),

• the number of node expansions per call to the algorithm

within one game simulation (nE/c) and

• for TrailMax, DATrailMax and RandomBeacons the

amortized number of nodes expanded per turn within

one game simulation (nE/t).

Similar measures are presented for nodes touched per call

(nT/c)andperturn(nT/t). Toaccountforvariablesizedmaps,

the numbers of nodes expanded and touched are further nor-

malized and measured as a percentage of the map size. Nodes

expanded counts how many times the neighbors of a node

were generated, while nodes touched measures how many

times a node is visited in memory.

The results are in Table 1 and are plotted in Figure 6. The

x-axis is reversed so the best algorithms are near the origin,

with high optimality and few expansions per move. Notice

further the logarithmic scale on the number of node expan-

sions. A pareto-optimal boundary is formed by Greedy and

the TrailMax algorithms, meaning that all other algorithms

have both worse optimality and more node expansions per

move, on average.

Cover clearly performs the worst. Having to compute the

heuristic in every step and for every possible move, its com-

putation time is beyond any computer game requirement. Al-

though solutions on abstractions can be computed in less

time, Cover is also the worst algorithm with respect to op-

timality and optimality decreases when using abstract solu-

tions.

Considering quality, RandomBeacons is the second worst

algorithm. This is due to the fact that it does not play very

well in the endgame, i.e. when the target is cornered and is

about to be captured. When distributing the beacons, many

of them lie in parts of the map that are heuristically far away

from the cop. Thus, the robber runs towards these positions.

Since he is cornered, this results in running into the cop.

588