Evaluating Strategies for Running from the Cops
Carsten Moldenhauer and Nathan R. Sturtevant
Department of Computer Science
University of Alberta
Edmonton, AB, Canada T6G 2E8
Moving target search (MTS) or the game of cops
and robbers has a broad field of application reach-
ing from law enforcement to computer games.
Within the recent years research has focused on
computing move policies for one or multiple pur-
suers (cops). The present work motivates to ex-
tend this perspective to both sides, thus developing
algorithms for the target (robber). We investigate
the game with perfect information for both play-
ers and propose two new methods, named TrailMax
and Dynamic Abstract Trailmax, to compute move
policies for the target. Experiments are conducted
by simulating games on 20 maps of the commercial
computer game Baldur’s Gate and measuring sur-
vival time and computational complexity. We test
seven algorithms: Cover, Dynamic Abstract Mini-
max, minimax, hill climbing with distance heuris-
tic, a random beacon algorithm, TrailMax and DA-
TrailMax. Analysis shows that our methods outper-
form all the other algorithms in quality, achieving
up to 98% optimality, while meeting modern com-
puter game computation time constraints.
Moving target search (MTS), or the game of cops and rob-
bers, has many applications reaching from law enforcement
to video games.The game was introduced into the arti-
ficial intelligence literature by [Ishida and Korf, 1991] as
a new variant of the classical search problem. Following
this study, the question how to catch a moving prey effec-
tively has been studied extensively [Ishida and Korf, 1995;
Koenig et al., 2007; Isaza et al., 2008].
In today’s computer games, the players control robbers be-
ing chased by computer generated police agents. The same
game turned around, i.e. the player controlling a cop and hav-
alizable. This is due to the fact that the focus in MTS research
has always been in developing strong pursuit strategies. Very
little is known on how to compute strategies for the target.
This paper presents a systematic study of move policies for
the robber, thus enabling better target modelling and widen-
ing the current focus in MTS.
The game of cops and robber has also been studied in
the mathematical literature (see [Hahn, 2007] for a survey).
Here, cops and robber alternatingly choose their initial posi-
tions at the beginning of the game and then play as in MTS.
The search time of a graph, i.e. the time needed by optimal
playing cops to catch the robber, is therefore a constant. Be-
sides bounds for the one cop and one robber problem, little is
known about this graph property. However, a first algorithm
that runs in polynomial time and which computes the search
time has been developed in [Hahn and MacGillivray, 2006].
Given this algorithm it is possible to determine optimal poli-
cies for both players.
Computer games require tight bounds on resource usage,
especially computation time. Therefore, computing optimal
policies, even though possible in polynomial time with the
above algorithm, is not practical. This gives rise to the ques-
tion of how to quickly compute approximations that yield
near-optimal move policies. In the following, we will in-
troduce a new algorithm called TrailMax and its variant Dy-
namic Abstract TrailMax to respond to this question.
As optimality has only been studied recently, previous
work in MTS has been concerned with approximative solu-
tions and has not, whether for the pursuer or the target, com-
pared methods against optimal policies. Therefore, this paper
is the first to conduct a study of various target algorithms with
respect to their achieved suboptimality.
A precise definition of the cops and robber game consid-
ered in this work will be given in Section 2. We review exist-
ing algorithms, including Cover and Dynamic Abstract Min-
imax, and outline their strengths and weaknesses in Section
3. The new methods, TrailMax and Dynamic Abstract Trail-
Max, are introduced in Section 4. Evaluations of experiments
and extensive comparisons of various target algorithms when
playing against an optimal cop can be found in Section 5. We
wrap up with conclusions in Section 6.
The game of cops and robber is played with n cops and one
robber. Cops and robber occupy vertices in a finite undirected
connected graph G and are allowed to move to an adjacent
vertex or remain on their current location in each turn. Turns
are taken alternatingly beginning with the first to last cop fol-
lowed by the robber. The game is played with perfect in-
formation, i.e. the graph and all locations of all agents are
Figure 7: Optimality versus node expansions per move in one
game simulation. Plotted for all games played in the experi-
ments. Left bottom corner is best, right upper corner is worst.
As expected, minimax becomes more optimal when the
depth is increased. However, its computation time increases
exponentially. When using a depth of seven and greater it
already expands more nodes per call then DATrailMax.
Abstract levels have cycles in them and minimax can
find how to exploit such cycles even with shallow searches.
Hence, DAM’s computed strategies on abstract levels are
similar for different search depths. Therefore, DAM does not
significantly increase in optimality when its depth parameter
It is surprising that Greedy, i.e. hill climbing with a dis-
tance heuristic, performs extremely well. Due to the fact that
this algorithm requires almost no computation time, we can
conclude that Greedy is the method of choice when optimal-
ity is of minor importance.
TrailMax and DATrailMax perform best with respect to op-
timality. Although DATrailMax uses TrailMax on abstract
levels it experiences only a small reduction in optimality. On
the contrary computation time decreases drastically. (About
5× fewer node expansions per call.) Notice that, although
the computation time per call is fairly high, the amortized
time per move is small and even comparable to RandomBea-
cons. When conducting experiments on relatively small maps
we found that DATrailMax expands and touches a higher per-
centage of nodes. This is because the abstraction is not as
useful and therefore the algorithm degenerates into TrailMax.
While Figure 6 shows the averaged points of all game sim-
ulations, the actual results are clouds of points where each
point represents the performance in one game. We compare
this underlying data for the two best algorithms, TrailMax
and DATrailMax, in Figure 7. The small dark points con-
tain data for DATrailMax, while the larger, light circles are
the data points for TrailMax. The x-axis is reversed and the
y-axis is logarithmic. DATrailMax is clearly faster. Trail-
max has a slight advantage in the number of times it makes
optimal moves, resulting in slightly better optimality. No-
tice that although there are games where both algorithms per-
form poorly with respect to optimality, the majority are above
90%. Furthermore, node expansions for both algorithms are
uniformly bounded at around 7% of the size of the map.
Despite research throughout the last two decades, the focus
in moving target search has been on computing move policies
for the pursuers. In the past, very little was known about how
to compute strategies for the target. Due to computer game
requirements on computation time optimal algorithms are no
feasible approach. Therefore, fast approximations of near-
optimal behavior for the target are needed.
The present work conducts a study on such approxima-
tions and evaluates their suboptimality. We find that our new
algorithms, TrailMax and Dynamic Abstract TrailMax pro-
vide the best performance, with near-optimal policies. Sur-
prisingly, we discover that, in our testbed, a greedy strategy
is better than most of the previous algorithms. Thus, the
present work redefines the state-of-the-art in perfect informa-
Future work will address how computation time can be
further reduced. The performance of the greedy algorithm,
which is the fastest approach, suggests that a greedy algo-
rithm with a better heuristic may perform well. Finally, al-
though we have focused on strategies for the robbers, similar
methodology can also be used to evaluate strategies for the
cops, and a variant of TrailMax could be used to compute
policies for the cops as well.
This research was supported by Canada’s NSERC, Alberta’s
iCORE and the German Academic Exchange Service.
[Bulitko and Sturtevant, 2006] V. Bulitko and N. Sturtevant.
State abstraction for real-time moving target pursuit: A
pilotstudy. AAAIWorkshoponLearningForSearch, 2006.
[Hahn and MacGillivray, 2006] Geˇ na
MacGillivray. A note on k-cop, l-robber games on graphs.
Discrete Mathematics, 306(19-20):2492–2497, 2006.
[Hahn, 2007] G. Hahn. Cops, robbers and graphs. Tatra
Mountains Mathematical Publications, 36(2):163–176,
[Isaza et al., 2008] A. Isaza,
R. Greiner. A cover-based approach to multi-agent moving
target pursuit. AIIDE, 2008.
[Ishida and Korf, 1991] T. Ishida and R. E. Korf. Moving
target search. IJCAI, pages 204–210, 1991.
[Ishida and Korf, 1995] T. Ishida and R. E. Korf. Moving-
target search: A real-time search for changing goals. IEEE
Transactions on Pattern Analysis and Machine Intelli-
gence, 17(6):609–619, 1995.
[Koenig et al., 2007] S. Koenig, M. Likhachev, and X. Sun.
Speeding up moving-target search. AAMAS, 2007.
[Moldenhauer and Sturtevant, 2009] C. Moldenhauer and
N. Sturtevant. Optimal solutions for moving target search
(extended abstract). AAMAS, 2009.
[Sturtevant and Buro, 2005] N. Sturtevant and M. Buro. Par-
tial pathfinding using map abstraction and refinement.
AAAI, pages 1392–1397, 2005.
Hahn and Gary
J. Lu, V. Bulitko,and