Content uploaded by Jeffrey O. Kephart
Author content
All content in this area was uploaded by Jeffrey O. Kephart
Content may be subject to copyright.
Analyzing Complex Strategic Interactions in Multi-Agent Systems
William E. Walsh Rajarshi Das Gerald Tesauro Jeffrey O. Kephart
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532, USA
wwalsh1, rajarshi, gtesauro, kephart @us.ibm.com
Abstract
We develop a model for analyzing complex games with re-
peated interactions, for which a full game-theoretic analy-
sis is intractable. Our approach treats exogenously specified,
heuristic strategies, rather than the atomic actions, as primi-
tive, and computes a heuristic-payoff table specifying the ex-
pected payoffs of the joint heuristic strategy space. We ana-
lyze a particular game based on the continuous double auc-
tion, and compute Nash equilibria of previously published
heuristic strategies. To determine the most plausible equi-
libria, we study the replicator dynamics of a large population
playing the strategies. To account for errors in estimation of
payoffs or improvements in strategies, we analyze the dynam-
ics and equilibria based on perturbed payoffs.
Introduction
Understanding complex strategic interactions in multi-agent
systems is assuming an ever-greater importance. In the
realm of agent-mediated electronic commerce, for exam-
ple, authors have recently discussed scenarios in which self-
interested software agents execute various dynamic pricing
strategies, including posted pricing, bilateral negotiation,
and bidding. Understanding the interactions among various
strategies can be extremely valuable, both to designers of
markets (who wish to ensure economic efficiency and sta-
bility) and to designers of individual agents (who wish to
find strategies that maximize profits). Moregenerally, byde-
mystifying strategic interactions among agents, we can im-
proveour ability to predict(and therefore design) the overall
behavior of multi-agent systems—thus reducing one of the
canonical pitfalls of agent-oriented programming (Jennings
& Wooldridge 2002).
In principle, the (Bayes) Nash equilibrium is an appropri-
ate concept for understanding and characterizing the strate-
gic behavior of systems of self-interested agents. In prac-
tice, however, it is infeasible to compute Nash equilibria
for all but the very simplest interactions. For some types
of repeated interactions, such as continuous double auc-
tions (Rust, Miller, & Palmer 1993) and simultaneous as-
cending auctions (Milgrom 2000), even formulating the in-
formation structure of the extensive-form game, much less
computing the equilibrium, remains an unsolved problem.
Copyright
c
2002, American Association for Artificial Intelli-
gence (www.aaai.org). All rights reserved.
Given this state of affairs, it is typical to endow agents
with heuristic strategies, comprising hand-crafted or learned
decision rules on the underlying primitive actions as a
function of the information available to an agent. Some
strategies are justified on the basis of desirable properties
that can be proven in simplified or special-case models,
while others are based on a combination of economic intu-
ition and engineering experience(Greenwald & Stone 2001;
Tesauro & Das 2001).
In this paper, we propose a methodology for analyzing
complex strategic interactions based on high-level, heuristic
strategies. The core analytical components of our methodol-
ogy are Nash equilibrium of the heuristic strategies, dynam-
ics of equilibrium convergence,and perturbation analysis.
Equilibrium and the dynamics of equilibrium conver-
gence have been widely studied, and our adoption of these
tools is by no means unique. Yet, these approaches have
not been widely and fully applied to the analysis of heuris-
tic strategies. A typical approach to evaluating the strate-
gies has been to compare various mixtures of strategies
in a structured or evolutionary tournament (Axelrod 1997;
Rust, Miller, & Palmer 1993; Wellman et al. 2001), often
with the goal of establishing which strategy is the “best”.
Sometimes, the answer is well-defined, as in the Year 2000
Trading Agent Competition (TAC-00), in which the top sev-
eral strategies were quite similar, and clearly superior to all
other strategies (Greenwald & Stone 2001). In other cases,
including recent studies of strategies in continuous double
auctions (Tesauro & Das 2001) and in TAC-01, there does
not appear to be any one dominant strategy.
The question of which strategy is “best” is often not the
most appropriate, given that a mix of strategies may consti-
tute an equilibrium. The tournament approach itself is often
unsatisfactory because it cannot easily provide a complete
understandingof multi-agent strategicinteractions, since the
tournament play is just one trajectory through an essentially
infinite space of possible interactions. One can never be cer-
tain that all possible modes of collective behavior have been
explored.
Our approach is a more principled and complete method
for analyzing the interactions among heterogeneous heuris-
tic strategies. Our methodology, described more fully in
the following sections, entails creating a heuristic-payoff
table—an analog of the usual payoff table, except that the
entries describe expected payoffs for high-level, heuristic
strategies rather than primitive actions. The heuristic-payoff
table is then used as the basis for several forms of analysis.
The next four sections of our paper, “Modeling Ap-
proach”, “Equilibrium Computation”, “Dynamic Analysis”,
and “Perturbation of Payoffs” detail the components of our
approach. After presenting the methodology, we describe its
application to two complex multi-agent games: automated
dynamic pricing (ADP) in a competing-seller scenario, and
automated bidding in the continuousdouble auction (CDA).
We conclude with a discussion of the methodology.
Modeling Approach
We start with a game that may include complex, repeated
interactions between A agents. The underlying rules of the
game are well-specified and common knowledge. The rules
specify particular actions that agents may take as a func-
tion of the state of the game. Each of the A agents has a
choice of the same S exogenously specified, heuristic strate-
gies. By strategy, we mean a policy that governs the choice
of individual actions, typically expressed as a deterministic
or stochastic mapping from the information available to the
agent to an action. For example in the CDA, typical actions
are of the form “bid b at timet”, while the bidding strategies
can be complex functions, expressed in hundreds or thou-
sands of lines of code, that specify what bids are placed over
the course of trading. We say the strategies are “heuristic”
in that they are not generally the solution of (Bayes) Nash
equilibrium analysis.
A key step in our methodology is to compute a heuristic-
payofftable, that specifies the expected payoff to each agent
as a function of the strategies played by all the agents. The
underlying payoffs may depend on varying types for the
agents, which may encompass, for instance, different util-
ity functions or different roles. To simplify analysis, we as-
sume that the types of agents are drawn independently from
the same distribution, as is commonin the auction literature.
To further improve tractability, we also assume that that an
agent chooses its strategy independently of its type. How-
ever, we do assume that the agent’s type and the distribution
of types are available to the strategy itself.
The heuristic-payoff table is an abstract representation of
the fundamental game in that we have reduced the model
of the game from a potentially very complex, multi-stage
game to a one-shot game in normal form, in which we treat
the choice of heuristic strategies, rather than the basic ac-
tions, as the level of decision making for strategic analysis.
We emphasize that we take the strategies as exogenous and
do not directly analyze their genesis nor their composition.
With this model, we can apply the standard game-theoretic
analysis just as we would with a normal-form game of sim-
ple actions, such as the prisoner’s dilemma.
The standard payoff table for a normal-form game re-
quires S
A
entries, which can be extremely large, even when
S and A are moderate. But we have restricted our analysis
to symmetric games in which each agent has the same set of
strategies and the same distribution of types (and hencepay-
offs). Hence, we can merely compute the payoff for each
strategy as a function of the number of agents playing each
strategy, without being concerned about the individual iden-
tities of those agents. This symmetry reduces the size of the
payoff table enormously. The number of entries in the table
is the number of unique ways into which a set of A identical
agents may be partitioned among S strategies. This quantity
can be shown to be
A S 1
A
, or roughly
A
S 1
S 1 !
when A S.
In this case, changing the exponentfrom A to S 1 results in
a hugereduction in thesize ofthe payofftable. For example,
in applications presented in this paper, A
20, S 3, and
thus the symmetric payoff table contains just 231 entries—
far less than the approximately 3
48 10
9
entries contained
in the asymmetric payoff table.
Payoffs may be computed analytically for sufficiently
simple games and strategies. However, in games of realistic
complexity, it is necessary to measure expected payoffs in
simulation. In games with non-deterministic aspects, it will
be necessary to run many independent simulations for each
payofftable entry, to ensure that the measured payoffsare as
accurate as possible.
With a payoff table in hand, one can use a variety of tech-
niques to gain insight into a system’s collective behavior.
We present three such techniques in this paper: (1) First,
we perform a static analysis, which entails computing Nash
equilibria of the payoff table. To simplify this computation,
we look only for symmetric Nash equilibria, although gen-
erally nothing precludes the existence of asymmetric equi-
libria. (2) Second, we model the dynamics of agents that pe-
riodically and asynchronously switch strategies to ones that
appear to be more successful. This helps us understand the
evolution of the mixture of strategies within the population,
providinginsight into the conditions under which of the var-
ious Nash equilibria may be realized. (3) Third, we suggest
techniques for understanding the strategies themselves at a
deeper level. Since our analysis is performed at the more
coarse-grained level of strategies rather than actions, it does
not directly provide insight into the behavior of the strate-
gies at the level of primitive actions. We would like to un-
derstand how to design good strategies, how the series of
primitiveactionsinteract between strategies, how to improve
strategies, and the system-level effects of strategy improve-
ments. We have begun to address these challenges in two
ways. We analyze how the static and dynamic properties of
the system change when the values in the payoff table are
perturbed. Perturbation analysis suggests the potential ef-
fects of improving one of the strategies, as well as noise or
measurement errors in the payoffs. All of these techniques
will be elucidated in the following three sections.
Comments on Our Modeling Approach
Obtaining the payoff table is the key step in our methodol-
ogy. Given the payoff table, we need not concern ourselves
any more with the details of how it was generated; the rest
of the analyses that we perform all operate mechanically on
the table values, with no need to re-do the simulations that
generated them. Still, obtaining the payoff table at a high
degree of precision may be computationally non-trivial. A
20 agent, 3 strategy game requires the computation of 231
table entries. In our analysis of the CDA game, we averaged
the results of 2500 simulations for each payoff entry, requir-
ing about an afternoon of computation on seven 333MHz-
450MHz IBM RS6000 workstations to compute the entire
table. Computing the payoff table for our application was
feasible because the auctions and heuristic strategies require
fairly simple computations and we had sufficient computa-
tional resources available.
For some games (e.g., iterative combinatorial auctions)
and strategies (e.g., that use expensive optimization algo-
rithms), computing the payoff table may be exorbitantly ex-
pensive. Other factors, such as limited resources or timing
properties of the game, may limit our ability to fully com-
pute the payofftable. For instance, each TAC-01 game takes
exactly 15 minutes, and only two auction servers are pub-
licly available at present. Clearly, for our methodologyto be
more broadly feasible, the expense of computing the payoff
table must be addressed. Currently,we are exploringwaysto
interleave Nash equilibrium computation with payoff com-
putation. We believe that this will admit methods to help
focus on the payoff entries where accuracy is most needed,
rather than performing a large number of simulations for all
entries.
Our methodology is based on heuristic strategies to ad-
dress the limitations in present game-theoretic techniques.
On the other hand, our assumptions of symmetry are made
strictly to reduce storage and computational cost. While it is
straightforward in principle to break the symmetry of type
distributions and strategy sets, the exponential growth in ta-
ble size and computation, both for determining the payoff
entries and for the equilibrium and dynamic analysis we dis-
cuss in subsequent sections, practically restricts the degree
to which we can break symmetry.
We acknowledge that, despite its expedience, the most
controversial aspect of our approach is likely to be the as-
sumption of a single set of heuristic strategies available to
all agents, which avoids the issue of how the strategies are
developed and become known to the agents. Nevertheless,
an observation of current practice suggests that a moder-
ately sized set of heuristic strategies can become available
to the population at large. Two automated bidding services
are openly available to users of eBay, and investing in a mu-
tual fund is an exact method of adopting the fund manager’s
strategy. Indeed, as automated negotiation becomes more
widespread and pervasive, we expect that the majority of
participants in the general public will adopt heuristic strate-
gies developed by others, rather than developing their own.
Admittedly, some participants with sufficient talent and re-
sources will invest in the development of highly specialized
strategies. But where there is an information flow, there is
the possibility for these strategies to spread. Many players
in TAC-01 used techniques publicly documented and shown
to be effective in TAC-00.
Another restriction we have made in our model is the as-
sumption that agentsmake their choices of strategy indepen-
dently of their own type. We should generally expect that
sophisticated agents would choose to condition their strat-
egy choice based on their type.
1
While we could easily
1
Although an individual agent willalways do as well or better to
model type conditioning by discretizing types, the storage
and computational expense quickly explodes, as with com-
puting asymmetric equilibria.
Equilibrium Computation
At the start of the game, each of the A agents chooses to play
one of the S available pure strategies. The payoff to agent i
is a real-valued function u of the strategy played by i and
the strategies played by all other agents. As discussed in the
modelingsection, the payoffis the expectedreward obtained
when the agents play a particular combination of strategies.
Because we assume symmetric strategysets and payoffs, the
payoff to an agent can be represented as the payoff to each
strategy as a function of the number of agents playing each
strategy.
Agent i maychoose its strategies randomly according to a
mixed strategy,ˆp
i
ˆp
i 1
ˆp
i S
. Here, ˆp
i j
indicates the
probability that agent i plays strategy j, with the constraints
that ˆp
i j
0 1 and
∑
S
j
1
ˆp
i j
1. The vector of all agents’
mixed strategies is ˆp and the vector of mixed strategies for
all agents except i is ˆp
i
. We indicate by ˆp
i
e
j
, the special
case when agenti plays pure strategy j with probability one.
We denote by u
e
j
ˆp
i
the expected payoff to an agent i
for playing pure strategy j, given that all other agents play
their mixed strategies ˆp
i
. The expected payoff to agent i of
the mixed strategy is then u
ˆp
i
ˆp
i 1
∑
S
j
1
u e
j
ˆp
i 1
ˆp
i j
.
In game theoretic analysis, it is generally assumed that ra-
tional agents would play mixed Nash equilibrium strategies,
whereby no one agent can receive a higher payoff by unilat-
erally deviating to another strategy, given fixed opponents’
strategies. Formally, probabilities ˆp
constitutea Nash equi-
librium iff for all agents i, andall ˆp
i
, u ˆp
i
ˆp
i
u ˆp
i
ˆp
i
.
In the remainder of this paper, we restrict our attention to
symmetric mixed strategy equilibria, whereby ˆp
i
ˆp
k
p
for all agents i and k. We denote an arbitrary, symmetric,
mixed strategy by p and the probability that a given agent
plays pure strategy j by p
j
. Nash equilibria of symmetric
strategies always exist for symmetric games (Weibull 1995),
and are not generally unique. We restrict our attention in this
way for two reasons. First, when searching for symmetric
Nash equilibria, we need only find S, rather than AS proba-
bilities. Second, absent a particular mechanism for breaking
symmetry, it is reasonable to assume symmetry by default,
as is often done in auction analysis with symmetric agent
types. In particular, symmetry is consistent with the particu-
lar evolutionary game theory model we consider in the next
section.
Finding Nash equilibria can be a computationally chal-
lenging problem, requiring solutions to complex, nonlin-
ear equations in the general case. The Nash equilibrium
conditions can be expressed in various equivalent formu-
lations, each suggesting different solution methods (McK-
elvey & McLennan 1996). Several solution methods are
implemented in Gambit (McKelvey, McLennan, & Turocy
2000), a freely available game solver. But because Gambit
condition on type, given the strategic behavior of the other agents,
an equilibrium of strategies conditioned on type could actually be
worse for all agents than a non-type-conditioned equilibrium.
is not able to exploit the symmetry of the games we study, it
requires the full normal form game table as input, severely
limiting the size of problemsthat can feasibly be represented
in the program.
In this work, we formulate Nash equilibrium as a mini-
mum of a function on a polytope. Restricting ourselves to
symmetric equilibria in a symmetric game, the problem is to
minimize:
v
p
S
∑
j 1
max u e
j
p u p p 0
2
(1)
The mixed strategy p
is a Nash equilibrium iff it is a global
minimum of v (McKelvey & McLennan 1996). Although
not all minima of the function v may be global, we can vali-
date that a minimum is global if its value is zero.
In this work we used amoeba (Press et al. 1992), a non-
linear optimizer, to find the zero-points of v in our applica-
tion games. Amoeba searches an n-dimensional space using
a
n 1 -dimensional simplex or polyhedron. The function
is evaluatedat each vertexof the simplexandthe polyhedron
attempts to move down the estimated gradient by a series of
geometric transformations that continually strive to replace
the worst-performing vertex. We repeatedly ran amoeba,
restarting at random points on the S-dimensional simplex,
and stopping when it had found 30 previously-discovered
equilibria in a row. Because S
3 in our applications, we
were able to plot v and verify that we had indeed found all
equilibria. For games with A
20, S 3, and three equi-
libria, it took amoeba roughly ten minutes to terminate on a
450Mhz IBM RS6000 machine.
Dynamic Analysis
Nash equilibria provide a theoretically satisfying view of
ideal static properties of a multi-agent system. Yet often the
dynamic properties may be of equal or greater concern. In
actual systems, it maybe unreasonableto assume that agents
have all correct, common knowledge necessary to compute
equilibria. Furthermore, even when agents have this com-
mon knowledge and the resources to compute equilibria, we
still want to address the question of which equilibrium is
chosen and how agents (implicitly) coordinate to reach it.
Many have studied models of adaptation and learning to
simultaneously address these issues. Common learning ap-
proaches adjust strategy mixes gradually to myopically im-
prove payoffs in repeated play of a game. Definitive, pos-
itive theoretical properties of equilibrium convergence for
general games of A agents and S strategies, have not yet
been established (Fudenberg & Kreps 1993; Fudenberg &
Levine 1993; Jordan 1993). For two-player, two-strategy
games, iterated Gradient Ascent (Singh, Kearns, &Mansour
2000) provably generates dynamics giving an average pay-
off equivalent to some Nash equilibrium. Greenwald and
Kephart (2001) observe empirically that agents using meth-
ods based on no external regret (Freund & Schapire 1996)
and no internal regret (Foster & Vohra 1997) learning play
pure strategies in a frequencycorresponding to a Nash equi-
librium. Approaches based on Q-Learning (Watkins 1989)
optimize long-term payoffs rather than only next-stage pay-
offs. Equilibrium convergence can be guaranteed for a sin-
gle Q-Learner, and for two-player games in the zero-sum
case (Littman 1994),or in the general-sumcase with the use
of a Nash equilibrium solver (Hu & Wellman 1998).
For this paper, we borrow a well-developed model from
evolutionary game theory (Weibull 1995) to analyze strat-
egy choice dynamics. In contrast to the aforementioned ap-
proaches, which model repeated interactions of the same set
of players (i.e., the game players constitute the population),
we posit a large population of N agents, from which A
N
agents are randomly selected at each time step to play the
game. At any given time, each agent in the population plays
one of the S pure strategies, and the fraction of agents play-
ing strategy j is p
j
.
2
The p
j
values define a population vec-
tor of strategy shares p. For sufficiently large N, p
j
may be
treated as a continuous variable.
We use the replicator dynamics formalism to model the
evolution of p with time as follows:
˙p
j
u e
j
p u p p p
j
(2)
where u
p p is the population average payoff, and u e
j
p
is the average payoff to agents currently using pure strategy
j. Equation 2 models the tendency of strategies with greater
than average payoff to attract more followers, and strategies
with less than average payoff to suffer defections.
We prefer that a dynamic model assume minimal in-
formational requirements for agents beyond their own ac-
tions and payoffs. The replicator dynamics equation implies
that agents know u
p p , a rather implausible level of in-
formation. However, we can obtain the same population
dynamics with a more plausible “replication by imitation”
model (Weibull 1995). In that model, an agent switches to
the strategy of a randomly chosen opponent who appears to
be receiving a higher payoff. Alternative models in which
learning at the individual level leads to replicator dynamics
has been discussed in (Borgers & Sarin 1997).
We could interpret p, at a given time, as representing a
symmetric mixed strategy for all N players in the game.
With this interpretation, the fixed points of Equation 2
(where ˙p
j
0 for all strategies j), correspond to Nash equi-
libria, and u
p p and u e
j
p are as defined in the equi-
librium context. When strategy trajectories governed by
Equation 2 convergeto an equilibrium, the equilibrium is an
attractor. However, these strategy trajectories do not nec-
essarily terminate at fixed points. Indeed, there are many
plausible payoff functions that generate limit cycle trajec-
tories (Weibull 1995) or even chaotic trajectories (Sato,
Akiyama, & Farmer 2001).
When multiple Nash equilibria exist, those that are attrac-
tors are clearly the only plausible equilibria within the evo-
lutionary model. With multiple attractors, those with larger
basins of attraction are more likely, assuming that every ini-
tial population state is equally likely. Alternatively, we can
use the basins of attraction to understand which initial pop-
ulation mixes will lead to which equilibrium. Strategy de-
signers who have an interest (e.g., fame or profits for selling
2
Our motivation for overloading the p notation will become ev-
ident below.
software implementing the strategy) in widespread adoption
of their strategies could then determine how much initial
adoption is necessary to lead to an equilibrium containing
a favorable ratio of their strategies.
For our analysis of two particular games (in the “Applica-
tions” section) we use the heuristic payoff table and Equa-
tion 2 to generate a large number of strategy trajectories,
starting from a broad distribution of initial strategy vectors
p. For three strategy choices, the resulting flows can be plot-
ted in a two-dimensional unit simplex and have an immedi-
ate visual interpretation.
Perturbation of Payoffs
In our model, we have assumed a fixed set of exogenously
specified strategies. But because they are heuristic, rather
than game-theoretically computed, we should generally as-
sume that there could be many variations that engender
changes in performance. Because the possible variations
could potentially be infinite, we do not have a way to ac-
count for all variations with our methodology. Still, by per-
turbing the payoff table in some meaningful ways, we can
perform some directed study of plausible effects of certain
abstract changes in strategy behavior.
A question we mayask is how would improving one strat-
egy relative to the others affect the equilibria and dynamics.
We consider a simple model in which the agents playing
strategy σ
“steal” some fraction α 0 1 of the payoff
from the agents playing the other strategies. For each profile
of strategies in the payoff table, and each strategy σ
σ ,
where n agents play σ , n agents play σ , and q σ is
the payoff of strategy σ , we change the payoffs as follows:
q
σ 1 αmin n n n q σ
q σ 1 α min n n n q σ
Note that, for any profile, this perturbation conserves the to-
tal payoff to all agents.
While it may not actually be possible to uniformly im-
prove a strategy as we describe, the approach is suggestive
of the type of perturbation that we might reasonably con-
sider. Other possibilities could include taking surplus from
only one opponent strategy, or uniform improvements to all
strategies (which might occur if one strategy becomes more
adept at executing win-win actions). These kinds of per-
turbations could help direct efforts to improve a strategy’s
likelihood of being adopted and the payoffs in equilibrium.
Alternatively, we might be interested in estimating the ef-
fects of unmodelled variations in the strategies throughout
the population, or performing sensitivity analysis on the es-
timates of expectedpayoffs. To do so, we could perturbindi-
vidual payoffentries randomly, either statically or whenever
used in Equation (2).
Applications
In this section we apply our methodology to two games
with complex, strategic interactions: an Automated Dy-
namic Pricing (ADP) game, and a Continuous Double Auc-
tion (CDA) game. We chose these games because of the in-
tractability of computingequilibria in the underlying games,
an existing body of literature which includes interesting
heuristic strategies, and the availability of simulators for
computing the heuristic payoff tables.
Automated Dynamic Pricing Game
Description of the ADP Game Interest in Internet com-
merce has fueled the emergence of software agents such as
shopbotsthat greatly facilitate comparison shoppingby buy-
ers. Shopbots may also enable seller agents called pricebots
to dynamically set posted prices based on competitor be-
havior. An example is the site buy.com, which monitors
its primary competitors’ prices and automatically undercuts
the lowest. Interactions among such pricebots can generate
rather complex price dynamics.
Models of interacting pricebots using a variety of pricing
strategies have been studied in (Greenwald & Kephart 1999;
Greenwald, Kephart, & Tesauro 1999). In these models, a
set of sellers offers a single homogeneous good to a much
larger set of buyers. At random times, over the course of a
large number of discrete time steps, sellers reset their prices
and buyers attempt to purchase. A buyer wishes to purchase
one unit of the good, preferring to pay lower prices not ex-
ceeding its value, which is randomly chosen from the uni-
form distribution in the unit interval. All sellers have the
same constant productioncost, and their objective is to max-
imize the product of their per-unit profit and the number of
sales. We assume that buyers use one of two simple rules
for seller selection. A fixed 50% of buyers selects a seller at
random,and the rest usea shopbot to findthe current lowest-
price seller.
We formulate a one-shot ADP Game of heuristic strate-
gies, abstracting the underlying game of repeated price-
setting by sellers. At the start of the game, sellers choose
one of three heuristic dynamic pricing strategies to use for
the duration of the game. The “game theory” (GT) strat-
egy (Greenwald & Kephart 1999) plays a mixed-strategy
Nash equilibrium computedfor the underlyinggame assum-
ingthat all pricing and purchasingdecisions aremade simul-
taneously. The “derivative-follower” (DF) strategy (Green-
wald & Kephart 1999) implements a simple hill-climbing
adaptation, experimenting with incremental price adjust-
ments to improve observed profitability, while ignoring as-
sumptions about buyers or competitors. The “No-Internal-
Regret” (NIR) strategy (Greenwald & Kephart 2001) adapts
learning techniques from Foster Vohra (1997) to adaptively
improveits pricing.
Following the simulation procedures in (Greenwald &
Kephart 1999; 2001), we computed heuristic payoff tables
for seller population sizes of 5 and 20 pricebots. Each table
entry indicates the time-averagedpayoff over 1 million time
steps, with a single seller resetting its price at each time step.
Analysis of ADP Game Table 1 shows the Nash equilib-
ria for ADP Games with 5 and 20 pricebots. None of the
equilibria involve the DF pricing strategy, signifying its rel-
ative weakness. Among the equilibria in Table 1 only A is a
pure-strategy Nash equilibrium. It is also interesting to note
that the number of equilibria dwindles from three to one as
the size of the ADP Game increases from 5 to 20 pricebots.
GT
Equilibrium point
Basin boundary
Pure strategy
(a)
NIR DF
|p| scale
0
1.7 x10
-2
.
1.7 x10
-6
B
C
A
GT
NIR
DF
(b)
D
Figure 1: (a) Replicator dynamics for the Automated Pricing Game with 5 pricebots. Points p in the simplex represent strategy
mixes, with homogeneouspopulationslabeled at the vertices. The trajectories in the simplex describe the motion of p following
Equation 2. Open circles are Nash equilibria, corresponding to fixed points of Equation 2, with labels correspondingto those in
Table 1. The dashed line denotes the boundaryof the two basins of attraction. The gray shading is proportionalto the magnitude
of ˙p. (b) Replicator dynamics for the Automated Pricing Game with 20 pricebots.
Agents Label p(GT) p(DF) p(NIR) Payoff
A 1.000 0.000 0.000 0.051
5 B 0.871 0.000 0.129 0.049
C 0.030 0.000 0.969 0.047
20 D 0.986 0.000 0.014 0.013
Table 1: The symmetric Nash mixed-strategy equilibria for
the Automated Pricing Game with 5 and 20 pricebots. Each
row is an equilibrium, showing the probabilities of choos-
ing the high-level strategies (GT, DF, and NIR), and the ex-
pected equilibrium payoff. The labels of the equilibria cor-
respond to those shown in Figures 1(a) and 1(b).
The replicator dynamics for the ADP Game are shown in
Figure 1(a) and 1(b). The strategy space is represented by
a two-dimensional unit simplex with vertices corresponding
to the pure strategies p
1 0 0 (all GT), p 0 1 0 (all
DF), and p 0 0 1 (all NIR). Trajectories are obtained by
starting from an initial pointandapplying Equation 2 repeat-
edly until ˙p
0.
For the 5-agent case shown in Figure 1(a), points A, B,
and C are the Nash equilibria shown in Table 1. A and C are
attractors, while B is a saddle point. Therefore only A and
C can be reached asymptotically as B is unstable to small
fluctuations. Note that C, consisting almost entirely of the
NIR strategy, has a much larger basin of attraction than the
pure-GT point A, suggesting it to be the most likely out-
come. Although it is not correct to refer to NIR as the “best”
strategy, its attraction is the strongest in the 5-agent game.
In the 20-agent case shown in Figure 1(b), however, we
find a surprising reversal: there is now only one Nash equi-
librium, consisting almost entirely of GT. NIR is much
weaker, relative to GT, as compared to the 5-agent game.
NIR nevertheless can play a crucial role in determining the
globalflowin thestrategy simplex. While alltrajectories ter-
minate at D, a significant fraction of them pass close to the
pure-NIR vertex. The light shading in this vicinity indicates
a very small magnitude of ˙p. This implies that even though
all-NIR is not a Nash equilibrium, the population can spend
a significantly long time with most agents adopting the NIR
strategy.
Perturbation analysis of the population dynamics, with 20
sellers, σ
NIR, σ GT and α 0 06, results in the
emergence of two new Nash equilibrium points (one an at-
tractor and another unstable), consisting solely of a mix of
GT and NIR strategies. When α
0 0675, there is an attrac-
tor equilibrium with a majority of NIR and a basin of attrac-
tion coveringmore than half of the simplex. Further increas-
ing α progressivelydecreases D’s basin of attraction. By the
pointat whichα increases to
0 1, Ddisappears completely
and a single Nash equilibrium point remains near the pure
strategy NIR vertex. The resulting simplex flow is similar to
that in Figure1(b), but with the positions of the purestrategy
verticesGT andNIR interchanged. In short, NIR would start
becoming a strong strategy with a 6
75% improvement, and
would become nearly dominant with a 10% improvement.
The weakness of NIR in the 20-agent game was quite
unexpected to us, given the strength it exhibited in a pre-
vious study using up to five agents (Greenwald & Kephart
2001), and in the present 5-agent game. This demonstrates
the value of performing equilibrium and dynamic analysis
on a relatively large number of agents. The results here have
suggestedto us a numberof avenuesfordeeperstudyandde-
velopment of the strategies, further demonstrating the value
of our methodology. PerhapsNIR, whichhas several param-
eters, needs to be retuned for populations of different sizes.
Alternatively, our results may show that it is more difficult
for NIR to learn when playing against a greater number of
agents. Whatever conclusions could be reached, we have al-
ready gained a greater understanding with our methodology
than we could with a more simple analysis.
Continuous Double Auction Game
Description of the CDA Game The CDA is the predom-
inant mechanism used for trading in financial markets such
as NASDAQ and NYSE. A CDA continually accepts bids,
which either immediately match pending bids, or remain
standing until matched by later bids. Models of CDAs have
been extensively studied using both human traders (Smith
1982) and computerized traders (Rust, Miller, & Palmer
1993; Cliff & Bruten 1997; Gjerstad & Dickhaut 1998).
Based on these, we adopt a model in which agents trade in a
CDA marketplace for five consecutive trading periods, with
a fresh supply of cash or commodity provided at the start of
each period.
We implement the CDA marketplace and simulation as
describedin detail by Tesauro and Das (2001), exceptfor the
details of choosing buyer/seller roles and limit prices, as de-
scribed here. At the start of the game, half of the agents are
randomly chosen to be buyers, and the remainder are sell-
ers. Agents are given a list of ten limit prices (seller costs or
buyer values) generated from a known random distribution.
This distributionuses fixed parameters to generate lower and
upperboundson the limit prices from auniform distribution,
and then generates the limit prices using a uniform distribu-
tion between these two bounds. For each run of the game,
we randomly select the integer lower bound b of all buyers
prices uniformly from
61 160 and the upper bound from
b 60 b 209 . We compute the bounds similarly for sell-
ers. The payoff for trading a unit i is s
i
x
i
l
i
for sellers
and s
i
l
i
x
i
for buyers, where x
i
is the trade price and
l
i
is the unit’s limit price. The total payoff obtained by an
agent in the CDA Game is
∑
i
s
i
. If the efficient number of
trades (i.e. the number that maximizes value summed over
all agents) at these limit prices is less than 10 or more than
90, we recalculate the bounds and limit prices.
We formulate a normal-form CDA Game of heuristic
strategies, abstracting the underlying game of continuous
bidding. Each agent chooses a strategy from a set of three
alternatives at the start of the game, and does not change
during the game. The “Zero-Intelligence Plus” (ZIP) strat-
egy we use (Tesauro & Das 2001) is a modified version of
that studied by Cliff and Bruten (1997). ZIP initially bids
to obtain a high surplus value (or profit) and subsequently
adjusts its bid price price towards the price of any observed
trades, or in the direction of improvement when no trades
have occurred after a period of time. The Gjerstad-Dickhaut
(GD) strategy we use (Tesauro & Das 2001), is a modified
version of the original (Gjerstad & Dickhaut 1998). GD cal-
culates a heuristic “belief” function based on the history of
recent market activity, and places bids to maximize the ex-
Size Label p(ZIP) p(GD) p(Kaplan) Payoff
14 - 0.420 0.000 0.580 0.967
A 0.439 0.000 0.561 0.972
20 B 0.102 0.542 0.356 0.991
C 0.000 0.690 0.310 0.989
Table 2: The symmetric Nash mixed-strategy equilibria for
the CDA Game with 14 and 20 agents. Each row is an equi-
librium, showing the probabilities of choosingthe high-level
strategies (ZIP, GD, and Kaplan), and the expected equilib-
rium payoff. The labels of the equilibria correspond to those
shown in Figure 2(a).
pected payoff, given the belief function. The Kaplan strat-
egy (Rust, Miller, & Palmer 1993) withholds bids until the
bid/ask spread decreases to a sufficiently small amount or
the end of a period is near.
We compute the heuristic-payoff table by averaging the
results from 2500 simulations for each profile of strategies.
Analysis of the CDA Game Previous studies show that
the choice of strategy in the CDA Game from amongst the
alternatives of ZIP, GD and Kaplan is an interesting prob-
lem without an obvious solution. Kaplan was the winner
of the Santa Fe Double Auction Tournament (Rust, Miller,
& Palmer 1993). However, the Kaplan strategy does not
perform well against itself, and must be parasitic on the in-
telligent bidding behavior of other agent strategies to ob-
tain decent profits. A recent analysis (Tesauro & Das 2001)
showsthat various published biddingstrategies all givegood
empirical performance and that none is dominant. The ho-
mogeneous pure-strategy populations are unstable to defec-
tion: all-ZIP and all-GD can be invaded by Kaplan, and all-
Kaplan can be invaded by ZIP or GD. Hence the Nash equi-
libria are difficult to compute by inspection or other simple
means.
We applied our solution method to the CDA Game with
various numbers of agents. We show the equilibria for 14
and 20 agents in Table 2. CDA Games with 6, 12, and 14
agents each result in only one Nash equilibriumwith similar
mixed-strategy vectors p
which assign zero probability to
choosingthe GD strategy. The results were qualitativelydif-
ferent for CDA Games with larger populations. For 16, 18,
and 20 agent games we found three equilibria, very similar
for each number of agents. One of these equilibria matched
the small-population equilibrium, and there are two addi-
tional equilibria, one involving only GD and Kaplan, and
one using all three strategies.
We plot the replicator dynamics for the 20-agent CDA
Game in Figure 2(a). The strategy space is represented by
a two-dimensional unit simplex with vertices corresponding
to the pure strategies p
1 0 0 (all ZIP), p 0 1 0 (all
GD), and p 0 0 1 (all Kaplan). The points labeled A,
B, and C are the Nash equilibria shown in Table 2. A and
C are both attractors, while B is a saddle point, hence only
A and C are realistic outcomes. We also note that A has
a much larger basin of attraction than C. If the initial p is
chosen randomly, the populationmix is most likelyto termi-
nate at A even though C has higher population payoff. An
Equilibrium point
Basin boundary
Pure strategy
B
C
D
ZIP
Kaplan
GD
A
(a)
|p| scale
0
4 x10
-2
.
4 x10
-6
ZIP
A'
B'
Kaplan
D'
(b)
C'
E'
GD
Figure 2: (a) Replicator dynamics for the CDA Game. Open circles are Nash equilibria with labels corresponding to those in
Table 2. Other notations are similar to those in Figure 1. (b) Replicator dynamics for the CDA Game with perturbed payoffs,
in which 5% of the ZIP and Kaplan agent payoffs was shifted to GD agents.
additional point D is shown which is an equilibrium of ZIP
and GD in the two-strategy case, but which has incentive to
defect to Kaplan in the full three-strategy game.
We also observed the the replicator dynamics for the 14-
agent CDA Game. There we found that the single equilib-
rium was an stable attractor to which all flows converged.
The gray shading in Figure 2(a), which denotes
˙p , in-
dicates that the population changes much more rapidly near
the lower-left corner (mostly Kaplan) than near the lower
right corner (mostly GD). This shows that there is only a
slight incentive to deviate to Kaplan in an all-GD popula-
tion, and a much larger incentive to switch to ZIP or GD in
an all-Kaplan population. Note that
˙p can vary by up to
four orders of magnitude. In particular, the magnitude of the
flows leading away from B are much smaller than the av-
erage
˙p in the simplex. Hence, although B is an unstable
equilibrium, the population could actually spend a relatively
long time in the region of B. The change in this region could
be so slow as to appear stable if the base time scale of deci-
sion making is not sufficiently fast.
We studied the sensitivity of the population dynamics for
the CDA Game shown in Figure 2(b) by simultaneously im-
proving σ
GD, relative to σ ZIP and σ Kaplan,
by α 05. The replicator dynamics of the perturbed CDA
Game shows significant change in the topology of the flows,
as depicted in Figure 2(b). The right edge of the simplex,
correspondingto a mix of GD agents and ZIP agents, is now
stable against invasionby Kaplan agents. As a consequence,
D
is now a Nash equilibrium point, and due to the global
topology of flows, a new interior equilibrium point occurs at
E
. The equilibriumpoint C has movedto the vertex of pure
GD. Only A and C are stable equilibria.
Although pure GD is still not dominant, nor the only at-
tractor, C
captures much of the simplex in its basin of at-
traction, making it the most likely attractor in the perturbed
CDA Game. Thus, if GD could be improved to capture
an extra 5% of other agents’ surpluses, it would likely be
widely adopted in the population. Moreover, although the
payoffs are actually not perturbed at C
(because it is a pure
strategy), we measured that the payoff there is higher than
the other perturbed and non-perturbed equilibrium points.
We also searched the range of improvements for GD and
found that an equilibrium containing GD captures most of
the simplex in its basin of attraction when α
0 0075. In
short, GD would start becoming a strong strategy with as
little as a 0
75% improvement, and would become nearly
dominant with a 5% improvement.
Discussion
We have proposed an approach for analyzing heuristic
strategies for complex games with repeated interactions.
Our methodology treats the heuristic strategies, rather than
the component atomic actions, as primitive, and computes
expected payoffs to agents as a function of the joint strat-
egy space. With our approach, we can draw useful and gen-
eral conclusions for games that defy analysis at the level of
atomic actions.
We have shown how to apply our methodology to two
games whose complexity has thus far defied game-theoretic
analysis at the level of atomic actions. For each, we found
multiple Nash equilibria. To address the issue of how a par-
ticular equilibrium may be realized, we computed the dy-
namics of a population in terms of the change of propor-
tional shares of strategies. We argued that dynamically un-
stable equilibria will not be realized, and that the attractor
equilibrium with the largest basin of attraction was the most
likely to be played in steady-state, while noting the effects
of time scale on convergence. We also examined perturba-
tions of the expected payoffs to identify how modest im-
provements to one strategy could significantly change the
dynamic properties and the set of equilibria. For each appli-
cation we discovered interesting and surprising results not
apparent from the simpler analyses commonly applied to
heuristic strategies.
Our approach is more principled and complete than the
feasible methods commonly applied to the analysis of com-
plex, strategic interactions. Still, more work is needed to
provide a full understanding of these interactions. While
we have touched on it with perturbation analysis, a deeper
understanding of how to design and improve the strategies
themselves requires a bridging of action-level and heuristic-
level analysis. We have made some preliminary progress
along these lines in the CDA game by a direct study of the
pure payoff table. There we found that studying the region
where a strategy fairs most poorly against others, combined
with a deep understanding of the strategies themselves, can
provide inspiration for strategy improvements.
Our general approach would also be improved with ad-
vances in the specific techniques employed. New learn-
ing algorithms should provide improved equilibrium con-
vergence in small populations. As mentioned above, we are
exploring techniques to minimize the amount of payoff ta-
ble computation necessary to accurately determine equilib-
ria. We hope this will make our methodology feasible for
analysis of the top TAC strategies. Additionally, we could
also perform sensitivity analysis for those games requiring
expensive simulation.
Our modeling assumptions and the computational tech-
niques we employ give us the ability to analyze relatively
large numbers of agents. This in turn allows us to observe
qualitatively different behaviors as the number of agents
grows. In some cases it may be reasonableto extrapolateour
results to even larger numbers of agents, beyond our ability
to directly compute.
The most important computational limitation of our
methodologyis anexponentialdependenceon thenumberof
high-levelstrategies. This wouldseem to limit its applicabil-
ity to real-world domains where there are potentially many
heuristic strategies. This apparent limitation may be sur-
mountable if the numerous heuristic strategies can be placed
into a small number of broad functional groups, and if vari-
ations within a group are not as important as the total popu-
lation fraction within each group. For example, a reasonable
approximation for financial markets may be to classify the
available trading strategies as either “buy-and-hold,” “fun-
damental,” or “technical” strategies, and then carry out our
analysis. To allow for algorithmic variations within each
group, we could randomly choose between variations for
each simulation when computing the payoff table.
Acknowledgments
We thank David Parkes for his helpful comments.
References
Axelrod, R. 1997. Evolving new strategies: The evolu-
tion of strategies in the iterated prisoner’s dilemma. In The
Complexity of Cooperation. Princeton University Press.
Borgers, T., and Sarin, R. 1997. Learning through rein-
forcement and replicator dynamics. Journal of Economic
Theory 77:1–14.
Cliff, D., and Bruten, J. 1997. Minimal-intelligence agents
for bargaining behaviors in market-based environments.
Technical Report HPL-97-91, Hewlett Packard Labs.
Foster, D., and Vohra, R. 1997. Regret in the on-line deci-
sion problem. Games and Economic Behavior 21:40–55.
Freund, Y., and Schapire, R. E. 1996. Game theory, on-
line prdiction and boosting. In Ninght Annual Conference
on Computational Learning Theory.
Fudenberg, D., and Kreps, D. M. 1993. Learning mixed
equilibria. Games and Economic Behavior 5:320–367.
Fudenberg,D., and Levine, K. K. 1993. Steady state learn-
ing and Nash equilibrium. Econometrica 61(3):547–573.
Gjerstad, S., and Dickhaut, J. 1998. Price formation in
double auctions. Games and Economic Behavior 22:1–29.
Greenwald, A., and Kephart, J. 1999. Shopbots and price-
bots. In Proceedings of Sixteenth International Joint Con-
ference on Artificial Intelligence, volume 1, 506–511.
Greenwald, A., and Kephart, J. O. 2001. Probabilistic
pricebots. In Fifth Internationl Conference on Autonomous
Agents, 560–567.
Greenwald, A., and Stone, P. 2001. Autonomous bid-
ding agents in the trading agent competition. IEEE Internet
Computing 5(2):52–60.
Greenwald, A.; Kephart, J.; and Tesauro, G. 1999. Strate-
gic pricebot dynamics. In Proceedings of First ACM Con-
ference on E-Commerce, 58–67.
Hu, J., and Wellman, M. P. 1998. Multiagent reinforce-
ment learning: theoretical framework and an algorithm. In
Proceedings of the Fifteenth International Conference on
Machine Learning, 242–250. Morgan Kaufmann.
Jennings, N. R., and Wooldridge, M. 2002. Agent-oriented
software engineering. In Bradshaw, J., ed., Handbook of
Agent Technology. AAAI/MIT Press.
Jordan, J. S. 1993. Three problems in learning mixed-
strategy Nash equilibria. Games and Economic Behavior
5:368–386.
Littman, M. 1994. Markov games as a framework for
multi-agent reinforcement learning. In Proceedings of the
Eleventh International Conference on Machine Learning,
157–163. Morgan Kaufmann.
McKelvey,R. D., and McLennan, A. 1996. Computationof
equilibria in finite games. In Handbook of Computational
Economics, volume 1. Elsevier Science B. V.
McKelvey, R. D.; McLennan, A.; and Turocy, T. 2000.
Gambit Command Language: Version 0.96.3.
Milgrom, P. 2000. Putting auction theory to work: The
simultaneous ascending auction. The Journal of Political
Economy 108(2):245–272.
Press, W.; Teukolsky, S.; Vetterling, W.; and Flannery, B.
1992. Numericalrecipes in C. CambridgeUniversityPress.
Rust, J.; Miller, J.; and Palmer, R. 1993. Behavior of
trading automata in a computerizeddouble auction market.
In Friedman, D., and Rust, J., eds., The Double Auction
Market: Institutions, Theories, and Evidence. Addison-
Wesley.
Sato, Y.; Akiyama, E.; and Farmer, J. D. 2001. Chaos
in learning a simple two person game. Technical Report
01-09-049, Santa Fe Institute.
Singh, S.; Kearns, M.; and Mansour, Y. 2000. Nash con-
vergence of gradient dynamics in general-sum games. In
Proceedings of UAI-2000, 541–548. Morgan Kaufman.
Smith, V. L. 1982. Microeconomic systems as an experi-
mental science. American Economic Review 72:923–955.
Tesauro, G., and Das, R. 2001. High-performancebidding
agents for the continuous double auction. In Third ACM
Conference on Electronic Commerce, 206–209.
Watkins, C. 1989. Learning from Delayed Rewards. Ph.D.
Dissertation, Cambridge University, Cambridge.
Weibull, J. W. 1995. Evolutionary Game Theory. The MIT
Press.
Wellman, M. P.; Wurman, P. R.; O’Malley, K.; Bangera,
R.; Lin, S.; Reeves, D.; and Walsh, W. E. 2001. Designing
the market game for a trading agent competition. IEEE
Internet Computing 5(2):43–51.