ArticlePDF Available

Analyzing Complex Strategic Interactions in Multi-Agent Systems

Authors:
  • FatBrain.ai

Abstract

We develop a model for analyzing complex games with re-peated interactions, for which a full game-theoretic analy-sis is intractable. Our approach treats exogenously specified, heuristic strategies, rather than the atomic actions, as primi-tive, and computes a heuristic-payoff table specifying the ex-pected payoffs of the joint heuristic strategy space. We ana-lyze a particular game based on the continuous double auc-tion, and compute Nash equilibria of previously published heuristic strategies. To determine the most plausible equi-libria, we study the replicator dynamics of a large population playing the strategies. To account for errors in estimation of payoffs or improvements in strategies, we analyze the dynam-ics and equilibria based on perturbed payoffs.
Analyzing Complex Strategic Interactions in Multi-Agent Systems
William E. Walsh Rajarshi Das Gerald Tesauro Jeffrey O. Kephart
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532, USA
wwalsh1, rajarshi, gtesauro, kephart @us.ibm.com
Abstract
We develop a model for analyzing complex games with re-
peated interactions, for which a full game-theoretic analy-
sis is intractable. Our approach treats exogenously specified,
heuristic strategies, rather than the atomic actions, as primi-
tive, and computes a heuristic-payoff table specifying the ex-
pected payoffs of the joint heuristic strategy space. We ana-
lyze a particular game based on the continuous double auc-
tion, and compute Nash equilibria of previously published
heuristic strategies. To determine the most plausible equi-
libria, we study the replicator dynamics of a large population
playing the strategies. To account for errors in estimation of
payoffs or improvements in strategies, we analyze the dynam-
ics and equilibria based on perturbed payoffs.
Introduction
Understanding complex strategic interactions in multi-agent
systems is assuming an ever-greater importance. In the
realm of agent-mediated electronic commerce, for exam-
ple, authors have recently discussed scenarios in which self-
interested software agents execute various dynamic pricing
strategies, including posted pricing, bilateral negotiation,
and bidding. Understanding the interactions among various
strategies can be extremely valuable, both to designers of
markets (who wish to ensure economic efficiency and sta-
bility) and to designers of individual agents (who wish to
find strategies that maximize profits). Moregenerally, byde-
mystifying strategic interactions among agents, we can im-
proveour ability to predict(and therefore design) the overall
behavior of multi-agent systems—thus reducing one of the
canonical pitfalls of agent-oriented programming (Jennings
& Wooldridge 2002).
In principle, the (Bayes) Nash equilibrium is an appropri-
ate concept for understanding and characterizing the strate-
gic behavior of systems of self-interested agents. In prac-
tice, however, it is infeasible to compute Nash equilibria
for all but the very simplest interactions. For some types
of repeated interactions, such as continuous double auc-
tions (Rust, Miller, & Palmer 1993) and simultaneous as-
cending auctions (Milgrom 2000), even formulating the in-
formation structure of the extensive-form game, much less
computing the equilibrium, remains an unsolved problem.
Copyright
c
2002, American Association for Artificial Intelli-
gence (www.aaai.org). All rights reserved.
Given this state of affairs, it is typical to endow agents
with heuristic strategies, comprising hand-crafted or learned
decision rules on the underlying primitive actions as a
function of the information available to an agent. Some
strategies are justified on the basis of desirable properties
that can be proven in simplified or special-case models,
while others are based on a combination of economic intu-
ition and engineering experience(Greenwald & Stone 2001;
Tesauro & Das 2001).
In this paper, we propose a methodology for analyzing
complex strategic interactions based on high-level, heuristic
strategies. The core analytical components of our methodol-
ogy are Nash equilibrium of the heuristic strategies, dynam-
ics of equilibrium convergence,and perturbation analysis.
Equilibrium and the dynamics of equilibrium conver-
gence have been widely studied, and our adoption of these
tools is by no means unique. Yet, these approaches have
not been widely and fully applied to the analysis of heuris-
tic strategies. A typical approach to evaluating the strate-
gies has been to compare various mixtures of strategies
in a structured or evolutionary tournament (Axelrod 1997;
Rust, Miller, & Palmer 1993; Wellman et al. 2001), often
with the goal of establishing which strategy is the “best”.
Sometimes, the answer is well-defined, as in the Year 2000
Trading Agent Competition (TAC-00), in which the top sev-
eral strategies were quite similar, and clearly superior to all
other strategies (Greenwald & Stone 2001). In other cases,
including recent studies of strategies in continuous double
auctions (Tesauro & Das 2001) and in TAC-01, there does
not appear to be any one dominant strategy.
The question of which strategy is “best” is often not the
most appropriate, given that a mix of strategies may consti-
tute an equilibrium. The tournament approach itself is often
unsatisfactory because it cannot easily provide a complete
understandingof multi-agent strategicinteractions, since the
tournament play is just one trajectory through an essentially
infinite space of possible interactions. One can never be cer-
tain that all possible modes of collective behavior have been
explored.
Our approach is a more principled and complete method
for analyzing the interactions among heterogeneous heuris-
tic strategies. Our methodology, described more fully in
the following sections, entails creating a heuristic-payoff
table—an analog of the usual payoff table, except that the
entries describe expected payoffs for high-level, heuristic
strategies rather than primitive actions. The heuristic-payoff
table is then used as the basis for several forms of analysis.
The next four sections of our paper, “Modeling Ap-
proach”, “Equilibrium Computation”, “Dynamic Analysis”,
and “Perturbation of Payoffs” detail the components of our
approach. After presenting the methodology, we describe its
application to two complex multi-agent games: automated
dynamic pricing (ADP) in a competing-seller scenario, and
automated bidding in the continuousdouble auction (CDA).
We conclude with a discussion of the methodology.
Modeling Approach
We start with a game that may include complex, repeated
interactions between A agents. The underlying rules of the
game are well-specified and common knowledge. The rules
specify particular actions that agents may take as a func-
tion of the state of the game. Each of the A agents has a
choice of the same S exogenously specified, heuristic strate-
gies. By strategy, we mean a policy that governs the choice
of individual actions, typically expressed as a deterministic
or stochastic mapping from the information available to the
agent to an action. For example in the CDA, typical actions
are of the form “bid b at timet”, while the bidding strategies
can be complex functions, expressed in hundreds or thou-
sands of lines of code, that specify what bids are placed over
the course of trading. We say the strategies are “heuristic”
in that they are not generally the solution of (Bayes) Nash
equilibrium analysis.
A key step in our methodology is to compute a heuristic-
payofftable, that specifies the expected payoff to each agent
as a function of the strategies played by all the agents. The
underlying payoffs may depend on varying types for the
agents, which may encompass, for instance, different util-
ity functions or different roles. To simplify analysis, we as-
sume that the types of agents are drawn independently from
the same distribution, as is commonin the auction literature.
To further improve tractability, we also assume that that an
agent chooses its strategy independently of its type. How-
ever, we do assume that the agent’s type and the distribution
of types are available to the strategy itself.
The heuristic-payoff table is an abstract representation of
the fundamental game in that we have reduced the model
of the game from a potentially very complex, multi-stage
game to a one-shot game in normal form, in which we treat
the choice of heuristic strategies, rather than the basic ac-
tions, as the level of decision making for strategic analysis.
We emphasize that we take the strategies as exogenous and
do not directly analyze their genesis nor their composition.
With this model, we can apply the standard game-theoretic
analysis just as we would with a normal-form game of sim-
ple actions, such as the prisoner’s dilemma.
The standard payoff table for a normal-form game re-
quires S
A
entries, which can be extremely large, even when
S and A are moderate. But we have restricted our analysis
to symmetric games in which each agent has the same set of
strategies and the same distribution of types (and hencepay-
offs). Hence, we can merely compute the payoff for each
strategy as a function of the number of agents playing each
strategy, without being concerned about the individual iden-
tities of those agents. This symmetry reduces the size of the
payoff table enormously. The number of entries in the table
is the number of unique ways into which a set of A identical
agents may be partitioned among S strategies. This quantity
can be shown to be
A S 1
A
, or roughly
A
S 1
S 1 !
when A S.
In this case, changing the exponentfrom A to S 1 results in
a hugereduction in thesize ofthe payofftable. For example,
in applications presented in this paper, A
20, S 3, and
thus the symmetric payoff table contains just 231 entries—
far less than the approximately 3
48 10
9
entries contained
in the asymmetric payoff table.
Payoffs may be computed analytically for sufficiently
simple games and strategies. However, in games of realistic
complexity, it is necessary to measure expected payoffs in
simulation. In games with non-deterministic aspects, it will
be necessary to run many independent simulations for each
payofftable entry, to ensure that the measured payoffsare as
accurate as possible.
With a payoff table in hand, one can use a variety of tech-
niques to gain insight into a system’s collective behavior.
We present three such techniques in this paper: (1) First,
we perform a static analysis, which entails computing Nash
equilibria of the payoff table. To simplify this computation,
we look only for symmetric Nash equilibria, although gen-
erally nothing precludes the existence of asymmetric equi-
libria. (2) Second, we model the dynamics of agents that pe-
riodically and asynchronously switch strategies to ones that
appear to be more successful. This helps us understand the
evolution of the mixture of strategies within the population,
providinginsight into the conditions under which of the var-
ious Nash equilibria may be realized. (3) Third, we suggest
techniques for understanding the strategies themselves at a
deeper level. Since our analysis is performed at the more
coarse-grained level of strategies rather than actions, it does
not directly provide insight into the behavior of the strate-
gies at the level of primitive actions. We would like to un-
derstand how to design good strategies, how the series of
primitiveactionsinteract between strategies, how to improve
strategies, and the system-level effects of strategy improve-
ments. We have begun to address these challenges in two
ways. We analyze how the static and dynamic properties of
the system change when the values in the payoff table are
perturbed. Perturbation analysis suggests the potential ef-
fects of improving one of the strategies, as well as noise or
measurement errors in the payoffs. All of these techniques
will be elucidated in the following three sections.
Comments on Our Modeling Approach
Obtaining the payoff table is the key step in our methodol-
ogy. Given the payoff table, we need not concern ourselves
any more with the details of how it was generated; the rest
of the analyses that we perform all operate mechanically on
the table values, with no need to re-do the simulations that
generated them. Still, obtaining the payoff table at a high
degree of precision may be computationally non-trivial. A
20 agent, 3 strategy game requires the computation of 231
table entries. In our analysis of the CDA game, we averaged
the results of 2500 simulations for each payoff entry, requir-
ing about an afternoon of computation on seven 333MHz-
450MHz IBM RS6000 workstations to compute the entire
table. Computing the payoff table for our application was
feasible because the auctions and heuristic strategies require
fairly simple computations and we had sufficient computa-
tional resources available.
For some games (e.g., iterative combinatorial auctions)
and strategies (e.g., that use expensive optimization algo-
rithms), computing the payoff table may be exorbitantly ex-
pensive. Other factors, such as limited resources or timing
properties of the game, may limit our ability to fully com-
pute the payofftable. For instance, each TAC-01 game takes
exactly 15 minutes, and only two auction servers are pub-
licly available at present. Clearly, for our methodologyto be
more broadly feasible, the expense of computing the payoff
table must be addressed. Currently,we are exploringwaysto
interleave Nash equilibrium computation with payoff com-
putation. We believe that this will admit methods to help
focus on the payoff entries where accuracy is most needed,
rather than performing a large number of simulations for all
entries.
Our methodology is based on heuristic strategies to ad-
dress the limitations in present game-theoretic techniques.
On the other hand, our assumptions of symmetry are made
strictly to reduce storage and computational cost. While it is
straightforward in principle to break the symmetry of type
distributions and strategy sets, the exponential growth in ta-
ble size and computation, both for determining the payoff
entries and for the equilibrium and dynamic analysis we dis-
cuss in subsequent sections, practically restricts the degree
to which we can break symmetry.
We acknowledge that, despite its expedience, the most
controversial aspect of our approach is likely to be the as-
sumption of a single set of heuristic strategies available to
all agents, which avoids the issue of how the strategies are
developed and become known to the agents. Nevertheless,
an observation of current practice suggests that a moder-
ately sized set of heuristic strategies can become available
to the population at large. Two automated bidding services
are openly available to users of eBay, and investing in a mu-
tual fund is an exact method of adopting the fund manager’s
strategy. Indeed, as automated negotiation becomes more
widespread and pervasive, we expect that the majority of
participants in the general public will adopt heuristic strate-
gies developed by others, rather than developing their own.
Admittedly, some participants with sufficient talent and re-
sources will invest in the development of highly specialized
strategies. But where there is an information flow, there is
the possibility for these strategies to spread. Many players
in TAC-01 used techniques publicly documented and shown
to be effective in TAC-00.
Another restriction we have made in our model is the as-
sumption that agentsmake their choices of strategy indepen-
dently of their own type. We should generally expect that
sophisticated agents would choose to condition their strat-
egy choice based on their type.
1
While we could easily
1
Although an individual agent willalways do as well or better to
model type conditioning by discretizing types, the storage
and computational expense quickly explodes, as with com-
puting asymmetric equilibria.
Equilibrium Computation
At the start of the game, each of the A agents chooses to play
one of the S available pure strategies. The payoff to agent i
is a real-valued function u of the strategy played by i and
the strategies played by all other agents. As discussed in the
modelingsection, the payoffis the expectedreward obtained
when the agents play a particular combination of strategies.
Because we assume symmetric strategysets and payoffs, the
payoff to an agent can be represented as the payoff to each
strategy as a function of the number of agents playing each
strategy.
Agent i maychoose its strategies randomly according to a
mixed strategyp
i
ˆp
i 1
ˆp
i S
. Here, ˆp
i j
indicates the
probability that agent i plays strategy j, with the constraints
that ˆp
i j
0 1 and
S
j
1
ˆp
i j
1. The vector of all agents’
mixed strategies is ˆp and the vector of mixed strategies for
all agents except i is ˆp
i
. We indicate by ˆp
i
e
j
, the special
case when agenti plays pure strategy j with probability one.
We denote by u
e
j
ˆp
i
the expected payoff to an agent i
for playing pure strategy j, given that all other agents play
their mixed strategies ˆp
i
. The expected payoff to agent i of
the mixed strategy is then u
ˆp
i
ˆp
i 1
S
j
1
u e
j
ˆp
i 1
ˆp
i j
.
In game theoretic analysis, it is generally assumed that ra-
tional agents would play mixed Nash equilibrium strategies,
whereby no one agent can receive a higher payoff by unilat-
erally deviating to another strategy, given fixed opponents’
strategies. Formally, probabilities ˆp
constitutea Nash equi-
librium iff for all agents i, andall ˆp
i
, u ˆp
i
ˆp
i
u ˆp
i
ˆp
i
.
In the remainder of this paper, we restrict our attention to
symmetric mixed strategy equilibria, whereby ˆp
i
ˆp
k
p
for all agents i and k. We denote an arbitrary, symmetric,
mixed strategy by p and the probability that a given agent
plays pure strategy j by p
j
. Nash equilibria of symmetric
strategies always exist for symmetric games (Weibull 1995),
and are not generally unique. We restrict our attention in this
way for two reasons. First, when searching for symmetric
Nash equilibria, we need only find S, rather than AS proba-
bilities. Second, absent a particular mechanism for breaking
symmetry, it is reasonable to assume symmetry by default,
as is often done in auction analysis with symmetric agent
types. In particular, symmetry is consistent with the particu-
lar evolutionary game theory model we consider in the next
section.
Finding Nash equilibria can be a computationally chal-
lenging problem, requiring solutions to complex, nonlin-
ear equations in the general case. The Nash equilibrium
conditions can be expressed in various equivalent formu-
lations, each suggesting different solution methods (McK-
elvey & McLennan 1996). Several solution methods are
implemented in Gambit (McKelvey, McLennan, & Turocy
2000), a freely available game solver. But because Gambit
condition on type, given the strategic behavior of the other agents,
an equilibrium of strategies conditioned on type could actually be
worse for all agents than a non-type-conditioned equilibrium.
is not able to exploit the symmetry of the games we study, it
requires the full normal form game table as input, severely
limiting the size of problemsthat can feasibly be represented
in the program.
In this work, we formulate Nash equilibrium as a mini-
mum of a function on a polytope. Restricting ourselves to
symmetric equilibria in a symmetric game, the problem is to
minimize:
v
p
S
j 1
max u e
j
p u p p 0
2
(1)
The mixed strategy p
is a Nash equilibrium iff it is a global
minimum of v (McKelvey & McLennan 1996). Although
not all minima of the function v may be global, we can vali-
date that a minimum is global if its value is zero.
In this work we used amoeba (Press et al. 1992), a non-
linear optimizer, to find the zero-points of v in our applica-
tion games. Amoeba searches an n-dimensional space using
a
n 1 -dimensional simplex or polyhedron. The function
is evaluatedat each vertexof the simplexandthe polyhedron
attempts to move down the estimated gradient by a series of
geometric transformations that continually strive to replace
the worst-performing vertex. We repeatedly ran amoeba,
restarting at random points on the S-dimensional simplex,
and stopping when it had found 30 previously-discovered
equilibria in a row. Because S
3 in our applications, we
were able to plot v and verify that we had indeed found all
equilibria. For games with A
20, S 3, and three equi-
libria, it took amoeba roughly ten minutes to terminate on a
450Mhz IBM RS6000 machine.
Dynamic Analysis
Nash equilibria provide a theoretically satisfying view of
ideal static properties of a multi-agent system. Yet often the
dynamic properties may be of equal or greater concern. In
actual systems, it maybe unreasonableto assume that agents
have all correct, common knowledge necessary to compute
equilibria. Furthermore, even when agents have this com-
mon knowledge and the resources to compute equilibria, we
still want to address the question of which equilibrium is
chosen and how agents (implicitly) coordinate to reach it.
Many have studied models of adaptation and learning to
simultaneously address these issues. Common learning ap-
proaches adjust strategy mixes gradually to myopically im-
prove payoffs in repeated play of a game. Definitive, pos-
itive theoretical properties of equilibrium convergence for
general games of A agents and S strategies, have not yet
been established (Fudenberg & Kreps 1993; Fudenberg &
Levine 1993; Jordan 1993). For two-player, two-strategy
games, iterated Gradient Ascent (Singh, Kearns, &Mansour
2000) provably generates dynamics giving an average pay-
off equivalent to some Nash equilibrium. Greenwald and
Kephart (2001) observe empirically that agents using meth-
ods based on no external regret (Freund & Schapire 1996)
and no internal regret (Foster & Vohra 1997) learning play
pure strategies in a frequencycorresponding to a Nash equi-
librium. Approaches based on Q-Learning (Watkins 1989)
optimize long-term payoffs rather than only next-stage pay-
offs. Equilibrium convergence can be guaranteed for a sin-
gle Q-Learner, and for two-player games in the zero-sum
case (Littman 1994),or in the general-sumcase with the use
of a Nash equilibrium solver (Hu & Wellman 1998).
For this paper, we borrow a well-developed model from
evolutionary game theory (Weibull 1995) to analyze strat-
egy choice dynamics. In contrast to the aforementioned ap-
proaches, which model repeated interactions of the same set
of players (i.e., the game players constitute the population),
we posit a large population of N agents, from which A
N
agents are randomly selected at each time step to play the
game. At any given time, each agent in the population plays
one of the S pure strategies, and the fraction of agents play-
ing strategy j is p
j
.
2
The p
j
values define a population vec-
tor of strategy shares p. For sufficiently large N, p
j
may be
treated as a continuous variable.
We use the replicator dynamics formalism to model the
evolution of p with time as follows:
˙p
j
u e
j
p u p p p
j
(2)
where u
p p is the population average payoff, and u e
j
p
is the average payoff to agents currently using pure strategy
j. Equation 2 models the tendency of strategies with greater
than average payoff to attract more followers, and strategies
with less than average payoff to suffer defections.
We prefer that a dynamic model assume minimal in-
formational requirements for agents beyond their own ac-
tions and payoffs. The replicator dynamics equation implies
that agents know u
p p , a rather implausible level of in-
formation. However, we can obtain the same population
dynamics with a more plausible “replication by imitation”
model (Weibull 1995). In that model, an agent switches to
the strategy of a randomly chosen opponent who appears to
be receiving a higher payoff. Alternative models in which
learning at the individual level leads to replicator dynamics
has been discussed in (Borgers & Sarin 1997).
We could interpret p, at a given time, as representing a
symmetric mixed strategy for all N players in the game.
With this interpretation, the fixed points of Equation 2
(where ˙p
j
0 for all strategies j), correspond to Nash equi-
libria, and u
p p and u e
j
p are as defined in the equi-
librium context. When strategy trajectories governed by
Equation 2 convergeto an equilibrium, the equilibrium is an
attractor. However, these strategy trajectories do not nec-
essarily terminate at fixed points. Indeed, there are many
plausible payoff functions that generate limit cycle trajec-
tories (Weibull 1995) or even chaotic trajectories (Sato,
Akiyama, & Farmer 2001).
When multiple Nash equilibria exist, those that are attrac-
tors are clearly the only plausible equilibria within the evo-
lutionary model. With multiple attractors, those with larger
basins of attraction are more likely, assuming that every ini-
tial population state is equally likely. Alternatively, we can
use the basins of attraction to understand which initial pop-
ulation mixes will lead to which equilibrium. Strategy de-
signers who have an interest (e.g., fame or profits for selling
2
Our motivation for overloading the p notation will become ev-
ident below.
software implementing the strategy) in widespread adoption
of their strategies could then determine how much initial
adoption is necessary to lead to an equilibrium containing
a favorable ratio of their strategies.
For our analysis of two particular games (in the “Applica-
tions” section) we use the heuristic payoff table and Equa-
tion 2 to generate a large number of strategy trajectories,
starting from a broad distribution of initial strategy vectors
p. For three strategy choices, the resulting flows can be plot-
ted in a two-dimensional unit simplex and have an immedi-
ate visual interpretation.
Perturbation of Payoffs
In our model, we have assumed a fixed set of exogenously
specified strategies. But because they are heuristic, rather
than game-theoretically computed, we should generally as-
sume that there could be many variations that engender
changes in performance. Because the possible variations
could potentially be infinite, we do not have a way to ac-
count for all variations with our methodology. Still, by per-
turbing the payoff table in some meaningful ways, we can
perform some directed study of plausible effects of certain
abstract changes in strategy behavior.
A question we mayask is how would improving one strat-
egy relative to the others affect the equilibria and dynamics.
We consider a simple model in which the agents playing
strategy σ
“steal” some fraction α 0 1 of the payoff
from the agents playing the other strategies. For each profile
of strategies in the payoff table, and each strategy σ
σ ,
where n agents play σ , n agents play σ , and q σ is
the payoff of strategy σ , we change the payoffs as follows:
q
σ 1 αmin n n n q σ
q σ 1 α min n n n q σ
Note that, for any profile, this perturbation conserves the to-
tal payoff to all agents.
While it may not actually be possible to uniformly im-
prove a strategy as we describe, the approach is suggestive
of the type of perturbation that we might reasonably con-
sider. Other possibilities could include taking surplus from
only one opponent strategy, or uniform improvements to all
strategies (which might occur if one strategy becomes more
adept at executing win-win actions). These kinds of per-
turbations could help direct efforts to improve a strategy’s
likelihood of being adopted and the payoffs in equilibrium.
Alternatively, we might be interested in estimating the ef-
fects of unmodelled variations in the strategies throughout
the population, or performing sensitivity analysis on the es-
timates of expectedpayoffs. To do so, we could perturbindi-
vidual payoffentries randomly, either statically or whenever
used in Equation (2).
Applications
In this section we apply our methodology to two games
with complex, strategic interactions: an Automated Dy-
namic Pricing (ADP) game, and a Continuous Double Auc-
tion (CDA) game. We chose these games because of the in-
tractability of computingequilibria in the underlying games,
an existing body of literature which includes interesting
heuristic strategies, and the availability of simulators for
computing the heuristic payoff tables.
Automated Dynamic Pricing Game
Description of the ADP Game Interest in Internet com-
merce has fueled the emergence of software agents such as
shopbotsthat greatly facilitate comparison shoppingby buy-
ers. Shopbots may also enable seller agents called pricebots
to dynamically set posted prices based on competitor be-
havior. An example is the site buy.com, which monitors
its primary competitors’ prices and automatically undercuts
the lowest. Interactions among such pricebots can generate
rather complex price dynamics.
Models of interacting pricebots using a variety of pricing
strategies have been studied in (Greenwald & Kephart 1999;
Greenwald, Kephart, & Tesauro 1999). In these models, a
set of sellers offers a single homogeneous good to a much
larger set of buyers. At random times, over the course of a
large number of discrete time steps, sellers reset their prices
and buyers attempt to purchase. A buyer wishes to purchase
one unit of the good, preferring to pay lower prices not ex-
ceeding its value, which is randomly chosen from the uni-
form distribution in the unit interval. All sellers have the
same constant productioncost, and their objective is to max-
imize the product of their per-unit profit and the number of
sales. We assume that buyers use one of two simple rules
for seller selection. A fixed 50% of buyers selects a seller at
random,and the rest usea shopbot to findthe current lowest-
price seller.
We formulate a one-shot ADP Game of heuristic strate-
gies, abstracting the underlying game of repeated price-
setting by sellers. At the start of the game, sellers choose
one of three heuristic dynamic pricing strategies to use for
the duration of the game. The “game theory” (GT) strat-
egy (Greenwald & Kephart 1999) plays a mixed-strategy
Nash equilibrium computedfor the underlyinggame assum-
ingthat all pricing and purchasingdecisions aremade simul-
taneously. The “derivative-follower” (DF) strategy (Green-
wald & Kephart 1999) implements a simple hill-climbing
adaptation, experimenting with incremental price adjust-
ments to improve observed profitability, while ignoring as-
sumptions about buyers or competitors. The “No-Internal-
Regret” (NIR) strategy (Greenwald & Kephart 2001) adapts
learning techniques from Foster Vohra (1997) to adaptively
improveits pricing.
Following the simulation procedures in (Greenwald &
Kephart 1999; 2001), we computed heuristic payoff tables
for seller population sizes of 5 and 20 pricebots. Each table
entry indicates the time-averagedpayoff over 1 million time
steps, with a single seller resetting its price at each time step.
Analysis of ADP Game Table 1 shows the Nash equilib-
ria for ADP Games with 5 and 20 pricebots. None of the
equilibria involve the DF pricing strategy, signifying its rel-
ative weakness. Among the equilibria in Table 1 only A is a
pure-strategy Nash equilibrium. It is also interesting to note
that the number of equilibria dwindles from three to one as
the size of the ADP Game increases from 5 to 20 pricebots.
GT
Equilibrium point
Basin boundary
Pure strategy
(a)
NIR DF
|p| scale
0
1.7 x10
-2
.
1.7 x10
-6
B
C
A
GT
NIR
DF
(b)
D
Figure 1: (a) Replicator dynamics for the Automated Pricing Game with 5 pricebots. Points p in the simplex represent strategy
mixes, with homogeneouspopulationslabeled at the vertices. The trajectories in the simplex describe the motion of p following
Equation 2. Open circles are Nash equilibria, corresponding to fixed points of Equation 2, with labels correspondingto those in
Table 1. The dashed line denotes the boundaryof the two basins of attraction. The gray shading is proportionalto the magnitude
of ˙p. (b) Replicator dynamics for the Automated Pricing Game with 20 pricebots.
Agents Label p(GT) p(DF) p(NIR) Payoff
A 1.000 0.000 0.000 0.051
5 B 0.871 0.000 0.129 0.049
C 0.030 0.000 0.969 0.047
20 D 0.986 0.000 0.014 0.013
Table 1: The symmetric Nash mixed-strategy equilibria for
the Automated Pricing Game with 5 and 20 pricebots. Each
row is an equilibrium, showing the probabilities of choos-
ing the high-level strategies (GT, DF, and NIR), and the ex-
pected equilibrium payoff. The labels of the equilibria cor-
respond to those shown in Figures 1(a) and 1(b).
The replicator dynamics for the ADP Game are shown in
Figure 1(a) and 1(b). The strategy space is represented by
a two-dimensional unit simplex with vertices corresponding
to the pure strategies p
1 0 0 (all GT), p 0 1 0 (all
DF), and p 0 0 1 (all NIR). Trajectories are obtained by
starting from an initial pointandapplying Equation 2 repeat-
edly until ˙p
0.
For the 5-agent case shown in Figure 1(a), points A, B,
and C are the Nash equilibria shown in Table 1. A and C are
attractors, while B is a saddle point. Therefore only A and
C can be reached asymptotically as B is unstable to small
fluctuations. Note that C, consisting almost entirely of the
NIR strategy, has a much larger basin of attraction than the
pure-GT point A, suggesting it to be the most likely out-
come. Although it is not correct to refer to NIR as the “best”
strategy, its attraction is the strongest in the 5-agent game.
In the 20-agent case shown in Figure 1(b), however, we
find a surprising reversal: there is now only one Nash equi-
librium, consisting almost entirely of GT. NIR is much
weaker, relative to GT, as compared to the 5-agent game.
NIR nevertheless can play a crucial role in determining the
globalflowin thestrategy simplex. While alltrajectories ter-
minate at D, a significant fraction of them pass close to the
pure-NIR vertex. The light shading in this vicinity indicates
a very small magnitude of ˙p. This implies that even though
all-NIR is not a Nash equilibrium, the population can spend
a significantly long time with most agents adopting the NIR
strategy.
Perturbation analysis of the population dynamics, with 20
sellers, σ
NIR, σ GT and α 0 06, results in the
emergence of two new Nash equilibrium points (one an at-
tractor and another unstable), consisting solely of a mix of
GT and NIR strategies. When α
0 0675, there is an attrac-
tor equilibrium with a majority of NIR and a basin of attrac-
tion coveringmore than half of the simplex. Further increas-
ing α progressivelydecreases Ds basin of attraction. By the
pointat whichα increases to
0 1, Ddisappears completely
and a single Nash equilibrium point remains near the pure
strategy NIR vertex. The resulting simplex flow is similar to
that in Figure1(b), but with the positions of the purestrategy
verticesGT andNIR interchanged. In short, NIR would start
becoming a strong strategy with a 6
75% improvement, and
would become nearly dominant with a 10% improvement.
The weakness of NIR in the 20-agent game was quite
unexpected to us, given the strength it exhibited in a pre-
vious study using up to five agents (Greenwald & Kephart
2001), and in the present 5-agent game. This demonstrates
the value of performing equilibrium and dynamic analysis
on a relatively large number of agents. The results here have
suggestedto us a numberof avenuesfordeeperstudyandde-
velopment of the strategies, further demonstrating the value
of our methodology. PerhapsNIR, whichhas several param-
eters, needs to be retuned for populations of different sizes.
Alternatively, our results may show that it is more difficult
for NIR to learn when playing against a greater number of
agents. Whatever conclusions could be reached, we have al-
ready gained a greater understanding with our methodology
than we could with a more simple analysis.
Continuous Double Auction Game
Description of the CDA Game The CDA is the predom-
inant mechanism used for trading in financial markets such
as NASDAQ and NYSE. A CDA continually accepts bids,
which either immediately match pending bids, or remain
standing until matched by later bids. Models of CDAs have
been extensively studied using both human traders (Smith
1982) and computerized traders (Rust, Miller, & Palmer
1993; Cliff & Bruten 1997; Gjerstad & Dickhaut 1998).
Based on these, we adopt a model in which agents trade in a
CDA marketplace for five consecutive trading periods, with
a fresh supply of cash or commodity provided at the start of
each period.
We implement the CDA marketplace and simulation as
describedin detail by Tesauro and Das (2001), exceptfor the
details of choosing buyer/seller roles and limit prices, as de-
scribed here. At the start of the game, half of the agents are
randomly chosen to be buyers, and the remainder are sell-
ers. Agents are given a list of ten limit prices (seller costs or
buyer values) generated from a known random distribution.
This distributionuses fixed parameters to generate lower and
upperboundson the limit prices from auniform distribution,
and then generates the limit prices using a uniform distribu-
tion between these two bounds. For each run of the game,
we randomly select the integer lower bound b of all buyers
prices uniformly from
61 160 and the upper bound from
b 60 b 209 . We compute the bounds similarly for sell-
ers. The payoff for trading a unit i is s
i
x
i
l
i
for sellers
and s
i
l
i
x
i
for buyers, where x
i
is the trade price and
l
i
is the unit’s limit price. The total payoff obtained by an
agent in the CDA Game is
i
s
i
. If the efficient number of
trades (i.e. the number that maximizes value summed over
all agents) at these limit prices is less than 10 or more than
90, we recalculate the bounds and limit prices.
We formulate a normal-form CDA Game of heuristic
strategies, abstracting the underlying game of continuous
bidding. Each agent chooses a strategy from a set of three
alternatives at the start of the game, and does not change
during the game. The “Zero-Intelligence Plus” (ZIP) strat-
egy we use (Tesauro & Das 2001) is a modified version of
that studied by Cliff and Bruten (1997). ZIP initially bids
to obtain a high surplus value (or profit) and subsequently
adjusts its bid price price towards the price of any observed
trades, or in the direction of improvement when no trades
have occurred after a period of time. The Gjerstad-Dickhaut
(GD) strategy we use (Tesauro & Das 2001), is a modified
version of the original (Gjerstad & Dickhaut 1998). GD cal-
culates a heuristic “belief” function based on the history of
recent market activity, and places bids to maximize the ex-
Size Label p(ZIP) p(GD) p(Kaplan) Payoff
14 - 0.420 0.000 0.580 0.967
A 0.439 0.000 0.561 0.972
20 B 0.102 0.542 0.356 0.991
C 0.000 0.690 0.310 0.989
Table 2: The symmetric Nash mixed-strategy equilibria for
the CDA Game with 14 and 20 agents. Each row is an equi-
librium, showing the probabilities of choosingthe high-level
strategies (ZIP, GD, and Kaplan), and the expected equilib-
rium payoff. The labels of the equilibria correspond to those
shown in Figure 2(a).
pected payoff, given the belief function. The Kaplan strat-
egy (Rust, Miller, & Palmer 1993) withholds bids until the
bid/ask spread decreases to a sufficiently small amount or
the end of a period is near.
We compute the heuristic-payoff table by averaging the
results from 2500 simulations for each profile of strategies.
Analysis of the CDA Game Previous studies show that
the choice of strategy in the CDA Game from amongst the
alternatives of ZIP, GD and Kaplan is an interesting prob-
lem without an obvious solution. Kaplan was the winner
of the Santa Fe Double Auction Tournament (Rust, Miller,
& Palmer 1993). However, the Kaplan strategy does not
perform well against itself, and must be parasitic on the in-
telligent bidding behavior of other agent strategies to ob-
tain decent profits. A recent analysis (Tesauro & Das 2001)
showsthat various published biddingstrategies all givegood
empirical performance and that none is dominant. The ho-
mogeneous pure-strategy populations are unstable to defec-
tion: all-ZIP and all-GD can be invaded by Kaplan, and all-
Kaplan can be invaded by ZIP or GD. Hence the Nash equi-
libria are difficult to compute by inspection or other simple
means.
We applied our solution method to the CDA Game with
various numbers of agents. We show the equilibria for 14
and 20 agents in Table 2. CDA Games with 6, 12, and 14
agents each result in only one Nash equilibriumwith similar
mixed-strategy vectors p
which assign zero probability to
choosingthe GD strategy. The results were qualitativelydif-
ferent for CDA Games with larger populations. For 16, 18,
and 20 agent games we found three equilibria, very similar
for each number of agents. One of these equilibria matched
the small-population equilibrium, and there are two addi-
tional equilibria, one involving only GD and Kaplan, and
one using all three strategies.
We plot the replicator dynamics for the 20-agent CDA
Game in Figure 2(a). The strategy space is represented by
a two-dimensional unit simplex with vertices corresponding
to the pure strategies p
1 0 0 (all ZIP), p 0 1 0 (all
GD), and p 0 0 1 (all Kaplan). The points labeled A,
B, and C are the Nash equilibria shown in Table 2. A and
C are both attractors, while B is a saddle point, hence only
A and C are realistic outcomes. We also note that A has
a much larger basin of attraction than C. If the initial p is
chosen randomly, the populationmix is most likelyto termi-
nate at A even though C has higher population payoff. An
Equilibrium point
Basin boundary
Pure strategy
B
C
D
ZIP
Kaplan
GD
A
(a)
|p| scale
0
4 x10
-2
.
4 x10
-6
ZIP
A'
B'
Kaplan
D'
(b)
C'
E'
GD
Figure 2: (a) Replicator dynamics for the CDA Game. Open circles are Nash equilibria with labels corresponding to those in
Table 2. Other notations are similar to those in Figure 1. (b) Replicator dynamics for the CDA Game with perturbed payoffs,
in which 5% of the ZIP and Kaplan agent payoffs was shifted to GD agents.
additional point D is shown which is an equilibrium of ZIP
and GD in the two-strategy case, but which has incentive to
defect to Kaplan in the full three-strategy game.
We also observed the the replicator dynamics for the 14-
agent CDA Game. There we found that the single equilib-
rium was an stable attractor to which all flows converged.
The gray shading in Figure 2(a), which denotes
˙p , in-
dicates that the population changes much more rapidly near
the lower-left corner (mostly Kaplan) than near the lower
right corner (mostly GD). This shows that there is only a
slight incentive to deviate to Kaplan in an all-GD popula-
tion, and a much larger incentive to switch to ZIP or GD in
an all-Kaplan population. Note that
˙p can vary by up to
four orders of magnitude. In particular, the magnitude of the
flows leading away from B are much smaller than the av-
erage
˙p in the simplex. Hence, although B is an unstable
equilibrium, the population could actually spend a relatively
long time in the region of B. The change in this region could
be so slow as to appear stable if the base time scale of deci-
sion making is not sufficiently fast.
We studied the sensitivity of the population dynamics for
the CDA Game shown in Figure 2(b) by simultaneously im-
proving σ
GD, relative to σ ZIP and σ Kaplan,
by α 05. The replicator dynamics of the perturbed CDA
Game shows significant change in the topology of the flows,
as depicted in Figure 2(b). The right edge of the simplex,
correspondingto a mix of GD agents and ZIP agents, is now
stable against invasionby Kaplan agents. As a consequence,
D
is now a Nash equilibrium point, and due to the global
topology of flows, a new interior equilibrium point occurs at
E
. The equilibriumpoint C has movedto the vertex of pure
GD. Only A and C are stable equilibria.
Although pure GD is still not dominant, nor the only at-
tractor, C
captures much of the simplex in its basin of at-
traction, making it the most likely attractor in the perturbed
CDA Game. Thus, if GD could be improved to capture
an extra 5% of other agents’ surpluses, it would likely be
widely adopted in the population. Moreover, although the
payoffs are actually not perturbed at C
(because it is a pure
strategy), we measured that the payoff there is higher than
the other perturbed and non-perturbed equilibrium points.
We also searched the range of improvements for GD and
found that an equilibrium containing GD captures most of
the simplex in its basin of attraction when α
0 0075. In
short, GD would start becoming a strong strategy with as
little as a 0
75% improvement, and would become nearly
dominant with a 5% improvement.
Discussion
We have proposed an approach for analyzing heuristic
strategies for complex games with repeated interactions.
Our methodology treats the heuristic strategies, rather than
the component atomic actions, as primitive, and computes
expected payoffs to agents as a function of the joint strat-
egy space. With our approach, we can draw useful and gen-
eral conclusions for games that defy analysis at the level of
atomic actions.
We have shown how to apply our methodology to two
games whose complexity has thus far defied game-theoretic
analysis at the level of atomic actions. For each, we found
multiple Nash equilibria. To address the issue of how a par-
ticular equilibrium may be realized, we computed the dy-
namics of a population in terms of the change of propor-
tional shares of strategies. We argued that dynamically un-
stable equilibria will not be realized, and that the attractor
equilibrium with the largest basin of attraction was the most
likely to be played in steady-state, while noting the effects
of time scale on convergence. We also examined perturba-
tions of the expected payoffs to identify how modest im-
provements to one strategy could significantly change the
dynamic properties and the set of equilibria. For each appli-
cation we discovered interesting and surprising results not
apparent from the simpler analyses commonly applied to
heuristic strategies.
Our approach is more principled and complete than the
feasible methods commonly applied to the analysis of com-
plex, strategic interactions. Still, more work is needed to
provide a full understanding of these interactions. While
we have touched on it with perturbation analysis, a deeper
understanding of how to design and improve the strategies
themselves requires a bridging of action-level and heuristic-
level analysis. We have made some preliminary progress
along these lines in the CDA game by a direct study of the
pure payoff table. There we found that studying the region
where a strategy fairs most poorly against others, combined
with a deep understanding of the strategies themselves, can
provide inspiration for strategy improvements.
Our general approach would also be improved with ad-
vances in the specific techniques employed. New learn-
ing algorithms should provide improved equilibrium con-
vergence in small populations. As mentioned above, we are
exploring techniques to minimize the amount of payoff ta-
ble computation necessary to accurately determine equilib-
ria. We hope this will make our methodology feasible for
analysis of the top TAC strategies. Additionally, we could
also perform sensitivity analysis for those games requiring
expensive simulation.
Our modeling assumptions and the computational tech-
niques we employ give us the ability to analyze relatively
large numbers of agents. This in turn allows us to observe
qualitatively different behaviors as the number of agents
grows. In some cases it may be reasonableto extrapolateour
results to even larger numbers of agents, beyond our ability
to directly compute.
The most important computational limitation of our
methodologyis anexponentialdependenceon thenumberof
high-levelstrategies. This wouldseem to limit its applicabil-
ity to real-world domains where there are potentially many
heuristic strategies. This apparent limitation may be sur-
mountable if the numerous heuristic strategies can be placed
into a small number of broad functional groups, and if vari-
ations within a group are not as important as the total popu-
lation fraction within each group. For example, a reasonable
approximation for financial markets may be to classify the
available trading strategies as either “buy-and-hold, “fun-
damental, or “technical” strategies, and then carry out our
analysis. To allow for algorithmic variations within each
group, we could randomly choose between variations for
each simulation when computing the payoff table.
Acknowledgments
We thank David Parkes for his helpful comments.
References
Axelrod, R. 1997. Evolving new strategies: The evolu-
tion of strategies in the iterated prisoner’s dilemma. In The
Complexity of Cooperation. Princeton University Press.
Borgers, T., and Sarin, R. 1997. Learning through rein-
forcement and replicator dynamics. Journal of Economic
Theory 77:1–14.
Cliff, D., and Bruten, J. 1997. Minimal-intelligence agents
for bargaining behaviors in market-based environments.
Technical Report HPL-97-91, Hewlett Packard Labs.
Foster, D., and Vohra, R. 1997. Regret in the on-line deci-
sion problem. Games and Economic Behavior 21:40–55.
Freund, Y., and Schapire, R. E. 1996. Game theory, on-
line prdiction and boosting. In Ninght Annual Conference
on Computational Learning Theory.
Fudenberg, D., and Kreps, D. M. 1993. Learning mixed
equilibria. Games and Economic Behavior 5:320–367.
Fudenberg,D., and Levine, K. K. 1993. Steady state learn-
ing and Nash equilibrium. Econometrica 61(3):547–573.
Gjerstad, S., and Dickhaut, J. 1998. Price formation in
double auctions. Games and Economic Behavior 22:1–29.
Greenwald, A., and Kephart, J. 1999. Shopbots and price-
bots. In Proceedings of Sixteenth International Joint Con-
ference on Artificial Intelligence, volume 1, 506–511.
Greenwald, A., and Kephart, J. O. 2001. Probabilistic
pricebots. In Fifth Internationl Conference on Autonomous
Agents, 560–567.
Greenwald, A., and Stone, P. 2001. Autonomous bid-
ding agents in the trading agent competition. IEEE Internet
Computing 5(2):52–60.
Greenwald, A.; Kephart, J.; and Tesauro, G. 1999. Strate-
gic pricebot dynamics. In Proceedings of First ACM Con-
ference on E-Commerce, 58–67.
Hu, J., and Wellman, M. P. 1998. Multiagent reinforce-
ment learning: theoretical framework and an algorithm. In
Proceedings of the Fifteenth International Conference on
Machine Learning, 242–250. Morgan Kaufmann.
Jennings, N. R., and Wooldridge, M. 2002. Agent-oriented
software engineering. In Bradshaw, J., ed., Handbook of
Agent Technology. AAAI/MIT Press.
Jordan, J. S. 1993. Three problems in learning mixed-
strategy Nash equilibria. Games and Economic Behavior
5:368–386.
Littman, M. 1994. Markov games as a framework for
multi-agent reinforcement learning. In Proceedings of the
Eleventh International Conference on Machine Learning,
157–163. Morgan Kaufmann.
McKelvey,R. D., and McLennan, A. 1996. Computationof
equilibria in finite games. In Handbook of Computational
Economics, volume 1. Elsevier Science B. V.
McKelvey, R. D.; McLennan, A.; and Turocy, T. 2000.
Gambit Command Language: Version 0.96.3.
Milgrom, P. 2000. Putting auction theory to work: The
simultaneous ascending auction. The Journal of Political
Economy 108(2):245–272.
Press, W.; Teukolsky, S.; Vetterling, W.; and Flannery, B.
1992. Numericalrecipes in C. CambridgeUniversityPress.
Rust, J.; Miller, J.; and Palmer, R. 1993. Behavior of
trading automata in a computerizeddouble auction market.
In Friedman, D., and Rust, J., eds., The Double Auction
Market: Institutions, Theories, and Evidence. Addison-
Wesley.
Sato, Y.; Akiyama, E.; and Farmer, J. D. 2001. Chaos
in learning a simple two person game. Technical Report
01-09-049, Santa Fe Institute.
Singh, S.; Kearns, M.; and Mansour, Y. 2000. Nash con-
vergence of gradient dynamics in general-sum games. In
Proceedings of UAI-2000, 541–548. Morgan Kaufman.
Smith, V. L. 1982. Microeconomic systems as an experi-
mental science. American Economic Review 72:923–955.
Tesauro, G., and Das, R. 2001. High-performancebidding
agents for the continuous double auction. In Third ACM
Conference on Electronic Commerce, 206–209.
Watkins, C. 1989. Learning from Delayed Rewards. Ph.D.
Dissertation, Cambridge University, Cambridge.
Weibull, J. W. 1995. Evolutionary Game Theory. The MIT
Press.
Wellman, M. P.; Wurman, P. R.; O’Malley, K.; Bangera,
R.; Lin, S.; Reeves, D.; and Walsh, W. E. 2001. Designing
the market game for a trading agent competition. IEEE
Internet Computing 5(2):43–51.
... Given this surprising recent result, there is an appetite for further zero intelligence ACE-style market simulation studies involving GVWY and SHVR. One compelling issue to explore is the co-adaptive dynamics of markets populated by traders that can choose to play one of the three strategies from GVWY, SHVR, and ZIC, in a manner similar to that studied by Walsh et al. (2002) who employed 'replicator dynamics' modelling techniques borrowed from theoretical evolutionary biology to explore the co-adaptive dynamics of markets populated by traders that could choose between SNPR, ZIP, and GD. ...
... A primary motivation for studying such co-evolutionary markets with adaptive PRZI traders is the desire to move beyond prior studies of markets populated by adaptive automated traders in which the "adaptation" merely involves selecting between one of typically only two or three fixed strategies (as in, for example, (Walsh et al. 2002;Vytelingum et al. 2008;Vach 2015)). The aim here is to create minimal model markets in which the space of possible ZI strategies is infinite, as a better approximation to the situation in real financial markets with high degrees of automated trading. ...
... homogeneous); or (N T − 1) :1 (i.e. OIM); or A method by which that open question could be resolved was developed by Walsh et al. (2002) who borrowed the technique of replicator dynamics analysis (RDA) from evolutionary game theory (see, for example, Maynard Smith (1982)). In a typical RDA, the population of traders is initiated with some particular ratio of the N S strategies being compared, and the traders are allowed to interact in the market as per usual, but every now and again an individual trader will be selected via some stochastic process and will be allowed to mutate its current strategy S i to one of the other available strategies S j≠i if that new strategy appears to be more profitable than S i . ...
Article
Full-text available
I introduce parameterised response zero intelligence (PRZI), a new form of zero intelligence (ZI) trader intended for use in simulation studies of the dynamics of continuous double auction markets. Like Gode and Sunder’s classic ZIC trader, PRZI generates quote prices from a random distribution over some specified domain of discretely valued allowable quote prices. Unlike ZIC, which uses a uniform distribution to generate prices, the probability distribution in a PRZI trader is parameterised in such a way that its probability mass function (PMF) is determined by a real-valued control variable s in the range [-1.0,+1.0][1.0,+1.0][-1.0, +1.0] that determines the strategy for that trader. When s=0s=0, a PRZI trader behaves identically to the ZIC strategy, with a uniform PMF; but when s≈±1s±1s \approx \pm 1 the PRZI trader’s PMF becomes maximally skewed to one extreme or the other of the price range, thereby making it more or less “urgent” in the prices that it generates, biasing the quote price distribution towards or away from the trader’s limit price. To explore the co-evolutionary dynamics of populations of PRZI traders that dynamically adapt their strategies, I show initial results from long-term market experiments in which each trader uses a simple stochastic hill-climber algorithm to repeatedly evaluate alternative s-values and choose the most profitable at any given time. In these experiments the profitability of any particular s-value may be non-stationary because the profitability of one trader’s strategy at any one time can depend on the mix of strategies being played by the other traders at that time, which are each themselves continuously adapting. Results from these market experiments demonstrate that the population of traders’ strategies can exhibit rich dynamics, with periods of stability lasting over hundreds of thousands of trader interactions interspersed by occasional periods of change. Python source code for PRZI traders, and for the stochastic hill-climber, have been made publicly available on GitHub.
... Empirical Game Theory Analysis (EGTA), deploying empirical or meta-games [15] [20], [8], [26], can be used to evaluate learning agents that interact in large-scale multiagent systems [13], [22] [23]. In our case, aiming at strategy profile rankings, we define the empirical game strategies based on agents' styles of play and train policies realizing these strategies. ...
Preprint
Full-text available
Game-theoretic solution concepts, such as the Nash equilibrium, have been key to finding stable joint actions in multi-player games. However, it has been shown that the dynamics of agents' interactions, even in simple two-player games with few strategies, are incapable of reaching Nash equilibria, exhibiting complex and unpredictable behavior. Instead, evolutionary approaches can describe the long-term persistence of strategies and filter out transient ones, accounting for the long-term dynamics of agents' interactions. Our goal is to identify agents' joint strategies that result in stable behavior, being resistant to changes, while also accounting for agents' payoffs, in dynamic games. Towards this goal, and building on previous results, this paper proposes transforming dynamic games into their empirical forms by considering agents' strategies instead of agents' actions, and applying the evolutionary methodology α\alpha-Rank to evaluate and rank strategy profiles according to their long-term dynamics. This methodology not only allows us to identify joint strategies that are strong through agents' long-term interactions, but also provides a descriptive, transparent framework regarding the high ranking of these strategies. Experiments report on agents that aim to collaboratively solve a stochastic version of the graph coloring problem. We consider different styles of play as strategies to define the empirical game, and train policies realizing these strategies, using the DQN algorithm. Then we run simulations to generate the payoff matrix required by α\alpha-Rank to rank joint strategies.
... This approach allows us to represent a strategy profile as a vector containing the count of players choosing each strategy, thereby reducing the number of profiles from 3 10 to 10+| | −1 10 = 66. We then replace the original payoff matrix with a heuristic payoff table (HPT) [28,32], where the payoffs of each meta strategy are stored as a function of the number of players using it. However, despite the players' probabilities being drawn from the same prior (5) distribution, their realized access may differ, leading to variability in payoffs even for identical strategy choices. ...
Preprint
The Ethereum block production process has evolved with the introduction of an auction-based mechanism known as Proposer-Builder Separation (PBS), allowing validators to outsource block production to builders and reap Maximal Extractable Value (MEV) revenue from builder bids in a decentralized market. In this market, builders compete in MEV-Boost auctions to have their blocks selected and earn potential MEV rewards. This paper employs empirical game-theoretic analysis to explore builders' strategic bidding incentives in MEV-Boost auctions, focusing on how advantages in network latency and access to MEV opportunities affect builders' bidding behaviors and auction outcomes. Our findings confirm an oligopolistic dynamic, where a few dominant builders, leveraging their advantages in latency and MEV access, benefit from an economy of scale that reinforces their market power, leading to increased centralization and reduced auction efficiency. Our analysis highlights the importance of fair MEV distribution among builders and the ongoing challenge of enhancing decentralization in the Ethereum block building market.
... A metric-space can be useful in performing perturbation analysis (as hinted in Section 6.4). It is not uncommon for payoffs to be estimated from data, such as in empirical game-theoretic analysis (EGTA) (Walsh et al., 2002;Wellman, 2006). Such payoffs have uncertainty, and small changes in the payoffs can cause large changes to the resulting equilibria. ...
Preprint
Full-text available
Equilibrium solution concepts of normal-form games, such as Nash equilibria, correlated equilibria, and coarse correlated equilibria, describe the joint strategy profiles from which no player has incentive to unilaterally deviate. They are widely studied in game theory, economics, and multiagent systems. Equilibrium concepts are invariant under certain transforms of the payoffs. We define an equilibrium-inspired distance metric for the space of all normal-form games and uncover a distance-preserving equilibrium-invariant embedding. Furthermore, we propose an additional transform which defines a better-response-invariant distance metric and embedding. To demonstrate these metric spaces we study 2×22\times2 games. The equilibrium-invariant embedding of 2×22\times2 games has an efficient two variable parameterization (a reduction from eight), where each variable geometrically describes an angle on a unit circle. Interesting properties can be spatially inferred from the embedding, including: equilibrium support, cycles, competition, coordination, distances, best-responses, and symmetries. The best-response-invariant embedding of 2×22\times2 games, after considering symmetries, rediscovers a set of 15 games, and their respective equivalence classes. We propose that this set of game classes is fundamental and captures all possible interesting strategic interactions in 2×22\times2 games. We introduce a directed graph representation and name for each class. Finally, we leverage the tools developed for 2×22\times2 games to develop game theoretic visualizations of large normal-form and extensive-form games that aim to fingerprint the strategic interactions that occur within.
... Several studies adopting game-theoretic methods to solve air traffic management problems can be found in the literature [11,[18][19][20][21][22][23]. However, these previous studies reported that finding Nash equilibria for problems involving more than two agents is complicated [24][25][26][27]. There is no guarantee in the existence of a finite number of Nash equilibria for an arbitrary system. ...
Article
Full-text available
This study introduces a decentralized traffic management algorithm for free-flowing aerial vehicles without explicit communications. The algorithm based on the concept of Nash equilibrium works to improve air traffic management efficiency while considering the different urgency levels of agents, such as unmanned aerial vehicles. An algorithm for a relatively simple two-agent case was developed first. Then, we extended the algorithm to a more general algorithm involving many agents. Numerical experiments comparing the performance of the proposed algorithm with the baseline adopted in current practice were conducted to demonstrate the effectiveness of the new approach.
... Critically, this policy will need to be recomputed each time the action-value is updated during training, and for large or continuous state-space Markov games, every time the agents need to take an action. Another class of multiagent algorithms are those in the space of Empirical Game Theoretic Analysis (EGTA) [60,61] including PSRO [35,44], JPSRO [40], and NeuPL [39,38]. These algorithms are capable of training policies in extensive-form games, and require finding equilibria of empirically estimated normal-form games as a subroutine (the "meta-solver" step). ...
Preprint
Full-text available
Solution concepts such as Nash Equilibria, Correlated Equilibria, and Coarse Correlated Equilibria are useful components for many multiagent machine learning algorithms. Unfortunately, solving a normal-form game could take prohibitive or non-deterministic time to converge, and could fail. We introduce the Neural Equilibrium Solver which utilizes a special equivariant neural network architecture to approximately solve the space of all games of fixed shape, buying speed and determinism. We define a flexible equilibrium selection framework, that is capable of uniquely selecting an equilibrium that minimizes relative entropy, or maximizes welfare. The network is trained without needing to generate any supervised training data. We show remarkable zero-shot generalization to larger games. We argue that such a network is a powerful component for many possible multiagent algorithms.
Chapter
We report on a series of experiments in which we study the coevolutionary “arms-race” dynamics among groups of agents that engage in adaptive automated trading in an accurate model of contemporary financial markets. At any one time, every trader in the market is trying to make as much profit as possible given the current distribution of different other trading strategies that it finds itself pitched against in the market; but the distribution of trading strategies and their observable behaviors is constantly changing, and changes in any one trader are driven to some extent by the changes in all the others. Prior studies of coevolutionary dynamics in markets have concentrated on systems where traders can choose one of a small number of fixed pure strategies, and can change their choice occasionally, thereby giving a market with a discrete phase-space, made up of a finite set of possible system states. Here we present first results from two independent sets of experiments, where we use minimal-intelligence trading-agents but in which the space of possible strategies is continuous and hence infinite. Our work reveals that by taking only a small step in the direction of increased realism we move immediately into high-dimensional phase-spaces, which then present difficulties in visualising and understanding the coevolutionary dynamics unfolding within the system. We conclude that further research is required to establish better analytic tools for monitoring activity and progress in co-adapting markets. We have released relevant Python code as open-source on GitHub, to enable others to continue this work.KeywordsFinancial marketsAgent-based computational economicsCoadaptationCoevolutionAutomated tradingMarket dynamics
Article
Multiagent policy evaluation and seeking are long-standing challenges in developing theories for multiagent reinforcement learning (MARL), due to multidimensional learning goals, nonstationary environment, and scalability issues in the joint policy space. This article introduces two metrics grounded on a game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multiagent learning. We adopt strict best response dynamics (SBRDs) to model selfish behaviors at a meta-level for MARL. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than α\alpha -rank, which relies on weakly better responses. We first consider settings where the difference between the largest and second largest equilibrium metric has a known lower bound. With this knowledge, we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for the difference is unknown. For this setting, we propose a class of perturbed SBRD such that the metrics of the policies observed with nonzero probability differ from the optimal by any given tolerance. The proposed perturbed SBRD addresses the scalability issue and opponent-induced nonstationarity by fixing the strategies of others for the learning agent, and uses empirical game-theoretic analysis to estimate payoffs for each strategy profile obtained due to the perturbation.
Article
Full-text available
This report describes simple mechanisms that allow autonomous software agents to engage in bargaining behaviors in market-based environments. Groups of agents with such mechanisms could be used in applications including market-based control, internet commerce, and economic modelling. After an introductory discussion of the rationale for this work, and a brief overview of key concepts from economics, work in market-based control is reviewed to highlight the need for bargaining agents. Following this, the early experimental economics work of Smith (1962) and the recent results of Gode and Sunder (1993) are described.
Article
Full-text available
Photocopy. Supplied by British Library. Thesis (Ph. D.)--King's College, Cambridge, 1989.
Article
Full-text available
We investigate the problem of learning to play the game of rock-paper-scissors. Each player attempts to improve her/his average score by adjusting the frequency of the three possible responses, using reinforcement learning. For the zero sum game the learning process displays Hamiltonian chaos. Thus, the learning trajectory can be simple or complex, depending on initial conditions. We also investigate the non-zero sum case and show that it can give rise to chaotic transients. This is, to our knowledge, the first demonstration of Hamiltonian chaos in learning a basic two-person game, extending earlier findings of chaotic attractors in dissipative systems. As we argue here, chaos provides an important self-consistency condition for determining when players will learn to behave as though they were fully rational. That chaos can occur in learning a simple game indicates one should use caution in assuming real people will learn to play a game according to a Nash equilibrium strategy.
Article
Full-text available
The authors study the steady states of a system in which players learn about the strategies their opponents are playing by updating their Bayesian priors in light of their observations. Players are matched.at random to play a fixed extensive-form game and each player observes the realized actions in his own matches but not the intended off-path play of his opponents or the realized actions in other matches. Because players are assumed to live finite lives, there are steady states in which learning continually takes place. If lifetimes are long and players are very patient, the steady state distribution of actions approximates those of a Nash equilibrium. Copyright 1993 by The Econometric Society.
Chapter
We review the current state of the art of methods for numerical computation ofNash equilibria for finite n-person games. Classical path following methods, such as theLemke-Howson algorithm for two person games, and Scarf-type fixed point algorithmsfor n-person games provide globally convergent methods for finding a sample equilibrium.For large problems, methods which are not globally convergent, such as sequential linearcomplementarity methods may be preferred on the grounds of speed. None ...
Conference Paper
This paper employs ideas from genetics to study the evolution of strategies in games. In complex environments, individuals are not fully able to analyze the situation and calculate their optimal strategy. Instead they can be expected to adapt their strategy over time based upon what has been effective and what has not. The genetic algorithm is demonstrated in the context of a rich social setting, the environment formed by the strategies submitted to a prisoner’s dilemma computer tournament. The results of the evolutionary process show that the genetic algorithm has a remarkable ability to evolve sophisticated and effective strategies in a complex environment.
Book
This text introduces current evolutionary game theory--where ideas from evolutionary biology and rationalistic economics meet--emphasizing the links between static and dynamic approaches and noncooperative game theory. The author provides an overview of the developments that have taken place in this branch of game theory, discusses the mathematical tools needed to understand the area, describes both the motivation and intuition for the concepts involved, and explains why and how the theory is relevant to economics.
Article
We study learning processes for finite strategic-form games, in which players use the history of past play to forecast play in the current period. In a generalization of fictitious play, we assume only that players asymptotically choose best responses to the historical frequencies of opponents′ past play. This implies that if the stage-game strategies converge, the limit is a Nash equilibrium. In the basic model, plays seems unlikely to converge to a mixed-strategy equilibrium, but such convergence is natural when the stage game is perturbed in the manner of Harsanyi′s purification theorem. Journal of Economic Literature Classification Number: C72.
Article
This paper discusses three problems that can prevent the convergence of learning mechanisms to mixed-strategy Nash equilibria. First, while players′ expectations may converge to a mixed equilibrium, the strategies played typically fail to converge. Second, even in 2 × 2 games, fictitious play can produce a sequence of frequency distributions in which the marginal frequencies converge to equilibrium mixed strategies but the joint frequencies violate independence. Third, in a three-player matching-pennies game with a unique equilibrium, it is shown that if players learn as Bayesian statisticians then the equilibrium is locally unstable. Journal of Economic Literature Classification Numbers: C72, C73, D83.