ArticlePDF Available

Abstract

In this paper we offer a new approach to modeling strategies of bounded complexity, the so-called factor-based strategies. In our model, the strategy of a player in the multi-stage game does not directly map the set of histories H to the set of her actions. Instead, the player's perception of H is represented by a factor φ : H → X; where X reflects the "cognitive complexity" of the player. Formally, mapping φ sends each history to an element of a factor space X that represents its equivalence class. The play of the player can then be conditioned just on the elements of the set X. From the perspective of the original multi-stage game we say that a function φ from H to X is a factor of a strategy σ if there exists a function ω from X to the set of actions of the player such that σ = ω ◦ φ. In this case we say that the strategy σ is φ-factor-based. Stationary strategies and strategies played by finite automata and strategies with bounded recall are the most prominent examples of factor-based strategies. In the discounted infinitely repeated game with perfect monitoring, a best reply to a profile of φ-factor-based strategies need not be a φ-factor-based strategy. However, if the factor φ is recursive, namely its value φ(a1,..., at) on a finite string of action profiles (a1,..., at) is a function of φ(a1,...,at-1) and at, then for every profile of factor-based strategies there is a best reply that is a pure factor-based strategy. We also study factor-based strategies in the more general case of stochastic games.
JENA ECONOMIC
RESEARCH PAPERS
# 2010 – 082
Should I remember more than you?
- On the best response to factor-based strategies -
by
René Levínský
Abraham Neyman
Miroslav Zelený
www.jenecon.de
ISSN 1864-7057
The JENA ECONOMIC RESEARCH PAPERS is a joint publication of the Friedrich
Schiller University and the Max Planck Institute of Economics, Jena, Germany.
For editorial correspondence please contact markus.pasche@uni-jena.de.
Impressum:
Friedrich Schiller University Jena Max Planck Institute of Economics
Carl-Zeiss-Str. 3 Kahlaische Str. 10
D-07743 Jena D-07745 Jena
www.uni-jena.de www.econ.mpg.de
© by the author.
Should I remember more than you?
– On the best response to factor-based strategies –
Ren´
e Lev
´
ınsk´
y
Max Planck Institute of Economics,
Kahlaische Straße 10, 07745 Jena, Germany,
Abraham Neyman
Institute of Mathematics and Center for the Study of Rationality,
The Hebrew University of Jerusalem
Giv’at Ram, Jerusalem 91904, Israel,
and Miroslav Zelen´
y
Department of Mathematical Analysis,
Faculty of Mathematics and Physics, Charles University,
Sokolovsk´
a 83, 186 75 Praha, Czech Republic.
November 29, 2010
JEL classification: C73
Keywords: Bounded rationality, factor-based strategies, bounded recall strategies, finite
automata.
Corresponding author. Email: levinsky@econ.mpg.de
The second author was supported in part by Israel Science Foundation grant 1123/06.
The third author was supported by the research project MSM 0021620839 financed by MSMT.
Jena Economic Research Papers 2010 - 082
1
Abstract
In this paper we offer a new approach to modeling strategies of bounded complexity, the
so-called factor-based strategies. In our model, the strategy of a player in the multi-stage
game does not directly map the set of histories 𝐻to the set of her actions. Instead,
the player’s perception of 𝐻is represented by a factor 𝜑:𝐻𝑋, where 𝑋reflects
the “cognitive complexity” of the player. Formally, mapping 𝜑sends each history to an
element of a factor space 𝑋that represents its equivalence class. The play of the player
can then be conditioned just on the elements of the set 𝑋.
From the perspective of the original multi-stage game we say that a function 𝜑from 𝐻
to 𝑋is a factor of a strategy 𝜎if there exists a function 𝜔from 𝑋to the set of actions of
the player such that 𝜎=𝜔𝜑. In this case we say that the strategy 𝜎is 𝜑-factor-based.
Stationary strategies and strategies played by finite automata and strategies with bounded
recall are the most prominent examples of factor-based strategies.
In the discounted infinitely repeated game with perfect monitoring, a best reply to a
profile of 𝜑-factor-based strategies need not be a 𝜑-factor-based strategy. However, if the
factor 𝜑is recursive, namely its value 𝜑(𝑎1, . . . , 𝑎𝑡) on a finite string of action profiles
(𝑎1, . . . , 𝑎𝑡) is a function of 𝜑(𝑎1, . . . , 𝑎𝑡1) and 𝑎𝑡, then for every profile of factor-based
strategies there is a best reply that is a pure factor-based strategy.
We also study factor-based strategies in the more general case of stochastic games.
Jena Economic Research Papers 2010 - 082
2
1. Introduction
There are two widely studied approaches to modeling strategies of bounded complexity
in (infinitely) repeated games. Aumann (1981), Lehrer (1988), and Aumann and Sorin
(1989) consider players with stationary bounded recall strategies (SBR strategies) who
have imperfect consciousness of the actual stage of the game, and whose action in the
current stage game relies only on the 𝑡previous signals they observed and can “remember.”
Neyman (1985), Rubinstein (1986), Abreu and Rubinstein (1988), and Ben-Porath (1993)
deal with (infinitely) repeated games in which players are represented by finite automata
(Moore machines). Both models provide a measure of the complexity of the strategy. In
the bounded recall approach, the complexity of a strategy is described by the “depth of
recall” 𝑡, and the complexity of a strategy played by an automaton is measured by the
minimal number of states the automaton must have to play the given strategy.
In this paper we pursue the question already raised by Kalai (1990): “What information
system (size and structure) should a player maintain when playing a strategic game?” in
the context of strategies of bounded complexity. In detail, we study the complexity of
the strategy that is the best response to a strategy with a given complexity. Abreu and
Rubinstein (1988) show that for every finite automaton 𝐴1in the discounted repeated
game, there exists a finite automaton 𝐴2such that 𝐴2maximizes its own payoff in the
game against 𝐴1and the number of states of 𝐴2is less than or equal to the number of
states of 𝐴1. Here, we address this question in the broader context of the newly defined
concept of factor-based strategies.
In our bounded rationality approach, the player is not cognitively capable of processing
the set of all possible strategies as the set of all possible mappings from the set of all (finite)
histories 𝐻to the set of actions. Instead, the player can base her actions only on elements 𝑥
from some abstract set 𝑋, where the set 𝑋reflects the set of histories 𝐻through a mapping
𝜑:𝐻𝑋. Here, 𝜑describes the player’s capacity to differentiate between elements of
Jena Economic Research Papers 2010 - 082
3
𝐻. Alternatively, we can understand 𝑋as an image (a representation) of 𝐻in the players
mind, where an element 𝑥of 𝑋represents the set of histories 𝐻𝑥={𝐻:𝜑() = 𝑥}.
Naturally, we are interested in cases where the set 𝑋is a proper factor of 𝐻.
In defining the factor-based strategies, we were originally motivated by bounded recall
strategies. The player is unable to distinguish between two different histories and in
the case where the two histories are identical in the last 𝑡coordinates. This fact can be
easily described by 𝜑() = 𝜑() = 𝑥. Our formal approach can capture much more than
SBR strategies. The strategies played by finite automata are also factor-based (with finite
range 𝑋).
Moreover, we can easily “translate” our model to Aumann (1976); the state space Ω
corresponds to the set of histories 𝐻, and the partition 𝒫is defined by 𝒫={𝐻𝑥:𝑥𝑋}.
Here we can easily see that the factor-based strategies can model a player whose cognitive
failure is of a different nature than forgetfulness; e.g., a player with infinite recall who is
unable to distinguish between some actions of her opponent (i.e., games with imperfect
monitoring). Again, the strategies of such a player will be factor-based (possibly with
infinite range 𝑋).
The concept of an agent with limited ability to distinguish between histories reflects also
an older invention: the modal frame 𝑊, 𝑅of Kripke (1959). Here, the elements of 𝑊
represent the “possible worlds” and the binary relation 𝑅on 𝑊is known as the accessibility
relation. Identifying 𝑊with the set of histories 𝐻, and 𝑅with an equivalence relation, we
match the concept of factor-based strategies with the structure of a modal frame.
With the concept of factor-based strategies in hand, we can come back to the original
question “what is the complexity of the strategy that is the best response to a strategy with
a given complexity?” In our model this means the following: Consider player 1 endowed
with the set of actions 𝐴1who “lives” in “mental world” 𝜑:𝐻𝑋, and plays some
strategy 𝜔1𝜑, where 𝜔1:𝑋𝐴1.Now consider a (general, unbounded) strategy 𝜎2that
is the best response strategy of player 2 to 𝜔1𝜑and another strategy 𝜔2𝜑that is the
Jena Economic Research Papers 2010 - 082
4
best response to 𝜔1𝜑from the class of the “bounded” 𝜑-based strategies. Now we ask
ourselves under which circumstances does 𝜎2fare better than 𝜔2𝜑against 𝜔1𝜑. In other
words: considering the mental model of my opponent represented by 𝜑:𝐻𝑋, under
which conditions on 𝜑is it really profitable for me to be “cleverer” than my opponent (i.e.,
to play with a general 𝜎2that is the mapping from the whole set of histories 𝐻), and when
is it enough to be just as “clever” as she is (i.e., to play just using some 𝜔2that maps only
𝑋to the set of my actions).
As the first (negative) result of our paper we show that in the discounted infinitely
repeated game with perfect monitoring, a best reply to a profile of 𝜑-factor-based strategies
need not be a 𝜑-factor-based strategy. We obtain our main (positive) result for 𝜑that
is recursive, i.e., if there exists a function 𝑔:𝑋×𝐴𝑋such that 𝜑(𝑎1, . . . , 𝑎𝑡) =
𝑔(𝜑(𝑎1, . . . , 𝑎𝑡1), 𝑎𝑡),where 𝐴is the set of action profiles in the stage game. Note that in all
the examples of factor-based strategies above (finite automata, SBR strategies, imperfect
monitoring) the factor 𝜑is recursive. For every recursive factor 𝜑we show that for any
profile of factor-based strategies there is a best reply that is a pure factor-based strategy.
As a tool we use the theory of Markov decision processes (MDP), namely theorems on
the existence of the best stationary strategy for a given MDP. In fact, once we rephrase
our problem of finding the best reply as a question in an MDP our results turn out to be
corollaries of the results of Blackwell (1962) and Derman (1965).
This new perspective on Blackwell’s optimality also proves (and extends) the previous
results of Abreu and Rubinstein (1988). First, the statements are now proven in the same
way for behavioral automata and behavioral SBR strategies. Second, Blackwell’s theorem
gives all statements in a more robust form for patient players, namely for the whole interval
of discount factors 𝛽[𝛽0,1).
All relevant notions will be defined and discussed in the next section. Section 3 introduces
the concept of factor-based strategies and presents examples. Section 4 contains the main
result and its proof. Section 5 concludes.
Jena Economic Research Papers 2010 - 082
5
2. The Game Models
If 𝑋is a finite or countable set (or a measurable space), then Δ(𝑋) denotes the set of
all probabilities on 𝑋. Our results apply to a large class of multistage games with perfect
monitoring.
2.1. Supergames. We start with recalling the model of the two-person supergame with
finite action sets. Let 𝐺=𝐴1, 𝐴2, 𝑢1, 𝑢2be a stage game, where 𝐴𝑖is a nonempty finite
set of actions for player 𝑖(𝑖= 1,2) and 𝑢𝑖:𝐴1×𝐴2is the payoff function of player 𝑖.
The corresponding supergame 𝐺is played as follows. At each period 𝑡={1,2,3, . . . }
players 1 and 2 make simultaneous and independent moves 𝑎𝑖
𝑡𝐴𝑖,𝑖= 1,2.
A play of the supergame is a sequence of action profiles (𝑎𝑡)
𝑡=1 with 𝑎𝑡= (𝑎1
𝑡, 𝑎2
𝑡)𝐴=
𝐴1×𝐴2, and a play (𝑎𝑡)
𝑡=1 defines a stream (𝑢𝑖(𝑎𝑡))
𝑡=1 of payoffs to player 𝑖.
Apure strategy for player 𝑖in the supergame 𝐺is a mapping 𝜎:𝐴<𝐴𝑖. The
player 𝑖following a pure strategy 𝜎plays at the 𝑡-th round the action 𝜎(𝑎1, . . . , 𝑎𝑡1)
where (𝑎1, . . . , 𝑎𝑡1)𝐴𝑡1is the sequence of actions that have been already played.
Abehavioral strategy for player 𝑖in the supergame 𝐺is a mapping 𝜎:𝐴<Δ(𝐴𝑖).
Player 𝑖following a behavioral strategy 𝜎plays at the 𝑡-th round an action 𝑎𝑖
𝑡𝐴𝑖with the
probability 𝜎(𝑎1, . . . , 𝑎𝑡1)(𝑎𝑖
𝑡) where (𝑎1, . . . , 𝑎𝑡1)𝐴𝑡1is the sequence of actions that
have been already played. Pure strategies can be viewed as a special case of behavioral
strategies by identifying 𝐴𝑖with the Dirac measures on 𝐴𝑖. This point of view will be used
throughout the paper.
2.2. Supergames with a time-dependent stage game. The previous concept can be
generalized as follows. Let {⟨𝐴1(𝑡), 𝐴2(𝑡), 𝑢1(𝑡), 𝑢2(𝑡)⟩} be a sequence of stage games. The
corresponding game Γis played as follows. At each period 𝑡players 1 and 2 make
simultaneous and independent moves 𝑎𝑖
𝑡𝐴𝑖(𝑡), 𝑖= 1,2. These plays define a stream
(𝑢𝑖(𝑡)(𝑎𝑡))
𝑡=1 of payoffs to player 𝑖. The pure and behavioral strategies of player 𝑖in Γ
are defined in a straightforward way.
Jena Economic Research Papers 2010 - 082
6
2.3. Stochastic games. A two-person stochastic game with finite action sets is 5-tuple
Γ = 𝑆, 𝐴, 𝑢, 𝑝, 𝜇such that
astate space 𝑆is a nonempty set,
𝐴(𝑧) = 𝐴1(𝑧)×𝐴2(𝑧) is an action set: for every state 𝑧𝑆,𝐴𝑖(𝑧) is a nonempty
finite set of actions for player 𝑖(𝑖= 1,2) at the state 𝑧,
𝑢= (𝑢1, 𝑢2) is a payoff function, where 𝑢𝑖(𝑧, 𝑎) is the payoff function of player 𝑖,
(𝑧𝑆, 𝑎 𝐴(𝑧)),
𝑝is a transition function: for each state 𝑧𝑆and each action profile 𝑎𝐴(𝑧),
𝑝(𝑧, 𝑎)Δ(𝑆) is a probability distribution of next states; i.e., 𝑝(𝑧, 𝑎)(𝑧) is the
probability of moving to the state 𝑧if the players played 𝑎at the state 𝑧, and
𝜇Δ(𝑆) is a distribution of the initial state.
A play of the stochastic game Γis a sequence of states and actions (𝑧1, 𝑎1, . . . , 𝑧𝑡, 𝑎𝑡, 𝑧𝑡+1,
𝑎𝑡+1, . . .) with 𝑎𝑡𝐴(𝑧𝑡).
A pure strategy of player 𝑖in the stochastic game with perfect monitoring specifies her
action 𝑎𝑖
𝑡𝐴𝑖(𝑧𝑡) as a function of the past state and action profiles (𝑧1, 𝑎1, . . . , 𝑎𝑡1, 𝑧𝑡).
Similarly, a behavioral strategy of player 𝑖is a function of the past state and action profiles
(𝑧1, 𝑎1, . . . , 𝑎𝑡1, 𝑧𝑡) and specifies the probability that an action 𝑎𝑖
𝑡𝐴𝑖(𝑧𝑡) is played. A pair
of strategies 𝜎1and 𝜎2of players 1 and 2 defines a probability distribution 𝑃𝜎1,𝜎2on the
space of plays of the stochastic game. The expectation w.r.t. this probability distribution
is denoted by 𝐸𝜎1,𝜎2. Given a discount factor 0 < 𝛽 < 1 the (unnormalized) 𝛽-discounted
payoff to player 𝑖is defined by
𝑉𝑖
𝛽(𝜎1, 𝜎2) = 𝐸𝜎1,𝜎2
𝑡=1
𝛽𝑡1𝑢𝑖(𝑧𝑡, 𝑎𝑡)
and the normalized 𝛽-discounted payoff to player 𝑖is defined by
𝑣𝑖
𝛽(𝜎1, 𝜎2) = (1 𝛽)𝑉𝑖
𝛽(𝜎1, 𝜎2).
Jena Economic Research Papers 2010 - 082
7
This normalization ensures that if player 𝑖receives a payoff 𝑐at each period (i.e., the
stream of her payoffs is constant), then 𝑣𝑖
𝛽(𝜎1, 𝜎2) = 𝑐.
Supergames are a special case of stochastic games with a single state. Similarly, su-
pergames with a time-dependent stage game can be viewed as stochastic games with the
state space and the deterministic transition 𝑡7→ 𝑡+1. Thus, the normalized 𝛽-discounted
payoff is well defined also for supergames (possibly with a time-dependent stage game) as
long as their stage payoffs are either bounded or grow in a subexponential rate in 𝑡. There-
fore, results on stochastic games will have direct consequences for them.
3. Factor-based strategies
Let 𝐻denote the set of all finite histories in a supergame 𝐺(in a stochastic game
respectively), i.e., 𝐻=𝐴<(𝐻=𝑆×(𝐴×𝑆)<respectively). Let 𝑋be a set and 𝜑be
a mapping from 𝐻to 𝑋.
We say that a behavioral strategy 𝜎is a factor-based strategy with factor 𝜑(𝜑-based
strategy for short) for player 𝑖in the supergame 𝐺if there is a factor-action function
𝜔:𝑋Δ(𝐴𝑖) such that 𝜎=𝜔𝜑. Factor 𝜑is called recursive if there is a function
𝑔:𝑋×𝐴𝑋such that 𝜑(𝑎1, . . . , 𝑎𝑡) = 𝑔(𝜑(𝑎1, . . . , 𝑎𝑡1), 𝑎𝑡).
The notion of factor-based strategy for player 𝑖in the supergame Γwith a time-
dependent stage game is defined analogously. The resulting probability of 𝑎𝑖
𝑡depends
on 𝜑(𝑎1, . . . , 𝑎𝑡1) and on the actual period 𝑡. Thus the 𝜑-based strategy 𝜎satisfies
𝜎(𝑎1, 𝑎2, . . . , 𝑎𝑡1) = 𝜔(𝑡, 𝜑(𝑎1, 𝑎2, . . . , 𝑎𝑡1)).
for some 𝜔:×𝑋Δ(𝐴𝑖).
Further, we define 𝜑-based strategy for player 𝑖in the stochastic game. The choice of
distribution of action 𝑎𝑖
𝑡depends on 𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡1, 𝑎𝑡1) and on the actual state 𝑧𝑡. This
means that 𝜔:𝑆×𝑋Δ(𝐴𝑖) and
𝜎(𝑧1, 𝑎1, . . . , 𝑧𝑡) = 𝜔(𝑧𝑡, 𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡1, 𝑎𝑡1)).
Jena Economic Research Papers 2010 - 082
8
Factor 𝜑in the case of a stochastic game is called recursive if there is a function 𝑔:
𝑋×𝑆×𝐴𝑋such that 𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡, 𝑎𝑡) = 𝑔(𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡1, 𝑎𝑡1), 𝑧𝑡, 𝑎𝑡). A few
classes of recursive 𝜑-based strategies follow.
3.1. SBR strategies. Let 𝑘. By a behavioral 𝑘-SBR strategy for player 𝑖in the
supergame 𝐺we mean a pair (𝑒, 𝜔), where 𝑒= (𝑒1, 𝑒2, . . . , 𝑒𝑘)𝐴𝑘and 𝜔:𝐴𝑘Δ(𝐴𝑖)
is a mapping. Player 𝑖following the strategy (𝑒, 𝜔) plays as follows. If moves 𝑎1, . . . , 𝑎𝑙
𝐴have been played, then player 𝑖takes the sequence 𝑠, which is formed by the last 𝑘
elements of the sequence (𝑒1, . . . , 𝑒𝑘, 𝑎1, . . . , 𝑎𝑙), and his (𝑙+ 1)-th move is 𝑎𝐴𝑖with the
(conditional) probability 𝜔(𝑠)(𝑎). A pure 𝑘-SBR strategy for player 𝑖in the supergame
𝐺is defined in a straightforward way.
Defining 𝜑(𝑎1, . . . , 𝑎𝑙) to be the last 𝑘elements of the sequence (𝑒1, . . . , 𝑒𝑘, 𝑎1, . . . , 𝑎𝑙), the
𝑘-SBR strategy 𝜎defined above obeys 𝜎=𝜔𝜑, and 𝜑is recursive; thus 𝜎is a recursive
𝜑-based strategy with finite range.
We say that a behavioral (pure) strategy 𝜎is a behavioral (pure) SBR strategy if 𝜎is a
behavioral (pure) 𝑘-SBR strategy for some 𝑘.
3.2. Strategies with time-dependent recall. (See, e.g., Neyman and Okada, 2009.)
Let 𝑘:be a function with 𝑘(𝑡)< 𝑡 for every 𝑡.Behavioral (respectively, pure)
𝑘(𝑡)-BR strategy is defined analogously to the above case but the action at stage 𝑡depends
on 𝑡and the last 𝑘(𝑡) stage-actions. Let 𝜎be such a strategy. Setting 𝜑(𝑎1, . . . , 𝑎𝑡) =
(𝑡, (𝑎𝑡𝑘(𝑡), . . . , 𝑎𝑡1)) we easily see that 𝜎is 𝜑-based. Moreover, 𝜑is recursive provided
𝑘(𝑡+ 1) 𝑘(𝑡) + 1 for every 𝑡.
3.3. Automata and behavioral automata. Abehavioral automaton (for player 1 in the
supergame 𝐺) is a quadruple 𝑀, 𝑚, 𝛼, 𝜏 , where 𝑀is a nonempty set (the state space),
𝑚𝑀is the initial state, 𝛼:𝑀Δ(𝐴1) is a probabilistic action function, and 𝜏:𝑀×
𝐴𝑀is a transition function. A 𝑘-state behavioral automaton is a behavioral automaton
where the set 𝑀has 𝑘elements. A behavioral automaton 𝑀, 𝑚, 𝛼, 𝜏 defines a behavioral
Jena Economic Research Papers 2010 - 082
9
strategy 𝜎1(for player 1) inductively: 𝑚1=𝑚,𝜎1() = 𝛼(𝑚1), 𝜎1(𝑎1, . . . , 𝑎𝑡1) = 𝛼(𝑚𝑡),
where 𝑚𝑡=𝜏(𝑚𝑡1, 𝑎𝑡1).
A behavioral automaton 𝑀, 𝑚, 𝛼, 𝜏 defines a recursive 𝜑-based strategy where 𝑋=
𝑀,𝜑() = 𝑚,𝜑(𝑎1, . . . , 𝑎𝑡) = 𝜏(𝜑(𝑎1, . . . , 𝑎𝑡1), 𝑎𝑡), and 𝜔=𝛼.
A𝑘-state (deterministic) automaton is defined by the replacement of Δ(𝐴1) with 𝐴1.
3.4. Time-dependent automata. (See, e.g., Neyman, 1997.) A time-dependent action
automaton is defined by replacing the action function 𝛼by a sequence of action functions
𝛼𝑡,𝑡1, where 𝛼𝑡defines the action at stage 𝑡. Similarly, a time-dependent transition
automaton is obtained by replacing the (stationary) transition function 𝜏with a sequence
of time-dependent transitions 𝜏𝑡,𝑡1, where 𝜏𝑡defines the transition at stage 𝑡. Finally,
a time-dependent (action and transition) automaton in the supergame 𝐺is a quadruple
𝑀, 𝑚,(𝛼𝑡)
𝑡=1,(𝜏𝑡)
𝑡=1, where 𝑀is a nonempty set (the state space), 𝑚𝑀is the
initial state, 𝛼𝑡:𝑀Δ(𝐴1) is a probabilistic action function, and 𝜏𝑡:𝑀×𝐴𝑀is a
(deterministic) transition function. It defines a behavioral strategy 𝜎1(for player 1) induc-
tively: 𝑚1=𝑚,𝜎1() = 𝛼1(𝑚1), 𝜎1(𝑎1, . . . , 𝑎𝑡1) = 𝛼𝑡(𝑚𝑡), where 𝑚𝑡=𝜏𝑡(𝑚𝑡1, 𝑎𝑡1).
Note that a time-dependent automaton 𝑀, 𝑚,(𝛼𝑡)
𝑡=1,(𝜏𝑡)
𝑡=1defines the same strat-
egy as the automaton 𝑀×, 𝑚∗∗, 𝛼, 𝜏 with 𝑚∗∗ = (𝑚,1), 𝛼(𝑚, 𝑡) = 𝛼𝑡(𝑚) and
𝜏((𝑚, 𝑡), 𝑎)=(𝜏𝑡(𝑚, 𝑎), 𝑡 + 1). Therefore, the corresponding strategy is a recursive 𝜑-
based strategy, where 𝜑:𝐴<𝑀×is given by 𝜑(𝑎1, . . . , 𝑎𝑡) = 𝜏(𝜑(𝑎1, . . . , 𝑎𝑡1), 𝑎𝑡)
and 𝜔=𝛼.
3.5. A counterexample. Our objective is to study for what factors 𝜑of the strategy 𝜎1
player 2 has a 𝜑-based best reply. First, we demonstrate that in the discounted two-person
repeated game (with finitely many stage actions) there need not be such a strategy.
Let 𝐺be the stage game with stage-action sets 𝐴1=𝐴2={1,2}, and the payoff function
to player 2 is given by 𝑢2(1,1) = 𝑢2(1,2) = 0, 𝑢2(2,1) = 𝑢2(2,2) = 1. Define the factor
Jena Economic Research Papers 2010 - 082
10
𝜑:𝐻𝑋by 𝑋={𝐵, 𝐶}and
𝜑() =
𝐵if = (𝑎1) and 𝑎2
1= 1or = (𝑎1, 𝑎2, 𝑎3) and 𝑎2
3= 2,
𝐶otherwise.
Consider a 𝜑-based strategy 𝜎1defined via 𝜔1:𝑋𝐴1, where 𝜔1(𝐵) = 2 and 𝜔1(𝐶) = 1.
Let us demonstrate that any 𝜑-based strategy 𝜎2of player 2 cannot be a best reply to
the strategy 𝜎1. First, the nonzero payoffs to player 2 are possible only in stages 2 and 4.
Suppose 𝜎2is 𝜑-based with 𝜎2=𝜔2𝜑. Set 0 𝜔2(𝐶)(1) = 𝑥1. Then 𝑉2
𝛽(𝜎1, 𝜎2) =
𝛽𝑥 +𝛽3(1 𝑥). But the strategy ˜𝜎2, where player 2 plays 1 in the first period and 2 in the
third, yields 𝑉2
𝛽(𝜎1,˜𝜎2) = 𝛽+𝛽3> 𝛽𝑥 +𝛽3(1 𝑥), whenever 𝑥[0,1], 𝛽 (0,1).
4. Main results
The main result follows.
Theorem 4.1. Let Γ = 𝑆, 𝐴, 𝑢, 𝑝, 𝜇be a two-person stochastic game with countably many
states, finitely many actions at each state, and a bounded payoff function 𝑢2. Let 𝜎1be a
𝜑-based behavioral strategy of player 1 in Γ. If 𝜑is recursive, then the following hold.
(i) For every 𝛽(0,1) there exists a 𝜑-based pure strategy 𝜎2such that for every
behavioral strategy 𝜌of player 2 in Γwe have 𝑣2
𝛽(𝜎1, 𝜎2)𝑣2
𝛽(𝜎1, 𝜌).
(ii) If 𝑆and the range of 𝜑are, in addition, finite, then there is a 𝜑-based pure strategy
𝜎2and a discount factor 𝛽0(0,1) such that
for every behavioral strategy 𝜌(of player 2 in Γ) and every 𝛽[𝛽0,1),we
have 𝑣2
𝛽(𝜎1, 𝜎2)𝑣2
𝛽(𝜎1, 𝜌);
for every behavioral strategy 𝜌, we have
𝐸𝜎1,𝜎2lim inf
𝑛→∞
1
𝑛
𝑛
𝑡=1
𝑢2(𝑧𝑡, 𝑎𝑡)𝐸𝜎1,𝜌 lim sup
𝑛→∞
1
𝑛
𝑛
𝑡=1
𝑢2(𝑧𝑡, 𝑎𝑡);
Jena Economic Research Papers 2010 - 082
11
for every 𝜀 > 0,there exists 𝑁such that, for every behavioral strategy 𝜌
and every 𝑛𝑁, we have
𝐸𝜎1,𝜎21
𝑛
𝑛
𝑡=1
𝑢2(𝑧𝑡, 𝑎𝑡)𝐸𝜎1,𝜌 1
𝑛
𝑛
𝑡=1
𝑢2(𝑧𝑡, 𝑎𝑡)𝜀.
Remark 4.2. (i) Let 𝐺be a supergame (supergame with time-dependent stage game
respectively). Since such a supergame belongs also to the class of stochastic supergames,
Theorem 4.1 gives the following consequences in the 𝛽-discounted game 𝐺,𝛽(0,1).
a) For every behavioral 𝑘-SBR strategy 𝜎1, there is a pure 𝑘-SBR strategy 𝜎2that is
a best reply of player 2.
b) For every behavioral (time-dependent recall) 𝑘(𝑡)-SBR strategy 𝜎1with 𝑘(𝑡+ 1)
𝑘(𝑡) + 1, there is a pure 𝑘(𝑡)-SBR strategy 𝜎2that is a best reply.
c) For every strategy 𝜎1that is defined by a 𝑘-state behavioral automaton, there is a
best reply 𝜎2of player 2 defined by a (deterministic) 𝑘-state automaton.
d) For every strategy 𝜎1that is defined by a 𝑘-state time-dependent automaton, there
is a best reply 𝜎2defined by a (deterministic) 𝑘-state time-dependent automaton.
(ii) The extension of the models from two-person games to multi-person games is straight-
forward and our results on the best reply for two-person games extends to 𝑛-person games
(𝑛 > 2) since players 1,2, . . . , 𝑛 1 can be considered as one player playing actions from
the space 𝐴1× ⋅ ⋅ ⋅ × 𝐴𝑛1.
The main result is a simple corollary of results on Markov decision processes. By a
Markov decision process (MDP, for short) we mean a one-person (called the decision maker)
stochastic game. We recall the definition of the MDP in the following notation: by 𝑟we
denote the single-stage payoff function to the decision maker and by 𝑣𝛽(𝜎) the normalized
𝛽-discounted payoff to the decision maker when the strategy 𝜎is played.
Jena Economic Research Papers 2010 - 082
12
More precisely, by MDP we mean a 5-tuple 𝑀, 𝐵, 𝑟, 𝑝, 𝜈such that
𝑀is a nonempty countable set (set of states),
𝐵(𝑧), 𝑧 𝑀is a nonempty finite set (set of actions at the state 𝑧),
𝑟(𝑧, 𝑎) is a real number for every 𝑧𝑀and 𝑎𝐵(𝑧) (reward function),
𝑝(𝑧, 𝑎) is a probability on 𝑀for every 𝑧𝑀and 𝑎𝐵(𝑧),
𝜈is an initial probability on 𝑀.
One can interpret this structure as follows. The set 𝐵(𝑧) is the set of feasible actions
that can be played at state 𝑧𝑀by the decision maker. The sequence (𝑧1, 𝑎1, 𝑧2, 𝑎2, . . . )
of states and actions of the process is realized as follows. The initial 𝑧1is chosen with
the probability 𝜈(𝑧1). If the sequence (𝑧1, 𝑎1, 𝑧2, 𝑎2, . . . , 𝑧𝑡) has been constructed, then the
decision maker plays an action 𝑎𝑡𝐵(𝑧𝑡) and receives a payoff 𝑟(𝑧𝑡, 𝑎𝑡). The (conditional)
probability of the next state 𝑧𝑡+1 𝑀of the process (given 𝑧1, . . . , 𝑧𝑡, 𝑎𝑡) is given by the
probability distribution 𝑝(𝑧𝑡, 𝑎𝑡).
Astrategy for an MDP is a function 𝜎that assigns to every finite sequence of states
and actions 𝑠= (𝑧1, 𝑎1, 𝑧2, 𝑎2, . . . , 𝑧𝑡) a probability 𝜎(𝑠) on 𝐵(𝑧𝑡). If 𝜎(𝑠) is always a
Dirac measure, then 𝜎is pure. By a stationary strategy for an MDP we mean a strategy
depending only on the last state.
A strategy 𝜎of the decision maker defines a probability distribution 𝑃𝜎on the space of
plays of the MDP. The expectation w.r.t. this probability distribution is denoted by 𝐸𝜎.
Given a discount factor 0 < 𝛽 < 1, the normalized 𝛽-discounted payoff to the decision
maker is defined by
𝑣𝛽(𝜎) = (1 𝛽)𝐸𝜎
𝑡=1
𝛽𝑡1𝑟(𝑧𝑡, 𝑎𝑡).
The key tools in our paper are results of Blackwell and Derman. Parts (ii) and (iii) of the
Theorem 4.4 follow implicitly from part (i) and the proof in Mertens and Neyman (1981)
that shows that the stationary strategy 𝜎that obeys (i) is 𝜀-optimal for every 𝜀 > 0; for
an explicit statement see Neyman (2003).
Jena Economic Research Papers 2010 - 082
13
Theorem 4.3 (Derman, 1965).Let 𝑀, 𝐵, 𝑟, 𝑝, 𝜈be an MDP with countably many states
and finitely many actions in each state, and with bounded reward function. Then for each
𝛽(0,1) there is a stationary pure strategy 𝜎such that, for every strategy 𝜌, we have
𝑣𝛽(𝜎)𝑣𝛽(𝜌).
Theorem 4.4 (Blackwell, 1962).Let 𝑀, 𝐵, 𝑟, 𝑝, 𝜈be an MDP with finitely many states
and actions. Then there is a stationary pure strategy 𝜎and a discount factor 𝛽0(0,1)
such that
(i) for every strategy 𝜌and for every 𝛽[𝛽0,1),we have 𝑣𝛽(𝜎)𝑣𝛽(𝜌);
(ii) for every strategy 𝜌, we have
𝐸𝜎lim inf
𝑛→∞
1
𝑛
𝑛
𝑡=1
𝑟(𝑧𝑡, 𝑎𝑡)𝐸𝜌lim sup
𝑛→∞
1
𝑛
𝑛
𝑡=1
𝑟(𝑧𝑡, 𝑎𝑡);
(iii) for every 𝜀 > 0there exists 𝑁such that, for every strategy 𝜌and every 𝑛𝑁,
we have
𝐸𝜎1
𝑛
𝑛
𝑡=1
𝑟(𝑧𝑡, 𝑎𝑡)𝐸𝜌1
𝑛
𝑛
𝑡=1
𝑟(𝑧𝑡, 𝑎𝑡)𝜀.
Proof of Theorem 4.1. Let 𝜎1be a 𝜑-based strategy for player 1 in the stochastic game
Γ and assume that 𝜑is recursive. Thus, there exist functions 𝜔:𝑆×𝑋Δ(𝐴1) and
𝑔:𝑋×𝑆×𝐴𝑋such that
𝜎1(𝑧1, 𝑎1, . . . , 𝑧𝑡) = 𝜔(𝑧𝑡, 𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡1, 𝑎𝑡1)),
𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡, 𝑎𝑡) = 𝑔(𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡1, 𝑎𝑡1), 𝑧𝑡, 𝑎𝑡).
Jena Economic Research Papers 2010 - 082
14
We define an MDP 𝔐=𝑀, 𝐵, 𝑟, 𝑞, 𝜈as follows.
𝑀=𝑆×𝑋,
𝐵(𝑧, 𝑥) = 𝐴2(𝑧),(𝑧, 𝑥)𝑀,
𝑟(𝑧, 𝑥, 𝑎2) =
𝑎1𝐴1(𝑧)
𝑢2(𝑧, (𝑎1, 𝑎2)) 𝜔(𝑧, 𝑥)(𝑎1),(𝑧, 𝑥)𝑀, 𝑎2𝐴2(𝑧),
𝑞(𝑧, 𝑥, 𝑎2)(𝑧, 𝑥) =
𝑎1𝐴1(𝑧),
𝑔(𝑥,𝑧,(𝑎1,𝑎2))=𝑥
𝑝(𝑧, (𝑎1, 𝑎2))(𝑧)𝜔(𝑧, 𝑥)(𝑎1),(𝑧, 𝑥)𝑀, 𝑎2𝐴2(𝑧),
𝜈(𝑧, 𝑥) =
𝜇(𝑧), 𝑥 =𝜑();
0,otherwise.
A play of 𝔐is of the form
(𝑧1, 𝑥1, 𝑎2
1, 𝑧2, 𝑥2, 𝑎2
2, . . . , 𝑧𝑡, 𝑥𝑡, 𝑎2
𝑡, . . . ).
If 𝜌is a strategy for player 2 in Γ, then the probability measure 𝑃𝜎1,𝜌 captures prob-
ability distribution of possible plays (𝑧1, 𝑎1, 𝑧2, . . . ) of Γ, where players 1 and 2 follow
the strategies 𝜎1and 𝜌, respectively. Thus, 𝑃𝜎1,𝜌(𝑧1, 𝑎1, . . . , 𝑧𝑡) is the probability that a
play starts with the sequence (𝑧1, 𝑎1, . . . , 𝑧𝑡). Similarly, if 𝜁is a strategy of the decision
maker in 𝔐, then the probability that a play starts with (𝑧1, 𝑥1, 𝑎2
1, . . . , 𝑧𝑡, 𝑥𝑡) is denoted
by 𝑃𝜁(𝑧1, 𝑥1, 𝑎2
1, . . . , 𝑧𝑡, 𝑥𝑡).
Let 𝜓be a mapping assigning to each sequence (𝑧1, 𝑎1, . . . , 𝑧𝑡) the corresponding sequence
(𝑧1, 𝑥1, 𝑎2
1, . . . , 𝑧𝑡, 𝑥𝑡), where 𝑥1=𝜑(), 𝑥𝑗=𝑔(𝑥𝑗1, 𝑧𝑗1, 𝑎𝑗1), 𝑗= 2, . . . , 𝑡.
Let 𝜌be a strategy of player 2 in Γ. Then we define the corresponding strategy ˜𝜌in
𝔐by
˜𝜌𝑠)(𝑎2) =
𝑠,𝜓(𝑠)=˜𝑠
𝜌(𝑠)(𝑎2)𝑃𝜎1,𝜌(𝑠˜𝑠),
where 𝑠= (𝑧1, 𝑎1, . . . , 𝑧𝑡), ˜𝑠= (𝑧1, 𝑥1, 𝑎2
1, . . . , 𝑧𝑡, 𝑥𝑡), and 𝑃𝜎1,𝜌(𝑠˜𝑠) denotes the conditional
probability of 𝑠given ˜𝑠. The symbol 𝑃𝜎1,𝜌 𝑠) denotes the probability that the play starts
Jena Economic Research Papers 2010 - 082
15
with a sequence 𝑠satisfying 𝜓(𝑠) = ˜𝑠, that is,
𝑃𝜎1,𝜌(˜𝑠) =
𝑠,𝜓(𝑠)=˜𝑠
𝑃𝜎1,𝜌(𝑠).
Claim 4.5.
(i) For every fixed ˜𝑠= (𝑧1, 𝑥1, 𝑎2
1, . . . , 𝑧𝑡, 𝑥𝑡)we have 𝑃𝜎1,𝜌(˜𝑠) = 𝑃˜𝜌𝑠).
(ii) Let 𝛽(0,1),𝑡, and 𝜌be a strategy in Γfor player 2. Then 𝐸˜𝜌(𝑟(𝑧𝑡, 𝑥𝑡, 𝑎2
𝑡)) =
𝐸𝜎1,𝜌(𝑢2(𝑧𝑡, 𝑎𝑡)).
Proof of Claim. (i) We will proceed by induction on the length of ˜𝑠. Suppose that ˜𝑠=
(𝑧1, 𝑥1). If 𝑥1=𝜑(), then we clearly have
𝑠,𝜓(𝑠)=˜𝑠
𝑃𝜎1,𝜌(𝑠) = 𝑃𝜎1,𝜌 (𝑧1) = 𝜇(𝑧1) = 𝑃˜𝜌(𝑧1, 𝑥1).
If 𝑥1=𝜑(), then the equality clearly holds. Now assume that the desired equality
holds for every ˜𝑤= (𝑧1, 𝑥1, 𝑎2
1, . . . , 𝑧𝑡, 𝑥𝑡). Fix such a ˜𝑤and consider ˜𝑠of the form ˜𝑠=
(𝑧1, 𝑥1, 𝑎2
1, . . . , 𝑧𝑡, 𝑥𝑡, 𝑎2
𝑡, 𝑧𝑡+1, 𝑥𝑡+1). We have
𝑠,𝜓(𝑠)=˜𝑠
𝑃𝜎1,𝜌(𝑠) =
𝑤
𝜓(𝑤)= ˜𝑤
𝑎1𝐴1(𝑧𝑡)
𝑔(𝑥𝑡,𝑧𝑡,(𝑎1,𝑎2
𝑡))=𝑥𝑡+1
𝑝(𝑧𝑡,(𝑎1, 𝑎2
𝑡))(𝑧𝑡+1)𝜔(𝑧𝑡, 𝑥𝑡)(𝑎1)𝑃𝜎1,𝜌 (𝑤)
=
𝑤
𝜓(𝑤)= ˜𝑤
𝑞(𝑧𝑡, 𝑥𝑡, 𝑎2
𝑡)(𝑧𝑡+1, 𝑥𝑡+1)𝑃𝜎1,𝜌(𝑤)
=𝑞(𝑧𝑡, 𝑥𝑡, 𝑎2
𝑡)(𝑧𝑡+1, 𝑥𝑡+1)𝑃˜𝜌( ˜𝑤) (by induction hypothesis)
=𝑃˜𝜌𝑠).
(ii) Let us compute
𝐸𝜎1,𝜌(𝑢2(𝑧𝑡, 𝑎𝑡)) = 𝑢2(𝑧𝑡, 𝑎𝑡)𝑑𝑃𝜎1,𝜌
=
𝑠=(𝑧1,𝑎1,...,𝑧𝑡)
𝑎=(𝑎1,𝑎2)𝐴(𝑧𝑡)
𝑢2(𝑧𝑡, 𝑎)𝜔(𝑧𝑡, 𝑥𝑡)(𝑎1)𝜌(𝑠)(𝑎2)𝑃𝜎1,𝜌(𝑠).
Jena Economic Research Papers 2010 - 082
16
Using the definition of 𝑟we get
𝐸𝜎1,𝜌(𝑢2(𝑧𝑡, 𝑎𝑡)) =
𝑠
𝑎2𝐴2(𝑧𝑡)
𝑟(𝑧𝑡, 𝑥𝑡, 𝑎2)𝜌(𝑠)(𝑎2)𝑃𝜎1,𝜌(𝑠)
=
˜𝑠
𝑠,𝜓(𝑠)=˜𝑠
𝑎2𝐴2(𝑧𝑡)
𝑟(𝑧𝑡, 𝑥𝑡, 𝑎2)𝜌(𝑠)(𝑎2)𝑃𝜎1,𝜌(𝑠)
=
˜𝑠
𝑎2𝐴2(𝑧𝑡)
𝑟(𝑧𝑡, 𝑥𝑡, 𝑎2)
𝑠,𝜓(𝑠)=˜𝑠
𝜌(𝑠)(𝑎2)𝑃𝜎1,𝜌(𝑠˜𝑠)
𝑃𝜎1,𝜌(˜𝑠)
=
˜𝑠
𝑎2𝐴2(𝑧𝑡)
𝑟(𝑧𝑡, 𝑥𝑡, 𝑎2)˜𝜌𝑠)(𝑎2)𝑃𝜎1,𝜌(˜𝑠).
Using part (i) of Claim 4.5 we conclude
𝐸𝜎1,𝜌(𝑢2(𝑧𝑡, 𝑎𝑡)) =
˜𝑠
𝑎2𝐴2(𝑧𝑡)
𝑟(𝑧𝑡, 𝑥𝑡, 𝑎2)˜𝜌𝑠)(𝑎2)𝑃˜𝜌𝑠)
=𝐸˜𝜌(𝑟(𝑧𝑡, 𝑥𝑡, 𝑎2
𝑡)).
Fix 𝛽(0,1). According to Theorem 4.3 there exists a pure stationary strategy 𝜏for
the decision maker in 𝔐. Such a strategy defines a 𝜑-based pure strategy 𝜎2of player 2
in Γas follows:
𝜎2(𝑧1, 𝑎1, . . . , 𝑧𝑡) = 𝜏(𝑧𝑡, 𝜑(𝑧1, 𝑎1, . . . , 𝑧𝑡1, 𝑎𝑡1)).
Now assume that we have a strategy 𝜌of player 2 in Γ. According to Claim 4.5(ii), we
have 𝑣2
𝛽(𝜎1, 𝜌) = 𝑣𝛽(˜𝜌)𝑣𝛽(𝜏) = 𝑣2
𝛽(𝜎1, 𝜎2). Thus we get assertion (i). Assertion (ii)
follows from Theorem 4.4 and Claim 4.5(ii).
5. Concluding remarks
5.1. Compact action spaces. A natural extension of our model is to consider players
with compact action sets 𝐴𝑖.In this extension, there arises a new problem not found in
games with finite action profiles, namely the existence of a best reply to a given strategy
𝜎. Consider, for example, the following two-player supergame, where the sets of actions of
Jena Economic Research Papers 2010 - 082
17
each player is the interval [0,1] and the stage-payoff of player 2 is (at any time) given by
𝑢2(𝑎1, 𝑎2) = 𝑎1+𝑎2. Now, suppose that player 1 plays the 1-SBR strategy given by
𝜎1(𝑎1, . . . , 𝑎𝑡1) =
1,if 𝑎2
𝑡1<1 and 𝑡 > 1,
0,otherwise,
𝑒1= (0,0).
This strategy is recursively factor-based. Indeed, we set 𝑋={𝐵, 𝐶 }and 𝜑(𝑎1, . . . , 𝑎𝑡1) =
𝐵if 𝑎2
𝑡1<1 and 𝑡 > 1, 𝜑(𝑎1, . . . , 𝑎𝑡1) = 𝐶otherwise; 𝜔(𝐵) = 1 and 𝜔(𝐶) = 0. Then we
have 𝜎1=𝜔𝜑. However, in the 𝛽-discounted game there is no 𝜑-based best reply, and
any 𝜑-based reply is dominated by (another) 𝜑-based reply.
Of course, there does not exist any general best reply to 𝜎1. The difficulty stems from the
fact that factor 𝜑is not continuous. However, using, e.g., Maitra (1968) one can generalize
our results of part (i) of Theorem 4.1.
5.2. Public vs. private strategies. Another interpretation of the 𝜑-based strategies is
related to the imperfect monitoring literature. Setting 𝑋as the set of all possible histories
of public signals, we can identify the 𝜑-based strategies with so-called public strategies
(see, e.g., Radner, Myerson, and Maskin, 1986). In contrast, a private strategy (see, e.g.,
Kandori and Obara, 2006) is a strategy where the current action depends on the history
of public signals (i.e., on elements of 𝑋), and, in addition, on private signals (e.g., past
private actions). Our question at the outset of this paper can then be reformulated as
“Considering my opponent is limited to public strategies only, under which conditions can
I exploit my (additional) private signal?”; in other words, “Can private strategies fare
better than the public strategies against public strategies?” The answer is that one does
not profit from the additional private signal since the factor 𝜑is in this situation obviously
recursive.
Jena Economic Research Papers 2010 - 082
18
References
Abreu, D., and A. Rubinstein (1988): “The structure of Nash equilibrium in repeated
games with finite automata,” Econometrica, 56(6), 1259–1281.
Aumann, R. J. (1976): “Agreeing to disagree,” Annals of Statistics, 4, 1236–1239.
(1981): “Survey of repeated games,” in Essays in Game Theory and Mathe-
matical Economics in Honour of Oskar Morgenstern, pp. 11–42. Wissenschaftsverlag,
Bibliographisches Institut, Mannheim.
Aumann, R. J., and S. Sorin (1989): “Cooperation and bounded recall,” Games and
Economic Behavior, 1(1), 5–39.
Ben-Porath, E. (1993): “Repeated games with finite automata,” Journal of Economic
Theory, 59, 17–32.
Blackwell, D. (1962): “Discrete dynamic programing,” Annals of Mathematical Statis-
tics, 33, 719–726.
Derman, C. (1965): “Markovian sequential control processes: Denumerable state spaces,”
Journal of Mathematical Analysis and Applications, 10, 295–302.
Kalai, E. (1990): “Bounded rationality and strategic complexity in repeated games,”
in Game Theory and Applications, ed. by T. Ichiishi, A. Neyman, and Y. Tauman, pp.
131–157. Academic Press, San Diego.
Kandori, M., and I. Obara (2006): “Efficiency in repeated games revisited: The role
of private strategies,” Econometrica, 74(2), 499–519.
Kripke, S. A. (1959): “A completeness theorem in modal logic,” The Journal of Symbolic
Logic, 4, 1–14.
Lehrer, E. (1988): “Repeated games with stationary bounded recall strategies,” Journal
of Economic Theory, 46(1), 130–144.
Maitra, A. (1968): “Discounted dynamic programming on compact metric spaces,”
Sankhy¯a: The Indian Journal of Statistics, Series A, 30(2), 211–216.
Jena Economic Research Papers 2010 - 082
19
Mertens, J.-F., and A. Neyman (1981): “Stochastic games,” International Journal of
Game Theory, 10, 53–66.
Neyman, A. (1985): “Bounded complexity justifies cooperation in the finitely repeated
prisoners’ dilemma,” Economics Letters, 19, 227–229.
(1997): “Cooperation, repetition, and automata,” in Cooperation: Game Theo-
retic Approaches, ed. by S. Hart, and A. Mas-Colell, vol. 155 of NATO ASI Series F, pp.
233–255. Springer-Verlag, New York.
(2003): “From Markov chains to stochastic games,” in Stochastic Games and
Applications, ed. by A. Neyman, and S. Sorin, vol. 570 of NATO Science Series C, pp.
9–25. Kluwer Academic Publishers, Dordrecht.
Neyman, A., and D. Okada (2009): “Growth of strategy sets, entropy, and nonstation-
ary bounded recall,” Games and Economic Behavior, 66(1), 404–425.
Radner, R., R. Myerson, and E. Maskin (1986): “An example of a repeated part-
nership game with discounting and with uniformly inefficient equilibria,” Review of Eco-
nomic Studies, 53(172), 59–69.
Rubinstein, A. (1986): “Finite automata play the repeated prisoner’s dilemma,” Journal
of Economic Theory, 39(1), 83–96.
Jena Economic Research Papers 2010 - 082
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Markov chains1 and Markov decision processes (MDPs) are special cases of stochastic games. Markov chains describe the dynamics of the states of a stochastic game where each player has a single action in each state. Similarly, the dynamics of the states of a stochastic game form a Markov chain whenever the players’ strategies are stationary. Markov decision processes are stochastic games with a single player. In addition, the decision problem faced by a player in a stochastic game when all other players choose a fixed profile of stationary strategies is equivalent to an MDP.
Article
Full-text available
The paper studies the implications of bounding the complexity of the strategies players may select, on the set of equilibrium payoffs in repeated games. The complexity of a strategy is measured by the size of the minimal automation that can implement it. A finite automation has a finite number of states and an initial state. It prescribes the action to be taken as a function of the current state and a transition function changing the state of the automaton as a function of its current state and the present actions of the other players. The size of an automaton is its number of states. The main results imply in particular that in two person repeated games, the set of equilibrium payoffs of a sequence of such games, G(n), n =1, 2, ..., converges as n goes to infinity to the individual rational and feasible payoffs of the one shot game, whenever the bound on one of the two automata sizes is polynomial or subexponential in n and both, the length of the game and the bounds of the automata sizes are at least n. A special case of such result justifies cooperation in the finitely repeated prisoner¹s dilemma, without departure from strict utility maximization or complete information, but under the assumption that there are bounds (possibly very large) to the complexity of the strategies that the players may use.
Article
Full-text available
The two-player Iterated Prisoner's Dilemma game is a model for both sentient and evolutionary behaviors, especially including the emergence of cooperation. It is generally assumed that there exists no simple ultimatum strategy whereby one player can enforce a unilateral claim to an unfair share of rewards. Here, we show that such strategies unexpectedly do exist. In particular, a player X who is witting of these strategies can (i) deterministically set her opponent Y's score, independently of his strategy or response, or (ii) enforce an extortionate linear relation between her and his scores. Against such a player, an evolutionary player's best response is to accede to the extortion. Only a player with a theory of mind about his opponent can do better, in which case Iterated Prisoner's Dilemma is an Ultimatum Game.
Article
The paper studies two-person supergames. Each player is restricted to carry out his strategies by finite automata. A player's aim is to maximize his average payoff and subject to that, to minimize the number of states of his machine. A solution is defined as a pair of machines in which the choice of machine is optimal for each player at every stage of the game. Several properties of the solution are studied and are applied to the repeated prisoner's dilemma. In particular it is shown that cooperation cannot be the outcome of a solution of the infinitely repeated prisoner's dilemma.
Article
Stochastic Games have a value.
Article
We define Markov strategy and Markov perfect equilibrium (MPE) for games with observable actions. Informally, a Markov strategy depends only on payoff-relevant past events. More precisely, it is measurable with respect to the coarsest partition of histories for which, if all other players use measurable strategies, each player's decision-problem is also measurable. For many games, this definition is equivalent to a simple affine invariance condition. We also show that an MPE is generically robust: if payoffs of a generic game are perturbed, there exists an almost Markovian equilibrium in the perturbed game near the initial MPE. Journal of Economic Literature Classification Numbers: C72, C73.