ArticlePublisher preview available

Weighted-average stochastic games with constant payoff

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

In a zero-sum stochastic game, at each stage, two opponents make decisions which determine a stage reward and the law of the state of nature at the next stage, and the aim of the players is to maximize the weighted-average of the stage rewards. In this paper we solve the constant-payoff conjecture formulated by Sorin, Venel and Vigeral in 2010 for two classes of stochastic games with weighted-average rewards: (1) absorbing games, a well-known class of stochastic games where the state changes at most once during the game, and (2) smooth stochastic games, a newly introduced class of stochastic games where the state evolves smoothly under optimal play.
Vol.:(0123456789)
Operational Research (2022) 22:1675–1696
https://doi.org/10.1007/s12351-021-00625-6
1 3
REVIEW
Weighted‑average stochastic games withconstant payoff
MiquelOliu‑Barton1
Received: 13 July 2020 / Revised: 18 December 2020 / Accepted: 1 February 2021 /
Published online: 2 March 2021
© The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature 2021
Abstract
In a zero-sum stochastic game, at each stage, two opponents make decisions which
determine a stage reward and the law of the state of nature at the next stage, and the
aim of the players is to maximize the weighted-average of the stage rewards. In this
paper we solve the constant-payoff conjecture formulated by Sorin, Venel and Vig-
eral in 2010 for two classes of stochastic games with weighted-average rewards: (1)
absorbing games, a well-known class of stochastic games where the state changes
at most once during the game, and (2) smooth stochastic games, a newly introduced
class of stochastic games where the state evolves smoothly under optimal play.
Keywords Stochastic game· Value· Markov chain· Constant payoff
1 Introduction
Model Stochastic games were introduced by  Shapley (1953) in order to model a
repeated interaction between two opponent players in a changing environment. The
game proceeds in stages. At each stage
m
of the game, players play a zero-
sum game that depends on a state variable. Formally, knowing the current state
km
, Player1 chooses an action
im
and Player2 chooses an action
jm
. Their choices
occur independently and have two consequences: first, they produce a stage reward
g(km,im,jm)
which is observed by the players and, second, they determine the law
q(km,im,jm)
of the next period’s state
km+1
. Thus, the sequence of states follows a
Markov chain controlled by the actions of both players. To any sequence of non-
negative weights
𝜃=(𝜃m)
and any initial state k corresponds the
-weighted aver-
age stochastic game which is one where Player 1 maximizes the expectation of
m1𝜃m
g(k
m
,i
m
,j
m)
, given that
k1=k
, while Player2 minimizes the same amount.
A crucial aspect in this model is that the current state is commonly observed by the
players at every stage. Another one is stationarity: the transition and stage reward
functions do not change over time.
* Miquel Oliu-Barton
miquel.oliu.barton@normalesup.org
1 Université Paris-Dauphine, Paris, France
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We consider zero-sum stochastic games. For every discount factor \(\lambda \), a time normalization allows to represent the discounted game as being played during the interval [0, 1]. We introduce the trajectories of cumulated expected payoff and of cumulated occupation measure on the state space up to time \(t\in [0,1]\), under \(\varepsilon \)-optimal strategies. A limit optimal trajectory is defined as an accumulation point as (\(\lambda , \varepsilon )\) tend to 0. We study existence, uniqueness and characterization of these limit optimal trajectories for compact absorbing games.
Article
In 1953, Lloyd Shapley defined the model of stochastic games, which were the first general dynamic model of a game to be defined, and proved that competitive stochastic games have a discounted value. In 1982, Jean-François Mertens and Abraham Neyman proved that competitive stochastic games admit a robust solution concept, the value, which is equal to the limit of the discounted values as the discount rate goes to 0. Both contributions were published in PNAS. In the present paper, we provide a tractable formula for the value of competitive stochastic games.
Article
Asymptotical problems have always played an important role in probability theory. In classical probability theory dealing mainly with sequences of independent variables, theorems of the type of laws of large numbers, theorems of the type of the central limit theorem, and theorems on large deviations constitute a major part of all investigations. In recent years, when random processes have become the main subject of study, asymptotic investigations have continued to playa major role. We can say that in the theory of random processes such investigations play an even greater role than in classical probability theory, because it is apparently impossible to obtain simple exact formulas in problems connected with large classes of random processes. Asymptotical investigations in the theory of random processes include results of the types of both the laws of large numbers and the central limit theorem and, in the past decade, theorems on large deviations. Of course, all these problems have acquired new aspects and new interpretations in the theory of random processes.
Article
The first edition of this book was published in 1979 in Russian. Most of the material presented was related to large-deviation theory for stochastic pro­ cesses. This theory was developed more or less at the same time by different authors in different countries. This book was the first monograph in which large-deviation theory for stochastic processes was presented. Since then a number of books specially dedicated to large-deviation theory have been pub­ lished, including S. R. S. Varadhan [4], A. D. Wentzell [9], J. -D. Deuschel and D. W. Stroock [1], A. Dembo and O. Zeitouni [1]. Just a few changes were made for this edition in the part where large deviations are treated. The most essential is the addition of two new sections in the last chapter. Large deviations for infinite-dimensional systems are briefly conside:red in one new section, and the applications of large-deviation theory to wave front prop­ agation for reaction-diffusion equations are considered in another one. Large-deviation theory is not the only class of limit theorems arising in the context of random perturbations of dynamical systems. We therefore included in the second edition a number of new results related to the aver­ aging principle. Random perturbations of classical dynamical systems under certain conditions lead to diffusion processes on graphs. Such problems are considered in the new Chapter 8.
Article
We study continuous-time stochastic games, with a focus on the existence of their equilibria that are insensitive to a small imprecision in the specification of players' evaluations of streams of payoffs. We show that the stationary, namely, time-independent, discounting game has a stationary equilibrium and that the discounting game and the more general game with time-separable payoffs have an epsilon equilibrium that is an epsilon equilibrium in all games with a sufficiently small perturbation of the players' valuations. A limit point of discounting valuations need not be a discounting valuation as some of the “mass” may be pushed to infinity; it is represented by an average of a discounting valuation and a mass at infinity. We show that for every such limit point there is a strategy profile that is an epsilon equilibrium of all the discounting games with discounting valuations that are sufficiently close to the limit point.
Article
We prove a Tauberian theorem for nonexpansive operators, and apply it to the model of zero-sum stochastic game. Under mild assumptions, we prove that the value of the lambda-discounted game v_{lambda} converges uniformly when lambda goes to 0 if and only if the value of the n-stage game v_n converges uniformly when n goes to infinity. This generalizes the Tauberian theorem of Lehrer and Sorin (1992) to the two-player zero-sum case. We also provide the first example of a stochastic game with public signals on the state and perfect observation of actions, with finite state space, signal sets and action sets, in which for some initial state k_1 known by both players, (v_{lambda}(k_1)) and (v_n(k_1)) converge to distinct limits.
Article
The authors study two person, zero sum stochastic games. It is proved that lim//n// yields // infinity left brace V//n/n right brace equals lim//r// yields //0 rV(r), where V//n is the value of the n-stage game and V(r) is the value of the infinite-stage game with payoffs discounted at interest rate r greater than 0. It is also shown that V(r) may be expanded as a Laurent series in a fractional power of r. This expansion is valid for small positive r. A similar expansion exists for optimal strategies. The main proof is an application of Tarski's principle for real closed fields.
Article
Stochastic Games have a value.