- A preview of this full-text is provided by Springer Nature.
- Learn more

Preview content only

Content available from Operational Research

This content is subject to copyright. Terms and conditions apply.

Vol.:(0123456789)

Operational Research (2022) 22:1675–1696

https://doi.org/10.1007/s12351-021-00625-6

1 3

REVIEW

Weighted‑average stochastic games withconstant payoﬀ

MiquelOliu‑Barton1

Received: 13 July 2020 / Revised: 18 December 2020 / Accepted: 1 February 2021 /

Published online: 2 March 2021

© The Author(s), under exclusive licence to Springer-Verlag GmbH, DE part of Springer Nature 2021

Abstract

In a zero-sum stochastic game, at each stage, two opponents make decisions which

determine a stage reward and the law of the state of nature at the next stage, and the

aim of the players is to maximize the weighted-average of the stage rewards. In this

paper we solve the constant-payoﬀ conjecture formulated by Sorin, Venel and Vig-

eral in 2010 for two classes of stochastic games with weighted-average rewards: (1)

absorbing games, a well-known class of stochastic games where the state changes

at most once during the game, and (2) smooth stochastic games, a newly introduced

class of stochastic games where the state evolves smoothly under optimal play.

Keywords Stochastic game· Value· Markov chain· Constant payoﬀ

1 Introduction

Model Stochastic games were introduced by Shapley (1953) in order to model a

repeated interaction between two opponent players in a changing environment. The

game proceeds in stages. At each stage

m∈ℕ

of the game, players play a zero-

sum game that depends on a state variable. Formally, knowing the current state

km

, Player1 chooses an action

im

and Player2 chooses an action

jm

. Their choices

occur independently and have two consequences: ﬁrst, they produce a stage reward

g(km,im,jm)

which is observed by the players and, second, they determine the law

q(km,im,jm)

of the next period’s state

km+1

. Thus, the sequence of states follows a

Markov chain controlled by the actions of both players. To any sequence of non-

negative weights

𝜃=(𝜃m)

and any initial state k corresponds the

𝜃

-weighted aver-

age stochastic game which is one where Player 1 maximizes the expectation of

∑m≥1𝜃m

g(k

m

,i

m

,j

m)

, given that

k1=k

, while Player2 minimizes the same amount.

A crucial aspect in this model is that the current state is commonly observed by the

players at every stage. Another one is stationarity: the transition and stage reward

functions do not change over time.

* Miquel Oliu-Barton

miquel.oliu.barton@normalesup.org

1 Université Paris-Dauphine, Paris, France

Content courtesy of Springer Nature, terms of use apply. Rights reserved.