Marc Ponsen’s research while affiliated with Maastricht University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (26)


Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling.
  • Article

September 2011

·

93 Reads

·

39 Citations

Journal of Artificial Intelligence Research

Marc J. V. Ponsen

·

·

This article discusses two contributions to decision-making in complex partially observable stochastic games. First, we apply two state-of-the-art search techniques that use Monte-Carlo sampling to the task of approximating a Nash-Equilibrium (NE) in such games, namely Monte-Carlo Tree Search (MCTS) and Monte-Carlo Counterfactual Regret Minimization (MCCFR). MCTS has been proven to approximate a NE in perfect-information games. We show that the algorithm quickly finds a reasonably strong strategy (but not a NE) in a complex imperfect information game, i.e. Poker. MCCFR on the other hand has theoretical NE convergence guarantees in such a game. We apply MCCFR for the first time in Poker. Based on our experiments, we may conclude that MCTS is a valid approach if one wants to learn reasonably strong strategies fast, whereas MCCFR is the better choice if the quality of the strategy is most important. Our second contribution relates to the observation that a NE is not a best response against players that are not playing a NE. We present Monte-Carlo Restricted Nash Response (MCRNR), a sample-based algorithm for the computation of restricted Nash strategies. These are robust best-response strategies that (1) exploit non-NE opponents more than playing a NE and (2) are not (overly) exploitable by other strategies. We combine the advantages of two state-of-the-art algorithms, i.e. MCCFR and Restricted Nash Response (RNR). MCRNR samples only relevant parts of the game tree. We show that MCRNR learns quicker than standard RNR in smaller games. Also we show in Poker that MCRNR learns robust best-response strategies fast, and that these strategies exploit opponents more than playing a NE does.


MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling

April 2011

·

53 Reads

·

4 Citations

This paper presents a sample-based algorithm for the compu-tation of restricted Nash strategies in complex extensive form games. Recent work indicates that regret-minimization algo-rithms using selective sampling, such as Monte-Carlo Coun-terfactual Regret Minimization (MCCFR), converge faster to Nash equilibrium (NE) strategies than their non-sampled counterparts which perform a full tree traversal. In this pa-per, we show that MCCFR is also able to establish NE strate-gies in the complex domain of Poker. Although such strate-gies are defensive (i.e. safe to play), they are oblivious to op-ponent mistakes. We can thus achieve better performance by using (an estimation of) opponent strategies. The Restricted Nash Response (RNR) algorithm was proposed to learn ro-bust counter-strategies given such knowledge. It solves a modified game, wherein it is assumed that opponents play according to a fixed strategy with a certain probability, or to a regret-minimizing strategy otherwise. We improve the rate of convergence of the RNR algorithm using sampling. Our new algorithm, MCRNR, samples only relevant parts of the game tree. It is therefore able to converge faster to robust best-response strategies than RNR. We evaluate our algorithm on a variety of imperfect information games that are small enough to solve yet large enough to be strategically interesting, as well as a large game, Texas Hold'em Poker.


Figure 2: An example of a relational decision tree encoding the differentiating function for predicting cards. Internal nodes include relational tests that partition the state space. Terminal nodes contain zero or more examples of the two distributions, effectively denoting a probability distribution over both distributions.
Integrating Opponent Models with Monte-Carlo Tree Search in Poker
  • Article
  • Full-text available

January 2010

·

1,109 Reads

·

37 Citations

In this paper we apply a Monte-Carlo Tree Search implemen-tation that is boosted with domain knowledge to the game of poker. More specifically, we integrate an opponent model in the Monte-Carlo Tree Search algorithm to produce a strong poker playing program. Opponent models allow the search algorithm to focus on relevant parts of the game-tree. We use an opponent modelling approach that starts from a (learned) prior, i.e., general expectations about opponent behavior, and then learns a relational regression tree-function that adapts these priors to specific opponents. Our modelling approach can generate detailed game features or relations on-the-fly. Additionally, using a prior we can already make reasonable predictions even when limited experience is available for a particular player. We show that Monte-Carlo Tree Search with integrated opponent models performs well against state-of-the-art poker programs.

Download


Abstraction and Generalization in Reinforcement Learning: A Summary and Framework

May 2009

·

3,253 Reads

·

51 Citations

Lecture Notes in Computer Science

In this paper we survey the basics of reinforcement learning, generalization and abstraction. We start with an introduction to the fundamentals of reinforcement learning and motivate the necessity for generalization and abstraction. Next we summarize the most important techniques available to achieve both generalization and abstraction in reinforcement learning. We discuss basic function approximation techniques and delve into hierarchical, relational and transfer learning. All concepts and techniques are illustrated with examples.


Fig. 1. The MiniGate environment.
Fig. 2. Example of a script drawn by the random script generation procedure. Note that some of the rules will never be executed. For example, the wizard cannot cast "Fireball," because he can cast only one level-3 spell, which was "Monster Summoning I." Further optimization reduces the probability of such coincidences.
Fig. 3. Diversity versus winning ratio for DS-B and DS-M, against various opponent tactics (summoning, optimized, offensive, and novice).
Effective and Diverse Adaptive Game AI

March 2009

·

1,264 Reads

·

21 Citations

IEEE Transactions on Computational Intelligence and AI in Games

Adaptive techniques tend to converge to a single optimum. For adaptive game AI, such convergence is often undesirable, as repetitive game AI is considered to be uninteresting for players. In this paper, we propose a method for automatically learning diverse but effective macros that can be used as components of adaptive game AI scripts. Macros are learned by a cross-entropy method (CEM). This is a selection-based optimization method that, in our experiments, maximizes an interestingness measure. We demonstrate the approach in a computer role-playing game (CRPG) simulation with two duelling wizards, one of which is controlled by an adaptive game AI technique called ldquodynamic scripting.rdquo Our results show that the macros that we learned manage to increase both adaptivity and diversity of the scripts generated by dynamic scripting, while retaining playing strength.


An evolutionary game-theoretic analysis of poker strategies

January 2009

·

582 Reads

·

35 Citations

Entertainment Computing

In this paper we investigate the evolutionary dynamics of strategic behavior in the game of poker by means of data gathered from a large number of real world poker games. We perform this study from an evolutionary game theoretic perspective using two Replicator Dynamics models. First we consider the basic selection model on this data, secondly we use a model which includes both selection and mutation. We investigate the dynamic properties by studying how rational players switch between different strategies under different circumstances, what the basins of attraction of the equilibria look like, and what the stability properties of the attractors are. We illustrate the dynamics using a simplex analysis. Our experimental results confirm existing domain knowledge of the game, namely that certain strategies are clearly inferior while others can be successful given certain game conditions.


Figure 1: Policy with communication (relational)
Figure 2: Rewards obtained by freezing the Q-function approximation and following a greedy policy to test for optimality (averaged over 5 runs)
Learning with whom to communicate using relational reinforcement learning.

January 2009

·

97 Reads

·

5 Citations

Marc J. V. Ponsen

·

·

·

[...]

·

Relational reinforcement learning (RRL) has emerged in the machine learning community as a new promising subfield of reinforcement learning (RL) (e.g. [1]). It upgrades RL techniques by using relational representations for states, actions and learned value-functions or policies to allow more natural representations and abstractions of complex tasks. This leads to a serious state space reduction, allowing to better generalize and infer new knowledge.



Bayes-Relational Learning of Opponent Models from Incomplete Information in No-Limit Poker.

January 2008

·

201 Reads

·

14 Citations

We propose an opponent modeling approach for no- limit Texas hold-em poker that starts from a (learned) prior, i.e., general expectations about opponent behav- ior and learns a relational regression tree-function that adapts these priors to specific opponents. An important asset is that this approach can learn from incomplete in- formation (i.e. without knowing all players' hands in training games).


Citations (23)


... Abstractions have also been utilized to reduce the memory required to store environment representations [19], [20] and to alleviate the computational complexity of evaluating cost functions in active-sensing applications [21]. However, while identifying the relevant aspects of a problem to generate task-relevant abstractions has long been considered vital to intelligent reasoning [22]- [27], the means by which they are generated has traditionally been heavily reliant on user-provided rules. ...

Reference:

Information-theoretic Abstraction of Semantic Octree Models for Integrated Perception and Planning
Abstraction and Generalization in Reinforcement Learning: A Summary and Framework
  • Citing Article
  • January 2010

Lecture Notes in Computer Science

... In addition, we require our algorithms to operate efficiently in real time (online), as opposed to algorithms that perform offline computations assuming they have access to a large number of samples of the opponent's strategy in advance [9,13]. That prior work also assumed access to historical data which included the private information of the opponents (i.e., their hole cards) even when such information was only observed by the opponent. ...

MCRNR: Fast Computing of Restricted Nash Responses by Means of Sampling
  • Citing Article
  • April 2011

... To develop advanced computer players of Big2 with challenging artificial intelligence, we studied existing algorithms in games including Monte Carlo Tree Search (MCTS [8]- [12]), Information Set Monte Carlo Tree Search (ISMCTS [13]- [17]), Big2 artificial intelligence algorithms [18]- [20], Libratus [21] that has the highest win rate VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ ...

Integrating Opponent Models with Monte-Carlo Tree Search in Poker

... This game has intelligent agents that are developed with reinforcement learning with hierarchical approaches. Agents have been developed using Q-learning [16] algorithm and its modified versions [17]. Civilizations IV is the fourth game of the popular turn based strategy video game series, which is published by 2K Games in 2005. ...

Hierarchical Reinforcement Learning with Deictic Representation in a Computer Game

... Others have applied evolutionary game models to real world data of online poker play [28,29]. The learning of the players in the data set was summarized using a handful of strategy descriptors, and the learning of agents over the course of play was analyzed. ...

The dynamics of human behaviour in poker

... As the developers prepared the evolution offline against the game agents, this led to the successful generation of robust strategies for game agents. This can improve not only the game AI domain knowledge but also the performance of game agents (Ponsen and Spronck, 2004). ...

IMPROVING ADAPTIVE GAME AI WITH EVOLUTIONARY LEARNING

... At the same time, researchers from Maastricht University, building on the links between RL methods and evolutionary game theory [Tuyls et al., 2002[Tuyls et al., , 2003, studied the evolutionary dynamics of heuristic strategies for Texas Hold'em Poker [Ponsen et al., 2009] and continuous double auctions Parsons, 2007, Kaisers et al., 2008]. Later, these works were extended by Tuyls and colleagues at DeepMind, as they appealed to EGTA to analyze and evaluate the RL breakthroughs achieved in Go, Capture the Flag, StarCraft, and other games [Balduzzi et al., 2018, Tuyls et al., 2018a,b, 2020. ...

An evolutionary game-theoretic analysis of poker strategies
  • Citing Article
  • January 2009

Entertainment Computing