ThesisPDF Available

HRLBˆ2: A Game AI Architecture for Believable Bots That Unifies the Elements of Flow and Reinforcement Learning

Authors:

Abstract and Figures

From an artificial intelligence standpoint, creating adaptive games that maximize players' enjoyment has remained a challenge since it is unclear how to effectively design and implement adaptive game modules or understanding which game features should be adjusted to achieve this objective. In order to address these challenges, in this thesis, we present a generic flow framework for game AI -- FlowAI. Our flow framework describes which modules and what gameplay features can be adapted to design an effective video game intended to facilitate the achievement of the optimal experience called flow. Furthermore, as a first step to empirically evaluate FlowAI, we approach the problem of fostering immersion. Particularly, we focus on designing believable behaviors for Non-player Characters (NPCs). That is, creating NPCs that appear to be controlled by a human player. To achieve this goal, this thesis introduces HRLB^2 -- a model-based hierarchical reinforcement learning framework for believable bots. This novel approach is designed so it can overcome the two main challenges of creating human-like NPCs. The first difficulty is exploring domains with high-dimensional state-action spaces while satisfying constraints imposed by traits that characterize a human-like behavior. The second problem is generating a diversity of behaviors, which also adapt to the opponent's play style. We evaluated the effectiveness of our framework in the domain of the 2D fighting game named Street Fighter IV. Accordingly, we implemented a bot in our framework and then assess its human-likeness by performing a Turing test. The results of these tests demonstrate that the bot behaves in a human-like manner.
Content may be subject to copyright.
A preview of the PDF is not available
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
The creation of believable behaviors for Non-Player Characters (NPCs) is key to improve the players' experience while playing a game. To achieve this objective, we need to design NPCs that appear to be controlled by a human player. In this paper, we propose a hierarchical reinforcement learning framework for believable bots (HRLB^2). This novel approach has been designed so it can overcome two main challenges currently faced in the creation of human-like NPCs. The first difficulty is exploring domains with high-dimensional state-action spaces, while satisfying constraints imposed by traits that characterize human-like behavior. The second problem is generating behavior diversity, by also adapting to the opponent's playing style. We evaluated the effectiveness of our framework in the domain of the 2D fighting game named Street Fighter IV. The results of our tests demonstrate that our bot behaves in a human-like manner.
Article
Full-text available
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
Conference Paper
Full-text available
Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort on the human designer's part. To date, human factors are generally not considered in the development and evaluation of possible approaches. In this paper, we propose and evaluate a novel method, based on human psychology literature, which we show to be both effective and efficient, for both expert and non-expert designers, in injecting human knowledge for speeding up tabular RL.
Article
Full-text available
In this article we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution we formulate the problem as a learning one and propose a novel RL algorithm capable of learning when to advise, adapting to the student and the task at hand. Furthermore, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
Article
Full-text available
Perception is guided by the anticipation of future events. It has been hypothesized that this process may be implemented by pattern completion in early visual cortex, in which a stimulus sequence is recreated after only a subset of the visual input is provided. Here we test this hypothesis using ultra-fast functional magnetic resonance imaging to measure BOLD activity at precisely defined receptive field locations in visual cortex (V1) of human volunteers. We find that after familiarizing subjects with a spatial sequence, flashing only the starting point of the sequence triggers an activity wave in V1 that resembles the full stimulus sequence. This preplay activity is temporally compressed compared to the actual stimulus sequence and remains present even when attention is diverted from the stimulus sequence. Preplay might therefore constitute an automatic prediction mechanism for temporal sequences in V1.