Fig 1 - uploaded by Simon Lucas
Content may be subject to copyright.
Source publication
We present an application of Monte Carlo tree search (MCTS) for the game of Ms Pac-Man. Contrary to most applica- tionsofMCTStodate,MsPac-Manrequiresalmostreal-timedeci- sion making and does not have a natural end state. We approached the problem by performing Monte Carlo tree searches on a five player tree representation of the game with limited t...
Context in source publication
Context 1
... current version of the simulator is a more accurate approximation of the original game, not only at the functional but also at the cosmetic level, and includes the four original mazes. Figure 1 shows a screen shot of each level in action. Nevertheless, there are still important differences with respect to the original game: ...
Similar publications
From an AI point of view, Real-Time Strategy (RTS) games are hard because they have enormous state spaces, they are real-time and partially observable. In this paper, we explore an approach to deploy game- tree search in RTS games by using game state abstraction, and explore the effect of using different abstractions over the game state. Different...
Real-Time Strategy (RTS) games have shown to be very resilient to standard adversarial tree search techniques. Recently , a few approaches to tackle their complexity have emerged that use game state or move abstractions, or both. Unfortunately, the supporting experiments were either limited to simpler RTS environments (µRTS, SparCraft) or lack test...
From an AI point of view, Real-Time Strategy (RTS) games are hard because they have enormous state spaces, they are real-time and partially observable. In this paper, we present an approach to deploy game-tree search in RTS games by using game state abstraction. We propose a high-level abstract representation of the game state, that significantly r...
Citations
... To verify the possibility of combining a dynamic influence map with reinforcement learning, Ms. Pac-Man, a popular test environment in the field of AI [3][4][5][6][7][8], is used as the learning and evaluation environment. In this kind of environment, the complete capabilities of the dynamic influence map, which represents the dynamic information of the current state of the game, can be displayed. ...
... The larger the value of α, the greater the value of the spreading influence of source point a in the direction of its movement. In this study, d a,b is used to calculate the influence of b according to Equation (5). ...
Almost all recent deep reinforcement learning algorithms use four consecutive frames as the state space to retain the dynamic information. If the training state data constitute an image, the state space is used as the input of the neural network for training. As an AI-assisted decision-making technology, a dynamic influence map can describe dynamic information. In this paper, we propose the use of a frame image superimposed with an influence map as the state space to express dynamic information. Herein, we optimize Ape-x as a distributed reinforcement learning algorithm. Sparse reward is an issue that must be solved in refined intelligent decision making. The use of an influence map is proposed to generate the intrinsic reward when there is no external reward. The experiments conducted in this study prove that the combination of a dynamic influence map and deep reinforcement learning is effective. Compared with the traditional method that uses four consecutive frames to represent dynamic information, the score of the proposed method is increased by 11–13%, the training speed is increased by 59%, the video memory consumption is reduced by 30%, and the memory consumption is reduced by 50%. The proposed method is compared with the Ape-x algorithm without an influence map, DQN, N-Step DQN, QR-DQN, Dueling DQN, and C51. The experimental results show that the final score of the proposed method is higher than that of the compared baseline methods. In addition, the influence map is used to generate an intrinsic reward to effectively resolve the sparse reward problem.
... The success of MCTS in board games has encouraged researchers to apply it in other scientific fields. Therefore, MCTS has been successfully implemented in video-games [23,24], protein folding problems [25], materials design and discovery [26,27], mixed-integer planning [28,29], and artificial general intelligence for games [30]. However, there exists still only a small number of engineering applications related to MCTS [31,32]. ...
Truss layout optimization under complex constraints has been a hot and challenging problem for decades that aims to find the optimal node locations, connection topology between nodes, and cross-sectional areas of connecting bars. Monte Carlo Tree Search (MCTS) is a reinforcement learning search technique that is competent to solve decision-making problems. Inspired by the success of AlphaGo using MCTS, the truss layout problem is formulated as a Markov Decision Process (MDP) model, and a 2-stage MCTS-based algorithm, AlphaTruss, is proposed for generating optimal truss layout considering topology, geometry, and bar size. In this MDP model, three sequential action sets of adding nodes, adding bars, and selecting sectional areas greatly expand the solution space and the reward function gives feedback to actions according to both geometric stability and structural simulation. To find the optimal sequential actions, AlphaTruss solves the MDP model and gives the best decision in each design step by searching and learning through MCTS. Compared with existing results from the literature, AlphaTruss exhibits better performance in finding the truss layout with the minimum weight under stress, displacement, and buckling constraints, which verifies the validity and efficiency of the established algorithm.
... Since then, MCTS has been applied to a wide variety of computerized card games and board games such as Scrabble, Poker, Othello, and Settlers of Catan [4][5][6][7]. MCTS has also been applied to several different video games including real-time games such as Ms. Pac-Man, and multiplayer games such as Starcraft and Civilization II [1,[8][9][10]. These examples of the application of MCTS demonstrate that MCTS has the ability to adapt to games featuring a large number of possible actions and a complex set of system variables. ...
... Using this approach, Gelly et al. were able to develop an improved game AI for computerized Go which is capable of beating master level players on a 9 x 9 board [11]. Another focus for MCTS improvement lies in the algorithm's rollout policy which allows MCTS to focus on promising actions while ignoring unprofitable actions [9]. The default rollout policy used in MCTS consists of a set of purely random moves. ...
Monte Carlo Tree Search (MCTS) is a best-first search algorithm that has produced many breakthroughs in AI research. MCTS has been applied to a wide variety of domains including turn-based board games, real-time strategy games, multiagent systems , and optimization problems. In addition to its ability to function in a wide variety of domains, MCTS is also a suitable candidate for performance improving modifications such as the improvement of its default rollout policy. In this work, we propose an enhancement to MCTS called Multiagent Monte Carlo Tree Search (MAMCTS) which incorporates multiagent credit evaluations in the form of Difference Evaluations. We show that MAMCTS can be successfully applied to a cooperative system called Multiagent Gridworld. We then show that the use of Difference Evaluations in MAMCTS offers superior control over agent decision making compared with other forms of multiagent credit evaluations, namely Global Evaluations. Furthermore , we show that the default rollout policy can be improved using a Genetic Algorithm, with (µ + λ) selection, resulting in a 37.6% increase in overall system performance within the training domain. Finally, we show that the trained rollout policy can be transferred to more complex multiagent systems resulting in as high as a 14.6% increase in system performance compared to the default rollout policy. ii
... This version came with three default ghosts teams showing different behaviours (Random, Legacy and Pincer). It was later extended by Samothrakis, Robles and Lucas [13] and modified further by Rohlfshagen [11] for use in the competition. The current version of the software bears little resemblance to the original code and is continually improved in response to comments by the competition's participants. ...
... References Original (Screen-Capture) [9], [10], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35] Public Variant [36], [37], [38], [39], [40], [41], [42], [43] Ms Pac-Man vs Ghosts engine [12], [44], [20], [45], [46], [47], [13], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66], [67] Ms Pac-Man vs Ghost Team engine [14] Own implementation [68], [69], [70], [71], [72], [73], [74], [75], [76], [77], [78], [79], [80], [81], [82], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92] in the most publications. Prior to the competitions described above, papers were largely fragmented, with each using their own, often much simplified version of the game. ...
... Rule-based & Finite State Machines [71], [16], [15], [72], [18], [23], [24], [9], [52], [10], [65] 67 Tree Search & Monte Carlo [20], [25], [26], [74], [13], [29], [30], [49], [51], [59], [56], [61] Evolutionary Algorithms [68], [69], [47], [45], [46], [48], [53], [50], [58], [57], [59], [60], [63] Neural Networks [70], [38], [75] Neuro-evolutionary [12], [36], [37], [44], [28], [31], [32], [33], [77], [62], [67], [64], [43] Reinforcement Learning [73], [21], [19], [22], [78], [41], [42], [34], [82], [92], [35] Other [27], [17], [54], [79], [90], [91] Game psychology [93], [94], [95], [96], [97], [98], [99] 7 Psychology [100], [101], [81] 3 Robotics [102], [103] 2 Sociology [104], [105] 2 Brain Computer Interfaces [83], [84], [85] 3 Biology and Animals [106] 1 Education [102], [107], [103], [80] 4 Other [108], [39], [109], [40] 4 ...
Pac-Man and its equally popular successor Ms Pac-Man are often attributed to being the frontrunners of the golden age of arcade video games. Their impact goes well beyond the commercial world of video games and both games have featured in numerous academic research projects over the last two decades. In fact, scientific interest is on the rise and many avenues of research have been pursued, including studies in robotics, biology, sociology and psychology. The most active field of research is computational intelligence, not least because of popular academic gaming competitions that feature Ms Pac-Man. This paper summarises peer-reviewed research that focuses on either game (or close variants thereof) with particular emphasis on the field of computational intelligence. The potential usefulness of games like Pac-Man for higher education is also discussed and the paper concludes with a discussion of prospects for future work.
... Robles and Lucas [12] applied a Tree Search method at the screen capture version of the game. Together with Samothrakis they implemented Ghost Team agents using MCTS [13]. The same approach was used by Nguyen and Thawonmas [14] to create a full Ghost Team. ...
... Monte-Carlo tree search have also grown in popularity over the past few years [23][24][25][26]. Robles and Lucas [23] applied a simple tree search heuristic on a Ms. Pac-Man agent to evaluate the danger of any particular course of action. ...
... Their agents scored over 58000 points. More recently, Samothrakis et al. [25] and Pepels et al. [26] applied Monte Carlo tree search to create high-performing agents, which obtained average scores of around 81000 and 87000 points, respectively. Foderaro et al. [27,28] relied on tree searches as well. ...
Conventional reinforcement learning methods for Markov decision processes rely on weakly-guided, stochastic searches to drive the learning process. It can therefore be difficult to predict what agent behaviors might emerge. In this paper, we consider an information-theoretic approach for performing constrained stochastic searches that promote the formation of risk-averse to risk-favoring behaviors. Our approach is based on the value of information, a criterion that provides an optimal trade-off between the expected return of a policy and the policy's complexity. As the policy complexity is reduced, there is a high chance that the agents will eschew risky actions that increase the long-term rewards. The agents instead focus on simply completing their main objective in an expeditious fashion. As the policy complexity increases, the agents will take actions, regardless of the risk, that seek to decrease the long-term costs. A minimal-cost policy is sought in either case; the obtainable cost depends on a single, tunable parameter that regulates the degree of policy complexity. We evaluate the performance of value-of-information-based policies on a stochastic version of Ms. Pac-Man. A major component of this paper is demonstrating that ranges of policy complexity values yield different game-play styles and analyzing why this occurs. We show that low-complexity policies aim to only clear the environment of pellets while avoiding invulnerable ghosts. Higher-complexity policies implement multi-modal strategies that compel the agent to seek power-ups and chase after vulnerable ghosts, both of which reduce the long-term costs.
... According to Kehoe, the initial gaming AI creations were rule-based systems, the most basic form an intelligent system can take [9]. Games such as Pac-Man are an example of rule-based AI systems, where the four pursuing "ghosts" make navigational decisions based upon simple rules and the position of the player [10], [11]. Kehoe presents FSM as a development of rule-based AI systems, as a FSM can evaluate many rules simultaneously and factor in the current state of the AI. ...
This paper proposes a character generation approach for the M.U.G.E.N. fighting game that can create engaging AI characters using a computationally cheap process without the intervention of the expert developer. The approach uses a Genetic Programming algorithm that refines randomly generated character strategies into better ones using tournament selection. The generated AI characters were tested by twenty-seven human players and were rated according to results, perceived difficulty and how engaging the gameplay was. The main advantages of this procedure are that no prior knowledge of how to code the strategies of the AI character is needed and there is no need to interact with the internal code of the game. In addition, the procedure is capable of creating a wide diversity of players with different strategic skills, which could be potentially used as a starting point to a further adaptive process.
... As patrolling is a task with no inherent end, a different approach is needed. In order to evaluate the expected reward, the default policy is applied up to a fixed horizon [Samothrakis et al., 2011] time after which the reward for the rollout is calculated. ...
Adversarial patrolling is an algorithmic problem where a robot visits sites within a given area so as to detect the presence of an adversary. We formulate and solve a new variant of this problem where intrusion events occur at discrete locations and are assumed to be clustered in time. Unlike related formulations, we model the behaviour of the adversary using a stochastic point process known as the reactive point process, which naturally models temporally self-exciting events such as pest intrusion and weed growth in agriculture. We present an asymptotically optimal, anytime algorithm based on Monte Carlo tree search that plans the motion of a robot given a separate event detection system in order to regulate event propagation at the sites it visits. We illustrate the behaviour of our algorithm in simulation using several scenarios, and compare its performance to a lawnmower planning algorithm. Our results indicate that our formulation and solution are promising in enabling practical applications and further theoretical extensions.
... Samothrakis et al [24] used a 5 player max n tree with limited tree search depth. The paper experimented with both Monte-Carlo Tree Search (MCTS) for Ms. Pac-Man and for the Ghosts. ...
This paper introduces the revival of the popular Ms. Pac-Man Versus Ghost Team competition. We present an updated game engine with Partial Observability constraints, a new Multi-Agent Systems approach to developing Ghost agents and several sample controllers to ease the development of entries. A restricted communication protocol is provided for the Ghosts, providing a more challenging environment than before. The competition will debut at the IEEE Computational Intelligence and Games Conference 2016. Some preliminary results showing the effects of Partial Observability and the benefits of simple communication are also presented.
... In 2011, Samothrakis et al. (2011) studied the effects of the Monte Carlo Tree Search (MCTS) in Ms Pacman agent and ghosts by using the 5-player max n tree representation of the game (including agent and the 4 ghosts). The search is only performed to a limited depth in the game tree and significantly different sets of payoff rules were used in the tree search for the agent compared to the ghosts. ...
AbstrakTeknik Kecerdasan Buatan (AI) berjaya digunakan dan diaplikasikan dalam pelbagai bidang, termasukpembuatan, kejuruteraan, ekonomi, perubatan dan ketenteraan. Kebelakangan ini, terdapat minat yangsemakin meningkat dalam Permainan Kecerdasan Buatan atau permainan AI. Permainan AI merujukkepada teknik yang diaplikasikan dalam permainan komputer dan video seperti pembelajaran, pathfinding,perancangan, dan lain-lain bagi mewujudkan tingkah laku pintar dan autonomi kepada karakter dalampermainan. Objektif utama kajian ini adalah untuk mengemukakan beberapa teknik yang biasa digunakandalam merekabentuk dan mengawal karakter berasaskan komputer untuk permainan Ms Pac-Man antaratahun 2005-2012. Ms Pac-Man adalah salah satu permainan yang digunakan dalam siri pertandinganpermainan diperingkat antarabangsa sebagai penanda aras untuk perbandingan pengawal autonomi.Kaedah analisis kandungan yang menyeluruh dijalankan secara ulasan dan sorotan literatur secara kritikal.Dapatan kajian menunjukkan bahawa, walaupun terdapat berbagai teknik, limitasi utama dalam kajianterdahulu untuk mewujudkan karakter permaianan Pac Man adalah kekurangan Generalization Capabilitydalam kepelbagaian karakter permainan. Hasil kajian ini akan dapat digunakan oleh penyelidik untukmeningkatkan keupayaan Generalization AI karakter permainan dalam Pasaran Permainan KecerdasanBuatan. Abstract Artificial Intelligence (AI) techniques are successfully used and applied in a wide range of areas, includingmanufacturing, engineering, economics, medicine and military. In recent years, there has been anincreasing interest in Game Artificial Intelligence or Game AI. Game AI refers to techniques applied incomputer and video games such as learning, pathfinding, planning, and many others for creating intelligentand autonomous behaviour to the characters in games. The main objective of this paper is to highlightseveral most common of the AI techniques for designing and controlling the computer-based charactersto play Ms. Pac-Man game between years 2005-2012. The Ms. Pac-Man is one of the games that used asbenchmark for comparison of autonomous controllers in a series of international Game AI competitions.An extensive content analysis method was conducted through critical review on previous literature relatedto the field. Findings highlight, although there was various and unique techniques available, the majorlimitation of previous studies for creating the Ms. Pac-Man game characters is a lack of generalizationcapability across different game characters. The findings could provide the future direction for researchersto improve the Generalization A.I capability of game characters in the Game Artificial Intelligence market.