Article

World-Championship-Caliber Scrabble

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Computer Scrabble programs have achieved a level of performance that exceeds that of the strongest human players. Maven was the first program to demonstrate this against human opposition. Scrabble is a game of imperfect information with a large branching factor. The techniques successfully applied in two-player games such as chess do not work here. Maven combines a selective move generator, simulations of likely game scenarios, and the B∗ algorithm to produce a world-championship-caliber Scrabble-playing program.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Table 3 shows that Maven has maintained at least a slight superiority over human experts since its debut in 1986 according to Maven's tournament statistics where the total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. Currently, Maven and Quackle are the leading Scrabble Ais, where both have defeated the best human champions in tournaments [17]. Between the time when simulation became available as an analytic tool in 1990 and simulations were first used in competitive play in 1996, human players have improved their positional skills by studying simulation results [17]. ...
... Currently, Maven and Quackle are the leading Scrabble Ais, where both have defeated the best human champions in tournaments [17]. Between the time when simulation became available as an analytic tool in 1990 and simulations were first used in competitive play in 1996, human players have improved their positional skills by studying simulation results [17]. However, this may be the earliest time at which a computer program achieved world-class status over human masters in a non-trivial game of skill. ...
... MAVEN AND HUMAN EXPERTS COMPARED[17] ...
... Table 3 shows that Maven has maintained at least a slight superiority over human experts since its debut in 1986 according to Maven's tournament statistics where the total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. Currently, Maven and Quackle are the leading Scrabble Ais, where both have defeated the best human champions in tournaments [17]. Between the time when simulation became available as an analytic tool in 1990 and simulations were first used in competitive play in 1996, human players have improved their positional skills by studying simulation results [17]. ...
... Currently, Maven and Quackle are the leading Scrabble Ais, where both have defeated the best human champions in tournaments [17]. Between the time when simulation became available as an analytic tool in 1990 and simulations were first used in competitive play in 1996, human players have improved their positional skills by studying simulation results [17]. However, this may be the earliest time at which a computer program achieved world-class status over human masters in a non-trivial game of skill. ...
... MAVEN AND HUMAN EXPERTS COMPARED[17] ...
Preprint
Artificial intelligence (AI) has a long-standing and healthy relationship with games where it becomes a popular application area for AI-driven research such as game playing, game design and so on. With the recent advancement of AI game-playing programs that had exceeded human capabilities, fairness becomes an important issue to be addressed in order to ensure the attractiveness of a game can be retained to the future players. Such an issue may also be compounded when considering the context of turn-based games, where the first player may have a huge advantage over the subsequent player(s) (called the advantage of initiative). As such, this paper proposes an innovative way to make a game attractive while maintaining fairness by adopting the Komi (compensation system). This fairness solution is validated by applying the system onto a word anagram game, Scrabble, where fairness can be maintained based on the skill level of the players.
... However, defeating a computer AI opponent requires complex and efficient heuristics. Currently, Maven and Quackle are the leading Scrabble AI's, where both have defeated the best human champions in tournaments [20]. ...
... Between the time when simulation became available as an analytical tool in 1990 and simulations were first used in competitive play in 1996, human players have improved their positional skills by studying simulation results [20]. However, Table 4 shows that Maven has maintained at least a slight superiority over human experts since its debut in 1986 according to Maven's tournament statistics where the total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. ...
... However, Table 4 shows that Maven has maintained at least a slight superiority over human experts since its debut in 1986 according to Maven's tournament statistics where the total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. This may be the earliest time at which a computer program achieved world-class status over human masters in a non-trivial game of skill [20]. Table 4. MAVEN and human experts compared [20]. ...
Preprint
Games are attractive and engaging due to the complexity they pose to the player. Some games are complex enough which made them attractive to play. However, popular games would lose their attractiveness due to the large advantage of initiative. High-performance AI like AlphaZero suggests from their actual games played that advantage of the first player would become larger as the performance level increases. This implies that games with a large advantage of the initiative would lose their attractiveness due to the unfairness. This paper explores an innovative way to make a game stay attractive. A link between the advantage of initiative and performance level is investigated while using Scrabble AI. Using two measures: the advantage of initiative and game refinement, possible treatments are considered. The experimental results with Scrabble AI suggest that reduction of search space from 15×15 to 13×13 board size is a possible enhancement.
... However, defeating a computer AI opponent requires complex and efficient heuristics. Currently, Maven and Quackle are the leading Scrabble AI's, where both have defeated the best human champions in tournaments [20]. ...
... Between the time when simulation became available as an analytical tool in 1990 and simulations were first used in competitive play in 1996, human players have improved their positional skills by studying simulation results [20]. However, Table 4 shows that Maven has maintained at least a slight superiority over human experts since its debut in 1986 according to Maven's tournament statistics where the total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. ...
... However, Table 4 shows that Maven has maintained at least a slight superiority over human experts since its debut in 1986 according to Maven's tournament statistics where the total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. This may be the earliest time at which a computer program achieved world-class status over human masters in a non-trivial game of skill [20]. Table 4. MAVEN and human experts compared [20]. ...
Preprint
Games are attractive and engaging due to the complexity they pose to the player. Some games are complex enough which made them attractive to play. However, popular games would lose their attractiveness due to the large advantage of initiative. High-performance AI like AlphaZero suggests from their actual games played that advantage of the first player would become larger as the performance level increases. This implies that games with a large advantage of the initiative would lose their attractiveness due to the unfairness. This paper explores an innovative way to make a game stay attractive. A link between the advantage of initiative and performance level is investigated while using Scrabble AI. Using two measures: the advantage of initiative and game refinement, possible treatments are considered. The experimental results with Scrabble AI suggest that reduction of search space from 15×15 to 13×13 board size is a possible enhancement.
... Maven [12] is another Scrabble AI, created by Brian Sheppard. It has been used in official licensed Hasbro Scrabble games. ...
... 2) Maven tournament statistics: Maven plays better than the human experts [12], as shown in Table V. According to [12], Maven's total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. ...
... 2) Maven tournament statistics: Maven plays better than the human experts [12], as shown in Table V. According to [12], Maven's total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. Finally, what qualities of MAVEN make it such a very strong player? ...
Conference Paper
Full-text available
This paper explores the advantage of initiative using Scrabble as a test bed. Recently, a list of solved two-person zero-sum games with perfect information has increased. Among them, most of the games are a win for the first player (i.e., the advantage of initiative), some are draws, and only a few games are a win for the second player. Self-play experiments using Scrabble AIs were performed in this study. The results show that the player who established an advantage in the early opening took higher win expectancy. This implies that the advantage of initiative should be reconsidered to apply for all levels including nearly perfect players. Thus, we meet a new challenge to improve the rules of a game to maintain the fairness. The game of Scrabble gives an interesting example while giving a randomized initial position. This discussion can be extended to other domains when AI becomes much stronger or smarter than before.
... Maven [12] is another Scrabble AI, created by Brian Shep- pard. It has been used in official licensed Hasbro Scrabble games. ...
... 2) Maven tournament statistics: Maven plays better than the human experts [12], as shown in Table V. According to [12], Maven's total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. ...
... 2) Maven tournament statistics: Maven plays better than the human experts [12], as shown in Table V. According to [12], Maven's total matches and tournament record is 3500 wins and 1500 losses against an average rating of 1975. Finally, what qualities of MAVEN make it such a very strong player? ...
Preprint
This paper explores the advantage of initiative using Scrabble as a test bed. Recently, a list of solved two-person zero-sum games with perfect information has increased. Among them, most of the games are a win for the first player (i.e., the advantage of initiative), some are draws, and only a few games are a win for the second player. Self-play experiments using Scrabble AIs were performed in this study. The results show that the player who established an advantage in the early opening took higher win expectancy. This implies that the advantage of initiative should be reconsidered to apply for all levels including nearly perfect players. Thus, we meet a new challenge to improve the rules of a game to maintain the fairness. The game of Scrabble gives an interesting example while giving a randomized initial position. This discussion can be extended to other domains when AI becomes much stronger or smarter than before.
... Selfplay reinforcement learning approaches have achieved high levels of perfor mance in other games: chess [49][50][51] , checkers 52 , backgammon 53 , othello 54 , Scrabble 55 and most recently poker 56 . In all of these examples, a value function was trained by regression [54][55][56] or temporaldifference learning [49][50][51][52][53] from training data generated by selfplay. ...
... Selfplay reinforcement learning approaches have achieved high levels of perfor mance in other games: chess [49][50][51] , checkers 52 , backgammon 53 , othello 54 , Scrabble 55 and most recently poker 56 . In all of these examples, a value function was trained by regression [54][55][56] or temporaldifference learning [49][50][51][52][53] from training data generated by selfplay. The trained value function was used as an evaluation function in an alpha-beta search [49][50][51][52][53][54] , a simple Monte Carlo search 55,57 or counterfactual regret minimization 56 . ...
... In all of these examples, a value function was trained by regression [54][55][56] or temporaldifference learning [49][50][51][52][53] from training data generated by selfplay. The trained value function was used as an evaluation function in an alpha-beta search [49][50][51][52][53][54] , a simple Monte Carlo search 55,57 or counterfactual regret minimization 56 . However, these methods used handcrafted input features [49][50][51][52][53]56 or handcrafted feature templates 54,55 . ...
Article
Full-text available
A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo's own move selections and also the winner of AlphaGo's games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100-0 against the previously published, champion-defeating AlphaGo. © 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
... Monte Carlo Tree Search (MCTS) has had significant success in perfect information games such as Scrabble and Hex [1,31] and most noticeably in computer Go [13,14,32]. In recent years, MCTS was adapted by researchers [3] to solve imperfect information games. ...
... One of the first popular extensions of MCTS to imperfect information games is a MCTS variation where a perfect information search is performed on a determinized instance of the game [2,16,31,33]. That is, the search is performed on an instance of the game where hidden information is revealed to the players, thereby transforming the imperfect information game to a perfect information game. ...
... Many MCTS variations were extended to imperfect information games [3]. One of the more popular approaches performs the search on a determinized instance of the game [2,16,31,33]. An imperfect information game can be converted into a perfect information game (i.e. a deterministic game) by making the game states fully observable by all players and fixing the outcomes of stochastic events. ...
Article
Full-text available
Monte Carlo Tree Search (MCTS) has been extended to many imperfect information games. However, due to the added complexity that uncertainty introduces, these adaptations have not reached the same level of practical success as their perfect information counterparts. In this paper we consider the development of agents that perform well against humans in imperfect information games with partially observable actions. We introduce the Semi-Determinized-MCTS (SDMCTS), a variant of the Information Set MCTS algorithm (ISMCTS). More specifically, SDMCTS generates a predictive model of the unobservable portion of the opponent's actions from historical behavioral data. Next, SDMCTS performs simulations on an instance of the game where the unobservable portion of the opponent's actions are determined. Thereby, it facilitates the use of the predictive model in order to decrease uncertainty. We present an implementation of the SDMCTS applied to the Cheat Game, a well-known card game, with partially observable (and often deceptive) actions. Results from experiments with 120 subjects playing a head-to-head Cheat Game against our SDMCTS agents suggest that SDMCTS performs well against humans, and its performance improves as the predictive model's accuracy increases.
... For Checkers, materials and degrees of mobility are assessed [107]. As for Scrabble, the single, duplicate, and triplicate letters are analyzed [108]. After analyzing the positions, the agents are trained using TD-learning and self-play to fine-tune their performance. ...
... After analyzing the positions, the agents are trained using TD-learning and self-play to fine-tune their performance. Finally, a search algorithm is applied such as Minmax in the case of Chess, Checkers and Othello or Monte-Carlo for Scrabble [108]. [105] delved into how RL can be applied to the game of Go which seems to be a challenging problem for most computer programs to master [109]. ...
Article
Full-text available
Reinforcement Learning (RL) is fast gaining traction as a major branch of machine learning, its applications have expanded well beyond its typical usage in games. Several subfields of reinforcement learning like deep reinforcement learning and multi-agent reinforcement learning are also expanding rapidly. This paper provides an extensive review on the field from the point of view of Machine Learning (ML). It begins by providing a historical perspective on the field then proceeds to lay a theoretical background on the field. It further discusses core reinforcement learning problems and approaches taken by different subfields before discussing the state of the art in the field. An inexhaustive list of applications of reinforcement learning is provided and their practicability and scalability assessed. The paper concludes by highlighting some open areas or issues in the field
... In recent years, AI researchers have developed programs capable of defeating the strongest human players in the world. Superhuman-performance programs exist for popular board games such as chess, shogi, Go (AlphaZero by [20]), checkers (Chinook by [21]), Othello (Logistello by [22]), and Scrabble (Maven by [23]). ...
... Scrabble is a type of scoring game that is played on a physical or virtual board space. Scrabble AI programs have achieved a level of performance that exceeds that of the strongest human players [23]. Furthermore, the game of Scrabble gave an exciting example while giving an initial randomized position when the advantage of the initiative was reconsidered with self-play experiments [16]. ...
Article
Full-text available
The compensation system called komi has been used in scoring games such as Go. In Go, White (the second player) is at a disadvantage because Black gets to move first, giving that player an advantage; indeed, the winning percentage for Black is higher. The perceived value of komi has been re-evaluated over the years to maintain fairness. However, this implies that this static komi is not a sufficiently sophisticated solution. We leveraged existing komi methods in Go to study the evolution of fairness in board games and to generalize the concept of fairness in other contexts. This work revisits the notion of fairness and proposes the concept of dynamic komi Scrabble. We introduce two approaches, static and dynamic komi, in Scrabble to mitigate the advantage of initiative (AoI) issue and to improve fairness. We found that implementing the dynamic komi made the game attractive and provided direct real-time feedback, which is useful for the training of novice players and maintaining fairness for skilled players. A possible interpretation of physics-in-mind is also discussed for enhancing game refinement theory concerning fairness in games.
... Moreover, an expert-level AI player of heads-up no-limit Texas hold'em, which has more than 10 160 decision points, has been developed using tree search with bet abstraction and deep learning of counterfactual values [14]. In research other than on poker AI, an expert-level AI of Scrabble has been developed using a selective move generator, simulations of likely game scenarios, and the heuristic search algorithm B * [15]. ...
... The relations between P (q, t, z) and these probabilities p win , p washout , p tenpai , and p lose are P (q, t, win) = p win (q, t) P (q, t, tenpai) = p win (q, t)p washout (q, t)p tenpai (q, t) P (q, t, noten) = p win (q, t)p washout (q, t)p tenpai (q, t) P (q, t, lose) = p win (q, t)p washout (q, t)p lose (q, t) P (q, t, other) = p win (q, t)p washout (q, t)p lose (q, t) (18) These probabilities are inferred by logistic regression using features that are the results of value evaluations of these MDPs. To explain their features, let us introduce the following symbols: V win (q, t) and P win (q, t) are values from M win , where the former is a state value of (q, null, S Fold , t) and the latter is the probability that i in this state finally chooses an action in A Wins ; P tenpai (q, t) is the probability that i in (q, null, S Fold , t) of M tenpai will have a tenpai hand when it terminates; and P Lose (q, t) and U LoseAverage (q, t) are values from Eqs. (14) and (15), where the initial hand of M fold is q and T is adjusted according to t. The features used for the regressions are as follows. ...
Preprint
We propose a method for constructing artificial intelligence (AI) of mahjong, which is a multiplayer imperfect information game. Since the size of the game tree is huge, constructing an expert-level AI player of mahjong is challenging. We define multiple Markov decision processes (MDPs) as abstractions of mahjong to construct effective search trees. We also introduce two methods of inferring state values of the original mahjong using these MDPs. We evaluated the effectiveness of our method using gameplays vis-\`{a}-vis the current strongest AI player.
... For instance, self-awareness of the game bots is the challenging application of computational intelligence under computer science area [2][3][4]. On the opposite side, engineers try to design a perfect player to bear against game environment [5][6][7]. Furthermore, collected behavior logs of the human players from their plays might also be a source for social scientists [8]. Consequently, several games have been used as test beds such as Pacman [5], Scrabble [6], Super Mario [9], Counter-Strike [2], StarCraft [10], Flappy Bird [11,12] and, Lunar Lander [13,14]. ...
... Furthermore, collected behavior logs of the human players from their plays might also be a source for social scientists [8]. Consequently, several games have been used as test beds such as Pacman [5], Scrabble [6], Super Mario [9], Counter-Strike [2], StarCraft [10], Flappy Bird [11,12] and, Lunar Lander [13,14]. ...
Chapter
Full-text available
In this chapter, we will present the novel applications of the Interval Type-2 (IT2) Fuzzy Logic Controllers (FLCs) into the research area of computer games. In this context, we will handle two popular computer games called Flappy Bird and Lunar Lander. From a control engineering point of view, the game Flappy Bird can be seen as a classical obstacle avoidance while Lunar Lander as a position control problem. Both games inherent high level of uncertainties and randomness which are the main challenges of the game for the player. Thus, these two games can be seen as challenging testbeds for benchmarking IT2-FLCs as they provide dynamic and competitive elements that are similar to real-world control engineering problems. As the game player can be considered as the main controller in a feedback loop, we will construct an intelligent control systems composed of three main subsystems: reference generator, the main controller, and game dynamics. In this chapter, we will design and then employ an IT2-FLC as the main controller in a feedback loop such that to have a satisfactory game performance while be able to handle the various uncertainties of the games. In this context, we will briefly present the general structure and the design methods of two IT2-FLCs which are the Single Input and the Double Input IT2-FLCs. We will show that the IT2-FLC structure is capable to handle the uncertainties caused by the nature of the games by presenting both simulations and real-time game results in comparison with its Type-1 and conventional counterparts. We believe that the presented design methodology and results will provide a bridge for a wider deployment of Type-2 fuzzy logic in the area of the computer games.
... MAVEN [7] is currently the best known computer Scrabble player, presented by Brian Sheppard. MAVEN has 32 wins and 17 losses against a champion caliber position. ...
... The statistics show that MAVEN can play significantly better than the expert players. Although many professional techniques were implemented in MAVEN already, there are still several ways to make MAVEN even stronger as mentioned in [7]. ...
Chapter
Full-text available
This paper explores Scrabble, scoring boardgames from the perspective of gamification. We propose the swing model, a new measurement based on game refinement theory for an assessment. The result indicates that Scrabble displays a stronger aspect of an entertaining game, compared to that of an educational game. Moreover, the present analysis reveals that increasing vowel tiles would be more appropriate for beginners. Our goal is to generalize game modification to influence a game’s usefulness in an educational way.
... In the recent years, researchers have started giving more attention to computer games since they can be seen as ideal test-beds for the studies, especially for computational intelligence researches [1][2][3][4][5][6][7][8][9][10][11]. In this context, various games are handled and investigated such as Pacman [4], Scrabble [5], Super Mario [6] Counter-Strike [7], Unreal Tournament [8], Warcraft [9][10], TORCS (The Open Racing Car Simulator) [1,11], and Flappy Bird [12][13][14][15]. ...
... In the recent years, researchers have started giving more attention to computer games since they can be seen as ideal test-beds for the studies, especially for computational intelligence researches [1][2][3][4][5][6][7][8][9][10][11]. In this context, various games are handled and investigated such as Pacman [4], Scrabble [5], Super Mario [6] Counter-Strike [7], Unreal Tournament [8], Warcraft [9][10], TORCS (The Open Racing Car Simulator) [1,11], and Flappy Bird [12][13][14][15]. The game "Flappy Bird" is a very popular in early 2014 [15][16]. ...
Conference Paper
Full-text available
In this study, we will present the novel application of Type-2 (T2) fuzzy control into the popular video game called flappy bird. To the best of our knowledge, our work is the first deployment of the T2 fuzzy control into the computer games research area. We will propose a novel T2 fuzzified flappy bird control system that transforms the obstacle avoidance problem of the game logic into the reference tracking control problem. The presented T2 fuzzy control structure is composed of two important blocks which are the reference generator and Single Input Interval T2 Fuzzy Logic Controller (SIT2-FLC). The reference generator is the mechanism which uses the bird's position and the pipes' positions to generate an appropriate reference signal to be tracked. Thus, a conventional fuzzy feedback control system can be defined. The generated reference signal is tracked via the presented SIT2-FLC that can be easily tuned while also provides a certain degree of robustness to system. We will investigate the performance of the proposed T2 fuzzified flappy bird control system by providing comparative simulation results and also experimental results performed in the game environment. It will be shown that the proposed T2 fuzzified flappy bird control system results with a satisfactory performance both in the framework of fuzzy control and computer games. We believe that this first attempt of the employment of T2-FLCs in games will be an important step for a wider deployment of T2-FLCs in the research area of computer games.
... Researchers have applied it to e.g. Backgammon (Tesauro and Galperin, 1997), Poker (Billings et al., 1999), Bridge (Ginsberg, 2001), Scrabble (Sheppard, 2002), and Go (Brügmann, 1993;Bouzy and Helmstetter, 2004 ...
... The idea of stopping rollouts before the end of the game and backpropagating results on the basis of heuristic knowledge has been explored in Amazons (Lorentz, 2008), Lines of Action , and Breakthrough (Lorentz and Horey, 2014). To the best of our knowledge, it was first described in a naive Monte Carlo context (without tree search) by Sheppard (2002). A similar method is considered in Subsection 8.2.2, where we also introduce a hybrid algorithm replacing the evaluation function with a minimax call. ...
... The approach is based on real-time planning that finds the best branch and plays the best arm within that branch. MCTS has been applied successfully to many board games including; the Asian board game GO [2] and [26], General Game Playing where the rules of games used to evaluate techniques are not known in advance [9] [10] and [7], imperfect information games where each player independently chooses an action and these actions are applied at the same time, such as Scrabble and Bridge [3], the arcade Ms Pac-Man game with repeated random sampling to obtain results [17], [20], [32], [23] and [13]. In the last few years, several integrated solutions have been proposed for computer-based rehabilitation techniques [4], [19], [38], [18], [21], [34], [36], [14], [37], [25], [8], [33], [28], [22] and [12]. ...
Preprint
Full-text available
Computational Intelligence (CI) in computer games plays an important role that could simulate various aspects of real-life problems. CI in real-time decision-making games can provide a platform for the examination of tree search algorithms. In this paper, we present a rehabilitation serious game (ReHabgame) in which the Monte-Carlo Tree Search (MCTS) algorithm is utilized. The game is designed to combat the physical impairment of post-stroke/brain injury casualties in order to improve upper limb movement. Through the process of ReHabgame the player chooses paths via upper limb according to his/her movement ability to reach virtual goal objects. The system adjusts the difficulty level of the game based on the player's quality of activity through MCTS. It learns from the movements made by a player and generates further subsequent objects for collection. The system collects orientation, muscle and joint activity data and utilizes them to make decisions. Players data are collected through Kinect Xbox One and Myo Armband. The results show the effectiveness of the MCTS in the ReHabgame that progresses from highly achievable paths to the less achievable ones, thus configuring and personalizing the rehabilitation process.
... The landscape of board games, the majority of which are perfect information games, was previously revolutionized by the introduction of two key techniques: position evaluation and Monte Carlo tree search (MCTS) [94], [95]. These methodologies, with minor modifications, demonstrated superhuman effectiveness in solving board games such as chess [96], checkers [97], othello [98], backgammon [99], and Scrabble [100]. In contrast, the application of these techniques to the game of Go with an estimated 2.1 × 10 170 legal board configurations, only enabled performance at the amateur level [101]- [105]. ...
Preprint
Self-play, characterized by agents' interactions with copies or past versions of itself, has recently gained prominence in reinforcement learning. This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then it provides a unified framework and classifies existing self-play algorithms within this framework. Moreover, the paper bridges the gap between the algorithms and their practical implications by illustrating the role of self-play in different scenarios. Finally, the survey highlights open challenges and future research directions in self-play. This paper is an essential guide map for understanding the multifaceted landscape of self-play in RL.
... Historically, the earliest forms of simulation-based search [8,41], used for example to achieve superhuman performance in Scrabble [30], were based upon rollouts zOEs; a that start immediately after a single action a from state s. The main idea is to estimate the action-value of every action a 2 A from the root state by the outcome of simulations starting from that action. ...
... (Kobzeva, 2015) berpendapat bahwa scrabble merupakan alat efektif yang digunakan untuk mengembangkan pemikiran kritis siswa serta kompetensi bahasa mereka. (Sheppard, 2002) mengatakan bahwa scrabble merupakan permaianan papan dengan cara pemain membangun kata-kata dengan balok kecil yang berisi huruf dengan nilai poin yang bervariasi. ...
Article
Full-text available
This research is motivated by the results of an analysis which shows that media for early childhood 5-6 years is needed for the development of aspects of language development. This research aims to develop a media that is suitable for the characteristics of early childhood 5-6 years. The method used is Research and Development (R&D) research. The subjects used in this study were 15 children aged 5-6 years from an educational institution in Sukodono, Sidoarjo. The results of this study are, as follows. 1) Media spelling word scrabble to develop aspects of language development in early childhood 5-6 years is developed based on the steps in DDD-E, namely Decide, Design, Development and Evaluate. 2) The effectiveness of the spelling word scrabble media product is "Effective". It can be concluded that Scrabble spelling word media is effectively used to develop language development in early childhood 5-6 years.Keywords: Research and Development; Media Scrabble; Language Development; GamificationAbstrakPenelitian ini dilatarbelakangi dari hasil analisis lapangan yang menunjukkan sangat dibutuhkan alat bantu untuk anak usia dini 5-6 tahun untuk mengembangkan aspek perkembangan Bahasa. Penilitian ini bertujuan untuk mengembangkan sebuah media yang sesuai dengan karakteristik anak usia dini 5-6 tahun. Metode yang digunakan adalah penelitian Research and Development (R&D). Subjek yang digunakan dalam penelitian ini adalah 15 anak dengan usia 5-6 tahun dari suatu lembaga pendidikan di Sukodono, Sidoarjo. Hasil dari penelitian ini adalah, sebagai berikut. 1) Media spelling word scrabble untuk mengembangkan aspek perkembangan Bahasa pada anak usia dini 5-6 tahun dikembangkan berdasarkan langkah-langkah dalam DDD-E, yaitu Decide, Design, Development and Evaluate. 2) Efektifitas produk media spelling word scrabble adalah “Efektif”. Dapat disimpulkan bahwa media spelling word scrabble efektif digunakan untuk mengembangkan perkembangan Bahasa anak usia dini 5-6 tahun.Kata kunci: Penelitian dan Pengembangan; Media Scrabble; Perkembangan Bahasa; Gamification
... al., 2020;Lidiasari, et al., 2017;Lin, et al., 2007;Onasanya, et al., 2021). In scrabble, one of the most important aspects of the game is the precision and speed with which the players answer the question (Sheppard, 2002). ...
Article
Full-text available
Students’ learning outcomes of vocabulary mastery in reading comprehension at junior high schools in Banda Aceh, Indonesia, are relatively low. To tackle the issue, the Scrabble Game Technique (hereafter, SGT) is hoped to be a game-changer. This study aims to investigate EFL students’ learning outcomes through the use of the SGT in learning English vocabulary through narrative texts to seventh-grade students at a junior high school. The aspects assessed for each type of vocabulary included nouns, verbs, pronouns, adverbs, adjectives, and conjunctions. The research design was quantitative with pre-experimental research that used a one-group pre-test post-test design to measure the students’ learning outcomes after three treatments with the SGT. A total of 30 seventh-grade students were selected by purposive sampling. The instrument used to collect data was a test, comprising 30 questions in total, with 18 multiple choices, 6 fill-in-the-blanks, and 6 matching-the-word. The tests were further analyzed using the right-hand t-test after the pre-requisite test was met. It was found that the percentage of mastery of nouns and verbs in the post-test was better than the pre-test, with the improvement of nouns at 93%, verbs at 91%, pronouns at 84%, adverbs at 72%, adjectives at 71%, and conjunctions at 71%. Furthermore, the t-count was 19.68 with p = 0.05, dk = 29, and t-table = 1.70. It was concluded that students’ learning outcomes were better after being taught through the SGT.
... Self-play reinforcement learning method has achieved professional performance in such games: chess (Baxter et al., 2000), othello (Sheppard, 2002), and poker (Moravcík, 2017). Therefore, this paper adopts self-play reinforcement learning for maneuver decision-making, and does not use any human knowledge. ...
Article
Full-text available
Autonomous maneuver decision-making methods for air combat often rely on human knowledge, such as advantage functions, objective functions, or dense rewards in reinforcement learning, which limits the decision-making ability of unmanned combat aerial vehicle to the scope of human experience and result in slow progress in maneuver decision-making. Therefore, a maneuver decision-making method based on deep reinforcement learning and Monte Carlo tree search is proposed to investigate whether it is feasible for maneuver decision-making without human knowledge or advantage function. To this end, Monte Carlo tree search in continuous action space is proposed and neural networks-guided Monte Carlo tree search with self-play is utilized to improve the ability of air combat agents. It starts from random behaviors and generates samples consisting of states, actions, and results of air combat through self-play without using human knowledge. These samples are used to train the neural network, and the neural network with a greater winning rate is selected by simulations. Then, repeat the above process to gradually improve the maneuver decision-making ability. Simulations are conducted to verify the effectiveness of the proposed method, and the kinematic model of the missile is used in simulations instead of the missile engagement zone to test whether the maneuver decision-making method is effective or not. The simulation results of the fixed initial state and random initial state show that the proposed method is efficient and can meet the real-time requirement.
... In the last few years, several Monte-Carlo based techniques emerged in the field of computer games. They have already been applied successfully to many games, including POKER (Billings et al. 2002) and SCRABBLE (Sheppard 2002). Monte-Carlo Tree Search (MCTS), a Monte-Carlo based technique that was first established in 2006, is implemented in top-rated GO programs. ...
Article
Classic approaches to game AI require either a high quality of domain knowledge, or a long time to generate effective AI behaviour. These two characteristics hamper the goal of establishing challenging game AI. In this paper, we put forward Monte-Carlo Tree Search as a novel, unified framework to game AI. In the framework, randomized explorations of the search space are used to predict the most promising game actions. We will demonstrate that Monte-Carlo Tree Search can be applied effectively to (1) classic board-games, (2) modern board-games, and (3) video games.
... So, they could memorize vocabulary in different ways. It supported by Sheppard (2002) scrabble is a board game where players build words with small tiles that contain letters with varying point values. Hapsari (2017) also said that scrabble game is very useful, easy and entertaining game to practice any set of vocabulary. ...
Article
This study aimed at finding out whether there is a significant effect of integrating scrabble into numbered heads together at grade seven of SMP Negeri 12 Konawe Selatan or not. The research was conducted to answer the following question “Is there any significant effect of integrating scrabble into numbered heads together on students’ vocabulary achievement at grade seven of SMP Negeri 12 Konawe Selatan?”. The hypothesis of this research is “There is a significant effect of integrating scrabble into numbered heads together on students’ vocabulary achievement at grade seven of SMP Negeri 12 Konawe Selatan”. This research used quasi experimental design with population was all grade seven students at SMP Negeri 12 Konawe Selatan in academic year 2018/2019. The samples of this research were class VIIE as the experimental class and VIIB as control class. The experimental class consist of 28 students and control class consist of 26 students. The research instrument was 40 questions of vocabulary test. Collecting the data, the researcher gave pre-test, taught integrating scrabble into numbered heads together, and giving post-test. The result show that students’ mean score in experimental class are 55.21 at pre-test and 75.13 at post-test, while students’ mean score in control class are 55.61 in pre-test and 64.93 in post-test. Based on the calculation of T-test, it shows that the score of tcount = 5.636 is higher than ttable = 2.007 and the pvalue= 0.000 is lower than = 0.05. Therefore, the null hypothesis is rejected and H is accepted. This can be conclude that there is a significant effect of integrating scrabble into numbered heads together on students’ vocabulary achievement at grade seven students of SMP Negeri 12 Konawe Selatan. Keywords: Scrabble, numbered heads together, vocabulary achievement
... However, the interest of the research community had increasingly shifted towards the application of AI from board games to other types of game in the last decade and a half, particularly in video games. A large part of the research focuses mainly on developing AI agents for playing games; either as effectively as possible, or human-like (or a particular human), with respect to some other property [20]. ...
Conference Paper
Full-text available
Games are attractive and engaging due to the complexity they pose to the player. Some games are complex enough which made them attractive to play. However, popular games would lose their attractiveness due to the large advantage of initiative. High-performance AI like AlphaZero suggests from their actual games played that advantage of the first player would become larger as the performance level increases. This implies that games with a large advantage of the initiative would lose their attractiveness due to the unfairness. This paper explores an innovative way to make a game stay attractive. A link between the advantage of initiative and performance level is investigated while using Scrabble AI. Using two measures: the advantage of initiative and game refinement, possible treatments are considered. The experimental results with Scrabble AI suggest that reduction of search space from 15x15 to 13x13 board size is a possible enhancement.
... Maven's [2] game play is sub-divided into 3 phases: ...
Preprint
Full-text available
The current state-of-the-art Scrabble agents are not learning-based but depend on truncated Monte Carlo simulations and the quality of such agents is contingent upon the time available for running the simulations. This thesis takes steps towards building a learning-based Scrabble agent using self-play. Specifically, we try to find a better function approximation for the static evaluation function used in Scrabble which determines the move goodness at a given board configuration. In this work, we experimented with evolutionary algorithms and Bayesian Optimization to learn the weights for an approximate feature-based evaluation function. However, these optimization methods were not quite effective, which lead us to explore the given problem from an Imitation Learning point of view. We also tried to imitate the ranking of moves produced by the Quackle simulation agent using supervised learning with a neural network function approximator which takes the raw representation of the Scrabble board as the input instead of using only a fixed number of handcrafted features.
... Monte Carlo Tree Search Monte Carlo Tree Search (MCTS) is a framework to find the best decision when the graph structure is a tree. It showed success in many game applications, such as Billings et al. [2002], Sheppard [2002], Tesauro and Galperin [1997]. Recently, the great success in computer Go [Baier andDrake, 2010, Bouzy andHelmstetter, 2004] made it an important focus in AI research and nowadays it becomes an important tool in various fields [Kim and Kim, 2017, Mańdziuk, 2018, Sironi et al., 2018. ...
Preprint
Recently, there have been great interests in Monte Carlo Tree Search (MCTS) in AI research. Although the sequential version of MCTS has been studied widely, its parallel counterpart still lacks systematic study. This leads us to the following questions: \emph{how to design efficient parallel MCTS (or more general cases) algorithms with rigorous theoretical guarantee? Is it possible to achieve linear speedup?} In this paper, we consider the search problem on a more general acyclic one-root graph (namely, Monte Carlo Graph Search (MCGS)), which generalizes MCTS. We develop a parallel algorithm (P-MCGS) to assign multiple workers to investigate appropriate leaf nodes simultaneously. Our analysis shows that P-MCGS algorithm achieves linear speedup and that the sample complexity is comparable to its sequential counterpart.
... has been widely used to provide not optimal but still acceptable policy/value functions in many games and achieved superhuman performance in Backgammon [14] and Scrabble [15]. But in the field of Go, the MCTS approach has only achieved amateur level's play [4]. ...
Article
Gomoku, also called Five in a row, is one of the earliest checkerboard games invented by humans. For a long time, it has brought countless pleasures to us. We humans, as players, also created a lot of skills in playing it. Scientists normalize and enter these skills into the computer so that the computer knows how to play Gomoku. However, the computer just plays following the pre-entered skills, it doesn’t know how to develop these skills by itself. Inspired by Google’s AlphaGo Zero, in this thesis, by combining the technologies of Monte Carlo Tree Search, Deep Neural Networks, and Reinforcement Learning, we propose a system that trains machine Gomoku players without prior human skills. These are self-evolving players that no prior knowledge is given. They develop their own skills from scratch by themselves. We have run this system for a month and half, during which time 150 different players were generated. The later these players were generated, the stronger abilities they have. During the training, beginning with zero knowledge, these players developed a row-based bottom-up strategy, followed by a column-based bottom-up strategy, and finally, a more flexible and intelligible strategy with a preference to the surrounding squares. Although even the latest players do not have strong capacities and thus couldn’t be regarded as strong AI agents, they still show the abilities to learn from the previous games. Therefore, this thesis proves that it is possible for the machine Gomoku player to evolve by itself without human knowledge. These players are on the right track, with continuous training, they would become better Gomoku players.
... As the tree grows larger more accurate values are generated. The average of these rollouts can provide an effective position evaluation achieving accurate performance in games such as Backgammon (Tesauro and Galperin 1997) and Scrabble (Sheppard 2002). This paper presents the research on the board game Quoridor to develop an artificial player agent using the Monte Carlo tree search algorithm and compares it with current existing agents. ...
Conference Paper
Full-text available
This paper presents a preliminary study using Monte Carlo Tree Search (MCTS) upon the board game of Quoridor. Quoridor is an interesting game for expansion of player agents in MCTS due to having a mechanically simple rule set, however, Quoridor has a state-space complexity similar to Chess with a higher game-tree complexity. The system is shown to perform well against current existing methods, defeating a set of player agents drawn from an existing digital implementation as well as a previous method using a GA.
... The idea of stopping rollouts before the end of the game and backpropagating results on the basis of heuristic knowledge has been explored in Amazons (Lorentz, 2008), Lines of Action , and Breakthrough (Lorentz & Horey, 2014). It was first described in a naive Monte Carlo context (without tree search) by Sheppard (2002). A similar method is considered in Subsection 4.2, where we also introduce a hybrid algorithm replacing the evaluation function with a minimax call. ...
Article
Full-text available
Monte-Carlo Tree Search (MCTS) has been found to show weaker play than minimax-based search in some tactical game domains. This is partly due to its highly selective search and averaging value backups, which make it susceptible to traps. In order to combine the strategic strength of MCTS and the tactical strength of minimax, MCTS-minimax hybrids have been introduced, embedding shallow minimax searches into the MCTS framework. Their results have been promising even without making use of domain knowledge such as heuristic evaluation functions. This article continues this line of research for the case where evaluation functions are available. Three di erent approaches are considered, employing minimax with an evaluation function in the rollout phase of MCTS, as a replacement for the rollout phase, and as a node prior to bias move selection. The latter two approaches are newly proposed. Furthermore, all three hybrids are enhanced with the help of move ordering and k-best pruning for minimax. Results show that the use of enhanced minimax for computing node priors results in the strongest MCTS-minimax hybrid investigated in the three test domains of Othello, Breakthrough, and Catch the Lion. This hybrid, called MCTS-IP-M-k, also outperforms enhanced minimax as a standalone player in Breakthrough, demonstrating that at least in this domain, MCTS and minimax can be combined to an algorithm stronger than its parts. Using enhanced minimax for computing node priors is therefore a promising new technique for integrating domain knowledge into an MCTS framework.
... It was, in fact, empirically demonstrated that multiple-step greedy policies can perform conspicuously better. Notable examples arise from the integration of RL and Monte Carlo Tree Search [4,28,23,3,25,24] or Model Predictive Control [15,6,27]. ...
Preprint
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work \cite{efroni2018beyond}, multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.
... There, an approximate online version of multiple-step greedy improvement is implemented via Monte Carlo Tree Search (MCTS) (Browne et al., 2012). The celebrated MCTS algorithm, which instantiates several steps of lookahead improvement, encompasses additional historical impressive accomplishments dating back to the past century and previous decade (Tesauro & Galperin, 1997;Sheppard, 2002;Bouzy & Helmstetter, 2004;Veness et al., 2009). To the best of our knowledge, and despite such empirical successes, the use of a multiple-step greedy policy improvement has never been rigorously studied before. ...
Article
Full-text available
The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.
... In most cases, roll-outs or lookahead strategies are used that select opponent's actions according to how the agent itself would select actions or according to simple rules. Although roll-outs have shown to substantially increase performance in games such as Backgammon (Tesauro and Galperin, 1997), Go (Bouzy and Helmstetter, 2004;Silver et al., 2016a), and Scrabble (Sheppard, 2002), the disadvantage of this approach is that particular weaknesses of the opponent cannot be exploited, as no true model of how the opponent selects actions is used. Opponent modelling has been studied for imperfect-information games such as poker (Ganzfried and Sandholm, 2011;Southey et al., 2005). ...
Conference Paper
Full-text available
In this paper we propose the use of vision grids as state representation to learn to play the game Tron using neural networks and reinforcement learning. This approach speeds up learning by significantly reducing the number of unique states. Furthermore, we introduce a novel opponent modelling technique, which is used to predict the opponent's next move. The learned model of the opponent is subsequently used in Monte-Carlo roll-outs, in which the game is simulated n-steps ahead in order to determine the expected value of conducting a certain action. Finally, we compare the performance using two different activation functions in the multi-layer perceptron, namely the sigmoid and exponential linear unit (Elu). The results show that the Elu activation function outperforms the sigmoid activation function in most cases. Furthermore, vision grids significantly increase learning speed and in most cases this also increases the agent's performance compared to when the full grid is used as state representation. Finally, the opponent modelling technique allows the agent to learn a predictive model of the opponent's actions, which in combination with Monte-Carlo roll-outs significantly increases the agent's performance.
... The approach is based on real-time planning that finds the best branch and plays the best arm within that branch. MCTS has been applied successfully to many board games including; the Asian board game GO [2] and [26], General Game Playing where the rules of games used to evaluate techniques are not known in advance [9] [10] and [7], imperfect information games where each player independently chooses an action and these actions are applied at the same time, such as Scrabble and Bridge [3], the arcade Ms Pac-Man game with repeated random sampling to obtain results [17], [20], [32], [23] and [13]. In the last few years, several integrated solutions have been proposed for computer-based rehabilitation techniques [4], [19], [38], [18], [21], [34], [36], [14], [37], [25], [8], [33], [28], [22] and [12]. ...
Conference Paper
Full-text available
Artificial and Computational Intelligence in computer games play an important role that could simulate various aspects of real life problems. Development of artificial intelligence techniques in real time decision-making games can provide a platform for the examination of tree search algorithms. In this paper, we present a rehabilitation system known as RehabGame in which the Monte-Carlo Tree Search algorithm is used. The objective of the game is to combat the physical impairment of stroke/ brain injury casualties in order to improve upper limb movement. Through the process of a real-time rehabilitation game, the player decides on paths that could be taken by her/his upper limb in order to reach virtual goal objects. The system has the capability of adjusting the difficulty level to the player's ability by learning from the movements made and generating further subsequent objects. The game collects orientation, muscle and joint activity data and utilizes them to make decisions on game progression. Limb movements are stored in the search tree which is used to determine the location of new target virtual fruit objects by accessing the data saved in the background from different game plays. It monitors the enactment of the muscles strain and stress through the Myo armband sensor and provides the next step required for the rehabilitation purpose. The results from two samples show the effectiveness of the Monte-Carlo Tree Search in the RehabGame by being able to build a coherent hand motion. It progresses from highly achievable paths to the less achievable ones, thus configuring and personalizing the rehabilitation process.
... In the last decade, computer games have been often used as testbeds for benchmarking computational intelligence methods since they provide realistic dynamic and competitive elements [1]. Consequently, various computer games are handled and investigated [2][3][4][5]. Games that are defined with physical systems such as TORCS [6], Flappy Bird [7] and Lunar Lander [8,9] are also investigated. Recently, computational intelligence methods have been successfully employed to real-world games [1,10]. ...
... A more universal possibility, introduced by Abramson [1], is to use some Monte-Carlo simulations: from each position with depth K, we handle a certain number N of uniformly random matches (or matches with a simple default policy), and we evaluate this position by the number of these matches that led to a victory of J 1 divided by N . Such a procedure is now called Flat MCTS, see Coquelin and Munos [9], Browne et al. [6], see also Ginsberg [14] and Sheppard [20]. ...
Article
Full-text available
We consider a deterministic game with alternate moves and complete information, of which the issue is always the victory of one of the two opponents. We assume that this game is the realization of a random model enjoying some independence properties. We consider algorithms in the spirit of Monte-Carlo Tree Search, to estimate at best the minimax value of a given position: it consists in simulating, successively, n well-chosen matches, starting from this position. We build an algorithm, which is optimal, step by step, in some sense: once the n first matches are simulated, the algorithm decides from the statistics furnished by the n first matches (and the a priori we have on the game) how to simulate the (n+1)-th match in such a way that the increase of information concerning the minimax value of the position under study is maximal. This algorithm is remarkably quick. We prove that our step by step optimal algorithm is not globally optimal and that it always converges in a finite number of steps, even if the a priori we have on the game is completely irrelevant. We finally test our algorithm, against MCTS, on Pearl's game and, with a very simple and universal a priori, on the games Connect Four, Hex and some variants. Except concerning Pearl's game, the numerical results are rather disappointing. We however exhibit some situations in which our algorithm seems efficient.
... Determinization has been called "averaging over clairvoyance" [38], where players never try to hide or gain information, because in each determinization, all information is already available. Despite these shortcomings, it has produced strong results in the past, for instance in Monte-Carlo engines for the trick-based card game Bridge [22], the card game Skat [8], Scrabble [40], and Phantom Go [9]. ...
Chapter
Monte-Carlo Tree Search (MCTS) is a best-first search method guided by the results of Monte-Carlo simulations. It is based on randomized exploration of the search space. Using the results of previous explorations, the method gradually builds up a game tree in memory and successively becomes better at accurately estimating the values of the most promising moves. MCTS has substantially advanced the state of the art in board games such as Go, Amazons, Hex, Chinese Checkers, Kriegspiel, and Lines of Action. This chapter gives an overview of popular and effective enhancements for board game playing MCTS agents. First, it starts by describing the structure of MCTS and giving pseudocode. It also addresses how to adjust MCTS to prove the game-theoretic value of a board position. Next, popular enhancements such as RAVE, progressive bias, progressive widening, and prior knowledge, which improve the simulation in the tree part of MCTS, are discussed in detail. Subsequently, enhancements such as MAST, N-Grams, and evaluation function-based strategies are explained for improving the simulation outside the tree. As modern computers have nowadays multiple cores, this chapter mentions techniques to parallelize MCTS in a straightforward but effective way. Finally, approaches to deal with imperfect information and stochasticity in an MCTS context are discussed as well.
... Determinization has been called "averaging over clairvoyance" [23], where players never try to hide or gain information, because in each determinization, all information is already available. Despite these shortcomings, it has produced strong results in the past, for instance in Monte-Carlo engines for the trick-based card game Bridge [15], the card game Skat [8], Scrabble [24], and Phantom Go [9]. ...
Conference Paper
This paper investigates Sequential Halving as a selection policy in the following four partially observable games: Go Fish, Lost Cities, Phantom Domineering, and Phantom Go. Additionally, H-MCTS is studied, which uses Sequential Halving at the root of the search tree, and UCB elsewhere. Experimental results reveal that H-MCTS performs the best in Go Fish, whereas its performance is on par in Lost Cities and Phantom Domineering. Sequential Halving as a flat Monte-Carlo Search appears to be the stronger technique in Phantom Go.
... In 1993 Bernd Brügmann was the first to use Monte-Carlo evaluations in his 9×9 Go program Gobble. The following years the technique was incorporated in stochastic games such as Backgammon [33] and imperfect-information games such as Bridge [19], Poker [5], and Scrabble [30]. ...
Article
In this article, we propose a method for constructing artificial intelligence (AI) player of Mahjong , which is a multiplayer imperfect information game. Since the size of the game tree is huge, constructing an expert-level AI player of Mahjong is challenging. We define multiple Markov decision processes (MDPs) as abstractions of Mahjong to construct effective search trees. We also introduce two methods of inferring state values of the Mahjong using these MDPs. We evaluated the effectiveness of our method using gameplays vis-à-vis the current strongest AI player.
Chapter
The world of today has undergone a technological revolution that has drastically transformed society. The world is now accessible to every age group. One common application of modern computing among the younger generation is computer games. Although there are some advantages, many researchers have shown that they may be somewhat harmful to the growth and development of children. In this paper, the authors examine computer game use among youth, including the games they play and how constructive games can have positive development implications. One popular learning game is Scrabble (a trademark of the Hasbro Corporation). In this paper, the authors present the architecture of a constructive computer game, NigerScrab. NigerScrab is a version of the computer Scrabble game that is both entertaining and educational and has positive impacts on youth.
Chapter
Full-text available
In the last two decades the intelligent agents have improved the lifestyle of human beings from different aspects of view such as life activities and services. Considering the importance of the safety and security role in the e-procurement, there have been many systems developed including trust engine. In particular, some of the first systems were modeled though trust evaluation concepts as crisp values, but now a days to adjust the systems with real world cases, the uncertainty and impreciseness parameters must be considered with the use of fuzzy sets theory. In this paper to minimize the number of exceptions related to suppliers, Trust Management Agent (TMA) is considered to prioritize candidate suppliers based on trust criteria. Due to lots of uncertainties, type-2 fuzzy sets prove to be a most suitable methodology to deal with the trust evaluation process efficiently. In this regard, a new evaluation process based on hierarchical Linguistic Weighted Averaging (LWA) sets is proposed. The solution method was then illustrated through a simple example which clarifies the suitability as well as the simplicity of the proposed method for the category of the defined problem.
Conference Paper
We propose a topology-aware heuristic that significantly reduces the message latency for search trees of tree parallel Monte-Carlo Tree Search. There exist many communication-aware and topology-aware mappings. However, those mappings are not applicable to the hash driven parallel search techniques. This is because in hash driven parallel search each graph/tree node is randomly distributed based on a hash function and each edge is also randomly connected, so each computation cluster only knows about the tasks that are being executed on themselves, so it is not possible to do dynamic load balancing according to the current status of the network. To cope with that, we devise an heuristic based on the depth of each search tree node and the betweenness centrality of each computational cluster of the network topology. Our experimental results show that we can reduce the average message latency by 15% to 35%.
Article
Monte-Carlo Tree Search (MCTS) is a best-first search method guided by the results of Monte-Carlo simulations. It is based on randomized exploration of the search space. Using the results of previous explorations, the method gradually builds up a game tree in memory and successively becomes better at accurately estimating the values of the most promising moves. MCTS has substantially advanced the state of the art in board games such as Go, Amazons, Hex, Chinese Checkers, Kriegspiel, and Lines of Action. This chapter gives an overview of popular and effective enhancements for board game playing MCTS agents. First, it starts by describing the structure of MCTS and giving pseudocode. It also addresses how to adjust MCTS to prove the game-theoretic value of a board position. Next, popular enhancements such as RAVE, progressive bias, progressive widening, and prior knowledge, which improve the simulation in the tree part of MCTS, are discussed in detail. Subsequently, enhancements such as MAST, N-Grams, and evaluation function-based strategies are explained for improving the simulation outside the tree. As modern computers have nowadays multiple cores, this chapter mentions techniques to parallelize MCTS in a straightforward but effective way. Finally, approaches to deal with imperfect information and stochasticity in an MCTS context are discussed as well.
Article
Full-text available
This article introduces a class of incremental learning procedures specialized for prediction-that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.
Article
Full-text available
An efficient backtracking algorithm makes possible a very fast program to play the SCRABBLE® Brand Crossword Game. The efficiency is achieved by creating data structures before the backtracking search begins that serve both to focus the search and to make each step of the search fast.
Article
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.
Article
Computing an effective strategy in games with incomplete information is much more difficult than in games where the status of every relevant factor is known. A weighted heuristic approach selects the move in a given position that maximizes a weighted sum of known factors, where the weights have been optimized over a large random sample of games. Probabilistic search is an alternative approach that generates a random set of scenarios, simulates how plausible moves perform under each scenario, and selects the move with the "best" overall performance. This paper compares the effectiveness of these approaches for the game of Scrabble.
Article
A program that plays the SCRABBLE Crossword Game has oeen designed and implemented in SIMULA 67 on a DECSystem-10 and in Pascal on a CYBER 173. The heart of the design is the data structure for the lexicon and the algorithm for searching it. The lexicon is represented as a letter table, or trie using a canonical ordering of the letters in the words rather than the original spelling. The algorithm takes the trie and a collection of letters, including blanks, and finds all words that can be formed from any combination and permutation of the letters. Words are found in approximately the order of their value in the game.
Article
In this paper we present a new algorithm for searching trees. The algorithm, which we have named B∗, finds a proof that an arc at the root of a search tree is better than any other. It does this by attempting to find both the best arc at the root and the simplest proof, in best-first fashion. This strategy determines the order of node expansion. Any node that is expanded is assigned two values: an upper (or optimistic) bound and a lower (or pessimistic) bound. During the course of a search, these bounds at a node tend to converge, producing natural termination of the search. As long as all nodal bounds in a sub-tree are valid, B∗ will select the best arc at the root of that sub-tree. We present experimental and analytic evidence that B∗ is much more effective than present methods of searching adversary trees.The B∗ method assigns a greater responsibility for guiding the search to the evaluation functions that compute the bounds than has been done before. In this way knowledge, rather than a set of arbitrary predefined limits can be used to terminate the search itself. It is interesting to note that the evaluation functions may measure any properties of the domain, thus resulting in selecting the arc that leads to the greatest quantity of whatever is being measured. We conjecture that this method is that used by chess masters in analyzing chess trees.
Article
Poker is an interesting test-bed for artificial intelligence research. It is a game of imperfect information, where multiple competing agents must deal with probabilistic knowledge, risk assessment, and possible deception, not unlike decisions made in the real world. Opponent modeling is another difficult problem in decision-making applications, and it is essential to achieving high performance in poker.This paper describes the design considerations and architecture of the poker program Poki. In addition to methods for hand evaluation and betting strategy, Poki uses learning techniques to construct statistical models of each opponent, and dynamically adapts to exploit observed patterns and tendencies. The result is a program capable of playing reasonably strong poker, but there remains considerable research to be done to play at a world-class level.
Article
Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.
Article
Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engieering and Computer Science, 1985. MICROFICHE COPY AVAILABLE IN ARCHIVES AND ENGINEERING. Bibliography: leaves 84-86. by Laura Jo Yedwab. M.S.
Article
The expected-outcome model, in which the proper evaluation of a game-tree node is the expected value of the game's outcome given random play from that node on, is proposed. Expected outcome is considered in its ideal form, where it is shown to be a powerful heuristic. The ability of a simple random sampler that estimates expected outcome to outduel a standard Othello evaluator is demonstrated. The sampler is combined with a linear regression procedure to produce efficient expected-outcome estimators. Overall, the expected-outcome model of two-player games is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent
Article
This paper describes Goren In a Box (gib), the first bridge-playing program to approach the level of a human expert. We give a basic overview of the algorithms used, describe their strengths and weaknesses, and present the results of experiments comparing gib to both human opponents and earlier programs. Introduction Of all the classic games of skill, only card games and Go have yet to see the appearance of serious computer challengers. In Go, this appears to be because the game is fundamentally one of pattern recognition as opposed to search; the brute-force techniques that have been so successful in the development of chess-playing programs have failed almost utterly to deal with Go's huge branching factor. Indeed, the arguably strongest Go program in the world was beaten by Janice Kim in the AAAI-97 Hall of Champions after Kim had given the program a monumental 25 stone handicap. Card games appear to be different. Perhaps because they are games of imperfect information, or perhaps...
Article
This paper presents a faster algorithm that uses a GADDAG, a finite automaton that avoids the non-deterministic prefix generation of the DAWG algorithm by encoding a bidirectional path starting from each letter of each word in the lexicon. For a typical lexicon, the GADDAG is nearly five times larger than the DAWG, but generates moves more than twice as fast. This time/space trade-off is justified not only by the decreasing cost of computer memory, but also by the extensive use of move-generation in the analysis of board positions used by Gordon in the probabilistic search for the most appropriate play in a given position within realistic time constraints
Conference Paper
. This paper discusses a practical framework for the semi-- automatic construction of evaluation functions for games. Based on a structured evaluation function representation, a procedure for exploring the feature space is presented that is able to discover new features in a computational feasible way. Besides the theoretical aspects, related practical issues such as the generation of training positions, feature selection, and weight fitting in large linear systems are discussed. Finally, we present experimental results for Othello, which demonstrate the potential of the described approach. Keywords: automatic feature construction, GLEM, Othello 1 Introduction Many AI systems use evaluation functions for guiding search tasks. In the context of strategy games they usually map game positions into the real numbers for estimating the winning chance for the player to move. Decades of research has shown how hard a problem evaluation function construction is, even when focusing on ...
Scrabble crossword game-playing programs Reprinted in: Computer Game-Playing: Theory and Practice
  • S C Shapiro
S.C. Shapiro, Scrabble crossword game-playing programs, SIGART Newsletter 80 (1982). Reprinted in: M.A. Bramer (Ed.), Computer Game-Playing: Theory and Practice, Ellis Horwood Ltd, Chichester, UK, 1983, pp. 221–228.
A competitive Scrabble program Reprinted in: Computer Game-Playing: Theory and Practice
  • P J Turcan
P.J. Turcan, A competitive Scrabble program, SIGART Newsletter 80 (1982). Reprinted in: M.A. Bramer (Ed.), Computer Game-Playing: Theory and Practice, Ellis Horwood Ltd, Chichester, UK, 1983, pp. 209– 220.
A competitive Scrabble program
  • Turcan
Letters beyond numbers
  • Uljee
I. Uljee, Letters beyond numbers, in: H.J. van den Herik, L.V. Allis (Eds.), Heuristic Programming in Artificial Intelligence 3, The Third Computer Olympiad, Ellis Horwood, Chichester, UK, 1992, pp. 63– 66.
241–275 see how this might operate. The question is to identify the set of all squares on which the word QUA may start. Without loss of generality, assume that QUA is horizontal. To play QUA, several constraints must be satisfied simultaneously: (1) An empty square
  • B Sheppard
B. Sheppard / Artificial Intelligence 134 (2002) 241–275 see how this might operate. The question is to identify the set of all squares on which the word QUA may start. Without loss of generality, assume that QUA is horizontal. To play QUA, several constraints must be satisfied simultaneously: (1) An empty square (or the edge) to the left of the Q (since Q is the first letter).
Adam's choice is best because there are only 2 Us left, and Adam does not want to risk getting a bad Q. When you lead the game you have to guard against extreme outcomes
  • Pilus Good
PILIS, PULIS, PILUS, and PURIS are all good. Adam's choice is best because there are only 2 Us left, and Adam does not want to risk getting a bad Q. When you lead the game you have to guard against extreme outcomes. MAVEN: ?AKNPRS SPANKER K5 105 292