Article

Using Resource-Limited Nash Memory to Improve an Othello Evaluation Function

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Finding the best strategy for winning a game using self-play or coevolution can be hindered by intransitivity among strategies and a changing fitness landscape. Nash Memory has been proposed as an archive for coevolution, to counter intransitivity and provide a more consistent fitness landscape. A lack of bounds on archive size might impede its use in a large, complex domain, such as the game of Othello , with strategies described by n -tuple networks. This paper demonstrates that even with a bounded-size archive, an evolving population can continue to show progress past the point where self-play no longer can. Characteristics of Nash equilibria are shown to be valuable in the measurement of performance. In addition, a technique for automated selection of features is demonstrated for the n -tuple networks.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... Othello has for a long time been a popular benchmark for computational intelligence methods [29], [35], [5], [25], [26], [42], [40], [28], [16], [30], [18]. All strong Othello-playing programs use a variant of the minimax search [7] with a board evaluation function. ...
... Past research suggested that more can be gained by improving the latter than the former; that is why recently the focus was mostly on training 1 look-ahead (a.k.a. 1-ply) agents using either self-play [26], [18], fixed opponents [20], [16], or expert game databases [30]. Multiple ways of training the agents have been proposed: value-based temporal difference learning [25], [42], [34], (co)evolution [26], [31], [17], [15], and hybrids thereof [39], [40]. ...
... 1-ply) agents using either self-play [26], [18], fixed opponents [20], [16], or expert game databases [30]. Multiple ways of training the agents have been proposed: value-based temporal difference learning [25], [42], [34], (co)evolution [26], [31], [17], [15], and hybrids thereof [39], [40]. ...
Preprint
Achieving superhuman playing level by AlphaGo corroborated the capabilities of convolutional neural architectures (CNNs) for capturing complex spatial patterns. This result was to a great extent due to several analogies between Go board states and 2D images CNNs have been designed for, in particular translational invariance and a relatively large board. In this paper, we verify whether CNN-based move predictors prove effective for Othello, a game with significantly different characteristics, including a much smaller board size and complete lack of translational invariance. We compare several CNN architectures and board encodings, augment them with state-of-the-art extensions, train on an extensive database of experts' moves, and examine them with respect to move prediction accuracy and playing strength. The empirical evaluation confirms high capabilities of neural move predictors and suggests a strong correlation between prediction accuracy and playing strength. The best CNNs not only surpass all other 1-ply Othello players proposed to date but defeat (2-ply) Edax, the best open-source Othello player.
... Othello has for a long time been a popular benchmark for computational intelligence methods [29], [35], [5], [25], [26], [42], [40], [28], [16], [30], [18]. All strong Othello-playing programs use a variant of the minimax search [7] with a board evaluation function. ...
... Past research suggested that more can be gained by improving the latter than the former; that is why recently the focus was mostly on training 1 look-ahead (a.k.a. 1-ply) agents using either self-play [26], [18], fixed opponents [20], [16], or expert game databases [30]. Multiple ways of training the agents have been proposed: value-based temporal difference learning [25], [42], [34], (co)evolution [26], [31], [17], [15], and hybrids thereof [39], [40]. ...
... 1-ply) agents using either self-play [26], [18], fixed opponents [20], [16], or expert game databases [30]. Multiple ways of training the agents have been proposed: value-based temporal difference learning [25], [42], [34], (co)evolution [26], [31], [17], [15], and hybrids thereof [39], [40]. ...
Article
Full-text available
Achieving superhuman playing level by AlphaGo corroborated the capabilities of convolutional neural architectures (CNNs) for capturing complex spatial patterns. This result was to a great extent due to several analogies between Go board states and 2D images CNNs have been designed for, in particular translational invariance and a relatively large board. In this paper, we verify whether CNN-based move predictors prove effective for Othello, a game with significantly different characteristics, including a much smaller board size and complete lack of translational invariance. We compare several CNN architectures and board encodings, augment them with state-of-the-art extensions, train on an extensive database of experts' moves, and examine them with respect to move prediction accuracy and playing strength. The empirical evaluation confirms high capabilities of neural move predictors and suggests a strong correlation between prediction accuracy and playing strength. The best CNNs not only surpass all other 1-ply Othello players proposed to date but defeat (2-ply) Edax, the best open-source Othello player.
... In the experiments, we apply Co-CMA-ES to Othello, a game which has been a popular benchmark recently [33], [36], [54], [45], [43], [41]. One of the main reasons why Othello is so interesting is that the game is full of dramatic reversals caused by the rapid changes in dominance on the board. ...
... In recent years, the game of Othello has been frequently employed for evaluating both evolutionary [33], [36], [54], [41], [8], [26], [51] and temporal difference learning methods [45], and for comparing their empirical results [27], [41], [50]. The game of Othello is a deterministic, sequential, zero-sum board game played by two players using doublesided pieces with white and black face, each face assigned to one player. ...
... Nevertheless, the performance gain obtained by CMA-ES compensates for this negative effect. Therefore, in the rest of this work, we use only coevolutionary CMA-ES. 1) Systematic vs. Random Snake-Like n-Tuples: The most popular method of placing n-tuples on the board consists in randomly generating a small number of long, snakeshaped sequences [36], [50], [41]. Lucas, who introduced this method [31], created tuples by starting from a random location and taking a random walk of 6 steps in any of the 8 orthogonal or diagonal directions. ...
Article
Full-text available
One weakness of coevolutionary algorithms observed in knowledge-free learning of strategies for adversarial games has been their poor scalability with respect to the number of parameters to learn. In this paper, we investigate to what extent this problem can be mitigated by using Covariance Matrix Adaptation Evolution Strategy, a powerful continuous optimization algorithm. In particular, we employ this algorithm in a competitive coevolutionary setup denoting this setting as Co-CMA-ES. We apply it to learn position evaluation functions for the game of Othello and find out that, in contrast to plain (co)evolution strategies, Co-CMA-ES learns faster, finds superior game-playing strategies and scales better. Its advantages come out into the open especially for large parameter spaces of tens of hundreds of dimensions. For Othello, combining Co-CMA-ES together with an experimentally-tuned derandomized systematic n-tuple networks significantly improved the current state of the art. Our best strategy outperforms all the other Othello 1-ply players published to date by a large margin regardless of whether the round-robin tournament among them involves a fixed set of initial positions or the standard initial position but randomized opponents. These results show a large potential of CMA-ES-driven coevolution, which could be, presumably, exploited also in other games.
... When value functions with thousands of parameters are used, then the more directed search accomplished by TDL seems preferable due to its better use of the available information [7]. A combination of TDL and coevolution can also work well [10], [11]. ...
... Evolution was via a (5 5) evolution strategy (ES) run for 150 generations, with a total of 30 000 games being run each generation for the TDL training. Note that any reasonable set of -tuples would be fine for the current work, such as those described in [10] and [11], and players based on [10] and [11] were used for comparative evaluation in our round-robin league. ...
... Evolution was via a (5 5) evolution strategy (ES) run for 150 generations, with a total of 30 000 games being run each generation for the TDL training. Note that any reasonable set of -tuples would be fine for the current work, such as those described in [10] and [11], and players based on [10] and [11] were used for comparative evaluation in our round-robin league. ...
Article
Full-text available
This paper investigates the use of preference learning as an approach to move prediction and evaluation function approximation, using the game of Othello as a test domain. Using the same sets of features, we compare our approach with least squares temporal difference learning, direct classification, and with the Bradley–Terry model, fitted using minorization–maximization (MM). The results show that the exact way in which preference learning is applied is critical to achieving high performance. Best results were obtained using a combination of board inversion and pair-wise preference learning. This combination significantly outperformed the others under test, both in terms of move prediction accuracy, and in the level of play achieved when using the learned evaluation function as a move selector during game play.
... With the help of a game tree search algorithm, this allows for selecting the move that leads to the most favorable afterstate. Most recent works on learning Othello strategies have focused on creating board evaluation functions [24], [26] and we follow that trend in this study. Moreover, we focus our research on comparison between learning procedures rather than developing efficient tree search algorithms. ...
... This task formulation is addressed by, among others, Temporal Difference Learning (TDL) and Coevolutionary Learning (CEL), which were applied to Othello by Lucas and Runarsson [24]. Other examples of using self-learning approaches for Othello include coevolution of spatially aware MLPs [5], TD-leaf learning of structured neural networks [41], coevolutionary temporal difference learning [36], and Nash Memory applied for coevolved n-tuple networks [26]. That study inspired our previous paper [36] in which we compare these methods with their direct hybridization called Coevolutionary Temporal Difference Learning (CTDL). ...
... Kim et al. [13] trained a population of neural networks with TD(0) and used the resulting strategies as an input for the standard genetic algorithm with mutation as the only variation operator. In [26] a coevolutionary algorithm is combined with TDL used as a weight mutation operator and applied to the game of Othello. Contrary to the approach presented here that uses straightforward coevolution with no long-term memory mechanism, the authors of [26] employed the Nash Memory algorithm [10] with bounded archives. ...
Article
Full-text available
This study investigates different methods of learning to play the game of Othello. The main questions posed concern scalability of algorithms with respect to the search space size and their capability to generalize and produce players that fare well against various opponents. The considered algorithms represent strategies as n-tuple networks, and employ self-play temporal difference learning (TDL), evolutionary and coevolutionary learning, and hybrids thereof. To assess the performance, three different measures are used: score against an a priori given opponent (a fixed heuristic strategy), against opponents trained by other methods (round-robin tournament), and against the top-ranked players from the online Othello League. We demonstrate that although evolutionary-based methods yield players that fare best against a fixed heuristic player, it is the coevolutionary temporal difference learning (CTDL), a hybrid of coevolution and TDL, that generalizes better and proves superior when confronted with a pool of previously unseen opponents. Moreover, CTDL scales well with the size of representation, attaining better results for larger n-tuple networks. By showing that a strategy learned in this way wins against the top entries from the Othello League, we conclude that it is one of the best 1-ply Othello players obtained to date without explicit use of human knowledge.
... This task formulation is addressed by, among others, Temporal Dierence Learning (TDL) and Coevolutionary Learning (CEL), which were applied to Othello by Lucas and Runarsson [13] . Other examples of using self-learning approaches for Othello include coevolution of spatially aware MLPs [4] , coevolutionary temporal dierence learning [23], and Nash Memory applied for coevolved n-tuple networks [15]. ...
... Kim et al. [8] trained a population of neural networks with T D(0) and used the resulting strategies as an input for the standard genetic algorithm with mutation as the only variation operator. In [15] a bounded-size Nash Memory archive for coevolution is combined with the TDL used as a weight mutation operator. ...
... We rely on n-tuple network because of its appealing potential demonstrated in recent studies [15, 11] and promising results in the Othello League [12] . We start from small networks formed by 7 instances of 4-tuples (7◊4) which include 567 weights. ...
Conference Paper
Full-text available
We propose Coevolutionary Gradient Search, a blueprint for a family of iterative learning algorithms that combine elements of local search and population-based search. The approach is applied to learning Othello strategies represented as n-tuple networks, using different search operators and modes of learning. We focus on the interplay between the continuous, directed, gradient-based search in the space of weights, and fitness-driven, combinatorial, coevolutionary search in the space of entire n-tuple networks. In an extensive experiment, we assess both the objective and relative performance of algorithms, concluding that the hybridization of search techniques improves the convergence. The best algorithms not only learn faster than constituent methods alone, but also produce top ranked strategies in the online Othello League.
... The effectiveness of n-tuple network highly depends on the placement of n-tuples [7]. Typically, n-tuples architectures consist of a small number of long, randomly generated, snakeshaped n-tuples [8], [7], [9]. ...
... 3) Other Approaches: Logistello [4], computer player, which beat the human Othello world champion in 1997, used 11 n-tuples of n ∈ {3, 10}, hand-crafted by an expert. External knowledge has also been used by Manning [8], who, generated a diverse 12 × 6-tuple network using random inputs method from Breiman's Random Forests basing on a set of 10 000 labeled random games. ...
Article
Full-text available
N-tuple networks have been successfully used as position evaluation functions for board games such as Othello or Connect Four. The effectiveness of such networks depends on their architecture, which is determined by the placement of constituent n-tuples (sequences of board locations) providing input to the network. The most popular method of placing n-tuples consists in randomly generating a small number of long, snake-shaped board location sequences. In this paper, we show that learning n-tuple networks is more effective if it involves, instead, a large number of systematically placed, short, straight n-tuples. In addition, we demonstrate that a simple variant of coevolutionary learning can evolve a systematic n-tuple network with tuples of size just 2 of a comparable performance to the best 1-ply Othello players. Our network consists of only 288 parameters, which is an order of magnitude less than the top published players to date. This indicates a need for more effective learning methods that would be capable of taking a full advantage of larger networks.
... The true picture is more subtle than this, with some evolutionary algorithms exploiting experience gained during the lifetime of an individual to bias the variation operators [1]. Nonetheless, in its purest form the only information gained in an evolutionary algorithm is at the selection stage. ...
... In contrast to this, RL algorithms such as temporal difference learning (TDL) learn during the lifetime of an agent as rewards are given during the task. The true picture is more subtle than this, with some evolutionary algorithms exploiting experience gained during the lifetime of an individual to bias the variation operators [1]. Nonetheless, in its purest form the only information gained in an evolutionary algorithm is at the selection stage. ...
Conference Paper
Full-text available
When learning to play a game or perform some task, it is important to learn as quickly and effectively as possible by making best use of the available information. Interesting insights can be gained by studying the learning process from an information theory perspective, and analysing the learning speed in terms of the maximum number of bits that could be learned per game/task, or per action. Previous work has applied this analysis to co-evolution and to temporal difference learning (TDL) for a simple board game with a fixed number of moves. This paper analyses a grid-world problem and calculates the upper bounds on the information rates for evolution and for TDL. The results show an interesting relationship between the upper bounds of the learning rates and the actual information acquisition rates that are achieved in practice. Also, which method works best is highly dependent on the choice of function approximator.
... 1) n-Tuple Network: One particular type of function approximator is n-tuple network [15], which has been recently successfully applied to board games such as Othello [7], [17], [33], [12], Connect 4 [37], or Tetris [13]. ...
Preprint
2048 is an engaging single-player, nondeterministic video puzzle game, which, thanks to the simple rules and hard-to-master gameplay, has gained massive popularity in recent years. As 2048 can be conveniently embedded into the discrete-state Markov decision processes framework, we treat it as a testbed for evaluating existing and new methods in reinforcement learning. With the aim to develop a strong 2048 playing program, we employ temporal difference learning with systematic n-tuple networks. We show that this basic method can be significantly improved with temporal coherence learning, multi-stage function approximator with weight promotion, carousel shaping, and redundant encoding. In addition, we demonstrate how to take advantage of the characteristics of the n-tuple network, to improve the algorithmic effectiveness of the learning process by i) delaying the (decayed) update and applying lock-free optimistic parallelism to effortlessly make advantage of multiple CPU cores. This way, we were able to develop the best known 2048 playing program to date, which confirms the effectiveness of the introduced methods for discrete-state Markov decision problems.
... Reinforcement learning (RL) has been used widely in developing self learning agents for computer games. In [17,18] self-play learning agents have been utilized to create strategies. Go is a self-play learning agent that has been studied in [19]. ...
... Since then, RL has employed in developing self learning agents for computer games. Agents that create strategies by self-play learning has been studied by many researchers in the field of AI [3], [4]. Go is an example of self-play learning agents, which has been studied in [5]. ...
Conference Paper
Full-text available
Recent years have proven the existing room of deep reinforcement learning (DRL) applications. DRL has been utilized as an AI computer player in many board games. Seejeh is an ancient board game, where no one attempts to create an AI system that is able to learn to play it. Seejeh is a two-player, zero-sum, discrete, finite and deterministic game of perfect information. Seejeh board game is different from all other strategic board games. It has two stages; Positioning and moving. Player place two tiles at each action in stage one. A player might have a sequence of moves in the second stage unlike Othello and Go. In this work, we develop an automated player based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. This paper presents a self-play algorithm utilizing DRL and search algorithms. The system starts with a neural network that knows nothing about the game of Seejeh. It then plays games against itself, by combining this neural network with powerful search algorithms. To the best of our knowledge, we are the first who develop an agent that learns to play Seejeh game.
... The position evaluation in Othello has been frequently employed for evaluating both evolutionary [6,14,15,30,35,39,41] and temporal di erence learning methods [36], and for comparing their empirical results [17,35,38]. The best evaluation function for Othello to date has been obtained using the Coevolutionary CMA-ES algorithm [16]. ...
Conference Paper
Among many interaction schemes in coevolutionary settings for interactive domains, the round-robin tournament provides the most precise evaluation of candidate solutions at the expense of computational effort. In order to improve the coevolutionary learning speed, we propose an interaction scheme that computes only a fraction of interactions outcomes between the pairs of coevolving individuals. The missing outcomes in the interaction matrix are predicted using matrix factorization. The algorithm adaptively decides how much of the interaction matrix to compute based on the learning speed statistics. We evaluate our method in the context of coevolutionary covariance matrix adaptation strategy (CoCMAES) for the problem of learning position evaluation in the game of Othello. We show that our adaptive interaction scheme allows to match the state-of-the-art results obtained by the standard round-robin CoCMAES while, at the same time, considerably improves the learning speed.
... 1) n-Tuple Network: One particular type of function approximator is n-tuple network [12], which has been recently successfully applied to board games such as Othello [6], [16], [29], [10], Connect 4 [31], or Tetris [11]. ...
Article
2048isanengagingsingleplayer,nondeterministicvideopuzzlegame,which,thankstothesimplerulesandhardtomastergameplay,hasgainedmassivepopularityinrecentyears.As is an engaging single-player, nondeterministic video puzzle game, which, thanks to the simple rules and hard-to-master gameplay, has gained massive popularity in recent years. As 2048canbeconvenientlyembeddedintothediscretestateMarkovdecisionprocessesframework,wetreatitasatestbedforevaluatingexistingandnewmethodsinreinforcementlearning.Withtheaimtodevelopastrong can be conveniently embedded into the discrete-state Markov decision processes framework, we treat it as a testbed for evaluating existing and new methods in reinforcement learning. With the aim to develop a strong 2048playingprogram,weemploytemporaldifferencelearningwithsystematicntuplenetworks.Weshowthatthisbasicmethodcanbesignificantlyimprovedwithtemporalcoherencelearning,multistagefunctionapproximatorwithweightpromotion,carouselshaping,andredundantencoding.Inaddition,wedemonstratehowtotakeadvantageofthecharacteristicsofthentuplenetwork,toimprovethealgorithmiceffectivenessofthelearningprocessbyi)delayingthe(decayed)updateandapplyinglockfreeoptimisticparallelismtoeffortlesslymakeadvantageofmultipleCPUcores.Thisway,wewereabletodevelopthebestknown playing program, we employ temporal difference learning with systematic n-tuple networks. We show that this basic method can be significantly improved with temporal coherence learning, multi-stage function approximator with weight promotion, carousel shaping, and redundant encoding. In addition, we demonstrate how to take advantage of the characteristics of the n-tuple network, to improve the algorithmic effectiveness of the learning process by i) delaying the (decayed) update and applying lock-free optimistic parallelism to effortlessly make advantage of multiple CPU cores. This way, we were able to develop the best known 2048$ playing program to date, which confirms the effectiveness of the introduced methods for discrete-state Markov decision problems.
... 1. Standard WPC Heuristic Player (SWH), hand-crafted by Yoshioka et al. (1999), and often used as an opponent in Othello research (Lucas and Runarsson, 2006;Szubert et al., 2009;Manning, 2010 (Szubert et al., 2009;Szubert et al., 2011;Krawiec et al., 2011). ...
Article
Full-text available
In test-based problems, solutions produced by search algorithms are typically assessed using average outcomes of interactions with multiple tests. This aggregation leads to information loss, which can render different solutions apparently indifferent and hinder comparison of search algorithms. In this paper we introduce performance profile, a generic, domain-independent, multi-criteria performance evaluation method that mitigates this problem by characterizing performance of a solution by a vector of outcomes of interactions with tests of various difficulty. To demonstrate the usefulness of this gauge, we employ it to analyze the behavior of Othello and Iterated Prisoner's Dilemma players produced by five (co)evolutionary algorithms as well as players known from previous publications. Performance profiles reveal interesting differences between the players, which escape the attention of the scalar performance measure of expected utility. In particular , they allow us to observe that evolution with random sampling produces players coping well against the mediocre opponents, while the coevolutionary and temporal difference learning strategies play better against the high-grade opponents. We postulate that performance profiles improve our understanding of characteristics of search algorithms applied to arbitrary test-based problems, and can prospectively help designing better methods for interactive domains.
... The major design choice concerns the representation of the value functions V : S → R, which are used by each of the considered algorithms to evaluate moves and, in effect, determine the game-playing policy. In problems with small s next , r next ← COMPUTE AFTERSTATE(s , a next ) 8: One particular type of such approximators are n-tuple networks [15], which have been recently successfully applied to Othello [16], [17], [18], [12] and Connect 4 [19]. ...
Conference Paper
Full-text available
The highly addictive stochastic puzzle game 2048 has recently invaded the Internet and mobile devices, stealing countless hours of players' lives. In this study we investigate the possibility of creating a game-playing agent capable of winning this game without incorporating human expertise or performing game tree search. For this purpose, we employ three variants of temporal difference learning to acquire i) action value, ii) state value, and iii) afterstate value functions for evaluating player moves at 1-ply. To represent these functions we adopt n-tuple networks, which have recently been successfully applied to Othello and Connect 4. The conducted experiments demonstrate that the learning algorithm using afterstate value functions is able to consistently produce players winning over 97% of games. These results show that n-tuple networks combined with an appropriate learning algorithm have large potential, which could be exploited in other board games.
... The aim here is not to find Othello players that are strong in absolute terms, or even especially strong value functions. The best performing value functions for Othello are tabular value functions [5] or N-tuple systems [6] [7] (which are essentially equivalent to each other), but these involve learning thousands or tens of thousands of parameters. We use Othello as an interesting domain of study in which to measure performance and intransitivities in coevolution. ...
Article
Full-text available
Coevolution is a natural choice for learning in problem domains where one agent's behavior is directly related to the behavior of other agents. However, there is a known tendency for coevolution to produce mediocre solutions. One of the main reasons for this is cycling, caused by intransitivities among a set of players. In this paper, we explore the link between coevolution and games, and revisit some of the coevolutionary literature in a games and measurement context. We propose a set of measurements to identify cycling in a population and a new algorithm that tries to minimize cycling in strictly competitive (zero sum) games. We experimentally verify our approach by evolving weighted piece counter value functions to play othello, a classic two-player perfect information board game. Our method is able to find extremely strong value functions of this type.
... However, recent research has seen the advent of methods that learn autonomously, without any help of external domain knowledge. Typical examples of such methods are CEL and Temporal Difference Learning, which were applied to Othello separately [4] [19] [28] as well as in combination [27] [22]. Our work follows this direction of knowledgefree methods and focuses on CEL. ...
Conference Paper
Full-text available
Recent developments cast doubts on the effectiveness of coevolutionary learning in interactive domains. A simple evolution with fitness evaluation based on games with random strategies has been found to generalize better than competitive coevolution. In an attempt to investigate this phenomenon, we analyze the utility of random opponents for one- and two-population competitive coevolution applied to learning strategies for the game of Othello. We show that if coevolution uses two-population setup and engages also random opponents, it is capable of producing equally good strategies as evolution with random sampling for the expected utility performance measure. To investigate the differences between analyzed methods, we introduce performance profile, a tool that measures the player's performance against opponents of various strength. The profiles reveal that evolution with random sampling produces players coping well with mediocre opponents, but playing relatively poorly against stronger ones. This finding explains why in the round-robin tournament, evolution with random sampling is one of the worst methods from all those considered in this study.
... The value of each input and its equivalents is used to calculate the index into the table. The value function is computed by summing all the LUT weights indexed by all n-tuples [9]. f. ...
... The value of each input and its equivalents is used to calculate the index into the table. The value function is computed by summing all the LUT weights indexed by all n-tuples [9]. f. ...
Conference Paper
Full-text available
Computer games have made themselves important increasingly with time. The intelligence level of these games has improved and humans have been defeated several times by the game engines like 'Chess', 'Go', 'Othello' and 'Checkers'. 'Time per move' and 'number of win/lose' are two different yardsticks and are used to measure the effectiveness of the evaluation functions used to implement the game. This paper is concentrated on finding the most efficient evaluation function for Othello. Extensive experimentation (approaching to 144,000 experiments collectively) has been done to measure the effectiveness of each evaluation function. The online engine is capable of playing the game up-till ten levels but in this paper we are covering the comparisons up to level 4. By making the comparisons based on the time, complexity, wins and draws it is easy to vindicate or reconsider the choice of evaluation function. After thorough experimentation it is proved that 'Standard Weighted Piece Counter (S-WPC)' is the best strategy among available with respect to its efficiency.
... The value of each input and its equivalents is used to calculate the index into the table. The value function is computed by summing all the LUT weights indexed by all n-tuples [9]. ...
Article
Full-text available
Games have emerged as one of the most important areas of study in the field of Artificial Intelligence. Nowadays, we find a lot of game engines such as Othello, Checker and Go that have outperformed the best human players. Evaluation Function is the most important part in the design of any game engines, as it reflects the quality and strength of these engines. The game of Othello has proved its prominence by being an active research area since long time now and has been successful to grab extensive focus of researchers, knowledge engineers and game developers. Othello is not as simple as Checkers and not as complex as Chess: both in its execution time and complexity, therefore it is an appropriate choice to be considered as a benchmark in the games development. Finding a better evaluation function to implement Othello has been an open question of research since long. In this paper we have compared different available strategies at length. An online gaming engine has developed by implementing the highly ranked evaluation functions. Extensive experimentation (approaching to 144,000 experiments collectively) has been done to measure the effectiveness of each evaluation function. The online engine is capable of playing the game up-till ten levels but in this paper we are covering the comparisons up to level 4. By making the comparisons based on the time, complexity, wins and draws it is easy to vindicate or reconsider the choice of evaluation function. After thorough experimentation it is proved that Multi Layer Perceptron Neural Network (MLPNN) is the best strategy among available with respect to its win/draw comparisons.
... Recently, Fogel and colleagues used a similar method to coevolve a neural network chess player, Blondie 25, which beat Pocket Fritz 2.0 demonstrating a performance rank of 2650 [46] and won a game with Fritz 8.0 [47], one of the strongest chess programs. Some research shows the potential of coevolutionary algorithms or hybridized coevolutionary algorithms for Othello [22,86,136,93,94] or for a small-board version of Go [84,123,80]; how- ever, particularly for the latter game, coevolutionary methods were unable to produce a player exhibiting a master level of play. Coevolutionary algorithms were also applied to non-deterministic games such as backgammon [114,5,130], Texas holdem poker [102], Blackjack [19], and robotic soccer [89,87]. ...
Thesis
Full-text available
Problems in which some elementary entities interact with each other are common in computational intelligence. This scenario, typical for coevolving artificial-life agents, learning strategies for games, and machine learning from examples, can be formalized as a test-based problem and conveniently embedded in the common conceptual framework of coevolution. In test-based problems candidate solutions are evaluated on a number of test cases such as agents, opponents or examples. Although coevolutionary algorithms proved successful in some applications, they also turned out to have hard to predict dynamics and fail to sustain progress during a run, thus being unable to obtain competitive solutions for many test-based problems. It has been recently shown that one of the reasons why coevolutionary algorithms demonstrate such undesired behavior is the aggregation of results of interactions between individuals representing candidate solutions and tests, which typically leads to characterizing the performance of an individual by a single scalar value. In order to remedy this situation, in the thesis, we make an attempt to get around the problem of aggregation using two methods. First, we introduce Fitnessless Coevolution, a method for symmetrical test-based problems. Fitnessless Coevolution plays games between individuals to settle tournaments in the selection phase and skips the typical phase of evaluation and the aggregation of results connected with it. The selection operator applies a single-elimination tournament to a randomly drawn group of individuals, and the winner of the final round becomes the result of selection. Therefore, Fitnessless Coevolution does not involve explicit fitness measure and no aggregation of interaction results is required. We prove that, under a condition of transitivity of the payoff matrix, the dynamics of Fitnessless Coevolution is identical to that of the traditional evolutionary algorithm. The experimental results, obtained on a diversified group of problems, demonstrate that Fitnessless Coevolution is able to produce solutions that are equally good or better than solutions obtained using fitness-based one-population coevolution with different selection methods. In a case study, we provide the complete record of methodology that let us evolve BrilliAnt, the winner of the Ant Wars contest. We detail the coevolutionary setup that lead to BrilliAnt's emergence, assess its direct and indirect human-competitiveness, and describe the behavioral patterns observed in its strategy. Second, we study the consequences of the fact that the problem of aggregation of interaction results may be got around by regarding every test of a test-based problem as a separate objective, and the whole problem as a multi-objective optimization task. Research on reducing the number of objectives while preserving the relations between candidate solutions and tests led to the notions of underlying objectives and internal problem structure, which can be formalized as a coordinate system that spatially arranges candidate solutions and tests. The coordinate system that spans the minimal number of axes determines the so-called dimension of a problem and, being an inherent property of every test-based problem, is of particular interest. We investigate in-depth the formalism of coordinate system and its properties, relate them to the properties of partially ordered sets, and design an exact algorithm for finding a minimal coordinate system. We also prove that this problem is NP-hard and come up with a heuristic which is superior to the best algorithm proposed so far. Finally, we apply the algorithms to several benchmark problems to demonstrate that the dimension of a problem is typically much lower than the number of tests. Our work suggest that for at least some classes of test-based problems, the dimension of a problem may be proportional to the logarithm of number of tests. Based on the above-described theoretical results, we propose a novel coevolutionary archive method founded on the concept of coordinate systems, called Coordinate System Archive (COSA), and compare it to two state-of-the-art archive methods, IPCA and LAPCA. Using two different objective performance measures, we find out that COSA is superior to these methods on a class of artificial test-based problems.
... Kim et al. [42] trained a population of neural networks with T D(0) and used the resulting strategies as an input for the standard genetic algorithm with mutation as the only variation operator. Recent work by Manning [43] demonstrate a bounded-size Nash Memory archive for coevolution [44] and employed TDL as a weight mutation operator. In [2], Singer has shown that reinforcement learning can be superior to random mutation as an exploration mechanism. ...
Conference Paper
Full-text available
In this paper we apply Coevolutionary Temporal Difference Learning (CTDL), a hybrid of coevolutionary search and reinforcement learning proposed in our former study, to evolve strategies for playing the game of Go on small boards (5×5). CTDL works by interlacing exploration of the search space provided by one-population competitive coevolution and exploitation by means of temporal difference learning. Despite using simple representation of strategies (weighted piece counter), CTDL proves able to evolve players that defeat solutions found by its constituent methods. The results of the conducted experiments indicate that our algorithm turns out to be superior to pure coevolution and pure temporal difference learning, both in terms of performance of the elaborated strategies and the computational cost. This demonstrates the existence of synergistic interplay between components of CTDL, which we also briefly discuss in this study.
... The major contribution of this paper is a novel method for what we think is the most popular solution concept, Nash equilibria in n-player zero sum games. Compared to an alternative method of finding Nash equilibria using coevolution proposed by Ficici [9] (see [17] for a real game implementation), our algorithm does not require either an archive nor solving linear programming equations; the end result of the algorithm is just one agent, rather than a collection of agents; we also think there are certain advantages we have compared to non-evolutionary methods, mostly related to imperfect recall (i.e when the agent forgets parts of the historical events that took place in a game). ...
Conference Paper
Full-text available
Coevolutionary algorithms are plagued with a set of problems related to intransitivity that make it questionable what the end product of a coevolutionary run can achieve. With the introduction of solution concepts into coevolution, part of the issue was alleviated, however efficiently representing and achieving game theoretic solution concepts is still not a trivial task. In this paper we propose a coevolutionary algorithm that approximates behavioural strategy Nash equilibria in n-player zero sum games, by exploiting the min-max solution concept. In order to support our case we provide a set of experiments in both games of known and unknown equilibria. In the case of known equilibria, we can confirm our algorithm converges to the known solution, while in the case of unknown equilibria we can see a steady progress towards Nash.
Article
This paper describes a methodology for quickly learning to play games at a strong level. The methodology consists of a novel combination of three techniques, and a variety of experiments on the game of Othello demonstrates their usefulness. First, structures or topologies in neural network connectivity patterns are used to decrease the number of learning parameters and to deal more effectively with the structural credit assignment problem, which is to change individual network weights based on the obtained feedback. Furthermore, the structured neural networks are trained with the novel neural-fitted temporal difference (TD) learning algorithm to create a system that can exploit most of the training experiences and enhance learning speed and performance. Finally, we use the neural-fitted TD-leaf algorithm to learn more effectively when look-ahead search is performed by the game-playing program. Our extensive experimental study clearly indicates that the proposed method outperforms linear networks and fully connected neural networks or evaluation functions evolved with evolutionary algorithms.
Conference Paper
Cycling has been an obstacle to coevolution of machine-learning agents. Monotonic algorithms seek continual improvement with respect to a solution concept; seeking an agent or set of agents that approaches the true solution without cycling. Algorithms that guarantee monotonicity generally require unlimited storage. One such algorithm is the Nash Memory, which uses the Nash Equilibrium as the solution concept. The requirement for unbounded storage is an obstacle to the use of this algorithm in large applications. This paper demonstrates the performance of the Nash Memory algorithm with fixed storage in coevolving a population of moderately large agents (with knowledge represented as n-tuple networks) learning a function with a large state space (an evaluation function for the game of Othello). The success of the algorithm results from the diversity of the agents produced, and the corresponding need for improved global performance in order for agents to survive and reproduce. The algorithm can be expected to converge to a region of highest performance within the capability of the search operators.
Article
Full-text available
Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a “meta-game” of self-learning.
Article
Full-text available
We review the current state of the art of methods for numerical computation of Nash equilibria for finite n-person games. Classical path following methods, such as the Lemke-Howson algorithm for two person games, and Scarf-type fixed point algorithms for n-person games provide globally convergent methods for finding a sample equilibrium. For large problems, methods which are not globally convergent, such as sequential linear complementarity methods may be preferred on the grounds of speed. None of these methods are capable of characterizing the entire set of Nash equilibria. More computationally intensive methods, which derive from the theory of semi-algebraic sets are required for finding all equilibria. These methods can also be applied to compute various equilibrium refinements. JEL classification numbers: Key words: COMPUTATION OF EQUILIBRIA IN FINITE GAMES Richard D. McKelvey Andrew McLennan Contents 1 Introduction 1 2 Notation and Problem Statement 2 3 Computing a Sample E...
Conference Paper
Full-text available
Genetic algorithms are computational models of die evolution of good solutions to problems based on the selective reproduction of the best variants and the constant addition of random variability to the population of variants. In biological evolution variants are inherited genotypes that are transmitted from parents to offspring. In cultural evolution behavioral variants are trasmitted from one individual to another because one individual (the learner) imitates another individual (the teacher). If the two individuals belong to successive generations, reproduction of teachers is selective, and random noise in added to the cultural transmission of behaviors from teachers to learners, good solutions to problems can evolve by cultural rather than biological evolution. We describe a model of imitation learning (inspired by Hutchins and Hazelhurst, 1995) according to which the learner learns via backpropagation using the output of the teacher in response to some shared input as its teaching input. Cultural transmission of behaviors from one generation to the next via imitation learning leads to the progressive deterioration of performance across generations. However, if only the best individuals of each generation function as teachers and, furthermore, the teaching input provided by the teacher is modified by noise before it is used by the learner, not only culturally transmitted behaviors do not deteriorate but initially nonexistent behavioral capacities can emerge evolutionarily via pure cultural transmission as they can be shown to emerge via genetic transmission.
Article
Full-text available
This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously de-veloped weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games of self-play learning. The conclusion is that n-tuple networks learn faster and better than the other more conventional approaches.
Chapter
Full-text available
All natural cognitive systems, and, in particular, our own, gradually forget previously learned information. Plausible models of human cognition should therefore exhibit similar patterns of gradual forgetting of old information as new information is acquired. Only rarely does new learning in natural cognitive systems completely disrupt or erase previously learned information; that is, natural cognitive systems do not, in general, forget ‘catastrophically’. Unfortunately, though, catastrophic forgetting does occur under certain circumstances in distributed connectionist networks. The very features that give these networks their remarkable abilities to generalize, to function in the presence of degraded input, and so on, are found to be the root cause of catastrophic forgetting. The challenge in this field is to discover how to keep the advantages of distributed connectionist networks while avoiding the problem of catastrophic forgetting. In this article the causes, consequences and numerous solutions to the problem of catastrophic forgetting in neural networks are examined. The review will consider how the brain might have overcome this problem and will also explore the consequences of this solution for distributed connectionist networks.
Article
Full-text available
This article introduces a class of incremental learning procedures specialized for prediction-that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.
Conference Paper
Full-text available
We review and investigate the current status of intransitivity as a potential obstacle in coevolution. Pareto-Coevolution avoids intransitivity by translating any standard superiority relation into a transitive Pareto-dominance relation. Even for transitive problems though, cycling is possible. Recently however, algorithms that provide monotonic progress for Pareto-Coevolution have become available. The use of such algorithms avoids cycling, whether caused by intransitivity or not. We investigate this in experiments with two intransitive test problems, and find that the IPCA and LAPCA archive methods establish monotonic progress on both test problems, thereby substantially outperforming the same method without an archive.
Conference Paper
Full-text available
One problem associated with coevolutionary algorithms is that of forgetting, where one or more previously acquired traits are lost only to be needed later. We introduce a new coevolutionary memory mechanism to help prevent forgetting that is built upon game-theoretic principles, specifically Nash equilibrium. This “Nash memory” mechanism has the following properties: 1) It accumulates a collection of salient traits discovered by search, and represents this collection as a mixed strategy. 2) This mixed strategy monotonically approaches the quality of a Nash equilibrium strategy as search progresses, thus acting as a “ratchet” mechanism. 3) The memory naturally embodies the result (solution) obtained by the coevolutionary process. 4) The memory appropriately handles intransitive cycles (subject to resource limitations). We demonstrate our Nash memory using Watson and Pollack’s intransitive numbers game, and compare its performance to the conventional “Hall of Fame” memory and the more recently proposed Dominance Tournament.
Conference Paper
Full-text available
This paper compares the use of temporal difference learning (TDL) versus co-evolutionary learning (CEL) for acquiring position evaluation functions for the game of Othello. The paper provides important insights into the strengths and weaknesses of each approach. The main findings are that for Othello, TDL learns much faster than CEL, but that properly tuned CEL can learn better playing strategies. For CEL, it is essential to use parent-child weighted averaging in order to achieve good performance. Using this method a high quality weighted piece counter was evolved, and was shown to significantly outperform a set of standard heuristic weights
Conference Paper
Full-text available
Coevolution can be used to adaptively choose the tests used for evaluating candidate solutions. A long-standing question is how this dynamic setup may be organized to yield reliable search methods. Reliability can only be considered in connection with a particular solution concept specifying what constitutes a solution. Recently, monotonic coevolution algorithms have been proposed for several solution concepts. Here, we introduce a new algorithm that guarantees monotonicity for the solution concept of maximizing the expected utility of a candidate solution. The method, called MaxSolve, is compared to the IPCA algorithm and found to perform more efficiently for a range of parameter values on an abstract test problem.
Article
Full-text available
All natural cognitive systems, and, in particular, our own, gradually forget previously learned information. Plausible models of human cognition should therefore exhibit similar patterns of gradual forgetting of old information as new information is acquired. Only rarely does new learning in natural cognitive systems completely disrupt or erase previously learned information; that is, natural cognitive systems do not, in general, forget 'catastrophically'. Unfortunately, though, catastrophic forgetting does occur under certain circumstances in distributed connectionist networks. The very features that give these networks their remarkable abilities to generalize, to function in the presence of degraded input, and so on, are found to be the root cause of catastrophic forgetting. The challenge in this field is to discover how to keep the advantages of distributed connectionist networks while avoiding the problem of catastrophic forgetting. In this article the causes, consequences and numerous solutions to the problem of catastrophic forgetting in neural networks are examined. The review will consider how the brain might have overcome this problem and will also explore the consequences of this solution for distributed connectionist networks.
Article
Full-text available
Similarities between bootstrap aggregation (bagging) and N-tuple sampling are explored to propose a retina-free data-driven version of the N-tuple network, whose close analogies to aggregated regression trees, such as classification and regression trees (CART), lead to further architectural enhancements. Performance of the proposed algorithms is compared with the traditional versions of the N-tuple and CART networks on a number of regression problems. The architecture significantly outperforms conventional N-tuple networks while leading to more compact solutions and avoiding certain implementational pitfalls of the latter.
Conference Paper
Full-text available
This paper presents an artificial neural network with shared weights, trained to play the game of Othello by self-play with temporal difference learning (TDL). The network performs as well as the champion of the CEC 2006 Othello Evaluation Function Competition. The TDL-trained network contains only 67 unique weights compared to 2113 for the champion
Article
Full-text available
A study was conducted to find out how game-playing strategies for Othello (also known as reversi) can be learned without expert knowledge. The approach used the coevolution of a fixed-architecture neural-network-based evaluation function combined with a standard minimax search algorithm. Comparisons between evolving neural networks and computer players that used deterministic strategies allowed evolution to be observed in real-time. Neural networks evolved to outperform the computer players playing at higher ply-depths, despite being handicapped by playing black and using minimax at ply-depth of two. In addition, the playing ability of the population progressed from novice, to intermediate, and then to master's level. Individual neural networks discovered various game-playing strategies, starting with positional and later mobility. These results show that neural networks can be evolved as evaluation functions, despite the general difficulties associated with this approach. Success in this case was due to a simple spatial preprocessing layer in the neural network that captured spatial information, self-adaptation of every weight and bias of the neural network, and a selection method that allowed a diverse population of neural networks to be carried forward from one generation to the next.
Article
Full-text available
Coevolution offers adaptive methods for the selection of tests used to evaluate individuals, but the resulting evaluation can be unstable. Recently, general archive-based coevolution methods have become available for which monotonic progress can be guaranteed. The size of these archives may grow indefinitely however, thus limiting their application potential. Here, we investigate how the size of an archive for ParetoCoevolution may be limited while maintaining reliability. The LAyered Pareto-Coevolution Archive (LAPCA) is presented, and investigated in experiments. LAPCA features a tunable degree of reliability, and is found to provide reliable progress in a difficult test problem while maintaining approximately constant archive sizes.
Conference Paper
Full-text available
A general model for the coevolution of cooperating species is presented. This model is instantiated and tested in the domain of function optimization, and compared with a traditional GA-based function optimizer. The results are encouraging in two respects. They suggest ways in which the performance of GA and other EA-based optimizers can be improved, and they suggest a new approach to evolving complex structures such as neural networks and rule sets. 1 Introduction Genetic algorithms (GAs), originally conceived by Holland [10], represent a fairly abstract model of Darwinian evolution and biological genetics. They evolve a population of competing individuals using fitness-biased selection, random mating, and a gene-level representation of individuals together with simple genetic operators (typically, crossover and mutation) for modeling inheritance of traits. These GAs have been successfully applied to a wide variety of problems including multimodal function optimization, machine learn...
Article
Full-text available
Following Tesauro's work on TD-Gammon, we used a 4000 parameter feed-forward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and choosing the move with the highest evaluation. However, no back-propagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hill-climbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a "meta-game" of self-learning. Keywords: coevolution, backgammon, reinforcement, temporal difference learning, self-learning Running Head: CO-EV...
Article
Full-text available
The replicator equation used in evolutionary game theory (EGT) assumes that strategies reproduce in direct proportion to their payoffs; this is akin to the use of fitness-proportionate selection in an evolutionary algorithm (EA). In this paper, we investigate how various other selection methods commonly used in EAs can affect the discrete-time dynamics of EGT. In particular, we show that the existence of evolutionary stable strategies (ESS) is sensitive to the selection method used. Rather than maintain the dynamics and equilibria of EGT, the selection methods we test impose a fixed-point dynamic virtually unrelated to the payoffs of the game matrix, give limit cycles, or induce chaos. These results are significant to the field of evolutionary computation because EGT can be understood as a coevolutionary algorithm operating under ideal conditions: an infinite population, noiseless payoffs, and complete knowledge of the phenotype space. Thus, certain selection methods, which may operate effectively in simple evolution, are pathological in an ideal-world coevolutionary algorithm, and therefore du- bious under real-world conditions.
Article
Full-text available
. Co-evolution can give rise to the "Red Queen effect", where interacting populations alter each other's fitness landscapes. The Red Queen effect significantly complicates any measurement of co-evolutionary progress, introducing fitness ambiguities where improvements in performance of co-evolved individuals can appear as a decline or stasis in the usual measures of evolutionary progress. Unfortunately, no appropriate measures of fitness given the Red Queen effect have been developed in artificial life, theoretical biology, population dynamics, or evolutionary genetics. We propose a set of appropriate performance measures based on both genetic and behavioral data, and illustrate their use in a simulation of co-evolution between genetically specified continuous-time noisy recurrent neural networks which generate pursuit and evasion behaviors in autonomous agents. 1 Introduction Some biologists have suggested that the `Red Queen effect' arising from coevolutionary arms races has been a p...
Article
Using a coordinated group of simple solvers to tackle a complex problem is not an entirely new idea. Its root could be traced back hundreds of years ago when ancient Chinese suggested a team approach to problem solving. For a long time, engineers have used the divide-and-conquer strategy to decompose a complex problem into simpler sub-problems and then solve them by a group of solvers. However, knowing the best way to divide a complex problem into simpler ones relies heavily on the available domain knowledge. It is often a manual process by an experienced engineer. There have been few automatic divide-and-conquer methods reported in the literature. Fortunately, evolutionary computation provides some of the interesting avenues to automatic divide-and-conquer methods [15]. An in-depth study of such methods reveals that there is a deep underlying connection between evolutionary computation and ANN ensembles.. Ideas in one area can be usefully transferred into another in producing effective algorithms. For example, using speciation to create and maintain diversity [15] had inspired the development of negative correlation learning for ANN ensembles [33], [34] and an in-depth study of diversity in ensembles [12], [51]. This paper will review some of the recent work in evolutionary approaches to designing ANN ensembles.
Article
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.
Article
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, ***, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.
Article
In this study, we use temporal difference learning (TDL) to investigate the ability of 20 different artificial neural network (ANN) architectures to learn othello game board evaluation functions. The ANN evaluation functions are applied to create a strong othello player using only 1-ply search. In addition to comparing many of the ANN architectures seen in the literature, we introduce several new architectures that consider the game board symmetry. Both embedding the game board symmetry into the network architecture through weight sharing and the outright removal of symmetry through symmetry removal are explored. Experiments varying the number of inputs per game board square from one to three, the number of hidden nodes, and number of hidden layers are also performed. We found it advantageous to consider game board symmetry in the form of symmetry by weight sharing; and that an input encoding of three inputs per square outperformed the one input per square encoding that is commonly seen in the literature. Furthermore, architectures with only one hidden layer were strongly outperformed by architectures with multiple hidden layers. A standard weighted-square board heuristic evaluation function from the literature was used to evaluate the quality of the trained ANN othello players. One of the ANN architectures introduced in this study, an ANN implementing weight sharing and consisting of three hidden layers, using only a 1-ply search, outperformed a weighted-square test heuristic player using a 6-ply minimax search.
Chapter
Many efforts have been made to discriminate, categorize, and quantitate patterns, and to reduce them into a usable machine language. The results have ordinarily been methods or devices with a high degree of specificity. For example, some devices require a special type font; others can read only one type font; still others require magnetic ink.
Article
In this article we describe reinforcement learning, a machine learning technique for solving sequential decision problems. We describe how reinforcement learning can be combined with function approximation to get approximate solutions for problems with very large state spaces. One such problem is the board game Othello, with a state space size of ap-proximately 10 28 . We apply reinforcement learning to this problem via a computer program that learns a strategy (or policy) for Othello by playing against itself. The reinforcement learning policy is evaluated against two standard strategies taken from the literature with favorable results. We contrast reinforcement learning with standard methods for solving sequential decision problems and give some examples of applications of reinforcement learning in operations research and management science from the literature.
Article
Operations research and management science are often confronted with sequential decision making problems with large state spaces. Standard methods that are used for solving such complex problems are associated with some difficulties. As we discuss in this article, these methods are plagued by the so-called curse of dimensionality and the curse of modelling. In this article, we discuss reinforcement learning, a machine learning technique for solving sequential decision making problems with large state spaces. We describe how reinforcement learning can be combined with a function approximation method to avoid both the curse of dimensionality and the curse of modelling. To illustrate the usefulness of this approach, we apply it to a problem with a huge state space—learning to play the game of Othello. We describe experiments in which reinforcement learning agents learn to play the game of Othello without the use of any knowledge provided by human experts. It turns out that the reinforcement learning agents learn to play the game of Othello better than players that use basic strategies.
Article
The assumption that acquired characteristics are not inherited is often taken to imply that the adaptations that an organism learns during its lifetime cannot guide the course of evolution. This inference is incorrect (Baldwin, 1896). Learning alters the shape of the search space in which evolution operates and thereby provides good evolutionary paths towards sets of co-adapted alleles. We demonstrate that this effect allows learning organisms to evolve much faster than their nonlearning equivalents, even though the characteristics acquired by the phenotype are not communicated to the genotype.
Article
Ever since the days of Shannon's proposal for a chess-playing algorithm [12] and Samuel's checkers-learning program [10] the domain of complex board games such as Go, chess, checkers, Othello, and backgammon has been widely regarded as an ideal testing ground for exploring a variety of concepts and approaches in artificial intelligence and machine learning. Such board games offer the challenge of tremendous complexity and sophistication required to play at expert level. At the same time, the problem inputs and performance measures are clear-cut and well defined, and the game environment is readily automated in that it is easy to simulate the board, the rules of legal play, and the rules regarding when the game is over and determining the outcome.
Chapter
We review the current state of the art of methods for numerical computation ofNash equilibria for finite n-person games. Classical path following methods, such as theLemke-Howson algorithm for two person games, and Scarf-type fixed point algorithmsfor n-person games provide globally convergent methods for finding a sample equilibrium.For large problems, methods which are not globally convergent, such as sequential linearcomplementarity methods may be preferred on the grounds of speed. None ...
Article
This paper presents a new evolutionary system, i.e., EPNet, for evolving artificial neural networks (ANNs). The evolutionary algorithm used in EPNet is based on Fogel's evolutionary programming (EP). Unlike most previous studies on evolving ANN's, this paper puts its emphasis on evolving ANN's behaviors. Five mutation operators proposed in EPNet reflect such an emphasis on evolving behaviors. Close behavioral links between parents and their offspring are maintained by various mutations, such as partial training and node splitting. EPNet evolves ANN's architectures and connection weights (including biases) simultaneously in order to reduce the noise in fitness evaluation. The parsimony of evolved ANN's is encouraged by preferring node/connection deletion to addition. EPNet has been tested on a number of benchmark problems in machine learning and ANNs, such as the parity problem, the medical diagnosis problems, the Australian credit card assessment problem, and the Mackey-Glass time series prediction problem. The experimental results show that EPNet can produce very compact ANNs with good generalization ability in comparison with other algorithms.
Conference Paper
Although the reinforcement learning and evolutionary algorithm show good results in board evaluation optimization, the hybrid of both approaches is rarely addressed in the literature. In this paper, the evolutionary algorithm is boosted using resources from the reinforcement learning. 1) The initialization of initial population using solution optimized by temporal difference learning 2) Exploitation of domain knowledge extracted from reinforcement learning. Experiments on Othello game strategies show that the proposed methods can effectively search the solution space and improve the performance
Conference Paper
This paper reviews some of the most popular evolutionary multiobjective optimization techniques currently reported in the literature, indicating some of their main applications, their advantages, disadvantages, and degree of applicability. Finally, some of the most promising areas of future research are briefly discussed
Conference Paper
Evolutionary algorithms (EAs) are a class of stochastic search algorithms which are applicable to a wide range of problems in learning and optimisation. They have been applied to numerous problems in combinatorial optimisation, function optimisation, artificial neural network learning, fuzzy logic system learning, etc. This paper first introduces EAs and their basic operators. Then, an overview of three major branches of EAs, i.e. genetic algorithms (GAs), evolutionary programming (EP) and evolution strategies (ESs), is given. Different search operators and selection mechanisms are described. The emphasis of the discussion is on global optimisation by EAs. The paper also presents three simple models for parallel EAs. Finally, some open issues and future research directions in evolutionary optimisation and evolutionary computation in general are discussed
Article
Using a coordinated group of simple solvers to tackle a complex problem is not an entirely new idea. Its root could be traced back hundreds of years ago when ancient Chinese suggested a team approach to problem solving. For a long time, engineers have used the divide-and-conquer strategy to decompose a complex problem into simpler sub-problems and then solve them by a group of solvers. However, knowing the best way to divide a complex problem into simpler ones relies heavily on the available domain knowledge. It is often a manual process by an experienced engineer. There have been few automatic divide-and-conquer methods reported in the literature. Fortunately, evolutionary computation provides some of the interesting avenues to automatic divide-and-conquer methods. An in-depth study of such methods reveals that there is a deep underlying connection between evolutionary computation and ANN ensembles. Ideas in one area can be usefully transferred into another in producing effective algorithms. For example, using speciation to create and maintain diversity had inspired the development of negative correlation learning for ANN ensembles, and an in-depth study of diversity in ensembles. This paper will review some of the recent work in evolutionary approaches to designing ANN ensembles.
Article
In this paper, we propose a change from a perfect paradigm to an imperfect paradigm in evolving intelligent systems. An imperfect evolutionary system (IES) is introduced as a new approach in an attempt to solve the problem of an intelligent system adapting to new challenges from its imperfect environment, with an emphasis on the incompleteness and continuity of intelligence. We define an IES as a system where intelligent individuals optimize their own utility, with the available resources, while adapting themselves to the new challenges from an evolving and imperfect environment. An individual and social learning paradigm (ISP) is presented as a general framework for developing IESs. A practical implementation of the ISP framework, an imperfect evolutionary market, is described. Through experimentation, we demonstrate the absorption of new information from an imperfect environment by artificial stock traders and the dissemination of new knowledge within an imperfect evolutionary market. Parameter sensitivity of the ISP framework is also studied by employing different levels of individual and social learning
Article
An evolutionary algorithm has taught itself how to play the game of checkers without using features that would normally require human expertise. Using only the raw positions of pieces on the board and the piece differential, the evolutionary program optimized artificial neural networks to evaluate alternative positions in the game. Over the course of several hundred generations, the program taught itself to play at a level that is competitive with human experts (one level below human masters). This was verified by playing the best evolved neural network against 165 human players on an Internet gaming zone. The neural network's performance earned a rating that was better than 99.61% of all registered players at the Website. Control experiments between the best evolved neural network and a program that relies on material advantage indicate the superiority of the neural network both at equal levels of look ahead and CPU time. The results suggest that the principles of Darwinian evolution may he usefully applied to solving problems that have not yet been solved by human expertise
Article
. This paper presents a critical review of the most important evolutionary-based multiobjective optimization techniques developed over the years, emphasizing the importance of analyzing their Operations Research roots as a way to motivate the development of new approaches that exploit the search capabilities of evolutionary algorithms. Each technique is briefly described mentioning its advantages and disadvantages, their degree of applicability and some of their known applications. Finally, the future trends in this discipline and some of the open areas of research are also addressed. Keywords: multiobjective optimization, multicriteria optimization, vector optimization, genetic algorithms, evolutionary algorithms, artificial intelligence. 1 Introduction Since the pioneer work of Rosenberg in the late 60s regarding the possibility of using genetic-based search to deal with multiple objectives, this new area of research (now called evolutionary multiobjective optimization) has grown c...
Article
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xii I Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 A. Adversarial Problems : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1. Differences from Other Familiar Problems : : : : : : : : : : : : : : 2 B. Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 1. Related Problems and Methods : : : : : : : : : : : : : : : : : : : : 3 2. The Genetic Algorithm : : : : : : : : : : : : : : : : : : : : : : : : 5 3. Biological Motivation : : : : : : : : : : : : : : : : : : : : : : : : : 5 C. Competitive Coevolution : : : : : : : : : : : : : : : : : : : : : : : : : 6 D. Categories of Applications : : : : : : : : : : : : : : : : : : : : : : : : 8 E. Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 II Theory of Coevolutionary Methods : : : : : : : : : : : : : : : : : : : : : 11 A. Preliminaries : : : : : : : : : : : : : : : : : : : : ...
Article
This paper presents ideas concerning game--tree evaluation that recently improved the author's strong Othello program LOGISTELLO considerably.
Article
We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion function G used by the top-down algorithm. More precisely, if the functions used to label the internal nodes have error 1=2 Gamma fl as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1=ffl) O(1=fl 2 ffl 2 ) and (1=ffl) O(log(1=ffl)=fl 2 ) (respectively) suffice to drive the error below ffl. Thus, small constant advantage over...
Currently, he is an Adjunct Instructor of Computer Science at Brookdale Community College, Lincroft, NJ, as well as a management consultant
  • Edward P Manning
Edward P. Manning (M'07) received the B.S.E.E. degree from Rensselaer Polytechnic Institute, Troy, NY, in 1979 and the M.Eng. degree from Stevens In-stitute of Technology, Hoboken, NJ, in 1991. Currently, he is an Adjunct Instructor of Computer Science at Brookdale Community College, Lincroft, NJ, as well as a management consultant; following a career with Bell Laboratories.