About
101
Publications
85,688
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,556
Citations
Introduction
My main research areas are games and automated planning. My recent research focus is on exploration methods in heuristic search, including Monte Carlo Tree Search and random walks. I am looking for a better understanding of these techniques, especially of how they interact with the traditional components of heuristic search - search and knowledge. I enjoy building high performance systems with my students, such as the Fuego framework for games, and planners such as MacroFF, Arvand, and Jasper.
Current institution
Additional affiliations
September 2000 - present
Education
September 1989 - May 1995
September 1983 - May 1989
Publications
Publications (101)
Boolean Satisfiability (SAT) is a well-known NP-complete problem. Despite this theoretical hardness, SAT solvers based on Conflict Driven Clause Learning (CDCL) can solve large SAT instances from many important domains. CDCL learns clauses from conflicts, a technique that allows a solver to prune its search space. The selection heuristics in CDCL p...
In conflict-directed clause learning (CDCL) SAT solving, a state-of-the-art criterion to measure the importance of a learned clause is called literal block distance (LBD), which is the number of distinct decision levels in the clause. The lower the LBD score of a learned clause, the better is its quality. The learned clauses with LBD score of 2, ca...
In this paper, we investigate Exploratory Conservative Policy Optimization (ECPO), a policy optimization strategy that improves exploration behavior while assuring monotonic progress in a principled objective. ECPO conducts maximum entropy exploration within a mirror descent framework, but updates policies using reversed KL projection. This formula...
Domain-specific knowledge plays a significant role in the success of many Monte Carlo Tree Search (MCTS) programs. The details of how knowledge affects MCTS are still not well understood. In this paper, we focus on identifying the effects of different types of knowledge on the behaviour of the Monte Carlo Tree Search algorithm, using the game of Go...
A state-of-the-art criterion to evaluate the importance of a given learned clause is called Literal Block Distance (LBD) score. It measures the number of distinct decision levels in a given learned clause. The lower the LBD score of a learned clause, the better is its quality. The learned clauses with LBD score of 2, called glue clauses, are known...
AlphaGo Zero pioneered the concept of two-head neural networks in Monte Carlo Tree Search (MCTS), where the policy output is used for prior action probability and the state-value estimate is used for leaf node evaluation. We propose a three-head neural net architecture with policy, state- and action-value outputs, which could lead to more efficient...
In this abstract, we present our study of exploring the SAT search space via random-sampling, with the goal of improving Conflict Directed Clause Learning (CDCL) SAT solvers. Our proposed CDCL SAT solving algorithm expSAT uses a novel branching heuristic expVSIDS. It combines the standard VSIDS scores with heuristic scores derived from exploration....
This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real-time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate va...
In this paper, we present our study of exploring the SAT search space via random-sampling, with the goal of improving Conflict Directed Clause Learning (CDCL) SAT solvers. Our proposed CDCL SAT solving algorithm expSAT uses a novel branching heuristic expVSIDS. It combines the standard VSIDS scores with heuristic scores derived from exploration. Ex...
Using deep convolutional neural networks for move prediction has led to massive progress in computer Go. Like Go,
Hex
has a large branching factor that limits the success of shallow and selective search. We show that deep convolutional neural networks can be used to produce reliable move evaluation in the game of
Hex
. We begin by collecting self...
Games have simple, fixed rules as well as clear results such as win, draw, or loss. However, developing algorithms for solving games has been a difficult challenge in Artificial Intelligence, because of the combinatorial complexity that the algorithms must tackle.
This chapter presents an overview of successful approaches and results accomplished t...
Proof Number search (PNS) is an effective algorithm for searching theoretical values on games with non-uniform branching factors. Focused depth-first proof number search (FDFPN) with dynamic widening was proposed for Hex where the branching factor is nearly uniform. However, FDFPN is fragile to its heuristic move ordering function. The recent advan...
We study the problem of identifying the best action among a set of possible options when the value of each action is given by a mapping from a number of noisy micro-observables in the so-called fixed confidence setting. Our main motivation is the application to the minimax game search, which has been a major topic of interest in artificial intellig...
Recently, Factorization Bradley-Terry (FBT) model is introduced for fast move prediction in the game of Go. It has been shown that FBT outperforms the state-of-the-art fast move prediction system of Latent Factor Ranking (LFR). In this paper, we investigate the problem of integrating feature knowledge learned by FBT model in Monte Carlo Tree Search...
ArvandHerd is a satisficing parallel planner that has been entered in the 2011 International Planning Competition (IPC 2011). It uses a portfolio-based approach where the portfolio contains four configurations of the Arvand planner and one configuration of the LAMA planner. Each processor runs a sin-gle planner, and the execution is mostly independ...
The game of Amazons is a modern board game with simple rules and nice mathematical properties. It has a high computational complexity. In 2001, the starting position on a 5 × 5 board was proven to be a first player win. The enhanced Amazons solver presented here extends previous work in the following five ways: by building more powerful endgame dat...
Games have simple, fixed rules as well as clear results such as win, draw, or loss. However, developing algorithms for solving games has been a difficult challenge in Artificial Intelligence, because of the combinatorial complexity that the algorithms must tackle.
This chapter presents an overview of successful approaches and results accomplished t...
In recent years the Monte Carlo tree search revolution has spread from computer Go to many areas, including computer Hex. MCTS-based Hex players now outperform traditional knowledge-based alpha-beta search players, and the reigning Computer Olympiad Hex gold medallist is the MCTS player MoHex. In this paper we show how to strengthen MoHex, and obse...
Monte-Carlo Tree Search methods have led to huge progress in computer Go. Still, program performance is uneven - most current Go programs are much stronger in some aspects of the game, such as local fighting and positional evaluation, than in other aspects. Well known weaknesses of many programs include (1) the handling of several simultaneous figh...
In Monte-Carlo Tree Search, simulations play a crucial role since they replace the evaluation function used in classical game-tree search and guide the development of the game tree. Despite their importance, not too much is known about the details of how they work. This paper starts a more in-depth study of simulations, using the game of Go, and in...
For decades, researchers have taught computers to play games in order to test their cognitive abilities against those of humans. In 1997, when an IBM computer called Deep Blue beat Garry Kasparov, the reigning world champion, at chess, many people assumed that computer scientists would eventually develop artificial intelligences that could triumph...
ArvandHerd is a sequential satisficing planner that uses a portfolio consisting of LAMA and Arvand. This planner won the multi-core track of the 2011 International Planning Com-petition. In this paper, we describe the various components of ArvandHerd, the updates made for the 2014 competition, and the modifications that allow ArvandHerd to compete...
Random walks are a relatively new component used in several state of the art satisficing planners. Empirical results have been mixed: while the approach clearly outperforms more systematic search methods such as weighted A* on many planning domains, it fails in many others. So far, the explanations for these empirical results have been somewhat ad...
Most of the satisficing planners which are based on heuristic search iteratively improve their solution quality through an anytime approach. Typically, the lowest-cost solution found so far is used to constrain the search. This avoids areas of the state space which cannot directly lead to lower cost solutions. However, in this paper we show that wh...
Random walks have become a popular component of recent planning systems. The increased exploration is a valuable addition to more exploitative search methods such as Greedy Best First Search (GBFS). A number of successful planners which incorporate random walks have been built. The work presented here aims to exploit the experience gained from buil...
Solving games is a challenging and attractive task in the domain of Artificial Intelligence.
Despite enormous progress, solving increasingly difficult games or game positions continues
to pose hard technical challenges. Over the last twenty years, algorithms based on the
concept of proof and disproof numbers have become dominating techniques for ga...
ArvandHerd is a parallel planner that won the multi-core sequential satisficing track of the 2011 International Planning Competition (IPC 2011). It assigns processors to run different mem-bers of an algorithm portfolio which contains several configurations of each of two different planners: LAMA-2008 and Arvand. In this paper, we demonstrate that s...
Temporal-difference learning is one of the most successful and broadly applied solutions to the reinforcement learning problem; it has been used to achieve master-level play in chess, checkers and backgammon. The key idea is to update a value function from episodes of real experience, by bootstrapping from future value estimates, and using value fu...
Automated algorithm configurators have been shown to be very effective for finding good configurations of high performance algorithms for a broad range of computationally hard problems. As we show in this work, the standard protocol for using these configurators is not always effective. We propose a simple and computationally inexpensive modificati...
Random walks are a relatively new component used in several state of the art satisficing planners. Em-pirical results have been mixed: while the approach clearly outperforms more systematic search methods such as weighted A* on many planning domains, it fails in many others. So far, the explanations for these em-pirical results have been somewhat a...
The UCT (Upper Confidence Bounds applied to Trees) al-gorithm has allowed for significant improvements in a number of games, most notably the game of Go. Move groups is a modification that greatly reduces the branching factor at the cost of increased search depth and as such may be used to enhance the performance of UCT. From the re-sults of the ex...
In Go and Hex, we examine the effect of a blunder — here, a random move — at various stages of a game. For each fixed move number, we run a self-play tournament to determine the expected blunder cost at that point.
The ideas of local search and random walks have been used successfully in several recent satisficing planners. Random Walk-Driven Local Search (RW-LS) is a strong new addi-tion to this family of planning algorithms. The method uses a greedy best-first search driven by a combination of random walks and direct node evaluation. In this way, RW-LS bal-...
Much recent work in satisficing planning has aimed at striking a balance between coverage - solving as many problems as possible - and plan quality. Current planners achieve near perfect coverage on the latest IPC benchmarks. It is therefore natural to investigate their scaling behavior on more difficult instances. Among state of the art planners,...
Much recent work in satisficing planning has aimed at striking a balance between coverage - solving as many problems as possible - and plan quality. Current planners achieve near perfect coverage on the latest IPC benchmarks. It is therefore natural to investigate their scaling behavior on more difficult instances. Among state of the art planners,...
The eight papers in this special issue cover Go, Lines of Action, Hex, single-player general game playing, parallelization in Go, and analyzing game records using Monte Carlo techniques.
Arvand is a stochastic planner that uses Monte Carlo random walks (MRW) planning to balance exploration and exploita-tion in heuristic search. Herein, we focus on the latest de-velopments of Arvand submitted to IPC'11: smart restarts, the online parameter learning system, and the integration of Arvand and the postprocessing system Aras.
The Monte-Carlo tree search algorithm Upper Confidence bounds applied to Trees (UCT) has become extremely popular in computer games research. The Rapid Action Value Estimation (RAVE) heuristic is a strong estimator that often improves the performance of UCT-based algorithms. However, there are situations
where RAVE misleads the search whereas pure...
A ubiquitous feature of planning problems -- problems involving the automatic generation of action sequences for attaining a given goal -- is the need to economize limited resources such as fuel or money. While heuristic search, mostly based on standard algorithms such as A*, is currently the superior method for most varieties of planning, its abil...
Fuego is an open-source software framework for developing game en- gines for full-information two-player board games, with a focus on the game of Go. It was mainly developed by the Computer Go group of the University of Alberta. Fuego includes a Go engine with a playing strength that is com- petitive with the top programs in 9 ◊ 9 Go, and respectab...
An important part of the creation of a housing subdivision is the design and layout of sewers underneath the road. This is a challenging cost optimization problem in a continuous threedimensional space. In this paper, heuristic-search-based techniques are proposed for tackling this problem. The result is new algorithms that can quickly find near op...
Compared to optimal planners, satisficing planners can solve much harder problems but may produce overly costly and long plans. Plan quality for satisficing planners has become increasingly important. The most recent planning competition IPC-2008 used the cost of the best known plan divided by the cost of the generated plan as an evaluation metric....
With the recent success of Monte-Carlo tree search algorithms in Go and other games, and the increasing number of cores in standard CPUs, the ef- ficient parallelization of the search has become an important issue. We present a new lock-free parallel algorithm for Monte-Carlo tree search which takes ad- vantage of the memory model of the IA-32 and...
Monte-Carlo tree search, especially the UCT algorithm and its en- hancements, have become extremely popular. Because of the importance of this family of algorithms, a deeper understanding of when and how the different en- hancements work is desirable. To avoid the hard to analyze intricacies of tournament- level programs in complex games, this work...
Search methods based on Monte-Carlo simulation have recently led to breakthrough performance im- provements in difficult game-playing domains such as Go and General Game Playing. Monte-Carlo Random Walk (MRW) planning applies Monte- Carlo ideas to deterministic classical planning. In the forward chaining planner ARVAND, Monte- Carlo random walks ar...
Depth-first proof-number (df-pn) search is a powerful member of the family of algorithms based on proof and disproof numbers. While df-pn has succeeded in practice, its theoretical properties remain poorly understood. This paper resolves the question of completeness of df-pn: its ability to solve any finite boolean-valued game tree search problem i...
Previous safety-of-territory solvers for the game of Go have worked on whole regions surrounded by stones of one color. Their
applicability is limited to small to medium-size regions. We describe a new technique that is able to prove that parts of
large regions are safe. By using pairs of dividing points, even huge regions can be divided into small...
Local search in the game of Go is easier if local areas have well-defined boundaries. An artificial boundary consists of temporarily
added stones that close off an area. This paper describes a new general framework for finding boundaries in a way such that
existing local search methods can be used. Furthermore, by using a revised local UCT search m...
We present a reinforcement learning architec- ture, Dyna-2, that encompasses both sample- based learning and sample-based search, and that generalises across states during both learning and search. We apply Dyna-2 to high performance Computer Go. In this do- main the most successful planning methods are based on sample-based search algorithms, such...
The game of checkers has roughly 500 billion billion possible positions (5 × 1020). The task of solving the game, determining the final result in a game with no mistakes made by either player, is daunting.
Since 1989, almost continuously, dozens of computers have been working on solving checkers, applying state-of-the-art artificial
intelligence te...
We explore an application to the game of Go of a reinforcement learning approach based on a linear evaluation function and large numbers of binary features. This strategy has proved effective in game playing programs and other reinforcement learning applications. We apply this strategy to Go by creating over a million features based on templates fo...
Thomsen's λ search and Nagai's depth-first proof- number (DFPN) search are two powerful but very different AND/OR tree search algorithms. Lambda Depth-First Proof Number search (LDFPN) is a novel algorithm that combines ideas from both algorithms. λ search can dramatically reduce a search space by finding different levels of threat sequences. DFPN...
Research on macro-operators has a long history in planning and other search applications. There has been a revival of interest in this topic, lead- ing to systems that successfully combine macro- operators with current state-of-the-art planning ap- proaches based on heuristic search. However, re- search is still necessary to make macros become a st...
This paper presents SAFETY SOLVER 2.0 , a safety-of-territory solver for the game of Go that can solve problems in areas with open boundaries. Pre- vious work on assessing safety of territory has concentrated on regions that are completely surrounded by stones of one player. SAFETY SOLVER 2.0can iden- tify open boundary problems under real game con...
Seki is a situation of coexistence in the game of Go, where neither player can profitably capture the opponent’s stones. This
paper presents a new method for deciding whether an enclosed area is or can become a seki. The method combines local search
with global-level static analysis. Local search is used to identify possible seki, and reasoning on...
Probabilistic combinatorial games (PCG) are a model for Go-like games recently introduced by Ken Chen. They differ from normal combinatorial games since terminal position in each subgame are evaluated by a probability distribution. The distribution expresses the uncertainty in the local evaluation. This paper focuses on the analysis and solution me...
There are two complementary approaches to playing sums of combinatorial games. They can be characterized as local analysis and global search. Algorithms from combinatorial game theory such as Hotstrat and Thermostrat [2] exploit summary information about each subgame such as its temperature or its thermograph. These algorithms can achieve good play...
The Graph–History interaction (GHI) problem is a notorious problem that causes game-playing programs to occasionally return incorrect solutions. This paper presents a practical method to cure the GHI problem for the case of the df-pn algorithm. Results in the game of Go with the situational super-ko rule show that the overhead incurred by our metho...
Abstract Despite recent progress in AI planning, many benchmarks remain challenging for cur- rent planners. In many domains, the performance of a planner can greatly be improved by discovering and exploiting information about the domain structure that is not explicitly encoded in the initial PDDL formulation. In this paper we present and compare tw...
AI has had notable success in building high- performance game-playing programs to compete against the best human players. However, the availability of fast and plentiful machines with large memories and disks creates the possibility of a game. This has been done before for simple or relatively small games. In this paper, we present new ideas and al...
Despite recent progress in AI planning, many benchmarks re- main challenging for current planners. In many domains, the performance of a planner can greatly be improved by discov- ering and exploiting information about the domain structure that is not explicitly encoded in the initial PDDL formulation. In this paper we present an automated method t...
Decomposition search is a divide and conquer approach that splits a game position into sub-positions and computes the global outcome by combining results of local searches. This approach has been shown to be successful to play endgames in the game of Go. This pa- per introduces dynamic decomposition search as a way of splitting a problem dynamicall...
In games research, Go is considered the classical board game that is most resistant to current AI techniques. Large-scale knowledge engineering has been considered indispensable for building state of the art programs, even for subprob- lems such as Life and Death, or tsume-Go. This paper de- scribes the technologies behind TSUMEGO EXPLORER ,a high-...
There are two complementary approaches to playing sums of combinatorial games. They can be characterized as local analysis and global search. Algorithms from combinatorial game theory such as Hotstrat and Thermostrat [2] exploit summary information about each subgame such as its temperature or its thermograph. These algorithms can achieve good play...
Most Go-playing programs use a combination of search and heuristics based on an influence function to determine whether territories are safe. However, to assure the correct evaluation of Go positions, the safety of stones and territories must be proved by an exact method.
The first exact algorithm, due to Benson [1], determines the unconditional sa...
We propose an incremental algorithm for the problem of maintaining
systems of difference constraints. As a difference from the unidirectional approach
of Ramalingam et al., it employs bidirectional search, which is
similar to that of Alpern et al., and has a bounded runtime
complexity in the worst case in terms of the size of changes. The major cha...
Since the state space of most games is a directed graph, many game-playing systems detect repeated positions with a trans- position table. This approach can reduce search effort by a large margin. However, it suffers from the so-called Graph History Interaction (GHI) problem, which causes errors in games containing repeated positions. This paper pr...
Despite major progress in AI planning over the last few years, many interesting domains remain challenging for cur- rent planners. This paper presents component abstraction, an automatic and generic technique that can reduce the complex- ity of an important class of planning problems. Component abstraction uses static facts in a problem denition to...
Game-SAT is a 2-player version of SAT where two players (MAX and MIN) play on a SAT instance by alternatively selecting a variable and assigning it a value true or false. MAX tries to make the formula true, while MIN tries to make it false. The Game-SAT problem is to determine the winner of a SAT instance under the rules above, assuming the perfect...
Temperature Discovery Search (TDS) is a new minimax- based game tree search method designed to compute or ap- proximate the temperature of a combinatorial game. TDS is based on the concept of an enriched environment, where a combinatorial game G is embedded in an environment con- sisting of a large set of simple games of decreasing temper- ature. O...
The problem of path-finding in commercial computer games has to be solved in real time, often under constraints of limited memory and CPU resources. The computational effort required to find a path, using a search algorithm such as A*, increases with size of the search space. Hence, path-finding on large maps can result in serious performance bottl...
We study the problem of incrementally maintaining a topological sorting in a large DAG. The Discovery Algorithm (DA) of Alpern et al. (Proc. 1st Annual ACM-SIAM Symp. on Discrete Algorithms, 1990, pp. 32-42) computes a cover K of nodes such that a solution to the modified problem can be found by changing node priorities within K only. It achieves a...
Conditional combinatorial games (CCG) are a new tool developed for describing loosely coupled games. The definition of CCG is based on the one for classical independent combinatorial games. However, play in a CCG depends on its global context: certain moves are legal only if a nonlocal context condition is currently true. Compared with independent...
Despite major progress in AI planning over the last few years, many interesting domains remain challenging for current planners. Topological abstraction can reduce planning complexity in several domains, decomposing a problem into a two-level hierarchy. This paper presents LAP, a planning model based on topological abstraction. In formalizing LAP a...
Search algorithms based on the notion of proof and disproof numbers have been shown to be effective in many games. In this paper, we modify the depth-first proof-number search algorithm df-pn, in order to apply it to the game of Go. We develop a solver for one-eye problems, a special case of enclosed tsume-Go ( life and death) problems. Our results...
Heuristic search has been successful for games like chess and checkers, but seems to be of limited value in games such as Go and shogi, and puzzles such as Sokoban. Other techniques are necessary to approach the performance that humans achieve in these hard domains. This paper explores using planning as an alternative problem-solving framework for...
Computer Go is one of the biggest challenges faced by game programmers. This survey describes the typical components of a Go program, and discusses knowledge representation, search methods and techniques for solving specific subproblems in this domain. Along with a summary of the development of computer Go in recent years, areas for future research...
Victor Allis' proof-number search is a powerful best-first tree search method which can solve games by repeatedly expanding a most-proving node in the game tree. A well-known problem of proof-number search is that it does not account for the effect of transpositions. If the search builds a directed acyclic graph instead of a tree, the same node can...
Capturing races or semeai are an important element of Go strategy and tactics. We extend previous work on semeai by introducing a more general framework for analyzing semeai, based on the new concepts of conditional combinatorial games and liberty count games. We show how this framework encompasses earlier concepts such as plain liberty regions and...
Position evaluation is a critical component of Go programs. This paper de-scribes both the exact and the heuristic methods for position evaluation that are used in the Go program Explorer, and outlines some requirements for developing better Go evaluation functions in the future.
Computer programs based on minimax search have achieved great success, solving a number of classic games including Gomoku and Nine Men's Morris, and reaching a performance that approaches or surpasses the best human players in other well-known games such as checkers, Othello and chess. All these high-performance game-playing programs use global sea...
In computer game-playing, the established method for constructing an evaluation function uses a scalar value computed as a weighted sum of features. This paper advocates the use of partial order evaluation, and describes an efficient new search method called partial order bounding (POB).Previous tree search algorithms using a partial order evaluati...
Computer Go is maybe the biggest challenge faced by game programmers. Despite considerable work and much progress in solving
specific technical problems, overall playing strength of Go programs lags far behind most other games. This review summarizes
the development of computer Go in recent years and points out some areas for future research.
Large-scale minimax search has been used with greatsuccess in many games, but not in Go. We investigatethe reasons for the diculty of applying minimaxsearch to Go, using late stage endgames as a testcase.1 Minimax Tree Search in GoDeep minimax search is the engine powering mostcomputer programs for two-player games with perfectinformation. It has l...
The field of Computer Go has seen impressive progress over the last decade. However, its future prospects are unclear. This
paper suggests that the obstacles to progress posed by the current structure of the community are at least as serious as the
purely technical challenges. To overcome these obstacles, I develop three possible scenarios, which a...
Thermography [1] is a powerful method for analyzing combinatorial games. It has been extended to games that contain loops in their game graph by Berlekamp [2]. We survey the main ideas of this method and discuss how it applies to Go endgames. After a brief review of the methodology, we develop an algorithm for generalized thermography and describe...
Although an interactive system may be dedicated to a specific application, if it aims at a heterogeneous user community it must provide many application-independent functions, such as: User interface, an explanatory and communications component, and data base functions for data structuring and visualization. In other words, every user-friendly inte...
The Smart Game Board, a software workbench dedicated to the development of game-playing programs, has been used to implement half a dozen programs that play different games. We describe its use in the development of three Go-playing programs: Explorer and its two offspring, Go Intellect and Swiss Explorer. It took four years to build and refine the...
We revisit the problem of constructing an evaluation function for game tree search. While the standard model assumes a numeric evaluation function, partial orders have some desirable properties for constructing a more meaningful evaluation. However, previous par- tial order tree search algorithms have been quite com- plex. We introduce partial orde...