Current Frontiers in Computer Go.
ABSTRACT This paper presents the recent technical advances in Monte Carlo tree search (MCTS) for the game of Go, shows the many similarities and the rare differences between the current best programs, and reports the results of the Computer Go event organized at the 2009 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE2009), in which four main Go programs played against top level humans. We see that in 9 × 9, computers are very close to the best human level, and can be improved easily for the opening book; whereas in 19 × 19, handicap 7 is not enough for the computers to win against top level professional players, due to some clearly understood (but not solved) weaknesses of the current algorithms. Applications far from the game of Go are also cited. Importantly, the first ever win of a computer against a 9th Dan professional player in 9 × 9 Go occurred in this event.
- SourceAvailable from: Remi Munos
Conference Proceeding: Algorithms for Infinitely Many-Armed Bandits.[show abstract] [hide abstract]
ABSTRACT: We consider multi-armed bandit problems where the number of arms is larger than the possible number of experiments. We make a stochastic assumption on the mean-reward of a new selected arm which characterizes its probability of be- ing a near-optimal arm. Our assumption is weaker than in previous works. We describe algorithms based on upper-confidence-bounds appl ied to a restricted set of randomly selected arms and provide upper-bounds on the resulting expected regret. We also derive a lower-bound which matches (up to a logarithmic factor) the upper-bound in some cases.Advances in Neural Information Processing Systems 21, Proceedings of the Twenty-Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 8-11, 2008; 01/2008
- [show abstract] [hide abstract]
ABSTRACT: The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The ﬁrst algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world’s strongest 9 × 9 Go program. Each technique signiﬁcantly improves MoGo’s playing strength.01/2007;
- [show abstract] [hide abstract]
ABSTRACT: Monte-Carlo Tree Search (MCTS) is a new best-first search guided by the results of Monte-Carlo simulations. In this article, we introduce two progressive strategies for MCTS, called progressive bias and progressive unpruning. They enable the use of relatively time-expensive heuristic knowledge without speed reduction. Progressive bias directs the search according to heuristic knowledge. Progressive unpruning first reduces the branching factor, and then increases it gradually again. Experiments assess that the two progressive strategies significantly improve the level of our Go program Mango. Moreover, we see that the combination of both strategies performs even better on larger board sizes.New Mathematics and Natural Computation (NMNC). 01/2008; 04(03):343-357.
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 20071
Current Frontiers in Computer Go
Arpad Rimmel, Olivier Teytaud, Chang-Shing Lee, Shi-Jim Yen, Mei-Hui Wang, and Shang-Rong Tsai
Abstract—This paper presents the recent technical advances
in Monte-Carlo Tree Search for the Game of Go, shows the
many similarities and the rare differences between the current
best programs, and reports the results of the computer-Go event
organized at FUZZ-IEEE 2009, in which four main Go programs
played against top level humans. We see that in 9x9, computers
are very close to the best human level, and can be improved easily
for the opening book; whereas in 19x19, handicap 7 is not enough
for the computers to win against top level professional players,
due to some clearly understood (but not solved) weaknesses of
the current algorithms. Applications far from the game of Go are
also cited. Importantly, the first ever win of a computer against
a 9th Dan professional player in 9x9 Go occurred in this event.
Index Terms—Monte-Carlo Tree Search, Upper Confidence
Trees, Game of Go
in spite of the fact that it is fully observable and has very
Currently, the best algorithms are based on Monte-Carlo
Tree Search , , ; they reach the professional level in
9x9 Go (the smallest, simplest form) and strong amateur level
in 19x19 Go.
During FUZZ-IEEE 2009, in Jeju Island, games were played
between four of the current best programs against a top level
professional player and a high-level amateur. We will use
the results of the different games in order to summarize the
state of the Monte-Carlo Tree Search algorithm, the main
differences between the programs and the current limitations
of the algorithm.
HE game of Go is one of the main challenges in artificial
intelligence. In particular, it is much harder than chess,
History of computer Go.
The ranks in the game of Go are ordered by decreasing Kyu,
increasing Dan, and then increasing professional Dans: 20Kyu
is the lowest level, 19K, 18K, ...,and 1K; 1Dan, 2D, 3D,...,
and 7D; the first professional Dan 1P is then considered as
nearly equivalent to 7D, followed by 2P, 3P, 4P,..., and 9P. The
title ”top pro” is given to professional players who recently
won at least one major tournament.
9x9 Go: In 2007, MoGo won the first ever game against
a pro, Guo Juan 5P, in 9x9, in a blitz game (10 minutes per
side). This was done a second time, with long time settings, in
2008, also by MoGo and against Catalin Taranu 5P. The only
A. Rimmel and O. Teytaud are with the TAO team, Inria Saclay IDF,
LRI, UMR 8623(CNRS - Universite Paris-Sud), bat 490 Universite Paris-
Sud, 91405 Orsay Cedex, France. e-mail:email@example.com. Chang-Shing Lee
and Mei-Hui Wang are with the Dept. of Computer Science and Information
Engineering, National University of Tainan, Taiwan. Shi-Jim Yen is with the
Computer Science and Information Engineering department from the National
Dong Hwa University, Taiwan. Shang-Rong Tsai is with the Chang Jung
Christian University, Taiwan.
wins as black against a pro were realized by MoGo against
Catalin Taranu (5P) in Rennes (France, 2009) and the win
against C.-H. Chou (Taipei, 2009).
19x19 Go: In 1998, Martin M¨ uller could win against Many
Faces Of Go, one of the top programs at that time, in spite of
29 handicap stones, an incredibly big handicap, so big that it
does not make sense for human players. In 2008, MoGo won
the first ever game in 19x19 against a pro, Kim Myungwan,
8P, in Portland; however, this was with the largest usually
accepted handicap, i.e. 9 stones. CrazyStone then won against
a pro with 8 and 7 handicap stones in Tokyo (Aoba Kaori
4P, in 2008); finally, MoGo won with handicap 7 against a
top level human player, Chou-Hsun Chou (9P and winner of
the famous LG Cup in 2007), and against a 1P player with
handicap 6 in Tainan (Taiwan, 2009).
During FUZZ-IEEE 2009 there was the first win of a
computer program (the Canadian program Fuego) against a 9P
player in 9x9 as white. On the other hand, none of the program
could win against Chou-Hsun Chou in 19x19, in spite of the
handicap 7, showing that winning with handicap 7 against a
top level player is still almost impossible for computers, in
spite of the win by MoGo a few months earlier with handicap
7. Also, during FUZZ-IEEE 2009, no program could win as
black in 9x9 Go with komi 7.5 against the top pro.
The two human players.
Chou-Hsun Chou is a top level professional player born in
Taiwan. He became professional in 1993 and reached 7P in
1997 and 9P in 1998. He won the LG Cup in 2007, beating
Hu Yaoyu 2 to 1.
Shen-Su Chang is a 6D amateur from Taiwan.
Technical terms from the game of Go.
In this section we define several Go terms. A group is a
connected set of stones (for 4-connectivity). A liberty is an
empty location, next to a group; a group is captured when it
has no more liberties; it is then removed from the board. A
group is termed dead when it is definitely going to be captured.
An atari is a situation in which a player plays a move in
the liberties of a group, so that only one liberty remains. A
semeai is a fight between two groups, each of them being
alive only if it kills the other (unless seki cases). A seki is a
situation in which two groups have common liberties and none
of the players can play in these liberties without being in self-
atari. The komi is the number of points given to white, as a
compensation for playing second. The handicap in a game is
a number of stones; with handicap N, the black player plays
N stones before white plays its first move. Even games are
games with handicap 0 and komi around 7.5 (the precise komi
depends on federations and rules). A moyo is an area of the
inria-00544622, version 1 - 9 Dec 2010
Author manuscript, published in "IEEE Transactions on Computational Intelligence and Artificial Intelligence in Games (2010) in
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 20072
board where one player has a lot of influence and that could
The rest of this paper is organized as follows: Section II
describes the main concepts in Monte-Carlo Go. Section III
introduces the results and comments for the FUZZ-IEEE 2009
computer Go invited session. Section IV concludes.
II. MONTE CARLO TREE SEARCH ALGORITHM AND
Section II-A describes the main concepts in Monte-Carlo
Go. Section II-B describes techniques for dealing with the
large action space. Section II-C explains how to extract
additional useful information from simulations. Section II-D
presents some expert modules useful for biasing the Monte-
Carlo part. Section II-E will summarize some known differ-
ences between the programs.
A. Main concepts in Monte-Carlo Go
The main concepts in Monte-Carlo Tree Search were de-
fined in , , ; one of the most well known variants
is Upper Confidence Bounds applied to Trees . The main
idea is to construct a tree of possible futures. This tree will be
biased in order to explore more deeply moves that have good
results so far. This is done by the repetition of 4 steps as long
as there is some time left: descent, evaluation, update, growth.
In the descent part, we use the statistics of the tree to chose
new nodes until we reach a node outside of the tree. This is
done by considering that the selection of a child is a bandit
problem. In a bandit problem, you have a fixed number
of arms, each arm is associated to an unknown probability
distribution. At each turn you select an arm and receive a
reward which is drawn according to the distribution of the arm.
Your goal is to maximize your rewards. The formula used to
solve this problem is called a bandit formula and is usually
based on a compromise between exploration and exploitation;
a classical example is given below. This formula is used during
all the descent step.
In the evaluation part of the algorithm, also called playout,
the goal is to have a value for the nodes selected during
the descent part. In order to do that, a legal move is chosen
randomly (but not uniformly) until the game is finished; see
In the update part, the statistics of the tree are updated
according to the result of the game.
In the growth part, the node just outside of the tree selected
at the end of the descent part is added to the tree.
All algorithms based on this principle will be termed Monte-
Carlo Tree Search in the rest of this paper.
An efficient way of solving the bandit problem is to chose
the move with the highest upper confidence bound. This is
done with the UCB formula. It consists in choosing the child
c of the current situation q which maximizes:
• sq(c) is the score of child c of node q;
• n(c) is the number of simulations of move c;
• N(q) is the number of simulations of state q;
• W(n) is the number of won simulations of node n;
• the constant C controls the compromise between ex-
ploitation of good moves and exploration of new moves.
When an other term that plays the role of exploration, like the
RAVE values originating in , is added to the formula, the
constant C becomes usually very small or even zero:
sq(c) = αW(c)
+ (1 − α)Wrave(c)
The “RAVE” values will be defined later (Eq. 4). In the rest of
this paper, we will identify the node c and the move played to
obtain c from q; this is an approximation only, as MoGo has a
transposition table as well as many strong programs; this will
just clarify the equations.
When the bandit part is based on Eq. 1 or a variant of it,
the MCTS is termed UCT (Upper Confidence Trees). In the
case of Go, more sophisticated formula are usually preferred;
nonetheless, UCT provides a very sound and principled way of
designing a general purpose MCTS. This is in particular im-
portant as MCTS is particularly well known for its efficiency
in general game playing, i.e. when the game is not known
in advance and the program must read the rules (in a given
formalism) before playing.
There are also several other modules which enhance the
performance, detailed in sections below.
B. Bandits for large action spaces: introducing a bias in the
The most classical idea for choosing a move in the tree part
is to maximize the score given in Equation 1. However, Equa-
tion 1 gives score +∞ to moves which have no simulation.
This implies that if there are N legal moves at situation q, then
the first N simulations at node q will all choose one different
initial move. This is of course a poor policy. Therefore, other
solutions have been proposed: first play urgency, progressive
widening and progressive unpruning. The last two are based
on ranking heuristics, which are detailed later.
First Play Urgency:  proposes the “first play urgency”
(FPU); this is a constant score, given to moves with no
simulations. The FPU can be improved, e.g. by replacing
the constant by a function of Go expertise. However, FPU
was replaced by other rules in all strong implementations
(note however that for other applications with less expertise
available, FPU might be a good rule of thumb).
Progressive widening:  proposed progressive widening,
consisting in optimizing Eq. 1 only among moves with index
lower than Θ(nK); precisely,
decision(q,n) = argmax
for the nthsimulated move at situation q. This requires the
use of a function index(q,c), which gives to each legal move
c at situation s a rank. Usually, a prior is computed for each
c at situation q, and then index(q,c) is the rank of move
inria-00544622, version 1 - 9 Dec 2010
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 20073
c according to this prior; therefore, what is really needed
for progressive widening is a score for each move, as for
It has been shown in  that even if index(.) is a random
ranking of moves, this algorithm can provide an improvement;
in applications, K ranges between1
efficiency of the heuristic , . Interestingly, with progres-
sive widening, UCT can be applied to problems with infinite
action space. However, in many problems and in particular
in Go and Havannah, progressive unpruning (defined below)
performs better and has been chosen in recent implementations
Progressive unpruning: Instead of an abrupt change as
progressive widening, which adds new moves to the pool of
moves considered in the argmax of Eq. 3,  proposes to
add a term in Eq. 1, e.g. as follows:
2depending on the
H(q,c) is a heuristic function for valuating move c in state
q. The formula above can be adapted in order to take into
account RAVE values as in Eq. 2.
A priori evaluation of moves: There are two main forms of
a priori evaluations of moves, cumulated in best implementa-
• Patterns. In the case of Go, , ,  propose the use
of patterns extracted from a database D of professional
games for building the function index() of progressive
widening (Eq. 3) or the function H() of progressive
unpruning (Eq. 4). Complex and essentially empirical
formula have been derived for this; they work roughly
as follows for estimating the value of a move:
– find the biggest pattern, centered on this move, which
appears in D;
– the empirical probability p1 for this pattern to be
played in D (the confidence of this pattern, in the
usual database terminology);
– the frequency p2 of this pattern in D (the support
of the pattern, in the usual database terminology, i.e.
the number of times the move was played divided
by the size of D);
– the heuristic value is then a linear compromise
between p1and p2(p1being much stronger).
The reader is referred to , ,  for various formulas
combining p1and p2into a H(q,c). There’s no widely
accepted formula; for most important patterns (like e.g.
the empty triangle, the wall, the keima and many others
as described in ), it is worth tuning manually the co-
efficients by tedious experiments - the usual general
formulas don’t reach the state of the art performance.
• Tactical and strategical rules. Important tactical or
strategical rules are used for biasing the tree search, e.g.
atari, extensions, line of influence (positive value for the
moves located on the third line), line of death (negative
value for the sides of the board); see  for more. Some
papers also propose common fate graphs; however,
these common fate graphs have not been extensively
used in successful MCTS implementations, except if
one considers that the use of the notion of groups is a
particular simple form of common fate graphs.
C. Side-informations extracted from simulations
MCTS is based on a huge number of simulations. The
only information which is kept, from these simulations, is the
number of won/lost games at each situation of the tree. It is
somewhat natural to try to extract more informations from
the simulations. The current main works around that are the
owner information, the rapid action value estimates, and the
Owner information: ”Owner information” is the heuristic
consisting in computing, for each location l of a board q,
with which probability it belongs (at the end of simulations
containing q) to the player whose turn it is to move. If
the probability is close to
be important; in CrazyStone, the probability of the move is
increased in the Tree (H(.) in Eq. 3). For example, in Fig.
1 extracted from , we see the probability for a move
to be black/white at the end; this is the owner information,
and the heuristic consists in playing more often, for white
(resp. black), in locations which will be white with probability
? 33% (resp. 67%).
Rapid Action Value Estimates: Rapid Action Value Esti-
mates (RAVE , see also , ) are a heuristic value
for moves. The RAVE value for move m in situation q is as
3, the move is considered to
if black (resp. white) is to play at q, with
• W(q,m)=number of won simulations where black (resp.
white) plays first at m after situation q;
• n(q,m)=number of simulations where black (resp. white)
plays first at m after situation q.
The important point, which makes the difference with the
classical UCT values, is that black (resp. white) plays first
(before white) at m after situation q, but not necessarily at
situation q. RAVE values are updated at each simulation, and
can only be used when a table of RAVE values is stocked
in each node (this moderately extends the space complexity,
as this is just storing one more value alongside the usual
statistics). They provide a big improvement (see discussion
in section II-E).
Criticality: Criticality has been specified in . The idea is
a generalization of the owner information. Whereas the owner
information suggests playing in unsettled territory (see Fig. 1),
the criticality suggests playing in locations highly correlated
with the victory (the semeai in the upper left part of the
Figure). Formally, the criticality of a location m in a situation
q is defined as follows:
−w(m)W + b(m)B
• v(m) is the number of simulations including situation q
won by the owner of m;
inria-00544622, version 1 - 9 Dec 2010
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 20074
around the frontier, in order to extend the domain owned by the player. The drawback is that in e.g. semeais the Monte-Carlo simulator is wrong (e.g. in
the upper-left part, the colors show that the territory belongs to black, while, in fact, the black group is dead and the white lives). The figure and the semeai
example on the upper-left corner are kindly provided by R´ emi Coulom.
Plot of the “Owner” value: blue areas (dark in black and white) are expected to belong to black. We see that the owner value suggests playing
• N is the number of simulations including situation q;
• W (resp. B) is the number of simulations at q won by
white (resp. black);
• w(m) (resp. b(m)) is the number of simulations at q with
m owned by white (resp. black).
We note that the formula is symmetric with regard to black and
white. The first term increases for locations highly correlated
with victory and the second term is a normalization; the
formula is intuitively a covariance.
Criticality was tested without success in Zen (according to
the author’s post in the computer-Go mailing list) and provided
a very little improvement in MoGo. This might be due to
the redundancy with other heuristics (e.g. rapid action value
estimates or Go expertise); nonetheless, criticality and variants
of it are the only current tool for detecting semeais, a very
important weakness of MCTS/UCT (see section II-D).
D. Expertise in the playouts
The design of the playouts is a very sensitive part of the
algorithm. A small modification usually has a huge impact on
the performance, in one way or the other. That’s why it is very
interesting to improve it. It is also the only way to correct some
inherent problem of the UCT algorithm as for example in the
case of nakade (see below) . However, except in some specific
cases, the reasons explaining the success of a modification
are still unknown. The current theory is that the modification
should improve the level of the Monte-Carlo simulations while
keeping the diversity and removing the undue bias. As this is
very hard to predict, all the following modifications have been
validated by numerous experiments.
Sequence-like Monte-Carlo (originating in MoGo): The
main innovation of the early versions of MoGo was the design
of the playouts , . They pointed out that improving the
strength of the playouts directly could lead to a decrease of
performance for the overall algorithm. That is why whereas
previous works on the playouts focused on increasing the
quality of the Monte-Carlo player as a standalone player,
this work designed a Monte-Carlo from a very empirical
point of view (accepting a modification of the playouts if
the MCTS based on these playouts plays better, and not if
the playout generator plays better). All strong algorithms now
use “sequence-like” simulations, in which a move is highly
correlated to the previous move. More precisely, a move is
played in the immediate neighborhood (in 8 connectivity) of
the last move if it matches a database of handcrafted patterns,
which are reasonable for human experts. If there are several
such moves, one of them is randomly chosen and played; if
not, then a randomly chosen move in the board is played (Alg.
A crucial property of the playouts is that it should be
balanced (i.e. equilibrated between black and white); this is
much more important than having a strong playout generator.
Ultimately, if the players play exactly equally well in all
situations, then the playouts are a perfect evaluation function.
The weaknesses of Monte-Carlo Tree Search (detailed later)
are in situations in which the simulations are not equilibrated;
for example, in semeais, Monte-Carlo may give around 50 %
of probability of winning the semeai to each player, even if
the semeai is a clear win for one of the players. This idea of
balancing the simulations was developed in , ; there’s a
recent effort in automatizing this , , with not yet good
results on big boards.
A counterpart to “sequence-like” simulations is the use
of the “fill board” modifications, a kind of “Tenuki”-rule,
which switches to another (empty) part of the goban and
therefore prevents the loss of diversity in the simulations. This
modification is described in detail in . This is somehow
controversial, as this rule (i) brings very big improvements
in MoGo (ii) is not yet tested in many implementations (iii)
is only efficient for long enough time settings (and can be
inria-00544622, version 1 - 9 Dec 2010
JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 20075
detrimental for short time settings).
Algorithm 1 Algorithm for choosing a move in MC simula-
tions. The patterns used for “sequential” moves are described
in . The implementation is a bit more complicated than
that, with some levels more, as well as in Fuego; a significantly
implementation is the one used in CrazyStone (and probably
Zen as well), which updates a complete table of probability
for all moves.
if the last move is an atari, then
Save the stones which are in atari if possible (this is
checked by liberty count).
if there is an empty location among the 8 locations around
the last move which matches a pattern then
Sequential move: play randomly uniformly in one of
if there is a legal move then
Legal move:Play randomly a legal move
Nakade: A nakade is a situation in which a surrounded
group has a single large internal, enclosed space in which the
player won’t be able to establish two eyes if the opponent plays
correctly. Most of current go programs don’t estimate properly
this kind of situation. It is not evaluated by the tree because
no player wants to play there (the Monte-Carlo evaluation is
the same unless many moves are played in the nakade) and it
is not correctly handled by the playouts without the addition
of a specific rule. This situation is a good example of case
where the addition of expert knowledge in the playouts can
contribute to solve the problem. In MoGo, the rule consists
in playing at the center of three empty locations surrounded
by opponent stones. This rule is called in Algorithm 1 before
other rules. It is a simple and efficient modification but it does
not work in all cases of nakade. Examples of nakade solved
and not solved by this method are given in Fig. 2. To the best
of our knowledge, the detailed implementation of Nakade rules
in other programs is not known in details; in Fuego, there is a
simple rule of moving single stone selfataries to the adjacent
Semeai: Semeai are situations where two opponent groups
can’t live without one killing the other or being in seki with
each other. It happens often in Go game and the result of
the semeai (which group is alive at the end) has a huge
impact on the score. That is why it is really important for
a Go program to handle such situations correctly. However, it
often requires a very long sequence of complicated moves to
determine the result, even the order of the moves can matter. In
this case, the tree is often not deep enough to solve the semeai.
There is for the moment no good solution to handle perfectly
those situations but some modifications of the Monte-Carlo
simulations can help. For example, we introduce in MoGo
the approach move. This is described on the left of Fig. 3,
black should play in B before playing in A for killing white;
this is an approach move. In MoGo, we improve the behavior
of Monte Carlo simulations by replacing self atari moves by
a connection to an other group when this is possible. More
details are given in  . However, as shown on the right of
Fig. 3, there are still simple semeai not correctly handled by
Two-liberties rules: A lot of rules in the playouts are based
on the number of liberties of a group. The basic rules, like
avoiding atari and killing group, are based on groups with one
liberty. By creating rules for groups with two liberties, we can
cover a larger number of situations and improve the quality
of the simulations. For example, the two-liberties killing rule
is ”if when removing one of the liberties, the group has no
way to escape (no move can improve the number of liberties),
then play it” and the corresponding two-liberties escape rule
”if one group has two liberties and the opponent can play
a two liberties killing move, then play a move that prevents
it”. Those rules are only examples. They are illustrated on
Fig. 4; see also . Similar rules are implemented in MoGo,
ManyFaces, and Fuego.
Other rules: Other classical rules consist in avoiding big
self-atari (but this can be complicated for nakade situations);
a detailed analysis of several rules (captures, extensions, dis-
tance to the borders, ladder atari and ko atari) and their relative
weights can be found in . Each program has his own expert
rules and they appear to be very implementation-dependent.
A rule that works for one program doesn’t necessarily work
for another. Furthermore, when a program is modified, the
rules might not work any more or at least not with the same
parameters. Therefore, using expert knowledge in the playouts
is very time-consuming in term of experiments. However, it
is worth doing it as we can see for example with the program
Zen: it is currently ranked 2D on KGS and, according to its
creator, possesses a lot of hard coded Go knowledge in its
E. Differences between programs
We here briefly survey the differences between the four
computer-Go programs involved in the games against humans.
There are not a lot of public informations on Zen; Zen is
according to his author’s post on the computer-Go mailing
list based on papers describing CrazyStone , with a lot of
expert knowledge added.
Differences in the playouts:
sequence-like Monte-Carlo based on local patterns. The
Nakade modification described above is used in MoGo and
provides a big improvement in particular in 9x9. Fill board is
used in MoGo but not in other implementations.
Differences in the bias for the bandit part: There is three
main modifications that can be applied to the bandit part of the
algorithm: (i) Rapid Action Value Estimates , (ii) a database
of patterns (as in , ), (iii) expert knowledge (patterns,
tactical and strategical rules detailed in ). The CrazyStone
algorithm in  handles (ii) and (iii) in a unified framework.
The use of those modifications in the different programs is
presented in Table I.
All implementations use
inria-00544622, version 1 - 9 Dec 2010