Conference PaperPDF Available

Large-Scale Parallel Monte Carlo Tree Search on GPU

Authors:

Abstract and Figures

Monte Carlo Tree Search (MCTS) is a method for making optimal decisions in artificial intelligence (AI) problems, typically move planning in combinatorial games. It combines the generality of random simulation with the precision of tree search. The motivation behind this work is caused by the emerging GPU-based systems and their high computational potential combined with relatively low power usage compared to CPUs. As a problem to be solved I chose to develop an AI GPU(Graphics Processing Unit)-based agent in the game of Reversi (Othello) which provides a sufficiently complex problem for tree searching with non-uniform structure and an average branching factor of over 8. I present an efficient parallel GPU MCTS implementation based on the introduced 'block-parallelism' scheme which combines GPU SIMD thread groups and performs independent searches without any need of intra-GPU or inter-GPU communication. I compare it with a simple leaf parallel scheme which implies certain performance limitations. The obtained results show that using my GPU MCTS implementation on the TSUBAME 2.0 system one GPU can be compared to 100-200 CPU threads depending on factors such as the search time and other MCTS parameters in terms of obtained results. I propose and analyze simultaneous CPU/GPU execution which improves the overall result.
Content may be subject to copyright.
Large-Scale Parallel Monte Carlo Tree Search on GPU
Kamil Rocki, Reiji Suda
Department of Computer Science
Graduate School of Information Science and Technology, The University of Tokyo
7-3-1, Hongo, Bunkyo-ku, 113-8654, Tokyo, Japan
Email: kamil.rocki, reiji@is.s.u-tokyo.ac.jp
Abstract—Monte Carlo Tree Search (MCTS) is a method
for making optimal decisions in artificial intelligence (AI)
problems, typically move planning in combinatorial games.
It combines the generality of random simulation with the
precision of tree search. The motivation behind this work is
caused by the emerging GPU-based systems and their high
computational potential combined with relatively low power
usage compared to CPUs. As a problem to be solved I chose
to develop an AI GPU(Graphics Processing Unit)-based agent
in the game of Reversi (Othello) which provides a sufficiently
complex problem for tree searching with non-uniform structure
and an average branching factor of over 8. I present an efficient
parallel GPU MCTS implementation based on the introduced
’block-parallelism’ scheme which combines GPU SIMD thread
groups and performs independent searches without any need of
intra-GPU or inter-GPU communication. I compare it with a
simple leaf parallel scheme which implies certain performance
limitations. The obtained results show that using my GPU
MCTS implementation on the TSUBAME 2.0 system one
GPU can be compared to 100-200 CPU threads depending on
factors such as the search time and other MCTS parameters in
terms of obtained results. I propose and analyze simultaneous
CPU/GPU execution which improves the overall result.
I. INTRODUCTION
Monte Carlo Tree Search (MCTS)[1][2] is a method
for making optimal decisions in artificial intelligence (AI)
problems, typically move planning in combinatorial games.
It combines the generality of random simulation with the
precision of tree search.
Research interest in MCTS has risen sharply due to its
spectacular success with computer Go and potential appli-
cation to a number of other difficult problems. Its application
extends beyond games[6][7][8][9]. The main advantages of
the MCTS algorithm are that it does not require any strategic
or tactical knowledge about the given domain to make
reasonable decisions and algorithm can be halted at any time
to return the current best estimate. Another advantage of
this approach is that the longer the algorithm runs the better
the solution and the time limit can be specified allowing
to control the quality of the decisions made. It provides
relatively good results in games like Go or Chess where
standard algorithms fail. So far, current research has shown
that the algorithm can be parallelized on multiple CPUs.
The motivation behind this work is caused by the emerg-
ing GPU-based systems and their high computational po-
tential combined with relatively low power usage compared
to CPUs. As a problem to be solved I chose developing an
AI GPU(Graphics Processing Unit)-based agent in the game
of Reversi (Othello) which provides a sufficiently complex
problem for tree searching with a non-uniform structure and
an average branching factor of over 8. The importance of
this research is that if the MCTS algorithm can be efficiently
parallelized on GPU(s) it can also be applied to other similar
problems on modern multi-CPU/GPU systems such as the
TSUBAME 2.0 supercomputer. Tree searching algorithms
are hard to parallelize, especially when GPU is considered.
Finding an algorithm which is suitable for GPUs is crucial
if tree search has to be performed on recent supercomput-
ers. Conventional ones do not provide good performance,
because of the limitations of the GPU’s architecture and the
programming scheme, threads’ communication boundaries.
One of the problems is the SIMD execution scheme within
GPU for a group of threads. It means that a standard CPU
parallel implementation such as root-parallelism[3] fail. So
far I were able to successfully parallelize the algorithm and
run it on thousands of CPU threads[4] using root-parallelism.
I research on an efficient parallel GPU MCTS imple-
mentation based on the introduced block-parallelism scheme
which combines GPU SIMD thread groups and performs
independent searches without any need of intra-GPU or
inter-GPU communication. I compare it with a simple leaf
parallel scheme which implies certain performance limita-
tions. The obtained results show that using my GPU MCTS
implementation on the TSUBAME 2.0 system one GPU’s
performance can be compared to 50-100 CPU threads[4]
depending on factors such as the search time and other
MCTS parameters using root-parallelism. The block-parallel
algorithm provides better results than the simple leaf-parallel
scheme which fail to scale well beyond 1000 threads on a
single GPU. The block-parallel algorithm is approximately
4 times more efficient in terms of the number of CPU
threads needed to obtain results comparable with the GPU
implementation. Additionally I are currently testing this
algorithms running on more than 100 GPUs to test its
scalability limits.
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2037
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2033
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2029
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2033
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2033
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2033
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2029
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2034
2011 IEEE International Parallel & Distributed Processing Symposium
1530-2075/11 $26.00 © 2011 IEEE
DOI 10.1109/IPDPS.2011.370
2034
One of the reasons why this problem has not been solved
before is that this architecture is quite new and new appli-
cations are being developed, so far there is no related work.
The scale of parallelism is extreme here (i.e. using 1000s
GPUs of 10000 threads). The published work is related to
hundreds or thousands of CPU cores at most. The existing
parallel schemes[3] rely on algorithm requiring either each
thread to execute the whole code (which does not work well
since GPU is a SIMD device) or synchronization/communi-
cation which is also not applicable.
II. MO NT E CAR LO TR EE SEARCH
A simulation is defined as a series of random moves
which are performed until the end of a game is reached
(until neither of the players can move). The result of this
simulation can be successful, when there was a win in the
end or unsuccessful otherwise. So, let every node iin the tree
store the number of simulations T(visits) and the number
of successful simulations Si. The general MCTS algorithm
comprises 4 steps (Figure 1) which are repeated.
A. MCTS iteration steps
1) Selection: - a node from the game tree is chosen
based on the specified criteria. The value of each node
is calculated and the best one is selected. In this paper,
the formula used to calculate the node value is the Upper
Confidence Bound (UCB).
UCBi=Si
ti
+CrlogT
ti
Where:
Ti- total number of simulations for the parent of node i
C- a parameter to be adjusted
Supposed that some simulations have been performed
for a node, first the average node value is taken and
then the second term which includes the total number
of simulations for that node and its parent. The first
one provides the best possible node in the analyzed tree
(exploitation), while the second one is responsible for the
tree exploration. That means that a node which has been
rarely visited is more likely to be chosen, because the value
of the second terms is greater.
2) Expansion: - one or more successors of the selected
node are added to the tree depending on the strategy. This
point is not strict, in our implementation I add one node per
iteration, so this number can be different.
3) Simulation: - for the added node(s) perform simula-
tion(s) and update the node(s) values (successes, total) - here
in the CPU implementation, one simulation per iteration
is performed. In the GPU implementations, the number
of simulations depends on the number of threads, blocks
SimulationExpansion Backpropagation
Selection
Repeat until time is left
Figure 1. A single MCTS algorithm iteration’s steps
and the method (leaf of block parallelism). I.e. the number
of simulations can be equal to 1024 per iteration for 4
block 256 thread configuration using the leaf parallelization
method.
4) Backpropagation: - update the parents’ values up to
the root nodes. The numbers are added, so that the root node
has the total number of simulations and successes for all of
the nodes and each node contains the sum of values of all
of its successors. For the root/block parallel methods, the
root node has to be updated by summing up results from all
other trees processed in parallel.
III. GPU IMPLEMENTATION
In the GPU implementation, 2 approaches are considered
and discussed. The first one (Figure 2a) is the simple
leaf parallelization, where one GPU is dedicated to one
MCTS tree and each GPU thread performs an independent
simulation from the same node. Such a parallelization should
provide much better accuracy when the great number of
GPU threads is considered. The second approach (Figure
2c), is the proposed in this paper block parallelization
method. It combines both aforementioned schemes. Root
parallelism (Figure 2b) is an efficient method of paralleliza-
tion MCTS on CPUs. It is more efficient than simple leaf
parallelization[3][4], because building more trees diminishes
the effect of being stuck in a local extremum/increases the
chances of finding the true global maximum. Therefore
having nprocessors it is more efficient to build ntrees
rather than performing nparallel simulations in the same
node. Given that a problem can have many local maximas,
starting from one point and performing a search might not
be very accurate in the basic MCTS case. The second one,
leaf parallelism should diminish this effect by having more
samples from a given point. The third one is root parallelism.
Here a single tree has the same properties as each tree in
the sequential approach except for the fact that there are
many trees and the chance of finding the global maximum
increases with the number of trees. The last, our proposed
algorithm, combines those two, so each search should be
more accurate and less local at the same time.
5) Leaf-parallel scheme: This is the simplest paralleliza-
tion method in terms of implementation. Here GPU receives
a root node from the CPU controlling process and performs n
simulations, where ndepends on the dimensions of the grid
(block size and number of blocks). Afterwards the results
are written to an array in the GPU’s memory (0 = loss, 1 =
victory) and CPU reads the results back.
203820342030203420342034203020352035
n simulations
a. Leaf parallelism
n trees
b. Root parallelism
c. Block parallelism
n = blocks(trees) x threads (simulations at once)
Figure 2. An illustration of considered schemes
Multiprocessor Multiprocessor
Multiprocessor Multiprocessor
Multiprocessor Multiprocessor
Multiprocessor Multiprocessor
GPU Hardware
Multiprocessor
Multiprocessor Multiprocessor
Multiprocessor Multiprocessor
Multiprocessor Multiprocessor
GPU Program
Block 0 Multiprocessor
Multiprocessor Multiprocessor
Multiprocessor Multiprocessor
Multiprocessor Multiprocessor
Block 2
Block 4
Block 6
Block 1
Block 3
Block 5
Block 7
SIMD warp SIMD warp
SIMD warpSIMD warp
32 threads (xed, for current hardware)
Thread 0 Thread 1
Thread 4 Thread 5
Thread 2 Thread 3
Thread 6 Thread 7
Thread 8 Thread 9
Thread 12 Thread 13
Thread 10 Thread 11
Thread 14 Thread 15
Thread 16 Thread 17
Thread 20 Thread 21
Thread 18 Thread 19
Thread 22 Thread 23
Number of blocks congurable
Number of threads congurable
Number of MPs xed
Root parallelism
Leaf parallelism
Block parallelism
}
}
Figure 3. Correspondence of the algorithm to hardware
Based on that, the obtained result is the same as in the
basic CPU version except for the fact that the number of
simulations is greater and the accuracy is better.
6) Block-parallel scheme: To maximize the GPU’s simu-
lating performance some modifications had to be introduced.
In this approach the threads are grouped and a fixed number
of them is dedicated to one tree. This method is introduced
due to the hierarchical GPU architecture, where threads
form small SIMD groups called warps and then these warps
form blocks(Figure 3). It is crucial to find the best possible
job division scheme for achieving high GPU performance.
The trees are still controlled by the CPU threads, GPU
simulates only. That means that at each simulation step in
the algorithm, all the GPU threads start and end simulating
at the same time and that there is a particular sequential part
of this algorithm which decreases the number of simulations
per second a bit when the number of blocks is higher. This
is caused by the necessity of managing each tree by the
CPU. On the other hand the more the trees, the better the
performance. In our experiments the smallest number of
threads used is 32 which corresponds to the warp size.
A. Hybrid CPU-GPU processing
I observed that the trees formed by our algorithm using
GPUs are not as deep as the trees when CPUs and root
parallelism are used. It is caused by the time spent on each
GPU’s kernel execution. CPU performs quick single simu-
lations, whereas GPU needs more time, but runs thousands
of threads at once. It would mean that the results are less
accurate, since the CPU tree grows faster in the direction of
the optimal solution. As a solution I experimented on using
hybrid CPU-GPU algorithm(Figure 4). In this approach, the
GPU kernel is called asynchronously and the control is given
back to CPU. Then CPU operates on the same tree (in case
GPU
kernel
execution
time
time
kernel execution call
gpu ready event
cpu control
CPU
can
work
here!
processed by GPU
expanded by CPU
in the meantime
Figure 4. Hybrid CPU-GPU processing scheme
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 7168 14336
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
8
8.5
9x 105
GPU Threads
Simulations/second
Leaf parallelism (block size = 64)
Block parallelism (block size = 32)
Block parallelism (block size = 128)
Figure 5. Block parallelism vs Leaf parallelism, speed
of leaf parallelism) or trees (block parallelism) to increase
their depth. It means that while GPU processes some data,
CPU repeats the MCTS iterative process and checks for the
GPU kernel completion.
Here the results are presented. Our test platform is TSUB-
AME 2.0 supercomputer equipped with a NVIDIA TESLA
C2050 GPUs and Intel Xeon X5670 CPUs. I compare the
speed(Figure 5) and results(Figure 6) of leaf parallelism and
block parallelism using different block sizes. The block size
and their number corresponds to the hardware’s properties.
In those graphs a GPU Player is playing against one CPU
core running sequential MCTS. The main aspect of the
analysis is that despite running fewer simulations in a given
amount of time using block parallelism, the results are much
better compared to leaf parallelism, where the maximal
winning ratio stops at around 0.75 for 1024 threads (16
blocks of 64 threads). The results are better when the block
size is smaller (32), but only when the number of threads
is small (up to 4096, 128 blocks/trees), then the lager block
case(128) performs better. It can be observed in Figure 5
that as I decrease the number of threads per block and at
the same time increase the number of trees, the number of
simulations per second decreases. This is due to the CPU’s
sequential part.
In Figure 7 and 8 I also show a different type of result,
where the X-axis represents current game step and the Y-axis
is the average point difference between 2 players.
In Figure 7 I observe that one GPU outperforms 256 CPUs
in terms both intermediate and final scores. Also I see that
the characteristics of the results using CPUs and GPU are
slightly different, where GPU is stronger at the beginning.
I believe that it might be caused by the larger search space
and therefore I conclude that later the parallel effect of the
GPU is weaker, as the number of distinct samples decreases.
Another reason for this is mentioned depth of the tree which
is lower in the GPU case. I present this in Figure 7.
Also I show that using our hybrid CPU/GPU approach
203920352031203520352035203120362036
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 7168 14336
0.2
0.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
GPU Threads
Win ratio
Leaf parallelism (block size = 64)
Block parallelism (block size = 32)
Block parallelism (block size = 128)
Figure 6. Block parallelism vs Leaf parallelism, final result
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
−1
2
5
8
11
14
17
20
23
26
29
32
35
38
41
44
47
50
Game step
Point difference ( our score − opponent’s score)
2 cpus
4 cpus
8 cpus
16 cpus
32 cpus
64 cpus
128 cpus
256 cpus
1 GPU − block parallelism (block size = 128)
Figure 7. GPU vs root-parallel CPUs
both the tree depth and the result are improved as expected
especially in the last phase of the game.
IV. CONCLUSION
I introduced an algorithm called block-parallelism which
allows to efficiently run Monte Carlo Tree Search on
GPUs achieving results comparable with a hundred of CPU
cores(Figure 7). Block-parallelism is not flawless and not
completely parallel as at most one CPU controls one GPU,
certain part of the algorithm has to be processed sequen-
tially which decreases the performance. I show that block-
parallelism performs better that leaf-parallelism on GPU
and probably is the optimal solution unless the hardware
limitations are not changed. I also show that using CPU and
GPU at the same time I get better results. The are challenges
ahead, such as unknown scalability and universality of the
algorithm. In Figure 9 I present preliminary results of multi
GPU configurations’ scaling using MPI.
V. FUT UR E WO RK
Application of the algorithm to other domain. A more
general task can and should be solved by the algorithm
Scalability analysis. This is a major challenge and
requires analyzing certain number of parameters and
their affect on the overall performance. Currently I
implemented the MPI-GPU version of the algorithm,
but the results are inconclusive, there are several reason
why the scalability can be limited including Reversi
itself.
ACK NOW LE DG EM EN TS
This work is partially supported by Core Research of Evo-
lutional Science and Technology (CREST) project ”ULP-
HPC: Ultra Low-Power, High-Performance Computing via
10 20 30 40 50 60
0
2
4
6
8
10
12
14
Game step
Points
10 20 30 40 50 60
0
10
20
30
40
Game step
Depth
GPU + CPU
GPU
Figure 8. Hybrid CPU/GPU vs GPU-only processing
2 3 4 5
10
6
10
7
Simulations/second
1 2 4 8 16 32
26.5
27
27.5
28
28.5
29
29.5
No of GPUs (112 block x 64 Threads)
Average Point Difference
Figure 9. Multi GPU Results - based on MPI communication scheme
Modeling and Optimization of Next Generation HPC Tech-
nologies” of Japan Science and Technology Agency (JST)
and Grant-in-Aid for Scientific Research of MEXT Japan.
REFERENCES
[1] Monte Carlo Tree Search (MCTS) research hub,
http://www.mcts-hub.net/index.html
[2] Kocsis L., Szepesvari C.: Bandit based Monte-Carlo Planning,
15th European. Conference on Machine Learning Proceed-
ings, 2006
[3] Guillaume M.J-B. Chaslot, Mark H.M. Winands, and H. Jaap
van den Herik: Parallel Monte-Carlo Tree Search, Computers
and Games: 6th International Conference, 2008
[4] Rocki K., Suda R.: Massively Parallel Monte Carlo Tree
Search, Proceedings of the 9th International Meeting High
Performance Computing for Computational Science, 2010
[5] Coulom R.: Efficient Selectivity and Backup Operators in
Monte-Carlo Tree Search, 5th International Conference on
Computer and Games, 2006
[6] Romaric Gaudel, Michle Sebag - Feature Selection as a one-
player game (2010)
[7] Guillaume Chaslot , Steven Jong , Jahn-takeshi Saito , Jos
Uiterwijk - Monte-Carlo Tree Search in Production Manage-
ment Problems (2006)
[8] O. Teytaud et. al, High-dimensional planning with Monte-
Carlo Tree Search (2008)
[9] Maarten P.D. Schadd, Mark H.M. Winands, H. Jaap van den
Herik, Guillaume M.J-B. Chaslot, and Jos W.H.M (2008)
204020362032203620362036203220372037
... Prior work on parallelization has given insights with regard to benefits of using today's multi-threaded processors to speed up MCTS [5]- [10]. However, due to its nature MCTS and the parallelization of it have been primarily studied for discrete state and action spaces. ...
... The parallelization of MCTS is a well researched topic. Most of this research focuses on the game of Go [5]- [10]. The baseline version of MCTS can be parallelized in are commonly referred to, namely leaf parallelization, root parallelization, and tree parallelization [6], [17]. ...
... 1) Mean: The most obvious aggregation is the mean simulation reward, which equals the cumulated sum of rewards over all time steps t over all threads ξ ∈ Ξ, see (10). ...
Preprint
Full-text available
Monte Carlo Tree Search (MCTS) has proven to be capable of solving challenging tasks in domains such as Go, chess and Atari. Previous research has developed parallel versions of MCTS, exploiting today's multiprocessing architectures. These studies focused on versions of MCTS for the discrete case. Our work builds upon existing parallelization strategies and extends them to continuous domains. In particular, leaf parallelization and root parallelization are studied and two final selection strategies that are required to handle continuous states in root parallelization are proposed. The evaluation of the resulting parallelized continuous MCTS is conducted using a challenging cooperative multi-agent system trajectory planning task in the domain of automated vehicles.
... Popular parallelization approaches include Leaf-Parallel MCTS (LeafP) [16] that parallelizes simulations at the same leaf node, and Root-Parallel MCTS (RootP) [17] that create multiple trees at different workers and aggregates their statistics of the subtrees before all the workers complete an MCTS step. [18], [19] proposed block parallelism -a combination of leafP and rootP for GPU acceleration. Recent work WU-UCT [4] proposes a variant of Tree-Parallel MCTS (TreeP), and shows that it maintains superior algorithm performance compared with the other parallelization approaches. ...
Preprint
Full-text available
Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop a scalable CPU-FPGA system for Tree-Parallel MCTS. We propose a novel decomposition and mapping of MCTS data structure and computation onto CPU and FPGA to reduce communication and coordination. High scalability of our system is achieved by encapsulating in-tree operations in an SRAM-based FPGA accelerator. To lower the high data access latency and inter-worker synchronization overheads, we develop several hardware optimizations. We show that by using our accelerator, we obtain up to $35\times$ speedup for in-tree operations, and $3\times$ higher overall system throughput. Our CPU-FPGA system also achieves superior scalability wrt number of parallel workers than state-of-the-art parallel MCTS implementations on CPU.
... Such tree expansion becomes computationally expensive when state-space complexity grows. For example, in Reversi the state-space complexity is 10 28 , for Tic-tac-toe it is 10 3 , and for Go it is 10 171 [7]. Thus, our main contribution of the work is to explore the potentials of both RL based motion planning and deterministic-based motion planning and how can we combine those to obtain fast and robust motion planning for quadtorors. ...
Preprint
Full-text available
Optimal trajectory planning involves obstacles avoidance in which path planning is the key to success in optimal trajectory planning. Due to the computational demands, most of the path planning algorithms can not be employed for real-time based applications. Model-based Reinforcement Learning approaches for path planning got certain success in the recent past. Yet, most of such approaches do not have deterministic output due to the nature of those approaches. We analyzed several types of reinforcement learning-based approaches for path planning. One of them is a deterministic tree-based approach and the other two approaches are based on Q-learning and approximate policy gradient, respectively. We tested preceding approaches on two different type of simulators. Each of which consists of a set of random obstacles which could be changed or moved dynamically. After analysing the result and computation time, we concluded that the deterministic tree search approach provides a highly accurate result. However, the computational time is considerably higher than the other two approaches. Finally, the comparative results are provided in terms of accuracy and computational time as evidence.
... One of them is the method that depends on the computational capability, so that method aims to discover the best moves with searching deeply and widely on search tree. Rocki [15] et. al. proposed to apply the TSUBAME 2.0 supercomputer to discovering the best moves based on Parallel Monte Carlo Tree Search. ...
Article
Full-text available
Recently, several studies have reported that the computer programs are called the game agent exceed to the ability of experts in some board games, e.g., Deep Blue, AKARA, AlphaGO, etc. Meanwhile, human beings have no advantages in terms of numerical ability compared with computers; however, experts often defeat those programs. For this, the aim of many researches for developing agents of board games is to defeat experts in all kinds of computational ways; hence, those depend on the computational capability because those apply deep look ahead search to determination of moves. By contrast to those researches, our final aims are the development of a board game agent does not require the high computational capability and of an “Enjoyable” game agent is tailored skills for a player based on “Simple structure and Algorithms.” To realize our aims, we propose to combine Self-Organizing Maps(SOM) with Reinforcement Learning. For more effective learning of the optimal moves of a board game, our proposal modifies the formula of SOM and introduces the tree search with less calculation load to determine moves in the closing stage. We conduct the two experiments; firstly, we examine the availability of our proposals. Secondly, we aim for improving the winning rate. From the results, the game agent that is developed on the basis of our proposal achieved a 60% winning rate against the opponent program by using the general personal computer. Moreover, those suggest the potential of becoming an “Enjoyable” game agent for every player with diverse skills.
... Depending on the number of parallel threads this can greatly increase the speed at which the win-rate statistics are gathered, however since each playout may take a different length of time, some efficiency is lost waiting for the longest playout to complete. Some implementations have moved these playouts to a GPU [24]. Unfortunately many of the playouts in leaf level parallelization are wasted due to their simultaneous nature: if the first eight playouts all are losses or very lowscoring, it is unlikely that the next eight will do any better, leading to an inherent limitation of this technique. ...
Article
Full-text available
Monte Carlo Tree Search (MCTS) is being effectively used in many domains, but acquiring good results from building larger trees takes time that can in many cases be impractical. In this paper we show that parallelizing the tree building process using multiple independent trees (root parallelization) can improve results when limited time is available, and compare these results to other parallelization techniques and to results obtained from running for an extended time. We obtained our results using MCTS in the domain of computer Go which has the most mature implementations. Compared to previous studies, our results are more precise and statistically significant.
... We will show the framework through a classic use case: the Tic-Tac-Toe game 27 . It is a combinatorial game that represents the group of problems to which MCTS has been most often applied 33 . Like Tic-Tac-Toe, combinatorial games have the following properties 34 :  Two players. ...
Article
Monte-Carlo methods are the basis for solving many computational problems using repeated random sampling in scenarios that may have a deterministic but very complex solution from a computational point of view. In recent years, researchers are using the same idea to solve many problems through the so-called Monte-Carlo Tree Search family of algorithms, which provide the possibility of storing and reusing previously calculated results to improve precision in the calculation of future outcomes. However, developers and researchers working in this area tend to have to carry out software developments from scratch to use their designs or improve designs previously created by other researchers. This makes it difficult to see improvements in current algorithms as it takes a lot of hard work. This work presents JGraphs, a toolset implemented in the Java programming language that will allow researchers to avoid having to reinvent the wheel when working with Monte-Carlo Tree Search. In addition, it will allow testing experiments carried out by others in a simple way, reusing previous knowledge.
... An active area of MCTS research related to our implementation is MCTS parallelization [44] [45]. In [46] the authors present three methods of parallelization which are meant to increase the speed of convergence of the MCTS algorithm. ...
Preprint
Full-text available
Mobile robots hold great promise in reducing the need for humans to perform jobs such as vacuuming, seeding,harvesting, painting, search and rescue, and inspection. In practice, these tasks must often be done without an exact map of the area and could be completed more quickly through the use of multiple robots working together. The task of simultaneously covering and mapping an area with multiple robots is known as multi-robot on-line coverage and is a growing area of research. Many multi-robot on-line coverage path planning algorithms have been developed as extensions of well established off-line coverage algorithms. In this work we introduce a novel approach to multi-robot on-line coverage path planning based on a method borrowed from game theory and machine learning- Monte Carlo Tree Search. We implement a Monte Carlo Tree Search planner and compare completion times against a Boustrophedon-based on-line multi-robot planner. The MCTS planner is shown to perform on par with the conventional Boustrophedon algorithm in simulations varying the number of robots and the density of obstacles in the map. The versatility of the MCTS planner is demonstrated by incorporating secondary objectives such as turn minimization while performing the same coverage task. The versatility of the MCTS planner suggests it is well suited to many multi-objective tasks that arise in mobile robotics.
Article
Optimal motion planning involves obstacles avoidance whereas path planning is the key to success in optimal motion planning. Due to the computational demands, most of the path planning algorithms can not be employed for real-time-based applications. Model-based reinforcement learning approaches for path planning have received particular success in the recent past. Yet, most such approaches do not have deterministic output due to randomness. In this paper, we investigate existing reinforcement learning-based approaches for path planning and propose such an approach for path planning in the 3D environment. One such reinforcement learning-based approach is a deterministic tree-based approach, and the other two approaches are based on Q-learning and approximate policy gradient, respectively. We tested the preceding approaches on two different simulators, each of which consists of a set of random obstacles that can be changed or moved dynamically. After analysing the result and computation time, we concluded that the deterministic tree search approach provides highly stable results. However, the computational time is considerably higher than the other two approaches. Finally, the comparative results are provided in terms of accuracy and computational time.
Article
Robust planning under uncertainty is critical for robots in uncertain, dynamic environments, but incurs high computational cost. State-of-the-art online search algorithms, such as DESPOT, have vastly improved the computational efficiency of planning under uncertainty and made it a valuable tool for robotics in practice. This work takes one step further by leveraging both CPU and GPU parallelization in order to achieve real-time online planning performance for complex tasks with large state, action, and observation spaces. Specifically, Hybrid Parallel DESPOT (HyP-DESPOT) is a massively parallel online planning algorithm that integrates CPU and GPU parallelism in a multi-level scheme. It performs parallel DESPOT tree search by simultaneously traversing multiple independent paths using multi-core CPUs; it performs parallel Monte Carlo simulations at the leaf nodes of the search tree using GPUs. HyP-DESPOT provably converges in finite time under moderate conditions and guarantees near-optimality of the solution. Experimental results show that HyP-DESPOT speeds up online planning by up to a factor of several hundred in several challenging robotic tasks in simulation, compared with the original DESPOT algorithm. It also exhibits real-time performance on a robot vehicle navigating among many pedestrians.
Article
Antenna selection is a promising technology to reduce the hardware complexity for massive multiple-input multiple-output (MIMO) system. However, the design of a near-optimal antenna selection algorithm with low searching complexity is still a challenge. In this paper, we describe a self-supervised learning Monte Carlo Tree Search (MCTS) method to solve the problem of selecting an antenna for the massive MIMO system. The search process for selecting antennas with maximal channel capacity is converted to a decision-making based problem. Based on the system model of antenna selection, we map the components of MIMO system to the basic elements of MCTS such as action, tree state, and reward. The three main search steps of MCTS--selecting, expanding, and backing up--are also redesigned for the antenna selection problem. To improve the search efficiency of the MCTS, we used a linear regression module to extract the features from the channel state information (CSI) and output the prediction to MCTS as prior probability. Since the data and label are generated by the MCTS process itself, the entire process can be considered as a self-supervised learning process. According to the simulation results, the proposed self-supervised learning MCTS-based antenna search method exhibits a high searching efficiency with near-optimal performance, which archives 40% and 15% outage capacity compared with random selection and greedy search selection, respectively. The bit-error rate (BER) performance of the proposed method has about 1-dB gain compared to the greedy search selection method. Compared with the state-of-the-art capacity optimal antenna search methods, the proposed self-supervised learning MCTS-based algorithm reduces about 50% search complexity.
Article
Full-text available
Classical search algorithms rely on the existence of a sufficiently powerful evalua-tion function for non-terminal states. In many task domains, the development of such an evaluation function requires substantial effort and domain knowledge, or is not even possible. As an alternative in recent years, Monte-Carlo evaluation has been succesfully applied in such task domains. In this paper, we apply a search algorithm based on Monte-Carlo evaluation, Monte-Carlo Tree Search, in the task domain of production management problems. These can be defined as single-agent problems which consist of selecting a sequence of actions with side effects, leading to high quantities of one or more goal products. They are challenging and can be constructed with highly variable difficulty. Earlier research yielded an offline learning algorithm that leads to good solutions, but requires a long time to run. We show that Monte-Carlo Tree Search leads to a solution in a shorter period of time than this algorithm, with improved solutions for large problems. Our findings can be generalized to other task domains.
Article
Full-text available
Monte Carlo Tree Search is a method of finding near-optimal solutions for large state-space problems. Currently, is it very important to develop algorithms being able to take advantage of great number of processors in such areas. In this paper MCTS parallel implementations for thousand of cores are presented and discussed. The MCTS paral-lelization method used is the root parallelization. Implementation of the distributed scheme uses MPI. Results presented are based on the Reversi game rules.
Conference Paper
Full-text available
For large state-space Markovian Decision Problems Monte-Carlo planning is one of the few viable approaches to find near-optimal solutions. In this paper we introduce a new algorithm, UCT, that applies bandit ideas to guide Monte-Carlo planning. In finite-horizon or discounted MDPs the algorithm is shown to be consistent and finite sample bounds are derived on the estimation error due to sampling. Experimental results show that in several domains, UCT is significantly more efficient than its alternatives. 1
Conference Paper
Full-text available
Monte-Carlo Tree Search (MCTS) is a new best-rst search method that started a revolution in the eld of Computer Go. Paral- lelizing MCTS is an important way to increase the strength of any Go program. In this article, we discuss three parallelization methods for MCTS: leaf parallelization, root parallelization, and tree parallelization. To be eective tree parallelization requires two techniques: adequately handling of (1) local mutexes and (2) virtual loss. Experiments in 13 13 Go reveal that in the program Mango root parallelization may lead to excellent results for a specic time setting and specic program parame- ters. However, as soon as the selection mechanism is able to handle more adequately the balance of exploitation and exploration, tree paralleliza- tion should have attention too and could become a second choice for parallelizing MCTS. Preliminary experiments on the smaller 9 9 board provide promising prospects for tree parallelization.
Article
This paper formalizes Feature Selection as a Reinforcement Learning problem, leading to a provably optimal though intractable selection policy. As a second contribution, this paper presents an approximation thereof, based on a one-player game approach and relying on the Monte-Carlo tree search UCT (Upper Confidence Tree) proposed by Kocsis and Szepesvari (2006). The Feature Uct SElection (FUSE) algorithm extends UCT to deal with i) a finite unknown horizon (the target number of relevant features); ii) the huge branching factor of the search tree, reflecting the size of the feature set. Finally, a frugal reward function is proposed as a rough but unbiased estimate of the relevance of a feature subset. A proof of concept of FUSE is shown on benchmark data sets.
Conference Paper
Monte-Carlo evaluation consists in estimating a position by averaging the outcome of several random continuations, and can serve as an evaluation function at the leaves of a min-max tree. This paper presents a new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte-Carlo phase. Instead of backing-up the min-max value close to the root, and the average value at some depth, a more general backup operator is defined that progressively changes from averaging to min-max as the number of simulations grows. This approach provides a fine-grained control of the tree growth, at the level of individual simulations, and allows efficient selectivity methods. This algorithm was implemented in a Go-playing program, Crazy Stone, that won the gold medal of the $9 \times 9$ Go tournament at the 11th Computer Olympiad.
Jaap van den Herik: Parallel Monte-Carlo Tree Search
  • M.J-B Guillaume
  • Chaslot
  • H M Mark
  • H Winands
Guillaume M.J-B. Chaslot, Mark H.M. Winands, and H. Jaap van den Herik: Parallel Monte-Carlo Tree Search, Computers and Games: 6th International Conference, 2008
High-dimensional planning with MonteCarlo Tree Search
  • O Teytaud
O. Teytaud et. al, High-dimensional planning with MonteCarlo Tree Search (2008)
  • P D Maarten
  • Mark H M Schadd
  • H Winands
  • Jaap Van Den Herik
  • M Guillaume
Maarten P.D. Schadd, Mark H.M. Winands, H. Jaap van den Herik, Guillaume M.J-B. Chaslot, and Jos W.H.M (2008)