ArticlePDF AvailableLiterature Review

Recent Advances in General Game Playing

Authors:

Abstract and Figures

The goal of General Game Playing (GGP) has been to develop computer programs that can perform well across various game types. It is natural for human game players to transfer knowledge from games they already know how to play to other similar games. GGP research attempts to design systems that work well across different game types, including unknown new games. In this review, we present a survey of recent advances (2011 to 2014) in GGP for both traditional games and video games. It is notable that research on GGP has been expanding into modern video games. Monte-Carlo Tree Search and its enhancements have been the most influential techniques in GGP for both research domains. Additionally, international competitions have become important events that promote and increase GGP research. Recently, a video GGP competition was launched. In this survey, we review recent progress in the most challenging research areas of Artificial Intelligence (AI) related to universal game playing.
This content is subject to copyright. Terms and conditions apply.
Review Article
Recent Advances in General Game Playing
Maciej Uwiechowski,1HyunSoo Park,2Jacek MaNdziuk,3and Kyung-Joong Kim2
1Systems Research Institute, Polish Academy of Sciences, Ulica Newelska 6, 01-447 Warsaw, Poland
2Department of Computer Science and Engineering, Sejong University, Seoul, Republic of Korea
3Faculty of Mathematics and Information Science, Warsaw University of Technology, Ulica Koszykowa 75, 00-662 Warsaw, Poland
Correspondence should be addressed to Kyung-Joong Kim; kimkj@sejong.ac.kr
Received  January ; Revised  June ; Accepted  July 
Academic Editor: Billy Yu
Copyright ©  Maciej ´
Swiechowski et al. is is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
e goal of General Game Playing (GGP) has been to develop computer programs that can perform well across various game types.
It is natural for human game players to transfer knowledge from games they already know how to play to other similar games. GGP
research attempts to design systems that work well across dierent game types, including unknown new games. In this review, we
present a survey of recent advances ( to ) in GGP for both traditional games and video games. It is notable that research on
GGP has been expanding into modern video games. Monte-Carlo Tree Search and its enhancements have been the most inuential
techniques in GGP for both research domains. Additionally,international competitions have become important eventst hat promote
and increase GGP research. Recently, a video GGP competition was launched. In this survey, we review recent progress in the most
challenging research areas of Articial Intelligence (AI) related to universal game playing.
1. Introduction
Games have always been an important platform for research
on Articial Intelligence (AI). Since the early days of AI, many
popular board games, such as chess and checkers, have been
used to demonstrate the potential of emerging AI techniques
to solve combinatorial problems. Recently, some board games
were declared nearly or completely solved (i.e., there are
programs capable of playing a particular game optimally and
neither humans nor other computer programs can perform
better) [,]. ese programs are based on sophisticated
tree-based search algorithms with well-designed evaluation
functions, huge databases of game situations, and specially
designed hardware chips. Although these programs have
managed to reach world champion-level performance, it
remains questionable whether they can match human-level
game playing capabilities. In any event, the expansion from
traditional board games to other types of complex games will
continuetoadvanceresearchongameAIproblems.
Some research stresses the importance of human-style
game playing instead of simply unbeatable performance [].
For example, given a certain board conguration, human
players usually do not check as many possible scenarios
as computer players. However, human players are good
at capturing patterns in very complex games, such as go
[]orchess[,]. Generally, the automatic detection of
meaningful shapes on boards is essential to successfully
play large-branching factor games. e use of computational
intelligence algorithms to lter out irrelevant paths at an early
stageofthesearchprocessisanimportantandchallenging
research area. Finally, current research trends are attempting
to imitate the human learning process in game play.
General Game Playing (GGP) was introduced to design
game-playing systems with applicability to more than one
specic game []. Traditionally, it is assumed that game AI
programs need to play extremely well on a target game
without consideration for the AI’s General Game Playing
ability.Asaresult,aworld-championlevelchessprogram,
such as Deep Blue, has no idea how to play checkers or even
a board game that only slightly diers from chess. is is
quite opposite to humans’ game-playing mechanism, which
easily adapts to various types of games based on learning
the rules and playing experience. In the context of GGP,
thegoalofanAIprogramisnottoperfectlysolveone
Hindawi Publishing Corporation
e Scientific World Journal
Volume 2015, Article ID 986262, 22 pages
http://dx.doi.org/10.1155/2015/986262
e Scientic World Journal
game but to perform well on a variety of dierent types
of games, including games that were previously unknown.
Such an approach requires a completely dierent research
approach, which, in turn, leads to new types of competitions
and general-purpose algorithms.
Unlike game-specic AI research, GGP assumes that
theAIprogramisnottightlycoupledtoagameand,
therefore, requires formal descriptions of games similar to
game manuals for human players. e formal description of
suchgamesisGameDenitionLanguage(GDL)[]. It is a
text-based and logic-based description of game rules that can
be used to model a diverse array of games ranging from those
as simple as Tic-Tac-Toe to those as complex as chess. GGP
programsmustbeabletoparseandunderstandtheGDLle
of a given game. Using the GDL, it is possible to dene new
games by slightly changing the widely used common rules.
is enables the denition of many games, a characteristic of
GDL that is essential for measuring the performance of GGP
programs. e use of GDL has become inuential in GGP
research through the introduction of the GGP competition.
It has also led to a new denition of its video game extension,
Video Game Description Language (VGDL).
Traditionally, GGP has focused primarily on two-
dimensional board games inspired by chess or checkers,
although several new approaches for General Video Game
Playing (GVGP) have been recently introduced to expand the
territory of GGP []. e goal of GVGP research is to develop
computer algorithms that perform well across dierent types
of video games. Compared with board games, video games
are characterized by uncertainty, continuous game and action
space, occasional real-time properties, and complex gaming
rules. Researchers involved in GVGP are beginning to dene
their language, VGDL, which is equivalent to GDL in GGP
research []. Additionally, GVDL comes with a new type of
competition []. is is a new eld of research that bridges
video game AI research and traditional GGP research.
Since its introduction, GGP research has continued to
progress. e AAAI GGP competition has provided an inter-
nationally acceptable venue for evaluating algorithms applied
to GGP [].Basedontheresultsofthecompetition,the
progress in this research domain can be measured, and many
promising techniques have emerged from the competition.
Specically, the use of the Monte-Carlo Tree Search (MCTS)
has been widely adopted in GGP research []. Recently, GGP
has expanded to other video games, including grid-style two-
dimensional video games and Atari video games. For this
review, we will focus on advances in GGP research since .
ispaperisorganizedasfollows.Section  describes
advances in the MCTS method, the state-of-the-art GGP
approach, with particular focus on Monte-Carlo (MC) sim-
ulation control mechanisms. e MCTS algorithm is very
well suited to the GGP domain due to its general applicability
because various games can be encountered during GGP
tournaments. However, the main disadvantage of MCTS is
that it makes very limited use of game-related knowledge,
which may be inferred from a game description. Section 
addresses methods that are rooted in AI and take advantage
of game-specic information. In Section ,recentadvancesin
game rules representations and parallelization of MCTS, both
of which are critical aspects of building ecient tournament-
level players are described. In Section , the use of GGP tech-
niques for video games is introduced and promising research
platforms are discussed. Section  reviews the acceleration
of GGP research through international competitions. Finally,
the paper is concluded with a discussion about challenges and
future directions.
2. GGP-Related Advances in MCTS
2.1. MCTS Overview. Monte-Carlo Tree Search (MCTS) is
the algorithm of choice by the most competitive General
Game Playing agents. For a survey about the MCTS, please
consult []. e authors of the survey aimed to embed an
exhaustive knowledge about the algorithm including origins,
mathematical foundations, the structure of the method, and
numerous enhancements. A simple description of how the
MCTS is applied by GGP players can be found in []fora
player named Gamer. e algorithm iteratively searches the
gametreestartingfromthecurrentstateinseriesofiterations
until the allotted time runs out. An iteration consists of the
following four steps: selection, expansion, simulation, and
back-propagation depicted in Figure .
() Selection Step. e algorithm starts from a root of the
game-tree and chooses a node within an already built
part of the tree based on the nodes’ statistics. Actions,
which have been performing better so far, are tested
more frequently. Typically, some kind of a condence
algorithm such as Upper Condence Bounds applied
forTrees(UCT)isusedasshownin().eUCT
algorithm is an extension to the at Upper Condence
Bounds (UCB). Consider
=arg max
𝑎∈𝐴(𝑠) (,)+ln ()
(,),()
where is the current state, is an action in this state,
()is a set of actions available in state ,(,)is
an assessment of performing action in state ,()
isanumberofpreviousvisitsofstate,(,) is a
number of times an action has been sampled in state
,andis the exploration ratio constant.
() Expansion Step. It means extending the tree by a new
node with the rst unvisited state so far, that is, the
rst state found aer leaving the tree.
() Simulation Step.Aerleavingthestoredfragmentof
the tree, a random simulation is performed until a
game termination is reached.
() Back-Propagation Step.escoresobtainedbyall
players in the ended game are fetched and back-
propagated (back-propagation) to all nodes visited in
the selection and expansion steps.
Because the origins of Monte-Carlo methods are in statistical
physics and for the UCT selection algorithm in optimization
of a multiarm bandit payo (gambling math), the success of
this approach in games has been surprising. ere have been
e Scientic World Journal
Selection Expansion Simulation Back-propagation
Run continuous
l
y in t
h
e a
ll
otte
d
time
F : Four steps of the Monte-Carlo Tree Search algorithm.
signicant amount of publications in the area of MCTS, but in
this section we will focus only on papers related to the GGP.
e main reason why this method has been so successful in
a domain of universal game-playing programs is that it does
not require any game-specic knowledge such as heuristic
evaluation function for the assessment of position. e only
requirement is to be able to simulate a game and read the
results. Moreover, the MCTS is an anytime algorithm that
can be stopped at any time and return the best move so
far. It parallelizes and scales well as opposed to alpha-beta-
like methods, which provide only linear improvement with
exponential growth of the tree. A link between game-tree
properties and performance of the MCTS can be found in
[]. e authors analyze such properties as follows:
(i) Branching factor:theaveragenumberofpossible
moves in a state impacting the tree width.
(ii) Tre e depth :connectedtotheaveragelengthofa
simulation from the beginning of a game to the end.
(iii) Progression towards a natural termination: each move
that naturally brings the state closer to a terminal
one. Examples of naturally progressive games given
in the paper are Connect 4,Othello,andQuarto.
Oen, a natural termination is featured in games,
where players ll a board and pieces once placed
do not disappear; therefore the board eventually lls
up completely. On the other hand, games without
a natural termination oen could go innitely long
without an articial termination condition such as a
maximumnumberofstepsoramaximumnumberof
state repetitions. Examples of such games are Chess,
Skirmish,andBomberman.
(iv) Existence of optimistic moves: moves which are imme-
diately good, that is, win a game or give a good result
for the player in a few number of steps, provided that
the opponent does not see a proper response (thus
optimistic). However, if the opponent makes the right
response, it usually puts him ahead of the player who
made the optimistic move. Optimistic moves usually
exist in games, where it takes lots of simulations to
calculate the response compared to nding the good
moves.
It was found out in [] that comparing branching factor
versus tree depth, there is no factor out of these two which
inuences the MCTS performance more. It depends more
on the actual rules of a game being played. Both the larger
branching factor and the deeper tree slow down the process
and render the MCTS assessment less accurate. Progression
towards a natural termination increases whereas existence of
optimistic moves decreases the performance of the MCTS.
2.2. Reducing the Combinatorial Complexity. e UCB algo-
rithm was designed to work with bandits giving payo
stochastically according to some unknown distribution. e
payofunctioniscontinuouswithincertainbounds.InGDL-
I, while there is still randomness by means of uncertainty
of what actions the other players will chose, the games are
deterministic by structure.
2.2.1. Suciency reshold. In [], the authors propose two
optimizations known as moving average return function and
suciency threshold to exploit the nature of games in GGP
which are characterized determinism and a xed number of
scores available in every game. e general idea is to allocate
more simulations to actions which evaluation is not clearly
converging to a score dened in the game. A second idea
is presented to distinguish between two similarly evaluated
best moves. In such case, it is benecial to allocate the whole
budget to just one of these two moves. If the estimation stays
high or even increases, then the move should be played.
Otherwise, the second, not well-simulated one, should be
played. e suciency threshold has been introduced by the
same authors both in [,] to optimize the allocation of
simulations and tackle the issue of liability of the MCTS
technique to choosing optimistic moves. It is dened as a
parameter , which aects the exploration parameter in the
UCT formula introduced in ():
=
, when all (,)≤,
0, when any (,)>. ()
e Scientic World Journal
if not useEarlyCuto then
return false
end if
if playoutSteps <minimumSteps then
return false
end if
if IsGoalStable() then
// Cuto point has been calculated as:
// cut rstGoalChange + numPlayers
return playoutSteps cut
end if
if hasTerminalInterval() then
// Cuto point has been calculated as:
// cut rstTerminal + . terminalInterval
return playoutSteps cut
end if
A : Pseudo-code for deciding cuts for the early cuto
extension. It was taken from [].
2.2.2. Move Average Return Function. Move Average Return
Function, introduced in [], increases importance of newer
simulation results as performed with more information and
beingmoreaccurateofhowthematchcanunfold.Let
denote the result from a simulation and be inverse to the
number of simulations for small number of simulations and
an arbitrary constant aer reaching that threshold. e score
update function becomes
(,)=old (,)+−old (,). ()
2.2.3. Early Cuto. e paper [] brings two new extensions
to the MCTS. e rst one is applied in the simulation step
and it is called early cuto.eideaistoterminateasimu-
lation earlier, as opposed to running it till the end, in order
to save computation time and perform more simulations.
e cuto is based on two conditions: the depth from the
starting state and goal stability. e goal is stable if it can be
computed in nonterminal states, changes with low variance,
andiscorrelatedwiththesituationinthegame.enotion
of the goal stability was borrowed from earlier papers in GGP
[]. e pseudo-code for the early cuto is as in Algorithm .
2.2.4. Unexplored Action Urgency. e second one is Unex-
plored Action Urgency in which there is no longer a require-
ment to select each action at least once. An urgency of an
action is dened as follows:
urgency =50+ln ()discount,()
where ()is the number of visits to a state and discount is
the number of unexplored actions divided by the number of
total actions available in the state.
Now if any action’s UCT value is higher or equal to the
urgency value, that action is chosen. Otherwise, the rst
unexplored action is chosen instead. e idea is to have the
MCTS fringe simulated better than in a regular algorithm.
e enhancements of Early Cuto and Unexplored Action
Urgency,introducedin[], were further included in the PhD
thesis [].isthesisisalsoarichsourceofinformation
about CadiaPlayer.
2.3. Simulation Control Enhancements. Most of the original
contributionstotheMCTScanbedividedintothreecate-
gories based on the area where they are applied: selection step,
simulation step, or both. e last category was investigated by
[] by the authors of CadiaPlayer.
2.3.1. RAVE. e RAVE method stands for Rapid Value
Action Estimation. It was rst proposed in  for Go, but
it was included in the CadiaPlayer’s authors paper []for
comparison and synergy purposes when it is be combined
with other methods in GGP. A recent paper about RAVE
which the GGP agents stem from is []. e aim of applying
RAVE is to make the learning process faster, especially at the
beginning when the tree exploration is more chaotic. In this
method, every action in a tree keeps an additional RAVE
value RAVE whichisupdatedeverytimethesameaction
was played inside a simulation (not necessarily in the same
state)andpropagatedupthetreelikeinthemainmethod.In
contrast to the main method, here many action have chance
to propagate their values, not only the one which started
a simulation. e obtained RAVE evaluations are linearly
weighted in the UCT formula with the regular assessment as
follows: ()×RAVE (,)+1−()×(,),
()=
3×()+,()
where () is the number of visits to a state and is the
equivalence parameter constant.
e RAVE enhancements increases results slightly or
signicantly in  of  tested games with the exception of
Skirmish.
An interesting idea for an incremental improvement of
the search algorithm is presented in []. e authors revisit
the concept of Rapid Action Value Estimation which is very
game-dependent in terms of eciency. ey show that with
theRAVEturnedontheresultsofgamesareshiedina
nearly linear way. In some games, for which RAVE is suitable,
the shi is benecial, whereas for others it is detrimental.
e solution is to detect online whether it is worth using
RAVE. Details of the algorithm are not included. However,
the idea is to use RAVE only in subtrees of nodes where
there is correlation between the non-RAVE scores and RAVE
predictions. e RAVE value must fall into margin outside of
which moves are considered as being too optimistic or too
pessimistic.
2.3.2. MAST, TO-MAST, PAST, and FAST. Four enhance-
ments under the category of “simulation control” are inves-
tigatedinthearticle[] and the PhD thesis of one of the
authors in [].
e enhancements are as follows:
(i) Move-Average Sampling Technique (MAST):astan-
dard UCT plus a lookup table of actions assessment
e Scientic World Journal
stored independently of the state they were played in
during simulations. is enhancement is called His-
tory Heuristic. In [], historically good actions bias
future simulations according to the Gibbs Sampling
(or Boltzmann distribution).
(ii) Tree-Only MAST (TO-MAST):thesameasMAST,
but updates statistics only for actions within the
constructed part of the UCT tree.
(iii) Predicate-Average Sampling Technique (PAST):the
same as MAST, but here the statistics are gathered
andusednotonlyforactionsbutalsoforthepairs
predicate,actionwhere predicates build the game
states. e evaluation of a state is aggregated using the
max operator over all the contained predicates.
(iv) Features-to-Action Sampling Technique (FAST):atem-
plateforthemosttypicalwayofencodingcellsand
pieces in the GDL games is used. If it successfully
detects that such objects are present in a game then
the system learns the importance of particular pieces
and cells using the TD() algorithm. en an evalua-
tion function based on a linear weighting of features
and their corresponding importance is constructed.
e function is used in a similar fashion as actions in
MAST (evaluates actions by their resulting states) to
bias the simulation according to the Gibbs Sampling.
All the proposed optimizations are empirically tested with
andwithoutusageofRAVE.Itisshownthatvariouscom-
binations provide signicant benets for certain games. e
result s o f R AV E /MAST and R AV E / PAST were id e n t i e d as
the most promising ones.
2.3.3. N-Grams and Last-Good Reply Policy. Another couple
of enhancements to the MCTS selection phase are described
in []. e rst one is N-grams Selection Technique (NST)
which extends MAST. e average players’ rewards and the
number of visits are here stored not just for actions but
for longer sequences of actions called N-grams. During a
game,theauthorsmaintainsequencesoflengths,,and
 with their respective statistics. Sequence of  is equivalent
to the regular history heuristic. Due to maintaining longer
sequences, actions are evaluated in specic contexts. e
statistics are used during a playout, where the simulated
player checks the database of stored sequences (starting
fromthoseoflength=)forpossibilityofreconstructing
a particular sequence aer choosing a candidate action.
Actions leading to the best evaluated sequence are chosen
more oen. Both Gibbs Sampling and -greedy methods are
tested. e latter outperformed the rst one in the empirical
experiments. Only such sequences of actions, which appear
at least times, aect the simulation phase to minimize
randomly occurring noise. e authors chose =inthe
experiments. e second enhancement presented in []
is Last-Good Reply Policy (LGRP) which had already been
successful in Go and Havannah. e idea is store the best
countermove for a preceding move. e best countermove is
dened as resulting in the highest reward among all players.
For each move, only one best-reply move is stored and every
new one overwrites the existing one. e LGRP policy is
used to rank the unexplored actions in the selection step
and the simulation step. Both enhancements were tested
independently and in combination with other enhancements
using a number of games. e best players were using either
NST or LGR with NST as the fallback strategy. Both enhance-
ments improve the performance of the baseline CadiaPlayer
in certain games.
2.3.4. Decaying Strategies. e two mentioned simulation-
control strategies, that is, N-gram Selection Technique and
the Move-Average Sampling Technique, were further opti-
mized by using them with a certain decay factor []. Decay
is a process of decreasing importance of the older statistics
with the assumption that they are more likely outdated or
gathered not in the most current area of exploration in the
UCT tree. e results are simply multiplied by a factor of
 ∈ [0,1]. ree decaying methods were investigated called
Move Decay (aer a move is made in the game), Batch Decay
(aer a xed number of simulations), and Simulation (aer
each simulation, but only for N-grams and Last-Good Replies
which occurred in the simulation). e authors also tested
a combination of Move Decay with Simulation Decay. All of
them improved the performance of the respective simulation-
control strategies and Move Decay with =. and =.
was the best, in overall, for the games designated in this
experiment.
2.3.5. Simulation Heuristics. An approach to optimizing the
simulation phase by adopting various light-weight heuristics
is investigated in MINI-Player []. e authors propose six
policies, called strategies in the paper, which are used with
certain probability at each step of a simulation to pick moves
for players. Each simulation is driven by exactly one strategy
(per player). e following simulation-based heuristics are
proposed:
(i) Random (R): the baseline MCTS policy is fast and
unbiased.
(ii) History Heuristic (HH): an established enhancement
totheMCTSisusedhereasastand-alonesimula-
tion heuristic. Actions which are globally good (i.e.,
independent of a particular state) are chosen more
frequently. e action-score statistics are updated
aer each simulation (not only those driven by HH).
(iii) Mobility (M): actions leading to states in which our
playerhasmoremoveoptionsrelativelytoother
players are favored.
(iv) Approximate Goal Evaluation (AGE):theauthors
propose a way of calculating a partial degree of
satisfaction of a GDL goal rule. e idea is based
on traversing a proof-tree, called the AND-OR tree,
in a recursive manner. Two types of values, the
actual degree of satisfaction and the tiebreaker, are
calculated and propagated bottom-up in the tree. e
formula is applied to all goal rules with the highest
score available to each player. AGE will choose the
action which leads to a state maximizing the goal
e Scientic World Journal
score. e idea of AGE was inspired by FluxPlayer
[], but in [] the realization is vastly dierent on
both conceptual and technical levels.
(v) Exploration (E):thisstrategyintroducesameasureof
similarity and thus dierence between any two game
states. First, for each action, the E strategy will look
at its aer-state and pick the most similar state to
that aer-state among the states visited before. en,
the chosen action will be the one that maximizes
a dierence between those two states among all
available actions.
(vi) Statistical Symbol Counting (SSC): this strategy relies
on building a simple evaluation function during the
START CLOCK. e number of facts of each type
and the quantities of each symbol appearing at a
certain position index in the facts are the building
blocks of the evaluation function. All these quantities
are tested for correlation with the game score and
assigned proportional weights. Quantities which do
not change are discarded. e strategy is a simplied
version of a stand-alone player discussed in [].
e strategies are evaluated online, independently for each
player, in such a way that the ones which perform statistically
better have a higher probability to be chosen in subsequent
simulations. ree methods for heuristic evaluation were
tested and the UCB algorithm was concluded to be the most
suitable one.
Another contribution of []wasamodiedformula
for choosing an action to play. A move is decided based
on statistics gathered in the top two levels in the tree. e
formula resembles a shallow min-max if our player has only
one available move (min case) or each of the opponents has
exactly one available move (max case). For the remaining
cases, the quality of an action is computed by a linear
interpolation, with  = 0.5, between its regular score and
the minimal score of the action’s child nodes. Actions leading
to terminal states, for which there are no more nodes in the
tree, have their average score multiplied by ..
Although using heavier playouts results in smaller num-
ber of simulations per second, the approach has improved the
baseline performance of the player performing only random
simulationsinoutofgamestestedin[]. Moreover,
the agent equipped with the simulation heuristics achieves a
higher average score across the domain of tested games.
2.4. Simultaneous Moves. While realization of a MCTS agent
for the case of alternate-turn games is straightforward, things
aregettingmorecomplicatedfortrulysimultaneousgames.
In such games, the algorithm has to choose during the
selection phase actions for each player and more than one
player can have more than one action in a state. is can be
seen as a multicriteria optimization. In addition, such games
are usually much more complex due to the higher eective
branching factor, which comes from multiplication of the
average numbers of legal actions for each player. is problem
was undertaken in [] where the following methods were
tested to deal with simultaneous moves in the MCTS/UCT:
(i) Decoupled UCT (DUCT): each player stores separate
rewards and visit counts for their tree. Actions are
chosen as if there was no move joint dependency.
(ii) Exp3: each player stores rewards and visit counts for
their own moves but the score of each move is scaled
by the probability of it having been sampled.
(iii) Regret Matching: a regret matrix is maintained by each
player storing cumulative regrets for playing an action
instead of another one. e chosen move minimizes
the regret.
(iv) Sequential UCT (SUCT): the game is virtually trans-
formed into a sequential one where players choose
actions one aer another and the preceding choices
areknowntothesubsequentplayerssotheycan
respond accordingly.
e authors conclude that DUCT winning in % of games
seems to be the safest choice but the SUCT with % win ratio
is not far behind. Regret Matching is not performing well in
general, but there is a game identied, where it outperforms
other methods signicantly.
2.5. Alternatives to UCT
2.5.1. Roulette Wheel Selection. e possibility of replacing
the UCT algorithm in the selection phase by a roulette
wheel selection was investigated in the master thesis [].
e roulette wheel selector is applied there in the most
straightforward way. First the total score from all the average
scores of actions is computed. en each action , in the order
of appearance, is assigned an subinterval from [0,1]starting
in the end of the previous action interval of length equal to the
score of divided by the total score. As an example, consider
ve actions from to with their respective intervals:
{1[0,0.05],2[0.05,0.2],3[0.2,0.25],4[0.25,0.5],
5[0.5,1.0]}.
Next, a number called MoveSelector from  to . is randomly
generated. Finally, the rst action with le-value of its interval
greater than or equal to the generated MoveSelector is chosen.
In addition, one-move wins and losses are handled
separately. A one-move is preferred over the roulette selection
and in the case of one-move loss a random move is chosen
instead (probably to avoid a division by zero). is alternative
approach to balance the exploration versus exploitation ratio
was tested only in two simple games: Tic-Tac-Toe and Nim.
e resulting player was not signicantly better or worse than
the UCT one.
2.5.2. TD-UCT. One of the recent publications [], concerns
combining the UCT score with an evaluation obtained
from a Temporal Dierence (TD) algorithm. ree ways of
aggregation of the TD values are proposed:
(i) TD-UCT Single Backup:thealgorithmomitsboot-
strapping and updates the TD values in the back-
propagation phase only from the selected leaf node
e Scientic World Journal
up to the root. ere are two weighting parameters:
distance to the selected leaf node and distance to the
terminal state of the performed simulation.
(ii) TD-UCT Weighted Rewards:thesimplestvariantin
which the TD evaluation completely replaces the
evaluation in the UCT algorithm. Rewards are
weighted by the number of steps to the terminal state.
(iii) TD-UCT Merged Bootstrapping:itisthemostcom-
plex variant. It combines the TD-UCT Single Backup
with fully edged bootstrapping (updating states
according to the value of the next state).
e authors chose two variants of Gomoku and three other
games and report improvement of the plain UCT perfor-
manceinallthetestedvariants.However,thersttwo
variants do not perform well when combined with other well-
known UCT enhancements such as RAVE or AMAF. e
third variant, TD-UCT Merged Bootstrapping, is shown that
it can be successfully combined leading to even better results.
3. AI-Based approaches
3.1. Overview. e competitive side of General Game Playing
has been dominated by the Monte-Carlo Tree Search and
its optimizations but it does not mean that methods having
roots in the more classical AI were given up. We start our
survey on this topic with a summary of achievements related
tocomputationalintelligenceinGGP[]. is work includes
historical overview of four pre-GGP attempts to create
multipurpose playing programs. A summary of the rst three
GGP competition winners is as follows: ClunePlayer [],
FluxPlayer [], and CadiaPlayer []. e paper contains
some remarks about possibility of adopting CI methods to
GGP as well as the authors recent work of constructing a
general state evaluation function. e approach is largely
basedon[] and extended in [], so we will devote a
separate paragraph for it.
3.2. GDL and Features. When designing programs to play
a specic game, one of the common aspects is to identify
characteristic features of the game. Features can encode
higher level properties of a state or can be building blocks
which the game state is composed of. Such features help to
determine whether a state is good or bad and usually are
used by top players in their playing. In General Game Playing,
no universal high level features exist and therefore they have
to be learned online. Several articles have been published
to tackle this issue. e approach is largely based on []
how typically certain game-elements are encoded in GDL and
what the features of a heuristic evaluation function derived
from those GDL expression can be. e considered features
are as follows:
(i) Solution cardinality:forexample,iftherearemore
than  elements.
(ii) Ordered domains:forexample,points.
(iii) Relative distances: for example, capture when having
the same location.
(iv) Distances to a xed uent: for example, timeout-
termination step.
(v) Persistence: for example, uents which once become
true never change.
3.3. Feature-Based Evaluation Functions
3.3.1. Game Independent Feature Learning. e paper []
presents a robust approach to feature learning named Game
Independent Feature Learning (GIFL) (Figure ). e idea is
to perform random simulations and build a small tree around
aterminalstate,showninFigure , when the simulation ends.
Next, dierences between two consecutive states encoded
in GDL are extracted as a set of predicates. e features are
identied as oensive and defensive depending on whether
they lead to victory or prevent loss. A database of features is
nally used to guide the UCT simulations. First, all applicable
features are fetched based on the predicate matching with the
current and the next states. Features with a value of  are
taken immediately. If no such features exist, the applicable
ones are chosen according to the probabilities computed by
the Boltzmann distribution:
()=𝑉(𝑎)/𝜏
𝑛
𝑏=1 𝑉(𝑏)/𝜏,()
where ()is probability of choosing action ,()is the
value of the feature corresponding to action ,is the number
of actions, and the parameter was set to ..
e approach was tested in  games. A signicant gain
was reported in  or  games depending on the time controls
for moves.
3.3.2. Decision Tree Learning. Identication of predicates as
features is also presented in []. Here, the concept of a
feature is simplied to a single fully grounded GDL predicate
such as (cell 2 2 x). e statistics of features such as the
average score, number of occurrences, mean, and high and
low bound for the score are gathered during a self-play.
Predicates which are more positively correlated with the win
arecalledsubgoals.Intheirpreviouswork(beyondthescope
of this survey), the authors were using a weighted linear
combination of the features to construct a state evaluation
function. Later, they switched to Decision Tree Learning.
A Decision Tree is a widely adopted classier in machine
learning. e learning algorithm of choice was ID. e agent
performs random simulations and feeds the Decision Tree.
ere are some optimizations proposed to avoid creation
of too many classes and overtting. During the playout,
theagentcomputesthenextstateforeachavailableaction
and projects the resulting state to the decision tree. e
stateisdecomposedintofeatures(predicates)inorderto
determine which class it belongs to. e score of the class is
the assessment of the state.
3.3.3. General Dynamic Evaluation Function. Another
approach to constructing an evaluation function dynamically
was introduced in [] and extended in []. e idea draws
e Scientic World Journal
(a)
(b) (c)
(e
)
(d
)
(f
)
Predicates Action Va l u e
(cell 1 1 x)
(cell 3 3 x) 100
Feature
l
earningexamp
l
e
State Action
(mark 2 2 x)
(mark 2 2 x)
Termi n a l ?
x
x
o
x
x
o
x
x
o
oo
x
x
o
o
x
xx
x
o
o
x
x
oo
x
x
x
o
o
Test
F : Game Independent Feature Learning. e gure was reproduced based on [].
T 
Example predicate
from the rules
New predicates aer
generalization
New predicates aer
specialization
(cell 1 1 ?)
(cell???)
(cell1??)
(cell?1?)
(cell 1 1 x)
(cell 1 1 o)
(cell11b)
from the common and previous state-of-the-art work.
Features are again predicates which are detected directly
from a game description. Next, the predicates are generalized
(by replacing symbols with variables) and also specialized
(byreplacingvariablesbysymbolswithintherespective
domains). Domains are detected via traversing dependency
graphs of GDL uents and variables. Table  presents a
possible predicate.
e features are analyzed in terms of their stability which
is a function of the variance  during game sequence
(vertical) and variance  between games (horizontal).
Consider
= 
(+10).()
A linear combination of the top  features by the average
scoreisintroduced.efeaturesareweightedbyaproduct
of their stability and correlation with the game score. Such
a constructed evaluation function is used in two variants:
with the MTD(f) algorithm and the so-called Guided UCT
method. e latter case involves early termination of the
Monte-Carlo simulation with probability  = 0.1.Incaseof
an early termination, the evaluation function provides scores
for the players. While the approach is not yet robust enough
for winning the GGP competition, in some games such as
Checkerstheresultsareverypromising.
3.4. Distance to Features. Tw o p apers [ ,] investigate the
concept of distance between features. In this case, the features
arecalledGDLexpressionforstatepredicateseitherfully
instantiated or containing variables. In earlier work, distances
between two predicates required a prior recognition of board-
like elements and Cartesian board-like structures with totally
ordered coordinates. In the mentioned articles, the authors
show a procedure for detecting admissible distance between
two features by means of a number of steps required to make
a certain feature true starting from a state when the other
feature is true. Figure  presents an excerpt from the game
Breakthrough.
e method involves constructing a Fluent Graph from
rules in a Disjunctive Normal Form (DNF) which is not
feasible for all the games due to the rules complexity.
Once distances are calculated, they are used inside a fuzzy
evaluation of the degree to which goal rules are satised.
e function operates on DNF forms of the goal rules,
takes the current state as the input, and returns a numerical
assessment in the [0,1]interval as the output. Conjunctions
are transformed to -norms and disjunctions are transformed
to -norms whereas true(P) conditionsarecomputedbased
on a closest distance from the current state to the predicate
.
e paper identies applicable games for which the
improvement is signicant. Out of  tested games, the
method works particularly well in  games, slightly above
average in the next  games, no gain is observed in  games,
and  games are underperforming with the inclusion of the
distance heuristics.
3.5. Survey-Like Papers on Knowledge-Based Methods. e
dissertation [] presents several knowledge-based methods
for GGP which are as follows: (A) automatically generated
state evaluation function which uses fuzzy logic to approx-
imate degree of truth of goal conditions in nonterminal
states; (B) construction of a neural network to optimize
the introduced evaluation function; (C) construction of
newpropositionsaswellassolvinggamesusingautomated
e Scientic World Journal
Cellholds (8, 8, black)
Cellholds (8, 7, black)
Cellholds (8, 1, black) Cellholds (7, 1, black) Cellholds (1, 1, black)
Cellholds (7, 7, black) Cellholds (1, 7, black)
Cellholds (1, 8, black)
Cellholds (7, 8, black) ···
···
···
···
······
F : Distances between uents in Breakthrough. e gure was reproduced based on [].
theorem proving techniques; (D) detection of symmetry in
games; (E) factoring of composite games. is PhD thesis
is also an exhaustive source of information about a player
named FluxPlayer.Becauseofthehugevolumeofthissource
( pages), we are unable to go into details like in the case of
shorter articles.
Another PhD dissertation []presentsasystematic
analysis for methods for creating evaluation functions. Many
overlapping concepts with [,,]areshared.iswork
contains classication of approaches by a method of aggrega-
tion,performance,andthesourcefeaturescomefrom.Some
theoretical divagations are included.
Neural Networks.In[], the authors show how to transform
a propositional logic of the GDL rules into a neural network.
e rules of interest are goal rules to then perform approxi-
mation of a goal in nonterminal states. is concept has been
very popular in General Game Playing. For this purpose, a
generalization of the -2derives from the area of Neuro-
Symbolic Integration (NSI). e algorithm correctly maps
propositions to neurons which result in some kind of fuzzy
inference engine with learning capabilities. e algorithm is
described to transform rules of the form
⇐
1≤𝑖≤𝑘𝑖with ⊗∈{∨,},()
where is an atom and the 𝑖areliterals.Aruleisrepresented
by +1neurons where one neuron is the head of the rule and
the rest neurons are denoted by literals connected to the
head. If the propositional value (e.g., head of the rule) is true,
then the neuron representing the proposition responds with
an output value of [MIN ,1]whereas output of [−1,MAX ]is
interpreted as false. In the work, a standard model of neuron is
dened with real weights, bias, unbiased and biased outputs,
a bipolar activation function, and the real output.
e authors tested if the mapping can be performed for
the rules of  games. For  games, no network could be
constructed. In  games, the proposed approach has led
to a higher state resolution than when a straightforward,
nongeneralized -2algorithm was used. Unfortunately,
there was no GGP player built on top of this algorithm and
therefore there are no results of playing strength.
3.6. Transfer Learning. An interesting quality attributed to
human-like playing is transfer learning. It means that humans
can generalize once learned knowledge about a game and
use it in similar context if they appear in dierent games.
Knowledge transfer is extremely dicult in General Game
Playing, not only because the variety of games is practically
unlimited but also because the description language is low-
level and purely universal. Please recall that players in General
Game Playing start from scratch and there is no formally
provided metainformation about what game is being played
or which players are involved in the game. e article [],
which uses GGP as the testing framework, concerns transfer
analogy in games. e analogy is tested by comparing GDL
descriptions. Two algorithms of discovering an analogy, mini-
mal ascension and metamapping,areintroduced.erstone
is related to small structural changes between the descriptions
(near learning) whereas the latter is responsible for matching
more complex changes (far learning). Both methods apply
static analysis of the GDL and dynamic analysis as the
review during game play. e authors tested games within
the same domain (prone to transfer) and with completely
dierent domains. e approach has successfully identied
some common scenarios, but in general, the authors conclude
that there are still many limitations of the transfer learning.
We will not go into details here, since transfer learning is not
a part of the GGP competition protocol.
4. Rules Representation and Parallelization
4.1. Overview. In this section, we focus on dealing with the
rules of GGP games and distributing computations. is
includes design of inference engines for reasoning in GDL.
We limit the scope to the default version (GDL-I) which has
been used in all Stanford’s competitions so far. In , an
extended specication was proposed (GDL-II) []which
allows nondeterminism and hidden information. ere are
many viable ways that operate with the GDL rules such
as Prolog, a custom GDL interpreter, or translation to a
dierent representation (Table ). A comparison between the
rst and the second approach is discussed in details in [].
An overview of a few available GDL reasoners is contained in
[]. In summary, a Prolog-based engine is relatively slow.
 e Scientic World Journal
T 
Approach Feasibility Speed
Full instantiation of all states  
Propositional network  
Custom GDL interpreter  
Prolog interpreter  
GGP Base Java Package  
Another representation ? ?
Because the topics in this section are mostly implementa-
tion oriented, we will focus on the general report on what has
been published here.
4.2. Instantiation. Instantiation of game description means
elimination of all variables. Such descriptions can be used
for many purposes such as solving games or inferring game
states in a more robust form. e paper []comeswithtwo
techniques of instantiation: Prolog-based and a manual one
using dependency graphs. A top view of the algorithm is
summarized as follows:
() Parse the GDL input.
() Create the DNF form of the bodies and all formulas.
() Eliminate negated atoms (in the Prolog case).
() Instantiate all formulas.
() Find groups of mutually exclusive atoms.
() Remove the axioms (by applying them in topological
order).
() Generate the instantiated GDL output.
e authors were able to instantiate  of the  enabled
games from the Dresden GGP repository using Prolog and
 using the dependency graphs.
4.3. Propositional Net GDL Interpreter. A highly optimized
custom interpreter for GDL can be created using a forward
chaining technique [,]. In this approach the rules are
rst converted to a disjunctive normal form (DNF), stratied,
and ordered using a sophisticated ordering strategy based on
statistics. Next, rules are assigned ecient structures which
process inputs (conditions) to outputs (rule instantiations).
Memory-ecient Reference Tables are designed for this task.
e forward chaining means starting with the available
ground data and traverses the rule base in an inverted fashion
compared to the original GDL to get the results satisfying
the required rules such as legal or goal.epredecessor
article []isfocusedonthemainalgorithmandautomatic
generation of OCAML functions whereas []introduces
later optimizations.
4.4. Classical GDL Interpreter. Two GDL re asoner s [,]
approach the problem in a more classical way, that is, by
determining results of rules from the denition, without full
instantiation or elimination of variables, with unication of
variables as they appear. Some common features of the two
systems include attening a GDL description by removing
all nested arguments, compilation to ++, tree-like repre-
sentations to perform the resolution, single-pass method by
meansofvisitingeachtreenodeonlyonce,andoptimized
data structures for containers for the output produced by
rules for conditions. For the containers, in the rst approach
trie-composed and tree-composed structures are used []
whereas in the second one [] the results are stored in
memory in a linear fashion (native pointers) with dynamic
hashing where it is benecial. Other dierences are in the way
how the results are gathered and merged together and how
unication is performed along the path of resolution or in
dealing with negation and recursion.
4.5. Factorization and Decomposition. When having a net-
based reasoning system, where each input node feeds data to
an output node, it is very benecial to decompose the graph
into subgraphs to avoid unnecessary data processing. Such
decomposition can be also benecial to detect independent
factors and thus reduce the complexity of the game tree. An
approach to decomposition based on model checking and the
Answer Set Programming (ASP) can be found [].
Another investigated optimization is based on solving
the inferential frame problem that is extracting the exact
translation function from one state to another. Normally,
all state predicates are cleared during the update and the
GDL species which predicates become true aer the update.
A more intuitive transformation from certain predicates to
certain predicates which not only illustrates how the state
evolves in time but also can vastly improve the reasoning
speed is presented in []. A related work to perform only
the necessary transition from one state to another without
recomputing the whole state is []. Like in [], the rules are
stratied and ordered. en the so-called numerical model of
astratiedprogramis constructed. Whenever a move update
is performed, only the potentially aected rules are marked
as “needing recomputing.” e authors report an order of
magnitude improvement in Connect Four.
4.6. Translation of the GDL to a Dierent Representation
4.6.1. Toss. e next concept under this category that we want
to address is translation to a dierent representation. e
GDLdescriptioncanbetranslatedtotheso-calledstructure
rewriting rules based on rst-order logic with counting [].
It allows capturing a dynamism of how predicates evolve in an
automata-likegraph.emethodispartoftheTosssystem
[]whichmakesuseofthisstructuretodevelopsimple
heuristics. Toss requires transformation of the GDL rules into
typenormalform(TNF)andlikealloftheplayersusingsome
kind of normal form or instantiation is not suitable for too
complicated game descriptions.
4.6.2. Action Language. e paper []presentsaformulaof
embedding GDL into an action language called C+.Asthe
name implies, moves performed by players make the central
point for action formalisms. e main result from the paper
is the algorithm of building causal laws from GDL rules.
e Scientic World Journal 
etranslationisproventobealwayscorrectsoitisanice
starting point for developing action-based heuristics.
4.6.3. Planning Domain Denition Language. While in mul-
tiplayergamessolvingthegameispossibleonlyfortrivial
and let us say uninteresting cases such as Tic-Tac-Toe; in
single-player games the goal is actually to solve the game at
least weakly. e article []introducesamethodtotranslate
a given GDL description into Planning Domain Denition
Language in order to use methods dedicated for planning to
generate a solution to a game.
4.7. Parallelization. ere are various reasons standing
behind parallelization of General Game Playing programs
such as pushing the envelope as far as possible, exploiting the
parallel nature of Monte-Carlo simulations which are part of
stateoftheartortheultimategoalofwinningtheocialGGP
Competition. However, there have not been many articles
relatedtoparallelizationinGeneralGamePlayingwhichis
probably due to the fact that there had been already existing
work tackling the parallelization in Go. To our knowledge,
there are two articles [,] on distributed computations
strictly connected to GGP.
4.7.1. Root Parallelization. In the rst one, a root paralleliza-
tion scheme is proposed which involves maintaining separate
instances of the game tree on dierent machines. Statistics of
nodes near the root are aggregated with certain frequency,
oncepermoveinthiscase.eauthorsinvestigatefour
techniques of such an aggregation known as Best (select the
best evaluated move from a distributed node), Sum (sum of
total scores and total visits), Sum10 (Sum performed only
for the top ten best evaluated moves), and Raw (send only
average scores of moves from nodes without weighting by the
number of total visits). e best algorithms are obtained for
Sum and Sum and are very close. e parallelization works
well for each but one tested game.
4.7.2. Tree Parallelization. Inthesecondarticle[]the
authors switch to Tree Parallelism, where only one master
node has access to a game tree and delegates work to sub-
players. e subplayers perform one or more simulations and
send the result immediately. According to the article, Root
Paralellism works better with small number of subplayers
(lessthan)whereasTreeParallelismscalesbetterupto
 distributed nodes. It becomes detrimental for a higher
number of nodes in almost all tested games.
4.7.3. Centurio. e last work we mention in this section is
about a player named Centurio []. It uses the Monte-Carlo
Tree Search algorithm but the realization is pretty standard,
so it is not included in Section .Essentially,thisworkisa
report of what is Centurio about without any novel contri-
butions. In the case of single-player games, GDL program is
translated to the ASP program in the case of solving single-
player games. e authors chose an existing third-party ASP
engine. Considerate part is also dedicated to parallelization
on a cluster using an open-source dedicated soware oering
a Network-Attached Memory (NAM) implementation.
5. General Video Game Playing
Recently, GVGP has been proposed as a new research topic in
theeldofcomputationalintelligence[]. Although its for-
mulation is very similar to GGP, its target has changed from
traditional board games to video games. e introduction of
GVGP has raised several new challenges related to the unique
properties of video games. For example, video games usually
do not allow very much time for players to make decisions,
and this situation is exacerbated in real-time strategy games
( to  ms to react). is signicantly aects the possibility
of using computationally expensive search techniques for
GVGP. Moreover, video games typically have enemies and
nonplayer characters (NPCs) that are continuously moving,
and a delay in decision-making can result in signicant losses
within a game. Additionally, video game settings are oen
more closely related to real-world situations than those of
board games.
Similar to GGP, algorithms applied to GVGP also need
to be tested against a large number of video games. As it
can be a huge burden to use many types of video games in
GVGPresearch,theuseofopengameplatformsisessential
to increasing the speed of research. Such platforms include
various video games with an API (Application Programming
Interface) for the AI program. Initially, the open platforms
werebasedonwell-knownemulatorsofearly-generation
console devices (Atari ). Because these platforms were
not designed with VGDL in mind [,], it was not easy
to add new games to the platform, and the AI controller
usually had little idea of the representation of video games
on standard platforms. e most famous open game platform
is ALE (Arcade Learning Environment) [], which is based
on an Atari  emulator (Stella) []. It supports various
classic Atari games, including Freeway and Ms. Pac-Man.
On the other hand, GVG-AI platforms used for the IEEE
Computational Intelligence in Games (CIG)  GVGP
competition were designed to support VGDL []. Moreover,
the game description language (GDL) is poorly suited for
video game environments because of several factors []:
(i) Nondeterministic behaviors by NPCs or elements of
chance.
(ii) Simultaneous decision making by players and NPCs
at any given step of the game.
(iii) Dynamics (physics, continuous, or temporal eects,
collisions, and interactions).
(iv) Large environments.
Although GVGP research has a relatively short history, many
researchershavealreadyappliedvarioustechniquestosolve
GVGP problems. ese solutions have been inspired by
GGP, game AI, and reinforcement learning. In this section,
we will divide the GVGP problem into ve subproblems:
() search/planning algorithms, () learning and adaptation,
() game state representation, () feature extraction and
 e Scientic World Journal
Game world AI player
Actuator
Update Sensor Preprocessing
Next action
Learning
(i) Objects
(ii) Video
(iii) Memory
(iv) etc.
(i) Game state
representation
(i) Dimension reduction
(ii) Feature extraction (i) Reinforce
learning
(ii) Supervised
learning
(iii) Evolutionary
computation
(iv) etc.
(i) Search/planning
(ii) Generative model
(i) Key input
Update every game frame
F : Overview of the GVGP decision process and research areas.
dimension reduction, and () objective functions. We will
also discuss recent research on each of the respective sub-
problems. Figure  presents an overview of GVGP research
areas and their ow of information processing.
5.1. Characteristics of GVGP Problems. First, most video
games are played in real time. In contrast, traditional board
games are based on turn-based playing, and two players
typically have a few seconds to a few minutes for their turn.
Naturally, AI players are also allowed to have some amount
of time for the decision process. However, unlike board
games, video games are based on the real-time processing
of user inputs, and AI processing is accomplished between
rendering frames. Although the number of frames per second
(fps) varies between games, it is typically faster than  fps,
with a  fps maximum, to provide gamers with seamless
interactions. As a result, the AI may have only  to  ms
per single frame, assuming that it can employ all of its
computational resources. In a multithreading environment,
the AI can work independently of the rendering engine;
however, game contexts change dynamically during the AI’s
thinkingtime,forcingthedecisiontobequicklyperformed.
is means time is one of the most important constraints in
the context of GVGP problems. Like in GGP, MCTS has been
widely adopted for GVGP; however it is characterized by a
very limited number of simulations and depth, which causes
a horizon eect [].
Second, it is not easy to predict future states in video
games. In traditional board games, most information is open
to both players, and there are a nite number of valid moves
for each piece. ese characteristics make it possible to
attempttopredictplayers’behaviorsinaboardgame.Alter-
natively, video games have two signicant challenges related
to predicting the future outcome of the game. First, they use
randomness to determine the appearance of obstacles and
for NPC behavior []. Because the game environment is
changing over time in a random fashion, it is dicult to
predict future states from the current game context. Second,
thenumberofpossibleactionspermoveisoeninnite
because the players are able to control units in any direction.
Furthermore,similartopokergames,apartoftheopponents
information may be hidden. For example, a fog-of-war is
common in many real-time strategy games, and the vision of
players is limited to the areas around allied units. As a result,
a forward model to simulate games can suer from inherent
inaccuracy.
Finally, the general denition and acquisition of relevant
features for GVGP are not trivial because each game has a
variety of dierent game objects, whereas the observation
of objects can change with the viewpoint settings. In board
games,thegamespaceisboundtotheboardandall
the information on the board is open to both players. In
video games, there are many dierent types of game objects
including, but not limited to, animals, monsters, items, and
natural objects. It is not easy to convert the game objects
in each game scene into vectors of numbers for evaluation
functions. Additionally, there are many dierent techniques
to support the generation of diverse views of game scenes. For
example, there can be a third-person perspective, zooming
in and out, and so on. Moreover, in some video games
(in particular those protected by commercial laws), the
acquisition of gaming events is not allowed. In this case,
researchers oen use screen-capture-based image processing
and direct memory data access to extract information from
scenes.
5.2. GVGP Platforms. Because GVGP is a relatively new
research topic, there are a small number of available plat-
forms for benchmarking purposes. e oldest platform is
ALE (Arcade Learning Environment), which was proposed
by the Alberta games group and uses Atari  games
[]. Recently, D. Perez et al. developed a GVG-AI plat-
form inspired by ALE, and this platform was used at the
GVGP competition held at the IEEE CIG conference []. In
addition, there are open-source platforms such as Learnfun
& Playfun (L&P) []andPiglet[] that use soware
e Scientic World Journal 
(a) Chopper Command (b) Freeway (c) Ms. Pac-Man
F : Examples of games in the Arcade Learning Environment.
emulators. Although these two platforms have yet to be
described in the literature, their design is similar to that of
ALE.
5.2.1. Arcade Learning Environment. ALE is an open GVGP
platform developed by the Atari  game group in Alberta
(http://www.arcadelearningenvironment.org/). It is based on
Stellar, an Atari  emulator, and, most importantly, AI
programs can be developed for this platform. Atari 
was a home video game console released in  and has
more than  available games. Among these are traditional
gamessuchasPac-ManandSpaceInvaders(Figure ). e
main advantage of the platform is that it supports older
games released in the early days of the video game industry.
Due to the variety of supported games, ALE is a good
platform for GVGP research. In addition, games can be
simulated by storing the emulator’s memory, registers, and
states. For these reasons, ALE has been used extensively in
GVGP and reinforcement learning research [,]and
canbeusedforsearch/planningwithMCTSandmodel-
based reinforcement learning. Additionally, the system in this
platform receives  action inputs by emulating a joystick with
onepushbutton.Finally,theAIdevelopercanhavevideo
outputs and memory states from the emulator.
Because the ALE platform is based on an emulator,
there are some restrictions on the interface between the AI
and the console game. Although, there is no restriction on
emulating the inputs, the gaming events themselves must be
interpreted from the raw data, including visual output and
memory data. Image processing (or vision) algorithms can
be applied to a captured two-dimensional screen image to
identify objects and backgrounds. Because this processing
takes time, sophisticated vision algorithms cannot be used to
enhance the accuracy of recognition. As a result, the game-
statedatafromaconsolegameisuncertainandisoflow
resolution with less structured forms.
5.2.2. GVG-AI Competition Platform. e GVG-AI platform
was developed to promote ALE-inspired GVGP. ALE sup-
ports diverse games from Atari , and there have been
successful research initiatives using the platform. However,
successful ALE-based GVGP research is hindered by the
diculty in extracting game-state data from the raw data.
is was an inevitable problem for the ALE platform because
it was based on game emulation. Although performing
GVGP with raw data is similar to human-like game play,
it signicantly increases the problem complexity. On the
other hand, the GVG-AI platform can collect information
on the current game state, such as object instances and
validity checks of actions. Taken together, this allows for
gamesimulationbasedonforwardmodels.Inthisway,this
platform avoids the technical problems of emulator-based
systems and allows researchers to focus primarily on solving
GVGP issues.
In contrast to ALE, GVG-AI utilizes VGDL, which is
similartotheGDLusedinGGP[,]. In GGP, GDL is used
to dene game rules. Similarly, VGDL can dene the rules of
two-dimensional video games. Each VGDL description can
be translated into a game in the GVG-AI platform. Inside
theplatform,aVGDLdescriptioncontainsthegamelogic
requiredtorunthegameandallthecomputationalresources
are available to the AI developer, regardless of the VGDL
denition. Specically, the GVG-AI platform uses a VGDL
JAVA programming language port of the initial PyVGDL
format that was designed as a subset of Python.
e limitations of the GVG-AI platform are that it
supports a limited number of games and that there are some
restrictions on the creation of games. Although it is easy to
dene video games using VGDL, there were only  such
games available at the  IEEE CIG competition. Relative
to ALE, this is a small number of games that does not cover
the plethora of game genres (Figure ). To overcome these
shortcomings, there have been eorts to automate game-level
generation using PuzzleScript, which is similar to VGDL [].
Because VGDL is designed to automatically create games and
generate procedural content, it is expected that the GVG-AI
platform will have more games in the near future []. At this
moment, however, VGDL supports only a two-dimensional
grid-style game environment, which hinders the creation
of new types of games. Recent projects have focused on
extending the original VGDL to rst-person shooting games
[].
5.2.3. Learnfun & Playfun and Piglet. ere are two addi-
tional platforms, L&P []andPiglet[], which were
developed independently. However, they were not developed
specically for GVGP or game AI research and are not well-
represented at academic conferences or in journals (although
 e Scientic World Journal
(a) Alien (b) Butteries
(c) Frogs
F : Examples of types of games in the GVG-AI competition platform.
(a) Super Mario Bros (b) Karate Kid (c) Pac-Man
F : Examples of games available in the Learnfun & Playfun platform.
L&Pispartiallydescribedin[]). Instead, these two plat-
formshavebeenpostedonpersonalblogs,GitHub,and
YouTube. e authors have made the platforms available for
GVGP research, although there is little connection between
them and the foci of academic GVGP research.
Learnfun & Playfun was implemented using a Nintendo
Entertainment System (NES) emulator. e primary goal of
the creator was to design soware to play games without
human intervention. As a result, L&P was designed to learn
how to play a game using game-play data without the need for
game-specic knowledge. Although this learning process is
time-intensive, the trained AI can eventually successfully play
many dierent types of games. Additionally, L&P automati-
cally determines which variables are consistently increasing
in the memory of the learned data and uses this as a target
function. For instance, although soware may be designed
for Super Mario Bros, it is possible that it can also play
other games, including Hudson’s Adventure Island, Pac-Man,
Karate Kid, and Bubble Bobble without signicant changes in
the code (Figure ).
Piglet,whichissimilartoL&P,isbasedonaGameboy
emulator (Figure ). Its author noted that he referred to the
L&P work while developing Piglet. e main dierence is
that the target function in Piglet is based on curiosity and
novelty. Although L&P was designed to increase numbers
corresponding to item counts, scores, and positions to rank
variablesforthetargetfunction,Pigletplaysgamesinsucha
way as to maximize the number of changes in the memory.
is curiosity-based approach (seeking novelty) has been
used to solve deceptive problems [].
5.3. Algorithms for GVGP. In video game AI, the controller
canbeseenasanagentwithsensorsandactuators[]. It
continuously collects data from the game environment using
logical sensors and makes decisions for actuators. Although
many techniques have been developed for game AI, they
e Scientic World Journal 
(a) Legend of Zelda (b) Super Mario
F : Examples of games available in the Piglet platform.
are usually coupled to predetermined target games, making
them more or less game-specic. Because video game AI is
highly dependent on game-specic knowledge, it does not
generalize well to other games without major revisions. is
means that successful techniques used in conventional video
game AI design are likely to fail when used for GVGP without
consideration of “generalization.” As in GGP, it is desirable to
use domain-free knowledge (less dependent on game-specic
knowledge) to train algorithms capable of automatic game
analysis. From this perspective, we can divide the GVGP
problem into three subparts. In this section, we will describe
each subproblem with related works:
(i) Feature extraction and dimension-reduction tech-
niques to improve learning eciency.
(ii) Search/planning algorithms independent of any
domain knowledge.
(iii) Ecient learning algorithms that can learn new envi-
ronments.
5.3.1. Inputs and Feature Extraction/Reduction. ere are dif-
ferent types of inputs available in the GVGP platform. In the
GVG-AI platform, inputs are represented as structured game
objects []. ese provide an API to the information on
gaming objects, with some restrictions in competition mode.
In this platform, the AI can determine the position, speed,
number of items, number of enemies, and player position
if allowed by the API. Unlike the GVG-AI platform, the
ALE platform is based on emulators and therefore provides
strikingly dierent inputs: a computer screen image (color
values of pixels) and memory data. Although this is more like
human visual processing by way of raw-level inputs [], it
signicantly increases the complexity of preprocessing. e
computer screen image input requires a vision algorithm to
segment the gaming object from backgrounds and tricks to
identify the dierent types of objects (e.g., mines or enemies)
[]. In addition to the screen inputs, raw memory data
from emulators can be used as an input because they contain
dierent types of information about the current game state.
One way to evaluate the informative memory is to use the
lexicographic order [].
e size of the inputs in ALE (i.e., screen images and
memory) is relatively large: the screen resolution is ×
(, pixels) and the memory has  bits. Because the raw
data is so large, the detection of game objects in this search
spaceisnotatrivialproblem.Forexample,tolocatetheplayer
avatar, feature sets are generated by exhaustively enumerating
all single patterns of sizes  ×,  ×, and  ×[].
is produces ,, dierent feature sets. To address
this complexity problem, researchers have used a tug-of-war
hashing algorithm to reduce the dimensions of the input data.
Alternatively, a dierent study []proposedthatthe“con-
tingency awareness” concept drawn from cognitive science
to be applied to GVGP. e contingency awareness concept
restsonthepremisethattheabilitytoknowsomeaspectsof
future observation is under the agent’s control, whereas other
aspects are determined solely by the environment. is is
important in GVGP because it helps identify important areas
of interest, which are dened as contingency revisions by the
authors.isapproachwastestedusinggamesinALE
and involved segmenting the input space into several regions,
thereby reducing the amount of information processing.
Similarly, in [], a large observational space was decomposed
into a number of smaller, more manageable subproblems
by factoring the raw inputs using  dierent Atari 
games. Recently, Mnih et al. [] showed promising results
for ALE platform games using deep learning techniques and
the automatic feature extraction of high-dimensional inputs.
5.3.2. Search/Planning Methods. SimilartoGGP,MCTSis
one of the most promising techniques in GVGP research.
e GVGP competition, hosted at IEEE CIG , showed
that MCTS is also one of the most popular techniques in
GVGP. In [], the authors reported that MCTS performed
signicantly better than a breadth-rst search on the ALE
platform. It is not an easy task to achieve good performance
without domain-specic knowledge. Perez et al. proposed
using a knowledge-based (KB) enhancement of MCTS called
KB Fast-Evo MCTS, which takes advantage of past experi-
ences []. ey reported that the use of a KB signicantly
improved the performance of the algorithm on the GVG-
AI platform. Alternatively, Vafadost developed techniques for
temporal abstraction inside MCTS for the eective construc-
tion of medium/long-term plans on the ALE platform [].
Inthisstudy,theVariableTimeScale(VTS)wasshowntobe
 e Scientic World Journal
a promising technique for determining the optimal time scale
for taking each action.
ere are several problems with the use of MCTS in
GVGP. e main diculty is the limitation on the number
of simulations per frame. In the GVG-AI framework, the
controller has approximately  ms between each frame.
Because the game state is updated at each frame, it is not
practical to wait for multiple time frames to engage in long-
termplanning,asitisnecessarytorespondwithoneaction
per frame. Additionally, waiting increases the inaccuracy of
simulationsbecausethegamestateislikelychangingwhile
the simulation is running. Another diculty when applying
MCTS to the GVG-AI framework is the randomness of NPCs
and enemies. Because the MCTS algorithm is based on a large
number of simulations derived from the current game state,
itisessentialtohaveanaccuratepredictionoffuturegame
states aer a nite number of actions. However, there is no
way to accurately predict the future positions of game objects
in video games with regard to randomness (e.g., related to
NPCs and movement). As a result, MCTS must use uncertain
predictions for future game states. Finally, the search space of
games is usually extremely high. In traditional board games,
the number of possible game states aer a nite number of
actions is determined by the possible valid actions per each
player move (dened as a branching factor). Although the
GVGP platform is based on two-dimensional games, which
have a limited number of valid actions per time frame, the
game states are also being aected by the positions and states
of NPCs and enemies. Moreover, the number of movable
gaming objects per frame is not small (as shown in Figure ),
and it compounds to cause an exponential growth of possible
states.
Although MCTS has been the most successfully applied
approach in GVGP research, evolutionary algorithms have
also been used to tackle the problem. For example, the
IEEE CIG  GVGP competition featured an MCTS as
well as a simple genetic algorithm (GA) approach. e nal
rankings showed that although the GA approach was not as
competitiveasMCTS,theGAshowedpotential.Basedon
this result, it is predicted that hybridizing an evolutionary
algorithm with MCTS (e.g., KB Fast-Evo MCTS) will allow
a synergy between the two search techniques [].
5.3.3. Learning and Adaptation Methods. In learning meth-
ods,theAIcontrollerattemptstolearnhowtoplaythegame.
In [,], SARSA(), a traditional technique for model-
free reinforcement learning, was augmented with a linear
function approximation. e parameters of the learning
algorithm were tuned by training on ve games and then
tested on  games. e goal of the agent was to maximize
the accumulated award by observing action-reward loops.
e authors reported that the learning approach showed
potential on the ALE platform, but there was room for
performance improvement. Additionally, during the learning
process, dierent types of feature representation methods
were compared using Atari ’s screen and memory. e
conclusion was that there is no dominant learning method
that can cover all the games. Recently, Mnih et al. proposed
the use of deep neural networks for reinforcement learning
problems on the ALE platform [].
In a model-based approach, the goal of learning is to nd
a model that properly selects the next action based on the
current game state. Recently, an evolutionary articial neural
network developed for GVGP incorporated a large number
of neurons and connections [,]. In [], the authors
found that Hypercube-based NeuroEvolution of Augmenting
Topologies (HyperNEAT; an evolutionary neural network),
which can handle high-dimensional inputs, was promising
for two games on the ALE platform. e authors prepro-
cessed game screens using image-processing techniques to
generate inputs for the neural networks, which, in turn,
returned key actions. In [], the authors showed compre-
hensive experimental results of dierent types of evolutionary
neural networks (or NeuroEvolution, NE). ey compared
Conventional NE, Covariance Matrix Adaptation Evolution
Strategy (CMA-ES) NE, Evolutionary of Network Topology
and Weights (NEAT), and HyperNEAT on  test games.
ey found that direct encoding algorithms (such as NEAT)
outperformed the other methods on low-dimensional, pre-
processed objects, and noise-screen representations but the
indirect encoding method, HyperNEAT, was promising for
fully generalized raw-pixel representations.
5.4. Relationships between GGP and GVGP. Techn i c a lly
speaking,GDLcouldbeusedforvideogames.Itallowsnon-
determinism (in the GDL-II variant), simultaneous actions,
and any number of players. However, it was not tailored for
the application in video games and there are better solutions
for this task.
e primary reasons of why GDL is not used in GVGP
are as follows:
(i) Complexity of dening sophisticated video games,
especially real-time ones with continuous (frequently
occurring) events: such denitions would end up
in GDL being very bloated and unnatural. ey
would typically require extensive amounts of rules,
lots of articial timer facts which are counterintuitive
and dicult to read by humans, and many state
updates only with no-op (no operation) moves from
players because, in video games, states oen change
regardless of the player’s actions. Such complex and
extremely lengthy descriptions are dicult to main-
tain or even understand by humans.
(ii) Complexity of simulating video games in GDL: the
GDL interpreters are relatively slow. Video games
typically require fast responses from the players (e.g.,
ms).GiventhefactthatdescriptionsinGDLwould
already extremely be complex in the case of real-time
video games, the available time could be even too low
to carry out a single simulation of a video game in
GDL. ere are no prospects that this could change
in any foreseeable future.
e distinction between GGP and GVGP is done already in
formulation of goals in both frameworks. ey were designed
to work in parallel, rather than interfere or extend each other.
e Scientic World Journal 
GGP focuses on combinatorial mind games which may have
complex rules as long as they retain the discrete nature. In
such games, a state is updated much less frequently than
in GVGP and usually as a direct result of the actions taken
by players. GVGP games may even have simpler structure,
but the setup is more complex as the games are played
with a much faster pace and events causing the state update
mayoccuratanytime.erefore,GVGPneedsadedicated
framework optimized for this kind of usage.
anks to the optimal usage of GDL and GVGL, the
respective competitions (GGP and GVGP) may operate using
a vastly dierent set of games which dier in the properties
mentioned in the previous paragraph:
(i) GGP: mostly discrete combinatorial nature, the game
state usually changing to players’ actions, and typi-
cally + seconds for a move
(ii) GVGP: mostly continuous nature, frequent incre-
mental changes of the game state, and typically
+ ms for a move
Although two games, which can be considered video ones,
that is, Pac-Man and Street Fighter, have been dened in
GDL, they were signicantly simplied compared to their
original counterparts.
6. Benchmarking and Competitions
In GGP, the ocial competition [,] plays an important
role in encouraging and promoting research; naturally, it
is used for benchmarking. Since , the competition has
been associated and colocated with either the Association
for the Advancement of Articial Intelligence (AAAI) or
the International Joint Conference on Articial Intelligence
(IJCAI). e number of participants is usually between  and
. e tournament consists of two phases. e preliminary
phaseisopentoallparticipants,andthetopeightteams
advance to the nals (except for , when the top 
advanced). In the nals, the best player is chosen in a double-
elimination playo format. At each phase, dierent types of
games are used. For example, the preliminary phase may use
single-agent, two-player, and multiplayer games. However,
for the nals, only two-player games have been used so far. In
the nal stage, each match between two players is played in
a best-of-three setting. e competition includes turn-based
or simultaneous move and zero-sum or non-zero-sum games,
including anything from simple puzzles to complex chess-
like games. Some variants of popular board games have been
used, such as checkers played on a cylindrical board or Tic-
Tac-Toeplayedinparallelonmultipleboards.
e GGP competition has shown steady progress in
the performance of the strongest program and has evolved
to incorporate human versus machine matches (known as
carbon versus silicon) aer the ocial tournament phase.
Except for the rst year of the event, the machines have
outperformed human players. Aer several years of competi-
tion, progress has become apparent as the new players can
easily beat the old players. ere are several sophisticated
approaches implemented in the GGP competition, such as
game-independent heuristics (mobility, inverse mobility, and
goal proximity), learning weights on game-playing heuristics,
MCTS, and structural analysis and compilation []. In
, a General Game Playing course was oered online by
COURSERA [], which has led to a signicant increase in
competition participants.
e success of the annual GGP competition has inspired
the GVGP community to start a similar event in  []. e
GVGP competition is based on the VGDL and provides 
sample games for training purposes. An additional  games
each are used for validation and testing stages. However,
these  games are not open to the public until the day of
competition. Participants are allowed to test their algorithms
using the  validation games to obtain scores, but the VGDL
les themselves are not available. e VGDL is designed for
modeling two-dimensional grid environments with the pro-
tagonist, nonplayer characters, obstacles, and other objects;
Pac-Man and Space Invaders are examples of games that can
be modeled by VGDL. e VGDL consists of the following
components:
(i) Sprite Set: all available sprites for the games (parame-
tersanddisplaysettingsarealsoincluded).
(ii) Level Mapping: relationships between characters and
sprites.
(iii) Interaction Set: specication of events when two
sprites collide in the game.
(iv) Termination Set: end condition of the game.
Because the VGDL is not visible to the AI in the validation
and testing stage, it is not easy to initially understand the
goalandobjectivesofagame.Asgameplayprogresses,the
AI controller must determine the goal of the game, how to
increase the controller’s scores, the events that occur when
twospritescollide,andthenatureofthespritesinthe
game. e organizers provide four dierent types of example
controllers: random, one-step look ahead, genetic algorithm,
andMCTS.Becausetheyareallowedonlyaboutmsper
frame, it is important to eciently simulate future game
states.
In the  competition,  participants submitted their
entries. For comparison, the four example controllers were
included in the evaluation. e best AI player was ranked
rst in ve in ten hidden test games. Based on the description
of the winner, it used “an open loop” tree search to build
a tree representing sequences of actions. e open loop
means that no state is stored in the tree. e UCB (Upper
Condence Bounds) formula introduced a “taboo bias”
penalty for actions leading to avatar positions visited in the
recent past. Also in the  competition, the two best players
outperformedthesampleMCTS.Ta b l e  summarizes the
players, scores, ranks, and the techniques used at the IEEE
CIG  competition.
To evaluate AI, organizers executed  games with ve
dierent levels. In total, this yielded  games per AI (
=× ×). To rank the AIs, the organizers used the ()
number of victories, () total points, and () elapsed time
to compete levels. e number of victories was the most
important factor, but the total number of points was used as
 e Scientic World Journal
T : Summary of entries in the IEEE CIG  competition.
Rank Entry name Total score Approach
Adrienctx  (i) Open loop tree search with UCB
(ii) Taboo bias
JinJerry  (i) Multistep look forward
(ii) Heuristics
sampleMCTS
(sample)  MCTS
Shmokin  (i) MCTS
(ii) Hill climbing
Normal MCTS  MCTS
Culim  Online Q-learning
MMbot  MCTS
TESTGAG  GA (Genetic Algorithm)
Yrai d  RBS (Rule based System) with GA
 TTompson 
(i) Steepest-ascent Hill Climbing
(ii) Random move
(iii) Horizon capped Asearch
 MnMCTS  MCTS
 sampleGA
(sample)  GA
 IdealStandard  (i) Find all nonlethal sprites by simulation
(ii) Visit nonlethal sprite randomly
 random
(sample)  Random move
 Tichau 
 Sampleonesteplookahead
(sample)  One step look ahead
 levis  Multistep look ahead
 LCU 
a tiebreaker in cases of a draw. Each AI received a point score
based on its ranking: the rst-place entry received  points,
the second-place entry received  points, and the third-place
entry received  points; the entries in lower places received
, , , , , and  points and  point, respectively. Entries
ranked lower than th received zero points.
7. Challenges
Despite signicant development of GGP and GVGP domains
in recent years there are still many open questions and chal-
lenging issues which are worth considering and interesting
research topics. Some of them, chosen based on the subjective
preferences of the authors, are listed below.
Human-Like Play ing. Cognitively-plausible and human-like
playing [] has been a challenge in AI which is still unsolved.
A majority of the top players use the MCTS algorithm which
can be considered a rened brute-force approach relying on
a high number of simulations. e MCTS-based players tend
to play many games in a similar way which can be spotted
and exploited by a human player. Extensive calculations are
rarely involved in human-like playing. Instead, we rely on
intuition, creativity, experience, and detecting visual patterns
while playing [,]. Bringing all these four concepts to GGP
is a grand challenge. Tackling intuition could be started from
a robust method of focusing the machine (simulations) only
on certain actions and discarding unpromising ones very
quickly. Detection of visual patterns could be started o by a
method of an automatic visualization of game states in GGP.
Opponent Modeling. Opponent modeling is an important
asset of game-playing programs and the realization of this
concept poses a challenge in many games. Naturally, a proper
opponent modeling is crucial to games in which there are
manyiterationsofthesamegameagainstthesameopponents
such as Poker [].Butitismorethanthat;thetreesearch
algorithms quietly assume some kind of (usually rational)
opponent’s behavior. Having a proper opponents’ model, the
GGP agents could prioritize exploration of certain parts in the
game tree associated with actions more likely to be played by
theopponentsandhavethebetterassessmentofthesestates.
Knowing a prole of an opponent could also alter the strategy
of our player. So far, there had been only a limited success in
implementing this concept in GGP.
Game Description and Representation.euseofGDLas
a game-dening framework makes certain approaches pro-
hibitive. Firstly, simulations of games written in GDL are
e Scientic World Journal 
slow because of several reasons. Some constructions such as
math expressions including very basic arithmetic, ordered
data types, and loops are not part of the language and have to
be simulated implicitly. Secondly, a suboptimal performance
of GDL is a price to pay for its universality. Because of
this universality, a GDL description contains no information
aboutthegameexceptforthewayofcomputingtheinitial
state, legal moves, state updates, and verication of whether
the state is terminal and if so what are the goal values for
theplayersinthatstate.erearenocluesaboutwhatkind
of objects constitute the state or what does a particular fact
mean. Additionally, there is no way to put any metadata
in the GLD game description. Furthermore, while a GDL
gamedescriptioncanbeeasilyusedtoformalsimulationofa
game, it is almost impossible, in a general case, to detect what
the game is about and which are its crucial, underpinning
concepts. On the other hand, a majority of successful game-
dedicated programs (e.g., in chess, bridge, or go) dwell on the
game-related concepts.
Wewouldliketoposethreechallengesinthisarea:
(i) To replace GDL by a more game-oriented (while still
general) description language.
(ii) To come up with an automatic way of translating rules
writteninGDLtoamoreecientrepresentationin
terms of performance and access to knowledge.
(iii) To design a method of discovering game-related
objectsinagamewritteninGDLinordertobuild
the internal game representation.
Transfer Learning. Transfer learning means reusing knowl-
edge learned while playing one game in playing another
game.InGGP,thisconceptcanbetackledinthefollowing
three ways:
(a) Changing the GGP specication to include a unique
nameintheGDLdescription:inthisway,transfer
learning between matches of the same game could be
naturally implemented.
(b) Performing automatic mapping of equivalent games
(even if the descriptions are obfuscated, the order
ofrulesischanged,etc.):suchideawaspursued
in [] by means of automatic domain mapping
from a formal description of a game, which was
consequently used to transfer an evaluation function
between games.
(c) Retaining the GGP specication but still enable trans-
ferlearning:tothisenditwouldbenecessaryto
extract universal high-level concepts existing in many
games (to enable transfer) and design algorithms
which operate on them (to enable learning and usage).
Transfer learning can speed up the learning process as
players would not have to learn from scratch. It could
also reveal insights about similarities between games which
would be especially useful for game descriptions related to
or inspired by real-world problems. e concept of transfer
learning clearly overlaps with human-like playing as humans
intuitively transfer game-playing experience between similar
games.
e Use of Computational Intelligence. Computational Intelli-
gence [] encompasses variety of methods which are adap-
tive and general and can be used in games without any apriori
knowledge. Among them, most notably, neural networks and
arichfamilyofmetaheuristicsseemtobeaperfectchoicefor
multigame playing. However, CI-based methods have not yet
hadmuchsuccessinGGP.eachievementsandperspectives
as of  can be found in []. Probable reasons why the
CI algorithms are not fuelling the state-of-the-art players are
the limitations of GDL and relatively short time available for
learning in the GGP protocol. On a general note, CI-based
learning methods are oen too slow even with specialized
game engines. e lack of game-related features present in
GDLalsohampersapplicationofmanyCImethods.Such
features could, in principle, be used as an input to a neural
network or expressed in the form of genomes in evolutionary
approaches.
While GDL shortcomings generally hinder the ecient
use of CI methods in competitive GGP, we believe that
multigame playing (in the form of GGP or another) is,
nonetheless, one of the grand challenges for CI/AI [,,].
Technical Chall e n g e s . Improving the ocial General Game
Playing Competition has been a constant challenge for the
organizers. e community needs new games, higher number
of unbiased games (with equal chances of winning for
all players), a better communication protocol (perhaps not
requiring participants to host their players as they were
servers), and nally a way to attract more participants.
eMCTSalgorithmusedinGGPcouldbeimproved
further. In particular, the algorithm could be tweaked online
to better suit the currently played game and also use some
knowledge discovered in this game. Better parallelization
schemes, preferably adjusted online to the played game, and
faster inference engines are relevant technical challenges as
well.
Further Investigation on Video Game GGP. Recently, the
GGP for video games has been newly introduced to the
computational intelligence and games society. It is similar to
traditional GGP research on board games but includes video
games, more challenging and close to real-world commercial
games.Becauseitisquitenewresearcharea,thereisvery
small number of publications available so far compared
to GGP. However, there are very interesting fundamental
building blocks for future successful research including
GVGP platforms, VGDL, and GVGP competitions. e next
problem is to dene more useful GVGP platforms, extension
of VGDL for complex video games, automatic creation of
video games using VGDL, cross-fertilization between GGP
and GVGP research, and application to commercial product.
8. Summary and Conclusions
Inthissurvey,wehavelistedrecentadvancesinGGPsince
. Although we cover just four years, there has been
 e Scientic World Journal
big progress related to MCTS, GVGP, and competitions
in this time frame. In the GGP research, there have been
successful papers on hybridization of game-independent
search/planning/heuristics and knowledge extraction from
game playing. e boundary of GGP has been expanded to
video games by the introduction of emulator-based platforms
(e.g., ALE) and VGDL-based platforms (e.g., GVG-AI). e
introduction of GGP to video games raised several new
challenges: what are the important inputs from video games
(memory or screen visual inputs)? Which GGP techniques
remain successful for GVGP? A new challenge is also related
to the invention of VGDL and a variant of GGP competition
dedicated to video games (GVGP). Successful approaches to
GVGP revealed important new insight on the understanding
of GGP for video games. ey can give useful advancement
in engineering and ideas for cognitive science to understand
humans generalization ability when playing games.
Compared to the GGP competition, which was initiated
in , the GVGP competition is at quite an early stage of
development (it is just one year old as of ). e competi-
tion organizers have not yet created enough new video games
and, therefore, the expression of the VGDL is still limited.
Also,therearenotmucheducationresourcesforGVGPcom-
petition as opposed to the GGP competition. For the latter,
there is a massively online open course (MOOC) available.
Nevertheless, the progress in GVGP eld is clearly visible: a
new game description language (VGDL) was specied and
several platforms, hobby-style research works, and media
exposure (deep mind []acquiredbyGooglecompany)
appeared recently. Lately, a group of leading researchers in
GVGP had a meeting at Dagstuhl in Germany []. We believe
that the eld will soon become one of the most important
areas of the game AI research.
Conflict of Interests
e authors declare that there is no conict of interests
regarding the publication of this paper.
Acknowledgments
is work was supported by the National Research Founda-
tion of Korea (NRF) grant funded by the Korean government
(MSIP) ( RAAA) and by the National Sci-
ence Centre in Poland, Grant no. DEC-//B/ST/.
M. ´
Swiechowski thanks the Foundation for Polish Sci-
ence under International Projects in Intelligent Computing
(MPD) and e European Union within the Innovative
Economy Operational Programme and European Regional
Development Fund.
References
[] F.-H. Hsu, “IBM’s deep blue chess grandmaster chips,IEEE
Micro,vol.,no.,pp.,.
[] J. Schaeer, N. Burch, Y. Bjornsson et al., “Checkers is solved,
Science,vol.,no.,pp.,.
[] J. Ma ´
ndziuk, “Towards cognitively plausible game playing
systems,IEEE Computational Intelligence Magazine,vol.,no.
, pp. –, .
[] C. Clark and A. Storkey, “Teaching deep convolutional neural
networks to play go,http://arxiv.org/abs/..
[] J. Ma´
ndziuk, Knowledge-Free and Learning-Based Methods in
Intelligent Game Playing,vol.ofStudies in Computational
Intelligence, Springer, , edited by: J. Kacprzyk.
[] D. B. Fogel, T. J. Hays, S. L. Hahn, and J. Quon, “e Blondie
chess program competes against fritz . and a human chess
master,” in Proceedings of the IEEE Symposium on Computa-
tional Intelligence and Games, pp. –, May .
[] M. Genesereth, N. Love, and B. Pell, “General game playing:
overview of the AAAI competition,AI Magazine,vol.,no.
, pp. –, .
[] N. Love, T. Hinrichs, and M. Genesereth, “General game play-
ing: game description language specication,” Stanford Logic
Group LG--, Computer Science Department, Stanford
University, Stanford, Calif, USA, .
[] J. Levine, C. B. Congdon, M. Ebner et al., “General video game
playing,” in Proceedings of the Dagstuhl Seminar on Articial and
Computational Intelligence in Games,pp.,.
[] T. Schaul, “An extensible description language for video games,
IEEE Transactions on Computational Intelligence and AI in
Games,vol.,no.,pp.,.
[] http://gvgai.net/.
[] http://games.stanford.edu/index.php/ggp-competition-aaai-.
[] C. B. Browne, E. Powley, D. Whitehouse et al., “A survey
of Monte Carlo tree search methods,” IEEE Transactions on
Computational Intelligence and AI in Games,vol.,no.,pp.
–,  .
[] P. Kissmann and S. Edelkamp, “Gamer, a general game playing
agent,K¨
unstliche Intelligenz,vol.,no.,pp.,.
[] H. Finnson and Y. Bj ¨
ornsson, “Game-tree properties and MCTS
performance,” in Proceedings of the 11th IJCAI Workshop on
General Game Playing (GIGA ’11), pp. –, Barcelona, Spain,
.
[] S. F. Gudmundsson and Y. Bj¨
ornsson, “MCTS: improved action
selection techniques for deterministic games,” in Proceedings of
the IJCAI-11 Workshop on General Game Playing (GIGA ’11),pp.
–, Barcelona, Spain, July .
[] S. F. Gudmundsson and Y. Bj¨
ornsson, “Suciency-based selec-
tion strategy for MCTS,” in Proceedings of the 23rd International
Joint Conference on Articial Intelligence (IJCAI ’13),pp.
, August .
[] H. Finnsson, “Generalized Monte-Carlo tree search extensions
for general game playing,” in Proceedings of the 26th AAAI
Conference on Articial Intelligence,pp.,July.
[] J. Clune, “Heuristic evaluation functions for general game
playing,” in Proceedings of the AAAI Conference on Articial
Intelligence, pp. –, July .
[] H. Finnsson and Y. Bj¨
ornsson, “CadiaPlayer: search-control
techniques,K¨
unstliche Intelligenz,vol.,no.,pp.,.
[] S. Gelly and D. Silver, “Monte-Carlo tree search and rapid action
value estimation in computer Go,Articial Intelligence,vol.,
no. , pp. –, .
[] J. Mehat and J.-N. Vittaut, “Online adjustment of tree search for
GGP,” in Proceedings of the IJCAI-13 Workshop on General Game
Playing (GIGA ’13), pp. –, Beijing, China, August .
e Scientic World Journal 
[] M. J. W. Tak, M. H. M. Winands, and Y. Bj¨
ornsson, “N-grams
and the last-good-reply policy applied in general game playing ,
IEEE Transactions on Computational Intelligence and AI in
Games, vol. , no. , pp. –, .
[] M. Tak, M. Winands, and Y. Bj¨
ornsson, “Decaying simulation
strategies,” in Proceedings of the 13th IJCAI Workshop on General
Game Playing (GIGA ’13), pp. –, Beijing, China, August
.
[] M. ´
Swiechowski and J. Ma´
ndziuk, “Self-adaptation of playing
strategies in general game playin