ArticlePDF Available

Abstract and Figures

In this paper, we discuss the interest and the need to evaluate the difficulty of single player video games. We first show the importance of difficulty, drawing from semiotics to explain the link between tension-resolution cycles and challenge with the player’s enjoyment. Then, we report related work on automatic gameplay analysis. We show through a simple experimentation that automatic video game analysis is both practicable and can lead to interesting results. We argue that automatic analysis tools are limited if they do not consider difficulty from the player point of view. The last two sections provide a player and Game Design oriented definition of the challenge and difficulty notions in games. As a consequence we derive the property that must fulfil a measurable definition of difficulty.
Content may be subject to copyright.
Measuring the level of difficulty in single player video games
Maria-Virginia Aponte, Guillaume Levieux
, Stephane Natkin
Centre d’Etudes et de Recherches en Informatique du CNAM, Conservatoire National des Arts et Métiers, Paris, France
article info
Article history:
Received 22 June 2010
Revised 8 April 2011
Accepted 9 April 2011
Available online 22 April 2011
Keywords:
Video games
Challenge
Difficulty
Learning
Evaluation
abstract
In this paper, we discuss the interest and the need to evaluate the difficulty of single player video games.
We first show the importance of difficulty, drawing from semiotics to explain the link between tension-
resolution cycles and challenge with the player’s enjoyment. Then, we report related work on automatic
gameplay analysis. We show through a simple experimentation that automatic video game analysis is
both practicable and can lead to interesting results. We argue that automatic analysis tools are limited
if they do not consider difficulty from the player point of view. The last two sections provide a player
and Game Design oriented definition of the challenge and difficulty notions in games. As a consequence
we derive the property that must fulfil a measurable definition of difficulty.
Ó2011 International Federation for Information Processing Published by Elsevier B.V. All rights reserved.
1. Introduction
One of the fundamental issues to tackle in the design of video
games is mostly referred to as creating a well-shaped difficulty curve.
This means that one of the core element of a good game design is to
make the game just as difficult as it has to be, so that the player
feels challenged enough, but not too much. However, game cre-
ators cannot rely on strong tools to help them in this task, and
there is not even a clear and accepted definition of difficulty as a
measurable parameter. For now, game difficulty adjustment is a
subjective and iterative process. Level and game designers create
a sequence of challenges and set their parameters to match their
chosen difficulty curve. Finding the right sequence and tuning
every challenge rely mainly on playtesting performed by the
designers. Playtesting is a heavy time consuming task, and it’s very
hard for a designer to evaluate the difficulty of a challenge he cre-
ated and played for many hours. Our goal is to provide a clear, gen-
eral and measurable definition of the difficulty in games. We must
rely on accepted definitions of video games and works relating the
games difficulty to the games quality, as perceived by the player.
We present related work on automatic gameplay analysis, and
then report a first experiment with a basic synthetic player. We
then define difficulty, taking into account the player experience
and a function of time. To conclude, we propose a way to explore
the link between the player abilities and the probability to lose a
challenge, providing an interesting measure for the game designer
to explore the difficulty of his game’s challenges.
2. Scaling the difficulty
Difficulty scaling is a fundamental part of game design [1,2].
However, this is not an obvious consequence of accepted defini-
tions of video game. Jesper Juul has listed many of them and has
proposed a synthesis [3]. We start from Juul’s definition to explain
why difficulty scaling is so important in game design:
‘A game is a rule-based formal system with a variable and quanti-
fiable outcome, where different outcomes are assigned different
values, the player exerts effort in order to influence the outcome,
the player feels attached to the outcome, and the consequences of
the activity are optional and negotiable.’
This definition gives a clear, precise idea of how a game system
behaves, and manages to take into account the most interesting
parts of the previous definitions. But for our purpose, we must ex-
plain more precisely why difficulty is a primary component of any
gameplay. The fact that the player exerts effort in order to influence
the outcome, and feels attached to the outcome is the core point. To
point out the important components of a gameplay, and foremost
the link between caring about difficulty and making a good game,
it is necessary to coin a definition that leaves aside the game’s
dynamics structure and focuses on video games from the player’s
point of view.
Robin Hunicke describes a game using a Mechanics, Dynamics
and Aesthetics (MDA) framework [4]. Mechanics are the tools we
use to build a game (e.g. physics engines, pathfinding algorithm...),
Dynamics describes the way the Mechanic’s components behave in
response to the player, and Aesthetics is the desirable emotional
responses evoked to the player. Of course, the design goals is the
Aesthetics, that is to say the player’s emotions. We argue that
1875-9521/$ - see front matter Ó2011 International Federation for Information Processing Published by Elsevier B.V. All rights reserved.
doi:10.1016/j.entcom.2011.04.001
Corresponding author.
E-mail addresses: maria-virginia.aponte_garcia@cnam.fr (M.-V. Aponte), guillau-
me.levieux@cnam.fr (G. Levieux), stephane.natkin@cnam.fr (S. Natkin).
Entertainment Computing 2 (2011) 205–213
Contents lists available at ScienceDirect
Entertainment Computing
journal homepage: ees.elsevier.com/entcom
the difficulty of challenges greatly influences video game’s aesthet-
ics and thus plays a central role in game design.
Umberto Eco’s book The open work is a fundamental research
about interactive art’s aesthetics [5]. Umberto Eco states that when
we face a piece of art, we are interpreting it, seeking patterns and
looking for information. Depending on our culture and knowledge,
we will find something to grab on within the stimulating field of
the piece of art. But then we will go further, and find another inter-
pretation and feel lost for short moment, while shaping our new
pattern. Moreover, when a piece of art is interactive, the aesthetic
value comes both from the tension resolution and from the fact
that this resolution is a consequence of our choice. Assuming that
a video game is an open work we can propose a similar analysis.
Every time the player faces an obstacle, he gets lost for a few sec-
onds. Then he finds and chooses a pattern, presses the right but-
tons, and takes pleasure both from resolving a tension and from
making a choice. Thus, we can draw from Umberto Eco’s work that
in video games, challenge is fundamental because it creates ten-
sion situations that the player has to solve and the opportunity
of meaningful choices.
Related work on video game player’s enjoyment support our
analysis and place challenge at the center of video game’s aes-
thetics. In his book A Theory of Fun for Game Design, Ralph Koster
states that we have fun playing games when we discover a new
pattern, i.e. a strategy that we apply to overcome a challenge
[6]. Sweetser and al see challenge as one of the most important
part of their Game Flow framework [7]. Yannakakis et al. measure
player enjoyment from challenge, besides behavior and spatial
diversity [8].
Mihaly Csikszentmihalyi’s Theory of Flow, that researchers have
applied to video game as a measure of the player’s enjoyment,
helps us to make a link between the difficulty of a challenge and
the player’s enjoyment [9,10,7]. A player is in a Flow status, and
thus enjoying the game, when the task is neither too hard nor
too easy. It is thus not enough to create tension situations and to
give the player choices to resolve this tension. A good game design
must accurately scale the difficulty of a challenge to have a tension
level that leads to the player’s enjoyment. Thus, a definition of a
game from the Aesthetic point of view and centered on challenges
could be:
‘Regarding challenges, the Aesthetics of a game is created by ten-
sion-resolution cycles, where the tension is kept under a certain
threshold, and where the resolution of a cycle depends on the
player’s choices.’
This definition does not take into account every aspect of game
aesthetic but is focused on challenge, that most studies consider as
a core component of game’s aesthetics. Tension situations that the
player seeks and try to solve have been created by the game de-
signer and the amount of tension they deliver directly stems from
their complexity. As a result, difficulty scaling is a central task of a
good game design. Games already propose multiple difficulty lev-
els [1], and sometimes even Dynamic Difficulty Adjustment [2],
manipulating some specific parameters of the gameplay in real
time [4], or automatically scaling the game AI capacity [11]. But
whichever difficulty scaling method the game designer uses, he
must still tune them properly. It is sometimes really hard to guess
to which extent a change in a low level parameter will just make
the game a bit harder or dramatically change the gameplay [1],
and tuning is one of the most time consuming area in game AI
development [12]. This is the design process that we want to
shorten by providing tools that will help game designers evaluat-
ing the impact of any difficulty scaling parameter on the final
difficulty curve. It’s then fundamental to provide game designers
with strong tools and a definition of difficulty as a measurable
parameter.
3. Related work: testing with a synthetic player
Our goal is to evaluate a parameter or a set of parameters that
can be considered as a measure of a game difficulty. There are
two theoretical approaches to evaluate such a parameter. The first
way is to find, according to the game structure, a mathematical
expression of the parameter and to solve the corresponding equa-
tions. The complexity of a game and the notion of difficulty tend to
show that this approach is not practicable. The second solution is
to experiment the game and measure the parameter. To experi-
ment the game, we may use either a real or a synthetic player.
The main advantage of using a real player is that it demonstrates
human behaviors. In counterpart he plays slowly, becomes tired
and his behavior is only known through the game interface. The
synthetic player is tireless, plays quickly and his behavior can be
fully understood. The design of the synthetic player allows to sim-
ulate some foreseen behavior of a real player (risky or careful, for
example) and some simple learning techniques.
Gameplay testing has already been the subject of many inter-
esting researches. Alasdair Macleod studied gameplay testing of
Perudo, a bidding dice game, simulating plays with a multi-agent
system [13]. He wanted to modify Perudo’s gameplay to make it
more fair, and added a rule which he thought would help losing
players to stay in the game. By running the experiment and analyz-
ing the modified game, he obtained the counter-intuitive result
that the rule was not helping the players at all. These results shows
that self-play testing can help testing gameplay modifications.
Neil Kirby analyzed Minesweeper, replacing the player by a rule
based AI [14]. Each rule was related to a different play complexity.
He found out that Minesweeper was surprisingly not as hard as he
supposed it to be, as the most part of the board was often solved
using only the very simple rule. These results point out that
automated techniques can provide interesting approaches to study
video game difficulty.
Both Perudo and Minesweeper are simple games, but auto-
mated analysis can also be applied to complex off-the-shelf
games. Bullen et al. used Unreal Engine (Epic Games) and created
a gameplay mutator providing sensors to log useful game events
[15]. They tested Unreal Tournament 2004 (Epic Games) using
partial and fully automated testing (i.e. both during player vs AI
and only AI games). They pointed out that fully automated tests
had to be done with a specific AI, because standard AI was not
aggressive enough. The fact is that standard Unreal Tournament
AI has been created to entertain the player, not to mimic his
behavior, and thus is not able to fully explore the gameplay. Re-
cently, Lankveld et al. proposed to analyze a game difficulty using
incongruity, the distance between the actual dynamics of the
game and the mental model the player has built [16]. They plan
to infer the complexity of the player’s mental model, and thus
the difficulty of the game, by monitoring his actions. These works
show that, to be useful, a synthetic player must simulate in some
way a real player.
Automated game analysis can be done at several levels. Nantes
et al. distinguish Entertainment Inspection (i.e. gameplay testing),
Environment Integrity Inspection (i.e. Sounds, graphics related is-
sues) and Software Inspection [17]. Their system targets Environ-
ment Integrity Inspection, using Computer Vision, and especially
corner detection to detect aliasing issues in shadows rendering.
This is a complementary approach to the one we propose, and Nan-
tes et al. acknowledge the need of analysis tools at every inspection
level.
206 M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213
As we argued in the previous section, Machine Learning is par-
ticularly interesting in automated gameplay testing. If we want the
synthetic player to test behaviors that were not under consider-
ation before, it must explore the game state space by himself.
Many researchers explore how machine learning can be helpful
to video game development, and especially concerning automated
testing. Chan et al. use a Genetic Algorithm to create sequences of
actions corresponding to unwanted behavior in FIFA-99 (EA
Games) [18]. They also consider that the game dynamics is too
complex to be fully formalized, due to the reason of huge branch-
ing factor, indeterminism and the fact that even designers never
formally define it. There is thus a need to build an AI driven agent
to explore this dynamics, here using evolution techniques. Spronck
et al. also took the same approach, making neural networks evolve
to test a simplified version of the spaceships game PICOVERSE, pro-
viding an insightful analysis of its AI driven opponents [19].
Automated learning approaches becomes inadequate when it
comes to creating characters with complex behaviors automati-
cally from scratch, as sated John E. Laird [20]. But many researchers
use games to evaluate machine learning techniques. The game is
often considered as a reference problem to be solved by the AI. Pac-
man (Namco) [21], for example, has been the subject of many re-
searches, applying various techniques to create synthetic players,
from Neural Network Evolution [22–24], to Reinforcement Learn-
ing [25], Genetic Programming [26] and genetic evolution of a
rule-based system [27]. Yannakakis et al. [8] takes another point
of view. They use synthetic characters to maximize player enjoy-
ment, and validate their measure of enjoyment, based on chal-
lenge, behavior diversity and spatial diversity . These results
show that machine learning techniques can be useful when analyz-
ing a gameplay.
4. Case study
4.1. The experiment
These sections present an analysis of a Pacman-like predator-
prey computer game. We built a simplified version of Pacman,
using only one ghost chasing Pacman. Both Pacman and the ghost
use A
pathfinding. The whole graph and shortest path were built
and stored at startup and stored to save calculation power. The
challenge is to eat a maximal number of pellet without being killed
by the ghost. The synthetic player has a partial view of the game. It
knows five parameters. The first one had four values, giving the
direction of the nearest pellet. The four other dimensions describe
the four Pacman directions. For each direction, Pacman knows
whether he is facing a wall (0), a ghost one (1) or two (2) step away
from him, a free-of-ghosts path less than 18 steps long (3), or free-
of-ghosts path longer than 18 steps long (4). We choose this game
state abstraction because we consider that the main information
that a real player uses is the path to the nearest pellet and the
safety of the fourth direction he can take.
The only ghost in the maze, Blinky, has been programmed to
chase the player, taking the shortest path to reach him. In Pacman
original rules, ghosts periodically enter scatter mode and stop chas-
ing the player to go back to their respective board corner. But as a
first step, we wanted to maintain the rules at their minimum com-
plexity, so that results could be more easily interpreted. The syn-
thetic player AI was programmed with a Markov Decision
Process, using reinforcement learning with Q-Learning algorithm
with eligibility traces ðQðkÞÞ [28]. Parameters for QðkÞwere
a
¼0:1;
c
¼0:95;k¼0:90. We balanced exploration and exploi-
tation using
-greedy action selection algorithm, with
¼0:05.
We consider the analysis of Pacman difficulty according to a
single gameplay parameter: the player speed. The number of
pellets eaten by Pacman is our difficulty evaluation function. We
choose this parameter because in Pacman original gameplay,
ghosts/Pacman relative speed is already used to scale difficulty
[21]. Every 14 frame, Blinky changes its position, moving one step
up, down, left or right. Each step is 16 pixel long, the distance be-
tween two pellets. Besides going up, down, left or right like the
ghost, the player also has the option to do nothing and stay at
the same place. We tested the synthetic player’s performance for
different gameplay configuration.
4.2. Results
We run six experiments with speed varying from 0 (i.e. Pacman
and the ghost move every 14 frame) to 7 (i.e. Blinky still moves
every 14 frames but Pacman moves every 7 frame). We let the syn-
thetic player develop his policy during 50000 games, Pacman hav-
ing 3 lives per game.
The Fig. 1 presents a synthetic view of these results. What we
can extrapolate from these is that modifying Pacman’s speed, the
difficulty tends not to be modified in a linear way. There is much
less difference in Pacman score between speed 0 and 5 than be-
tween speeds 5 and 7. Such an analysis could be useful for a game
designer when tuning the difficulty. He can understand that when
Pacman speed gets closer to twice the ghost speed, then the games
get really easier. Between 0 and 5, difficulty raises almost linearly.
4.3. Critical analysis
These results show that it is possible to evaluate a difficulty
curve, for a given game with a given challenge whose difficulty
can be tuned according to a given parameter. However, this exper-
iment is just an attempt to describe difficulty for a specific game. It
does not give us a general framework we could apply to any game
to measure its difficulty. The next step is thus to find a general and
precise definition of the difficulty.
5. The meaning of challenge and difficulty
At this point we have used the term of difficulty in games with-
out providing any definition of this word. We shall not try to give a
general definition of difficulty covering a wide range of psycholog-
ical aspects from emotional problems to intellectual and physical
challenges. Instead, we consider the notion of difficulty in the
sense used in game design, that is video games are built as combi-
nations or sequences of predefined challenges. Thus, the study of
the overall difficulty for a given video game involves studying
the difficulty players experiment when overcoming individual
challenges in that game.
Fig. 1. Pacman score at different speeds – 5000 last games mean value.
M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213 207
In this section we give a precise definition of challenges in video
games and explain how the progression of difficulty while playing
a given path of challenges in a video game can be seen as the pro-
gression of the difficulties for each challenge in the sequence of
challenges the player actually attempt to overcome. We discuss
on the aspects a measurable notion of challenge difficulty must in-
volve and we end by defining this notion as a conditional
probability.
5.1. Challenges and their difficulty
In game playing, a level of difficulty is related to the player skills
or abilities to overcome a given challenge. We must first define the
notion of challenge in games. Starting from Juul’s definition, we
can consider that a video game challenge is by itself a sub-game:
a rule based system with variable and quantified outcomes.
According to the quantification, some of the outcomes may be con-
sidered either as a success or a failure. A general definition of the
difficulty must allow considering either binary (WIN,LOSE) or dis-
crete (from 0 to Npoints) quantification. Without losing generality,
we consider that in all cases the designer can define a binary func-
tion that can decide whether the player has won or lost, and thus,
we adopt a binary quantification in the rest of this section.
This leads to the two following definitions: a video game chal-
lenge is a dynamic rule based system with two outcomes WIN or
LOSE. At a given time ta challenge can be in one of the four possible
states NOT STARTED, IN PROGESS, WIN, LOSE, according to the fol-
lowing automaton (Fig. 2).
A solution of a challenge is a sequence of player’s actions and
the corresponding game feedback that leads the automaton from
the NOT STARTED state to the WIN state.
In games solving a challenge may imply to solve a set of sub-
challenges. The structure of the game as a set of quests and levels
is a logical, topological and, as a consequence, temporal combina-
tion of challenges (see [29] for a formal definition of this organiza-
tion). Consider two challenges aand band a game where the player
must solve ato solve b;ais said to be a sub-challenge of b. When the
player knows how to solve b, he knows how to solve a. As a conse-
quence any solutions of bincludes at least one of the solutions of a.
5.2. Progression of difficulty
In video games, the choice of challenges and the succession of
challenges is related to the flow principle explained in Section 2.
In many Game Design books, the progression of tension cycles is
presented using the Learning/Difficulty curves as depicted in
Fig. 3 [30],[31]. At any time of the game, the difficulty of the next
challenge must be a little higher than the current level of the
player apprenticeship. When he wins the challenge and the tension
decreases, the player gets new skills and abilities. This correlated
progression of skills abilities and difficulty must be kept all along
the game.
The same idea is expressed by Jesper Juul using the notion of
repertoire of methods [32]
‘At every instant within a game, a player has created for himself a
collection of methods and strategic rules which he has devised and
which he applies (the player’s repertoire). One strength of a good
game is to constantly challenge the player, which in turn leads
him to constantly find new strategies, apart of those already in
the repertoire’.
Thus, we propose to study the difficulty of each challenge
played in a given sequence as a mean to follow the progression
of difficulty for that particular path in the game.
We distinguish two ways of controlling the difficulty progres-
sion: (a) the progression of skills and (b) the combination of
challenges. The progression of skills relates the difficulty of a gi-
ven challenge according to the mastering of a given set of
parameters. Here, difficulty relates to an idea of complexity:
what type of problem a human ‘‘processor’’ is able to face taking
into account his level of practice. How far can he move buttons,
memorize configuration, how precise can be his shot, how long
can he stay on one foot? As in any sport or mental exercise,
the player’s practice enhances his skills, and the same challenge
can be solved using parameters chosen to increase the level of
difficulty.
The second way of controlling the progression of difficulty we
identify is by combining several challenges. The solution of many
game challenges rely on mastering a set of basic techniques and
then in trying to combine them. In strategy games, you master first
the strength and movements of units, then the position of produc-
tion units, then the technological evolution. At each level of a
game, the player understands new mechanisms, then slowly com-
bines them. It is the same with basic attacks and combo in fighting
games, with group strategy in FPS, and, at last but not least, the
increasing complexity of puzzles in adventure games.
As an illustration, consider the three following challenges: (A)
Send a ball in a basket. For a given ball the difficulty of A decrease
with the size X of the basket. (B) Press a button when a given event
occurs. The difficulty decreases with the accepted error E between
the date of the event and the moment when the button is pressed.
(C) For given X and E, sending a ball in a basket when a given event
occurs is more difficult than A followed by B. The point here is that
challenge (C) is a combination of challenges (A) and (B), such that
the individual mastering of the two latter allow the player to in-
crease his skills and eventually overcome (C).
Based on these assumptions about what we could call, the com-
posionality of challenges, we consider that in a game, the progres-
sion of difficulty relies on the use of two sets of challenges:
Fig. 2. Automaton of a challenge.
Fig. 3. Difficulty and learning curves.
208 M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213
A set of basic or atomic challenges, whose complexity can be
controlled through a set of parameters.
A partially ordered
1
set of composite challenges, built on top of
atomic challenges. The solutions of composite challenges can be
deduced from those of lower level challenges.
5.3. The difficulty evaluation function for a given challenge
Our aim is to give a measurable notion of the difficulty DðaÞfor a
given challenge a. Let us set up some properties that must fulfil this
function:
Dmust be measurable using for example a tool able to record
the internal states of the game.
Dmust allow comparing the difficulty of challenges of the same
‘‘kind’’ (jumping in platform game, for example).
Dmust be relative to the game history, in particular to the pro-
gression of the player’s skill according to the set of challenges
already overcome.
Let Abe the set of all challenges that have been solved before
time 0. We define LOSEða;tÞand WINða;tÞas the following events:
LOSEða;tÞ=the automaton of areaches the state LOSE before time
t, starting at time 0.
WINða;tÞ=the automaton of areaches the state WIN before time
t, starting at time 0.
We propose a definition of the difficulty D for a given challenge a
and related to tile tas the conditional probability:
Dða;tÞ¼ProbabilityfLOSEða;tÞ=A1Þ
The easiness E of acan be also defined in the same way:
Eða;tÞ¼ProbabilityfWINða;tÞ=A2Þ
At all time Dða;tÞþEða;tÞ61. We can also consider the steady state
difficulty and easiness:
D
ðaÞ¼lim
t!1
Dða;tÞand E
ðaÞ¼lim
t!1
Eða;tÞð3Þ
If challenge amust necessarily be finished in the game then we
have:
D
ðaÞþE
ðaÞ¼1ð4Þ
These functions gives us two kind of information about the chal-
lenge difficulty. First, E
gives us the difficulty of the challenge in
terms of the probability that a player overcomes it. But we also
can be more precise and with Eða;tÞ, get the probability that a
player has to overcome the challenge before time t. We assume that
designers are able to implement in the game code some triggers
associated to the transitions in each challenge automaton during
a test performed by players. The time needed to perform a challenge
and the fact that a challenge has been successful or not can be
recorded.
The definition of the difficulty of a challenge as a function of
time does not mean that the difficulty of a challenge is, in general,
simply related to the time spend to overcome the challenge. The
difficulty of some challenges is a decreasing function of the time
to over come it: disarm a bomb on less than one minute. It can also
be an increasing function when, for example, the number of ene-
mies increase with time spend in a room. The time spend to solve
an enigma is related to its difficulty. Many other cases can be found
in games. In our definition we do not assume the way the time
influence the difficulty of a challenge. This relation is needed as
we define the Difficulty as a stochastic process. The definition
and parameters evaluation of such a process are related to a behav-
ior observed over a given period. Our definition is directly inspired
by the formal specification of dependability parameters such the
safety or the availability of a system. For example, S(t), the safety
of a plane during a fly (probability that a catastrophic event
does not occur before the time t) is not, in general, a simple func-
tion of the mission time. The asymptotic safety S
, the probability
that the fly end safely, is the equivalent of the asymptotic difficulty
D
.
Moreover, difficulty should be defined as a stochastic process,
as it is the way it is considered when building a gameplay. Indeed,
shaping a player’s experience and thus crafting the right difficulty
curve implies a through understanding of the time it takes to win a
challenge. Thus, D
ðaÞis a good, synthetic estimation of the chal-
lenge’s adifficulty, but D(a,t) gives some precious additional infor-
mation to the game designer.
In an open game, there is a small chance that two players will
reach the same challenge following the same history. Hence A,
the set of challenges being solved by the two players, could be dif-
ferent. So, it is necessary to drive the tester with a walk-through. In
this case the difficulty Dand the easiness Ecan be statistically esti-
mated, and, under some ergodic assumptions D
ðaÞand E
ðaÞalso.
This can lead to validate experimentally numerous assumptions
about the learning and the difficulty curves. For example, if ais a
sub-challenge of bthen D
ðaÞ<D
ðbÞ. In the same case, if the
player plays twice a, even if he loses the first execution, the second
one should be less difficult. Lets denote ðaknowingaÞthis second
challenge:
D
ðb knowing aÞ6D
ðbÞ
If ais a sub-challenge of band if the player has already played abe-
fore bthen
D
ðb knowing aÞ6D
ðbÞ
Defining the difficulty of the game as a probability is also very use-
ful as it helps us getting a precise evaluation of the challenge’s dif-
ficulty, which only depends on a stable, accepted variable of the
gameplay: the challenge’s binary result. Indeed, only taking into ac-
count the binary result would not be appropriate without a statisti-
cal approach: the tuning of a gameplay variable can lead to subtle
changes in the difficulty, that a binary measure is not able to repre-
sent. But from many observations of this binary result, we can get a
fairly precise evaluation of the challenge’s difficulty.
This section has described a general definition of difficulty for a
given challenge in the context of a given path of challenges. It can
be measured experimentally, using many players constrained to a
specific follow up of challenges by a walk-through. In order to sim-
plify the formulaes of the next section, we assume that any time
we write Dwe implicitly refer to D
, and that as a result, we can
always take again into account the time in any of the next defini-
tions of difficulty. The next section adopts a complementary ap-
proach by exploring the relationship between the basic behaviors
the player learns and practices when playing a game, and his prob-
ability to lose or win a challenge according the level of his master-
ing of them.
6. Abilities and difficulty
In this section we propose a method to analyse the link be-
tween a challenge’s difficulty and the abilities the player has to
develop in order to win it. We call abilities the basic behaviors
1
In the sense of challenges that can be played following a given scenario.
M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213 209
the player has to learn and practice in order to increase his effi-
ciency at playing the game. In a racing game for example, the
player learns to follow the best trajectory for every road bend
type, like sharp bends or double bends. In shooting games, the
player learns to be more and more accurate with each weapon,
and to keep changing his direction, to prevent the other players
from aiming at him. If correctly executed, these abilities will help
the player to win the challenge. Our objective here is to link the
player’s level for a given ability to the probability to win or lose
the challenge.
We define the relation between the level of a given ability and
the result of a challenge completion as a conditional probability,
that is, the probability to lose a challenge knowing the player’s cur-
rent ability level. What we propose to measure is the player level
for one specific ability while he is playing a given challenge. Having
done many of these measures
2
for each level of a particular ability,
we are able to calculate the probability the player has to win the
challenge, knowing his level for that ability. We thus statistically
estimate the relation between the ability level of the player and
the challenge completion probability. To keep our proposal simple,
we consider a finite and small number of levels.
This conditional probability can reveal many aspects of the
gameplay. It can be used to understand which ability is the most
important to win a challenge. A strong correlation between the
ability level and the probability to win the challenge means that
this ability is an important aspect of the gameplay for this
challenge. It can also be used as a comparison basis between chal-
lenges
3
: an important variation of the required ability level between
two consecutive challenges denotes an important change in the
gameplay between these challenges. It can also be used as a predic-
tion tool: if we know, from statistical measures, the probability one
has to win a specific challenge for a given ability level, we can ob-
serve a player, calculate his current level for this ability and get as
a result his probability to win the challenge. This kind of prediction
can be very useful when dynamically adjusting the difficulty of a
game, for instance.
The remainder of this section explains how to calculate the
probability the player has to lose, knowing his level for one specific
ability.
6.1. Assumptions and notations
In this section, we first set up some hypothesis and notations
we will use further. We start by defining the player level for an
ability.
The ability level must be calculated for a specific ability, and a
specific challenge. Indeed, the player’s level for an ability depends
on the context and can vary between challenges: for example, the
faster the targets will move, the lower the player accuracy will be.
To keep the model as simple as possible, the ability level will be a
discrete value, for instance
4
one
of the following: fbad;a
v
erage;
goodg. We thus assume the following:
a game is made of challenges, called c2C.
the player develop abilities, called a2A.
each ability can be mastered at a certain discrete level l2L.
In this article, we consider Las having three levels: L¼
fbad;a
v
erage;goodg.
the function ple
v
el
c
ðaÞ:AC!Lis the player level for ability
a, when the player is engaged in the challenge c. The method
to compute this function is given in the next section.
Given these definitions, we state a link between the challenge c
and an level of player mastering for ability a, as the conditional
probability dðc;aÞ, given by:
dðc;aÞ¼ProbabilityfLOSEðcÞ=ple
v
elða;cÞg ð5Þ
In the next sections, we will statistically evaluate, for each value of
ple
v
el
c
ðaÞ, that is fbad;a
v
erage;goodgin this article, the probability
the player has to lose the challenge. The first step, explained in the
next section, is to calculate the player level for a given ability aand
a given challenge c, that we call ple
v
el
c
ðaÞ.
6.2. Computation of ple
v
el
c
ðaÞ
We propose to statistically evaluate the player level for an abil-
ity from his playing history during a session game. We assume two
hypothesis here: the player behavior can be recorded, as a follow
up of game events that we call a trace; the game designer, as an ex-
pert, can help us determine the most important events describing
the player behavior, in the context of a specific ability we try to
measure. We will use this trace of game events to estimate the
player level for a given ability.
The next following sections describe the traces of events and
the calculation of the ability level.
6.2.1. Trace of events
We are measuring a player’s ability for some specifics challenge
and ability. To do so, we record what happens in the game when
the player is trying to solve the challenge, and measure the quality
of his behavior along the dimension of our chosen ability. We do
not record every change in the game state, but only the changes re-
lated to the player ability we want to measure. If we measure accu-
racy, we only need to record two specific events: when the player
shoots and when he hits the target. We will thus record traces of
these events, that we define as follow:
agame event ge
a;c
is an important change in the game state
related to the ability awe are measuring during the challenge c.
a challenge ends when it reaches a terminal state, that is when
the player has won or lost this challenge (see the automaton in
Fig. 2).
When the player tries to complete a challenge c, he is involved
in a challenge session s
a;c
, related to an ability a.
achallenge session trace t
a;c
2T
a;c
is the sequence of all game
events ge
a;c
recorded during a challenge session s
a;c
.
the function resultðt
a;c
Þ:T
a;c
!fwin;losegdetermines if, at the
end of a given challenge session, the player has won or lost
the challenge.
6.2.2. Abilities and subgoals
All our measures rely on one central hypothesis: we suppose
that the main goal of the player is to win the challenge he is in-
volved in. To overcome it, the player repeatedly tries to reach short
term goals, or sub-goals, that allow him to move closer from his
main goal. When playing a game like God of War (Sony – Santa
Monica Studio), the player is slashing group of enemies, one after
the other. Each groups can be considered as a challenge. But when
playing, his actual goal is to execute a combo at the right moment,
to kill as many enemies as possible. He will try to reach this sub-
goal as often as possible, and eventually will kill all the enemies
and progress to the next challenge.
2
For many players.
3
We compare them on the basis of the same ability.
4
The number of levels is not a fixed value, and can be modified to match the
needed level of description. But the more precise the model is, the more data it must
be fed with to be statistically valid.
210 M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213
We suppose also, that mastering an ability eventually leads the
player to reach a certain sub-goal
5
when playing the game. During
one challenge session, many games involve the repetition of the ba-
sic behaviors we call abilities. Thus, the player will try, again and
again, to attain the sub-goal related to the ability we measure, with
a certain probability of success. During a challenge session, we will
count how many times the player succeed of fail to attain this
sub-goal, and thus get a statistical measure of the ability level from
his many attempts to reach it.
The attainment of this sub-goal must only depend on the player
level for this ability. For example, in Mario series (Nintendo), we
cannot measure the player ability to kill mushrooms from the goal
of staying alive: it is a too broad goal, also influenced by the
player’s capacity to jump from a platform to another one, avoid
turtles shells, and many others. But the sub goal of having a dead
mushroom is perfect for this ability.
Given a challenge session trace, we assume being able to iden-
tify when the player has reached the sub-goal related to the ability,
and when he has failed reaching it. If we do so, we can evaluate at
which frequency the player has reached the sub-goal, and use this
score as the player level for this ability. Having a trace
tða;cÞ2Tða;cÞof all game events happening during a session for
a challenge cand ability a, we need to fulfill the following condi-
tions in order to calculate the player level:
each ability a2Athat the player can use in challenge cmust
correspond to a short term goal g
a;c
of the player.
there exists a bijective function subgoal
c
ðaÞ, statically defined
with the assistance of the game designer, giving the subgoal
g
a;c
related to an ability aduring a challenge c.
we must be able to count, in challenge session trace t
a;c
, how
many times the player has reached g
a;c
, which we define as
the function nsuccess
c
ðt;aÞ
we must be able to count, in challenge session trace t
a;c
, how
many times the player has failed to reach g
a;c
6
, which we define
as the function nfailurecðt;aÞ
If we know how many times the player succeeded or failed to
reach the sub-goal related to an ability, we know the probability
the player has to reach the sub goal:
ProbaSubgoal
c
ðt;aÞ¼ nsuccess
c
ðt;aÞ
nsuccess
c
ðt;aÞþnfailure
c
ðt;aÞð6Þ
The result probability is thus a score for the player level on a gi-
ven ability, during a challenge session trace. We will not keep this
value but to lower down the complexity of the player model, map
it into the discrete values, of L¼fbad;a
v
erage;goodg. The mapping
could, for example, be defined as follow:
ple
v
el
c
ðt;aÞ¼
bad iff 0 6ProbaSubgoal
c
ðt;aÞ<0:3
a
v
erage iff 0:36ProbaSubgoal
c
ðt;aÞ<0:6
good iff 0:66ProbaSubgoal
c
ðt;aÞ61
8
>
<
>
:ð7Þ
The following algorithm shows the calculus of the function
ple
v
el
c
ðt;aÞ. The function do have a challenge as a parameter, as
we only give the traces we recorded for a specific challenge and
ability.
1function plevel(Ability a, Trace t)
2return Level
3{
4var g=subgoal(a,c);//Get the subgoal related to a and c
5var nbsuccess=0; //To count the successes
6var nbtotal=0; //it Total counter
7
8foreach subtrace ti of t where players tries
to reach g
9{
10 if(success(ti, g)) //If goal reached
11 nbsuccess++;
12 nbtotal++;
13 }
14
15 //Get the probability to reach the sub-goal
16 var p=nbsucess / nbtotal;
17
18 //Map it to a discrete value
19 var level=‘‘bad’’;
20 if(p>0.3)
21 level=‘‘average’’;
22 if(p>0.6)
23 level=‘‘good’’;
24
25 return level;
26}
Many different means can be used to know if a player tries
to reach a subgoal (line 8 of the previous algorithm). We will
explore different options, the first one being to manually de-
fine a state machine that recognizes when a follow up of
events means that the player has succeeded or failed to reach
the goal.
Now that we can calculate, for a given ability and a challenge
record session, the player level, we are ready to calculate d, and
will do so in the next section. We will record many challenge exe-
cution trace for a given challenge, group them according to the
player ability level we observe in the trace, and thus calculate
the probability the player has to lose in each case of the conditional
probability d, for each player ability level. The next section explains
this part of the calculus.
6.3. Calculating d
We are going to statistically evaluate d, using the set of traces
T
a;c
we have recorded. We partition the set Tða;cÞof all the chal-
lenge session traces we recorded for challenge cand ability a, into
cardðLÞsubsets: T
bad
a;c
;T
a
v
erage
a;c
;T
good
a;c
, using ple
v
el
c
ðt;aÞs:
T
bad
a;c
will contain all the traces t
a;c
2T
a;c
where ple
v
el
c
ðt;aÞ¼bad
T
a
v
erage
a;c
will contain all the traces t
a;c
2T
a;c
where ple
v
el
c
ðt;aÞ¼
a
v
erage
T
good
a;c
will contain all the traces t
a;c
2T
a;c
where ple
v
el
c
ðt;aÞ¼
good.
Our partition groups traces into subsets by ability level. Each of
these levels is a different value of the conditional part of the prob-
ability d. The next step is to estimate the probability to lose, for
each ability level.
As we said in the previous section, we can also know, for any
challenge session trace, if the player has won or lost the challenge,
that is, the challenge sessions result result
c
ðt;aÞ. We can thus
5
That we assume such that there is a unique sub-goal for each ability, and
conversely.
6
That hypothesis amounts to being able to recognize patterns of events
corresponding to the attainment of given sub-goal.
M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213 211
count, in any subset T
0
of T
a;c
, the number of traces where the
player has lost, which we will call nlostðT
0
Þ.
Given the number of lost challenge sessions in each partition
subset nlostðT
0
Þ, we can now get the player’s probability to lose
for each ability level, that is, for each subset of T
a;c
:
ProbaLostðT
0
Þ¼nlostðT
0
Þ
cardðT
0
Þð8Þ
We can then build the whole conditional probability table for
the function dfor an ability aand a challenge c. We calculate from
our trace in T
c
, the value ProbaLostðT
0
Þ, for every T
0
;T
bad
a;c
;T
a
v
erage
a;c
;
T
good
a;c
, These values correspond to the probability to lose for each
player level of ple
v
el
c
ðt;aÞand we thus know all the conditional
probabilities of the function d:
dðc;aÞ¼ProbabilityfLOSEðcÞ=ple
v
el
c
ðaÞg ð9Þ
The next algorithm shows the computation of dusing
ple
v
el
c
ðt;aÞdefined in the previous section. It calculates the func-
tion dfor a given player level and an array of challenge session
traces, recorded for a specific challenge. Thus, the function need
not know for which challenge the calculus is done, as it only gets
the session traces of a specific challenge.
1function delta(Ability a, Level l, Trace T[])
2return float
3{
4var nbsuccess=0; //To count the successes
5var nbtotal=0; //Total counter
6
7foreach trace t of T
8{
9//Take only the traces corresponding to the level we want
10 if(plevel(a,t)==l)
11 {
12 if(result(t)==‘‘win’’)
13 nbsuccess++;
14 nbtotal++;
15 }
16 }
17
18 return nbsuccess/nbtotal;
19}
6.4. From local to global abilities
In the previous section, we proposed a way to measure the
player abilities to win a challenge. These abilities can be called local,
because we are only defining and measuring them in the context of
a specific challenge. However, we explained in Section 5.2 that what
the player learns in a challenge is often useful in the next one, and
we defined a relation between atomic and higher level challenges.
As a result, many abilities must be shared between challenges: if
the player learns to do a specific task to win a challenge, a higher
level challenge of the same type has many chances to rely on the
same kind of ability, but with different difficulty settings.
Thus, it would be interesting for the game designer to define
global abilities, shared between a set of challenges, and thus
compare the difficulty of these challenges for a given level of
this global ability. If the player must be good at aiming for
any challenge of the game, it would be much more interesting
to know the probability he has to lose each challenge given
his general ability to aim, because we may then compare the
challenges between them.
The problem we are facing, when defining global abilities over
multiple challenges, is that we will not find the same ability level
(Eq. 7) for every challenge. A player may aim poorly and rated as
bad for the aiming ability on a hard challenge because targets are
moving really fast and be rated as good on a much easier challenge.
The probability that Eq. 7gives us will not be the same depending
on the challenge, because the context of the challenge will influ-
ence it.
To transform the local ability given by Eq. 7into a global one,
we propose to observe the local ability ratings the same player gets
when playing two different challenges successively. If we make the
rough approximation that the player level as not raised a lot be-
tween these consecutive challenges and thus consider it constant,
the two ratings we get are the measure of the same ability value
within two different challenges contexts. Given many examples
of consecutive local ability ratings, we may find a function map-
ping these local abilities to a global one. As a result, we may then
define the difficulty of these challenges over the same global abil-
ity. These proposition must be further explored through experi-
mentation though.
6.5. Understanding the player’s abilities that really counts
The automated system we proposed is primarily designed to get
a clear, synthetic view of a challenge difficulty, that is, the proba-
bility the player has to loose the challenge. But it can also be very
useful to understand how this difficulty is created, that is, the core
components that make the game hard or easy.
Indeed, the designer has an idea of the abilities the player has to
master in order to rule the game. In a shooting game, the player has
to move fast, remember the map layout, aim precisely, use certain
strategies. But if the designer can, from his through understanding
of the gameplay he created, identify the player abilities, he may not
exactly know which one of them is the most important, which one
of them really makes an actual player win or lose the game.
That why aside from calculating each challenge ability, we de-
signed our software to output a graphical view of the Eq. 5, as well
as useful statistical values, such as a linear regression. That way,
the designer can visualize the link between any ability he can eval-
uate as a short term goal and the difficulty of the game, and get a
better understanding of the abilities that really count, regarding a
specific challenge.
7. Conclusion
There is a lack of a precise definition of difficulty in games, and
more important, game designer lack of methodology and tools to
measure it. In this paper, we perform a first step to fill this gap.
Our approach is twofold: first, we propose a precise and measur-
able definition of difficulty for challenges in a game on behold of
the past experience of the player, and second, we provide a method
to explore one specific aspect of difficulty in challenges: the rela-
tion between what the player learns and the probability he has
of overcoming that particular challenge.
In a first experiment using a synthetic player and a basic AI dri-
ven player, we are able to extract objective difficulty measures out
of a simple game. As a generalization, we propose a measurable
definition of difficulty based on the past experience of the player.
This notion uses the set of challenge the player has tried to solve,
and define the difficulty of a challenge as the probability to lose
over time. The obtained function is measurable under two
hypothesis: the game design provides a clear specification of
challenges; we are able to constrain the players to a specific follow
up of challenges while recording the results of their playing ses-
212 M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213
sions. This first measure provides a general vision of the game
difficulty.
As a complementary measure, we explore more thoroughly the
player learning when playing a game. More precisely, we propose
to measure the quality in the execution by the player of some basic
behaviours he learns and practices while playing, and which has
been previously identified by the game designer as important to
overcome a given challenge. We call abilities these basic behav-
iours, that we assume to be measurable during a session game.
We propose then to calculate the relation between the level of
mastering of a particular ability and the probability the player
has to lose a given challenge.
These measures can provide insights to game designer when
trying to assess different aspects of a gameplay. They can help
them to determine the importance of an ability for a specific chal-
lenge, or to compare challenges on the basis of one specific ability.
The next step of our research is to implement both measures in dif-
ferent types of games, and to experiment them using real players.
References
[1] D. Boutros, Difficulty is difficult: designing for hard modes in games,
Gamasutra, 2008. Available from: <http://www.gamasutra.com/> (last access
01/2009).
[2] E. Adams, The designer’s notebook: difficulty modes and dynamic difficulty
adjustment, Gamasutra, 2008. Available from: <http://www.gamasutra.com/>
(last access 01/2009).
[3] J. Juul, The game, the player, the world: looking for a heart of gameness, in: M.
Copier, J. Raessens (Eds.), Level Up: Digital Games Research Conference
Proceedings, 2003, pp. 30–45.
[4] R. Hunicke, The case for dynamic difficulty adjustment in games, in: Advances
in Computer Entertainment Technology, 2005, pp. 429–433.
[5] U. Eco, L’oeuvre ouverte, Seuil, 1965.
[6] R. Koster, A Theory of Fun for Game Design, Paraglyph Press, Scottsdale,
Arizona, 2005.
[7] P. Sweetser, P. Wyeth, Gameflow: a model for evaluating player enjoyment in
games, Comput. Entertain. 3 (3) (2005) 3. doi:http://dx.doi.org/10.1145/
1077246.1077253. URL http://dx.doi.org/10.1145/1077246.1077253.
[8] G.N. Yannakakis, J. Hallam, Towards optimizing entertainment in computer
games, Applied Artificial Intelligence 21 (10) (2007) 933–971.
[9] M. Csikszentmihalyi, Flow: The Psychology of Optimal Experience, Harper
Perennial, 1991. Available from: <http://www.amazon.ca/exec/obidos/
redirect?tag=citeulike04-20&path=ASIN/0060920432>.
[10] W. Jsselsteijn, Y. De Kort, K. Poels, A. Jurgelionis, F. Belotti, Characterising and
measuring user experiences in digital games, in: International Conference on
Advances in Computer Entertainment Technology, 2007.
[11] G. Andrade, G. Ramalho, H. Santana, V. Corruble, Extending reinforcement
learning to provide dynamic game balancing, in: IJCAI 2005 Workshop on
Reasoning, Representation, and Learning in Computer Games, 2005, pp. 7–12.
[12] B. Scott, Architecting a game AI, in: AI Game Programming Wisdom 1, Charles
River Media, Inc., 2002.
[13] A. Macleod, Game design through self-play experiments, in: ACE ’05:
Proceedings of the 2005 ACM SIGCHI International Conference on Advances
in Computer Entertainment Technology, ACM, New York, NY, USA, 2005, pp.
421–428. doi:http://doi.acm.org/10.1145/1178477.1178572.
[14] N. Kirby, AI as gameplay analysis tool, in: Game Programming Wisdom 4,
Course Technology, Cengage Learning, 2008, pp. 39–49 (Chapter 1).
[15] T. Bullen, M.J. Katchabaw, N. Dyer-Witheford, Automating content analysis of
video games, in: Canadian Game Studies Association (CGSA) Symposium,
2006.
[16] G. van Lankveld, P. Spronck, M. Rauterberg, Difficulty scaling through
incongruity, in: Proceedings of the Fourth Artificial Intelligence and
Interactive Digital Entertainment Conference, Stanford, California, USA,
2008.
[17] A. Nantes, R. Brown, F. Maire, A framework for the semi-automatic testing of
video games, in: Artificial Intelligence and Interactive Digital Entertainment
Conference, AAAI, 2008.
[18] B. Chan, J. Denzinger, D. Gates, K. Loose, J. Buchanan, Evolutionary
behavior testing of commercial computer games, Congress on Evolutionary
Computation, CEC2004 1 (2004) 125–132, doi:10.1109/CEC.2004.1330847.
[19] P. Spronck, Evolving improved opponent intelligence, in: GAME-ON Third
International Conference on Intelligent Games and Simulation, 2002, pp.
94–98.
[20] J.E. Laird, Game developers magazine.
[21] J. Pittman, The pac-man dossier, Gamasutra, 2009. Available from: <http://
www.gamasutra.com/> (last access 01/2009).
[22] G.N. Yannakakis, J. Hallam, Evolving opponents for interesting interactive
computer games, in: Animals to Animats 8: Proceedings of the Eighth
International Conference on Simulation of Adaptive Behavior (SAB-04), The
MIT Press, Santa Monica, CA, USA, 2004, pp. 499–508.
[23] S.M. Lucas, Evolving a neural network location evaluator to play ms. pac-man,
in: Proceedings of the 2005 IEEE Symposium on Computational Intelligence
and Games (CIG05), 2005.
[24] M. Gallagher, M. Ledwich, Evolving pac-man players: can we learn from raw
input? in: Computational Intelligence and Games (CIG), 2007.
[25] J. Bonet, C. Stauffer, Learning to play pac-man using incremental
reinforcement learning, 2001. Available from: <http://www.ai.mit.edu/
people/stauffer/Projects/PacMan> (accessed 5.12.2008).
[26] J.P. Rosca, Generality versus size in genetic programming, in: Genetic
Programming 1996: Proceedings of the First Annual Conference, MIT Press,
1996, pp. 381–387.
[27] M. Gallagher, A. Ryan, Learning to play pac-man: an evolutionary, rule-based
approach, in: Evolutionary Computation (CEC ’03), vol. 4, 2003, pp. 2462–
2469.
[28] Watkins, C.J. Learning from delayed rewards, Ph.D. Thesis, Cambridge, 1989.
[29] S. Natkin, L. Vega, A petri net model for computer games analysis, International
Journal of Intelligent Games & Simulation 3 (1) (2004) 37–44.
[30] S. Natkin, A.-M. Delocque-Fourcaud, E. Novak, Video Games and
Interactive Media: A Glimpse at New Digital Entertainment, AK Peters Ltd.,
2006.
[31] E. Byrne, Game Level Design (Game Development Series), Charles River Media,
2004. Available from: <http://www.amazon.ca/exec/obidos/redirect?tag=
citeulike09-20&amp;path=ASIN/1584503696>.
[32] J. Juul, Half-Real: Video Games between Real Rules and Fictional Worlds, The
MIT Press, 2005. Available from: <http://www.amazon.ca/exec/obidos/
redirect?tag=citeulike09-20&amp;path=ASIN/0262101106>.
M.-V. Aponte et al. / Entertainment Computing 2 (2011) 205–213 213
... DL-DDA studied offline learning strategies such as clustering for different players and training deep neural networks for prediction tasks [4]. Reinforcement Learning methods like Deep Q-learning are also suitable for the task since the reward function can be linked to Q-functions in online training [5]. ...
... For probabilistic DDA systems like ELO and TrueSkill, the boundary of the confidence interval for their estimated result highly depends on the empirical variance selection for the Gaussian distribution, and depends only on one-shot experience without sufficiently exploiting each player's performance. For learning methods based on the observed data and performance, prediction models fully exploited correlations and similarities among different groups of players in DL-DDA, and historical performance with Deep Q-learning [4,5]. However, they suffer from distribution shifts in decision-making since players act differently towards other opponents. ...
... Performance for multi-armed bandits is evaluated by cumulative regret (5), which represents the cumulative reward gap between the best arm * and selected arm . For all four algorithms tested, their cumulative regrets grow with a logarithm upper bound: ( ( / )). ...
Article
Full-text available
Dynamic Difficulty Adjustment (DDA) is a crucial task in video game design to select the appropriate difficulty level to maintain player engagement. Recent studies have highlighted the importance of developing adaptive systems that balance challenge and enjoyment. This paper explores the application of Multi-Armed Bandit algorithms to DDA, providing a computationally efficient and explainable solution to real-time difficulty adjustment. A numerical game engine was built with a damage calculator and a linear attribute generator for players and monsters. The study tested four Multi-Armed Bandits (MAB) algorithms: epsilon-greedy, Upper Confidence Bound, Asymptotically Optimal Upper Confidence Bound, and Thompson Sampling within the DDA framework. Results showed that all MAB algorithms successfully identified the optimal difficulty within 100 trials, and on average, players won their battles in more than 150 rounds. These results demonstrate the strong performance of MAB algorithms in dynamic difficulty adjustment and indicate the potential for implementing them in more complex gaming environments. The findings suggest that MAB could significantly enhance the adaptability and efficiency of DDA systems in future game development.
... Ensuring that a game's difficulty curve aligns with the designer's vision requires rigorous testing throughout the development phase. Traditionally, human players are recruited to assess the game's difficulty [4]. These testers comprehensively navigate through various levels and scenarios to plot out the difficulty curve and see if it match with the developer's intent. ...
... To this point, we have used the term "difficulty" without providing a formal definition. According to prior literature [3,4,20], difficulty, in the context of video game, is generally related to the player's skills or abilities to overcome a given "challenge". A "challenge" in a game can be understood as a sub-game: a rule-based system with variable and quantifiable outcomes [20]. ...
... As discussed by Aponte et al. [4], difficulty can be measured experimentally by having many players attempt a specific set of challenges and evaluating the quality of their performance. The quality measurement can involve any game-related statistics that correlate with difficulty, such as the player's win rate in overcoming a challenge, the score achieved, or the time taken. ...
Preprint
Full-text available
Recent advances in Large Language Models (LLMs) have demonstrated their potential as autonomous agents across various tasks. One emerging application is the use of LLMs in playing games. In this work, we explore a practical problem for the gaming industry: Can LLMs be used to measure game difficulty? We propose a general game-testing framework using LLM agents and test it on two widely played strategy games: Wordle and Slay the Spire. Our results reveal an interesting finding: although LLMs may not perform as well as the average human player, their performance, when guided by simple, generic prompting techniques, shows a statistically significant and strong correlation with difficulty indicated by human players. This suggests that LLMs could serve as effective agents for measuring game difficulty during the development process. Based on our experiments, we also outline general principles and guidelines for incorporating LLMs into the game testing process.
... However, due to the large dimensionality of the variety in maps and playstyles, the problem of accurately quantifying difficulty of maps is complex and not completely understood. This understanding matches existing literature on the subject of difficulty estimation in videogames [2,10]. ...
... It is widely recognized that difficulty estimation in videogames is useful for many reasons [2,10]. However, to the best of the author's knowledge, most existing approaches to difficulty estimation fall under three main categories: ...
Preprint
Full-text available
Difficulty estimation of Beat Saber maps is an interesting data analysis problem and valuable to the Beat Saber competitive scene. We present a simple algorithm that iteratively averages player skill and map difficulty estimations in a bipartite graph of players and maps, connected by scores, using scores only as input. This approach simultaneously estimates player skills and map difficulties, exploiting each of them to improve the estimation of the other, exploitng the relation of multiple scores by different players on the same map, or on different maps by the same player. While we have been unable to prove or characterize theoretical convergence, the implementation exhibits convergent behaviour to low estimation error in all instances, producing accurate results. An informal qualitative evaluation involving experienced Beat Saber community members was carried out, comparing the difficulty estimations output by our algorithm with their personal perspectives on the difficulties of different maps. There was a significant alignment with player perceived perceptions of difficulty and with other existing methods for estimating difficulty. Our approach showed significant improvement over existing methods in certain known problematic maps that are not typically accurately estimated, but also produces problematic estimations for certain families of maps where the assumptions on the meaning of scores were inadequate (e.g. not enough scores, or scores over optimized by players). The algorithm has important limitations, related to data quality and meaningfulness, assumptions on the domain problem, and theoretical convergence of the algorithm. Future work would significantly benefit from a better understanding of adequate ways to quantify map difficulty in Beat Saber, including multidimensionality of skill and difficulty, and the systematic biases present in score data.
... This is due to factors such as larger game state spaces, more intricate rules, and a greater variety of pieces and possible moves. Building on these factors, and drawing inspiration from previous work in entertainment computing (Aponte et al., 2011;Fraser et al., 2014), we developed a framework to objectively quantify the demands each game places on the four key abilities: Perception, Reasoning, Decision, and Adversary. ...
Preprint
Full-text available
Large Vision Language Models (LVLMs) have demonstrated remarkable abilities in understanding and reasoning about both visual and textual information. However, existing evaluation methods for LVLMs, primarily based on benchmarks like Visual Question Answering and image captioning, often fail to capture the full scope of LVLMs' capabilities. These benchmarks are limited by issues such as inadequate assessment of detailed visual perception, data contamination, and a lack of focus on multi-turn reasoning. To address these challenges, we propose \method{}, a game-based evaluation framework designed to provide a comprehensive assessment of LVLMs' cognitive and reasoning skills in structured environments. \method{} uses a set of games to evaluate LVLMs on four core tasks: Perceiving, Question Answering, Rule Following, and End-to-End Playing, with each target task designed to assess specific abilities, including visual perception, reasoning, decision-making, etc. Based on this framework, we conduct extensive experiments that explore the limitations of current LVLMs, such as handling long structured outputs and perceiving detailed and dense elements. Code and data are publicly available at https://github.com/xinke-wang/LVLM-Playground.
... During practice or in learning games with adaptive difficulty, the difficulty level is continuously adjusted to match the learners' performance levels, which are constantly monitored 171 . Adaptive scaffolding aims to offer an appropriate level of challenge aligning with students' zones of proximal development 172 , ensuring it neither over-nor under-challenges the learner, especially during repeated practice 173,174 . ...
Preprint
Full-text available
Socioeconomic status (SES) influences school success. Students with lower SES may face challenges that this study tries to address by instructional scaffolding. To be effective, such support needs to consider students’ individual strengths and weaknesses. In this study, 321 sixth-grade students used an e-textbook about fractions. They were randomly assigned to receive either adaptive task difficulty, explanatory feedback, or dynamic visualizations as scaffolds or no scaffolding. We assessed their fraction knowledge at pre- and post-test and nine cognitive and motivational-affective variables. Latent profile analyses identified three profiles, with students with lower SES commonly associated with the two profiles with unfavorable learning prerequisites. A linear mixed model revealed that adaptive task difficulty significantly benefited students in the profile with the least favorable prerequisites. Implementing adaptive task difficulty in Math classes might mitigate challenges associated with lower SES, enhancing educational success and equity by addressing individual prerequisites and learning needs.
... Second, we can mechanically specify and measure the difference between external (i.e., game challenge level) and internal (i.e., player's skill level) complexity. Commercial games possess various methods of game balancing, challenge control, and response (Aponte et al. 2011). Each modifies the game's entertainment value for players of different skill levels (Lankveld et al. 2008). ...
Article
Difficulty constitutes a key component of games. It both serves as motivation to play the game and as a way to control progression. Usually, a video game offers a gradual progression in gameplay difficulty, starting with easy levels that allow players to grasp the basic mechanics, and incrementally introducing more challenging obstacles as they progress. Needless to say, the success of a game heavily relies on its ability to provide a well-balanced difficulty curve and a satisfying progression. Therefore, designing a game entails a complex and time-consuming process that involves extensive playtesting, one potential approach to address this challenge is the utilization of software tools capable of automatically evaluating the difficulty of game levels. In this paper, we present a comprehensive model for automatically evaluating the difficulty levels of platformer games. Our model is based on the formal calculation of static danger zones within levels and the analysis of enemy movement patterns using simulated pheromones. To validate our model, we implemented it and conducted tests using the complete set of levels from the game ” Super Mario Bros. ”. Futhermore, the paper includes a detailed presentation of the model, the tools developed, a comparative analysis showing the computed results, and a discussion on the limitations and advantages.
Article
Selective exposure has long been a concern of HCI researchers as it can lead to ideological polarization and distrust in society. Efforts have tried to reduce selective exposure online by serving diversified news content, but their effectiveness has been limited by users' lack of motivation to engage with the diverse content offered. To address this, we design the NewsGuesser system, which leverages the insight that curiosity can prompt motivation and engagement, by asking readers to guess the source of their news. In interviews with 40 participants, balanced for partisan affiliation, we use NewsGuesser as a probe tool to explore how guessing affects their perceptions of selective exposure. Participants struggled with the guessing game, which revealed a misalignment between users' expectations of different news sources and reality. Faced with the visualizations of the (often inaccurate) guessing results, participants were able to reflect on their own biases and selective exposure. In a number of cases, the guessing process changed participants' impressions of news organizations and some expressed an interest in engaging with more diverse news sources. While many also found the guessing game frustrating, the system and interview results suggest a number of new directions for designing social media and news media platforms.
Article
Full-text available
In this paper we introduce experiments on neuro-evolution mechanisms applied to preda-tor/prey multi-character computer games. Our test-bed is a modified version of the well-known Pac-Man game. By viewing the game from the predators' (i.e. opponents') perspective, we attempt off-line to evolve neural-controlled op-ponents capable of playing effectively against computer-guided fixed strategy players. How-ever, emergent near-optimal behaviors make the game less interesting to play. We therefore dis-cuss the criteria that make a game interesting and, furthermore, we introduce a generic mea-sure of predator/prey computer games' interest. Given this measure, we present an evolutionary mechanism for opponents that keep learning from a player while playing against it (i.e. on-line) and we demonstrate its efficiency and robustness in increasing and maintaining the game's inter-est. Computer game opponents following this on-line learning approach show high adaptabil-ity to changing player strategies which provides evidence for the approach's effectiveness against human players.
Article
Full-text available
In this paper, we describe the challenge of adequately characterizing and measuring experiences associated with playing digital games. We discuss the applicability of traditional usability metrics to user-centred game design, and highlight two prominent concepts, flow and immersion, as potential candidates for evaluating gameplay. The paper concludes by describing the multi-measure approach taken by the Game Experience Research Lab in Eindhoven.
Article
Content analysis of video games tends to be an extremely arduous task, involving the collection of a very large quantity of data and statistics detailing the experiences of gameplay. Nevertheless, it is an important process that supports many business, policy, social, and scholarly activities related to the games industry. Consequently, supports are clearly necessary to facilitate content analysis procedures for video games. This paper discusses an innovative approach to automating content analysis for video games through the use of software instrumentation. By properly instrumenting video game software to enable data collection and processing, content analysis procedures can be either partially or fully automated, depending on the game in question. This paper discusses our overall approach to automation, as well as our experiences to date with Epic's Unreal Engine. Sample results from initial experiments conducted so far are also presented. These results have been quite positive, demonstrating great promise for continued work in this area.