Conference PaperPDF Available

Preliminary Study of the Performance of the miniPXI when Measuring Player Experience throughout Game Development

Authors:
Preliminary Study of the Performance of the miniPXI when
Measuring Player Experience throughout Game Development
Aqeel Haider
aqeel.haider@kuleuven.be
KU Leuven
Belgium
Günter Wallner
guenter.wallner@jku.at
Johannes Kepler University Linz
Austria
Kathrin Gerling
Karlsruhe Institute of Technology
Germany
kathrin.gerling@kit.edu
Vero Vanden Abeele
KU Leuven
Belgium
vero.vandenabeele@kuleuven.be
ABSTRACT
Short questionnaires, using only a few items for measuring user
experience related constructs, have been used in a variety of do-
mains. In the eld of games, the miniPXI is such a validated short
version of the player experience inventory (PXI), containing 11
single items to measure 11 dierent PX-related constructs. Previous
validations of the miniPXI were carried out in an experimental set-
ting with existing, fully nished games. In this study, we conduct a
preliminary investigation of the potential of miniPXI to evaluate
prototypes during game development. We explore dierences in
PX across two iterations of nine games prototypes, based on input
from 16 participants. Findings suggest that the miniPXI is capable
of detecting dierences between the two prototype versions. In
addition, at the level of individual games, the miniPXI is eective
at identifying dierences in nearly all PX dimensions. However, we
also nd limited use for the single enjoyment item, and suggest
that including alternative measures such as the Net Promotor Score
may be more useful. Overall, this work suggests that the miniPXI
has the potential to evaluate dierent iterations of game prototypes,
starting from the earliest stages of game development.
CCS CONCEPTS
Human-centered computing User studies.
KEYWORDS
short questionnaire; player experience evaluation; games user re-
search; validity analysis
ACM Reference Format:
Aqeel Haider, Günter Wallner, Kathrin Gerling, and Vero Vanden Abeele.
2023. Preliminary Study of the Performance of the miniPXI when Measuring
Player Experience throughout Game Development. In Companion Proceed-
ings of the Annual Symposium on Computer-Human Interaction in Play (CHI
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
©2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0029-3/23/10. . . $15.00
https://doi.org/10.1145/3573382.3616076
PLAY ’23 Companion), October 10–13, 2023, Stratford, ON, Canada. ACM,
New York, NY, USA, 7pages. https://doi.org/10.1145/3573382.3616076
1 INTRODUCTION
Games User Research (GUR) plays a key role in evaluating player
experience (PX) in games. Through the collection of empirical data
via game evaluations, GUR oers valuable insights for game devel-
opment, including factors such as the perception of challenge and
engagement [
5
]. In the past decade, the methodologies employed
in GUR have undergone rapid advancements, from biometric anal-
ysis [
21
] to advanced machine learning and adaptive AI on the
basis player metrics [
15
]. However, gathering subjective evalua-
tions of player experience via surveys remains highly relevant.
Consequently, questionnaires are a crucial element of GUR, serving
as a means of capturing self-reported PX, including aspects such as
overall enjoyment, mastery, or immersion [11].
A number of validated questionnaires have been developed for the
purpose of evaluating PX [
1
,
2
,
6
,
7
,
16
,
24
]. But these questionnaires
contain a large number of items, and participants perceive them as
lengthy, which can be a barrier to their widespread adoption and
use [14] particularly within an industry environment.
To address fatigue of participants and shorten the time needed
for GUR evaluations, short or single-item measures in the eld of
games research have recently emerged, such as the HEXAD-12–a
short version of the Gamication User Types Hexad Scale [
18
], the
GUESS-18–a shorter version of the game user experience satisfac-
tion scale (GUESS) [
17
], or the miniPXI–an eleven-item measure of
the player experience inventory (PXI) [
14
]. These measures typi-
cally only use a few items per player experience construct, which
has practical advantages such as quicker response times [
22
], lower
respondent dissatisfaction [
25
], and fewer data omissions [
8
]. How-
ever, while these metrics oer advantages in terms of eciency
and convenience, their reliability and validity in capturing the mul-
tifaceted nature of PX, requires additional investigation [
14
]. In
particular, further research is needed to verify the extent to which
limited-item measures can eectively capture the diverse and com-
plex dimensions of PX, and the extent to which they are suitable
within iterative game development processes.
We address this issue in the context of the miniPXI, a short ver-
sion of the PXI [
1
]. Results of the validation [
14
] show that the
miniPXI can be a valuable tool for PX evaluations where longer
measurements are not feasible. However, the validation took place
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada Aqeel Haider et al.
in a somewhat articial setting, i.e., through delayed recall of player
experiences and/or the evaluation of fully nished games [
14
]. To
date, it is not well-understood how the miniPXI performs in a more
realistic setting that involves immediate evaluations of intermediate
game prototypes as typifying of game development processes. We
aim to close this gap through the work presented here: Our study
aims to gain insights into the capabilities of the miniPXI in captur-
ing the nuances when relying on immediate recall of PX (versus
delayed recall) and its potential as a tool for evaluating game proto-
types (versus nished games) and identifying dierences between
dierent iterations of the same game, or across game genres.
RQ1:
Does the miniPXI eectively identify dierences across iter-
ations of game prototypes?
RQ2:
Can the miniPXI successfully identify variations between
dierent game genres?
In the current study, a preliminary investigation is conducted in-
volving 16 participants in the evaluation of two iterations of nine
games. Through our analysis, we determine that the miniPXI is
capable of detecting dierences between the two iterations. In ad-
dition, at the level of individual games, the miniPXI is eective at
identifying dierences in nearly all cases. However, we also nd
limited use for the single Enjoyment item and suggest that alterna-
tive measures, such as the Net Promotor Score (NPS) [
23
], may be
more useful.
2 RELATED WORK
In the following section, we summarize key aspects in the evalua-
tion of player experience via single-item surveys and summarize
ndings with respect to their eectiveness. Additionally, we zoom
in on the miniPXI as a single-item variant.
2.1 Single item questionnaires for measuring
PX
Self-reported questionnaires allow players to provide this ‘subjec-
tive’ feedback and insights into their personal experiences, thoughts,
and emotions while playing a game. Therefore, a number of vali-
dated questionnaires have been created for the purpose of evaluat-
ing PX at these levels, among others, the Game Engagement Ques-
tionnaire (GEQ) [
6
] with 19 items, the Player Experience of Need
Satisfaction questionnaire (PENS) [
24
] with 21 items, the Ubisoft
Perceived Experience Questionnaire (UPEQ) [
2
] with 21 items or the
Player Experience Inventory (PXI) [
1
] with 33 items. Consequently,
these scales, with the number of items ranging from 19 to 33, are
sometimes reported as lengthy by industry partners [
27
] and found
impractical. In contrast, single-item per-construct measures, also
known as scales that rely on a single item to measure a construct,
might be advantageous for practical user research situations. They
allow for quick and ecient data collection and integrate well into
iterative evaluations with tight schedules and budgets [
14
]. Single-
item scales have also been reported as oering greater face validity,
allowing scores to be more easily interpreted and compared across
implementations [
12
]. They also exhibit less variation in adaptation
across populations and contexts, minimize missing or invalid re-
sponses and mitigate participant fatigue [
13
,
28
]. Hence, such short
measures are particularly advantageous for studies with repeated
measurements (e.g., during game development iterations) or in the
case autonomous questionnaire completion (e.g., online or mobile
studies), or where PX is only one many aspects to evaluate (e.g., in
the context of serious games, where researchers may also want to
assess persuasiveness or pedagogic qualities). However, there are
also limitations to single-item scales. They capture less information
than multiple-item measures, which can be particularly problematic
when assessing complex and ambiguous constructs [
20
], such as
PX dimensions like ow or immersion. In addition, the absence of
an internal reliability assessment and the inability to distinguish
between explained and unexplained variances restricts their use
for more elaborate statistical modeling.
2.2 miniPXI as a single-item variant of the PXI
questionnaire
Most recently, the development and validation of the miniPXI was
put forward as a short version of the PXI [
14
]. The full PXI is a
validated questionnaire that measures eleven constructs (see Ta-
ble 1). Five constructs sit at the level of Functional Consequences,
focusing on immediate, tangible outcomes resulting from game
design choices. Five constructs sit at the level of the Psychosocial
Consequences, exploring emotional experiences as second-order
responses to game design choices. A validation study of the miniPXI
provided nuanced results; reliability estimates for PXI constructs
were varied, and the authors could only conrm the validity for
nine out of eleven constructs. Hence, the reliability and validity
results indicated that the short version of the questionnaire did not
perform at the same level as the full version. This validation study
was also carried out in a somewhat articial setting, asking partic-
ipants to evaluate the player experience, either through delayed
recall of game experiences or via the evaluation of fully developed
games [
14
]. Hence, the ndings may still dier in a context in which
PX is evaluated immediate after game play, or in a game develop-
ment context, specically during the early stages, when games are
still early prototypes.
Accordingly, the question arises whether the benets of single-
item measures truly extend to the PX domain, and a better under-
standing is needed of the trade-o between practical usage benets
and scientic limitations.
3 METHODOLOGY
The PX evaluation took place as part of a university course on game
design and development, in which two playtesting sessions took
place about six weeks from each other. The initial playtesting ses-
sion featured early prototypes of the games, while the subsequent
playtesting session incorporated rened prototypes that had been
improved based on the feedback received from the evaluations in
the rst iteration.
3.1 Participants
The sample for this study consisted of 16 undergraduates enrolled
in the class, with ages ranging from 21 to 32, with a median age of 24
years. Of the participants, 90.5% (
𝑛=
19) self-identied as male and
9.5% (
𝑛=
2) as female. The data analyzed for this study were gath-
ered over the duration of a semester during two playtesting sessions.
The research protocol was authorized by the institutional ethics
committee, ensuring adherence to ethical standards and guidelines.
miniPXI across Game Iterations CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
Most of the players (31.3%,
𝑛=
5) indicated they play games be-
tween 5 and 10 hours per week. 18% (
𝑛=
3) of the participants
fell into each of the three groups: 1 to 2 hours per week, 10 to 20
hours per week, and 2 to 5 hours per week. Only 12.5% (
𝑛=
2)
of the participants mentioned that they played the game for more
than 20 hours per week. Participants also self-rated their expertise
levels in gaming on 7 point scale from novice (1) to expert (7). The
majority of participants (43.8%,
𝑛=
7) considered themselves to
be at an intermediate level with a rating of 6, followed by 31.3%
(
𝑛=
5) who rated themselves as moderately skilled with a rating
of 5. A smaller proportion of participants (18.8%,
𝑛=
3) identied
themselves as novices with a rating of 4. Only one participant (6.3%)
rated themselves at the highest level of expertise with a rating of 7.
3.2 Measures
The miniPXI scale, consisting of eleven items, was utilized to mea-
sure eleven dierent constructs related to the PX [
14
]. In addition
to the miniPXI scale, the Net Promotor Score (NPS) [
23
] was added
as a manipulation check. The net promotor score is a single item
metric from stemming from Market Research, to measure customer
loyalty, satisfaction, and enthusiasm via the item "How likely are
you to recommend this [product or service] to a friend or colleague?".
It is the most used single-metric measure to assess user experience
by industries [
3
]. The item is scored on a scale of 1 (not at all) to 10
(highly likely). Further specic interpretation of the scoring can be
carried out to segment users in promotors versus detractors, and
product overall growth, yet this segmentation is beyond the scope
for this paper.
3.3 Games
For this study, the rst session of playtesting and PX evaluation
included 15 games, created by undergraduate students, spanning
a variety of genres. Only eleven games were further developed,
and available for the second session. Two of these eleven games
were further excluded from the nal analysis because they lacked a
sucient number of players in both sessions (
𝑛<
3). Consequently,
nine games were included in the study’s nal sample. Collectively,
the individual nine games developed represent the dierent and
varying genres (see Figure 1) that typify the heterogeneity of the
game domain, and can be expected to dier on the dierent con-
structs of the miniPXI. The genres and number of players for two
iterations of the nine games analyzed were as follows:
Game #1 (Dungeon Crawler, 𝑛=4)
Game #2 (Turn-based Strategy, 𝑛=4)
Game #3 (Racing game, 𝑛=3)
Game #4 (Management simulation, 𝑛=3)
Game #5 (Cooperative puzzle game, 𝑛=3)
Game #6 (Roguelite FPS deck builder, 𝑛=5)
Game #7 (Platformer, 𝑛=4)
Game #8 (Educational puzzle game, 𝑛=5)
Game #9 (Platformer, 𝑛=3)
3.4 Procedure
Students were randomly assigned to multiple games they should
playtest, with the assignment of games being the same in both
sessions. The duration for testing a game was 20 minutes maximum.
After playing the game the participants were asked to ll out the
questionnaires online, hosted on a Qualtrics survey platform. The
survey included basic demographic questions (age, gender), gaming
experience, as well as the miniPXI and NPS (cf. Section 3.2).
3.5 Statistical Analysis
Linear Mixed Eect Modeling (LMEM) was used due to the nested
and unbalanced data, as there were varying numbers of players per
game across the iterations. The LMEM model was tted using the re-
stricted maximum likelihood estimation (REML) method, developed
in R(version 4.0.5) [
26
] using the lme4 package [
4
]. Additionally,
the lmerTest package [
19
] was used to obtain
𝑝
-values.
1
Each of the
miniPXI constructs was examined to determine whether there were
dierences in construct ratings across game iterations (RQ1) and
game genres (RQ2). For RQ1, the LMEMs were specied with the
miniPXI construct rating as the dependent variable, including ran-
dom eects for the player and the game genres, while the iteration
was added as a xed eect. For RQ2, similar to RQ1, the miniPXI
construct rating was used as the dependent variable, the player as a
random eect, and the game genres and iteration as xed eects. In
the case of the Net Promoter Score (NPS), the score rating ranging
from 1 to 10 was used as the dependent variable in the Linear Mixed
Eects Models (LMEMs) for both research questions. To assess the
signicance of the eects, Mixed Model ANOVA tables were gener-
ated using likelihood ratio tests [
10
]. Additionally, eect sizes are
reported in terms of conditional
𝑅2
𝑐
to measure the magnitude of
the observed eects.
4 RESULTS
The miniPXI found improvements for the dierent prototype itera-
tions, and noticeable dierences across the dierent games genres,
see gure 2. Below we discuss the results in more detail.
4.1 RQ1: Does the miniPXI eectively capture
and discern dierences, via immediate
recall of PX, across dierent iterations in
prototype development?
Dierences in construct ratings were observed between the two
iterations of prototype development for all miniPXI constructs,
see Figure 2and Table 1, RQ1, columns “Means - Iteration 1” and
“Means - Iteration 2”.
Regarding functional consequences, these observed dierences
reached the level of signicance at the composite level (
𝛽=
0
.
52
, 𝑆𝐸 =
0
.
157
, 𝑋 2(
1
,
75
)=
9
.
704
, 𝑝 =.
002) with eect size
𝑅2
𝑐
=
0
.
612. More
specically, signicant improvements were observed on the con-
structs of Ease of Control (EC) (
𝛽=
0
.
84
, 𝑆𝐸 =
0
.
272
, 𝑋 2(
1
,
75
)=
8
.
577
, 𝑝 =.
003) with eect size
𝑅2
𝑐
=
0
.
313 and Progress Feedback
(PF) (
𝛽=
0
.
68
, 𝑆𝐸 =
0
.
313
, 𝑋 2(
1
,
75
)=
4
.
557
, 𝑝 =.
033) with eect
size 𝑅2
𝑐
=0.368.
Regarding psychosocial consequences, these observed dierences
reached the level of signicance at the composite level (
𝛽=
0
.
41
, 𝑆𝐸 =
0
.
160
, 𝑋 2(
1
,
75
)=
6
.
322
, 𝑝 =.
011) with eect size
𝑅2
𝑐
=
0
.
574.
More specically, signicant improvements were observed on the
1
Detailed model specications and dataset can be found in the supplementary
materials.
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada Aqeel Haider et al.
Figure 1: Screenshots of the nine games developed, representing dierent genres.
constructs of Curiosity (CUR) (
𝛽=
0
.
46
, 𝑆𝐸 =
0
.
211
, 𝑋 2(
1
,
75
)=
4
.
623
, 𝑝 =.
031
, 𝑅2
𝑐
=
0
.
362) with eect size
𝑅2
𝑐
=
0
.
362, and Meaning
(MEA) (
𝛽=
0
.
65
, 𝑆𝐸 =
0
.
293
, 𝑋 2(
1
,
75
)=
4
.
727
, 𝑝 =.
031
, 𝑅2
𝑐
=
0
.
212)
with eect size 𝑅2
𝑐
=0.212.
Regarding the observed dierences for the construct of Enjoy-
ment, this did not reach the level of signicance (
𝛽=
0
.
29
, 𝑆𝐸 =
0
.
45
, 𝑋 2(
1
,
75
)=
0
.
402
, 𝑝 =.
525). In contrast, for the NPS, the ob-
served dierences across the two iterations did reach signicance
(
𝑏𝑒𝑡 𝑎 =
0
.
75
, 𝑆𝐸 =
0
.
329
, 𝑋 2(
1
,
75
)=
4
.
778
, 𝑝 =.
029) with eect
size 𝑅2
𝑐
=0.352.
4.2 RQ2: Can the miniPXI successfully identify
variations via immediate recall PX, among
dierent game genres?
Dierences in construct ratings were observed between the game
genres, see Figure 2and Table 1, RQ2. Regarding functional conse-
quences, these observed dierences among the nine genres reached
the level of signicance at the composite level (
𝑋2(
1
)=
25
.
779
, 𝑝 =
.
001) with eect size
𝑅2
𝑐
=
0
.
686. More specically on the constructs
of Audiovisual Appeal (
𝑋2(
8
,
75
)=
27
.
305
, 𝑝 =<.
001) with eect
size
𝑅2
𝑐
=
0
.
532,Challenge (
𝑋2(
8
,
75
)=
22
.
439
, 𝑝 =.
004) with eect
size
𝑅2
𝑐
=
0
.
534,Clarity of Goals (
𝑋2(
8
,
75
)=
20
.
063
, 𝑝 =<.
010)
with eect size
𝑅2
𝑐
=
0
.
494 and Progress Feedback (PF) (
𝑋2(
8
,
75
)=
18.261, 𝑝 =.019) with eect size 𝑅2
𝑐
=0.523.
Regarding psychosocial consequences, these observed dierences
among the nine game genres reached the level of signicance
at the composite level (
𝑋2(
8
,
75
)=
20
.
431
, 𝑝 =.
009) with eect
size
𝑅2
𝑐
=
0
.
694. More specically on the constructs of Autonomy
(
𝑋2(
1
)=
15
.
633
, 𝑝 =.
047) with eect size
𝑅2
𝑐
=
0
.
316,Immersion
(
𝑋2(
8
,
75
)=
16
.
400
, 𝑝 =.
037) with eect size
𝑅2
𝑐
=
0
.
603, and Mas-
tery (𝑋2(8,75)=22.638, 𝑝 =.003) with eect size 𝑅2
𝑐
=0.570.
Similar to the results for RQ1, the observed dierences for the con-
struct of Enjoyment did not reach signicance levels (
𝑋2(
8
,
75
)=
7
.
953
, 𝑝 =.
438). For the NPS, in contrast, the observed dierences
between games did reach signicance (
𝑋
2
(
8
,
75
)=
19
.
377
, 𝑝 =.
029)
with eect size 𝑅2
𝑐
=0.567.
5 DISCUSSION
Our work investigates the capabilities of the miniPXI in capturing
the nuances of player experience (PX) and its potential as a useful
tool for evaluating game prototypes and identifying dierences
between dierent game genres. Here, we discuss the ndings related
to each research question and highlight the potential implications
of utilizing the miniPXI in similar evaluation settings.
Regarding the identication of dierences across dierent itera-
tions of game prototypes (RQ1), the miniPXI measured improve-
ments in construct scorings for each of the functional consequences,
with two out of ve reaching signicance levels. Additionally, the
miniPXI also measured improvements for the psychosocial con-
structs, with the exception of Autonomy. At the composite level of
functional and psychosocial consequences, these reected medium
to large eect sizes, as per guidance by Cohen [
9
]. Finally, the
miniPXI also observed improvements in the Enjoyment construct,
yet here, improvements did not reach signicance. In contrast, the
NPS did report a signicant dierence across the two iterations.
These preliminary ndings suggest that the miniPXI can be a valu-
able tool to be used in iterative playtesting, where there is a need
for rapid measurements. Interestingly, improvements in scores are
more pronounced for the constructs at the level of functional con-
sequences than for the constructs at the level of psychosocial con-
sequences, as can be witnessed from the larger eect sizes. Possibly,
this can be explained through the understanding of prototypes
being evaluated. Likely, the dierences in immediate tangible im-
provements result in stronger experienced improvements. Most
notably, the Enjoyment construct did underperform when com-
pared to the NPS metric.
Regarding the exploration of dierences across dierent games
genres (RQ2), the miniPXI also demonstrated its capability to detect
variations in ratings on the miniPXI constructs. At the composite
level, both functional and psychosocial consequences reached sig-
nicance and showed nearly identical medium to large eect sizes.
For the individual constructs, seven out of eleven reach signicance,
despite the games being in their initial development phases. This
miniPXI across Game Iterations CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
Figure 2: Construct ratings for the 9 games across two iterations ( = Iteration 1, = Iteration 2), and the eleven miniPXI
constructs (AA = Audiovisual Appeal, CH = Challenge, EC = Ease of control, CG = Clarity of goals, PF = Progress feedback, AUT
= Autonomy, CUR = Curiosity, IMM = Immersion, MAS = Mastery, MEA = Meaning, ENJ = Enjoyment).
Table 1: Mean, Standard deviations (
𝑆𝐷
), Estimate (
𝛽
) and Standard error (
𝑆𝐸
) of iteration two (RQ1 only), Chi-square tests (
𝑋2
),
degrees of freedom (
𝑑 𝑓
), sample size (
𝑁
), level of signicance (
𝑝
), and conditional
𝑅2
(
𝑅2
𝑐
) for LMEM per construct for RQ1 and
RQ2. Signicant ndings are highlighted in bold.
Construct
RQ1 RQ2
Mean (𝑆𝐷) Likelihood test Likelihood test
Iteration 1 Iteration 2 𝛽 𝑆𝐸 𝑋 2(𝑑 𝑓 , 𝑁 )𝑝 𝑅2
𝑐𝑋2(𝑑 𝑓 , 𝑁 )𝑝 𝑅2
𝑐
Functional Constructs 1.00 (1.00) 1.44 (1.04) 0.52 0.157 9.704 (1, 75) .002 0.612 25.779 (8, 75) .001 0.686
functional
Audiovisual Appeal (AA) 1.96 (1.32) 2.19 (1.02) 0.20 0.197 1.017 (1, 75) .313 0.461 27.305 (8, 75) <.001 0.532
Challenge (CH) 0.33 (1.64) 0.62 (1.65) 0.30 0.257 1.406 (1, 75) .235 0.524 22.439 (8, 75) .004 0.534
Ease of Control (EC) 1.15 (1.46) 1.58 (1.24) 0.84 0.272 8.577 (1, 75) .003 0.313 13.451 (8, 75) .090 0.407
Clarity of Goals (GR) 1.19 (1.84) 1.69 (1.23) 0.57 0.292 3.731 (1, 75) .053 0.371 20.063(8, 75) .010 0.494
Progress Feedback (PF) 0.22 (1.65) 1.00 (1.83) 0.68 0.313 4.557 (1, 75) .033 0.368 18.261 (8, 75) .019 0.523
Psychosocial Constructs 1.37 (0.94) 1.68 (1.02) 0.41 0.160 6.322 (1, 75) .011 0.574 20.431 (8, 75) .009 0.694
psychosocial
Autonomy (AUT) 1.59 (1.19) 1.58 (1.45) 0.35 0.285 1.558 (1, 75) .212 0.213 15.633 (8, 75) .047 0.316
Curiosity (CUR) 1.96 (1.13) 2.31 (0.79) 0.46 0.211 4.623 (1, 75) .031 0.362 11.328 (8, 75) .184 0.560
Immersion (IMM) 1.26 (1.83) 1.73 (1.28) 0.47 0.263 3.219 (1, 75) .072 0.458 16.400 (8, 75) .037 0.603
Mastery (MAS) 1.19 (1.39) 1.31 (1.44) 0.23 0.220 1.095 (1, 75) .295 0.519 22.638 (8, 75) .003 0.570
Meaning (MEA) 0.81 (1.27) 1.58 (1.33) 0.65 0.293 4.727 (1, 75) .031 0.212 09.669 (8, 75) .289 0.381
Enjoyment (ENJ) -0.44 (1.95) -0.08 (2.17) 0.29 0.402 0.402 (1, 75) .525 - 07.953 (8, 75) .438 0.269
NPS 7.22 (1.59) 7.97 (1.69) 0.75 0.329 4.778 (1, 75) .029 0.352 19.377 (8, 75) .013 0.567
capability may be particularly useful in academic settings, which
typically involve the development and PX evaluations of multi-
ple research-based games. Again, we observe that the Enjoyment
item underperforms; it does not reach signicance and reects the
smallest eect size of all constructs. At rst glance it could be hy-
pothesized that these may be explained by Enjoyment being an
umbrella construct for the dierent functional and psychosocial
constructs. One could reason that overall, as an ‘averaged measure’,
it simply reects similar levels of enjoyment for the nine game
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada Aqeel Haider et al.
genres. However, this is contradicted by the strong performance of
the NPS metric.
Across RQ1 and RQ2, these ndings suggest that the Enjoyment
item, phrased as "I had a good time playing this game" performs
suboptimally, and suggest that GUR need to reect on whether
they truly aim to gauge enjoyment or rather appreciation. Possibly,
players may still appreciate the game at large, despite the lack of a
‘good time’ during game play. This may be particularly relevant for
serious games, that aim to give players pause, or games that provide
hard fun, where frustration is part of the cycle of experiences that
players undergo. Instead of Enjoyment, incorporating NPS into PX
evaluations can assist GUR experts in gaining a comprehensive
understanding of players’ appreciation of the game. Additionally,
given the limitations of short questionnaires in capturing the multi-
faceted nature of PX [
14
], integrating NPS with short questionnaires
has the potential to complement ndings and enhance the validity
and robustness of research outcomes.
6 LIMITATIONS AND FUTURE WORK
The current study has a number of limitations and suggests pos-
sible future research directions. First, the overall sample size (16
participants) was small. Using LMEM, inferential statistical analysis
on this nested and balanced data was carried out. Yet, future stud-
ies should aim to include larger sample sizes to improve the rigor
with which miniPXI feasibility is investigated. Additionally, future
research may investigate the validity of shorter questionnaires by
focusing on particular genres or genres that are closely related,
and formulating hypotheses with respect to certain dimensions of
player experience. Finally, incorporating existing validated scales
along with the shorter scale can also aid in conducting exhaustive
and rigorous validity analyses.
Currently, a comprehensive validity study employing full-scale
measures is in progress, and its results will provide valuable insights
into the eld. Nevertheless, the ndings indicate that the miniPXI
demonstrates an ability to capture dierences, with the exception
of the ENJ, showcasing its potential as a tool for measuring player
experience in evolving game iterations.
7 CONCLUSION
Based on the results of our preliminary study, the miniPXI question-
naire utilized in this study demonstrated its potential for investi-
gating dierences in player experience between iterations of game
prototypes and across dierent game genres. Despite the limited
number of participants and the use of simplied game versions,
the miniPXI was able to capture variations in player experience
at the level of functional and psychosocial consequences, showing
medium to large eects. This indicates that miniPXI is appropriate
for evaluating individual game iterations, and across game genres,
as it oers time-saving benets and a minimal evaluation setup.
However, our study also showed that the Enjoyment construct
performed suboptimally, and suggest that for a more holistic under-
standing of PX, and when aiming for the identication of overall
player appreciation toward the game, it might be more useful to
add the Net Promotor score.
REFERENCES
[1]
Vero Vanden Abeele, Katta Spiel, Lennart Nacke, Daniel Johnson, and Kathrin
Gerling. 2020. Development and validation of the player experience inventory:
A scale to measure player experiences at the level of functional and psychosocial
consequences. International Journal of Human Computer Studies 135, June 2019
(2020), 102370. https://doi.org/10.1016/j.ijhcs.2019.102370
[2]
Ahmad Azadvar and Alessandro Canossa. 2018. UPEQ: Ubisoft Perceived Expe-
rience Questionnaire: A Self-Determination Evaluation Tool for Video Games.
In Proceedings of the 13th International Conference on the Foundations of Digital
Games (FDG ’18). Association for Computing Machinery, New York, NY, USA,
Article 5, 7 pages. https://doi.org/10.1145/3235765.3235780
[3]
Sven Baehre, Michele O’Dwyer, Lisa O’Malley, and Nick Lee. 2022. The use of
Net Promoter Score (NPS) to predict sales growth: insights from an empirical
investigation. Journal of the Academy of Marketing Science 50, 1 (01 Jan 2022),
67–84. https://doi.org/10.1007/s11747-021- 00790-2
[4]
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting
Linear Mixed-Eects Models Using lme4. Journal of Statistical Software 67, 1
(2015), 1–48. https://doi.org/10.18637/jss.v067.i01
[5]
Regina Bernhaupt. 2015. Game User Experience Evaluation. Springer, Cham,
Switzerland.
[6]
Jeanne H Brockmyer, Christine M Fox, Kathleen A Curtiss, Evan McBroom,
Kimberly M Burkhart, and Jacquelyn N Pidruzny. 2009. The development of
the Game Engagement Questionnaire: A measure of engagement in video game-
playing. Journal of Experimental Social Psychology 45, 4 (2009), 624–634. https:
//doi.org/10.1016/j.jesp.2009.02.016
[7]
M.-T. Cheng, H.-C. She, and L A Annetta. 2015. Game immersion experience: its
hierarchical structure and impact on game-based science learning. Journal of
Computer Assisted Learning 31, 3 (2015), 232–253. https://doi.org/10.1111/jcal.
12066
[8]
Sung Hyeon Cheon and Johnmarshall Reeve. 2015. A classroom-based interven-
tion to help teachers decrease students’ amotivation. Contemporary Educational
Psychology 40 (Jan. 2015), 99–111. https://doi.org/10.1016/j.cedpsych.2014.06.004
[9]
Jacob Cohen. 2013. Statistical Power Analysis for the Behavioral Sciences. Routledge.
https://doi.org/10.4324/9780203771587
[10]
Ciprian M Crainiceanu and David Ruppert. 2004. Likelihood ratio tests in linear
mixed models with one variance component. Journal of the Royal Statistical
Society: Series B (Statistical Methodology) 66, 1 (2004), 165–185.
[11]
Alena Denisova, A. Imran Nordin, and Paul Cairns. 2016. The convergence of
player experience questionnaires. In Proceedings of the 2016 Annual Symposium
on Computer-Human Interaction in Play. Association for Computing Machinery,
New York, NY, USA, 33–37. https://doi.org/10.1145/2967934.2968095
[12]
Christyn L Dolbier, Judith A Webster, Katherine T McCalister, Mark W Mallon,
and Mary A Steinhardt. 2005. Reliability and Validity of a Single-Item Measure
of Job Satisfaction. American Journal of Health Promotion 19, 3 (2005), 194–198.
https://doi.org/10.4278/0890-1171- 19.3.194
[13]
Aimee L. Drolet and Donald G. Morrison. 2001. Do We Really Need Multiple-
Item Measures in Service Research? Journal of Service Research 3, 3 (Feb. 2001),
196–204. https://doi.org/10.1177/109467050133001
[14]
Aqeel Haider, Casper Harteveld, Daniel Johnson, Max V. Birk, Regan L. Mandryk,
Magy Seif El-Nasr, Lennart E. Nacke, Kathrin Gerling, and Vero Vanden Abeele.
2022. MiniPXI: Development and Validation of an Eleven-Item Measure of the
Player Experience Inventory. Proc. ACM Hum.-Comput. Interact. 6, CHI PLAY,
Article 244 (oct 2022), 26 pages. https://doi.org/10.1145/3549507
[15]
Yu-Guan Hsieh, Kimon Antonakopoulos, and Panayotis Mertikopoulos. 2021.
Adaptive learning in continuous games: Optimal regret bounds and convergence
to Nash equilibrium. In Conference on Learning Theory. PMLR, 2388–2422.
[16]
Charlene Jennett, Anna L. Cox, Paul Cairns, Samira Dhoparee, Andrew Epps,
Tim Tijs, and Alison Walton. 2008. Measuring and dening the experience of
immersion in games. International Journal of Human Computer Studies 66, 9 (Sept.
2008), 641–661. https://doi.org/10.1016/j.ijhcs.2008.04.004
[17]
Joseph R. Keebler, William J. Shelstad, Dustin C. Smith, Barbara S. Chaparro, and
Mikki H. Phan. 2020. Validation of the GUESS-18: A Short Version of the Game
User Experience Satisfaction Scale (GUESS). J. Usability Studies 16, 1 (nov 2020),
49–62.
[18]
Jeanine Krath, Maximilian Altmeyer, Gustavo F. Tondello, and Lennart E. Nacke.
2023. Hexad-12: Developing and Validating a Short Version of the Gamication
User Types Hexad Scale. In Proceedings of the 2023 CHI Conference on Human
Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for
Computing Machinery, New York, NY, USA, Article 677, 18 pages. https://doi.
org/10.1145/3544548.3580968
[19]
Alexandra Kuznetsova, Per B. Brockho, and Rune H. B. Christensen. 2017.
lmerTest Package: Tests in Linear Mixed Eects Models. Journal of Statistical
Software 82, 13 (2017), 1–26. https://doi.org/10.18637/jss.v082.i13
[20]
Robert Loo. 2002. A caveat on using single-item versus multiple-item scales.
Journal of Managerial Psychology 17, 1 (01 Jan 2002), 68–75. https://doi.org/10.
1108/02683940210415933
miniPXI across Game Iterations CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
[21]
Pejman Mirza-Babaei, Lennart Nacke, Geraldine Fitzpatrick, Gareth White, Gra-
ham McAllister, and Nick Collins. 2012. Biometric Storyboards: Visualising Game
User Research Data. In CHI ’12 Extended Abstracts on Human Factors in Computing
Systems (Austin, Texas, USA) (CHI EA ’12). Association for Computing Machinery,
New York, NY, USA, 2315–2320. https://doi.org/10.1145/2212776.2223795
[22]
Mark S. Nagy. 2002. Using a single-item approach to measure facet
job satisfaction. Journal of Occupational and Organizational Psychol-
ogy 75, 1 (2002), 77–86. https://doi.org/10.1348/096317902167658
arXiv:https://bpspsychub.onlinelibrary.wiley.com/doi/pdf/10.1348/096317902167658
[23]
Frederick F Reichheld. 2003. The one number you need to grow. Har vard business
review 81, 12 (2003), 46–55.
[24]
Richard M Ryan, C Scott Rigby, and Andrew Przybylski. 2006. The Motivational
Pull of Video Games: A Self-Determination Theory Approach. Motivation and
Emotion 30, 4 (2006), 344–360. https://doi.org/10.1007/s11031- 006-9051-8
[25]
Jerey M. Stanton, Evan F. Sinar, William K. Balzer, Amanda L. Julian, Paul Thore-
sen, Shahnaz Aziz, Gwenith G. Fisher, and Patricia C. Smith. 2002. Development
of a compact measure of job satisfaction: The abridged Job Descriptive Index.
Educational and psychological measurement 62, 1 (2002), 173–191. Publisher: Sage
Publications Sage CA: Los Angeles, CA.
[26]
Team, R Core and others. 2013. R: A language and environment for statistical
computing.
[27]
Margaret Verkuyl,Naza Djafarova, Paula Mastrilli, and Lynda Atack. 2022. Virtual
Gaming Simulation: Evaluating Players’ Experiences. Clinical Simulation in
Nursing 63 (2022), 16–22. https://doi.org/10.1016/j.ecns.2021.11.002
[28]
J P Wanous, A E Reichers, and M J Hudy. 1997. Overall job satisfaction: how good
are single-item measures? The Journal of applied psychology 82, 2 (apr 1997),
247–252. https://doi.org/10.1037/0021-9010.82.2.247
Received 2023-06-22; accepted 2023-08-03
... Additionally, a preliminary examination of the efficacy of the miniPXI in the context of early-stage game development was undertaken in a recent study [34]. The study demonstrated the effectiveness of the miniPXI in distinguishing PX across several prototype iterations. ...
... The reliability and validity results indicated that the short version did not perform at the same level as the full variant. Additionally, a preliminary study assessed the effectiveness of the miniPXI within the framework of early-stage game development [34]. The miniPXI 's ability to distinguish PX among prototype iterations was confirmed in this study. ...
Preprint
Full-text available
Repeated measurements of player experience are crucial in games user research, assessing how different designs evolve over time. However, this necessitates lightweight measurement instruments that are fit for the purpose. In this study, we conduct an examination of the test-retest reliability of the \emph{miniPXI} -- a short variant of the \emph{Player Experience Inventory} (\emph{PXI}), an established measure for measuring player experience. We analyzed test-retest reliability by leveraging four games involving 100 participants, comparing it with four established multi-item measures and single-item indicators such as the Net Promoter Score (\emph{NPS}) and overall enjoyment. The findings show mixed outcomes; the \emph{miniPXI} demonstrated varying levels of test-retest reliability. Some constructs showed good to moderate reliability, while others were less consistent. On the other hand, multi-item measures exhibited moderate to good test-retest reliability, demonstrating their effectiveness in measuring player experiences over time. Additionally, the employed single-item indicators (\emph{NPS} and overall enjoyment) demonstrated good reliability. The results of our study highlight the complexity of player experience evaluations over time, utilizing single and multiple items per construct measures. We conclude that single-item measures may not be appropriate for long-term investigations of more complex PX dimensions and provide practical considerations for the applicability of such measures in repeated measurements.
... By incorporating neurotechnology or EEG to gauge players' emotional states, we can bring a new dimension to user research for games and enrich the evaluation of play experiences. While several user research methods are currently employed to gauge player experience [10,11], leveraging EEG to understand players' brain activity, mental workload, and emotional states can further improve the quality of data analysis in user research. EEG data give insight into the brain's activity during video game gameplay, allowing researchers to examine user attention, emotion, stress, and engagement and assess the qualities of those specific games that contribute to data fluctuations for game developers [12]. ...
... This was followed by the application of an artifact removal algorithm (ATAR) [53]. After cleaning the EEG signals, the power of each sensor was computed for six different frequency bands: namely, δ-Delta (0.5-4 Hz), θ-Theta (4-7.5 Hz), α-Alpha (8)(9)(10)(11)(12)(13), β-Beta (14-30 Hz), γ-Gamma (32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47), and a full band (0.5-47 Hz). Finally, the average power of each sensor for the entire duration of gameplay was computed using the Welch method and was mapped to log scale as log(P + 1). ...
Article
Full-text available
Game platforms have different impacts on player experience in terms of affective states and workloads. By studying these impacts, we can uncover detailed aspects of the gaming experience. Traditionally, understanding player experience has relied on subjective methods, such as self-reported surveys, where players reflect on their experience and effort levels. However, complementing these subjective measures with electroencephalogram (EEG) analysis introduces an objective approach to assessing player experience. In this study, we examined player experiences across PlayStation 5, Nintendo Switch, and Meta Quest 2. Using a mixed-methods approach, we merged subjective user assessments with EEG data to investigate brain activity, affective states, and workload during low- and high-stimulation games. We recruited 30 participants to play two games across three platforms. Our findings reveal that there is a statistically significant difference between these three platforms for seven out of nine experience factors. Also, three platforms have different impacts on play experience and brain activity. Additionally, we utilized a linear model to associate player experience aspects such arousal, frustration, and mental workload with different brain regions using EEG data.
... This approach would likely be closer to the scale's intended use compared to the critical incident technique, increasing the external validity of such results. Initial findings in this regard were already reported in the original work on the PXI [66], and just recently, Haider et al. [27] reported on a preliminary investigation on the miniPXI's potential to evaluate prototypes during game development. Second, we collected data using an online study setting. ...
Conference Paper
Full-text available
Measuring the subjective experience of digital game players is essential to player experience research. Recently, the Player Experience Inventory (PXI) was developed, which assesses both functional and psychosocial consequences of digital gameplay. We present a pre-registered independent online study with a large sample to provide additional evidence of psychometric quality for the PXI. Responses from 1518 participants were collected, rating a recent or memorable experience playing a digital game using the PXI and related measures. While our results from standard psychometric reliability and validity analyses generally favored the PXI, we also identified challenges with the immersion construct. Further, we find a ten-factor model, or alternatively, an 11-factor should enjoyment be measured, to fit our collected data best. In sum, the PXI is a valuable tool to measure a variety of constructs central to player experience.
Article
Repeated measurements of player experience are crucial in games user research, assessing how different designs evolve over time. However, this necessitates lightweight measurement instruments that are fit for the purpose. In this study, we conduct an examination of the test-retest reliability of the miniPXI -a short variant of the Player Experience Inventory (PXI) , an established measure for measuring player experience. We analyzed test-retest reliability by leveraging four games involving 100 participants, comparing it with four established multi-item measures and single-item indicators such as the Net Promoter Score ( NPS ) and overall enjoyment. The findings show mixed outcomes; the miniPXI demonstrated varying levels of test-retest reliability. Some constructs showed good to moderate reliability, while others were less consistent. On the other hand, multi-item measures exhibited moderate to good test-retest reliability, demonstrating their effectiveness in measuring player experiences over time. Additionally, the employed single-item indicators ( NPS and overall enjoyment) demonstrated good reliability. The results of our study highlight the complexity of player experience evaluations over time, utilizing single and multiple items per construct measures. We conclude that single-item measures may not be appropriate for long-term investigations of more complex PX dimensions and provide practical considerations for the applicability of such measures in repeated measurements.
Conference Paper
Full-text available
The Hexad scale is a crucial tool for personalized gamification in user experience (UX) design. However, completing a 24-item questionnaire can increase dropout rates and screen fatigue within online surveys. When included in larger surveys, scale brevity makes a difference. To reduce the time required for the assessment process, we developed and validated a 12-item version of the Hexad scale. To create it, we carried out an exploratory factor analysis on an existing data set to identify appropriate items (n = 882). To validate the 12-item version, we conducted a confirmatory factor analysis on a new data set (n = 1, 101). Our results show that Hexad-12 outperforms the original Hexad scale regarding model fit, reliability, convergent, and discriminant validity. Therefore, Hexad-12 resolves issues found in studies using the original Hexad scale and provides a suitable and swift instrument for concisely assessing Hexad user types in tailored gamification design.
Article
Full-text available
Questionnaires are vital in games user research (GUR) to assess player experience (PX). However, having too many questions in surveys prevents wider uptake among GUR professionals because of games' rapid production cycles. To address this issue, we present the miniPXI---an eleven-item measure of the popular Player Experience Inventory (PXI)---providing single items for each of its eleven constructs. To develop the scale and examine its reliability and validity, we present three studies, conducted with 15 experts and 628 digital game players across continents. In the first survey study (n=366, 15 experts), single items were selected. In a second survey study (n=232), we explored reliability and validity of the single-item scale. Participants completed both full and single-item (SI) variants in three days. In the last study (n=30), we established the validity and sensitivity via an experimental evaluation of two games. The results are nuanced; SI reliability estimates for PXI constructs range from .51 to .83 with an average of .68, we could confirm the validity for nine constructs. We conclude that the miniPXI can be a valuable tool for PX evaluations where a longer measure is not feasible, and provide practical considerations for its use.
Article
Full-text available
Net Promoter Score (NPS) has been widely adopted by managers as a measure of customer mindset and predictor of sales growth. Over time, practitioners have evolved the use of NPS from its original purpose as a transaction-based customer loyalty metric, towards a metric for tracking overall brand health which includes responses from non-customers. Despite enduring managerial popularity, academics remain skeptical of NPS, citing methodological issues and ongoing concerns with NPS measurement. This study re-visits the use of NPS as a predictor of sales growth by analyzing data from seven brands operating in the U.S. sportswear industry, measured over five years. Our results confirm—within the context of our study—that while the original premise of NPS is reasonable, the methodological concerns raised by academics are valid, and only the more recently developed brand health measure of NPS (using an all potential customer sample) is effective at predicting future sales growth.
Article
Full-text available
Games User Research (GUR) focuses on measuring, analysing and understanding player experiences to optimise game designs. Hence, GUR experts aim to understand how specific game design choices are experienced by players, and how these lead to specific emotional responses. An instrument, providing such actionable insight into player experience, specifically designed by and for GUR was thus far lacking. To address this gap, the Player Experience Inventory (PXI) was developed, drawing on Means-End theory and measuring player experience both at the level of Functional Consequences, (i.e., the immediate experiences as a direct result of game design choices, such as audiovisual appeal or ease-of-control) and at the level of Psychosocial Consequences, (i.e., the second-order emotional experiences, such as immersion or mastery). Initial construct and item development was conducted in two iterations with 64 GUR experts. Next, the scale was validated and evaluated over five studies and populations, totalling 529 participants. Results support the theorized structure of the scale and provide evidence for both discriminant and convergent validity. Results also show that the scale performs well over different sample sizes and studies, supporting configural invariance. Hence, the PXI provides a reliable and theoretically sound tool for researchers to measure player experience and investigate how game design choices are linked to emotional responses.
Conference Paper
Full-text available
In order to appeal to a growing market, game developers are offering a wide variety of activities. It is becoming necessary to understand which psychological need each activity caters for. The purpose of this paper is to demonstrate the development and evaluation of an instrument to assess which basic psychological needs are satisfied by different video games. This work is part of a growing effort in HCI to develop surveys able to capture subtle nuances of the player experience. This model, UPEQ, was developed by transforming a self-determination theory questionnaire into a video game specific survey. UPEQ consists of three subscale of Autonomy, Competence and Relatedness, which, through two studies focusing on development and validation of the model showed significant correlations with other self-reported measures of sense of transportation to the game as well as enjoyment of and engagement with the game. Regression with in-game behavior of players tracked by game engine also confirmed that each subscale of UPEQ independently predicts playtime, money spent on the game and playing as a group.
Article
Full-text available
One of the frequent questions by users of the mixed model function lmer of the lme4 package has been: How can I get p values for the F and t tests for objects returned by lmer? The lmerTest package extends the 'lmerMod' class of the lme4 package, by overloading the anova and summary functions by providing p values for tests for fixed effects. We have implemented the Satterthwaite's method for approximating degrees of freedom for the t and F tests. We have also implemented the construction of Type I - III ANOVA tables. Furthermore, one may also obtain the summary as well as the anova table using the Kenward-Roger approximation for denominator degrees of freedom (based on the KRmodcomp function from the pbkrtest package). Some other convenient mixed model analysis tools such as a step method, that performs backward elimination of nonsignificant effects - both random and fixed, calculation of population means and multiple comparison tests together with plot facilities are provided by the package as well.
Conference Paper
Full-text available
Player experience is an important field of digital games re- search to understand how games influence players. A common way to directly measure players’ reported experiences is through questionnaires. However, the large number of questionnaires currently in use introduces several challenges both in terms of selecting suitable measures and comparing results across studies. In this paper, we review some of the most widely known and used questionnaires and focus on the immersive experience questionnaire (IEQ), the game engagement questionnaire (GEQ), and the player experience of need satisfaction (PENS), with the aim to position each of them in relation to each other. This was done through an online survey, in which we gathered 270 responses from players about their most recent experience of a digital game. Our findings show considerable convergence between these three questionnaires and that there is room to refine them into a more widely applicable measure of general game engagement.
Article
Background Our suite of openly available virtual games for nursing education is used worldwide, resulting in a need to better understand the learners’ experience. Method We evaluated the games, by providing a link at the end of our games to an abbreviated version of the Player Experience Inventory and through qualitative feedback from an open-ended question. Results A total of 568 players completed the survey. Findings confirmed that the virtual gaming simulation design choices we made during development were strongly linked to positive psychosocial consequences which in turn promoted game playing. Conclusions The simulations were well received by a diverse group of respondents.