Content uploaded by Aqeel Haider
Author content
All content in this area was uploaded by Aqeel Haider on Oct 09, 2023
Content may be subject to copyright.
Preliminary Study of the Performance of the miniPXI when
Measuring Player Experience throughout Game Development
Aqeel Haider
aqeel.haider@kuleuven.be
KU Leuven
Belgium
Günter Wallner
guenter.wallner@jku.at
Johannes Kepler University Linz
Austria
Kathrin Gerling
Karlsruhe Institute of Technology
Germany
kathrin.gerling@kit.edu
Vero Vanden Abeele
KU Leuven
Belgium
vero.vandenabeele@kuleuven.be
ABSTRACT
Short questionnaires, using only a few items for measuring user
experience related constructs, have been used in a variety of do-
mains. In the eld of games, the miniPXI is such a validated short
version of the player experience inventory (PXI), containing 11
single items to measure 11 dierent PX-related constructs. Previous
validations of the miniPXI were carried out in an experimental set-
ting with existing, fully nished games. In this study, we conduct a
preliminary investigation of the potential of miniPXI to evaluate
prototypes during game development. We explore dierences in
PX across two iterations of nine games prototypes, based on input
from 16 participants. Findings suggest that the miniPXI is capable
of detecting dierences between the two prototype versions. In
addition, at the level of individual games, the miniPXI is eective
at identifying dierences in nearly all PX dimensions. However, we
also nd limited use for the single enjoyment item, and suggest
that including alternative measures such as the Net Promotor Score
may be more useful. Overall, this work suggests that the miniPXI
has the potential to evaluate dierent iterations of game prototypes,
starting from the earliest stages of game development.
CCS CONCEPTS
•Human-centered computing →User studies.
KEYWORDS
short questionnaire; player experience evaluation; games user re-
search; validity analysis
ACM Reference Format:
Aqeel Haider, Günter Wallner, Kathrin Gerling, and Vero Vanden Abeele.
2023. Preliminary Study of the Performance of the miniPXI when Measuring
Player Experience throughout Game Development. In Companion Proceed-
ings of the Annual Symposium on Computer-Human Interaction in Play (CHI
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
©2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0029-3/23/10. . . $15.00
https://doi.org/10.1145/3573382.3616076
PLAY ’23 Companion), October 10–13, 2023, Stratford, ON, Canada. ACM,
New York, NY, USA, 7pages. https://doi.org/10.1145/3573382.3616076
1 INTRODUCTION
Games User Research (GUR) plays a key role in evaluating player
experience (PX) in games. Through the collection of empirical data
via game evaluations, GUR oers valuable insights for game devel-
opment, including factors such as the perception of challenge and
engagement [
5
]. In the past decade, the methodologies employed
in GUR have undergone rapid advancements, from biometric anal-
ysis [
21
] to advanced machine learning and adaptive AI on the
basis player metrics [
15
]. However, gathering subjective evalua-
tions of player experience via surveys remains highly relevant.
Consequently, questionnaires are a crucial element of GUR, serving
as a means of capturing self-reported PX, including aspects such as
overall enjoyment, mastery, or immersion [11].
A number of validated questionnaires have been developed for the
purpose of evaluating PX [
1
,
2
,
6
,
7
,
16
,
24
]. But these questionnaires
contain a large number of items, and participants perceive them as
lengthy, which can be a barrier to their widespread adoption and
use [14] particularly within an industry environment.
To address fatigue of participants and shorten the time needed
for GUR evaluations, short or single-item measures in the eld of
games research have recently emerged, such as the HEXAD-12–a
short version of the Gamication User Types Hexad Scale [
18
], the
GUESS-18–a shorter version of the game user experience satisfac-
tion scale (GUESS) [
17
], or the miniPXI–an eleven-item measure of
the player experience inventory (PXI) [
14
]. These measures typi-
cally only use a few items per player experience construct, which
has practical advantages such as quicker response times [
22
], lower
respondent dissatisfaction [
25
], and fewer data omissions [
8
]. How-
ever, while these metrics oer advantages in terms of eciency
and convenience, their reliability and validity in capturing the mul-
tifaceted nature of PX, requires additional investigation [
14
]. In
particular, further research is needed to verify the extent to which
limited-item measures can eectively capture the diverse and com-
plex dimensions of PX, and the extent to which they are suitable
within iterative game development processes.
We address this issue in the context of the miniPXI, a short ver-
sion of the PXI [
1
]. Results of the validation [
14
] show that the
miniPXI can be a valuable tool for PX evaluations where longer
measurements are not feasible. However, the validation took place
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada Aqeel Haider et al.
in a somewhat articial setting, i.e., through delayed recall of player
experiences and/or the evaluation of fully nished games [
14
]. To
date, it is not well-understood how the miniPXI performs in a more
realistic setting that involves immediate evaluations of intermediate
game prototypes as typifying of game development processes. We
aim to close this gap through the work presented here: Our study
aims to gain insights into the capabilities of the miniPXI in captur-
ing the nuances when relying on immediate recall of PX (versus
delayed recall) and its potential as a tool for evaluating game proto-
types (versus nished games) and identifying dierences between
dierent iterations of the same game, or across game genres.
RQ1:
Does the miniPXI eectively identify dierences across iter-
ations of game prototypes?
RQ2:
Can the miniPXI successfully identify variations between
dierent game genres?
In the current study, a preliminary investigation is conducted in-
volving 16 participants in the evaluation of two iterations of nine
games. Through our analysis, we determine that the miniPXI is
capable of detecting dierences between the two iterations. In ad-
dition, at the level of individual games, the miniPXI is eective at
identifying dierences in nearly all cases. However, we also nd
limited use for the single Enjoyment item and suggest that alterna-
tive measures, such as the Net Promotor Score (NPS) [
23
], may be
more useful.
2 RELATED WORK
In the following section, we summarize key aspects in the evalua-
tion of player experience via single-item surveys and summarize
ndings with respect to their eectiveness. Additionally, we zoom
in on the miniPXI as a single-item variant.
2.1 Single item questionnaires for measuring
PX
Self-reported questionnaires allow players to provide this ‘subjec-
tive’ feedback and insights into their personal experiences, thoughts,
and emotions while playing a game. Therefore, a number of vali-
dated questionnaires have been created for the purpose of evaluat-
ing PX at these levels, among others, the Game Engagement Ques-
tionnaire (GEQ) [
6
] with 19 items, the Player Experience of Need
Satisfaction questionnaire (PENS) [
24
] with 21 items, the Ubisoft
Perceived Experience Questionnaire (UPEQ) [
2
] with 21 items or the
Player Experience Inventory (PXI) [
1
] with 33 items. Consequently,
these scales, with the number of items ranging from 19 to 33, are
sometimes reported as lengthy by industry partners [
27
] and found
impractical. In contrast, single-item per-construct measures, also
known as scales that rely on a single item to measure a construct,
might be advantageous for practical user research situations. They
allow for quick and ecient data collection and integrate well into
iterative evaluations with tight schedules and budgets [
14
]. Single-
item scales have also been reported as oering greater face validity,
allowing scores to be more easily interpreted and compared across
implementations [
12
]. They also exhibit less variation in adaptation
across populations and contexts, minimize missing or invalid re-
sponses and mitigate participant fatigue [
13
,
28
]. Hence, such short
measures are particularly advantageous for studies with repeated
measurements (e.g., during game development iterations) or in the
case autonomous questionnaire completion (e.g., online or mobile
studies), or where PX is only one many aspects to evaluate (e.g., in
the context of serious games, where researchers may also want to
assess persuasiveness or pedagogic qualities). However, there are
also limitations to single-item scales. They capture less information
than multiple-item measures, which can be particularly problematic
when assessing complex and ambiguous constructs [
20
], such as
PX dimensions like ow or immersion. In addition, the absence of
an internal reliability assessment and the inability to distinguish
between explained and unexplained variances restricts their use
for more elaborate statistical modeling.
2.2 miniPXI as a single-item variant of the PXI
questionnaire
Most recently, the development and validation of the miniPXI was
put forward as a short version of the PXI [
14
]. The full PXI is a
validated questionnaire that measures eleven constructs (see Ta-
ble 1). Five constructs sit at the level of Functional Consequences,
focusing on immediate, tangible outcomes resulting from game
design choices. Five constructs sit at the level of the Psychosocial
Consequences, exploring emotional experiences as second-order
responses to game design choices. A validation study of the miniPXI
provided nuanced results; reliability estimates for PXI constructs
were varied, and the authors could only conrm the validity for
nine out of eleven constructs. Hence, the reliability and validity
results indicated that the short version of the questionnaire did not
perform at the same level as the full version. This validation study
was also carried out in a somewhat articial setting, asking partic-
ipants to evaluate the player experience, either through delayed
recall of game experiences or via the evaluation of fully developed
games [
14
]. Hence, the ndings may still dier in a context in which
PX is evaluated immediate after game play, or in a game develop-
ment context, specically during the early stages, when games are
still early prototypes.
Accordingly, the question arises whether the benets of single-
item measures truly extend to the PX domain, and a better under-
standing is needed of the trade-o between practical usage benets
and scientic limitations.
3 METHODOLOGY
The PX evaluation took place as part of a university course on game
design and development, in which two playtesting sessions took
place about six weeks from each other. The initial playtesting ses-
sion featured early prototypes of the games, while the subsequent
playtesting session incorporated rened prototypes that had been
improved based on the feedback received from the evaluations in
the rst iteration.
3.1 Participants
The sample for this study consisted of 16 undergraduates enrolled
in the class, with ages ranging from 21 to 32, with a median age of 24
years. Of the participants, 90.5% (
𝑛=
19) self-identied as male and
9.5% (
𝑛=
2) as female. The data analyzed for this study were gath-
ered over the duration of a semester during two playtesting sessions.
The research protocol was authorized by the institutional ethics
committee, ensuring adherence to ethical standards and guidelines.
miniPXI across Game Iterations CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
Most of the players (31.3%,
𝑛=
5) indicated they play games be-
tween 5 and 10 hours per week. 18% (
𝑛=
3) of the participants
fell into each of the three groups: 1 to 2 hours per week, 10 to 20
hours per week, and 2 to 5 hours per week. Only 12.5% (
𝑛=
2)
of the participants mentioned that they played the game for more
than 20 hours per week. Participants also self-rated their expertise
levels in gaming on 7 point scale from novice (1) to expert (7). The
majority of participants (43.8%,
𝑛=
7) considered themselves to
be at an intermediate level with a rating of 6, followed by 31.3%
(
𝑛=
5) who rated themselves as moderately skilled with a rating
of 5. A smaller proportion of participants (18.8%,
𝑛=
3) identied
themselves as novices with a rating of 4. Only one participant (6.3%)
rated themselves at the highest level of expertise with a rating of 7.
3.2 Measures
The miniPXI scale, consisting of eleven items, was utilized to mea-
sure eleven dierent constructs related to the PX [
14
]. In addition
to the miniPXI scale, the Net Promotor Score (NPS) [
23
] was added
as a manipulation check. The net promotor score is a single item
metric from stemming from Market Research, to measure customer
loyalty, satisfaction, and enthusiasm via the item "How likely are
you to recommend this [product or service] to a friend or colleague?".
It is the most used single-metric measure to assess user experience
by industries [
3
]. The item is scored on a scale of 1 (not at all) to 10
(highly likely). Further specic interpretation of the scoring can be
carried out to segment users in promotors versus detractors, and
product overall growth, yet this segmentation is beyond the scope
for this paper.
3.3 Games
For this study, the rst session of playtesting and PX evaluation
included 15 games, created by undergraduate students, spanning
a variety of genres. Only eleven games were further developed,
and available for the second session. Two of these eleven games
were further excluded from the nal analysis because they lacked a
sucient number of players in both sessions (
𝑛<
3). Consequently,
nine games were included in the study’s nal sample. Collectively,
the individual nine games developed represent the dierent and
varying genres (see Figure 1) that typify the heterogeneity of the
game domain, and can be expected to dier on the dierent con-
structs of the miniPXI. The genres and number of players for two
iterations of the nine games analyzed were as follows:
Game #1 (Dungeon Crawler, 𝑛=4)
Game #2 (Turn-based Strategy, 𝑛=4)
Game #3 (Racing game, 𝑛=3)
Game #4 (Management simulation, 𝑛=3)
Game #5 (Cooperative puzzle game, 𝑛=3)
Game #6 (Roguelite FPS deck builder, 𝑛=5)
Game #7 (Platformer, 𝑛=4)
Game #8 (Educational puzzle game, 𝑛=5)
Game #9 (Platformer, 𝑛=3)
3.4 Procedure
Students were randomly assigned to multiple games they should
playtest, with the assignment of games being the same in both
sessions. The duration for testing a game was 20 minutes maximum.
After playing the game the participants were asked to ll out the
questionnaires online, hosted on a Qualtrics survey platform. The
survey included basic demographic questions (age, gender), gaming
experience, as well as the miniPXI and NPS (cf. Section 3.2).
3.5 Statistical Analysis
Linear Mixed Eect Modeling (LMEM) was used due to the nested
and unbalanced data, as there were varying numbers of players per
game across the iterations. The LMEM model was tted using the re-
stricted maximum likelihood estimation (REML) method, developed
in R(version 4.0.5) [
26
] using the lme4 package [
4
]. Additionally,
the lmerTest package [
19
] was used to obtain
𝑝
-values.
1
Each of the
miniPXI constructs was examined to determine whether there were
dierences in construct ratings across game iterations (RQ1) and
game genres (RQ2). For RQ1, the LMEMs were specied with the
miniPXI construct rating as the dependent variable, including ran-
dom eects for the player and the game genres, while the iteration
was added as a xed eect. For RQ2, similar to RQ1, the miniPXI
construct rating was used as the dependent variable, the player as a
random eect, and the game genres and iteration as xed eects. In
the case of the Net Promoter Score (NPS), the score rating ranging
from 1 to 10 was used as the dependent variable in the Linear Mixed
Eects Models (LMEMs) for both research questions. To assess the
signicance of the eects, Mixed Model ANOVA tables were gener-
ated using likelihood ratio tests [
10
]. Additionally, eect sizes are
reported in terms of conditional
𝑅2
𝑐
to measure the magnitude of
the observed eects.
4 RESULTS
The miniPXI found improvements for the dierent prototype itera-
tions, and noticeable dierences across the dierent games genres,
see gure 2. Below we discuss the results in more detail.
4.1 RQ1: Does the miniPXI eectively capture
and discern dierences, via immediate
recall of PX, across dierent iterations in
prototype development?
Dierences in construct ratings were observed between the two
iterations of prototype development for all miniPXI constructs,
see Figure 2and Table 1, RQ1, columns “Means - Iteration 1” and
“Means - Iteration 2”.
Regarding functional consequences, these observed dierences
reached the level of signicance at the composite level (
𝛽=
0
.
52
, 𝑆𝐸 =
0
.
157
, 𝑋 2(
1
,
75
)=
9
.
704
, 𝑝 =.
002) with eect size
𝑅2
𝑐
=
0
.
612. More
specically, signicant improvements were observed on the con-
structs of Ease of Control (EC) (
𝛽=
0
.
84
, 𝑆𝐸 =
0
.
272
, 𝑋 2(
1
,
75
)=
8
.
577
, 𝑝 =.
003) with eect size
𝑅2
𝑐
=
0
.
313 and Progress Feedback
(PF) (
𝛽=
0
.
68
, 𝑆𝐸 =
0
.
313
, 𝑋 2(
1
,
75
)=
4
.
557
, 𝑝 =.
033) with eect
size 𝑅2
𝑐
=0.368.
Regarding psychosocial consequences, these observed dierences
reached the level of signicance at the composite level (
𝛽=
0
.
41
, 𝑆𝐸 =
0
.
160
, 𝑋 2(
1
,
75
)=
6
.
322
, 𝑝 =.
011) with eect size
𝑅2
𝑐
=
0
.
574.
More specically, signicant improvements were observed on the
1
Detailed model specications and dataset can be found in the supplementary
materials.
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada Aqeel Haider et al.
Figure 1: Screenshots of the nine games developed, representing dierent genres.
constructs of Curiosity (CUR) (
𝛽=
0
.
46
, 𝑆𝐸 =
0
.
211
, 𝑋 2(
1
,
75
)=
4
.
623
, 𝑝 =.
031
, 𝑅2
𝑐
=
0
.
362) with eect size
𝑅2
𝑐
=
0
.
362, and Meaning
(MEA) (
𝛽=
0
.
65
, 𝑆𝐸 =
0
.
293
, 𝑋 2(
1
,
75
)=
4
.
727
, 𝑝 =.
031
, 𝑅2
𝑐
=
0
.
212)
with eect size 𝑅2
𝑐
=0.212.
Regarding the observed dierences for the construct of Enjoy-
ment, this did not reach the level of signicance (
𝛽=
0
.
29
, 𝑆𝐸 =
0
.
45
, 𝑋 2(
1
,
75
)=
0
.
402
, 𝑝 =.
525). In contrast, for the NPS, the ob-
served dierences across the two iterations did reach signicance
(
𝑏𝑒𝑡 𝑎 =
0
.
75
, 𝑆𝐸 =
0
.
329
, 𝑋 2(
1
,
75
)=
4
.
778
, 𝑝 =.
029) with eect
size 𝑅2
𝑐
=0.352.
4.2 RQ2: Can the miniPXI successfully identify
variations via immediate recall PX, among
dierent game genres?
Dierences in construct ratings were observed between the game
genres, see Figure 2and Table 1, RQ2. Regarding functional conse-
quences, these observed dierences among the nine genres reached
the level of signicance at the composite level (
𝑋2(
1
)=
25
.
779
, 𝑝 =
.
001) with eect size
𝑅2
𝑐
=
0
.
686. More specically on the constructs
of Audiovisual Appeal (
𝑋2(
8
,
75
)=
27
.
305
, 𝑝 =<.
001) with eect
size
𝑅2
𝑐
=
0
.
532,Challenge (
𝑋2(
8
,
75
)=
22
.
439
, 𝑝 =.
004) with eect
size
𝑅2
𝑐
=
0
.
534,Clarity of Goals (
𝑋2(
8
,
75
)=
20
.
063
, 𝑝 =<.
010)
with eect size
𝑅2
𝑐
=
0
.
494 and Progress Feedback (PF) (
𝑋2(
8
,
75
)=
18.261, 𝑝 =.019) with eect size 𝑅2
𝑐
=0.523.
Regarding psychosocial consequences, these observed dierences
among the nine game genres reached the level of signicance
at the composite level (
𝑋2(
8
,
75
)=
20
.
431
, 𝑝 =.
009) with eect
size
𝑅2
𝑐
=
0
.
694. More specically on the constructs of Autonomy
(
𝑋2(
1
)=
15
.
633
, 𝑝 =.
047) with eect size
𝑅2
𝑐
=
0
.
316,Immersion
(
𝑋2(
8
,
75
)=
16
.
400
, 𝑝 =.
037) with eect size
𝑅2
𝑐
=
0
.
603, and Mas-
tery (𝑋2(8,75)=22.638, 𝑝 =.003) with eect size 𝑅2
𝑐
=0.570.
Similar to the results for RQ1, the observed dierences for the con-
struct of Enjoyment did not reach signicance levels (
𝑋2(
8
,
75
)=
7
.
953
, 𝑝 =.
438). For the NPS, in contrast, the observed dierences
between games did reach signicance (
𝑋
2
(
8
,
75
)=
19
.
377
, 𝑝 =.
029)
with eect size 𝑅2
𝑐
=0.567.
5 DISCUSSION
Our work investigates the capabilities of the miniPXI in capturing
the nuances of player experience (PX) and its potential as a useful
tool for evaluating game prototypes and identifying dierences
between dierent game genres. Here, we discuss the ndings related
to each research question and highlight the potential implications
of utilizing the miniPXI in similar evaluation settings.
Regarding the identication of dierences across dierent itera-
tions of game prototypes (RQ1), the miniPXI measured improve-
ments in construct scorings for each of the functional consequences,
with two out of ve reaching signicance levels. Additionally, the
miniPXI also measured improvements for the psychosocial con-
structs, with the exception of Autonomy. At the composite level of
functional and psychosocial consequences, these reected medium
to large eect sizes, as per guidance by Cohen [
9
]. Finally, the
miniPXI also observed improvements in the Enjoyment construct,
yet here, improvements did not reach signicance. In contrast, the
NPS did report a signicant dierence across the two iterations.
These preliminary ndings suggest that the miniPXI can be a valu-
able tool to be used in iterative playtesting, where there is a need
for rapid measurements. Interestingly, improvements in scores are
more pronounced for the constructs at the level of functional con-
sequences than for the constructs at the level of psychosocial con-
sequences, as can be witnessed from the larger eect sizes. Possibly,
this can be explained through the understanding of prototypes
being evaluated. Likely, the dierences in immediate tangible im-
provements result in stronger experienced improvements. Most
notably, the Enjoyment construct did underperform when com-
pared to the NPS metric.
Regarding the exploration of dierences across dierent games
genres (RQ2), the miniPXI also demonstrated its capability to detect
variations in ratings on the miniPXI constructs. At the composite
level, both functional and psychosocial consequences reached sig-
nicance and showed nearly identical medium to large eect sizes.
For the individual constructs, seven out of eleven reach signicance,
despite the games being in their initial development phases. This
miniPXI across Game Iterations CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
Figure 2: Construct ratings for the 9 games across two iterations ( = Iteration 1, = Iteration 2), and the eleven miniPXI
constructs (AA = Audiovisual Appeal, CH = Challenge, EC = Ease of control, CG = Clarity of goals, PF = Progress feedback, AUT
= Autonomy, CUR = Curiosity, IMM = Immersion, MAS = Mastery, MEA = Meaning, ENJ = Enjoyment).
Table 1: Mean, Standard deviations (
𝑆𝐷
), Estimate (
𝛽
) and Standard error (
𝑆𝐸
) of iteration two (RQ1 only), Chi-square tests (
𝑋2
),
degrees of freedom (
𝑑 𝑓
), sample size (
𝑁
), level of signicance (
𝑝
), and conditional
𝑅2
(
𝑅2
𝑐
) for LMEM per construct for RQ1 and
RQ2. Signicant ndings are highlighted in bold.
Construct
RQ1 RQ2
Mean (𝑆𝐷) Likelihood test Likelihood test
Iteration 1 Iteration 2 𝛽 𝑆𝐸 𝑋 2(𝑑 𝑓 , 𝑁 )𝑝 𝑅2
𝑐𝑋2(𝑑 𝑓 , 𝑁 )𝑝 𝑅2
𝑐
Functional Constructs 1.00 (1.00) 1.44 (1.04) 0.52 0.157 9.704 (1, 75) .002 0.612 25.779 (8, 75) .001 0.686
functional
Audiovisual Appeal (AA) 1.96 (1.32) 2.19 (1.02) 0.20 0.197 1.017 (1, 75) .313 0.461 27.305 (8, 75) <.001 0.532
Challenge (CH) 0.33 (1.64) 0.62 (1.65) 0.30 0.257 1.406 (1, 75) .235 0.524 22.439 (8, 75) .004 0.534
Ease of Control (EC) 1.15 (1.46) 1.58 (1.24) 0.84 0.272 8.577 (1, 75) .003 0.313 13.451 (8, 75) .090 0.407
Clarity of Goals (GR) 1.19 (1.84) 1.69 (1.23) 0.57 0.292 3.731 (1, 75) .053 0.371 20.063(8, 75) .010 0.494
Progress Feedback (PF) 0.22 (1.65) 1.00 (1.83) 0.68 0.313 4.557 (1, 75) .033 0.368 18.261 (8, 75) .019 0.523
Psychosocial Constructs 1.37 (0.94) 1.68 (1.02) 0.41 0.160 6.322 (1, 75) .011 0.574 20.431 (8, 75) .009 0.694
psychosocial
Autonomy (AUT) 1.59 (1.19) 1.58 (1.45) 0.35 0.285 1.558 (1, 75) .212 0.213 15.633 (8, 75) .047 0.316
Curiosity (CUR) 1.96 (1.13) 2.31 (0.79) 0.46 0.211 4.623 (1, 75) .031 0.362 11.328 (8, 75) .184 0.560
Immersion (IMM) 1.26 (1.83) 1.73 (1.28) 0.47 0.263 3.219 (1, 75) .072 0.458 16.400 (8, 75) .037 0.603
Mastery (MAS) 1.19 (1.39) 1.31 (1.44) 0.23 0.220 1.095 (1, 75) .295 0.519 22.638 (8, 75) .003 0.570
Meaning (MEA) 0.81 (1.27) 1.58 (1.33) 0.65 0.293 4.727 (1, 75) .031 0.212 09.669 (8, 75) .289 0.381
Enjoyment (ENJ) -0.44 (1.95) -0.08 (2.17) 0.29 0.402 0.402 (1, 75) .525 - 07.953 (8, 75) .438 0.269
NPS 7.22 (1.59) 7.97 (1.69) 0.75 0.329 4.778 (1, 75) .029 0.352 19.377 (8, 75) .013 0.567
capability may be particularly useful in academic settings, which
typically involve the development and PX evaluations of multi-
ple research-based games. Again, we observe that the Enjoyment
item underperforms; it does not reach signicance and reects the
smallest eect size of all constructs. At rst glance it could be hy-
pothesized that these may be explained by Enjoyment being an
umbrella construct for the dierent functional and psychosocial
constructs. One could reason that overall, as an ‘averaged measure’,
it simply reects similar levels of enjoyment for the nine game
CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada Aqeel Haider et al.
genres. However, this is contradicted by the strong performance of
the NPS metric.
Across RQ1 and RQ2, these ndings suggest that the Enjoyment
item, phrased as "I had a good time playing this game" performs
suboptimally, and suggest that GUR need to reect on whether
they truly aim to gauge enjoyment or rather appreciation. Possibly,
players may still appreciate the game at large, despite the lack of a
‘good time’ during game play. This may be particularly relevant for
serious games, that aim to give players pause, or games that provide
hard fun, where frustration is part of the cycle of experiences that
players undergo. Instead of Enjoyment, incorporating NPS into PX
evaluations can assist GUR experts in gaining a comprehensive
understanding of players’ appreciation of the game. Additionally,
given the limitations of short questionnaires in capturing the multi-
faceted nature of PX [
14
], integrating NPS with short questionnaires
has the potential to complement ndings and enhance the validity
and robustness of research outcomes.
6 LIMITATIONS AND FUTURE WORK
The current study has a number of limitations and suggests pos-
sible future research directions. First, the overall sample size (16
participants) was small. Using LMEM, inferential statistical analysis
on this nested and balanced data was carried out. Yet, future stud-
ies should aim to include larger sample sizes to improve the rigor
with which miniPXI feasibility is investigated. Additionally, future
research may investigate the validity of shorter questionnaires by
focusing on particular genres or genres that are closely related,
and formulating hypotheses with respect to certain dimensions of
player experience. Finally, incorporating existing validated scales
along with the shorter scale can also aid in conducting exhaustive
and rigorous validity analyses.
Currently, a comprehensive validity study employing full-scale
measures is in progress, and its results will provide valuable insights
into the eld. Nevertheless, the ndings indicate that the miniPXI
demonstrates an ability to capture dierences, with the exception
of the ENJ, showcasing its potential as a tool for measuring player
experience in evolving game iterations.
7 CONCLUSION
Based on the results of our preliminary study, the miniPXI question-
naire utilized in this study demonstrated its potential for investi-
gating dierences in player experience between iterations of game
prototypes and across dierent game genres. Despite the limited
number of participants and the use of simplied game versions,
the miniPXI was able to capture variations in player experience
at the level of functional and psychosocial consequences, showing
medium to large eects. This indicates that miniPXI is appropriate
for evaluating individual game iterations, and across game genres,
as it oers time-saving benets and a minimal evaluation setup.
However, our study also showed that the Enjoyment construct
performed suboptimally, and suggest that for a more holistic under-
standing of PX, and when aiming for the identication of overall
player appreciation toward the game, it might be more useful to
add the Net Promotor score.
REFERENCES
[1]
Vero Vanden Abeele, Katta Spiel, Lennart Nacke, Daniel Johnson, and Kathrin
Gerling. 2020. Development and validation of the player experience inventory:
A scale to measure player experiences at the level of functional and psychosocial
consequences. International Journal of Human Computer Studies 135, June 2019
(2020), 102370. https://doi.org/10.1016/j.ijhcs.2019.102370
[2]
Ahmad Azadvar and Alessandro Canossa. 2018. UPEQ: Ubisoft Perceived Expe-
rience Questionnaire: A Self-Determination Evaluation Tool for Video Games.
In Proceedings of the 13th International Conference on the Foundations of Digital
Games (FDG ’18). Association for Computing Machinery, New York, NY, USA,
Article 5, 7 pages. https://doi.org/10.1145/3235765.3235780
[3]
Sven Baehre, Michele O’Dwyer, Lisa O’Malley, and Nick Lee. 2022. The use of
Net Promoter Score (NPS) to predict sales growth: insights from an empirical
investigation. Journal of the Academy of Marketing Science 50, 1 (01 Jan 2022),
67–84. https://doi.org/10.1007/s11747-021- 00790-2
[4]
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting
Linear Mixed-Eects Models Using lme4. Journal of Statistical Software 67, 1
(2015), 1–48. https://doi.org/10.18637/jss.v067.i01
[5]
Regina Bernhaupt. 2015. Game User Experience Evaluation. Springer, Cham,
Switzerland.
[6]
Jeanne H Brockmyer, Christine M Fox, Kathleen A Curtiss, Evan McBroom,
Kimberly M Burkhart, and Jacquelyn N Pidruzny. 2009. The development of
the Game Engagement Questionnaire: A measure of engagement in video game-
playing. Journal of Experimental Social Psychology 45, 4 (2009), 624–634. https:
//doi.org/10.1016/j.jesp.2009.02.016
[7]
M.-T. Cheng, H.-C. She, and L A Annetta. 2015. Game immersion experience: its
hierarchical structure and impact on game-based science learning. Journal of
Computer Assisted Learning 31, 3 (2015), 232–253. https://doi.org/10.1111/jcal.
12066
[8]
Sung Hyeon Cheon and Johnmarshall Reeve. 2015. A classroom-based interven-
tion to help teachers decrease students’ amotivation. Contemporary Educational
Psychology 40 (Jan. 2015), 99–111. https://doi.org/10.1016/j.cedpsych.2014.06.004
[9]
Jacob Cohen. 2013. Statistical Power Analysis for the Behavioral Sciences. Routledge.
https://doi.org/10.4324/9780203771587
[10]
Ciprian M Crainiceanu and David Ruppert. 2004. Likelihood ratio tests in linear
mixed models with one variance component. Journal of the Royal Statistical
Society: Series B (Statistical Methodology) 66, 1 (2004), 165–185.
[11]
Alena Denisova, A. Imran Nordin, and Paul Cairns. 2016. The convergence of
player experience questionnaires. In Proceedings of the 2016 Annual Symposium
on Computer-Human Interaction in Play. Association for Computing Machinery,
New York, NY, USA, 33–37. https://doi.org/10.1145/2967934.2968095
[12]
Christyn L Dolbier, Judith A Webster, Katherine T McCalister, Mark W Mallon,
and Mary A Steinhardt. 2005. Reliability and Validity of a Single-Item Measure
of Job Satisfaction. American Journal of Health Promotion 19, 3 (2005), 194–198.
https://doi.org/10.4278/0890-1171- 19.3.194
[13]
Aimee L. Drolet and Donald G. Morrison. 2001. Do We Really Need Multiple-
Item Measures in Service Research? Journal of Service Research 3, 3 (Feb. 2001),
196–204. https://doi.org/10.1177/109467050133001
[14]
Aqeel Haider, Casper Harteveld, Daniel Johnson, Max V. Birk, Regan L. Mandryk,
Magy Seif El-Nasr, Lennart E. Nacke, Kathrin Gerling, and Vero Vanden Abeele.
2022. MiniPXI: Development and Validation of an Eleven-Item Measure of the
Player Experience Inventory. Proc. ACM Hum.-Comput. Interact. 6, CHI PLAY,
Article 244 (oct 2022), 26 pages. https://doi.org/10.1145/3549507
[15]
Yu-Guan Hsieh, Kimon Antonakopoulos, and Panayotis Mertikopoulos. 2021.
Adaptive learning in continuous games: Optimal regret bounds and convergence
to Nash equilibrium. In Conference on Learning Theory. PMLR, 2388–2422.
[16]
Charlene Jennett, Anna L. Cox, Paul Cairns, Samira Dhoparee, Andrew Epps,
Tim Tijs, and Alison Walton. 2008. Measuring and dening the experience of
immersion in games. International Journal of Human Computer Studies 66, 9 (Sept.
2008), 641–661. https://doi.org/10.1016/j.ijhcs.2008.04.004
[17]
Joseph R. Keebler, William J. Shelstad, Dustin C. Smith, Barbara S. Chaparro, and
Mikki H. Phan. 2020. Validation of the GUESS-18: A Short Version of the Game
User Experience Satisfaction Scale (GUESS). J. Usability Studies 16, 1 (nov 2020),
49–62.
[18]
Jeanine Krath, Maximilian Altmeyer, Gustavo F. Tondello, and Lennart E. Nacke.
2023. Hexad-12: Developing and Validating a Short Version of the Gamication
User Types Hexad Scale. In Proceedings of the 2023 CHI Conference on Human
Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for
Computing Machinery, New York, NY, USA, Article 677, 18 pages. https://doi.
org/10.1145/3544548.3580968
[19]
Alexandra Kuznetsova, Per B. Brockho, and Rune H. B. Christensen. 2017.
lmerTest Package: Tests in Linear Mixed Eects Models. Journal of Statistical
Software 82, 13 (2017), 1–26. https://doi.org/10.18637/jss.v082.i13
[20]
Robert Loo. 2002. A caveat on using single-item versus multiple-item scales.
Journal of Managerial Psychology 17, 1 (01 Jan 2002), 68–75. https://doi.org/10.
1108/02683940210415933
miniPXI across Game Iterations CHI PLAY ’23 Companion, October 10–13, 2023, Stratford, ON, Canada
[21]
Pejman Mirza-Babaei, Lennart Nacke, Geraldine Fitzpatrick, Gareth White, Gra-
ham McAllister, and Nick Collins. 2012. Biometric Storyboards: Visualising Game
User Research Data. In CHI ’12 Extended Abstracts on Human Factors in Computing
Systems (Austin, Texas, USA) (CHI EA ’12). Association for Computing Machinery,
New York, NY, USA, 2315–2320. https://doi.org/10.1145/2212776.2223795
[22]
Mark S. Nagy. 2002. Using a single-item approach to measure facet
job satisfaction. Journal of Occupational and Organizational Psychol-
ogy 75, 1 (2002), 77–86. https://doi.org/10.1348/096317902167658
arXiv:https://bpspsychub.onlinelibrary.wiley.com/doi/pdf/10.1348/096317902167658
[23]
Frederick F Reichheld. 2003. The one number you need to grow. Har vard business
review 81, 12 (2003), 46–55.
[24]
Richard M Ryan, C Scott Rigby, and Andrew Przybylski. 2006. The Motivational
Pull of Video Games: A Self-Determination Theory Approach. Motivation and
Emotion 30, 4 (2006), 344–360. https://doi.org/10.1007/s11031- 006-9051-8
[25]
Jerey M. Stanton, Evan F. Sinar, William K. Balzer, Amanda L. Julian, Paul Thore-
sen, Shahnaz Aziz, Gwenith G. Fisher, and Patricia C. Smith. 2002. Development
of a compact measure of job satisfaction: The abridged Job Descriptive Index.
Educational and psychological measurement 62, 1 (2002), 173–191. Publisher: Sage
Publications Sage CA: Los Angeles, CA.
[26]
Team, R Core and others. 2013. R: A language and environment for statistical
computing.
[27]
Margaret Verkuyl,Naza Djafarova, Paula Mastrilli, and Lynda Atack. 2022. Virtual
Gaming Simulation: Evaluating Players’ Experiences. Clinical Simulation in
Nursing 63 (2022), 16–22. https://doi.org/10.1016/j.ecns.2021.11.002
[28]
J P Wanous, A E Reichers, and M J Hudy. 1997. Overall job satisfaction: how good
are single-item measures? The Journal of applied psychology 82, 2 (apr 1997),
247–252. https://doi.org/10.1037/0021-9010.82.2.247
Received 2023-06-22; accepted 2023-08-03