ArticlePDF Available

The future value of serious games for assessment: Where do we go now?

Authors:

Abstract

Game-based assessments will most likely be an increasing part of testing programs in future generations because they provide promising possibilities for more valid and reliable measurement of students’ skills as compared to the traditional methods of assessment like paper-and-pencil tests or performance-based assessments. The current status of serious games for assessment has been highlighted from several angles in the previous articles of this special issue. Here, we will synthesize the findings from the individual papers to demonstrate how to best benefit from the advantages of game-based assessment and how to address the many challenges that still remain. In the first part we will once more discuss how game-based assessments advantages can play a role in future testing, and in the second part we will address one of the most daring challenges: the psychometrics behind the game. In a short conclusion section we will discuss how research and practice should shape a future generation of game-based assessment.
Journal of Applied Testing Technology, Vol 18(S1), 32-37, 2017
* Author for correspondence
1. Using Serious Games for
Assessment
Using serious games as a tool for assessment may both
expand and strengthen the domain of assessment. It is
hypothesized that the domain may be expanded because
serious games have the potential to reveal Knowledge,
Skills, and Attributes (KSAs) of students that are “invisible
or hard to detect when assessed with more traditional
assessment methods. ey may strengthen the domain
because the measurement of particular KSAs can be
improved (i.e., increased validity and reliability) through
the use of technology in serious games as compared to
traditional assessments like paper-and-pencil tests or
performance-based assessments (Iseli, Koeig, Lee, &
Wainess, 2010; Levy, 2013; De Klerk, Veldkamp, & Eggen,
2014; Mislevy et al., 2015).
An example of how serious games can expand
the domain of assessment can be found in a game-
based assessment that has been developed by CRESST
(Iseli, Koenig, Lee, & Wainess, 2010). is game-based
assessment was used to assess marine personnels
reactions to emergency situations. In the game, the player
(i.e., the test taker) was required to react to a variety of
emergencies that could occur on a marine vessel (e.g., a
re) as a virtual character. rough an interactive interface
the marine personnel could indicate which actions they
thought should be taken and which priorities should be
set to achieve goals in the virtual emergency situations.
All of their (re)actions in the serious game were recorded
and analyzed in log les. is game-based assessment
technology, and others like it, confronts test takers with
more and more realistic situations in which to measure
behaviors that represent reactions, decisions, planning,
and prioritizing than would be possible in, for example,
a practical performance-based assessment. Serious games
thus have the capability to present users with an expanded
set of situations and contexts in which an expanded set of
behaviors and constructs can also be assessed compared
to traditional self-report methods of assessment.
An example of how serious games can strengthen
assessments can be seen in a game developed for
Abstract
Game-based assessments will most likely be an increasing part of testing programs in future generations because they
provide promising possibilities for more valid and reliable measurement of students’ skills as compared to the traditional
methods of assessment like paper-and-pencil tests or performance-based assessments. The current status of serious
games for assessment has been highlighted from several angles in the previous articles of this special issue. Here, we will

          
game-based assessments advantages can play a role in future testing, and in the second part we will address one of the
most daring challenges: the psychometrics behind the game. In a short conclusion section we will discuss how research
and practice should shape a future generation of game-based assessment.
Keywords: Evidence-Centered Design, Game-Based Assessment, Psychometrics, Serious Game, Training and
Assessment
The Future Value of Serious Games for
Assessment: Where Do We Go Now?
Sebastiaan de Klerk1* and Pamela M. Kato2
1University of Twente/eX:plain, Enschede, Netherlands; s.dklerk@explain.nl
2Coventry University, Coventry, United Kingdom
Sebastiaan de Klerk and Pamela M. Kato
Vol 18 (S1) | 2017 | www.jattjournal.com Journal of Applied Testing Technology 33
formative assessment by WestEd, called SimScientists
(Quellmalz, Timms, Silberglitt, & Buckley, 2012). is
game-based assessment was built for students around
the age of 12 and comprises a virtual environment in
which students can engage in science tasks. For example,
students are presented with an animation of the ocean
and are required to draw a “food web" between several
animals and plants by drawing arrows that connect them.
Assessment is strengthened by the fact that it is not only
the nal food web that is logged (i.e., the product data)
but also the process data like reaction times, navigation
paths through the game, and the intermediate steps taken
to arrive at the nal product. is new information may
have value with regard to the inferences made about the
KSAs of students, and can for example also be evaluated
as diagnostic evidence to identify misconceptions of
students. e articles in this special issue demonstrate
that both eects, expansion and strengthening of
measurement, can be achieved when both the application
of serious game design principles and assessment design
principles are combined.
ere is much enthusiasm in the eld of education
about game-based assessment (Mislevy et al., 2014)
because the current methods of assessment do not seem
to fully have the power to measure all aspects of students’
KSAs (De Klerk, Eggen, & Veldkamp, 2014). e results
of standardized tests are, to an increasing degree, not
only used to evaluate students, but also to evaluate
the success of schools to teach their students (Nelson,
Nugent, and Rupp, 2012). Yet, if we cannot even be sure
that standardized tests provide the most valid and reliable
indicators of students’ KSAs, then how can we see those
indicators as valid and reliable for evaluating schools? As
the research in this special issue shows, there might be an
important role for game-based assessment in lling the
reliability and validity gap created by the strong emphasis
on standardized testing in education. Considering the
strong improvements in statistical methodology and
technology over the recent years now may be the time
to capitalize on the full potential game-based assessment
may provide. e articles presented in this special issue
provide valuable insights regarding this potential.
With regard to statistical methodology for game-
based assessment there is much promise in the eld of
Educational Data Mining (EDM) (Rupp, Nugent, &
Nelson, 2012; Rupp, DiCerbo, Levy, Benson, Sweet,
Crawford et al., 2012). EDM is concerned with nding
meaningful relationships in big data that are logged
by educational applications. Several techniques and
(statistical) methodologies are grouped under the
broader concept of EDM. For example, cluster analysis
can be used to nd clusters of data, network analysis
is concerned with identifying how data elements are
connected, and regression trees can be used to predict
future performance (Kerr & Chung, 2012; Gobert, Sao
Pedro, Baker, Toto, & Montalvo, 2012; Mislevy, Behrens,
& DiCerbo, 2012). EDM techniques are for the most part
exploratory techniques that are applied as a rst step to
nd meaningful patterns in the data. Advances have also
been made in conrmatory statistical techniques. For
example, Bayesian network methods are oen used to
make probabilistic statements about knowledge, skills,
and abilities of students based on their performance in a
game-based assessment (De Klerk et al., 2015; Mislevy et
al., 2014).
A Bayesian Network is a graphical structure to
reason under uncertainty and is based on a Bayesian
psychometric modeling framework (Pearl, 1988). e
network consists of nodes (which are observable and
latent variables) that are connected through arcs (arrows)
which depict conditional probabilistic relationships
between the variables (Neapolitan, 2003). e observable
variables can be students’ actions in a serious game, while
the latent variables are the KSAs that are the subject of
measurement. Students’ actions inuence the state of the
KSAs t hrough the conditional probabilities that are dened
in the network. At rst, the conditional probabilities can
be based on subject matter expert input, and can later be
dened through data. In that way, a Bayesian Network
can be updated and improved continuously.
Although it still is a time and cost intensive process
to build a digital learning or assessment environment,
let alone a full immersive simulation, the technological
improvements driven by the digital revolution make
development of these environments more feasible. For
example, in the Netherlands, the commercial educational
serious game, Math Garden, was built by the University of
Amsterdam to have children playfully develop their math
skills (Straatemeier, 2014). As digital technology improves
and becomes more accessible, combining commercial
and educational opportunities might be a strategy for
more institutions and companies to start building serious
games for learning and assessment.
Another important benet of the increased
technological and statistical possibilities is that large
quantities of data can be processed, logged, and analyzed.
Vol 18 (S1) | 2017 | www.jattjournal.com Journal of Applied Testing Technology
34
The Future Value of Serious Games for Assessment: Where Do We Go Now?
Each interaction between student and computer, (e.g.,
mouse clicks, keystrokes, virtual location information,
spoken text data, etc.) can be recorded and analyzed,
both ad hoc and over time (Nelson et al., 2012).
However, as DiCerbo excellently discusses in this issue,
‘handling’ big data is not the highest purpose. e real
challenge lies in locating and supporting the evidentiary
structures between serious game performances and
measuring students’ KSAs. Answering this challenge can
be regarded as a key task of measurement experts.
e Evidence-Centered Design (ECD) framework
(Mislevy, Steinberg, & Almond, 2003) is an important
tool for building an evidentiary argument in game-based
assessment (DiCerbo, 2017). When strong evidentiary
arguments can be built, game-based assessment may t
very well into a new paradigm that integrates assessment
and training. e concept of continuous evaluation,
where assessment and training are two sides of the same
coin, goes well with serious games because a typical
student will likely spend more time in a serious game
than on a standardized test. Furthermore, the possibilities
discussed above may provide the opportunity to
continually evaluate students’ movements and actions
in the serious game over long periods of time, making
it possible to monitor students’ subtle improvements in
their knowledge and abilities. e extended sampling, as
compared to a traditional test, and the stronger alignment
between what has been taught and what is tested may
ultimately provide the most reliable and valid inferences
regarding what students learning outcomes are. e eld
is in need of serious games that do this well and studies
that document their improved reliability and validity as
assessment tools.
Interestingly, serious games have long been used for
e-learning purposes as it is broadly recognized that ‘fun
is an important incentive for learning (Clark & Mayer,
2011; Kato, 2010). Generally speaking, learning and
assessment are not yet aligned, as many people do not
nd it very pleasurable to take a test. For many people
taking a (standardized) test is a tense experience, and so-
called test anxiety can have a negative impact on students
performance on tests (Elliot & McGregor, 1999). Shute
(2011) indicates that ‘testing’ in a serious game can give
students more of a fun feeling and may challenge them
to perform at their best. When students experience this,
they may “forget” that they are in a testing situation and
test anxiety may be diminished. is eect has been
labeled stealth assessment by Shute. Besides standardized
testing, this eect can also hold for performance-based
assessment in which students’ practical performance
is observed and evaluated by a rater. However, game-
based assessment is less obtrusive as there is no rater
physically judging your performance, which in turn
may aect your performance. is may further increase
the validity of game-based assessment when compared
to performance-based assessment. is potential eect
should be investigated in future research endeavors.
An important issue is the narrow sampling of learning
objectives in a (standardized) test. e use of game-based
assessment can potentially increase the representativeness
of the assessment in two ways: by increasing the number
of tasks in the assessment, and by creating tasks that
could never be tested in a paper-and-pencil test or
a performance-based assessment. For example, the
serious game presented in the paper by Bauer, Wylie,
Jackson, Mislevy, Homan-John, and John (2017), called
Mars Generation One, was developed for teaching and
formative assessment of argumentation. Many dierent
kinds of argument and argumentation schemes have been
incorporated in the game, and as a student progresses
through the game he or she will encounter all types of
argumentation, "claims," and rebuttals. How these were
handled and used in the game is recorded and analyzed,
and the results are then used for teaching, both inside
and outside of the game environment. In a traditional
setting, be it a standardized test or a performance-based
assessment, a student could have been tested on only
one or two types of argumentation. However, because
the number of tasks in a serious game assessment can
be increased and broadened signicantly, the sampling
variability can be increased, thus, potentially improving
the representativeness of the measurements and validity
of the assessment.
2. Psychometric Considerations
In one of the biggest advantages of game-based
assessment also lays the greatest challenge. What to do
with and how to interpret the enormous quantities of
data that can be produced by performing a game-based
assessment (Levy, 2013). In contrast to a standardized
test, which only produces product data, a serious game
also provides process data. Product data are the observed
values that students produce by performing in a test (or
game) that give an indication of their performance. In
Sebastiaan de Klerk and Pamela M. Kato
Vol 18 (S1) | 2017 | www.jattjournal.com Journal of Applied Testing Technology 35
a standardized test this is usually the score of number
correct. Process data are the actual log les of data
collected that can, in great detail, when analyzed, show
how students have reached their product data. Process
data are mouse clicks, keystrokes, navigational behavior,
time stamps, etc. (Rupp et al., 2012). Performance in
a serious game can produce many pages of log le data
in just a short period of time. e challenge is to nd
meaningful relationships between the data presented
in the log les and their relationships to the constructs
to be measured in real life. As DiCerbo and colleagues
demonstrate in this issue (2017), the ECD framework
can be an excellent point of departure for building an
evidentiary argument in which game-based assessment
performance data is used as evidence for understanding
about students’ knowledge, skills and abilities.
In fact, based on the ECD framework, three challenges
have to be met to build a strong evidentiary structure: a
student model, a task model, and an evidence model (Levy,
2013). In the student model details and specications are
provided regarding what (combination of) knowledge,
skills, and abilities are being measured. ese KSAs are
called Student Model Variables or SMVs and can be
more generally qualied as latent variables (Mislevy et
al., 2014). ese variables cannot be directly observed
and are subject of indirect inference based on a statistical
model (which is part of the evidence model). An example
of such a variable is creativity: we cannot directly measure
creativity, as we can for example measure somebody’s
height. As a result, we have to infer a person’s level of
creativity from observing their behavior. e challenge
then of course is to dene which type(s) of behavior(s)
reveal something about a person’s ability to be creative and
then to create situations that would reveal those abilities
among those who are creative (and less of those behaviors
among those less of those abilities). Shute, Bauer, Ventura,
and Zapata-Rivera (2009), for example, had students
play a serious game in which tasks and objectives could
be completed in many dierent ways: some creative and
some not. is indicates that it is important to create tasks
that give students the opportunity to demonstrate their
SMVs.
is challenge can be met with the task model in the
ECD framework. e task model species which tasks,
assignments or objectives students have to complete in the
game. Completing these tasks, of course, has to yield data,
or observable variables (OVs), that provide information
about the latent SMVs. In contrast to a standardized test,
the tasks in game-based assessments generally cannot be
operationalized as questions (although questions can be
part of the assessment), but are an integrated part of the
game play. e challenge is to design and develop clear
tasks that can be scored inside the virtual environment
and provide the necessary information to make valid
inferences about the status of the SMVs.
e most important model in the ECD framework
could well be the evidence model. e evidence model
is where theory and data come together through the two
coherent processes of evidence identication and evidence
accumulation (Levy, 2013). As mentioned above, serious
games provide the opportunity to not only collect product
data, but also process data. e rst challenge here, the
evidence identication process, is to identify which
elements in the process data provide meaningful evidence
with regard to the SMVs. e identied elements are then
called the Observable Variables (OVs), and will later serve
as input for the psychometric model. e second process
within the evidence model is evidence accumulation.
When the OVs have been identied, the psychometric
model serves to transform them for each student in a
unidimensional or multidimensional score that validly
represents the SMVs. In a traditional standardized test
consisting of multiple choice questions the OVs are
generally zeros and ones, resulting in relatively simple
psychometric models with one or two item parameters,
However, in a game-based assessment, more variables can
be parameterized (i.e., as predictors) in the psychometric
model. Furthermore, in traditional testing most oen
the answers students give to questions are considered to
be independent of each other, that is, it is assumed that
there is no (statistical) relationship between the answers
to questions X and Y. is is generally not the case in a
serious game because the actions that somebody can
perform at a certain point of time in the game are oen
dependent on what has been done before. Mislevy et al.,
(2014) refer to this phenomena as the change state of a
serious game. Finally, the relationship between SMVs and
OVs is quite complex because multiple variables in the
game can be dependent upon and interact with multiple
(combinations of ) students’ knowledge, skills, and
abilities in real life. e process of building an evidentiary
argument for a game-based assessment is a dicult and
laborious process consisting of many iterations and tests.
DiCerbo’s article (2017) presents a nice overview of such
a process.
Vol 18 (S1) | 2017 | www.jattjournal.com Journal of Applied Testing Technology
36
The Future Value of Serious Games for Assessment: Where Do We Go Now?
3. The Future
We expect that game-based assessment will increasingly
evolve into interactive and immersive virtual environments
in which students can freely wander around to complete
tasks and assignments with their actions being scored
on the y with feedback being immediately available.
With digital and technological applications continually
improving, we may also see virtual reality moving more
into the domain of educational assessment. Immersive
virtual reality simulations can also oer solutions for the
measurement of complex practical skills like painting,
welding, or other industrial professions.
Many game-based assessments are still only used for
formative purposes, providing one summary score as an
indicator of a construct. With the psychometric models
improving (e.g., Mislevy et al., 2014), we might also see
game-based assessment being used for a summative or
credentialing purpose in the future. Future research
should therefore focus on investigating the extent
to which serious games can be used in high-stakes
assessment. Maybe formative and summative assessment
in serious games can be more integrated in the future.
For instance, students’ performance can be constantly
monitored and when they reach a certain standard they
automatically do some sort of summative module or they
immediately pass the ‘test’ when enough information has
been collected. It is further critical that these eorts to
develop game-based assessments be evaluated with high
standards of scientic integrity to ensure they are valid
and reliable (Kato, 2012).
4. Conclusions
e articles discussed in this special issue of JATT on
Serious Games in Assessment contribute to the increasing
body of research on game-based assessment. is is an
interesting eld of research that is full of promise and
that continues to require rigorous attention from the
perspectives of both research and practice.
5. References
Bauer, M., Wylie, C. Jackson, T., Mislevy, R. J., Homan-John,
E., & John, E. (2017). Journal of Applied Testing Technology.
Clark, R. C., & Mayer, R. E. (2011). E-learning and the sci-
ence of instruction. San Francisco: Pfeier. https://doi.
org/10.1002/9781118255971
De Klerk, S., Eggen, T. J. H. M., & Veldkamp, B. P. (2014).
A blending of computer-based assessment and perfor-
mance-based assessment: Multimedia-Based Performance
Assessment (MBPA). e introduction of a new method
of assessment in Dutch Vocational Education and Train-
ing (VET). Cadmo, 22(1), 39-56. https://doi.org/10.3280/
CAD2014-001006
De Klerk, S., Veldkamp, B. P., & Eggen, T. J. H. M. (2015).
Psychometric analysis of the performance data of simula-
tion-based assessment: A systematic review and a Bayes-
ian network example. Computers & Education, 85, 23–34.
https://doi.org/10.1016/j.compedu.2014.12.020
DiCerbo, K. (2017). Building the evidentiary argument in
game-based assessment. Journal of Applied Testing Tech-
nology.
Elliot, A. J., & McGregor, H. (1999). Test anxiety and the hi-
erarchical model of approach. Contemporary Educational
Psychology, 19, 430–446.
Gobert, J. D., Sao Pedro, M. A., Baker, R. S. J. D., Toto, E., &
Montalvo, O. (2012). Leveraging educational data mining
for real-time performance assessment of scientic inqui-
ry skills within microworlds. Journal of Educational Data
Mining, 4, 111–143.
Iseli, M. R., Koenig, A. D., Lee, J. J., & Wainess, R. (2010).
Automated assessment of complex task performance in
games and simulations (CRESST Research rep. No. 775).
Los Angeles: National Center for Research on Evalua-
tion, Standards, Student Testing. Retrieved from: http://
www.cse.ucla.edu/products/reports/R775.pdf PMCid:P-
MC3389788
Kato, P. M. (2012). Evaluating ecacy and validating health
games. Games for Health: Research, Development, and
Clinical Applications, 1(1), 74–76. https://doi.org/10.1089/
g4h.2012.1017 PMid:26196436
Kato, P. M. (2010). Video games in health care: Closing the gap.
Review of General Psychology, 14(2), 113–121. https://doi.
org/10.1037/a0019441
Kerr, D., & Chung, G. K. W. K. (2012). Identifying key features
of student performance in educational video games and
simulations through cluster analysis. Journal of Education-
al Data Mining, 4(1).
Levy, R. (2013). Psychometric and evidentiary advances, op-
portunities, and challenges for simulation-based assess-
ment. Educational Assessment, 18(3), 182–207. https://
doi.org/10.1080/10627197.2013.814517
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). Focus
article: On the structure of educational assessments. Mea-
surement: Interdisciplinary Research and Perspectives, 1(1),
3–62. https://doi.org/10.1207/S15366359MEA0101_02
Mislevy, R. J., Behrens, J. T., DiCerbo, K., & Levy, R. (2012).
Design and discovery in educational assessment: evi-
dence-centered design, psychometrics, and data mining.
Journal of Educational Data Mining, 4, 11–48.
Mislevy, R.J., Oranje, A., Bauer, M., von Davier, A.A., Hao, J.,
Corrigan, S., Homan, E., DiCerbo, K., & John, M. (2014).
Psychometric considerations in game-based assessment.
New York, NY: Institute of Play.
Sebastiaan de Klerk and Pamela M. Kato
Vol 18 (S1) | 2017 | www.jattjournal.com Journal of Applied Testing Technology 37
Neapolitan, R.E. (2003). Learning Bayesian networks. New
York, NY: Prentice-Hall.
Pearl, J. (1988). Probabilistic reasoning in intelligent sys-
tems: Networks of plausible inference. San Francisco, CA:
Morgan Kaufmann. https://doi.org/10.1016/B978-0-08-
051489-5.50012-6
Nelson, B., Nugent, B., & Rupp, A. A. (2012). On instruction-
al utility, statistical methodology, and the added value of
ECD: Lessons learned from the special issue. Journal of Ed-
ucational Data Mining, 7(4), 224–230.
Quellmalz, E. S., Timms, M. J., Silberglitt, M. D., & Buckley, B.
C. (2012). Science assessments for all: Integrating science
simulations into balanced state science assessment systems.
Journal of Research in Science Teaching, 49(3), 363–393.
https://doi.org/10.1002/tea.21005
Rupp, A. A., DiCerbo, K. E., Levy, R., Benson, M., Sweet, S.,
Crawford, A., et al. (2012). Putting ECD into practice: the
interplay of theory and data in evidence models within a
digital learning environment. Journal of Educational Data
Mining, 4, 49–110.
Rupp, A. A., Levy, R., DiCerbo, K., Sweet, S. J., Crawford, A.
V., Calico, T., Benson, M., Fay, D., Kunze, K. L., Mislevy, R.
J., & Behrens, J. T. (2012). Putting ECD into practice: e
interplay of theory and data in evidence models within a
digital learning environment. Journal of Educational Data
Mining, 4(1), 49–110.
Shute, V. J., Ventura, M., Bauer, M. I., & Zapata-Rivera, D.
(2009). Melding the power of serious games and embedded
assessment to monitor and foster learning: Flow and grow.
In U. Ritterfeld, M. J. Cody, & P. Vorderer (Eds.). Serious
games: Mechanisms and eects (pp. 295-321). Mahwah,
NJ: Routledge.
Shute, V. J. (2011). Stealth assessment in computer-based games
to support learning. In S. Tobias and J.D. Fletcher (Eds.),
Computer Games and Instruction (pp. 503-523). Charlotte,
NC: Information Age Publishing.
Straatemeier, M. (2014). Math garden: A new educational and
scientic instrument. Doctoral dissertation, UvA, e
Netherlands.
... This increasing interest provides an opportunity to use video games as a tool to improve learning and education. Specifically, there is much enthusiasm in the field of education about game-based assessment (GBA) Faculty because conventional assessment methods do not seem to fully have the power to measure all aspects of students' knowledge, skills, and attributes [6]. Accompanying this explosion in technology use is the quantity, range and scale of data that can be collected, which have increased exponentially over the last decade [7]. ...
... Moreover, playing games is one of the most popular activities in the world, and the technological revolution that we are experiencing allows the implementation of games as alternative assessment tools in educational environments. However, previous studies suggest that the use of games also presents some challenges, such as finding the time for both the presenter/instructor and student to learn the systems employed, the financial impact on both parties, and technical limitations [1], [6]. We can tackle all these challenges by facing current limitations and revealing the great potential games have for assessment. ...
Article
Full-text available
Technology has become an essential part of our everyday life, and its use in educational environments keeps growing. In addition, games are one of the most popular activities across cultures and ages, and there is ample evidence that supports the benefits of using games for assessment. This field is commonly known as game-based assessment (GBA), which refers to the use of games to assess learners' competencies, skills, or knowledge. This paper analyzes the current status of the GBA field by performing the first systematic literature review on empirical GBA studies. It is based on 65 research papers that used digital GBAs to determine: (1) the context where the study has been applied; (2) the primary purpose; (3) the domain of the game used; (4) game/tool availability; (5) the size of the data sample; (6) the computational methods and algorithms applied; (7) the targeted stakeholders of the study; and (8) what limitations and challenges are reported by authors. Based on the categories established and our analysis, the findings suggest that GBAs are mainly used in K-16 education and for assessment purposes, and that most GBAs focus on assessing STEM content, and cognitive and soft skills. Furthermore, the current limitations indicate that future GBA research would benefit from the use of bigger data samples and more specialized algorithms. Based on our results, we discuss current trends in the field and open challenges (including replication and validation problems), providing recommendations for the future research agenda of the GBA field.
... This increasing interest provides an opportunity to use video games as a tool to improve learning and education. Specifically, there is much enthusiasm in the field of education about game-based assessment (GBA) because the classic methods of assessment do not seem to fully have the power to measure all aspects of students' knowledge, skills, and attributes [6]. ...
... Moreover, playing games is one of the most popular activities over the world, and the technological revolution that we are experiencing allows the implementation of games as alternative assessment tools in educational environments. However, previous studies suggest that the use of games also present some challenges, such as finding the time for both the presenter/preceptor and student to learn the systems employed, the financial impacts on both parties, and technical limitations [1,6]. We can settle all these challenges by facing current limitations and revealing the great potential games have for assessment. ...
Preprint
Full-text available
Technology has become an essential part of our everyday life, and its use in educational environments keeps growing. In addition, games are one of the most popular activities across cultures and ages, and there is ample evidence that supports the benefits of using games for assessment. This field is commonly known as game-based assessment (GBA), which refers to the use of games to assess learners' competencies, skills, or knowledge. This paper analyzes the current status of the GBA field by performing the first systematic literature review on empirical GBA studies, based on 66 research papers that used digital GBAs to determine: (1) the context where the study has been applied, (2) the primary purpose, (3) the knowledge domain of the game used, (4) game/tool availability, (5) the size of the data sample, (6) the data science techniques and algorithms applied, (7) the targeted stakeholders of the study, and (8) what limitations and challenges are reported by authors. Based on the categories established and our analysis, the findings suggest that GBAs are mainly used in formal education and for assessment purposes, and most GBAs focus on assessing STEM content and cognitive skills. Furthermore, the current limitations indicate that future GBA research would benefit from the use of bigger data samples and more specialized algorithms. Based on our results, we discuss the status of the field with the current trends and the open challenges (including replication and validation problems) providing recommendations for the future research agenda of the GBA field.
... In this study, the definition of SG is limited to digital games that are applied in serious fields and have an assessment mechanism for its players to improve their skills [18][19][20]. A mapping of the SG position that is the main focus of this study is shown in Figure 1. ...
... In essence, the incentive system provides a 'reward' when the player is successful in completing a challenge. Furthermore, the games incentive system can also be used as a new learning culture for instructors [19]. In this case, the instructor can monitor the progress of the player in the game, knowing the achievements of each player, including what materials the player has completed or missed so that the collected player progress data can be used to give insight in determining the direction of the next challenge. ...
Article
Full-text available
Serious games or applied games are digital games applied in serious fields such as education, advertising, health, business, and the military. Currently, serious game development is mostly based on the Game Development Life Cycle (GDLC) approach. A serious game is a game product with unique characteristics that require a particular approach to its development. This paper proposes a serious game development model adapted from the Game-Based Learning Foundation. This paper’s main contribution is to enhance knowledge in the game development field and game-related application research. The proposed model was validated using the relativism approach and it was used to develop several game prototypes for universities, national companies, and the military.
... The study by Valladares-Rodríguez et al [31] identified research issues related to the development of SG for use in neuropsychological evaluation, proving its potential as an alternative to conventional neuropsychological examinations. However, it is pointed out that more research is needed on their reliability and validity for their application in daily clinical practice [32]. In addition, it is necessary to address the risk of investing in technical features that could potentially affect the reliability of the game. ...
Article
Full-text available
Background Ecologically valid evaluations of patient states or well-being by means of new technologies is a key issue in contemporary research in health and well-being of the aging population. The in-game metrics generated from the interaction of users with serious games (SG) can potentially be used to predict or characterize a user’s state of health and well-being. There is currently an increasing body of research that investigates the use of measures of interaction with games as digital biomarkers for health and well-being. Objective The aim of this paper is to predict well-being digital biomarkers from data collected during interactions with SG, using the values of standard clinical assessment tests as ground truth. Methods The data set was gathered during the interaction with patients with Parkinson disease with the webFitForAll exergame platform, an SG engine designed to promote physical activity among older adults, patients, and vulnerable populations. The collected data, referred to as in-game metrics, represent the body movements captured by a 3D sensor camera and translated into game analytics. Standard clinical tests gathered before and after the long-term interaction with exergames (preintervention test vs postintervention test) were used to provide user baselines. Results Our results showed that in-game metrics can effectively categorize participants into groups of different cognitive and physical states. Different in-game metrics have higher descriptive values for specific tests and can be used to predict the value range for these tests. Conclusions Our results provide encouraging evidence for the value of in-game metrics as digital biomarkers and can boost the analysis of improving in-game metrics to obtain more detailed results.
... Such a combination moves SGs beyond focusing merely on intervention or screening, leading to a dual-role SG where intervention per se is supported by continuous assessment. However, it is necessary to address the risk of investing in technical features that could potentially affect the reliability of the game, thus intertwining the purpose of enhancing a feature with that of its measurement [27]. ...
Article
Full-text available
Conventional clinical cognitive assessment has its limitations, as evidenced by the environmental shortcomings of various neuropsychological tests conducted away from an older person's everyday environment. Recent research activities have focused on transferring screening tests to computerized forms, as well as on developing short screening tests for screening large populations for cognitive impairment. The purpose of this study was to present an exergaming platform, which was widely trialed (116 participants) to collect in-game metrics (built-in game performance measures). The potential correlation between in-game metrics and cognition was investigated in-depth by scrutinizing different in-game metrics. The predictive value of high-resolution monitoring games was assessed by correlating it with classical neuropsychological tests; the area under the curve (AUC) in the receiver operating characteristic (ROC) analysis was calculated to determine the sensitivity and specificity of the method for detecting mild cognitive impairment (MCI). Classification accuracy was calculated to be 73.53% when distinguishing between MCI and normal subjects, and 70.69% when subjects with mild dementia were also involved. The results revealed evidence that careful design of serious games, with respect to in-game metrics, could potentially contribute to the early and unobtrusive detection of cognitive decline.
... The 3D-graphics and natural sounds were supported in it [12]. Nowadays, mobile gamers have the opportunity to sample much more innovative choices of mGames because technology also rapidly changes and becomes more advance, such as location-based games [13]. ...
... A commonly used technique evaluate players is to make them fill out a questionnaire before playing the game, and a subsequent questionnaire after playing the game, and then compare both responses to measure the effect of the game on its players [6]. This methodology, however, also has drawbacks, as the measurement of learning is carried out externally, outside the learning environment, and taking a questionnaire could have additionally negative effects on players' performance [7]. Moreover, when only applying questionnaires, educators do not receive any information about the behavior and choices/answers made by the players, neither during nor after the game. ...
Conference Paper
Full-text available
Data science applications in education are quickly proliferating, partially due to the use of LMSs and MOOCs. However, the application of data science techniques in the validation and deployment of serious games is still scarce. Among other reasons, obtaining and communicating useful information from the varied interaction data captured from serious games requires specific data analysis and visualization techniques that are out of reach of most non-experts. To mitigate this lack of application of data science techniques in the field of serious games, we present T-Mon, a monitor of traces for the xAPI-SG standard. T-Mon offers a default set of analysis and visualizations for serious game interaction data that follows this standard, with no other configuration required. The information reported by T-Mon provides an overview of the game interaction data collected, bringing analysis and visualizations closer to non-experts and simplifying the application of serious games.
Thesis
Full-text available
This study aimed to determine the effects of using 'Geopardy!' game as formative assessment to the learners' perception on their learning environment and achievements in Geometry. The game was conducted to the experimental group as formative assessment while the controlled group used traditional paper-pencil test every Friday. The respondents were 64 Grade 7 students from Iligan City National High School in Mahayahay, Iligan City. The researchers gathered data through mixed method approach and non-equivalent pre-post design from two pre assigned sections. Achievement test, perception questionnaire and math journals were used. It was found out that the learners very much enjoyed and liked the idea of having the game in their Geometry class. The game helped the learners to have a positive outlook on the learning environment as well as working in groups. Though improvements on the achievement levels were seen on both experimental and controlled group, it was very close that it showedno significant difference on the gain scores between the groups. The researchers concluded that Geopardy! is indeed an effective assessment tool. It could improve the perceptions of the learners on their learning environment and also their achievement levels in Geometry, however, the issue on the gain scores needs further study.
Article
Full-text available
Research on commercial computer games has demonstrated that in-game behavior is related to the players’ personality profiles. However, this potential has not yet been fully utilized for personality assessments. Hence, we developed an applied (i.e., serious) assessment game to assess the Honesty–Humility personality trait. In two studies, we demonstrate that this game adequately assesses Honesty–Humility. In Study 1 ( N = 116), we demonstrate convergent validity of the assessment game with self-reported Honesty–Humility and divergent validity with the other HEXACO traits and cognitive ability. In Study 2 ( N = 287), we replicate the findings from Study 1, and also demonstrate that the assessment game shows incremental validity—beyond self-reported personality—in the prediction of cheating for financial gain, but not of counterproductive work and unethical behaviors. The findings demonstrate that assessment games are promising tools for personality measurement in applied contexts.
Article
Full-text available
Innovation in technology drives innovation in assessment. Since the introduction of computer-based assessment (CBA), a few decades ago, many formerly paper-and-pencil tests have transformed in a computer-based equivalent. CBAs are becoming more complex, including multimedia and simulative elements and even immersive virtual environments. In Vocational Education and Training (VET), test developers may seize the opportunity provided by technology to create a multimedia-based equivalent of performance-based assessment (PBA), from here on defined as multimediabased performance assessment (MBPA). MBPA in vocational education is an assessment method that incorporates multimedia (e.g. video, illustrations, graphs, virtual reality) for the purpose of simulating the work environment of the student and for creating tasks and assignments in the assessment. Furthermore, MBPA is characterized by a higher amount of interactivity between the student and the assessment than traditional computer-based tests. The focal constructs measured by MBPA are the same as are currently assessed by performance-based assessments. Compared to automated delivery of item-based tests, MBPA realizes the full power of ICT. In the present article we will therefore discuss the current status of MBPA, including examples of our own research on MBPA. We provide an argument for the use of MBPA in vocational education too.
Article
Full-text available
We present Science Assistments, an interactive environment, which assesses students’ inquiry skills as they engage in inquiry using science microworlds. We frame our variables, tasks, assessments, and methods of analyzing data in terms of evidence-centered design. Specifically, we focus on the student model, the task model, and the evidence model in the conceptual assessment framework. In order to support both assessment and the provision of scaffolding, the environment makes inferences about student inquiry skills using models developed through a combination of text replay tagging [cf. Sao Pedro et al. 2011], a method for rapid manual coding of student log files, and educational data mining. Models were developed for multiple inquiry skills, with particular focus on detecting if students are testing their articulated hypotheses, and if they are designing controlled experiments. Student-level cross-validation was applied to validate that this approach can automatically and accurately identify these inquiry skills for new students. The resulting detectors also can be applied at run-time to drive scaffolding intervention.
Article
Full-text available
In educational assessment, we observe what students say, do, or make in a few particular circumstances and attempt to infer what they know, can do, or have accomplished more generally. A web of inference connects the two. Some connections depend on theories and experience concerning the targeted knowledge in the domain, how it is acquired, and the circumstances under which people bring their knowledge to bear. Other connections may depend on statistical models and probability-based reasoning. Still others concern the elements and processes involved in test construction, administration, scoring, and reporting. This article describes a framework for assessment that makes explicit the interrelations among substantive arguments, assessment designs, and operational processes. The work was motivated by the need to develop assessments that incorporate purposes, technologies, and psychological perspectives that are not well served by familiar forms of assessments. However, the framework is equally applicable to analyzing existing assessments or designing new assessments within familiar forms.
Chapter
Full-text available
Immersive games tend to induce "flow," a state in which a game player loses track of time and is absorbed in the experience of game play. Flow is conducive to engagement, and engagement is conducive to learning, yet immersive games lack the assessment infrastructure to maximize learning potential. Typical assessments are likely to disrupt flow in immersive games. Thus there is a need for embedded (or "stealth") assessments that would be less obtrusive and hence less disruptive to flow. This paper proposes an approach for embedding assessments in immersive games to reveal what is being learned during the gaming experience. This effort draws on recent advances in assessment design. Key elements of the approach include: (a) evidence-centered design, which systematically analyzes the assessment argument, including the claims to be made about the learner and the evidence that supports (or fails to support) those claims, and (b) formative assessment to guide instructional experiences. This paper illustrates how elements of this approach have been applied in a non- game setting (i.e., Cisco network training simulation) and how it could be applied to an existing immersive game setting (i.e., Oblivion). Finally, the chapter offers suggestions for extending and applying this approach not only for existing games but for the design of new games.
Article
This article characterizes the advances, opportunities, and challenges for psychometrics of simulation-based assessments through a lens that views assessment as evidentiary reasoning. Simulation-based tasks offer the prospect for student experiences that differ from traditional assessment. Such tasks may be used to support evidentiary arguments that differ considerably from those typical in assessment. These novel assessment arguments are richer or more nuanced than those commonly used in terms of the targeted inferences about students, the evidence that facilitates those inferences, and the tasks that allow for the collection of such evidence. Driving principles of assessment are reviewed, and their implications for specifying student, task, and evidence models for simulation-based assessments are described. Potential pitfalls in these and related aspects of the assessment development process are described. Strategies for solutions to some of the more imminent challenges to psychometrics for simulations are discussed.