Conference PaperPDF Available

Evaluation of Learning Analytics on Adaptive Learning Systems: A work in progress Systematic Review



There is currently no systematic overview of with what purpose Learning Analytics (LA) and Learning Analytics Dashboards (LAD) are evaluated on Adaptive Learning Platforms. This work in progress systematic review provides the preliminary results of this endeavor. The paper establishes an overview of the current research field from two reviews. From this foundation we provide an analysis of seven papers. The preliminary results show four different purposes for evaluating LA and LAD on Adaptive Learning Platforms. These are: 1) Evaluation of LA and LAD design and framework, 2) Evaluation of LA and LAD performance, 3) Evaluation of perceived value, and 4) Evaluation of adaptivity. Through examining these papers, we see that when LA and LAD are evaluated on Adaptive Learning Platforms there are both single and multiple purpose of applying an evaluation method. These categories might change as the work in progress develops and more papers gets added in the synthesis.
Evaluating Learning Analytics
of Adaptive Learning Systems: A Work
in Progress Systematic Review
Tobias Alexander Bang Tretow-Fish(B
)and Md. Saifuddin Khalid
Department of Applied Mathematics and Computer Science at the Technical
University of Denmark, Kongens Lyngby, Denmark
Abstract. There is currently no systematic overview of methods for
evaluating Learning Analytics (LA) and Learning Analytics Dashboards
(LAD) of Adaptive Learning Platforms (ALPs). 10 articles and 2 reviews
are analyzed and synthesized. Focusing on the purposes of evaluation,
methods used in the studies are grouped into five categories (C1-5): C1)
evaluation of LA and LAD design and framework, C2) evaluation of per-
formance with LA and LAD, C3) evaluation of adaptivity functions of
the system, C4) evaluation of perceived value, and C5) Evaluation of
pedagogical and didactic theory/context. While there is a relative high
representation of evaluations in the C1-C4 categories of methods, which
contribute to the design and development of the interaction and inter-
face design features, the C5 category is not represented. The presence
of pedagogical and didactical theory in the LA, LAD, and ALPs is lack-
ing. Though traces of pedagogical theory is present none of the studies
evaluates on its impact. AQ2
Keywords: Adaptive learning platforms ·Learning analytics ·
1 Introduction
Adaptive Learning (AL) is not only a relatively new research area but also a
multi-disciplinary field involving multiple synonymous and definitions. Adaptive
learning, personalized learning, individualized learning, and customized learning
are in some way interchangeable although adaptive learning is the most fre-
quently used term of the four [13]. Various methods are applied for the design
and evaluation of adaptive or personalized activities and contents of the digital
learning platforms.
Existing reviews on Learning Analytic (LA), Learning Analytics Dashboards
(LAD), and AL has not focused on the methods used to evaluate LA and LADs
of Adaptive Learning Platforms (ALPs). For instance, the systematic literature
ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2022
Published by Springer Nature Switzerland AG 2022. All Rights Reserved
E. Brooks et al. (Eds.): DLI 2021, LNICST 435, pp. 1–16, 2022.
Author Proof
2 T.A.B.Tretow-FishandM.S.Khalid
review [7] presents six reviews on adaptive learning and seven reviews on learn-
ing analytics among others types of learning technologies but lacks focus on
the methods for the evaluation of LA or LADs. The review [9] posed several
questions on especially which methods have been employed for the evaluation
of the systems. The review reports that the learners play an important role in
the evaluation of intelligent tutoring systems, such as learners’ experience when
evaluating system usability. In the examined studies 5.66% of studies involving
intelligent tutoring systems were evaluated only by learner experiences, while in
combination with learner’s performance, system’s performance or both, learn-
ers’ experiences have been used more frequently [9]. The review does not entail
what methods were used for obtaining the learner experience or what types of
usability tests were used.
The purpose of applying different methods of evaluation is therefore inter-
esting to look into to get a better understanding of which perspectives are being
evaluated from as well as how they are being evaluated.
This leads to the motivation for this systematic review. The motivation is
to synthesize the evaluation methods applied in the design, development, and
implementation of AL as they support pedagogical and learning related deci-
sions for educators and students. Likewise, we want to examine how students’
and educators’ perceptions of LAD and LA are integrated in the evaluation
methods. The study will contribute to the fields of usability engineering, user
experience, and digital learning technology. The study on the methods of eval-
uating AL platforms is pivotal for improving the quality of learning experience
and learning outcome, educators teaching experience and their adoption of the
technology, and development process of companies and the implementation of
the right evaluation methods.
The above-mentioned scope and motivation led us to devising the research
How to evaluate the Learning Analytics and Learning Analytics Dashboards of
Adaptive Learning Platforms?
The desired outcomes is one set of methods for evaluating the functionalities
and perceived experiences of the technological features, and the other set of
methods on the evidence of improving learning outcomes, learning experience,
and teaching quality. While the first contribute to the field of interaction design
and the second contribute to the broader field of service design and innovation
within the education and training domain.
2 Methods
Applying two different established methods, the protocol for the selection of
papers and the protocol for the process of analysis and synthesis are conducted.
Author Proof
Evaluating Learning Analytics of Adaptive Learning Systems 3
2.1 Selection of Papers: PRISMA
The selection of articles are conducted according to the Preferred Reporting
Items for Systematic Reviews and Meta-Analyses (PRISMA) [11], which includes
four phases: identification, screening, eligibility, and included (See Fig. 1). Since
the aim is to review evaluation methods used for LA and LAD on ALPs, various
combinations of the following keywords are used: evaluation, adaptive learning,
learning analytics, learning analytics dashboards, assessment, etc. The searches
were restricted to peer-reviewed papers, published in English, Danish, and Nor-
wegian (considering authors’ language skills), from 2011 to the search date
September 1, 2021. In consultation with a librarian and after testing different
combinations of keywords, four databases were selected, and different combina-
tions of the keywords returned the following: Scopus [n = 75], ACM [n = 144],
ScienceDirect [n = 106], and Taylor & Francis [n = 38]. We envision further
inclusion of databases such as IEEE Xplore, JSTOR, Routledge, Springer, and
ERIC in our continued work.
The exclusion criteria implemented in screening and eligibility stages are as
follows: 1) A paper that does not mention LA or LAD in relation to ALP. 2)
Papers with a focus on LA and LAD in other e-learning environments which
do not meet the requirements of adaptivity for the learning platform. 3) Papers
without empirical data examining LA or LAD on ALP. 4) For the conference
proceedings, only included papers published as part of the main conference.
Workshop papers and posters were excluded.
Fig. 1. PRISMA flow-chart
Author Proof
4 T.A.B.Tretow-FishandM.S.Khalid
The two authors screened separate databases and only the papers selected by
one author (n = 95) are included in this document. For this review, 10 articles
and 2 reviews have been included for analysis and synthesis.
2.2 Constant Comparative Analysis Method
We applied the constant comparative analysis method for the analysis and syn-
thesis [4]. The articles were encoded according to themes and then divided into
categories. During this process, the coded sections were regularly compared to
similar parts of texts containing the same codes. The intention was to create
a connection between the texts and ensure the continuity of the codes’ defini-
tions [4].
Each included paper was read with the purpose of identifying methods,
parameters, and purpose of evaluating LA and LAD. The data extracted from
the papers are tabulated to synthesize: 1) The methods used when evaluating
LA and LAD. 2) Parameters measured by the aforementioned methods to eval-
uate LA and LAD. 3) The purpose for the evaluation method applied. From
the identified purposes a thematic analysis was initiated and categories were
3 Analysis and Synthesis
In this section, we report the qualitative synthesis of the systematic review.
The evaluation methods identified in the papers are summarized in the Table 1
are grouped into four categories. C1) Evaluation of LA and LAD design and
framework - focusing on how LA and LAD is implemented on the platform.
C2) Evaluation of performance with LA and LAD - focusing on user perfor-
mance with LA and LAD statistics. C3) Evaluation of adaptivity - focusing on
if and how the adaptivity functions of the system works. C4) Evaluation of per-
ceived value - focusing on perceived value of students, educators, or users. C5)
Evaluation of pedagogical and didactic theory/context - focusing on whether a
pedagogical theory is the groundwork for the LA, LAD, or framework or if there
are actionable pedagogical recommendations associated with the application.
Each category will have studies which are in depth described if their main
focus aligns with the category. Several studies have multiple evaluations besides
their main focus these papers will be mentioned in each category as evaluation
3.1 C1) Evaluation of la and LAD Design and Framework
Paper [1] focuses on evaluating LA and LAD design and framework whereas [8]
only has evaluation of LA and LAD design and framework as a part of their
[1] propose EduAdapt, an architectural model for the adaptation of learning
objects considering device characteristics, learning style and students’ contextual
Author Proof
Evaluating Learning Analytics of Adaptive Learning Systems 5
information. They develop an ontology (OntoAdapt) for recommending content
to users. Particularly, for EduApadt the study wants to investigate if the use of
ontology matches the learning objects adaptation scope.
The OntoAdapt [1] is an ontology which is evaluated in two phases. The first
phase describes the development of the ontology and the second phase of apply-
ing it in a developed application. The first phase uses two strategies; scenarios
and analyzing the quality and fidelity that OntoAdapt delivers against other
ontologies. The scenarios identified different use cases from which they devel-
oped OntoAdapt. The analyzing of quality and fidelity of OntoAdapt compared
to other ontologies was done with some evaluation metrics from full ontology
evaluation (FOEval) (coverage, richness, and level of detail) and some provided
by the software Protege (Annotations, Object property, Data property, Prop-
erties to the specific domain, Properties with specific range, Total number of
classes, and Total number of subclass) to analyze the quality and the fidelity
that OntoAdapt delivers in covering concepts on the associated subjects. These
metrics were complemented with the tool Manchester-OWL Ontology Metric to
validate and display statistics on OntoAdapts performance. These were used to
calculate Attribute Richness (AR), Relation Richness (RR), Ontology Richness
(OR), and Subclass Richness (SR).
The second phase, were the testing of the ontology. A mobile application pro-
totype for Apple iOS mobile devices was developed and they used the prototype
in an undergraduate course called Ubiquitous and Mobile Computing with 20
learners who used the Adapt application during 1 month. The study applied a
survey using the Felder and Silverman index of learning styles with 44 items on
a 5 point Likert scale on four dimensions (Active/Reflective, Sensing/Intuitive,
Visual/Verbal, and Sequential/Global). Afterwards a pretest on the EduAdapt
was preformed with 20 Learning Objectives (LOs). A post test was then offered
after 1 month and to complete a survey on EduAdapt. The survey was based on
the work of a two-tier test and a usability evaluation and was compounded of
10 statements, the students had to rate usinga5pointLikertscaletomeasure
the level of user satisfaction. These results were evaluated on reliability with the
Cronbach alpha approach and Wilcoxon-Mann-Whitney test to assess whether
samples have the same distribution.
As one of the results in this expansive study “we can highlight as the main
scientific contribution the proposal of a model for learning objects adaptation
that employs inferences and rules in an ontology considering various contexts,
including the student’s learning style” [1, p. 83].
Besides this paper, an additional paper touch upon the evaluation of LA and
LAD design and framework in their study. [8] presents a framework to frame
user requirements of an adaptive system.
3.2 C2) Evaluation of Performance with la and LAD
Two papers focused mainly on evaluating user performance with LA and LAD
and one additional paper mentioned the measurement of performance through
LA and LAD but not as part of the main scope of its study.
Author Proof
6 T.A.B.Tretow-FishandM.S.Khalid
Table 1. Review results
Author Category Evaluated
Methods Parameters Purpose
Di Mascio
et al.
(2013) [3]
C3, C4 Adaptive
evaluation, expert
think-aloud and
verbal protocols,
simulation and
Users’ attitudes
towards the
system, users’
performance and
The qualitative
(Heuristic, expert
evaluations etc.)
are used to
evaluate design
choices for the
system. Whereas,
the simulations
and system
indicators are also
used to evaluate
the design choices
but from a
perspective on
et al.
(2016) [2]
C3, C4 Mechanism
that adapts to
Surveys and
Adaptability and
The simulation
method was used
to evaluated on
the amount of
possible outputs
from the system.
The surveys
combined with a
pilot case
evaluated on the
perceived levels of
both variability
and adaptability
Tlili et al.
(2019) [14]
C3, C4 Method for
modelling to
Survey and LA
personality scores
LA was used to
estimate learners’
personalities and
surveys were used
to evaluate the
validity of the
personality models
et al.
(2015) [5]
C3, C4
Surveys, pre- and
scores, and user
satisfaction scores
satisfaction scores
and learning
How does the
performs (student
performance) as
well as what the
satisfaction with
the LAD were
Nye et al.
(2021) [10]
C4 MentorPal,
framework for
For m at iv e us e r
testing interviews,
log data, pre- and
post-surveys for
career attitudes,
and post-survey
for usability
Feasibility for
virtual mentoring
The LAD is
evaluated on
statistics which
were used to check
for the model
quality, this was
verified against
users’ subjective
quality assurance
(cont in ued)
Author Proof
Evaluating Learning Analytics of Adaptive Learning Systems 7
Table 1. (continued)
Author Category Evaluated
Methods Parameters Purpose
et al.
(2016) [1]
C1, C4 Ontology
model for LA
and LAD
FOEval, u ser
feedback, surveys,
measurements of
survey reliability,
user scenarios,
questions and
usage patterns
Learners, learning
objects, devices,
context, context
coverage, richness,
and level of detail
The evaluations of
the ontology goes
through two
phases. First
phases evaluations
are used for
developing the
ontology. Second
phase evaluations
are applied to
compare it with
other ontologies
and to evaluate
how the ontology
performs in a
learning context
et al.
(2016) [8]
C1, C4 Teacher-led
design on
questionnaire and
perceptions and
This paper
proposes a
methodology to
requirements to a
number of critical
success factors in
meeting the users’
expectations of
the system
et al.
(2014) [6]
C3, C4 Adaptive
Subjective ratings,
Linguistic Inquiry
and Word Count,
and Advanced
Text Analyzer
User’s experienced
cognitive load,
Measures, and
Category Features
A method for a
adaptive learning
design which
adapts to users’
cognitive load
et al.
(2015) [12]
C3, C4
Conte xt-awar e
Tutor Oriented
Modeling for
design methods,
data mining
interviews, and
SUS questionnaire
Learners’ affective
state, educators’
tacit experiences,
Exploration of
feedback. How or
if this improves
support through a
et al.
(2021) [15]
C2, C4 Student-
Pre- and post test
of students’
performance and
system log files
Students’ learning
Does such a
system have a
practical value
[5] developed a new algorithm, called the competency-based guided-learning
algorithm (CBGLA). The study aimed to develop a CBGLA-based learning sys-
tem that includes personalized learning paths which guide learners in achieving
the learning objectives. The purposes of the guided-learning functions are to
Author Proof
8 T.A.B.Tretow-FishandM.S.Khalid
accelerate and streamline the learning process. The system was tested on six
third-year college students of electrical engineering before the experiment was
conducted on 59 third-year college students of electrical engineering [Experimen-
tal group = 29, control group = 30]. To test the effectiveness of CGBLA a quasi-
experimental research method was employed using a non-equivalent test design.
The statistical mean, independent sample t-test, and one-way analysis of covari-
ance (ANCOVA) was used to investigate the participants’ learning effectiveness,
satisfaction, and three dimensions of system validity through achievement of
learning objectives, required learning time, and learning effectiveness. Learner
satisfaction was investigated using a 16-item survey of five-point Likert scale
covering three dimensions: interface design, design of adaptive guided-learning
mechanism, and the perception of CBL. “The results of system validity exper-
iments were significantly positive. This paper also conducted learning exper-
iments to analyze learning effectiveness. Results showed that students learned
more effectively under the guidance of the CBGL system than under the instruc-
tion of a teacher. [...] However, students expressed a lower degree of satisfaction
when surveyed about their perception of CBL” [5, p. 124].
[15] presents the Student-Centered Online One-to-one Tutoring system
(SCOOT), which deals with the cost of one-to-one tutoring. SCOOT is pre-
sented as a supplementary service where students can ask questions outside
school to expand the flexibility of posing questions. The tutoring sessions with
SCOOT is organized in four essential components: organization of teachers, stu-
dent inquiry, and pair matching mechanism and the tutoring session. In SCOOT,
teachers and students are able to communicate online through screen sharing,
sending text and pictures, and speech. The teachers’ interface of the application
needs the teachers to log into the system and mark themselves as available for
synchronous live conversations. The students’ interface of the application require
the students to log on and interact with the available teachers. These tutoring
sessions are initiated by the students. The study seeks to evaluate the efficiency
of SCOOT as well as examining how students’ prior knowledge and the superfi-
cial patterns of tutoring sessions affected their learning. The evaluation include
integrate students’ learning performance and behavior log files instead of running
between-subject experiments. The study ran for 50 days with a pre-test before
and post-test afterwards. To get an in-depth understanding of how tutoring ses-
sions affected students’ learning, 40 tutoring sessions were randomly selected
based on the criteria inferred and the sessions were manually labeled in detail
with the coding schema developed from Chi’s Interactive Constructive Active
Passive (ICAP) framework. The participants consisted of 810 students in Grade
7 and 64 mathematics teachers and tutoring sessions which had a length of less
than 1 min were omitted. Pretest performance combined with system usage fac-
tors was used to predict posttest performance by linear regression, this was done
with Waikato Environment for Knowledge Analysis (WEKA). Common descrip-
tive statistical analysis and Pearson correlation coefficient were computed. “The
results suggested that system interventions are needed at both the student and
teacher sides to facilitate good-quality tutoring interactions; otherwise, SCOOT
Author Proof
Evaluating Learning Analytics of Adaptive Learning Systems 9
may further increase the difference between high- and low- achieving students”
[15, p. 17].
Apart from the two above-mentioned papers [12] evaluates the effectiveness
of supporting the learning process by e.g. giving affective and sensory input to
help calm the user in a stressful learning context, and whether the input was
helpful in the students’ performance.
3.3 C3) Evaluation of Adaptivity
Four papers evaluated on the adaptivity of LA, LAD, or framework. Two addi-
tional papers mentioned adaptivity in their studies but did not present it as their
main focus.
For defining personality in adaptive learning systems, [14] devised an
evidence-based personality model by mapping students’ participation in 15 func-
tionalities of iMoodle learning management system against Big Five Inventory
(BFI) dimensions (i.e. Extraversion, Agreeableness, Consciousness, Neuroticism,
and Openness). The devised method is defined as an LA approach for defining
the learners’ personality. Based on 50 students data, Chi-square test is used as an
assessment criterion to compare between the assessed personality levels from the
results of LA approach and that assessed from the BFI results. Since the study is
exploratory and little information is previously known, the obtained experimen-
tal data from their pilot experiment was validated using three methods namely,
Chi-square, 10-fold cross-validation and Cohen’s Kappa. The study concluded
that the “LA approach with Bayesian network can model learners’ personalities
with an acceptable precision and a fair agreement compared to BFI for only
three personality dimensions, namely, extraversion, openness and neuroticism”
[14, p. 12].
To evaluate adaptability and variability of content from both simulations and
user feedback, [2] presents the Personal Health System as a part of Help4Mood
which supports users in not relapsing into depression, thereby learning to live
with their condition. The Personal Health System is a developed tool that adapts
its content to users stamina or mood. The design of the Personal Health Sys-
tem has been performed by adopting a user centred design methodology, which
was done by involving a set of users, clinicians and caregivers. The evaluation
is done through two methodologies one is producing simulated data and the
second is collecting user feedback on the system. The simulation data was pro-
duced with two categories of scenarios in mind. The scenarios were designed on
clinical requirements and were restrictive and flexible scenarios. Restrictive sce-
narios had a high number of constraints in the relative order and dependencies
between tasks. They also had a high number of constraints in the periodicity and
priority of the tasks. The flexible scenario in contrast had a minor number of
constraints. The evaluation space corresponds to the multivariate combination of
answers to all questions that might make sense given the context of the user. The
simulations were done on 19 tasks and 31 subtasks and a task could be formed
by one or more subtasks. There were 20.000 simulations of interactive sessions
(restrictive n = 10.000 and flexible n = 10.000). In this study, adaptability was
Author Proof
10 T. A. B. Tretow-Fish and M. S. Khalid
defined as how much the produced content of a session can change in relation
to current and past information and inferred about the users’ condition. Vari-
ability was defined as how the content order is offered depending both on user
actions during the interactions and restrictions defined by clinicians. The second
methodology encompassed two tests where users used the system and afterwards
answered an 11-item survey with 3-point Likert scale on their perceived useful-
ness of different functionalities of the system. Two of these items referred to
adaptability and variability. The paper concluded that “We can ensure that our
framework provides a sufficient degree of adaptive and varied sessions, allow-
ing the personalisation of the interactive sessions in order to improve the user
experience” [2, p. 90].
In a study by [6], user’s experienced cognitive load is examined to help
improve performance in complex, time-critical situations by dynamically deploy-
ing more appropriate output strategies to reduce cognitive load. This is done
through linguistic behavioral features as indices of user’s cognitive load. A pilot
study was conducted on a paper mock-up with two teams of four participants
consisting of experts from fire management work roles. Their feedback improve
the task design as the interaction design. The study examined a session where
44 participants (11 teams of four operators) participants strategically managed
fire fighting tasks as a team. Participants interacted with a multi-touch table-
top screen that displays the fire management tasks and related information. All
participants had general knowledge about firefighting, but none had ever partic-
ipated in any actual firefighting, training fire fighting exercises, or used any fire
management system before. Task design was set up with three different levels
of task complexity or cognitive load. The levels were low, medium, and high (in
the analysis combining low and medium to a single low category) Data consisted
of participants voices which were recorded with wireless close-talk microphones
recorded with the audio recording tool WaveSurfer and two video cameras which
were used to record the operators’ interactions. Further data consisted of logs of
interactions with the touch table including operators’ touch positions and drag-
ging behavior as well as a survey on the self-rated perception of task difficulty
as individuals and as a team on two separate 9-point Likert scales. The survey
also contained an open question for general comments on task complexity, use
of policy documents, and any communication issues. The analysis were done on
observations, data transcription, feature extraction, and statistical analyses of
the linguistic features with Linguistic Inquiry, Word Count, and Advanced Text
Analyzer to investigate the variations in their behavior under different task load
levels. In conclusion the paper states that: “An interaction system that is able
to analyze users’ speech and linguistic patterns to determine their current cogni-
tive load could dynamically adapt its response to minimise the users’ extraneous
cognitive load and help them maintain task performance” [6, p. 362].
[12] presents an Ambient Intelligence Context-aware Affective Recommender
Platform (AICARP) that applies Tutor Oriented Recommendations Modeling
for Educational Systems (TORMES) elicitation methodology to sense changes
in learners’ affective state. AICARP delivers interactive context-aware affective
Author Proof
Evaluating Learning Analytics of Adaptive Learning Systems 11
educational recommendations in an interactive way through complementary sen-
sory communication channels. The recommendations are given to make users
adjust breathing, stress etc. To evaluate the TORMES methodology, problem
scenarios were used to identify the necessary requirements or user goals while
taking into account the context of use elicited in the previous activity. Problem
scenarios were used to develop solution scenarios that solved or avoided the prob-
lems posed by delivering interactive recommendations. To specify these solutions
the recommendation modeling work with five dimensions; recommended action
(what), recommendation rules (when and who), justification of the recommen-
dation (why), recommendation format (how and where), and recommendation
attributes (which). Evaluation of the scenarios were carried out by applying the
user-centered design method Wizard of Oz. In this empirical study, a psycho-
educational expert with experience in supporting learners face-to-face and online
acted as the Wizard. Video of the participant and affective data (pulse, skin tem-
perature, skin resistance, and skin conductance) was visualized to the wizard who
in turn generated the associated recommend action (e.g. the green LED and the
buzzer playing a pure tone). The study had six participants one of them being
visually impaired. Before the study began participants completed the General
Self-Efficacy Scale (GSE), the Big Five Inventory (BFI), and the Positive and
Negative Affect Schedule (PANAS). As part of the study participants had to
complete two tasks. Each of the task involved speaking for 5 min, while being
recorded with the webcam. Before talking, participants had 1 min to think about
what to say. Data consisted of AICARP system data (the previous mentioned
physiological data), recordings from a webcam (facial expressions and voice),
recordings from a video-camera (body movements), and time-stamped notes by
an observer. The impact of the elicited interactive recommendation on the learner
was evaluated at the end of the experiment by means of a questionnaire and
an interview. The questionnaire was the System Usability Scale (SUS) 10-item
5 point Likert scale. The interview consisted of five open questions with the
goal of understanding participants’ opinions of their interaction with the system
regarding perception, intrusiveness, and utility. Chi-square test was conducted
to determine whether there were independence between the usability of the sys-
tem and the effectiveness of the recommendations perceived by the participants.
Answers given in the open questions were coded categorically. To verify these
categories chi-square tests were again applied. The results cannot be applied as
representative due to a very low sample size. The study concludes that “[...] this
research opens a new avenue in related literature which focuses on managing the
recommendation opportunities that an ambient intelligent scenario can provide
to tackle affective issues during the language learning process when preparing
for the oral examination of a second language learning course” [12, p. 50].
In addition to these papers, a number of papers mentions evaluation methods
that evaluates on adaptivity. [3] presents usability associated with adaptivity and
[5] effectiveness of adaptivity.
Author Proof
12 T. A. B. Tretow-Fish and M. S. Khalid
3.4 C4) Evaluation of Perceived Value
2 papers had their main focus on evaluating from users’ perceived value or eval-
uation of users’ perceived value. All of the rest 8 studies reviewed in this paper
had in one way or the other included users’ perceived value as a feature of their
[3] describes the development of the TERENCE system’s Graphical User
Interface (GUI) prototypes through evidence based and user-centered design
where they identified users’ requirements and context of use by using users
and domain experts. The first group were learners (n170), which here is
described as 7–8 year old primary-school students who are poor comprehend-
ing and hard of hearing or deaf. The second group were educators (n10),
who were primary school teachers, support teachers, and parents of learners.
The last group were experts (n10) who were psychologists and linguists, who
designed and developed the learning material. In evaluating the ALS TERENCE
the GUI was assessed and it was assessed on two levels. The Learner GUI and
the Expert/Educator GUI. Three evaluations were done and the two first ones
were done with experts. The purpose for the expert evaluation were to assess
whether the learning material were adequate for the learners and to evaluate the
usability of prototypes, in particular whether the interfaces followed standard
visual design guidelines, whether the interfaces supported the user’s next step to
achieve the task, and whether the interfaces provided appropriate feedback. The
prototypes were evaluated using heuristic evaluation, expert review and cognitive
walk through. More evaluations were conducted with end users. The purpose was
to provide indications related to the pedagogical effectiveness of the prototypes
and to evaluate their usability. The methods were: observational, think-aloud,
verbal protocols, and controlled experiment. The paper informs about upcoming
analyses of a large-scale evaluation with 900 end users. The initial findings and
analysis is not included here. Their findings informs an expansion of usability
testing to also include timing and focus of users’ participation as well as system
performance during the execution of users’ tasks [3, p. 5-7].
[10] presents a virtual mentor system called MentorPal. In this empirical
study the system gives career advice to high school students (n = 31) attend-
ing STEM internships who considering STEM careers. They participated in 3
sessions with MentorPal where they completed a pre-survey, interacted with
MentorPal for 25–30 min, and then completed a post-survey. Researchers unob-
trusively observed the students during usage and were available to help if needed.
The STEM career advice were given with focus on STEM careers in the Navy.
The system works as follows; the student asks one of four recorded virtually
represented mentors by Free Text, Speech Input, or Topic Buttons about the
mentor’s career to get a better understanding of the career’s alignment with the
students interests and goals. MentorPal responds with the most suited answer.
The primary pedagogical technique encouraged during recording of the mentors
was the use of anecdotes and narrative. Development of MentorPal was done
with three parameters in focus: Conversational Flow, Video-Chat Authenticity,
and Low Cost. MentorPals performance was evaluated through pre- and post
Author Proof
Evaluating Learning Analytics of Adaptive Learning Systems 13
surveys. Usability was evaluated with Unified Theory of Acceptance and Use of
Technology constructs (UTAUT) survey on a 6 point Likert scale with 6 items.
To evaluate change in attitude towards specific careers a survey was generate
from variants of the CAPA Career Confidence Inventory and the CAPA Interest
Inventory which resulted in respectively 50 items based on the approximately
400 CAPA items on a 5 point Likert scale. The results were tested and eval-
uated through traditional classifier statistics these were used to check for the
direction of increases or decreases for model quality with 5-fold cross-validation
accuracy scores, this was verified against subjective quality assurance testing.
Their findings were limited on both sample size, sample diversity, and impact
but one of the clear conclusions were that: “A panel of four mentors (even one
hypothetically optimized through hindsight) is insufficient to cover either the
main career interests or diversity representation of even 31 students. So, future
research should investigate how students respond to self-reported or automati-
cally personalized panels drawn from a larger set of mentors representing broader
career choices and backgrounds” [10, p. 39].
In addition, several papers mentions the evaluation of perceived value.
[1,2,12,14,15] present the evaluation of perceived value as a method for fur-
ther informing performance of LA. [2,12] uses evaluation of perceived value to
evaluate adaptability and variability and to assess the usability of the LA, [5]
presents it to assess satisfaction levels of LA, [6] estimate perceived level of cog-
nitive load, [15] assesses the practical value of the LA, and [1,8] developing the
3.5 C5) Evaluation of Pedagogical and Didactic Theory/Context
Three papers mentioned pedagogical theory as a contextual factor for their stud-
ies. None of the studies evaluated on how pedagogical or didactic theory was
evaluated upon in either LA, LAD, or frameworks. The three papers that men-
tioned pedagogical were: [3] who had a second iteration of expert evaluation
which consisted of 10 learning experts. As they applied the TERRENCE sys-
tem to their prototype they included a pedagogical direction described as the
pedagogical stimulation plan. The results from the user evaluation consisting
of approx 170 users assessed whether the pedagogical effectiveness of the pro-
totypes, the evaluation of its usability, and whether expectations to the peda-
gogical stimulation plan was met. This was done through observational, think-
aloud, verbal protocols, and controlled experiment. [1] reviewed other works on
ontology which had a pedagogical approach. This was compared to their own
ontology’s adaption to learning styles but their own ontology was not assessed
on any pedagogical parameters. [5] used competency-based learning to develop
their CBGLA algorithm but their study does not mention how CBGLA could or
should be implemented in a pedagogical context neither how CBGLA resulted
in the development of users’ competencies.
Author Proof
14 T. A. B. Tretow-Fish and M. S. Khalid
4 Conclusion and Discussion
In this work in progress systematic literature review, we identified 12 relevant
papers, synthesized 10 empirical papers, and covered two reviews as part of the
introduction for establishing the scope of the paper. The methods that directly
or indirectly contribute to the evaluation of LA or LAD of ALPs are grouped
into five categories: C1) evaluation of LA and LAD design and framework, C2)
evaluation of performance with LA and LAD, C3) evaluation of adaptivity func-
tions of the system, C4) evaluation of perceived value, and C5) Evaluation of
pedagogical and didactic theory/context. Figure 2shows the number of papers
covering the methods under the five categories as the central focus of their study.
Fig. 2. Distribution of evaluation categories
Pedagogical and didactic theory/context (C5) as a theme occurred in multiple
papers but none of the papers covered the evaluation of impact of an LA or
LADs of ALP. LA and LAD are rarely examined in an educational context as a
learning tool which informs either students and educators on making informed
pedagogical or didactic choices framed by a pedagogical or didactic theory.
We experienced the lack of pedagogical theories and concepts such as motiva-
tion, engagement, gamification, and nudging to mention a few. For future studies,
we raise the question, how do we improve learning and teaching quality with LA
if there are no learning theory attached to the data collection and presenta-
tion? And how can LA and LAD lead to better learning or teaching if there are
no actions associated with the data rather than just a presentation of learning
objectives’ difficulty, time spent on the platform or active users. Pedagogy and
didactics needs to be connected with LA and LAD of ALPs to support teachers
Author Proof
Evaluating Learning Analytics of Adaptive Learning Systems 15
and students as they focus on cognitive and meta-cognitive impact, behavioral
change, and social learning activities.
Broadly, we see assessments with ontologies, frameworks, methodologies,
experimental designs, mathematical models, and LA statistics which are almost
all the building blocks of a LA. Only evaluations of the visualization and the
pedagogical elements are not present.
1. Abech, M., et al.: A model for learning objects adaptation in light of
mobile and context-aware computing. Pers. Ubiquit. Comput. 20(2), 167–
184 (2016). ISSN: 1617–4909.
2. Bres´o, A., et al.: A novel approach to improve the planning of adaptive and inter-
active sessions for the treatment of major depression. Int. J. Hum. Comput. Stud.
87, 80–91 (2016). ISSN: 10959300.
3. Di Mascio, T., et al.: Design choices: affected by user feedback? Affected by system
performances? Lessons learned from the TERENCE project. In: ACM Interna-
tional Conference Proceeding Series, pp. 16–19, September 2013.
4. Hewitt-Taylor, J.: Use of constant comparative analysis in qualitative research.
Nurs. Stand. (Royal College of Nursing (Great Britain): 1987) 15, 39–42 (2001).
5. Hsu, W.-C., et al.: A competency-based guided-learning algorithm applied on
adaptively guiding e-learning. Interact. Learn. Environ. 23(1), 106–125 (2015).
ISSN: 1744–5191.
6. Asif Khawaja, M., et al.: Measuring cognitive load using linguistic features: impli-
cations for usability evaluation and adaptive interaction design. Int. J. Hum. Com-
put. Interact. 30(5), 343–368 (2014). ISSN: 1532–7590.
7. Martin, F., Dennen, V.P., Bonk, C.J.: A synthesis of systematic review research on
emerging learning environments and technologies. Educ. Technol. Res. Dev. 68(4),
1613–1633 (2020). ISSN: 1042–1629. 2
8. Mavroudi, A., et al.: Teacher-led design of an adaptive learning environment.
Interact. Learn. Environ. 24(8), 1996–2010 (2016). ISSN: 1049–4820. https://
9. Mousavinasab, E., et al.: Intelligent tutoring systems: a systematic review of char-
acteristics, applications, and evaluation methods. Interact. Learn. Environ. 29(1),
142–163 (2021).
10. Nye, B.D., et al.: Feasibility and usability of MentorPal, a framework
for rapid development of virtual mentors. J. Res. Technol. Educ. 53(1),
21–43 (2021).
Author Proof
16 T. A. B. Tretow-Fish and M. S. Khalid
11. Page, M.J., et al.: The PRISMA 2020 statement: an updated guideline for reporting
systematic reviews. BMJ 372, n71 (2021). ISSN: 1756–1833.
12. Santos, O.C., Boticario, J.G., Rodriguez-Sanchez, M.C.: New review of hyperme-
dia and multimedia toward interactive context-aware affective educational rec-
ommendations in computer-assisted language learning toward interactive context-
aware affective educational recommendations in computer-assisted language learn-
ing (2015). ISSN: 1361–4568.
13. Shemshack, A., Spector, J.M.: A systematic literature review of personalized learn-
ing terms. Smart Learn. Environ. 7(1), 1–20 (2020).
14. Tlili, A., et al.: Automatic modeling learner’s personality using learning analytics
approach in an intelligent Moodle learning platform. Interact. Learn. Environ.
(2019). ISSN: 17445191.
15. Zhang, L., et al.: Evaluation of a student-centered online one-to-one tutor-
ing system. Interact. Learn. Environ. 0(0), 1–19 (2021). https://doi.
Author Proof
... The learning analytics dashboards (LADs) of digital learning platforms provide the students with feedback to guide their learning and the teachers gain insights on content and students' activities to prepare their teaching [21]. Despite the advancement of the fields of user experience design, data science, and didactic design and their impact on the digital learning systems, the studies on LADs or Learning Analytics (LA) have not reported any case of implementing or redesigning dashboard for supporting faculty preparation based on the analytics on students' learning [21], [13]. ...
... The learning analytics dashboards (LADs) of digital learning platforms provide the students with feedback to guide their learning and the teachers gain insights on content and students' activities to prepare their teaching [21]. Despite the advancement of the fields of user experience design, data science, and didactic design and their impact on the digital learning systems, the studies on LADs or Learning Analytics (LA) have not reported any case of implementing or redesigning dashboard for supporting faculty preparation based on the analytics on students' learning [21], [13]. In higher education, the focus of the empirical studies on LA had been on increasing students success [1], [6], improving retention by identifying students at risk (based on click-stream or engagement on the platform) [22], [12], self-awareness of under-performing peers [9]. ...
Conference Paper
Full-text available
This study contributes with a case study on redesigning three Learning Analytics Dashboards (LADs) of the adaptive learning platform Rhapsode TM with instructions for pedagogical actions. Applying self-determination theory's elements of competence and relatedness and mental models in a design thinking process, the differences among the teachers' perceptions and the designers' intentions are highlighted through several methods to answer the questions of: How might we improve the learning analytics dashboards by prioritizing course instructors' perceived competence and relatedness? and How might we redesign learning analytics dashboards by including course instructors' purpose, insights, and recommending actions? These questions are answered first by developing three Role-based Per-sonas of Alina Action, Niels Novice, and Paul Privacy along with scenarios and user stories. Second, prototypes of interfaces are designed and tested in three iterations showing insights, recommended actions, and explanation of mechanics. Feedback from the tests on the prototypes receives positive feedback from all teacher personas. The teacher persona of Niels Novice also supplies a criticism of the insights and recommended actions on the basis of creating undesired interpretation, potential bias, taking away the freedom of interpretation, and an authoritative system that "instructs/orders" action. Additionally, the scope of the study cannot meet the persona of Paul Privacy's reservations about students' possible experience of surveillance.
Full-text available
This paper introduces a system that supports student-centered online one-to-one tutoring and evaluates the practical value of the system by running an experiment with 64 experienced mathematics teachers and 810 students in Grade 7. The experiment lasted for 50 days. A comprehensive evaluation was performed using students’ academic performance before and after usage of the system and the system log files. By classifying the students into active and inactive usage groups, it was determined that active students significantly outperformed inactive students on posttests, but with a small effect size. The results also suggested that high prior knowledge students tended to benefit more from using the system than low prior knowledge students. An explanation for this result was that students with a high level of prior knowledge were more likely to have good-quality interactions with their teachers. Therefore, although some advantages of this type of student-centered online one-to-one tutoring are observed, in this system, both the students and the teachers need to be further facilitated to produce more effective tutoring interactions.
Full-text available
The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement, published in 2009, was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. Over the past decade, advances in systematic review methodology and terminology have necessitated an update to the guideline. The PRISMA 2020 statement replaces the 2009 statement and includes new reporting guidance that reflects advances in methods to identify, select, appraise, and synthesise studies. The structure and presentation of the items have been modified to facilitate implementation. In this article, we present the PRISMA 2020 27-item checklist, an expanded checklist that details reporting recommendations for each item, the PRISMA 2020 abstract checklist, and the revised flow diagrams for original and updated reviews.
Full-text available
Learning is a natural human activity that is shaped by personal experiences, cognitive awareness, personal bias, opinions, cultural background, and environment. Learning has been defined as a stable and persistent change in what a person knows and can do. Learning is formed through an individual’s interactions, including the conveyance of knowledge and skills from others and experiences. So, learning is a personalized experience that allows one to expand their knowledge, perspective, skills, and understanding. Therefore, personalized learning models can help to meet individual needs and goals. Furthermore, to personalize the learning experience, technology integration can play a crucial role. This paper provides a review of the recent research literature on personalized learning as technology is changing how learning can be effectively personalized. The emphasis is on the terms used to characterize learning as those can suggest a framework for personalized and will eventually be used in meta-analyses of research on personalized learning, which is beyond the scope of this paper.
Full-text available
With the rapid growth of technology, computer learning has become increasingly integrated with artificial intelligence techniques in order to develop more personalized educational systems. These systems are known as Intelligent Tutoring systems (ITSs). This paper focused on the variant characteristics of ITSs developed across different educational fields. The original studies from 2007 to 2017 were extracted from the PubMed, ProQuest, Scopus, Google scholar, Embase, Cochrane, and Web of Science databases. Finally, 53 papers were included in the study based on inclusion criteria. The educational fields in the ITSs were mainly computer sciences (37.73%). Action-condition rule-based reasoning, data mining, and Bayesian network with 33.96%, 22.64%, and 20.75% frequency respectively, were the most frequent artificial intelligent techniques applied in the ITSs. These techniques enable ITSs to deliver adaptive guidance and instruction, evaluate learners, define and update the learner’s model, and classify or cluster learners. Specifically, the performance of the system, learner’s performance, and experiences were used for evaluation of ITSs. Most ITSs were designed for web user interfaces. Although these systems could facilitate reasoning in the learning process, these systems have rarely been applied in experimental courses including problem-solving, decision-making in physics, chemistry, and clinical fields. Due to the important role of a cell phone in facilitating personalized learning and given the low rate of using mobile-based ITSs, this study has recommended the development and evaluation of mobile-based ITSs.
Full-text available
The growth usage of mobile technologies and devices such as smartphones and tablets, and the almost ubiquitous wireless communication set the stage for the development of novel kinds of applications. One possibility is exploiting this scenario in the field of education, so creating more intelligent, flexible and customizable systems. Mobile devices can be used to help students to learn, considering their learning styles, surroundings, devices and profiles. In this way, the main goal of this article is to propose EduAdapt, an architectural model for the adaptation of learning objects considering device characteristics, learning style and other student's context information. To make this adaptation we used inferences and rules in a proposed ontology, named OntoAdapt. We believe that such ontology can help recommending learning objects to students or adapt these objects according to the context (context-aware computing). We evaluate this proposal in two ways. Firstly, we used scenarios and metrics to assess the ontology. Secondly, we developed a prototype of EduAdapt model and submitted to a class of 20 students with the intention of evaluating the usability and adherence to adapted objects, resulting in a 78 % of acceptance. In brief, the evaluation presented encouraging results, indicating that the proposed model would be useful in the learning process.
In this introduction to the special issue on systematic reviews on emerging learning environments and technologies, we introduce best practices for conducting systematic reviews and meta-analysis and discuss the need for a systematic review on emerging learning environments and technologies. We synthesize research on seven primary areas of emerging learning environments and technologies that include: (1) social media, (2) massive open online courses, (3) special education technology, (4) mobile learning, (5) game-based learning and gamification, (6) adaptive learning, and (7) learning analytics and introduce the thirteen articles that were included in this special issue. This article also provides implications for the readers on using and conducting systematic reviews.
One-on-one mentoring is effective for helping novices with career development. However, traditional mentoring scales poorly. To address this problem, MentorPal emulates conversations with a panel of virtual mentors based on recordings of real STEM professionals. Students freely ask questions as they might in a career fair, while machine learning algorithms respond with best-match answers. MentorPal is researching rapid development of new virtual mentors, where training data will be sparse. In a usability study, 31 high school students reported (a) increased career knowledge and confidence, (b) positive ease-of-use, and that (c) mentors were helpful (87%) but seldom covered their preferred career (29%). These results demonstrate feasibility for virtual mentoring, but efficacy studies are needed to evaluate its impact, particularly for groups with limited STEM opportunities.
The ability of automatically modeling learners' personalities is an important step in building adaptive learning environments. Several studies showed that knowing the personality of each learner can make the learning interaction with the provided learning contents and activities within learning systems more effective. However, the traditional method of modeling personality is using self-reports, such as questionnaire, which is subjective and with several limitations. Therefore, this study presents a new unobtrusive method to model the learners' personalities in an intelligent Moodle (iMoodle) using Learning Analytic (LA) approach with Bayesian network. To evaluate the accuracy of the proposed approach, an experiment was conducted with one hundred thirty-nine learners in a public university. Results showed that recall, precision, F-measure and accuracy values are in acceptance range for three personality dimensions including extraversion, openness, and neuroticism. Moreover, the results showed that the LA approach has a fair agreement with the Big Five Inventory (BFI) in modeling these three personality dimensions. Finally, this study provides several recommendations which can help researchers and practitioners develop effective smart learning environments for both learning and modeling. For example, it is needed to help identify more features of the hardest personality traits, such as agreeableness, using gamification courses. ARTICLE HISTORY