Conference PaperPDF Available

Exploring the application of process mining to support self-regulated learning: An initial analysis with video lectures

Exploring the Application of Process Mining to
Support Self-Regulated Learning
An Initial Analysis with Video Lectures
Manuel Caeiro Rodríguez, Martín Llamas Nistal,
Fernando A. Mikic Fonte
Department of Telematic Engineering
University of Vigo, Vigo, Spain
{mcaeiro; martin; mikic}
Manuel Lama Penín, Manuel Mucientes Molina
C. Singular de Investigación en Tecnoloxías da Información
University of Santiago de Compostela, Spain
{manuel.lama; manuel.mucientes}
AbstractSelf-regulated learning involves students taking the
responsibility of their own learning. Self-regulated learning students
usually adopt a variety of learning strategies and behaviors, such as
the performance of forethought-performance-reflection cycles or the
regular and sequenced work over time, that eventually enable them
to achieve a more significant and long-lasting learning. In this
paper, we explore if these particular behaviors and strategies can be
analyzed through the application of process mining techniques
taking as data the events registered during the performance of
learning activities. The discovery of the underlying processes
followed by students can open new approaches to study the real self-
regulated strategies used by students. The paper reviews the
techniques and tools available to perform the process mining of
events related to self-regulated learning and describes some initial
works in this area. Furthermore, as an initial empirical study, we
analyze the process followed by students regarding the visualization
of videos provided in a first-year engineering subject. The obtained
results are studied taking into account the grades obtained by the
students. The results show that the students that obtained the best
grades follow more varied routes than the students that obtained the
worst grades. In addition, the best ones are more regular over time
regarding weekly video visualization, mainly at the beginning of the
term, while the worst ones visualize the videos mainly at the second
part of the term.
Keywords- Self-Regulated Learning, Process Mining, Event
Sequence Analysis
During the last years, the use of Information and
Communication Technologies (ITC) to support our working and
social activities has increased significantly and many changes
have been produced. Teaching and learning activities are not an
exception to this general trend. At all levels, from kindergartens
to higher education institutions, teachers and students have
adopted new tools and changed their practices and approaches,
both in accordance to the technology and pedagogy used. In any
case, it is important to have in mind that these changes are not
produced just by the advantages provided by the new
technologies, but also by the changes in the educational needs
and goals, namely: what has to be learnt has changed.
Nowadays, the acquisition of pure theoretical knowledge is not
recognized as the most valuable outcome. By the contrary, the
development of other human-related competencies and skills is
demanded, such as creativity, problem-solving, critical thinking
or team-work capabilities, are desirable [1]. In this context, the
new educational goals demand more active and participative
pedagogical approaches, situating the learner at the center of the
teaching and learning process. ICT technologies seem perfect to
match many of the new needs of the new active scenario,
providing tools that allow a greater participation of the students.
They can be used to actively involve students during the
performance of simulations, experiments, discussions, etc.
Furthermore, the new learning environments supported by
technology can provide other advantages, as long as they can be
more flexible and support personalization [2].
In any case, despite the changes in the technological and
learning domains, there is something that remains: learning is
produced inside the mind of the learner. According to cognitive
load theory, learning is the result of a cognitive effort [3]. Many
times, the main responsibility for learning is located out of the
learner: the teachers, the tools, the methods, etc. Nevertheless,
particularly according to the Self-Regulated theory (SRL), the
learner is the key person in this process and his/her participation
is key to be successful. Moreover, SRL also identifies some
strategies that SRL students usually adopt during their learning
experiences [4]. Such strategies are not fixed procedures, but
there are some common sequences of activities that can be
generally identified. In any case, they are not new recipes, but to
a long extend well-known practices, such as to mix different
kind of study activities with rest and leisure periods, to work on
a daily basis, performing activities to recall, self-assessment and
review, and to keep up with motivation and self-reflection.
Usually, most successful students perform these activities
naturally. The development of SRL strategies is particularly
important among first-year college students, because the
appropriate management of these strategies can make a big
difference in the final results.
The purpose of this paper is to analyze the possible use of
process mining techniques to explore students’ event logs and
check if the results obtained can be related to SRL. A basic idea
is to identify desirable behaviors. Process mining techniques
have been developed in the field of business processes to enable
the discovery of process models from event logs. They have
been very successful in different domains, also in educational
contexts, being named as educational process mining [5]. In this
paper, we explore how these and other techniques related to the
analysis of activities and processes can be used to support the
study of SRL processes. In addition, an initial explorative
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1772
analysis of events related to the visualization of video lectures in
a first-year degree course is also presented, showing how
process mining techniques can be applied.
The rest of the paper is organized as follows. Next section
introduces some of the main ideas about the SRL theory and
approach, focusing mainly on the intended plan of activities.
Then, section III introduces technologies used to analyze
sequences of events, particularly process mining. In addition,
some works about the use of process mining to analyze SRL in
students are reviewed. Then, in section IV we present the initial
analysis from the event log of our students. The paper ends with
an analysis of the results and conclusions in section VI.
Self-Regulated Learning (SRL) is a theory about learning
that involves many ideas and contributions from different fields
[6]: pedagogy, psychology, philosophy, etc. It considers that
learning is a high-level process involving many components:
cognitive, metacognitive, motivational, situational, behavioral,
etc. A main idea about SRL is that the responsibility of learning
is on the learner. From this point of view, teachers have a great
work while providing learners with the appropriate tools,
resources and guidance to orient learners, but learners have to
assume they are in control and take the decisions that affect to
their own learning. It is a complex trade-off, but this view
focuses the attention on the autonomy of the student and on
his/her freedom to organize, manage and perform different kinds
of activities over time.
There are different models about SRL corresponding to a
variety of authors. Among them, we recognize Zimmerman and
Pintrich as two of the most important ones. In words of
Zimmerman, “Self-regulations refer to self-generated thoughts,
feelings and behaviors that are oriented to attaining goals” [4].
Pintrich affirms that SRL is “an active, constructive process
whereby learners set goals for their learning and then attempt to
monitor, regulate, and control their cognition, motivation, and
behavior, guided and constrained by their goals and the
contextual features in the environment” [7]. Despite differences
among authors, SRL generally identifies a cycle of activities that
can be observed in self-regulated learners [8]: forethought-
performance-reflection. Of course, these activities can be
produced in many different ways, because they are at a high
cognitive level. For example, forethought can be performed
writing a plan in a paper, organizing the weekly schedule,
thinking mentally about what to do, sharing with a friend the
study plan, etc. In a different way, but also related to the
processes, self-regulated learners usually perform much more
and varied activities than other students, in many cases because
they usually approach different learning problems with different
strategies and approaches.
The analysis of the activities performed by students and of
the sequencing of activities in accordance to the SRL theory has
been approached by several authors. In the literature, the
following studies have been found:
In [9] activity transition graphs are used to analyze
differences among 8 students related to the way in
which they regulated their learning over time.
Learners that used a high variety of activities and
learners who followed specific sequences were
distinguished clearly. In this study, it is not
concluded that learners performing a specific
sequence were more successful, but it is considered
that learners that perform more metacognitive
activities get better learning outcomes.
In [10] concordance analysis was used to conclude
that high and low performing students shown
differences in learning sequences. This study is
based on the analysis of frequencies and patterns of
self-regulatory activities. Particularly, there were
clear differences related to the performance of
testing and monitoring activities, that were
performed in a more flexible way by successful
More recently, in [11], students’ activities captured in event
logs are analyzed by process mining techniques to discover SRL
processes and temporal patterns. The processes, patterns and
frequencies of the most successful and least successful students
were analyzed to try to identify differences among them. The
different activities were classified in accordance to SRL high
level activities: Metacognition, such as Orientation, Planning,
Goal Setting Evaluation and Monitoring; Cognition, such as
Reading, Repeating, Search and Elaboration; Organization and
Motivation. Events in correspondence to these activity types
were collected. Process mining techniques used were Fuzzy
Miner and ProM, cf. section III.A. ProM was used to check the
conformance of the event logs to some proposed process models
and a Fitness metric was defined to measure the measure the
similarity of a set of traces to such a reference model. In addition,
differences in frequencies of SRL events were also studies. A
main finding of this study was that successful students show
more learning and regulation events. Particularly, temporal
patterns of students’ spontaneous learning steps were different
among the two groups of students. For successful students, more
regulation event types are identified: orientation and planning
activities are identified before the performance of information
processing activities. Furthermore, they also constantly monitor
different learning events and perform evaluation activities. In
general, the process model of successful students corresponds
well to current theories of SRL. By the contrary, temporal
patterns of less successful students resemble a surface approach
to learning. Preparation and evaluation activities are partly
missing and repeating activities is more frequent than
performing different types of activities oriented to achieve a
deeper processing of the information. In this way, this study
demonstrates how theoretical models and assumptions about
SRL can be tested by process mining methods.
Despite these studies, the identification of patterns of
activities that can be used as predictors or indicators of SRL has
not been established. Students show a variety of behaviors
regarding the sequential organization of their learning and
regulation activities. In general, this is a complex challenge
because it depends on the pedagogical approach (behaviorism,
constructivism), on the educational context (individual,
collaborative), on individual preferences, etc. In addition,
activities can be considered at different levels of granularity,
from micro to macro activities, and these differences are difficult
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1773
to capture in event logs and it is also complex to establish a
correspondence between micro and macro activities.
In any case, although there is not any assumption about
desired or preferred activities, there are some clear ideas about
the sequencing of activities that are generally accepted. For
example, the performance of activities over time, working in a
weekly basics is a good practice. Also, it is a general assumption
that SRL students have some kind of plan for action and perform
their activities in accordance to such a plan involving different
kind of activities: forethought-performance-reflection. As a
result, another general accepted idea is that students that get
better marks usually follow a SRL approach characterized by the
performance of some activities, and students that perform badly
do other set of activities [11].
During the performance of any teaching and learning process
many events are produced. Indeed, large datasets involving a
sequence of events generated in a specific order can be available
and can be analyzed using specific techniques developed in the
data analytics field [12].
In this paper, we focus the attention on the analysis of event
logs related to the activities performed during teaching and
learning processes. Different methods for sequential and
temporal analysis of SRL data can be found in the literature [13,
14]. Nevertheless, during the last years new methods have been
developed taken into account process models [5]: Process
Mining (PM), Sequence Pattern Mining (SPM), Intention
Mining (IM) and Graph Mining (GM).
A.Process Mining
Process Mining (PM) is one of the techniques that can be
used to analyze event sequences and provide indicators of
interest related to them. More specifically, the main goal of
process mining is to discover, monitorize and improve real
processes extracting knowledge from the event logs produced by
these processes [15, 16]. There is a main difference between
process mining in relation to other techniques: the assumption
of a latent or implicit model. The concept of process model is
very abstract, involving activities and transitions, but it does not
refer to any kind of cognitive construct. The models are obtained
from events collected during the actual execution or
development of the processes, both implicit or explicit. In this
way, PM provides insights about what is actually happening.
The application of PM techniques requires the registration of
events related to some activities and to a particular case (e.g. a
learner experience during a course). An event log is a collection
of cases, i.e. events from different students, that can be seen as a
sequence of events. Generally, event logs include additional
information about the process and context in which they are
generated and this information can be used during the
application of the techniques: Resource, as the student that
participates in the teaching/learning activity and whose events
are tracked; Timestamp, as the indications of when a student
initiates and finishes some activity, such as visualizing a video;
and any other data that can be relevant, such as the grades
obtained during the process, the device used, etc.
PM includes a variety of techniques for different purposes:
automatic discovery of processes, model conformance
verification, social network and organisations mining, automatic
building of simulation models, prediction of cases,
recommendations based on historic data, etc. Fig. 1 shows the
three main types of process mining techniques [17]:
Discovery involves the production of a model from
an event log without using any other information a-
priori. Generally, Petri Nets is used as the reference
notation to generate the models, but other notations
are also in use: Business Process Model Notation
(BPMN), Event-driven Process Chain (EPC) or
Unified Modeling Language (UML) activity
diagrams. Furthermore, in some cases other type of
diagrams are also used, in order to represent social
networks associated to process models.
Model conformance verification involves the
verification of an existing process model through
the comparison of an event log to a process model.
This can be used to validate if reality, as it is
recorded in an event log, is in conformance to a
model. It is possible to consider different types of
models: procedural, organizational, declarative
process, business rules/policies, etc.
Enhancement or extension of existing process
models from the information contained in an event
log. The analysis of the process model can be used
to identify possible problems, such as bottlenecks
or parts of the process model that are never
activated. This information can be used to improve
the process model and alleviate these issues.
Fig. 1. Main components and interactions involved in PM [17]
Several different algorithms are available to perform the
techniques related to process mining [5]:
The α-algorithm can be used to produce a Petri Net
from an event log and it can deal with concurrency
[17]. It is simple, but many of its fundaments have
been embedded in more complex and robust
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1774
techniques. The input of the α-algorithm is an
event log (L) over a set or sequence of activities
(<a1, a2,…, an>). This algorithm is made up by 8
steps: (i) each activity in the event log corresponds
to a transition; (ii) find the set of start activities, i.e.
the first elements of any sequence; (iii) find the set
of end activities, i.e. the last elements of any
sequence; (iv) find pairs of activities, i.e. (A, B),
such as that every element included in A and every
element included in B are causally related; (v)
delete all previous found (A, B) that are not
maximal; (vi) each pair (A, B) is a place (P) in the
Petri Net. Add and initial and a final place; (vii)
connect with an arc each place (P) with each
element of its set A of source transitions and with
each element of its set B of target transitions. In
addition, add an arc from the source place to each
start transition and add another arc from each end
transition to the final place; and (viii) the final
model is made up by all the places, transitions and
arcs defined.
The Fuzzy Miner algorithm uses events log data to
generate a complete model made up by modes
(activities) and edges (relations between activities)
by taking the relative importance and the temporal
order of all events into account. The algorithm uses
two basic metrics: significance and correlation
[18]. Be aware they do not directly correspond to
the well-known statistical measurements.
Significance measures the relative importance of
the occurrence of events and of relations among
events. More frequent events are assessed as more
significant. Correlation is calculated only for edges,
indicating how closely related are two events
following one another. As a final step, the
following rules are applied to simplify the model by
making decisions regarding the inclusion of nodes
and edges in the process model: events that are
highly significant are preserved; events that are less
significant, but highly correlated are aggregated;
and events that are less significant and lowly
correlated are abstracted. It is possible to influence
the model simplification by parameter setting, for
example, through cut off values. Similarly, edge
filtering is also used to bring structure to the model.
The Heuristic Miner algorithm uses likelihood by
calculating the frequencies of the relations among
tasks (eg., causal dependency, loops, etc.). and
constructs tables and graphs with the
dependency/frequency data. This algorithm
presents a low sensitiveness to noise and
incompleteness in logs.
The genetic algorithms provide models built on
causal matrixes with input and output dependencies
for each activity. These algorithms have a good
behavior in case of problems such as noise,
incomplete data, non-free-choice constructs,
hidden activities, concurrency and duplicate
activities. Among these algorithms, ProDiGen [19]
is the one that presents the best results for both
structured and unstructured processes. It guides its
search towards complete, precise and simple
models, using a hierarchical tness function that
takes into account completeness, precision, and
simplicity and that uses heuristics to optimize the
genetic operators (a crossover operator that selects
the crossover point from a Probability Density
Function generated from the errors of the mined
model, and a mutation operator guided by the
causal dependencies of the log).
Two of the most well-known software tools to perform
Process Mining are Disco and ProM. Disco is based on Fuzzy
Miner, but it has been further developed in many ways. ProM
has more than 600 available plugins and dozens of model types.
This software package is maintained at the University of
Eindhoven ( and provides
access to a large range of process mining and analysis tools.
Other tools are [20]: Celonis Discovery, Perceptive Process
Mining, QPR ProcessAnalyzer, Aris Business Process Analysis,
Fujitsu Process Analytics, XMAnalyzer, and StereoLOGIC
Discovery Analyst and ProDiGen platform. In particular, the
ProDiGen platform [21] includes algorithms for automatically
extracting frequent and infrequent behavioral patterns from the
process models obtained by the discovery algorithm [22].
During the last years, process mining has been increasingly
used in research related to the use of ICT to support teaching and
learning activities [5, 23]. In general, it is assumed that events
produced as a result of a teaching or learning experience can be
in correspondence with one or more processes, that can be linked
to some kind of implicit or latent process model.
B.Sequence Pattern Mining
Sequence pattern analysis involves the study of event
sequences to identify patterns or indicators of interest [24]. It
provides tools to automatically obtain models based on
sequences of events, such as the events produced during a
teaching and learning process. The models obtained can be used
to replicate any possible sequence of events produced during the
process. Moreover, metrics which assess the skills using the
obtained models can also be developed, indicating for example
if a student is more or less self-regulated.
There are different methods to analyze event series
depending on the goal:
In case the goal is to measure the sequence’s
dependency regarding a past interval of the
sequence, time series methods should be applied
In case the goal is to classify sequences in
accordance to some scheme of categories,
Markovian methods are usually involved, aiming to
fit sequences of categories by estimating transition
In case the goal is to discover if a transition from a
particular transition to other one and the time taken
to do it, event history methods can be involved [26].
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1775
Sequence Pattern Mining (SPM) is usually applied for the
discovery of common sub-sequences, namely, to find if any
specific order of events is produced [24, 27, 28]. Some SPM
techniques are Lag Sequential Analysis (LAS), t-pattern analysis
and Markov models. In any case, it is very important to be aware
that the application of these techniques is recognized as
technically feasible for short series of events, but if a whole
process needs to be analyzed, a different kind of approach is
required [29]. For example, LAS application has already been
described in the psychological literature [30]. This work
describes the application of the method to a sequence view of
events, rather than a process view. In this case, the demands on
the size of the data points increase at such a rate that statistical
testing is usually not possible on any sequence longer than two
or three events.
The SPM techniques can be related to Episode Mining (EP).
The difference between both methods is that in case of the SPM
the goal is to identify the most frequent event patterns in a set of
event sequences, in case of the EP the goal is to discover the
most frequently used event patterns within an event sequence.
C.Intentional Mining
Intentional Mining (IM) is a new emerged field related to the
analysis of events’ sequences. Nevertheless, it is not directly
focused on the processes of activities, but on the reasoning
behind the such activities [31]. This is especially interested in
the case of SRL processes, because the importance of the cycle
forethought-performance-reflection. Nevertheless, for the best
of our knowledge, these techniques have not been applied to
learning and teaching processes, yet.
D.Graph Mining
The goal of Graph Mining (GM) is to find all frequent sub-
graphs in a large graph or collection of graphs. This can be
related to PM if the graphs of process are considered.
Nevertheless, the approach is quite different. In case of GM a
geometry-oriented approach is followed, trying to find
topological substructures in graph data [32].
We have carried out an initial empirical study trying to relate
the application of process mining techniques to the development
of SRL behaviors. For this study, we use data obtained from a
kind of flipped learning experience, in which students had the
opportunity to watch videos of recorded lectures as a
complement to other teaching and learning activities. In this
way, data events related to video visualizations were collected.
Using this data, we try to investigate if students’ activity related
to video visualizations is produced in accordance to certain
The study was performed with data collected from students
of a first-year bachelor degree course in Computer Architecture
in course 2013/2014. A kind of flipped-classroom approach was
implemented during this course, and students were asked to
watch the recorded video lectures as a complement to the
traditional classroom activities involving also lectures and
problem solving. A total number of 180 students were involved,
from which 80 passed the final exam and 100 failed.
Students were asked to watch the videos over the term in a
specific order, in accordance with the development of the
lectures in the classroom. A total number of 21 videos were
available. Nevertheless, no control was established to ensure or
force students to watch the videos. As a result, students had
completely freedom and some of students watched all the videos
while other students didn’t watch any video. In addition, the
order in which they have seem the videos was completely free.
Data about the time at which each student visualized a video
was collected. TABLE 1 shows an extract of the data set. The
total number of event logs corresponding to video visualizations
is 3.161. Data was pseudorandomized using keys for students.
The analysis of the data required some initial processing.
Initially the logs related to the visualization of each video were
located at separated files and they were joined into a single
document for processing. Next, some processing was performed
in order to remove consecutive visualizations of the same video
for the same student in order to clarify the results of the analysis.
For example, the following sequence: "ADMmasEjercicios
(25/11/13 20:31) ADMmasEjercicios (25/11/13 20:35)
ADMmasEjercicios (25/11/13 20:48)", was reduced to
"ADMmasEjercicios (25/11/13 20:31) ADMmasEjercicios
25/11/13 20:48)".
Students were grouped into two groups: Q1 of the most
successful students that obtained a mark in the final exam greater
than 7.5; and Q4 of the least successful ones, with marks
between 0 and 2.5. The assumption is that the most successful
students watched the videos in the order in which they were
proposed and over the whole term. By the contrary, the least
successful students would watch the videos randomly and
mainly at the end of the term. We wanted to test these theoretical
assumptions by analyzing the temporal patterns obtained from
our data sets.
C.Process Analysis
Figures 2 and 3 show the process models mined from the Q1
and Q4 students according to the final marks obtained. These
models, obtained through the ProDiGen platform, show the
routes of activities performed by students, in our case the
Student I D Data & Time Subhead
101092 16 November 2013 23:30 Algorítmez (2/4)
101127 17 November 2013 20:05 Algorítmez (1/4)
101072 18 November 2013 18:26 ADM y más ejercicios
101167 18 November 2013 23:55 Algorítmez (1/4)
101167 18 November 2013 23:55 Algorítmez (2/4)
101167 19 November 2013 0:06 Algorítmez (1/4)
101167 19 November 2013 0:08 Algorítmez (2/4)
101116 19 November 2013 0:40 ADM y más ejercicios
101026 19 November 2013 12:29 Algorítmez (2/4)
101058 20 November 2013 20:33 Algorítmez (2/4)
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1776
visualization of recorded lectures. A color notation is used to
represent the activities that have been performed more times as
in a heat map. The darker colors identify the activities performed
more times, and the clearer colors the less frequent ones.
Numbers in the arrows indicate the number of students that
followed that route.
Notice that in Q1 there are more than 50 traces that go from
the start process to other activities, meanwhile in Q4 just 17.
Clearly, the number of students that obtained a good mark was
much larger than the students that obtained a bad mark.
Fig. 2. Process map mined from the events of the (Q1) most successful
A second idea of the analysis of the diagrams is that in Q1
there are more variety of routes than in Q4. This is a bit tricky,
because what happens really is that in Q4 the number of samples
available is much smaller than in Q1, indeed much than one
fourth. As a result, the number of alternative routes in Q4 is more
reduced. In any case, there are some parts of the Q4 process map
where many alternative routes can be found:
“EjerciciosPuntoFlotante” and “ADMmasEjercicios”. These
videos are about lectures related to the problem solving of key
topics of the subject. These alternatives can also be viewed
referred to the Q1 students. From our knowledge of the subject
is clear that students are very interested on these activities. As a
conclusion, it could be considered that when students are
interested on some activity, they perform such activities
following many different routes, maybe visiting and revisiting
them after the performance of different topics.
Another point that can be observed in Q1 different than in
Q4 process map is that some sequences can be clearly identified.
For example: “Algoritmez34”-“Algoritmez44”-“Punto-
Frotante” and “ModelovonNewmann14”-“ModelovonN...24”-
“ModelovonNeumann34”-“Simplez412”. This sequencing is in
accordance to the sequence in which the lectures were provided
during the term. Clearly, in the case of the Q1 students, these
sequences cannot be observed. This is a kind of confirmation of
our hypothesis: “most successful students watch the videos in
the order in which they were proposed over the whole term”.
Fig. 3. Process map mined from the events of the (Q4) least successful
D.Dotted Chart Analysis
The previous diagram type provides a view or the activities
performed and the order in which they were performed.
Nevertheless, we were also interested about the time at which
activities took part, and their distribution over time. To do it we
performed a dotted chart analysis. The dotted chart is similar to
a Gant chart. In this kind of diagram a dot is plotted for each
event, showing the spread of events over time. The time
perspective can be observed clearly along the X axis while the
event types are represented with different colors or figures of the
dots. Users or cases are represented along the Y axis. For the
generation of this kind of diagram we used the Dotted Chart
Analysis tool available in ProM [33].
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1777
Figures 4 and 5 show the dotted charts obtained for the Q1
and Q4 students. A main different among both figures is the
number of components included. While in Fig. 4, 43 different
students are represented, in Fig. 5, just 13 from 45. From the
analysis of the two charts we can see that Q1 students watched
videos mainly during the first part of the course, until mid-
November, while Q4 students watched videos over the whole
term. This can be considered in contradiction to our initial
hypothesis, because we assumed that the best students would
visualize the videos over the whole term. A different
explanation, in view of the results obtained, could be that the
most applied students begin earlier their “homework” watching
the videos since the beginning of the course, and then they
decide if videos are valuable to complement his study. By the
contrary, the least applied students take a view of the videos later
in the term.
Fig. 4. Dotted chart mined from the events of the (Q1) most successful students
Fig. 5. Dotted chart from the events of the (Q4) least successful students
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1778
Another issue that can be observed in the videos is that Q1
students watch videos more frequently that Q4 students. In a
weekly basis for Q1 and once every two weeks for Q4. In any
case, this is a broad observation and a more detailed and precise
study needs to be performed.
Process mining comprises a broad set of algorithms and
techniques for sequence and temporal data analysis. This paper
provides a brief introduction to this field and considers the
application of these techniques to analyze students’ behavior in
accordance to the SRL theory. This is a very initial study, but it
seems clear that this is an approach that can provide many results
and where not many works have been carried out up to date.
Process mining is a quite recent discipline where new
algorithms and techniques are being developed continuously.
Nowadays, it is attracting the attention of researchers in many
domains, also in the e-learning area, to identify and analyze the
processes, explicit or latent, followed during the development of
different activities. When persons are involved, processes are
usually not fixed and determinist, but the view of what happens
in the real world can help to understand what is happening and
in which ways the reality differs from what we plan or expect.
In this way, process mining can be a very useful tool and perform
a key role in the support of human activities.
Similarly, SRL theory has been developed and formulated
during recent years. There are not magic recipes or clear
specifications related to what learners have to do to become
successful SRL students. Nevertheless, it is clear the
identification of high-level activities and a cycle related to the
behavior of these students: forefront-performance-reflection. It
would be very interested if some kind of technology, such as
process mining, could provide us some indicators about if a
particular student is behaving more or less in a SRL way. This
could help to identify students that need support and to provide
some recommendations.
In this paper, we have explored these general ideas. The
analysis provided do not offer any conclusive results, but they
show the complexities underlying this approach. The use of
process charts and dotted charts offer two different views about
the activities of the students and despite they do not provide clear
answers, both of them can help us to understand what is
happening. We think this information is valuable and plan to
continue working towards the definition of more clear indicators
and references.
A main concern about this paper is whether using micro-
level task such as video visualizations is enough to identify the
metacognitive activities underlying SRL. It is not clear if video
visualization really reflect the learning plan of the students.
Results section to describe what you have found out and
compare it with previous research on this sense.
This piece of research is supported by the research network
TELGalicia 3.0 (ED431D 2017/12) funded by the Galician
Regional Government.
[1]O. H. Graven and L. MacKinnon. “A survey of current state-of-
the art support for lifelong learning”. 6th International
Conference on Information Technology Based Higher Education
and Training 2005, Santo Domingo, Dominican Republic, 2015,
pp. 9-9. ITHET '05. IEEE, F2C/19-F2C/25.
[2]D. Dagger, A. O’Connor, S., Lawless, E., Walsh, and V. P.
Wade. “Service-oriented e-learning platforms: From monolithic
systems to flexible services”. IEEE Internet Computing, vol. 11,
no. 3, 2007.
[3]J. Sweller. “Cognitive load theory, learning difficulty, and
instructional design”. Learning and instruction, vol. 4, no. 4,
1994, pp. 295-312.
[4]B. J. Zimmerman. “Becoming a Self-Regulated Learner: Am
Overview”. Theory and Practice, vol. 41, no. 2, 2002, pp. 64-70.
[5]A. Bogarín, R. Cerezo, and C. Romero. “A survey on
educational process mining”. Wiley Interdisciplinary Reviews:
Data Mining and Knowledge Discovery, 2017.
[6]M. Boekaerts. “Self-regulated learning: a new concept embraced
by researchers, policy makers, educators, teachers, and
students”. Learning and Instruction, vol. 7, no. 2, 1997, pp. 161-
[7]P. Pintrich. “A Conceptual Framework for Assessing Student
Motivation and Self-Regulated Learning in College Students",
Educational Psychology Review, vol. 16, no. 4, 2004.
[8] B. J. Zimmerman. “Attaining self-regulation: a social cognitive
perspective”. In M. Boekaerts, P.R. Pintrich, and M. Zeidner
(Eds.) “Handbook of self-regulation”, 2000, pp.13-39, San
Diego, Academic.
[9]A. F. Hadwin, J. C. Nesbit, D. Jamieson-Noel, J. Code and P. H.
Wine. “Examining trace data to explore self-regulated learning”.
Metacognition and Learning, vol. 2, no. 2, 2007, pp. 107-124.
[10]F. de Jong. “Task and student dependency in using self-
regulation activities: consequences for process-oriented
instruction”. In F. de Jong and B. Van Hout-Wolters (Eds.).
“Process-oriented instruction: Verbal and pictorial aid and
comprehension strategies, 1994, pp. 87-99, Amsterdam: VU
University Press.
[11]M. Bannert, P. Reimann, and C. Sonnenberg. "Process mining
techniques for analysing patterns and strategies in students’ self-
regulated learning." Metacognition and learning, vol. 9, no. 2,
2014, pp. 161-185.
[12]C. Romero and S. Ventura. “Educational data science in massive
open online courses”. WIREs Data Management Knowledge
Discovery, vol. 7, 2017, pp. 1-12.
[13]P. M. Sanderson, and C. Fisher. “Exploratory sequential data
analysis: foundations”. Human-Computer Interaction, vol. 9,
1994, pp. 251-317.
[14]A. Langley. “Strategies for theorizing from process data”.
Academy of Management for Management Review, vol. 24, no.
4, 1999, pp. 691-710.
[15]W. M. P. Van der Aalst, A. Adriansyah, A. K. A. De Medeiros,
F. Arcieri, T. Baier, T. Blickle, and A. Gurattin. “Process mining
manifiesto” International Conference on Business Process
Management, BPM’11, 2011, pp. 169-194, Berlin, Germany:
Springer Heidelberg.
[16]C. Romero, S. Ventura, M. Pechenizkiy and R. Baker (Eds.).
Handbook of educational data mining. 2010. Boca Raton:
[17]W. M. P. Van der Aalst. “Process mining Discovery,
conformance and enhancement of business processes”, 2010.
Berlin, Germany: Springer-Verlag.
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1779
[18]C. Günter and W. M. P. Van der Aalst. “Fuzzy Mining: adaptive
process simplification base don multi-perspective metrics”. In
G. Alonso, P. Dadam, and M. Roseman (Eds.) International
Conference on Business Process Management, BPM’07, 2007,
pp. 328-343. Berlin, Germany: Springer.
[19]B. Vázquez-Barreiros, M. Mucientes, M. Lama, “ProDiGen:
Mining complete, precise and minimal structure process models
with a genetic algorithm”. Information Sciences, vol. 294, 2015,
[20]W. M. P. Van der Aalst. “Process Mining: Data science in
action”, 2016, Berlin, Germany: Springer.
[22]David Chapela, Manuel Mucientes, Manuel Lama: Discovering
Infrequent Behavioral Patterns in Process Models. 15th
International Conference on Business Process Management
(BPM 2017), pp. 324-340, Berlin, Germany, Springer.
[23]P. Reimann and K. Yacef. “Usign process mining for
understanding learning” In R. Luckin, S. Puntambekar, P.
Goodyear, B. Grabowski, J. D. M. Underwood, and N. Winters
(Eds.) Handbook of design in educational technology, 2013.
New York, USA: Routledge.
[24]R. Agrawal and R. Srikant. “Mining sequential patterns”
Proceedings of the Eleventh IEEE International Conference on
Data Engineering, 1995, pp. 3-14. Taipei, Taiwan.
[25]J. D. Hamilton. “Time series analysis”, vol. 2, 1994. Princeton,
New Jersey, USA: Princeton University Press.
[26]R. Steele. “Event history analysis”, 2005. ESRC National Centre
for Research Methods.
[27]D. Perera, J. Kay, I. Koprinska, K. Yacef, and O. Zaiane.
“Clustering and sequential pattern mining of online
collaborative learning data”. IEEE Transactions on Knowledge
and Data Engineering, vol. 21, no. 6, 2009, pp. 759-772.
[28]M. Zhou, Y. X. Xu, J. C. Nesbit, and P. H. Winne. “Sequential
pattern analysis of learning logs: Methodology and
applicati ons”. In C. Romero, S. Vent ura, M. Pechenizkiy and R.
Baker (Eds.) Handbook of educational data mining. 2010. Boca
Raton: Chapman&Hall/CRC.
[29]P. Reimann, J. Frerejean, and K. Thompson. “Using process
mining to identify models of group decision making in chat
data” Proceedings of the 9th International Conference on
Computer Supported Collaborative Learning, vol. 1, 2009, pp.
98-107, Rhodes, Greece.
[30]R. Bakeman and J. M. Gottman. “Observing interaction: an
introduction to sequential analysis” 2nd edition, 1997.
Cambridge: Cambridge University Press.
[31]G. Khodabandelou, C. Hug, R. Deneckere, and C. Salinesi.
“Process mining versus intention mining” Enterprise Business-
Process and Information Systems Modeling, 2013, pp. 466-480.
Berlin, Germany: Springer Heidelberg.
[32]T. Washio and H. Motoda. “State of the art of graph-based data
mining”. ACM SIGKDD Explorations Newsletter, vol. 5, 2003,
pp. 59-68.
978-1-5386-2957-4/18/$31.00 ©2018 IEEE 17-20 April, 2018, Santa Cruz de Tenerife, Canar
Islands, Spain
2018 IEEE Global En
Education Conference (EDUCON)
Page 1780
... The majority of data analysis studies focused on the understanding of patterns in learning behavior. Furthermore, around half of the data analysis papers have also tried to understand the impact of those patterns on a specific variable that measures learning effectiveness [9,12,27,44,45,60,70,82,94,126,137,146,200,213,227,236,260]. In the same way as in the controlled experiment papers, the impact of video characteristics has been studied mostly on the learning outcome [9,27,45,70,94,126,146,200,213,227,236,260] and on engagement [12,45,70,82,94,146,236]. Figure 8 sorts the characteristics studied in data analysis articles according to their frequency. ...
... Furthermore, around half of the data analysis papers have also tried to understand the impact of those patterns on a specific variable that measures learning effectiveness [9,12,27,44,45,60,70,82,94,126,137,146,200,213,227,236,260]. In the same way as in the controlled experiment papers, the impact of video characteristics has been studied mostly on the learning outcome [9,27,45,70,94,126,146,200,213,227,236,260] and on engagement [12,45,70,82,94,146,236]. Figure 8 sorts the characteristics studied in data analysis articles according to their frequency. We found that data analysis studies usually made use of learner's activity characteristics by taking advantage of system logs (e.g., play, pause) [9,12,44,45,57,60,79,99,126,137,139,140,146,167,189,200,203,212,213,227,239,243,260,266]. ...
... In the same way as in the controlled experiment papers, the impact of video characteristics has been studied mostly on the learning outcome [9,27,45,70,94,126,146,200,213,227,236,260] and on engagement [12,45,70,82,94,146,236]. Figure 8 sorts the characteristics studied in data analysis articles according to their frequency. We found that data analysis studies usually made use of learner's activity characteristics by taking advantage of system logs (e.g., play, pause) [9,12,44,45,57,60,79,99,126,137,139,140,146,167,189,200,203,212,213,227,239,243,260,266]. Other characteristics often analyzed were metadata and video transcripts [12,56,60,70,94,137,167,203,211,212,216,218,243]. ...
Full-text available
People increasingly use videos on the Web as a source for learning. To support this way of learning, researchers and developers are continuously developing tools, proposing guidelines, analyzing data, and conducting experiments. However, it is still not clear what characteristics a video should have to be an effective learning medium. In this paper, we present a comprehensive review of 257 articles on video-based learning for the period from 2016 to 2021. One of the aims of the review is to identify the video characteristics that have been explored by previous work. Based on our analysis, we suggest a taxonomy which organizes the video characteristics and contextual aspects into eight categories: (1) audio features, (2) visual features, (3) textual features, (4) instructor behavior, (5) learners activities, (6) interactive features (quizzes, etc.), (7) production style, and (8) instructional design. Also, we identify four representative research directions: (1) proposals of tools to support video-based learning, (2) studies with controlled experiments, (3) data analysis studies, and (4) proposals of design guidelines for learning videos. We find that the most explored characteristics are textual features followed by visual features, learner activities, and interactive features. Text of transcripts, video frames, and images (figures and illustrations) are most frequently used by tools that support learning through videos. The learner activity is heavily explored through log files in data analysis studies, and interactive features have been frequently scrutinized in controlled experiments. We complement our review by contrasting research findings that investigate the impact of video characteristics on the learning effectiveness, report on tasks and technologies used to develop tools that support learning, and summarize trends of design guidelines to produce learning videos
... This contrasts with the later Sobocinski et al. (2020) study in which the model choice is made explicit early on the theoretical framework section, providing clear signposting for the reader. The Rodríguez et al. (2018) study is clear in its theoretical focus on the SRL and provides a comprehensive theoretical treatment on the subject, signalling the Zimmerman (2000) and Pintrich (2000) models for particular focus. The method and results section, however, abandon any explicit linkage with SRL, in favour of a more atomic interpretation of video interactions. ...
Full-text available
We present a systematic literature review of data-driven self-regulated learning (SRL) that emphasises the methodological importance of temporality and sequence, as opposed to conventional statistical analysis. Researchers seem unanimous in their view of the importance of SRL in modern online and blended educational settings; this is borne out by number of reviews of literature on the subject. There has, as yet, been no systematic treatment of SRL in the context of its conceptualisation as a phenomenon that unfolds in sequences over time. To address this limitation, this review explores the corpus of work (n = 53) in which SRL and its related dimensions are analysed through the lenses of temporality, sequence and order. The results show that, in the pursuit of validity and impact, key decisions need to be addressed in regard to theoretical grounding, data collection, and analytic methods. Based on these outcomes, we propose a framework of directives and questions to aid researchers who want to push forward the field. This framework comprises four sub-areas: i) methodological considerations, relating to data capture and analytic processes; ii) theoretical considerations, relating to the usage of models of self-regulated learning and their related dimensions; iii) validity considerations, relating to the robustness of the chosen analytic outcomes studied; and iv) temporal considerations, relating to the articulation of analytic outcomes in the context of temporality and sequence.
Conference Paper
Full-text available
Process mining has focused, among others, on the discovery of frequent behavior with the aim to understand what is mainly happening in a process. Little work has been done involving uncommon behavior, and mostly centered on the detection of anomalies or deviations. But infrequent behavior can be also important for the management of a process, as it can reveal, for instance, an uncommon wrong realization of a part of the process. In this paper, we present WoMine-i, a novel algorithm to retrieve infrequent behavioral patterns from a process model. Our approach searches in a process model extracting structures with sequences, selections, parallels and loops, which are infrequently executed in the logs. This proposal has been validated with a set of synthetic and real process models, and compared with state of the art techniques. Experiments show that WoMine-i can find all types of patterns, extracting information that cannot be mined with the state of the art techniques.
Full-text available
The current massive open online course ( MOOC ) euphoria is revolutionizing online education. Despite its expediency, there is considerable skepticism over various concerns. In order to resolve some of these problems, educational data science ( EDS ) has been used with success. MOOCs provide a wealth of information about the way in which a large number of learners interact with educational platforms and engage with the courses offered. This extensive amount of data provided by MOOCs concerning students' usage information is a gold mine for EDS . This paper aims to provide the reader with a complete and comprehensive review of the existing literature that helps us understand the application of EDS in MOOCs . The main works in this area are described and grouped by task or issue to be solved, along with the techniques used. WIREs Data Mining Knowl Discov 2017, 7:e1187. doi: 10.1002/widm.1187 This article is categorized under: Application Areas > Education and Learning
Educational process mining (EPM) is an emerging field in educational data mining (EDM) aiming to make unexpressed knowledge explicit and to facilitate better understanding of the educational process. EPM uses log data gathered specifically from educational environments in order to discover, analyze, and provide a visual representation of the complete educational process. This paper introduces EPM and elaborates on some of the potential of this technology in the educational domain. It also describes some other relevant, related areas such as intentional mining, sequential pattern mining and graph mining. It highlights the components of an EPM framework and it describes the different challenges when handling event logs and other generic issues. It describes the data, tools, techniques and models used in EPM. In addition, the main work in this area is described and grouped by educational application domains. WIREs Data Mining Knowl Discov 2018, 8:e1230. doi: 10.1002/widm.1230 This article is categorized under: • Application Areas > Business and Industry • Application Areas > Education and Learning • Application Areas > Government and Public Sector
This is the second edition of Wil van der Aalst’s seminal book on process mining, which now discusses the field also in the broader context of data science and big data approaches. It includes several additions and updates, e.g. on inductive mining techniques, the notion of alignments, a considerably expanded section on software tools and a completely new chapter of process mining in the large. It is self-contained, while at the same time covering the entire process-mining spectrum from process discovery to predictive analytics. After a general introduction to data science and process mining in Part I, Part II provides the basics of business process modeling and data mining necessary to understand the remainder of the book. Next, Part III focuses on process discovery as the most important process mining task, while Part IV moves beyond discovering the control flow of processes, highlighting conformance checking, and organizational and time perspectives. Part V offers a guide to successfully applying process mining in practice, including an introduction to the widely used open-source tool ProM and several commercial products. Lastly, Part VI takes a step back, reflecting on the material presented and the key open challenges. Overall, this book provides a comprehensive overview of the state of the art in process mining. It is intended for business process analysts, business consultants, process managers, graduate students, and BPM researchers.
A conceptual framework for assessing student motivation and self-regulated learning in the college classroom is presented. The framework is based on a self-regulatory (SRL) perspective on student motivation and learning in contrast to a student approaches to learning (SAL) perspective. The differences between SRL and SAL approaches are discussed, as are the implications of the SRL conceptual framework for developing instruments to assess college student motivation and learning. The conceptual framework may be useful in guiding future research on college student motivation and learning.