Content uploaded by Saman Rizvi
Author content
All content in this area was uploaded by Saman Rizvi on Oct 03, 2018
Content may be subject to copyright.
Temporal Dynamics of MOOC Learning Trajectories
Saman Rizvi
Institute of Educational Technology
The Open University, Milton Keynes,
United Kingdom
sam an.rizvi@open.ac.uk
Bart Rienties
Institute of Educational Technology
The Open University, Milton Keynes,
United Kingdom
bart.rienties@open.ac.uk
Jekaterina Rogaten
Institute of Educational Technology
The Open University, Milton Keynes,
United Kingdom
jekaterina.rogaten@open.ac.uk
ABSTRACT
Massive Open Online Courses (MOOCs) are a relatively new
online learning phenomenon, whereby in 2017 more than 81
million learners have followed around 9,400 courses offered by
more than 800 universities. Learners’ retention has been one of the
most vital issues associated with MOOC learning. A large body of
literature can be found addressing various aspects of retention.
However, few studies have examined the temporal aspects of
learning processes, and why some learners complete only a few
learning activities before dropping out, while others persist over
time. Little is known about the nature and level of participation, or
learners’ progression in ordered learning activities in MOOCs, i.e.,
learners’ learning pathways. This study aims to fill this gap in
knowledge by analyzing an Open University MOOC offered via
FutureLearn platform. Using exploratory methods associated with
Educational Process Mining (EPM) on system logs, the study
explored self-allocated time that 2,086 learners assigned to a
variety of learning activities. Learners’ activities were mapped to
identify common and distinct learning pathways. Analyses were
performed on two distinct groups of learners: Completers and Non-
Completers. Using the measure of relative frequencies, the study
compared participatory behaviors of both groups with expected
learning behavior for all types of learning activities. Also, we
explored typical weekly performance, identified and mapped most
significant temporal learning pathways of subgroup of learners.
The results indicated that at least one main and dominating pathway
existed, but paths of dominant subgroups of Completers and Non-
Completers remained noticeably distinct. We concluded the paper
with practical implications and limitations of using process mining
methods for temporal behavioral modeling in educational domains.
Future research directions and potential benefits of such temporal
modeling are also discussed.
CCS Concepts
• Information syste ms➝ Information systems applications -
Data mining • Applied computing ➝E-learning.
Keywords
Educational Process Mining; MOOCs; Temporal Modelling;
Behavioral Analysis.
1. INTRODUCTION
Learning in Massive Open Online Courses (MOOCs) generates
voluminous data in the form of logs stored in system databases. To
date, only a very small fraction of that data has been explored in
systematic MOOC research. In log-based behavioral modeling in
educational domains, researchers’ main focus has been on ITS
(Intelligent Tutoring System) or Learning Management System
(LMS), with very few attempts to analyze in-depth paths within
MOOCs log data [1].
Possibly one of the most controversial debates in both residential
and online education is how to define success in learning. Previous
research [2] have shown that merely assessment results or
participation levels do not represent “academic success” in
MOOCs. Academic success and failure may be partly hidden in
learners’ journeys through their respective learning activities, and
in interactions with a variety of learning resources. Recent research
[1, 9] also suggested that academic grades do not always evidence
learning. Instead, learning itself is processual, and is guided by
learners’ intentions. The processual nature of learning may be
observed and measured via interaction and engagement with a
variety of learning and assessment activities (e.g., audio, video,
discussion, quiz, article). Thus, a comprehensive log data
exploration is needed to understand learners’ behavioral patterns
and their temporal learning choices in MOOCs learning
environment. Hereby, the term temporal dynamics used in this
research has twofold meanings; the engagement-duration, and
sequential progression through various activities.
The structure, curriculum, and learning activity design within
MOOCs have lately been topics of interest for both researchers and
providers. However, research on pedagogical aspects of MOOC
learning environments [4] is still in its early stages. In contrast,
retention in MOOCs has been extensively studied, but it remains
one of the largest concerns for major providers like Coursera,
FutureLearn, and edX. An average MOOC duration is four to six
weeks, and learners’ dropout rate increases significantly after a
couple of weeks [14, 15]. In related online learning literature,
design aspects of the online learning environment have been found
to influence learners’ retention strongly. Generally, the issues of
success in online learning are closely linked to the learning design,
where learning design can be described as the process of designing
pedagogically informed learning activities to support learners while
remaining aligned with the curriculum. Numerous studies [6, 7]
have found a strong link between learning design and successful
learning outcomes or academic achievement. Overall, course
structure, duration, learning/instructional design, length and
duration of course contents are some of the course-level factors that
can be altered to improve retention. Therefore, MOOC developers
are experimenting with a variety of content types, moving away
from predominantly video-based MOOCs [9].
Permission to make digital or hard copies of all or part of this work for
personal or classroom us e is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for
components of this work owned by others than ACM must be honored.
Abs tracting with credit is permitted. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from Permissions@acm.org.
DATA '18, October 1–2, 2018, Madrid, Spain © 2018 Association for
Computing Machinery. ACM ISBN 978-1-4503-6536-9/18/10…$15 .00
https://doi.org/10.1145/3279996.3280035
One way to approach this problem is analyzing log data from
different MOOC designs. The black box of log data storage can be
scrutinized to understand learners’ temporal behaviors across a
range of different learning designs. For instance, in a large-scale
study of four edX MOOCs, [10] found that participants exhibited a
pattern of ‘non-linear navigation through the course materials’.
While answering the question on whether the navigation strategies
of MOOC learners differ by demographics, it was reported that
certificate-earners remained inclined towards the application of
non-linear navigation strategies and that “certificate earners
repeated visiting prior sequences three times as often, presumably
to review older content.” [10]. Hence, the research suggests distinct
navigational strategies, and that clicking (or not clicking) the
activities as “completed” represented two distinct psychological
dispositions; one when a learner might be inclined to attain a
certificate, and the other when learner showed no intention to get a
certificate, yet, continued to learn.
It is worth to mention here that FutureLearn’s policy on certificate
of participation allows a non-linear navigation through the
activities. However, a learner must mark as complete at least 50%
of the course steps and attempt every test question to get a
certificate of participation. This leads to the learners’ classification
we used in current study, where analyses were performed on two
distinct group of learners; Completers or those who marked all their
activities as completed, and Non-Completers or those who never
marked any of their activities as completed. This also reinforces the
idea that modifications or alterations in learning designs should be
based on knowledge extracted from the system logs; such as access
frequency, duration, transition between activities, behavior of
clicking the activity as completed and dominant group progression
(i.e. relatively large number of learners following the same
navigational pattern).
2. THEORETICAL BACKGROUND
The self-allocated time that learners assign to a variety of learning
activities, as they sequentially progress in a course, has always been
a topic of interest in interdisciplinary and Educational research.
Previously, [11] proposed an experimentalists approach towards
learning designs and termed science of Education; essentially a
design science. The research encouraged experimenting innovative
designs in technology-enhanced learning with a “goal to compare
different designs to see what effects what”.
Recently, in his seminal work, [12] claimed that learning comprises
of the active processes of filtering, selecting, organizing, and
integrating new information. At present, MOOCs developers like
FutureLearn and edX are trying to optimize the design of MOOCs
to lessen the cognitive load for learners (such as topics difficulty
level, information or task presentation, permanence of knowledge
acquired), while making absorption of textual, visual or auditory
information natural and easy for learners. Additionally, it is
common that learners distribute their time to different learning
activities to get the maximum (subjective) benefit in a limited time
frame [13], and this engagement-duration is recorded in the system
logs.
This provides a strong theoretical and empirical rationale for the
exploration of self-allocated time learners assign to a variety of
MOOC activities from a predetermined learning design comprising
textual and visual media. This study is the first step towards such
exploration with bases on the theoretical models of learning
processes and multi-media usage by [12], and on the Educational
design science principals by [11] for the proposed alterations and
adjustments in MOOC learning designs.
3. RESEARCH QUESTIONS
The main aim of this research is to understand how MOOC learners
distribute their learning time. The aim of this temporal behavior
mapping study is to understand learners’ journeys using the
processual nature of learning activities at three levels of granularity;
activity type, week-wise performance, and dominant group
progression in individual activities. Therefore, our main research
questions are:
RQ1. In terms of self-allocated learning time assigned to a
particular learning or assessment activity, to what extent does
participatory behavior vary between Completers and Non-
Completers?
RQ2. To what extent do temporal learning paths differ between
Completers and Non-Completers?
RQ3. Can learners’ subgroups be identified based on the sequence
of learning activities accessed?
4. METHODOLOGY
4.1 Data selection
In terms of the total number of enrolled learners, FutureLearn has
recently been ranked as the largest MOOC provider in Europe and
the 4th largest in the world [14]. In our current study, we used the
learners’ data log from one of the Open University (OU) science
MOOC offered via FutureLearn’s platform in the year 2017. The
course followed a design structure comprising a variety of steps
from FutureLearn step categories: Article, Discussion, Peer
Review, Quiz, Test, Video/Audio, and Exercise. This four weeks
long course comprised 68 activities in total.
From the logs, we used following attributes for a total of 2,086
learners enrolled in the MOOC; anonymized learners ID, week
number, learning activity-type, learning activity, and timestamps.
In our dataset, 449 learners marked each of their activity as
completed (Completers), and 805 learners never marked any of
their activity as completed (Non-Completers).
4.2 Methods
The enormity of volume of data already available, and the data
constantly being generated by the online learning system, requires
advanced analysis methods which are scalable, comprehensible,
and yet simultaneously easy to implement by a non-technical
stakeholder. Therefore, to develop learners’ temporal navigational
pattern, this study employed methods typically associated with
Educational Process Mining (EPM). Process Mining is a set of
emerging data mining techniques aimed at extracting process-
related knowledge from the events logs. EPM is the application of
Process Mining techniques in the educational domain [3]. In
Process Mining, the term Variant refers to an end-to-end sequence
of activities followed by a significant number of cases. For
example, Figure 1. clarifies the concept of this term and represents
a learning trajectory followed by a subgroup of seven learners.
We evaluated overall participatory behavior in the MOOC i.e.,
events over time and active learners over time. First, we evaluated
participatory behavior for Completers and Non-Completers in
different types of learning activities, such as Video, Article, and
Quiz. Based on the observed relative frequency of access for the
two groups, we examined if such behavior was aligned with
expected learning behavior, i.e., the respective learning design.
With an aim to understand how and why learners distributed their
time in a variety of learning activities, we compared the mean and
median duration of activities for the group of Completers. Second,
we mapped and compared mainstream weekly and dropout learning
traces. Third, for both groups, we compared temporal learning
pathways of dominant subgroups of learners (Variants) who
followed a particular learning trajectory or pathway.
Figure 1. List of the 524 types of learning sessions obtained
from the log. Type 27 shows four end-to-end interactions
(events), with the time associated with the duration of the
session (variant 27: learning path of a subgroup of 7 learners).
To construct the process maps or learning trajectories, we used
Disco tool, which implements an extended and improved version
of Fuzzy Miner algorithm [15]. This algorithm creates elaborative,
uncomplicated process maps and can easily identify infrequent
variants.
5. RESULTS AND DISCUSSION
Analyzing the number of learning events and active cases for both
groups revealed interesting results. The results from the
comparison of events over time and active cases over time (see
Figure. 2) showed that there were slight, but noticeable differences
between Completers and Non-Completers.
Completers
Non-Completers
Figure 2. Learning Events (per hour) for Completers and
Non-Completers
Both groups displayed a significant increase in their participation
in learning activities as the course progressed, i.e. a large number
of learning events started to occur later in course, and not from the
very start. A significant increase was noticed in number of active
cases and hence, number of learning events, during the first half of
the MOOC offering period. After crossing a peak busy period or
peak number of active learners per hour, overall learning activity
gradually diminished representing a vital issue associated with
retention (See Figure2.). However, some interested learners
continued until the end, and most of them marked activities as
“completed” (Completers). At the beginning, most learners initially
got engaged and then decided whether or not to actively continue.
It was noticed that some Completers tended to register and join in
much later and mostly continued to complete the course. Against
our expectations, the findings suggest that both Completers and
Non-Completers continued to join the course many days or even
weeks after the start date.
The next phase of this analysis used relative frequency of different
types of learning and assessment activities, such as Audio, Video,
Discussion, Quiz, and Test. While comparing the relative
frequency of learning activities in both groups’ logs with expected
frequency (i.e., actual activity distribution in that particular MOOC
learning design), it was noticed that the frequencies of accessed
events (for Non-Completers) and frequencies of completed events
(for Completers) were in line with the course design. For typical
learning activities such as Video, Article or Discussion, both
groups followed the course distribution. However, the relative
frequency of assessment activities, such as Test or Quiz, was much
less than expected in both groups (Figure3.). This observed
behavior points towards a persistent interest in learning activities
and not in assessment activities. However, the relative frequency
should be seen in relation to the observed time spent on these
activities.
Figure 3. Relative frequency of learning events
Evaluation of mean and median time taken by Completers to
complete an activity reported contrasting results (See Figure4.). A
significant difference was noticed between participation in learning
activities and assessment activities. Relative frequency of learning
activities was in line with the learning design, and relative
frequency of assessment activities was lower than expected. The
group of Completers spent a short time on typical learning activities
such as video, article or discussion and more time on assessment
activities, such as quiz and test.
One reason for this contrasting behavior could be an assumption
that learners already had expertise in certain topics and after
skimming through the material to refresh their memory, they paid
more attention to quizzes and tests. Another reason could be the
FutureLearn’s policy regarding certificate of participation that
allows learners to have a choice over what order to visit the steps
and whether to mark a step as complete. A learner must mark as
complete at least 50% of the course steps and attempt every test
question to get a certificate of participation.
Figure 4. Duration of Complete rs' learning events
Understandably, a few outlier learners might have skewed the
results of the duration of activities, making large mean values less
representative of an average Completer’s behavior (Figure4.). As
for short median time, one can assume that increasing playback rate
can decrease overall time spent on learning from the videos. Still,
the short median duration of 2 min 27 sec, for example, does not
seem enough for article reading. The pattern of short median time
for learning activities and long median time for assessment
activities (like quiz and test) signifies that Completers spent less
time on some learning activities, marked the complete button in
haste, and spent relatively more time on assessment activities to get
them done correctly.
Next, we constructed process maps to understand weekly
performance or traces for both groups (Figure5. and Figure6.). In
week1 we noticed significantly large number of activities for both
groups, which gradually declined, revealing vital issue associated
with retention in MOOCs.
Figure 5. Weekly trace/dropouts for Completers
In both cases, learners started learning by accessing resources from
week one or week two, while the resources were repeatedly
accessed. An interesting trend was that most of the learners who
marked each activity as completed, either lost interest and left the
course in the first week or else steadily continued till the end while
following the course design they were expected to follow
(Figure5.). On the other hand, Non-Completers dropped out in the
first or second week. Very few Non-Completers continued
accessing the course resources, however, they never marked any of
their activity as completed (Figure6.).
Figure 6. Weekly trace/dropouts for Non-Completers
We then extended the granularity to individual activities (See
Figure7.). The analysis involved identification of mainstream trace
variants from the logs. Overall, results were consistent with
previous findings, whereby main variant group in Completers
completed most activities, performed steadily till the end, or else
left after completing few or all activities from week1 (see top three
variants). This behavioral trace is also suggestive of the importance
of the first few activities in any MOOC design.
Moreover, it was found that the top variant for Non-Completer
remained one where a large number of learners dropped out after
accessing one single activity (the first activity of course which is a
video:1.1 video). In the next two variants for Non-Completers,
learners accessed one or two more activities before finally leaving
the course. Interestingly, Completers’ participation remained
relatively consistent in all activities which were apparent from the
relative frequencies, while Non-Completers stayed active for just
the first few activities and their activities diminished with time.
Figure 7. Comparison of frequent Variants in Completers and
Non-Completers
In conclusion, the results of this study support the following
propositions. First, MOOC learning is processual, and learning
trajectories can be mapped and compared with the learning designs
(the pathways a learner is expected to follow). Second,
engagement-duration is a key temporal aspect which should not be
left unexplored or unmapped as results sugget that engagement-
duration is one critical part of learners’ experience with learning
resources. Finally, learner’s progress remained linked with course
design. As such, mapping end-to-end learning processes could lead
to a better understanding of learners' dispositions.
6. IMPLICATIONS FOR FUTURE WORK
To improve the generalizability of findings from this study, more
deepened and broadened research is required amongst more MOOC
designs and platforms. An important research direction could be an
investigation of the temporal breakdown of learning trajectories of
MOOC learners. This unique research direction for this study also
involves the provision of an intuitive and elaborative, easy to follow
model visualization for learners’ overall participation and variant
groups’ behavior. One approach could be performing behavioral
modeling at different levels of granularity (activity type, weekly
behavior, and variant activity). However, it has been reported in
other studies that using Process Mining methods in combination
with other analytics techniques, such as manual or automated
clustering during preprocessing, produces context-aware
behavioral models [3]. This approach can depict natural attitude
more accurately because clustering finds natural grouping in data.
Another potential future research direction could also be one where
researchers utilize not only the log data but also other contextual or
interaction information, typically not captured in event log data;
such as learners’ demographics, discussion text or learning
outcomes. Additionally, drawing sequences of learning events are
important but the inclusion of the time dimension in learners’
pathway mapping is equally crucial. We intend to extend our
research along these lines and plan to perform comparative process
mining on different variants or learners’ subgroups within the
course or variants from different courses. It is our belief that the
processual nature of learning activities provides bases for informed
alterations in learning designs and this could potentially improve
retention in MOOCs as a consequence.
7. ACKNOWLEDGMENTS
This work was supported and funded by the Leverhulme Trust,
Open World Learning.
8. REFERENCES
[1] L. Juhaňák, J. Zounek, and L. Rohlíková, “Using process
mining to analyze students’ quiz-taking behavior patterns in
a learning management system,” Computers in Human
Behavior, 2017.
[2] S. Joksimović et al., “How Do We Model Learning at Scale?
A Systematic Review of Research on MOOCs,” Review of
Educational Research, p. 0034654317740335, 2017.
[3] A. Bogarín, R. Cerezo, and C. Romero, “A survey on
educational process mining,” Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery, vol. 8, no.
1, 2018.
[4] S. Sergis, D. G. Sampson, and L. Pelliccione, “Educational
Design for MOOCs: Design Considerations for
Technology-Supported Learning at Large Scale,” in Open
Education: from OERs to MOOCs, Springer, 2017, pp. 39–
71.
[5] M. G. Gomez-Zermeno and L. Aleman de La Garza,
“Research Analysis on MOOC Course Dropout and
Retention Rates.,” Turkish Online Journal of Distance
Education, vol. 17, no. 2, pp. 3–14, 2016.
[6] M. Kloft, F. Stiehler, Z. Zheng, and N. Pinkwart, “Predicting
MOOC dropout over weeks using machine learning
methods,” in Proceedings of the EMNLP 2014 Workshop on
Analysis of Large Scale Social Interaction in MOOCs, 2014,
pp. 60–65.
[7] B. Rienties and L. Toetenel, “The impact of learning design
on student behaviour, satisfaction and performance: A
cross-institutional comparison across 151 modules,”
Computers in Human Behavior, vol. 60, pp. 333–341, 2016.
[8] Q. Nguyen, B. Rienties, and L. Toetenel, “Mixing and
matching learning design and learning analytics,” in
International Conference on Learning and Collaboration
Technologies, 2017, pp. 302–316.
[9] D. Jansen and R. Schuwer, “Institutional MOOC strategies
in Europe,” Status Report Based on a Mapping Survey
Conducted in October-December 2014, 2015.
[10] P. J. Guo and K. Reinecke, “Demographic differences in
how students navigate through MOOCs,” in Proceedings of
the first ACM conference on Learning@ scale conference,
2014, pp. 21–30.
[11] A. Collins, “Toward a design science of education,” in New
directions in educational technology, Springer, 1992, pp.
15–22.
[12] R. E. Mayer, The Cambridge handbook of multimedia
learning. Cambridge university press, 2005.
[13] A. Wigfield and J. S. Eccles, “Expectancy–value theory of
achievement motivation,” Contemporary educational
psychology, vol. 25, no. 1, pp. 68–81, 2000.
[14] D. Shah, “Monetization Over Massiveness: Breaking Down
MOOCs by the Numbers in 2016,” EdSurge. Available
online: https://www. edsurge. com/(accessed on 25 July
2017), 2016.
[15] C. W. Günther and W. M. Van Der Aalst, “Fuzzy mining–
adaptive process simplification based on multi-perspective
metrics,” in International Conference on Business Process
Management, 2007, pp. 328–343.
Columns on Last Page Should Be Made As Close As Possible to
Equal Length
Authors’ background
Your Name
Title*
Re search Field
Pe rsonal website
Sa man Rizvi
PhD Candidate
Learning Analytics and
Ed ucational Data Mining
Ba rt Rienties
Professor
Learning Analytics
Je katerina Rogaten
Se nior Lecturer
Learning Analytics
*This form helps us to understand your paper better, the form itself will not be published.
*Title can be chosen from: master student, Phd candidate, assistant professor, lecture, senior lecture, associate professor, full professor