ArticlePDF Available

Are MOOC Learning Analytics Results Trustworthy? With Fake Learners, They Might Not Be!

Authors:

Abstract and Figures

The rich data that Massive Open Online Courses (MOOCs) platforms collect on the behavior of millions of users provide a unique opportunity to study human learning and to develop data-driven methods that can address the needs of individual learners. This type of research falls into the emerging field of learning analytics. However, learning analytics research tends to ignore the issue of the reliability of results that are based on MOOCs data, which is typically noisy and generated by a largely anonymous crowd of learners. This paper provides evidence that learning analytics in MOOCs can be significantly biased by users who abuse the anonymity and open-nature of MOOCs, for example by setting up multiple accounts, due to their amount and aberrant behavior. We identify these users, denoted fake learners, using dedicated algorithms. The methodology for measuring the bias caused by fake learners’ activity combines the ideas of Replication Research and Sensitivity Analysis. We replicate two highly-cited learning analytics studies with and without fake learners data, and compare the results. While in one study, the results were relatively stable against fake learners, in the other, removing the fake learners’ data significantly changed the results. These findings raise concerns regarding the reliability of learning analytics in MOOCs, and highlight the need to develop more robust, generalizable and verifiable research methods.
Content may be subject to copyright.
This manuscript is a post-print version of the document published
in:
Alexandron, G., Yoo, L. Y., Ruipérez-Valiente, J. A., Lee, S. and
Pritchard, D.E. Are MOOC Learning Analytics Results
Trustworthy? With Fake Learners, They Might Not Be! Expert
Systems (In press). International Journal of Artificial Intelligence in
Education. 2019. 10.1007/s40593-019-00183-1
https://link.springer.com/article/10.1007%2Fs40593-019-
00183-1
© 2019 Springer
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 1
Are MOOC Learning Analytics Results Trustworthy?
With Fake Learners, They Might Not Be!
Giora Alexandron, Weizmann Institute of Science
giora.alexandron@weizmann.ac.il
Lisa Y Yoo, Massachusetts Institute of Technology
lyy@mit.edu
José A. Ruipérez-Valiente, Massachusetts Institute of Technology
jruipere@mit.edu
Sunbok Lee, University of Houston
sunboklee@outlook.com
David E. Pritchard, Massachusetts Institute of Technology
dpritch@mit.edu
Abstract. The rich data that Massive Open Online Courses (MOOCs) platforms collect on the behavior of millions
of users provide a unique opportunity to study human learning and to develop data-driven methods that can address
the needs of individual learners. This type of research falls into the emerging field of learning analytics. However,
learning analytics research tends to ignore the issue of the reliability of results that are based on MOOCs data,
which is typically noisy and generated by a largely anonymous crowd of learners. This paper provides evidence
that learning analytics in MOOCs can be significantly biased by users who abuse the anonymity and open-nature of
MOOCs, for example by setting up multiple accounts, due to their amount and aberrant behavior. We identify these
users, denoted fake learners, using dedicated algorithms. The methodology for measuring the bias caused by fake
learners’ activity combines the ideas of Replication Research and Sensitivity Analysis. We replicate two highly-
cited learning analytics studies with and without fake learners data, and compare the results. While in one study,
the results were relatively stable against fake learners, in the other, removing the fake learners’ data significantly
changed the results. These findings raise concerns regarding the reliability of learning analytics in MOOCs, and
highlight the need to develop more robust, generalizable and verifiable research methods.
Keywords. Learning Analytics, MOOCs, Replication Research, Sensitivity Analysis, Fake Learners
PREFACE: THE BEGINNING OF THIS RESEARCH
During 2015 we were working on recommendation algorithms in MOOCs. However, we ran into a
strange phenomenon – the most successful learners seemed to have very little interest in the course
c
All rights reserved
2Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
materials (explanation pages, videos), and they mainly concentrated on solving assessment items. As
a result, the recommendation algorithms sometimes recommended skipping resources that we thought
should be very useful. The hypothesis that these are learners who already know the material (e.g., Physics
teachers) did not match the demographic data that we had on these users.
One day, we received a strange email from one of the users. The user complained about a certain
question, claiming that a certain response that was correct a week ago is now rejected by the system as
incorrect. Since it was a parameterized question (randomized per user), we suspected that the user viewed
it from two different accounts. Connecting this with the strange pattern of users who achieved high-
performance without using the resources, we realized that we bumped into a large-scale phenomenon of
Copying Using Multiple Accounts (CUMA) (Alexandron et al., 2017, 2015a; Ruipérez-Valiente et al.,
2016).
Our research on detecting and preventing CUMA started as a spin-off of the unexplained bias in
predictive modeling. Three years later, we return to investigate the bias issue and its effect on learning
analytics results.
INTRODUCTION
Modern digital learning environments collect rich data that can be used to improve the design of these
environments, and to develop ‘intelligent’ mechanisms to address the needs of learners, instructors, and
content developers (Siemens, 2013; U.S. Department of Education, Office of Educational Technology,
2012). MOOCs, which collect fine-grained data on the behavior of millions of learners, provide “un-
paralleled opportunities to perform data mining and learning experiments” (Champaign et al., 2014) (p.
1). A partial list of studies includes comparing active vs. passive learning (Koedinger et al., 2015), how
students use videos (Kim et al., 2014a,b; Chen et al., 2016), which instructional materials are helpful
(Alexandron et al., 2015b; MacHardy and Pardos, 2015), recommending content to learners on real-time
(Pardos et al., 2017; Rosen et al., 2017), providing analytics to instructors (Ruipérez-Valiente et al.,
2017c), predicting drop-out (Xing et al., 2016), to list a few. These studies fall into the emerging dis-
ciplines of Learning Analytics (LA), Educational Data Mining (EDM), and Artificial Intelligence in
Education (AIED). A recent review of the existing literature on the application of data science methods
to MOOCs can be found in (Romero and Ventura, 2017). While ‘Big Data in Education’ is mostly as-
sociated with MOOCs and other learning at scale applications, it is also very relevant to other widely
distributed platforms, such as Moodle (Luna et al., 2017).
Such research uses data-intensive methods that draw on machine learning, data mining, and arti-
ficial intelligence. These methods seek to extract meaningful structures from the data, which can have
predictive value in inferring the future behaviors of students (Qiu et al., 2016). Thus, their reliability is
highly determined by the quality of the data. Yudelson et al. (2014) showed that models fitted on high
quality data can outperform models that are fitted on a larger dataset that is of lower quality – a sort of
the ‘ed-tech variant’ of Peter Norvig’s “More data beats clever algorithms, but better data beats more
data” 1.
1https://www.azquotes.com/author/49745-Peter_Norvig
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 3
One of the main issues that can affect the quality of the data is noise, which is anything that is not the
‘true’ signal (Silver, 2012). Noise that appears as outliers can be identified and removed relatively easy
using outlier detection methodologies (Hodge and Austin, 2004). However, by definition, when there
are too many ‘outliers’ in a certain direction, they are not ‘outliers’ anymore, rendering outlier detection
methods ineffective. The same holds for data coming from complex, semi-structured or unstructured
domains, such as MOOCs, in which a plethora of behaviors are expected (Qiu et al., 2016). In such
cases, there can be numerous outliers in various directions that are caused by genuine learning activity.
Eventually, unfiltered noise can significantly affect various analytics computed on the data. For example,
Du et al. (2018) discussed how learning analytics perform differently on specific subgroups in MOOCs,
and proposed a methodology to discover such subgroups, which is based on Exceptional Model Mining
(EMM). The focus of our paper is not on discovering subgroups, but on showing that a certain subgroup
(fake learners) has exceptional behavior that can bias analytics. In the future, it can be interesting to
check if EMM techniques can identify subgroups of users who are actually fake learners.
In addition to the issue of noise, there is the issue of prior assumptions on the model. Educational
data mining methods make such assumptions on the process that generated the data, either for choosing
the objectives to optimize as proxies of the ‘true’ goal, for feature engineering, or for data abstraction
(Perez et al., 2017). Thus, making wrong assumptions on the process can largely affect the validity of
the results. Being able to model various subgroups is crucial for designing tailored pedagogic interven-
tions (Kiernan et al., 2001). This is especially true for the diverse population of the global classroom –
MOOCs. We note that making no ‘modeling’ assumptions and relying on Big Data alone, instead of as a
supplement to traditional data analysis methodologies (‘Big Data Hubris’), can also lead to large errors,
as in the case of Google Flu Trends (Lazer et al., 2014).
Not only that the accuracy of machine learning (ML) models is highly affected by such issues,
validating these models is a scientific and technological challenge (Seshia and Sadigh, 2016). ML mod-
els are mathematical functions that are learned from the data using statistical learning methods (Hastie
et al., 2001). Due to their probabilistic nature and complex internal structure, it is difficult to question
their reasoning (Krause et al., 2016). This is even more the case when such models are encapsulated
within artificial intelligence or analytics solutions that use them as ‘black-boxes’ for automatic or semi-
automatic decision making. This is a major concern of the Big Data era (Müller et al., 2016), which is
highly relevant to the educational domain as well (Pardo et al., 2016; O’Neil, 2017).
Fake Learners. Typically, data-driven education research makes the implicit assumption that the data that
are used represent genuine learning behavior. However, recent studies revealed that MOOCs can contain
large groups of users who abuse the system in order to receive certificates with less effort (Alexandron
et al., 2015a; Ruipérez-Valiente et al., 2016; Alexandron et al., 2017; Northcutt et al., 2016; Ruipérez-
Valiente et al., 2017a). As these users do not rely on ‘learning’ to achieve high performance, we denote
these users as fake learners (and use the term true learners to describe the genuine ones).
Fake learners introduce noise to statistical models that seek to make sense of learning data. The
effect of this noise depends mainly on the amount of fake learners, and on how aberrant their behavior is.
In order to remove their effect by filtering them out from the data, they must first be detected. However,
this typically involves sophisticated algorithms that are currently not available as ‘off-the-shelf’ tools
4Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
that can be used to clean the data. Also, while this study relies on algorithms for detecting two types
of fake learning methods – Copying Using Multiple Accounts (CUMA) and unauthorized collaboration
(Ruipérez-Valiente et al., 2017a), there might be more types of fake learners who currently sneak under
the radar.
The risk that the aberrant behavior exhibited by CUMA users, and their prevalence (>10% of cer-
tificate earners), which can bias learning analytics results, was raised in our previous work (Alexandron
et al., 2017), but remained an open question. The goal of the current research is to address the bias issue
directly. In this paper we demonstrate for the first time, to the best of our knowledge, that fake learners
can significantly affect learning analytics results. We also provide a sort of ‘future projection’ to what
will happen if the number of fake learners increases (a likely consequence of MOOC certificates gaining
more value due to the current transition to MOOC-based degrees (Reich and Ruipérez-Valiente, 2019)).
The findings that we report here significantly extend a preliminary report from this work (Alexandron
et al., 2018).
Research Questions. We study the following Research Questions (RQs):
1. (RQ1) Is there a significant difference between the ‘fake’ and ‘true’ learners with respect to various
performance measures, and to the amount of use of the course instructional materials?
2. (RQ2) Can this difference bias the results of learning analytics models in a significant way?
Replication Research. Our research approach combines the ideas of Sensitivity Analysis (Saltelli et al.,
2000) and Replication Research (Open Science Collaboration, 2015). We pick two highly-cited learn-
ing analytics MOOC studies, and evaluate how sensitive to fake learners are findings obtained with a
similar methodology on new data from our MOOC. This is done by replicating each of the studies with
and without fake learners’ data, and comparing the results. This is not a full, but a partial replication.
Specifically, we focus on the parts that study the characteristics of effective learning behaviors.
The studies that we replicate are “Correlating Skill and Improvement in 2 MOOCs with a Student’s
Time on Task” (Champaign et al., 2014), and “Learning is Not a Spectator Sport: Doing is Better than
Watching for Learning from a MOOC” (Koedinger et al., 2015). They appeared on the First (2014) and
the Second (2015) ACM Conference on Learning@Scale 2, which is a premier venue for interdisciplinary
research at the intersection of the learning sciences and computer science, with specific focus on large
scale learning environments such as MOOCs.
Contribution. This study provides the first evidence, to the best of our knowledge, that learning analytics
research in MOOCs can be significantly biased by the aberrant behavior of users who abuse the open
nature of MOOCs. This issue raises concerns regarding the reliability of learning analytics in MOOCs,
and calls for more robust, generalizable, and verifiable research methods. A systematic approach for
addressing this issue, within the conceptual framework of Educational Open Science (van der Zee and
Reich, 2018), is large-scale replication research. A significant stride towards making such research
technologically feasible is made by the MOOC Replication Framework (MORF) (Gardner et al., 2018).
2https://learningatscale.acm.org/
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 5
MATERIALS AND METHODS
In this section we describe the experimental setup, the data, and the data mining algorithms. Some
of the methodological contents of this section have been reused from (Ruipérez-Valiente et al., 2016;
Alexandron et al., 2017)
Experimental Setup
The Course. The context of this research is MITx Introductory Physics MOOC 8.MReVx, offered
on edX in Summer 20143. The course covers the standard topics of a college introductory mechanics
course with an emphasis on problem solving and concept interrelation. It consists of 12 mandatory and
2 optional weekly units. A typical unit contains three sections: instructional e-text/video pages (with
interspersed concept questions, also known as checkpoints), homework, and a quiz. Altogether, the
course contains 273 e-text pages, 69 videos, and about 1000 problems.
Research Population. The research population consists of 478 certificate earners, out of the 13,500 users
who registered for the course. Overall, 502 users earned a certificate, but we removed beta-testers, users
who help validate content before the course is published, from the analysis. Gender distribution was 83%
males, 17% females. Education distribution was 37.7% secondary or less, 34.5% College Degree, and
24.9% Advanced Degree. Geographic distribution includes US (27% of participants), India (18%), UK
(3.6%), Brazil (2.8%), and others (total of 152 countries).
Data. The data for this study consists of learners’ clickstream data, which mainly include video events
(play, pause, etc.), responses to assessment items, and navigation to course pages, yielding about half a
million data points. In addition, we use the course structure files, which hold information that describes
the course elements and the relations between them (e.g., the page in which a question resides).
Fake Learners: Definition and Detection
We define fake learners as users who apply unauthorized methods to improve their grade. This definition
emphasizes the fact that the apparent behavior of fake learners does not explain significant aspects of their
performance (can be achieved without ‘learning’, at least of Physics, in the case of our course), that it is
systematic, and goal-oriented (as opposed to ‘gaming the system’ (Baker et al., 2008), for example). In a
few occasions we also use the term ‘cheating’, but as a general issue, and we deliberately avoid referring
to ‘fake learners’ as ‘cheaters’. This is because the question of what should be regarded as ‘cheating’ in
MOOCs is an issue that requires a discussion that is out of the scope of this paper, going well beyond the
technical question of whether the user broke the edX “Terms of Service & Honor Code”4.
Currently, we have means to identify two types of such methods:
3https://courses.edx.org/courses/MITx/8.MReVx/2T2014/course
4https://www.edx.org/edx-terms- service
6Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
Copying Using Multiple Accounts (CUMA): This refers to users who maintain multiple accounts.
Amaster account that receives credit, and harvesting account/s that are used to collect the correct
answers (typically by relying on the fact that many questions provide the full answer, or at least
true/false feedback, after exhausting the maximum number of attempts) (Alexandron et al., 2015a;
Ruipérez-Valiente et al., 2016). We note that in this method the multiple accounts are used by the
same person. We use the algorithm described in (Alexandron et al., 2017). The algorithm detects
65 master accounts out of the 478 certificate earners (as noted in (Alexandron et al., 2017), the
algorithm is designed to provide a lower bound on the true number of master accounts). Among
masters and harvesters, only the master accounts are considered as fake learners (the harvesting
accounts are not certified).
Collaborators:This definition refers to MOOC learners who collaborate with peers and submit
a significant portion of their assignments together. This is explicitly forbidden by the edX Honor
Code, “unless collaboration on an assignment is explicitly permitted”, which was not the case.
To detect such collaboration, we use the algorithm of (Ruipérez-Valiente et al., 2017a) 5, which
uses dissimilarity metrics to find accounts that tend to submit their assignments in close proxim-
ity in time. Overall, the algorithm identifies 20 (4%) of the certificate earners as submitting a
significant portion of their assignments with peers.
As there are users who use both methods, we give the CUMA algorithm priority when conducting
analyses that require to assign a user to one of the groups (‘CUMA Users’ or ‘Collaborators’), as it rep-
resents a more specific behavioral pattern. Among the unauthorized collaborators, 11 also used CUMA.
Hereafter we refer as ‘collaborators’ to the 9 accounts who were not CUMA users.
Measures of Learners’ Performance
To date there is no standard and well accepted method to evaluate the performance of MOOC learners.
In the MOOC literature, the most common measures are most likely grade, and the binary yes/no for
certification. We use these, in addition to more robust methods that draw on Psychometrics and Item
Response Theory (IRT) (Meyer and Zhu, 2013; Champaign et al., 2014). The measures that we use are
listed below.
Grade: Total points earned in the course (60 points is the threshold for certification). The main
issue with this measure is that it is very sensitive to which and how many items a learner attempts.
In addition, it does not consider the attempt in which the learner succeeded (most items allow
multiple attempts). Also, the variability that we observed on this measure was very low. Due to
these limitations, we find this measure less useful as a valid and reliable measurement. However it
is the most common measure, and what edX instructors receive.
Proportion Correct on First Attempt (CFA): The proportion of items, among the items that the
student attempted, that were answered correctly on the first attempt. While CFA is a simple and
5source code: https://github.com/jruiperezv/close_submitters_algorithm
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 7
straightforward approximation of students performance, which in this MOOC is highly correlated
with more robust measures (e.g, IRT), it is also very sensitive to which items a learner chooses to
attempt (choosing easy items will lead to higher CFA).
Ability: Student’s ability using a 2PL IRT model. Model is fitted on the first attempt matrix of
the certificated users (N=502), and item set that contains questions attempted by at least 50% of
these users. We chose IRT because students’ IRT ability scores are known to be independent of
the problem sets each student tried to solve (De Ayala, 2009). Missing items are treated using
mean imputation (Donders et al., 2006). The model is fitted on a standard laptop using R’s TAM
package 6.
Weekly Improvement: Per student, this is interpreted as the slope of the regression line fitted to the
weekly IRT ability measure (namely, the result of fitting 2PL IRT on each week of the course in
separate) (Champaign et al., 2014). One of the important issues that must be addressed during the
calculation of the IRT slopes is to set up the common scale across weekly IRT scores. IRT is a
latent variable model, and a latent variable does not have any inherent scale. Therefore, each IRT
estimation defines its own scale for the latent variable. Equating is the process of transforming a
set of scores from one scale to another. We used mean and sigma equating to set up a common
scale across weekly IRT scores. The equated IRT slope captures the change in students’ relative
performance during the course. For example, a student who has average performance in all the
weeks, will have 0relative improvement.
Mean Time on Task: The average time the student spent on an item. For multiple attempts, it is
composed of the sum of time for all attempts. The time for each attempt is operationalized as
the delta between the time of the attempt, and the time of the previous action (navigating into the
page, submission to previous item on the same page, or previous submission to this item in case
of multiple attempts; we do not accumulate durations over 15 minutes, assuming that the user
disengaged from the system (Champaign et al., 2014)). Time on task, or response time, plays an
important role in cognitive ability measurement (Goldhammer, 2015). Kyllonen and Zu (Kyllonen
and Zu, 2016) stated that “A recurring question has been whether speed and level are simply
two interchangeable measures of the same underlying ability or whether they represent different
constructs” (p. 2).
RESULTS
The results are organized into three subsections. The first subsection provides descriptive statistics that
demonstrates the differences between fake and true learners with respect to fundamental behavioral char-
acteristics. The second and third subsections present the results of replicating (Koedinger et al., 2015)
and (Champaign et al., 2014), respectively.
6https://cran.r-project.org/web/packages/TAM/TAM.pdf
8Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
Differences in Behavioral Characteristics
Time on Course Resources. First, we measure the amount of time that fake learners spent on different
course resources, compared to true learners. We consider:
Reading time: Time that the user spent on explanatory pages.
Watching time: Time that the user spent on videos.
Homework time: Time spent in pages that contain homework items.
Figure 1 presents the time that fake and true learners spent on each resource type. As can be seen,
fake learners spent significantly less time on each type of the instructional resources. This is confirmed
with a two-sided Mann-Whitney Utest (nfake = 72,ntrue = 406,p.value < 0.01). For Watching Time,
median values for fake and true learners are 0.3and 1.6hours, respectively (U= 21,040); For Reading
Time, median values for fake and true learners are 7.2and 15.9hours, respectively (U= 21,499);
For Homework Time, median values for fake and true learners are 7.7and 16.0hours, respectively
(U= 21,934).
Fake True
0 2 4 6 8 10 12
Watching Time
Learner Group
Log (seconds)
Fake True
8 9 10 11 12
Reading Time
Learner Group
Fake True
9.0 9.5 10.0 10.5 11.0 11.5
Homework Time
Learner Group
Fig.1. Time on instructional resources
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 9
Proportion of Items Solved. The proportion of assessment items that true and fake learners attempted
(successfully or not) is another metric on which we compare the behavior of the groups. A main reason
is that solving assessment items, especially ones that do not contribute much to the final grade (available
as formative assessment), and after the learner secured enough points to receive a certificate, is a clear
indication of motivation to learn.
The course contains mainly three types of assessment items: Checkpoint, Homework, and Quiz (see
Subsection Experimental Setup). They are analyzed separately because of their different characteristics
with respect to weight (points for solving them), and the easiness of getting the correct answer without
effort (e.g., whether the ‘show answer’ option is enabled for most of them after exhausting the possible
attempts).
We assume that fake learners would factor that into their decision of whether to spend time on these
items. For example, since Checkpoint items have low weight, we assume that fake learners would show
less interest in solving them. Quiz items have high weight, but are harder to copy (no ‘show answer’,
only true/false feedback). Homework offers relatively high weight and have ‘show answer’ enabled,
which probably makes them ideal for fake learners (high ‘return on investment’).
Figure 2 presents the proportion of items solved by each group. As in the case of the time spent
on resources (previous subsection), there is a clear difference between the groups, with fake learners
trying less items. This is confirmed with a two-sided Mann-Whitney Utest (nfake = 72,ntrue = 406,
p.value < 0.05). For Checkpoint items, median values for fake and true learners are 0.69 and 0.75,
respectively (U= 18,442); For Quiz items, median values for fake and true learners are 0.58 and 0.64,
respectively (U= 18,752); For Homework items, median values for fake and true learners are 0.47 and
0.48, respectively (U= 17,088).
Interestingly, and as we suspected, the difference between the groups is smaller on Homework items
which provide to fake learners the appealing combination of high weight and a ‘show answer’ feature.
This is furthered discussed in the Discussion section.
Performance Measures. Figure 3 illustrates the differences between fake and true learners with respect
to the measures of learners performance that were defined in the Methods section.
Fake learners are significantly faster than true learners (median time-on-task is 97 vs. 150 seconds,
respectively; pvalue < 0.001). Another measure on which they are better than true learners is weekly
improvement. This means that fake learners tend to improve (relative to the other learners) during the
course. This is in line with the findings of (Alexandron et al., 2017), which reported that the number of
items that CUMA users copied tended to increase significantly as the course progressed.
On the other metrics (grade,ability, CFA) the two populations do not differ significantly (fake
learners have a higher CFA, and lower grade, but with pvalue > 0.05). However, we do find on
these metrics a significant difference within the fake learners cohort, between the CUMA users and the
collaborators. This is demonstrated in Figure 3. CUMA users have higher grade (0.85 vs. 0.77), ability
(0.21 vs. 0.66), and CFA (0.79 vs. 0.67), than collaborators, all with significant p-values.
Summary of the differences. Overall, we see that fake learners spent much less time on course resources,
and attempted less items. In the case of response time, we see that fake learners solve exercises much
10 Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
Fake True
0.4 0.5 0.6 0.7 0.8 0.9
Fake True
0.4 0.5 0.6 0.7 0.8
Fake True
0.3 0.4 0.5 0.6 0.7
Fig.2. Proportion of items solved by category.
faster. Regarding success metrics, fake learners have a higher weekly improvement. On the other success
metrics (grade,ability and CFA), on average there is no significant difference between true and fake
learners.
Replication Study 1
Next, we examine the effect of the differences in the behavioral metrics presented above on the findings
reported in “Learning is Not a Spectator Sport: Doing is Better than Watching for Learning from a
MOOC” (Koedinger et al., 2015). Specifically, we concentrated on the third research question (RQ)
– “What variations in course feature use (watching videos, reading text, or doing activities) are most
associated with learning? And can we infer causal relationships?”.
The analysis for this RQ is presented in the section titled “Variation in Course Feature Use Predict
Differences in Learning Outcomes” (starting on p. 116, left column). It has two parts – “Exploratory Data
Analysis”, and “Causal Analysis”, which we refer to as Subsections Analysis 1A and 1B, respectively.
Analysis 1A
This analysis characterizes students on three behavioral dimensions by performing a median split on each
of the metrics – amount of videos played, number of pages visited, and number of activities started. A
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 11
Fake True
0.6 0.7 0.8 0.9 1.0
Grade
Learner Group
Fake True
−2 −1 0 1 2 3
Ability
Learner Group
Fake True
0.5 0.6 0.7 0.8 0.9
CFA
Learner Group
Fake True
−0.4 −0.2 0.0 0.2 0.4
Weekly Improvement
Learner Group
Fake True
50 100 150 200 250
Time on Task
Learner Group
Fig.3. Performance measures.
learner who is on the upper half of each split is referred to as ‘Watcher’, ‘Reader’, and ‘Doer’, respec-
tively. This split yields 8 (23) subgroups.
The subgroups are compared on 2global performance measures – ‘Quiz Total’, and ‘Final Exam’.
The conclusions regarding the quizzes are that “Doers do well on the quizzes [...] even without being on
the high half of reading or watching”, “doing the activities may be sufficient to do well on the quizzes”.
Regarding the final exam, it is found that doing is most important (“a higher final exam score is more
typical of those on the higher half of doing”), but that doing is furthered enhanced by watching, reading,
or both. Altogether, the title of this analysis is that “Doing, not Watching, Better Predicts Learning”,
which supports the phrase “Doing is Better than Watching” in the title of the paper.
Replicating Analysis 1A on 8.MReVx 2014
In order to conduct this analysis on the data of 8.MReVx, we need to make a few adjustments. The
definitions of Watcher, Reader, and Doer remain the same. Doer is computed based on the amount of
items started, but this raises some issue as the grade and the IRT ability – the outcome measures, are also
based on the items solved. To make these measures as independent as possible, we base the definition
of Doer on checkpoint items (items within the units). This is to reflect the nature of ‘active learning by
doing’, which we interpret as the main idea behind the doing profile, while using items that their direct
contribution to the grade and IRT ability is minor.
The second major decision to make was on what the global measure of learning should be. In the
original paper, Quiz Total and Final Exam are used. In 8.MReVx there is no Final Exam (there is a post-
12 Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
test that was taken by a very small number of learners), and the grade is not sufficient for this purpose,
as the variability of the grade is very low, and its distribution among the learner profiles is quite uniform.
Thus, we use IRT as a global measure of performance in the course.
The results of the analysis, with and without fake learners, are presented in Figure 4. The left figure
demonstrates the analysis for all learners (including fake learners). Within this figure, the leftmost, red
bar represents the ability of ‘Doers who are neither Watchers nor Readers’. This bar is the highest,
meaning that this is the most successful group of learners. It is in line with the finding of (Koedinger
et al., 2015) that Doers can do well without watching videos or reading explanations (the original figure
from (Koedinger et al., 2015) is presented in Appendix 3, Figure 4a).
The right figure presents the results of the same analysis without fake learners (only true learners).
As can be seen, the performance of ‘Doers who are neither Watchers nor Readers’ drops sharply from
mean IRT ability of 0.41 to mean IRT ability of 0.14. This change is statistically significant (using
Bootstrap hypothesis testing; see below). Without fake learners, ‘Doers who are also Watchers’
become the most successful group (there is a small decrease in the performance of this group, which is
statistically insignificant).
To verify that the effect is not an artifact we use Bootstrap hypothesis testing (MacKinnon, 2009).
Denote the group of all learners by L, and the size of the group of the ‘true’ learners by n. We estimate
the ‘Sampling distribution of mean IRT ability’ of ‘Doers who are neither Watchers nor Readers’ and
‘Doers who are also Watchers’ using 1000 bootstrap samples of size nfrom L. The results show that the
change to the mean IRT ability of ‘Doers who are neither Watchers nor Readers’ is statistically significant
(pvalue < 0.05), and that the change to the mean IRT ability of ‘Doers who are also Watchers’ is
insignificant. A figure demonstrating the sampling distribution for both groups is provided in Appendix
1.
Doers still do better in all combinations, but the original conclusion that Doers can do well without
watching videos or reading explanations becomes debatable when removing fake learners.
Replicating Analysis 1B
In the original analysis, Tetrad, a tool for causal inference, was used to evaluate whether associations
between key variables – pre-test, use of course materials (doing, watching, reading), and outcomes (quiz
total, final test), are potentially causal (the original graph from (Koedinger et al., 2015) is presented in
Appendix 3, Figure 4b).
We replicated the same analysis using Tetrad on the data of 8.MReVx with and without fake learners.
As a ‘pre-test’, we used the IRT score of the first week. The results are presented in Figure 5. The left
figure demonstrates the graph for all learners (including fake learners). The graph on the right is the
result after removing fake learners, namely, for true learners only.
As can be seen, the causal graph changes significantly after removing fake learners data. First,
when removing fake learners (moving from the left to the right figures), two causal link disappear
(watching quizScore and reading I RT ). Interestingly, the original link reading I RT
has a negative weight, meaning that with fake learners, reading is found to have a negative effect on IRT
ability. The weights of the other edges change, but we do not interpret this as a qualitative difference.
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 13
−0.6
−0.4
−0.2
0.0
0.2
0.4
Neither Reader Watcher Read&Watcher
Doer
Nondoer
IRT by Learner Type (All)
(a)
−0.6
−0.4
−0.2
0.0
0.2
0.4
Neither Reader Watcher Read&Watcher
Doer
Nondoer
IRT by Learner Type (True)
(b)
Fig.4. IRT results: (a) All learners (including fake learners); (b) true learners only (fake learners removed).
(We note that we could not use sampling techniques to evaluate the statistical significance of the change
in the Tetrad results, as we did not find a way to run Tetrad’s execution engine as a run-time library. The
tool version that we use is the one that was used in the original paper (Koedinger et al., 2015), which is
an old version that to the best of our knowledge does not support such usage.)
Future Projection
To evaluate the effect that an increase in the percentage of fake learners could potentially have on the
analytics bias, we repeat the analyses of Subsections Analysis 1A and 1B after increasing the amount of
fake learners from 15% to 26%. This increase is achieved by simply duplicating the fake learners
data, in order to maintain a similar multivariate distribution with respect to the behavioral characteristics
that we measure.
The rationale for this ‘simulation’ analysis is twofold: First, the current 15% fake learners is
a lower bound, and we assume that the actual amount of fake learners is higher. Second, it seems
reasonable to assume that cheating in MOOCs will increase as the result of MOOC certificates gaining
more value (Alexandron et al., 2017).
The effect on the bias is presented in Appendix 2. The effect on the bias of the Tetrad analysis
is incremental (same edges, with slight change of weights; see Appendix 2, Figure 3). The effect on
Analysis 1A is significant, with “Doers who are neither Watchers nor Readers” becoming by far the most
successful group (see Appendix 2, Figure 2).
14 Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
(a) (b)
Fig.5. Tetrad Results: (a) All learners (including fake learners); (b) true learners (fake learners removed).
Summary of results - Replication Study 1
Based on the results of Subsections 1A and 1B, we conclude that the analysis of “What variations in
course feature use are most associated with learning? And can we infer causal relationships?” – RQ3
from the paper (Koedinger et al., 2015) – changed in a meaningful way when replicated on the data of
8.MReVx with and without fake learners.
Replication Study 2
Another educational data mining study of the relation between which course materials learners use,
and their success in the course, was presented in (Champaign et al., 2014). The research objective is
to understand the effectiveness of online learning materials, with the goal of improving the design of
interactive learning environments. Among the “most striking features” that emerged from their analysis
were (p. 18) “the large number of negative correlations between time spent on resource use and skill
level in 6.002x” and “the significant negative correlations between relative skill increase and time spent
on any of the available instructional resources in 6.002x, accompanied by only one significant positive
correlation” (the figure from the original paper is presented in Appendix 3, Figure 5).
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 15
As in the case of (Koedinger et al., 2015), what attracted our attention was the negative correla-
tion between performance and use of (some of) instructional resources. Again, does this mean that the
learning materials are unhelpful?
Thus, we replicate the analysis within Subsection “Correlations of Skill and Learning with Instruc-
tional Resource Use” (starts at p. 17). We note that the research found significant differences in the same
correlations among two different MITx MOOCs (8.MReV 2013 and 6.002x 2012 – the first MOOC of-
fered by MITx). Since these correlations seem to be course specific, our focus when replicating this
analysis is not whether we receive the same results, but whether the results that we receive remain the
same with and without fake learners.
The results are presented in Figure 6 (the figure adopts the visualization used in (Champaign et al.,
2014)). It shows the relation between the amount of time spent on various course resources, and certain
performance metrics. For each pie, the outer circle is the whole group, and the inner is the same measure
after removing fake learners from the data. The angle of the piece represents the size of the correlation.
Clockwise angle represents positive correlation (colored with green), and counterclockwise represents
negative correlation (colored in red). Gray color means pvalue > 0.05. The difference between
the angle of the outer circle, and the angle of the inner one, is the effect of fake learners’ data on the
correlation.
Let us examine the correlations with pvalue < 0.05 (colored with red/green). With respect to
Grade vs. Homework Time and Grade vs. Reading Time, there is almost no effect (angle of inner and
outer piece is almost identical). With respect to Ability vs. Homework and Ability vs. Reading Time,
we see a negative correlation, which is reduced when removing fake learners. With respect to weekly
improvement vs. homework time, we see a positive correlation, which increases when removing fake
learners.
Summary of results - Replication Study 2
For the three metrics that changed when removing fake learners’ data – Ability vs. Homework Time,
Ability vs. Reading Time, and Weekly Improvement vs. Homework Time – we see that after removing
fake learners’ data, the correlation moved in the positive direction. However, in each of the correlations
we did not see a qualitative change, such as a negative correlation that becomes positive.
DISCUSSION
Our results show that fake learners interact with the course materials in a very different way than true
learners. For example, they attempt fewer questions and show minimal interest in the instructional ma-
terials. On the other hand, they exhibit high performance on various metrics – their IRT ability,weekly
improvement, and CFA, are slightly higher, and their time-on-task is significantly faster. This is not
surprising, as these users use means other than learning to achieve these results.
Due to their aberrant behavior, and their prevalence the MOOC that we study (15% of the cer-
tificate earners), we suspected that fake learners can bias the results of learning analytics, especially
those that deal with effective use of course resources. Our findings show that this is not an hypothetical
16 Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
GradeAbilityWeekly Imp.
Homework Time Reading Time Watching Time
Fig.6. Effect of fake learners on correlation between performance and time on course resources.
risk. We use the methodology of Replication Research as means to focus our investigation on analytics
that are acknowledged by the learning analytics research community as meaningful insights into what
constitute effective learning behavior in MOOCs. Among the two studies that we replicated, one was
relatively stable against fake learners, but on the other, the findings were biased in a significant way by
fake learners’ data.
Since the primary motivation for fake learning in 8.MReV is receiving a certificate with low effort
(Alexandron et al., 2017), it is reasonable to assume that the same behavioral pattern of high performance
and low resource use would characterize such learners in other MOOCs. Depending on the prevalence of
fake learners among the learners, MOOC-based learning analytics research is vulnerable to bias by fake
learners. However, the amount of fake learners within MOOCs is still an open question.
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 17
The amount of CUMA in 8.MReV seems to generalize to other MOOCs (Alexandron et al., 2017),
and we do not see reason to assume otherwise for collaborators. Obviously, there could be other fake
learning methods, but we have not worked on new detection algorithms yet. Over time, we can expect
that the percentage of fake learners, and subsequently their effect, may rise with the increase in the value
of MOOC certificates, unless proper actions are taken against this. To evaluate the effect of increase in
the number of fake learners, we conducted a simple ‘simulation’ analysis, illustrating the hypothetical
bias caused by a 2X increase in the percentage of fake learners (see Subsection Future Projection). The
results demonstrate that with 2X fake learners we can expect to see a significant increase in the bias.
Our research supplies some evidence that fake learners can lead to wrong inference on analytics
aiming to address questions such as ‘what are effective learning strategies?’, ‘which types of resources
are helpful?’, and so on. While ‘global’ correlations seem to be less vulnerable, studying selected cohorts
like ‘efficient learners’ (e.g., ones who are fast and successful) would be very prone to bias due to fake
learners’ activity.
It is important to emphasize that the bias is due to the the fact that fake learners introduce noise
into the data. Such noise can affect various types of computational models – for example, consider a
recommendation engine that sequences content to learners in real-time. Such engines typically rely on
machine learning models that are fitted to learners’ data. Biased data can lead to modeling ‘noise’ instead
of real predictor-outcome relationships. Since such machine learning models are typically encapsulated
within ‘policy’ layers that use ‘business’ (in this case, pedagogy) logic to translate prediction into action,
validating the recommendations becomes an extremely difficult task (Krause et al., 2016). As a thought
experiment, imagine two competing MOOC content recommendation engines: an adversary engine that
sends learners to random pages that they have not seen, and a Zone of Proximal Development (ZPD)
engine that is tuned to challenge learners while keeping them within the ZPD. Now, assume that we have
a sequence of pages that both engines recommended during a 15-minutes activity. What is the chance
that an expert would identify which one is the adversary, and which one is the ZPD, without knowing the
nitty-gritty of the ZPD engine?
Fake Accounts in Social Networks. Malicious use of fake accounts is a common issue in social networks.
Above all, the Facebook–Cambridge Analytica data scandal7has brought to public attention the issue of
fake accounts, and how they can be used in malicious ways and on large-scale to collect data and affect
social trends. Partly as the result of this ‘wake-up call’, Twitter recently announced that it has shut down
70 million fake/suspicious accounts since May 2018 8. This (negative) similarity between MOOCs and
social networks sheds some light on the fact that MOOCs are a learning environment which is also a
global social platform.
Limitations. The main limitations of our research are the fact that it is based on data from one MOOC,
and on detection algorithms that detect only a subset of the fake learners in the course. Future research
can expand this analysis to multiple MOOCs, and hopefully until then there will be algorithms for de-
tecting other fake learning methods. In addition, it would be interesting to evaluate the effect on a wider
set of learning analyses and machine learning models.
7https://en.wikipedia.org/wiki/Facebook%E2%80%93Cambridge_Analytica_data_scandal
8https://www.nytimes.com/2018/07/11/technology/twitter-fake- followers.html
18 Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
Summary and Conclusions
This study follows Replication Research methodology, and uses Sensitivity Analysis techniques to study
how learning analytics can be biased by noisy data that include a significant amount of fake learners –
learners who use illegitimate techniques to improve their grade. These users exhibit learning behaviors
that are very different from those of ‘true’ learners, and achieve high performance. This can bias the
analytics towards falsely identifying non-learning behaviors as effective learning strategies.
Our findings provide the first evidence, to the authors’ knowledge, of how non-learning behaviors
that are not modeled can significantly bias learning analytics results. The findings also point to the fact
that cheating in educational settings can have consequences that go way beyond the issue of academic
dishonesty.
To date, the issue of the reliability of learning analytics has received little attention within the learn-
ing analytics research community. As a first step, it is important to acknowledge that this is a real
concern. Conveying this message is one of the major goals of this paper.
In order to address this issue, it is important to adopt more robust techniques for evaluating and
validating learning analytics research, for example by encouraging and facilitating replication research
at scale, and by developing advanced verification techniques. Another direction to take from this research
is to develop detection methods that can generalize across platforms and course designs, e.g. by using
ML (Ruipérez-Valiente et al., 2017b) or anomaly detection techniques (Alexandron et al., 2019).
In the verification domain, much can be learned from the hardware verification industry, which
makes extensive use of sophisticated simulation methods, and from recent developments in the area of
verification of autonomous vehicles, which deal with verifying complicated artificial intelligence sys-
tems.
ACKNOWLEDGEMENTS
GA’s research is supported by the Israeli Ministry of Science and Technology under project no. 713257.
REFERENCES
G. Alexandron, J. A. Ruipérez-Valiente, and D. E. Pritchard. Evidence of MOOC Students Using Mul-
tiple Accounts To Harvest Correct Answers, 2015a. Learning with MOOCs II, 2015.
G. Alexandron, Q. Zhou, and D. Pritchard. Discovering the Pedagogical Resources that Assist Stu-
dents in Answering Questions Correctly — A Machine Learning Approach. Proceedings of the 8th
International Conference on Educational Data Mining, pages 520–523, 2015b.
G. Alexandron, J. A. Ruipérez-Valiente, Z. Chen, P. J. Muñoz-Merino, and D. E. Pritchard. Copy-
ing@Scale: Using Harvesting Accounts for Collecting Correct Answers in a MOOC. Computers and
Education, 108:96–114, 2017.
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 19
G. Alexandron, J. A. Ruipérez-Valiente, S. Lee, and D. E. Pritchard. Evaluating the robustness of learning
analytics results against fake learners. In Proceedings of the Thirteenth European Conference on
Technology Enhanced Learning. Springer, 2018.
G. Alexandron, J. A. Ruipérez-Valiente, and D. E. Pritchard. Towards a General Purpose Anomaly
Detection Method to Identify Cheaters in Massive Open Online Courses. Proceedings of the 12th
International Conference on Educational Data Mining, 2019.
R. Baker, J. Walonoski, N. Heffernan, I. Roll, A. Corbett, and K. Koedinger. Why Students Engage in
"Gaming the System" Behavior in Interactive Learning Environments. Journal of Interactive Learning
Research, 19(2):162–182, 2008.
J. Champaign, K. F. Colvin, A. Liu, C. Fredericks, D. Seaton, and D. E. Pritchard. Correlating skill and
improvement in 2 MOOCs with a student’s time on tasks. Proceedings of the first ACM conference on
Learning @ scale conference - L@S ’14, (March):11–20, 2014.
Z. Chen, C. Chudzicki, D. Palumbo, G. Alexandron, Y.-J. Choi, Q. Zhou, and D. E. Pritchard. Re-
searching for better instructional methods using AB experiments in MOOCs: results and challenges.
Research and Practice in Technology Enhanced Learning, 11(1):9, 2016.
R. De Ayala. The Theory and Practice of Item Response Theory. Methodology in the social sciences.
Guilford Publications, 2009.
A. R. T. Donders, G. J. Van Der Heijden, T. Stijnen, and K. G. Moons. A gentle introduction to imputation
of missing values. Journal of clinical epidemiology, 59(10):1087–1091, 2006.
X. Du, W. Duivesteijn, M. Klabbers, and M. Pechenizkiy. Elba: Exceptional learning behavior analysis.
In Educational Data Mining, pages 312–318, 2018.
J. Gardner, C. Brooks, J. M. L. Andres, and R. Baker. Morf: A framework for mooc predictive modeling
and replication at scale. arXiv preprint arXiv:1801.05236, 2018.
F. Goldhammer. Measuring ability, speed, or both? challenges, psychometric solutions, and what can
be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13
(3-4):133–164, 2015.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in
Statistics. Springer New York Inc., New York, NY, USA, 2001.
V. Hodge and J. Austin. A survey of outlier detection methodologies. Artificial intelligence review, 22
(2):85–126, 2004.
M. Kiernan, H. C. Kraemer, M. A. Winkleby, A. C. King, and C. B. Taylor. Do logistic regression and
signal detection identify different subgroups at risk? implications for the design of tailored interven-
tions. Psychological Methods, 6(1):35, 2001.
20 Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
J. Kim, P. J. Guo, C. J. Cai, S.-W. D. Li, K. Z. Gajos, and R. C. Miller. Data-driven interaction techniques
for improving navigation of educational videos. Proceedings of the 27th annual ACM symposium on
User interface software and technology - UIST ’14, pages 563–572, 2014a.
J. Kim, P. J. Guo, D. T. Seaton, P. Mitros, K. Z. Gajos, and R. C. Miller. Understanding in-video dropouts
and interaction peaks in online lecture videos. 2014b.
K. R. Koedinger, E. A. Mclaughlin, J. Kim, J. Z. Jia, and N. L. Bier. Learning is Not a Spectator Sport :
Doing is Better than Watching for Learning from a MOOC. pages 111–120, 2015.
J. Krause, A. Perer, and K. Ng. Interacting with predictions: Visual inspection of black-box machine
learning models. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Sys-
tems, pages 5686–5697. ACM, 2016.
P. Kyllonen and J. Zu. Use of response time for measuring cognitive ability. Journal of Intelligence, 4
(4):14, 2016.
D. Lazer, R. Kennedy, G. King, and A. Vespignani. The parable of google flu: traps in big data analysis.
Science, 343(6176):1203–1205, 2014.
J. M. Luna, C. Castro, and C. Romero. Mdm tool: A data mining framework integrated into moodle.
Computer Applications in Engineering Education, 25(1):90–102, 2017.
Z. MacHardy and Z. A. Pardos. Toward the evaluation of educational videos using bayesian knowledge
tracing and big data. In Proceedings of the Second (2015) ACM Conference on Learning @ Scale,
L@S ’15, pages 347–350. ACM, 2015.
J. G. MacKinnon. Bootstrap Hypothesis Testing, chapter 6, pages 183–213. John Wiley Sons, Ltd, 2009.
J. P. Meyer and S. Zhu. Fair and equitable measurement of student learning in moocs: An introduction to
item response theory, scale linking, and score equating. Research & Practice in Assessment, 8:26–39,
2013.
O. Müller, I. Junglas, J. v. Brocke, and S. Debortoli. Utilizing big data analytics for information systems
research: challenges, promises and guidelines. European Journal of Information Systems, 25(4):289–
302, 2016.
C. G. Northcutt, A. D. Ho, and I. L. Chuang. Detecting and preventing "multiple-account" cheating in
massive open online courses. Comput. Educ., 100(C):71–80, Sept. 2016.
C. O’Neil. Weapons of math destruction: How big data increases inequality and threatens democracy.
Broadway Books, 2017.
Open Science Collaboration. Estimating the reproducibility of psychological science. Science, 349
(6251), 2015. ISSN 0036-8075.
Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy? 21
A. Pardo, N. Mirriahi, R. Martinez-Maldonado, J. Jovanovic, S. Dawson, and D. Gaševi´
c. Generating
actionable predictive models of academic performance. In Proceedings of the Sixth International
Conference on Learning Analytics & Knowledge, pages 474–478. ACM, 2016.
Z. A. Pardos, S. Tang, D. Davis, and C. V. Le. Enabling Real-Time Adaptivity in MOOCs with a
Personalized Next-Step Recommendation Framework. In Proceedings of the Fourth (2017) ACM
Conference on Learning @ Scale - L@S ’17, 2017. ISBN 9781450344500. doi: 10.1145/3051457.
3051471.
S. Perez, J. Massey-Allard, D. Butler, J. Ives, D. Bonn, N. Yee, and I. Roll. Identifying productive
inquiry in virtual labs using sequence mining. In E. André, R. Baker, X. Hu, M. M. T. Rodrigo,
and B. du Boulay, editors, Artificial Intelligence in Education, pages 287–298, Cham, 2017. Springer
International Publishing.
J. Qiu, J. Tang, T. X. Liu, J. Gong, C. Zhang, Q. Zhang, and Y. Xue. Modeling and predicting learning
behavior in moocs. In Proceedings of the ninth ACM international conference on web search and data
mining, pages 93–102. ACM, 2016.
J. Reich and J. A. Ruipérez-Valiente. The MOOC pivot. Science, 363(6423):130–131, 2019.
C. Romero and S. Ventura. Educational data science in massive open online courses. Wiley Interdis-
ciplinary Reviews: Data Mining and Knowledge Discovery, WIREs Data Mining Knowl Discov, 01
2017. doi: 10.1002/widm.1187.
Y. Rosen, I. Rushkin, A. Ang, C. Federicks, D. Tingley, and M. J. Blink. Designing adaptive assessments
in moocs. In Proceedings of the Fourth (2017) ACM Conference on Learning @ Scale, L@S ’17, pages
233–236, 2017. ISBN 978-1-4503-4450-0.
J. A. Ruipérez-Valiente, G. Alexandron, Z. Chen, and D. E. Pritchard. Using Multiple Accounts for
Harvesting Solutions in MOOCs. Proceedings of the Third (2016) ACM Conference on Learning @
Scale - L@S ’16, pages 63–70, 2016.
J. A. Ruipérez-Valiente, S. Joksimovi´
c, V. Kovanovi´
c, D. Gaševi´
c, P. J. Muñoz Merino, and C. Del-
gado Kloos. A data-driven method for the detection of close submitters in online learning environ-
ments. In Proceedings of the 26th International Conference on World Wide Web Companion, pages
361–368, 2017a.
J. A. Ruipérez-Valiente, P. J. Muñoz-Merino, G. Alexandron, and D. E. Pritchard. Using Machine Learn-
ing to Detect ‘Multiple-Account’ Cheating and Analyze the Influence of Student and Problem Fea-
tures. IEEE Transactions on Learning Technologies, 14(8):1–11, 2017b.
J. A. Ruipérez-Valiente, P. J. Muñoz-Merino, J. A. Gascón-Pinedo, and C. D. Kloos. Scaling to massive-
ness with ANALYSE: A learning analytics tool for Open edX. IEEE Transactions on Human-Machine
Systems, 47(6):909–914, 2017c.
22 Alexandron et al. / Are MOOC Learning Analytics Results Trustworthy?
A. Saltelli, K. Chan, E. M. Scott, et al. Sensitivity analysis, volume 1. Wiley New York, 2000.
S. A. Seshia and D. Sadigh. Towards verified artificial intelligence. CoRR, abs/1606.08514, 2016. URL
http://arxiv.org/abs/1606.08514.
G. Siemens. Learning Analytics: The Emergence of a Discipline. American Behavioral Scientist, (10):
1380–1400, 2013.
N. Silver. The signal and the noise: why so many predictions fail–but some don’t. Penguin, 2012.
U.S. Department of Education, Office of Educational Technology. Enhancing Teaching and Learning
Through Educational Data Mining and Learning Analytics: An Issue Brief, 2012.
T. van der Zee and J. Reich. Open education science. AERA Open, 4(3):2332858418787466, 2018.
W. Xing, X. Chen, J. Stein, and M. Marcinkowski. Temporal predication of dropouts in moocs: Reaching
the low hanging fruit through stacking generalization. Computers in Human Behavior, 58:119–129,
2016.
M. Yudelson, S. Fancsali, S. Ritter, S. Berman, T. Nixon, and A. Joshi. Better data beats big data. In
Educational Data Mining 2014, 2014.
23
Appendices
Appendix 1: Sampling Distribution of Mean IRT Ability
(a) (b)
Figure.1. Sampling Distribution of Mean IRT Ability for (a) Doers who are neither Watchers nor Readers (b)
Doers who are also Watchers. The dashed vertical lines mark the 95% confidence interval, and the vertical blue
lines mark the mean value without fake learners.
Appendix 2: Replication Study 1 with 2X Simulated Fake Learners
(a) (b) (c)
Figure.2. IRT Results (a) With simulated 2xfake learners (b) Original data (All learners); (c) true learners (fake
learners removed).
24
(a) (b) (c)
Figure.3. Tetrad Results (a) With simulated 2xfake learners (b) Original data (all learners); (c) true learners (fake
learners removed).
Appendix 3: Figures from Original Papers
(a)
(b)
Figure.4. Original figures from (Koedinger et al., 2015) (a) Final grade by learner type (b) Causal model generated
by Tetrad.
25
Figure.5. Original figure from (Champaign et al., 2014)
... On that account, volatility modeling has attracted sparked interest in the research community [12,13]. The most common methodologies found in literatures treated the peak points as outliers [14,15] and adopted anomaly detections to identify them. Then they could be removed manually or replaced with the means of a moving window. ...
... A characteristic of time series volatility is the random presence of peaks that deviate from the average over temporal-or channel-wise. Modeling with the peaks can significantly affect the model performance [15]. Therefore, the most straightforward option in the SDP field was to treat the peak data as outliers and use outlier detection methodologies such as clustering [30] and Exceptional Model Mining (EMM) [31] to identify and remove them. ...
... What's more, It is not appropriate to remove or smooth these peaks without considering the reasons for their creation. Actually, the peak data in the MOOC is a "natural outlier", which may be generated by genuine learning activity [15]. It may provide helpful information about students' learning and should be considered when making predictions. ...
Article
Full-text available
Student Dropout Prediction (SDP) is pivotal in mitigating withdrawals in Massive Open Online Courses. Previous studies generally modeled the SDP problem as a binary classification task, providing a single prediction outcome. Accordingly, some attempts introduce survival analysis methods to achieve continuous and consistent predictions over time. However, the volatility and sparsity of data always weaken the models’ performance. Prevailing solutions rely heavily on data pre-processing independent of predictive models, which are labor-intensive and may contaminate authentic data. This paper proposes a Survival Analysis based Volatility and Sparsity Modeling Network (SAVSNet) to address these issues in an end-to-end deep learning framework. Specifically, SAVSNet smooths the volatile time series by convolution network while preserving the original data information using Long-Short Term Memory Network (LSTM). Furthermore, we propose a Time-Missing-Aware LSTM unit to mitigate the impact of data sparsity by integrating informative missingness patterns into the model. A survival analysis loss function is adopted for parameter estimation, and the model outputs monotonically decreasing survival probabilities. In the experiments, we compare the proposed method with state-of-the-art methods in two real-world MOOC datasets, and the experiment results show the effectiveness of our proposed model.
... In the present paper, we did not demonstrate feature importance ranking; future work can provide this additional information for stakeholders interested in the relative importance of the different features contributing to dropout. And, finally, it is important to consider the impact of fake learners on MOOCs, as recent research (Alexandron et al., 2019) suggests that fake learners can bias the findings of analytical experiments using MOOC learner-system data. ...
Article
Full-text available
The nature of teaching and learning has evolved over the years, especially as technology has evolved. Innovative application of educational analytics has gained momentum. Indeed, predictive analytics have become increasingly salient in education. Considering the prevalence of learner-system interaction data and the potential value of such data, it is not surprising that significant scholarly attention has been directed at understanding ways of drawing insights from educational data. Although prior literature on educational big data recognizes the utility of deep learning and machine learning methods, little research examines both deep learning and machine learning together, and the differences in predictive performance have been relatively understudied. This paper aims to present a comprehensive comparison of predictive performance using deep learning and machine learning. Specifically, we use educational big data in the context of predicting dropout in MOOCs. We find that machine learning classifiers can predict equally well as deep learning classifiers. This research advances our understanding of the use of deep learning and machine learning in optimizing dropout prediction performance models.
... Considering these definitions, online learning environments might be one of the most suitable platforms for the successful deployment of LA (Heath, 2021). In fact, students in online learning leave a large amount of data while interacting with course materials and assignments (Alexandron et al., 2019;Schwendimann et al., 2017). Therefore, these environments may contribute to data-driven decision-making in higher education, also in non-emergency days (Tsai et al., 2019;. ...
Article
Full-text available
Emerging technological advancements can play an essential role in overcoming challenges caused by the COVID-19 pandemic. As a promising educational technology field, Learning Analytics (LA) tools or systems can offer solutions to COVID-19 pandemic-related needs, obstacles, and expectations in higher education. In the current study, we systematically reviewed 20 papers to better understand the responses of LA tools to the online learning challenges that higher education students, instructors, and institutions faced during the pandemic. In addition, we attempted to provide key cases in which LA has been effectively deployed for various purposes during the pandemic in the higher education context. We found out several prominent challenges for stakeholders. Accordingly, learners needed of timely support and interaction, and experienced difficulty of time management. Instructors lacked pedagogical knowledge for online teaching. In particular, individual and collaborative assessment have been a challenge for them. Institutions have not been ready for a digital transformation and online teaching. In response to these challenges, LA tools have been deployed for the following opportunities: monitoring, planning online learning process, fostering learners’ engagement and motivation, facilitating assessment process; increasing interaction, improving retention, being easy to use. Understanding these promises can also give insight into future higher education policies.
... In Massive Open Online Courses (MOOCs), gaming the system is prevalent enough to warrant a designation for learners committing to non-learning strategies: fake learners (Alexandron et al., 2018(Alexandron et al., , 2019. Here, some learners set up multiple accounts gathering solutions to assessments to use in their main account (Northcutt et al., 2016;Ruiperez-Valiente et al., 2016). ...
Preprint
Full-text available
Automatic grading models are valued for the time and effort saved during the instruction of large student bodies. Especially with the increasing digitization of education and interest in large-scale standardized testing, the popularity of automatic grading has risen to the point where commercial solutions are widely available and used. However, for short answer formats, automatic grading is challenging due to natural language ambiguity and versatility. While automatic short answer grading models are beginning to compare to human performance on some datasets, their robustness, especially to adversarially manipulated data, is questionable. Exploitable vulnerabilities in grading models can have far-reaching consequences ranging from cheating students receiving undeserved credit to undermining automatic grading altogether - even when most predictions are valid. In this paper, we devise a black-box adversarial attack tailored to the educational short answer grading scenario to investigate the grading models' robustness. In our attack, we insert adjectives and adverbs into natural places of incorrect student answers, fooling the model into predicting them as correct. We observed a loss of prediction accuracy between 10 and 22 percentage points using the state-of-the-art models BERT and T5. While our attack made answers appear less natural to humans in our experiments, it did not significantly increase the graders' suspicions of cheating. Based on our experiments, we provide recommendations for utilizing automatic grading systems more safely in practice.
... In addition, any practical application of automatic scoring methods to real-life settings should be prepared to deal with students developing sophisticated online cheating methods (Alexandron et al., 2017), or trying to game the system (Baker et al., 2008). For example, students may learn keywords or phrasal patterns that can improve their scores without learning the core material (Ding et al., 2020;Filighera et al., 2020), which may even result in biasing machine-learning algorithms in ways that affect their accuracy on the entire population (Alexandron et al., 2019). ...
Article
Machine learning algorithms that automatically score scientific explanations can be used to measure students’ conceptual understanding, identify gaps in their reasoning, and provide them with timely and individualized feedback. This paper presents the results of a study that uses Hebrew NLP to automatically score student explanations in Biology according to fine-grained analytic grading rubrics that were developed for formative assessment. The experimental results show that our algorithms achieve a high-level of agreement with human experts, on par with previous work on automated assessment of scientific explanations in English, and that ∼500 examples are typically enough to build reliable scoring models. The main contribution is twofold. First, we present a conceptual framework for constructing analytic grading rubrics for scientific explanations, which are composed of dichotomous categories that generalize across items. These categories are designed to support automated guidance, but can also be used to provide a composite score. Second, we apply this approach in a new context – Hebrew, which belongs to a group of languages known as Morphologically Rich. In languages of this group, among them also Arabic and Turkish, each input token may consist of multiple lexical and functional units, making them particularly challenging for NLP. This is the first study on automatic assessment of scientific explanations (and more generally, of open-ended questions) in Hebrew, and among the firsts to do so in Morphologically-Rich Languages.
... Our present work followed the methodology most commonly used in knowledge tracing studies where data from all students is used for training the models (accounting for train/test validation splits etc). Some prior work has however suggested that one way to improve model performance would be to use a subset of the available data (Faraway and Augustin, 2018) -for example only a part of the students -to train the models, since some students seem to produce higher quality data than others (Alexandron et al., 2019;Yudelson et al., 2014), which can lead to model performance improvement over all students (Yudelson et al., 2014). This topic should also be further explored. ...
Preprint
In this work, we review and evaluate a body of deep learning knowledge tracing (DLKT) models with openly available and widely-used data sets, and with a novel data set of students learning to program. The evaluated DLKT models have been reimplemented for assessing reproducibility and replicability of previously reported results. We test different input and output layer variations found in the compared models that are independent of the main architectures of the models, and different maximum attempt count options that have been implicitly and explicitly used in some studies. Several metrics are used to reflect on the quality of the evaluated knowledge tracing models. The evaluated knowledge tracing models include Vanilla-DKT, two Long Short-Term Memory Deep Knowledge Tracing (LSTM-DKT) variants, two Dynamic Key-Value Memory Network (DKVMN) variants, and Self-Attentive Knowledge Tracing (SAKT). We evaluate logistic regression, Bayesian Knowledge Tracing (BKT) and simple non-learning models as baselines. Our results suggest that the DLKT models in general outperform non-DLKT models, and the relative differences between the DLKT models are subtle and often vary between datasets. Our results also show that naive models such as mean prediction can yield better performance than more sophisticated knowledge tracing models, especially in terms of accuracy. Further, our metric and hyperparameter analysis shows that the metric used to select the best model hyperparameters has a noticeable effect on the performance of the models, and that metric choice can affect model ranking. We also study the impact of input and output layer variations, filtering out long attempt sequences, and non-model properties such as randomness and hardware. Finally, we discuss model performance replicability and related issues. Our model implementations, evaluation code, and data are published as a part of this work.
Article
Full-text available
Backgroud uring the COVID‐19 pandemic, online learning has played an increasingly crucial role in the educational system. Academic dishonesty (AD) in online learning is a challenging problem that represents a complex psychological and social phenomenon for learners. However, there is a lack of comprehensive and systematic reviews of AD in online learning environments. Objectives This study presents a systematic study of AD in online learning environments to delineate its trends and uncover potential areas for further research. Methods We conducted this review based on various sources of evidence‐based research and followed the guidelines of the PRISMA statement and procedure for selection. After the exclusion criteria were employed, 59 eligible articles were selected and then analysed in a descriptive overview. Two frameworks were identified in the structured content analysis to analyse these articles. One was the framework of Gilbert's Behaviour Engineering Model (BEM), and the other was the types of interventions for online AD, where 36 articles were analysed. Results and Conclusions The descriptive results showed that most studies used quantitative methods and focused on students. The analysis results of influencing factors under the BEM framework showed that the category of environment support and tools accounts for the largest proportion. And the types of interventions for online AD we classified include individual AD & high technological complexity, individual AD & low technological complexity, collective AD & high technological complexity, and collective AD & low technological complexity. These findings provide a comprehensive understanding and guidance of AD in the online environment for relevant managers, designers and developers.
Article
Assessment is an integral part of online education, much like traditional classroom education. During the online assessment, evaluation of the learning outcomes presents challenges mainly due to academic dishonesty among students that may lead to unfair evaluations. This systematic review examines the research on online assessment security involving studies completed between 2016 and 2021. The review investigates the literature, around four critical themes - reasons for student engagement in dishonest behavior, mechanisms used for dishonesty, integrity strategies to handle dishonesty and the role of machine learning in integrity strategies. The results indicate that readily available opportunities provided by environmental factors like Internet availability, shadow individual characteristics of the student; the students with high moral values also often succumb to dishonesty. We found that among the five types of dishonest behaviors identified in online students, researchers have shown more interest in studying collusion and plagiarism. It is interesting to note that technology, a pre-requisite for the conduct of online assessments, is exploited by the students for dishonest behaviors vis-a-vis technology also plays a key role in mitigation. The integrity strategies fall under two approaches- prevention and detection. We propose an Academic Dishonesty Mitigation Plan (ADMP) that encompasses strategies from both prevention and detection approaches for effective security and integrity of online assessments. ADMP also necessitates the involvement of major stakeholders - platform owners, institutions, teachers and students, to establish a secure online assessment system. We find an increasing use of machine learning techniques to automate the detection of dishonest behavior. The findings provide a holistic understanding of academic dishonesty that could help preserve integrity in current online assessments.
Article
Full-text available
Automated text detection and analysis holds incredible potential for research in higher education. It is challenging because higher education institutes produce an enormous amount and variety of texts, letters, articles, books, reports etc. Futuristic E-learning based education replaces the difficulty of understanding the semantic meaning of the learning content from videos which is most prominent source used by the leaners to acquire knowledge. Therefore, Content Based Video retrieval has become the challenging research area under pattern recognition and computer vision in higher education through Massive Open Online Courses (MOOCs). Text plays a dynamic role in understanding the true meaning of behavior of the video. Hence, it is challenging to detect and identify the text in video due to variable complex background, low contrast, blur, poor illumination, font size, font-style, occlusions. The traditional approach of end-to-end convolution neural network (CNN) performs satisfactory in detecting video text. However, it is also important to deal with the video size, therefore, we have adopted Map Reduce technique to store the video content and utilize it efficiently by parallel computing. Followed by this, we employed novel approach to clean up the video frames to feed to neural network model based on region proposal network (RPN) with CNN by finding appropriate anchor ratios to extract the text candidates. Finally, we train our model with extracted frames to predict for the test videos. The proposed method is evaluated on ICDAR Video text benchmark datasets and few publicly available test datasets to achieve high recall.
Conference Paper
Full-text available
We propose a general-purpose method for detecting cheating in Massive Open Online Courses (MOOCs) using an Anomaly Detection technique. Using features that are based on measures of aberrant behavior, we show that a classifier that is trained on data of one type of cheating (Copying Using Multiple Accounts) can detect users who perform another type of cheating (unauthorized collaboration). The study exploits the fact that we have dedicated algorithms for detecting these two methods of cheating, which are used as reference models. The contribution of this paper is twofold. First, we demonstrate that a detection method that is based on anomaly detection, which is trained on a known set of cheaters, can generalize to detect cheaters who use other methods. Second, we propose a new time-based person-fit aberrant behavior statistic.
Article
Full-text available
When massive open online courses (MOOCs) first captured global attention in 2012, advocates imagined a disruptive transformation in postsecondary education. Video lectures from the world's best professors could be broadcast to the farthest reaches of the networked world, and students could demonstrate proficiency using innovative computer-graded assessments, even in places with limited access to traditional education. But after promising a reordering of higher education, we see the field instead coalescing around a different, much older business model: helping universities outsource their online master's degrees for professionals. To better understand the reasons for this shift, we highlight three patterns emerging from data on MOOCs provided by Harvard University and Massachusetts Institute of Technology (MIT) via the edX platform: The vast majority of MOOC learners never return after their first year, the growth in MOOC participation has been concentrated almost entirely in the world's most affluent countries, and the bane of MOOCs—low completion rates—has not improved over 6 years.
Conference Paper
Full-text available
Massive Open Online Courses (MOOCs) collect large amounts of rich data. A primary objective of Learning Analytics (LA) research is studying these data in order to improve the pedagogy of interactive learning environments. Most studies make the underlying assumption that the data represent truthful and honest learning activity. However, previous studies showed that MOOCs can have large cohorts of users that break this assumption and achieve high performance through behaviors such as Cheating Using Multiple Accounts or unauthorized collaboration, and we therefore denote them fake learners. Because of their aberrant behavior, fake learners can bias the results of Learning Analytics (LA) models. The goal of this study is to evaluate the robustness of LA results when the data contain a considerable number of fake learners. Our methodology follows the rationale of ‘replication research’. We challenge the results reported in a well-known, and one of the first LA/PedagogicEfficacy MOOC papers, by replicating its results with and without the fake learners (identified using machine learning algorithms). The results show that fake learners exhibit very different behavior compared to true learners. However, even though they are a significant portion of the student population (∼15%), their effect on the results is not dramatic (does not change trends). We conclude that the LA study that we challenged was robust against fake learners. While these results carry an optimistic message on the trustworthiness of LA research, they rely on data from one MOOC. We believe that this issue should receive more attention within the LA research community, and can explain some ‘surprising’ research results in MOOCs.
Article
Full-text available
Scientific progress is built on research that is reliable, accurate, and verifiable. The methods and evidentiary reasoning that underlie scientific claims must be available for scrutiny. Like other fields, the education sciences suffer from problems such as failure to replicate, validity and generalization issues, publication bias, and high costs of access to publications—all of which are symptoms of a nontransparent approach to research. Each aspect of the scientific cycle—research design, data collection, analysis, and publication—can and should be made more transparent and accessible. Open Education Science is a set of practices designed to increase the transparency of evidentiary reasoning and access to scientific research in a domain characterized by diverse disciplinary traditions and a commitment to impact in policy and practice. Transparency and accessibility are functional imperatives that come with many benefits for the individual researcher, scientific community, and society at large—Open Education Science is the way forward.
Conference Paper
Full-text available
Massive Open Online Courses (MOOCs) collect large amounts of rich data. A primary objective of Learning Analytics (LA) research is studying these data in order to improve the pedagogy of interactive learning environments. Most studies make the underlying assumption that the data represent truthful and honest learning activity. However, previous studies showed that MOOCs can have large cohorts of users that break this assumption and achieve high performance through behaviors such as Cheating Using Multiple Accounts or unauthorized collaboration, and we therefore denote them fake learners. Because of their aberrant behavior, fake learners can bias the results of Learning Analytics (LA) models. The goal of this study is to evaluate the robustness of LA results when the data contain a considerable number of fake learners. Our methodology follows the rationale of ‘replication research’. We challenge the results reported in a well-known, and one of the first LA/PedagogicEfficacy MOOC papers, by replicating its results with and without the fake learners (identified using machine learning algorithms). The results show that fake learners exhibit very different behavior compared to true learners. However, even though they are a significant portion of the student population (∼15%), their effect on the results is not dramatic (does not change trends). We conclude that the LA study that we challenged was robust against fake learners. While these results carry an optimistic message on the trustworthiness of LA research, they rely on data from one MOOC. We believe that this issue should receive more attention within the LA research community, and can explain some ‘surprising’ research results in MOOCs. Keywords: Learning Analytics, Educational Data Mining, MOOCs, Fake Learners, Reliability, IRT
Article
Full-text available
The MOOC Replication Framework (MORF) is a novel software system for feature extraction, model training/testing, and evaluation of predictive dropout models in Massive Open Online Courses (MOOCs). MORF makes large-scale replication of complex machine-learned models tractable and accessible for researchers, and enables public research on privacy-protected data. It does so by focusing on the high-level operations of an \emph{extract-train-test-evaluate} workflow, and enables researchers to encapsulate their implementations in portable, fully reproducible software containers which are executed on data with a known schema. MORF's workflow allows researchers to use data in analysis without providing them access to the underlying data directly, preserving privacy and data security. During execution, containers are sandboxed for security and data leakage and parallelized for efficiency, allowing researchers to create and test new models rapidly, on large-scale multi-institutional datasets that were previously inaccessible to most researchers. MORF is provided both as a Python API (the MORF Software), for institutions to use on their own MOOC data) or in a platform-as-a-service (PaaS) model with a web API and a high-performance computing environment (the MORF Platform).
Article
Full-text available
One of the reported methods of cheating in online environments in the literature is CAMEO (Copying Answers using Multiple Existences Online), where harvesting accounts are used to obtain correct answers that are later submitted in the master account which gives the student credit to obtain a certificate. In previous research we developed an algorithm to identify and label submissions that were cheated using the CAMEO method; this algorithm relied on the IP of the submissions. In this study we use this tagged sample of submissions to i) compare the influence of student and problems characteristics on CAMEO and ii) build a random forest classifier that detects submissions as CAMEO without relying on IP, achieving sensitivity and specificity levels of 0.966 and 0.996, respectively. Finally, we analyze the importance of the different features of the model finding that student features are the most important variables towards the correct classification of CAMEO submissions, concluding also that student features have more influence on CAMEO than problem features.
Conference Paper
Full-text available
Virtual labs are exploratory learning environments in which students learn by conducting inquiry to uncover the underlying scientific model. Although students often fail to learn efficiently in these environments, providing effective support is challenging since it is unclear what productive engagement looks like. This paper focuses on the mining and identification of student inquiry strategies during an unstructured activity with the DC Circuit Construction Kit (https://phet.colorado.edu/). We use an information theoretic sequence mining method to identify productive and unproductive strategies of a hundred students. Low domain knowledge students who successfully learned during the activity paused more after testing their circuits, particularly on simply structured circuits that target the activity’s learning goals, and mainly earlier in the activity. Moreover, our results show that a strategic use of pauses so that they become opportunities for reflection and planning is highly associated with productive learning. Implication to theory, support, and assessment are discussed.
Conference Paper
Full-text available
Online learning has become very popular over the last decade. However , there are still many details that remain unknown about the strategies that students follow while studying online. In this study, we focus on the direction of detecting 'invisible' collaboration ties between students in online learning environments. Specifically, the paper presents a method developed to detect student ties based on temporal proximity of their assignment submissions. The paper reports on findings of a study that made use of the proposed method to investigate the presence of close submitters in two different massive open online courses. The results show that most of the students (i.e., student user accounts) were grouped as couples, though some bigger communities were also detected. The study also compared the population detected by the algorithm with the rest of user accounts and found that close submitters needed a statistically significant lower amount of activity with the platform to achieve a certificate of completion in a MOOC. These results confirm that the detected close submitters were performing some collaboration or even engaged in unethical behaviors, which facilitates their way into a certificate. However, more work is required in the future to specify various strategies adopted by close submitters and possible associations between the user accounts.
Conference Paper
There is an indisputable need for evidence-based instructional designs that create the optimal conditions for learners with different knowledge, skills and motivations to succeed in MOOCs. The study explores the technological feasibility and implications of adaptive functionality to course (re)design in the edX platform. Additionally, the study aims to establish the foundation for future study of adaptive functionality in MOOCs on learning outcomes, engagement and course drop-out rates. Preliminary findings suggest that the adaptivity of this kind leads to a higher efficiency of learning: students go through the course faster and attempt fewer problems, since the problems are served to them in a targeted way. And yet there is no evidence that the students’ overall performance in the course suffers. Further research is needed to explore additional facets of adaptive assessment in different contexts of MOOCs and the effects on learning outcomes.