Content uploaded by Nia Marcia Maria Dowell
Author content
All content in this area was uploaded by Nia Marcia Maria Dowell on Oct 24, 2017
Content may be subject to copyright.
The Changing Patterns of MOOC Discourse
Nia M. M. Dowell
Department of Psychology
Institute for Intelligent Systems
University of Memphis
Memphis, United States
ndowell@memphis.edu
Christopher Brooks
School of Information
University of Michigan
Ann Arbor, United States
brooksch@umich.edu
Vitomir Kovanović
School of Informatics
University of Edinburgh
Edinburgh, United Kingdom
v.kovanovic@ed.ac.uk
Srećko Joksimović
Moray House School of Education
University of Edinburgh
Edinburgh, United Kingdom
s.joksimovic@ed.ac.uk
Dragan Gašević
Schools of Education and Informatics
University of Edinburgh
Edinburgh, United Kingdom
dragan.gasevic@ed.ac.uk
ABSTRACT
There is an emerging trend in higher education for the
adoption of massive open online courses (MOOCs). How-
ever, despite this interest in learning at scale, there has been
limited work investigating how MOOC participants have
changed over time. In this study, we explore the temporal
changes in MOOC learners’ language and discourse charac-
teristics. In particular, we demonstrate that there is a clear
trend within a course for language in discussion forums to
be of both more on-topic and reflective of deep learning in
subsequent offerings of a course. We measure this in two
ways, and demonstrate this trend through several repeated
analyses of different courses in different domains. While
not all courses show an increase beyond statistical signifi-
cance, the majority do, providing evidence that MOOC
learner populations are changing as the educational phe-
nomena matures.
Author Keywords
MOOCs; learning at scale; discussion forums; on-topic dis-
cussion; discourse complexity.
INTRODUCTION
Early research on the MOOC phenomena saw significant
investment in understanding the makeup of the learner pop-
ulation, largely through demographic [1], performance, and
activity-based measures [2]. With the phenomena now in its
fifth year, we provide here a retrospective analysis of how
learner engagement within MOOCs has changed based on
the form of learner discussion. In particular, we demon-
strate here that discussions have (a) become more focused
or on-topic over time, and (b) the linguistic features that
characterize MOOC learners’ discourse has become more
complex over time.
This discovery has significant implications for instructional
design and course iteration, as well as implications for fu-
ture research in the area of learning at scale research. For
instance, if students for future offerings of a course are a
more selective population, and that this population tends
towards more complex and on-topic discussions, course
designers may focus future development efforts on expand-
ing the disciplinary depth of assessments, or introducing
additional depth-based learning activities (e.g. honor tracks
in the Coursera platform). Researchers, meanwhile, need to
be aware of not only the intra-course difference, especially
when doing repeated trials and quasi-experimental designs,
but also the inter-course difference when attempting to gen-
eralize findings. As we show, the population characteristics
of a MOOC in its first offering are not the same as those of
a population in the same course but in subsequent offerings,
and direct comparisons (at least with respect to discourse)
cannot be made.
METHODS
For this analysis, we chose five MOOCs on the Coursera
platform which ran for several sessions (N= 59,017 partici-
pants). We worked with instructional designers to ensure
that each of the courses chosen experienced minimal
changes between course offerings, limited to corrections
and minor additions of content. The instructors had con-
sistent involvement in the course across subsequent offer-
ings. Each course was different with respect to the first ses-
sion start date, the length of the course, the instructor, learn-
ing objectives, participants, and domain being taught. The
courses chosen had all been run between six and ten times
(𝑥 = 8.2&𝜎 = 2.05), and the data from all offerings was
included.
Permissio n to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for com-
ponents of this work owned by others than ACM must be honored. Ab-
stracting with credit is permitted. To copy otherwise, or republish, to post
on servers or to redistribute to lists, requires prior specific permission and
/or a fee. Request permissions from
Permissio ns@acm.org
.
L@S 2017, April 20-21, 2017, Cambridge, MA, USA
©2017 ACM. ISBN978-1-4503-4450 -0/17/04 $15.00
DOI:
http://dx.doi.org/10.1145/3051457.3054005
A mixed-effects modeling approach was adopted for all
analyses due to the structure of the data (e.g., courses over
time) [7]. Mixed-effects models include a combination of
fixed and random effects and can be used to assess the in-
fluence of the fixed effects (e.g. time) on dependent varia-
bles after accounting for any extraneous random effects
(e.g. individual participant differences). The primary anal-
yses focused on identifying the characteristics of MOOC
participants discourse features over time. We were particu-
larly interested in changes in discourse features related to
message relevance (measured by the relevance of students’
messages in the discussion with the course video tran-
scripts)1 and linguistic complexity (measured through Coh-
Metrix’s [4] Flesch-Kincaid reading level measure [3]).
Therefore, we developed two mixed-effects models, with
message relevance level and Flesch-Kincaid reading level
as the dependent variables, and time and course as the inde-
pendent variables.
In addition to constructing the models with the on-topic
discussion and Flesch-Kincaid reading level as fixed ef-
fects, null models with the random effects (participant) but
no fixed effects were also constructed. A comparison of the
null random-effects-only model with the fixed-effect mod-
els allowed us to determine whether MOOC participants
discourse has changed over time above and beyond the par-
ticipant individual differences. Akaike Information Criteri-
on (AIC), Log Likelihood (LL) and a likelihood ratio test
were used to determine the best fitting and most parsimoni-
ous model. In addition, we also estimate effect sizes for
each model, using a pseudo R2 method, as suggested by
Nakagawa and Schielzeth [5]. For mixed-effects models, R2
can be characterized into two varieties: marginal R2 and
conditional R2. Marginal R2 (R2m) is associated with vari-
ance explained by fixed factors, and conditional R2 (R2c)
can be interpreted as the variance explained by the entire
model, namely random and fixed factors. Both R2m and R2c
convey relevant information regarding the model fit and
variance explained, and so we report both here. The NLME
package in R [6] was used to perform all the required com-
putation.
RESULTS AND DISUCSSION
The likelihood ratio tests indicated that both the on-topic
discussion and Flesch-Kincaid model yielded a significantly
better fit than the null random effects only models with
χ2(9) = 9277.32, p = .001, R2
m = .16, R2
c = .38 for the on-
topic model, and χ2(9) = 3024.47, p = .0001, R2
m = .05, R2
c
= .37, for the Flesch-Kincaid model. Several conclusions
can be drawn from this initial model fit evaluation and in-
1 Relevance was determined by building a custom LSA
space using the instructor video transcripts for the course as
source data. The amount of on-topicness of a students’ post
was then calculated by computing the semantic similarity
between the LSA space and the students’ post using LSA.
spection of R2 variance. First, the model comparisons imply
that temporality and course features were able to add a sig-
nificant improvement in characterizing the trend of both
MOOC participants’ rate of on-topic posting and linguistic
complexity, above and beyond individual participant differ-
ences. Second, for the on-topic model, time, course, and
individual participant features explained about 38% of the
predictable variance, with 16% of the variance being ac-
counted for by the time and course features alone. Howev-
er, for the Flesch Kincaid Model, time and course features
were only able to explain a total of 5% of variance grade
level. The observed difference in variance suggests tem-
poral changes and the course are more accurate at charac-
terizing changes in MOOC participants’ on-topic discus-
sion, than their linguistic complexity. Table 1 shows the
coefficients for the main effects of each course and course
by time interactions. To assess course-time interactions, a
reference category was selected for the categorical predictor
variable of course (i.e., Thermodynamics) for both models.
The main effect coefficients for each course in Table 1 rep-
resent the difference in the intercepts between a given
course and the reference course, Thermodynamics,
when the time variable is at its mean value. However, be-
cause we are more interested in the temporal changes in on-
topic discussion and linguistic complexity, these main ef-
fects are of less relevance for the current research. The in-
teraction coefficients for the on-topic model indicate that
four of the five MOOC courses are increasing in linguistic
complexity over time, as compared to the Thermodynamics
reference course.
On-Topic
Model
Flesch Kincaid
Model
Variable
β
SE
β
SE
Main Effects
Thermodynamics
0.72***
0.007
7.18***
0.11
Fantasy & Science
Fiction
0.07***
0.007
-0.12
0.11
Instructional
Methods
0.15***
0.010
2.04***
0.16
Finance
-0.13***
0.007
-1.10***
0.11
Model thinking
-0.07***
0.007
-0.44***
0.11
Interactions
Thermodynamics*
Time
-0.005
0.006
0.01
0.09
Fantasy & Science
Fiction* Time
0.03***
0.006
0.11
0.10
Instructional
Methods* Time
0.02*
0.009
0.22
0.14
Finance * Time
0.03***
0.006
0.21**
0.09
Model thinking *
Time
0.02***
0.006
0.37***
0.09
Table 1. All learner mixed-effects model coefficients for pre-
dicting changes in on-topic discussion with Flesh Kincaid over
time. Note: * p < .09; **p < .05; *** p < .001. Fixed effect coef-
ficient (β). Standard error (SE). N= 59,017.
For the Flesch-Kincaid model, we see two of the courses
have increased in linguistic complexity, as compared to the
Thermodynamics reference course. We further probed the
Fantasy & Science Fiction and Instructional Methods
courses to see if the temporal trend for linguistic complexi-
ty was significant when it is not being compared to the ref-
erence category. Specifically, we constructed additional
models by regressing time on Flesch-Kincaid, for the Fan-
tasy & Science Fiction and Instructional Methods courses
separately. This analysis revealed that linguistic complexity
was indeed increasing significantly for both the Fantasy &
Science Fiction with χ2(1) = 11.57, p < .001, β = .12, p <
.001, and the Instructional Methods course with χ2(1) =
8.04, p < .01, β = .24, p < .01.
These temporal changes in on-topic discussion and linguis-
tic complexity are depicted in Figures 1 and 2, respectively.
Note that the standardized time variable was used in the
analysis, however this relationship is plotted across years in
the below figures to visualize the relationship. Figure 1
illustrates the temporal trend of on-topic discussion, which
appears to increase with subsequent offerings of a course,
for all courses but thermodynamics. Figure 2 shows the
temporal trend of grade reading level appears to increase
with subsequent offerings of a course, for all courses but
thermodynamics.
CONCLUSIONS
This paper shares some of our initial explorations of issues
associated with discussion forums of course-based Massive
Open Online Courses. In this work, we have demonstrated
the increasing relevance and linguistic complexity of
MOOC discussion fora over subsequent offerings. While
not all courses have the same amplitude of increase, there is
a general trend seen in all courses except for one, an intro-
ductory thermodynamics course. We have not addressed the
question as to why discourse patterns are changing in
MOOCs. It may be that the population for subsequent offer-
ings is more niche, and new courses are generally taken by
the broadest (in terms of interest) population. It could be an
effect of habitual course takers: there are several anecdotes
which we aim to explore more fully of learners taking
courses repeated despite passing them, either to sign up as a
formal mentor for the course, or to engage in continued on-
topic learning with new cohorts. Or it could be an effect of
the MOOC phenomena in general, with a steadily increas-
ing user base and distribution of new courses. In our future
work, we will also explore additional MOOC participant
population characteristics, and incorporate the total number
of posts per learner into the models.
Figure 1. Linear mixed-effect model fitted estimates for on-topic discussion over time for five MOOC courses.
Figure 2. Linear mixed-effect model fitted estimates for Flesch-Kincaid Grade level over time for each of the five
MOOC courses.
ACKNOWLEDGMENTS
This research was supported in part by the National Science
Foundation under Grant No. BCC 14-517. Any opinions,
findings, and conclusions or recommendations expressed in
this material are those of the authors and do not necessarily
reflect the views of these funding agencies.
REFERENCES
[1] Emanuel, E.J. 2013. Online education: MOOCs taken
by educated few. Nature. 503, 7476 (2013), 342.
[2] Kizilcec, R.F. et al. 2013. Deconstructing Disengage-
ment: Analyzing Learner Subpopulations in Massive
Open Online Courses. Proceedings of the Third Inter-
national Conference on Learning Analytics and
Knowledge (New York, NY, USA, 2013), 170–179.
[3] Klare, G.R. 1974. Assessing readability. Reading Re-
search Quarterly. 10 (1975 1974), 62–102.
[4] McNamara, D.S. et al. 2014. Automated evaluation of
text and discourse with Coh-Metrix. Cambridge Uni-
versity Press.
[5] Nakagawa, S. and Schielzeth, H. 2013. A general and
simple method for obtaining R2 from generalized line-
ar mixed-effects models. Methods in Ecology and Evo-
lution. 4, 2 (Feb. 2013), 133–142.
[6] Pinheiro, J. et al. 2016. nlme: Linear and nonlinear
mixed effects models.
[7] Pinheiro, J.C. and Bates, D.M. 2000. Mixed-effects
models in S and S-Plus. Springer.