PosterPDF Available

The Changing Patterns of MOOC Discourse

Authors:

Abstract and Figures

There is an emerging trend in higher education for the adoption of massive open online courses (MOOCs). However, despite this interest in learning at scale, there has been limited work investigating how MOOC participants have changed over time. In this study, we explore the temporal changes in MOOC learners’ language and discourse characteristics. In particular, we demonstrate that there is a clear trend within a course for language in discussion forums to be of both more on-topic and reflective of deep learning in subsequent offerings of a course. We measure this in two ways, and demonstrate this trend through several repeated analyses of different courses in different domains. While not all courses show an increase beyond statistical significance, the majority do, providing evidence that MOOC learner populations are changing as the educational phenomena matures.
Content may be subject to copyright.
The Changing Patterns of MOOC Discourse
Nia M. M. Dowell
Department of Psychology
Institute for Intelligent Systems
University of Memphis
Memphis, United States
ndowell@memphis.edu
Christopher Brooks
School of Information
University of Michigan
Ann Arbor, United States
brooksch@umich.edu
Vitomir Kovanović
School of Informatics
University of Edinburgh
Edinburgh, United Kingdom
v.kovanovic@ed.ac.uk
Srećko Joksimović
Moray House School of Education
University of Edinburgh
Edinburgh, United Kingdom
s.joksimovic@ed.ac.uk
Dragan Gašević
Schools of Education and Informatics
University of Edinburgh
Edinburgh, United Kingdom
dragan.gasevic@ed.ac.uk
ABSTRACT
There is an emerging trend in higher education for the
adoption of massive open online courses (MOOCs). How-
ever, despite this interest in learning at scale, there has been
limited work investigating how MOOC participants have
changed over time. In this study, we explore the temporal
changes in MOOC learners’ language and discourse charac-
teristics. In particular, we demonstrate that there is a clear
trend within a course for language in discussion forums to
be of both more on-topic and reflective of deep learning in
subsequent offerings of a course. We measure this in two
ways, and demonstrate this trend through several repeated
analyses of different courses in different domains. While
not all courses show an increase beyond statistical signifi-
cance, the majority do, providing evidence that MOOC
learner populations are changing as the educational phe-
nomena matures.
Author Keywords
MOOCs; learning at scale; discussion forums; on-topic dis-
cussion; discourse complexity.
INTRODUCTION
Early research on the MOOC phenomena saw significant
investment in understanding the makeup of the learner pop-
ulation, largely through demographic [1], performance, and
activity-based measures [2]. With the phenomena now in its
fifth year, we provide here a retrospective analysis of how
learner engagement within MOOCs has changed based on
the form of learner discussion. In particular, we demon-
strate here that discussions have (a) become more focused
or on-topic over time, and (b) the linguistic features that
characterize MOOC learners’ discourse has become more
complex over time.
This discovery has significant implications for instructional
design and course iteration, as well as implications for fu-
ture research in the area of learning at scale research. For
instance, if students for future offerings of a course are a
more selective population, and that this population tends
towards more complex and on-topic discussions, course
designers may focus future development efforts on expand-
ing the disciplinary depth of assessments, or introducing
additional depth-based learning activities (e.g. honor tracks
in the Coursera platform). Researchers, meanwhile, need to
be aware of not only the intra-course difference, especially
when doing repeated trials and quasi-experimental designs,
but also the inter-course difference when attempting to gen-
eralize findings. As we show, the population characteristics
of a MOOC in its first offering are not the same as those of
a population in the same course but in subsequent offerings,
and direct comparisons (at least with respect to discourse)
cannot be made.
METHODS
For this analysis, we chose five MOOCs on the Coursera
platform which ran for several sessions (N= 59,017 partici-
pants). We worked with instructional designers to ensure
that each of the courses chosen experienced minimal
changes between course offerings, limited to corrections
and minor additions of content. The instructors had con-
sistent involvement in the course across subsequent offer-
ings. Each course was different with respect to the first ses-
sion start date, the length of the course, the instructor, learn-
ing objectives, participants, and domain being taught. The
courses chosen had all been run between six and ten times
(𝑥 = 8.2&𝜎 = 2.05), and the data from all offerings was
included.
Permissio n to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. Copyrights for com-
ponents of this work owned by others than ACM must be honored. Ab-
stracting with credit is permitted. To copy otherwise, or republish, to post
on servers or to redistribute to lists, requires prior specific permission and
/or a fee. Request permissions from
Permissio ns@acm.org
.
L@S 2017, April 20-21, 2017, Cambridge, MA, USA
©2017 ACM. ISBN978-1-4503-4450 -0/17/04 $15.00
DOI:
http://dx.doi.org/10.1145/3051457.3054005
A mixed-effects modeling approach was adopted for all
analyses due to the structure of the data (e.g., courses over
time) [7]. Mixed-effects models include a combination of
fixed and random effects and can be used to assess the in-
fluence of the fixed effects (e.g. time) on dependent varia-
bles after accounting for any extraneous random effects
(e.g. individual participant differences). The primary anal-
yses focused on identifying the characteristics of MOOC
participants discourse features over time. We were particu-
larly interested in changes in discourse features related to
message relevance (measured by the relevance of students’
messages in the discussion with the course video tran-
scripts)1 and linguistic complexity (measured through Coh-
Metrix’s [4] Flesch-Kincaid reading level measure [3]).
Therefore, we developed two mixed-effects models, with
message relevance level and Flesch-Kincaid reading level
as the dependent variables, and time and course as the inde-
pendent variables.
In addition to constructing the models with the on-topic
discussion and Flesch-Kincaid reading level as fixed ef-
fects, null models with the random effects (participant) but
no fixed effects were also constructed. A comparison of the
null random-effects-only model with the fixed-effect mod-
els allowed us to determine whether MOOC participants
discourse has changed over time above and beyond the par-
ticipant individual differences. Akaike Information Criteri-
on (AIC), Log Likelihood (LL) and a likelihood ratio test
were used to determine the best fitting and most parsimoni-
ous model. In addition, we also estimate effect sizes for
each model, using a pseudo R2 method, as suggested by
Nakagawa and Schielzeth [5]. For mixed-effects models, R2
can be characterized into two varieties: marginal R2 and
conditional R2. Marginal R2 (R2m) is associated with vari-
ance explained by fixed factors, and conditional R2 (R2c)
can be interpreted as the variance explained by the entire
model, namely random and fixed factors. Both R2m and R2c
convey relevant information regarding the model fit and
variance explained, and so we report both here. The NLME
package in R [6] was used to perform all the required com-
putation.
RESULTS AND DISUCSSION
The likelihood ratio tests indicated that both the on-topic
discussion and Flesch-Kincaid model yielded a significantly
better fit than the null random effects only models with
χ2(9) = 9277.32, p = .001, R2
m = .16, R2
c = .38 for the on-
topic model, and χ2(9) = 3024.47, p = .0001, R2
m = .05, R2
c
= .37, for the Flesch-Kincaid model. Several conclusions
can be drawn from this initial model fit evaluation and in-
1 Relevance was determined by building a custom LSA
space using the instructor video transcripts for the course as
source data. The amount of on-topicness of a students’ post
was then calculated by computing the semantic similarity
between the LSA space and the students’ post using LSA.
spection of R2 variance. First, the model comparisons imply
that temporality and course features were able to add a sig-
nificant improvement in characterizing the trend of both
MOOC participants’ rate of on-topic posting and linguistic
complexity, above and beyond individual participant differ-
ences. Second, for the on-topic model, time, course, and
individual participant features explained about 38% of the
predictable variance, with 16% of the variance being ac-
counted for by the time and course features alone. Howev-
er, for the Flesch Kincaid Model, time and course features
were only able to explain a total of 5% of variance grade
level. The observed difference in variance suggests tem-
poral changes and the course are more accurate at charac-
terizing changes in MOOC participants’ on-topic discus-
sion, than their linguistic complexity. Table 1 shows the
coefficients for the main effects of each course and course
by time interactions. To assess course-time interactions, a
reference category was selected for the categorical predictor
variable of course (i.e., Thermodynamics) for both models.
The main effect coefficients for each course in Table 1 rep-
resent the difference in the intercepts between a given
course and the reference course, Thermodynamics,
when the time variable is at its mean value. However, be-
cause we are more interested in the temporal changes in on-
topic discussion and linguistic complexity, these main ef-
fects are of less relevance for the current research. The in-
teraction coefficients for the on-topic model indicate that
four of the five MOOC courses are increasing in linguistic
complexity over time, as compared to the Thermodynamics
reference course.
On-Topic
Model
Flesch Kincaid
Model
Variable
β
SE
β
SE
Main Effects
Thermodynamics
0.72***
0.007
7.18***
0.11
Fantasy & Science
Fiction
0.07***
0.007
-0.12
0.11
Instructional
Methods
0.15***
0.010
2.04***
0.16
Finance
-0.13***
0.007
-1.10***
0.11
Model thinking
-0.07***
0.007
-0.44***
0.11
Interactions
Thermodynamics*
Time
-0.005
0.006
0.01
0.09
Fantasy & Science
Fiction* Time
0.03***
0.006
0.11
0.10
Instructional
Methods* Time
0.02*
0.009
0.22
0.14
Finance * Time
0.03***
0.006
0.21**
0.09
Model thinking *
Time
0.02***
0.006
0.37***
0.09
Table 1. All learner mixed-effects model coefficients for pre-
dicting changes in on-topic discussion with Flesh Kincaid over
time. Note: * p < .09; **p < .05; *** p < .001. Fixed effect coef-
ficient (β). Standard error (SE). N= 59,017.
For the Flesch-Kincaid model, we see two of the courses
have increased in linguistic complexity, as compared to the
Thermodynamics reference course. We further probed the
Fantasy & Science Fiction and Instructional Methods
courses to see if the temporal trend for linguistic complexi-
ty was significant when it is not being compared to the ref-
erence category. Specifically, we constructed additional
models by regressing time on Flesch-Kincaid, for the Fan-
tasy & Science Fiction and Instructional Methods courses
separately. This analysis revealed that linguistic complexity
was indeed increasing significantly for both the Fantasy &
Science Fiction with χ2(1) = 11.57, p < .001, β = .12, p <
.001, and the Instructional Methods course with χ2(1) =
8.04, p < .01, β = .24, p < .01.
These temporal changes in on-topic discussion and linguis-
tic complexity are depicted in Figures 1 and 2, respectively.
Note that the standardized time variable was used in the
analysis, however this relationship is plotted across years in
the below figures to visualize the relationship. Figure 1
illustrates the temporal trend of on-topic discussion, which
appears to increase with subsequent offerings of a course,
for all courses but thermodynamics. Figure 2 shows the
temporal trend of grade reading level appears to increase
with subsequent offerings of a course, for all courses but
thermodynamics.
CONCLUSIONS
This paper shares some of our initial explorations of issues
associated with discussion forums of course-based Massive
Open Online Courses. In this work, we have demonstrated
the increasing relevance and linguistic complexity of
MOOC discussion fora over subsequent offerings. While
not all courses have the same amplitude of increase, there is
a general trend seen in all courses except for one, an intro-
ductory thermodynamics course. We have not addressed the
question as to why discourse patterns are changing in
MOOCs. It may be that the population for subsequent offer-
ings is more niche, and new courses are generally taken by
the broadest (in terms of interest) population. It could be an
effect of habitual course takers: there are several anecdotes
which we aim to explore more fully of learners taking
courses repeated despite passing them, either to sign up as a
formal mentor for the course, or to engage in continued on-
topic learning with new cohorts. Or it could be an effect of
the MOOC phenomena in general, with a steadily increas-
ing user base and distribution of new courses. In our future
work, we will also explore additional MOOC participant
population characteristics, and incorporate the total number
of posts per learner into the models.
Figure 1. Linear mixed-effect model fitted estimates for on-topic discussion over time for five MOOC courses.
Figure 2. Linear mixed-effect model fitted estimates for Flesch-Kincaid Grade level over time for each of the five
MOOC courses.
ACKNOWLEDGMENTS
This research was supported in part by the National Science
Foundation under Grant No. BCC 14-517. Any opinions,
findings, and conclusions or recommendations expressed in
this material are those of the authors and do not necessarily
reflect the views of these funding agencies.
REFERENCES
[1] Emanuel, E.J. 2013. Online education: MOOCs taken
by educated few. Nature. 503, 7476 (2013), 342.
[2] Kizilcec, R.F. et al. 2013. Deconstructing Disengage-
ment: Analyzing Learner Subpopulations in Massive
Open Online Courses. Proceedings of the Third Inter-
national Conference on Learning Analytics and
Knowledge (New York, NY, USA, 2013), 170179.
[3] Klare, G.R. 1974. Assessing readability. Reading Re-
search Quarterly. 10 (1975 1974), 62102.
[4] McNamara, D.S. et al. 2014. Automated evaluation of
text and discourse with Coh-Metrix. Cambridge Uni-
versity Press.
[5] Nakagawa, S. and Schielzeth, H. 2013. A general and
simple method for obtaining R2 from generalized line-
ar mixed-effects models. Methods in Ecology and Evo-
lution. 4, 2 (Feb. 2013), 133142.
[6] Pinheiro, J. et al. 2016. nlme: Linear and nonlinear
mixed effects models.
[7] Pinheiro, J.C. and Bates, D.M. 2000. Mixed-effects
models in S and S-Plus. Springer.
... In addition to the manner in which forums tend to be used, student posts vary by discipline for at least two measures: the Flesch-Kincaid grade level of posts and the extent to which posts are on topic (Dowell et al. 2017). In Dowell and colleagues' (2017) study of MOOCs from five disciplines, they found that education students' posts were written at a grade 9 or 10 level, whereas the literature and physics MOOC posts were written at about a grade 7 level. ...
... These simpler approaches include measures such as post length (Chandrasekaran et al. 2015), the use of discourse connectives (Chandrasekaran et al. 2015(Chandrasekaran et al. , 2017, the use of course-specific terminology (Chandrasekaran et al. 2015(Chandrasekaran et al. , 2017Dowell et al. 2017), post readability measures that include Flesch-Kincaid (Dowell et al. 2017), and pronoun use (Kacewicz et al. 2014;Demmans Epp et al. 2017b). More complex linguistic measures are being developed and validated (Wang et al. 2015;Dowell et al. 2019). ...
... These simpler approaches include measures such as post length (Chandrasekaran et al. 2015), the use of discourse connectives (Chandrasekaran et al. 2015(Chandrasekaran et al. , 2017, the use of course-specific terminology (Chandrasekaran et al. 2015(Chandrasekaran et al. , 2017Dowell et al. 2017), post readability measures that include Flesch-Kincaid (Dowell et al. 2017), and pronoun use (Kacewicz et al. 2014;Demmans Epp et al. 2017b). More complex linguistic measures are being developed and validated (Wang et al. 2015;Dowell et al. 2019). ...
Article
From massive open online courses (MOOC) to the smaller scale use of learning management systems, many students interact with online platforms that are meant to support learning. Investigations into the use of these systems have considered how well students learn when certain approaches are employed. However, we do not fully understand how course type or system design influence student actions and experiences, meaning prior findings cannot be properly interpreted and used because we do not understand how these factors influence those findings. Accordingly, we conducted a study to compare student experiences and behaviors across learning management systems and courses from a learning analytics perspective. Differences in student behaviors and experiences highlight how system design and the nature of the course interact: Students reported increased learning support when using a system that foregrounds student interaction through discussion forums, but this relationship did not hold across all course types. Similarly, students from the content-delivery focused system spent more time online while feeling less supported regardless of which type of course they were taking. This newly found evidence for the often-interrelated influence that the course and system have on student experiences and behaviors should therefore be considered when selecting a system to meet particular pedagogical goals.
... Advances in artificial intelligence methods, such as NLP (Kao & Poteet, 2007), have made it possible to automatically (a) harness vast amounts of educational discourse data being produced in technology-mediated learning environments, (b) quantify aspects of human cognition, affective, and social processes that (c) would otherwise not be possible or extremely timeconsuming for human coders to capture, given the multifaceted characteristics of human discourse. Indeed, NLP and automated text analysis approaches have proven quite useful in quantifying and characterizing psychological, affective, cognitive, and social phenomena from a learner-generated discourse (Bell et al., 2012;Cade et al., 2014;D'Mello et al., 2009;D'Mello & Graesser, 2012;Dowell et al., 2017Dowell et al., , 2019Dowell et al., , 2020Dowell & Graesser, 2015;Eichstaedt et al., 2018;Kern et al., 2020;Lin et al., 2020;McNamara et al., 2014;Schwartz et al., 2013;Tausczik & Pennebaker, 2010;Zedelius et al., 2019). ...
Article
Full-text available
Over the last decade, psychological interventions, such as the values affirmation intervention, have been shown to alleviate the male-female performance difference when delivered in the classroom, however, attempts to scale the intervention are less successful. This study provides unique evidence on this issue by reporting the observed differences between two randomized controlled implementations of the values affirmation intervention: (a) successful in-class and (b) unsuccessful online implementation at scale. Specifically, we use natural language processing to explore the discourse features that characterize successful female students’ values affirmation essays to gain insight on the underlying mechanisms that contribute to the beneficial effects of the intervention. Our results revealed that linguistic dimensions related to aspects of cohesion, affective, cognitive, temporal, and social orientation, independently distinguished between males and females, as well as more and less effective essays. We discuss implications for the pipeline from theory to practice and for psychological interventions.
... SARS-COV-19 (COVID) has impacted both society at large and the education sector specifically, and this will likely continue in the short term (Nordmann et al., 2020b). The current higher education model has survived many potential technological disruptions, including MOOCs (Dey et al., 2009;Dowell et al., 2017). It is not clear how the societal upheaval linked with the pandemic will impact the present disruption in the higher education sector, how lecture recordings will fit into our new normal, and whether the lasting impact of the pandemic will be to completely revolutionise how we teach, or if this will simply be a short term blip in education history. ...
Article
Full-text available
Much has been written about instructor attitudes towards lecture capture, particularly concerning political issues such as opt-out policies and the use of recordings by management. Additionally, the pedagogical concerns of lecturers have been extensively described and focus on the belief that recording lectures will impact on attendance and will reduce interactivity and active learning activities in lectures. However, little work has looked at the relationship between attitudes towards lecture capture and broader conceptions of learning and teaching. In this pre-registered study, we administered the Conceptions of Learning and Teaching scale and a novel lecture capture attitude scale to 159 higher education teachers. We found that appreciation of active learning predicted more positive attitudes towards lecture recordings as an educational support tool, whilst higher teacher-centred scores predicted greater concern about the negative educational impact of recordings. The effects observed were small; however, they are strong evidence against the view that it is instructors who value participatory and active learning that are opposed to lecture capture. Exploratory analyses also suggested that those who did not view recordings as an essential educational resource record fewer of their lectures, highlighting the real-world impact that attitudes can have, and further strengthening the need for staff to be provided with evidence-based guidance upon which to base their teaching practice. Data, analysis code, and the pre-registration are available at https://osf.io/uzs3t/ .
... Language is a window to learners' social, cognitive, and affective states in learning [8,10,13,42]. The advances of computational linguistics offer a powerful and efficient way to quantify learning behavior at scale [7,9,11,12]. While these methods have been commonly applied to forecast academic achievement and cognitive processes [35,36], there have been fewer instances that focus on non-cognitive outcomes such as learning experience and social identity [1,6]. ...
Conference Paper
Full-text available
Women are traditionally underrepresented in science, technology, engineering, and mathematics (STEM). While the representation of women in STEM classrooms has grown rapidly in recent years, it remains pedagogically meaningful to understand whether their learning outcomes are achieved in different ways than male students. In this study, we explored this issue through the lens of language in the context of an asynchronous online discussion forum. We applied Linguistic Inquiry and Word Count (LIWC) to examine linguistic features of students’ reflective posting in an online chemistry class at a four-year university. Our results suggest that cognitive linguistic features significantly predict the likelihood of passing the course and increases perceived sense of belonging. However, these results only hold true for female students. Pronouns and words relevant to social presence correlate with passing the course in different directions, and this mixed relationship is more polarized among male students. Interestingly, the linguistic features per se do not differ significantly between genders. Overall, our findings provide a more nuanced account of the relationship between linguistic signals of social/cognitive presence and learning outcomes. We conclude with implications for pedagogical interventions and system design to inclusively support learner success in online STEM courses.
... Wang et al. (2015) suggest that on-topic (or content-related) discussions are correlated with more learning than off-topic discussions. Dowell, Brooks, Kovanovic, Joksimovic, and Gasevic (2017) explored whether the learners are on-or off-topic in their discussions of subsequent offerings of courses and found that they are mostly on-topic over time. However, the majority of this research neglects the fact that off-topic (but content-related) contributions that go beyond the course material may potentially result in the construction of new knowledge and hence, be worthwhile to explore further. ...
Article
Research states that effective teacher professional development (PD) engages teachers as active learners and co-creators of content. However, it is yet to be known whether such pedagogy impacts on cognitive engagement. We adopt the ICAP Framework to measure cognitive engagement in a teacher PD Massively Open Online Course (MOOC). We use word embeddings to automate the identification of teachers' community contributions as representing ‘active’ engagement by manipulating course materials, or ‘constructive’ engagement through the generation of new knowledge. We explored individual variation in engagement across units. Our findings demonstrated that the participants' cognitive engagement is influenced by the nature of MOOC tasks. We adopted a manual content analysis approach to explore constructive contributions. From 67 cases considered, all but one case was identified as containing ‘constructive knowledge’, providing a solid basis for replicating our proposed methodology to analyse cognitive engagement within the community-centric MOOC models.
... In this plot, each point, which represents a post, is colored by its dominant topic. Previously, Dowell et al. compared linguistic features of discourse in MOOC forums of repeated courses, and found that later iterations became more on-topic and linguistically complex [6]. We hypothesized that we might also observe more on-topic conversations over time in the Teach-Outs. ...
Conference Paper
MOOCs have developed into multiple learning design models with a wide range of objectives. Teach-Outs are one such example, aiming to drive meaningful discussions around topics of pressing social urgency without the use of formal assessments. Given this approach, it is crucial to evaluate learners' engagement in the discussion forum to understand their experiences. This paper presents a pilot study that applied unsupervised natural language processing techniques to understand what and how students engage in dialogue in a Teach-Out. We used topic modeling to discover the emerging topics in the discussion forums and evaluated the on-topicness of the discussions (i.e. the degree to which discussions were relevant to the Teach-Out content). We also applied content analysis to investigate the sentiments associated with the discussions. We have taken a step toward extracting structure from students' discussions to understand learning behaviors happen in the discussion forum. This is the first study to analyze discussion forums in a Teach-Out.
Article
Full-text available
The explosive growth of online education environments is generating a massive volume of data, specially in text format from forums, chats, social networks, assessments, essays, among others. It produces exciting challenges on how to mine text data in order to find useful knowledge for educational stakeholders. Despite the increasing number of educational applications of text mining published recently, we have not found any paper surveying them. In this line, this work presents a systematic overview of the current status of the Educational Text Mining field. Our final goal is to answer three main research questions: Which are the text mining techniques most used in educational environments? Which are the most used educational resources? And which are the main applications or educational goals? Finally, we outline the conclusions and the more interesting future trends. This article is categorized under: Application Areas > Education and Learning Ensemble Methods > Text Mining
Conference Paper
Full-text available
As MOOCs grow in popularity, the relatively low completion rates of learners has been a central criticism. This focus on completion rates, however, reflects a monolithic view of disengagement that does not allow MOOC designers to target interventions or develop adaptive course features for particular subpopulations of learners. To address this, we present a simple, scalable, and informative classification method that identifies a small number of longitudinal engagement trajectories in MOOCs. Learners are classified based on their patterns of interaction with video lectures and assessments, the primary features of most MOOCs to date. In an analysis of three computer science MOOCs, the classifier consistently identifies four prototypical trajectories of engagement. The most notable of these is the learners who stay engaged through the course without taking assessments. These trajectories are also a useful framework for the comparison of learner engagement between different course structures or instructional approaches. We compare learners in each trajectory and course across demographics, forum participation, video access, and reports of overall experience. These results inform a discussion of future interventions, research, and design directions for MOOCs. Potential improvements to the classification mechanism are also discussed, including the introduction of more fine-grained analytics.
Article
Coh-Metrix is among the broadest and most sophisticated automated textual assessment tools available today. Automated Evaluation of Text and Discourse with Coh-Metrix describes this computational tool, as well as the wide range of language and discourse measures it provides. Section I of the book focuses on the theoretical perspectives that led to the development of Coh-Metrix, its measures, and empirical work that has been conducted using this approach. Section II shifts to the practical arena, describing how to use Coh-Metrix and how to analyze, interpret, and describe results. Coh-Metrix opens the door to a new paradigm of research that coordinates studies of language, corpus analysis, computational linguistics, education, and cognitive science. This tool empowers anyone with an interest in text to pursue a wide array of previously unanswerable research questions..
Article
The use of both linear and generalized linear mixed-effects models (LMMs and GLMMs) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools for mixed-effects models. The presentation of variance explained' (R2) as a relevant summarizing statistic of mixed-effects models, however, is rare, even though R2 is routinely reported for linear models (LMs) and also generalized linear models (GLMs). R2 has the extremely useful property of providing an absolute value for the goodness-of-fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R2 can also be a quantity of biological interest. One reason for the under-appreciation of R2 for mixed-effects models lies in the fact that R2 can be defined in a number of ways. Furthermore, most definitions of R2 for mixed-effects have theoretical problems (e.g. decreased or negative R2 values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R2 for mixed-effects models. We first provide the common definitions of R2 for LMs and GLMs and discuss the key problems associated with calculating R2 for mixed-effects models. We then recommend a general and simple method for calculating two types of R2 (marginal and conditional R2) for both LMMs and GLMMs, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed-effects models. The proposed method has the potential to facilitate the presentation of R2 for a wide range of circumstances.
Book
Linear Mixed-Effects * Theory and Computational Methods for LME Models * Structure of Grouped Data * Fitting LME Models * Extending the Basic LME Model * Nonlinear Mixed-Effects * Theory and Computational Methods for NLME Models * Fitting NLME Models
Article
One of the problems in public education and mass communication is how to tell whether a particular piece of writing is likely to be readable to a particular group of readers. Two major solutions are possible: measuring and predicting readability. Measuring, by judgments or tests, involves using readers. Predicting by readability formulas, does not involve readers but instead uses counts of language elements in the piece of writing. This article reviews formulas and related predictive devices since 1960. Four categories are presented: 1) recalculations and revisions of existing formulas; 2) new formulas, for general purpose or special purpose use; 3) application aids, for both manual and machine use; and, 4) predictions of readability for foreign languages. It concludes with suggestions for choosing a formula, based upon the following considerations: 1) special versus general needs, 2) manual versus machine application, 3) simple versus complex formulas, 4) word length versus word list formulas; and, 5) sentence length versus sentence complexity. Finally, the article stresses that formulas provide good indices of difficulty, but do not indicate causes of difficulty or say how to write readably.
Assessing readability. Reading Research Quarterly
  • G R Klare
Klare, G.R. 1974. Assessing readability. Reading Research Quarterly. 10 (1975 1974), 62-102.