Conference PaperPDF Available

Towards Automated Content Analysis of Discussion Transcripts: A Cognitive Presence Case

Authors:

Abstract and Figures

In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features , together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen's kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.
Content may be subject to copyright.
Towards Automated Content Analysis of Discussion
Transcripts: A Cognitive Presence Case
Vitomir Kovanovi´
c
School of Informatics
The University of Edinburgh
Edinburgh, UK
v.kovanovic@ed.ac.uk
Sre´
cko Joksimovi´
c
Moray House School of
Education
The University of Edinburgh
Edinburgh, UK
s.joksimovic@ed.ac.uk
Zak Waters
Queensland University of
Technology
Brisbane, Australia
z.waters@qut.edu.au
Dragan Gaševi´
c
Moray House School of
Education and School
of Informatics
The University of Edinburgh
Edinburgh, UK
dgasevic@acm.org
Kirsty Kitto
Queensland University of
Technology
Brisbane, Australia
kirsty.kitto@qut.edu.au
Marek Hatala
School of Interactive Arts and
Technology
Simon Fraser University
Burnaby, Canada
mhatala@sfu.ca
George Siemens
LINK Research Lab
University of Texas at Arlington
Arlington, USA
gsiemens@uta.edu
ABSTRACT
In this paper, we present the results of an exploratory study that
examined the problem of automating content analysis of student
online discussion transcripts. We looked at the problem of cod-
ing discussion transcripts for the levels of cognitive presence, one
of the three main constructs in the Community of Inquiry (CoI)
model of distance education. Using Coh-Metrix and LIWC fea-
tures, together with a set of custom features developed to capture
discussion context, we developed a random forest classification sys-
tem that achieved 70.3% classification accuracy and 0.63 Cohen’s
kappa, which is significantly higher than values reported in the pre-
vious studies. Besides improvement in classification accuracy, the
developed system is also less sensitive to overfitting as it uses only
205 classification features, which is around 100 times less features
than in similar systems based on bag-of-words features. We also
provide an overview of the classification features most indicative of
the different phases of cognitive presence that gives an additional in-
sights into the nature of cognitive presence learning cycle. Overall,
our results show great potential of the proposed approach, with an
added benefit of providing further characterization of the cognitive
presence coding scheme.
Keywords
Community of Inquiry (CoI) model, content analysis, content ana-
lytics, online discussions, text classification
1. INTRODUCTION
Online discussions are commonly used in modern higher educa-
tion, both for blended and fully online learning [42]. In distance
education, given the absence of face to face interactions, online
discussions represent an important component of the whole edu-
cational experience. This is especially important for the social-
constructivist pedagogies which emphasize the value of social con-
struction of knowledge through interactions and discussions among
a group of learners [3]. In this regard, the Community of Inquiry
(CoI) model [23,24] represents perhaps one of the best researched
and validated models of online and distance education, focused on
explaining important dimensions – also known as presences – that
shape students’ online learning experience.
The most commonly used approaches to the analysis of online
discussion transcripts are based on the quantitative content analysis
(QCA) [12,54,50,15]. According to Krippendorff [37] content
analysis is “a research technique for making replicable and valid
inferences from texts (or other meaningful matter) to the contexts
of their use”[p18]. In the case of the study presented in this paper,
contexts is online learning environments. QCA is a well defined
research technique commonly used in social science research, and
it makes use of specifically designed coding schemes to analyze text
artifacts with respect to the defined research goals and objectives.
For instance, the CoI model defines a set of coding schemes which
are used by the educational researchers to assess the levels of three
CoI presences.
In the domain of educational research, QCA of student discus-
sion data have been mainly used for the retrospection and research
after the courses are over without an impact on the courses’ learning
outcomes [53]. In the field of content analytics [36] – which focuses
on building analytical models based on the learning content includ-
ing student-produced content such as online discussion messages –
there have been some attempts to automate some of those coding
schemes. Most notable are the efforts of McKlin [44] and Corich
et al. [11] on automation of the CoI coding schemes, which served
as a starting point for our research in the area [35,62]. One of
the main challenges for automation of content analysis is the fact
that the most important constructs from the educational perspective
(e.g., student group learning progress, motivation, engagement, so-
cial climate) are latent constructs not explicitly present in the dis-
cussion transcripts. This means the assessment of these constructs
requires human interpretation and judgment.
This paper presents the results of a study that explored the use of
content analytics for automating content analysis of student online
discussions based on the CoI coding schemes. We focused on au-
tomation of the content analysis of cognitive presence, one of the
main constructs in the CoI model. By building upon the existing
work in the fields of text mining and text classification and our pre-
vious work in this area [35,62], we developed a random forests clas-
sifier which makes use of a novel set of classification features and
provides a classification accuracy of 70.3% and Cohen’s κof 0.63
in our cross validation testing. In this paper, we describe the de-
veloped classifier and the adopted classification features. We also
report on the findings of the empirical evaluation of the classifier
and critically discuss the findings.
2. BACKGROUND WORK
2.1 The Community of Inquiry (CoI) model
The Community of Inquiry (CoI) model is a widely researched
model that explains different dimensions of social learning in on-
line learning communities [23,24]. Central to the model are the
three constructs, also known as presences, which together provide
a comprehensive understanding of learning processes [23,24]:
1) Cognitive presence which is the central construct in the CoI
model and describes different phases of student knowledge con-
struction within a learning community [24].
2) Social presence captures different social relationships within a
learning community that have a significant impact on the success
and quality of the learning process [51].
3) Teaching presence explains the role of instructors during the
course delivery as well as their role in the course design and
preparation [4].
The focus of this study is on the analysis of cognitive presence,
which is defined by Garrison et al. [24] as “an extent to which the
participants in any particular configuration of a community of in-
quiry are able to construct meaning through sustained communi-
cation.”[p11]. Cognitive presence is grounded in the constructivist
views of Dewey [14] and is “the element in this [CoI] model that
is most basic to success in higher education” [23, p89]. Cogni-
tive presence is operationalized by the practical inquiry model [24],
which defines the following four phases:
1) Triggering event: In this phase, an issue, dilemma or problem
is identified. In the case of a formal educational context, those
are often explicitly defined by the instructors; however, they can
also be initiated by the other discussion participants [24].
2) Exploration: This phase is characterized by the transition be-
tween the private world of reflective learning and the shared
world of social construction of knowledge [24]. Questioning,
brainstorming and information exchange are the main activities
which characterize this phase [24].
3) Integration: In this phase, students move between reflection
and discourse. The phase is characterized by the synthesis of
the ideas generated in the exploration phase. The synthesis ulti-
mately leads to the construction of meaning [24]. From a teach-
ing perspective, this is the most difficult phase to detect from
the discussion transcripts, as the integration of ideas is often not
clearly identifiable.
4) Resolution: In this phase, students resolve the original prob-
lem or dilemma that started the learning cycle. In the formal
educational setting, this is typically achieved through a vicari-
ous hypothesis testing or consensus building within a learning
community [24].
The CoI model defines its own multi-dimensional content analy-
sis schemes [23,24] and 34-item likert-scale survey instrument [5]
which are used for the assessment of the three presences. The model
has gained a considerable attention in the research community re-
sulting in a fairly large number of replication studies and empirical
validations (for an overview see [25]) including the studies about the
interaction dynamics between the three presences [26]. In general,
the model has been shown to be robust, and its coding scheme ex-
hibits sufficient levels of inter-rater reliability for it to be considered
a valid construct [25].
While the CoI model has been proven to be a very useful model
for assessment of the social distance learning, there are several prac-
tical issues that still remain open. First, the use of the CoI coding
schemes requires a substantial amount of manual work, which is
very time consuming and requires trained coders. For example, to
code the dataset used in this study, two experienced coders spent
around 130 hours each to manually code 1,747 messages [27]. The
coding process started with the calibration of the use of the coding
scheme which was then followed by the independent coding, and
finally reconciliation of the coding disagreements.
One major consequence of manual coding of messages in the CoI
model is that it has been used mostly for research purposes and not
for the real-time monitoring of students’ learning progress and guid-
ing instructional interventions. This is not unique to the CoI model
and is very common with most of content analysis schemes used in
education. The lack of automated content analysis approaches has
been identified by Donnelly and Gardner [15] as one of the main rea-
sons why transcript analysis techniques have had almost zero impact
on educational practice. The development of the CoI survey instru-
ment [5] is one attempt to eliminate, or at least to lessen the need
for the manual content analysis of discussion transcripts. Still, the
instrument is based on self-reported survey data, which makes it
not so suitable for the real-time monitoring and guidance of student
learning.
In order to enable for a broader adoption of the CoI model, the
coding process needs to be automated and this is precisely the goal
of the current study. While this study focuses on automation of cod-
ing online discussion transcripts for the levels of cognitive presence,
a more general goal is to automate coding for all three presences,
which would enable for a more comprehensive view of social learn-
ing phenomena and the development of more sophisticated social
learning environments [60]. This in turn could be used by the in-
structors to inform their interventions leading to better achievement
of learning objectives. From the standpoint of self-regulated learn-
ing research [8] – a major theory in contemporary education – in
order to regulate their own learning effectively, learners need real-
time feedback, which is an “inherent catalyst” for all self-regulated
activities [8]. By providing learners with timely feedback on their
own learning and the learning of their peers, they would be in a
position to better regulate their own learning activities.
2.2 Automating Cognitive Presence Analysis
Several studies have investigated automating content analysis us-
ing the cognitive presence coding scheme. A study by McKlin [44]
describes a system built using feed-forward, back-propagation ar-
tificial neural network that was trained on a single semester worth
of discussion messages (N=1,997). The classification features were
the counts of words in the one of the 182 different word categories as
defined in the General Inquirer category model [52]. McKlin [44]
also used a binary indicator whether a message is a reply to another
message, as triggering events are more likely to be the discussion
starters and thus not replies to other messages. Finally, McKlin [44]
defined custom categories of words and phrases, which are thought
to be indicative of the different phases of cognitive presence and
included count of words in those categories as additional classifi-
cation features. For example, “indicative words” category contains
“compared to”, “I agree”, “that reminds me of”, and “thanks” as it is
hypothesized that integration messages would contain larger num-
ber of these phrases in order to connect the message with the previ-
ously given information. Unfortunately, these additional coding cat-
egories are very briefly described and thus is not possible to repli-
cate them and evaluate their usability in future studies. McKlin’s
findings show that classification system overgeneralized the explo-
ration phase and under-generalized the integration phase. Further-
more, given the very low frequency of messages in the resolution
phase (i.e., <1% and only 3 messages in total in their data set), the
neural network developed by McKlin simply ignored the resolution
category and never predicted the resolution phase for any message
in the corpus. Overall, they reported Holsti’s Coefficient of Relia-
bility [31] of 0.69 and Cohen’s κof 0.31, which show some potential
of the proposed approach with much room for improvement in order
to reach reliability levels commonly found among two independent
coders – usually Cohen’s κof at least 0.70 [29].
Following the work of McKlin [44], a study by Corich et al. [11]
presented ACAT, a very general classification framework that can
support any coding scheme besides cognitive presence which is also
based on word count features. In order to use ACAT, users are re-
quired to provide a set of labeled training examples, which are used
for training of classification models. Furthermore, as ACAT does
not specify a particular set of word categories that are used as classi-
fication features, users are required to provide definitions (i.e., cate-
gory name and list of words) that are used as classification features.
Interestingly, the use of the ACAT system is also evaluated on the
problem of coding cognitive presence of the CoI model. However,
instead of classifying each message to one of the four phases of
cognitive presence, Corich et al. [11] classified each sentence of
each message to four cognitive presence levels. This poses some
theoretical challenges as the CoI coding schemes are originally de-
signed to be used for message-level content analysis. The dataset
used by Corich et al. [11] consists of 484 sentences originating from
74 discussion messages and they report Holsti’s coefficient of reli-
ability of 0.71 in their best test case. However, given that their re-
port did not provide sufficient details about the classification scheme
used in terms of the specific indicators for each category of cogni-
tive presence, nor did it discuss the types of features that were used
for classification, it is hard to evaluate the significance of their re-
sults.
Besides the studies by McKlin [44] and Corich et al. [11], we
should also mention our previous work in this domain. A study
by Kovanović et al. [35] investigated the use of Support Vector Ma-
chines (SVMs) [59] classification for the automation of cognitive
presence coding using a bag-of-words approach based on the N-
gram and Part-of-Speech (POS) N-gram features. Using a 10-fold
cross-validation, a classification accuracy of 0.41 Cohen’s κwas
achieved – which is higher than values reported in the previous stud-
ies [44,11].
Several challenges related to the classification of online discus-
sion massages based on cognitive presence were observed in our ex-
isting work [35]. First, the distribution of classes in the used dataset
(i.e., phases of cognitive presence) was uneven, which is in agree-
ment with the findings commonly reported in the literature [25].
This poses some challenges to the classification accuracy. This was
already seen in the McKlin [44] study whose classifier completely
ignored the resolution phase (as only three messages were coded
as being in resolution phase). Secondly, the use of bag-of-words
features (i.e., n-grams, POS n-grams, and back-off n-grams) cre-
ates a very large feature space (i.e., more than 20,000 features) rel-
ative to the number of classification instances (i.e., 1,747) which
poses challenge of over-fitting. Next, the use of bag-of-words fea-
tures makes the classification system highly domain dependent, as
the space of bag-of-words features is defined based on the training
set. For instance, a classification system trained on a introductory
programming course would likely have a bigram feature java pro-
gramming which is highly specific to a particular domain and would
impede the performance of the classifier in other domains. Finally,
given that each message belongs to a discussion and represents a
part of the overall conversation, the context of the previous mes-
sages in the discussion thread is very important. For example, given
the structure and cyclic nature of inquiry process, it is highly un-
likely that a discussion would start with a resolution message, or
that the first response to a triggering message will be an integration
message [27]. These “dependencies” between discussion messages
are not taken into the account when each message is classified in-
dependently of other messages in the discussion.
In order to address the challenge of isolated classification of dis-
cussion messages, Waters et al. [62] developed a structured classi-
fication system using conditional random fields (CRFs) [38]. This
classifier does a prediction for the whole sequence of messages within
a discussion, taking into the account orderings of messages within
a discussion thread. Using a 10-fold cross-validation, the devel-
oped classifier achieved Cohen’s κof 0.48 which is significantly
higher than 0.41 Cohen’s κreported by [35], showing a promise of
the structured classification approach. However, there are still cou-
ple of unresolved issues which warrant further investigation. First
of all, although the classification accuracy is improved, it is still far
below the Cohen’s κof 0.7 which is considered a norm for assessing
the quality of the coding in the CoI research community [29]. Sec-
ondly, CRFs are an example of black-box classification method [28]
that are hard to interpret, which limits their potential use for under-
standing how cognitive presence is captured in the discourse.
3. METHOD
3.1 Data set
The dataset used in this study is the same dataset that was used
in studies by Kovanović et al. [35] and Waters et al. [62]. The data
comes from a masters level, and research-intensive course in soft-
ware engineering offered through a fully online instructional con-
dition at a Canadian open public university. The dataset consists
of six offerings of the course between 2008 and 2011 with the to-
tal of 81 students that produced 1,747 discussion messages (Ta-
ble 1). On average, each offering of the course had 13-14 stu-
dents (SD = 5.1) that produced on average 291 messages, al-
beit with a large variation in the number of messages per course
offer (SD = 192.4). The whole dataset was coded by the two ex-
pert coders for the four levels of cognitive presence enabling for a
supervised learning approach. The inter-rater agreement was excel-
lent (percent agreement = 98.1%, Cohen’s κ= 0.974) with a
total of only 33 disagreements.
Table 2shows the distribution of four phases of cognitive pres-
ence. In addition to the four categories of cognitive presence, we in-
cluded the category “other”, which is used for messages that did not
exhibit signs of any phase of cognitive presence. The most frequent
messages were exploration messages (39% of messages), while the
least frequent were the resolution messages (6% of messages). This
large difference between the frequencies of the four phases was ex-
pected. It is consistent with the previous studies of cognitive pres-
ence [26], which found that a majority of students were not pro-
Table 1: Course offerings statistics
Student count Message count
Winter 2008 15 212
Fall 2008 22 633
Summer 2009 10 243
Fall 2009 7 63
Winter 2010 14 359
Winter 2011 13 237
Average (SD) 13.5 (5.1) 291.2 (192.4)
Total 81 1,747
Table 2: Distribution of cognitive presence phases
ID Phase Messages (%)
0 Other 140 8.0%
1 Triggering Event 308 17.6%
2 Exploration 684 39.2%
3 Integration 508 29.1%
4 Resolution 107 6.1%
Average (SD) 349.4 (245.7) 20.0% (10.0%)
Total 1,747 100%
gressing to the later stages of integration and resolution. While
there are various interpretations for this pattern, including the va-
lidity of the model, the design and expectations of the courses –
i.e., not requiring students to move to those phases – seems to be
the most compelling reason, as shown by its growing acceptance
in the literature [25]. Psychologically, if students are going through
the four phases of the practical inquiry model that underlies the cog-
nitive presence construct, it does seem reasonable that students will
spend more time exploring and hypothesizing different solutions,
before they could come up with a final resolution [2,27]. More-
over, as discussions were designed to occur between the third and
the fifth week of the course, students did not typically move to the
resolution phase this early in the course. Specifically, the discus-
sions were organized to provide the students with opportunities to
discuss ideas that would inform the individual research projects that
they planned for the later stages of the course.
3.2 Feature Extraction
While the majority of the previous work related to text classi-
fication is based on lexical N-gram features (e.g., unigrams, bi-
grams, trigrams) and similar features (e.g., POS bigrams, depen-
dency triplets), we eventually decided not to include N-gram and
similar features described in the Kovanović et al. [35] study for sev-
eral reasons. First of all, the use of those features inflates the fea-
ture space, generating thousands of features even for small datasets.
This strongly increases the chances for over-fitting the training data.
Secondly, the use of those features is also very “dataset dependent”,
as data itself defines the classification space. Thus, it is hard to
define a fixed set of classification features in advance, as the par-
ticular choice of words in the training documents will define what
features are used for classification (i.e., what N-gram variables are
extracted). Finally and most importantly, given that N-grams and
other simple text mining features are not based on any existing the-
ory of human cognition related to the CoI model, it is hard to un-
derstand what they might theoretically mean. Given that our goal
is also to understand how cognitive presence is captured within
discourse, we focused our work on extracting features which are
strongly theory-driven and based on empirical studies. In total, we
extracted 205 classification features which are described in the re-
minder of this subsection.
3.2.1 LIWC features
In this study, we used the LIWC (Linguistic Inquiry and Word
Count) tool [57], to extract a large number of word counts which
are indicative of different psychological processes (e.g., affective,
cognitive, social, perceptual). Our previous research [32] showed
that different linguistic features operationalized through the LIWC
word categories offer distinct proxies of cognitive presence.
In contrast to extracting N-grams, which produce a very large
number of independent features, LIWC provides us with exactly 93
different word counts which are all based on extensive empirical re-
search [58, cf.]. LIWC features essentially “merge” related – and
domain-independent – N-gram features together to produce more
meaningful classification features. We used the 2015 version of the
LIWC software package, which also provides four high-level aggre-
gate measures of i) analytical thinking, ii) social status, confidence,
and leadership, iii) authenticity, and iv) emotional tone.
3.2.2 Coh-Metrix features
For extraction of features for classification we also used Coh-
Metrix [30,45], a computational linguistics tool that provides 108
different metrics of text coherence (i.e., co-reference, referential,
causal, spatial, temporal, and structural cohesion), linguistic com-
plexity, text readability, and lexical category use. Coh-Metrix has
been extensively used a large number of studies to measure subtle
differences in different forms of text and discourse and is currently
used by the Common Core initiative to analyze learning texts in K-
12 education [45].
Coh-Metrix has been previously used in the domain of social
learning to measure the student performance [16] and development
of social ties [33,34] based on the language used in the discourse.
For example, a study by Dowell et al. [16] showed that character-
istics of the discourse – as measured by Coh-Metrix – were able
to account for 21% of the variability in the performance of active
MOOC students. Students performed significantly better when then
engaged in exploratory-style discourse, with the high levels of deep
cohesion and the use of simple syntactic structures and abstract lan-
guage. With the goal of the existing CoI content schemes to pre-
scribe different indicators of important socio-cognitive processes
in the discourse, the use of Coh-Metrix provides a valuable set of
metrics that can be easily extracted and used for automation of the
CoI coding schemes.
3.2.3 Discussion context features
Drawing on the study by Waters et al. [62], we also focused on
incorporating more context information in our feature space. Thus,
we included all features (except unigrams) which were used in the
Waters et al. study. Those included:
Number of replies: An integer variable indicating the number
of replies a given message received.
Message Depth: An integer variable showing a position of
message within a discussion.
Cosine similarity to previous/next message: The rationale be-
hind these features is to capture how much a message builds
on the previously presented information.
Start/end indicators: Simple 0/1 indicator variables showing
whether a message is first/last in the discussion.
As the CoI model – from the perspective of educational psychology
– is a process model [25], students’ cognitive presence is viewed as
being developed over time through discourse and reflection. There-
fore, in order to reach higher levels of cognitive presence students
need to either: i) construct knowledge in the shared-world through
the exchange of a certain number of discussion messages, or ii) con-
struct knowledge in the their own private world of reflective learn-
ing. Given the social-constructivist view of learning in the CoI
model, we can expect that the distribution of messages exhibiting
the characteristics of the different phases of cognitive presence will
tend to change over time, as the students progress through those
phases. Thus, we can expect that triggering and exploration mes-
sages will be more frequent in the early stages of the discussions,
while integration and resolution messages will be more common in
the later stages.
3.2.4 LSA similarity
Messages belonging to different phases of cognitive presence are
characterized with various socio-cognitive processes [24]. The trig-
gering phase introduces a certain topic in a tentative form, present-
ing a concept(s) that might not be completely developed, while the
exploration phase further elaborates on various approaches to the
inquiry initiated in the triggering phase. More precisely, the explo-
ration phase introduces new ideas, divergent from the community,
or even several contrasting topics within the same message [49].
On the other hand, the integration phase assumes a continuous pro-
cess of reflection and integration, which leads to the construction
of meaning from the introduced ideas [24]. Finally, the resolu-
tion phase presents explicit guidelines for applying knowledge con-
structed through the inquiry process [24,49]. Based on these in-
sights, we assumed that information presented in the various stages
of the learning process might have an important influence on mes-
sage comprehension. Still, given the differences among the learners
and their learning habits, we did not expect this to be manifested as
a general rule, but more as a slight tendency which would be useful
in combination with the other classification features.
Following the approach suggested by Foltz et al. [20], we used
LSA with the sentence as a unit of analysis to define a single vari-
able lsa.similarity, which represents the average sentence sim-
ilarity (i.e., coherence) within a message. As LSA determines the
coherence based on the semantic relatedness between terms (i.e.,
terms that tend to occur in a similar context) [13], we first had to
define a semantic space in which the similarity estimates are given.
Having in mind that different discussions might relate to the dif-
ferent concepts, we decided to create a separate semantic space for
each discussion. We identified the most important concepts from
the first message in a discussion with a semantic annotation tool
TAGME [19] and then each identified concept was linked to an
appropriate Wikipedia page from which we extracted information
about that concept [19]. Given that previous studies [55,22] showed
that Wikipedia can be used for estimation of semantic similarity be-
tween different concepts, we used information from the extracted
pages to construct the semantic space on which LSA similarity of
the concepts is calculated.
3.2.5 Number of named entities
Based on the work described in [47] and our previous study [35],
we hypothesized that messages belonging to the different phases of
cognitive presence would contain different count of named entities
(e.g., named objects such as people, organizations, and geographi-
cal locations). The basis for this is taken from the definition of the
cognitive presence construct [24]. Exploration messages are char-
acterized by the brainstorming and exploration of new ideas, and
thus, those messages are expected to contain more named entities
than integration and resolution messages. Given the subject of the
course in which the data for this study were collected, we extracted
from each message a number of entities that are related to the com-
puter science category of Wikipedia by using the DBPedia Spotlight
annotation tool [46].
3.3 Data preprocessing
As the first step in our analysis, we addressed the problem of dif-
ferent number of messages in five classification categories (i.e., four
phases of cognitive presence and “other”). The imbalance of dif-
ferent classes can have very negative effects on the results of the
classification analyses [56]. Generally speaking, there are two pos-
sible ways of addressing this problem [10]: i) cost-sensitive clas-
sification, in which different penalties are assigned for misclassifi-
cation of instances from different categories (higher penalties for
smaller classes), and thus forcing the algorithm to put more em-
phasis on properly recognizing smaller classes; and ii) resampling
methods, either by oversampling smaller classes, undersampling
large classes, or through a combination of these two approaches.
Given that cost-sensitive classification is used typically for two class
problems (“positive” vs. “negative”), where correctly classifying
one of the classes is the primary goal of the classifier (i.e., patients
with a disease, fraudulent banking transaction), it makes sense to as-
sign different misclassification costs as correctly identifying “neg-
ative” class is not important. However, in our case, we are equally
interested in all five classes (four cognitive presence categories and
the other messages), as they represent different phases in student
learning cycles and it is not immediately clear whether misclassi-
fication of resolution messages is “worse” than misclassification of
triggering event messages. Thus, in our study, we used resampling
techniques and in particular a very popular SMOTE algorithm [9],
which is a hybrid approach that combines oversampling the minor-
ity class with undersampling of the majority class.
One interesting property of SMOTE is that instead of simply re-
sampling minority class instances – which would generate simple
copies of the existing data points – it generates new synthetic in-
stances which are “similar” to the existing instances but not exactly
the same. For example, in n-dimensional feature space, for every
data point (X={f1, f2, ...fn}) of the class Cithat is selected for
resampling, SMOTE:
1) Find K (in our case five) nearest neighboring instances from
the class Ci. As the distances between original Cidata points
are known in advance, the list of K nearest neighbors for all
instances in Ciclass are calculated and stored in N×Kma-
trix (where Nis the number of data points in the Ciclass).
2) Randomly picks one of the identified neighbors (Y).
3) Generates a new data point Zas:
Z=X+rand(0,1) Y
where rand(0,1) is a function returning a random number
between 0 and 1.
Figure 1shows the results of applying SMOTE to our dataset.
As our original dataset consists of 1,747 messages, the class distri-
bution would be uniform if each of the classes contained approxi-
mately 350 messages (i.e., 1,747/5350). Thus, we first user
SMOTE oversampling procedure explained previously to generate
additional 210, 42, and 243 instances of “Other”, “Triggering”, and
“Resolution” classes, respectively. This increased the total num-
ber of messages in each of these three classes to 350 messages in
total. We then undersampled messages in “Exploration” and “Inte-
gration” categories to create a smaller groups of also 350 messages.
Hence, we removed 334 “Exploration” messages and 158 “Integra-
tion” messages, to produce smaller groups of also 350 messages in
total. Overall, after applying SMOTE the new dataset consists of
1,750 messages, with each of the five categories of messages repre-
sented with exactly 350 messages.
Besides compensating for class imbalance problem, we also re-
moved the two duplicate features that were provided by both LIWC
and Coh-Metrix: i) the total number of words in a message, and
ii) the average number of words in a sentence. We decided to re-
move LIWC values and use only the ones provided by Coh-Metrix.
Other Trig. Exp. Integ. Resol.
Message Count
0 350 700
140
308
684
508
107
Figure 1: SMOTE preprocessing for class balancing. Dark blue
– original instances which are preserved, light blue – synthetic
instances, red – original instances which are removed.
The primary reason for using Coh-Metrix features is consistency,
as there are some small differences in how those two systems pro-
cess corner cases (e.g., hyphenated words, interpunction signs) and
given that Coh-Metrix provides additional set of metrics (e.g., num-
ber of sentences, number of paragraphs) we wanted to use consistent
calculations for all of the included metrics.
3.4 Model Selection and Evaluation
To build our classifier, we used random forests [7], a state-of-the
art tree-based classification technique. A large comparative analysis
of 179 general-purpose (i.e., not domain-specific, offline, and un-
structured) classification algorithms on 121 different datasets used
in the previously published studies by Fernández-Delgado et al. [18]
found that random forests were the top performing classification al-
gorithm, only matched by Gaussian kernel SVMs. Random forests
are ensemble tree-based method that combines bagging (bootstrap
aggregating) with the idea of random-subspace to create a robust
classification system which has low variance without increasing the
bias [18]. Random forests work by creating a large number of trees
and then the final prediction is decided using the majority voting
scheme. Each tree is constructed on a different bootstrap sample
(sub-sample of the same size with repetition) and evaluated on data-
points that did not enter the bootstrap sample (in general, around
one third of the training dataset size). In addition, each tree does
not use the complete feature set, but has a random selection of N
attributes (i.e., a subspace) which are then used for growing an in-
dividual tree without any pruning. Random forests are widely used
technique that can handle large datasets with thousands of features.
It is important to note that random forests can also be used to
measure importance of individual classification features. While im-
portance of individual classification features might be calculated in
many different ways [41], one popular measure is Mean Decrease
Gini (MDG) which is based on the reduction in Gini impurity mea-
sure. Generally speaking, Gini impurity index measures how much
the data points of a given tree node belong to the same class (i..e,
how much they are “clean”). For every internal (split) node we can
measure the decrease in Gini impurity, which shows how useful a
given tree node is for separating the data (i..e, how much it reduces
the impurity of the resulting groups of data). For random forests,
MDG measure for a feature Xjis calculated as a mean decrease in
Gini impurity of all tree nodes where a given feature Xjis used.
As there are two parameters used for configuration of random
forests (i.e., ntree – number of trees constructed, and mtry – the
number of randomly selected features), we used a cross-validation
to select the optimal random forest parameters. As the performance
of random forests typically stabilizes after a certain number of trees
are built, we decided to build a large ensemble of 1,000 trees to
make sure that convergence is reached. Thus, we focused on se-
lecting optimal number of features used in every three (i.e., mtry
parameter). We used a 10-fold cross validation and repeated it 10
Number of attributes in a tree
Accuracy
0.68
0.69
0.70
0.71
0 50 100 150 200
Figure 2: Random forest parameter tuning results.
times in order to reduce variability and get more accurate estimates
of cross validated performance. In each run of the cross validation,
we examined 20 different values for the mtry parameter: {2, 12, 23,
34, 44, 55, 66, 76, 87, 98, 108, 119, 130, 140, 151, 162, 172, 183,
194, 205}. The exact set of these values is obtained by using the
var_seq function from R’s caret package.
Before training and evaluating our classification models, we split
data to 75% for model training and 25% for testing. We used strat-
ified sampling, so that class distribution in both sub-samples is the
same. We selected the best mtry value using the 10 repetitions of
the 10-fold cross validation and then reported the classification ac-
curacy of the best performing model on the testing data.
3.5 Implementation
We implemented our classifier in the R and Java programming
languages using several software packages:
for feature extraction we used Coh-Metrix [45,30] and LIWC
2015 software packages [58],
for developing random forest classifier, we used the randomForest
R package [40],
for running repeated cross validation and aggregating model per-
formance, we used the caret R package [21],
for running the SMOTE algorithm we used the Weka [63] Java
package, and
for calculation of LSA similarity measure, we used the Text Min-
ing Library for LSA (TML)1.
The complete dataset for the study and source code of the implemen-
tation is publicly available at github.com/kovanovic/lak16
_classification repository.
3.6 Limitations
The major limitations of our approach are related to the size of
our data set. Although we have six course offerings, they are all
from the same course at a single university, and together with the
particular details of adopted pedagogical and instructional approach
they might potentially have an effect on the generalizability of our
classification model. Thus, in our future work, we plan to test the
generalization power of our classifier on a different dataset, which
would preferably also account for other important confounding vari-
ables recognized in research of the CoI model such as subject do-
main [6], level of education (i.e., undergraduate vs. graduate) [26],
and mode of instruction (blended vs. fully online vs. MOOC) [61].
4. RESULTS
4.1 Model training and evaluation
Figure 2shows the results of our model selection and evaluation
procedure. The best classification accuracy of 0.72 (SD = 0.04)
and 0.65 Cohen’s κ(SD = 0.05) was obtained with mtry value of
12, which means that each decision tree takes into the account only
1tml-java.sourceforge.net
Table 3: Random forest parameter tuning results
mtry Accuracy Kappa
Min 194 0.68 (0.04) 0.59 (0.04)
Max 12 0.72 (0.04) 0.65 (0.05)
Difference 0.04 0.06
0 200 400 600 800 1000
0.1 0.3 0.5 0.7
Number of trees
Error
OOB
other
triggering
exploration
integration
resolution
Figure 3: Best random forest configuration performance.
12 out of 205 features. The difference between the best- and worst-
performing configurations was 0.06 Cohen’s κ(Table 3), which sug-
gest that parameter optimization plays an important role in the final
classifier performance. Looking at the best performing configura-
tion (Figure 3), we can see that the use of 1,000 trees in an ensem-
ble resulted in reasonably stable error rates, with an average out-of-
bag (OOB) error rate of 0.29, (i.e., an average misclassification rate
for all data points in cases when they were non used in bootstrap
samples). As expected, the highest error rates were associated with
the undersampled classes (i.e., exploration and triggering) and the
smallest with the classes that were most heavily oversampled (i.e.,
resolution and “other”)
Following the model building, we evaluated its performance on
the hold-out 25% of the data. Our random forest classifier obtained
70.3% classification accuracy (95% CI[0.66, 0.75]) and 0.63 Co-
hen’s κwhich were significant improvements over 0.41 and 0.48
reported in Kovanović et al. [35] and Waters et al. [62] studies, re-
spectively. Table 4shows the confusion matrix obtained on the test-
ing dataset. We can see that the most significant misclassifications
are between exploration and integration messages which are hard-
est to distinguish. This is already witnessed in the [62] where most
of the misclassifications were related to exploration and integration
messages.
4.2 Variable importance analysis
Figure 4shows the variable importance measures for all the 205
classification features. The median MDG score was 4.43, with the
most of the features having smaller MDG scores, and only few fea-
tures having very high MDG scores. Table5shows the values of top
20 variables based on their MDG scores and their average values in
each class (i.e., cognitive presence phase). We can see that the most
important variable was the cm.DESCWC, i.e., the number of words in
a message; that is, the longer the message was, the higher the chance
of the message was to be in the later phases of the cognitive presence
cycle. Also, the number of paragraphs, number of sentences, and
Table 4: Confusion matrix for the best performing model
Predicted
Actual Other Triggering Explorat. Integrat. Resolut.
Other 79 2222
Triggering 567 960
Exploration 9 15 35 27 1
Integration 2 2 23 44 16
Resolution 004281
Mean Decrease Gini
Variable
0
50
100
150
200
0 10 20 30
Figure 4: Variable importance by Mean Decrease Gini measure.
Blue line separates top twenty features.
average sentence length showed similar trends, with higher values
being associated with the later phase of cognitive presence.
The most important Coh-Metrix features were related to lexical
diversity of the student vocabulary with the highest lexical diver-
sity being displayed by “other” messages. Standard deviation of
the number of syllables – which is an indicator of the use of words
of different lengths – had the strongest association with the trig-
gering event phase. In contrast, the givenness (i.e., how much of
the information in text is previously given) had the highest associ-
ation with the resolution phase messages. Finally, the low Flesch-
Kincaid Grade level readability score and the low overlap between
verbs used had the strongest association with “other” messages (i.e.,
messages without traces of cognitive presence).
The most important LIWC features were i) the number of ques-
tion marks used, which was strongly associated with the trigger-
ing event phase, ii) the number of first person pronouns, which was
highly associated with the other (i.e., non-cognitive presence) mes-
sages, and iii) use of money-related words, which is mostly associ-
ated with the integration and resolution phases.
Message context features also scored high, with message depth
being higher for the later stages of cognitive presence, and highest
for “other” messages. A similar trend was observed for similarity
with the previous message, which was highest for the integration
and resolution messages and lowest for the triggering event mes-
sages. In contrast, similarity with the next message and number of
replies were highest for triggering events and lowest for the “other
messages. It is interesting to note that both LSA similarity and the
number of named entities obtained high MDG scores. The number
of named entities was the second most important feature and was
highly associated with the later stages of the cognitive presence cy-
cle. A similar trend was also observed for LSA similarity however,
its importance was much lower.
5. DISCUSSION
Based on the testing results of the developed classifier, we can see
that the use of the LIWC and Coh-Metrix features, together with
a small number of thread-based context features could be used to
provide reasonably high classification performance. The obtained
Cohen’s κvalue of 0.63 falls in the range of “substantial” inter-
rater agreement [39], and is just slightly below the 0.70 Cohen’s κ
which is the CoI research community commonly used as a threshold
value for that is required before coding results are considered valid.
We can also see that the parameter tuning plays an important role
in optimizing the classifier performance, as the different classifier
configurations obtained results different up to 0.05 Cohen’s κand
0.04% classification accuracy (Table 3).
Given that the same dataset is used as in the [35] and [62] stud-
ies, it is possible to directly compare the results of the classification
algorithms. The obtained Cohen’s κis 0.15 and 0.22 higher than
the ones reported by Waters et al. [62] and Kovanović et al. [35], re-
spectively. Furthermore, the resulting feature space is much smaller,
Table 5: Twenty most important variables and their mean scores for messages in different phases of cognitive presence
Cognitive presence phase
#Variable Description MDGOther Triggering Exploration Integration Resolution
1cm.DESWC Number of words 32.91 55.41 (61.06) 80.91 (41.56) 117.71 (67.23) 183.30 (102.94) 280.68 (189.62)
2ner.entity.cnt Number of named entities 26.41 13.44 (15.36) 21.67 (10.55) 28.84 (16.93) 44.75 (24.85) 64.18 (32.54)
3cm.LDTTRa Lexical diversity, all words 21.98 0.85 (0.12) 0.77 (0.09) 0.71 (0.10) 0.65 (0.09) 0.58 (0.09)
4message.depth Position within discussion 19.09 2.39 (1.13) 1.00 (0.90) 1.84 (0.97) 1.87 (0.94) 2.00 (0.68)
5cm.LDTTRc Lexical diversity, content words 17.12 0.95 (0.06) 0.90 (0.06) 0.86 (0.08) 0.82 (0.07) 0.78 (0.07)
6cm.LSAGN Avg. givenness of each sentence 16.63 0.10 (0.07) 0.14 (0.06) 0.18 (0.07) 0.21 (0.06) 0.24 (0.06)
7liwc.QMark Number of question marks 16.59 0.27 (0.85) 1.84 (1.63) 0.92 (1.26) 0.58 (0.82) 0.38 (0.55)
8message.sim.prev Similarity with previous message 16.41 0.20 (0.17) 0.06 (0.13) 0.22 (0.21) 0.30 (0.24) 0.39 (0.19)
9cm.LDVOCD Lexical diversity, VOCD 15.43 12.92 (33.93) 28.99 (50.61) 53.57 (54.68) 83.47 (43.00) 97.16 (28.95)
10 liwc.money Number of money-related words 14.38 0.21 (0.69) 0.32 (0.74) 0.32 (0.75) 0.65 (1.12) 0.99 (1.04)
11 cm.DESPL Avg. number of paragraphs sent. 12.47 4.26 (2.98) 6.37 (2.76) 7.49 (4.11) 10.17 (5.64) 14.05 (8.88)
12 message.sim.next Similarity with next message 11.74 0.08 (0.14) 0.34 (0.40) 0.20 (0.22) 0.22 (0.24) 0.22 (0.23)
13 message.reply.cnt Number of replies 11.67 0.42 (0.67) 1.44 (1.89) 0.82 (1.70) 1.10 (2.66) 0.84 (1.24)
14 cm.DESSC Sentence count 11.67 4.28 (3.17) 6.36 (2.75) 7.49 (4.11) 10.17 (5.64) 14.29 (10.15)
15 lsa.similarity Avg. LSA sim. between sentences 9.69 0.29 (0.27) 0.47 (0.23) 0.54 (0.23) 0.62 (0.20) 0.67 (0.17)
16 cm.DESSL Avg. sentence length 9.60 11.88 (6.82) 13.62 (5.85) 16.69 (6.54) 19.36 (8.39) 21.73 (8.61)
17 cm.DESWLsyd SD of word syllables count 8.92 0.98 (0.69) 1.33 (0.70) 0.98 (0.18) 0.97 (0.14) 0.97 (0.11)
18 liwc.i Number of FPSpronouns 8.84 4.33 (3.53) 2.82 (2.06) 2.37 (1.94) 2.51 (1.65) 2.19 (1.23)
19 cm.RDFKGL Flesch-Kincaid Grade Level 8.29 7.68 (4.28) 10.30 (3.50) 10.19 (3.11) 11.13 (3.46) 11.99 (3.37)
20 cm.SMCAUSwn WordNet overlap between verbs 8.14 0.38 (0.25) 0.48 (0.20) 0.51 (0.13) 0.50 (0.10) 0.47 (0.06)
MDG - Mean decrease Gini impurity index, FPS - first person singular
with only 205 classification features in total, which is 100x smaller
than the number of bag-of-words features extracted by Kovanović
et al. [35] classifier. This limits the chances of over-fitting the train-
ing data and also improves the performance of the classifier. This
is particularly important for the prospective use of the classifier in
different subject domains, and pedagogical contexts.
Another important finding of this study is the list of important
classification features. We see that a small subset of features is
highly predictive of the different phases of cognitive presence, while
a majority of the features have a much lower predictive power (Fig-
ure 4). It is interesting to notice that most of the discussion context
features (except the discussion start/end indicators) obtained high
importance scores, indicating the value in providing contextual in-
formation to the classification algorithm. In our future work, we will
focus on investigation of the additional features that would provide
even more contextualized information to the classifier.
It is important to notice that the list of the most important vari-
ables is aligned with the conceptions of cognitive presence in the
existing CoI literature. If we look at the messages in the four phases
of cognitive presence, we can see that the higher levels of cognitive
presence are associated with messages that are i) generally longer,
with more sentences and paragraphs, ii) adopt more complex lan-
guage with generally longer sentences, iii) include more named en-
tities (e.g., names of different constructs, theories, people, compa-
nies, and geographical locations) iv) have lower lexical diversity,
v) occur later in the discussion, vi) have higher givenness of the
information, higher coherence, and higher verb overlap, vii) use
fewer question marks and first-person singular pronouns, viii) ex-
hibit higher similarity with the previous messages, and ix) more
frequently use money-related terms. Interestingly, the feature of the
highest importance is also the simple word count implying that the
longer the message, the more likely it is in the higher levels of cogni-
tive presence cycle. This is also consistent with the findings of a pre-
vious study with the same dataset [32]. Joksimović et al. [32] found
that word count was the only LIWC 2007 variable that yielded sta-
tistically significant differences among all four cognitive presence
categories. This is not totally surprising as the similar findings are
reported by essay grading studies who found that the strongest pre-
dictor of the final essay grade is the length of the essay [48].
Looking at the non-cognitive or “other” messages, we can see
that they are characterized by the large lexical diversity. This is
expected, as non-cognitive messages tend to be shorter (i.e., fewer
words, paragraphs, and sentences) and more informal. Higher lev-
els of lexical diversity are known to be associated with very short
tests or texts of low cohesion [1]. As “other” messages often are
not related to the course topic, they also tend to have a lower num-
ber of named entities, and lower givenness and verb overlap. Such
messages also tend to adopt a simpler language, as indicated by the
lowest scores on the Flesch-Kincaid grade level. “Other” messages
also tend to occur more frequently near the end of the discussion,
as indicated by their high values for message.depth feature and
also more often are related to expression of personal information,
as indicated by the highest values for the use of first-person singu-
lar pronouns. This is expected as many discussions would typically
finish with students thanking each other for their contributions.
6. CONCLUSIONS
This paper has twofold contributions. First, we developed a clas-
sifier for coding student discussion transcripts for the levels of cog-
nitive presence with a much higher performance (0.63 Cohen’s κ)
than previously reported ones [35,62] in the studies with the same
dataset. The performance of the developed classifier is in the range
which is generally considered to be a substantial level of agree-
ment [39]. We can see that the proposed approach, which is based
on the use of Coh-Metrix, LIWC, and discussion context features,
shows a great promise for providing a fully automated system for
coding cognitive presence. The feature space that is used is also
much smaller, which limits the chances for over-fitting the data and
makes the developed classifier more generalizable to other contexts.
Secondly, we can see a particular subset of classification features
that are very highly predictive of the different phases of cognitive
presence. The most predictive feature is simple word count, which
implies that the longer the message is, the higher the chances are
for the message to display higher levels of cognitive presence. We
also identified several additional features which are also highly pre-
dictive of the cognitive presence phase, in particular the number
of named entities that are used (higher values are associated with
integration and resolution phase) and lexical diversity (lower val-
ues are associated with “other” and triggering messages). We also
see that features that provide information on the discussion context
(i.e., similarity the with previous/next message, order in the discus-
sion thread, and number of replies) are highly valuable and provide
important information to the classification algorithm.
In our future work, we will focus on exploring additional fea-
tures for improving the classification performance [43]. The study
presented in this paper and our previous work [35] indicate that con-
textual features have a significant effect on classification accuracy
and we will examine additional features of this kind. As our results
reveal that the number of named entities has a significant effect on
classification accuracy, and we will further explore similar features,
such as concept maps [64], which would provide additional infor-
mation about relationships between important concepts discussed
in text-based messages. Finally, we will look at the different data
preprocessing steps, including the use of the different algorithms
for resolving the class imbalance problem. As we also observed
that some of the students used direct quotes of other student mes-
sages which can cause problems for many of the text metrics that
we used for classification, we will further examine the effects of the
quotation on the final classification accuracy.
Finally, following the results presented in [17], we are explor-
ing ideas for the development of a system that would – beside class
labels – provide associated probabilities. Such a classifier could
be used to develop a semi-automated classification system in which
only one part of the data for which probabilities are sufficiently high
would be automatically classified, and the rest would be manually
classified. This would be advantageous as the combined desired
accuracy of automatic-manual coding could be reached by setting
a corresponding probability threshold. For achieving high levels
of accuracy, a large majority of data would be classified automati-
cally eliminating the large part of the manual work. Besides using
it for coding discussion transcripts for research purposes, such sys-
tem could be use, for example, to provide a real-time overview of
the progress for a group of students and to point out the students for
which an progress estimates are uncertain.
References
[1] Coh-Metrix 3.0 indicies. http://cohmetrix.com/
documentation_indices.html.
[2] Z. Akyol, J. B. Arbaugh, M. Cleveland-Innes, D. R. Garrison, P. Ice,
J. C. Richardson, and K. Swan. A response to the review of the com-
munity of inquiry framework. Journal of Distance Education, 23(2),
2009.
[3] T. Anderson and J. Dron. Three generations of distance education
pedagogy. The International Reviewof Research in Open and Distance
Learning, 12(3):80–97, 2010.
[4] T. Anderson, L. Rourke, D. R. Garrison, and W. Archer. Assessing
teaching presence in a computer conferencing context. Journal of
Asynchronous Learning Networks, 5:1–17, 2001.
[5] J. Arbaugh, M. Cleveland-Innes, S. R. Diaz, D. R. Garrison, P. Ice,
J. C. Richardson, and K. P. Swan. Developing a community of inquiry
instrument: Testing a measure of the community of inquiry framework
using a multi-institutional sample. The Internet and Higher Education,
11(3–4):133–136, 2008.
[6] J. B. Arbaugh, A. Bangert, and M. Cleveland-Innes. Subject matter
effects and the community of inquiry (coi) framework: An exploratory
study. The Internet and Higher Education, 13(1):37–44, 2010.
[7] L. Breiman. Random Forests. Machine Learning, 45(1):5–
32, Oct. 2001. ISSN 0885-6125, 1573-0565. doi: 10.1023/A:
1010933404324. URL http://link.springer.com/article/
10.1023/A%3A1010933404324.
[8] D. L. Butler and P. H. Winne. Feedback and self-regulated learning:
A theoretical synthesis. Review of Educational Research, 65(3):245–
281, 1995.
[9] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote:
synthetic minority over-sampling technique. Journal of artificial in-
telligence research, pages 321–357, 2002.
[10] N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: special issue
on learning from imbalanced data sets. ACM Sigkdd Explorations
Newsletter, 6(1):1–6, 2004.
[11] S. Corich, K. Hunt, and L. Hunt. Computerised content analysis for
measuring critical thinking within discussion forums. Journal of e-
Learning and Knowledge Society, 2(1), 2012.
[12] B. De Wever, T. Schellens, M. Valcke, and H. Van Keer. Content anal-
ysis schemes to analyze transcripts of online asynchronous discussion
groups: A review. Computers & Education, 46(1):6–28, 2006.
[13] S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and
R. Harshman. Indexing by latent semantic analysis. Journal of the
American Society for Information Science, 41(6):391–407, 1990.
[14] J. Dewey. My pedagogical creed. School Journal, 54(3):77–80, 1897.
[15] R. Donnelly and J. Gardner. Content analysis of computer conferenc-
ing transcripts. Interactive Learning Environments, 19(4):303–315,
2011.
[16] N. Dowell, O. Skrypnyk, S. Joksimović, A. C. Graesser, S. Dawson,
D. Gašević, P. de Vries, T. Hennis, and V. Kovanović. Modeling Learn-
ers’ Social Centrality and Performance through Language and Dis-
course. In Submitted to the 8th International Conference on Educa-
tional Data Mining (EDM 2015), Madrid, Spain, June 2015.
[17] P. Dönmez, C. Rosé, K. Stegmann, A. Weinberger, and F. Fischer.
Supporting CSCL with automatic corpus analysis technology. In Pro-
ceedings of th 2005 conference on Computer support for collaborative
learning: learning 2005: the next 10 years!, page 125–134, 2005.
[18] M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do
we need hundreds of classifiers to solve real world classification prob-
lems? The Journal of Machine Learning Research, 15(1):3133–3181,
2014.
[19] P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts
with wikipedia pages. Software, IEEE, 29(1):70–75, 2012.
[20] P. W. Foltz, W. Kintsch, and T. K. Landauer. The measurement of
textual coherence with latent semantic analysis. Discourse Processes,
25:285–307, 1998.
[21] M. K. C. from Jed Wing, S. Weston, A. Williams, C. Keefer, A. En-
gelhardt, T. Cooper, Z. Mayer, B. Kenkel, the R Core Team, M. Ben-
esty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, and C. Candan.
caret: Classification and Regression Training, 2015. URL http:
//CRAN.R-project.org/package=caret. R package version 6.0-
58.
[22] E. Gabrilovich and S. Markovitch. Computing Semantic Relatedness
Using Wikipedia-based Explicit Semantic Analysis. In Proceedings
of the 20th International Joint Conference on Artifical Intelligence,
IJCAI’07, pages 1606–1611, San Francisco, CA, USA, 2007. Morgan
Kaufmann Publishers Inc. URL http://dl.acm.org/citation.
cfm?id=1625275.1625535.
[23] D. R. Gar rison, T. Anderson, and W. Archer. Critical inquiry in a text-
based environment: Computer conferencing in higher education. The
Internet and Higher Education, 2(2–3):87–105, 1999.
[24] D. R. Gar rison, T. Anderson, and W. Archer. Critical thinking, cogni-
tive presence, and computer conferencing in distance education. Amer-
ican Journal of Distance Education, 15(1):7–23, 2001.
[25] D. R. Gar rison, T. Anderson, and W. Archer. The first decade of the
community of inquiry framework: A retrospective. The Internet and
Higher Education, 13(1–2):5–9, 2010.
[26] R. Gar rison, M. Cleveland-Innes, and T. S. Fung. Exploring causal
relationships among teaching, cognitive and social presence: Student
perceptions of the community of inquiry framework. The Internet and
Higher Education, 13(1–2):31–36, 2010.
[27] D. Gašević, O. Adesope, S. Joksimović, and V. Kovanović. Externally-
facilitated regulation scaffolding and role assignment to develop cog-
nitive presence in asynchronous online discussions. The Internet and
Higher Education, 24:53–65, Jan. 2015. doi: 10.1016/j.iheduc.2014.
09.006.
[28] L. Getoor. Introduction to Statistical Relational Learning. MIT Press,
2007. ISBN 978-0-262-07288-5.
[29] P. Gorsky, A. Caspi, I. Blau, Y. Vine, and A. Billet. Toward a coi
population parameter: The impact of unit (sentence vs. message) on
the results of quantitative content analysis. The International Review
of Research in Open and Distributed Learning, 13(1):17–37, 2011.
[30] A. C. Graesser, D. S. McNamara, and J. M. Kulikowich. Coh-
Metrix Providing Multilevel Analyses of Text Characteristics. Edu-
cational Researcher, 40(5):223–234, June 2011. ISSN 0013-189X,
1935-102X. doi: 10.3102/0013189X11413260. URL http://edr.
sagepub.com/content/40/5/223.
[31] O. R. Holsti. Content analysis for the social sciences and humanities.
1969.
[32] S. Joksimović, D. Gašević, V. Kovanović, O. Adesope, and M. Hatala.
Psychological characteristics in cognitive presence of communities of
inquiry: A linguistic analysis of online discussions. The Internet and
Higher Education, 22:1–10, July 2014. ISSN 1096-7516. doi: 10.
1016/j.iheduc.2014.03.001. URL http://www.sciencedirect.
com/science/article/pii/S1096751614000189.
[33] S. Joksimović, N. Dowell, O. Skrypnyk, V. Kovanović, D. Gašević,
S. Dawson, and A. C. Graesser. Exploring the Accumulation of So-
cial Capital in cMOOC Through Language and Discourse. Journal of
Educational Data Mining, (submitted), 2015.
[34] S. Joksimović, V. Kovanović, J. Jovanović, A. Zouaq, D. Gašević, and
M. Hatala. What Do cMOOC Participants Talk About in Social Me-
dia?: A Topic Analysis of Discourse in a cMOOC. In Proceedings of
the Fifth International Conference on Learning Analytics And Knowl-
edge, LAK ’15, pages 156–165, New York, NY, USA, 2015. ACM.
ISBN 978-1-4503-3417-4. doi: 10.1145/2723576.2723609. URL
http://doi.acm.org/10.1145/2723576.2723609.
[35] V. Kovanović, S. Joksimović, D. Gašević, and M. Hatala. Automated
Content Analysis of Online Discussion Transcripts. In Proceedings
of the Workshops at the LAK 2014 Conference co-located with 4th In-
ternational Conference on Learning Analytics and Knowledge (LAK
2014), Indianapolis, IN, Mar. 2014. URL http://ceur-ws.org/
Vol-1137/.
[36] V. Kovanović, S. Joksimović, D. Gašević, M. Hatala, and G. Siemens.
Content Analytics: the definition, scope, and an overview of published
research. In C. Lang and G. Siemens, editors, Handbook of Learning
Analyitcs. 2015.
[37] K. H. Krippendorff. Content Analysis: An Introduction to Its Method-
ology. Sage Publications, 0 edition, 2003.
[38] J. Laffer ty, A. McCallum, and F. C. Pereira. Conditional random fields:
Probabilistic models for segmenting and labeling sequence data. 2001.
[39] J. R. Landis and G. G. Koch. The measurement of observer agreement
for categorical data. Biometrics, 33(1):159–174, Mar. 1977. ISSN
0006-341X.
[40] A. Liaw and M. Wiener. Classification and regression by randomforest.
R News, 2(3):18–22, 2002. URL http://CRAN.R-project.org/
doc/Rnews/.
[41] G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts. Understanding
variable importances in forests of randomized trees. In Advances in
Neural Information Processing Systems, pages 431–439, 2013.
[42] R. Luppicini. Review of computer mediated communication research
for education. Instructional Science, 35(2):141–185, 2007.
[43] E. Mayfield and C. Penstein-Rosé. Using feature construction to avoid
large feature spaces in text classification. In Proceedings of the 12th
annual conference on Genetic and evolutionary computation, page
1299–1306, 2010.
[44] T. McKlin. Analyzing Cognitive Presence in Online Courses Using
an Artificial Neural Network. PhD thesis, Georgia State University,
College of Education, Atlanta, GA, United States, 2004.
[45] D. S. McNamara, A. C. Graesser, P. M. McCarthy, and Z. Cai. Auto-
mated Evaluation of Text and Discourse with Coh-Metrix. Cambridge
University Press, Mar. 2014.
[46] P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia spot-
light: shedding light on the web of documents. In Proceedings of the
7th International Conference on Semantic Systems, page 1–8, 2011.
[47] J. Mu, K. Stegmann, E. Mayfield, C. Rosé, and F. Fischer. The
ACODEA framework: Developing segmentation and classification
schemes for fully automatic analysis of online discussions. Interna-
tional Journal of Computer-Supported Collaborative Learning, 7(2):
285–305, 2012.
[48] E. B. Page and N. S. Petersen. The computer moves into essay grading:
Updating the ancient test. Phi Delta Kappan, 76(7):561, Mar. 1995.
ISSN 00317217. URL http://search.proquest.com/docview/
218533317/abstract.
[49] C. L. Park. Replicating the Use of a Cognitive Presence Measurement
Tool. Journal of Interactive Online Learning, 8:140–155, 2009.
[50] L. Rourke, T. Anderson, D. R. Garrison, and W. Archer. Methodolog-
ical issues in the content analysis of computer conference transcripts.
International Journal of Artificial Intelligence in Education (IJAIED),
12:8–22, 2001. Part II of the Special Issue on Analysing Educational
Dialogue Interaction (editor: Rachel Pilkington).
[51] L. Rourke, T. Anderson, D. R. Garrison, and W. Archer. Assessing so-
cial presence in asynchronous text-based computer conferencing. The
Journal of Distance Education / Revue de l’Éducation à Distance, 14
(2):50–71, 2007.
[52] P. J. Stone, D. C. Dunphy, and M. S. Smith. The general inquirer: A
computer approach to content analysis. 1966.
[53] J.-W. Strijbos. Assessment of (computer-supported) collaborative
learning. IEEE Transactions on Learning Technologies, 4(1):59–73,
2011.
[54] J.-W. Strijbos, R. L. Martens, F. J. Prins, and W. M. G. Jochems. Con-
tent analysis: what are they talking about? Computers & Education,
46(1):29–48, 2006.
[55] M. Strube and S. P. Ponzetto. WikiRelate! Computing Semantic Re-
latedness Using Wikipedia. In Proceedings of the 21st National Con-
ference on Artificial Intelligence - Volume 2, AAAI’06, pages 1419–
1424, Boston, Massachusetts, 2006. AAAI Press. ISBN 978-1-57735-
281-5. URL http://dl.acm.org/citation.cfm?id=1597348.
1597414.
[56] P.-N. Tan, V. Kumar, and M. Steinbach. Introduction to Data Mining.
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,
2005. ISBN 0-321-32136-7.
[57] Y. R. Tausczik and J. W. Pennebaker. The Psychological Meaning of
Words: LIWC and Computerized Text Analysis Methods. Journal of
Language and Social Psychology, 29(1):24–54, 2010.
[58] Y. R. Tausczik and J. W. Pennebaker. The Psychological Meaning
of Words: LIWC and Computerized Text Analysis Methods. Jour-
nal of Language and Social Psychology, 29(1):24–54, Mar. 2010.
ISSN 0261-927X, 1552-6526. doi: 10.1177/0261927X09351676.
URL http://jls.sagepub.com.proxy.lib.sfu.ca/content/
29/1/24.
[59] V. N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1 edi-
tion edition, 1998.
[60] J. Vassileva. Toward social learning environments. IEEE Transactions
on Learning Technologies, 1(4):199–214, 2008.
[61] N. Vaughan and D. R. Garrison. Creating cognitive presence in a
blended faculty development community. The Internet and Higher
Education, 8(1):1–12, 2005.
[62] Z. Waters, V. Kovanović, K. Kitto, and D. Gašević. Structure mat-
ters: Adoption of structured classification approach in the context of
cognitive presence classification. In Proceedings of the 11th Asia In-
formation Retrieval Societies Conference, AIRS 2015, 2015.
[63] I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Ma-
chine Learning Tools and Techniques, Third Edition. Morgan Kauf-
mann, 3 edition, 2011.
[64] A. Zouaq and R. Nkambou. Building domain ontologies from text for
educational purposes. IEEE Transactions on Learning Technologies,
1(1):49–62, 2008.
... Among the studies that reported collaborative learning tasks, the main task was to predict the behaviour of a student or a group of students based on the text data collected from online discussions. To illustrate, many papers proposed content analysis methods to automatically identify social and cognitive presence in online discussions (Kovanović et al., 2016;M. Ferreira et al., 2020;Y. ...
... For instance, our results showed that collaborative learning has been the most frequent educational task across the studies reviewed. Moreover, many papers focused on the automatic extraction of relevant indicators of cognitive and social processes from the students' messages in educational forums (Kovanović et al., 2016;M. Ferreira et al., 2020;Liu et al., 2023; G. Barbosa et al., 2020). ...
... The main techniques found were supervised machine learning algorithms (N = 90), meaning that textual data was automatically categorized in a set of pre-defined categories (classification) or a scalar number (regression). In addition to the final accuracy of a model, many papers also reported on the most important features in different classification algorithms (Kovanović et al., 2016;M. Ferreira et al., 2020;Y. ...
Article
Full-text available
Learning Analytics (LA) is one of the world’s most influential research fields related to educational technology. Among many themes that the LA community considers, the application of Natural Language Processing (NLP) algorithms has been largely adopted to extract information from textual data generated in learning environments (e.g., student essays and short answers, online discussion and chat). NLP can shed light on the learning process and student outcomes in different contexts. Based on the importance of NLP for education, this paper conducted a systematic literature review of the application of NLP to understand how the LA community has been applying this method. Our methodology includes automatic and manual methods to extract information about authors, relevant papers, and specific data related to educational applications and algorithms used in the field. This review selected 156 papers that reveal essential aspects of the topic, such as: (i) the majority of the works focused on the analysis of online discussions and essay assessment; (ii) in general, the authors did not apply the developed models in real settings; (iii) recent papers selected start to evaluate deep learning models (e.g., BERT) more frequently; (iv) the datasets used in the experimentation are usually small and containing English text; (v) the average models performance reaches 0.54 and 0.79 of Cohen’s Kappa and Accuracy, respectively. The results of this study and its practical implications are further discussed.
... Due to the plethora of MOOC forum messages, automated approaches analysing the linguistic features of the messages are required to study the quality of learners' understanding (O'Riordan et al., 2016). Researchers using these approaches have asserted that specific cognitive processes (e.g., asking questions, expressing opinions, etc.) are characterised by distinctive features (e.g., Dowell et al., 2015;Kovanovi c et al., 2016;O'Riordan et al., 2020). Simply put, learners' linguistic choices can reflect, to some extent, their level of understanding and the quality of their cognitive engagement (Tenbrink, 2020). ...
... e.g.,Kovanovi c et al., 2016;Sun et al., 2016;Wang et al., 2015), we used question marks to identify interactive language. ...
... The nature of the words (e.g., academic vs. non-academic) also needs to be considered to better capture the psychological effort that learners put into learning.Lastly, in response to RQ3, variable importance scores revealed that the two features capturing the pedagogical design of MOOC forum discussion tasks significantly contributed to predicting different SOLO levels of learners' cognitive engagement. Although the classification accuracy of our model was similar to that found in previous studies (e.g.,Kovanovi c et al., 2016;Wang et al., 2015), this study advances our knowledge of the role that the pedagogical design of MOOC forum discussion tasks plays in the extent to which learners engage cognitively in MOOC forum discussions. This indicates the importance of designing MOOC forum tasks to facilitate cognitive engagement. ...
Article
Full-text available
Background Forums in massive open online courses (MOOCs) enable written exchanges on course content; hence, they can potentially facilitate learners' cognitive engagement. Given the myriad of MOOC forum messages, this engagement is commonly analysed automatically through the linguistic features of the messages. Assessing linguistic features of learners' forum messages involves consideration of the learning tasks. MOOC forum discussion tasks, however, have not been previously considered. Objective and Method This study explores the effects of MOOC forum discussion tasks on learners' cognitive engagement. Based on the structure of observed learning outcomes (SOLO) taxonomy, we manually annotate distinct levels of cognitive engagement encouraged in forum discussion tasks and displayed by learners in messages starting discussions (i.e., thread starters). We study the linguistic features of thread starters in relation to the pedagogical design of the discussion tasks. Additionally, we use random‐forest modelling to identify the linguistic and task‐related features that help to categorise learners' cognitive engagement according to SOLO levels. Results Manual analysis showed that learners' thread starters mainly reflect surface SOLO levels and include few academic words and cohesive language. Random‐forest modelling showed that these linguistic features, together with the SOLO levels encouraged in the discussion tasks, played an important role in identification of learners' cognitive engagement. Major Takeaways Our results highlight the importance of the pedagogical design of MOOC forum tasks in helping learners engage cognitively. Our study also contributes to the empirical evidence that learners' linguistic choices can afford insights into the quality of their cognitive engagement.
... Although Bloom's taxonomy is well-known, it does not establish an epistemological basis for the classification of different cognitive skills or educational outcomes (Pring, 1971). Kovanović et al. (2016) analysed MOOC forum posts through the lens of the cognitive presence in the Community of Inquiry (CoI) framework . This framework brings together three presences that are essential for learners to construct meaning through interaction and communication. ...
... Due to the plethora of MOOC forum messages, automated approaches analysing the linguistic features of the messages are required to study the quality of learners' understanding (O'Riordan et al., 2016). Researchers using these approaches assert that specific cognitive processes (e.g., asking questions, expressing opinions, etc.) are characterised by distinctive features (e.g., Dowell et al., 2015;Kovanović et al., 2016;O'Riordan et al., 2020). Simply put, learners' linguistic choices can reflect, to some extent, the quality of their cognitive engagement (Tenbrink, 2020). ...
... In line with previous studies (e.g., Kovanović et al., 2016, Sun et al., 2016Wang et al., 2015), we used question marks to identify interactive language. ...
Thesis
Full-text available
Massive open online courses (MOOCs) emerged with the promise to disrupt higher education. Fifteen years after their emergence, in terms of performance, that promise has not been fulfilled. In MOOC discussion forums, learners seldom capitalise on the opportunities for social learning. Through four empirical studies, we investigate how MOOC discussion forums are structured and how they can be potentially designed to facilitate learner-to-learner interactions and instructional dialogue. Results show that thoughtful design can help improve MOOC forum navigation, participation, and interactions. However, the environment in which forums are embedded needs to be considered as a techno-pedagogical fabric that provides (but also constrains) opportunities for social learning.
... Cognitive process captures higher-order thinking and critical thinking skills through words associated with causation, self-reflection, uncertainty, differentiation and so on (Moore et al., 2019). LIWC's cognitive processing score has been found to have high levels of predictive validity and has been used for automatic classification of cognitive presence empirically (Kovanović et al., 2016;Ferreira et al., 2020). Analytical thinking signifies formal and logical language which results from cognitive processes (Pennebaker et al., 2014). ...
Article
Full-text available
The COVID-19 pandemic disrupted teaching and learning activities in higher education around the world. As universities shifted to remote instruction in response to the pandemic, it is important to learn how students engaged in learning during this challenging period. In this paper, we examined the changes in learners’ social and cognitive presence in online discussion forums prior and after remote instruction. We also extracted emergent topics during the pandemic as an attempt to explore what students talked about and how they interacted with their peers. We extracted discussion forum posts between 2019 and 2020 from courses that have been offered repeatedly each term at a four-year university in the U.S. Our findings suggest that students exhibited higher social presence through increased social and affective language during remote instructions. We also identified emergent COVID-19 related discourse, which involved sharing personal experience with positive sentiments and expressing opinions on contemporary events. Our qualitative analysis further revealed that students showed rapport and empathy towrads others, and engaged in active sense making of the pandemic through engaging in critical discourse. Our study sheds lights on leveraging discussion forum to facilitate learner experiences and building classroom community in online courses. We further discussed the potential for conducting large-scale computational linguistic modeling on learner discourse and the role of artificial intelligence in deriving insights on learning behavior at scale to support remote teaching and learning.
... Traditional LEC methods, based on machine learning techniques such as Support Vector Machine (SVM) and Random Forest (RF), require much time from domain experts to extract hand-crafted features and generally yield suboptimal performance [24], [25]. Recently, deep learning-based approaches have been employed to automatically learn feature representations for LEC tasks from text. ...
Preprint
Full-text available
Various machine learning approaches have gained significant popularity for the automated classification of educational text to identify indicators of learning engagement -- i.e. learning engagement classification (LEC). LEC can offer comprehensive insights into human learning processes, attracting significant interest from diverse research communities, including Natural Language Processing (NLP), Learning Analytics, and Educational Data Mining. Recently, Large Language Models (LLMs), such as ChatGPT, have demonstrated remarkable performance in various NLP tasks. However, their comprehensive evaluation and improvement approaches in LEC tasks have not been thoroughly investigated. In this study, we propose the Annotation Guidelines-based Knowledge Augmentation (AGKA) approach to improve LLMs. AGKA employs GPT 4.0 to retrieve label definition knowledge from annotation guidelines, and then applies the random under-sampler to select a few typical examples. Subsequently, we conduct a systematic evaluation benchmark of LEC, which includes six LEC datasets covering behavior classification (question and urgency level), emotion classification (binary and epistemic emotion), and cognition classification (opinion and cognitive presence). The study results demonstrate that AGKA can enhance non-fine-tuned LLMs, particularly GPT 4.0 and Llama 3 70B. GPT 4.0 with AGKA few-shot outperforms full-shot fine-tuned models such as BERT and RoBERTa on simple binary classification datasets. However, GPT 4.0 lags in multi-class tasks that require a deep understanding of complex semantic information. Notably, Llama 3 70B with AGKA is a promising combination based on open-source LLM, because its performance is on par with closed-source GPT 4.0 with AGKA. In addition, LLMs struggle to distinguish between labels with similar names in multi-class classification.
... Specifically, we used the LIWC linguistic indicators to capture social presence (i.e., affective, cohesive, and interactive presence) based on the work of Ferreira et al. (2020). Moreover, we referred to the work of Kovanović et al. (2016) to measure cognitive presence, i.e., triggering event, exploration, integration, and resolution. The linguistic indicators for each dimension of social and cognitive presence are shown in Table 1. ...
Article
Full-text available
This study explored how undergraduate students demonstrated cognitive and social presence in their written comments during asynchronous social annotation activities. Moreover, we investigated the relationships between cognitive and social presence and peer support in digital social reading. This study included 93 participants from a large North American University, who were asked to annoate weekly readings throughout a semester using a digital social reading platform. Using the Linguistic Inquiry and Word Count (LIWC) text mining tool to analyze learners’ written comments, we identified the linguistic indicators of social presence and cognitive presence, respectively. We then conducted a latent profile analysis (LPA) to examine whether students demonstrated specific patterns of social and cognitive presence. Learners showed two profiles, revealing high and low presence patterns. The high-presence group exhibited greater engagement in the affective and interactive aspects of social presence, as well as in the first two stages of cognitive presence (i.e., triggering event, and exploration), compared to the low-presence group. Furthermore, the high-presence group was more likely to receive peer support than the low-presence group. These findings lead to practical implications for instructors regarding promoting interaction among students in online learning communities, especially in social digital learning contexts.
... Based on the percentages of the four CP phases, they assessed the levels of CP between different communities consisting of first-and second-year students. Moreover, aiming at improving the coding efficiency and extending the applicability of QCA for learning assessment, several studies have worked toward automated approaches by leveraging machine learning techniques (Ba et al., 2022;Hu et al., 2022;Kovanović et al., 2016;Zou et al., 2021). ...
Article
Full-text available
Accurate assessment and effective feedback are crucial for cultivating learners' abilities of collaborative problem‐solving and critical thinking in online inquiry‐based discussions. Based on quantitative content analysis (QCA), there has been a methodological evolvement from descriptive statistics to sequential mining and to network analysis for mining coded discourse data. Epistemic network analysis (ENA) has recently gained increasing recognition for modelling and visualizing the temporal characteristics of online discussions. However, due to methodological restraints, some valuable information regarding online discussion dynamics remains unexplained, including the directionality of connections between theoretical indicators and the trajectory of thinking development. Guided by the community of inquiry (CoI) model, this study extended generic ENA by incorporating directional connections and stanza‐based trajectory tracking. By examining the proposed extensions with discussion data of an online learning course, this study first verified that the extensions are comparable with QCA, indicating acceptable assessment validity. Then, the directional ENA revealed that two‐way connections between CoI indicators could vary over time and across groups, reflecting different discussion strategies. Furthermore, trajectory tracking effectively detected and visualized the fine‐grained progression of thinking. At the end, we summarize several research and practical implications of the ENA extensions for assessing the learning process. Practitioner notes What is already known about this topic Assessment and feedback are crucial for cultivating collaborative problem‐solving and critical thinking in online inquiry‐based discussions. Cognitive presence is an important construct describing the progression of thinking in online inquiry‐based discussions. Epistemic network analysis is gaining increasing recognition for modelling the temporal characteristics of online inquiries. What this paper adds Directional connections between discourses can reflect different online discussion strategies of groups and individuals. A pair of connected discourses coded with the community of inquiry model can have different meanings depending on their temporal order. A trajectory tracking approach can uncover the fine‐grained progression of thinking in online inquiry‐based discussions. Implications for practice and/or policy Besides the occurrences of individual discourses, examining the meanings of directional co‐occurrences of discourses in online discussions is worthwhile. Groups and individuals can employ different discussion strategies and follow diverse paths to thought development. Developmental assessment is crucial for understanding how participants achieve specific outcomes and providing adaptive feedback.
Chapter
This chapter takes the position of assisting the team with its mission given its membership. Conflicts from this perspective must be resolved to the benefit of the team rather than the individual, otherwise a team ceases to exist, causing ethical dilemmas to arise.
Article
A well-designed asynchronous online discussion (AOD) has the potential to encourage learners’ critical thinking (CT). Previous studies have shown that the discussion strategies selected by instructors when designing AOD questions can influence learners’ CT. However, the associations between different discussion strategies and learners’ CT have not been fully explored, and clear guidance for instructors on how to promote learners’ CT by selecting appropriate discussion strategies is limited. In this study, a coupling deep learning model named CritiNet was developed to classify and identify learners’ CT automatically based on Murphy's CT analysis model as a coding scheme (recognize, understand, analyze, evaluate, and create) in 15,483 Chinese text discussion posts. These discussion posts were generated by 505 learners in four different discussion strategies: case-based discussion, debate, open-ended discussion, and role play. Then, the associations of the discussion strategies with learners’ CT were examined. Results indicated CritiNet had excellent performance in classifying Chinese text discussion posts and identifying learners’ CT. Pearson's chi-squared test reported a strong association between the four discussion strategies and learners’ CT. Cross-analysis revealed differences among the four discussion strategies in encouraging CT. Specifically, the case-based discussion strategy developed evaluate of CT more effectively, whereas the three other strategies promoted analyze to a greater extent. By contrast, the open-ended strategy encouraged the least development of CT in create, and the role play strategy generated the smallest proportion in evaluate. The implications of these findings for instructors to encourage learners’ CT effectively in AOD were discussed.
Article
Full-text available
Connectivist pedagogies are geared towards building a network of learners that actively employ technologies to establish interpersonal connections in open online settings. In this context, as course participants increasingly establish interpersonal relationships among peers they have greater opportunity to draw on and leverage the latent social capital that resides in such a distributed learning environment. However, to date there have been a limited number of studies exploring how learners build their social capital in open large-scale courses. To inform the facilitation of learner networks in open online settings and beyond, this study analyzed factors associated with how learners accumulate social capital in the form of learner connections over time. The study was conducted in two massive open online course offerings (Connectivism and Connective Knowledge) that were designed on the principles of connectivist pedagogy and that made use of data about social interaction from Twitter, blogs, and Facebook. For this purpose, linear mixed modeling was used to understand the associations between learner social capital, linguistic and discourse patterns, media used for interaction, as well as the time in the course when interaction took place. The results highlight the association between the language used by the learners and the creation of ties between them. Analyses on the accumulation of connections over time have implications for the pedagogical choices that would be expected to help learners leverage access to potential social capital in a networked context.
Conference Paper
Full-text available
Within online learning communities, receiving timely and meaningful insights into the quality of learning activities is an important part of an effective educational experience. Commonly adopted methods – such as the Community of Inquiry framework – rely on manual coding of online discussion transcripts, which is a costly and time consuming process. There are several efforts underway to enable the automated classification of online discussion messages using supervised machine learning, which would enable the real-time analysis of interactions occurring within online learning communities. This paper investigates the importance of incorporating features that utilise the structure of on-line discussions for the classification of " cognitive presence " – the central dimension of the Community of Inquiry framework focusing on the quality of students' critical thinking within online learning communities. We implemented a Conditional Random Field classification solution, which incorporates structural features that may be useful in increasing classification performance over other implementations. Our approach leads to an improvement in classification accuracy of 5.8% over current existing techniques when tested on the same dataset, with a precision and recall of 0.630 and 0.504 respectively.
Chapter
Full-text available
The field of learning analytics recently attracted attention from educational practitioners and researchers interested in the use of large amounts of learning data for understanding learning process and improving learning and teaching practices. In this chapter, we introduce content analytics – a particular form of learning analytics focused on the analysis of different forms of content related to learning. While several publications provided brief overviews of content analytics, the goal of this chapter is to define content analytics and provide a comprehensive overview of the most important studies in the published literature to date. Given the early stage of the learning analytics field, the focus of this chapter is on the important problems and challenges for which existing content analytics approaches are suitable and have been successfully used in the past. We also reflect on the current trends in content analytics and their position within a broader domain of educational research.
Article
Full-text available
We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifi ers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearestneighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods), implemented in Weka, R (with and without the caret package), C and Matlab, including all the relevant classifiers available today. We use 121 data sets, which represent the whole UCI data base (excluding the large-scale problems) and other own real problems, in order to achieve significant conclusions about the classifier behavior, not dependent on the data set collection. The classifiers most likely to be the bests are the random forest (RF) versions, the best of which (implemented in R and accessed via caret) achieves 94.1% of the maximum accuracy overcoming 90% in the 84.3% of the data sets. However, the difference is not statistically significant with the second best, the SVM with Gaussian kernel implemented in C using LibSVM, which achieves 92.3% of the maximum accuracy. A few models are clearly better than the remaining ones: random forest, SVM with Gaussian and polynomial kernels, extreme learning machine with Gaussian kernel, C5.0 and avNNet (a committee of multi-layer perceptrons implemented in R with the caret package). The random forest is clearly the best family of classifiers (3 out of 5 bests classi ers are RF), followed by SVM (4 classifiers in the top-10), neural networks and boosting ensembles (5 and 3 members in the top-20, respectively). © 2014 Manuel Fernández-Delgado, Eva Cernadas, Senén Barro and Dinani Amorim.
Book
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book.
Article
Coh-Metrix is among the broadest and most sophisticated automated textual assessment tools available today. Automated Evaluation of Text and Discourse with Coh-Metrix describes this computational tool, as well as the wide range of language and discourse measures it provides. Section I of the book focuses on the theoretical perspectives that led to the development of Coh-Metrix, its measures, and empirical work that has been conducted using this approach. Section II shifts to the practical arena, describing how to use Coh-Metrix and how to analyze, interpret, and describe results. Coh-Metrix opens the door to a new paradigm of research that coordinates studies of language, corpus analysis, computational linguistics, education, and cognitive science. This tool empowers anyone with an interest in text to pursue a wide array of previously unanswerable research questions..