Conference PaperPDF Available

Structure Matters: Adoption of Structured Classification Approach in the Context of Cognitive Presence Classification


Abstract and Figures

Within online learning communities, receiving timely and meaningful insights into the quality of learning activities is an important part of an effective educational experience. Commonly adopted methods – such as the Community of Inquiry framework – rely on manual coding of online discussion transcripts, which is a costly and time consuming process. There are several efforts underway to enable the automated classification of online discussion messages using supervised machine learning, which would enable the real-time analysis of interactions occurring within online learning communities. This paper investigates the importance of incorporating features that utilise the structure of on-line discussions for the classification of " cognitive presence " – the central dimension of the Community of Inquiry framework focusing on the quality of students' critical thinking within online learning communities. We implemented a Conditional Random Field classification solution, which incorporates structural features that may be useful in increasing classification performance over other implementations. Our approach leads to an improvement in classification accuracy of 5.8% over current existing techniques when tested on the same dataset, with a precision and recall of 0.630 and 0.504 respectively.
Content may be subject to copyright.
Structure matters: Adoption of structured
classification approach in the context of
cognitive presence classification
Zak Waters1, Vitomir Kovanovi´c2, Kirsty Kitto1, and Dragan Gaˇsevi´c2
1Queensland University of Technology,
Brisbane, Australia,,,
2The University of Edinburgh,
Edinburgh, United Kingdom,
Abstract. Within online learning communities, receiving timely and
meaningful insights into the quality of learning activities is an important
part of an effective educational experience. Commonly adopted meth-
ods – such as the Community of Inquiry framework – rely on manual
coding of online discussion transcripts, which is a costly and time con-
suming process. There are several efforts underway to enable the auto-
mated classification of online discussion messages using supervised ma-
chine learning, which would enable the real-time analysis of interactions
occurring within online learning communities. This paper investigates
the importance of incorporating features that utilise the structure of on-
line discussions for the classification of “cognitive presence” – the central
dimension of the Community of Inquiry framework focusing on the qual-
ity of students’ critical thinking within online learning communities. We
implemented a Conditional Random Field classification solution, which
incorporates structural features that may be useful in increasing classifi-
cation performance over other implementations. Our approach leads to
an improvement in classification accuracy of 5.8% over current existing
techniques when tested on the same dataset, with a precision and recall
of 0.630 and 0.504 respectively.
Keywords: Text Classification, Conditional Random Fields, Online Learn-
ing, Online Discussions
1 Introduction
The classification of social interactions occurring among individuals who partic-
ipate in an online community is an important research problem. Not all partici-
pant contributions have the same value, with some being more thoughtful than
others. This problem is particularly important in an educational domain, where
online discussions are often being used to support both fully online and blended
models of learning [7]. A substantial body of research aims to foster higher-order
thinking among students in online learning communities. One prominent frame-
work for approaching this problem is the Community of Inquiry (CoI) model [8]
which describes the important dimensions of learning in online communities,
and provides a quantitative coding scheme for their assessment. This coding
scheme provides a method for categorising various interactions between partic-
ipants within a particular online community, which is traditionally conducted
by two human “coders” who manually label discussion messages for post hoc
Despite wide adoption by online education researchers, coding online discus-
sion transcripts is a manual and labor-intensive task, often requiring several
coders to dedicate significant amounts of time to code each of the discussion
messages. This approach i) does not enable for a real-time feedback on the
quality of learning interactions, and ii) limits the wider adoption of the CoI
framework by educational practitioners. This problem makes the task an ideal
candidate for automation, and a number of approaches aimed at automating the
process of coding transcripts using machine learning techniques are in develop-
ment [22,2,17]. While these approaches have produced promising results, their
text classification models currently make class predictions on a per-message ba-
sis, using only features derived from a single post, without consideration of the
context of a post or of the preceding classification sequence. Given that human
coders take discussion context into account during the classification process, and
that the underlying construct of cognitive presence develops over time [9,7], it
seems likely that structural classification features can be used to model context
in a similar fashion, and that these might improve classification accuracy.
This paper presents the preliminary results of an alternate approach to the au-
tomated analysis of online discussions within online learning communities using
Conditional Random Fields (CRFs) [26], which is a novel extension of previous
work that aims to automate the text-classification of online discussions using the
CoI framework. Our results show that the use of structural features in combina-
tion with a CRF model produce a higher classification accuracy than currently
available methods. In section 2, the CoI model is briefly introduced, and examines
current approaches of analysing community participants’ “cognitive presence”.
Related applications of CRFs to online discussions are also reviewed. Section 3
outlines our approach, which aims to improve on existing approaches by com-
bining structural features with a Linear-Chain CRF model. The results of this
experiment are presented in section 4, where they are compared against current
approaches and human accuracies. Structural features and their potential use
across a number of contexts and discussion media are discussed in section 5,
along with the limitations of the current study, which form the basis of the fu-
ture work directions. Finally, the research and key contributions are summarised
in section 6.
2 Background Work
2.1 The Community of Inquiry (CoI) framework
Overview. The Community of Inquiry (CoI) framework [8,7] proposes three
important dimensions (presences) of inquiry-based online learning:
1. Teaching presence defines the role of instructors before and for the dura-
tion of a course, consisting of i) direct instruction, ii) course facilitation, and
iii) course organization and design.
2. Social presence provides insights into the social climate between course
participants. It consists of i) affective communication, ii) group cohesion,
and iii) interactivity of communication.
3. Cognitive presence is a central component of the framework and defines
phases in the development of cognitive and deep thinking skills in online
learning community [8].
The CoI framework defines multi-dimensional content analysis schemes [4]
for the coding of student discussion messages, which is the main unit of analysis
used to assess the level of the three presences. This framework has gained consid-
erable attention in the educational research community, with a large number of
replication studies and empirical validations (cf. [10,9]). Overall, the CoI frame-
work and its coding schemes show sufficient levels of robustness (see section 3.1
for an example) resulting in widespread adoption of the framework in the online
education research community [10].
Of particular interest is the level of cognitive presence exhibited by the com-
munity members, due to its indication of their critical thinking. It is defined as
the “extent to which the participants in any particular configuration of a com-
munity of inquiry are able to construct meaning through sustained communica-
tion.” [8, p11], and is operationalized through a practical inquiry model which
defines the four phases of the inquiry process that occurs during learning [8]:
1. Triggering: In the first phase, students are faced with some problem or
dilemma which triggers a learning cycle. This typically results in messages
asking questions and expressing a sense of puzzlement.
2. Exploration: This phase is primarily characterized by the exploration –
both individually and in group – of different ideas and solutions to the prob-
lem at hand. Brainstorming, questioning, leaping into conclusions, and in-
formation exchange are the primary activities in the exploration phase.
3. Integration: After exploring different ideas, students synthesize the rele-
vant ideas which ultimately leads to construction of meaning [8]. From the
perspective of an instructor, this is the most difficult phase to detect as
integration of ideas is often not clearly visible in discussion transcripts.
4. Resolution: In the final phase, students apply the newly constructed knowl-
edge to the original problem, typically in the form of hypothesis testing or
the building of a consensus.
Challenges of CoI framework adoption. One of the biggest practical chal-
lenges in adoption of the CoI framework – and other transcript analysis methods
– is that it requires experienced coders and substantial labor-intensive work to
code (i.e. categorise) discussion messages for the levels of three presences [17,4].
As such, it is argued that this and similar approaches have had very little practi-
cal impact upon current educational practices [4]. To enable for a more proactive
use of the Community of Inquiry framework by the course instructors, there is a
need for an automated content analysis of online discussions that would provide
instructors with a real-time feedback about student learning activities [15].
2.2 Automated classification of student discussion messages
Despite the labor intensive nature of manually coding online discussion messages,
human coders that categorise online discussion messages into the phases of cogni-
tive presence typically achieve very high intersubjective agreements. Moreover,
the high levels of agreement among coders suggests that humans can identify
the latent phases of cognitive presence from text-based discussions with relative
ease. On the other hand, using machine learning to classify student messages
in a similar manner is a challenging task. Where humans construct meaning
from text using various inferences and abstractions that manifest as complex
higher-order cognitive processes, machine learning approaches require meticu-
lously constructed feature spaces, which are representative of the problem task.
Kovanovi´c et al. [17] presented an approach to classifying cognitive presence
from online discussions, using a Support Vector Machine (SVM) classification
model, which achieved classification accuracy of 58.84%. While the results of
this work are promising, the overall performance of this approach is substan-
tially less accurate than what can be achieved by human coders, which provides
further evidence of the overall complexity of this task. In this approach, Ko-
vanovi´c et al. [17] made use of lexical features derived from the content of each
individual discussion message that are prominent within the literature. These
features consisted of various N-grams, POS tags, name entity counts and depen-
dency tuples, as well as intuitive features such as whether a post or reply is the
first in a discussion thread. In contrast, human coders may typically utilise con-
textual information when making their coding decisions, such as the structure
the discussion or the sequence in which discussion messages appear. Because
of this, it is worth investigating how structural features about a discussion in
addition to considering discussion messages in sequence may further improve
classification performance.
Beyond the CoI framework, many studies have acknowledged that account-
ing for the relationships between individual messages and the latent structure of
discussions may improve classification performance for transcript analysis [25,
5,23]. Specifically, Ravi and Kim [23] suggests that using features derived from
a previous message can be a positive indicator for classification of the next post
along in a discussion. Other related work in threaded-discussion classification
that seeks to incorporate the structural features of discussions is becoming in-
creasingly common [6,28,14]. The most common type of structural features
utilised include a post’s position relative to others in a discussion, whether a
post is the first or the last in a thread, how similar a post is as compared to its
neighbours, and how many replies a post accrued. For this study, we attempt to
account for the latent structure between posts in a discussion by incorporating
these features into a Conditional Random Field approach.
2.3 Conditional Random Fields for Automated Detection of
Cognitive Presence
We have implemented a Conditional Random Field (CRF) classification model [26]
to annotate posts within a discussion with the phases of cognitive presence. Un-
like traditional text classification methods, Conditional Random Fields consider
the label sequence of a data set. Because of this, Conditional Random Fields
have found numerous applications in natural language processing (NLP) tasks,
such as part-of-speech (POS) tagging [18], document segmentation and sum-
marisation [24], as well as gene prediction from biological sequence data [3].
Recent related research has extended CRFs to online forum discussions,
where posts and interactions between participants are sequential in nature. Wang
et al. [28] applied CRFs to discussion forums to learn the reply structure of forum
interactions. This was achieved by using rich features that capture both short
and long range dependencies within posts of an online discussion such as the
lexical content similarity between two neighbouring posts. Similarly, FitzGerald
et al. [6] combined the lexical features of posts with a Linear-Chain CRF to
detect high quality comments in blog discussions, such as the word and sentence
count of the post. Moreover, FitzGerald et al. [6] postulates that there exists
sequential dependencies between posts in a forum, which emphasises the useful-
ness of structural features derived from the entire discussion, as well as lexical
features from a single post. To date, CRF classification has not been applied to
the problem of automating the detection of Cognitive Presence in online discus-
sion transcripts. Here, we show that making this step improves the accuracy of
classification when compared with the current best practices.
3 Methods
3.1 Dataset
The data used in this study comes from six offerings of a fully-online masters-level
research-oriented course in software engineering at a Canadian public university.
This is the same dataset as was used in the study by Kovanovi´c et al. [17] which
makes for more accurate and direct comparison between the two different clas-
sification approaches. In total, the data consists of 1,747 messages produced by
81 students. Each message was coded by two experienced coders who achieved
an excellent level of coding agreement of 0.97 Cohen’s Kappa, which is a mea-
sure commonly used to measure inter-rater reliability between coders using a
quantitative categorisation scheme. Table 1shows the distribution of messages
in different phases of cognitive presence. The details of course structure and
organization are explained in detail in Kovanovi´c et al. [16], Gaˇsevi´c et al. [12].
Table 1. Cognitive Presence Coding
ID Phase Messages (%)
0 Other (no signs of cognitive presence) 140 8.01%
1 Triggering Event 308 17.63%
2 Exploration 684 39.17%
3 Integration 508 29.08%
4 Resolution 107 6.12%
All phases 1747 100%
3.2 Classifier Implementation
For this study, we implemented a Linear-Chain Conditional Random Field (LC-
CRF) model to predict the phases of cognitive presence occurring in online
discussions. This LCCRF was implemented in Java using the Mallet library [21],
which is a widely used open source toolkit for machine learning. This library was
extended as needed to suit our experimental requirements.
3.3 Data Preprocessing
In this dataset, online discussions form a tree-like hierarchical structure (i.e.,
each discussion message can receive replies which can also receive replies). This
presents a problem; in order to train and test our LCCRF implementation, the
structure of the data must be linear, as opposed to the current tree structure. In
order to obtain appropriate sequences of data, sub-threads were extracted such
that every sequence of posts from the root node to every leaf node in a tree
was obtained. To obtain reliable results, these sub-threads must be remerged
after classification to produce one classification per message in a discussion; this
remerging process in described in section 4.1. While other CRF models will
accept hierarchical structures (e.g., such as Tree-Structured and Hierarchical
CRFs), we chose a linear-chain model over other approaches due to the size
constraints imposed by the dataset, which had only 84 coded discussion threads
in total to use for training and testing a tree-structured model. Breaking these
up into linear chains produced more message sequences that could be used to
train our linear model.
In addition to the extraction of linear sequences, the discussion threads in
the data set were split into two sets; one for training and testing the CRF model,
the other for validation from which our results are derived. These threads were
split 70/20/10% for training, testing and validation, respectively.
3.4 Classification Features
Many of the features used for the purpose of this study were extracted using
the various functionalities of the Stanford CoreNLP Java library [20], and are
derived from the related work in our literature review. Each post in the discussion
is described by a feature vector that attempts to encapsulate both lexical and
structural features. In addition to word unigrams, lexical features were derived
from the text content of a post itself, and structural features were used to indicate
where a post resides in the context of the entire discussion thread. These features
are presented below:
1. Entity Count is the number of entities within a post as found by the
Stanford CoreNLP Named Entity Recognition (NER) tool. The rationale
behind using this feature is that discussion participants posting exploration
comments are more likely to introduce a number of entities through their
exploration of ideas.
2. First Post and Last Post are boolean features that are set to true when a
post is the first and last in a discussion respectively. This feature represents
the implicit structure of the discussion, where it is intuitive to believe that
most Triggering phases occur at the start of a discussion.
3. Comment Depth is the number assigned to a post based on its chronolog-
ical order within a discussion thread.
4. Post Similarity of the previous and next post in a discussion is calculated
by obtaining the cosine similarity of two TF-IDF weighted vectors. The
post similarity features assist in incorporating the local structure of the
discussions, where it is expected that some phases of cognitive presence differ
significantly from one another, and some only slightly.
5. Word and Sentence counts capture the number of words and sentences
within a particular post. It is expected that when a discussion is reaching
the integration and resolution phases, there is a lot more content due to the
synthesis and integration of ideas.
6. Number of Replies to a post, which provides the classifier with the in-
tuition that the earlier phases of cognitive presence (Triggering and Explo-
ration) will have more replies than the later phases. Additionally, this feature
also helps model the implicit structure within a discussion, giving the clas-
sifier an indication of how large the discussion is. The rationale behind this
feature is that the triggering and exploration phases would generally have
more replies than the integration and resolution phases.
These features form a feature vector for each message in a discussion thread.
Because our classifier is sequential, these feature vectors are combined to form
a feature vector sequence used in Mallet for training and testing our CRF clas-
sification model.
4 Results
The aim of this study was to investigate whether classifying posts in sequence,
with the addition of structural features improves upon the current approach to
identifying cognitive presence in online learning discussions. In order to evaluate
the effectiveness of our approach we use Cohen’s Kappa, which is a metric often
used for judging the reliability of a categorisation scheme. Cohen’s Kappa is
advantageous as it allows for a genuine comparison between the performance of
human coders and our approach. A comparison between this experiment and the
approach with the current highest accuracy is described in Table 2.
Before remerging the discussion threads, the CRF model achieved an ac-
curacy of 67.2%, and 0.515 and 0.620 precision and recall respectively and a
F-measure of 0.562. Because sub-threads were extracted for this experiment (de-
tailed in section 3.3), messages found earlier in the discussion threads have been
classified multiple times. As a result of this, these accuracies are optimistically
high due to multiple correct classifications diluting the overall classification ac-
curacies. This problem was fixed by re-merging the discussion threads back into
their original hierarchical form in order using a majority vote mechanism.
Table 2. Comparison of Results
Approach Cohen’s Kappa Accuracy
Kovanovi´c et al. [17] 0.410 58.4%
LCCRF 0.482 64.2%
Human 0.97 NA
4.1 Re-merging Discussion Threads
As mentioned earlier in section 3.3, every message sequence from a root post to
every leaf node in a discussion was extracted to produce an appropriate linear
sequence to train the LCCRF. This means that the earlier posts in a discussion
may have been classified multiple times. Furthermore, the predicted phase need
not necessarily be the same for these multiple classifications; a post that was
classified as Triggering in one sequence might be classified as Exploration in
the next sequence that it appears in. In order to obtain one classification result
for each message in a threaded discussion, the sub-threads were remerged using
a majority vote mechanism. This method of remerging posts results in a final
accuracy of 64.2% for the validation set. A large majority of posts that were
classified multiple times belonged to the Triggering label, but many of these
multiple classifications were correctly identified. Thus, the resulting small drop
in performance is representative of the general classification accuracy obtained
by the LCCRF. It seems that this implementation performs well at this type
of classification task, with an overall precision and recall of 0.630 and 0.504 re-
spectively and a F-measure of 0.559. Moreover, our implementation achieves a
Cohen’s Kappa value of 0.482, which gives us a comparison with the human
coding according to this widely used metric for judging the overall reliability of
a coding or categorisation scheme. Table 2demonstrates that while an improve-
ment has been obtained, more work needs to be completed before we can be sure
that an automated approach is performing at a level similar to human coders in
this task.
5 Discussion
Our LCCRF approach shows promise for the automated classification of cog-
nitive presence in discussions occurring within an online learning community.
Moreover, the results of this work show a modest improvement over the work
conducted by Kovanovi´c et al. [17], who presented an accuracy of 58.4% as seen
in Table 2. The key differences in these two approaches is clear: our approach
considers discussion messages in sequence, modelled via the CRF, utilising fea-
tures that attempt to convey the context of the discussion. In contrast the work
presented by Kovanovi´c et al. [17] considers each message separately, relying on
primarily lexical features and a SVM.
These results suggest that a CRF utilising structural features is well suited
to this text classification task. Using this approach, the classifier may more ap-
propriately model the dependencies between messages in online discussions. The
structurally oriented feature-set allows for a contrast between posts that would
otherwise contain very similar lexical features. By combining these features, the
probabilistic CRF implementation appears to better model the dependencies
between posts, leading to increased predictive performance. This improvement
provides preliminary evidence of how modelling the structure of discussions, and
considering discussion posts in sequence may be an important factor in further
improving the automated detection of cognitive presence. Further studies using
our approach will seek to confirm this theory by exploring alternate features and
CRF implementations.
5.1 Limitations and future work
One key limitation of this work is contextual, our results may be biased to-
wards the single course from which the dataset was derived. Moreover, there are
a number of different platforms in which online learning discussions can take
place. For example, a learning community using Social Media may be more in-
formal in nature than one conducted in an institutes formal discussion forum.
Using a model trained on one community may not produce reliable results for
another community. Future research needs to consider data sets from courses in
other subject areas and delivery modes (i.e., blended learning). One potential
advantage of a structural approach is that it may perform more consistently
across different datasets. A classification based upon structural features is more
likely to prove robust under changed conditions than specific lexical character-
istics, and so there is the possibility that the CRF approach will achieve better
performance at text annotation across multiple discussion groups and fora. Fur-
ther research and new datasets will be required to investigate whether this claim
holds merit.
Other approaches to move towards automating the coding process will be inves-
tigated as future work. Because this approach uses a linear-chain model, some
dependencies between messages in an online discussion may be missed. However,
this linear model allows for the implementation of coding practice rules used by
various CoI coding schemes, such as “coding up” – i.e., when a message has
traces of two phases of cognitive presence, it is coded with the higher phase [16].
Despite this, approaches that might better model dependencies across hierarchi-
cal structures, such as a tree CRF may further improve on our current accuracy.
As seen in Table 1, the distribution of phases (class labels) in our dataset is
largely uneven. This disparity between the individual phases of cognitive pres-
ence is seen in the predictive performance of our classifier, where the lowest repre-
sented phases are typically classified correctly less often than that of their higher
represented counterparts. Unfortunately, collaboration within online learning
communities commonly takes this form, where learners typically do not progress
to the resolution phase of cognitive presence [11,12]. Future attempts at au-
tomation may benefit from a method of accounting for this uneven distribution
of class labels.
In order to replace the current approach to analysing online learning communities
with manual hand-coding transcripts, we aim to achieve Cohen’s Kappa value
of close to 0.80, which indicates an almost perfect agreement among coders ac-
cording to the Landis and Koch [19] interpretation of Cohen’s Kappa. Our CRF
approach achieved a Kappa value of 0.482, which indicates a moderate agree-
ment, but will require further improvement before machine learning techniques
can replace hand coders. Future work will aim to further improve our classifier’s
performance. Specifically, we plan to further improve our model by: (i) eval-
uating our model on another, larger dataset with a more even distribution of
phases; (ii) seeking additional features that may improve upon our current accu-
racies, such as Coh-Metrix [13] and features derived from the Linguistic Inquiry
and Word Count (LIWC) framework [27] that are commonly used to charac-
terise cognitive processing associated with comprehending and producing text
and discourse, and; (iii) better modelling the dependencies between threaded
discussions using a Tree-Structured CRF model approach
6 Conclusion
In this work, we presented a new approach to automating the detection of the
four phases of cognitive presence arising in online discussions. By reconceptualis-
ing online discussions as a sequence prediction problem, we predicted a sequence
of labels (i.e. the phases of cognitive presence) for a sequence of messages. This
allowed us to use a linear chain Conditional Random Field model for classifi-
cation, which incorporates structural features of online discussions rather than
just the lexical features that have previously been applied to solving this prob-
lem. This approach to automating the detection of cognitive presence has shown
promise, with moderate improvements over alternative approaches with an ac-
curacy of 64.2% and a Cohen’s Kappa value of 0.482. However, classification
accuracies are not yet high enough to replace the current approach of manually
coding transcripts. Further improving this model is a priority for future work
where we aim to further evaluate the model on alternative datasets, investigate
additional features, and attempt to better model the dependencies between posts
using a tree-structured CRF model.
[1] Arbaugh, J., Cleveland-Innes, M., Diaz, S.R., Garrison, D.R., Ice, P.,
Richardson, J.C., Swan, K.P.: Developing a community of inquiry instru-
ment: Testing a measure of the Community of Inquiry framework using a
multi-institutional sample. The Internet and Higher Education 11(3–4),
133–136 (2008)
[2] Corich, S., Hunt, K., Hunt, L.: Computerised Content Analysis for Measur-
ing Critical Thinking within Discussion Forums. Journal of e-Learning and
Knowledge Society 2(1), 47 – 60 (2012)
[3] Culotta, A., Kulp, D., McCallum, A.: Gene prediction with conditional
random fields (2005)
[4] Donnelly, R., Gardner, J.: Content analysis of computer conferencing tran-
scripts. Interactive Learning Environments 19(4), 303–315 (2011)
[5] Feng, D., Shaw, E., Kim, J., Hovy, E.: An intelligent discussion-bot for
answering student queries in threaded discussions. In: Proceedings of the
11th international conference on Intelligent user interfaces. pp. 171–177.
ACM (2006)
[6] FitzGerald, N., Carenini, G., Murray, G., Joty, S.: Exploiting conversational
features to detect high-quality blog comments. In: Advances in Artificial
Intelligence, pp. 122–127. Springer (2011)
[7] Garrison, D.R.: Thinking Collaboratively: Learning in a Community of In-
quiry. Routledge, New York, NY (2015)
[8] Garrison, D.R., Anderson, T., Archer, W.: Critical Thinking, Cognitive
Presence, and Computer Conferencing in Distance Education. American
Journal of Distance Education 15(1), 7–23 (2001)
[9] Garrison, D.R., Anderson, T., Archer, W.: The first decade of the com-
munity of inquiry framework: A retrospective. The Internet and Higher
Education 13(1–2), 5–9 (2010)
[10] Garrison, D.R., Arbaugh, J.: Researching the community of inquiry frame-
work: Review, issues, and future directions. The Internet and Higher Edu-
cation 10(3), 157–172 (Jan 2007)
[11] Garrison, R., Cleveland-Innes, M., Fung, T.S.: Exploring causal relation-
ships among teaching, cognitive and social presence: Student perceptions of
the community of inquiry framework. The Internet and Higher Education
13(1–2), 31–36 (2010)
[12] Gaˇsevi´c, D., Adesope, O., Joksimovi´c, S., Kovanovi´c, V.: Externally-
facilitated regulation scaffolding and role assignment to develop cognitive
presence in asynchronous online discussions. The Internet and Higher Ed-
ucation 24, 53–65 (2015)
[13] Graesser, A.C., McNamara, D.S., Kulikowich, J.M.: Coh-metrix providing
multilevel analyses of text characteristics. Educational Researcher 40(5),
223–234 (2011)
[14] Jin, W.: Blog comments classification using tree structured conditional ran-
dom fields. Ph.D. thesis, University of British Columbia (Vancouver (2012)
[15] Kovanovic, V., Gasevic, D., Hatala, M.: Learning Analytics for Communities
of Inquiry. Journal of Learning Analytics 1(3), 195–198 (2014)
[16] Kovanovi´c, V., Gaˇsevi´c, D., Joksimovi´c, S., Hatala, M., Adesope, O.: Ana-
lytics of communities of inquiry: Effects of learning technology use on cog-
nitive presence in asynchronous online discussions. The Internet and Higher
Education 27, 74–89 (2015)
[17] Kovanovi´c, V., Joksimovi´c, S., Gaˇsevi´c, D., Hatala, M.: Automated Content
Analysis of Online Discussion Transcripts. In: Proceedings of the Workshops
at the LAK 2014 Conference co-located with 4th International Conference
on Learning Analytics and Knowledge (LAK 2014). Indianapolis, IN (Mar
[18] Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields:
Probabilistic models for segmenting and labeling sequence data (2001)
[19] Landis, J.R., Koch, G.G.: The measurement of observer agreement for cat-
egorical data. biometrics pp. 159–174 (1977)
[20] Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., Mc-
Closky, D.: The Stanford CoreNLP natural language processing toolkit.
In: Proceedings of 52nd Annual Meeting of the Association for Com-
putational Linguistics: System Demonstrations. pp. 55–60 (2014), http:
[21] McCallum, A.K.: Mallet: A machine learning for language toolkit. Tech.
rep. (2002),
[22] McKlin, T., Harmon, S., Evans, W., Jones, M.: Cognitive presence in web-
based learning: A content analysis of students’ online discussions. In: IT
Forum. vol. 60 (2002)
[23] Ravi, S., Kim, J.: Profiling student interactions in threaded discussions with
speech act classifiers. Frontiers in Artificial Intelligence and Applications
158, 357 (2007)
[24] Shen, D., Sun, J.T., Li, H., Yang, Q., Chen, Z.: Document summarization
using conditional random fields. In: IJCAI International Joint Conference
on Artificial Intelligence. pp. 2862–2867 (2007)
[25] Soller, A., Lesgold, A.: A computational approach to analyzing online
knowledge sharing interaction. In: Proceedings of Artificial Intelligence in
Education. pp. 253–260 (2003)
[26] Sutton, C., McCallum, A.: An introduction to conditional random fields.
Foundations and Trends in Machine Learning 4(4), 267–373 (2011)
[27] Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words:
Liwc and computerized text analysis methods. Journal of language and
social psychology 29(1), 24–54 (2010)
[28] Wang, H., Wang, C., Zhai, C.X., Han, J.: Learning online discussion struc-
tures by conditional random fields. In: SIGIR’11 - Proceedings of the 34th
International ACM SIGIR Conference on Research and Development in In-
formation Retrieval. pp. 435–444. Beijing (2011)
... Several automated and semi-automated methods have been proposed in the recent literature (Barbosa et al., 2020;Ferreira et al., 2018;Kovanović et al., 2014bKovanović et al., , 2016Neto et al., 2018;Rolim et al., 2019;Waters et al., 2015). While early approaches applied a combination of word and phrase counts features with black-box machine learning algorithms (i.e., Neural Networks and Support Vector Machine) (Kovanović, Joksimović, et al., 2014;Mcklin, 2004), recent studies proposed the adoption of features based on psychological processes, writing cohesiveness, and discussion structure compound with decision tree algorithms (Barbosa et al., 2020;Kovanović, Joksimović, et al., 2014;Neto et al., 2018). ...
... To overcome these problems, the previous methods reported on the literature used domain-independent features and decision trees, especially the random forests algorithm. In this direction, Kovanović et al. (2016), Neto et al. (2018), and Barbosa et al. (2020) adopted random forests and features based on linguistic resources (Coh-metrix (McNamara et al., 2014) and LIWC (Tausczik & Pennebaker, 2010)), latent semantic analysis, named entity recognition, and discussion structures (Waters et al., 2015), to identify the phases of cognitive presence. These studies reached Cohen's κ of 0.63 (Kovanović et al., 2016), 0.72 (Neto et al., 2018), and 0.53 (Barbosa et al., 2020) when applied to discussion messages written in English, Portuguese and across these two languages, respectively. ...
... We used the metrics of accuracy, Cohen's κ, F 1 score, weighted-averaged F 1 score, and error rate of each class to measure the performance of our CNN and RF classifiers. Accuracy score and Cohen's κ are commonly used to evaluate the performance of a supervised machine-learning classifier (i.e., input variables have been pre-classified), in educational research (Barbosa et al., 2020;Kovanović et al., 2014bKovanović et al., , 2016Neto et al., 2018;Wang et al., 2015;Waters et al., 2015). Accuracy score is defined as the number of true positives (TP c ) plus the number of false positives (FP c ) over the sum of all the true and false predictions for each class c. ...
Full-text available
This paper proposes the adoption of a deep learning method to automate the categorisation of online discussion messages according to the phases of cognitive presence, a fundamental construct from the widely used Community of Inquiry (CoI) framework of online learning. We investigated not only the performance of a deep learning classifier but also its generalisability and interpretability, using explainable artificial intelligence algorithms. In the study, we compared a Convolution Neural Network (CNN) model with the previous approaches reported on the literature based on random forest classifiers and linguistics features of psychological processes and cohesion. The CNN classifier trained and tested on the individual data set reached results up to Cohen's κ of 0.528, demonstrating a similar performance to those of the random forest classifiers. Also, the generalisability outcomes of the CNN classifiers across two disciplinary courses were similar to the results of the random forest approach. Finally, the visualisations of explainable artificial intelligence provide novel insights into identifying the phases of cognitive presence by word-level relevant indicators, as a complement to the feature importance analysis from the random forest. Thus, we envisage combining the deep learning method and the conventional machine learning algorithms (e.g. random forest) as complementary approaches to classify the phases of cognitive presence.
... A template for automatic classification of posts according to the four phases of cognitive presence using Conditional Random Fields (CRFs) was proposed by Waters et al. [10]. The approach is based on the implementation of the Linear-Chain Conditional Random Field (LCCRF). ...
... 3) Context features discussion: Other context features proposed by Waters et al. [10] and used in other previous work [12], [13], [22] were incorporated into the feature set of this work. In total, six features were added: 1) Number of replies: an integer variable indicating the number of responses that a given message received; 2) Message Depth: an integer variable showing the position of a message within a discussion tree; 3-4) Cosine similarity to previous/next messages: a real variable showing how much the current message is based on the information previously/later posted; and 5-6) Start/end indicators: an indicator (0/1) that shows if a message is the first/last in a discussion thread. ...
... The evaluation of the classifier also showed that the optimization of the parameter mtry (i.e., the number of attributes used in each tree of the forest) improved the final result by 0.05, in the value of Cohen's κ and 0.04, in the accuracy [9], [22]. It is important to highlight that the methodology proposed in this study, the adoption of features based on the structure of the text and discussions instead of traditional text content features, achieved better results in the classification of the cognitive presence when compared to previous work [7]- [10], [12], [13], [22], [26]. On the other hand, the literature suggests that in order to build a classifier generic enough to fit different contexts, a larger dataset than the one used in the current study is needed [1], [44]. ...
Full-text available
This paper investigates the impact of the use of data from different educational contexts in the automatic classification of online discussion messages according to cognitive presence, an essential construct of the community of inquiry model. In particular, this paper analyzed online discussion messages written in Brazilian Portuguese from two different courses that were from different subject areas (Biology and Technology) and had different teaching presences in the online discussions. The study explored a set of 127 features of online discussion messages and a random forest classifier to automatically recognize the phases of the cognitive presence in online discussion messages. The results showed that the classifier achieved better performance whenever applied to the entire dataset. It reveals that when a classifier is created for a specific course it is not generic enough to be applied to a course from a different field of knowledge. The results also showed the importance of the features that were predictive of the phases of the cognitive presence in the educational context. Based on the findings of this study, future work should adopt the same feature set as used in the current study, but it should train the classifier of the cognitive presence on datasets in subject areas related to the topic of the discussions.
... Many studies have shown the practical value and benefits of the CoI model for supporting and developing online learning experiences, and for promoting student engagement and learning outcomes [23]. As the assessment of the CoI presences requires a significant amount of manual coding of online discussion transcripts, several automated and semi-automated approaches have been proposed [11,17,30,31,37,38,41,44,54]. Most of the automated approaches focused on cognitive presence and categorizing students' discussion messages into one of the four phases of the cognitive presence cycle [21]. ...
... Most of the automated approaches focused on cognitive presence and categorizing students' discussion messages into one of the four phases of the cognitive presence cycle [21]. While early automation approaches used simple, dictionary-based techniques to specify indicators of different phases of cognitive presence [37,38], later studies [30,31,54] adopted commonly used text classification and natural language processing (NLP) techniques to discover important indicators of different cognitive presence phases automatically. While the use of classical NLP techniques provided good results, early work in this area also pointed out some important challenges in automating cognitive presence classification [13,30]. ...
... The similar challenges of high-class imbalance were observed, as well as issues with high dimensionality of feature space based on bag-of-words techniques. To alleviate some of these issues, several improvements were proposed [31,54]. For example, conditional random fields, which are a popular structured classification technique, was adopted by Waters et al. [54], achieving Cohen's κ of 0.48. ...
Conference Paper
Full-text available
This paper presents a study that examined automated cross-language classification of online discussion messages for the levels of cognitive presence, a key construct from the widely used Community of Inquiry (CoI) model of online learning. Specifically, we examined the classification of 1,500 Portuguese language discussion messages using a classifier trained on a corpus of the 1,747 English language discussion messages. In the study, a random forest classifier was developed using a small set of 108 validated indicators of psychological processes, linguistic coherence, and online discussion structure. The classifier obtained 67% accuracy and Cohen’s κ of 0.32, showing a moderate level of inter-rater agreement above chance and the general viability of the proposed approach. Most importantly, the findings suggest that certain aspects of cognitive presence construct are highly generalizable and transfer across different languages. Finally, the paper also presents a novel method for addressing class imbalance problem using a generic algorithm heuristic technique, which provided substantial improvements over the use of imbalanced dataset. Results and practical implications are further discussed.
... The most widely-used model is the measurement rubrics of cognitive presence in Community of Inquiry (CoI) model established by Garrison, Anderson and Archer [11]. There are already promising studies on automated classification of discussion messages based on the Garrison et al. 's measurement rubric of cognitive presence [4,8,18,20,23,29]. These studies of asynchronous discussions classification tend to only cater for online for-credit undergraduate courses. ...
... They reported that the inter-rater reliability achieved an agreement of 98.1% and Cohen's κ of 0.974 between two raters. The same dataset has been reused in another four studies of automated cognitive presence classification for over five years [8,12,20,29]. Earlier studies of automated classifiers of cognitive presence did not report the inter-rater reliability of manual classification work [4,23]. ...
... Additionally, these classified messages will be our initial training data for machine learning algorithms since the final goal of our project is to develop automated cognitive classifiers. The automated classifiers can largely improve the efficiency and quality of assessing cognitive presence in massive data sets of online discussions and have great potentials for real-time remediation of student learning in Previous automated classifier studies of cognitive presence indicate that manual classification results achieved 100% reliability through negotiation between raters before automated model training [8,18,20,29]. It is because the machine-learning algorithms they utilised, such as Support Vector Machine [18], Conditional Random Fields [29], Random Forest [8,20], need a single and confirmed category of each discussion message. ...
Conference Paper
Full-text available
This paper reports on early stages of a machine learning research project, where phases of cognitive presence in MOOC discussions were manually coded in preparation for training automated cognitive classifiers. We present a manual-classification rubric combining Garrison, Anderson and Archer’s (2001) coding scheme with Park’s (2009) revised version for a target MOOC. The inter-rater reliability between two raters achieved 95.4% agreement with a Cohen’s weighted kappa of 0.96, demonstrating our classification rubric is plausible for the target MOOC dataset. The classification rubric, originally intended for for-credit, undergraduate courses, can be applied to a MOOC context.We found that the main disagreements between two raters lay on adjacent cognitive phases, implying that additional categories may exist between cognitive phases in such MOOC discussion messages. Overall, our results suggest a reliable rubric for classifying cognitive phases in discussion messages of the target MOOC by two raters. This indicates we are in a position to apply machine learning algorithms which can also cater for data with inter-rater disagreements in future automated classification studies.
... Corich et al. [6] developed an automated content analysis tool and used it to classify forum messages into one of the four levels of cognitive presence. Waters et al. [25] looked at predicting the level of cognitive presence for entire chains of messages, instead of treating messages in isolation -since cognitive presence is expected to develop over time. Kovanović et al. [15] developed a model that is able to identify the level of cognitive presence with 70.3% accuracy, compared to gold-standard human annotation. ...
... Sometimes a message can show indications of two distinct phases of cognitive presence. The coding scheme indicates that these should be coded with the higher phase [25]. This is sometimes referred to as coding up. ...
... One approach to exploiting the temporal and contextual aspect of discussion threads was explored by Waters et al. [25] using conditional random fields (CRFs). Again, the data set used was the same as in the present study. ...
Conference Paper
Full-text available
The widespread use of online discussion forums in educational settings provides a rich source of data for researchers interested in how collaboration and interaction can foster effective learning. Such online behaviour can be understood through the Community of Inquiry framework, and the cognitive presence construct in particular can be used to characterise the depth of a student's critical engagement with course material. Automated methods have been developed to support this task, but many studies used small data sets, and there have been few replication studies. In this work, we present findings related to the robustness and generalisability of automated classification methods for detecting cognitive presence in discussion forum transcripts. We closely examined one published state-of-the-art model, comparing different approaches to managing unbalanced classes in the data. By demonstrating how commonly-used data preprocessing practices can lead to over-optimistic results, we contribute to the development of the field so that the results of automated content analysis can be used with confidence.
... Recent studies examined the use of different features and classifiers. For instance, Kovanović et al. [28] developed an approach that relies on features based on Coh-Metrix [23], LIWC [22], LSA similarity, named entities, and discussion context [30] instead of word counts used by previous works. Moreover, they applied a random forest algorithm to classify the messages according to the categories of Cognitive Presence. ...
... The results obtained indicate the potential of using XGBoost and AdaBoost. Moreover, the proposed approach reached the values of κ which are better than those of the classifiers of Cognitive Presence developed for English [28], [30]. ...
... Many studies have proposed methods for automatic classi cation of discussion transcripts according to the coding schemes for the cognitive and social presences [4,15,28,31,38,48]. Most of the recent approaches focused on the adoption of a relatively small set of features representative of relevant psychological processes, linguistic properties, and writing cohesiveness in order to classify students' messages automatically. ...
... Therefore, the recent literature proposes the adoption of a combination of (i) textual features re ective of psychological processes, writing cohesiveness and discussion structure, and (ii) decision tree classi ers [7], which allow, due to their white box nature, for the analysis of the in uence of the di erent features on the nal classi cation results. Kovanović et al. [28] and Neto et al. [38] adopted random forest classi ers and features based on Coh-metrix [37], LIWC [46], latent semantic analysis (LSA)-based similarity, named entities, and discussion context [48], to identify phases of cognitive presence for messages written in English (0.63 Cohen's ) and Portuguese (0.72 Cohen's ). Ferreira et al. [15] proposed a similar approach, using the same features and classi ers as used in [28,38], to develop three binary classi ers for automatic coding of a discussion message based on the three categories of social presence. ...
... Kovanović et al. [24] examined the use of a combination of bag-of-words (n-gram) and Part-of-Speech (POS) N-gram features for classifying cognitive presence using the Support Vector Machines (SVMs) classifier, achieving 0. 41 Cohen's κ. Kovanović et al. [25] and Neto et al. [35] adopted features based on Coh-metrix [32], LIWC [44], latent semantic analysis (LSA) similarity, named entities, and discussion context [45], to identify phases of cognitive presence for messages written in English (0.63 Cohen's κ) and Portuguese (0.72 Cohen's κ). Besides, the authors applied a random forest classifier [6], which also allowed for the analysis of the influence of the different features on the final classification results. ...
... Although we did not find any other related work which performed a similar analysis of social presence to compare to, it is important to mention that the approach presented here reached accuracy results better than the classifiers of cognitive presence developed for English [24,25,45]. ...
... Sometimes a message can show indications of two distinct phases of cognitive presence. The coding scheme indicates that these should be coded with the higher phase [14]. This is sometimes referred to as coding up. ...
... The number of direct replies to the message. Messages relating to triggering events and exploration are expected to generate more replies than those in deeper phases [14]. ...
Conference Paper
Full-text available
This paper describes work in progress to answer the question of how we can identify and model the depth and quality of student participation in class discussion forums using the content of the discussion forum messages. We look at two widely-studied frameworks for assessing critical discourse and cognitive engagement: the ICAP and Community of Inquiry (CoI) frameworks. Our goal is to discover where they agree and where they offer complementary perspectives on learning. In this study, we train predictive classifiers for both frameworks on the same data set in order to discover which attributes are most predictive and how those correlate with the framework labels. We find that greater depth and quality of participation is associated with longer and more complex messages in both frameworks, and that the threaded reply structure matters more than temporal order. We find some important differences as well, particularly in the treatment of messages of affirmation.
... Finally, Waters et al. [83] implement a machine learning approach to predict students' critical thinking levels in formal online discussions according to CoI. In their study, they adopt word count, post similarity, chronological order, and other features to build a model that achieves a moderate level of accuracy. ...
Full-text available
Learners' progress within Computer-Supported Collaborative Learning (CSCL) environments is typically measured via analysis and interpretation of quantitative web interaction measures. However, the usefulness of these 'proxies for learning' is questioned as they do not necessarily reflect critical thinking-an essential component of collaborative learning. Research indicates that pedagogical content analysis schemes have value in measuring critical discourse in small scale, formal, online learning environments, but research using these methods on high volume, informal, Massive Open Online Course (MOOC) forums is less common. The challenge in this setting is to develop valid and reliable indicators that operate successfully at scale. In this paper, we test two established coding schemes used for pedagogical content analysis of online discussions in a large-scale review of MOOC comment data. Pedagogical Scores (PS) are derived from manual ratings applied to comments by raters and correlated with automatically derived linguistic and interaction indicators. Results show that the content analysis methods are reliable, and are very strongly correlated with each other, suggesting that their specific format is not significant in this setting. In addition, the methods are strongly associated with relevant linguistic indicators of higher levels of learning and have weaker correlations with other linguistic and interaction metrics. This suggests promise for further research using Machine Learning techniques, with the goal of providing realistic feedback to instructors, learners and learning designers.
Full-text available
This paper describes a doctoral research that focuses on the development of a learning analytics framework for inquiry-based digital learning. This research builds on the the Community of Inquiry model (CoI) as a foundation commonly used in research and practice of digital learning and teaching. Specifically, the main contributions of this research are: i) the development of a novel text classification algorithm for (semi)automated message classification which enables for easier adoption of the CoI model, ii) understanding of the relationships between different socio-technological interactions and the dimensions of the CoI model.
Full-text available
This paper describes a study that looked at the effects of different technology-use profiles on educational experience within communities of inquiry, and how they are related to the students’ levels of cognitive presence in asynchronous online discussions. Through clustering of students (N=81) in a graduate distance education engineering course, we identified six different profiles: 1) Task-focused users, 2) content-focused no users, 3) no users, 4) highly intensive users, 5) content-focused intensive users, and 6) Socially-focused intensive users. Identified profiles significantly differ in terms of their use of learning platform and their levels of cognitive presence, with large effect sizes of 0.54 and 0.19 multivariate η2, respectively. Given that several profiles are associated with higher levels of cognitive presence, our results suggest multiple ways for students to be successful within communities of inquiry. Our results also emphasize a need for a different instructional support and pedagogical interventions for different technology-use profiles.
Conference Paper
Full-text available
We describe the design and use of the Stanford CoreNLP toolkit, an extensible pipeline that provides core natural language analysis. This toolkit is quite widely used, both in the research NLP community and also among commercial and government users of open source NLP technology. We suggest that this follows from a simple, approachable design, straight-forward interfaces, the inclusion of robust and good quality analysis components, and not requiring use of a large amount of associated baggage.
Full-text available
This paper describes a study that looked at the effects of different teaching presence approaches in communities of inquiry, and ways in which student–student online discussions with high levels of cognitive presence can be designed. Specifically, this paper proposes that high-levels of cognitive presence can be facilitated in online courses, based on the community of inquiry model, by building upon existing research in i) self-regulated learning through externally-facilitated regulation scaffolding and ii) computer-supported collaborative learning through role assignment. We conducted a quasi-experimental study in a fully-online course (N = 82) using six offerings of the course. After performing a quantitative content analysis of online discussion transcripts, a multilevel linear modeling analysis showed the significant positive effects of both externally-facilitated regulation scaffolding and role assignment on the level of cognitive presence. Specifically, the results showed that externally-facilitated regulation scaffolding had a higher effect on cognitive presence than extrinsically induced motivation through grades. The results showed the effectiveness of role assignment to facilitate a high-level of cognitive presence. More importantly, the results showed a significant effect of the interaction between externally-facilitated regulation scaffolding and role assignment on cognitive presence. The paper concludes with a discussion of practical and theoretical implications.
Full-text available
Computer analyses of text characteristics are often used by reading teachers, researchers, and policy makers when selecting texts for students. The authors of this article identify components of language, discourse, and cognition that underlie traditional automated metrics of text difficulty and their new Coh-Metrix system. Coh-Metrix analyzes texts on multiple measures of language and discourse that are aligned with multilevel theoretical frameworks of comprehension. The authors discuss five major factors that account for most of the variance in texts across grade levels and text categories: word concreteness, syntactic simplicity, referential cohesion, causal cohesion, and narrativity. They consider the importance of both quantitative and qualitative characteristics of texts for assigning the right text to the right student at the right time.
Full-text available
Since its publication in The Internet and Higher Education, Garrison, Anderson, and Archer's [Garrison, D. R., Anderson, T., & Archer, W. (2000). Critical inquiry in a text-based environment: Computer conferencing in higher education. The Internet and Higher Education, 2(2–3), 87–105.] community of inquiry (CoI) framework has generated substantial interest among online learning researchers. This literature review examines recent research pertaining to the overall framework as well as to specific studies on social, teaching, and cognitive presence. We then use the findings from this literature to identify potential future directions for research. Some of these research directions include the need for more quantitatively-oriented studies, the need for more cross-disciplinary studies, and the opportunities for identifying factors that moderate and/or extend the relationship between the framework's components and online course outcomes.
Full-text available
This article describes a practical approach to judging the nature and quality of critical discourse in a computer conference. A model of a critical community of inquiry frames the research. A core concept in defining a community of inquiry is cognitive presence. In turn, the practical inquiry model operationalizes cognitive presence for the purpose of developing a tool to assess critical discourse and reflection. The authors present encouraging empirical findings related to an attempt to create an efficient and reliable instrument to assess the nature and quality of critical discourse and thinking in a text‐based educational context. Finally, the authors suggest that cognitive presence (i.e., critical, practical inquiry) can be created and supported in a computer‐conference environment with appropriate teaching and social presence.