ArticlePDF Available

Abstract

Online discussion forums are widely used for active textual interaction between lecturers and students, and to see how the students have progressed in a learning process. The objective of this study is to compare appropriate machine-learning models to assess sentiments and Bloom's epistemic taxonomy based on textual comments in educational discussion forums. The proposed method is called the hierarchical approach of Bloom-Epistemic and Sentiment Analysis (BE-Sent). The research methodology consists of three main steps. The first step is the data collection from the internal discussion forum and YouTube comments of a Web Programming channel. The next step is text preprocessing to annotate the text and clear unimportant words. Furthermore, with the text dataset that has been successfully cleaned, sentiment analysis and epistemic categorization will be done in each sentence of the text. Sentiment analysis is divided into three categories: positive, negative, and neutral. Bloom's epistemic is divided into six categories: remembering, understanding, applying, analyzing, evaluating, and creating. This research has succeeded in producing a course learning subsystem that assesses opinions based on text reviews of discussion forums according to the category of sentiment and epistemic analysis.
International Journal of Evaluation and Research in Education (IJERE)
Vol. 13, No. 1, February 2024, pp. 80~90
ISSN: 2252-8822, DOI: 10.11591/ijere.v13i1.26024 80
Journal homepage: http://ijere.iaescore.com
Bloom-epistemic and sentiment analysis hierarchical
classification in course discussion forums
Hapnes Toba1, Yolanda Trixie Hernita2, Mewati Ayub1, Maresha Caroline Wijanto2
1Master of Computer Science Study Program, Faculty of Information Technology, Maranatha Christian University, Bandung, Indonesia
2Bachelor of Informatics Study Program, Faculty of Information Technology, Maranatha Christian University, Bandung, Indonesia
Article Info
ABSTRACT
Article history:
Received Nov 21, 2022
Revised Apr 3, 2023
Accepted May 27, 2023
Online discussion forums are widely used for active textual interaction
between lecturers and students, and to see how the students have progressed
in a learning process. The objective of this study is to compare appropriate
machine-learning models to assess sentiments and Bloom’s epistemic
taxonomy based on textual comments in educational discussion forums. The
proposed method is called the hierarchical approach of Bloom-Epistemic
and Sentiment Analysis (BE-Sent). The research methodology consists of
three main steps. The first step is the data collection from the internal
discussion forum and YouTube comments of a Web Programming channel.
The next step is text preprocessing to annotate the text and clear unimportant
words. Furthermore, with the text dataset that has been successfully cleaned,
sentiment analysis and epistemic categorization will be done in each
sentence of the text. Sentiment analysis is divided into three categories:
positive, negative, and neutral. Bloom’s epistemic is divided into six
categories: remembering, understanding, applying, analyzing, evaluating,
and creating. This research has succeeded in producing a course learning
subsystem that assesses opinions based on text reviews of discussion forums
according to the category of sentiment and epistemic analysis.
Keywords:
Bloom taxonomy
Course learning system
Discussion forum
Machine-learning
Sentiment analysis
This is an open access article under the CC BY-SA license.
Corresponding Author:
Hapnes Toba
Computer Science Study Program, Faculty of Information Technology, Maranatha Christian University
Jalan Suria Sumantri No. 65, Bandung 40164, West Java, Indonesia
Email: hapnestoba@it.maranatha.edu
1. INTRODUCTION
Online discussion forums are one of the media that are currently being used by people to
communicate with each other. Through interaction in discussion forums, it is very easy for discussion
members to be able to develop a cooperative attitude and think critically in a forum [1]. A discussion forum
is a positive tool for communicating with one another to share ideas and opinions. One example of the
benefits of having a discussion forum for online learning is in the academic world. Learning activities can be
more effective because students can solve problems through group discussions with lecturer observations
during group discussions [2]. However, in the current e-learning environment, discussion forums have not
been optimally used. Therefore, there is a need for collaboration between lecturers and students, as members
of a social network, so that the discussion forum can run well [3]. In the discussion forums, lecturers can
observe textual interactions between students and students can use the discussion forums in communicating
with other students in the group. Students can express their opinions to each other, find solutions to problems,
and develop each others abilities, attitudes, and forms of positive behavior [4].
Discussion forums are part of the collaborative learning strategy. The purpose of such a
collaborative learning model is to improve the ability of those who do not understand a study material
Int J Eval & Res Educ ISSN: 2252-8822
Bloom-epistemic and sentiment analysis hierarchical classification in course discussion (Hapnes Toba)
81
perfectly. Students can share and interact with each other with different thoughts, opinions, and
interpretations of learning materials and assignments. Discussion forum datasets are useful for analyzing each
interaction between group members [5]. Given this situation, our study uses data from chat history in online
learning environments.
One way to determine the attitude or nature of participants in the discussion through the history of
textual interactions is to categorize the history of the conversation by sentiment [2], [6]. The purpose of
sentiment analysis is to analyze the nature of each text message, the nature of sentiment analysis is primarily
divided into three groups, namely negative, neutral, and positive. The benefit of this analysis is that it makes
it possible to determine to what extent opinions are dispersed among the participants in the discussion to find
solutions in a learning phase [7], [8]. Blooms taxonomy may also serve as a standard to achieve the
outcomes of the discussion forum, which is divided into six categories, namely remembering, understanding,
applying, analyzing, evaluating, and creating [9], [10].
In learning environments, the automatic analysis of opinions [5][7] and Bloom’s epistemic
taxonomy [11][13] will be valuable for lecturers to adjust or extend the study materials based on students’
comments. However, in recent years, the two approaches have evolved independently. To the extent of the
knowledge, the research will fill the gap by combining epistemic analysis and sentiment analysis in one
framework. The researchers will also demonstrate the features of machine learning algorithms that will be
useful for each step of the analysis.
2. RESEARCH METHOD
2.1. Research framework
In general, the research framework involves four main phases. Figure 1 illustrates the research steps.
The initial stage is the stage of collecting data from primary data sources, i.e., user comments in the
Indonesian language from a programming course channel on YouTube. The data is gathered and will be
processed during the annotation phase. The annotation step represents the data preprocessing and data
labeling step. Data preprocessing is a process that must be performed as part of a data mining process so that
the data to be used can be generated as necessary.
During the process of extracting sentences or words from the dataset, there are several stages of
preprocessing, namely tokenization, filtering, and labeling [14]. After the preprocessing steps, the dataset will
be annotated. Each text that appears in the forum will be annotated manually by three annotators (human
assessment) depending on the sentiment class and epistemic category of the Bloom taxonomy [15], [16].
In the model creation phase, the concept of a two-step hierarchical classification is used to predict
Blooms epistemic and sentiment analysis (BE-Sent) on the forum datasets [17]. There are therefore two
groups of classes: sentiment and epistemic. This research compares the random forest (RF) and long-short
term memory (LSTM) methods that are used to learn model scenarios. During the evaluation phase of the
model, the calculation of model performance is carried out by calculating the accuracy and confusion matrix
analysis.
2.2. Contributions
The key contributions to this research include the following: i) We offer a new approach to
analyzing the learning progress of students through interaction in discussion forums. To achieve this, we
introduce the BE-Sent machine-learning model to predict Bloom’s epistemic taxonomy and combine it with
sentiment analysis to predict the students’ opinions regarding the subject discussed in the forum; ii) We
conducted a thorough analysis to identify challenging cases in the grouping of Bloom categories with the
machine learning model mentioned; iii) We show how the BE-Sent model can be integrated into a course
learning system (CLS).
2.3. Sentiment analysis
Sentiment analysis is also referred to as opinion mining [7]. It is a field of study that analyzes the
opinions, judgments, and emotions of people towards different aspects such as products, services,
organizations, individuals, problems, events, topics, and their related attributes [6]. Sentiment analysis has
the objective to identify and evaluate a portion of text that expresses positive sentiments, negative sentiments,
or neutral sentiments.
Positive sentiment is expressed when the text states happiness, approval, or agreement. For example,
“I had a fantastic experience programming in Java language”, would be a positive sentiment. Negative
sentiment is expressed when the text states sadness, disapproval, or disagreement. For example, “I was
extremely not satisfied with the service at the faculty”, would be a negative sentiment. Neutral sentiment is
expressed when the text states neither positive nor negative emotion. For example, “The source code is long”,
would be a neutral sentiment.
ISSN: 2252-8822
Int J Eval & Res Educ, Vol. 13, No. 1, February 2024: 80-90
82
Sentiment analysis aims thus to find the value of emotional polarity in the text so that the polarity in
each discussion conversation text can be assessed and classified. The generic approach used to analyze
sentiment can be divided into three categories, namely machine learning [18], lexicon-based [19], and hybrid
approach [20]. The categories are based on the nature of the model forming.
Sentiment analysis based on machine learning is a computational approach that determines the
sentiment expressed in a piece of text by using machine learning algorithms. In this approach, a model is
trained on a large dataset of labeled text samples (e.g., positive, negative, neutral) to learn patterns and
relationships between the words, phrases, and sentiments. During the testing phase, the model is applied to
classify new, unseen text data into one of the predefined sentiment categories. There are several machine
learning algorithms used in sentiment analysis, such as naïve Bayes, support vector machines (SVM), and
neural networks. In lexicon-based sentiment analysis, a list of words and phrases that have been annotated
with sentiment polarity information (e.g., positive, negative, or neutral) is created, like a dictionary. To
determine the sentiment of a piece of text, the words in the text are matched with the words in the sentiment
lexicon. Finally, their sentiment polarity scores are aggregated to compute the overall sentiment of the text.
A hybrid approach to sentiment analysis aggregates multiple methods to improve the accuracy and
robustness of sentiment analysis. For example, a hybrid approach can use a combination of lexicon-based
sentiment analysis and machine learning-based sentiment analysis to take advantage of the strengths of both
methods. The lexicon-based approach can provide fast and broad sentiment information, while the machine
learning approach can provide more nuanced and context-aware sentiment information. The hybrid approach
can achieve precision and robustness superior to one or the other method alone and is widely used in practical
applications of sentiment analysis.
Figure 1. The general research framework and the concept of two-step hierarchical classification during
model creation
2.4. Bloom’s taxonomy for epistemic analysis
Blooms taxonomy was developed in 1956 by Bloom [21]. Taxonomy is a system that underlies
classification based on scientific research data. The epistemic cognitive domain of Bloom’s taxonomy is an
aspect of ability related to aspects of knowledge, reasoning, or thought which is divided into six categories,
as given in Table 1 [22]. Bloom’s taxonomy is identified with some influential verbs at each of its levels.
Int J Eval & Res Educ ISSN: 2252-8822
Bloom-epistemic and sentiment analysis hierarchical classification in course discussion (Hapnes Toba)
83
Table 1. The Bloom’s taxonomy
Taxonomy
Description
Remembering
The ability in the form of memory to remember (recall) or recognize (recognition) such
as terminology, definitions, facts, ideas, methodologies, basic principles, and so on.
Understanding
The ability to understand and capture the meaning of what is learned. The existence of
the ability to decipher, and change the data presented in a certain form to another form.
Applying
The ability to apply a method to handle a new case. Application of an idea, procedure,
theory, and so on.
Analyzing
The ability to break complex information into small parts and be able to relate it to
other information so that it can be understood better.
Evaluating
The ability to recognize data or information that must be obtained to produce a solution
that is needed.
Creating
The ability to judge an argument regarding something that is understood, done,
analyzed, and produced.
2.5. Random forest classifier
As an advanced machine-learning algorithm, RF classifiers can be used for sentiment analysis based
on the decision tree algorithm [23]. It is also used to perform classification and regression. Formally, an RF is
a combination of several ‘good’ decision tree models which are then unified as one big model. RF
implements bootstrap sampling to build prediction trees. Each decision tree predicts with a random predictor
and the accuracy value gets better if the number of trees used is large.
2.6. Long short-term memory
Long short-term memory (LSTM) is one of the powerful classification methods in deep learning.
LSTM architecture has been developed as a solution to the problem of vanishing and exploding gradients
encountered in recurring neural networks (RNNs) [24]. A vanishing gradient is caused due to the slowly
decreased and unchanged weight values in the last layer thus causing it to never get a better or convergent
result. On the other hand, too many increasing values of gradient cause the weight values in several layers to
also increase, and thus the optimization algorithm becomes divergent which is called an exploding gradient.
LSTM has a chain-like structure where in each cell there are three gates, namely the forget gate,
input gate, and output gate. LSTM has been proven to be effective for text mining solutions [25]. A
simplified form of an LSTM architecture can be seen in Figure 2. A forget gate is a gate that manages the
deletion of previously stored memory and at this gate will determine whether the information from input and
output 1 will be allowed to continue the next process or not. This layer will produce output between 0 and 1.
Output 0 is information that will be forgotten while output 1 is information that will not be forgotten and is
allowed to pass.
The input gate determines which parts are to be updated and manages the storage of new
information. The input gate has two parts, i.e., the neuron layer with the sigmoid activation function, and the
neuron layer with the tanh activation function. The output gate decides what is to be produced as the final
decision in each neuron. At the output gates, there are two gates to be implemented. Firstly, it will be decided
which value in the memory cell will be supplied using the sigmoid activation function. Secondly, a value will
be placed in the memory cell using the activation function tanh.
Figure 2. A generic layout of an LSTM cell architecture
2.7. YouTube application programming interface
YouTube application programming interface (API) is a public service provided by YouTube for
programmers to create some interactions with video resources on YouTube channels [26]. Developers need
to create some settings on Google Cloud Console to use the YouTube API [27]. An API is a very useful
method for connecting applications or websites.
ISSN: 2252-8822
Int J Eval & Res Educ, Vol. 13, No. 1, February 2024: 80-90
84
3. RESULTS AND DISCUSSION
The dataset is extracted from a publicly available discussion forum, i.e., the Pasundan University
Web Programming Course on the YouTube Channel [28]. The extracted textual comments are taken from
sessions 10 to 12. The general statistics for the dataset can be found in Table 2.
Table 2. General statistics of the dataset
Number of
videos
Number of main
forums
Number of threads
(Reply)
Number of
chats
Number of
words
6
3,281
1,113
4,396
51,202
A discussion example, some replies, and the annotated BE-Sent are given in Table 2. The annotation
is carried out by three students with a 0.72 inter-annotator agreement. For each chat, the annotator searches
for the main word or phrase as an indicator to categorize it into sentiment (positive, negative, or neutral), and
Bloom’s categories: remembering, understanding, applying, analyzing, evaluating, and creating. For instance,
in Table 3, the first chat contains the phrase tidak berfungsi (does not work), and the word coba(to try).
Based on that, the chat is annotated with negative sentiment and Bloom’s applying category.
Table 3. Forum example and the annotation
Forum type
Textual content (in Bahasa Indonesia)
Textual content (translated into English)
Annotation
Main
Ingin bertanya mengapa jquery tidak
berfungsi di semua web padahal kode
yang sudah coba dimasukkan sudah
sama dengan materi parallax mohon
bantuannya
I want to ask why the jquery script does
not work on all web applications, even
though the codes are the same as the
parallax material, please help.
Sentiment = negative
Epistemic = applying
Reply
Cek urutan tag script jquery harus
lebih dulu dari bootstrap
Please check the order of the jquery script
tags. Those must appear before the
bootstrap.
Sentiment = neutral
Epistemic = analyzing
Reply
Baik dicoba
Noted, I will try the method.
Sentiment = positive
Epistemic = analyzing
Main
Terima kasih tutornya
Thank you for the tutorial.
Sentiment = positive
Epistemic = applying
Reply
Sama-sama, semoga bermanfaat
You're welcome, I hope it will be helpful
Sentiment = neutral
Epistemic = evaluating
Further statistics of the training data are presented in Table 4. These statistics will be important to
analyze whether or not we need to balance the distribution of classes. Based on the facts in Table 4, the
synthetic minority oversampling technique (SMOTE) algorithm is applied to deal with imbalanced data
during training. In total, our dataset consists of 4,396 chats to train and validate the models, and 100 chats to
test models. During the training phase, we split the dataset into a hold-out composition of 70% train and 30%
validation. We choose a 5-fold cross-validation (CV) setting as a baseline and compare the performance to
the trained models. For the final evaluation, we collect a new dataset taken from the discussion forum in our
internal CLS. There are 100 chats randomly chosen from the undergraduate ‘Introduction of Web
Programming’ informatics subject during the even semester of the 2020/2021 academic year.
Table 4. Training data distribution
Sentiment
Bloom
Positive
Neutral
Negative
Remember
Understand
Apply
Analyze
Evaluate
Create
1,742
(39.63%)
2,332
(53.05%)
322
(7.32%)
36
(0.82%)
2,688
(61.15%)
1,599
(36.37%)
24
(0.55%)
24
(0.55%)
25
(0.57%)
Total = 4,396
Total = 4,396
3.1. Machine-learning performance
Several machine-learning experiments are carried out in this research: a 5-fold CV as the baseline, a
hold-out of 70% training and 30% validation scenario, and a testing scenario from our internal discussion
forum. The CV scenario is an acceptable method to tune and predict a global expectation of the model
performance since there will be some training-testing iterations with randomly chosen instances from the
dataset. The result of the CV experiment can be seen in Table 5.
Int J Eval & Res Educ ISSN: 2252-8822
Bloom-epistemic and sentiment analysis hierarchical classification in course discussion (Hapnes Toba)
85
Based on the baseline results of the 5 CV experiments in Table 5, the accuracy of the RF models is
statistically significantly better than that of the LSTM models (p=0.05). Typically, an RF can trace the order
of terms’ occurrences. Thus, RF models will determine the order of important terms and assign them to the
appropriate class. In addition, an LSTM model has the nature of randomness because of its artificial neural
network characteristics. It is therefore not always capable of tracing the order of occurrences of terms.
Table 5. The baseline: 5-fold cross-validation accuracy performance
Method
Sentiment
Epistemic
Iteration
RF (%)
LSTM (%)
RF (%)
LSTM (%)
1
85.9
84.7
81.7
81.2
2
83.0
82.9
80.4
76.9
3
82.9
81.7
82.4
79.7
4
84.0
81.9
83.4
81.7
5
84.6
82.3
81.4
78.2
Mean
84.1
82.7
81.9
79.5
Std. Dev.
1.3
1.2
1.1
2.0
However, an LSTM model may be more flexible in general cases, as will be seen in the additional
test dataset. An excerpt of an RF decision tree can be seen in Figure 3. In this example, we can track how the
RF tree simulates the flow of a conversation. The branching of the tree is calculated by employing the Gini
information gain [24], as shown in Table 6.
Figure 3. An RF Bloom decision tree excerpt for the ‘understanding’ classification
Table 6. An excerpt of the RF Bloom decision tree branch for the ‘understanding’ classification
Level
Branch
Textual content (in Bahasa Indonesia)
Textual content (translated into English)
Gini
0
Root
Ok trimakasih, ka
Ok, thank you, sister.
0.161
1
Left
Terimakasih ilmunya membantu pemula
Thank you for the knowledge. It is very helpful for
beginners.
0.160
2
Left
Bagus banget channel pengen programming
It is a very nice channel. Motivate me to learn to
program.
0.160
3
Left
Bahasa pemrograman dibenci javascript ups
lupa titik koma
I hate the javascript programming language. I
always forget the semi-colon.
0.159
4
Left
Bang maaf programmer butuhkan perusahaan
Excuse me, brother. As a programmer, I need a
work placement.
0.152
5
Left
Mantap banget bang pemula semoga ilmu
bermanfaat
It is very cool, bro. Hopefully, the knowledge will
be useful for beginners.
0.151
4
Right
Mengerti
I understand.
0.078
5
Right
Sehat dosen terima kasih ilmu penuh dedikasi
For all lecturers, thank you for your knowledge and
dedication. Keep healthy.
0.071
ISSN: 2252-8822
Int J Eval & Res Educ, Vol. 13, No. 1, February 2024: 80-90
86
Further investigations during the experiments can be followed in Table 7. A hold-out composition of
70% training and 30% validation proportion is applied to the dataset. This table shows that each of the
‘sentiment only’ and ‘epistemic only’ models outperform the multilabel approach. This suggests that each
classification problem has specific characteristics that need to be further classified with a dedicated model.
Based on these results, we propose to use a hierarchical approach. First, we classified the sentiment, and
based on the classified positive, negative, or neutral sentiment, a further epistemic classification process will
be performed (Figure 1). The accuracy performance of the hierarchical classification method is comparable
to the specific classification of sentiment and epistemic. This suggests that a hierarchical classification would
be preferable in our case. The traversal from the root to the leaves can be followed in Table 6.
Table 7. Accuracy performance using 70%-30% hold-out validation scenarios (in %) of the proposed method
Methods
5-CV RF
RF
5-CV LSTM
LSTM
Sentiment only
84.1 +/-1.3
85.70
81.2 +/-1.2
82.30
Epistemic only
82.7 +/-1.1
82.10
76.4 +/-1.1
77.30
Multilabel
62.1 +/-1.1
63.20
54.3 +/-1.1
53.40
Two-step hierarchical
83.1 +/-1.2
84.35
78.2 +/-1.2
79.35
The next experiment involves testing the models with another equivalent dataset from our in-house
CLS. The dataset consists of 100 randomly selected chats out of 14 regular course sessions. Comparable to
the external dataset (YouTube), the chats within the internal dataset, are mostly manually classified as
understanding (58%) and applying (42%). This implies that internal chats have also the tendency to
correspond to Bloom’s taxonomy for undergraduate studies. The objective of this experiment is to evaluate
how convergent the general (i.e., publicly available) discussion forums are compared to the internal one. The
result can be seen in Table 8.
From Table 8, it can be deduced that the performance of the test dataset decreased significantly.
This is caused by the unseen terms in the internal dataset. Internal chats tend to have more formal greetings
and questions regarding specific tasks or assignments. Another aspect is that internal chats are highly
engaged to the students. While some students are more active than others, they tend to be more supportive
and participatory.
Table 8. Accuracy performance of internal forum discussion test dataset
Methods
RF (%)
LSTM (%)
Sentiment only
67
53
Epistemic only
45
55
Hierarchical
62
49
The LSTM models are trained by using the default Keras library and hyper-parameters. We are
optimizing some of the hyper-parameters with a random search strategy. Our focus is to find the optimal
number of training epochs. During modeling, a sequential bidirectional LSTM approach is performed [18].
The first layer of the LSTM model is the word embedding layer which uses a 32-length vector
composed by using the GloVe 500 words dimensionality reduction algorithm [5]. The next layer is the LSTM
layer which has 100 neurons (each neuron is a single LSTM cell in Figure 2), with two hidden bidirectional
layers. These layers serve for modeling memory cells. After that, the dense layer is constructed as an output
layer with a sigmoid function. This sigmoid function is used to provide final rating labels. The LSTM
architecture is shown in Figure 4.
The training-validation curves during the LSTM modeling phase are illustrated in Figures 5 and 6.
Figure 5 shows the excerpt of LSTM’s sentiment analysis during training validation and Figure 6 shows the
epistemic performance of the LSTM. The number of optimum training epochs during sentiment analysis
training is six, and seven for the epistemic analysis. The performance of the test-validation dataset is lower
than the accuracy of the training. By further evaluation, the models seem to suffer from overfitting. This fact
suggests that LSTM models are not general enough to catch variations of words occurring in chat histories.
Based on the results in Figures 5 and 6, exploring a deeper experiment to vary the length of the
sentences in the forums would be important. This is important to anticipate the robustness of the time steps or
the length of the sentiment sentences in our case, during the forming of LSTM models [25]. Another
possibility to improve the LSTM models is to identify specific verbs according to Bloom’s taxonomy from an
extensive lexicon as guidance during classification in an encode-decoder scenario [29].
Int J Eval & Res Educ ISSN: 2252-8822
Bloom-epistemic and sentiment analysis hierarchical classification in course discussion (Hapnes Toba)
87
Figure 4. The overall architecture of the LSTM training model
Figure 5. Sentiment analysis LSTM training-
validation curve
Figure 6. Epistemic analysis LSTM training-
validation curve
3.2. Classification ambiguity
In this subsection, we further analyze the epistemic model performance by using the test dataset,
which includes 100 chats from our internal CLS. The dataset is mainly annotated in the understanding (58%)
and applying (42%) epistemic class. In our opinion, these two classes will also reflect the competencies of
undergraduate students in general, and thus it is worth exploring how the RF and LSTM performed on the
dataset. The accuracy performance of these two models can be seen in the confusion matrix in Table 9.
An interesting fact in Table 9 is that the RF model fits the ‘understanding’ class, while the LSTM
model fits the ‘applying’ class. The false-positive rate of the RF model is statistically significantly higher
than that of the LSTM model. In our observation, this implies that the LSTM has more ability to catch the
randomness of word occurrences [24]. On the other hand, the RF model has more ability to predict the
occurrences of some regularities in word occurrences [23]. This fact can be inferred from the first row in
Table 7, which shows that more cases are classified as true. Some examples that show the cases of
randomness in epistemic analysis and the regularity of sentiment analysis can be seen in Table 10.
Table 9. Confusion matrix of RF and LSTM accuracy in the understanding and applying classes (%)
RF
Understanding
Applying
LSTM
Understanding
Applying
Understanding
36
22
Understanding
32
26
Applying
33
9
Applying
19
23
ISSN: 2252-8822
Int J Eval & Res Educ, Vol. 13, No. 1, February 2024: 80-90
88
Table 10 shows the examples of specific words that indicate their occurrences in specific sentiment
analysis classes. On the other hand, the same words could be classified into different classes of epistemic
analysis. Based on this observation, the LSTM model might be enhanced by first identifying important verbs
to indicate a specific level in Blooms taxonomy [29]. An example of model integration in a real-case
scenario is depicted in Figure 7. The sentences in the yellow boxes of Figure 7 are the English translation of
the discussion interactions. The predicted sentiments’ labels are as: netral neutral; negatif negative;
positif positive. The predicted Bloom’s labels are as: app applying; rem remembering;
und understanding.
Table 10. Examples of word randomness and regularity occurrences in epistemic and sentiment analysis
Type of word occurrences
Specific Word example
(Indonesian)
Specific Word example
(English translation)
Sentiment class
Epistemic class
Formal greetings
Selamat pagi/siang/sore
Terima kasih
Good
morning/afternoon/evening.
Thank you.
Positive
Understanding, and
applying
Negation
Bukan hal itu
Tidak bisa
Not that kind of thing.
It cannot be done.
Negative
Understanding
Technological terms
Unduh
Alat edit teks
To download.
Text editor.
Neutral
Understanding, and
applying
Figure 7. Subsystem response according to the chat history in a discussion forum
3.3. Subsystem integration
Based on the RF and LSTM models that have been deployed, a new subsystem is constructed to
improve an engagement-based learning management system (LMS) [30]. We integrated the RF model to
predict the sentiment analysis, and the LSTM model to predict the epistemic. Following this enhancement,
the lecturers can analyze the discussion forum in a deeper sense. With this functionality, they might estimate
how far students have progressed within a session. Figure 7 shows an example of how the subsystem
responded in a discussion forum. In the near future, we also plan to improve the LSTM model for epistemic
analysis by incorporating word class identification, i.e., verb classification in Bloom’s taxonomy.
4. CONCLUSION
In this paper, the researchers have demonstrated the process of developing machine-learning models
for conducting epistemic and sentiment analysis. There were two machine learning algorithms applied (RF
and LSTM). In aggregating the two algorithms, a two-stage classification concept is used. The first stage is
on the sentiment aspect and the next is on the epistemic aspect. The observations showed that the RF model
tends to be more appropriate for analyzing sentiments because it can capture the occurrence of regular words.
In contrast, the LSTM model is more likely to be adapted to predict epistemic classes with occurrences of
words that tend to be more randomized. The model training results have successfully been implemented in an
engagement-based LMS to help lecturers observe the progress of student learning through conversational
history in discussion forums.
Int J Eval & Res Educ ISSN: 2252-8822
Bloom-epistemic and sentiment analysis hierarchical classification in course discussion (Hapnes Toba)
89
ACKNOWLEDGEMENTS
The research presented in this paper was supported by the Research Institute and Community
Service (LPPM) at Maranatha Christian University, Bandung, Indonesia.
REFERENCES
[1] I. Galikyan, W. Admiraal, and L. Kester, “MOOC discussion forums: The interplay of the cognitive and the social,” Computers &
Education, vol. 165, p. 104133, May 2021, doi: 10.1016/j.compedu.2021.104133.
[2] Z. Liu et al., “Exploring the Relationship Between Social Interaction, Cognitive Processing and Learning Achievements in a
MOOC Discussion Forum,” Journal of Educational Computing Research, vol. 60, no. 1, pp. 132169, Mar. 2022, doi:
10.1177/07356331211027300.
[3] W. Zou, X. Hu, Z. Pan, C. Li, Y. Cai, and M. Liu, “Exploring the relationship between social presence and learners’ prestige in
MOOC discussion forums using automated content analysis and social network analysis,” Computers in Human Behavior,
vol. 115, p. 106582, Feb. 2021, doi: 10.1016/j.chb.2020.106582.
[4] H. Toba, M. Ayub, M. C. Wijanto, R. Parsaoran, and A. Sani, “Groups Allocation Based on Sentiment-Epistemic Analysis in
Online Learning Environment,” in 2021 International Conference on Data and Software Engineering (ICoDSE), IEEE, Nov.
2021, pp. 16. doi: 10.1109/ICoDSE53690.2021.9648426.
[5] A. Onan and M. A. Toçoğlu, “Weighted word embeddings and clustering‐based identification of question topics in MOOC
discussion forum posts,” Computer Applications in Engineering Education, vol. 29, no. 4, pp. 675689, Jul. 2021, doi:
10.1002/cae.22252.
[6] O. Grljević, Z. Bošnjak, and A. Kovačević, “Opinion mining in higher education: a corpus-based approach,” Enterprise
Information Systems, vol. 16, no. 5, May 2022, doi: 10.1080/17517575.2020.1773542.
[7] M. Misuraca, G. Scepi, and M. Spano, “Using Opinion Mining as an educational analytic: An integrated strategy for the analysis
of students’ feedback,” Studies in Educational Evaluation, vol. 68, p. 100979, Mar. 2021, doi: 10.1016/j.stueduc.2021.100979.
[8] N. Kaliappen, W. N. A. Ismail, A. B. A. Ghani, and D. Sulisworo, “Wizer.me and Socrative as innovative teaching method tools:
Integrating TPACK and Social Learning Theory,” International Journal of Evaluation and Research in Education (IJERE),
vol. 10, no. 3, p. 1028, Sep. 2021, doi: 10.11591/ijere.v10i3.21744.
[9] A. Krouska, C. Troussas, and M. Virvou, “Computerized Adaptive Assessment Using Accumulative Learning Activities Based on
Revised Bloom’s Taxonomy,” in Knowledge-Based Software Engineering: 2018. JCKBSE 2018. Smart Innovation, Systems and
Technologies, vol 108. Springer, Cham, 2019, pp. 252258. doi: 10.1007/978-3-319-97679-2_26.
[10] N. Barari, M. RezaeiZadeh, A. Khorasani, and F. Alami, “Designing and validating educational standards for E-teaching in virtual
learning environments (VLEs), based on revised Bloom’s taxonomy,” Interactive Learning Environments, vol. 30, no. 9,
pp. 16401652, Oct. 2022, doi: 10.1080/10494820.2020.1739078.
[11] V. Echeverría, J. C. Gomez, and M. F. Moens, “Automatic labeling of forums using Bloom’s taxonomy,” in Lecture Notes in
Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8346
LNAI, no. PART 1, 2013, pp. 517528. doi: 10.1007/978-3-642-53914-5_44.
[12] J. S. Wong, B. Pursel, A. Divinsky, and B. J. Jansen, “Analyzing MOOC discussion forum messages to identify cognitive learning
information exchanges,” in Proceedings of the Association for Information Science and Technology, 2015, pp. 110. doi:
10.1002/pra2.2015.145052010023.
[13] T. O’Riordan, D. E. Millard, and J. Schulz, “Is critical thinking happening? Testing content analysis schemes applied to MOOC
discussion forums,” Computer Applications in Engineering Education, vol. 29, no. 4, pp. 690709, 2021, doi: 10.1002/cae.22314.
[14] S. Pradha, M. N. Halgamuge, and N. Tran Quoc Vinh, “Effective text data preprocessing technique for sentiment analysis in
social media data,” Proceedings of 2019 11th International Conference on Knowledge and Systems Engineering, KSE 2019, 2019,
doi: 10.1109/KSE.2019.8919368.
[15] R. Meissner, D. Jenatschke, and A. Thor, “Evaluation of Approaches for Automatic E-Assessment Item Annotation with Levels
of Bloom’s Taxonomy,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), vol. 12511 LNCS, pp. 5769, 2021, doi: 10.1007/978-3-030-66906-5_6.
[16] A. S. de Oliveira and L. N. Lenartovicz, “School Evaluation: A Possible Dialogue between the History and Epistemology,” Open
Journal of Social Sciences, vol. 06, no. 08, pp. 179189, 2018, doi: 10.4236/jss.2018.68014.
[17] H. Toba, Z.-Y. Ming, M. Adriani, and T.-S. Chua, “Discovering high quality answers in community question answering archives
using a hierarchy of classifiers,” Information Sciences, vol. 261, pp. 101115, Mar. 2014, doi: 10.1016/j.ins.2013.10.030.
[18] T. U. Tran, H. T. T. Hoang, P. H. Dang, and M. Riveill, “Toward a multitask aspect-based sentiment analysis model using deep
learning,” IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 2, pp. 516524, Jun. 2022, doi:
10.11591/ijai.v11.i2.pp516-524.
[19] M. Maree, M. Eleyat, S. Rabayah, and M. Belkhatir, “A hybrid composite features based sentence level sentiment analyzer,” IAES
International Journal of Artificial Intelligence (IJ-AI), vol. 12, no. 1, pp. 284294, Mar. 2023, doi: 10.11591/ijai.v12.i1.pp284-
294.
[20] F. Iqbal et al., “A Hybrid Framework for Sentiment Analysis Using Genetic Algorithm Based Feature Reduction,” IEEE Access,
vol. 7, pp. 1463714652, 2019, doi: 10.1109/ACCESS.2019.2892852.
[21] B. S. Bloom, Changes in evaluation methods,” in Research and development and school change, Routledge, 2020. [Online].
Available: https://www.taylorfrancis.com/chapters/edit/10.4324/9780203781555-6/changes-evaluation-methods-benjamin-bloom
[22] M. A. Aripin, R. Hamzah, P. Setya, M. H. M. Hisham, and M. I. Mohd Ishar, “Unveiling a new taxonomy in education field,”
International Journal of Evaluation and Research in Education (IJERE), vol. 9, no. 3, pp. 524530, Sep. 2020, doi:
10.11591/ijere.v9i3.20458.
[23] P. Karthika, R. Murugeswari, and R. Manoranjithem, “Sentiment Analysis of Social Media Network Using Random Forest
Algorithm,” in 2019 IEEE International Conference on Intelligent Techniques in Control, Optimization and Signal Processing
(INCOS), IEEE, Apr. 2019, pp. 15. doi: 10.1109/INCOS45849.2019.8951367.
[24] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 17351780, Nov. 1997,
doi: 10.1162/neco.1997.9.8.1735.
[25] Z. Jin, Y. Yang, and Y. Liu, “Stock closing price prediction based on sentiment analysis and LSTM,” Neural Computing and
Applications, vol. 32, no. 13, pp. 97139729, Jul. 2020, doi: 10.1007/s00521-019-04504-2.
[26] P. Sivakumar and J. Ekanayake, “Predicting ratings of YouTube videos based on the user comments,” Preprint. 2021. [Online].
Available: http://drr.vau.ac.lk/handle/123456789/205 (accessed: Nov. 09, 2022).
ISSN: 2252-8822
Int J Eval & Res Educ, Vol. 13, No. 1, February 2024: 80-90
90
[27] “YouTube Data API,” Google Developers. [Online]. Available: https://developers.google.com/youtube/v3 (accessed: Nov. 09,
2022).
[28] “Web Programming UNPAS - YouTube.” [Online]. Available: https://www.youtube.com/c/WebProgrammingUNPAS/about
(accessed: Nov. 09, 2022).
[29] B. D. Wijanarko, Y. Heryadi, H. Toba, and W. Budiharto, “Automated question generating method based on derived keyphrase
structures from bloom’s taxonomy,” ICIC Express Letters, vol. 14, no. 11, pp. 10591067, 2020, doi: 10.24507/icicel.14.11.1059.
[30] M. Ayub, H. Toba, M. C. Wijanto, R. Parsaoran, A. Sani, and Y. T. Hernita, “the Impact of Developing a Blended Learning Sub-
System on Students Online Learning Engagement,” Journal of Technology and Science Education, vol. 11, no. 2, pp. 556568,
2021, doi: 10.3926/jotse.1196.
BIOGRAPHIES OF AUTHORS
Hapnes Toba graduated in 2002 with a Master of Science from the Delft
University of Technology in the Netherlands and completed his doctoral degree in computer
science at Universitas Indonesia in 2015. He is an associate professor in the area of artificial
intelligence and is interested in information retrieval, natural language processing,
educational data mining, and computer vision. He has been a faculty member in the Faculty
of Information Technology at Maranatha Christian University since 2003. He is also an
active board member of the Indonesian Computational Language Association (INACL) and
serves as the chair of the Information and Communication Technology Forum of the
Association of Christian Universities and Colleges in Indonesia. He can be contacted by
email at: hapnestoba@it.maranatha.edu.
Yolanda Trixie Hernita is an alumnus of Maranatha Christian University. She
is majoring in Informatics Engineering very interested in web programming and data
analytics. Apart from the academic field, she also joined the Voice of Maranatha (VOM)
organization on campus, from 2018 to 2019. At the recent moment, she serves as a data
analyst at a national banking organization. She can be contacted by email at:
1872045@maranatha.ac.id.
Mewati Ayub graduated with a Bachelor of Informatics from Bandung
Institute of Technology (ITB) in 1986 and completed her master’s degree at Bandung
Institute of Technology in 1996, and her doctoral degree at Bandung Institute of Technology
in 2006. She has been working as a faculty member in the Faculty of Information
Technology at Maranatha Christian University since 2006. Her specialty is in the field of
educational technology, software engineering, and data mining. She can be contacted by
email at: mewati.ayub@it.maranatha.edu.
Maresha Caroline Wijanto is an alumnus of the Faculty of Information
Technology, Maranatha Christian University, and graduated from the Bandung Institute of
Technology (ITB) with her Masters Degree in Computer Science. She joined the Faculty of
Information Technology at Maranatha Christian University in 2010. Her specialty is in the
field of Natural Language Processing, Machine Learning, and Data Mining. She can be
contacted by email at: maresha.cw@it.maranatha.edu.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Current lexica and machine learning based sentiment analysis approaches still suffer from a two-fold limitation. First, manual lexicon construction and machine training is time consuming and error-prone. Second, the prediction’s accuracy entails sentences and their corresponding training text should fall under the same domain. In this article, we experimentally evaluate four sentiment classifiers, namely Support Vector Machines, Naive Bayes, Logistic Regression and Random Forest. We quantify the quality of each of these models using three real-world datasets that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic movie reviews. Specifically, we study the impact of a variety of natural language processing (NLP) pipelines on the quality of the predicted sentiment orientations. Additionally, we measure the impact of incorporating lexical semantic knowledge captured by WordNet on expanding original words in sentences. Findings demonstrate that the utilizing different NLP pipelines and semantic relationships impacts the quality of the sentiment analyzers. In particular, results indicate that coupling lemmatization and knowledge-based n-gram features proved to produce higher accuracy results. With this coupling, the accuracy of the support vector machine (SVM) classifier has improved to 90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three other classifiers.
Article
Full-text available
Sentiment analysis or opinion mining is used to understand the community’s opinions on a particular product. This is a system of selection and classification of opinions on sentences or documents. At a more detailed level, aspect-based sentiment analysis makes an effort to extract and categorize sentiments on aspects of entities in opinion text. In this paper, we propose a novel supervised learning approach using deep learning techniques for a multitasking aspect-based opinion mining system that supports four main subtasks: extract opinion target, classify aspect, classify entity (category) and estimate opinion polarity (positive, neutral, negative) on each extracted aspect of the entity. We have used a part-of-speech (POS) layer to define the words’ morphological features integrated with GloVe word embedding in the previous layer and fed to the convolutional neural network_bidirectional long-short term memory (CNN_BiLSTM) stacked construction to improve the model’s accuracy in the opinion classification process and related tasks. Our multitasking aspect-based sentiment analysis experiments on the dataset of SemEval 2016 showed that our proposed models have obtained and categorized core tasks mentioned above simultaneously and attained considerably better accurateness than the advanced researches.
Conference Paper
Full-text available
This project attempted to develop a model to predict the rating of a YouTube video based on the user comments. We extracted the user comments from many YouTube videos to make the sentimental analysis. The keywords were extracted from the user reviews using the Natural Language Processing technique, and those reviews were categorized into positive or negative predicated on the sentimental analysis. The Naïve Bayes model was trained to utilize the user reviews extracted from YouTube to presage the rating of a video. The model was tested on original datasets, and the precision of that was evaluated respectively. Conclusively, one conclusion has been met that the rating of a video cannot be presaged through the user comments. The performance of the model is decent enough compared to the subsisting models in the literature. YouTube sanctions extract an inhibited number of user comments, and hence, this factor could negatively affect the rating presage's precision.
Article
Full-text available
In this research, we show a development process of engagement sub-systems in a blended-learning management system and evaluate the impact of student interaction in the whole system. We develop special sub-systems for engagement purposes via forum, course rating, and class assignment modules. During the system development process, we employ continuous improvement methodology which helps to shorten the software delivery time without disturbing the overall operation. We evaluate the impact of engagement processes in terms of behavioral, emotional and cognitive aspects. Our evaluation results show that by employing the engagement sub-systems we have increased a 0.30 satisfaction point on average (1-5 Likert scale) for 11 evaluation survey questions distributed to 305 students during 2 times evaluation period. Another interesting finding from the surveys is that behavioral (discussion forum and attendance list sub-system) and cognitive (course rating sub-system) aspects have great influences for the students’ activities (class assignment sub-system) which finally has a great impact on their cognitive performances.
Article
Full-text available
The purpose of this paper is to share a lecturer’s viewpoint on using Wizer.me and Socrative applications as an innovative teaching method integrating TPACK and Social Learning Theory (SLT) at higher education. The applications were used to teach 44 undergraduate students who registered for Cross-Cultural Management course at Universiti Utara Malaysia (UUM). At first, the lecturer used Wizer.me before the class starts and requesting the students to answer several questions before coming to the class. After completing each chapter, the students requested to answer some questions using Socrative application to test their understanding level. The research revealed that at the beginning of the semester, the students not aware of these two applications. However, at the end of the semester, every student familiar with these applications and overall provided positive feedback on the usage of Wizer.me and Socrative application in the teaching and learning process. This study used IntenCheck sentiment analysis software to evaluate the students’ feedback. Student’s opinion on using Wizer.me and Socrative application as an innovative teaching method not explored before at UUM. Therefore, this viewpoint could provide useful insight for university lecturers to use these applications in their teaching and learning process.
Article
Full-text available
In the field of learning analytics, mining the regularities of social interaction and cognitive processing have drawn increasing attention. Nevertheless, in MOOCs, there is a lack of investigations on the combination of social and cognitive behavioral patterns. To fill in this gap, this study aimed to uncover the relationship between social interaction, cognitive processing, and learning achievements in a MOOC discussion forum. Specifically, we collected the 3925 participants’ forum data throughout 16 weeks. Social network analysis and epistemic network analysis were jointly adopted to investigate differences in social interaction, cognitive processing between two achievement groups, and the differences in cognitive processing networks between two types of communities. Finally, moderation analysis was employed to examine the moderating effect of community types between cognitive processing and learning achievements. Results indicated that: (1) the high- and low-achieving groups presented significant differences in terms of degree, betweenness, and eigenvector centrality; (2) the stronger cognitive connections were found within the high-achieving group and the instructor-led community; (3) the cognitive processing indicators including insight, discrepancy, and tentative were significantly negative predictors of learning achievements, whereas inhibition and exclusive were significantly positive predictors; (4) the community type moderated the relationship between cognitive processing and learning achievements.
Article
Full-text available
Massive Open Online Course (MOOC) platforms capture the digital traces of millions of learners and generate an avalanche of “numbers” on learner behavior in MOOCs. Yet little is known about the dynamics through which MOOCs can support individual learning as the cognitive and social constituents of this complex process and their interplay within this process do not clearly surface in this large mass of “numbers”. This study analyzed the content generated by learners in a MOOC discussion forum with a particular focus on the still under-explored cognitive dimension of learning in MOOCs and demonstrated how certain levels of cognitive engagement relate to learning. It further examined the interplay between the cognitive and social aspects, revealing the moderating role of the social aspect in the association between the lowest level of cognitive engagement and learning in a MOOC environment. The study concludes with discussing the theoretical and practical implications of the findings and with highlighting the need to consider the interdependencies between the cognitive and social variables and learning when designing and evaluating MOOCs.
Article
Full-text available
The analysis of students' feedback written in natural language has been poorly considered in academic institutions, looking more frequently at students' ratings as a base to evaluate courses and instructors. Statistical text analyses offer the possibility of exploring text collections from a quantitative viewpoint. Particularly interesting is Opinion Mining (OM), a family of techniques at the crossroads of Statistics, Linguistics and Computer Science. OM allows evaluating the sentiment of individual opinions, highlighting their semantic orientation. In an educational context, this approach allows processing students' comments and creating powerful analytics. This paper aims at introducing readers to OM, presenting a strategy to compute the sentiment polarity of students' comments. After explaining the rationale of the proposal and its mathematical formalisation, a toy example is presented to show how it works in practice. A discussion about theoretical and empirical implications offers some hints about its potentiality in a learning environment.
Chapter
The classification of e-assessment items with levels of Bloom’s taxonomy is an important aspect of effective e-assessment. Such annotations enable the automatic generation of parallel tests with the same competence profile as well as a competence-oriented analysis of the students’ exam results. Unfortunately, manual annotation by item creators is rarely done, either because the used e-learning systems do not provide the functionality or because teachers shy away from the manual workload. In this paper we present an approach for the automatic classification of items according to Bloom’s taxonomy and the results of their evaluation. We use natural language processing techniques for pre-processing from four different NLP libraries, calculate 19 item features with and without stemming and stop word removal, employ six classification algorithms and evaluate the results of all these factors by using two real world data sets. Our results show that 1) the selection of the classification algorithm and item features are most impactful on the F1 scores, 2) automatic classification can achieve F1 scores of up to 90% and is thus well suited for a recommender system supporting item creators, and 3) some algorithms and features are worth using and should be considered in future studies.