ChapterPDF Available

Establishing a Formal Benchmarking Process for Sentiment Analysis for the Bangla Language

Authors:

Abstract and Figures

Tracking sentiments is a critical task in many natural language processing applications. A lot of work has been done on many leading languages in the world, such as English. However, in many languages such as Bangla, sentiment analysis is still in early development. Most of the research on this topic suffers from three key issues: (a) the lack of standardized publicly available datasets, (b) the subjectivity of the reported results, which generally manifests as a lack of agreement on core sentiment categorizations, and finally, (c) the lack of an established framework where these efforts can be compared to a formal benchmark. Thus, this seems to be an opportune moment to establish a benchmark for sentiment analysis in Bangla. With that goal in mind, this paper presents benchmark results of ten different sentiment analysis solutions on three publicly available Bangla sentiment analysis corpora. As part of the benchmarking process, we have optimized these algorithms for the task at hand. Finally, we establish and present sixteen different evaluation matrices for benchmarking these algorithms. We hope that this paper will jumpstart an open and transparent benchmarking process, one that we plan to update every two years, to help validating newer and novel algorithms that will be reported in this area in future.
Content may be subject to copyright.
Establishing a Formal Benchmarking Process
for Sentiment Analysis for the Bangla
Language
AKM Shahariar Azad Rabby
1(&)
, Aminul Islam
1
,
and Fuad Rahman
2
1
Apurba Technologies, Dhaka, Bangladesh
{rabby,aminul}@apurbatech.com
2
Apurba Technologies, Sunnyvale, CA, USA
fuad@apurbatech.com
Abstract. Tracking sentiments is a critical task in many natural language pro-
cessing applications. A lot of work has been done on many leading languages in
the world, such as English. However, in many languages such as Bangla, sen-
timent analysis is still in early development. Most of the research on this topic
suffers from three key issues: (a) the lack of standardized publicly available
datasets, (b) the subjectivity of the reported results, which generally manifests as
a lack of agreement on core sentiment categorizations, and nally, (c) the lack of
an established framework where these efforts can be compared to a formal
benchmark. Thus, this seems to be an opportune moment to establish a bench-
mark for sentiment analysis in Bangla. With that goal in mind, this paper presents
benchmark results of ten different sentiment analysis solutions on three publicly
available Bangla sentiment analysis corpora. As part of the benchmarking pro-
cess, we have optimized these algorithms for the task at hand. Finally, we
establish and present sixteen different evaluation matrices for benchmarking
these algorithms. We hope that this paper will jumpstart an open and transparent
benchmarking process, one that we plan to update every two years, to help
validating newer and novel algorithms that will be reported in this area in future.
Keywords: Sentiment analysis NLP Bangla sentiment corpus
Annotation Benchmarking
1 Introduction
The explosion of information technology, especially the use of social media, has
resulted in a vast amount of content that is thrown at human beings at any given
moment. A lot of this content is tied to social, political, and economic interests,
publishers of all of which have a vested interest in tracking whether the audience likes
the content or not. For instance, data-driven trend analysis is an essential part of
modern politics and advertising. Less dramatic, but equally critical applications of
sentiment analysis are customer reviews on online shopping sites or opinion mining on
newspapers to gauge public sentiment on national security issues, just to name a few.
©Springer Nature Switzerland AG 2021
K. Arai et al. (Eds.): FTC 2020, AISC 1289, pp. 428448, 2021.
https://doi.org/10.1007/978-3-030-63089-8_28
Bangla is spoken as the rst language by almost 200 million people worldwide, 160
million of whom hold Bangladeshi citizenship. But Natural Language Processing
(NLP) development of the Bangla language is in very early stages, and there is not yet
enough labeled data to work with for the language. Because of the scarcity of labeled
data and standardized corpora, little work has been reported in this space.
Recently, a sentiment analysis corpus of about 10,000 sentences was made public by
Apurba Technologies [1]. We searched and located two additional, albeit smaller, open-
sourced datasets in this space [2]. We built ten different sentiment analysis algorithms
using Machine Learning (ML), statistical modeling, and other methods. This paper
benchmarks these 10 algorithms on the above-mentioned 3 annotated corpora.
The paper is arranged as follows. We begin by reviewing the existing state of the art
of sentiment analysis in Banglawhich as stated already is not very richbut the
principal issue that becomes crystal clear is that whatever efforts have been reported on
this topic, it is absolutely impossible to compare them since they use different datasets
and almost always the datasets reported are not available to other researchers. As a
natural segue from this topic, we then present how we combined all the possible
sources of sentiment corpora available publicly and built a large dataset. We then move
to designing 14 different matrices that form the benchmarking framework. We then
describe 10 different sentiment analysis algorithms that have been reported in the
literature. Although this list is not exhaustive in any sense, it does cover the majority of
the work ever reported in this space. We not only implemented these algorithms, we
also ne-tuned the parameters for optimizing each of these solutions. Finally, these 10
algorithms were benchmarked by the 14 different matrices identied earlier. The paper
ends with a discussion on the reported work.
2 Brief Background
There are three classication levels in sentiment analysis: document-level, sentence-
level, and aspect-level. In the document level, overall sentiment is assessed based on
the complete text. The sentence-level analysis aims to classify sentiment expressed in
each sentence. The rst step is to identify whether the sentence is subjective or
objective. If the sentence is subjective, sentence-level analysis will determine whether
the sentence expresses positive or negative opinions [3]. In aspect-based sentiment
analysis, sentiments are assessed on aspects or points of view of a topic, especially with
multi-clausal sentences. For the rest of this paper, we will exclusively focus on
sentence-level sentiment analysis.
Machine learning techniques for sentiment analysis are getting better, especially for
vector representation models, where some of these models can extract semantics that
helps to understand the intent of the messages [4]. Many machine learning and deep
learning techniques have been reported for identifying and classifying sentiment
polarity in a document or sentence. Existing research demonstrates that Long Short-
Term Memory networks (LSTMs) are capable of learning the context and inherent
meaning of a word and provide more accurate results for sentiments [5]. Classication
algorithms such as Random Forest, Decision Tree Classier, and the k-nearest
neighbors (KNN) algorithm, are suitable for classication based on feature sets. Naive
Bayes works based on Bayestheorem of a probability distribution. Convolutional
Establishing a Formal Benchmarking Process 429
Neural Networks (CNNs), a commonly used tool in deep learning, works well for
sentiment analysis as its standard architecture can map the sentences of variable length
into sentences of xed size scattered vectors [6].
1
Table 1. Bangla sentiment analysis - previous work
Year
2017
2014
2017
2016
2018
2017
Acc
75.5%
SVM 88%
MaxEnt
88%
Lr:75.91%
SVM:
79.56%
Tree:76.64
%
78%
83.79%
(MSE)
0.0529
Availability
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Size
15,000
Comments
1,300 tweets
15,325 head-
lines
10,000
Bangla
text samples
1,899,094
Sentences
23,506,262
Words,
394,297 que
Words
NA
Dataset
Self-
collected
comments
data
Bangla
Tweets
Self-
collected
news head-
line data.
Self-
collected
Bangla Web
Crawl
Bangla Sen-
timent Da-
taset
Bangla
tweets using
Twitter
APIs.
Method
word2vec and Sentiment
extraction of words
Support Vector Machine
(SVM) and Maximum
Entropy (MaxEnt).
Support Vector Ma-
chine, Logistic Regres-
sion, etc.
LSTM, using two types
of loss functions – bina-
ry cross-entropy and
categorical cross-entropy
Word embedding meth-
ods Word2vec Skip-
Gram and Continuous
Bag of Words with an
addition Word to Index
model for SA in Bangla
language
Fuzzy rules to represent
semantic rules that are
simple but greatly influ-
ence the actual polarity
of the sentences
Author
Md. Al- Amin, Md.
Saiful Islam,
Shapan Das Uzzal
Shaika Chow-
dhury, Wasifa
Chowdhury
Mohammad Sam-
man Hoss-ain, Israt
Jahan Jui, Afia
Zahin Suzana
Asif Hassan, Mo-
hammad Rashedul
Amin, Abul Kalam
Al Azad, Nabeel
Mohammed
Sakhawat Hosain
Sumit, Md. Zakir
Hossan, Tareq Al
Muntasir and
Tanvir Sourov
Md. Asimuzzaman,
Pinku Deb Nath,
Farah Hossain,
Asif Hossain,
Rashedur M. Rah-
man
1
Recently lots of pre-trained language models like BERT [30], ELMo [31], XLNet have been reported
to achieve promising results on several NLP tasks including sentiment analysis. However, these
models are mainly targeted to the English language, not Bangla.
430 AKM Shahariar Azad Rabby et al.
Paper Title
Sentiment Analysis of Ben-
gali Comments with
Word2Vec and Sentiment
Information of Words [7]
Performing Sentiment Analy-
sis in Bangla Microblog
Posts [8]
Sentiment Analysis for Ben-
gali Newspaper Headlines [9]
Sentiment Analysis on Bang-
la and Romanized Bangla
Text (BRBT) using Deep
Recurrent models. [10]
Exploring Word Embedding
for Bangla Sentiment Analy-
sis [11]
Sentiment Analysis of Bangla
Microblogs Using Adaptive
Neuro Fuzzy System [12]
Year
2019
2019
2019
2017
2016
2018
Acc
Above
90%
84.4%
87%
99.87%
88.54%
65.97%
three,
54.24%
five labels
Availability
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Size
7,500 Bangla
sentences
9337 post
1050 Bang-
la texts
850 Bangla
comments
from differ-
ent sources
68356 trans-
lated reviews
15689
YouTube
comment
Dataset
Self-
collected
Dataset from
Hasaan, As-
if, et al.
Self-
collected
Self-
collected
Generated
from Ama-
zon's
Watches
English da-
taset.
Self-
collected
YouTube
comment
Method
Naïve Bayes Classifica-
tion Algorithm and
Topical approach to ex-
tract the emotion.
Long Short-term
Memory (LSTM) Neural
Networks for analyzing
negative sentences in
Bangla.
Random Forest Classifi-
er to classify sentiments.
The model is generated
by a neural network vari-
ance called Convolution-
al Neural Network
Mutual Information
(MI) for the feature se-
lection process and also
used Multinomial Naive
Bayes (MNB) for the
classification
Deep learning based
modelsto classify a
Bangla sentence with a
three-class
Author
Rashedul Amin
Tuhin, Bechitra
Kumar Pa ul, Faria
Nawrine, Mahbuba
Akt A it K
Abdul Hasib Ud-
din; Sumit Kumar
Dam; Abu Shamim
Mohammad Arif-
Chakrabarty
Nusrath Tabassum;
Muhammad Ibra-
him Khan
Md. Habibul Alam
; Md-Mizanur Ra-
homan ; Md. Abul
Kalam Azad
Animesh Kumar
Paul; Pintu Chan-
dra Shill
Nafis Irtiza Tripto ;
Mohammed Eunus
Ali
Establishing a Formal Benchmarking Process 431
Paper Title
An Automated System of
Sentiment Analysis from
Bangla Text using Supervised
Learning Techniques [13]
Extracting Severe Negative
Sentence Pattern from Bangla
Data via Long Short-term
Memory Neural Network [14]
Design an Empirical Frame-
work for Sentiment Analysis
from Bangla Text using Ma-
chine Learning [15]
Sentiment analysis for Bangla
sentences using convolutional
neural network [16]
Sentiment mining from Bang-
la data using mutual infor-
mation [17]
Detecting Multilabel Senti-
ment and Emotions from
Bangla YouTube Comments
[18]
Year
2016
2018
2019
2018
2016
2019
Acc
83%
89.271%
70%
80%
73%
80.48%
Availability
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Not publicly
available
Size
1500 short
Bangla
comment
9,500 com-
ments
201 Com-
ments
45,000
9000 words
1000 restau-
rant reviews
Dataset
Collected
from various
social sites
Collected
from differ-
ent source
Collected
from
YouTube
Collected
from Face-
book using
Facebook
graph api
Collected
from Face-
book Group
Self-
collected
Method
Used Tf.Idf to come
out a better solution
and give more accurate
result by extracting
different feature
One vector containing
more than one words
using N-gram
A backtracking algo-
rithm used, where the
heart of this approach is
a sentiment lexicon
Represent Bangla sen-
tence based on charac-
ters and extract infor-
mation from the charac-
ters using an RNN
Naïve Bayes and Dic-
tionary Based Approach
used to Lexicon Based
Sentiment Analysis
e
v
ı
a
Nl
aimonitl
u
M
Bayes used for sentiment
analysis.
Author
Muhammad
Mahmudun Nabi,
Md. Altaf, Sabir
Ismail
SM Abu Taher;
Kazi Afsana Akh-
ter ; K.M. Azharul
Hasan
Tapasy Rabeya ;
Narayan Ranjan
Chakraborty ; San-
jida Ferdous ;
Manoranjan Dash ;
Ahmed Al Marouf
Mohammad Sal-
man Haydar ; Mus-
takim Al Helal ;
Syed Akhter Hoss-
ain
Sanjida Akter;
Muhammad Tareq
Aziz
Omar Sharif; Mo-
hammed Moshiul
Hoque; Eftekhar
Hossain
432 AKM Shahariar Azad Rabby et al.
Table 1shows the state of the art of Bangla sentiment analysis research.
One observation that is painfully plain in this table is that all of the authors of these
papers spent valuable time in building and annotating their own datasets. What is even
more alarming is that none of these datasets were then made publicly available. This
has made it impossible to compare the validity and relative strengths or weaknesses for
any of these solutions, making the task of establishing a benchmark framework
impossible.
3 Dataset
In this research, we used three different datasets. The rst dataset is our own, that we
previously published [1], representing the largest open-access sentiment analysis
dataset for Bangla, with 9,630 samples. The second is the ABSA Sports dataset [2],
with 2,979 samples. The third and nal dataset [2] is the ABSA Restaurant dataset,
with 2,059 samples. All datasets have three sentiment categorizations: positive, neg-
ative, and neutral. For simplicity, we excluded all of the neutral data from our datasets.
After eliminating the neutral samples, the Apurba, ABSA Sports, and ABSA Restau-
rant datasets have 7,293, 2,718, and 1,808 positive and negative samples, respectively.
The proposed benchmarking system has four stages: data collection, data pre-
processing, training, and evaluation.
3.1 Dataset Collection
The Apurba Dataset was collected from a popular online news portal Prothom Alo
(), tagged manually and checked twice for validation. Also, the dataset is open-
source for all types of non-commercial usage, intended for educational and research
use. The other two datasets can easily be obtained from GitHub. We also merged these
three datasets and made a mixed dataset.
Paper Title
Detecting Sentiment from
Bangla Text using Machine
Learning Technique and Fea-
ture Analysis [19]
N-Gram Based Sentiment
Mining for Bangla Text Us-
ing Support Vector Machine
[20]
Sentiment Analysis of Bangla
Song Review- A Lexicon
Based Backtracking Ap-
proach [21]
Sentiment Extraction from
Bangla Text: A Character
Level Supervised Recurrent
Neural Network Approach [
22]
Sentiment analysis on the
Facebook group using lexi-
con-based approach [23]
Sentiment Analysis of Ben-
gali Texts on Online Restau-
rant Reviews Using Multi-
nomial Naïve Bayes [24]
Establishing a Formal Benchmarking Process 433
3.2 Data Pre-processing
Data cannot be used as-is in most machine learning algorithmsit needs to be pro-
cessed before anything else can be done.
In this research, we took the text and annotated sentiment values. We excluded the
neutral samples and represent the positive class with 0 and the negative level with 1.
We removed all unnecessary characters, including punctuation, URL, extra white
space, emoticons, symbols, pictographs, transport and maps symbol, iOS ags, digits,
and 123 other characters, and so forth. After all these steps, the preprocessed dataset
looks as shown in Fig. 1.
Tokenization is a task of separating the given sentence sequence each word, which
are then known as tokens. Tokenizers accomplish this task by locating word bound-
aries. The ending point of a word and the beginning of the next word are our word
boundaries. We tokenize each sentence based on white space. The next step is
removing stop-words, which are commonly used words (such as aor and) which
our algorithm ignores. Figure 2shows a typical example of these steps.
We then prepare a term frequency-inverse document frequencyvectorization,
commonly known as tf-idf, that creates a sparse matrix. The sparse matrix contains a
vector representation of our data. The tf-idf output is used as a weighting factor to
measure how important a word is in a document in a collection of given corpus.
Then we split our data into two portions, 80% is for training purposes and 20% for
test the model performance. Figure 3shows the owchart of these pre-processing steps.
Fig. 1. Processed dataset sample
Fig. 2. Pre-processing steps
434 AKM Shahariar Azad Rabby et al.
4 Benchmarking Indices
Sensitivity analysis is a model that determines how target variables are affected based
on changes in other variables known as input variables. This model, also referred to as
what-if or simulation analysis, is a way to predict the outcome of a decision given a
certain range of variables. By creating a given set of variables, an analyst can determine
how changes in one variable affect the outcome. We have used a set of universally
standardized indices for validating the algorithms including Confusion Matrix (CM),
True Positive Rate (TPR), True Negative Rate (TNR), False Negative Rate (FNR),
False Positive Rate (FPR), Positive Predictive Value (PPV), Negative Predictive Value
(NPV), False Discovery Rate (FDR), False Omission Rate (FOR), Accuracy (ACC), F1
Score, R2 Score, Receiver Operating Characteristic (ROC), and Area Under the Curve
(AUC) [2428].
5 Sentiment Analysis Algorithms
We used ten different algorithms, which are: Multinomial Naive Bayes, Bernoulli
Naive Bayes, Logistic Regression, Decision Tree Classier, K-Nearest Neighbors
Classier (KNN), Support Vector Machine (SVM), Ada-Boost Classier, Extreme
Gradient Boosting (XGBoost) and long short-term memory (LSTM). LSTM achieves
the best performance among them. We used K-fold cross-validation and Grid Search to
nd the best parameters for all of our algorithms.
5.1 Multinomial Naive Bayes
Multinomial Naive Bayes estimates the conditional probability of a particular word
given a class as the relative frequency of term tin samples belonging to class
c. Multinomial Naive Bayes simply assumes a multinomial distribution for all the pairs,
which seems to be a reasonable assumption in some cases, especially for word counts
in documents.
Fig. 3. Flowchart of the pre-processing steps
Establishing a Formal Benchmarking Process 435
5.2 Bernoulli Naive Bayes
The Bernoulli Naive Bayes classier assumes that all our features are binarythat they
take only two values. This is similar to the Multinomial Naive Bayes, but the predictors
are Boolean variables. The parameters that we use to predict the class variable take up
only values, yes or no, for example, if a word occurs in the text or not.
5.3 Logistic Regression
Logistic Regression is the primary form of statistical method to nd a binary dependent
variable. In this technique, models try to nd the probability of each class. Logistic
Regression is a ML classication algorithm that used to predict the probability of a
categorical dependent variable. In logistic regression, the dependent variable is a binary
variable that contains data coded as either 1 (yes, success, etc.) or 0 (no, failure, etc.). In
other words, the logistic regression model predicts P (Y = 1) as a function of X.
5.4 Random Forest
A forest usually consists of lots of trees; in a random forest, a large number of indi-
vidual decision trees operated like ensemble. Every decision tree gives their vote to a
particular class, and the class that gets the most votes is selected for model prediction.
5.5 Decision Tree Classier
A decision tree is the purest form of the classication algorithm. A decision tree
contains nodes, edges, and leaf nodes for classications. Decision trees consist of:
(a) nodes to test for the value of a particular attribute, (b) edges/branches to correspond
to the outcome of a test and connect to the next node or leaf, and (c) leaf nodes which
are terminal nodes that predict the outcome (such as class labels or class distribution).
5.6 KNN Classier
In the eld of AI, the k-nearest neighborsalgorithm is a non-parametric technique used
for classications. It is easy to implement, but the major problem is that it becomes
slow as the amount of data increases.
5.7 SVM Classier
A Support Vector Machine (SVM) is a discriminative classier formally dened by a
separating hyperplane. In other words, given labeled training data (supervised learn-
ing), the algorithm builds an optimal hyperplane that separates new examples into
constituent classes. In two-dimensional space, this hyperplane is a line dividing a plane
into two parts wherein each class lies on either side.
436 AKM Shahariar Azad Rabby et al.
5.8 Ada-Boost Classier
The general idea behind boosting methods is to train predictors sequentially, each
trying to correct its predecessor. The basic concept behind Ada-boost is to set the
weights of classiers and to train the data samples in each iteration such that it ensures
accurate predictions, even for unusual observations.
5.9 XGBoost
XGBoost is a decision-tree-based ensemble ML algorithm that uses a gradient boosting
framework. XGBoost Gradients are fantastic models because they can increase accu-
racy over a traditional statistical or conditional model and can apply themselves quite
well to the two primary types of targets.
5.10 LSTM
Long Short-Term Memory (LSTM) networks are a modied version of recurrent neural
networks that enables the memory storage of past data. RNNs vanishing gradient
problem is solved here. LSTM is ideal for classifying, analyzing, and forecasting time
series owing to uncertain time lags.
6 Performance
6.1 Multinomial Naive Bayes
We found that if the alpha value set to 0.9, Multinomial Naive Bayes gets a maximum
of 76.65% accuracy. Table 2shows the performance of Multinomial Naive Bayes. And
Table 3shows the sensitivity analysis for this algorithm.
Table 2. Multinomial Naive Bayes performance
Dataset CM ACC ROC AUC
Apurba [342, 264]
[195, 658]
68.54% 73.05%
ABSA sports [[38, 72]
[55, 379]]
76.65% 67.93%
ABSA restaurant [225, 37]
[52, 48]
75.41% 72.64%
All dataset [566, 466]
[271, 1061]
68.82% 73.05%
Establishing a Formal Benchmarking Process 437
6.2 Bernoulli Naive Bayes
For all datasets, we found the alpha value of 0.8 got the best performance. Table 4
shows the performance, and Table 5shows the sensitivity analysis for Bernoulli Naive
Bayes.
6.3 Logistic Regression
Table 6shows the performance, and Table 7shows the sensitivity analysis for Logistic
Regression.
Table 3. Sensitivity analysis of multinomial Naive Bayes
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 77.14 56.44 22.86 43.56 71.37 63.69 28.63 36.31 74.14
ABSA sports 87.33 34.55 12.67 65.45 84.04 40.86 15.96 59.14 85.65
ABSA restaurant 48.0 85.88 52.0 14.12 56.47 81.23 43.53 18.77 51.89
All dataset 79.65 54.84 20.35 45.16 69.48 67.62 30.52 32.38 74.22
Table 4. Bernoulli Naive Bayes performance
Dataset CM ACC ROC AUC
Apurba [342, 264]
[195, 658]
69.16% 73.27%
ABSA sports [23, 87]
[20, 414]
80.33% 70.50%
ABSA restaurant [225, 37]
[52, 48]
71.82% 73.64%
All dataset [566, 466]
[271,1061]
67.98% 73.54%
Table 5. Sensitivity analysis of Bernoulli Naive Bayes
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 78.19 56.44 21.81 43.56 71.64 64.77 28.36 35.23 74.78
ABSA sports 92.86 23.64 7.14 76.36 82.75 45.61 17.25 54.39 87.51
ABSA restaurant 25.0 89.69 75.0 10.31 48.08 75.81 51.92 24.19 32.89
All dataset 80.56 51.74 19.44 48.26 68.3 67.34 31.7 32.66 73.92
438 AKM Shahariar Azad Rabby et al.
6.4 Random Forest
Table 8shows the performance, and Table 9shows the sensitivity analysis for the
Random Forest model.
Table 6. Logistic Regression performance
Dataset CM ACC ROC AUC
Apurba [338, 268]
[203, 650]
67.72% 72.51%
ABSA sports [23, 87]
[20, 414]
80.33% 70.50%
ABSA restaurant [237, 25]
[66, 34]
74.86% 75.39%
All dataset [566, 466]
[276, 1056]
68.61% 74.30%
Table 7. Sensitivity analysis of logistic regression
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 76.2 55.78 23.8 44.22 70.81 62.48 29.19 37.52 73.4
ABSA sports 95.39 20.91 4.61 79.09 82.63 53.49 17.37 46.51 88.56
ABSA restaurant 34.0 90.46 66.0 9.54 57.63 78.22 42.37 21.78 42.77
All dataset 79.28 54.84 20.72 45.16 69.38 67.22 30.62 32.78 74.0
Table 8. Random Forest performance
Dataset CM ACC ROC AUC F1 Precision Recall
Apurba [340, 266]
[309, 544]
60.59% 65.56% 65.42% 67.16% 63.77%
ABSA sports [47, 63]
[41, 393]
80.88% 73.30 88.31% 86.18% 90.55%
ABSA restaurant [240, 22]
[75, 25]
73.20% 70.00% 34.01% 53.19% 25%
All dataset [629, 403]
[387, 945]
66.58% 71.36% 70.52% 70.10% 70.94%
Table 9. Sensitivity Analysis of Random Forest
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 64.71 59.08 35.29 40.92 69.0 54.32 31.0 45.68 66.79
ABSA sports 88.71 43.64 11.29 56.36 86.13 49.48 13.87 50.52 87.4
ABSA restaurant 28.0 91.98 72.0 8.02 57.14 77.0 42.86 23.0 37.58
All dataset 68.77 62.02 31.23 37.98 70.03 60.61 29.97 39.39 69.39
Establishing a Formal Benchmarking Process 439
6.5 Decision Tree Classier
Table 10 shows the performance, and Table 11 shows the sensitivity analysis of the
Decision Tree Classier.
6.6 K-NN Classier
Table 12 shows the performance, and Table 13 shows the sensitivity analysis of KNN.
Table 10. Decision Tree performance
Dataset CM ACC ROC AUC F1 Precision Recall
Apurba [316, 290]
[341, 512]
56.75% 57.11% 61.87% 63.84% 60.02%
ABSA sports [49, 61]
[73, 361]
75.37% 65.88% 84.34% 85.55% 83.18%
ABSA restaurant [216, 46]
[55, 45]
72.10% 65.13% 47.12% 49.45% 45%
All dataset [601, 431]
[492, 840]
60.96% 60.99% 64.54% 66.09% 63.06%
Table 11. Sensitivity analysis of decision tree
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 58.85 55.61 41.15 44.39 65.11 48.98 34.89 51.02 61.82
ABSA sports 83.18 47.27 16.82 52.73 86.16 41.6 13.84 58.4 84.64
ABSA restaurant 41.0 82.06 59.0 17.94 46.59 78.47 53.41 21.53 43.62
All dataset 63.21 60.95 36.79 39.05 67.63 56.21 32.37 43.79 65.35
Table 12. K-NN Classier performance
Dataset CM ACC ROC AUC
Apurba [293, 313]
[308, 545]
57.44% 57.42%
ABSA sports [25, 85]
[29, 405]
79.04% 66.31%
ABSA restaurant [236, 26]
[77, 23]
71.55% 63.69%
All dataset [500, 532]
[368, 964]
61.92% 63.10%
440 AKM Shahariar Azad Rabby et al.
6.7 SVM Classier
Table 14 shows the performance, and Table 15 shows the sensitivity analysis of the
SVM.
6.8 Ada-Boost Classier
We got the best accuracy for Ada-Boost if the number of the estimator set to 50.
Table 16 shows the performance, and Table 17 shows the sensitivity analysis of the
Ada-Boost Classier.
Table 13. Sensitivity analysis of KNN
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 63.89 48.35 36.11 51.65 63.52 48.75 36.48 51.25 63.71
ABSA sports 93.32 22.73 6.68 77.27 82.65 46.3 17.35 53.7 87.66
ABSA restaurant 23.0 90.08 77.0 9.92 46.94 75.4 53.06 24.6 30.87
All dataset 72.37 48.45 27.63 51.55 64.44 57.6 35.56 42.4 68.18
Table 14. SVM performance
Dataset CM ACC ROC AUC
Apurba [293, 313]
[308, 545]
66.83% 72.24%
ABSA sports [25, 85]
[29, 405]
70.77% 69.37%
ABSA restaurant [236, 26]
[77, 23]
69.89% 72.87%
All dataset [500, 532]
[368, 964]
67.94% 73.95%
Table 15. Sensitivity analysis of SVM
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 69.75 62.71 30.25 37.29 72.47 59.56 27.53 40.44 71.09
ABSA sports 75.81 50.91 24.19 49.09 85.9 34.78 14.1 65.22 80.54
ABSA restaurant 62.0 72.9 38.0 27.1 46.62 83.41 53.38 16.59 53.22
All dataset 70.35 64.83 29.65 35.17 72.08 62.88 27.92 37.12 71.2
Establishing a Formal Benchmarking Process 441
6.9 XGBoost
Table 18 shows the performance, and Table 19 shows the sensitivity analysis of
XGBoost.
Table 16. ADA Boost performance
Dataset CM ACC ROC AUC
Apurba [293, 313]
[308, 545]
64.22% 65.92%
ABSA sports [25, 85]
[29, 405]
79.42% 66.74%
ABSA restaurant [236, 26]
[77, 23]
73.20% 69.38%
All dataset [500, 532]
[368, 964]
65.44% 70.44%
Table 17. Sensitivity analysis of ADA Boost
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 82.77 38.12 17.23 61.88 65.31 61.11 34.69 38.89 73.01
ABSA sports 96.77 11.82 3.23 88.18 81.24 48.15 18.76 51.85 88.33
ABSA restaurant 18.0 93.89 82.0 6.11 52.94 75.0 47.06 25.0 26.87
All Dataset 82.88 42.93 17.12 57.07 65.21 66.02 34.79 33.98 72.99
Table 18. XGBoost performance
Dataset CM ACC ROC AUC
Apurba [291, 315]
[140, 713]
68.81% 6580
ABSA sports [15, 95]
[16, 418]
79.60% 54.97%
ABSA restaurant [244, 18]
[67, 33]
76.52% 63.06%
All dataset [490, 542]
[185, 1147]
69.25% 66.80%
Table 19. Sensitivity Analysis of XGBoost
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 83.59 48.02 16.41 51.98 69.36 67.52 30.64 32.48 75.81
ABSA sports 96.31 13.64 3.69 86.36 81.48 48.39 18.52 51.61 88.28
ABSA restaurant 33.0 93.13 67.0 6.87 64.71 78.46 35.29 21.54 43.71
All dataset 86.11 47.48 13.89 52.52 67.91 72.59 32.09 27.41 75.94
442 AKM Shahariar Azad Rabby et al.
6.10 LSTM
In word2vec [31], vector representations help to get a closer relationship among the
words. Deep learning models such as LSTMs can remember important information
across long stretches of sequences [32]. For semantic understanding or meaningthat
based on context, it is important to get the actual sentiment of a sentence [4].
Hence LSTM model with word2vec has been implemented to get the results over the
newly published corpora. Here are the implementation details:
Word Embedding using vord2vec
Window size: 2
Minimum word count frequency is 4 (ignored lower than 4)
The dimensionality of the word vectors: 100
Embedding layer dropout: 50
LSTM layer dropout: 20
Recurrent dropout: 20
The dimensionality of the output space 100
Activation function: Sigmoid
Optimizer: Adam
Loss function: Binary cross-entropy
Number of Epoch: 10
Batch Size: 100
Table 20 shows the performance, and Table 21 shows the sensitivity analysis of the
datasets. For the ABSA dataset, it doesnt work well for the lack of enough data in both
classes. So, the model was biased for those two ABSA datasets. Figure 4is showing
the proposed LSTM model.
Table 20. LSTM performance
Dataset CM ACC ROC AUC
Apurba [361, 245]
[175, 678]
69.52% 69.53%
ABSA sports [0, 110]
[0, 434]
79.77% 50%
ABSA restaurant [262, 0]
[100, 0]
72.38% 50%
All dataset [579, 453]
[181, 1151]
73.18% 71.26%
Establishing a Formal Benchmarking Process 443
7 Discussion
In this section, we will benchmark the ten algorithms. Table 22 shows the comparison
of all the algorithms on all the datasets.
The algorithms are sorted based on their performance on the merged dataset.
According to this evaluation, LSTM performs the best, followed by XGBoost and
Multinomial Naive Bayes and so forth.
Fig. 4. Proposed LSTM architecture
Table 21. Sensitivity Analysis of LSTM
Dataset TPR TNR FNR FPR PPV NPV FDR FOR F1
Apurba 79.25 60.56 20.75 39.44 73.88 67.46 26.12 32.54 76.47
ABSA sports 100 0 0 100 79.78 20.22 ––
ABSA restaurant 0 100 100 0 72.38 27.62
All dataset 82.81 62.5 17.19 37.5 74.03 73.80 25.97 26.20 78.17
Table 22. Benchmark comparison - 1
Algorithm Acc Apurba Acc Sports Acc Restaurant Acc All Data
LSTM 69.52% 79.77% 72.38% 73.18%
XGBoost 68.81% 79.60% 76.52% 69.25%
Multinomial Naive Bayes 68.54% 76.65% 75.42% 68.82%
Logistic Regression 67.72% 80.33% 74.86% 68.61%
Bernoulli Naive Bayes 69.16% 80.33% 71.82% 67.98%
SVM 66.83% 70.77% 69.89% 67.94%
Random Forest 60.59% 80.88% 73.20% 66.58%
ADA Boost 64.22% 79.42% 73.20% 65.44%
K-NN Classifier 57.44% 79.04% 71.55% 61.92%
Decision Tree Classifier 56.75% 75.37% 72.10% 60.96%
444 AKM Shahariar Azad Rabby et al.
Note that although LSTM performs best on the combined dataset, it was beaten by
Random Forest on the Sports and by XGBoost on the Restaurant datasets, respectively,
as noted by the highlighted cells in Table 22. Another point to note is that Bernoulli
Naive Bayes is twice in the second-best position: on the Apurba and the Sports
datasets, as indicated by the gray cells in Table 22.
To rank these algorithms based on how consistent they are, we start by assigning 1,
2, 10 positions for each dataset, and then adding up their ranks on each dataset. The
algorithm with the smallest sum can be ranked as most consistent, assuming the degree
of difculty of each dataset is the same, which, admittedly, we cannot know for sure.
But it still gives us a senseof how they perform over a range of different problem
domains. Table 23 shows this revised ranking. This indicates that LSTM and XGBoost
are tied in the rst place, followed by another tie between Multinomial Naive Bayes
and Logistic Regression. Decision Tree Classier is again at the bottom of this table.
Since LSTM seems to be leading the ranking on both tables, we should take a
closer look at this algorithm. LSTM is a deep learning algorithm. Therefore, it has a
different way of learning from data. The other six models are classication algorithms
using various types of features. As described earlier, LSTM learns the context or
semantic meaning from word2vec, but the rest of the models work on the frequency of
a given word from encoded vector representation. As the dataset contains only about
12,000 records, this is not enough for getting consistent and accurate output, especially
for LSTM, as it is learning the context or semantic lexicon. It needs more data to
perform better. We have tested the LSTM model by parameter tuning, input shufing,
and changing the input size. We found that it sometimes provides very different outputs
for small changes in the value of the parameters.
Table 23. Benchmark comparison - 2
Algorithm Accuracy
Apurba
Accuracy
sports
Accuracy
restaurant
Accuracy
all data
Sum of
rankings
Overall
ranking
LSTM 1 3 5 1 10 1st
XGBoost 3 4 1 2 10 1st
Multinomial
Naive Bayes
4 7 2 3 16 2nd
Logistic
Regression
5 2 3 4 14 2nd
Bernoulli Naive
Bayes
2 2 7 5 16 3rd
SVM 6 9 9 6 30 6th
Random Forest 8 1 4 7 20 4th
ADA Boost 7 5 4 8 24 5th
K-NN Classier 9 6 8 9 32 7th
Decision Tree
Classier
10 8 6 10 34 8th
Establishing a Formal Benchmarking Process 445
8 Conclusion and Future Work
This paper presents a detailed benchmarking of ten sentiment-analysis algorithms on
three publicly available Bangla datasets. One of the core issues that we face in Bangla
natural language processing research is the unavailability of standard datasets. In other
languages, such as English or Chinese, this is not a concern. The absence of a standard,
publicly available dataset means that every researcher has to rst collect and label the
data before any training can take place. And since each new algorithm is evaluated on a
different dataset, it is also virtually impossible to compare the different approaches in
terms of their accuracy and quality. We hope that this paper will alleviate those
problems to some degree. Since we have ne-tuned the algorithms for these particular
datasets, researchers in the future can improve on these algorithms by comparing their
performance against these benchmarked datasets, which will aid in the overall
improvement in the development of NLP tools for Bangla.
One of the essential factors in sentiment analysis that has not been addressed in this
paper is multi-aspect sentence evaluation. In a sentence, there might be multiple clauses,
and different clauses may have different sentiments. For example, examine the following
quote: Sakibs batting was good, but he did not bowl well.Here, we need to take the
sentiment based the aspects of batting and bowling. The same goes for customer
reviews: a product may be bad or good from different perspectives. So, a future task
would be to extend these benchmarking models for aspect-based sentiment analysis. For
sentiment analysis, there are some smarter and more complicated models, such as CNN-
LSTM, where the dimensional approach can provide more ne-grained sentiment
analysis [14]. We decided not to include those models since we wanted to start the
benchmarking with the fundamental, commonly used, algorithms, especially within the
nascent Bangla NLP domain. In the next iteration of this research, we plan to include
some of these more advanced models. Finally, the size of the datasets used in this
benchmarking is still minimal. We hope that other researchers will come forward and ll
this gap by publicly offering larger labeled datasets for Bangla sentiment analysis.
References
1. Rahman, F., Khan, H., Hossain, Z., Begum, M., Mahanaz, S., Islam, A., Islam, A.: An
annotated Bangla sentiment analysis corpus. In: 2019 International Conference on Bangla
Speech and Language Processing (ICBSLP) (2020)
2. Rahman, M., Kumar Dey, E.: Datasets for aspect-based sentiment analysis in Bangla and its
baseline evaluation. Data 3(2), 15 (2018)
3. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applications: A
survey (2014)
4. LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The
Handbook of Brain Theory and Neural Networks, vol. 3361, no. 10 (1995)
5. Le, M., Postma, M., Urbani, J., Vossen, P.: A deep dive into word sense disambiguation with
LSTM. In: Proceedings of the 27th International Conference on Computational Linguistics,
Santa Fe, New Mexico, USA, pp. 354356. Association for Computational Linguistics,
August 2018
6. Sentiment analysis using deep learning techniques: A review. Int. J. Adv. Comput. Sci. Appl
446 AKM Shahariar Azad Rabby et al.
7. Al-Amin, M., Islam, M.S., Uzzal, S.D.: Sentiment analysis of Bengali comments with
word2vec and sentiment information of words. In: 2017 International Conference on
Electrical, Computer and Communication Engineering (ECCE), pp. 186190. IEEE,
February 2017
8. Chowdhury, S., Chowdhury, W.: Performing sentiment analysis in Bangla microblog posts.
In: 2014 International Conference on Informatics, Electronics & Vision (ICIEV), pp. 16.
IEEE, May 2014
9. Hossain, M.S., Jui, I.J., Suzana, A.Z.: Sentiment analysis for Bengali newspaper headlines.
Doctoral dissertation, BRAC University (2017)
10. Hassan, A., Amin, M.R., Mohammed, N., Azad, A.K.A.: Sentiment analysis on Bangla and
Romanized Bangla text (BRBT) using deep recurrent models. arXiv:1610.00369 (2016)
11. Sumit, S.H., Hossan, M.Z., Al Muntasir, T., Sourov, T.: Exploring word embedding for
bangla sentiment analysis. In: 2018 International Conference on Bangla Speech and
Language Processing (ICBSLP), pp. 15. IEEE, September 2018
12. Asimuzzaman, M., Nath, P.D., Hossain, F., Hossain, A., Rahman, R.M.: Sentiment analysis
of Bangla microblogs using adaptive neuro fuzzy system. In: 2017 13th International
Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, pp. 1631
1638 (2017)
13. Tuhin, R.A., Paul, B.K., Nawrine, F., Akter, M., Das, A.K.: An automated system of
sentiment analysis from Bangla text using supervised learning techniques. In: 2019 IEEE 4th
International Conference on Computer and Communication Systems (ICCCS), pp. 360364.
IEEE (2019)
14. Uddin, A.H., Dam, S.K., Arif, A.S.M.: Extracting severe negative sentence pattern from
bangla data via long short-term memory neural network. In: 2019 4th International
Conference on Electrical Information and Communication Technology (EICT), pp. 16.
IEEE, December 2019
15. Tabassum, N., Khan, M.I.: Design an empirical framework for sentiment analysis from
Bangla text using machine learning. In: 2019 International Conference on Electrical,
Computer and Communication Engineering (ECCE), pp. 15. IEEE, February 2019
16. Alam, M.H., Rahoman, M.M., Azad, M.A.K.: Sentiment analysis for Bangla sentences using
convolutional neural network. In: 2017 20th International Conference of Computer and
Information Technology (ICCIT), pp. 16. IEEE, December 2017
17. Paul, A.K., Shill, P.C.: Sentiment mining from Bangla data using mutual information. In:
2016 2nd International Conference on Electrical, Computer & Telecommunication
Engineering (ICECTE), pp. 14. IEEE, December 2016
18. Tripto, N.I., Ali, M.E.: Detecting multilabel sentiment and emotions from Bangla YouTube
comments. In: 2018 International Conference on Bangla Speech and Language Processing
(ICBSLP), pp. 16. IEEE, September 2018
19. Taher, S.A., Akhter, K.A., Hasan, K.A.: N-gram based sentiment mining for Bangla text
using support vector machine. In: 2018 International Conference on Bangla Speech and
Language Processing (ICBSLP), pp. 15. IEEE, September 2018
20. Rabeya, T., Chakraborty, N.R., Ferdous, S., Dash, M., Al Marouf, A.: Sentiment analysis of
Bangla song review-a lexicon based backtracking approach. In: 2019 IEEE International
Conference on Electrical, Computer and Communication Technologies (ICECCT), pp. 17.
IEEE, February 2019
21. Haydar, M.S., Al Helal, M., Hossain, S.A.: Sentiment extraction from Bangla text: a
character level supervised recurrent neural network approach. In: 2018 International
Conference on Computer, Communication, Chemical, Material and Electronic Engineering
(IC4ME2), pp. 14. IEEE, February 2018
Establishing a Formal Benchmarking Process 447
22. Akter, S., Aziz, M.T.: Sentiment analysis on Facebook group using lexicon-based approach.
In: 2016 3rd International Conference on Electrical Engineering and Information Commu-
nication Technology (ICEEICT), pp. 14. IEEE, September 2016
23. Sharif, O., Hoque, M.M., Hossain, E.: Sentiment analysis of Bengali texts on online
restaurant reviews using multinomial Naïve Bayes. In: 2019 1st International Conference on
Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 16. IEEE,
May 2019
24. Fawcett, Tom: An introduction to ROC analysis (PDF). Pattern Recogn. Lett. 27(8), 861
874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010
25. Powers, D.M.W.: Evaluation: from precision, recall and f-measure to ROC, informedness,
markedness & correlation (PDF). J. Mach. Learn. Technol. 2(1), 3763 (2011)
26. Ting, K.M.: Encyclopedia of Machine Learning. Springer (2011). ISBN 978-0-387-30164-8
27. Brooks, H., Brown, B., Ebert, B., Ferro, C., Jolliffe, I., Koh, T.-Y., Roebber, P., Stephenson,
D.: WWRP/WGNE Joint Working Group on Forecast Verication Research. Collaboration
for Australian Weather and Climate Research. World Meteorological Organisation (2015).
Accessed 17 July 2019
28. Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefcient (MCC) over
F1 score and accuracy in binary classication evaluation. BMC Genom. 21(6) (2020).
https://doi.org/10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477
29. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional
transformers for language understanding. CoRR, vol. abs/1810.04805 (2018)
30. Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.:
Deep contextualized word representations. In: Proceedings of NAACL (2018)
31. Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efcient estimation of word representations
in vector space. CoRR, vol. abs/1301.3781 (2013)
32. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 17351780
(1997)
448 AKM Shahariar Azad Rabby et al.
... Consequently, data annotation has relatively a trouble-free job for this research domain [5,45,134]. But there is a lack of adequate domain-speciic data for regional languages like Bangla [26,42,55,57,61,90,98,129,131]. The major setback is the lack of enough annotated data [14,58,69] and for this reason, this language is still considered as a low-resourced language [40,70,103,104]. ...
... Learning. From DL approaches, LSTM, hybrid LSTM, and BiLSTM (Bidirectional Long Short-Term Memory), Gated Recurrent Unit (GRU) played the pivotal role in giving a better performance (in terms of accuracy) in this linguistic research [12,25,56,57,98,107,108,128]. An accuracy level of 77.85% [56] and 91.35% [57] were achieved respectively using BiLSTM. ...
... An accuracy level of 77.85% [56] and 91.35% [57] were achieved respectively using BiLSTM. Moreover, a hybrid LSTM approach called RNN based LSTM attained 85% accuracy applied to Bangla newspapers data [98]. They classiied their data as positive, negative, and neutral. ...
Article
Full-text available
The effortless expansion of Internet access has eventually transformed the dissemination behavior towards E-Mode. Thus the usage of online or, more specifically, ‘Digital’ texts has expanded abruptly. ‘Bangla’, the seventh most spoken language globally, has no different nature. Communication in the Bangla language has also been exposed on the Internet, which describes the feelings of individuals in any specific context. These enormously generated data from diverse sources have drawn the interest of the researchers working in the Natural Language Processing domain. Despite its relatively complicated structure, a lesser amount of annotated data, as well as a limited number of frameworks and approaches, exist. This lacking of resources has kept several stones unturned in this diverse, emotion-rich and widely spoken language. To bridge the lacking and absence of resources, this article aims to provide a generalized deduced working procedure in this domain. To do so, the existing research work in the domain of sentiment analysis using Bangla text has been collected, evaluated and summarized. Also, in this article, the techniques used in pre-processing, feature extraction, and eventually used algorithms have been identified and discussed. Considering these facts, this research work sketches a tentative blueprint of sentiment analysis using Bangla text. Additionally, this article discusses existing regional language corpora such as Tamil, Urdu, and Hindi, as well as English and methodologies used to extract emotional essence from Bangla language comparing other languages. That will assist in determining the probable chosen path of exploring Bangla in a more deeper aspect. Moreover, this work has deduced and presented a generalized framework that will direct aspiring researchers to decide the pathway of choosing data vis-à-vis methodologies based on their interests.
... 8 https://www.ynsitu.com/en/8-reasons-english-is-the-dominant-language/ P r e -P r i n t A c c e p t e d is a lack of adequate domain-specific data for regional languages like Bangla [1, [39][40][41][42][43][44][45][46]. The major setback is the lack of enough annotated data [31,47,48] and for this reason, this language is still considered as a low-resourced language [26,[49][50][51]. ...
... From DL approaches, LSTM, hybrid LSTM, and BiLSTM (Bidirectional Long Short-Term Memory), Gated Recurrent Unit (GRU) played the pivotal role in giving a better performance (in terms of accuracy) in this linguistic research P r e -P r i n t A c c e p t e d o n A C M T A L L I P [24,25,39,46,64,[105][106][107]. An accuracy level of 77.85% [24] and 91.35% [46] were achieved respectively using BiLSTM. ...
... An accuracy level of 77.85% [24] and 91.35% [46] were achieved respectively using BiLSTM. Moreover, a hybrid LSTM approach called RNN based LSTM attained 85% accuracy applied to Bangla newspapers data [39]. They classified their data as positive, negative, and neutral. ...
Preprint
Full-text available
The effortless expansion of Internet access has eventually transformed the dissemination behavior towards E-Mode. Thus the usage of online or, more specifically, ‘Digital’ texts has expanded abruptly. ‘Bangla’, the seventh most spoken language globally, has no different nature. Communication in the Bangla language has also been exposed on the Internet, which describes the feelings of individuals in any specific context. These enormously generated data from diverse sources have drawn the interest of the researchers working in the Natural Language Processing domain. Despite its relatively complicated structure, a lesser amount of annotated data, as well as a limited number of frameworks and approaches, exist. This lacking of resources has kept several stones unturned in this diverse, emotion-rich and widely spoken language. To bridge the lacking and absence of resources, this article aims to provide a generalized deduced working procedure in this domain. To do so, the existing research work in the domain of sentiment analysis using Bangla text has been collected, evaluated and summarized. Also, in this article, the techniques used in pre-processing, feature extraction, and eventually used algorithms have been identified and discussed. Considering these facts, this research work sketches a tentative blueprint of sentiment analysis using Bangla text. Additionally, this article discusses existing regional language corpora such as Tamil, Urdu, and Hindi, as well as English and methodologies used to extract emotional essence from Bangla language comparing other languages. That will assist in determining the probable chosen path of exploring Bangla in a more deeper aspect. Moreover, this work has deduced and presented a generalized framework that will direct aspiring researchers to decide the pathway of choosing data vis-à-vis methodologies based on their interests.
Article
Full-text available
Development of urban transport facilities via analysing indicators has become one of the new domains explored worldwide, as cities are growing with no control. The present state of sustainability testing practices feels that technical analysis is rather very complex collaboratively and comprehensively met with local statements. As a result, developing indicators assessment techniques has become inevitable to deal with these challenges. In this study, a comprehensive transport assessment index such as trustworthiness, Amenity, Receptivity, Assurance and Affinity for the cities through passenger response can be useful for the researchers in the city. A set of 5 indicators has been formulated among 500 respondents that might influence and shape urban public transport performance. For finding the Key Performance Indicator (KPI) of the Urban transport system, the hybrid Complex Proportional Assessment (COPRAS)—Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) is applied to measure the quantitative weight to analyze the complex process of perceived service quality. This model is validated by different Multi-Criteria Decision Making (MCDM) such as Analytic Hierarchy Process (AHP) and Fuzzy-AHP by comparing performance measures for randomly selected elements, which is new to mixed traffic flow conditions. The performance obtained such as computational complexity and agility in decision making by the hybrid COPRAS-TOPSIS are 132 and 1245. Hence, the proposed approach was more accurate than conventional ones, thus used to support decision-makers for long-term designing and planning of transport networks based on priority. Based on these factors, the present study successfully develops and applies KPI analysis prediction of the general transport approach in the background of India.
Article
Full-text available
Background: To evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet. Accuracy and F1 score computed on confusion matrices have been (and still are) among the most popular adopted metrics in binary classification tasks. However, these statistical measures can dangerously show overoptimistic inflated results, especially on imbalanced datasets. Results: The Matthews correlation coefficient (MCC), instead, is a more reliable statistical rate which produces a high score only if the prediction obtained good results in all of the four confusion matrix categories (true positives, false negatives, true negatives, and false positives), proportionally both to the size of positive elements and the size of negative elements in the dataset. Conclusions: In this article, we show how MCC produces a more informative and truthful score in evaluating binary classifications than accuracy and F1 score, by first explaining the mathematical properties, and then the asset of MCC in six synthetic use cases and in a real genomics scenario. We believe that the Matthews correlation coefficient should be preferred to accuracy and F1 score in evaluating binary classification tasks by all scientific communities.
Conference Paper
Full-text available
Sentiment analysis has become a leading context for scientific and commercial market research in the field of machine learning. Currently, it’s a more prominent research field of Bangla language processing system as there are few research works regarding sentiment analysis for this language. In essence, sentiment analysis is an automated process of text mining to determine the emotion from a given text. By using sentiment analysis, a given text can be categorized into several emotions. This paper deals with six individual emotion classes- happy, sad, tender, excited, angry and scared. Here, we proposed two methods of machine learning techniques- Naïve Bayes Classification Algorithm and Topical approach to extract the emotion from any Bangla text. Proposed methods have been applied for both article and sentence level of scope. A comparative analysis of the performance between these two methods has been done, and the topical approach achieved the best performance for both levels of magnitude.
Conference Paper
Full-text available
Opinion Mining is a valuable knowledge resource to understand the collective opinions and to take better decisions. It is a Natural Language Processing (NLP) task that decides whether a text expresses positive or negative sentiment. Web contents are increasing rapidly and providing a huge number of information. It is an important research issue to analyze and organize these enormous information for better knowledge extraction. In this paper, we emphasis on opinion mining for Bangla text using web based diverse data. We apply both Linear and Nonlinear Support Vector Machine as machine learning technique and N -gram method to classify Bangla documents collected from social media sites. Most of works in this arena take a single word as a vector. Instead of thinking a single word as a vector, we used one vector containing more than one words using N-gram. N-grams of texts are extensively used in text mining and natural language processing tasks. We found better results using N-grams for different values of n.
Conference Paper
Full-text available
Sentiment analysis has become a key research area in natural language processing due to its wide range of practical applications that include opinion mining, emotions extraction, trends predictions in social media, etc. Though the sentiment analysis in English language has been extensively studied in recent years, a little research has been done in the context of Bangla language, one of the most spoken languages in the world. In this paper, we present a comprehensive set of techniques to identify sentiment and extract emotions from Bangla texts. We build deep learning based models to classify a Bangla sentence with a three-class (positive, negative, neutral) and a five-class (strongly positive, positive, neutral, negative, strongly negative) sentiment label. We also build models to extract the emotion of a Bangla sentence as any one of the six basic emotions (anger, disgust, fear, joy, sadness and surprise). We evaluate the performance of our model using a new dataset of Bangla, English and Romanized Bangla comments from different types of YouTube videos. Our proposed approach shows 65.97% and 54.24% accuracy in three and five labels sentiment, respectively. We also show that the performance of our model is better for domain and language specific texts.
Conference Paper
Full-text available
Sentiment Analysis (SA), sometimes known as opinion mining, polarity analysis or emotional AI, is a study of analyzing user's reviews, ratings, recommendations and other forms of online expressions. Most of the research work on SA in Natural Language Processing (NLP) are focused on the English language. However, Bengali is spoken as the first language by almost 230 million people worldwide, 163.9 million of whom are Bangladeshi. These people are found to get increasingly involved in online activities on popular microblogging and social networking sites, sharing opinions and thoughts and most of them are in Bengali and Romanized Bengali (English character to write Bengali) language. These online opinions are changing the way of doing business. And lots of data are being generated each year which are being underutilized. In this paper, we have experimented current state of the art word embedding methods Word2vec Skip-Gram and Continuous Bag of Words with an addition Word to Index model for SA in Bangla language. Word2vec Skip-Gram model outperformed other models and achieved 83.79% accuracy.