Conference PaperPDF Available

Sentiment Analysis of Bangla text using Gated Recurrent Neural Network

Authors:

Figures

Content may be subject to copyright.
Sentiment Analysis of Bangla text using Gated
Recurrent Neural Network
Nasif Alvi1, Kamrul Hasan Talukder1, Abdul Hasib Uddin1
1Khulna University, Khulna, Bangladesh
Emails: nasif.cse12@gmail.com, khtalukder@gmail.com, abdulhasibuddin@gmail.com
Abstract. Sentiment analysis is a fundamental part of Natural Language
Processing. There are numerous works on this topic in English and other
languages. However, it is still a comparatively new practice in Bangla. The
absence of a suitable Bangla corpus is the primary obstacle for sentiment
analysis tasks in Bangla. Nonetheless, Long Short-Term Memory (LSTM) is a
common technique for resolving sentiments from a dataset containing a large
amount of text data. However, Gated Recurrent Unit (GRU) is very efficient for
datasets with a low amount of text data. In this manuscript, we present a 5 -
layered GRU neural network model, each layer comprising of 48 neurons,
applied the model on an existing Bangla corpus. We have implemented the 10-
folds cross-validation approach and repeated the same processes three times.
Each time, we have considered the averages of the ten validation accuracy and
losses and compared the results with the state-of-the-art published outcome
(77.85% highest accuracy) for Bi-directional LSTM (BLSTM). The highest
accuracy for our model is 78.41%, while the lowest accuracy is 76.34%.
Keywords: Sentiment analysis, Natural Language Processing, Corpus, Neural
Network, Text data, LSTM, GRU, BLSTM.
1 Introduction
Sentiment Analysis (SA) is a technique to find as well as to classify thoughts
expressed in a portion of perusal based on several types of terminologies like
computer technology especially in order to decide if the conduct of the writer against
a particular subject, upshots and so on is definitive, opposite or indifferent. Sentiment
analysis frequently applies on management among ideas, thoughts as well as temporal
texts. SA offers detailed information pertaining to universal judgments since it runs
into the entire various forms of prattles, ratings as well as feedback. SA is basically
kind of validated tactic for forecasting a variety of important situations, such as box
office film reviews as well as universal and provincial particles. Universal views are
applied to worth a particular motive like an individual, commodity or venue, as well
as it can be seen on various websites such as Amazon and Yelp. It is possible to
define emotions in definitive, opposite or indifferent classes as well as major tribes. If
the conductor has a satisfying and affirmative experience and bad impact, SA will
instantly discover the articulate course of user feedback or opinions. In the area of
classification of emotion, views or people's emotions are analyzed. In social media
and in virtually any system, these kinds of programs are used. The views or emotions
are the resemblance of the values, choices and actions of individuals. With these
techniques it is possible for corporations to make political decisions. In current years,
a great number of individuals are sharing their opinions or thoughts through the
internet using Bangla [1].
The development of restaurants across numerous online channels can be observed
over the last few years. Websites have turned out as the most common forum where
restaurants are upheld on the principle of the opinions of customers. The
representation of consumer sentiment results from such online customer feedback that
magnify a restaurant's overall quality. The contact between customers and owners
through the online portal provides the ability to examine the response of the
customer's insights. It is therefore necessary to be able to measure consumer opinion
in order to improve quality according to the demands of consumers. The advantage of
potential research will be given by a qualified computer by labelled data. Works such
as the CNN model for Bangla reviews for factor extraction, mixed machine learning
models for forecasting reviews. Sentiment analysis has already been a common form
of forecasting consumer ratings. Few research has been performed on the Bangla text
in particular, but not so effectively. A Sentiment Analysis model for restaurant
ranking was developed by authors on the basis of food price, quality, operation,
ambience and special meaning [2].
SA is basically an implementation of physical dialect technology. It is recognized as
concept mining, sentiment elimination. At present, Sentiment Analysis vastly refers to
the countable provision of thoughts or analysis, computational linguistics, natural
language processing as well as biometrics for the systematic detection, retrieval,
quantification, as well as study of affective states and subjective knowledge.
Moreover, recent advances in research into machine learning, especially deep
research, methods focused on learning, for example recurrent neural network (RNN),
accept advantage of the ability to infer choices through formulating a diagram in SA
[3].
Micro-blogging platforms such as Twitter, YouTube, Facebook, etc. have now
become very popular for social connections. Through social media, people
communicate their sadness, which can be studied to determine the reasons behind
their depression. Most studies on the study of emotions as well as depression are
focused on inquisitions as well as scholarly interviews in non-Bengali languages,
especially English. In identifying human depression, these conventional approaches
are not always sufficient. Artificial Intelligence's aim is to mimic human habits, then
evaluate them. Machine learning as well as Deep Learning approaches are nowadays
being vastly used for analyzing human behavior as well as human sentiment.
Detecting emotion and analyzing sentiment has become an important part and for this
several types of learning methods are used. It is possible to further study the
classification of feelings and emotions from two separate viewpoints, especially the
detection of feelings as well as emotions from image data, and the detection of
feelings as well as emotions from textual data. The total field of definitive, opposite
as well as mystical emotion alignment works is covered in a common way by
sentiment analysis. Emotions, e.g. happiness, grief, depression, disgust, etc. are very
profound emotions that are often harder to analyze. Any of those thoughts are
stronger than others, needing research of high-level clinical experience as well as
much specialized empirical methods. For this reason; sentiment analysis is foremost
important [4].
In this research study, we have used Gated Recurrent Unit (GRU) for the sentiment
analysis with 7,000 Bangla text data. With the 10-folds cross-validation approach; the
process has been implemented and highest accuracy (78.41%) has been obtained for
the sentiment analysis.
2 Related Work
Hoque et al. [3] examined execution of various ML approaches along with doc2vec
for categorizing sentiment of Bangla regular dialects. They streamed a doc2vec model
utilizing a corpus developed with seven thousand Bangla sentence and with 120
component of highlight vectors with two kinds of information: positive and negative.
Then they utilized a few ML algorithms (LR, SGD, SVM, K-Neighbors Classifier,
DT, LDA, SM, BLSTM and GaussianNB) for analysis where BLSTM acquired
highest accuracy. The information was split 80% as training and rest 20% as testing
haphazardly.
Uddin et al. [5] established a Gated Recurrent Unit model based on depression
detection method by analysis. All of the data culled from Bangla information from
Twitter, Facebook and different sources. There were 4 hyper- parameters,
specifically, number of GRU layers 5, group size 10 and number of epochs 5. They
had collected 5,000 Bangla information from Twitter and 210 depressed Bangla
statements from local Bengali speakers utilizing google structure. They utilized GRU
size 64, 128, 256, 512, and 1024 for this investigation.
Hossain et al. [6] proposed a joint model with CNN-LSTM to conduct sentiment
analysis on online restaurant surveys. They utilized the dataset into 80% for training
with CONV size 256 and LSTM size 128. They collected the information of those
restaurants that were related with online platform like FoodPanda and Shohoz Food
consisting of 1000 reviews Review and category were two sections. At last the recall,
precision and f1-score average values were 0.70, 0.70 and 0.71.
Sharfuddin et al. [1] accomplished their work on sentiment classification of Bangla
content utilizing RNN with BLSTM(Bidirectional LSTM) where contained around
15000 comments got from Facebook and at that point kept 10000 comments
consisting of 5000 negative comments and 5000 positive comments and all the
symbols, emojis, stickers, numbers were erased to work on plain Bangla content.
Hasan et al. [7] developed a model that recognized the sentiment assessment from
Bangla text utilizing logical valence examination. In this investigation, utilized the
WorldNet to get the feelings of each word as per its grammatical features (POS) and
SentiWordNet to get the earlier valence of each word. At that point determined the
total positivity, negativity and neutrality of sentence or archive regarding all out
sense. They made a XML document to store the Bangla word and its related POS and
take the assessment of 20-30 people groups about the sentiment of the section.
Tripto et al. [8] introduced an extensive group of methods to recognize sentiment and
concentrated on feelings from Bangla texts. In this study, LSTM, SVM, NB, CNN
classifiers and the dataset of Bangla sentence along with a 3 class that were
affirmative, neutral, negative and a 5 class that were strongly positive, negative,
neutral, positive, strongly negative of the estimation name with six fundamental
feelings (anger, fear, disgust, sadness, joy and surprise) were used. They assessed the
exhibition of the model utilizing another dataset of Bangla, Romanized Bangla and
English comments from various sorts of YouTube recordings. Their mentioned
methods indicated 54.24% and 65.97% accuracy in 3 and 5 names feeling
individually.
Al-Amin et al. [9] analyzed a methodology of sentiment characterization and
sentiment extraction of words and Bangla comments with word2vec. The data set had
multiline comments and 16,000 Bangla single lines that were gathered from popular
blogging websites and tagged every comment to one or the other positive or
pessimistic by taking suppositions from various kinds of individuals by overviews.
They prepared 90% of the tagged comments picked arbitrarily as well as the leftover
10% because of testing.
In our previous analysis [10] we classified English tweets into five categories: happy,
surprise, sad, disgust and neutral. We used total 4000 tweets as our dataset: 3750 as
training set and 250 as test set. Conducting unigram model and unigram using POS
tag model 66% and 64.8% accuracies were achieved respectively.
3 Methodology
The flowchart of the research methodology is shown in Fig. 1.
3.1 Dataset Collection:
The dataset was collected from Hoque et. al [3] for sentiment analysis in Bangla text
including Positive and Negative sentiment. The total number of samples were 7000
where 3500 samples were positive sentiment and rest of 3500 samples were negative
sentiment.
3.2 Features Extraction:
To extract the feature information “Integer encode” method was used in this study.
The integer values have a characteristic arranged connection between one another and
AI calculations might have the option to comprehend and saddle this relationship. The
total length of dataset was obtained 21889. Total vector size was taken same as the
maximum length of sentence. After that Zero padding was used to keep the length of
each text same.
Fig. 1. Working procedure of our system
3.3 Dataset Training:
10-fold cross-validation is the most popular technique to train the dataset. It is a re-
examining procedure to evaluate predictive models by parceling the first instance into
a preparation set to build the model and a test set to evaluate it. It rearranges the
dataset haphazardly, sections dataset into 10 set lastly compact the aptitude of the
model using the case of model assessment scores. In this study 10-fold cross-
validation (6300 training data and 700 test data) was used and it was calculated 3
times. Each time the training data was shuffled to learn efficiently.
3.4 Gated Recurrent Unit Network (GRU):
GRU network is the streamlined structure of the repetitive neural organization.
Notwithstanding, at the point when the info data is expanded to a specific length, the
RNN can't associate with the significant data. GRU network is pointed toward
tackling the issue of long-range reliance as well as slope vanishing of RNN. The GRU
neural organization along with smaller edge structure as well as better productivity is
straightforwardly chosen for the determination of stuff pitting shortcoming. Like
GRU, an intermittent unit in RNN is recognized as long transient memory. LSTM as
well as GRU have the similar objective of following long haul conditions viably while
alleviating the disappearing/detonating inclination issues [11]. The GRU neural
organization model adjusts to the issue of reliance on an assortment of time scales by
arranging a wide range of cycle units which balance the progression of data with the
door unit [12].
In our case, we have used 5 GRU layers with each layer containing 48 neurons. Then
the flatten method has been used to convert the whole matrix into 1D vector before
the dense layer which is defined as output layer. In dense layer there remains 2
neurons named positive sentiment and negative sentiment. The activation function has
been used “tanh” in hidden layers and in the final activation the “softmax” function
has been used. For reducing the loss or error rate the “Categorical_crossentropy_loss”
function has been used.
4 Result and Analysis
We have applied 10 fold cross validation three times on our dataset to compare the
results. In each iteration we have applied shuffling on our training dataset to train the
network properly.
Table 1 shows all the validation accuracies and validation losses along with the
number of epochs in each fold in the three times running. Fig. 2 and Fig. 3 represents
the graphical view of validation accuracy and validation loss in each fold in all our
three times running. The lowest validation accuracies were found in fold10 for all the
three times. On the other hand, we achieved highest validation accuracy of 90.71% in
fold3 in our first iteration using 13 epochs. We achieved average accuracy of 78.41%,
78.04% and 76.34% in our three times running respectively. Fig. 4 represents the
graphical view of the average accuracy and average loss of all the three times running.
Table 1. Results of tuning GRU Hyper-parameter
RUN 1
RUN 2
RUN 3
Implementation
no.
No. of epochs
Validation
accuracy
Validation loss
No. of epochs
Validation
accuracy
Validation loss
No. of epochs
Validation
accuracy
Validation loss
Fold1
9
76.14%
0.5033
10
73.71%
0.5760
9
72.14%
0.5551
Fold2
11
85.14%
0.3924
10
84.86%
0.3868
10
85.71%
0.3614
Fold3
13
90.71%
0.2626
12
89.86%
0.2828
15
90.00%
0.2892
Fold4
11
84.71%
0.3270
12
85.29%
0.3138
11
84.86%
0.3477
Fold5
10
86.86%
0.3193
11
87.00%
0.3458
11
86.29%
0.3570
Fold6
10
86.71%
0.3429
10
84.57%
0.3582
12
85.57%
0.3554
Fold7
10
82.14%
0.4216
9
81.43%
0.4031
9
81.14%
0.4154
Fold8
7
62.14%
0.6760
8
63.86%
0.6811
7
61.57%
0.6781
Fold9
13
78.71%
0.5899
8
71.71%
0.5966
8
70.00%
0.6233
Fold10
6
50.86%
0.6971
8
58.14%
0.6883
6
46.14%
0.6939
Fig. 2. Validation accuracy of 10 fold cross validation in three times running
Fig. 3. Validation loss of 10 fold cross validation in three times running
(a)
(b)
Fig. 4. Graphical view of (a) average accuracy and (b) average loss in three times run
Table 2 shows the comparison of our system with Hoque et. al [3]. For analysis
Hoque et. al [3] randomly split their data 80% as training set and 20% as test set.
However, randomly splitting a dataset is not the most standard way to learn a model.
Because, it does not ensure the participation of all data for training. Hence, in our
analysis, we applied 10 fold cross validation three times by shuffling the dataset each
times to achieve more accurate result. 10 fold cross validation is one of the most
popular and accepted technique to learn a model while 10 fold cross validation
ensures the participation of all data in training the model. We achieved the superior
average accuracy of 78.41% as well as lowest average accuracy of 76.34%. On the
other hand, Hoque et. al [3] achieved highest accuracy of 77.85% using BLSTM and
lowest accuracy of 59.21% using GaussianNB.
Table 2. Comparison of our system with Hoque et. al [3]
Our system
Hoque et. al [3]
Highest accuracy
78.41%
77.85%
Lowest accuracy
76.34%
59.21%
5 Conclusion
There exists few researches on Bangla text sentiment analysis. For this reason, the
dataset on Bangla text is rarely available. In the field of research, sentiment analysis is
an emerging topic, we should try to build more accurate model on native languages.
In our research we have used an existing dataset and we have trained this dataset
using GRU. Our system outperforms the previous one. There remains some
limitations in our research. In future, we want to apply more preprocessing techniques
and other feature extraction method into our data to get better result. Also to compare
the performances, we want to use other classification algorithms. We want to further
continue our study to develop multi class sentiment analysis or emotion analysis.
Acknowledgement:
This research work has been funded by Information and Communication Technology
(ICT) Division, Ministry of Post, Telecommunication, and Information Technology,
Government of the People’s Republic of Bangladesh through ICT fellowship.
References:
1. A. Aziz Sharfuddin, M. Nafis Tihami and M. Saiful Islam, "A Deep Recurrent Neural
Network with BiLSTM model for Sentiment Classification," International Conference on
Bangla Speech and Language Processing (ICBSLP), Sylhet, 2018, pp. 1-4, doi:
10.1109/ICBSLP.2018.8554396.
2. N. Hossain, M. R. Bhuiyan, Z. N. Tumpa and S. A. Hossain, "Sentiment Analysis of
Restaurant Reviews using Combined CNN-LSTM," 11th International Conference on
Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India,
2020, pp. 1-5, doi: 10.1109/ICCCNT49239.2020.9225328.
3. M. T. Hoque, A. Islam, E. Ahmed, K. A. Mamun and M. N. Huda, "Analyzing
Performance of Different Machine Learning Approaches With Doc2vec for Classifying
Sentiment of Bengali Natural Language," International Conference on Electrical,
Computer and Communication Engineering (ECCE), Cox'sBazar, Bangladesh, 2019, pp.
1-5, doi: 10.1109/ECACE.2019.8679272.
4. X. Wang, C. Zhang, Y. Ji, L. Sun, L. Wu and Z. Bao, "A Depression Detection Model
Based on Sentiment Analysis in Micro-blog Social Network", Lecture Notes in
Computer Science, 2013, pp. 201-213. Available: 10.1007/978-3-642-40319-4_18.
5. A. H. Uddin, D. Bapery and A. S. Mohammad Arif, "Depression Analysis of Bangla Social
Media Data using Gated Recurrent Neural Network," 1st International Conference on
Advances in Science, Engineering and Robotics Technology (ICASERT), Dhaka,
Bangladesh, 2019, pp. 1-6, doi: 10.1109/ICASERT.2019.8934455.
6. N. Hossain, M. R. Bhuiyan, Z. N. Tumpa and S. A. Hossain, "Sentiment Analysis of
Restaurant Reviews using Combined CNN-LSTM," ICCCNT, Kharagpur, India, 2020, pp.
1-5, doi: 10.1109/ICCCNT49239.2020.9225328.
7. K. M. A. Hasan, Mosiur Rahman and Badiuzzaman, "Sentiment detection from Bangla text
using contextual valency analysis," 2014 17th International Conference on Computer and
Information Technology (ICCIT), Dhaka, 2014, pp. 292-295, doi:
10.1109/ICCITechn.2014.7073151.
8. N. Irtiza Tripto and M. Eunus Ali, "Detecting Multilabel Sentiment and Emotions from
Bangla YouTube Comments," ICBSLP, Sylhet, 2018, pp. 1-6, doi:
10.1109/ICBSLP.2018.8554875.
9. M. Al-Amin, M. S. Islam and S. Das Uzzal, "Sentiment analysis of Bengali comments with
Word2Vec and sentiment information of words," ECCE, Cox's Bazar, 2017, pp. 186-190,
doi: 10.1109/ECACE.2017.7912903.
10. A. Z. Riyadh, N. Alvi and K. H. Talukder, "Exploring human emotion via Twitter," ICCIT,
Dhaka, 2017, pp. 1-5, doi: 10.1109/ICCITECHN.2017.8281813.
11. X. Li, J. Li, Y. Qu and D. He, "Gear Pitting Fault Diagnosis Using Integrated CNN and
GRU Network with Both Vibration and Acoustic Emission Signals", Applied Sciences,
2019, vol. 9, no. 4, p. 768. Available: 10.3390/app9040768.
12. B. Liu, C. Fu, A. Bielefield and Y. Liu, "Forecasting of Chinese Primary Energy
Consumption in 2021 with GRU Artificial Neural Network", Energies, 2017, vol. 10, no.
10, p. 1453. Available: 10.3390/en10101453.
... The study offered a comprehensive comparison of Recurrent Neural Network (RNN) performance on both raw and translated texts (Bengali and English), achieving accuracy rates of 82.9% for the Bengali texts and 84.5% for the English translations. Also in 2021, Alvi et al [12] investigated the sentiment analysis of Bangla text using a Gated Recurrent Neural Network (GRNN). They applied a 10-fold cross-validation method, repeated three times, to ensure robust results. ...
Conference Paper
Full-text available
Depression is a worldwide epidemic and is of special concern due to its impact on mental health and the need for comprehensive studies for better detection and treatment. As the severity of depression grows among the Bengali population, yet the identification is relatively uncommon in this wider community, it appears to be in high demand. In our research, an efficient deep hybrid learning is instituted to identify levels of depression in Bangla text. We gathered 3000 Bengali Facebook posts for our dataset labeled for three levels of depression, mild, moderate, and severe. A bidirectional Long Short-Term Memory (bi-LSTM) architecture and a Random Forest classifier have been applied. We followed some pre-processing steps for these texts, which involved cleaning, correcting mistakes, removing stop words, segmentation, and embedding the text using Word2Vec to turn the text into vectors. The bi-LSTM model performed at an accuracy of 90.5% but this was improved to 93.67% when combined with Random Forest by using Softmax generated class scores as features. The aspects of our proposed hybrid bi LSTM-RF techniques suggest that it could be more precise and effective for depression detection in the less researched Bangla language. These results provide insight into the application of deep learning with classical classifiers for sentiment classification in the mental health field and enhancing the treatment of depression by early detection.
... Deep recurrent models are commonly employed in the context of Bangla sentiment classification problems within the realm of deep learning methodologies [17]. Transfer learning models such as LSTM [18], and GRU [19] are also popular for this task. Transformer models, like as Bangla-BERT [20], have become crucial instruments in the progression of sentiment analysis in the Bangla language. ...
Conference Paper
Sentiment analysis, which involves the identification and assessment of the emotional state of the author, constitutes an integral component within the broader context of text analysis. This approach is crucial since it facilitates a holistic understanding of users’ emotions, perspectives, and preferences. The advent of large language models (LLMs), exemplified by LLaMA, has significantly expanded the accessibility of state-of-the-art model applications, including sentiment analysis. Nevertheless, the potential of LLMs in addressing various sentiment analysis challenges has not been extensively explored. This work explores the feasibility of leveraging LLaMA, a Large Language Model (LLM), for sentiment analysis in Bangla, a low-resource language. A dataset of 1,000 Bangla customer reviews was used to fine-tune the model, achieving an overall accuracy of 89.76%, outperforming existing models such as Bangla-BERT by 1%. The study highlights the use of parameter-efficient fine-tuning techniques (LORA and PEFT) to reduce computational overhead, making it suitable for resource-constrained environments. The findings demonstrate the potential of LLMs in advancing sentiment analysis for low-resource languages.
... They proved by experiments that BERT outperform Word2Vec, FastText, GloVe feature extractor techniques. Another DL based study was performed in Alvi et al. (2022) using LSTM, GRU and BLSTM classifiers along with 10-fold cross validation and achieved highest 78.41% accuracy score. ...
Article
Full-text available
In this modern technologically advanced world, Sentiment Analysis (SA) is a very important topic in every language due to its various trendy applications. But SA in Bangla language is still in a dearth level. This work focuses on examining different hybrid feature extraction techniques and learning algorithms on Bangla Document level Sentiment Analysis using a new comprehensive dataset (BangDSA) of 203,493 comments collected from various microblogging sites. The proposed BangDSA dataset approximately follows the Zipf’s law, covering 32.84% function words with a vocabulary growth rate of 0.053, tagged both on 15 and 3 categories. In this study, we have implemented 21 different hybrid feature extraction methods including Bag of Words (BOW), N-gram, TF-IDF, TF-IDF-ICF, Word2Vec, FastText, GloVe, Bangla-BERT etc with CBOW and Skipgram mechanisms. The proposed novel method (Bangla-BERT+Skipgram), skipBangla-BERT outperforms all other feature extraction techniques in machine leaning (ML), ensemble learning (EL) and deep learning (DL) approaches. Among the built models from ML, EL and DL domains the hybrid method CNN-BiLSTM surpasses the others. The best acquired accuracy for the CNN-BiLSTM model is 90.24% in 15 categories and 95.71% in 3 categories. Friedman test has been performed on the obtained results to observe the statistical significance. For both real 15 and 3 categories, the results of the statistical test are significant.
... In [14], sentiment analysis on KN95 mask reviews was conducted using TF-IDF vectorization and classifiers (Support Vector Machine, Gaussian Naïve Bayes, and Multinomial Naïve Bayes), with Gaussian Naïve Bayes demonstrating superior accuracy, recall, and F1-score. Paper [15] introduced a five-layered GRU neural network model surpassing the state-of-the-art Bidirectional LSTM (BLSTM) result. The research by author Naimul Hossain [16] has accrued 94.22% accuracy using a combined CNN-LSTM model. ...
Conference Paper
Sentiment analysis is a technique that combines machine learning and natural language processing to identify the emotional attitude of a text. This is a very active research area in recent years. Bengali is the fifth most spoken Indo-European language in the world. Many people in Bangladesh use news portals and social media to gather information on various topics. We used a publicly available dataset from Kaggle. This data set consists of more negative reviews than positive reviews. We try to experiment with this dataset with different models, such as traditional ML models and deep learning models like CNN, LSTM, and the transformer model (Bangla-BERT-base). The Bangla-BERT-base achieved a notable 96% accuracy through 10-fold cross-validation. Several other performance measures are also used to evaluate our model.
... Learning. From DL approaches, LSTM, hybrid LSTM, and BiLSTM (Bidirectional Long Short-Term Memory), Gated Recurrent Unit (GRU) played the pivotal role in giving a better performance (in terms of accuracy) in this linguistic research [12,25,56,57,98,107,108,128]. An accuracy level of 77.85% [56] and 91.35% [57] were achieved respectively using BiLSTM. ...
Article
Full-text available
The effortless expansion of Internet access has eventually transformed the dissemination behavior towards E-Mode. Thus the usage of online or, more specifically, ‘Digital’ texts has expanded abruptly. ‘Bangla’, the seventh most spoken language globally, has no different nature. Communication in the Bangla language has also been exposed on the Internet, which describes the feelings of individuals in any specific context. These enormously generated data from diverse sources have drawn the interest of the researchers working in the Natural Language Processing domain. Despite its relatively complicated structure, a lesser amount of annotated data, as well as a limited number of frameworks and approaches, exist. This lacking of resources has kept several stones unturned in this diverse, emotion-rich and widely spoken language. To bridge the lacking and absence of resources, this article aims to provide a generalized deduced working procedure in this domain. To do so, the existing research work in the domain of sentiment analysis using Bangla text has been collected, evaluated and summarized. Also, in this article, the techniques used in pre-processing, feature extraction, and eventually used algorithms have been identified and discussed. Considering these facts, this research work sketches a tentative blueprint of sentiment analysis using Bangla text. Additionally, this article discusses existing regional language corpora such as Tamil, Urdu, and Hindi, as well as English and methodologies used to extract emotional essence from Bangla language comparing other languages. That will assist in determining the probable chosen path of exploring Bangla in a more deeper aspect. Moreover, this work has deduced and presented a generalized framework that will direct aspiring researchers to decide the pathway of choosing data vis-à-vis methodologies based on their interests.
... From DL approaches, LSTM, hybrid LSTM, and BiLSTM (Bidirectional Long Short-Term Memory), Gated Recurrent Unit (GRU) played the pivotal role in giving a better performance (in terms of accuracy) in this linguistic research P r e -P r i n t A c c e p t e d o n A C M T A L L I P [24,25,39,46,64,[105][106][107]. An accuracy level of 77.85% [24] and 91.35% [46] were achieved respectively using BiLSTM. ...
Preprint
Full-text available
The effortless expansion of Internet access has eventually transformed the dissemination behavior towards E-Mode. Thus the usage of online or, more specifically, ‘Digital’ texts has expanded abruptly. ‘Bangla’, the seventh most spoken language globally, has no different nature. Communication in the Bangla language has also been exposed on the Internet, which describes the feelings of individuals in any specific context. These enormously generated data from diverse sources have drawn the interest of the researchers working in the Natural Language Processing domain. Despite its relatively complicated structure, a lesser amount of annotated data, as well as a limited number of frameworks and approaches, exist. This lacking of resources has kept several stones unturned in this diverse, emotion-rich and widely spoken language. To bridge the lacking and absence of resources, this article aims to provide a generalized deduced working procedure in this domain. To do so, the existing research work in the domain of sentiment analysis using Bangla text has been collected, evaluated and summarized. Also, in this article, the techniques used in pre-processing, feature extraction, and eventually used algorithms have been identified and discussed. Considering these facts, this research work sketches a tentative blueprint of sentiment analysis using Bangla text. Additionally, this article discusses existing regional language corpora such as Tamil, Urdu, and Hindi, as well as English and methodologies used to extract emotional essence from Bangla language comparing other languages. That will assist in determining the probable chosen path of exploring Bangla in a more deeper aspect. Moreover, this work has deduced and presented a generalized framework that will direct aspiring researchers to decide the pathway of choosing data vis-à-vis methodologies based on their interests.
Conference Paper
Every day, thousands of research papers are produced, and amongst all of these research works, computer science is most continually evolving. Thus, a large number of academics, research institutions, and funding bodies benefit from knowing which research fields are popular in this specific field of study. In this regard, we have produce a deep learning-based framework to estimate the future paths of computer science research by forecasting the number of articles that will be published. The recommended strategy shows the best prediction results in contemporary to the baseline approaches with 1483.23 RMSE and 0.9854 R-Square values.
Conference Paper
Full-text available
In this modern technologically advanced world, Sentiment Analysis (SA) is a very important topic in every language due to its various trendy applications. But SA in Bangla language is still in a dearth level. This work focuses on examining different hybrid feature extraction techniques on Bangla SA using a new comprehensive dataset of 203,493 comments collected from various microblogging sites. In this study, we have implemented 21 different hybrid feature extraction methods including Bag of Words (BOW), N-gram, TF-IDF, TF-IDF-ICF, Word2Vec, FastText, GloVe, Bangla-BERT etc with CBOW and Skipgram mechanisms. The proposed novel method (Bangla-BERT+Skipgram) outperforms all other feature extraction techniques in machine leaning (ML), ensemble learning (EL) and deep learning (DL) approaches. The (Bangla-BERT+Skipgram) method achieved highest 92.37%, 92.55% and 95.71% accuracy in ML, EL and DL algorithms.
Preprint
Full-text available
Text classification is an essential and the most well-known topic of Artificial Intelligence as a discipline of Natural Language Processing (NLP). Because of the abundance of textual documents in Bangla, text classification has become a crucial subject. Natural Language Processing (NLP) in Bangla, at the same time, is not as developed as it is in English, and little study has been done in the context of the Bengali dialect, which is among the most widely used languages in the world. As a consequence, it's past time to address this issue in order to effective information management and data structure. The following is an example of a Bangla phrase from a narrative: assertive, interrogative, imperative, optative, or exclamatory text document. Numerous machine learning (ML) and deep learning (DL) algorithms are applied to categorize the sentence in the text document using the dataset. Our dataset is unique in that it was created by hand while keeping Bangla's sentence structure and origin in perspective. Within all the machine learning (ML) techniques, there are two that stands out: RN and DT provides the supreme exactness at 89.42%. As a deep learning strategy, between LSTM and RNN, LSTM exhibit superlative accuracy, with having an accuracy of 88.2 percent. Our experiment also offers a benefit in NLP for detecting the expression of textual data in the future execution, and hybrid approaches will be performed by increasing our dataset for improving the interaction between Bangla and the Natural language processing (NLP) field.
Conference Paper
Full-text available
The combination of machine learning approach and natural language processing is applied to analyze the sentiment of text for particular sentences. In this particular area lots of work done in recent times. Restaurant business was always a popular business in Bangladesh. These business is now Leaning towards online delivery services and the overall quality of restaurants are now judged by reviews of customers. One try to understand the quality of a restaurant by the reviews from other customers. These opinions of customers organizing in structured way and to understand perception of customers reviews and reactions is the main motto of our work. Collecting data was the first thing we have done for deploying this piece of work. Then making a dataset which we harvested from websites and tried to deploy with deep learning technique. In this piece of research, a combined CNN-LSTM architecture used in our dataset and got an accuracy of 94.22%. Also used some other performance metrics to evaluate our model.
Conference Paper
Full-text available
Nowadays, micro-blogging sites like Twit-ter, Facebook, YouTube, etc., have become much popular for social interactions. People are expressing their depression over social media, which can be analyzed to identify the causes behind their depression. Most of the researches on emotion and depression analysis are based on questionnaires and academic interviews in non-Bengali languages, especially English. These traditional methods are not always suitable for detecting human depression. In this paper, we introduced a Gated Recurrent Neural Network based depression analysis approach on Bangla social media data. We collected Bangla data from Twitter, Facebook and other sources. We selected four hyper-parameters, namely, number of Gated Recurrent Unit (GRU) layers, layer size, batch size and number of epochs, and presented step by step tuning for these Hyper-parameters. The results show the effects of these tuning steps and how the steps can be beneficial in configuring GRU models for gaining high accuracy on a significantly smaller data set. This work will help psychologists and concerned authorities of society detect depression among Bangla speaking social media users. It will also help researchers to implement Natural Language Processing tasks with Deep Learning methods.
Article
Full-text available
This paper deals with gear pitting fault diagnosis problem and presents a method by integrating convolutional neural network (CNN) and gated recurrent unit (GRU) networks with vibration and acoustic emission signals to solve the problem. The presented method first trains a one-dimensional CNN with acoustic emission signals and a GRU network with vibration signals. Then the gear pitting fault features obtained by the two networks are concatenated to form a deep learning structure for gear pitting fault diagnosis. Seven different gear pitting conditions are used to test the feasibility of the presented method. The diagnosis result of the gear pitting fault shows that the accuracy of the presented method reaches above 98% with only a relatively small number of training samples. In comparison with the results using CNN or GRU network alone, the presented method gives more accurate diagnosis results. By comparing the results of different loads and learning rates, the robustness of the presented method for gear pitting fault diagnosis is proved. Moreover, the presented deep structure can be easily extended to more other sensor input signals for gear pitting fault diagnosis in the future.
Conference Paper
Full-text available
Sentiment analysis has become a key research area in natural language processing due to its wide range of practical applications that include opinion mining, emotions extraction, trends predictions in social media, etc. Though the sentiment analysis in English language has been extensively studied in recent years, a little research has been done in the context of Bangla language, one of the most spoken languages in the world. In this paper, we present a comprehensive set of techniques to identify sentiment and extract emotions from Bangla texts. We build deep learning based models to classify a Bangla sentence with a three-class (positive, negative, neutral) and a five-class (strongly positive, positive, neutral, negative, strongly negative) sentiment label. We also build models to extract the emotion of a Bangla sentence as any one of the six basic emotions (anger, disgust, fear, joy, sadness and surprise). We evaluate the performance of our model using a new dataset of Bangla, English and Romanized Bangla comments from different types of YouTube videos. Our proposed approach shows 65.97% and 54.24% accuracy in three and five labels sentiment, respectively. We also show that the performance of our model is better for domain and language specific texts.
Article
Full-text available
The forecasting of energy consumption in China is a key requirement for achieving national energy security and energy planning. In this study, multi-variable linear regression (MLR) and support vector regression (SVR) were utilized with a gated recurrent unit (GRU) artificial neural network of Chinese energy to establish a forecasting model. The derived model was validated through four economic variables; the gross domestic product (GDP), population, imports, and exports. The performance of various forecasting models was assessed via MAPE and RMSE, and three scenarios were configured based on different sources of variable data. In predicting Chinese energy consumption from 2015 to 2021, results from the established GRU model of the highest predictive accuracy showed that Chinese energy consumption would be likely to fluctuate from 2954.04 Mtoe to 5618.67 Mtoe in 2021.
Conference Paper
The vector representation of Bengali words using word2vec model (Mikolov et al. (2013)) plays an important role in Bengali sentiment classification. It is observed that the words that are from same context stay closer in the vector space of word2vec model and they are more similar than other words. In this article, a new approach of sentiment classification of Bengali comments with word2vec and Sentiment extraction of words are presented. Combining the results of word2vec word co-occurrence score with the sentiment polarity score of the words, the accuracy obtained is 75.5%.
Conference Paper
Datasets originating from social networks are valuable to many fields such as sociology and psychology. But the supports from technical perspective are far from enough, and specific approaches are urgently in need. This paper applies data mining to psychology area for detecting depressed users in social network services. Firstly, a sentiment analysis method is proposed utilizing vocabulary and man-made rules to calculate the depression inclination of each micro-blog. Secondly, a depression detection model is constructed based on the proposed method and 10 features of depressed users derived from psychological research. Then 180 users and 3 kinds of classifiers are used to verify the model, whose precisions are all around 80%. Also, the significance of each feature is analyzed. Lastly, an application is developed within the proposed model for mental health monitoring online. This study is supported by some psychologists, and facilitates them in data-centric aspect in turn.