ArticlePDF Available

Abstract

In today's digital world, millions of individuals are linked to one another via the Internet and social media. This opens up new avenues for information exchange with others. Sentiment analysis (SA) has gotten a lot of attention during the last decade. We analyse the challenges of Sentiment Analysis (SA) in one of the Asian regional languages known as Marathi in this study by providing a benchmark setup in which we first produced an annotated dataset composed of Marathi text acquired from microblogging websites such as Twitter. We also choose domain experts to manually annotate Marathi microblogging posts with positive, negative, and neutral polarity. In addition, to show the efficient use of the annotated dataset, an ensemble-based model for sentiment analysis was created. In contrast to others machine learning classifier, we achieved better performance in terms of accuracy for ensemble classifier with 10-fold cross-validation (cv), outcomes as 97.77%, f-score is 97.89%.
An Ensemble Based Approach for Sentiment Classication in Asian Regional
Language
Mahesh B. Shelke
1
, Jeong Gon Lee
2
,
*
, Sovan Samanta
3
, Sachin N. Deshmukh
1
, G. Bhalke Daulappa
4
,
Rahul B. Mannade
5
and Arun Kumar Sivaraman
6
1
Department of Computer Science and Information Technology, Dr. Babasaheb Ambedkar Marathwada University, Aurangabad,
Maharashtra, 431004, India
2
Division of Applied Mathematics, Wonkwang University, 460, Iksan-daero, Iksan-Si, Jeonbuk, 54538, Korea
3
Department of Mathematics, Tamralipta Mahavidyalaya, Tamluk, West Bengal, 721636, India
4
Department of Electronics and Telecommunication Engineering, AISSMSCOE, Pune, Maharashtra, 411001, India
5
Department of Information Technology, Government College of Engineering, Aurangabad, Maharashtra, 431005, India
6
School of Computer Science and Engineering, Vellore Institute of Technology, Chennai, 600127, India
*Corresponding Author: Jeong Gon Lee. Email: jukolee@wku.ac.kr
Received: 30 January 2022; Accepted: 23 March 2022
Abstract: In todays digital world, millions of individuals are linked to one
another via the Internet and social media. This opens up new avenues for infor-
mation exchange with others. Sentiment analysis (SA) has gotten a lot of attention
during the last decade. We analyse the challenges of Sentiment Analysis (SA) in
one of the Asian regional languages known as Marathi in this study by providing
a benchmark setup in which we rst produced an annotated dataset composed of
Marathi text acquired from microblogging websites such as Twitter. We also
choose domain experts to manually annotate Marathi microblogging posts with
positive, negative, and neutral polarity. In addition, to show the efcient use of
the annotated dataset, an ensemble-based model for sentiment analysis was cre-
ated. In contrast to others machine learning classier, we achieved better perfor-
mance in terms of accuracy for ensemble classier with 10-fold cross-validation
(cv), outcomes as 97.77%, f-score is 97.89%.
Keywords: Sentiment analysis; machine learning; lexical resource; ensemble classier
1 Introduction
In this digital age, millions of people are connected to one another through Web 2.0 and social
networking. This allows for a new technique of exchanging knowledge with other people. Social
networking sites, e-commerce websites, blogging, and other similar platforms allow users to instantly
generate creative content, thoughts, and opinions, leading in the development of huge amounts of data
every day. Sentiment analysis and opinion mining have grown as a challenging and dynamic eld of
research for both resourced and under-resourced languages. The term sentiment refers to a broad concept
that encompasses sentiment, evaluation, appraisal, or attitude toward a piece of information that
demonstrates the authors point of view.
This work is licensed under a Creative Commons Attribution 4.0 International License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original
work is properly cited.
Computer Systems Science & Engineering
DOI: 10.32604/csse.2023.027979
Article
ech
T
PressScience
Opinion mining or emotional intelligence are terms used to describe sentiment analysis. Sentiment
analysis is the systematic process of extracting useful knowledge from unstructured and unorganized text
information in various social platforms and online sources, such as chats on Twitter, WhatsApp, and
Facebook, as well as online blogs and comments. Opinion mining includes establishing automated
systems that employ any of the machine learning methods to accomplish opinion mining.
The number of Marathi internet users and web content has grown tremendously. Because Marathi is still
an under-resourced language in the eld of sentiment analysis, there have been few attempts to perform SA in
Marathi. Users express their opinions in a variety of methods, including bilingual text, transliterated words,
emoticons, spelling variations, incorrect linguistic structures, and many others [1]. This makes sentiment
analysis a difcult eld for research, particularly with Indian languages. This allows for the development
of Marathi resources and research in the eld of sentiment analysis.
The major contributions of this research work are the development and evaluation of lexical resources
for sentiment analysis in Marathi, because there are minor lexical resources, libraries, and lexical Corpus
available for Marathi, indicating that Marathi has not been explored in the eld of sentiment analysis. In
this research, we present an ensemble-based model for predicting the sentiment of Marathi texts through
integrating the output of Machine Learning-based models. And for developing benchmark dataset, we
manually annotated the Twitter dataset with the help of human annotators (domain experts), who are
senior researchers in Marathi, and for analysis of these annotatorsperformance, we used Fleisskappa as
performance measurement matrices, and lastly, all classication algorithms are also evaluated and
discussed. In addition, an annotated dataset of Marathi tweets with positive, negative, and neutral
sentiment orientations was created.
2 Related Work
In recent years, only a few Indian languages have been studied, including Hindi, Telugu, Tamil, Telugu,
and Bengali. However, as Indian peoples digital literacy grows and technology becomes easier to utilize for
creating content in Indian languages, countries like India will be capable of creating content in regional
languages on the Internet.
Authors have proposed ensemble-based model sentiment analysis of Persian text [24]. Sentiment
analysis is performed using deep learning and shallow approaches. In experimentation, achieved accuracy
is up to 79.68% [5]. Researchers proposed an ensemble-based recommender system for hotel reviews and
also categorized aspects [6]. And used ensemble of binary classication known as BERT technique, with
features as Word2Vec, subjectivity score and Term Frequency-Inverse Document Frequency (TF-IDF),
achieved performance of model with 84% f-score and 93.26% accuracy [7]. In proposed ensemble model
for feature extraction author has considered Information Gain (IG), Gini Index and Chi Square. And used
machine learning algorithms as Sequential Minimal Optimization (SMO), Multi-nominal Naïve Bayes
(MNB), and Random Forest (RF) and considered multi-domain dataset.
The researchers studied the use of Naive Bayes (NB) and Support Vector Machine for machine learning-
based sentiment classication of movie reviews (SVM) [812]. Sentiment Analysis is a two-class
classication problem comprising Positive and Negative classes; this kind of study may be used to
classify textual information and feature selection affects classier performance.
The Authors have performed comparative performance weight of each binary classier in the training
sample set is computed for enhanced one-vs-one (OVO) technique based on the K nearest neighbours and
the class centre of each category in the training sample set about the classication algorithm [13]. The
information gain (IG) approach is used to identify the key features for multi-class sentiment analysis; a
binary SVM classier is then trained on feature extraction training of every pair of sentiment categories.
2458 CSSE, 2023, vol.44, no.3
Ensemble approaches, as alternative to using each of the individual learning algorithms alone, employ many
learning algorithms to achieve greater efciency [14]. Deep learning techniquesperformance can be
improved by combining them with standard approaches based on manually acquired features [15].
Machine Learning based techniques has played a signicant role in Natural Language Processing [16].
Machine learning techniques are divided into two learning classes as supervised and unsupervised learning. For
task of Sentiment analysis mostly preferred supervised algorithms as Support Vector Machine (SVM),
Maximum Entropy and Naïve Bayes (NB) [1719]. It includes feature-based sentiment analysis and summarization.
3 System Development
This section describes corpus creation process, pre-processing, manual annotation, and performance
evaluation of human annotator with the help of Fleisss Kappa [20]. And proposed ensemble-based model
for sentiment classication.
3.1 Corpus Creation
3.1.1 Corpus Acquisition
We have extracted publicly available Marathi Tweets from twitter with the help twitter-API. Initially, we
have collected generalized 1493 Marathi Tweets.
3.1.2 Data Preprocessing
Initially, pre-processed the data into the necessary forms, for which following steps are carried out:
Identied and eliminated duplicate and irrelevant tweets manually.
Identied and transliterated English words present in tweets into Marathi manually.
Removed stop words.
Performed lemmatization to nd root word.
Removed any incorrect punctuation, smileys, hashtags, or photo tags.
Removed complicated sentences since they are inappropriate for performing sentiment analysis.
3.1.3 Data Annotation
We chose three domain experts who are senior scholars with a Ph.D. in Marathi to do manual data
annotation with the help of human annotators. We asked them to tag the Marathi Tweets dataset with 1,
0, and 1 to represent the positivity, neutrality, and negativity of Marathi tweets.
3.2 Feature Extraction
Supervised Machine learning methods generates output for test data by learning from a pre-dened set of
features in the training samples [21]. As Machine learning methods cannot directly works on raw text, as
result feature extraction methods are required to transfer text into a vector of features. In this research
work we are considering unigram with Term FrequencyInverse Document Frequency (TF-IDF) for
feature extraction. Mostly, unigram i.e., single words hold important opinions, emotion [22]. For
example, Camera of this mobile is good, here word goodexpresses opinion about camera. So, it
becomes important for to consider Unigram + TF-IDF model for feature extraction.
The unigram word vectors obtained during initial stage are used to build a matrix containing all of the
tweets, and the unigrams recovered from the matrix are handled as features. The TF-IDF feature matrix is
constructed with the features as columns and tweets as rows. The Lexical TF-IDF is calculated by
multiplying each feature column of the TF-IDF feature matrix by its sentiment score. This matrix is used
to train supervised machine learning algorithms.
CSSE, 2023, vol.44, no.3 2459
3.3 Sentiment Classication Approach
To learn and classify, machine learning algorithms employ various series. The names of the input feature
vectors and their classes are included in the training set. Using this training set, a classication model was
created to classify the input material into positive and negative class [23]. Extracted feature sets are applied to
train the classier to evaluate if the data set review is positive or negative. Ensemble techniques are a type of
machine learning methodology that integrates numerous base models to create a single best prediction model.
3.3.1 Logistic Regression (LR)
Logistic regression estimates probabilities using a logistic function, which is the cumulative logistic
distribution, to assess the association between a categorical dependent variable and one or more
independent variables [2428]. Logistic regression is a linear approach; however, the logistic function is
used to modify the predictions. It is a statistical technique for assessing a dataset that has one or more
independent variables that affect the outcomes.
Instead of tting a regression line, we t a "S" shaped logistic function that predicts two maximum
values in logistic regression (0 or 1). Logistic regression starts with a conventional linear regression and
then adds a sigmoid to the linear regression result. Regression is expressed Eq. (1) and logistic function
in Eq. (2).
¼w0x0þw1x1þw2x2þwNxN(1)
where, w
0
indicates weights and x1 represents independent variables.
hzðÞ¼ 1
1þexpz(2)
3.3.2 Stochastic Gradient Decent (SGD)
Stochastic Gradient Descent (SGD) is a straightforward but highly efcient method for tting linear
classiers and regressors to convex loss functions. SGD has been effectively used to large-scale and
sparse machine learning applications, such as text categorization and NLP. Given the sparsity of the data,
the classiers in this module can efciently scale to problems with more than training instances and more
than features. The class SGD Classier provides a simple stochastic gradient descent learning process that
supports various classication loss functions and penalties. The decision boundary of an SGD Classier
trained with the hinged loss, which is comparable to a linear SVM.
3.3.3 Support Vector Machine (SVM)
The Support Vector Machine (SVM) is a well-known supervised machine learning model for
categorization and prediction of different datasets. Several studies claim that SVM is a fairly accurate
approach for text categorization. It is also often used in sentiment analysis.
For example, if we have a dataset with data that has been pre-labelled into two categories: positive and
negative labels in Fig. 1, we may train a model to classify real time data into these two categories. This is
precisely how SVM operates. We train the model on a dataset so that it can evaluate and classify
unknown data into the categories that were present in the training set.
3.3.4 Naive Bayes (NB)
The Naive Bayes classier is a prominent supervised classier that allows you to express positive,
negative, and neutral sentiments in content. To classify words into their respective categories, the Naive
Bayes classier employs conditional probability. The advantage of using Naive Bayes for text
classication is that it just requires a minimal dataset for training. The raw data is pre-processed, with
removal of stop words, punctuation marks, extra spaces, transliteration of other language words and
2460 CSSE, 2023, vol.44, no.3
special symbols removed. Human annotator performs the manual tagging of words with labels of positive,
negative, and neutral tags.
It can be benecial for determining the likelihood of each statement using sentiment. In this technique,
each attribute helps to selecting which labelling should be allocated to the emotion value of each phrase. The
Naive Bayes classier starts by computing the prior probability of each labelled sentence, which is derived by
examining the occurrence of each labelled statement in the training data set. Following Eq. (3) describes
bayes rule.
PAjBðÞ¼
PðBjAÞPAðÞ
PBðÞ (3)
where, A is Particular class, B sentence which needs to be classied, P(A) and P(B) are Prior
probabilities, and P(A | B) and P(B | A) are Posterior probabilities.
3.3.5 Nearest Neighbour
Nearest Neighbours (KNN) is an important classication technique in Machine Learning. It is a
supervised learning algorithm that is widely used in text classication. It is extensively applicable in real-
world circumstances since it is non-parametric, which means it makes no underlying assumptions
regarding data distribution. We are provided some previous data (also known as training data) that
classify locations into categories based on a characteristic.
3.3.6 Ensemble Classier
The purpose of Ensemble techniques is to integrate the predictions of numerous base estimators with a
specic learning algorithm to increase the classiers accuracy and resilience. The idea behind the Voting
Classier is to integrate conceptually distinct machine learning classiers and forecast the class labels
using a majority vote or the average projected probability (soft vote). Such a classier can be effective for
balancing out the individual aws of a set of similarly highly performing classiers.
Fig. 2 shows An Ensemble based Sentiment classication approach using supervised Machine Learning
algorithms. And Algorithms are Support Vector Machine (SVM), Nave Bayes (NB), k-Nearest Neighbour
(KNN), Neural Network, Decision Tree (DT), Logistic Regression (LR), Stochastic Gradient Decent
(SGD), and the proposed Ensemble-based Model are implemented in research work.
Figure 1: Optimal hyperplane
CSSE, 2023, vol.44, no.3 2461
Algorithm: Sentiment analysis using proposed ensemble based algorithm
Input: An annotated Marathi Tweet Dataset
A list of tweets, Tcontains positive and negative sentences
T={T
p1
,T
P2
,……,T
pi
,T
n1
,T
n2
,……,T
ni
}
Where, T
pi
is number of positive sentences, and T
ni
is number of negative sentences
Pos_count,Neg_count is Positive count and negative count respectively.
Output: Pos_score
i
, Neg_score
i
contains sentiment score
Sentiment_polarity_score(dataset)
1. for each T
i
,in Tdo
2. Pos_count
i
=0;
3. Neg_count
i
=0;
4. for each Classier_C
i
in ensemble_classier do
5. if C
i
predict_positive then
6. Pos_count
i
+= 1;
7. else
8. Neg_count
i
+= 1;
9. [if-else end]
10. [for end]
11. [for end]
12. for each Classier_C
i
in ensemble_classier do
13. Weight
Ci
= accuracy_C
i
/ (Sum of all learning
classiers in ensemble classier)
14. [for end]
15. for each T
i
,in Tdo
16. Pos_count
i
=0;
17. Neg_count
i
=0;
18. for each Classier_C
i
in ensemble_classier do
19. if C
i
predict_positive then
20. Pos_score
i
+= Weight
Ci
* prob(pos
i
);
21. else
22. Neg_score
i
+= Weight
Ci
* prob(neg
i
);
(Continued)
2462 CSSE, 2023, vol.44, no.3
4 Performance Evaluation
4.1 Data Annotation: Inter-annotator Agreement Score
We employed the FleissKappa inter annotator agreement score to evaluate manual data annotation
evaluation between annotator. Fleisskappa score is calculated using the formula below (Wik21).
k¼
Px
Px
1
Px
(4)
Where, the factor 1
Pxrepresents the degree of agreement that can be obtained other than by chance,
The degree of agreement that was achieved above chance is given by
Px
Px. and if the evaluators are totally
in agreement, Kappa k = 1 and k = 0 if there is no agreement among the evaluators (other than what would be
expected by chance). And for Marathi Tweets dataset the inter-annotator agreement score is k = 0.957, which
is almost perfect agreement. Tab. 1. Inter-Annotator agreement score shows Inter-Annotator agreement score
and Tab. 2. The statistics for Marathi tweets dataset after preprocessing and data annotation. shows the
Algorithm: (continued)
23. [if-else end]
24. [for end]
25. return Pos_score
i,
Neg_score
i
26. [for end]
Calculating Probability:
prob posi
ðÞ¼ Pos counti
Pos countiþNeg counti
prob negi
ðÞ¼ Neg counti
Pos countiþNeg counti
Figure 2: Ensemble based sentiment classication approach
CSSE, 2023, vol.44, no.3 2463
statistics for Marathi tweets dataset after preprocessing and data annotation. Inter- Annotator agreement score
and the statistics for Marathi tweets dataset after preprocessing and data annotation are shown in graphical
manner in Figs. 3 and 4. respectively.
Table 1: Inter-annotator agreement score
Annotator ( iwith j) Fleisss Kappa score
A
12
0.953
A
23
0.965
A
13
0.954
Avg. agreement score 0.957
Table 2: The statistics for marathi tweets dataset after preprocessing and data annotation.
Sr. no. Statistics No. of tweets
1 Initial 1493
2 After Preprocessing 1069
3 Positive 627
4 Negative 430
5 Neutral 12
Figure 3: Graphical representation of inter-annotator agreement score (eisss kappa)
Figure 4: Graphical representation of statistics for marathi tweets dataset
2464 CSSE, 2023, vol.44, no.3
4.2 Performance of Sentiment Classication Approach
We concentrated on three sorts of class problems in the experiment: positivity, neutrality, and negativity.
Using the Twitter API, we retrieved Marathi tweets. Furthermore, the Marathi Tweets dataset is classied into
three groups depending on the sentiment represented in the statements. If the expressed attitude indicates
positivity, then labelled as 1, if it is neutrality then labelled as 0, and if it is negativity then labelled as 1.
The dataset is partitioned into 75:25 ratios for training and testing datasets. The dataset is subjected to
different preprocessing methods, including data cleaning, URL and Hashtag removal, unnecessary blank
spaces, emojis, removal of Stopword, and lemmatization. k-fold cross validation with k = 5 and
k = 10 was also employed.
And evaluation metrics used are F-score and Accuracy which are calculated as below.
Recall ¼TP Sentiment
TP Sentiment þFN Sentiment (5)
Precision ¼TP Sentiment
TP Sentiment þFP Sentiment (6)
FScore ¼2Precision Recall
Precision þRecall (7)
Accuracy ¼TP Sentiment þTN Sentiment
TP Sentiment þFN Sentiment þFP Sentiment þTN Sentiment (8)
Analyzed comparative results from base classiers, majority voting ensemble, and developed ensemble
classier. The proposed ensemble classiers performance is compared to that of the individual conventional
classier and the majority voting ensemble classier. Tab. 3. displays the results. On Marathi datasets, the
suggested ensemble classier outperformed the stand-alone classier and the majority voting ensemble
classier.
A classication model may be assessed using a variety of metrics, the most basic of which is accuracy
and f-score. Tab. 3. shows the performance evaluation of individual classier with k-fold validation.
Graphical representation of performance evaluation of individual classier with k-fold validation is shown
in Figs. 5 and 6.
Table 3: Performance evaluation of individual classier with k-fold validation
Sr. no. Classier Unigram + TF-IDF ( k=5) Unigram + TF-IDF ( k=10)
Accuracy F-score Accuracy F-score
1 SVM 92.46% 96.00% 91.89% 95.65%
2 Multinomial Naïve Byes 90.76% 95.15% 89.53% 94.44%
3 K-Nearest Neighbour 91.98% 95.70% 89.63% 94.35%
4 Neural Network 93.40% 96.44% 92.83% 96.01%
5 Decision Tree 91.71% 91.84% 90.90% 92.94%
6 Logistic Regression 90.76% 95.15% 88.97% 94.15%
7 Stochastic Gradient Decent 95.47% 97.37% 96.13% 97.62%
8 Ensemble Classier 96.77% 98.73% 97.77% 97.89%
CSSE, 2023, vol.44, no.3 2465
We performed 5-fold cross validation (cv) on dataset, for individual classier Support Vector Machine
(SVM), Multinomial Naïve Bayes (MNB), K- Nearest Neighbour (KNN), Neural Network (ANN), Decision
Tree (DT), Logistic Regression (LR), Stochastic Gradient Decent (SGD), we obtained accuracy as 92.46%,
90.76%, 91.98%, 93.40%, 91.71%, 90.76%, and 95.47% respectively and obtained better performance in
terms of accuracy for ensemble classier as 96.77%, f-score is 98.73%. For 10-fold cross validation (cv)
on dataset, individual classier SVM, MNB, KNN, ANN, DT, LR, and SGD, we obtained accuracy as
91.89%, 89.53%, 89.63%, 92.83%, 90.90%, 88.97%, and 96.13%, respectively and we obtained better
accuracy for ensemble classier as 97.77%, f-score is 97.89% for Marathi tweets dataset.
4.3 Result Discussions
This is the rst attempt to develop and evaluate a machine learning-based ensemble classier for
Marathi, and because there are no results for the same language, we compared our model with Hindi and
Konkani for result analysis because these languages are considered for sentiment analysis using Machine
Learning algorithms, and they are also in the Devanagari language family. The authors employed
Figure 5: Graphical representation of performance evaluation of individual classier with 5-fold validation
Figure 6: Graphical representation of performance evaluation of individual classier with 10-fold validation
2466 CSSE, 2023, vol.44, no.3
machine learning techniques such as Naive Bayes, Decision Tree, and Support Vector Machine (SMO) using
the Weka tool to reach accuracy of 50.95%, 54.48%, and 51.07% for the electronics product review dataset in
Hindi [25]. In the case of Konkani, the authors used a dataset of Konkani poetry with Naive Bayes
classication and attained an accuracy of 82.67% [2628]. Furthermore, we have obtained better
classication results for ensembled based classier as 96.77%, 97.77%, for 5-fold and 10-fold cv
respectively.
5 Conclusions
This research work presents a benchmarked technique for Sentiment Analysis of an Asian language
Marathi. For which we created an annotated corpus of Marathi Tweets, and performed manual data
annotation with the help of domain experts with tweets labelled as positivity, neutrality and negativity
polarity score that is 1, 0, and 1. And for performance evaluation of manually annotated corpus we used
Fleisss kappa (Inter-annotator agreement score) metrics and achieved average kappa score k = 0.957,
which is almost perfect agreement between inter-annotator. For ensemble-based Sentiment classication
experimentation, obtained better performance in terms of accuracy for ensemble classier with 5-fold
cross validation (cv) 96.77%, f-score is 98.73% and with 10-fold cross validation (cv), we obtained better
accuracy for ensemble classier as 97.77%, f-score is 97.89% for Marathi tweets dataset in comparison
with another machine learning classier.
Acknowledgement: The authors wish to express their thanks to one and all who supported them during
this work.
Funding Statement : This paper was supported by Wonkwang University in 2022.
Conicts of Interest: The authors declare that they have no conicts of interest to report regarding the
present study.
References
[1] R. Biswarup, G. Avishek and S. Ram, An ensemble-based hotel recommender system using sentiment analysis
and aspect categorization of hotel reviews,Applied Soft Computing, vol. 98, no. 17, pp. 106119, 2021.
[2] K. Dashtipour, C. Ieracitano, M. Carlo, A. Raza and A. Hussain, An ensemble based classication approach for
persian sentiment analysis,in Progresses in Articial Intelligence and Neural Systems, Smart Innovation,
Systems and Technologies, Springer, Singapore, vol. 184, no. 3, pp. 207215, 2021.
[3] M. Ghosh and G. Sanyal, An ensemble approach to stabilize the features for multi-domain sentiment analysis
using supervised machine learning,Journal of Big Data, vol. 5, no. 1, pp. 123138, 2018.
[4] K. Sarkar and M. Bhowmick, Sentiment polarity detection in bengali tweets using multinomial naïve bayes and
support vector machines,in IEEE Calcutta Conf. (CALCON), India, pp. 3135, 2017.
[5] A. Kannan, G. Mohanty and R. Mamidi, Towards building a sentiwordnet for tamil,in Proc. of the 13th Int.
Conf. on Natural Language Processing, India, pp. 3035, 2016.
[6] M. G. Jhanwar and A. Das, An ensemble model for sentiment analysis of hindi-english code-mixed data,in
Workshop on Humanizing AI (HAI). Stockholm, Sweden: IJCAI, 2018.
[7] R. Gayathri, R. Vincent, M. Rajesh, A. K. Sivaraman and A. Muralidhar, Web-acl based dos mitigation solution
for cloud,Advances in Mathematics Scientic Journal, vol. 9, no. 7, pp. 51055113, 2020.
[8] D. M. Mathews and S. Abraham, Twitter data sentiment analysis on a malayalam dataset using rule-based
approach,In: S. N., P. L., N. H., H. P., and N. N. (Ed.), Emerging Research in Computing, Information,
Communication and Applications, Springer, Singapore, vol. 906, pp. 407415, 2019.
[9] A. Oscar, C. P. Ignacio, J. Fernando and I. Carlos, Enhancing deep learning sentiment analysis with ensemble
techniques in social applications,Expert Systems with Applications, vol. 77, no. 12, pp. 236246, 2017.
CSSE, 2023, vol.44, no.3 2467
[10] M. Ganga, N. Janakiraman, A. K. Sivaraman, A. Balasundaram, R. Vincent et al.,Survey of texture based image
processing and analysis with differential fractional calculus methods,in Int. Conf. on System, Computation,
Automation and Networking (ICSCAN), IEEE Xplore, Puducherry, India, pp. 16, 2021.
[11] B. Pang, L. Lee and S. Vaithyanathan, Thumbs up? Sentiment classication using machine learning techniques,
in Proc. of the ACL-02 Conf. on Empirical Methods in Natural Language Processing (EMNLP 02), USA, pp. 79
86, 2002.
[12] D. Kothandaraman, A. Balasundaram, R. Dhanalakshmi, A. K. Sivaraman, S. Ashokkumar et al., Energy and
bandwidth based link stability routing algorithm for IoT,Computers Materials & Continua, vol. 70, no. 2,
pp. 38753890, 2021.
[13] P. Sharma and T. S. Moh, Prediction of indian election using sentiment analysis on hindi twitter,in IEEE Int.
Conf. on Big Data (Big Data), Japan, pp. 19661971, 2016.
[14] Y. Qiang, Z. Zhang and R. Law, Sentiment classication of online reviews to travel destinations by supervised
machine learning approaches,Expert Systems with Applications, vol. 36, no. 3, pp. 65276535, 2009.
[15] A. Balasundaram, G. Dilip, M. Manickam, A. K. Sivaraman, K. Gurunathan et al., Abnormality identication in
video surveillance system using dct,Intelligent Automation & Soft Computing, vol. 32, no. 2, pp. 693704, 2021.
[16] S. Rushdi, Experiments with SVM to classify opinions in different domains,Expert Systems with Applications,
vol. 38, no. 12, pp. 1479914804, 2011.
[17] A. Q. Md, D. Agrawal, M. Mehta, A. K. Sivaraman and K. F. Tee, Time optimization of unmanned aerial vehicles
using an augmented path,Future Internet, MDPI, vol. 13, no. 12, 308, pp. 113, 2021.
[18] S. S. Prasad, J. Kumar, D. K. Prabhakar and S. Tripathi, Sentiment mining: An approach for Bengali and Tamil
tweets,in Ninth Int. Conf. on Contemporary Computing (IC3 (pp. 1-4), Noida, IEEE, pp. 263278, 2016.
[19] S. Rani and P. Kumar, A journey of Indian languages over sentiment analysis: A systematic review,Springer,
vol. 6, no. 2, pp. 14151462, 2018.
[20] V. C. Joshi and V. M. Vekariya, An approach to sentiment analysis on gujarati tweets,Advances in
Computational Sciences and Technology, vol. 10, no. 5, pp. 14871493, 2017.
[21] R. Gayathri, A. Magesh, A. Karmel, R. Vincent and A. K. Sivaraman, Low cost automatic irrigation system with
intelligent performance tracking,Journal of Green Engineering, vol. 10, no. 12, pp. 1322413233, 2020.
[22] Y. Sharma, V. Mangat and M. Kaur, A practical approach to sentiment analysis of hindi tweets,in 1st Int. Conf.
on Next Generation Computing Technologies (NGCT-2015), Dehradun, India, vol. 19, no. 2, pp. 677680, 2015.
[23] R. Vincent, P. Bhatia, M. Rajesh, A. K. Sivaraman and M. S. S. Al Bahri, Indian currency recognition and
verication using transfer learning,International Journal of Mathematics and Computer Science, vol. 15, no.
4, pp. 12791284, 2020.
[24] L. Yang, B. Jian-Wu and F. Zhi-Ping, A method for multi-class sentiment classication based on an improved
one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm,Information Science, vol. 23,
no. 8, pp. 3852, 2017.
[25] M. S. Akhtar, A. Ekbal and P. Bhattacharyya, Aspect based sentiment analysis: Category detection and sentiment
classication for hindi. Vol. 23. Cham: Springer, pp. 246257, 2018.
[26] A. Rajan and A. Salgaonkar, Sentiment analysis for Konkani language: Konkani poetry, a case study. vol. 9.
Singapore: Springer, pp. 321329, 2020.
[27] W. Sun, G. Z. Dai, X. R. Zhang, X. Z. He and X. Chen, TBE-Net: A three-branch embedding network with part-
aware ability and feature complementary learning for vehicle re-identication,IEEE Transactions on Intelligent
Transportation Systems, vol. 32, no. 9, pp. 113, 2021.
[28] W. Sun, L. Dai, X. R. Zhang, P. S. Changa and X. Z. He, RSOD: Real-time small object detection algorithm in
UAV-based trafc monitoring,Applied Intelligence, vol. 83, no. 12, pp. 116, 2021.
2468 CSSE, 2023, vol.44, no.3
... Analyzing this data can provide valuable insights for applications such as advertising, surveys, predictions, and government purposes. Therefore, it is essential to develop resources for regional languages to effectively analyze this data [3] [4]. ...
Article
Researchers in the field of natural language processing (NLP) have been actively working in creating lexical resources for a variety of languages, including Marathi. Marathi Senti-wordnet, Marathi NRC-VAD, and the Marathi version of LIWC (MR-LIWC2015) are a few examples of these resources. The National Research Council of Canada produced the NRC-VAD lexical resource, which is the focus of this research paper. However, the present version of the Marathi NRC-VAD lexical database has several shortcomings. The original Marathi NRC-VAD dataset serves as the foundation for the curation of this vocabulary collection. Furthermore, the existing resource's applicability for language processing tasks is restricted due to its limited coverage of Marathi lexicons. To address these gaps, this research paper proposes a method to develop a modified Marathi NRC-VAD lexical resource. The proposed method is based on adding synsets to the existing NRC-VAD lexicon. At the end of the paper, we presented sentiment analysis of Marathi news dataset using Marathi NRC-VAD and proposed modified Marathi NRC-VAD lexicon set. The results showed that the proposed modified Marathi NRC-VAD gives better results.
Article
Unlike the stable gas mixtures often analyzed in lab settings, the gas mixtures in practice may change fast due to the turbulent conditions of the environment, making the detection of gas components challenging. There will be a great safety risk if no fast and accurate detection method. In this paper, an attention-based gated recurrent unit (AGRU) is proposed to solve this problem. It was introduced in detail that the method based on the gated recurrent unit (GRU) and attention mechanism. The component and concentration of the gas mixture were analyzed simultaneously by the dual loss function. The individual gas concentration in the gas mixture was obtained by multi-label coding. It was used to evaluate the performance of the model by accuracy, root mean square error(RMSE), number of model parameters, and floating point of operations(FLOPs). The experiment result showed that the AGRU’s accuracy curve is fluctuating around 97%, and varying the length of response time between 3s and 30s hardly affected the recognition accuracy of gas species. AGRU also has the smallest RMSE of each gas among models. At 3s, AGRU’s FLOPs is smaller than Improved LeNet5, which lays the foundation for the deployment in the embedded. Therefore, our proposed algorithm has a higher detection performance, lower complexity, and excellent robustness to be applicable in various practical applications for fast detection.
Conference Paper
Full-text available
Sentiment analysis is a discipline of Natural Language Processing which deals with analysing the subjectivity of the data. It is an important task with both commercial and academic functionality. Languages like English have several resources which assist in the task of sentiment analysis. SentiWordNet for English is one such important lexical resource that contains subjective polarity for each lexical item. With growing data in native vernacular , there is a need for language-specific SentiWordNet(s). In this paper, we discuss a generic approach followed for the development of a Tamil SentiWordNet using currently available resources in En-glish. For Tamil SentiWordNet, a substantial agreement Fleiss Kappa score of 0.663 was obtained after verification from Tamil annotators. Such a resource would serve as a baseline for future improvements in the task of sentiment analysis specific to Tamil data.
Article
Full-text available
With the pandemic gripping the entire humanity and with uncertainty hovering like a black cloud over all our future sustainability and growth, it became almost apparent that though the development and advancement are at their peak, we are still not ready for the worst. New and better solutions need to be applied so that we will be capable of fighting these conditions. One such prospect is delivery, where everything has to be changed, and each parcel, which was passed people to people, department to department, has to be made contactless throughout with as little error as possible. Thus, the prospect of drone delivery and its importance came around with optimization of the existing system for making it useful in the prospects of delivery of important items like medicines, vaccines, etc. These modular AI-guided drones are faster, efficient, less expensive, and less power-consuming than the actual delivery.
Article
Full-text available
In the present world, video surveillance methods play a vital role in observing the activities that take place across secured and unsecured environment. The main aim with which a surveillance system is deployed is to spot abnormalities in specific areas like airport, military, forests and other remote areas, etc. A new block-based strategy is represented in this paper. This strategy is used to identify unusual circumstances by examining the pixel-wise frame movement instead of the standard object-based approaches. The density and also the speed of the movement is extorted by utilizing optical flow. The proposed strategy recognizes the unusual movement and differences by using discrete cosine transform coefficient. Our goal is to attain a trouble-free block-based Discrete Cosine Transform (DCT) strategy that promotes real-time abnormality detection. The proposed approach has been evaluated against an airport dataset and the outcome of unusual happenings occurred in is evaluated and reported.
Article
Full-text available
The prevailing applications of Unmanned Aerial Vehicles (UAVs) in transportation systems promote the development of object detection methods to collect real-time traffic information through UAVs. However, due to the small size and high density of objects from the aerial perspective, most existing algorithms are difficult to accurately process and extract informative features from the traffic images collected by UAVs. To address the challenges, this paper proposes a new real-time small object detection (RSOD) algorithm based on YOLOv3, which improves the small object detection accuracy by (i) using feature maps of a shallower layer containing more fine-grained information for location prediction; (ii) fusing local and global features of shallow and deep feature maps in Feature Pyramid Network(FPN) to enhance the ability to extract more representative features; (iii)assigning weights to output features of FPN and fusing them adaptively; and(iv) improving the excitation layer in Squeeze-and-Excitation attention mechanism to adjust the feature responses of each channel more precisely. Experimental results show that, when the input size is 608 × 608 × 3, the precision of the proposed RSOD algorithm measured by mAP@0.5 is 43.3% and 52.7% on the Visdrone-DET2018 and UAVDT datasets, which is 3.4% and 5.1% higher than those of YOLOv3, respectively.
Article
Full-text available
Internet of Things (IoT) is becoming popular nowadays for collecting and sharing the data from the nodes and among the nodes using internet links. Particularly, some of the nodes in IoT are mobile and dynamic in nature. Hence maintaining the link among the nodes, efficient bandwidth of the links among the mobile nodes with increased life time is a big challenge in IoT as it integrates mobile nodes with static nodes for data processing. In such networks, many routing-problems arise due to difficulties in energy and bandwidth based quality of service. Due to the mobility and finite nature of the nodes, transmission links between intermediary nodes may fail frequently, thus affecting the routing-performance of the network and the accessibility of the nodes. The existing protocols do not focus on the transmission links and energy, bandwidth and link stability of the nodes, but node links are significant factors for enhancing the quality of the routing. Link stability helps us to define whether the node is within or out of a coverage range. This paper proposed an Optimal Energy and bandwidth based Link Stability Routing (OEBLS) algorithm, to improve the link stable route with minimized error rate and throughput. In this paper, the optimal route from the source to the sink is determined based on the energy and bandwidth, link stability value. Among the existing routes, the sink node will choose the optimal route which is having less link stability value. Highly stable link is determined by evaluating link stability value using distance and velocity. Residual-energy of the node is estimated using the current energy and the consumed energy. Consumed energy is estimated using transmitted power and the received power. Available bandwidth in the link is estimated using the idle time and channel capacity with the consideration of probability of collision.
Article
Full-text available
One of the major setbacks to the economy of a country is the production of fake currency. In recent past due to advancements in technologies in color printing, duplicating newer ways of faking the currency are coming into the picture. The problem has even risen in larger scale after demonetization in India and many fake currencies have entered the market. According to recent data, the Reserve Bank of India has estimated nearly around 2 trillion worth currency as counterfeit currency. This increase in fake currency has been a major problem, especially for a common person, and now that even the banks and ATM's are disbursing fake currency. Almost everyone from a vegetable seller to a businessman is wary of accepting currencies in denominations of 500 and 2000 because they look almost exactly like a real note. The only solution to this problem is fake currency detector machines which are widely used and are mostly available in banks but which is not reachable every time by the average citizen. By building this system we hope to build an easy and efficient way to identify fake currencies which can be of great help to the common citizen. The application would be easily accessible to common people. Not only identifying if a note is fake or not, but our system is also even capable of recognizing the currency scanned. We have performed currency recognition and verification using Transfer Learning on AlexNet. The idea proposed by our system is to capture the image of currency notes of domination's 100, 200, 500, 2000 through mobile and after processing tell if the currency is fake or not and check its denomination value. Verifying the currency has many applications in banks as they have to deal with counterfeit currency every day. Our system would be extremely useful for a person with a mobile phone to check the authenticity of his currency note. Recognizing the currency note can be a useful feature for the visually impaired as there would also be a voice announcement on recognizing and verification of a currency note.
Article
Vehicle re-identification (Re-ID) is one of the promising applications in the field of computer vision. Existing vehicle Re-ID methods mainly focus on global appearance features or pre-defined local region features, which have difficulties in handling inter-class similarities and intra-class differences among vehicles in various traffic scenarios. This paper proposes a novel end-to-end three-branch embedding network (TBE-Net) with feature complementary learning and part-aware ability. The proposed TBE-Net integrates complementary features, global appearance, and local region features into a unified framework for subtle feature learning, thereby obtaining more integral and diverse vehicle features to re-identify the vehicle from similar ones. The local region feature branch in the proposed TBE-Net contains an attention module that highlights the major differences among local regions by adaptively assigning large weights to the critical local regions and small weights to insignificant local regions, thereby enhancing the perception sensitivity of the network to subtle discrepancies. The complementary branch in the proposed TBE-Net exploits different pooling operations to obtain more comprehensive structural features and multi-granularity features as a supplement to the global appearance and local region features. The abundant features help accommodate the ever-changing critical local regions in vehicles' images due to the sensors' settings, such as the position and shooting angle of surveillance cameras. The extensive experiments on VehicleID and VeRi-776 datasets show that the proposed TBE-Net outperforms the state-of-the-art methods.
Article
Water is the elixir for agricultural process as the entire procedure relies merely on that. Optimization and effective usage of water in agriculture is an essential phenomenon. Wide varieties of automatic irrigation systems exists in practice but none of them proved to be efficient in terms of cost, technology adopted and fixing the loopholes which is evident from the current literature. Efficiency should be achieved in all the aforementioned levels for enhancing the agricultural productivity. Drip emitters with sensors helps in proper water distribution of agricultural crops which in turn prevents the water wastage and soil degradation. It joins hand with the emerging technology Internet of Things to enable smart tracking and solving the irrigation related issues. Hence the major focus is to assist the farmers for irrigation by applying organized procedures for getting the details regarding amount of water to be dispersed, issues in the water dispersion and water inlet flows. Effective automatic irrigation is the needy solution in today’s scenario as agriculture is the backbone in spite of the growth and advancement of any other processes. Our system attempts to fix the loopholes in the existing ones by tracking the water pressure in the dispersion pumps and checking the inlet, outlet flows. It eradicates the agricultural overhead with a complete user friendly interface. The interrelated computing devices manage the ability of data transfer over a network with minimized human intervention. Low cost is achieved in the irrigation process with the help of drip emitter connected to the semi-submersible water pump. Experimental test bed is done with the water pumps and sensors for validating its accuracy and effectiveness. © 2020 Alpha Publishers. All rights reserved.
Article
Finding a suitable hotel based on user’s need and affordability is a complex decision-making process. Nowadays, the availability of an ample amount of online reviews made by the customers helps us in this regard. This very fact gives us a promising research direction in the field of tourism called hotel recommendation system which also helps in improving the information processing of consumers. Real-world reviews may showcase different sentiments of the customers towards a hotel and each review can be categorized based on different aspects such as cleanliness, value, service, etc. Keeping these facts in mind, in the present work, we have proposed a hotel recommendation system using Sentiment Analysis of the hotel reviews, and aspect-based review categorization which works on the queries given by a user. Furthermore, we have provided a new rich and diverse dataset of online hotel reviews crawled from Tripadvisor.com. We have followed a systematic approach which first uses an ensemble of a binary classification called Bidirectional Encoder Representations from Transformers (BERT) model with three phases for positive–negative, neutral–negative, neutral–positive sentiments merged using a weight assigning protocol. We have then fed these pre-trained word embeddings generated by the BERT models along with other different textual features such as word vectors generated by Word2vec, TF–IDF of frequent words, subjectivity score, etc. to a Random Forest classifier. After that, we have also grouped the reviews into different categories using an approach that involves fuzzy logic and cosine similarity. Finally, we have created a recommender system by the aforementioned frameworks. Our model has achieved a Macro F1-score of 84% and test accuracy of 92.36% in the classification of sentiment polarities. Also, the results of the categorized reviews have formed compact clusters. The results are quite promising and much better compared to state-of-the-art models. The relevant codes and notebooks can be found here.