Conference PaperPDF Available

Sentiment Analysis Techniques for Social Media Data: A Review

Authors:
  • iNurture Education Solution Ltd

Abstract and Figures

The world is going to digitize day by day. A lot of data generated by the social website users that play an essential role in decision making. It is impossible to read the whole text, so sentiment analysis make it easy by providing the polarity to the text and classify text into positive and negative classes. Classification task can be performed by using different algorithms results in a different level of accuracy. The purpose of the survey is to provide an overview of various methods that deal with sentiment analysis. The review also presented a comparative analysis of various sentimental analysis techniques with their performance measurement.
Content may be subject to copyright.
Sentiment Analysis Techniques for Social Media Data: A
Review
Dipti Sharma1, Dr. Munish Sabharwal2, Dr. Vinay Goyal3, and Dr. Mohit Vij4
1, 2, 3 Chandigarh University, Mohali, Punjab, India
4 Skyline University, Dubai, UAE
1dipi.sharma150@gmail.com, 2smunish.cse@cumail.in
3hod.cse@cumail.in, 4dr.mohit.vij@gmail.com
Abstract. The world is going to digitize day by day. A lot of data generated by
the social website users that play an essential role in decision making. It is im-
possible to read the whole text, so sentiment analysis make it easy by providing
the polarity to the text and classify text into positive and negative classes. Clas-
sification task can be performed by using different algorithms results in a dif-
ferent level of accuracy. The purpose of the survey is to provide an overview of
various methods that deal with sentiment analysis. The review also presented a
comparative analysis of various sentimental analysis techniques with their per-
formance measurement.
Keywords: Sentiment Analysis, Opinion Mining, Decision Making.
1 Introduction
Sentiment analysis (SA) is a process of studying public opinion about an entity. Opin-
ion mining can be used in place of SA. Both terms are interchangeable. Opinion is a
judgment of a person regarding an entity that varies from one another and tells about
the choice of opinion holder [1]. In this era, Social Media is an important platform for
communication and interaction. A lot of people also found innovative information on
social media and due to that social media become treasures of information.
Sentiment analysis plays an important role in decision making and the recom-
mender system [2]. Decision making includes purchasing a product, making an in-
vestment. Users are always interested in seeking the experience of their colleague
while making an investment or purchasing a product. Now a day there is a lot of re-
views on social media, which are impossible to read by an investor or a buyer. Senti-
ment analysis makes this task easy because it describes the polarity of review so that a
buyer can directly know whether a given review is positive or negative without read-
ing the whole sentence which helps in decision making. Three levels of SA are: aspect
level, sentence level, and document level [3].
To classify sentiment, various steps need to follow that are data collection, data
pre-processing, feature extraction, sentiment classification, and evaluation. Data is
collected from various source that is in raw form. To find the sentiment it needs to
maintain in a structured form. This can be done using the pre-processing of data. Af-
ter pre-processing, feature extraction is performed. Once the feature of data has been
extracted, now the task of sentiment classification has to be performed. To perform
classification different approaches or methods of sentiment classification can be used
like: Lexicon based, Machine learning, and hybrid method. Section 2 describes the
basic step for SA. Section 3 and section 4 discusses the feature extraction and senti-
ment classification methods respectively. Comparison table, discussion and conclu-
sion and future work are presented in section 5, 6 and 7 respectively.
2 Sentiment Analysis
The Sentiment analysis (SA) which is commonly known as opinion mining or contex-
tual mining, is used in the Natural Language Processing (NLP), computational lin-
guistics, text analysis which helps in identify, systematically extract and quantify, the
subjective information [5]. The sentiment analysis actually works widely in the form
of a customer’s voice like reviews or responses on any material or item. Example:
Suppose a customer wants to buy any item online, so before buy that item the cus-
tomer generally reads reviews about that item or product and this will help to take the
right decision about that item [6] [7].
Sentiment analysis uses three terms to define sentiment. These are, object about
which opinion is given, features of that object, opinion holder who give his opinion
about the object. Sentiment analysis handles various challenges such as identification
of the object, feature extraction and finds the orientation of opinion. Sentiment Analy-
sis performs the classification task in 3 steps:
Document level
Sentence level
Feature level or Aspect level
Fig. 1. Sentiment Analysis Process
The document level of classification is used where the task is to find the overall polar-
ity of a topic irrespective of opinion holder. Document-level sentiment analysis as-
sumes opinion about the single entity is expressed by the document. This true in case
of product review, movie review etc. where a document expresses the opinion about a
single movie or a single product. The sentence is a shorter form of document as col-
Features
Review
data
Data Pre-
processing
Sentiment Polarity
Sentiment
classification
Feature extraction
& selection
lection of sentence makes a document [3]. Sentence level classification assumes each
sentence holds a single opinion. Here classification includes two subtasks: subjectivi-
ty detection and opinion detection. At the feature level or aspect level, the analysis of
various features of an object is performed. Example: Suppose a customer buy a Sam-
sung Mobile Phone, then he observed that the camera quality of the cell phone is fair
but the sound quality of the cell phone is not fair. So to analyse, the various aspect of
an entity aspect level analysis is performed.
Sentiment Analysis includes Data Pre-processing, Feature Selection, and classifica-
tion then find the polarity of data as shown in Fig. 1. Data pre-processing includes
tokenization, stop word removal, stemming, and lemmatization etc. Tokenization is a
task of breaking a sequence of words into individual words called tokens. Stop words
are the words (is, am, are, in, to etc.) which do not hold any opinion, so it is beneficial
to remove them. Stemming is a task of converting word’s variant forms to its base
form like helping to help.
3 Features Extraction in Sentiment Analysis
The extraction of the feature from the text is a very basic task in the Sentiment analy-
sis. In this technique, the text has to be converted into the feature vector with the help
of the data-driven approach. Here below, we have seen some features which are
commonly used in a Sentiment analysis (SA). Term Presence Vs Term Frequency, N-
gram Features, Parts of Speech, Term Position [8]
3.1 Term Presence Vs Term Frequency
The “Term Frequency” used to find the term count occur in the corpus. The “Term
Presence” is actually a binary valued feature vector, which indicates that the term
occurs in the sentence or not. 1 represents the presence of term and 0 represents the
absence of the term. Pang-Lee et al. [9] show that the “term presence” is more im-
portant than the “term frequency” in the Sentiment analysis. Here we also saw that the
occurrence of rare words contains more reliable information as compared to the oc-
currence of frequent words. The phenomenon which is used in this process is known
as Hapax Legomena.
3.2 N-gram Features
The N-gram Features are widely used in NLP. Number of terms occur together in a
text known as n-gram. When only one term is taken as a feature known as unigram,
for two-term it is bigram. Here the Pang et.al [9] experimented that the unigrams out-
perform the bigrams with the sentiment polarity whereas Dave et al. [10] found that
bigrams and trigrams performed better.
3.3 Parts of Speech (POS) Tagging
Verbs, adjectives, and adverbs mainly contain the opinion of a person in the English
language. POS tagging helps to find these tagged words in a corpus. Adjective, ad-
verb, and verbs can be considered as features and irrelevant words can be removed
from the corpus so that vocab size can be reduced.
3.4 Negation
Negative words when words come with the positive opinion, invert the polarity, posi-
tive to negative like ‘not good movie’ has ‘good’ with positive polarity but ‘not’
change the polarity to negative.
4 Sentiment Analysis Methods
Sentiment analysis methods are machine learning based, lexicon-based and hybrid
method. In machine learning method labeled dataset is used where the polarity of a
sentence is already mentioned. From that dataset, we extract the feature and that
features help to classify the polarity of the unknown input sentence. Machine learning
methods divided into supervised learning and unsupervised learning.
Fig. 2. Sentiment Analysis Methods
4.1 Supervised learning
This approach is used when there is labeled data available for training the model. Two
steps are used in supervised learning: first is to train the model and another is predic-
tion [11]. During training, data set with its labels is fed to the classification algorithm
which gives a model as an output. After that test data is fed into the model to predict
the category. There are various supervised classification algorithm are:
Naïve Bayes. It is a probabilistic classification algorithm. It considers each word
independent as it does not consider the location of a term in the sentence. Naïve
Bayes based on Bayes theorem to calculate the probability of each term which corre-
sponding to a label.

 (1)
p(label) is the prior probability of the label in the dataset. p(feature | label) is the
prior probability of a feature related to a label. p(feature) is the prior probability of a
feature that is occurred. Geol A.et al. [12] used SentiWordNet Lexicon with Naïve
Bayes that improve the classification of twitter dataset as it provides the score of posi-
tive and negative tweets.
Bayesian Network. As the naïve Bayes classifier treat each word as independent so it
is not able to find a semantic relationship between the words whereas Bayesian Net-
work can. Bayesian network strongly considers the words dependency on each other.
The Bayesian network represents dependency in term of a directed graph which is
acyclic where each node represent the word as a variable and edges represent the
dependency between the variables. As a sentiment classifier, Al-Smadi et al. [13] used
Bayesian networks, finding competitive output and sometimes high, as compared to
other classifiers.
Support Vector Machine (SVM). SVM is initialized first time to solve the problems
of binary classification. Its focuses on determining best hyperplanes which act as a
separator to describe the decision boundaries among the data points which are from
different classes. A hyperplane should be selected which can maintain the maximum
distance between two support vectors of different classes as shown in the figure. The
SVM has the capability to manage the linear, and non-linear classification tasks.
Zainuddin, N. & Selamat A. [14] used SVM for classification with various
weighting schemes like TF-IDF, term occurrence, Binary Occurrence. He uses chi-
square as a feature selection which is used for dimensionality reduction and noise
removal. With the help of the experiment, he showed that the use of chi-square fea-
ture selection with SVM improve the accuracy
Artificial Neural Network. Artificial Neural Network (ANN) mimic the neuron
structure of the human brain. The basic unit for the neural network is neuron. ANN
comprises an input layer, hidden layer, and an output layer. A vector ‘a (i)’ is given as
input to neuron, vector denotes the frequency of a word in a document. There is a
weight ‘A’, corresponding to each neuron which is used to calculate the function.
Neural network use linear function is: x (i) =A. (a (i)). The sign of x (i) is used to
classify the class.
In artificial neural networks training of model consist of two steps: forward propa-
gation and backward propagation. In forward propagation, the input is given at the
input layer of neurons which is multiplied by the weights which are random numbers.
Functions are used to normalize the output value between 0 and 1. Then the output is
compared with the target value, if there is a difference (error) between two values
then backward propagation is performed. During back propagation input is multiplied
by error value so that weight can be adjusted. Hence learning depends upon error. The
author [15] used a neural network for face classification which has given a high accu-
racy rate.
Vega, L. & Mendez- Vazquez [16] proposed a Dynamic Neural Network (DNN)
Model where he used competitive and Hebbian learning for the learning process. He
compared the baseline approach with DNN and showed that DNN performs in a better
way than baseline methods. Patil, S. et al. [17] proposed a technique where he used
latent semantic analysis (LSA) with a convolution neural network (CNN). LSA is a
technique for converting word to vector. Weighting in LSA performed with TF-IDF
algorithm. His model provides 87% accuracy.
Decision tree. It is a tree-like structure where the non- terminal nodes represent a
feature and terminal node represents the label. The path is taken on the basis of a con-
dition. This is a recursive process and ultimately reach a terminal node which gives a
label to an input.
The main challenge in the decision tree is to find which attribute is to be chosen as
a root node. This can be solved by using some statistical approach such as information
gain and Gini index. A Decision tree is a good method for sentiment analysis because
it also provides a good result on a large amount of data. Commonly used decision tree
algorithms are CART, CHAID, and C5.0.
Decision tree divides the training data hierarchically. For the division of data, a
condition is used that is on the attribute value. Condition on the basis of whether a
word is present or absent. The division process is continued until terminal nodes rep-
resent the small numbers of features which are used for the classification task. Koten-
ko, I et al. [18] used a decision tree to block the false content on the web site. He used
TF-IDF for weighting the word which tells about the importance of the word and a
binomial classifier which tell about whether a word belongs to a specific category or
not.
Fig. 3. Structures of SVM, ANN and Decision tree respectively
Rule-Based Classifier. Model produced by the rule based classifier is in form of set
of rules. On the basis of these rules prediction for new information is driven. Rules
are always in the form of antecedent and consequent. IF (antecedent) is in the left-
hand side represents the conditions while right-hand side (consequent) represent the
prediction of class. A rule form can be seen below [19].
{w1 /\ w2/\w3} {+|-}
Word in a rule expresses the sentiment shown below.
{Good} {+} {Bad} {-}
In text classification IF part represents the features set that may be term presence
and THEN part represent the label. The rule-based classifier uses two terms to define
the rule: Confidence and Support. Number of the instance in a training data set related
to rule is defined by the support. The conditional probability of a label if a feature set
occurs represented by confidence in a rule.
Buddeewong, S. and Kreesuradej, W. [20] proposed Association Rule based Text
classifier algorithm (ARTC). In his work, he made two itemsets: One for those words
which did not overlap with other class and other for those which overlap with other
classes. Then with the help of frequent itemset, he generated the rules. He used the
Apriori algorithm for rule generation. He has experimented 95.08 % accuracy rate.
4.2 Unsupervised learning
This method is used when the reliability of labeled data is difficult. It is easy to collect
the unlabeled data than labeled data. The sentence is categorized on the basis of key-
word lists of each category. In order to analyse the domain dependent data, it is easier
using the unsupervised approach. Unnisa, M. et al. [21] performed sentiment analysis
using the unsupervised approach where tweets were clustered into the positive and
negative cluster using spectral clustering approach. Spectral clustering outperforms
Naïve Bayes, SVM, and Maximum Entropy.
4.3 Lexicon based method
The words which express the opinion are most important for sentiment analysis. The
Positive opinion is the desired label where negative opinion is an undesired label for
an entity. Lexicon is a collection of the predefined word where a polarity score is
associated with each word. It is the easiest approach for sentiment classification. This
classifier makes use of a lexicon and performs word matching which used to catego-
rize a sentence. The performance of this classification approach depends upon lexicon
size. There are two approaches used under the lexicon-based method explained in the
subsection.
Dictionary-Based Approach. In dictionary-based approach some words are selected
as a seed word and these words are used to find the synonym to enlarge the size of
word set. Online dictionaries are used to expand size. Seed words are the opinion
words that are unique and important in a corpus. These seed words and new expanded
words are used as a feature for performing sentiment analysis. There are various dic-
tionaries like WordNet, SentiWordNet, SentiFul, SenticNet.
Park, S., & Kim, Y. [22] proposed a method for building thesaurus lexicon by us-
ing dictionary-based approach. He used three online dictionaries for building a the-
saurus and only store the words that are co-occurred in the lexicon that enhanced the
reliability of lexicon. The expansion of lexicon is done by synonym and antonyms of
seed words. Expended thesaurus enhances the accuracy of the classification task. He
selected the seed word using TF-IDF methods. He also mentioned that this approach
is a time-consuming approach.
Corpus-Based approach. In corpus-based approach we do not only find the label of
a word but also the context orientation. In this approach firstly a list of seed words is
prepared and then the syntactic pattern of these listed words is used to generate new
subjective words from the corpus. Syntactic pattern means a word which occurs with
each other or together. This approach further works in two way:
Statistical based approach.
Semantic-based approach.
The author [23] demonstrate both lexicon based approaches. He observed that cor-
pus-based approach with SVM provides high accuracy for the light-stemmed data. He
also declared that with the increase of lexicon, the accuracy of lexicon-based ap-
proach is also increased.
5 Comparison of Different Sentiment Analysis Technique
The work that has been performed by the number of researchers in the field of senti-
ment analysis and analyzing the performance using various techniques is described in
the table below.
Table 1. Comparison of different analysis techniques.
Author
Description
Technique
used
Dataset
Performance
Measurement
Singh,
V. et al.
(2013)
[24]
The author has implemented
SentiWordNet approach with
different variations of linguistic
features, scoring schemes and
aggregation thresholds for senti-
ment analysis.
Lexicon
based
&
Machine
learning
Movie
review
data
SentiWord-
Net(SWN)= 65%
Naïve Bayes =
82%
SVM = 77%
Luo, Z.
et al.
(2013)
[25]
The author demonstrates how can
exploit social and structural textu-
al information of Tweets and
improve Twitter-based opinion
retrieval.
Support
Vector Ma-
chine
Twitter
Accuracy=82.52%
Socher
R.et al.
(2013)
[26]
The author introduced a Sentiment
Treebank for sentiment detection
and evaluation resources. They
used Recursive Neural Tensor
Network for Treebank.
Deep learning
Stanford
Senti-
ment
Tree-
bank
Recognition
Rate=80.70%
Wan-
xiang,
C. et al.
(2015)[2
7]
The author proposed a framework
of adding a sentiment sentence
compression (Sent Comp) step
before performing the aspect-
based sentiment analysis. They
applied a discriminative condi-
tional random field model, with
certain special features, to auto-
matically compress sentiment
sentences.
Lexicon
based ap-
proach
Chines
blog
dataset
88.78% accuracy
for No_comp_ssc
88.04% accuracy
for anu-
al_comp_ssc
87.95% accuracy
for auto_comp_ssc.
Yan, X.,
et al.
(2015)[2
8]
The author developed the Tibetan
sentence sentiment tendency
judgment system based on maxi-
mum entropy and test it on the
corpus which contains 10000
Tibetan sentiment sentences.
Maximum
Entropy
classifier
Blogs
F-value=82.8%
Sharma,
Y. et al.
(2015)
[29]
The author purposed the sentiment
analysis of Hindi tweets.
Lexicon
based ap-
proach
Tweet on
JAIHIN
D and
#world-
cup2015
73.53% accuracy
for “JAIHIND”
81.97% accuracy
for “#wprld-
cup2015”
Zimbra,
D. et al.
(2016)[3
0]
The author proposed a method to
brand-related sentiment analysis
using feature engineering and
artificial neural network.
Artificial
Neural Net-
work
Twitter
dataset
86% accuracy for
three class prob-
lem.
85% accuracy for
five class problem.
Kale, S.
et al.
(2017)
[31]
The author considered Semantic
analysis of his work and compari-
son between algorithms also
performed.
Naïve Bayes
and
Maximum
Entropy
Tweets
63.9% accuracy
for Naïve Bayes
27.8% accuracy for
Maximum Entropy
Jianqian
g, Z. et
al.
(2017)
[32]
The author performed Twitter
Sentiment analysis by introducing
word embedding obtained by
unsupervised learning and then
integrated it into deep convolution
neural network.
Deep Convo-
lution Neural
Network
Stanford
Twitter
Senti-
ment
dataset
Accuracy=87.36%
Alshari,
E. et al.
(2018)[3
3]
The author used SentiWordNet
(SWN) to find the polarity of the
non-opinion word and propose
new method Senti2Vec.
Lexicon
Based ap-
proach
Movie
Review
dataset
85.4% accuracy
for positive data
83.9% accuracy for
negative data
Ban-
dana, R.
(2018)[3
4]
The author described the Hetero-
geneous feature such as machine
learning-based and lexicon based
and supervised learning algo-
rithms like Linear Support Vector
Machine and Naïve Bayes for
purposed model.
Hybrid ap-
proach
(SentiWord-
Net + Naïve
Bayes +
Support
Vector Ma-
chine)
Movie
Review
dataset
Using 250 training
and 100 testing da-
taset (89% for na-
ïve Bayes, 76 %
for SVM).
Using 300 training
and 150 testing da-
taset (84% for na-
ïve Bayes, 79% for
SVM).
Ghosh,
M., &
Sanyal,
G.
(2018)
[35]
The author used three feature
selection techniques which are
Information gain, Chi-Square and
Gini index in combination in order
to increase the performance of
four classifiers.
Sequential
Minimal
Optimization
+ Multinomi-
al Naïve
Bayes +
Random
forest +
Logistic
regression
Movie
(IMDb)
Electron-
ics
product
Kitch-
enware
90.18(F-measure)
for SMO
88.18 accuracy for
MNB
87.73 accuracy for
RF
87.32 accuracy for
LR
Sumit,
S. et al.
(2018)
[36]
The author experimented senti-
ment analysis in Bangla language
using a continuous bag of word
and Word2Vec Skip gram word
embedding methods with word to
index model and also compared
them
Artificial
Neural Net-
work
Bangla-
deshi
Face-
book
pages
83.79% accuracy
for Skip-gram
82.57 % accuracy
for CBOW
54.40 % accuracy
for Word to Index
6 Discussion
This review paper covers the basic understanding about sentiment analysis and meth-
ods used for classification. The systematic review has been explored the various sen-
timent analysis methods with their performance parameters. It has been observed that
high accuracy of classification depends upon the quality of selected features and clas-
sification algorithm used. In recent years a lot of work done in order to find the se-
mantic relationship using word embedding methods and classification using artificial
neural networks [32][36]. The semantic relationship is required to check as related
words mostly express the same polarity. SVM and naïve Bayes are used by the re-
searchers as a reference model for comparing their proposed work. These two algo-
rithms provide high accuracy with feature selection techniques.
Lexicon based approach is used by the researchers to solve sentiment analysis
problems as it is scalable and also computationally efficient. This approach can solve
high complex tasks and also performed very well as experimented in [27] [29] [33]. It
is also observed that researchers mostly used SentiWordNet [33] [34] lexicon in order
to find the polarity score of the words. The datasets used in the analysis are mostly
movie reviews, tweets. Mostly researchers performed SA using English language but
here we can see some researchers [27] [29] [36] used non-English languages for solv-
ing SA problem which also provides compatible results. Hence with respect to above
discussion, we can say that more good is our domain-dependent inputs (dataset, lexi-
con, feature extraction/ selection, and classification algorithms), much better output
(results) can be achieved.
7 Conclusion and future work
This paper discussed the methods for sentiment classification and comparison of algo-
rithms experimented by different researchers on different datasets along with perfor-
mance measures. It is concluded that Naïve Bayes and SVM are the most frequently
used algorithm for classification. These two algorithms are used by researches for
comparing their proposed work. After studying these researches, it is very clear that
the expansions in sentiment classification and feature selection algorithms are still
required and hence an open area of research.
For sentiment analysis data is taken from blogs, social media website like Face-
book, Twitter, Amazon, flip kart etc. People freely express their view on these media
about certain topic, product, and politics. By analyzing these reviews one can extract
the information about their area and can do improvement. Since so much research has
been done in the field of sentiment analysis, still it faces many challenges. Sometimes
people express their views in a sarcastic way that is hard to detect. Due to these chal-
lenges, sentiment analysis still remains an area of research. In order to improve the
classification result deep data analysis is required based on context.
References
1. Liu, B. “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data”, Springer
2006.
2. Tatemura, J. “Virtual reviewers for collaborative exploration of movie reviews”. In pro-
ceedings of the 5th International Conference on Intelligent user interface. ACM, pp. 272-
275, 2000.
3. Liu, B. “Sentiment Analysis and Opinion mining”. In Synthesis Lectures on Human Lan-
guage Technologies, pp.1167, 2012.
4. Maks Isa, Vossen Piek. “A lexicon model for deep sentiment analysis and opinion mining
applications”. In Decision Support Systems, vol. 53, pp. 680-688, 2012. Springer.
5. Contratres, F. G., Alves-Souza, S. N., Filgueiras, L. V. L., & DeSouza, L. S. “Sentiment
Analysis of Social Network Data for Cold-Start Relief in Recommender Systems”. In
World Conference on Information Systems and Technologies, pp. 122-132, 2018. Spring-
er, Cham.
6. Neri, F., Aliprandi, C., Capeci, F., Cuadros, M., & By, T. “Sentiment Analysis on Social
Media”. In 2012 IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, pp. 919-926, 2012.
7. Etter, M., Colleoni, E., Illia, L., Meggiorin, K., & D’Eugenio, A. “Measuring organiza-
tional legitimacy in social media: Assessing citizens’ judgments with sentiment analysis”.
Business & Society, vol. 57(1), pp. 60-97, 2018.
8. Mejova, Y., & Srinivasan, P. “Exploring feature definition and selection for sentiment
classifiers”. In Proceedings of the fifth international AAAI conference on weblogs and so-
cial media 2011.
9. Pang, B., & Lee, L. “Opinion Mining and Sentiment Analysis”. In Foundations and Trends
Information Retrieval, vol. 2, pp. 1135, 2008.
10. Dave, K., Lawrence, S., and Pennock, D. “Mining the Peanut Gallery: Opinion Extraction
and Semantic Classification of Product Reviews”. 2003.
11. J Kaur & M Sabharwal. “Spam Detection in Online Social Networks Using Feed Forward
Neural Network”. In RSRI Conference on Recent Trends in Science and Engineering, vol.
2, pp. 69-78, 2018.
12. Goel, A., Gautam, J., & Kumar, S. “Real time sentiment analysis of tweets using Naive
Bayes”. In 2nd International Conference on Next Generation Computing Technologies
(NGCT), pp. 257-216, 2016.IEEE.
13. Al-Smadi, M., Al-Ayyoub, M., Jararweh, Y., Qawasmeh, O.: Enhancing aspect-based sen-
timent analysis of Arabic hotels’ reviews using morphological, syntactic and semantic fea-
tures. Inf. Process. Manag. (2018).
14. Zainuddin, N., & Selamat, A. “Sentiment analysis using Support Vector Machine”. In In-
ternational Conference on Computer, Communications, and Control Technology (I4CT),
pp. 333-337, 2014, IEEE.
15. Sachdeva K., Kaur A. & M. Sabharwal. “Face Recognition using Neural Network with
SURF Technique”. In International Conference on Futuristic Trends in Computing and
Networks, vol. 2(1), pp. 256-261, 2018.
16. Vega, L., & Mendez-Vazquez, A. “Dynamic Neural Networks for Text Classification”. In
International Conference on Computational Intelligence and Applications (ICCIA), pp. 6-
11, 2016, IEEE.
17. Patil, S., Gune, A., & Nene, M. “Convolutional neural networks for text categorization
with latent semantic analysis”. In International Conference on Energy, Communication,
Data Analytics and Soft Computing (ICECDS), pp. 499-503, 2017, IEEE.
18. Kotenko, I., Chechulin, A., & Komashinsky, D. “Evaluation of text classification tech-
niques for inappropriate web content blocking”. In 8th International Conference on Intelli-
gent Data Acquisition and Advanced Computing Systems: Technology and Applications
(IDAACS), pp. 412-417, 2015, IEEE.
19. Xia R, Xu F, Yu J, Qi Y, and Cambria E. “Polarity shift detection, elimination and ensem-
ble: a three-stage model for document-level sentiment analysis “. In Information Pro-
cessing and Management, vol. 52, pp. 3645, 2016.
20. Buddeewong, S., & Kreesuradej, W. “A new association rule-based text classifier algo-
rithm”. In 17th IEEE International Conference on Tools with Artificial Intelligence
(ICTAI’05), 2005.
21. Unnisa, M., Ameen A., & Raziuddin, S. “Opinion Mining on Twitter Data using Unsuper-
vised Learning Technique”. In International Journal of Computer Applications, pp.0975
8887, Vol. 148, 2016.
22. Park, S., & Kim, Y. “Building thesaurus lexicon using dictionary-based approach for sen-
timent classification”. In IEEE 14th International Conference on Software Engineering Re-
search, Management and Applications (SERA), 2016.
23. Abdulla, N. A., Ahmed, N. A., Shehab, M. A., & Al-Ayyoub, M. “Arabic sentiment anal-
ysis: Lexicon-based and corpus-based”. In IEEE Jordan Conference on Applied Electrical
Engineering and Computing Technologies (AEECT), 2013.
24. Singh, V. K., Piryani, R., Uddin, A., & Waila, P. “Sentiment analysis of Movie reviews
and Blog posts”. In 3rd IEEE International Advance Computing Conference (IACC), pp.
893-898, 2013.
25. Luo, Z., Osborne, M., & Wang, T. “An effective approach to tweets opinion retrieval”. In
Springer journal on World Wide Web, pp. 545566, 2013.
26. Socher R.et al. "Recursive deep models for semanticcompositionality over a sentiment
Treebank." In Proceedings of the Conference on Empirical Methods in Natural Language
Processing (EMNLP), pp. 1631-1642, 2013.
27. Wanxiang Che, Yanyan Zhao, Honglei Guo, Zhong Su, & Ting Liu. “Sentence Compres-
sion for Aspect-Based Sentiment Analysis”. In IEEE/ACM Transactions on Audio,
Speech, and Language Processing, Vol. 23, pp. 2111-2124, 2015.
28. Yan, X., & Huang, T.,”Tibetan Sentence Sentiment Analysis Based on the Maximum En-
tropy Model”. In 10th International Conference on Broadband and Wireless Computing,
Communication and Applications (BWCCA), pp. 594-597, 2015, IEEE.
29. Sharma, Y., Mangat, V., & Kaur, M.”A practical approach to Sentiment Analysis of Hindi
tweets”. In 1st International Conference on Next Generation Computing Technologies
(NGCT), pp. 677-680, 2015, IEEE.
30. Zimbra, D., Ghiassi, M., & Lee, S. “Brand-Related Twitter Sentiment Analysis Using Fea-
ture Engineering and the Dynamic Architecture for Artificial Neural Networks”. In 49th
Hawaii International Conference on System Sciences (HICSS), pp. 1930-1938, 2016,
IEEE.
31. Kale, S., & Padmadas, V. “Sentiment Analysis of Tweets Using Semantic Analysis”. In In-
ternational Conference on Computing, Communication, Control, and Automation
(ICECUBE), 2017, IEEE.
32. Jianqiang, Z., Xiaolin, G., & Xuejun, Z. “Deep Convolution Neural Networks for Twitter
Sentiment Analysis”, pp. 2325323260, IEEE Access, 2018.
33. Alshari, E. M., Azman, A., Doraisamy, S., Mustapha, N., & Alkeshr, M. Effective Meth-
od for Sentiment Lexical Dictionary Enrichment Based on Word2Vec for Sentiment Anal-
ysis”. In Fourth International Conference on Information Retrieval and Knowledge Man-
agement (CAMP). 2018.
34. Bandana, R. “Sentiment Analysis of Movie Reviews Using Heterogeneous Features”. In
2nd International Conference on Electronics, Materials Engineering & Nano-Technology.
2018, IEEE.
35. Ghosh, M., & Sanyal, G. An ensemble approach to stabilize the features for multi-domain
sentiment analysis using supervised machine learning. In Journal of Big Data. 2018, 5(1).
Springer.
36. Sumit, S. H., Hossan M. Z., Muntasir, T. A. & Sourov T. “Exploring Word Embedding
For Bangla Sentiment Analysis”. In International Conference on Bangla Speech and Lan-
guage Processing (ICBSLP), 2018, IEEE.
... Despite the widespread occurrence of code-mixed communication, a gap remains in the research landscape, particularly in understanding the emotional dynamics of such interactions (Ramalingam et al., 2023;Attri et al., 2020). Studies in Natural Language Processing (NLP) are progressively delving into the significance of emotions in human dialogues, offering promising applications across various domains including humancomputer interaction (Kulkarni et al., 2023), social media scrutiny (Sharma et al., 2020), and healthcare (Takale, 2024). The Emotional Dynamics in Realistic Environments and Friends (EDiReF) initiative plays a significant role in this area by examining emotional expression and shifts in both bilingual (Hindi-English) and monolingual (English) conversations. ...
... Indrajeet Kaur Chhabra et al. [5] proposed a hybrid approach in which SentiWordNet lexicon is used to discover the polarity of the words and then linear SVM classifier is used to the classify the reviews. Dipti Sharma et al. [6] presented a review of various methods that are used for the sentiment analysis, and conclude that for sentiment classification NB and SVM algorithms are most frequently used. Rajkumar S. Jagdale et al. [7] performed the sentiment analysis of the amazon product reviews by using NB and SVM and conclude that for camera reviews SVM achieves accuracy of 93%. ...
Article
In today’s world social networking platforms like Facebook, YouTube, twitter etc. are a great source of communication for internet users and loaded with large number of emotions, views and opinions of the people. Sentiment analysis is the study of attitudes, emotions and opinions of the people and is also known as opinion mining. Sentiment analysis is used to find the opinion i.e. negative or positive about a particular subject. In this paper an Enhanced sentiment analysis approach is presented by using the Association rule mining i.e. Apriori and machine learning approach such as Support Vector Machine. The Enhanced approach is compared with the baseline approach, on accuracy, precision, recall, and F1-score measures. The Enhanced approach for sentiment analysis is implemented using the R programming language. The Enhanced approach shows better performance in comparison to the baseline approach.
Conference Paper
Full-text available
The COVID-19 pandemic is one of the most pressing issues of recent times. It is found that the virus can be spread from one human to another through droplets and airborne particles. Despite an ever increasing pool of scientific evidence pointing to the effectiveness of the simple act of wearing face masks in greatly reducing the number of cases, people are reluctant to wear masks. Therefore, the authors have come up with a project that uses Artificial Intelligence (AI) through the processing of the image to operate doors to help create a mask-wearing habit in the general populace. The system is designed to detect the faces and determine whether the person is wearing a face mask or not, to decide whether they should be allowed to enter or not.
Article
Nowadays there is huge growth in data People post their views and opinions through the web on different apps, blogs, articles, etc. Customers post their reviews on shopping sites about the product or service. So, it becomes beneficial for companies, manufactures, business owners and sellers to understand customers, product users or buyers but due to huge data/feedbacks or posted opinions manually analyzing text data, is impossible to do. So, opinion mining is very important so as to analyze all the data and know the sentiments from that data without much human effort and in less time huge data can be analyzed. Many researches have made the base in this field of opinion mining. Here opinion mining will be discussed starting with what is opinion mining, how opinion mining is performed, levels, types and approaches for opinion mining, and applications. Also, methods for Text Preprocessing, Feature Extraction, Evaluation and Classification Approaches that are Machine Learning approaches and Lexicon Based approaches also, various opinion mining methods such as Support Vector machines (SVM), Neural Network, Naïve Bayes, Bayesian Network, Maximum Entropy, Corpus and Dictionary based methods are discussed here.
Article
Full-text available
Abstract Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying numerous methodologies. Feature generation and selection are consequent for text mining as the high dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability of the widely used feature selection method (IG, Chi Square, Gini Index) individually as well as their combined approach on four machine learning classification algorithm. The proposed methods are evaluated on three standard datasets viz. IMDb movie review, electronics and kitchen product review dataset. Initially, select the feature subsets from three different feature selection methods. Thereafter, statistical method UNION, INTERSECTION and revised UNION method are applied to merge these different feature subsets to obtain all top ranked including common selected features. Finally, train the classifier SMO, MNB, RF, and LR (logistic regression) with this feature vector for classification of the review data set. The performance of the algorithm is measured by evaluation methods such as precision, recall, F-measure and ROC curve. Experimental results show that the combined method achieved best accuracy of 92.31 with classifier SMO, which is encouraging and comparable to the related research.
Conference Paper
Full-text available
Sentiment Analysis (SA), sometimes known as opinion mining, polarity analysis or emotional AI, is a study of analyzing user's reviews, ratings, recommendations and other forms of online expressions. Most of the research work on SA in Natural Language Processing (NLP) are focused on the English language. However, Bengali is spoken as the first language by almost 230 million people worldwide, 163.9 million of whom are Bangladeshi. These people are found to get increasingly involved in online activities on popular microblogging and social networking sites, sharing opinions and thoughts and most of them are in Bengali and Romanized Bengali (English character to write Bengali) language. These online opinions are changing the way of doing business. And lots of data are being generated each year which are being underutilized. In this paper, we have experimented current state of the art word embedding methods Word2vec Skip-Gram and Continuous Bag of Words with an addition Word to Index model for SA in Bangla language. Word2vec Skip-Gram model outperformed other models and achieved 83.79% accuracy.
Conference Paper
Full-text available
In the field of pattern recognition and computer vision, an intriguing and a challenging problem that is widely studied is recognition by face biometric. Face Recognition is an application of biometric and it has utilizations in authentication by biometric, video surveillance, security and so forth. In earlier years, several techniques for recognition by face biometric were prospected. Nevertheless, these techniques were affected from dilemma such as pose, illumination variations, increased distance between individual's face and camera can blur the image and noise was also one of the reason due to which earlier techniques were with destitute performance. To address such problems, this paper surveys different work which has been done on this research topic. This paper construes the techniques used by researches in their work to overcome various problems suffer by face recognition. A new methodology is applied for exploration of the attribute space to the conceptual component subset where attributes are liberated by using Principle Component Analysis, and matching and recognition is performed by the use of SVM classifier and SURF Technique respectively. From this research results obtained gives enhanced Accuracy, Error rate, PSNR and MSE value.
Conference Paper
Full-text available
Social Networks define a path for consumers to continue contact with their friends. Social Networks' is increasingly the popularity allows of them to accumulator huge amounts of PI (Personal Information) about their consumers. Unhappily, spam information wealth as-well-as its simple to access consumers information can attract attackers class concentration. That's why social networks have been attacked by spammers while there has been a various work to identify and repair them. With regard to this problem that spammers look for novel paths to target social networks every day, there have been permanent events to verify the spammers and malicious email and social tweets. Spam detection in social network problems is becoming a novel framework for distribution of information, sentiments and news. Spammers are misusing these Online Social Networking platforms, thereby scattering misinformation, propaganda fake news, un-solicited message and rumours. There exists a large black market, which permits a user who intends to spam to acquisition a million followers that focus consist of fake accounts in order to cover an original user. This only gives the recognition of a spammer a genuine appearance but also enables data mining spamming. Where the spammer could send direct communication that contains malicious content. In this research work, examined that the feed forward neural network with feature extraction to identify spam's in two phases i.e. Negative and Positive in social networks. The enhancement in spam detection is considered on the basis of accuracy performance parameters and the consequences, thus achieved define that the new research approach that joins all methods out-performs other artificial approaches in terms of overall perfection and non-spammer detection accuracy.
Article
Full-text available
Twitter sentiment analysis provides the organizations with the ability to surveying public emotion towards the events or products related to them. Most of the studies are focusing on obtaining sentiment features by analyzing lexical and syntactic features that are expressed explicitly through sentiment words, emoticons, exclamation marks etc. In this paper, we introduce a word embeddings obtained by unsupervised learning on large twitter corpora that uses latent contextual semantic relationships and co-occurrence statistical characteristics between words in tweets. These word embeddings are combined with n-grams features and word sentiment polarity score features to form a sentiment feature set of tweets. The feature set is integrated into an deep convolution neural network for training and predicting sentiment classification labels. We experimentally compare the performance of our model with the baseline model that is a word n-grams model on five Twitter datasets, the results indicate that our model performs better on the accuracy and F1-Measure for Twitter sentiment classification. OAPA
Article
Full-text available
Conventional quantitative methods for the measurement of organizational legitimacy consider mainly three sources that make judgments about organizations visible: news media, accreditation bodies, and surveys. Over the last decade, however, social media have enabled ordinary citizens to bypass the gatekeeping function of these institutional evaluators and autonomously make individual judgments public. This inclusion of voices beyond functional and formally organized stakeholder groups potentially pluralizes the ongoing discussions about organizations. The individual judgments in blogs, tweets, and Facebook posts give indication about the broader fit between an organization’s perceived behavior and heterogeneous social norms and therefore constitute an indicator of organizational legitimacy that can be accessed and measured. We propose the use of social media data and sentiment analysis to study the affect-based responses to organizational actions by citizens. We critically discuss and compare the method with existing quantitative methods for legitimacy measurement and apply them to a recent case in the banking industry. We discuss the value of the method for studying the process of legitimacy construction as the expression and negotiation of normative judgments about organizations by various evaluators.
Article
This research presents an enhanced approach for Aspect-Based Sentiment Analysis (ABSA) of Hotels' Arabic reviews using supervised machine learning. The proposed approach employs a state-of-the-art research of training a set of classifiers with morphological, syntactic, and semantic features to address the research tasks namely: (a) T1:Aspect Category Identification, (b) T2:Opinion Target Expression (OTE) Extraction, and (c) T3: Sentiment Polarity Identification. Employed classifiers include Naïve Bayes, Bayes Networks, Decision Tree, K-Nearest Neighbor (K-NN), and Support-Vector Machine (SVM).The approach was evaluated using a reference dataset based on Semantic Evaluation 2016 workshop (SemEval-2016: Task-5). Results show that the supervised learning approach outperforms related work evaluated using the same dataset. More precisely, evaluation results show that all classifiers in the proposed approach outperform the baseline approach, and the overall enhancement for the best performing classifier (SVM) is around 53% for T1, around 59% for T2, and around 19% in T3.