ArticlePDF Available

Abstract and Figures

Sentiment analysis and opinion mining is closely coupled with each other. An extensive research work is being carried out in these areas by using different methodologies. Sentiments in a given text are identified by these methodologies as either positive, negative or neutral. Tweets, facebook posts, user comments about certain topics and reviews regarding product, software and movies can be the good source of information. Sentiment Analysis techniques can be used on such data by businesses executives for future planning and forecasting. As the data is obtained from multiple sources and it depends directly on the user which can be from any part of the world so the noisiness in data is a common issue such as mistake in spellings, grammatical errors and improper punctuation. Different approaches are available for sentiment analysis which can automatically sort and categorize the data. These approaches are mainly categorized as Machine Learning based, Lexicon based and Hybrid. A hybrid approach is the combination of machine learning and lexicon based approach for the optimum results, this approach generally yields better results. In this research work different hybrid techniques and tools have been discussed and analyzed from different aspects.
No caption available
… 
Content may be subject to copyright.
INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, VOL. 8, NO. 4, JUNE 2017
[ISSN: 2045-7057] www.ijmse.org 28
Abstract Sentiment analysis and opinion mining is closely
coupled with each other. An extensive research work is being
carried out in these areas by using different methodologies.
Sentiments in a given text are identified by these methodologies
as either positive, negative or neutral. Tweets, facebook posts,
user comments about certain topics and reviews regarding
product, software and movies can be the good source of
information. Sentiment Analysis techniques can be used on such
data by businesses executives for future planning and forecasting.
As the data is obtained from multiple sources and it depends
directly on the user which can be from any part of the world so
the noisiness in data is a common issue such as mistake in
spellings, grammatical errors and improper punctuation.
Different approaches are available for sentiment analysis which
can automatically sort and categorize the data. These approaches
are mainly categorized as Machine Learning based, Lexicon
based and Hybrid. A hybrid approach is the combination of
machine learning and lexicon based approach for the optimum
results, this approach generally yields better results. In this
research work different hybrid techniques and tools have been
discussed and analyzed from different aspects.
Keywords Hybrid Technique for Sentiment Analysis, Opinion
Mining, Polarity Detection and Social Media
I. INTRODUCTION
he combination of the lexicon based approach and
machine learning approach have improved the
classification performance compared to machine learning
and lexicon approach alone. Due to rapid increase and
globalization of internet, millions of users come online daily
and the amount of user-generated information and data is
increasing with the same pace. The internet has become the
need for several services and businesses in our daily lives. A
lot of textual data is generated by people using social websites
such as facebook and twitter in the form of posts and tweets.
Some of the websites and blogs today contain the section of
user’s comments or feedback so valuable information can also
be taken from these sites to get the sentiments of the users
about any particular topic or the feedback about new product
or software etc. Extraction of sentiments from such data can
yield valuable information about any particular topic, movie
and product services etc [1]. Several tools and techniques are
available now days to extract and classify the sentiments from
the provided data as either positive, negative or neutral. Tools
and techniques from Lexicon based approach uses domain
specific dictionary and lexicons as the major source of lookup
for sentiment classification[2]. These lexicons have
predefined semantic orientations that are later compared with
the input data set for classification as explained by [1][7].
Machine learning based approach on the other hand follow the
supervised learning algorithms such as Naive Bayes and
Support Vector Machine to create the training data set
[8][10]. Then on the basis of this trained dataset the inputs
are compared and classified as either positive, negative or any
other sentiment [11], [12]. The Hybrid approach which uses
the combination of both lexicon based approach and machine
learning approach. The basic goal of this combination is to
yield the best and optimum results using the effective feature
set of both lexicon and machine learning based techniques,
and to overcome the deficiencies and limitations of both
approaches. Many researchers have combined different
lexicon and machine learning based techniques to generate
better and effective hybrid tools [13][17]. In this research,
we will study, analyze and compare different hybrid tools and
techniques for sentiment classification and will discuss
different feature sets and accuracies of the studied approaches.
II. HYBRID TOOLS AND TECHNIQUES
A. pSenti
pSenti is a concept-level sentiment analysis tool that was
presented by [18], it combines lexicon and learning based
sentiment classification methods. As compared to the pure
lexicon based methods pSenti achieved greater accuracy in
sentiment strength detection and polarity classification. On the
other hand, when the tool was compared against pure machine
learning based methods it yielded slightly lower accuracy.
Extensive experiments on two different datasets i.e., CNet
Software Reviews Dataset and IMDB Movie Reviews Dataset
for the evaluation of the proposed approach were performed.
Learning based approach used in the proposed method is not
only responsible for tiny tasks like adjustment of sentiment
values or sentiment words detection but it is also responsible
for evaluation of all aspects of sentiment system.
The main component of the system measures the given
opinionated text and gives the output in terms of collective
Hybrid Tools and Techniques for Sentiment
Analysis: A Review
T
Munir Ahmad1, Shabib Aftab2, Iftikhar Ali3 and Noureen Hameed4
1-4Department of Computer Science, Virtual University of Pakistan
1munirahmad@gmail.com, 2shabib.aftab@gmail.com
INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, VOL. 8, NO. 4, JUNE 2017
[ISSN: 2045-7057] www.ijmse.org 29
sentiment, such as customer feedback. The final results are
shown with a real valued score between -1 and +1 that can be
transformed as either positive/negative or into a score between
1-5 stars in a latter stage. Advantages of the proposed
approach are that the system can be extended by adding new
linguistic rules or sentiment lexicon can be expanded at any
instance/level. The proposed system is not sensitive to the
changes in the topic. It works better than SentiStrength [5] and
lexicon only as well but its accuracy is slightly lower than
learning only.
B. Combining Lexicon Based and Learning Based
methods for Twitter Sentiment Analysis
For entity level sentiment analysis, [19] used an augmented
lexicon based method. First, they obtained additional
opinionated indicator, i.e. words and symbols, by applying
Chi-square test on results gathered from the lexicon-based
method. Additional opinionated tweets were identified with
the help of new opinionated indicators. For entities in the
newly identified tweets, a sentiment classification algorithm is
employed to assign sentiment polarity scores. The result of the
lexicon method is basically the training data for the classifier
and the whole process has no manual labeling except test set.
This research used five datasets based on the query entities
Obama, Harry Potter, Tangled, iPad and Packer. Proposed
method achieved 85.4% accuracy on the five datasets used in
this research. In the proposed technique (LMS) a relative
improvement over the lexicon-based method was observed.
However, it performed worse in comparison to the pure
learning-based technique but having advantage that it does not
require pre-labeled data. Therefore, the proposed approach is
easy in implementation but cost some performance.
C. SAIL
Another hybrid methodology was developed by [15]. This
study proposed a system for twitter and SMS sentiment
analysis based on hierarchical model, affective lexicon and a
language modeling approach. It is observed that language
model was not good alone but an improved performance was
noticed when using with lexicon-based model. The
hierarchical model proved very successful even using the
n-grams, affective ratings and part-of-speech. The proposed
tool uses an affective lexicon that was spontaneously
generated from massive corpora of raw web data. Words and
bigrams are used for affective ratings calculations and
statistics. As far as the unconstrained data is concerned the
lexicon models were combined with a learning classifier that
is based on the Max-Ent language models that are primarily
taught on a huge external dataset. These two classification
methods for sentiment analysis are then combined to
formulate the final results. The combination of the two proved
to be affective and yielded better results.
D. NILC_USP
The researchers in [14] describes NILC_USP system in
SemEval-2013 and proposed a trio classification process that
combines three classification approaches i.e. the rule-based
approach, the lexicon-based approach and the machine
learning based approach. The proposed algorithm has five
steps.
Normalization: The first step is normalization of the given
input dataset, it can also be referred as pre-processing, it
basically cleans and normalizes the input text, and following
operations are performed by this step.
- Hashtags, URLs and mentions are formulated in
consistent set of codes
- Emoticons are categorized as per their physical
appearance as either happy, sad, laugh, etc. and
assigned with particular codes
- Exaltation signals are detected and marked such as
multiple signs of exclamation
- Misspelled words are corrected
- Part-of-speech tagging is performed
Rule Based Classifier: In this step the pre-processed text is
handed over to the rule based classifier, the only rule applied
by this classifier uses emoticons which are present in the given
text. Empirically it was noticed that the presence of the
positive emoticons in the sentences and tweets are the
indicator of an overall positivity in the text. Likewise, the
presence of negative and bad emoticons refers to negative
aspects in the given text. This step returns a number of
appearances of positive and negative emoticons in the result.
Lexicon-based Classifier: In the proposed system the
lexicon provided by SentiStrength [5] was used. This lexicon
provides a vocabulary of emotions, an emoticons list, negation
and boosting words list. The semantic orientation of every
single word in the given text is calculated in the proposed
algorithm. The polarity of the word is decreased if the words
are negated, likewise the polarity is increased when the words
are intensified, the classifier labels the text as positive,
negative or neutral.
Machine Learning Classifier: Labeled examples are used
by the Machine learning classifiers to learn and classify the
given text, SVM algorithm provided by CLiPS pattern was
used. In the proposed model, bag of words, part of speech sets
and the existence of negation in the sentences were used as the
feature set by the classifier.
The results of this study showed that the hybrid classifier
approach can improve results based on the advantage of
multiple sentiment analysis techniques over rule-based,
lexicon-based and machine learning methods.
E. Combining Lexicon based and Learning based
approaches for improved performance and convenience in
sentiment classification
[16] proposed a hybrid approach to improve the
performance of sentiment analysis process. The programing
language chosen for the implementation of this algorithm was
Python. The proposed algorithm is composed of three steps
after pre-processing, the first part refers to the lexicon-based
model and it deals with finding the optimum parameters for
the classifier. While the second part refers to the learning-
INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, VOL. 8, NO. 4, JUNE 2017
[ISSN: 2045-7057] www.ijmse.org 30
based model and deals with the analysis of the model that
performs better. Lastly, the third part refers to the hybrid
model that analyze and decides the optimal MID ratio.
The Lexicon-Based Model: A training set is not required by
the lexicon-based model. Only a lexicon is required from
which the classifier fetches the sentiment classification and
negation words, and the aforementioned test set that the
classifier runs on for further processing. The lexicon based
model used in the proposed research work was AFINN as
described by [20].
The Learning Based Model: At this stage, it utilizes the
aforementioned SciKit Learn framework, that provides a
pipeline structure and allows several transformations to be
applied to the data and formulate it as needed, creating a final
model that classifies the data. By replacing the modeling part
of the pipeline structure it can be tested with different
classifiers to evaluate and calculate which classifier yields the
best and optimal results. Following three classifiers were
tested by the researchers Multinomial Naive Bayes, Bernoulli
naive Bayes and SVM.
F. A Hybrid approach for sentiment classification of
Egyptian Dialect Tweets
A hybrid approach was proposed by [21] that was crafted to
improve the performance measures of sentiment analysis for
the Arabic Language. This study focused on tweets sentiment
classification for Egyptian dialect. Arabic is one of the widely
used languages on the web [22]. Many researchers have
worked on Arabic language sentiment analysis on different
data sets with different tools and algorithms [23].
Following steps were carried out by the researcher for the
implementation of the hybrid technique:
- Step 1: The features to be used by the machine
learning approach are identified and separated.
- Step 2: The annotated corpus to be used for training
and validation of the best classifier at different
corpus sizes is built by the system.
- Step 3: Sentiment lexicon of different sizes is built
using the annotated corpus
- Step 4: Theses different approaches are combined and
tested for better and optimized results
- Step 5: Straight forward and simple method is crafted
to detect negations in the hybrid approach
The results obtained by this study using hybrid approach
showed better performance than other sentence-level
classification systems,
G. Sentiment Analysis: A Review and Comparative
Analysis of Web Services
The authors in [24] conducted the comparison of 15
sentiment analysis techniques/tools , many of these tools were
based on hybrid approach (combining the Machine Learning
based algorithm and Lexicon based algorithms). According to
researcher tools like Alchemy and Semantria can be used for
any kind of text classification even if the texts are large in
size. These tools can be the good option if the text is ironic.
Moreover, other tools such as Wingify and Viralheat may not
be the good options due to the less effective results however
further testing of these tools on different data sets is needed.
They have pointed out that there are many interlinked and
closely coupled tasks which are observed during the sentiment
analysis; it is difficult to separate them clearly as most of them
are quite close to each other and share common aspects. Some
of the important tasks are as under:
- Sentiment Classification: Each text, sentence or document
represents some sentiments which may be positive, negative or
neutral. Searching for these sentiments are sometimes referred
as sentiment orientation or sentiment polarity detection as
described by [25].
- Subjectivity Classification: An objective sentence may
contain factual information while the subjective sentence may
contain opinion, emotion and belief etc. Subjectivity detection
is a crucial task in sentiment analysis. This process is deemed
to be even more complex than normal sentiment classification
(positive, negative or neutral) as explained by [26][29].
- Opinion Summarization: It is an important task to
summarize the opinion within a text and detects the major
features of an object shared within one or multiple documents
as explained by [30].
Other than these three, there are other tasks such as Opinion
retrieval [31], Sarcasm and Irony detection [32] and
others [33].
H. Alchemy API
Alchemy API [34] is offered as a service and used for
enriching the text content using automated tagging, semantic
analysis and semantic mining. It is a hybrid tool based on NLP
and machine learning algorithms. It offers features like named
entity extraction, concept tagging, keyword extraction,
sentiment analysis, relation extraction, automatic language
identification, structured data extraction and many other
features[35]. IBM acquired Alchemy API in 2015 and this
technology in now a core component of cognitive APIs
offered on IBM’s Watson developer cloud. All the services
are accessed via HTTP REST interface and different SDKs
are available for Java, C# or Perl. The researcher explained
the usage of Alchemy API for enterprise grade text analysis in
[36]. It classifies the sentiment from text being analyzed into
three categories: Positive, Negative and Neutral. The degree
of sentiment is measured in the range of [-1,1] and it supports
English and German languages. The API is capable of
performing sentiment analysis on document, entity or
keywords level and it is able to detect directional sentiment
for subject-action-object relations.
I. Building Large-Scale Twitter-Specific Sentiment
Lexicon: A Regression Learning Approach
The study [37] proposed TS-Lex, that is a large scale
twitter specific lexicon and it is based on a representation
learning approach. The proposed methodology was comprised
INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, VOL. 8, NO. 4, JUNE 2017
[ISSN: 2045-7057] www.ijmse.org 31
of two parts. In the first part a representation learning
algorithm was used for effective learning of phrases
embedding, which were later used as features for
classification. In the second part a seed expansion algorithm
was used. This algorithm expands a small list of sentiment
seeds to obtain the training data from them which will be
further used for building the phrase-level classifier. Precisely
the tailored neural architecture was introduced that integrated
the sentiment information of tweets with its hybrid loss
function and then it was used for learning sentiment-specific
phrase embedding (SSPE). SSPE was obtained by looking for
positive and negative emoticons in the tweets, no manual
annotation was made on it. To further collect the training data,
alike phrases from Urban Dictionary were used to expand a
trivial list of sentiment seeds that were later used to build
phrase level classifier. TS-Lex experimental results showed
that sentiment lexicons that were previously introduced were
out performed by this algorithm and further it adds
improvements to the top-performing system in SemEval 2013
by combining features.
J. Sentiment Analysis on Twitter
A hybrid approach was proposed by [38], both the corpus
based and the dictionary based approaches were used in it to
detect semantic orientation of the opinion words from twitter
dataset. To obtain the sentiment polarity, opinion words were
taken from the dataset (Combination of adjectives, verbs and
adverbs). Adjectives score was calculated using log linear
classifier whereas verbs and adverbs score was calculated
using word seed list. If the verbs and adverbs are not
recognized by the WordNet then they are rejected because
they may not be the legitimate words. Afterwards the corpus
based approach was used to find the linguistic orientation of
the adjectives while the dictionary based method was used to
find semantic orientation of verbs and adverbs. If the
orientation was not calculated, these would be de-listed from
the opinion word list. An emotion intensifier was applied
through a linear equation and overall sentiment of the tweet
was calculated. A case study of a tweet was presented for
illustration purposes to verify the effectiveness of the
suggested method. The experimental results proved that the
proposed system has the features of recognizing the semantic
orientation and served as a partial view of the occurrence.
Study recommends more research using larger samples to
validate or invalidate these findings.
K. Sentiment Analysis using Sentiment Features
The study [39] proposed a hybrid approach for twitter
sentiment analysis. Sentiment lexicons were used to generate a
new feature set and this lexicon was used to train a linear
SVM classifier. The results showed that the suggested hybrid
method outperformed the state of the art unigram baseline. It
was evaluated in perspective of sentiment analysis that moving
towards sentiment features is optimal than conventional text
processing features. All the features can be computed in a very
short time and it performs better than unigram feature set. The
proposed system has a low memory and time complexity
because of very small feature set size. The baseline SVM
unigram model with emoticons and stop words was selected
because it performed better than all other combinations. The
SVM achieved an overall accuracy of 86.7% as our baseline
and it performed better than Naïve Bayes and likewise Naïve
Bayes performed better than MaxEnt. The proposed method
showed the accuracy of 89.13% with significant margin with
the baseline.
L. Sentiment Analysis using Support Vector Machines
with diverse information sources
Tony Mullen and Nigel Collier worked on the sentiment
analysis with the help of support vector machine. In this study
[40] they used diverse information sources. For the
classification of text, author introduced negative and positive
approach using SVM. SVM is powerful and well known tool
that allows to classify the vectors of real valued feature. The
proposed method was applied on Movie reviews data set from
Epinions.com and the results showed that the hybrid SVM
which combines unigram styled feature based on SVMs
showed better result as compared to the SVMs that are based
on real-valued favorability measures. Current techniques
emphasize on the use of variety of random information
sources and SVM helps as an ideal tool to bring the sources
together. Researchers used different techniques of assigning
semantic importance to words & phrases available in the text.
In this approach the researchers concluded that words within
the text worked in an efficient way as compared to the old
approach (bag-of-word). The model is further combined with
unigram models that have shown effective results in the past
as explained by [41].
M. Improving Twitter Sentiment Analysis with Topic-
Based Mixture Modeling and Semi-Supervised Training
Multiple approaches to improve Twitter sentiment analysis
were studied by Bing Xiang & Liang Zhou [42]. They
proposed improvement of twitter sentiment Analysis with the
help of topic based mixture modeling approach along with
semi supervised training. The aim of this study was the
presentation of different approaches for advanced Twitter
sentiment analysis. In this study initially they built a state of
the art baseline for rich feature set then a topic-based
sentiment mixture model was built having the topic specified
data arranged in a semi supervised training structure. The
information regarding topic is generated with the help of topic
modeling which is based on an application of LDA (Latent
Dirichlet Allocation).The proposed approach performed better
than the top system in the task SemEval-2013 in terms of
averaged F-Scores. Several experiments were carried out on
data from the task B of Sentiment Analysis in Twitter in
SemEval-2013. They used data distributed in positive,
negative and neutral to tune parameters and features of
classification. Experiments showed that weighting adds 2% of
improvement and the universal sentiment mode achieved 69.7
average F-Score with all features combined.
INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, VOL. 8, NO. 4, JUNE 2017
[ISSN: 2045-7057] www.ijmse.org 32
N. MSA-COSRs
Multi-aspect sentiment analysis was analyzed by Xianghua
et. al [43] for the Chinese online social reviews that was based
on topic modeling and the HowNet lexicon. In this research
authors proposed an efficient way to spontaneously find the
aspects that are under discussion in Chinese social reviews.
They called this approach as a Multi-aspect Sentiment
Analysis for Chinese Online Social Reviews (MSA-COSRs).
In this study first they applied the Latent Dirichlet Allocation
(LDA) model to find out the multi aspect global topics of
social reviews, after that they extracted the local topics and
sentiment associated with it. Multi aspect analysis is
composed of two subtasks: first is pulling out the aspects and
the second subtask is orientation of sentiment calculation of
aspect. The LDA trained model identified the aspects of local
topics and polarity of sentiment related with text is classified
by HowLexicon. Results of this approach help in improving
the sentiment analysis. Multi fine grained topics and linked
sentiments are identified by it. This is very helpful to tackle
the sentiment analysis and it helps to study the sentiment
orientation with deep accuracy. With the success of this
method it is difficult to train the LDA model for a suitable
topic. Experimental results showed that the proposed model
not only gain optimal topic partitioning results, but it also
helps in the improvement of sentiment analysis accuracy.
III. DISCUSSION
Sentiment Analysis and classification is partially dependent
on the sentimental separation of the text, reviews, comments
or any input datasets. The lexicon based approach works
better when there is a clear boundary between the positive and
negative sentiments within the input dataset. When there are
no clear boundaries between the specific sentiments on the
target dataset the machine learning based approach works
better. One of the main reasons behind the poor sentiment
separation in the text obtained from different sources on the
web like Facebook posts, tweets, product and movie reviews
is that these are user entered data and may contain wrong
punctuations, grammatical mistakes, fuzzy and noisy texts.
We've discussed different hybrid techniques in this paper
which performed better than the lexicon based approach and
the learning based techniques. The ease of implementation
which makes the hybrid approach a substantial and affective
option for sentiment analysis. Comparison of the feature list
and the results obtained on different data sets have been
arranged and presented in this research for a better
understanding of the hybrid approach and for future reference.
IV. CONCLUSION
There are a lot of studies available on the hybrid methods
for sentiment classification but comprehensive and compact
information on this particular topic was required. In our
research we have discussed different hybrid techniques and
tools. Significant outcomes and results have been obtained
while comparing these hybrid techniques and tools. Our study
will serve the researchers to have a better view of the hybrid
approach for sentiment classification. A comparative analysis
of the techniques by using different dataset is also available in
the research that can be further extended.
Table 1: Tools / Techniques, Features and their accuracy
REFERENCES
[1] S. J. M. Modha, Jalaj S. , Gayatri S. Pandi, “Automatic
Sentiment Analysis for Unstructured Data,” Int. J. Adv. Res.
Comput. Sci. Softw. Eng., vol. 3, no. 12, pp. 9197, 2013.
[2] F. M. Kundi, A. Khan, S. Ahmad, and M. Z. Asghar,
“Lexicon-Based Sentiment Analysis in the Social Web,” J.
Basic. Appl. Sci. Res, vol. 4, no. 6, pp. 238248, 2014.
[3] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede,
“Lexicon-Based Methods for Sentiment Analysis,” Comput.
Linguist., vol. 37, no. 2, pp. 267307, 2011.
[4] M. Ahmad, S. Aftab, S. S. Muhammad, and U. Waheed,
“Tools and Techniques for Lexicon Driven Sentiment
Analysis : A Review,” Int. J. Multidiscip. Sci. Eng., vol. 8, no.
1, pp. 1723, 2017.
[5] M. Thelwall, K. Buckley, G. Paltoglou, and D. Cai,
“Sentiment Strength Detection in Short Informal Text,” Am.
Soc. Informational Sci. Technol., vol. 61, no. 12, pp. 2544
2558, 2010.
[6] M. Thelwall and K. Buckley, “Topic-based sentiment analysis
for the social web: The role of mood and issue-related words,”
J. Am. Soc. Inf. Sci. Technol., vol. 64, no. 8, pp. 16081617,
2013.
[7] X. Ding, X. Ding, B. Liu, B. Liu, P. S. Yu, and P. S. Yu, “A
holistic lexicon-based approach to opinion mining,” Proc. Int.
Conf. Web search web data Min. - WSDM ’08, p. 231, 2008.
[8] J. Fang and B. Chen, “Incorporating Lexicon Knowledge into
INTERNATIONAL JOURNAL OF MULTIDISCIPLINARY SCIENCES AND ENGINEERING, VOL. 8, NO. 4, JUNE 2017
[ISSN: 2045-7057] www.ijmse.org 33
SVM Learning to Improve Sentiment Classification,” Proc.
Work. Sentim. Anal. where AI meets Psychol., pp. 94100,
2011.
[9] N. Vasfisisi, M. Reza, and F. Derakhshi, “Text Classification
with Machine Learning Algorithms,” J. Basic. Appl. Sci. Res,
vol. 3, pp. 3135, 2013.
[10] M. Ahmad, S. Aftab, S. S. Muhammad, and S. Ahmad,
“Machine Learning Techniques for Sentiment Analysis: A
Review,” Int. J. Multidiscip. Sci. Eng., vol. 8, no. 3, pp. 27
32, 2017.
[11] P. Goncalves, B. Fabrício, A. Matheus, and C. Meeyoung,
“Comparing and Combining Sentiment Analysis Methods
Categories and Subject Descriptors,” Proc. first ACM Conf.
Online Soc. networks, pp. 2738, 2013.
[12] J. Khairnar and M. Kinikar, “Machine Learning Algorithms
for Opinion Mining and Sentiment Classification,” Int. J. Sci.
Res. Publ., vol. 3, no. 6, pp. 16, 2013.
[13] R. Prabowo and M. Thelwall, “Sentiment analysis: A
combined approach,” J. Informetr., vol. 3, no. 2, pp. 143157,
2009.
[14] P. P. Balage Filho and T. A. S. Pardo, “NILC{_}USP: A
Hybrid System for Sentiment Analysis in Twitter Messages,”
in Second Joint Conference on Lexical and Computational
Semantics (*SEM), Volume 2: Proceedings of the Seventh
International Workshop on Semantic Evaluation (SemEval
2013), 2013, vol. 2, no. SemEval, pp. 568572.
[15] N. Malandrakis, A. Kazemzadeh, A. Potamianos, and S.
Narayanan, “SAIL : A hybrid approach to sentiment analysis,”
vol. 2, no. SemEval, pp. 438442, 2013.
[16] F. Sommar and M. Wielondek, “Combining Lexicon- and
Learning-based Approaches for Improved Performance and
Convenience in Sentiment Classification,” 2015.
[17] S. Tan, Y. Wang, and X. Cheng, “Combining learn-based and
lexicon-based techniques for sentiment detection without
using labeled examples,” Proc. 31st Annu. Int. ACM SIGIR
Conf. Res. Dev. Inf. Retr. SIGIR 08, p. 743, 2008.
[18] A. Mudinas, D. Zhang, and M. Levene, “Combining lexicon
and learning based approaches for concept-level sentiment
analysis,” Proc. First Int. Work. Issues Sentim. Discov. Opin.
Min. - WISDOM ’12, pp. 18, 2012.
[19] L. Zhang, R. Ghosh, M. Dekhil, M. Hsu, and B. Liu,
“Combining lexicon-based and learning-based methods for
Twitter sentiment analysis,” Int. J. Electron. Commun. Soft
Comput. Sci. Eng., vol. 89, pp. 18, 2015.
[20] F. Å. Nielsen, “A new ANEW: Evaluation of a word list for
sentiment analysis in microblogs,” in CEUR Workshop
Proceedings, 2011, vol. 718, pp. 9398.
[21] A. Shoukry and A. Rafea, “A hybrid approach for sentiment
classification of Egyptian dialect tweets,” Proc. - 1st Int. Conf.
Arab. Comput. Linguist. Adv. Arab. Comput. Linguist. ACLing
2015, pp. 7885, 2016.
[22] M. Elhawary and M. Elfeky, “Mining Arabic business
reviews,” in Proceedings - IEEE International Conference on
Data Mining, ICDM, 2010, pp. 11081113.
[23] R. Duwairi, M. N. Al-refai, and N. Khasawneh, “Feature
Reduction Techniques for Arabic Text Categorization,” J. Am.
Soc. Inf. Sci., vol. 60, no. 11, pp. 23472352, 2009.
[24] J. . Serrano-Guerrero, J. A. . Olivas, F. P. . Romero, and E. .
C. Herrera-Viedma, “Sentiment analysis: A review and
comparative analysis of web services,” Inf. Sci. (Ny)., vol. 311,
pp. 1838, 2015.
[25] L. C. Yu, J. L. Wu, P. C. Chang, and H. S. Chu, “Using a
contextual entropy model to expand emotion words and their
intensity for the sentiment classification of stock market
news,” Knowledge-Based Syst., vol. 41, pp. 8997, 2013.
[26] L. Barbosa and J. Feng, “Robust Sentiment Detection on
Twitter from Biased and Noisy Data,” Coling, no. August, pp.
3644, 2010.
[27] A. Esuli and F. Sebastiani, “SENTIWORDNET: A Publicly
Available Lexical Resource for Opinion Mining,” Proc. 5th
Conf. Lang. Resour. Eval., pp. 417422, 2006.
[28] I. Maks and P. Vossen, “A lexicon model for deep sentiment
analysis and opinion mining applications,” in Decision
Support Systems, 2012, vol. 53, no. 4, pp. 680688.
[29] S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet 3 .
0 : An Enhanced Lexical Resource for Sentiment Analysis and
Opinion Mining SentiWordNet,” Analysis, vol. 0, pp. 112,
2010.
[30] D. Wang, S. Zhu, and T. Li, “SumView: A Web-based engine
for summarizing product reviews and customer opinions,”
Expert Systems with Applications, vol. 40, no. 1. pp. 2733,
2013.
[31] L. Guo and X. Wan, “Exploiting syntactic and semantic
relationships between terms for opinion retrieval,” Journal of
the American Society for Information Science and Technology,
vol. 63, no. 11. pp. 22692282, 2012.
[32] A. Reyes and P. Rosso, “Making objective decisions from
subjective data: Detecting irony in customer reviews,” in
Decision Support Systems, 2012, vol. 53, no. 4, pp. 754760.
[33] J. Savoy, “Authorship Attribution Based on Specific
Vocabulary,” ACM Trans. Inf. Syst., vol. 30, no. 2, p. Art. nos.
12, 1--30, 2012.
[34] “AlchemyAPI.” [Online]. Available:
https://www.ibm.com/watson/alchemy-api.html.
[35] K. Shaalan and H. Raza, “NERA: Named entity recognition
for Arabic,” J. Am. Soc. Inf. Sci. Technol., vol. 60, no. 8, pp.
16521663, 2009.
[36] J. Turian and D. Ph, “Using AlchemyAPI for Enterprise-Grade
Text Analysis,” 2013.
[37] D. Tang, F. Wei, B. Qin, M. Zhou, and T. Liu, “Building
Large-Scale Twitter-Specific Sentiment Lexicon: a
Representation Learning Approach,” Proc. 25th Int. Conf.
Comput. Linguist. (COLING 2014), pp. 172182, 2014.
[38] T. M. S. Akshi Kumar, A. Kumar, and T. M. Sebastian,
“Sentiment Analysis on Twitter,” IJCSI Int. J. Comput. Sci.
Issues, vol. 9, no. 4, pp. 372378, 2012.
[39] S. A. Bahrainian and A. Dengel, “Sentiment Analysis using
sentiment features,” Proc. - 2013 IEEE/WIC/ACM Int. Jt.
Conf. Web Intell. Intell. Agent Technol. - Work. WI-IATW
2013, vol. 3, pp. 2629, 2013.
[40] T. Mullen and N. Collier, “Sentiment analysis using support
vector machines with diverse information sources,” Conf.
Empir. Methods Nat. Lang. Process., pp. 412418, 2004.
[41] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?:
sentiment classification using machine learning techniques,”
Proc. Conf. Empir. Methods Nat. Lang. Process., pp. 7986,
2002.
[42] B. Xiang, “Improving Twitter Sentiment Analysis with Topic-
Based Mixture Modeling and Semi-Supervised Training.,”
Acl, pp. 434439, 2014.
[43] F. Xianghua, L. Guo, G. Yanyan, and W. Zhiqiang, “Multi-
aspect sentiment analysis for Chinese online social reviews
based on topic modeling and HowNet lexicon,” Knowledge-
Based Syst., vol. 37, pp. 186195, 2013.
... The hybrid approach uses a combination of machine learning and lexicon-base approaches. This combination improves the classification performance (Ahmad, Aftab, Ali, & Hameed, 2017). ...
... Diverse machine learning tools and techniques for performing sentiment analysis have been explored and discussed in depth (Ahmad et al., 2017). In an attempt to improve sentiment classification and general sentiment analysis quality and accuracy, research has advanced to involve the combination of lexicon-based and machine-learning-based techniques in what is classically known as ensemble or hybrid techniques (Ahmad et al., 2017a). In the machine learning-based approach, several techniques have been used, including the following: Support Vector Machine (SVM), Naïve Bayes (NB), Random Forests (RF), Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Decision Trees (DT), Logistic Regression, and Maximum Entropy (ME). ...
Article
Full-text available
The purpose of this paper is to review various studies on current machine learning techniques used in sentiment analysis with the primary focus on finding the most suitable combinations of the techniques, datasets, data features, and algorithm performance parameters used in most applications. To accomplish this, we performed a systematic review of 24 articles published between 2013 and 2020 covering machine learning techniques for sentiment analysis. The review shows that Support Vector Machine as well as Naïve Bayes techniques are the most popular machine learning techniques; word stem and n-grams are the most extensively applied features, and the Twitter dataset is the most predominant. This review further revealed that machine learning algorithms' performance depends on many factors, including the dataset, extracted features, and size of data used. Accuracy is the most commonly used algorithm performance metric. These findings offer important information for researchers and businesses to use when selecting suitable techniques, features, and datasets for sentiment analysis for various business applications such as brand reputation monitoring.
... Venkateswarlu [19] noted that the lexicon-based approach requires a predefined lexicon. Ahmad [20] furthered that these lexicons have predefined semantic orientations that are later compared with the input data set for classification. Broadly, in a lexicon-based approach, a sentence is represented as a set of words. ...
... During training, the supervised classifier extracts the hidden patterns and relations among the dependent and independent features and develops a classification model. After training, a data with unknown output (test data) class is given to the classifier which is then predicted by the classification model on behalf of extracted patterns and rules from training data [10][11][12][13][14]. e task of software defect prediction is achieved by classifying a particular software instance (method, class, module, file, and package) as defective or nondefective. ...
Article
Full-text available
The demand for automated online software systems is increasing day by day, which triggered the need for high-quality and maintainable software's at lower cost. Software defect prediction is one of the crucial tasks of the quality assurance process which improves the quality at lower cost by reducing the overall testing and maintenance efforts. Early detection of defects in the software development life cycle (SDLC) leads to the early corrections and ultimately timely delivery of maintainable software, which satisfies the customer and makes him confident towards the development team. In the last decade, many machine learning based approaches for software defect prediction have been proposed to achieve the higher accuracy. Artificial Neural Network (ANN) is considered as one of the widely used machine learning techniques, which is included in most of the proposed defect prediction frameworks and models. This research provides a critical analysis of the latest literature, published from year 2015 to 2018 on the use of Artificial Neural Networks for software defect prediction. In this study, a systematic research process is followed to extract the literature from three widely used digital libraries including IEEE, Elsevier, and Springer, and then after following a thorough process, 8 most relevant research publications are selected for critical review. This study will serve the researchers by exploring the current trends in software defect prediction with the focus on ANNs and will also provide a baseline for future innovations, comparisons, and reviews.
... Machine learning algorithms are being focused by the many researchers to recognize the hidden patterns as well as to mine the valuable information from raw data. Some of the research fields in which machine learning played a vital role, include: sentiment analysis [12][13][14][15][16][17][18], rainfall prediction [19][20], and network intrusion detection [21][22], software defect prediction [23][24][25][26][27][28][29][30][31][32], health and medical data mining [33][34][35][36][37][38][39][40] ...
Article
Full-text available
Covid-19 pandemic has seriously affected the mankind with colossal loss of life around the world. There is a critical requirement for timely and reliable detection of Corona virus patients to give better and early treatment to prevent the spread of the infection. With that being said, current researches have revealed some critical benefits of utilizing complete blood count tests for early detection of COVID-19 positive individuals. In this research we employed different machine learning algorithms using full blood count for the prediction of COVID-19. These algorithms include: “K Nearest Neighbor, Radial Basis Function, Naive Bayes, kStar, PART, Random Forest, Decision Tree, OneR, Support Vector Machine and Multi-Layer Perceptron”. Further, “Accuracy, Recall, Precision, and F-Measure” are the performance evaluation measures that are utilized in this study.
... Both supervised and unsupervised methods can be used, although supervised learning is preferred, given that a sufficient quantity of labeled input documents is available (Walaa et al., 2014). -Hybrid approach combines the lexiconbased approach and machine learning approach for increased performance (Ahmad et al., 2017). ...
Article
Full-text available
Sentiment analysis is currently the most actively researched topic in the field of natural language processing, however, despite it being such a powerful tool, it is not very widely used in the agrarian sector. This research focuses on the discovery and analysis of scientific literature related to Sentiment analysis in agriculture, to provide an overview of how and where Sentiment analysis is used in the agrarian sector and which methods are most commonly used. This article also discusses which applications of Sentiment analysis yield the most benefits and suggests a direction for future research.
... Some of them aim to extend machine learning models with lexicon-based knowledge (Behera et al. 2016). The goal is to combine both methods to yield optimal results using an effective feature set of both lexicon and machine learning-based techniques (Munir Ahmad et al. 2017). This way, the deficiencies and limitations of both approaches can be overcome. ...
Article
Full-text available
With advanced digitalisation, we can observe a massive increase of user-generated content on the web that provides opinions of people on different subjects. Sentiment analysis is the computational study of analysing people's feelings and opinions for an entity. The field of sentiment analysis has been the topic of extensive research in the past decades. In this paper, we present the results of a tertiary study, which aims to investigate the current state of the research in this field by synthesizing the results of published secondary studies (i.e., systematic literature review and systematic mapping study) on sentiment analysis. This tertiary study follows the guidelines of systematic literature reviews (SLR) and covers only secondary studies. The outcome of this tertiary study provides a comprehensive overview of the key topics and the different approaches for a variety of tasks in sentiment analysis. Different features, algorithms, and datasets used in sentiment analysis models are mapped. Challenges and open problems are identified that can help to identify points that require research efforts in sentiment analysis. In addition to the tertiary study, we also identified recent 112 deep learning-based sentiment analysis papers and categorized them based on the applied deep learning algorithms. According to this analysis, LSTM and CNN algorithms are the most used deep learning algorithms for sentiment analysis.
Article
Full-text available
Twitter has become a unique platform for social interaction from people all around the world, leading to an extensive amount of knowledge that can be used for various reasons. People share and spread their own ideologies and point of views on unique topics leading to the production of a lot of content. Sentiment analysis is of extreme importance to various businesses as it can directly impact their important decisions. Several challenges related to the research subject of sentiment analysis includes issues such as imbalanced dataset, lexical uniqueness, and processing time complexity. Most machine learning models are sequential: they need a considerable amount of time to complete execution. Therefore, we propose a model sentiment analysis specifically designed for imbalanced datasets that can reduce the time complexity of the task by using various text sequenced preprocessing techniques combined with random majority under-sampling. Our proposed model provides competitive results to other models while simultaneously reducing the time complexity for sentiment analysis. The results obtained after the experimentation corroborate that our model provides great results producing the accuracy of 86.5% and F1 score of 0.874 through XGB.
Conference Paper
Over the years, e-learning has seen a boost with the advancement in technology. The very pertinent problem of Covid-19 led to the adoption of online education in India. This study aims to understand the online education in India in terms of benefits, challenges, and strategies using sentiment analysis and the positive and negative instances faced by Indian students and teachers during the pandemic time. Responses from students and teachers have been garnered during the pandemic Covid-19 as the primary source of data. The nature of the research conducted is exploratory and qualitative. Further, a comparative analysis has been undertaken using data visualization. Primary data has been collected using questionnaire and interview methods while secondary data through Twitter. The data pertains to the sentiments of undergraduate and management graduate college students and teachers from February 2020 to January 2022. The secondary data corpus collected from Twitter (5000 tweets) was scraped using Application Program Interfaces (APIs) for further analysis. The Findings suggest that most of the opinions in the primary data collected have been negative while the secondary data analysis shows that polarity is more on neutral sentiments and subjectivity is more on positive sentiments. Relevant recommendations have been made on the conclusion drawn.
Article
Full-text available
Short-term prediction of heavy precipitation events is especially crucial in flood warning and mitigation. This study offered a novel concept of the regional heavy precipitation based on the probability pattern of a typical rainstorm. Daily precipitation data of 12 synoptic stations located over southwestern Iran were used for this purpose. In addition, six synoptic variables at 1000 to 200 hPa pressure levels on one to five days before heavy precipitations (covering a wide range outside the study area) were used as predictors. All data used in this study cover the period 1987- 2018. Four feature selection methods and 10 binary classifier machine-learning models were employed in this study. The results revealed that using synoptic data up to four days prior to the events best distinguishes heavy precipitation from non-heavy precipitation events. In addition, among the four feature selection methods, Chi-Square and Extra Tree methods are superior to Correlation and Random Forest. As a result of this study, it was found that the Random Forest model with the Chi-Square feature selection method has the highest efficiency in predicting regional heavy precipitation events in the study area. Relative humidity and specific humidity 1-2 days before and wind speed 2-4 days before the precipitation events are relevant synoptic variables for predicting heavy precipitation events.
Article
Full-text available
By increasing the access to electronic documents and rapid growth of World Wide Web, documents classification task automatically has become a key method to organizing information and knowledge discovery. The appropriate classification of electronic documents, online news, weblogs, emails and digital libraries required for text mining, machine learning techniques and natural language processing is to obtain meaningful knowledge. The aim of this paper is to highlight the major techniques and methods applied in classification of documents. In this paper, we review some existing methods of text classification.
Article
Full-text available
Social media platforms and micro blogging websites are the rich sources of user generated data. Through these resources, users from all over the world express and share their opinions about a variety of subjects. The analysis of such a huge amount of user generated data manually is impossible, therefore an effective and intelligent technique is needed which can analyze and provide the polarity of this textual data. Multiple tools and techniques are available today for automatic sentiment classification for this user generated data. Mostly, three approaches are used for this purpose Lexicon based techniques, Machine Learning based techniques and hybrid techniques (which combines lexicon based and machine learning based approach). Machine Learning approach is effective and reliable for opinion mining and sentiment classification. Many variants and extensions of machine learning techniques and tools are available today. The purpose of this study is to explore the different machine learning techniques to identify its importance as well as to raise an interest for this research area.
Article
Full-text available
The growth of user’s generated content increased in microblogging platforms like Facebook, Twitter and Blogger in form of client reviews, comments and opinion. Using this bulk of helpful data is difficult to analyze and also a time consuming task. So it is needed to have such an intelligent text mining system that automatically analyze such vast data and categorize them into positive or negative class. Due to the noisiness in data, it is difficult to design such text mining systems because they suffer from mistakes of spelling, grammatical and improper punctuation. Opinion mining is a useful tool to monitor consumer’s feedback and public mood about certain product in terms of negativity or positivity. For example the management of customer relations can use these feedbacks and improve the products by keeping in view the complaints. Lexical tools are one of the famous and useful techniques for sentiment classification. Many extensions and modifications of these tools are available now days. The purpose of this research is to study the available lexical tools and techniques to raise an interest for this research area
Article
Full-text available
With the rise of social networking epoch, there has been a surge of user generated content. Microblogging sites have millions of people sharing their thoughts daily because of its characteristic short and simple manner of expression. We propose and investigate a paradigm to mine the sentiment from a popular real-time microblogging service, Twitter, where users post real time reactions to and opinions about “everything”. In this paper, we expound a hybrid approach using both corpus based and dictionary based methods to determine the semantic orientation of the opinion words in tweets. A case study is presented to illustrate the use and effectiveness of the proposed system
Patent
Full-text available
A sentiment classifier for sentiment classification of content. An aspect classifier is configured to classify content as being related to a particular aspect of information, the aspect classifier incorporating at least a portion of the domain specific sentiment lexicon. A polarity classifier is then configured to classify the content classified by the aspect classifier as having one of a positive sentiment of the particular aspect of information, a negative sentiment of the particular aspect of information or as having no sentiment as to the particular aspect of information. The polarity classifier also incorporating at least a portion of the domain specific sentiment lexicon.
Article
Full-text available
Sentiment analysis is a compelling issue for both information producers and consumers. We are living in the " age of customer " , where customer knowledge and perception is a key for running successful business. The goal of sentiment analysis is to recognize and express emotions digitally. This paper presents the lexicon-based framework for sentiment classification, which classifies tweets as a positive, negative, or neutral. The proposed framework also detects and scores the slangs used in the tweets. The comparative results show that the proposed system outperforms the existing systems. It achieves 92% accuracy in binary classification and 87% in multi-class classification.
Article
Sentiment Analysis (SA), also called Opinion Mining, is currently one of the most studied research fields. It aims to analyze people’s sentiments, opinions, attitudes, emotions, etc., towards elements such as topics, products, individuals, organizations, and services. Different techniques and software tools are being developed to carry out Sentiment Analysis. The goal of this work is to review and compare some free access web services, analyzing their capabilities to classify and score different pieces of text with respect to the sentiments contained therein. For that purpose, three well-known collections have been used to perform several experiments whose results are shown and commented upon, leading to some interesting conclusions about the capabilities of each analyzed tool.