Content uploaded by Fajri Koto
Author content
All content in this area was uploaded by Fajri Koto on Jun 09, 2015
Content may be subject to copyright.
A Comparative Study on Twitter Sentiment
Analysis: Which Features are Good?
Fajri Koto, and Mirna Adriani
Faculty of Computer Science, University of Indonesia
Depok, Jawa Barat, Indonesia 16423
fajri91@ui.ac.id,mirna@cs.ui.ac.id
http://www.cs.ui.ac.id
Abstract. In this paper, investigations of Sentiment Analysis over a
well-known Social Media Twitter were done. As literatures show that
some works related to Twitter Sentiment Analysis have been done and
delivered interesting idea of features, but there is no a comparative study
that shows the best features in performing Sentiment Analysis. In total
we used 9 feature sets (41 attributes) that comprise punctuation, lexical,
part of speech, emoticon, SentiWord lexicon, AFINN -lexicon, Opinion
lexicon, Senti-Strength method, and Emotion lexicon. Feature analysis
was done by conducting supervised classification for each feature sets
and continued with feature selection in subjectivity and polarity domain.
By using four different datasets, the results reveal that AFINN lexicon
and Senti-Strength method are the best current approaches to perform
Twitter Sentiment Analysis.
Keywords: Twitter, Sentiment Analysis, Comparative Study, Polarity,
Subjectivity
1 Introduction
In general the goal of Sentiment Analysis is to determine the polarity of natural
language text by performing supervised and/or unsupervised classification. This
sentiment classification can be roughly divided into two categories: Subjectiv-
ity and Polarity [1]. The difference between subjectivity and polarity classifica-
tion is the class involved in conducting training and testing stage. Sentiment
of subjectivity comprises of subjective and objective class [2]. Whereas polarity
classification involves classes of positive,negative and neutral [3].
Many approaches [4-12] have been addressed to classify sentiment over Twit-
ter1. However, based on the previous study there is no a comparative study
that shows the good feature in performing Sentiment Analysis. Whereas, this
information will be necessary especially for today’s business that concern with
social media analysis in running their work. Driven by this fact, we first derive
all possible features and then investigate the cases by performing supervised
classification for each feature set.
1http://www.twitter.com
2 A Comparative Study on Twitter Sentiment Analysis
Table 1. List of all feature sets for Twitter Sentiment Analysis
Set #Attr List of Attribute Description
Punctuation [3],
range = {0,1,..,n}
5 Number of ”!”, ”?”, ”.”,
”,”, and special character
Number of corresponding
punctuation in a tweet
Lexical,
range1 = {0,1,..,n},
range2 = {false,true}
91)tweetLength, #lowercase,
#uppercase,
Aggregate{min, max, avg}
of #letterInWord,
#hashtag
The corresponding number of
attributes
2)haveRT True if the tweet contains ”RT ”
phrase, False otherwise
Part of Speech,
extracted by NLTK
Python [16]
range1 = {0,1,..,n},
range2 = {false,true}
81)#noun, #verb,
#adjective, #adverb,
#pronoun
Number of corresponding POS tag
in a tweet
2)hasComparative,
hasSuperlative,
hasPastPartciple
True if the tweet contains a
comparative/superlative adjective
or adverb; or a past participle,
False otherwise
Emoticon,
obtained from [3][5]
and Wikipedia
range = {-n,.0,1.,n}
1emoticonScore Increasing the score by +1 and -1
for positive and negative emoticon
respectively, initiated by 0
SentiWord Lex. [8],
range = {0,1,..,n}
2sumpos, sumneg sum of the scores for the positive
or negative words that matches the
lexicon
AFINN Lex. [9][10],
range1 = {0,1,..,n},
range2 = {-n,..,-1,0}
21) APO sum of the scores for the positive
words that matches the lexicon
2) ANE sum of the scores for the negative
words that matches the lexicon
Opinion Lex. (OL),
range = {0,1,..,n}
4 1) Wilson (positive words,
negative words) [6]
2) Bingliu (positive words,
negative words) [7]
sum of the scores for the positive
or negative words that matches the
lexicon
Senti-Strength (SS)
[12],
range1 = {-5,-4,..-1}
range2 = {1,2,..,5}
21)ssn method score for negative category
2)ssp method score for positive category
NRC Emotion Lex.
[11][13][14],
range = {0,1,..,n}
8 joy, trust, sadness, anger,
surprise, fear, disgust,
anticipation
number of words that matches
with corresponding emotion class
word list
A Comparative Study on Twitter Sentiment Analysis 3
2 Experiment with Feature of Sentiment Analysis
Table 2. Balanced Dataset
Subjectivity Sanders HCR OMD SemEval Polarity Sanders HCR OMD SemEval
#neutral 1190 280 800 2256 #negative 555 368 800 896
#objective 1190 280 800 2256 #positive 555 368 800 896
#total 2380 560 1600 4512 #total 1110 736 1600 1792
The experiment was conducted in two sentiment domains: polarity and sub-
jectivity. There are 4 different datasets: 1) Sanders [1], 2) Health Care Reform
(HCR) [15], 3) Obama-McCain Debate (OMD)6[15], and 4) International Work-
shop Sem-Eval 2013 (SemEval)2data (see Table 2) that were used in this work.
In total we used 9 feature sets (41 attributes) that comprise punctuation, lexical,
part of speech, emoticon, SentiWord lexicon, AFINN -lexicon, Opinion lexicon,
Senti-Strength method, and Emotion lexicon (see Table 1). For preprocessing
stage, it was adjusted based on the type of feature. It comprises: removing user-
name, url,RT phrase, special character, stopwords; converting to lowercase;
stemming and lemmatization. For the first experiment, we conducted binary
classification for each feature set on each dataset. We then also performed feature
selections of all feature sets (by merging the features into a set of 41 attributes)
to all row of datasets.
Results of these experiments are summarized in Table 3 and Table 4. Letter
A, B, C, and D in both tables represent the classifiers of Naive Bayes,Neural
Network, SVM, and Linear Regression consecutively. In the first experiment (see
Table 3), the colored cells are the top-5 of features according to its accuracy. For
both classifications, it reveals that AFINN, Senti-Strength, and Opinion lexicon
are the feature sets that are often found as the top-5 on each dataset. Whereas,
the well-known lexicon, SentiWord, is not able to beat these lexicons. It affirms
that SentiWord is not compatible for Twitter Sentiment Analysis. Our result also
shows that emotion and punctuation are good features for Twitter Sentiment
Analysis, especially in polarity classification.
In Table 4 we show the result of our second experiment, feature selection. The
column of each classifier (A, B, C and D) is filled by number of corresponding
attributes of each feature sets that arise based on feature selection. The table
reveals that punctuation, AFINN and Senti-Strength are the most selected fea-
tures either in subjectivity and polarity classification. It is quite similar with the
previous experiment and affirms that AFINN and Senti-Strength are the current
best feature to conduct Twitter Sentiment Analysis. Thus they are very good to
use as current baseline for Twitter Sentiment Analysis.
3 Conclusion and Future Work
In this work, a comparative study between various features of Twitter Sentiment
Analysis was done by using four different datasets and nine feature sets. Our
2http://www.cs.york.ac.uk/semeval-2013/
4 A Comparative Study on Twitter Sentiment Analysis
Table 3. Classification result for each feature set
Feature
Subjectivity
SemEval Sanders HCR OMD
A B C D A B C D A B C D A B C D
Punct. 56.4 56.7 55.4 57.4 57.6 57.3 57.3 59.3 56.1 59.6 58.6 62.7 56.1 62.1 56.4 60.5
Lexical 51.3 51.8 51.7 54.9 56.7 55.9 55.9 55.4 59.1 59.5 59.3 58.8 52.8 71.8 58.4 68.3
POS 52.4 56.3 55.1 57.4 60.6 60.2 61.4 61.9 59.6 57.0 59.6 59.3 51.1 49.9 50.7 50.8
Emoticon 54.7 53.0 53.4 53.4 53.4 51.3 50.0 48.2 50.7 51.6 51.3 50.5 50.6 49.8 49.8 50.4
SentiWord 58.6 60.6 60.2 60.4 60.4 62.6 60.7 61.1 56.1 57.4 56.3 54.3 50.8 50.2 50.6 47.6
AFINN 64.3 68.8 68.7 68.8 61.2 65.1 64.0 64.8 60.7 63.2 62.3 61.9 51.1 52.1 50.8 51.3
OL 62.0 62.4 62.3 62.7 60.5 63.8 58.9 63.1 61.6 60.5 59.6 61.6 66.4 66.3 63.9 66.1
SS 63.6 66.8 65.9 65.9 62.7 64.5 63.8 64.6 60.9 58.9 60.2 60.7 55.9 55.1 56.4 55.9
Emotion 57.0 58.1 58.6 59.1 58.2 56.3 57.2 57.1 56.1 59.8 52.5 55.9 51.2 50.1 51.0 50.6
Feature
Polarity
SemEval Sanders HCR OMD
A B C D A B C D A B C D A B C D
Punct. 61.8 59.8 61.5 62.1 57.7 57.5 56.2 57.8 63.6 62.2 60.2 64.7 58.8 59.1 59.9 59.2
Lexical 54.3 55.9 54.5 56.4 59.6 59.4 58.7 62.7 54.2 59.6 52.8 60.5 50.5 53.6 55.2 55.3
POS 55.7 55.7 56.1 55.8 57.6 62.6 61.0 60.6 49.9 49.5 50.8 49.1 57.6 56.9 57.0 56.5
Emoticon 55.0 55.3 55.1 55.1 52.8 52.1 52.2 53.3 51.4 49.6 48.8 49.6 49.6 50.3 50.6 50.6
SentiWord 60.9 60.7 60.6 60.4 58.7 56.1 59.3 56.2 56.3 54.3 54.5 55.0 52.7 52.1 52.7 53.9
AFINN 74.3 75.2 75.2 75.2 69.8 70.9 70.6 71.1 60.5 58.7 60.3 60.1 62.7 62.8 62.5 62.8
OL 68.5 70.2 70.1 69.8 70.0 68.2 69.2 70.2 59.6 59.3 61.0 61.7 60.3 62.9 61.1 58.9
SS 72.9 75.2 73.0 74.9 72.3 71.8 72.2 72.3 59.7 60.5 59.7 58.7 62.5 62.6 61.7 62.5
Emotion 66.4 66.2 66.4 68.5 65.7 65.1 63.9 66.5 55.8 52.6 55.3 55.9 59.1 57.9 57.6 59.2
Table 4. Feature selection result
Feature #Attr Subjectivity Polarity
A B C D A B C D
Punct. 5 1 3 1 1 2 2 2 2
Lexical 9 4 1 2 - 1 2 3 1
POS 8 2 - 1 - 3 - 1 -
Emoticon 1 - 1 - - - 1 1 1
SentiWord 2 1 1 1 - - 1 1 -
AFINN 2 2 1 2 2 1 2 2 2
OL 4 1 2 1 1 - - 2 1
SS 2 2 2 2 2 1 1 1 1
Emotion 8 - 3 2 - - - 2 1
Accuracy 65.5 67.4 63.4 66.0 71.5 73.9 73.5 75.0
experiment reveals that AFINN and Senti-Strength are the current best features
for Twitter Sentiment Analysis. According to the results, the other features such
as punctuation, Opinion lexicon and emotion are also important to consider.
Future research may be conducted along with new idea of features released and
investigated.
A Comparative Study on Twitter Sentiment Analysis 5
References
1. Bravo-Marquez, F., Mendoza, M., Poblete, B.: Combining strengths, emotions and
polarities for boosting Twitter sentiment analysis. In: Proceedings of the Second
International Workshop on Issues of Sentiment Discovery and Opinion Mining, 2
(2013).
2. Raaijmakers, S., Kraaij, W.: A Shallow Approach to Subjectivity Classification. In:
ICWSM (2008)
3. Aisopos, F., Papadakis, G., Tserpes, K., Varvarigou, T.: Content vs. context for
sentiment analysis: a comparative analysis over microblogs. In: Proceedings of the
23rd ACM conference on Hypertext and social media, pp. 187-196 (2012)
4. Go, A., Bhayani R., Huang L.: Twitter sentiment classification using distant super-
vision. In: CS224N Project Report, Stanford (2009)
5. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis
of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp.
30–38 (2011)
6. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level
sentiment analysis. In: Proceedings of the conference on human language technology
and empirical methods in natural language processing, pp. 347–354 (2005)
7. Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on
the web. In: Proceedings of the 14th international conference on World Wide Web,
pp. 342-351 (2005)
8. Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An Enhanced Lexical
Resource for Sentiment Analysis and Opinion Mining. In: LREC, Vol. 10, pp. 2200–
2204 (2010)
9. Bradley, M. M., Lang, P. J.: Affective norms for English words (ANEW): Instruction
manual and affective ratings. In: Technical Report C-1, The Center for Research in
Psychophysiology, University of Florida, pp. 1–45 (1999)
10. Nielsen, F. A.: A new ANEW: Evaluation of a word list for sentiment analysis in
microblogs. in: arXiv preprint arXiv: 1103.2903. (2011)
11. Mohammad, S. M., Turney, P. D.: Crowdsourcing a wordemotion association lexi-
con. In: Computational Intelligence, 29(3), pp. 436–465 (2013)
12. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the
social web. In: Journal of the American Society for Information Science and Tech-
nology, 63(1), pp. 163–173 (2012)
13. Ekman, P.: An argument for basic emotions. Cognition and Emotion, 6(3-4), pp.
169–200 (1992)
14. Plutchik, R.: The psychology and biology of emotion. HarperCollins College Pub-
lishers (1994)
15. Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.:Twitter polarity classification
with label propagation over lexical links and the follower graph. In: Proceedings of
the First workshop on Unsupervised Learning in NLP, pp. 53–63 (2011)
16. Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL
on Interactive presentation sessions, pp. 69–72 (2006)