Conference PaperPDF Available

A Comparative Study on Twitter Sentiment Analysis: Which Features are Good?

Authors:

Abstract and Figures

In this paper, investigations of Sentiment Analysis over a well-known Social Media Twitter were done. As literatures show that some works related to Twitter Sentiment Analysis have been done and delivered interesting idea of features, but there is no a comparative study that shows the best features in performing Sentiment Analysis. In total we used 9 feature sets (41 attributes) that comprise punctuation, lexical, part of speech, emoticon, SentiWord lexicon, AFINN-lexicon, Opinion lexicon, Senti-Strength method, and Emotion lexicon. Feature analysis was done by conducting supervised classification for each feature sets and continued with feature selection in subjectivity and polarity domain. By using four different datasets, the results reveal that AFINN lexicon and Senti-Strength method are the best current approaches to perform Twitter Sentiment Analysis.
Content may be subject to copyright.
A Comparative Study on Twitter Sentiment
Analysis: Which Features are Good?
Fajri Koto, and Mirna Adriani
Faculty of Computer Science, University of Indonesia
Depok, Jawa Barat, Indonesia 16423
fajri91@ui.ac.id,mirna@cs.ui.ac.id
http://www.cs.ui.ac.id
Abstract. In this paper, investigations of Sentiment Analysis over a
well-known Social Media Twitter were done. As literatures show that
some works related to Twitter Sentiment Analysis have been done and
delivered interesting idea of features, but there is no a comparative study
that shows the best features in performing Sentiment Analysis. In total
we used 9 feature sets (41 attributes) that comprise punctuation, lexical,
part of speech, emoticon, SentiWord lexicon, AFINN -lexicon, Opinion
lexicon, Senti-Strength method, and Emotion lexicon. Feature analysis
was done by conducting supervised classification for each feature sets
and continued with feature selection in subjectivity and polarity domain.
By using four different datasets, the results reveal that AFINN lexicon
and Senti-Strength method are the best current approaches to perform
Twitter Sentiment Analysis.
Keywords: Twitter, Sentiment Analysis, Comparative Study, Polarity,
Subjectivity
1 Introduction
In general the goal of Sentiment Analysis is to determine the polarity of natural
language text by performing supervised and/or unsupervised classification. This
sentiment classification can be roughly divided into two categories: Subjectiv-
ity and Polarity [1]. The difference between subjectivity and polarity classifica-
tion is the class involved in conducting training and testing stage. Sentiment
of subjectivity comprises of subjective and objective class [2]. Whereas polarity
classification involves classes of positive,negative and neutral [3].
Many approaches [4-12] have been addressed to classify sentiment over Twit-
ter1. However, based on the previous study there is no a comparative study
that shows the good feature in performing Sentiment Analysis. Whereas, this
information will be necessary especially for today’s business that concern with
social media analysis in running their work. Driven by this fact, we first derive
all possible features and then investigate the cases by performing supervised
classification for each feature set.
1http://www.twitter.com
2 A Comparative Study on Twitter Sentiment Analysis
Table 1. List of all feature sets for Twitter Sentiment Analysis
Set #Attr List of Attribute Description
Punctuation [3],
range = {0,1,..,n}
5 Number of ”!”, ”?”, ”.”,
”,”, and special character
Number of corresponding
punctuation in a tweet
Lexical,
range1 = {0,1,..,n},
range2 = {false,true}
91)tweetLength, #lowercase,
#uppercase,
Aggregate{min, max, avg}
of #letterInWord,
#hashtag
The corresponding number of
attributes
2)haveRT True if the tweet contains ”RT
phrase, False otherwise
Part of Speech,
extracted by NLTK
Python [16]
range1 = {0,1,..,n},
range2 = {false,true}
81)#noun, #verb,
#adjective, #adverb,
#pronoun
Number of corresponding POS tag
in a tweet
2)hasComparative,
hasSuperlative,
hasPastPartciple
True if the tweet contains a
comparative/superlative adjective
or adverb; or a past participle,
False otherwise
Emoticon,
obtained from [3][5]
and Wikipedia
range = {-n,.0,1.,n}
1emoticonScore Increasing the score by +1 and -1
for positive and negative emoticon
respectively, initiated by 0
SentiWord Lex. [8],
range = {0,1,..,n}
2sumpos, sumneg sum of the scores for the positive
or negative words that matches the
lexicon
AFINN Lex. [9][10],
range1 = {0,1,..,n},
range2 = {-n,..,-1,0}
21) APO sum of the scores for the positive
words that matches the lexicon
2) ANE sum of the scores for the negative
words that matches the lexicon
Opinion Lex. (OL),
range = {0,1,..,n}
4 1) Wilson (positive words,
negative words) [6]
2) Bingliu (positive words,
negative words) [7]
sum of the scores for the positive
or negative words that matches the
lexicon
Senti-Strength (SS)
[12],
range1 = {-5,-4,..-1}
range2 = {1,2,..,5}
21)ssn method score for negative category
2)ssp method score for positive category
NRC Emotion Lex.
[11][13][14],
range = {0,1,..,n}
8 joy, trust, sadness, anger,
surprise, fear, disgust,
anticipation
number of words that matches
with corresponding emotion class
word list
A Comparative Study on Twitter Sentiment Analysis 3
2 Experiment with Feature of Sentiment Analysis
Table 2. Balanced Dataset
Subjectivity Sanders HCR OMD SemEval Polarity Sanders HCR OMD SemEval
#neutral 1190 280 800 2256 #negative 555 368 800 896
#objective 1190 280 800 2256 #positive 555 368 800 896
#total 2380 560 1600 4512 #total 1110 736 1600 1792
The experiment was conducted in two sentiment domains: polarity and sub-
jectivity. There are 4 different datasets: 1) Sanders [1], 2) Health Care Reform
(HCR) [15], 3) Obama-McCain Debate (OMD)6[15], and 4) International Work-
shop Sem-Eval 2013 (SemEval)2data (see Table 2) that were used in this work.
In total we used 9 feature sets (41 attributes) that comprise punctuation, lexical,
part of speech, emoticon, SentiWord lexicon, AFINN -lexicon, Opinion lexicon,
Senti-Strength method, and Emotion lexicon (see Table 1). For preprocessing
stage, it was adjusted based on the type of feature. It comprises: removing user-
name, url,RT phrase, special character, stopwords; converting to lowercase;
stemming and lemmatization. For the first experiment, we conducted binary
classification for each feature set on each dataset. We then also performed feature
selections of all feature sets (by merging the features into a set of 41 attributes)
to all row of datasets.
Results of these experiments are summarized in Table 3 and Table 4. Letter
A, B, C, and D in both tables represent the classifiers of Naive Bayes,Neural
Network, SVM, and Linear Regression consecutively. In the first experiment (see
Table 3), the colored cells are the top-5 of features according to its accuracy. For
both classifications, it reveals that AFINN, Senti-Strength, and Opinion lexicon
are the feature sets that are often found as the top-5 on each dataset. Whereas,
the well-known lexicon, SentiWord, is not able to beat these lexicons. It affirms
that SentiWord is not compatible for Twitter Sentiment Analysis. Our result also
shows that emotion and punctuation are good features for Twitter Sentiment
Analysis, especially in polarity classification.
In Table 4 we show the result of our second experiment, feature selection. The
column of each classifier (A, B, C and D) is filled by number of corresponding
attributes of each feature sets that arise based on feature selection. The table
reveals that punctuation, AFINN and Senti-Strength are the most selected fea-
tures either in subjectivity and polarity classification. It is quite similar with the
previous experiment and affirms that AFINN and Senti-Strength are the current
best feature to conduct Twitter Sentiment Analysis. Thus they are very good to
use as current baseline for Twitter Sentiment Analysis.
3 Conclusion and Future Work
In this work, a comparative study between various features of Twitter Sentiment
Analysis was done by using four different datasets and nine feature sets. Our
2http://www.cs.york.ac.uk/semeval-2013/
4 A Comparative Study on Twitter Sentiment Analysis
Table 3. Classification result for each feature set
Feature
Subjectivity
SemEval Sanders HCR OMD
A B C D A B C D A B C D A B C D
Punct. 56.4 56.7 55.4 57.4 57.6 57.3 57.3 59.3 56.1 59.6 58.6 62.7 56.1 62.1 56.4 60.5
Lexical 51.3 51.8 51.7 54.9 56.7 55.9 55.9 55.4 59.1 59.5 59.3 58.8 52.8 71.8 58.4 68.3
POS 52.4 56.3 55.1 57.4 60.6 60.2 61.4 61.9 59.6 57.0 59.6 59.3 51.1 49.9 50.7 50.8
Emoticon 54.7 53.0 53.4 53.4 53.4 51.3 50.0 48.2 50.7 51.6 51.3 50.5 50.6 49.8 49.8 50.4
SentiWord 58.6 60.6 60.2 60.4 60.4 62.6 60.7 61.1 56.1 57.4 56.3 54.3 50.8 50.2 50.6 47.6
AFINN 64.3 68.8 68.7 68.8 61.2 65.1 64.0 64.8 60.7 63.2 62.3 61.9 51.1 52.1 50.8 51.3
OL 62.0 62.4 62.3 62.7 60.5 63.8 58.9 63.1 61.6 60.5 59.6 61.6 66.4 66.3 63.9 66.1
SS 63.6 66.8 65.9 65.9 62.7 64.5 63.8 64.6 60.9 58.9 60.2 60.7 55.9 55.1 56.4 55.9
Emotion 57.0 58.1 58.6 59.1 58.2 56.3 57.2 57.1 56.1 59.8 52.5 55.9 51.2 50.1 51.0 50.6
Feature
Polarity
SemEval Sanders HCR OMD
A B C D A B C D A B C D A B C D
Punct. 61.8 59.8 61.5 62.1 57.7 57.5 56.2 57.8 63.6 62.2 60.2 64.7 58.8 59.1 59.9 59.2
Lexical 54.3 55.9 54.5 56.4 59.6 59.4 58.7 62.7 54.2 59.6 52.8 60.5 50.5 53.6 55.2 55.3
POS 55.7 55.7 56.1 55.8 57.6 62.6 61.0 60.6 49.9 49.5 50.8 49.1 57.6 56.9 57.0 56.5
Emoticon 55.0 55.3 55.1 55.1 52.8 52.1 52.2 53.3 51.4 49.6 48.8 49.6 49.6 50.3 50.6 50.6
SentiWord 60.9 60.7 60.6 60.4 58.7 56.1 59.3 56.2 56.3 54.3 54.5 55.0 52.7 52.1 52.7 53.9
AFINN 74.3 75.2 75.2 75.2 69.8 70.9 70.6 71.1 60.5 58.7 60.3 60.1 62.7 62.8 62.5 62.8
OL 68.5 70.2 70.1 69.8 70.0 68.2 69.2 70.2 59.6 59.3 61.0 61.7 60.3 62.9 61.1 58.9
SS 72.9 75.2 73.0 74.9 72.3 71.8 72.2 72.3 59.7 60.5 59.7 58.7 62.5 62.6 61.7 62.5
Emotion 66.4 66.2 66.4 68.5 65.7 65.1 63.9 66.5 55.8 52.6 55.3 55.9 59.1 57.9 57.6 59.2
Table 4. Feature selection result
Feature #Attr Subjectivity Polarity
A B C D A B C D
Punct. 5 1 3 1 1 2 2 2 2
Lexical 9 4 1 2 - 1 2 3 1
POS 8 2 - 1 - 3 - 1 -
Emoticon 1 - 1 - - - 1 1 1
SentiWord 2 1 1 1 - - 1 1 -
AFINN 2 2 1 2 2 1 2 2 2
OL 4 1 2 1 1 - - 2 1
SS 2 2 2 2 2 1 1 1 1
Emotion 8 - 3 2 - - - 2 1
Accuracy 65.5 67.4 63.4 66.0 71.5 73.9 73.5 75.0
experiment reveals that AFINN and Senti-Strength are the current best features
for Twitter Sentiment Analysis. According to the results, the other features such
as punctuation, Opinion lexicon and emotion are also important to consider.
Future research may be conducted along with new idea of features released and
investigated.
A Comparative Study on Twitter Sentiment Analysis 5
References
1. Bravo-Marquez, F., Mendoza, M., Poblete, B.: Combining strengths, emotions and
polarities for boosting Twitter sentiment analysis. In: Proceedings of the Second
International Workshop on Issues of Sentiment Discovery and Opinion Mining, 2
(2013).
2. Raaijmakers, S., Kraaij, W.: A Shallow Approach to Subjectivity Classification. In:
ICWSM (2008)
3. Aisopos, F., Papadakis, G., Tserpes, K., Varvarigou, T.: Content vs. context for
sentiment analysis: a comparative analysis over microblogs. In: Proceedings of the
23rd ACM conference on Hypertext and social media, pp. 187-196 (2012)
4. Go, A., Bhayani R., Huang L.: Twitter sentiment classification using distant super-
vision. In: CS224N Project Report, Stanford (2009)
5. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis
of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp.
30–38 (2011)
6. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level
sentiment analysis. In: Proceedings of the conference on human language technology
and empirical methods in natural language processing, pp. 347–354 (2005)
7. Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on
the web. In: Proceedings of the 14th international conference on World Wide Web,
pp. 342-351 (2005)
8. Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An Enhanced Lexical
Resource for Sentiment Analysis and Opinion Mining. In: LREC, Vol. 10, pp. 2200–
2204 (2010)
9. Bradley, M. M., Lang, P. J.: Affective norms for English words (ANEW): Instruction
manual and affective ratings. In: Technical Report C-1, The Center for Research in
Psychophysiology, University of Florida, pp. 1–45 (1999)
10. Nielsen, F. A.: A new ANEW: Evaluation of a word list for sentiment analysis in
microblogs. in: arXiv preprint arXiv: 1103.2903. (2011)
11. Mohammad, S. M., Turney, P. D.: Crowdsourcing a wordemotion association lexi-
con. In: Computational Intelligence, 29(3), pp. 436–465 (2013)
12. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the
social web. In: Journal of the American Society for Information Science and Tech-
nology, 63(1), pp. 163–173 (2012)
13. Ekman, P.: An argument for basic emotions. Cognition and Emotion, 6(3-4), pp.
169–200 (1992)
14. Plutchik, R.: The psychology and biology of emotion. HarperCollins College Pub-
lishers (1994)
15. Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.:Twitter polarity classification
with label propagation over lexical links and the follower graph. In: Proceedings of
the First workshop on Unsupervised Learning in NLP, pp. 53–63 (2011)
16. Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL
on Interactive presentation sessions, pp. 69–72 (2006)
... The strength of positive/negative emotions expressed in the Twitter posts was assessed with the sentiment analysis tool SentiStrength [38], exhibited to outperform other machine learning approaches [39] and deemed to be state of the art for the analysis of tweets [40]. To each tweet, SentiStrength assigns a positive sentiment score, which ranges from 1 (no positive attitude) to 5 (very strong positive attitude), and a negative sentiment score, on a scale from -1 (no negative attitude) to -5 (very strong negative attitude). ...
... Several lexicon dictionaries are in use, but relevant studies used the AFINN for sentiment analysis (Lowe et al. 2011;Markovikj et al. 2013). In particular, Koto and Adriani (2015) argued that this lexicon dictionary is best for analyzing sentiments in Twitter data. AFINN provides a list of English terms, and scores coded words with an integer between −5 and +5. ...
Article
This research examined social media’s role in understanding perceptions about the spaces in which individuals interact, what planners can learn from social media data, and how to use social media to inform urban regeneration efforts. Using Twitter data from 2010 to 2018 recorded in one U.S. shrinking city, Detroit, Michigan, this paper longitudinally investigated topics that people discuss, their emotions, and neighborhood conditions associated with these topics and sentiments. Findings demonstrate that neighborhood demographics, socioeconomic, and built environment conditions impact people’s sentiments.
... Studies subjectivity and sentiment polarity [508,429,338,396,213,398,214,403,16,426,210,181,513,334,358,376,98,381,106,374,413,328,199,326,325,196,234,321,121] sentiment polarity and emotion [543,428,248,383,425,62,342,360,71,555,365,491,125,87,57,85,51,558,505,149,148,447] sentiment polarity and mood [194] sentiment polarity and irony [404] sentiment polarity and sarcasm [511] sentiment polarity and affect [507] emotion and anger [446,445] irony and sarcasm [352] subjectivity, sentiment polarity and emotion [73] subjectivity, sentiment polarity, emotion and irony [310] In this domain, objective statements are usually classified as being neutral (in terms of polarity), whereas subjective statements are non-neutral. In the latter cases, sentiment analysis is performed to determine the polarity classification (more information on this below). ...
Preprint
Social media popularity and importance is on the increase, due to people using it for various types of social interaction across multiple channels. This social interaction by online users includes submission of feedback, opinions and recommendations about various individuals, entities, topics, and events. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Therefore, through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence, which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, natural language processing tasks and other aspects derived from the published studies. Such multi-source information fusion plays a fundamental role in mining of people's social opinions from social media platforms. These can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. Future research directions are presented, whereas further research and development has the potential of leaving a wider academic and societal impact.
... As hedonic happiness is an important aspect in determining the total level of happiness, the sentiment score was measured as a proxy of the level of happiness, and the happiness score was calculated by counting positive and negative scores from a neighborhood; the sentiment of each tweet was classified into positive, negative, or neutral using the AFINN lexicon dictionary developed by Hansen et al. (2011). Since their dictionary has been used in several previous studies (Koto & Adriani, 2015;Lowe et al., 2011;Markovikj et al., 2013), the authors assumed that the AFINN lexicon was validated for the academic research. The happiness score was calculated by using a proportion of positive and F I G U R E 2 Detroit, neighborhood planning area, and new developments negative sentiments of each neighborhood. ...
Article
Planning interventions have been applied to improve the well-being, hereafter happiness, of residents. The happiness in shrinking cities, in particular, becomes more critical since urban decline tends to induce an unequal and uneven distribution of care under a limited budget and human resources. Using geo-tagged Twitter, census, and geospatial data on Detroit, Michigan, which is one of the well-known shrinking cities in the U.S., the spatial distribution of sentiments, topics of tweets appeared, and the association between neighborhood conditions and the level of happiness were examined. The outcomes indicate that people in Detroit are posting happy tweets more than negative tweets. The downtown area holds both positive and negative hotspots, which are clustered around sports arenas and bars, respectively. Neighborhoods with young and well-educated residents , situated close to amenities (i.e., recreation facilities, colleges, and commercial areas), and less crime tend to be happier. The use of SNS data could serve as a meaningful social listening tool to reconcile the declining urban vitality of neighborhoods since people interact with those spaces. Negative sentiments are attached to specific neighborhoods with certain conditions so that regeneration efforts should take place in neighborhoods with a higher priority.
... Identification of semantic orientation [14], comparative study and low performance of the SentiWordNet lexicon in sentiment analysis [9], development of novel emoji and linguistic content based lexicons using unsupervised approach [15,16], sentiment polarity detection system using unsupervised approach on Turkish movie reviews [17] etc. are all different interesting research works that use unsupervised approach. The application of standard lexicons such as SentiWordNet [18], AFINN [19] etc. in unsupervised sentiment classification is widely studied and evaluated in different works [20,21,22]. These lexicon based techniques are employed in solving interesting problems, such as analysing the sentiment of the characters in Shakespeare's plays [23], opinion mining from clinical discharge summaries [24], development of biasaware systems [25] etc. ...
Article
Full-text available
The world is going through an unprecedented crisis due to COVID-19 breakout, and people all over the world are forced to stay indoors for safety. In such a situation, the rise and fall of the number of affected cases or deaths has turned into a constant headline in most news channels. Consequently, there is a lack of positivity in the worldwide news published in different forms of media. Texts based on news articles, movie reviews, tweets, etc. are often analyzed by researchers, and mined for determining opinion or sentiment, using supervised and unsupervised methods. The proposed work takes up the challenge of mining a comprehensive set of online news texts, for determining the prevailing sentiment in the context of the ongoing pandemic, along with a statistical analysis of the relation between actual effect of COVID-19 and online news sentiment. The amount and observed delay of impact of the ground truth situation on online news is determined on a global scale, as well as at country level. The authors conclude that at a global level, the news sentiment has a good amount of dependence on the number of new cases or deaths, while the effect varies for different countries, and is also dependent on regional socio-political factors. Also available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7576103/
Article
Social media sites have been gaining traction as a source of novel data for environmental research, particularly for cultural ecosystem service (CES) assessments. However, Reddit, a discussion-based site, has yet to establish itself as an important source of data for CES research, possibly due to researchers not being aware of its potential applications or because Reddit posts lack georeferencing information. Here, we demonstrate how researchers can search Reddit for CES datasets related to recreation and how specific pages on Reddit may provide data for other CES such as aesthetics. Using named-entity recognition, we developed an automated method of geocoding the approximate location of where images in Reddit posts were taken. Furthermore, we compare posts from Reddit and Flickr for a range of recreational activities and compare the content and textual metadata of images relating to hiking. Though there is potential for Reddit data to be used in spatial analysis, we highlight the limitations associated with georeferencing posts. We recommend that data from Reddit is best suited to assessing general trends in CES, either for a given service or place. By demonstrating the value of big data from Reddit we hope to encourage its inclusion in future CES and environmental research.
Article
Full-text available
Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 published studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, and other aspects derived. Social Opinion Mining can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. The latest developments in Social Opinion Mining beyond 2018 are also presented together with future research directions, with the aim of leaving a wider academic and societal impact in several real-world applications.
Preprint
Full-text available
Although some linguists (Rusmali et al., 1985; Crouch, 2009) have fairly attempted to define the morphology and syntax of Minangkabau, information processing in this language is still absent due to the scarcity of the annotated resource. In this work, we release two Minangkabau corpora: sentiment analysis and machine translation that are harvested and constructed from Twitter and Wikipedia. We conduct the first computational linguistics in Minangkabau language employing classic machine learning and sequence-to-sequence models such as LSTM and Transformer. Our first experiments show that the classification performance over Minangkabau text significantly drops when tested with the model trained in Indonesian. Whereas, in the machine translation experiment, a simple word-to-word translation using a bilingual dictionary outperforms LSTM and Transformer model in terms of BLEU score.
Conference Paper
Full-text available
There is high demand for automated tools that assign polarity to microblog content such as tweets (Twitter posts), but this is challenging due to the terseness and informality of tweets in addition to the wide variety and rapid evolution of language in Twitter. It is thus impractical to use standard supervised machine learning techniques dependent on annotated training examples. We do without such annotations by using label propagation to incorporate labels from a maximum entropy classifier trained on noisy labels and knowledge about word types encoded in a lexicon, in combination with the Twitter follower graph. Results on polarity classification for several datasets show that our label propagation approach rivals a model supervised with in-domain annotated tweets, and it outperforms the noisily supervised classifier it exploits as well as a lexicon-based polarity ratio classifier.
Article
Full-text available
Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word-emotion and word-polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion-annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher inter-annotator agreement than that obtained by asking if a term evokes an emotion.
Article
Full-text available
We examine sentiment analysis on Twitter data. The contributions of this paper are: (1) We introduce POS-specific prior polarity fea- tures. (2) We explore the use of a tree kernel to obviate the need for tedious feature engineer- ing. The new features (in conjunction with previously proposed features) and the tree ker- nel perform approximately at the same level, both outperforming the state-of-the-art base- line. kernel based model. For the feature based model we use some of the features proposed in past liter- ature and propose new features. For the tree ker- nel based model we design a new tree representa- tion for tweets. We use a unigram model, previously shown to work well for sentiment analysis for Twit- ter data, as our baseline. Our experiments show that a unigram model is indeed a hard baseline achieving over 20% over the chance baseline for both classifi- cation tasks. Our feature based model that uses only 100 features achieves similar accuracy as the uni- gram model that uses over 10,000 features. Our tree kernel based model outperforms both these models by a significant margin. We also experiment with a combination of models: combining unigrams with our features and combining our features with the tree kernel. Both these combinations outperform the un- igram baseline by over 4% for both classification tasks. In this paper, we present extensive feature analysis of the 100 features we propose. Our ex- periments show that features that have to do with Twitter-specific features (emoticons, hashtags etc.) add value to the classifier but only marginally. Fea- tures that combine prior polarity of words with their parts-of-speech tags are most important for both the classification tasks. Thus, we see that standard nat- ural language processing tools are useful even in a genre which is quite different from the genre on which they were trained (newswire). Furthermore, we also show that the tree kernel model performs roughly as well as the best feature based models, even though it does not require detailed feature en-
Conference Paper
Full-text available
Twitter sentiment analysis or the task of automatically retrieving opinions from tweets has received an increasing interest from the web mining community. This is due to its importance in a wide range of fields such as business and politics. People express sentiments about specific topics or entities with different strengths and intensities, where these sentiments are strongly related to their personal feelings and emotions. A number of methods and lexical resources have been proposed to analyze sentiment from natural language texts, addressing different opinion dimensions. In this article, we propose an approach for boosting Twitter sentiment classification using different sentiment dimensions as meta-level features. We combine aspects such as opinion strength, emotion and polarity indicators, generated by existing sentiment analysis methods and resources. Our research shows that the combination of sentiment dimensions provides significant improvement in Twitter sentiment classification tasks such as polarity and subjectivity.
Conference Paper
Full-text available
Microblog content poses serious challenges to the applicabil-ity of traditional sentiment analysis and classification meth-ods, due to its inherent characteristics. To tackle them, we introduce a method that relies on two orthogonal, but com-plementary sources of evidence: content-based features cap-tured by n-gram graphs and context-based ones captured by polarity ratio. Both are language-neutral and noise-tolerant, guaranteeing high effectiveness and robustness in the set-tings we are considering. To ensure our approach can be integrated into practical applications with large volumes of data, we also aim at enhancing its time efficiency: we pro-pose alternative sets of features with low extraction cost, ex-plore dimensionality reduction and discretization techniques and experiment with multiple classification algorithms. We then evaluate our methods over a large, real-world data set extracted from Twitter, with the outcomes indicating sig-nificant improvements over the traditional techniques.
Conference Paper
Full-text available
We present a shallow linguistic approach to subjectivity clas- sification. Using multinomial kernel machines, we demon- strate that a data representation based on counting character n-grams is able to improve on results previously attained on the MPQA corpus using word-based n-grams and syntactic information. We compare two types of string-based repre- sentations: key substring groups and character n-grams. We find that word-spanning character n-grams significantly re- duce the bias of a classifier, and boost its accuracy.1
Conference Paper
Full-text available
The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past year the toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language. This paper reports on the simplified toolkit and explains how it is used in teaching NLP.
Article
We introduce a novel approach for automatically classify-ing the sentiment of Twitter messages. These messages are classified as either positive or negative with respect to a query term. This is useful for consumers who want to re-search the sentiment of products before purchase, or com-panies that want to monitor the public sentiment of their brands. There is no previous research on classifying sen-timent of messages on microblogging services like Twitter. We present the results of machine learning algorithms for classifying the sentiment of Twitter messages using distant supervision. Our training data consists of Twitter messages with emoticons, which are used as noisy labels. This type of training data is abundantly available and can be obtained through automated means. We show that machine learn-ing algorithms (Naive Bayes, Maximum Entropy, and SVM) have accuracy above 80% when trained with emoticon data. This paper also describes the preprocessing steps needed in order to achieve high accuracy. The main contribution of this paper is the idea of using tweets with emoticons for distant supervised learning.