Conference PaperPDF Available

A Comparative Study on Twitter Sentiment Analysis: Which Features are Good?

Authors:

Abstract and Figures

In this paper, investigations of Sentiment Analysis over a well-known Social Media Twitter were done. As literatures show that some works related to Twitter Sentiment Analysis have been done and delivered interesting idea of features, but there is no a comparative study that shows the best features in performing Sentiment Analysis. In total we used 9 feature sets (41 attributes) that comprise punctuation, lexical, part of speech, emoticon, SentiWord lexicon, AFINN-lexicon, Opinion lexicon, Senti-Strength method, and Emotion lexicon. Feature analysis was done by conducting supervised classification for each feature sets and continued with feature selection in subjectivity and polarity domain. By using four different datasets, the results reveal that AFINN lexicon and Senti-Strength method are the best current approaches to perform Twitter Sentiment Analysis.
Content may be subject to copyright.
A Comparative Study on Twitter Sentiment
Analysis: Which Features are Good?
Fajri Koto, and Mirna Adriani
Faculty of Computer Science, University of Indonesia
Depok, Jawa Barat, Indonesia 16423
fajri91@ui.ac.id,mirna@cs.ui.ac.id
http://www.cs.ui.ac.id
Abstract. In this paper, investigations of Sentiment Analysis over a
well-known Social Media Twitter were done. As literatures show that
some works related to Twitter Sentiment Analysis have been done and
delivered interesting idea of features, but there is no a comparative study
that shows the best features in performing Sentiment Analysis. In total
we used 9 feature sets (41 attributes) that comprise punctuation, lexical,
part of speech, emoticon, SentiWord lexicon, AFINN -lexicon, Opinion
lexicon, Senti-Strength method, and Emotion lexicon. Feature analysis
was done by conducting supervised classification for each feature sets
and continued with feature selection in subjectivity and polarity domain.
By using four different datasets, the results reveal that AFINN lexicon
and Senti-Strength method are the best current approaches to perform
Twitter Sentiment Analysis.
Keywords: Twitter, Sentiment Analysis, Comparative Study, Polarity,
Subjectivity
1 Introduction
In general the goal of Sentiment Analysis is to determine the polarity of natural
language text by performing supervised and/or unsupervised classification. This
sentiment classification can be roughly divided into two categories: Subjectiv-
ity and Polarity [1]. The difference between subjectivity and polarity classifica-
tion is the class involved in conducting training and testing stage. Sentiment
of subjectivity comprises of subjective and objective class [2]. Whereas polarity
classification involves classes of positive,negative and neutral [3].
Many approaches [4-12] have been addressed to classify sentiment over Twit-
ter1. However, based on the previous study there is no a comparative study
that shows the good feature in performing Sentiment Analysis. Whereas, this
information will be necessary especially for today’s business that concern with
social media analysis in running their work. Driven by this fact, we first derive
all possible features and then investigate the cases by performing supervised
classification for each feature set.
1http://www.twitter.com
2 A Comparative Study on Twitter Sentiment Analysis
Table 1. List of all feature sets for Twitter Sentiment Analysis
Set #Attr List of Attribute Description
Punctuation [3],
range = {0,1,..,n}
5 Number of ”!”, ”?”, ”.”,
”,”, and special character
Number of corresponding
punctuation in a tweet
Lexical,
range1 = {0,1,..,n},
range2 = {false,true}
91)tweetLength, #lowercase,
#uppercase,
Aggregate{min, max, avg}
of #letterInWord,
#hashtag
The corresponding number of
attributes
2)haveRT True if the tweet contains ”RT
phrase, False otherwise
Part of Speech,
extracted by NLTK
Python [16]
range1 = {0,1,..,n},
range2 = {false,true}
81)#noun, #verb,
#adjective, #adverb,
#pronoun
Number of corresponding POS tag
in a tweet
2)hasComparative,
hasSuperlative,
hasPastPartciple
True if the tweet contains a
comparative/superlative adjective
or adverb; or a past participle,
False otherwise
Emoticon,
obtained from [3][5]
and Wikipedia
range = {-n,.0,1.,n}
1emoticonScore Increasing the score by +1 and -1
for positive and negative emoticon
respectively, initiated by 0
SentiWord Lex. [8],
range = {0,1,..,n}
2sumpos, sumneg sum of the scores for the positive
or negative words that matches the
lexicon
AFINN Lex. [9][10],
range1 = {0,1,..,n},
range2 = {-n,..,-1,0}
21) APO sum of the scores for the positive
words that matches the lexicon
2) ANE sum of the scores for the negative
words that matches the lexicon
Opinion Lex. (OL),
range = {0,1,..,n}
4 1) Wilson (positive words,
negative words) [6]
2) Bingliu (positive words,
negative words) [7]
sum of the scores for the positive
or negative words that matches the
lexicon
Senti-Strength (SS)
[12],
range1 = {-5,-4,..-1}
range2 = {1,2,..,5}
21)ssn method score for negative category
2)ssp method score for positive category
NRC Emotion Lex.
[11][13][14],
range = {0,1,..,n}
8 joy, trust, sadness, anger,
surprise, fear, disgust,
anticipation
number of words that matches
with corresponding emotion class
word list
A Comparative Study on Twitter Sentiment Analysis 3
2 Experiment with Feature of Sentiment Analysis
Table 2. Balanced Dataset
Subjectivity Sanders HCR OMD SemEval Polarity Sanders HCR OMD SemEval
#neutral 1190 280 800 2256 #negative 555 368 800 896
#objective 1190 280 800 2256 #positive 555 368 800 896
#total 2380 560 1600 4512 #total 1110 736 1600 1792
The experiment was conducted in two sentiment domains: polarity and sub-
jectivity. There are 4 different datasets: 1) Sanders [1], 2) Health Care Reform
(HCR) [15], 3) Obama-McCain Debate (OMD)6[15], and 4) International Work-
shop Sem-Eval 2013 (SemEval)2data (see Table 2) that were used in this work.
In total we used 9 feature sets (41 attributes) that comprise punctuation, lexical,
part of speech, emoticon, SentiWord lexicon, AFINN -lexicon, Opinion lexicon,
Senti-Strength method, and Emotion lexicon (see Table 1). For preprocessing
stage, it was adjusted based on the type of feature. It comprises: removing user-
name, url,RT phrase, special character, stopwords; converting to lowercase;
stemming and lemmatization. For the first experiment, we conducted binary
classification for each feature set on each dataset. We then also performed feature
selections of all feature sets (by merging the features into a set of 41 attributes)
to all row of datasets.
Results of these experiments are summarized in Table 3 and Table 4. Letter
A, B, C, and D in both tables represent the classifiers of Naive Bayes,Neural
Network, SVM, and Linear Regression consecutively. In the first experiment (see
Table 3), the colored cells are the top-5 of features according to its accuracy. For
both classifications, it reveals that AFINN, Senti-Strength, and Opinion lexicon
are the feature sets that are often found as the top-5 on each dataset. Whereas,
the well-known lexicon, SentiWord, is not able to beat these lexicons. It affirms
that SentiWord is not compatible for Twitter Sentiment Analysis. Our result also
shows that emotion and punctuation are good features for Twitter Sentiment
Analysis, especially in polarity classification.
In Table 4 we show the result of our second experiment, feature selection. The
column of each classifier (A, B, C and D) is filled by number of corresponding
attributes of each feature sets that arise based on feature selection. The table
reveals that punctuation, AFINN and Senti-Strength are the most selected fea-
tures either in subjectivity and polarity classification. It is quite similar with the
previous experiment and affirms that AFINN and Senti-Strength are the current
best feature to conduct Twitter Sentiment Analysis. Thus they are very good to
use as current baseline for Twitter Sentiment Analysis.
3 Conclusion and Future Work
In this work, a comparative study between various features of Twitter Sentiment
Analysis was done by using four different datasets and nine feature sets. Our
2http://www.cs.york.ac.uk/semeval-2013/
4 A Comparative Study on Twitter Sentiment Analysis
Table 3. Classification result for each feature set
Feature
Subjectivity
SemEval Sanders HCR OMD
A B C D A B C D A B C D A B C D
Punct. 56.4 56.7 55.4 57.4 57.6 57.3 57.3 59.3 56.1 59.6 58.6 62.7 56.1 62.1 56.4 60.5
Lexical 51.3 51.8 51.7 54.9 56.7 55.9 55.9 55.4 59.1 59.5 59.3 58.8 52.8 71.8 58.4 68.3
POS 52.4 56.3 55.1 57.4 60.6 60.2 61.4 61.9 59.6 57.0 59.6 59.3 51.1 49.9 50.7 50.8
Emoticon 54.7 53.0 53.4 53.4 53.4 51.3 50.0 48.2 50.7 51.6 51.3 50.5 50.6 49.8 49.8 50.4
SentiWord 58.6 60.6 60.2 60.4 60.4 62.6 60.7 61.1 56.1 57.4 56.3 54.3 50.8 50.2 50.6 47.6
AFINN 64.3 68.8 68.7 68.8 61.2 65.1 64.0 64.8 60.7 63.2 62.3 61.9 51.1 52.1 50.8 51.3
OL 62.0 62.4 62.3 62.7 60.5 63.8 58.9 63.1 61.6 60.5 59.6 61.6 66.4 66.3 63.9 66.1
SS 63.6 66.8 65.9 65.9 62.7 64.5 63.8 64.6 60.9 58.9 60.2 60.7 55.9 55.1 56.4 55.9
Emotion 57.0 58.1 58.6 59.1 58.2 56.3 57.2 57.1 56.1 59.8 52.5 55.9 51.2 50.1 51.0 50.6
Feature
Polarity
SemEval Sanders HCR OMD
A B C D A B C D A B C D A B C D
Punct. 61.8 59.8 61.5 62.1 57.7 57.5 56.2 57.8 63.6 62.2 60.2 64.7 58.8 59.1 59.9 59.2
Lexical 54.3 55.9 54.5 56.4 59.6 59.4 58.7 62.7 54.2 59.6 52.8 60.5 50.5 53.6 55.2 55.3
POS 55.7 55.7 56.1 55.8 57.6 62.6 61.0 60.6 49.9 49.5 50.8 49.1 57.6 56.9 57.0 56.5
Emoticon 55.0 55.3 55.1 55.1 52.8 52.1 52.2 53.3 51.4 49.6 48.8 49.6 49.6 50.3 50.6 50.6
SentiWord 60.9 60.7 60.6 60.4 58.7 56.1 59.3 56.2 56.3 54.3 54.5 55.0 52.7 52.1 52.7 53.9
AFINN 74.3 75.2 75.2 75.2 69.8 70.9 70.6 71.1 60.5 58.7 60.3 60.1 62.7 62.8 62.5 62.8
OL 68.5 70.2 70.1 69.8 70.0 68.2 69.2 70.2 59.6 59.3 61.0 61.7 60.3 62.9 61.1 58.9
SS 72.9 75.2 73.0 74.9 72.3 71.8 72.2 72.3 59.7 60.5 59.7 58.7 62.5 62.6 61.7 62.5
Emotion 66.4 66.2 66.4 68.5 65.7 65.1 63.9 66.5 55.8 52.6 55.3 55.9 59.1 57.9 57.6 59.2
Table 4. Feature selection result
Feature #Attr Subjectivity Polarity
A B C D A B C D
Punct. 5 1 3 1 1 2 2 2 2
Lexical 9 4 1 2 - 1 2 3 1
POS 8 2 - 1 - 3 - 1 -
Emoticon 1 - 1 - - - 1 1 1
SentiWord 2 1 1 1 - - 1 1 -
AFINN 2 2 1 2 2 1 2 2 2
OL 4 1 2 1 1 - - 2 1
SS 2 2 2 2 2 1 1 1 1
Emotion 8 - 3 2 - - - 2 1
Accuracy 65.5 67.4 63.4 66.0 71.5 73.9 73.5 75.0
experiment reveals that AFINN and Senti-Strength are the current best features
for Twitter Sentiment Analysis. According to the results, the other features such
as punctuation, Opinion lexicon and emotion are also important to consider.
Future research may be conducted along with new idea of features released and
investigated.
A Comparative Study on Twitter Sentiment Analysis 5
References
1. Bravo-Marquez, F., Mendoza, M., Poblete, B.: Combining strengths, emotions and
polarities for boosting Twitter sentiment analysis. In: Proceedings of the Second
International Workshop on Issues of Sentiment Discovery and Opinion Mining, 2
(2013).
2. Raaijmakers, S., Kraaij, W.: A Shallow Approach to Subjectivity Classification. In:
ICWSM (2008)
3. Aisopos, F., Papadakis, G., Tserpes, K., Varvarigou, T.: Content vs. context for
sentiment analysis: a comparative analysis over microblogs. In: Proceedings of the
23rd ACM conference on Hypertext and social media, pp. 187-196 (2012)
4. Go, A., Bhayani R., Huang L.: Twitter sentiment classification using distant super-
vision. In: CS224N Project Report, Stanford (2009)
5. Agarwal, A., Xie, B., Vovsha, I., Rambow, O., Passonneau, R.: Sentiment analysis
of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp.
30–38 (2011)
6. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level
sentiment analysis. In: Proceedings of the conference on human language technology
and empirical methods in natural language processing, pp. 347–354 (2005)
7. Liu, B., Hu, M., Cheng, J.: Opinion observer: analyzing and comparing opinions on
the web. In: Proceedings of the 14th international conference on World Wide Web,
pp. 342-351 (2005)
8. Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An Enhanced Lexical
Resource for Sentiment Analysis and Opinion Mining. In: LREC, Vol. 10, pp. 2200–
2204 (2010)
9. Bradley, M. M., Lang, P. J.: Affective norms for English words (ANEW): Instruction
manual and affective ratings. In: Technical Report C-1, The Center for Research in
Psychophysiology, University of Florida, pp. 1–45 (1999)
10. Nielsen, F. A.: A new ANEW: Evaluation of a word list for sentiment analysis in
microblogs. in: arXiv preprint arXiv: 1103.2903. (2011)
11. Mohammad, S. M., Turney, P. D.: Crowdsourcing a wordemotion association lexi-
con. In: Computational Intelligence, 29(3), pp. 436–465 (2013)
12. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the
social web. In: Journal of the American Society for Information Science and Tech-
nology, 63(1), pp. 163–173 (2012)
13. Ekman, P.: An argument for basic emotions. Cognition and Emotion, 6(3-4), pp.
169–200 (1992)
14. Plutchik, R.: The psychology and biology of emotion. HarperCollins College Pub-
lishers (1994)
15. Speriosu, M., Sudan, N., Upadhyay, S., Baldridge, J.:Twitter polarity classification
with label propagation over lexical links and the follower graph. In: Proceedings of
the First workshop on Unsupervised Learning in NLP, pp. 53–63 (2011)
16. Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL
on Interactive presentation sessions, pp. 69–72 (2006)
... The Afinn Lexicon is the most often used dictionary for sentiment analysis. According to [28], the Afinn lexicon was [29,30] and opinion mining on travel websites [31] the Afinn lexicon is used to analyze sentiments. ...
Article
Full-text available
Massive open online courses (MOOCs) platforms now play a big part in online learning. These platforms typically solicit course-related feedback from the community of registered students. The platforms and the students benefit from the analysis of the reviews. Any type of review may often be classified using rule-based, machine learning, lexicon-based, and hybrid approaches. The reviews of massive open online courses are categorized into good, negative, and neutral in this study using lexicon-based techniques such as Afinn, TextBlob, Vader, and Sentiwordnet. The reviews of the 234 courses available on the Coursera platform are used to conduct experiments. To compare the categorization results from lexicon-based techniques, the cosine similarity metric is utilized. Statistical measurements are also used to gauge how accurate the categorization is. The results section follows with a comparison analysis of the lexicon-based techniques. The results demonstrate that Afinn and TextBlob classifiers perform more effectively than Vader and Sentiwordnet classifiers.
... In dataset analysis, Saif et al. [11] and Koto and Adriani [12] provided insights into the performance correlations between dataset characteristics and sentiment classification. They explored various textual attributes, including opinion and sentiment lexicons such as AFINN and Senti-Strength, and found that some widely-known lexicons like SentiWordNet were less effective for Twitter sentiment analysis. ...
Conference Paper
Sentiment Analysis is an emerging research area that focuses on extracting semantic and emotional inferences from natural language, paving the way for analyses that deal with a high volume of textual data. The growing importance of data in strategic decision making and the recognition of social networks as vast repositories of public opinion have propelled this study, which aimed to explore the interaction between human emotions and motorsport events. Thus, this study focused on applying Natural Language Processing to extract and analyze sentiments expressed in tweets about Formula 1. Advanced machine learning and deep learning techniques were employed to train various models in the sentiment classification task. Among these, Logistic Regression and LSTMs stood out, achieving accuracies of 78.21% and 78.08%, respectively. The LSTM model, in particular, was implemented on a public dataset of tweets collected during the 2021 and 2022 Formula 1 seasons. The model was used to classify the sentiments expressed by fans, allowing for an exploratory analysis of data correlated to specific events of the races. The findings revealed significant engagement patterns, with notable spikes in emotional reactions coinciding with critical moments of the seasons. These discoveries illustrate how particular events can profoundly influence the emotions and behavior of fans. From a detailed analysis of expressed sentiments, valuable data can be obtained that may be leveraged for developing more effective marketing and communication strategies in the sport.
... AFINN is a lexicon that is designed for modern English [29]. Furthermore, it is one of the most effective tools for analyzing subjectivity and polarity in text data [30]. VADER is one of the effective sentiment analyzers for social media text data including tweets [24] [27] [31] [32]. ...
... The Afinn lexicon [17] is used to perform sentiment analysis [18,19] on all the journalist articles to get an insight into their writing profile. This process involves calculating the proportion of words in the articles that have positive and negative contexts. ...
Preprint
Slating a product for release often involves pitching journalists to run stories on your press release. Good media coverage often ensures greater product reach and drives audience engagement for those products. Hence, ensuring that those releases are pitched to the right journalists with relevant interests is crucial, since they receive several pitches daily. Keeping up with journalist beats and curating a media contacts list is often a huge and time-consuming task. This study proposes a model to automate and expedite the process by recommending suitable journalists to run media coverage on the press releases provided by the user.
... Twitter seems to be a place for short personal expressions, the length of tweets was previously mentioned as a potentially useful feature for opinion detection [166]. In the light of this previous work, it can be argued that non-perceptions tweets may be longer than perception tweets, potentially because they contain more information. ...
Article
Full-text available
Asthma self-management and regular use of medication are crucial in preventing asthma attacks, which can be fatal. However, existing research usually does not take into consideration the impact that perceptions about asthma, of both patients and the general public have on adherence to treatment. Both methodological and empirical contributions are made, combining the approaches from data science and psychology. The thesis proposes new forms of triangulation of alternative research methods; introduces a novel process of identifying perceptions in text, as well as the use of Grouped Model Class Reliance, which is capable of assessing the most predictive feature group while accounting for non-linear relationships. Empirical contributions are reflected in identification of the leading perceptions, that include stigmatization and its underlying mechanisms, as well as the perceived sense of community. Using a mixed method approach, this work combines traditional Psychological quantitative and qualitative methods and predictive algorithms/machine learning techniques. Using interviews, exploratory qualitative research identified patients’ internal perceptions (about asthma) and external perceptions (what they considered were others’ perceptions). Qualitative analysis of Twitter content formed the second part of this study, identifying several themes in perceptions expressed in tweets. Convergence analysis revealed mutual topics from tweets and interviews: self-pity, humour, disclosure, lack of understanding - topics that reflect stigmatiization; attachment to inhaler and perceived sense of community. However, the presence of negative humour and self-pity was much more prominent on Twitter than in interviews, signifying that some perceptions are more freely expressed on social media such as Twitter, than in the laboratory setting. Conversely, interviews provide more context for stigmatization, though examples. Having recognised the value of Twitter as a naturalistic setting for observing perception, this work created a novel procedure for analysing perceptions in ‘big data’ comprising of filtering, activation, evaluation and modality, that can be used in asthma non-related domains. The results indicate that perceptions related to stigma are the most prevalent negative perceptions about asthma held by both asthma patients and non-patients (following the Twitter analysis). The next stage of research expanded on this initial conclusion by assessing the impact that negative perceptions have on adherence to medication by people with asthma. The third study measured the impact stigma-related factors have on adherence to asthma medication, concluding that denial was the strongest factor. Mediation analysis using coping mechanisms as mediators also highlighted the non-atomic nature of stigma, identifying different underlying mechanisms by which factors relating to stigmatisation of asthma impact patients’ adherence to medication. However, this work also indicated there is potential information sharing and non-linear interactions occurring across factors. This led to the final study that mitigated the effects of non-linear relationships, using a first Grouped Model Class Reliance (Group-MCR) to compare and quantify the importance of several groups of factors in predicting adherence (including perceptions, demographics, lifestyle, coping, emotions, asthma and psychology traits). This final, fourth study was important in linking up the work of this thesis as it established that perceptions are not just important in predicting adherence - they are the strongest set of predictors of adherence when compared to other factors considered in the literature. This thesis takes advantage of a mixed-method approach, highlighting the value of the exploratory nature of qualitative work that provided the context and enabled identification of relevant perceptions; the strength of traditional statistics in describing effects, which was evident in the mediation analysis that implied different stigma mechanisms; and the predictive power of machine learning when dealing with complex, non-linear relationships and large amounts of data. This work indicates that public health interventions should focus on patients’ perceptions as an important component of treatment. In addition, the non-atomic and intrinsic nature of stigma identified within patients with asthma and the general public, underlines the importance of not only changing the negative perceptions of patients in the development of future interventions, but also engagement about asthma with the wider public, with the ultimate aim of reducing stigma.
... The first study of climate change on Twitter was by [7], where they considered whether important events affect the discussion, finding large variations across metropolitan areas and by topic. At the same period [8], identified three groups of human stance towards climate change, namely supportive, unsupportive and neutral [9], found that messages between likeminded users typically carry positive sentiment, while skeptics and activists carry negative sentiment, and [10] identified the different levels of positive and negative opinions hidden in society. In addition [11], found that climate change skeptics use sarcasm and incivility in their tweets and [12] that anti-climate change tweets are largely not credible. ...
Article
Full-text available
How do climate change deniers differ from believers? Is there any correlation between human sentiment and deviations from historic temperature? We answer nine such questions using 13 years of Twitter data and 15 million tweets. Seven aspects are explored, namely, user gender, climate change stance and sentiment, aggressiveness, deviations from historic temperature, topics discussed, and environmental disaster events. We found that: a) climate change deniers use the term global warming much often than believers and use aggressive language, while believers tweet more about taking actions to fight the phenomenon, b) deniers are more present in the American Region, South Africa, Japan, and Eastern China and less present in Europe, India, and Central Africa, c) people connect much more the warm temperatures with man-made climate change than cold temperatures, d) the same regions that had more climate change deniers also tweet with negative sentiment, e) a positive correlation is observed between human sentiment and deviations from historic temperature; when the deviation is between −1.143°C and +2.401°C, people tweet the most positive, f) there exist 90% correlation between sentiment and stance, and -94% correlation between sentiment and aggressiveness, g) no clear patterns are observed to correlate sentiment and stance with disaster events based on total deaths, number of affected, and damage costs, h) topics discussed on Twitter indicate that climate change is a politicized issue and people are expressing their concerns especially when witnessing extreme weather; the global stance could be considered optimistic, as there are many discussions that point out the importance of human intervention to fight climate change and actions are being taken through events to raise the awareness of this phenomenon.
... There are two main approaches for tackling unlabeled tweets. The first approach is to perform unsupervised sentiment analysis using sentiment lexicons that yield sentiment scores for each tweet, such as SentiWordNet, VADER, TextBlob or AFINN [8,9,10,11]. Sometimes, sentiment scores are used to identify phrase patterns that are associated with different sentiments [12,13]. ...
Chapter
Full-text available
The Covid-19 pandemic has created a world-wide crisis from the perspectives of health and economy. Vaccination is one of the prime means by which herd immunity could be developed. Social media platforms such as Twitter has played a major role in building public opinion as the vaccination drive got underway in several countries. In this paper, we present a tweet-based sentiment analysis of the two popularly administered vaccines in India Covishield and Covaxin during the second wave of the pandemic in India, from March 2021 to September 2021, which was attributed to the Delta mutant of the coronavirus. We use unlabeled Covid-19 vaccine-related tweets downloaded from a large-scale dataset from March 2021 to September 2021, and employ transfer learning for classifying the unlabeled tweets. The contributions of this paper are: - sentiment analysis of unlabeled vaccine-related tweets by training a transformer model on pre-trained XLNet (transformer) features derived from a labeled non-Covid Twitter dataset, a time-line of public sentiments for the two vaccines administered in India, and word clouds of high-frequency adjective unigrams after sentiment analysis, as evidence.
Article
Text preprocessing is not only an essential step to prepare the corpus for modeling but also a key area that directly affects the natural language processing (NLP) application results. For instance, precise tokenization increases the accuracy of part-of-speech (POS) tagging, and retaining multiword expressions improves reasoning and machine translation. The text corpus needs to be appropriately preprocessed before it is ready to serve as the input to computer models. The preprocessing requirements depend on both the nature of the corpus and the NLP application itself, that is, what researchers would like to achieve from analyzing the data. Conventional text preprocessing practices generally suffice, but there exist situations where the text preprocessing needs to be customized for better analysis results. Hence, we discuss the pros and cons of several common text preprocessing methods: removing formatting, tokenization, text normalization, handling punctuation, removing stopwords, stemming and lemmatization, n-gramming, and identifying multiword expressions. Then, we provide examples of text datasets which require special preprocessing and how previous researchers handled the challenge. We expect this article to be a starting guideline on how to select and fine-tune text preprocessing methods.
Conference Paper
Full-text available
There is high demand for automated tools that assign polarity to microblog content such as tweets (Twitter posts), but this is challenging due to the terseness and informality of tweets in addition to the wide variety and rapid evolution of language in Twitter. It is thus impractical to use standard supervised machine learning techniques dependent on annotated training examples. We do without such annotations by using label propagation to incorporate labels from a maximum entropy classifier trained on noisy labels and knowledge about word types encoded in a lexicon, in combination with the Twitter follower graph. Results on polarity classification for several datasets show that our label propagation approach rivals a model supervised with in-domain annotated tweets, and it outperforms the noisily supervised classifier it exploits as well as a lexicon-based polarity ratio classifier.
Article
Full-text available
Even though considerable attention has been given to the polarity of words (positive and negative) and the creation of large polarity lexicons, research in emotion analysis has had to rely on limited and small emotion lexicons. In this paper we show how the combined strength and wisdom of the crowds can be used to generate a large, high-quality, word-emotion and word-polarity association lexicon quickly and inexpensively. We enumerate the challenges in emotion annotation in a crowdsourcing scenario and propose solutions to address them. Most notably, in addition to questions about emotions associated with terms, we show how the inclusion of a word choice question can discourage malicious data entry, help identify instances where the annotator may not be familiar with the target term (allowing us to reject such annotations), and help obtain annotations at sense level (rather than at word level). We conducted experiments on how to formulate the emotion-annotation questions, and show that asking if a term is associated with an emotion leads to markedly higher inter-annotator agreement than that obtained by asking if a term evokes an emotion.
Article
Full-text available
We examine sentiment analysis on Twitter data. The contributions of this paper are: (1) We introduce POS-specific prior polarity fea- tures. (2) We explore the use of a tree kernel to obviate the need for tedious feature engineer- ing. The new features (in conjunction with previously proposed features) and the tree ker- nel perform approximately at the same level, both outperforming the state-of-the-art base- line. kernel based model. For the feature based model we use some of the features proposed in past liter- ature and propose new features. For the tree ker- nel based model we design a new tree representa- tion for tweets. We use a unigram model, previously shown to work well for sentiment analysis for Twit- ter data, as our baseline. Our experiments show that a unigram model is indeed a hard baseline achieving over 20% over the chance baseline for both classifi- cation tasks. Our feature based model that uses only 100 features achieves similar accuracy as the uni- gram model that uses over 10,000 features. Our tree kernel based model outperforms both these models by a significant margin. We also experiment with a combination of models: combining unigrams with our features and combining our features with the tree kernel. Both these combinations outperform the un- igram baseline by over 4% for both classification tasks. In this paper, we present extensive feature analysis of the 100 features we propose. Our ex- periments show that features that have to do with Twitter-specific features (emoticons, hashtags etc.) add value to the classifier but only marginally. Fea- tures that combine prior polarity of words with their parts-of-speech tags are most important for both the classification tasks. Thus, we see that standard nat- ural language processing tools are useful even in a genre which is quite different from the genre on which they were trained (newswire). Furthermore, we also show that the tree kernel model performs roughly as well as the best feature based models, even though it does not require detailed feature en-
Conference Paper
Full-text available
Twitter sentiment analysis or the task of automatically retrieving opinions from tweets has received an increasing interest from the web mining community. This is due to its importance in a wide range of fields such as business and politics. People express sentiments about specific topics or entities with different strengths and intensities, where these sentiments are strongly related to their personal feelings and emotions. A number of methods and lexical resources have been proposed to analyze sentiment from natural language texts, addressing different opinion dimensions. In this article, we propose an approach for boosting Twitter sentiment classification using different sentiment dimensions as meta-level features. We combine aspects such as opinion strength, emotion and polarity indicators, generated by existing sentiment analysis methods and resources. Our research shows that the combination of sentiment dimensions provides significant improvement in Twitter sentiment classification tasks such as polarity and subjectivity.
Conference Paper
Full-text available
Microblog content poses serious challenges to the applicabil-ity of traditional sentiment analysis and classification meth-ods, due to its inherent characteristics. To tackle them, we introduce a method that relies on two orthogonal, but com-plementary sources of evidence: content-based features cap-tured by n-gram graphs and context-based ones captured by polarity ratio. Both are language-neutral and noise-tolerant, guaranteeing high effectiveness and robustness in the set-tings we are considering. To ensure our approach can be integrated into practical applications with large volumes of data, we also aim at enhancing its time efficiency: we pro-pose alternative sets of features with low extraction cost, ex-plore dimensionality reduction and discretization techniques and experiment with multiple classification algorithms. We then evaluate our methods over a large, real-world data set extracted from Twitter, with the outcomes indicating sig-nificant improvements over the traditional techniques.
Conference Paper
Full-text available
We present a shallow linguistic approach to subjectivity clas- sification. Using multinomial kernel machines, we demon- strate that a data representation based on counting character n-grams is able to improve on results previously attained on the MPQA corpus using word-based n-grams and syntactic information. We compare two types of string-based repre- sentations: key substring groups and character n-grams. We find that word-spanning character n-grams significantly re- duce the bias of a classifier, and boost its accuracy.1
Conference Paper
Full-text available
The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is written in Python and distributed under the GPL open source license. Over the past year the toolkit has been rewritten, simplifying many linguistic data structures and taking advantage of recent enhancements in the Python language. This paper reports on the simplified toolkit and explains how it is used in teaching NLP.
Article
We introduce a novel approach for automatically classify-ing the sentiment of Twitter messages. These messages are classified as either positive or negative with respect to a query term. This is useful for consumers who want to re-search the sentiment of products before purchase, or com-panies that want to monitor the public sentiment of their brands. There is no previous research on classifying sen-timent of messages on microblogging services like Twitter. We present the results of machine learning algorithms for classifying the sentiment of Twitter messages using distant supervision. Our training data consists of Twitter messages with emoticons, which are used as noisy labels. This type of training data is abundantly available and can be obtained through automated means. We show that machine learn-ing algorithms (Naive Bayes, Maximum Entropy, and SVM) have accuracy above 80% when trained with emoticon data. This paper also describes the preprocessing steps needed in order to achieve high accuracy. The main contribution of this paper is the idea of using tweets with emoticons for distant supervised learning.