Content uploaded by Hussam Hamdan
Author content
All content in this area was uploaded by Hussam Hamdan on May 29, 2016
Content may be subject to copyright.
Université d’Aix-Marseille
Ecole Doctorale 184
Faculté des Sciences et Technique
LSIS UMR CNRS 7296 / Dimag
LIF UMR 7279
CLEO-OpenEdition
Thèse présentée pour obtenir le grade universitaire de docteur
Spécialité : Informatique
Sentiment Analysis in Social Media
Hussam Hamdan
Soutenue le 01/12/2015 devant le jury :
Pr. Patrice Bellot Université Aix-Marseille Directeur de thèse
Pr. Frédéric Béchet Université Aix-Marseille Directeur de thèse
Pr. Béatrice Daille Université Nantes Président du jury
HDR. Patrick Paroubek Université Paris 11 (Paris Sud) Rapporteur
Pr. Jacques Savoy Université Neuchâtel (Suisse) Examinateur
MCF-HDR. Julien Velcin Université Lyon 2 Rapporteur
Acknowledgements
This thesis is a part of AgoraWeb project and funded by Région PACA (Provence-
Alpes-Côte d’Azur). Agoraweb project aims at analyzing the tweets and online reviews
which are related to a book in the OpenEdition Book platform in order to use this social
information to help a book recommendation system. Three laboratories are involving in
this project :
1. Le Centre pour l’édition électronique ouverte (CLEO).
2. Le Laboratoire des Sciences de l’Information et des Systèmes (LSIS).
3. Le Laboratoire d’Informatique Fondamentale de Marseille (LIF).
I would like to thank professors Patrice Bellot and Frédéric Béchet who gave me this
opportunity to do my PhD under their supervisions. I would like to express my thanks
to all members of the three laboratories whom I have the chance to work and discuss
with.
I would like to give special thanks to Dr. Patrick Paroubek, Dr. Julien Velcin, Dr.
Beatrice Daille, and Dr. Jacques Savoy for being in my thesis committee.
I wish to express my gratitude to my family and my friends which have supported
me during my PhD research.
i
ii
Table of Contents
Table of Contents ii
Abstract 1
1 Context and Motivations 3
1.1 Introduction .................................. 3
1.2 Context and Motivations ........................... 4
1.3 Research Objectives .............................. 6
1.4 Thesis Organization ............................. 8
2 Sentiment Analysis and Machine Learning 11
2.1 Introduction .................................. 13
2.2 Opinion Definition .............................. 14
2.3 Sentiment Analysis Levels .......................... 15
2.4 Sentiment Analysis Tasks ........................... 15
2.5 Sentiment Analysis Approaches ....................... 16
2.5.1 Lexicon-Based Approach ....................... 16
2.5.2 Supervised Approach ......................... 20
2.6 Machine Learning ............................... 23
2.6.1 Text Representation ......................... 25
iii
Table of Contents
2.6.2 Support Vector Machines (SVM) .................. 26
2.6.3 Logistic Regression (LR) ....................... 28
2.6.4 Conditional Random Fields (CRF) ................. 28
2.6.5 Classifier Evaluation ......................... 30
2.6.5.1 Precision, Recall, F-measure and Accuracy ........ 30
2.6.5.2 Micro/Macro Measures .................. 32
2.7 Summary and Discussion ........................... 33
3 Supervised Metrics for Term Weighting in Sentiment Analysis 35
3.1 Introduction .................................. 37
3.2 Related Work ................................. 39
3.3 Term Weighting Metrics ........................... 40
3.3.1 Local Weight ............................. 41
3.3.2 Global Weight ............................. 41
3.3.3 Normalization ............................ 45
3.3.4 Score Aggregation .......................... 45
3.4 Datasets .................................... 46
3.4.1 Twitter Dataset ............................ 46
3.4.2 Restaurant and Laptop Reviews Datasets .............. 47
3.5 Experiments .................................. 47
3.5.1 Experiment Setup .......................... 47
3.5.2 Experiment Evaluations ....................... 48
3.6 Conclusion and Future Work ......................... 54
4 Feature Extraction for Twitter Sentiment Analysis 57
4.1 Introduction .................................. 59
4.2 Problem Formulations ............................ 60
4.3 Overview of the Proposed Approach .................... 61
4.4 Feature Extraction .............................. 61
4.4.1 Word ngrams ............................. 62
4.4.2 Negation Features .......................... 62
4.4.3 Twitter Dictionary .......................... 63
iv
Table of Contents
4.4.4 Z Score Features ........................... 63
4.4.5 Semantic Features .......................... 64
4.4.5.1 Brown Dictionary Features ................ 65
4.4.5.2 Topic Features ....................... 65
4.4.5.3 Semantic Role Labeling Features ............. 66
4.4.6 Sentiment Lexicons .......................... 67
4.4.6.1 Manually Constructed Sentiment Lexicons ........ 68
4.4.6.2 Automatic Constructed Sentiment Lexicons ....... 69
4.4.7 Our Sentiment Lexicon ........................ 71
4.5 Experiments and Results ........................... 72
4.5.1 Twitter Dataset ............................ 72
4.5.2 Experiment Setup .......................... 72
4.5.3 Results ................................. 73
4.6 Ranking Twitter Terms According to their Positivity ............ 77
4.6.1 Score Computing ........................... 78
4.7 Conclusion and Future Work ........................ 78
5 From Term Polarity to Text Polarity Detection 81
5.1 Introduction .................................. 83
5.2 Related Work ................................. 84
5.3 Correlation-Based Classification Model ................... 85
5.4 Experiments and Evaluations ........................ 86
5.4.1 Training and Testing Data ...................... 87
5.4.2 Correlation Metrics Evaluation ................... 87
5.4.3 Correlation-Based Model Evaluation ................ 88
5.5 Conclusion and Future Work ......................... 91
6 Aspect-Based Sentiment Analysis 93
6.1 Introduction .................................. 95
6.2 Related Work ................................. 97
6.3 Opinion Target Extraction in Restaurant Reviews ............. 99
6.3.1 Dataset ................................ 99
v
Table of Contents
6.3.2 Opinion Target Extraction ...................... 100
6.3.2.1 Experiments and Results ................. 102
6.3.3 Sentiment Polarity .......................... 104
6.3.3.1 Experiments and Results ................. 106
6.4 Opinion Target Extraction in Book Reviews ................. 107
6.4.1 Book Review Corpus Annotation .................. 109
6.4.2 Opinion Target Extraction ...................... 111
6.4.3 Experiments and Results ...................... 113
6.4.4 Sentiment Polarity .......................... 114
6.4.5 Experiments .............................. 115
6.5 Conclusion and Future Work ......................... 115
Conclusion and Future Work 117
Appendices 123
7 Online Book Review Classification 123
7.1 Introduction .................................. 123
7.2 System Description .............................. 124
7.3 Annotated Corpus .............................. 124
7.3.1 Feature Extraction .......................... 126
7.3.2 Experiments and Evaluation ..................... 126
7.4 Conclusion .................................. 127
8 Using DFR for Book Recommendation 129
8.1 Introduction .................................. 129
8.2 Dataset .................................... 130
8.3 Information Retrieval Model ......................... 131
8.4 Experiments and Results ........................... 133
8.4.1 Experiment Setup .......................... 133
8.4.2 Results ................................. 135
8.5 Conclusion and Perspectives ......................... 135
vi
Table of Contents
viii
Abstract
In this thesis, we address the problem of sentiment analysis. More specifically, we are
interested in analyzing the sentiment expressed in social media texts such as tweets or
customer reviews about restaurant, laptop, hotel or the scholarly book reviews written
by experts.
We focus on two main tasks : sentiment polarity detection in which we aim to deter-
mine the polarity (positive, negative or neutral) of a given text and the opinion target
extraction in which we aim to extract the opinion targets that people tend to express
their opinions towards them (e.g. food, pizza and service are opinion targets in restau-
rant reviews).
Our main objective is constructing state-of-the-art systems which can do the two
tasks. Therefore, for evaluation purpose, we have participated at an International Work-
shop on Semantic Evaluation (SemEval), we have chosen two tasks : (1) Sentiment ana-
lysis in Twitter in which we seek to determine the polarity of tweet and (2) Aspect-Based
sentiment analysis which aims to extract the opinion targets in restaurant reviews, then
to determine the polarity of each target. We have also applied and evaluated our me-
thods using a French book reviews corpus constructed by OpenEdition 1team in which
we extract also the opinion targets and their polarities.
1. http ://www.openedition.org/
1
Introduction
Our proposed methods are supervised for both tasks :
1. For polarity sentiment detection, we address three points : term weighting, feature
extraction and classification method. We first study several supervised term weigh-
ting metrics and analyze the behavior of term weighting metric which could give
a good performance. Then, we enrich the document representation by extracting
several groups of features. As the features extracted from sentiment lexicons seem
to be the most influential features, we propose a new metric called natural entropy
to construct an automatic sentiment lexicon from noisy labeled Twitter corpus and
combine the features extracted from this lexicon to improve the performance. The
evaluation demonstrates that this rich feature extraction process can produce a
state-of-the-art system in sentiment analysis. After these experiments with term
weighting and features extraction with classic classification methods such as Sup-
port Vector Machines and Logistic Regression, we have found that it is difficult to
understand the decision of those classic methods. Therefore, we propose a simple
and an interpretable model for estimating the polarity of text. This new model
relies on bottom-up approach where it goes from word polarity to text polarity de-
tection. Our first experiments will show that this new model seems to be promising
and could outperform the classic methods.
2. For opinion target extraction, we adopt a Conditional Random Field model with
feature extraction process, most of the extracted feature have proved their per-
formance in entity extraction problem. We applied this model for extracting the
opinion targets in English restaurant reviews and French scholarly book reviews.
2
Chapitre 1
Context and Motivations
Contents
1.1 Introduction ............................. 3
1.2 Context and Motivations ...................... 4
1.3 Research Objectives ......................... 6
1.4 Thesis Organization ......................... 8
1.1 Introduction
The automatic detection of sentiment or opinion is the main subject of this PhD
thesis. The main concern is the detection of the sentiment in social media texts. This
involves diverse subtasks, such as the identification of the polarity of a given text (posi-
tive, negative, or neutral) and extracting the attributes (or aspects) of a given product
or service in order to determine their polarities.
There are two main approaches for sentiment analysis. The first is lexicon-based
approach in which we first determine the polarity of each word using different unsu-
pervised techniques, then we aggregate the scores of the words consisting the text in
order to get the polarity score of the text. The second one is supervised in which we
need a labeled corpus to be used for training a classification method which builds a
3
Chapitre 1. Context and Motivations
classification model in order to be used for predicting the polarity of the novel texts.
Sentiment analysis can be done at different levels of text granularity : document,
sentence, phrase, clause and word. Our main work could be seen as sentence or phrase
level sentiment analysis. We try to determine the polarity of posts in Twitter (sentence
level) and the reviews of restaurants, laptops, hotels, French books which could be seen
as a sentence or clause level sentiment analysis because we firstly segment each review
into sentences then we either determine the polarity of each sentence or extract the
opinion targets from each sentence and determine the polarity of each opinion target in
its context.
The approaches to solve the problems discussed above encompass a variety of re-
search areas such as : Machine Learning, Natural Language Processing and Information
Retrieval. The problem of sentiment analysis have been widely studied, different ap-
proaches have been applied. The focus of this thesis is to build a state-of-the-art system
which can predict the sentiment of a new text in an effective manner. Therefore, we
adopt a supervised classification approach and propose different techniques to improve
them such as term weighting and feature extraction in order to obtain such system.
Furthermore, we propose a new model based on correlation metric which can avoid
some drawbacks of the classic supervised methods. The main evaluation of system per-
formance is done through participating at SemEval-2015 1, in addition to the evaluation
on the book review corpus constructed by OpenEdition 2.
The remaining of this chapter is organized as follows : Section 2 presents the discus-
sion of the context and motivations for this work. Research objectives are presented in
Section 3, followed with the overview of the thesis in Section 4.
1.2 Context and Motivations
Sentiment analysis of user-generated text is interesting for many practical reasons.
It could be exploited in marketing, politics and social analysis. For example, a manu-
facturer or a service provider could be interested in collecting the customers’ opinions
1. http ://alt.qcri.org/semeval2015/
2. http ://www.openedition.org/
4
1.2. Context and Motivations
about their products or services.
Users or customers have become more and more interested in the other people’s opi-
nion. For example, the one who wants to buy a book, to reserve for a hotel or restaurant
or to watch a movie will be interested in knowing the opinions of other people in order
to make his decision. Thus, the sentiment analysis could be helpful for recommenda-
tion purpose. While the known search engines such as Google retrieve the information
which is relevant to a user’s query, a more intelligent search can take the mass’ opinion
into account which make the recommendations more appropriate.
This research is a part of AgoraWeb project which aims to analyze the reviews of
books in order to capture the sentiment expressed towards each book which may play an
important role in book recommendation system. Figure 1.1 illustrates the main phases
of AgoraWeb project.
AgoraWeb firstly goes from the book list available in OpenEdition Book platform.
Then, a query composed of each book title and some keywords will be launched using
Google (see Chapter 8 which explains this system). AgoraWeb will next retrieve all
scholarly book reviews related to each book from the Google’s results and it will also
take the related reviews available in OpenEdition Book platform 3. It also collects the
tweets which contain a link to these books. Then, a sentiment analysis will be done to
determine the polarity of each tweet, and the sentiment of each aspect or facet in the
book reviews which requires to firstly extract the aspects. Therefore, we can unders-
tand the opinions of the people towards each book which may be useful for a book
recommendation system.
Thus, our research concerns the detection of sentiment in Twitter and aspect-based
sentiment analysis in which we should extract the opinion targets and their sentiments.
We are interested in evaluating our methods in different domains like restaurant, laptop
and hotel reviews in addition to tweets and book reviews.
3. OpenEdition is the umbrella portal for OpenEdition Books, Revues.org, Hypotheses and Calenda,
four platforms dedicated to electronic resources in social and human sciences (books, journals, research
blogs, and academic announcements respectively). OpenEdition Books is the newest OpenEdition plat-
form, distributes reference books from publishers in social and human sciences. Its aim is to build an
international library in the digital humanities, while encouraging publishers to adopt Open Access in the
long term. By spring 2014, 1,271 books are available on this platform.
5
Chapitre 1. Context and Motivations
Figure 1.1 – AgoraWeb project.
1.3 Research Objectives
The main objective of this thesis is to build a state-of-the-art sentiment analysis
system. Many research papers have been reported that supervised classification methods
outperform the lexicon-based or unsupervised ones. The research in supervised methods
for sentiment analysis has addressed three points :
1. Term Weighting : normally, the researchers have been used bag of words to re-
present the document, each document is a weighted vector of its terms, they often
use the term frequency or the binary weights, but several studies have focused
on how we can improve the classification performance by using more complex
weighting schemas such as Pair-wise Mutual Information and Information Gain.
2. Document Representation : the question is to know whether we can represent a
document by its terms only or whether we need to extract additional features in
order to improve the classification performance. This is the objective of feature
extraction and feature selection processes..
3. Classification Method : many methods have been applied using different term
weights and document representations. The most used methods are Support Vec-
tor Machines, Logistic Regression, Maximum Entropy, Conditional Random Fields
and Naive-Bayes.
In this thesis we are interested in two main tasks : sentiment analysis in Twitter and
6
1.3. Research Objectives
Aspect-Based sentiment analysis. Aspect-Based sentiment analysis consists of different
subtasks, we focus on extracting the opinion targets and detecting their sentiment po-
larities. Therefore, we can distinguish between two different but integrated goals :
— Building a sentiment analysis system for tweets, customer reviews, scholarly French
book reviews.
— Extracting opinion target expressions in English restaurant and French book re-
views.
Figure 1.2 illustrates how the Aspect-Based sentiment analysis could be decomposed
into two tasks : (1) opinion target extraction in which we are interested to extract the
appropriate representation for each term in the sentence in order to tagging the sen-
tence and (2) sentiment polarity detection for which we will work on three directions.
Figure 1.2 – Our research directions in Twitter and Aspect-Based sentiment analysis.
For building a sentiment analysis system, we follow three main directions in super-
vised sentiment analysis. We first propose different global and local weighting schemas
and study their effects on the system performance (Chapter 3). Second, we propose
to enrich the document representation by extracting different groups of features and
7
Chapitre 1. Context and Motivations
evaluate the influence of each feature group (Chapters 4 and 6). Third, we propose a
simple and a new supervised classifier which is based on a term weighting schema, this
model can produce interpretable results which give us the possibility to understand its
decisions (Chapter 5). These three directions aim to learn a classification model which
will be used to predict the polarity of new texts.
For extracting the opinion targets, we also propose to use a supervised tagging me-
thod with a feature extraction process. Thus, we will work on enriching the representa-
tion of each term in each document in order to get an effective system (Chapter 6 will
deal with this problem in restaurant and book reviews). A sequential tagging method
will be used to learn a tagging model which can tag the terms in new texts.
1.4 Thesis Organization
The rest of this thesis consists of five chapters and the appendices which contains
two further chapters.
Chapter 2 gives an overview about the sentiment analysis : opinion definition, the
sentiment analysis tasks and levels, the two main approaches of sentiment analysis
(supervised and lexicon-based). It also describes the machine learning concepts which
will be used in this thesis, three supervised approaches will be presented Support Vector
Machines (SVM), Logistic Regression and Conditional Random Fields (CRF).
Chapter 3 focuses on the supervised weighting metrics which will be used for weigh-
ting the terms in order to use these weights instead of term frequency and study the
effect of term weighting on the sentiment classification in short text when using SVM.
Chapter 4 presents our work in Twitter sentiment analysis, we will describe our
system for sentiment classification, this system uses a logistic regression classifier with
several types of different features in order to construct a state-of-the-art sentiment ana-
lyzer, this system has been participated in SemEval-2015 and has got the third rank over
40 teams, then we propose a new method to construct an automatic sentiment lexicon
and use this new lexicon in our system which make our system more efficient. As the
sentiment lexicons have demonstrated an influential impact on sentiment analysis we
8
1.4. Thesis Organization
have also participated at another task at SemEval which aims at ranking some Twitter
terms according to their association to the positive sentiment, our system which com-
bines the scores from different lexicons has got the first and the second rank over 11
teams according to Spearman and Kendall metrics.
Chapter 5 presents a new model for sentiment analysis in short text. After our expe-
riments in term weighting with SVM (Chapter 3) and feature extraction with Logistic
Regression(Chapter 4), we found that it is difficult to interpret the decision of these
classifiers. Therefore, we propose a simple model based on the supervised weightings
metrics. This model goes up from computing the polarity of each term using an an-
notated corpus to the polarity of a text by aggregating the scores of its terms. Our
experiments results show that this model could outperform the classic models such as
SVM and Logistic Regression.
Chapter 6 is dedicated to Aspect-Based sentiment analysis where we firstly extract
the opinion target expressions existing in a restaurant reviews then we determine the
polarity of each target. A Conditional Random Field model has been used for opinion
target extraction and a supervised system based on logistic regression has been used for
sentiment polarity. The two systems have been participated at SemEval-2015 and have
got good ranking. We also adopt the same methods to extract the opinion targets and
their polarities in French book reviews extracted from OpenEdition Book platform.
In conclusion, we present a summary on the research that has been done in this
thesis and the major scientific contributions in the domain of sentiment analysis.
We also present two chapters in the appendices :
Chapter 7 explains an online book review finder which we have built for collecting
the book reviews over the Web. From a predefined list of books we generate a query for
each book and launch it on Google, we take the first 20 retrieved pages and determine
whether they are book review or not by using a supervised book review classification
system which has been constructed using a corpus annotated by OpenEdition team.
Chapter 8 describes our participation in INEX 4Social book search 2014, we have
proposed to use a Divergence From Randomness (DFR) model implemented in Terrier 5
4. http ://inex.mmci.uni-saarland.de/tracks/books/
5. http ://terrier.org/
9
Chapitre 1. Context and Motivations
to rank the relevant books for a user query, our team was ranked third with this system.
10
Chapitre 2
Sentiment Analysis and Machine
Learning
Abstract
In this chapter, we are going to define the opinion and the sentiment analysis tasks
and approaches. A literature overview of the two main approaches in sentiment analy-
sis will be covered. A definition of machine learning and the supervised classification
methods will also be presented. These methods will be used during this research to
implement our sentiment analysis systems.
Contents
2.1 Introduction ............................. 13
2.2 Opinion Definition .......................... 14
2.3 Sentiment Analysis Levels ..................... 15
2.4 Sentiment Analysis Tasks ..................... 15
2.5 Sentiment Analysis Approaches .................. 16
2.5.1 Lexicon-Based Approach ...................... 16
2.5.2 Supervised Approach ........................ 20
2.6 Machine Learning .......................... 23
2.6.1 Text Representation ........................ 25
11
Chapitre 2. Sentiment Analysis and Machine Learning
2.6.2 Support Vector Machines (SVM) ................. 26
2.6.3 Logistic Regression (LR) ...................... 28
2.6.4 Conditional Random Fields (CRF) ................ 28
2.6.5 Classifier Evaluation ........................ 30
2.6.5.1 Precision, Recall, F-measure and Accuracy ...... 30
2.6.5.2 Micro/Macro Measures ................. 32
2.7 Summary and Discussion ...................... 33
12
2.1. Introduction
2.1 Introduction
In this chapter, we formally define the problem of sentiment analysis or opinion mi-
ning, we will also see the key tasks of sentiment analysis, inspect the literature overview
of sentiment analysis domain and present the machine learning techniques which will
be exploited during this research.
When we think about how an automatic program can determine the sentiment of a
text (positive or negative), the first idea which comes to mind is to count the number
of positive words and negative ones in the text, then if the positive ones are more
numerous than the negative ones, the text will be attributed to the positive sentiment
otherwise to the negative sentiment. For this method we need to have the sentiment of
the words in the text, therefore we should have a predefined list of positive words such
as (happy, good, great) and a list of negative words such as (bad, sad, terrible).
We can suppose that the opinionated words are the main key of the sentiment detec-
tion. In fact, there are two main approaches in sentiment analysis. The first one called
lexicon-based approach which needs to have already the sentiment score of each word
in order to get the sentiment score of the whole text. The second approach works in
contrast, it requires a set of labeled texts, it goes from the text and learns the sentiment
score for each word in order to use these scores as weights to get the sentiment of the
new texts. This approach is called supervised.
We will cover these two main approaches in this chapter. Since the supervised ap-
proach is widely used and more precise one, we will present the supervised techniques
which we will use for constructing the different sentiment analysis systems during this
research.
The remaining of this chapter is organized as follows : Section 2 defines the opinion,
Section 3 exposes the different levels in sentiment analysis, Section 4 describes the dif-
ferent tasks in sentiment analysis while Section 5 discusses the two main approaches
in sentiment analysis, Section 6 presents the machine learning concepts and the tech-
niques used for supervised classification and how evaluate them, Section 7 summarizes
and discusses the main ideas in this chapter.
13
Chapitre 2. Sentiment Analysis and Machine Learning
2.2 Opinion Definition
Opinion itself is a broad concept. In general, the opinion is a quintuple (Liu, 2012):
(ei,a
ij ,s
ijkl ,h
k,t
l)
Where eiis the name of an entity, an entity is a thing of interest about which data
is to be held (i.e. restaurant, laptop, book, hotel), aij is an aspect of the entity, sijkl is
the sentiment expressed on the aspect, hkis the opinion holder who has expressed his
opinion towards that aspect, and tlis the time when the opinion has been expressed by
the holder. The sentiment sij kl can be positive, negative, or neutral, or expressed with
different strength/intensity levels, e.g., 1 to 5 stars as used by most review sites on the
Web.
For example, let’s suppose that a user, Spamx, wrote the following tweet :
Spamx : 5.2.2014
Great laptop that offers many great features !
In this tweet, laptop is the entity, features is an aspect of laptop, the sentiment towards
the aspect is positive because of the adjective great which indicates a positive sentiment,
Spamx is the opinion holder and the opinion time is 5.2.2014.
Several studies have extended this definition, they have demonstrated that the as-
pect of each entity may be another entity which leads to a hierarchy of aspects and
entities (Kim et al., 2013).
The previous definition tries to define the opinion in general but in practice many
applications focus on the sentiment of a given text without any concern with other
components of the opinion. Recently, there is more interest in determining the sen-
timent expressed on a topic or an aspect which is normally referred as Aspect-Based
sentiment analysis or Feature-Based sentiment analysis as called in early studies (Hu et
Liu, 2004a). The extraction of the opinion holder is out of interest when we are inter-
esting in the collective opinion about an object. The opinion time can be important if
we study how the opinion changes during the time which may be interesting in politics
14
2.3. Sentiment Analysis Levels
and marketing domains.
2.3 Sentiment Analysis Levels
Many applications need to determine the sentiment of a text which may be a docu-
ment, sentence, paragraph, clause or just a word. Four main levels of text granularities
have been investigated in sentiment analysis :
1. Document Level : the analysis at this level aims to determine whether the whole
document expresses a positive or negative sentiment. This level assumes that each
document is about one entity, as in movie reviews. Thus, it is not applicable if the
document discusses multiple entities.
2. Sentence Level : the analysis at this level determines whether each sentence is
positive, negative or neutral. Few work has been also done at the clause level (Liu,
2012).
3. Entity and Aspect Level : aspect level performs fine-grained analysis. Firstly, the
aspect should be extracted, then the sentiment towards this aspect. For example,
food and service are aspects of the restaurant.
4. Word Level : the analysis at this level goes to the word. It determines whether the
word implies a negative, positive or neutral opinion. Normally, the word polarities
are used for sentiment tasks at higher levels. The task at this level can be seen as
sentiment lexicon construction.
2.4 Sentiment Analysis Tasks
Sentiment analysis tasks are derived from the 5 components of opinion definition.
Six tasks can be mentioned :
1. Entity Extraction and Categorization : the entity can be expressed using dif-
ferent writings. For example, "Motorola" may be written as "Moto" or "Mot.". This
task extracts all entity expressions in the document, and groups the synonymous
15
Chapitre 2. Sentiment Analysis and Machine Learning
entity expressions into entity cluster or category where each entity category indi-
cates a unique entity (Kim et al., 2013).
2. Aspect Extraction and Categorization : each aspect of an entity can be expressed
using different writings. This task extracts all aspect expressions and groups them
into clusters where each cluster represents a unique aspect. For example, Pizza,
fish, salad can be categorized into the cluster (aspect) Food (Kim et al., 2013).
3. Aspect Sentiment Classification : this task determines whether an opinion on an
aspect is positive, negative or neutral (Hu et Liu, 2004a).
4. Entity Sentiment Classification : this task determines whether an opinion on an
entity or entity category is positive, negative or neutral (Kim et al., 2013).
5. Opinion Holder Extraction and Categorization : this task extracts the opinion
holder from text or structured data (Kim et Hovy, 2006).
6. Time Extraction and Standardization : this task extracts the time when the opi-
nion is given (O’Connor et al., 2010).
2.5 Sentiment Analysis Approaches
There are two principally different approaches to opinion mining : lexicon-based
and supervised. The lexicon-based or unsupervised approach goes from the word level
in order to constitute the polarity of the text. This approach depends on a sentiment
lexicon to get the word polarity score. While the supervised approach goes from the text
level and learn a model which assigns a polarity score to the whole text, this approach
needs a labeled corpus to learn the model.
Since these two approaches are the most used in sentiment analysis, we cover the
literature survey from these two perspectives.
2.5.1 Lexicon-Based Approach
Unsupervised or lexicon-based approaches decide the polarity of a document ba-
sed on sentiment lexicons. The sentiment of a text is a function of the common words
16
2.5. Sentiment Analysis Approaches
between the text and the sentiment lexicons.
This function can be for example, the number of positive words divided by the num-
ber of negative ones, if the ratio is more than 1, the text will be positive, if it is equal to
1 the text will be considered neutral, otherwise it will be negative.
Another function, which has been widely used, is the Semantic Orientation (SO) of
the adjectives, phrases or syntactical patterns discovered in the text (Turney, 2002), SO
is a measure of sentiment in text, it will be discussed later in this chapter.
Much of the first lexicon-based research has focused on using adjectives as indi-
cators of the semantic orientation of text (Hatzivassiloglou et McKeown, 1997; Hu et
Liu, 2004b). First, a list of adjectives and corresponding semantic orientation values is
compiled into a dictionary. Then, for any given text, all adjectives are extracted and an-
notated with their semantic orientation value, using the dictionary scores. The semantic
orientation scores are in turn aggregated into a single score for the text.
(Taboada et al., 2011) proposed another function called SO-CAL (Semantic Orienta-
tion CALculator) which uses dictionaries of words annotated with their semantic orien-
tation (polarity and strength), and incorporates intensification and negation. Figure 2.1
illustrates the main stages of lexicon-based approach.
Figure 2.1 – Lexicon-based sentiment analysis approach.
Thus, the sentiment lexicon is the most important part of this approach. Three dif-
ferent ways can be used to construct such lexicons :
1. Manual Approach
The manual approach depends on the human effort, the annotators are asked to
label several words. Normally, they are asked to decide if a word is positive or
negative or neutral whatever the context. Some lexicons take the context into ac-
17
Chapitre 2. Sentiment Analysis and Machine Learning
count, the annotators are asked to give the sentiment of a word in a given context.
Other lexicons focus on the sentiment strength, they indicate the strength by using
additional categories such as (strong, weak) which results into five labels (strong
positive, strong negative, neutral, weak positive, weak negative). The sentiment
strength can be expressed by giving a polarity score to each word which make the
annotation more complex.
Several manually created sentiment resources have been constructed and success-
fully applied in sentiment analysis. For example, the General Inquirer 1has sen-
timent labels for about 3600 terms (gen, 1966). Bing Liu’s Opinion Lexicon (Hu
et Liu, 2004a) consists of about 6,800 manually labeled words. The MPQA Sub-
jectivity Lexicon, which draws from the General Inquirer and other sources, has
sentiment labels for about 8000 words(Wiebe et al., 2005). The NRC Emotion
Lexicon has sentiment and emotion labels for about 14,000 words (Mohammad et
Turney, 2010), these labels joy, sadness, anger, fear, surprise, anticipation, trust,
and disgust were compiled through Mechanical Turk annotations.
The main disadvantage of the manual approach is that it is costly and time-
consuming.
2. Dictionary-Based Approach
This approach uses few seed sentiment words to bootstrap based on the synonym
and antonym links present in a dictionary such as WordNet. This method works
as follows :
A small set of sentiment words (seeds) with known positive or negative orienta-
tions is first collected manually. The algorithm then extends this set by searching
in WordNet or another dictionary for their synonyms and antonyms. The newly
found words are added to the seed list. The next iteration starts. The iterative
process ends when no more new words can be found (Liu, 2012).
This approach quickly finds a large number of sentiment words with their orien-
tations. But many errors can be found in the resulting list. SentiWordNet is an
example of using this approach on WordNet (Baccianella et al., 2010). Opinion
Digger (Moghaddam et Ester, 2010) uses also a similar approach.
1. http ://www.wjh.harvard.edu/ inquirer/
18
2.5. Sentiment Analysis Approaches
The main disadvantage is that this approach is context and domain independent.
3. Corpus-Based Approach
This approach relies on the following statement : "A document should be positive
(or negative) if it contains many positive (or negative) words, and a word should
be positive (or negative) if it appears in many positive (or negative) documents".
The corpus-based approach has been used in two scenarios :
(a) Given a seed list of known sentiment words, discover other sentiment words
and their orientations from a domain corpus.
(b) Given a set of noisy or automatically collected labeled data, discover the
polarity of each word.
A corpus and some seed adjective sentiment words were used by (Hatzivassilo-
glou et McKeown, 1997) to find additional sentiment adjectives in the corpus.
Their method exploits a set of linguistic rules to identify more adjective senti-
ment words and their orientations from the corpus. One of the rules is about
the conjunction AND, which says that conjoined adjectives usually have the same
orientation. Rules were also designed for other connectives, i.e., OR, BUT, EI-
THER–OR, and NEITHER–NOR.
Turney (2002) estimated the sentiment orientation of the extracted phrases using
the Pointwise Mutual Information (PMI). The sentiment orientation of a phrase
is computed based on its association with the positive reference word "excellent"
and the negative reference word "poor".
Turney et Littman (2003) used also sentiment orientation to compute the polarity
of a given word. They computed the orientation of the word from the strength
of its association with a set of positive words (good, nice, excellent, positive, for-
tunate, correct, and superior), minus the strength of its association with a set of
negative words (bad, nasty, poor, negative, unfortunate, wrong, and inferior). The
association strength is measured using Point-wise Mutual Information. They used
AltaVista search engine to compute the probability of finding the word Excellent
(or any other seed word) and a word w. Then, the sentiment orientation SO :
PMI(w, Excellent)=log(p(Excellent,w)
p(Excellent).p(w))
19
Chapitre 2. Sentiment Analysis and Machine Learning
SO(w)=PMI(w, Excellent)≠PMI(w, Bad)
Mohammad (2012) collected a set of 775,000 tweets to generate a large word-
sentiment association lexicon ; a tweet was considered positive if it has one of 32
positive hashtagged seed words, and negative if it has one of 36 negative hash-
tagged seed words ; the association score for a term was calculated using SO.
Mohammad et al. (2013) used similar method on the sentiment-140 corpus (Go
et al., 2009), a collection of 1.6 million tweets that contains positive and negative
emoticons ; the tweets are labeled positive or negative according to the emoticons.
We proposed (Hamdan et al., 2015c) a new metric called Natural Entropy instead
of PMI in order to construct a sentiment lexicon using sentiment-140 corpus.
2.5.2 Supervised Approach
The supervised approach is a machine learning approach (see the following Section).
Sentiment classification can be seen as a text classification problem. For classifying a do-
cument into different topics, i.e., politics, economics, sciences and sports, topic-related
words are the key features while in sentiment classification the opinion words which
indicate positive or negative opinions are more important like good, bad, happy, sad
(Liu, 2012; Pang et al., 2002).
In general, the supervised text classification consists of two phases : Training and
Predicting.
For training phase, a corpus of annotated documents is needed, a document repre-
sentation process extracts the features, these features will be presented to a supervised
classification method to learn a classification model which will predict the labels of new
documents.
For predicting phase, a new document is presented to the same document represen-
tation process to extract the features which represent the document in order to use the
classification model to predict the label of the document. Figure 2.2 shows these two
phases.
The research papers in sentiment classification have mainly focused on the two
20
2.5. Sentiment Analysis Approaches
Figure 2.2 – Training and predicting phases of a text classification system.
steps : document representation and classification methods.
Document Representation : the document refers to any text (long or short), choo-
sing the useful features that represent the document is important for any classification
system. Features can be extracted or learned. The Feature extraction process extracts
some useful features to enrich the document representation. This traditional hand-
crafted features often requires expensive human labor and often relies on a domain
expert. Therefore, feature learning can be an alternative solution which is a set of tech-
niques that learns a transformation of input data to a representation or the compositio-
nality of the input that can be effectively exploited in machine learning tasks.
Classification Method : As we previously mentioned, Sentiment classification is
essentially a text classification problem. Therefore, any existing supervised learning
method can be applied, e.g., Naive Bayes, Support Vector Machines (SVM), Logistic
Regression (LR), k-nearest neighbors (k-NN), Maximum Entropy, Conditional Random
Field (CRF) and Neural Networks.
While some papers have extended the bag-of-word representation by adding dif-
ferent types of features (Hamdan et al., 2013, 2015c; Mohammad et al., 2013; Pang
et al., 2002), others have proposed different weighting schemas to weight the features
such as PMI, Information Gain and chi-square ‰2(Deng et al., 2014; Martineau et Fi-
nin, 2009; Paltoglou et Thelwall, 2010). Recently, after the success of deep learning
techniques in many classification systems, several studies have learned the features ins-
tead of extracting them (Severyn et Moschitti, 2015; Socher et al., 2013).
21
Chapitre 2. Sentiment Analysis and Machine Learning
The work of Pang et al. (2002) was the first to apply this approach to classify the mo-
vie reviews into two classes positive or negative. They tested several classifiers (Naive
Bayes, SVM, Maximum entropy) with several features, they reported the best perfor-
mance given by SVM with unigram text representation.
Later on, many studies have proposed different features and some feature selection
methods to choose the best feature set. Many features have been exploited :
—Terms and their weights : The features are the unigrams or n-grams with the
associated frequency or weight given by a weighting schema like TF-IDF or PMI.
—Part of Speech (POS) : The words can indicate different sentiment according to
their parts of speech (POS). Some papers treated the adjectives as special features.
—Sentiment Lexicons : The words and expressions which express an opinion have
been used to add additional features as the number of positive and negative terms.
—Sentiment Shifters : The terms that are used to change the sentiment orientation,
from positive to negative or vice versa such as not and never. Taking into account
these features can improve the sentiment classification.
—Semantic Features : The named entities, concepts and topics have been extracted
to get the semantic of the text.
Many systems which have worked on feature extraction have achieved a state-of-the-
art performance in many competitions like SemEval 2. For example, Mohammad et al.
(2013) used SVM model with several types of features including terms, POS and senti-
ment lexicons in Twitter data set. we (Hamdan et al., 2015a,b,c) have also proved the
importance of feature extraction with logistic regression classifier in Twitter and reviews
of restaurants and laptops. We extracted terms, sentiment lexicon and some semantic
features like topics (Chapters 4 and 5 will discuss that). And we have proposed to ex-
tract the concepts from DBPedia (Hamdan et al., 2013) and some statistical features
computed using Z score (Hamdan et al., 2014a).
While feature extraction has got a lot of attention, some studies have focused on term
weighting, they used uni-gram representation with a weighting schema which exploits
the category information in the labeled data. For example, Martineau et Finin (2009)
2. https ://www.cs.york.ac.uk/semeval-2013/task2.html
22
2.6. Machine Learning
proposed Delta TF-IDF weighting in which the final term weight is the difference bet-
ween TF-IDF in positive class and negative class. Later on, Paltoglou et Thelwall (2010)
studied some variants of the classic TF-IDF scheme adapted to sentiment analysis. Seve-
ral recent studies have discussed the supervised weighting schema (see Chapter 3).
Recently, some research papers have applied deep learning techniques to sentiment
classification. Socher et al. (2013) proposed to use recursive neural network to capture
the compositionality in the phrases, their model outperforms other traditional models
which opens the door to adopt these techniques in many later studies.
Tang et al. (2014) combined the hand-crafted features with learned features. They
used neural network for learning sentiment-specific word embedding, a noisy label Twit-
ter corpus was used for training the network, then they combined hand-crafted features
with these word embedding to produce a state-of-the-art system in sentiment analysis
in Twitter.
Kim (2014) proposed a simple convolutional neural network with one layer of convo-
lution which performs remarkably well. Their results add to the well-established evi-
dence that unsupervised pre-training of word vectors is an important ingredient in deep
learning for Natural language processing. Severyn et Moschitti (2015) applied the same
deep convolutional neural network for Twitter sentiment analysis, they also used a noisy
labeled corpus besides a training data provided by SemEval (see 4.5.1), their system was
one of the best systems in SemEval-2015 for sentiment analysis in Twitter.
As the supervised approach has proved its performance, we have adopted it in this
research. The following section will present the machine learning concepts and describe
the supervised methods that we employed.
2.6 Machine Learning
Machine learning is defined as a set of methods that can automatically detect pat-
terns in data, and then use the uncovered patterns to predict future data, or to perform
other kinds of decision making under uncertainty (Mohri et al., 2012). Machine lear-
ning is usually divided into two main types : Supervised Learning and Unsupervised
23
Chapitre 2. Sentiment Analysis and Machine Learning
Learning.
1. Supervised Learning
In the predictive or supervised learning approach, the goal is to learn a mapping
from inputs x to outputs y, given a labeled set of input-output pairs :
D={(xi,y
i)}N
i=1
Here D is called the training set, and N is the number of training examples. Each
training input xiis a m-dimensional vector of numbers. These dimensions are
called features or attributes. Similarly the form of the output or response variable
can in principle be anything, but most methods assume that yiis a categorical or
nominal variable from some finite set, yiœ{1, .., C}or that yiis a real-valued
scalar. When yiis categorical, the problem is known as classification or pattern
recognition, and when yiis real-valued, the problem is known as regression. In
classification, if C=2, this is called binary classification in which we often assume
yiœ{0,1}; if C>2, this is called multi-class classification.
One way to formalize the problem is as an approximation function. We assume
y=f(x) for some unknown function f, and the goal of learning is to estimate the
function f given a labeled training set, and then to make predictions using ‚
y=
‚
f(x)(We use the hat symbol to denote an estimate.). The main goal is to make
predictions on new inputs, meaning ones that we have not seen before (this is
called generalization), since predicting the response on the training set is easy
(we can just look up the answer).
2. Unsupervised Learning
The second main type of machine learning is the descriptive or unsupervised lear-
ning approach. Here we are only given inputs, D={(xi)}N
i=1, and the goal is to
find interesting patterns in the data. This is sometimes called knowledge disco-
very. This is a much less well-defined problem, since we are not told what kinds of
patterns to look for, and there is no obvious error metric to use (unlike supervised
learning, where we can compare our prediction of y for a given x to the observed
value).
24
2.6. Machine Learning
There is a third type of machine learning, known as reinforcement learning, which is
somewhat less commonly used. This is useful for learning how to act or behave when
given occasional reward or punishment signals. (For example, consider how a baby
learns to walk.)
Supervised learning is the form of machine learning most widely used in practice. In
this thesis we will use three supervised classification methods : SVM, Logistic Regression
and Conditional Random Field. The first two methods will be used to classify a docu-
ment while the last one will be used for tagging a sequential input to extract the opinion
targets expressions. In the following subsections, we present how we can represent the
text before using a machine learner, then these three methods will be described and the
evaluation measures which will be used to evaluate the classification performance will
be discussed.
2.6.1 Text Representation
The text data is unstructured, it has to be transformed into a structured data in
order to be treated by any algorithm. To do this, many preprocessing techniques can
be done. After converting an unstructured data into a structured data, we need to have
an effective document representation model to build an efficient classification system.
Bag-of-Word (BoW) is one of the basic methods of document representation. The BoW
is used to form a vector representing a document using the frequency count of each
term in the document. This method of document representation is called Vector Space
Model (VSM) (Salton et al., 1975).
The vector space model (VSM) represents documents as vectors in m-dimensional
space, Let us denote the set of documents by D={d1,d
2, .., dn},F={f1,f
2, ..., fm}
is the vocabulary or the features. The document djis represented by a bag-of-words
vector : dj =(w1j,w
2j, ..., wmj ), where wij stands for the weight of feature fiin document
dj. The simplest way of document encoding is to use binary weights (1 if the term is
present in the document, 0 if it is absent). Term frequency is widely used to weight the
terms. To improve the performance, more complex term weighting schemes can be used.
The weighting schema can be unsupervised as IDF (Inverted Document Frequency) or
25
Chapitre 2. Sentiment Analysis and Machine Learning
supervised as PMI (pair-wise mutual information), IG (Information Gain) (Lan et al.,
2009) (Chapter 3 will discuss this issue).
Unfortunately, BoW/VSM representation schema has its limitations (Bloehdorn et
Hotho, 2004). Some of them are :
— High dimensionality of the representation : the number of dimensions is large over
the whole corpus which leads to sparse document vector representation.
— Loss of correlation with adjacent words : this representation does not capture the
co-occurrence relations. For example, Multi-Word Expressions with an own mea-
ning like “European Union” are chunked into pieces with possibly very different
meanings .
— Synonymy : Since the BOW representation does not connect the synonyms, the
different words which express the same concept or the same meaning will be
considered different dimensions.
— Polysemy : the same word can express different meanings depending on the context.
Since, the BOW does not capture the meaning, the same word with two different
meanings is considered a single term. This problem is also called semantic disam-
biguation.
— Lack of Generalization : It does not capture the semantic relationship that exist
among the terms in a document. There is no way to generalize similar terms like
“beef” and “pork” to their common hypernym “meat”.
To overcome these problems, feature selection is used to reduce the dimensionality
and feature extraction have been widely used to overcome the remaining limitations.
Feature extraction can add new features to the terms which lead to a rich document
representation that may overcome these limitations and improve the classification per-
formance.
2.6.2 Support Vector Machines (SVM)
The Support Vector Machines (SVM) (Boser et al., 1992; Cortes et Vapnik, 1995) are
supervised learning models. Given a set of training examples, each marked for belonging
26
2.6. Machine Learning
to one of two classes, SVM algorithm builds a model that assigns new examples to one
class. SVM model represents the examples as points in space (feature space). In cases
where the examples are linearly separable, there are many potential hyperplanes which
can separate the classes. SVM seeks to construct the best hyperplane. Intuitively, a good
separation is achieved by the hyperplane that has the largest distance to the nearest
training-data points of the two classes (so-called functional margin), since in general
the larger the margin the lower the generalization error of the classifier.
Given two classes of examples (•and O) that are linearly separable, the hyperplane
that separates the examples : (≠æw≠æx≠b=0) represents the classification model as
illustrated in Figure 2.3, ≠æxrepresents the vector of features and ≠æwrepresents the
vector of weights corresponding to each feature.
Figure 2.3 – SVM classifier.
SVM are naturally two-class classifiers. Nevertheless, many works adapted them to
multi-class classifiers using a set of one-versus-all classifiers.
In addition to performing linear classification, SVM can efficiently perform a non-
linear classification using what is called the kernel trick, implicitly mapping their inputs
into high-dimensional feature spaces.
Using SVM needs to specify some choices : choice of parameter C and the similarity
function (the Kernel). The parameter “C” relates to the cost function (it gives weight
to the data) when “C” is large, the variance will be increased (try to fit as close as
27
Chapitre 2. Sentiment Analysis and Machine Learning
possible to the training data) with a risk of over-fitting which makes the model bad for
generalization. When C is too small, we risk under-fitting. The kernel can be linear or
non-linear.
2.6.3 Logistic Regression (LR)
Linear regression attempts to model the relationship between a dependent (the
class) Y and one or more independent variables (the features) X by fitting a linear
equation to observed data (James et al., 2014). A linear regression hyperplane has an
equation of the form :
Y=–+—X
where X is the an independent variables and Y is the dependent variable. The slope of
the hyperplane is —, and –is the intercept (the value of Y when X = 0).
Linear regression could produce any value. For classification, we need a model to
predict the probability of the independent variable (the class) which resides in the in-
terval [0..1]. Therefore a logistic regression model was proposed, it predicts the probabi-
lity of an outcome that can only have two values. That means it models P(x)=P(y=1|x)
where we suppose y=1 is the positive class and y=0 is the negative one. This can be
written as :
P(x)=p(y=1|x)== e–+—x
1+e–+—x=1
1+e≠(–+—x)
where P(x) is the probability of the independent variables, p(y=1|x) is the probability
of the positive class given the independent variables, eis the base of the natural loga-
rithm (about 2.718) and –and —are the parameters of the model. Figure 2.4 illustrates
the function of linear and logistic regression in two dimensions (i.e. on independent
variable).
2.6.4 Conditional Random Fields (CRF)
A conditional random field is a model for labeling sequences of tokens with tags
drawn from a finite set (Sutton et McCallum, 2012). Typical applications include part-
28
2.6. Machine Learning
Figure 2.4 – Logistic and Linear regression functions.
of-speech tagging and named-entity extraction. A sequence tagging problem can be
viewed as a sequence classification problem where the categories are sequences of tags.
If there are K different tags, a sequence of length N has up to KNpossible sequences of
tags.
CRF of a sequence of observations xassumes that there is a hidden sequence of
states y. The linear-chain CRF is illustrated in Figure 2.5, where :
x=<x1, x2, .., xt> : observation sequence.
y=<y1, y2, .., yt> : hidden state sequence.
Figure 2.5 – The linear-chain CRF
CRF models the conditional distribution p(y|x) over hidden sequence ygiven ob-
servation sequence x. This model is trained to label an unknown observation sequence
by selecting the hidden sequence that maximizes p(y|x). The conditional distribution
29
Chapitre 2. Sentiment Analysis and Machine Learning
takes the form :
p◊(y|x)= 1
Z◊(x)exp Y
]
[
T
ÿ
t=1
K
ÿ
k=1
◊kfk(yt≠1,y
t,x
t)Z
^
\
{fk}1ÆkÆKis an arbitrary set of feature functions and {◊k}1ÆkÆKare the associated real-
valued parameter values. The CRF form is sometimes referred to as linear-chain CRF,
although it is still more general, as ytand xtcould be composed not directly of the
individual sequence tokens, but on sub-sequences (e.g., trigrams) or other localized
characteristics. We will denote by Y,X, respectively, the sets in which ytand xttake
their values. The normalization factor is defined by :
Z◊(x)= ÿ
yœYT
exp Y
]
[
T
ÿ
t=1
K
ÿ
k=1
◊kfk(yt≠1,y
t,x
t)Z
^
\.
2.6.5 Classifier Evaluation
The evaluation of a classification system is usually done experimentally. Experimen-
tal evaluation gives an estimate of the classifier effectiveness and performance on the
testing data set. It provides a possibility to statistically compare performance of different
classifiers.
Normally, we have two separate annotated data sets for training and testing. During
the training phase, the classification algorithm learns a classification model or a classi-
fier which will be applied to predict the novel unclassified documents. The testing data
set is used to evaluate the performance of the classifier.
If we do not have two separate data sets for training and testing, we can use n-fold
cross-validation where the annotated data set is divided in n equal parts. After this, the
classifier is trained n times, each time on n-1 different parts, being tested on the left out
part on every run. The performance is averaged over the ten runs.
2.6.5.1 Precision, Recall, F-measure and Accuracy
Let’s run the classifier on the test documents. Let’s take a particular class of test docu-
ments ci (the documents belonging to this class in test set called positive examples while
30
2.6. Machine Learning
the documents of other classes are considered as negative examples), the outcomes of
the classifier can be described using four terms :
— True positives tp : the number of correctly classified documents that belong to
positive class.
— True negatives tn : the number of correctly classified documents that do not belong
to the positive class.
— False positives fp : the number of documents that incorrectly assigned to the posi-
tive class.
— False negatives fn : the number of documents that incorrectly not assigned to the
positive class.
The former four outcomes are the base of the evaluation measures : Precision, Recall,
F—-Measure and accuracy. These four outcomes can be formulated in a 2◊2 contingency
table or confusion matrix, as illustrated in Table 2.1.
Class ci Positive examples Negative examples
Classified as positive tp fn
Classified as negative fp tn
Table 2.1 – Contingency table or confusion matrix.
The precision of the class is the ratio of correctly assigned documents to class ci out
of all documents assigned to the class ci.
precision =tp
tp +fp
Recall of the class is the ratio of correctly assigned documents to class ci out of all
documents that really belonging to the class ci.
recall =tp
tp +fn
F—≠Measure is a weighted harmonic mean of Precision and Recall and is usually used
with (—=1). These measures can be calculated as follows :
F—=(1 + —2).P recision.Recall
—2.P recision +Recall
31
Chapitre 2. Sentiment Analysis and Machine Learning
The values of the F-measure lie in the interval [0, 1]. The higher the F-measure, the
better the classification performance. If —=1precision and recall are weighted evenly
to account equally for both evaluation metrics. Then, it is called F1-measure :
F1= 2.P recision.Recall
P recision +Recall
Another important measure is accuracy which takes into account the four outcomes.
Accuracy is the ratio of correctly classified documents in all classes out of all documents.
Accuracy =tp +tn
tp +tn +fp+fn
In some experimental results, we multiply the f1 score by 100 to make its value in the
interval [0,100].
2.6.5.2 Micro/Macro Measures
In text classification with a set of different categories classifier C=c1, c2, ..,cn. The
performance is evaluated using Precision, Recall or F1-Measure for each category. Eva-
luation results must be averaged across the different categories. We refer to the sets of
true positives, true negatives, false positives and false negative examples for the cate-
gory ci using tpi,tni,fpi, fni respectively. In Micro-averaging, categories participate in
the average proportionally to the number of their positive examples Sebastiani (2002).
This applies to both MicroAvgPrecision and MicroAvgRecall.
M icroAvgP recision =q|C|
i=1 tpi
q|C|
i=1 tpi +fpi
MicroAvgRecall =q|C|
i=1 tpi
q|C|
i=1 tpi +fni
In other hand, for Macroaveraging all categories count the same ; frequent and in-
frequent categories participate equally in MacroAvgPrecision and MacroAvgRecall (Se-
bastiani, 2002).
M acroAvgP recision =q|C|
i=1
tpi
tpi+fpi
|C|
32
2.7. Summary and Discussion
MacroAvgRecalln =q|C|
i=1
tpi
tpi+fni
|C|
MicroAvgF1-Measure and MacroAvgF1-Measure are calculated according to equations
M icroAvgF 1= 2.M icroAvgP r ecision.Micr oAvgRecall
M icroAvgP recision +MicroAvgRecall
M acroAvgF 1= 2.M acroAvgP r ecision.Macr oAvgRecalln
M acroAvgP recision +MacroAvgRecalln
In fact, Microaveraging favors classifiers with good behavior on categories that are heav
ily populated with document while Macroaveraging favors those with good behavior
on poorly populated categories. In general, developing classifiers that behave well on
poorly populated categories is very challenging therefore most research use Macroave-
raging for evaluation (Sebastiani, 2002).
2.7 Summary and Discussion
This chapter introduced the definition of opinion and the main concepts in sentiment
analysis and machine learning. The main distinction was made between two approaches
for sentiment analysis : supervised approach which requires labeled data and lexicon-
based approach (unsupervised).
The lexicon-based approach depends on sentiment lexicons which can be construc-
ted in three different ways. The first method is the manual one in which the humans
annotate some terms according to the sentiment they express. The second is dictionary-
based methods which rely on a lexical resource as WordNet and a small set of positive
and negative words in order to get the sentiment of all words in the resource. The third
method is corpus-based which based on the co-occurrence between the terms and some
opinionated words (as good, bad, happy) or based on a noisy labeled data. The manual
approach is time-consuming and requires human efforts while the main drawback of the
dictionary-based is the difficulty to find domain-specific sentiment words. Corpus-based
methods solve this problem, but suffer from necessity of a large corpus, preferably in
the same domain in order to infer polarity of as many words as possible.
33
Chapitre 2. Sentiment Analysis and Machine Learning
The supervised approach normally gives a better performance because it learns from
labeled data which has the same relationship we seek to predict in the new data. The
model learned from a specific corpus can not be often applied to a different corpus, we
need to train a model for predicting the sentiment of tweets and another one for the
sentiment in movie reviews. The main difficulty in this approach is the lack of linguistic
explanations on the weighting and importance of words in the model which makes it
difficult to interpret the results produced by this model.
The supervised approach could exploit the sentiment lexicons by incorporating some
features extracting from the lexicons which makes this approach more reliable and pre-
cise.
We presented the three supervised methods which we will exploit during this re-
search and the measures which will be used to evaluate the performance of the systems
based on these methods.
34
Chapitre 3
Supervised Metrics for Term Weighting
in Sentiment Analysis
Abstract
Term weighting metrics assign weights to terms in order to discriminate the im-
portant terms from the less crucial ones. Due to this characteristic, these metrics have
attracted growing attention in Text Classification and recently in Sentiment Analysis.
Using the weights given by such metrics could lead to more accurate document re-
presentation which may improve the performance of the classification. While previous
studies have focused on testing different weighting metrics at two-classes document
level sentiment analysis, this study inspects if the weighting metrics could be efficient
in three-classes short text sentiment analysis. We present an empirical study of fifteen
global supervised weighting metrics with four local weighting metrics adopted from In-
formation Retrieval. We also give an analysis to understand the behavior of each metric,
and therefore we study how each metric distributes the terms and deduce some cha-
racteristics which may distinguish the good and bad metrics. The evaluation has been
done using Support Vector Machine on three different datasets : Twitter, restaurant and
laptop reviews.
35
Chapitre 3. Supervised Metrics for Term Weighting in Sentiment Analysis
Contents
3.1 Introduction ............................. 37
3.2 Related Work ............................. 39
3.3 Term Weighting Metrics ...................... 40
3.3.1 Local Weight ............................ 41
3.3.2 Global Weight ........................... 41
3.3.3 Normalization ............................ 45
3.3.4 Score Aggregation ......................... 45
3.4 Datasets ................................ 46
3.4.1 Twitter Dataset ........................... 46
3.4.2 Restaurant and Laptop Reviews Datasets ............ 47
3.5 Experiments ............................. 47
3.5.1 Experiment Setup ......................... 47
3.5.2 Experiment Evaluations ...................... 48
3.6 Conclusion and Future Work ................... 54
36
3.1. Introduction
3.1 Introduction
Polarity classification is the basic task in sentiment analysis in which the polarity of a
given text should be determined, i.e. whether the expressed opinion is positive, negative
or neutral. This analysis can be done at different levels of granularity : Document Le-
vel, Sentence Level or Aspect Level. Different machine learning approaches have been
proposed for accomplishing this task : lexicon-based and supervised. The supervised
methods have been widely used and have achieved good results since 2002.
In this chapter, we are going to study how the term weighting could affect the perfor-
mance of a supervised sentiment analysis system which is the first direction to improve
a sentiment analyzer (see 1.3).
Document representation is a critical component in Sentiment Analysis just like in
Information Retrieval and Text Classification, the Vector Space Model is one of the most
popular models in which each document can be seen as a vector of independent features
or terms, and each term assigned a weight according to a weighting schema or metric.
The basic weighting schema uses binary weights (w = 1 if the term is present in the
document, and w = 0 if not). A better and much referenced weighting schema is the
tf (Term Frequency) or the tf*idf (Inverted Document Frequency) schema. Many other
schema have been proposed aiming at making the text classifiers more accurate.
In sentiment analysis, the early work by Pang et al. (2002) reported that binary
weight schema outperforms the term frequency. Recent research has focused on more
complex term weighting schemas which are usually called supervised weighting metrics
as they exploit the categorical information. Some metrics have been adopted from in-
formation retrieval such as DeltaIDF (Martineau et Finin, 2009; Paltoglou et Thelwall,
2010), later on several metrics have been proposed involving those adopted from in-
formation theory and widely used in text classification such as information gain and
Mutual Information (Deng et al., 2014). Recently, Wu et Gu (2014) also tested several
methods adopted from information retrieval and information theory, they also proposed
a new metric called natural entropy (nent) inspired from information theory.
Just like Information Retrieval, term weight is depending on three factors :
37
Chapitre 3. Supervised Metrics for Term Weighting in Sentiment Analysis
— Local factor : which is a function of term frequency tf within the document.
— Global factor : which is a function of term frequency at the corpus level such as
the document frequency df.
— Normalization factor : which normalizes the weights in each document, the nor-
malization can also be done for local and global factors.
This general definition of term weight is used in (Paltoglou et Thelwall, 2010). While
Deng et al. (2014); Wu et Gu (2014) considered that a supervised term weighting
schema based on two basic factors : Importance of a term in a document (ITD) and
importance of a term for expressing sentiment (ITS), the ITD is exactly the local factor,
ITS is the global factor in the general definition of term weighting.
We can also distinguish between the unsupervised weighting methods which only
use the distribution of the term in the corpus for global weight without any category
information just like in information retrieval, and supervised weighting methods which
use the available category information for more efficient estimation of term importance.
Thus, for each term in each category we get a score, the final score should be a function
of these scores such as the maximum score, the sum or the weighted sum of the term
scores over the categories.
This chapter presents an empirical study of four local weighting schema and fifteen
supervised global weighting schema. Theses metrics are evaluated on three datasets
provided in SemEval tasks : Sentiment Analysis in Twitter (Nakov et al., 2013) and
Aspect-Based Sentiment Analysis (Pontiki et al., 2015). In context of sentiment analysis,
several studies have evaluated some schemas, but they all evaluated their schema using
binary classification (if a given text is positive or negative) and at document level (movie
reviews), they reported the results on different datasets but they did not explain the
results or the behavior of each metric in order to understand why their proposed metrics
improve the performance. In all these studies, Support Vector Machine (SVM) classifier
has been used as an algorithm for evaluating the metrics.
The intuition beyond using these metrics is that a supervised weighting schema may
give more realistic representation of a document which may improve the performance
of a SVM classifier.
38
3.2. Related Work
In this chapter, we study the impact of term weighting on short text sentiment ana-
lysis (sentence or aspect level), and on three-classes classification problem (positive,
negative, neutral). We also go beyond the empirical study to understand the behavior
of each metric and deduce some characteristics which may distinguish the good and the
bad metrics. Therefore, we formulate our study to address three questions :
1. Are the global weighting metrics useful for sentiment analysis ?
2. If a global weighting metric is useful, are the local metrics useful ?
3. What does make a global metric useful and how can we interpret its performance ?
The remaining of this chapter is organized as follows. Section 2 outlines existing work
concerning supervised weighting metrics in sentiment analysis. Section 3 describes the
term weighting metrics. Datasets are presented in Section 4. Our experiments and ana-
lysis are discussed in section 5, and future work is presented in Section 6.
3.2 Related Work
Term weighting is the task of assigning scores to terms, these scores measure the
importance of a term according to the objective task. A lot of term weighting methods
have been proposed for Information Retrieval, all based on Salton’s definition (Salton et
Buckley, 1988) where term weight is function of three factors : term frequency, inverse
document frequency, and normalization.
While the term weighting methods in Information Retrieval are unsupervised, many
supervised methods have been proposed in Text Classification, they have proved their
efficiency in many studies (Debole et Sebastiani, 2003; Sebastiani, 2002)(Ren et Soh-
rab, 2013)(Forman, 2003)(Savoy, 2013).
Supervised classification methods have been widely used for sentiment analysis,
early work by Pang et al. (2002) reported that SVM outperforms other classifiers and the
binary term representation also outperforms term frequency. Thus, following research
has used the binary representation. Recently, research has focused on more efficient
term weighting methods to improve the performance of sentiment analysis, Martineau
et Finin (2009) proposed Delta tf*idf in which the final term weight is the difference
39
<