ChapterPDF Available

Abstract and Figures

With the on growing usage of microblogging services, such as Twitter, millions of users share opinions daily on virtually everything. Making sense of this huge amount of data using sentiment and emotion analysis, can provide invaluable benefits to organizations trying to better understand what the public thinks about their services and products. While the vast majority of now-a-days researches are solely focusing on improving the algorithms used for sentiment and emotion evaluation, the present one underlines the benefits of using a semantic based approach for modeling the analysis’ results, the emotions and the social media specific concepts. By storing the results as structured data, the possibilities offered by semantic web technologies, such as inference and accessing the vast knowledge in Linked Open Data, can be fully exploited. The paper also presents a novel semantic social media analysis platform, which is able to properly emphasize the users’ complex feeling such as happiness, affection, surprise, anger or sadness.
Content may be subject to copyright.
Semantic Web-based Social Media Analysis
Liviu-Adrian Cotfas1,2, Camelia Delcea2, Antonin Segault1, Ioan Roxin1
1Franche-Comté University, Montbéliard, France
{liviu-adrian.cotfas,ioan.roxin}@univ-fcomte.fr,
antonin.segault@edu.univ-fcomte.fr
2Bucharest University of Economic Studies, Bucharest, Romania
camelia.delcea@csie.ase.ro
Abstract. With the on growing usage of microblogging services, such as Twitter,
millions of users share opinions daily on virtually everything. Making sense of
this huge amount of data using sentiment and emotion analysis, can provide in-
valuable benefits to companies trying to better understand what the public thinks
about their services and products. While the vast majority of now-a-days re-
searches are focusing on improving the algorithms for sentiment and emotion
evaluation, the present one underlines the importance of using a semantic based
approach for modeling the analysis’ results, the emotions and the social media
specific concepts. Moreover, by storing the results as structured data, the possi-
bilities offered by semantic web technologies, such as inference and accessing
the vast knowledge in Linked Open Data, can be fully exploited. The paper also
presents a novel semantic social media analysis platform, which is able to
properly emphasize the users’ complex feeling such as happiness, affection, sur-
prise, anger or sadness.
Keywords: ontology, emotion analysis, sentiment analysis, semantic web, twit-
ter, social media analysis
1 Introduction
The last few years have witnessed an amazingly fast-paced growth in the usage of social
media networks. Thus, the most commonly used micro-blogging service, Twitter
1
,
which allows users to broadcast 140 character status messages, also known as tweets,
has over 240 million monthly active users, who post more than 500 million tweets every
day, as reported in April 2014. Many of these messages contain sentiment and emotion
indications regarding almost any topic, therefore turning Twitter into a rich data source
for analyzing the public’s opinion. Moreover, various existing researches have already
shown that users frequently express opinions regarding products and services [1] in
their tweets. Correctly extracting these opinions could easily provide invaluable infor-
mation to companies willing to better understand their customer’s needs. Compared to
traditional marketing studies, which can take time and involve high costs, social media
1
http://www.twitter.com
emotion and sentiment analysis offers the promise of obtaining almost real-time opin-
ions from huge numbers of actual or potential customers.
Sentiment and emotion analysis are growing areas of Natural Language Processing,
commonly used to get insights from customer reviews, blogs and more recently from
social media messages. They require a multidisciplinary approach, combining elements
from fields such as linguistics, psychology and artificial intelligence. Among the tasks
to which they have already been applied, we can mention analyzing customer’s opin-
ions [2, 3], analyzing public’s opinion during crisis [4], predicting political elections
outcome [5] and even stock market evolution prediction [6].
Sentiment analysis is used to determine whether a text expresses a positive, negative
or neutral perception [7, 8], also known as polarity. Besides simply determining the
perception, some papers also investigate how the strength of the perception should be
evaluated [9], thus providing a more in-depth understanding of the user’s actual feel-
ings.
While knowing the perception of the user is definitely important, analyzing the cat-
egories of emotions contained in Twitter messages using emotion analysis can provide
even more information, by putting the focus on the actual feelings, such as joy, surprise,
sadness or anger.
Aspect based sentiment and emotion analysis are able to complete the picture by
associating the determined perceptions and emotions with particular properties of the
analyzed entities, thus taking into consideration the fact that users frequently express
different and sometimes event contradictory feelings regarding the various features and
characteristics of a product or service, also called facets [10]. Detailed surveys of exit-
ing sentiment and emotion analysis approaches are presented in [1113].
While various social media sentiment and emotion analysis approaches have been
proposed and evaluated in the scientific literature, only a few papers partially address
an equally important aspect, represented by the manner in which the extracted tweets,
together with their associated data, the analyzed entities and their facets, as well as the
results of the analysis could be stored in a standardized, easily interchangeable and ex-
tensible way. Semantic web technologies cannot only meet these requirements, but also,
by employing techniques such as interlinking and reusing of classes and properties from
well-known ontologies [14], bridges towards the huge amount of knowledge available
in Linked Open Data can be created. Therefore, innovative social media analysis plat-
forms can be developed, capable of providing increasingly deeper insights into the cus-
tomer’s real opinions.
In this paper, an end-to-end semantic approach TweetOntoSense - is proposed, that
uses ontologies to model the various emotions expressed in social media messages, the
analyzed entities and their facets, the results of the analysis, as well as the various Twit-
ter related concepts. To the best of our knowledge, there are no current scientific pub-
lications or commercial systems proposing a fully semantic social media analysis ap-
proach. The other contributions of the paper are the TweetOntoSense ontology and the
Twitter ontology. By storing the extracted information as triples, advanced analysis can
be performed using the technologies associated with the semantic web.
The paper is organized as follows. In the second section, a survey of existing ontol-
ogy based approaches found in the scientific literature is provided. The third section
presents the Emotions, Twitter and TweetOntoSense ontologies, which form the bases
of the proposed approach. The fourth section of the paper includes the steps needed to
perform sentiment and emotion analysis. The fifth section shows how the extracted
information can be further exploited using semantic web inference and SPARQL
(SPARQL Protocol and RDF Query Language) queries, to create the bases for devel-
oping an advanced social media analysis platform. The last section summarizes the pa-
per and introduces some of the future research directions.
2 Ontology Based Approaches
According to [15], ontologies are defined as a “formal, specification of a shared con-
ceptualization”. They formally represent knowledge as a hierarchy of concepts, using
a shared vocabulary to denote the types, properties and interrelationships of those con-
cepts. Currently, ontologies have become the means of choice for representing
knowledge, by both providing a common understanding for concepts and being ma-
chine processable.
Existing ontology based social media sentiment and emotion analysis approaches
can be classified in respect to the usage of ontologies in:
approaches modelling only the possible sentiments and emotions;
approaches modelling only the analyzed entities and their facets.
2.1 Approaches modelling only the possible sentiments and emotions
An approach for modelling the possible sentiments and emotions found in Twitter mes-
sages is proposed in [16].
emotion
positivenegative unexpected
anger disgust sadness fear joy love surprise
Fig. 1. Ontology for emotion representation used in [16]
The ontology includes seven basic emotions, composed from the six Ekman emotions
and the additional “love” emotion. The emotions are structured in the positive, negative
and unexpected categories, as shown in Fig. 1, corresponding to the possible senti-
ments.
A potential downside of the approach is represented by the limited set of emotions,
which might not be able to capture all the shades of the opinions expressed in the social
media messages. As shown in the third section of the paper, our approach relies on a
more complex ontology, which structures emotions in a multi-level hierarchy, in which
with every level, emotions become more and more fine-grained.
2.2 Approaches modelling the analyzed entities and their facets
Such approaches take into consideration the fact that users express opinions about the
various characteristics of the analyzed entities and not only about the product or service
as a whole. Modelling the analyzed entities and their facets using an ontology is inves-
tigated in [2], where the authors show how this approach could be applied for evaluating
the public’s sentiments on the different characteristics of several popular smartphones.
An extract from the proposed ontology is presented in Fig. 2.
smartphone
nokia_lumia
SubConcept-Of
lumia_display SubConcept-Of
lumia_windows
lumia_processor lumia_microusb
lumia_battery
lumia_cameraSubConcept-Of
SubConcept-Of
htc_one SubConcept-Of apple_iphoneSubConcept-Of
Fig. 2. Ontology for object facet representation used in [2]
However, the paper does not deal with emotion analysis, nor it proposes how the results
of the sentiment analysis could be represented and does not propose a general entity-
facet ontology that could be used to build a semantic social media analysis platform.
2.3 Representing the results of the aspect emotion analysis
An important step towards representing the results of the emotion analysis process in a
standardized and largely accepted format is represented by the general-purpose emotion
annotation and representation language Emotion Markup Language EmotionML [17].
It is a W3C recommendation for representing emotion related states in data processing
systems and provides twelve vocabularies for appraisals, categories and dimensions,
further described in [18].
An ontology-based approach for representing sentiment analysis results is repre-
sented by Marl [19], a vocabulary designed to annotate and describe subjective opinions
expressed on the web. The Onyx ontology is a recent development towards representing
emotion analysis results, of the approach proposed in Marl. It aims to provide a simple
means to describe emotion analysis processes and results using semantic web technol-
ogies [20]. It is organized around the onyx:EmotionAnalysis, onyx:EmotionSet and
onyx:Emotion classes and reuses several properties and classes, such as prov:Activity
and prov:Entity, from the W3C Provenance Ontology [21]. Therefore, neither Marls,
nor Onyx are able to provide a complete description for both sentiment and emotion
analysis results. Moreover, they were not specifically designed for analyzing social me-
dia messages and cannot capture all the associated details, such as the Twitter account,
its followers and much more other information, that is highly relevant in social media
analysis.
3 Twitter Sentiment and Emotion Analysis Ontologies
The concepts needed in order to perform sentiment and emotion analysis on Twitter
messages can be grouped in three main categories:
concepts that express human emotions;
concepts that describe Twitter specific knowledge;
concepts that provide a connection between the twitter message, the expressed emo-
tions and the analyzed entity and its facets.
As shown in the previous section, while various existing researches focus on the differ-
ent components required for building a fully semantic approach for aspect emotion and
sentiment social media analysis, currently there are no end-to-end semantic solutions.
While for the first category of concepts, the ones describing emotions, several exist-
ing ontologies were found in the scientific literature, for the last two categories no ap-
propriate ontology was identified. Therefore, as an initial step, a Twitter ontology mod-
elling the relations between users, tweets and their associated properties had to be cre-
ated. Afterwards, an aspect sentiment and emotion analysis ontology, named Tweet-
OntoSense, which connects the expressed sentiments and emotions, the twitter mes-
sages and the analyzed entities and their facets was defined.
The main concepts from the three ontologies are shown in Fig. 3, together with the
object properties that connect them. The following subsections describe in further de-
tails the proposed ontologies, used to enable social media analysis using semantic web
technologies.
em:Emotionem:Neutral owl:subClassOf
Emotion Ontology
TweetOntoSense Ontology
twos:TweetEm
otionSet
Twitter Ontology
tw:Tweet twos:analyzedTweet
tw:TwitterAccount
sioc:has_creator
twos:AnalysisResult
twos:TweetSen
timentSet
twos:hasEmotions
twos:hasSentiments
twos:TweetEm
otion
twos:HasTweetEmotion
twos:hasEmotion
Fig. 3. Ontology-based aspect sentiment and emotion analysis
3.1 Emotion Ontology
Several emotion ontologies such as the ones proposed in [16], [22] and [23] currently
exist. From them, it has been chosen the emotional categories ontology presented in
[23], as besides being inspired by recognized psychological models, it also structures
the different human emotions in a taxonomy. The nine top-level emotions in the ontol-
ogy, as well as the second-level emotions associated with the concept of “Anger”, are
shown in Fig. 4.
The ontology contains for each class a number of individuals, representing words
associated with the particular type of emotion. In order to obtain a better coverage of
the words used to express emotions, we have chosen to enrich the ontology using some
of the values in the corresponding WordNet synsets [24]. Fig. 5. shows the WordNet
synset for the word “fear”, corresponding to the concept of “Fear” in the emotion cate-
gories ontology.
Even though the ontology currently supports only English and Spanish, it can easily
be extended with other languages as shown in [25], where the ontology was extended
to include concepts in Italian. Thus, tweets in other languages can be more precisely
analyzed, without having to resort to automatic translation services. This can prove
highly important in many situations, as almost 49% of all the Twitter messages are
written in other languages than English.
The em prefix is used in the rest of the paper to denote classes or properties belonging
to this ontology.
Thing
Emotion
Affection Anger Bravery Disgust Fear Happiness Neutral SadnessSurprise
owl:subClassOf
owl:subClassOf
Sulking Fury Hostility Indignation Envy Annoyance
Frustration
owl:subClassOf owl:subClassOf
owl:subClassOf
Fig. 4. Ontology of emotions [23]
Fig. 5. WordNet synset for “fear”
3.2 Twitter Ontology
When producing semantic data, a good practice is to reuse classes and properties from
existing ontologies [14], as it facilitates mappings with other ontologies such as the
ones in the Linked Open Data
2
project.
Therefore, given the fact that the existing Twitter REST API ontology presented in
[26] does not provide any mappings to well-known ontologies, a new Twitter ontology,
for which the main classes and properties are shown in Fig. 6, is proposed, that both
reuses well-known vocabularies such as Dublin Core
3
(prefix dcterms), FOAF
4
(prefix
foaf),SIOC
5
(prefix sioc) and Basic Geo WGS84
6
(prefix geo) and also facilitates social
media network analysis using SPARQL queries. The tw prefix is used in the rest of the
paper to denote classes or properties belonging to this ontology.
sioc:Post
tw:Tweet
owl:subClassOf
tweet
rdf:type
date
dcterms:created
text sioc:content
tw:TwitterAccount
sioc:UserAccount
owl:subClassOf
sioc:has_creator user
rdf:type
Thing
owl:subClassOf owl:subClassOf
Fig. 6. Twitter ontology
As shown in [27], several generic widely used vocabularies for annotating the data ex-
tracted from social media networks currently exist. One of the best well-known is the
Friend of a friend FOAF ontology, used to represent people and their relationships.
The proposed Twitter ontology reuses from FOAF the foaf:accountName and the
foaf:homepage properties. Another widely used ontology is The Semantically-Inter-
linked Online Communities SIOC ontology, dedicated to the description of infor-
mation exchanges in online communities such as blogs and forums, from which the
proposed ontology reuses several properties, including sioc:has_topic, sioc:content and
sioc:links_to. Moreover, the tw:Tweet and tw:TwitterAccount classes are derived from
the sioc:Post and sioc:UserAccount classes, defined in the SIOC ontology.
2
http://linkeddata.org
3
http://dublincore.org/
4
http://xmlns.com/foaf/spec/
5
http://sioc-project.org/
6
http://www.w3.org/2003/01/geo/
The Dublin Core ontology provides terms to declare a large variety of document's
metadata, from which the dcterms:created and dcterms:language properties have been
reused, in order to specify the date when the tweet was published and the language of
the tweet. The Basic Geo WGS84 vocabulary provides the necessary properties for de-
scribing the location associated with a tweet, through geo:lat and geo:long.
The information associated with a tweet can thus be represented as follows.
<http://twitter.com/13006812/status/454515103182774272>
rdf:type tw:Tweet, owl:NamedIndividual ;
dc:created "2014-04-11T07:03:39Z"^^xsd:dateTime ;
tw:hasFavoriteCount 3 ;
geo:long "6.79" ;
geo:lat "47.52" ;
sioc:has_creator https://twitter.com/twitterAccount1 ;
sioc:content "tweet content" ;
sioc:has_topic "hashtag" ;
dcterms:language [ rdf:value "eng"^^dcterms:RFC4646 ].
3.3 TweetOntoSense Ontology
The application specific ontology, shown in Fig. 7, describes the analyzed entities, like
products, services or events, together with their facets and the detected emotions. The
twos prefix is used in the rest of the paper to denote classes or properties belonging to
this ontology.
em:Emotion
twos:TweetEm
otionSet
tw:Tweet twos:analyzedTweet twos:AnalysisResult
twos:TweetSen
timentSet
twos:hasEmotions
twos:hasSentiments
twos:TweetEm
otion
twos:hasTweetEmotion
twos:hasEmotion
twos:AnalyzedEntity
twos:TweetSen
timent
twos:AnalyzedEntityFacet
twos:hasFacet
twos:hasAnalyzedEntity twos:hasAnalyzedEntity
twos:hasAnalyzedEntityFacet
twos:hasAnalyzedEntityFacet
twos:hasTweetSentiment
Fig. 7. Main classes in the TweetOntoSense ontology
The main classes, around which the ontology is built are twos:AnalysisResult, twos:An-
alyzedEntity and twos:AnalizedEntityFacet. The twos:Entity class serves as a base class
for twos:AnalyzedEntity and twos:AnalizedEntityFacet and defines the twos:hasQuery-
Term data property, containing the keywords or hashtags that will be used to retrieve
the analyzed tweets. The analyzed entity is modeled by the twos:AnalyzedEntity class,
representing the particular product, service or event for which social media analysis is
performed. An alternative approach for modelling the analyzed entities is presented in
[28].
Given the fact that people usually express opinions not only about the concept, but
also about its characteristics, known as facets [2], the twos:AnalizedEntityFacet class
models the relevant characteristics. Finally, the twos:AnalysisResult class provides the
necessary link with the Twitter ontology, previously described. It includes the senti-
ment analysis results, represented by twos:TweetSetimentSet and the emotion analysis
result, represented by twos:TweetEmotionSet. The twos:TweetSentiment class is used
to represent the detected sentiment and stores the associated strength through the
twos:hasSentimentStrength. The twos:TweetEmotion class provides the link with the
detected emotion from the Emotion Ontology and stores the strength of the detected
emotion through the twos:hasEmotionStrength.
4 Ontology based Sentiment and Emotion Analysis
The section shows how the proposed ontologies can be used to perform automatic se-
mantic web-based sentiment and emotion social media analysis.
Fig. 8. Sentiment and emotion analysis steps
Extracting sentiments and emotions from tweets is known to be a challenging task
for several reasons. Among the difficulties that were encountered while performing as-
pect sentiment and emotion analysis, it can be mentioned the huge variety of topics
covered, the informality of the language, as well as the extensive usage of abbreviations
and emoticons. Besides this, the concise nature of the Twitter messages can be consid-
ered both an advantage and a drawback. Further reasons are explained in [29, 30].
The steps used for sentiment and emotion analysis are shown in Fig. 8 and further
described in the subsections bellow.
4.1 Tweet Retrieval
First, the tweets are retrieved using the Twitter Public Stream API, using as track pa-
rameters all the combinations between the keywords associated with the individuals
belonging to twos:AnalyzedEntity and the corresponding individuals from the twos:An-
alyzedEntityFacet class.
Given the fact that unexpected or important events can immediately lead to huge
number of tweets being written every minute, the retrieved tweets are first stored in a
high performance non-relational database and are only afterwards analyzed. Based on
the in-depth comparison of existing non-relational databases provided in [31] and on
our preliminary tests, Apache Cassandra has been chosen for the proposed platform.
4.2 Language Identification
An accurate identification of the language used to write the tweet is highly important
given the fact that many natural language processing algorithms and linguistic re-
sources can only be used with the language for which they were created, with additional
customizations required for other languages.
Previously, the language had to be determined using language detection algorithms
adapted for social media, such as the one presented in [32], which include a modified
version of the original TextCat identification algorithm described in [33]. The algorithm
uses n-gram frequency models to discriminate between the different languages. Cur-
rently, the response received from the Twitter API also includes a field with the detected
language.
While adapting the required algorithms and linguistic resources for each language,
holds the promise of providing more accurate results, automatic translation, such as the
one provided by Google Translate API can also be used for translating the text of the
tweets written in other languages.
The Twitter and TweetOntoSense ontologies presented in this paper are language
independent. However, the Emotion Ontology currently includes emotion words only
for English and Spanish. MultiWordNet can be used to populate the ontology with emo-
tion words for additional languages.
4.3 Preprocessing
The second step represents the preprocessing phase in which tokenization, normaliza-
tion and stemming are applied, as shown in Fig. 9. A comprehensive discussion regard-
ing the role of preprocessing can be found in [34].
Given the fact that many users write messages using a casual language, the normal-
ization process includes:
Removing duplicated letters, which frequently occur in twitter messages and empha-
size a particular word, in order not to interfere with the stemmer. For example the
first tweet in the Sentiment140 corpus
7
, presented in [30] is:
“I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own
right.”
Converting all-caps words to lower case. While it can be argued that further infor-
mation regarding the intensity of a sentiment could be extracted from the use of all-
caps [16], in this paper it has been chosen to only focus on extracting the associated
emotions. An example of a tweet that uses All-caps is:
My Kindle2 came and I LOVE it! :)”.
1. To kenization
2. Normalization
3. Stemming
Twitter message
Preprocessed
tweet text
Fig. 9. Preprocessing steps
7
http://www.sentiment140.com/
Replacing hashtags with the corresponding words.
Replacing abbreviations with the corresponding regular words taken from the Inter-
net Lingo Dictionary.
AHH YES LOL IMA TELL MY HUBBY TO GO GET ME SUM MCDONALDS
Replacing emoticons with the corresponding emotions from the ontology of emo-
tional categories. The Internet Lingo Dictionary
8
has been used to gather the emoti-
cons, together with their meaning, although other sources such as the Smiley Ontol-
ogy could prove equally useful. A similar set of emoticons is used in [30], with the
mention that they are only divided into emoticons for expressing positive and nega-
tive feelings. Table 1 includes the emoticons that are mapped to the word happiness
during the preprocessing phase.
Table 1. Emotions mapped to happiness
:)
: )
:-)
:-))
:-)))
;)
;-)
ˆ_ˆ
:-D
:D
=D
C:
=)
Table 2 includes the emoticons that are mapped to the word surprise during the prepro-
cessing phase.
Table 2. Emotions mapped to surprise
:0
Table 3 includes the emoticons that are mapped to the word sadness during the prepro-
cessing phase.
Table 3. Emotions mapped to sadness
:-(
:(
:((
: (
D:
Dx
‘n’
:\
/:
):-/
:’
=’[
:_(
/T_T
TOT
;_;
(:-(
The last operation of the preprocessing phase consists in applying the Porter stemmer
on the resulting sequence of words.
4.4 Sentiment and Emotion Identification
In the last step, sentiments and emotions are extracted from the preprocessed tweets.
The proposed ontologies and approach can easily be used with more advanced aspect
sentiment and emotion mining algorithms, like the ones presented in [11, 13, 20].
8
http://www.netlingo.com/smileys.php
As the novelty of the proposed approach lies in the ontology-based analysis of tweets
preceding and following the sentiment analysis phase, we have chosen a simple senti-
ment and emotion mining approach, which only focuses on extracting explicit senti-
ments and emotions.
Thus, emotions are determined by comparing the processed tweet with the stemmed
versions of the individuals in the enriched ontology of emotion categories. Sentiments
were determined by grouping the emotions in positive, negative and neutral ones. The
resulting knowledge is saved in the triple store for further analysis using SPARQL que-
ries, as it is shown in the fourth section of the paper. Even though the proposed emotion
analysis approach is relatively simple, it has been found to provide fairly good results
when tested on a publically available corpus
9
, containing 5513 tweets collected for the
search terms “Microsoft”, “Apple”, “Twitter” and “Google”, which were annotated
with the following sentiment labels: positive, negative, neutral and irrelevant. From the
above mentioned corpus, only the positive and negative tweets were analyzed, as they
are the ones that could express emotions. Thus, a subset of 973 tweets was selected for
further analysis, representing 17.64% from the initial set.
Table 4. Emotion analysis results on the analyzed corpus
Apple
Google
Microsoft
Twitter
Total
like
13
5
7
9
35
love
7
6
6
6
25
hate
6
1
2
1
10
hope
3
0
1
0
4
upset
1
0
0
0
1
31
12
16
16
75
After comparing these tweets with the words included in the emotion ontology, 75
tweets were found to express emotions, the most frequent ones being “like” (35), “love”
(25), “hate” (10) and “hope” (4). An overview of the results is given in Table 4. The
results grouped by the topmost emotions in the ontology, are shown in Fig. 10. Analyz-
ing the emotions at different levels of specificity can more easily be performed thanks
to the hierarchical organization of the various emotions.
Unrevealing a significant number of emotions using a simple detection approach,
proves once more that user’s frequently express opinions in social media messages.
Aspect opinion mining can be performed by comparing the preprocessed tweets with
the twos:AnalizedEntityFacet individuals associated with the twos:AnalyzedEntity for
which the tweet has been retrieved.
9
http://www.sananalytics.com/lab/twitter-sentiment/
Fig. 10. Emotion analysis results on the analyzed corpus
5 Social Media Analysis
The proposed semantic analysis approach can be used to develop an end-to-end ontol-
ogy-based social media analysis platform. A complete semantic approach is provided
through:
modelling the analyzed emotions using the selected Emotion Ontology;
modelling the Twitter related data using the proposed Twitter Ontology;
modelling the analyzed entities, representing for example products or services, their
facets and the analysis results using the proposed TweetOntoSense Ontology.
The approach offers multiple advantages, including the possibility to exploit the vast
amount of information readily available in the Linking Data Cloud using the technolo-
gies associated with the semantic web.
Moreover, using semantic web inference, for example, new relations between the
collected information can be discovered automatically. Thus, if the em:offended emo-
tion is associated to a tweet, during the emotion identification phase, the inference en-
gine also associates the more general em:indignation and em:anger emotions. Fig. 11
shows the hierarchy relation between the three emotions in the Emotion Ontology.
Therefore, emotion analysis can easily be performed at various granularity levels.
SPARQL queries provide the necessary mean for performing advanced analysis,
while their structured result can easily be processed for creating meaningful charts and
data tables in the user interface. For example, the following query retrieves from the
triple store all the studied entities together with the detected emotions.
0
5
10
15
20
25
Apple Google Microsoft Twitter
SELECT ?analyzedEntity ?emotion
WHERE
{ ?analysisResult rdf:type twos:AnalysisResult;
twos:hasEmotions ?tweetEmotionSet.
?tweetEmotionSet twos:hasTweetEmotion ?tweetEmotion.
?tweetEmotion twos:hasEmotion ?emotion;
twos:hasAnalyzedEntity ?analyzedEntity.
}
ORDER BY ASC(?analizedEntity)
Thing
Emotion
Anger
owl:subClassOf
owl:subClassOf
Indignation
owl:subClassOf
offended
rdf:type
Fig. 11. Emotion hierarchy
Using also inference, the query bellow returns the users that have written tweets which
express emotions derived from em:happiness, ordered by the influence of each user,
measured as the number of followers. Influencers, defined as users with a large number
of followers, can thus be easily determined for each type of emotion.
SELECT ?user ?tweetContent ?followerCount
WHERE
{
?analysisResult rdf:type twos:AnalysisResult;
twos:hasEmotions ?tweetEmotionSet;
twos:analyzedTweet ?tweet.
?tweetEmotionSet twos:hasTweetEmotion ?tweetEmotion.
?tweetEmotion twos:hasEmotion ?emotion.
?tweet sioc:content ?tweetContent;
sioc:hasCreator ?user.
?user tw:hasFollowerCount ?followerCount.
?emotion rdf:type em:Happiness.
}
ORDER BY DESC(?followerCount)
The architecture of a semantic social media platform, using the proposed approach is
shown in Fig. 11. The social media analysis dashboard, representing the user interface
of the platform, can communicate with a common REST approach with a Web API,
labeled TweetOntoSense API in the figure. The API performs the necessary SPARQL
queries on the semantic database and returns the results to the social media analysis
dashboard in JavaScript Object Notation JSON format. By reusing classes from the
Linking Open Data Cloud, advanced analysis can be performed, that tap into the vast
information available in knowledgebase such as DbPedia.
TweetO ntoSense API
Social Me dia Analsyi s
Dashboard
HTTP
Request / Response
SPARQL Q uery
Request/Re sponse
TweetOntoSense
Twitter
Emotion
Linking Open Data C loud
Fig. 12. Social media analysis platform
6 Concluding Remarks
The present paper proposes a novel ontology-based social media sentiment and emotion
analysis approach that better captures the wide array of feelings expressed in the mil-
lions of tweets published every day. While existing approaches only associate simple
positive, negative or neutral perceptions, TweetOntoSense paves the way towards fine-
grained analysis using semantic web technologies, thus unlocking a vast amount of
emotional information that has previously been unavailable to companies and public
authorities, trying to better understand their customers’ opinions through social media
analysis. A Twitter ontology reusing classes and properties from well-known ontolo-
gies is also proposed.
Among the further research directions, we consider both extending the proposed ap-
proach to other online social media networks, such as Facebook, LinkedIn and Google+
and also analyzing how the expressed emotions change over time as a result of the
changes in user perception. The proposed ontologies will be available for download at
https://github.com/lcotfas/TweetOntoSense.
Acknowledgments. The study was produced as part of the SCOPANUM research pro-
ject, supported by grants from CSFRS (http://csfrs.fr/), and a doctoral grant from Pays
de Montbéliard Agglomération (http://www.agglo-montbeliard.fr/). The authors also
acknowledge the support of Leverhulme Trust International Network research project
"IN-2014-020".
References
1. Pak, A., Paroubek, P.: Twitter as a Corpus for Sentiment Analysis and Opinion
Mining. In: Proceedings of the Seventh International Conference on Language Re-
sources and Evalua-tion. pp. 13201326. , Valletta (2010).
2. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based
sentiment analysis of twitter posts. Expert Syst. Appl. 40, 40654074 (2013).
3. Delcea, C., Cotfas, L.-A., Paun, R.: Understanding Online Social Networks’ Users
A Twitter Approach. In: Hwang, D., Jung, J.J., and Nguyen, N.-T. (eds.) Com-
putational Collective Intelligence. Technologies and Applications. pp. 145153.
Springer International Publishing, Cham (2014).
4. Torkildson, M.K., Starbird, K., Aragon, C.: Analysis and Visualization of Senti-
ment and Emotion on Crisis Tweets. In: Luo, Y. (ed.) Cooperative Design, Visu-
alization, and Engineering. pp. 6467. Springer International Publishing, Cham
(2014).
5. Rill, S., Reinel, D., Scheidt, J., Zicari, R.V.: PoliTwi: Early detection of emerging
political topics on twitter and the impact on concept-level sentiment analysis.
Knowl.-Based Syst. 69, 2433 (2014).
6. Khadjeh Nassirtoussi, A., Aghabozorgi, S., Ying Wah, T., Ngo, D.C.L.: Text min-
ing for market prediction: A systematic review. Expert Syst. Appl. 41, 76537670
(2014).
7. Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: A hybrid
system using n-gram analysis and dynamic artificial neural network. Expert Syst.
Appl. 40, 62666282 (2013).
8. Mostafa, M.M.: More than words: Social networks’ text mining for consumer
brand sentiments. Expert Syst. Appl. 40, 42414251 (2013).
9. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the so-
cial web. J. Am. Soc. Inf. Sci. Technol. 63, 163173 (2012).
10. Robaldo, L., Di Caro, L.: OpinionMining-ML. Comput. Stand. Interfaces. 35,
454469 (2013).
11. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applica-
tions: A survey. Ain Shams Eng. J. 5, 10931113 (2014).
12. Tsytsarau, M., Palpanas, T.: Survey on mining subjective data on the web. Data
Min. Knowl. Discov. 24, 478514 (2012).
13. Liu, B.: Sentiment Analysis and Opinion Mining. Synth. Lect. Hum. Lang. Tech-
nol. 5, 1167 (2012).
14. Shadbolt, N., Berners-Lee, T., Hall, W.: The Semantic Web Revisited. IEEE Intell.
Syst. 21, 96101 (2006).
15. Borst, W.N.: Construction of engineering ontologies for knowledge sharing and
reuse. Universiteit Twente (1997).
16. Roberts, K., Roach, M., Johnson, J.: EmpaTweet: Annotating and Detecting Emo-
tions on Twitter. In: Proceedings of the Eighth International Conference on Lan-
guage Resources and Evaluation. 38063813, Istanbul (2012).
17. Baggia, P., Burkhardt, F., Pelachaud, C., Peter, C., Zovato, E.: Emotion Markup
Language (EmotionML) 1.0, http://www.w3.org/TR/emotionml/.
18. Ashimura, K., Baggia, P., Burkhardt, F., Oltramari, A., Peter, C., Zovato, E.: Vo-
cabularies for EmotionML, http://www.w3.org/TR/2012/NOTE-emotion-voc-
20120510/.
19. Westerski, A., Iglesias Fernandez, C.A., Tapia Rico, F.: Linked opinions: Describ-
ing sentiments on the structured web of data. Presented at the 4th international
workshop Social Data on the Web , Bonn, Germany (2011).
20. Tsytsarau, M., Palpanas, T.: Survey on mining subjective data on the web. Data
Min. Knowl. Discov. 24, 478514 (2012).
21. Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-
Reyes, Stephan Zednik, Jun Zhao: PROV-O: The PROV Ontology,
http://www.w3.org/TR/prov-o/.
22. J. Hastings, W. Ceusters, B. Smith, K. Mulligan: Dispositions and processes in the
Emo-tion Ontology. In: Proceedings of ICBO 2011 (2011).
23. Francisco, V., Hervás, R., Peinado, F., Gervás, P.: EmoTales: creating a corpus of
folk tales with emotional annotations. Lang. Resour. Eval. 46, 341381 (2012).
24. Montejo-Ráez, A., Martínez-Cámara, E., Martín-Valdivia, M.T., Ureña-López,
L.A.: Ranked WordNet graph for Sentiment Polarity Classification in Twitter.
Comput. Speech Lang. 28, 93107 (2014).
25. M Baldoni, C Baroglio, V Patti, P Rena: From tags to emotions: Ontology-driven
sen-timent analysis in the social semantic web. Presented at the Intelligenza Arti-
ficiale (2012).
26. Togias, K., Kameas, A.: An Ontology-Based Representation of the Twitter REST
API. Presented at the November (2012).
27. Breslin, J.G., Passant, A., Decker, S.: The Social Semantic Web. Springer (2009).
28. Fornara, N., Ježić, G., Kušek, M., Lovrek, I., Podobnik, V., Tržec, K.: Semantics
in Multi-agent Systems. In: Ossowski, S. (ed.) Agreement Technologies. pp. 115
136. Springer Netherlands (2013).
29. D. Maynard, K. Bontcheva, D. Rout: Challenges in developing opinion mining
tools for social media. In: Proceedings of the Eighth International Conference on
Language Resources and Evaluation. pp. 1522. , Istanbul (2012).
30. E. Kouloumpis, T. Wilson, J. Moore: Twitter sentiment analysis: The good the bad
and the omg! In: Proceedings of the Fifth International Conference on Weblogs
and Social Media. pp. 538541. , Barcelona (2011).
31. T Rabl, S Gómez-Villamor, M Sadoghi, V Muntés-Mulero, H.-A. Jacobsen, S
Mankov-skii: Solving big data challenges for enterprise application performance
management. In: Proceedings of the VLDB Endowmen. pp. 17241735 (2012).
32. Carter, S., Weerkamp, W., Tsagkias, M.: Microblog language identification: over-
coming the limitations of short, unedited and idiomatic text. Lang. Resour. Eval.
47, 195215 (2013).
33. Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: Proceedings of
SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Re-
trieval (1994).
34. Bao, Y., Quan, C., Wang, L., Ren, F.: The Role of Pre-processing in Twitter Sen-
timent Analysis. In: Intelligent Computing Methodologies. pp. 615624. Springer
(2014).
... A semantic web-based approach for analyzing social media users' emotions towards the products and services offered by a company has been proposed in Cotfas et al. (2016). Thanks to using an emotion ontology that structures emotions in a hierarchy, starting from general ones to more particular ones, the public's opinion can be analyzed at different levels of emotions granularity. ...
... In this context, the present paper adapts the approach proposed in Cotfas et al. (2016) with the purpose of analyzing the public's opinion concerning companies from social media messages, with the help of emotion analysis and semantic web technologies. Moreover, the identified perception can be easily analyzed in the context of the business sector to which the company belongs. ...
... Ontologies have already been successfully applied in many social media analysis tasks, including detecting trending news and topics (Ejaz et al. 2018), modeling of extreme financial events (Qu et al. 2016), understanding people behavior in an earthquake evacuation scenario (Iwanaga et al. 2011), analyzing how social media users perceive the products and services offered by companies , as well as their opinions regarding the characteristics of the products and services (Cotfas et al. 2016;. ...
... The information stored in ontologies can easily be retrieved using a specialized query language, known as SPARQL and new relationships inside the data can be discovered through inference using semantic reasoning engines. Ontologies have already been successfully used in many social media analysis tasks, including detecting trending news and topics [13], modelling of extreme financial events [14], understanding people behaviour in an earthquake evacuation scenario [15], extracting user preferences regarding the characteristics of a product [4] and analysing the emotions expressed in social media messages [16]. The concepts required in order to semantically search information in previously collected social media messages can be grouped in the following three categories:  concepts that describe the social media specific knowledge;  concepts that represent the analysed entities, such as products or service;  concepts that provide a connection between the social media messages, the analysed entities, as well as with any other additional data obtained using NLP techniques, such as sentiment or emotion analysis. ...
... For representing the social media concepts and their properties, we have chosen to use the ontology that we have proposed in [16], which extends well recognized ontologies such as SIOC and FOAF with the concepts specific to Twitter and follows the recommended ontology modelling best practices. The tw prefix is used in the following to denote classes or properties belonging to this ontology. ...
... While many sentiment analysis algorithms have been proposed in the scientific literature, we have chosen to use the bag of words model described in [16], given its low complexity and adequate results. ...
... Ontologies have already been successfully used in many social media analysis tasks, including detecting trending news and topics (Ejaz et al. 2018), modeling of extreme financial events (Qu et al. 2016), understanding people behavior in an earthquake evacuation scenario (Iwanaga et al. 2011), extracting user preferences regarding the characteristics of a product (Kontopoulos et al. 2013), and analyzing the emotions expressed in social media messages (Cotfas et al. 2016). ...
... For representing the first category of concepts, the ones concerning the tweets and their properties, we have chosen to use the ontology described in (Cotfas et al. 2016), which reuses classes and properties from well-known ontologies, as recommended in the ontology modeling best practices (Allemang and Hendler 2011). By reusing concepts and properties from recognized vocabularies such as Dublin Core (prefix dcterms), FOAF (prefix foaf), SIOC (prefix sioc), and Basic Geo WGS84 (prefix geo), the ontology facilitates the integration between the data extracted by analyzing the social media messages and the vast amount of information available in other ontologies, such as the ones included in Linking Open Data Cloud (Linked Data Community 2018). ...
Chapter
While the literature contains many slightly different definitions for the image of a company, they all put great emphasis on its importance. Many of the messages posted on social media networks nowadays contain strong sentiment and emotion indications regarding almost any topic, therefore turning them into a rich and almost real-time data source for analyzing the public’s opinion on various subjects, including many of the factors that can influence the image of companies. Thus, in this chapter we propose a natural language processing (NLP) approach for monitoring and evaluating the companies’ image by extracting information from social media messages posted on Twitter. The messages are analyzed using a bag-of-words sentiment analysis approach. The results of the analysis are stored as semantically structured data, thus making it possible to fully exploit the possibilities offered by semantic web technologies, such as inference and accessing the vast amount of knowledge in Linked Open Data, for further analysis.
... Opinion mining is a growing area of the Natural Language Processing field commonly used to determine viewpoints towards targets of interest using computational methods [27]. It is also known as sentiment analysis and includes many subtasks, such as polarity detectionin which the goal is to determine whether a text has positive, negative or neutral connotation [28], emotion identificationin which the objective is to uncover specific emotions such as happiness, fear or sadness [29], subjectivity detectionin which the goal is to determine if the text is objective or subjective [30]. ...
Article
Full-text available
The coronavirus outbreak has brought unprecedented measures, which forced the authorities to make decisions related to the instauration of lockdowns in the areas most hit by the pandemic. Social media has been an important support for people while passing through this difficult period. On November 9, 2020, when the first vaccine with more than 90% effective rate has been announced, the social media has reacted and people worldwide have started to express their feelings related to the vaccination, which was no longer a hypothesis but closer, each day, to become a reality. The present paper aims to analyze the dynamics of the opinions regarding COVID-19 vaccination by considering the one-month period following the first vaccine announcement, until the first vaccination took place in UK, in which the civil society has manifested a higher interest regarding the vaccination process. Classical machine learning and deep learning algorithms have been compared to select the best performing classifier. 2 349 659 tweets have been collected, analyzed, and put in connection with the events reported by the media. Based on the analysis, it can be observed that most of the tweets have a neutral stance, while the number of in favor tweets overpasses the number of against tweets. As for the news, it has been observed that the occurrence of tweets follows the trend of the events. Even more, the proposed approach can be used for a longer monitoring campaign that can help the governments to create appropriate means of communication and to evaluate them in order to provide clear and adequate information to the general public, which could increase the public trust in a vaccination campaign.
Chapter
Due to COVID-19 pandemic, public health emergency was created throughout the world. So, we took the base data and perform analysis on how the effect of vaccination on the human lives in terms of recovery, severity, side effects, and deaths on the globe. We also analyzed the country wise vaccination to understand the scenarios in the world, because the COVID virus is transforming in different countries in different ways, therefore the understanding the mutations of the virus and the use of the drug analysis also very much important for the future generations and also useful to face the future COVID virus mutations.
Chapter
Recent studies have shown that based on informal communication, made from one user to another, certain opinions regarding the activity of a company can be formed, with long-term influence on consumers, perception on its image. By focusing on the Millennial generation, this research presents its main characteristics and examines the consumer’s behaviour in online social networks, with an accent on how his perception towards certain companies may change due to communications on these networks with other users. Based on the latest studies in the literature, a questionnaire has been created, containing most of the issues identified in the publications related directly to consumer behaviour and company’s image. Additionally, a PANAS analysis has been conducted in order to see whether there are any behaviour or opinion differences between the two identified categories of Millennial online social networks’ users: “Enthusiastic” and “Stressed”. In the end, a grey incidence analysis has been applied to the considered variables.
Chapter
The coronavirus pandemic has forced authorities to take unprecedented measures, including the temporary closure of business and the instauration of national and regional lockdowns. The educational system, one of the key components of the society, has also been disrupted, as many schools and universities have moved their courses online for prolonged periods. With the introduction of the first vaccine on December 8, 2020, social media users have reacted by posting messages supporting or rejecting the vaccination process. In this context, the present paper aims to analyze the opinions regarding COVID-19 vaccination in education-related tweets. A dataset containing 102,805 English tweets published in the month following the beginning of the vaccination process has been collected. Several classical machine learning and deep learning algorithms have been compared and the best-performing classifier, RoBERTa, has been selected and applied for determining the stance of the collected tweets, as in favor, against or neutral. The evolution of the opinions has been put in correspondence with the main events that have occurred during the analyzed period, while the main discussion topics have been outlined using the Latent Dirichlet Allocation and n-gram analysis. The obtained results can be useful for authorities looking to better understand the opinions of the parents, students, teachers, and general public.
Article
Full-text available
Emotion ontologies have been developed to capture affect, a concept that encompasses discrete emotions and feelings, especially for research on sentiment analysis, which analyzes a customer's attitude towards a company or a product. However, there have been limited efforts to adapt and employ these ontologies. This research surveys and synthesizes emotion ontology studies to develop a Framework of Emotion Ontologies that can be used to help a user select or design an appropriate emotion ontology to support sentiment analysis and increase the user's understanding of the roles of affect, context, and behavioral information with respect to sentiment. The framework, which is derived from research on emotion ontologies, psychology, and sentiment analysis, classifies emotion ontologies as discrete emotion or one of two hybrid ontologies that are combinations of the discrete, dimensional, or componential process emotion paradigms. To illustrate its usefulness, the framework is applied to the development of an emotion ontology for a sentiment analysis application.
Conference Paper
Full-text available
Recently, increasing attention has been attracted to Social Networking Sentiment Analysis. Twitter as one of the most fashional social networking platforms has been researched as a hot topic in this domain. Normally, sentiment analysis is regarded as a classification problem. Training a classifier with tweets data, there is a large amount of noise due to tweets' shortness, marks, irregular words etc. In this work we explore the impact pre-processing methods make on twitter sentiment classification. We evaluate the effects of URLs, negation , repeated letters, stemming and lemmatization. Experimental results on the Stanford Twitter Sentiment Dataset show that sentiment classification accuracy rises when URLs features reservation, negation transformation and repeated letters normalization are employed while descends when stemming and lemmatization are applied. Moreover, we get a better result by augmenting the original feature space with bigram and emotions features. Comprehensive application of these measures makes us achieve classification accuracy of 85.5%.
Article
The Social Web (including services such as MySpace, Flickr, last.fm, and WordPress) has captured the attention of millions of users as well as billions of dollars in investment and acquisition. Social websites, evolving around the connections between people and their objects of interest, are encountering boundaries in the areas of information integration, dissemination, reuse, portability, searchability, automation and demanding tasks like querying. The Semantic Web is an ideal platform for interlinking and performing operations on diverse person- and object-related data available from the Social Web, and has produced a variety of approaches to overcome the boundaries being experienced in Social Web application areas. After a short overview of both the Social Web and the Semantic Web, Breslin et al. describe some popular social media and social networking applications, list their strengths and limitations, and describe some applications of Semantic Web technology to address their current shortcomings by enhancing them with semantics. Across these social websites, they demonstrate a twofold approach for interconnecting the islands that are social websites with semantic technologies, and for powering semantic applications with rich community-created content. They conclude with observations on how the application of Semantic Web technologies to the Social Web is leading towards the "Social Semantic Web" (sometimes also called "Web 3.0"), forming a network of interlinked and semantically-rich content and knowledge. The book is intended for computer science professionals, researchers, and graduates interested in understanding the technologies and research issues involved in applying Semantic Web technologies to social software. Practitioners and developers interested in applications such as blogs, social networks or wikis will also learn about methods for increasing the levels of automation in these forms of Web communication.
Book
The PROV Ontology (PROV-O) expresses the PROV Data Model using the OWL2 Web Ontology Language. It provides a set of classes, properties, and restrictions that can be used to represent and interchange provenance information generated in different systems and under different contexts. It can also be specialized to create new classes and properties to model provenance information for different applications and domains.
Article
With the rapid growth of social media, sentiment analysis, also called opinion mining, has become one of the most active research areas in natural language processing. Its application is also widespread, from business services to political campaigns. This article gives an introduction to this important area and presents some recent developments.
Conference Paper
Facebook is one of the largest socializing networks nowadays, gathering among his users a whole array of persons from all over the world, with a diversified background, culture, opinions, age and so on. Here is the meeting point for friends (both real and virtual), acquaintances, colleagues, team-mates, class-mates, co-workers, etc. Also, here is the land where the information is spreading so fast and where you can easily exchange your opinions, feelings, traveling informations, ideas, etc. But what happens when one is reading the news feed or is seeing his Facebook friends’ photos? Is he thrilled, excited? Is he feeling that the life is good? Or contrary: he is feeling lonely, isolated? Is he doing a comparison with his friends? These are some of the questions this paper in trying to answer and shaping some of these relationships, the grey system theory will be used.
Conference Paper
Twitter messages, also known as tweets, are increasingly used by marketers worldwide to determine consumer sentiments towards brands, products or events. Currently, most existing approaches used for social networks sentiment analysis only extract simple feedbacks in terms of positive and negative perception. In this paper, TweetOntoSense is proposed - a semantic based approach that uses ontologies in order to infer the actual user’s emotions. The extracted sentiments are described using a WordNet enriched emotional categories ontology. Thus, feelings such as happiness, affection, surprise, anger, sadness, etc. are put forth. Moreover, compared to existing approaches, TweetOntoSense also takes into consideration the fact that a single tweet message might express several, rather than a single emotion. A case study on Twitter is performed, also showing this approach’s practical applicability.
Conference Paper
Understanding how people communicate during disasters is important for creating systems to support this communication. Twitter is commonly used to broadcast information and to organize support during times of need. During the 2010 Gulf Oil Spill, Twitter was utilized for spreading information, sharing firsthand observations, and to voice concern about the situation. Through building a series of classifiers to detect emotion and sentiment, the distribution of emotion during the Gulf Oil Spill can be analyzed and its propagation compared against released information and corresponding events. We contribute a series of emotion classifiers and a prototype collaborative visualization of the results and discuss their implications.
Chapter
Today’s leading businesses have understood the role of “social” into their everyday activity. Online social networks (OSN) and social media have melt and become an essential part of every firm’s concerns. Brand advocates are the new leading triggers for company’s success in online social networks and are responsible for the long term engagement between a firm and its customers. But what can it be said about this impressive crowd of customers that are gravitating around a certain brand advocacy or a certain community? Are they as responsive to a certain message as one might think? Are they really impressed by the advertising campaigns? Are they equally reacting to a certain comment or news? How they process the everyday grey knowledge that is circulating in OSN? In fact, how impressionable they are and which are the best ways a company can get to them?
Conference Paper
Recently, increasing attention has been attracted to Social Networking Sentiment Analysis. Twitter as one of the most fashional social networking platforms has been researched as a hot topic in this domain. Normally, sentiment analysis is regarded as a classification problem. Training a classifier with tweets data, there is a large amount of noise due to tweets’ shortness, marks, irregular words etc. In this work we explore the impact pre-processing methods make on twitter sentiment classification. We evaluate the effects of URLs, negation, repeated letters, stemming and lemmatization. Experimental results on the Stanford Twitter Sentiment Dataset show that sentiment classification accuracy rises when URLs features reservation, negation transformation and repeated letters normalization are employed while descends when stemming and lemmatization are applied. Moreover, we get a better result by augmenting the original feature space with bigram and emotions features. Comprehensive application of these measures makes us achieve classification accuracy of 85.5%.