Content uploaded by Liviu-Adrian Cotfas
Author content
All content in this area was uploaded by Liviu-Adrian Cotfas on Sep 29, 2017
Content may be subject to copyright.
Semantic Web-based Social Media Analysis
Liviu-Adrian Cotfas1,2, Camelia Delcea2, Antonin Segault1, Ioan Roxin1
1Franche-Comté University, Montbéliard, France
{liviu-adrian.cotfas,ioan.roxin}@univ-fcomte.fr,
antonin.segault@edu.univ-fcomte.fr
2Bucharest University of Economic Studies, Bucharest, Romania
camelia.delcea@csie.ase.ro
Abstract. With the on growing usage of microblogging services, such as Twitter,
millions of users share opinions daily on virtually everything. Making sense of
this huge amount of data using sentiment and emotion analysis, can provide in-
valuable benefits to companies trying to better understand what the public thinks
about their services and products. While the vast majority of now-a-days re-
searches are focusing on improving the algorithms for sentiment and emotion
evaluation, the present one underlines the importance of using a semantic based
approach for modeling the analysis’ results, the emotions and the social media
specific concepts. Moreover, by storing the results as structured data, the possi-
bilities offered by semantic web technologies, such as inference and accessing
the vast knowledge in Linked Open Data, can be fully exploited. The paper also
presents a novel semantic social media analysis platform, which is able to
properly emphasize the users’ complex feeling such as happiness, affection, sur-
prise, anger or sadness.
Keywords: ontology, emotion analysis, sentiment analysis, semantic web, twit-
ter, social media analysis
1 Introduction
The last few years have witnessed an amazingly fast-paced growth in the usage of social
media networks. Thus, the most commonly used micro-blogging service, Twitter
1
,
which allows users to broadcast 140 character status messages, also known as tweets,
has over 240 million monthly active users, who post more than 500 million tweets every
day, as reported in April 2014. Many of these messages contain sentiment and emotion
indications regarding almost any topic, therefore turning Twitter into a rich data source
for analyzing the public’s opinion. Moreover, various existing researches have already
shown that users frequently express opinions regarding products and services [1] in
their tweets. Correctly extracting these opinions could easily provide invaluable infor-
mation to companies willing to better understand their customer’s needs. Compared to
traditional marketing studies, which can take time and involve high costs, social media
1
http://www.twitter.com
emotion and sentiment analysis offers the promise of obtaining almost real-time opin-
ions from huge numbers of actual or potential customers.
Sentiment and emotion analysis are growing areas of Natural Language Processing,
commonly used to get insights from customer reviews, blogs and more recently from
social media messages. They require a multidisciplinary approach, combining elements
from fields such as linguistics, psychology and artificial intelligence. Among the tasks
to which they have already been applied, we can mention analyzing customer’s opin-
ions [2, 3], analyzing public’s opinion during crisis [4], predicting political elections
outcome [5] and even stock market evolution prediction [6].
Sentiment analysis is used to determine whether a text expresses a positive, negative
or neutral perception [7, 8], also known as polarity. Besides simply determining the
perception, some papers also investigate how the strength of the perception should be
evaluated [9], thus providing a more in-depth understanding of the user’s actual feel-
ings.
While knowing the perception of the user is definitely important, analyzing the cat-
egories of emotions contained in Twitter messages using emotion analysis can provide
even more information, by putting the focus on the actual feelings, such as joy, surprise,
sadness or anger.
Aspect based sentiment and emotion analysis are able to complete the picture by
associating the determined perceptions and emotions with particular properties of the
analyzed entities, thus taking into consideration the fact that users frequently express
different and sometimes event contradictory feelings regarding the various features and
characteristics of a product or service, also called facets [10]. Detailed surveys of exit-
ing sentiment and emotion analysis approaches are presented in [11–13].
While various social media sentiment and emotion analysis approaches have been
proposed and evaluated in the scientific literature, only a few papers partially address
an equally important aspect, represented by the manner in which the extracted tweets,
together with their associated data, the analyzed entities and their facets, as well as the
results of the analysis could be stored in a standardized, easily interchangeable and ex-
tensible way. Semantic web technologies cannot only meet these requirements, but also,
by employing techniques such as interlinking and reusing of classes and properties from
well-known ontologies [14], bridges towards the huge amount of knowledge available
in Linked Open Data can be created. Therefore, innovative social media analysis plat-
forms can be developed, capable of providing increasingly deeper insights into the cus-
tomer’s real opinions.
In this paper, an end-to-end semantic approach – TweetOntoSense - is proposed, that
uses ontologies to model the various emotions expressed in social media messages, the
analyzed entities and their facets, the results of the analysis, as well as the various Twit-
ter related concepts. To the best of our knowledge, there are no current scientific pub-
lications or commercial systems proposing a fully semantic social media analysis ap-
proach. The other contributions of the paper are the TweetOntoSense ontology and the
Twitter ontology. By storing the extracted information as triples, advanced analysis can
be performed using the technologies associated with the semantic web.
The paper is organized as follows. In the second section, a survey of existing ontol-
ogy based approaches found in the scientific literature is provided. The third section
presents the Emotions, Twitter and TweetOntoSense ontologies, which form the bases
of the proposed approach. The fourth section of the paper includes the steps needed to
perform sentiment and emotion analysis. The fifth section shows how the extracted
information can be further exploited using semantic web inference and SPARQL
(SPARQL Protocol and RDF Query Language) queries, to create the bases for devel-
oping an advanced social media analysis platform. The last section summarizes the pa-
per and introduces some of the future research directions.
2 Ontology Based Approaches
According to [15], ontologies are defined as a “formal, specification of a shared con-
ceptualization”. They formally represent knowledge as a hierarchy of concepts, using
a shared vocabulary to denote the types, properties and interrelationships of those con-
cepts. Currently, ontologies have become the means of choice for representing
knowledge, by both providing a common understanding for concepts and being ma-
chine processable.
Existing ontology based social media sentiment and emotion analysis approaches
can be classified in respect to the usage of ontologies in:
approaches modelling only the possible sentiments and emotions;
approaches modelling only the analyzed entities and their facets.
2.1 Approaches modelling only the possible sentiments and emotions
An approach for modelling the possible sentiments and emotions found in Twitter mes-
sages is proposed in [16].
emotion
positivenegative unexpected
anger disgust sadness fear joy love surprise
Fig. 1. Ontology for emotion representation used in [16]
The ontology includes seven basic emotions, composed from the six Ekman emotions
and the additional “love” emotion. The emotions are structured in the positive, negative
and unexpected categories, as shown in Fig. 1, corresponding to the possible senti-
ments.
A potential downside of the approach is represented by the limited set of emotions,
which might not be able to capture all the shades of the opinions expressed in the social
media messages. As shown in the third section of the paper, our approach relies on a
more complex ontology, which structures emotions in a multi-level hierarchy, in which
with every level, emotions become more and more fine-grained.
2.2 Approaches modelling the analyzed entities and their facets
Such approaches take into consideration the fact that users express opinions about the
various characteristics of the analyzed entities and not only about the product or service
as a whole. Modelling the analyzed entities and their facets using an ontology is inves-
tigated in [2], where the authors show how this approach could be applied for evaluating
the public’s sentiments on the different characteristics of several popular smartphones.
An extract from the proposed ontology is presented in Fig. 2.
smartphone
nokia_lumia
SubConcept-Of
lumia_display SubConcept-Of
lumia_windows
lumia_processor lumia_microusb
lumia_battery
lumia_cameraSubConcept-Of
SubConcept-Of
htc_one SubConcept-Of apple_iphoneSubConcept-Of
Fig. 2. Ontology for object – facet representation used in [2]
However, the paper does not deal with emotion analysis, nor it proposes how the results
of the sentiment analysis could be represented and does not propose a general entity-
facet ontology that could be used to build a semantic social media analysis platform.
2.3 Representing the results of the aspect emotion analysis
An important step towards representing the results of the emotion analysis process in a
standardized and largely accepted format is represented by the general-purpose emotion
annotation and representation language Emotion Markup Language – EmotionML [17].
It is a W3C recommendation for representing emotion related states in data processing
systems and provides twelve vocabularies for appraisals, categories and dimensions,
further described in [18].
An ontology-based approach for representing sentiment analysis results is repre-
sented by Marl [19], a vocabulary designed to annotate and describe subjective opinions
expressed on the web. The Onyx ontology is a recent development towards representing
emotion analysis results, of the approach proposed in Marl. It aims to provide a simple
means to describe emotion analysis processes and results using semantic web technol-
ogies [20]. It is organized around the onyx:EmotionAnalysis, onyx:EmotionSet and
onyx:Emotion classes and reuses several properties and classes, such as prov:Activity
and prov:Entity, from the W3C Provenance Ontology [21]. Therefore, neither Marls,
nor Onyx are able to provide a complete description for both sentiment and emotion
analysis results. Moreover, they were not specifically designed for analyzing social me-
dia messages and cannot capture all the associated details, such as the Twitter account,
its followers and much more other information, that is highly relevant in social media
analysis.
3 Twitter Sentiment and Emotion Analysis Ontologies
The concepts needed in order to perform sentiment and emotion analysis on Twitter
messages can be grouped in three main categories:
concepts that express human emotions;
concepts that describe Twitter specific knowledge;
concepts that provide a connection between the twitter message, the expressed emo-
tions and the analyzed entity and its’ facets.
As shown in the previous section, while various existing researches focus on the differ-
ent components required for building a fully semantic approach for aspect emotion and
sentiment social media analysis, currently there are no end-to-end semantic solutions.
While for the first category of concepts, the ones describing emotions, several exist-
ing ontologies were found in the scientific literature, for the last two categories no ap-
propriate ontology was identified. Therefore, as an initial step, a Twitter ontology mod-
elling the relations between users, tweets and their associated properties had to be cre-
ated. Afterwards, an aspect sentiment and emotion analysis ontology, named Tweet-
OntoSense, which connects the expressed sentiments and emotions, the twitter mes-
sages and the analyzed entities and their facets was defined.
The main concepts from the three ontologies are shown in Fig. 3, together with the
object properties that connect them. The following subsections describe in further de-
tails the proposed ontologies, used to enable social media analysis using semantic web
technologies.
em:Emotionem:Neutral owl:subClassOf
Emotion Ontology
TweetOntoSense Ontology
twos:TweetEm
otionSet
Twitter Ontology
tw:Tweet twos:analyzedTweet
tw:TwitterAccount
sioc:has_creator
twos:AnalysisResult
twos:TweetSen
timentSet
twos:hasEmotions
twos:hasSentiments
twos:TweetEm
otion
twos:HasTweetEmotion
twos:hasEmotion
Fig. 3. Ontology-based aspect sentiment and emotion analysis
3.1 Emotion Ontology
Several emotion ontologies such as the ones proposed in [16], [22] and [23] currently
exist. From them, it has been chosen the emotional categories ontology presented in
[23], as besides being inspired by recognized psychological models, it also structures
the different human emotions in a taxonomy. The nine top-level emotions in the ontol-
ogy, as well as the second-level emotions associated with the concept of “Anger”, are
shown in Fig. 4.
The ontology contains for each class a number of individuals, representing words
associated with the particular type of emotion. In order to obtain a better coverage of
the words used to express emotions, we have chosen to enrich the ontology using some
of the values in the corresponding WordNet synsets [24]. Fig. 5. shows the WordNet
synset for the word “fear”, corresponding to the concept of “Fear” in the emotion cate-
gories ontology.
Even though the ontology currently supports only English and Spanish, it can easily
be extended with other languages as shown in [25], where the ontology was extended
to include concepts in Italian. Thus, tweets in other languages can be more precisely
analyzed, without having to resort to automatic translation services. This can prove
highly important in many situations, as almost 49% of all the Twitter messages are
written in other languages than English.
The em prefix is used in the rest of the paper to denote classes or properties belonging
to this ontology.
Thing
Emotion
Affection Anger Bravery Disgust Fear Happiness Neutral SadnessSurprise
owl:subClassOf
owl:subClassOf
Sulking Fury Hostility Indignation Envy Annoyance
Frustration
owl:subClassOf owl:subClassOf
owl:subClassOf
Fig. 4. Ontology of emotions [23]
Fig. 5. WordNet synset for “fear”
3.2 Twitter Ontology
When producing semantic data, a good practice is to reuse classes and properties from
existing ontologies [14], as it facilitates mappings with other ontologies such as the
ones in the Linked Open Data
2
project.
Therefore, given the fact that the existing Twitter REST API ontology presented in
[26] does not provide any mappings to well-known ontologies, a new Twitter ontology,
for which the main classes and properties are shown in Fig. 6, is proposed, that both
reuses well-known vocabularies such as Dublin Core
3
(prefix dcterms), FOAF
4
(prefix
foaf),SIOC
5
(prefix sioc) and Basic Geo WGS84
6
(prefix geo) and also facilitates social
media network analysis using SPARQL queries. The tw prefix is used in the rest of the
paper to denote classes or properties belonging to this ontology.
sioc:Post
tw:Tweet
owl:subClassOf
tweet
rdf:type
date
dcterms:created
text sioc:content
tw:TwitterAccount
sioc:UserAccount
owl:subClassOf
sioc:has_creator user
rdf:type
Thing
owl:subClassOf owl:subClassOf
Fig. 6. Twitter ontology
As shown in [27], several generic widely used vocabularies for annotating the data ex-
tracted from social media networks currently exist. One of the best well-known is the
Friend of a friend – FOAF ontology, used to represent people and their relationships.
The proposed Twitter ontology reuses from FOAF the foaf:accountName and the
foaf:homepage properties. Another widely used ontology is The Semantically-Inter-
linked Online Communities – SIOC ontology, dedicated to the description of infor-
mation exchanges in online communities such as blogs and forums, from which the
proposed ontology reuses several properties, including sioc:has_topic, sioc:content and
sioc:links_to. Moreover, the tw:Tweet and tw:TwitterAccount classes are derived from
the sioc:Post and sioc:UserAccount classes, defined in the SIOC ontology.
2
http://linkeddata.org
3
http://dublincore.org/
4
http://xmlns.com/foaf/spec/
5
http://sioc-project.org/
6
http://www.w3.org/2003/01/geo/
The Dublin Core ontology provides terms to declare a large variety of document's
metadata, from which the dcterms:created and dcterms:language properties have been
reused, in order to specify the date when the tweet was published and the language of
the tweet. The Basic Geo WGS84 vocabulary provides the necessary properties for de-
scribing the location associated with a tweet, through geo:lat and geo:long.
The information associated with a tweet can thus be represented as follows.
<http://twitter.com/13006812/status/454515103182774272>
rdf:type tw:Tweet, owl:NamedIndividual ;
dc:created "2014-04-11T07:03:39Z"^^xsd:dateTime ;
tw:hasFavoriteCount 3 ;
geo:long "6.79" ;
geo:lat "47.52" ;
sioc:has_creator https://twitter.com/twitterAccount1 ;
sioc:content "tweet content" ;
sioc:has_topic "hashtag" ;
dcterms:language [ rdf:value "eng"^^dcterms:RFC4646 ].
3.3 TweetOntoSense Ontology
The application specific ontology, shown in Fig. 7, describes the analyzed entities, like
products, services or events, together with their facets and the detected emotions. The
twos prefix is used in the rest of the paper to denote classes or properties belonging to
this ontology.
em:Emotion
twos:TweetEm
otionSet
tw:Tweet twos:analyzedTweet twos:AnalysisResult
twos:TweetSen
timentSet
twos:hasEmotions
twos:hasSentiments
twos:TweetEm
otion
twos:hasTweetEmotion
twos:hasEmotion
twos:AnalyzedEntity
twos:TweetSen
timent
twos:AnalyzedEntityFacet
twos:hasFacet
twos:hasAnalyzedEntity twos:hasAnalyzedEntity
twos:hasAnalyzedEntityFacet
twos:hasAnalyzedEntityFacet
twos:hasTweetSentiment
Fig. 7. Main classes in the TweetOntoSense ontology
The main classes, around which the ontology is built are twos:AnalysisResult, twos:An-
alyzedEntity and twos:AnalizedEntityFacet. The twos:Entity class serves as a base class
for twos:AnalyzedEntity and twos:AnalizedEntityFacet and defines the twos:hasQuery-
Term data property, containing the keywords or hashtags that will be used to retrieve
the analyzed tweets. The analyzed entity is modeled by the twos:AnalyzedEntity class,
representing the particular product, service or event for which social media analysis is
performed. An alternative approach for modelling the analyzed entities is presented in
[28].
Given the fact that people usually express opinions not only about the concept, but
also about its characteristics, known as facets [2], the twos:AnalizedEntityFacet class
models the relevant characteristics. Finally, the twos:AnalysisResult class provides the
necessary link with the Twitter ontology, previously described. It includes the senti-
ment analysis results, represented by twos:TweetSetimentSet and the emotion analysis
result, represented by twos:TweetEmotionSet. The twos:TweetSentiment class is used
to represent the detected sentiment and stores the associated strength through the
twos:hasSentimentStrength. The twos:TweetEmotion class provides the link with the
detected emotion from the Emotion Ontology and stores the strength of the detected
emotion through the twos:hasEmotionStrength.
4 Ontology based Sentiment and Emotion Analysis
The section shows how the proposed ontologies can be used to perform automatic se-
mantic web-based sentiment and emotion social media analysis.
Fig. 8. Sentiment and emotion analysis steps
Extracting sentiments and emotions from tweets is known to be a challenging task
for several reasons. Among the difficulties that were encountered while performing as-
pect sentiment and emotion analysis, it can be mentioned the huge variety of topics
covered, the informality of the language, as well as the extensive usage of abbreviations
and emoticons. Besides this, the concise nature of the Twitter messages can be consid-
ered both an advantage and a drawback. Further reasons are explained in [29, 30].
The steps used for sentiment and emotion analysis are shown in Fig. 8 and further
described in the subsections bellow.
4.1 Tweet Retrieval
First, the tweets are retrieved using the Twitter Public Stream API, using as track pa-
rameters all the combinations between the keywords associated with the individuals
belonging to twos:AnalyzedEntity and the corresponding individuals from the twos:An-
alyzedEntityFacet class.
Given the fact that unexpected or important events can immediately lead to huge
number of tweets being written every minute, the retrieved tweets are first stored in a
high performance non-relational database and are only afterwards analyzed. Based on
the in-depth comparison of existing non-relational databases provided in [31] and on
our preliminary tests, Apache Cassandra has been chosen for the proposed platform.
4.2 Language Identification
An accurate identification of the language used to write the tweet is highly important
given the fact that many natural language processing algorithms and linguistic re-
sources can only be used with the language for which they were created, with additional
customizations required for other languages.
Previously, the language had to be determined using language detection algorithms
adapted for social media, such as the one presented in [32], which include a modified
version of the original TextCat identification algorithm described in [33]. The algorithm
uses n-gram frequency models to discriminate between the different languages. Cur-
rently, the response received from the Twitter API also includes a field with the detected
language.
While adapting the required algorithms and linguistic resources for each language,
holds the promise of providing more accurate results, automatic translation, such as the
one provided by Google Translate API can also be used for translating the text of the
tweets written in other languages.
The Twitter and TweetOntoSense ontologies presented in this paper are language
independent. However, the Emotion Ontology currently includes emotion words only
for English and Spanish. MultiWordNet can be used to populate the ontology with emo-
tion words for additional languages.
4.3 Preprocessing
The second step represents the preprocessing phase in which tokenization, normaliza-
tion and stemming are applied, as shown in Fig. 9. A comprehensive discussion regard-
ing the role of preprocessing can be found in [34].
Given the fact that many users write messages using a casual language, the normal-
ization process includes:
Removing duplicated letters, which frequently occur in twitter messages and empha-
size a particular word, in order not to interfere with the stemmer. For example the
first tweet in the Sentiment140 corpus
7
, presented in [30] is:
“I loooooooovvvvvveee my Kindle2. Not that the DX is cool, but the 2 is fantastic in its own
right.”
Converting all-caps words to lower case. While it can be argued that further infor-
mation regarding the intensity of a sentiment could be extracted from the use of all-
caps [16], in this paper it has been chosen to only focus on extracting the associated
emotions. An example of a tweet that uses All-caps is:
“My Kindle2 came and I LOVE it! :)”.
1. To kenization
2. Normalization
3. Stemming
Twitter message
Preprocessed
tweet text
Fig. 9. Preprocessing steps
7
http://www.sentiment140.com/
Replacing hashtags with the corresponding words.
Replacing abbreviations with the corresponding regular words taken from the Inter-
net Lingo Dictionary.
AHH YES LOL IMA TELL MY HUBBY TO GO GET ME SUM MCDONALDS
Replacing emoticons with the corresponding emotions from the ontology of emo-
tional categories. The Internet Lingo Dictionary
8
has been used to gather the emoti-
cons, together with their meaning, although other sources such as the Smiley Ontol-
ogy could prove equally useful. A similar set of emoticons is used in [30], with the
mention that they are only divided into emoticons for expressing positive and nega-
tive feelings. Table 1 includes the emoticons that are mapped to the word happiness
during the preprocessing phase.
Table 1. Emotions mapped to happiness
:)
: )
:-)
:-))
:-)))
;)
;-)
ˆ_ˆ
:-D
:D
=D
C:
=)
Table 2 includes the emoticons that are mapped to the word surprise during the prepro-
cessing phase.
Table 2. Emotions mapped to surprise
:0
Table 3 includes the emoticons that are mapped to the word sadness during the prepro-
cessing phase.
Table 3. Emotions mapped to sadness
:-(
:(
:((
: (
D:
Dx
‘n’
:\
/:
):-/
:’
=’[
:_(
/T_T
TOT
;_;
(:-(
The last operation of the preprocessing phase consists in applying the Porter stemmer
on the resulting sequence of words.
4.4 Sentiment and Emotion Identification
In the last step, sentiments and emotions are extracted from the preprocessed tweets.
The proposed ontologies and approach can easily be used with more advanced aspect
sentiment and emotion mining algorithms, like the ones presented in [11, 13, 20].
8
http://www.netlingo.com/smileys.php
As the novelty of the proposed approach lies in the ontology-based analysis of tweets
preceding and following the sentiment analysis phase, we have chosen a simple senti-
ment and emotion mining approach, which only focuses on extracting explicit senti-
ments and emotions.
Thus, emotions are determined by comparing the processed tweet with the stemmed
versions of the individuals in the enriched ontology of emotion categories. Sentiments
were determined by grouping the emotions in positive, negative and neutral ones. The
resulting knowledge is saved in the triple store for further analysis using SPARQL que-
ries, as it is shown in the fourth section of the paper. Even though the proposed emotion
analysis approach is relatively simple, it has been found to provide fairly good results
when tested on a publically available corpus
9
, containing 5513 tweets collected for the
search terms “Microsoft”, “Apple”, “Twitter” and “Google”, which were annotated
with the following sentiment labels: positive, negative, neutral and irrelevant. From the
above mentioned corpus, only the positive and negative tweets were analyzed, as they
are the ones that could express emotions. Thus, a subset of 973 tweets was selected for
further analysis, representing 17.64% from the initial set.
Table 4. Emotion analysis results on the analyzed corpus
Apple
Google
Microsoft
Twitter
Total
like
13
5
7
9
35
love
7
6
6
6
25
hate
6
1
2
1
10
hope
3
0
1
0
4
upset
1
0
0
0
1
31
12
16
16
75
After comparing these tweets with the words included in the emotion ontology, 75
tweets were found to express emotions, the most frequent ones being “like” (35), “love”
(25), “hate” (10) and “hope” (4). An overview of the results is given in Table 4. The
results grouped by the topmost emotions in the ontology, are shown in Fig. 10. Analyz-
ing the emotions at different levels of specificity can more easily be performed thanks
to the hierarchical organization of the various emotions.
Unrevealing a significant number of emotions using a simple detection approach,
proves once more that user’s frequently express opinions in social media messages.
Aspect opinion mining can be performed by comparing the preprocessed tweets with
the twos:AnalizedEntityFacet individuals associated with the twos:AnalyzedEntity for
which the tweet has been retrieved.
9
http://www.sananalytics.com/lab/twitter-sentiment/
Fig. 10. Emotion analysis results on the analyzed corpus
5 Social Media Analysis
The proposed semantic analysis approach can be used to develop an end-to-end ontol-
ogy-based social media analysis platform. A complete semantic approach is provided
through:
modelling the analyzed emotions using the selected Emotion Ontology;
modelling the Twitter related data using the proposed Twitter Ontology;
modelling the analyzed entities, representing for example products or services, their
facets and the analysis results using the proposed TweetOntoSense Ontology.
The approach offers multiple advantages, including the possibility to exploit the vast
amount of information readily available in the Linking Data Cloud using the technolo-
gies associated with the semantic web.
Moreover, using semantic web inference, for example, new relations between the
collected information can be discovered automatically. Thus, if the em:offended emo-
tion is associated to a tweet, during the emotion identification phase, the inference en-
gine also associates the more general em:indignation and em:anger emotions. Fig. 11
shows the hierarchy relation between the three emotions in the Emotion Ontology.
Therefore, emotion analysis can easily be performed at various granularity levels.
SPARQL queries provide the necessary mean for performing advanced analysis,
while their structured result can easily be processed for creating meaningful charts and
data tables in the user interface. For example, the following query retrieves from the
triple store all the studied entities together with the detected emotions.
0
5
10
15
20
25
Apple Google Microsoft Twitter
SELECT ?analyzedEntity ?emotion
WHERE
{ ?analysisResult rdf:type twos:AnalysisResult;
twos:hasEmotions ?tweetEmotionSet.
?tweetEmotionSet twos:hasTweetEmotion ?tweetEmotion.
?tweetEmotion twos:hasEmotion ?emotion;
twos:hasAnalyzedEntity ?analyzedEntity.
}
ORDER BY ASC(?analizedEntity)
Thing
Emotion
Anger
owl:subClassOf
owl:subClassOf
Indignation
owl:subClassOf
offended
rdf:type
Fig. 11. Emotion hierarchy
Using also inference, the query bellow returns the users that have written tweets which
express emotions derived from em:happiness, ordered by the influence of each user,
measured as the number of followers. Influencers, defined as users with a large number
of followers, can thus be easily determined for each type of emotion.
SELECT ?user ?tweetContent ?followerCount
WHERE
{
?analysisResult rdf:type twos:AnalysisResult;
twos:hasEmotions ?tweetEmotionSet;
twos:analyzedTweet ?tweet.
?tweetEmotionSet twos:hasTweetEmotion ?tweetEmotion.
?tweetEmotion twos:hasEmotion ?emotion.
?tweet sioc:content ?tweetContent;
sioc:hasCreator ?user.
?user tw:hasFollowerCount ?followerCount.
?emotion rdf:type em:Happiness.
}
ORDER BY DESC(?followerCount)
The architecture of a semantic social media platform, using the proposed approach is
shown in Fig. 11. The social media analysis dashboard, representing the user interface
of the platform, can communicate with a common REST approach with a Web API,
labeled TweetOntoSense API in the figure. The API performs the necessary SPARQL
queries on the semantic database and returns the results to the social media analysis
dashboard in JavaScript Object Notation – JSON format. By reusing classes from the
Linking Open Data Cloud, advanced analysis can be performed, that tap into the vast
information available in knowledgebase such as DbPedia.
TweetO ntoSense API
Social Me dia Analsyi s
Dashboard
HTTP
Request / Response
SPARQL Q uery
Request/Re sponse
TweetOntoSense
Twitter
Emotion
Linking Open Data C loud
Fig. 12. Social media analysis platform
6 Concluding Remarks
The present paper proposes a novel ontology-based social media sentiment and emotion
analysis approach that better captures the wide array of feelings expressed in the mil-
lions of tweets published every day. While existing approaches only associate simple
positive, negative or neutral perceptions, TweetOntoSense paves the way towards fine-
grained analysis using semantic web technologies, thus unlocking a vast amount of
emotional information that has previously been unavailable to companies and public
authorities, trying to better understand their customers’ opinions through social media
analysis. A Twitter ontology reusing classes and properties from well-known ontolo-
gies is also proposed.
Among the further research directions, we consider both extending the proposed ap-
proach to other online social media networks, such as Facebook, LinkedIn and Google+
and also analyzing how the expressed emotions change over time as a result of the
changes in user perception. The proposed ontologies will be available for download at
https://github.com/lcotfas/TweetOntoSense.
Acknowledgments. The study was produced as part of the SCOPANUM research pro-
ject, supported by grants from CSFRS (http://csfrs.fr/), and a doctoral grant from Pays
de Montbéliard Agglomération (http://www.agglo-montbeliard.fr/). The authors also
acknowledge the support of Leverhulme Trust International Network research project
"IN-2014-020".
References
1. Pak, A., Paroubek, P.: Twitter as a Corpus for Sentiment Analysis and Opinion
Mining. In: Proceedings of the Seventh International Conference on Language Re-
sources and Evalua-tion. pp. 1320–1326. , Valletta (2010).
2. Kontopoulos, E., Berberidis, C., Dergiades, T., Bassiliades, N.: Ontology-based
sentiment analysis of twitter posts. Expert Syst. Appl. 40, 4065–4074 (2013).
3. Delcea, C., Cotfas, L.-A., Paun, R.: Understanding Online Social Networks’ Users
– A Twitter Approach. In: Hwang, D., Jung, J.J., and Nguyen, N.-T. (eds.) Com-
putational Collective Intelligence. Technologies and Applications. pp. 145–153.
Springer International Publishing, Cham (2014).
4. Torkildson, M.K., Starbird, K., Aragon, C.: Analysis and Visualization of Senti-
ment and Emotion on Crisis Tweets. In: Luo, Y. (ed.) Cooperative Design, Visu-
alization, and Engineering. pp. 64–67. Springer International Publishing, Cham
(2014).
5. Rill, S., Reinel, D., Scheidt, J., Zicari, R.V.: PoliTwi: Early detection of emerging
political topics on twitter and the impact on concept-level sentiment analysis.
Knowl.-Based Syst. 69, 24–33 (2014).
6. Khadjeh Nassirtoussi, A., Aghabozorgi, S., Ying Wah, T., Ngo, D.C.L.: Text min-
ing for market prediction: A systematic review. Expert Syst. Appl. 41, 7653–7670
(2014).
7. Ghiassi, M., Skinner, J., Zimbra, D.: Twitter brand sentiment analysis: A hybrid
system using n-gram analysis and dynamic artificial neural network. Expert Syst.
Appl. 40, 6266–6282 (2013).
8. Mostafa, M.M.: More than words: Social networks’ text mining for consumer
brand sentiments. Expert Syst. Appl. 40, 4241–4251 (2013).
9. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the so-
cial web. J. Am. Soc. Inf. Sci. Technol. 63, 163–173 (2012).
10. Robaldo, L., Di Caro, L.: OpinionMining-ML. Comput. Stand. Interfaces. 35,
454–469 (2013).
11. Medhat, W., Hassan, A., Korashy, H.: Sentiment analysis algorithms and applica-
tions: A survey. Ain Shams Eng. J. 5, 1093–1113 (2014).
12. Tsytsarau, M., Palpanas, T.: Survey on mining subjective data on the web. Data
Min. Knowl. Discov. 24, 478–514 (2012).
13. Liu, B.: Sentiment Analysis and Opinion Mining. Synth. Lect. Hum. Lang. Tech-
nol. 5, 1–167 (2012).
14. Shadbolt, N., Berners-Lee, T., Hall, W.: The Semantic Web Revisited. IEEE Intell.
Syst. 21, 96–101 (2006).
15. Borst, W.N.: Construction of engineering ontologies for knowledge sharing and
reuse. Universiteit Twente (1997).
16. Roberts, K., Roach, M., Johnson, J.: EmpaTweet: Annotating and Detecting Emo-
tions on Twitter. In: Proceedings of the Eighth International Conference on Lan-
guage Resources and Evaluation. 3806–3813, Istanbul (2012).
17. Baggia, P., Burkhardt, F., Pelachaud, C., Peter, C., Zovato, E.: Emotion Markup
Language (EmotionML) 1.0, http://www.w3.org/TR/emotionml/.
18. Ashimura, K., Baggia, P., Burkhardt, F., Oltramari, A., Peter, C., Zovato, E.: Vo-
cabularies for EmotionML, http://www.w3.org/TR/2012/NOTE-emotion-voc-
20120510/.
19. Westerski, A., Iglesias Fernandez, C.A., Tapia Rico, F.: Linked opinions: Describ-
ing sentiments on the structured web of data. Presented at the 4th international
workshop Social Data on the Web , Bonn, Germany (2011).
20. Tsytsarau, M., Palpanas, T.: Survey on mining subjective data on the web. Data
Min. Knowl. Discov. 24, 478–514 (2012).
21. Khalid Belhajjame, James Cheney, David Corsar, Daniel Garijo, Stian Soiland-
Reyes, Stephan Zednik, Jun Zhao: PROV-O: The PROV Ontology,
http://www.w3.org/TR/prov-o/.
22. J. Hastings, W. Ceusters, B. Smith, K. Mulligan: Dispositions and processes in the
Emo-tion Ontology. In: Proceedings of ICBO 2011 (2011).
23. Francisco, V., Hervás, R., Peinado, F., Gervás, P.: EmoTales: creating a corpus of
folk tales with emotional annotations. Lang. Resour. Eval. 46, 341–381 (2012).
24. Montejo-Ráez, A., Martínez-Cámara, E., Martín-Valdivia, M.T., Ureña-López,
L.A.: Ranked WordNet graph for Sentiment Polarity Classification in Twitter.
Comput. Speech Lang. 28, 93–107 (2014).
25. M Baldoni, C Baroglio, V Patti, P Rena: From tags to emotions: Ontology-driven
sen-timent analysis in the social semantic web. Presented at the Intelligenza Arti-
ficiale (2012).
26. Togias, K., Kameas, A.: An Ontology-Based Representation of the Twitter REST
API. Presented at the November (2012).
27. Breslin, J.G., Passant, A., Decker, S.: The Social Semantic Web. Springer (2009).
28. Fornara, N., Ježić, G., Kušek, M., Lovrek, I., Podobnik, V., Tržec, K.: Semantics
in Multi-agent Systems. In: Ossowski, S. (ed.) Agreement Technologies. pp. 115–
136. Springer Netherlands (2013).
29. D. Maynard, K. Bontcheva, D. Rout: Challenges in developing opinion mining
tools for social media. In: Proceedings of the Eighth International Conference on
Language Resources and Evaluation. pp. 15–22. , Istanbul (2012).
30. E. Kouloumpis, T. Wilson, J. Moore: Twitter sentiment analysis: The good the bad
and the omg! In: Proceedings of the Fifth International Conference on Weblogs
and Social Media. pp. 538–541. , Barcelona (2011).
31. T Rabl, S Gómez-Villamor, M Sadoghi, V Muntés-Mulero, H.-A. Jacobsen, S
Mankov-skii: Solving big data challenges for enterprise application performance
management. In: Proceedings of the VLDB Endowmen. pp. 1724–1735 (2012).
32. Carter, S., Weerkamp, W., Tsagkias, M.: Microblog language identification: over-
coming the limitations of short, unedited and idiomatic text. Lang. Resour. Eval.
47, 195–215 (2013).
33. Cavnar, W., Trenkle, J.: N-gram-based text categorization. In: Proceedings of
SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Re-
trieval (1994).
34. Bao, Y., Quan, C., Wang, L., Ren, F.: The Role of Pre-processing in Twitter Sen-
timent Analysis. In: Intelligent Computing Methodologies. pp. 615–624. Springer
(2014).