Conference PaperPDF Available

Twitter Sentiment Analysis: A Case Study in the Automotive Industry


Content may be subject to copyright.
Twitter Sentiment Analysis: A Case Study in the
Automotive Industry
Sarah E. Shukri
Business Information
Technology Department
The University Of Jordan
Amman, Jordan
Rawan I. Yaghi
Business Information
Technology Department
The University Of Jordan
Amman, Jordan
Ibrahim Aljarah
Business Information
Technology Department
The University Of Jordan
Amman, Jordan
Hamad Alsawalqah
Computer Information
Systems Department
The University Of Jordan
Amman, Jordan
Abstract Sentiment analysis is one of the fastest growing
areas which uses the natural language processing, text mining
and computational linguistic to extract useful information to help
in the decision making process. In the recent years, social media
websites have been spreading widely, and their users are
increasing rapidly. Automotive industry is one of the largest
economic sectors in the world with more than 90 million cars and
vehicles. Automotive industry is highly competitive and requires
that sellers, automotive companies, carefully analyze and attend
to consumers’ opinions in order to achieve a competitive
advantage in the market. Analysing consumers’ opinions using
social media data can be very great way for the automotive
companies to enhance their marketing targets and objectives. In
this paper, a sentiment analyses on a case study in the automotive
industry is presented. Text mining and sentiment analysis are
used to analyze unstructured tweets on Twitter to extract the
polarity, and emotions classification towards the automotive
classes such as Mercedes, Audi and BMW. We can note from the
emotions classification results that, “joy” category is better for
BMW comparing to Mercedes and Audi, The sadness
percentage is larger for Audi and Mercedes comparing to BMW.
Furthermore, we can note from the polarity classification that
BMW has 72% positive tweets compared 79% for Mercedes and
83% for Audi. In addition, the results show that BMW has 8%
negative polarity compared 18% for Mercedes and 16% for
Keywords Sentiment Analysis; Twitter; Automotive;
Others’ opinions have always been an important piece of
information for consumers when it’s time to make buying
decision. Long before awareness of the World Wide Web
became widespread, people often rely on their friends’
recommendations and specialized magazines or websites as
the main sources of information. But with the growth of the
web over the last decade, the social media nowadays provides
new tools to efficiently create and share useful information
[1]. This made it possible to find out about experiences and
the opinions almost everywhere (blogs, forums, social
networks, news portals, and content-sharing sites, etc.).
Researches indicate that using the social media sites is
considered as the best way to grow a business in terms of
money, time, effort and other resources [2].
Although these opinions are meant to be helpful, the
massive availability of such opinions and their unstructured
nature make it difficult for companies to benefit from them.
To solve this issue, a number of techniques for analysing data
generated by users on social media sites have been developed.
Sentiment analysis which is known as opinion mining is one
such recent techniques. Sentiment analysis uses natural
language processing, text mining and computational linguistic
to extract useful information and knowledge from source data.
The purpose of sentiment analysis is to classify polarity from a
source text into positive, neutral and negative. Text mining is
a crucial step in sentiment analysis where unstructured data
are analysed and scored based on how much it relates to a
specific concept, in order to be classified later based on its
given score [3].
Automotive industry is one of the largest and highly
competitive economic sectors in the world. Due to the high
competition, automotive companies are moving toward using
social media sites to reach further customers and advertise
their products in considerably short time.
Twitter is one of the highest growing social media websites
in the world. Twitter is a micro blogging services which
enables users to tweet within any topic with a maximum
length of 140 characters. As of June 20151, Twitter has more
than 500 million users, out of which more than 302 million are
active users. With an average of 500 million tweets created
daily; twitter became one of the greatest sources of
information that is available on the Internet [4]. Thus, twitter
data can be very useful for automotive marketers because it
can be used for mining consumers’ opinions and reviews in
the automotive industry using sentiment analysis. This can
provide useful insights to help companies in creating a
competitive advantage over their competitors.
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
978-1-4799-7431-3/15/$31.00 ©2015 IEEE
This research applies sentiment analysis to analyse peoples’
opinions and reviews about three automotive companies:
Mercedes, Audi, and BMW. To do so, tweets are extracted
from twitter and processed using text mining techniques.
These tweets are then used in the sentiment analysis to classify
tweets based on the sentiment that is expressed in a text [5]. At
the end, tweets are classified into three categories: positive
sentiment, negative sentiment, or neutral sentiment. As the
attempts to apply applying sentiment analysis in the
automotive industry, to the best of our knowledge, are very
few [10, 11], the results of this research can provide further
insights about the importance of analysing the consumers’
reviews and opinions in this industry.
The remainder of this paper is organized as follows: Section
II presents the research work related to this research. Section
III presents the methodology. Section IV presents a
demonstration of the method on the case study and discusses
the results. Section V concludes the paper with a summary and
an outlook on future research direction.
With the explosion of Web 2.0 platforms, social media sites
become a huge source for consumer voices. Capturing and
analyzing public opinions from social media sites has recently
enjoyed a huge burst of research activity. One of The resulting
emerging fields is sentiment analysis [1, 5]. Subsequently
there have been literally hundreds of papers published on the
subject. Among these papers, we focus on the most related to
the work presented in this paper as follows:
In paper [6], the authors analyzed three of the most popular
companies in pizza industry by using text mining. The authors
studied information from social media sites about the users of
those companies and their competitors. The goal was to help
those companies improve their services and strategies to
attract more customers. They found that social media sites
have an important role in creating competitive advantage.
Authors recommended that good understanding and use of
social media users’ information can improve the relationship
of companies with their users, improve their services’ levels,
and improve the quality of their decision.
Another work [7] presented a new approach to provide
decision support for vehicle defect discovery. Authors used
many techniques such as text mining and sentiment analysis
on popular social media communities. Their focus was on
improving vehicle quality management by analyzing social
media. They found that a good analysis of social media data
can improve automotive quality management strategies.
As an attempt to overcome the challenges that may face the
developers while developing opining mining tools, the authors
in [8] developed a model rule-based approach which can
analyze the linguistics of social media sites.
In [9], we can find a case study which applies sentiment
analysis on twitter. Authors presented a method to make
sentiment analysis and opinion mining using tweets. The first
step in the presented method is collecting the corpus and
preparing it for the analysis while the second one is building
the model to classify the tweets using Naive Bayes algorithm
(NB) based on sentiments (positive, negative and neutral).
Another work [10] introduced what is called the J.D. Power
and Associates (JDPA) sentiment Corpus. The JDPA corpus
consists of users’ blog posts containing opinions about
automobiles. Moreover, the authors presented statistics
including inter-annotator agreement and catalogued
components of sentiment that occur naturally.
The authors in [11] analyzed a data set of around 730,000
Tweets published in a time frame of 19 weeks using sentiment
analysis. Within this data set, they analyzed those Tweets
dealing with the corporate crisis of Toyota in 2010. Their
focus was on the dynamics of discussions in social media in
order to reflect sentiments within these discussions. The
authors Identified and investigated specific stages of
communication, which they called “quiet stages” and “peaks”.
As the usage of social media sites grows and extends, the
companies can use social media sites to assess their state in the
market as well as their competitors. This can be done by
studying the data generated by users on these sites. Such data
tells about users’ opinions and comments about these
companies’ products or services. Thus, in this paper we will
study the automotive industry in social media, and try to
answer the following questions:
What is the rate of using these companies’ data by users?
What is the percentage of negative reviews and comments
compared to the positive ones?
Who is the leader in automotive sector based on polarity
classifications of reviews and comments?
While the social media provides a great engagement of
users, and leads to incredibly high level of communication
between the user and the seller, still there are some industries
that do not engage in social media. The automotive industry
represents a great example of engagement in social media, as
published in 2014 CMO council report: 1 out of 4 - which
equals 23%- of car buyers has discussed other users’
experiences and reviews before purchasing their car. 38% of
cars’ costumers said that they will use social media in the next
purchase. 84% of the car’s customers use Facebook with a
24% of them using social media sites to purchase their last car
and in the range of October 2012- April 2013 an amazing
increase in the number of clicks of automotive Ad’s on
Facebook occurred to jump up from 16% to 39%2.
In this paper, we will first discuss the level of engagements
in social media of these three automotive manufacturers. We
extracted the engagements percentage from the Talkwalker
API3. BMW, Mercedes and Audi are defined to be of the
largest automotive brands in Europe, it’s very critical to
discuss the level of their engagement in social media. Figure 1
shows the engagement percentage in different social media
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
As we can note in Figure 1, BMW has the largest
engagement percentage in twitter with a percentage of 62%.
Mercedes also has the largest engagement percentage throw
online news, Blogs, and Other with 18%, 6%, and 30%,
respectively. Audi also has engagement percentage through
twitter comparing to Mercedes with a percentage of 59%
(Audi), and 47% (Mercedes).
Figure 1. Social Media Sites engagement percentage
A. Data collection
In this paper, we collected data from twitter using the
twitter API. The corpus had 3000 tweets, tweets are extracted
using R4.
B. Data pre-processing
Tweets are filtered to be in English language. The corpus
contains three types of cars: Mercedes, Audi, and BMW. Each
type is represented by 1000 tweets. The tweets are extracted
based on the search query using “@” annotation followed by
the car’s type. To build a good experiment, Dataset of each
car's type was extracted from twitter pages and users. After
that, we have started to prepare the extracted datasets by
cleaning them from any unnecessary characters such as
retweets and usernames' symbols, hashtags, numbers,
punctuations, stop words, whitespaces and html links. In this
paper, we applied the following text mining pre-processing
· Tokenization: that reads the text that will be mined and
removes all tabs and punctuations between words and
replaces them with a white space,
· Filtering: that will remove words such as: stop words,
extremely repeated words and rarely repeated words,
· Lemmatization: which will be used to transform all the
verbs to the infinite tense and all the nouns to the singular
· Stemming: will be used to return all the words to their
basic forms where it will remove the plural ‘s’ from the
nouns and the ‘ing’ from the verbs.
C. Sentiment Analysis Models
We used the classification algorithm Naïve Bayes (NB) to
classify the polarity and emotions in the sentiment analysis.
The NB algorithm is simple, easy to implement and efficient
with acceptable accuracy. Furthermore, two sentiment models
are investigated based on polarity lexicon [13], and emotions
lexicon [14].
The NB algorithm is a simple probabilistic model that
assumes all the data attributes are independent. The
probabilistic model uses the Bayes theorem to solve the
classification problems such as the maximum posterior
probability of the class label given the attributes set is
calculated. Bayes theorem is given by the following equation:
Where C is a Class label, X is the attributes set, while P(C)
and P(X|C) are the prior probability of the class and the
conditional probability of the attributes given the class.
The first sentiment model uses NB classifier, which is
trained by the training data set, and makes use of Wiebe's
polarity lexicon [13]. The training data set is annotated to
three classes: positive, neutral and negative tweets.
The NB polarity classifier uses polarity lexicon based on
the matching criteria between the tweet words and lexicon
words. When the training process is finished and the model is
well trained, the second step begins to test the model using
testing data set, which is not labeled. The testing process is
used to assess the accuracy of the built model. The last step is
to validate the model and extract the polarity percentages for
the three categories; positive, negative, and neutral.
The second NB classifier is trained on training data set and
makes use of emotions lexicon using the Strapparava emotions
lexicon [14]. The training data set is annotated to seven
classes: anger, disgust, fear, joy, sadness, surprise, and
unknown tweets. Like the polarity classification, the matching
criteria between the tweet words and emotions lexicon words.
The tweets collected about BMW, Mercedes, and Audi
contains the @BMW tag, @Mercedesbenz, and @Audi,
respectively. Each tweet is analysed and classified to be
positive or negative or neutral tweet based on a query term and
polarity classification. Table I, Table II, and Table III contain
some tweet samples about BMW, Mercedes, and Audi,
respectively and the polarity classifications.
Polarity Classification
#BMW Nice car, you can try
Elegance and sportiness united
in one vehicle: the new
#BMW #series Coupé
such a bad car #BMW
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
@MercedesBenz Intelligent
innovation and safety as never before.
Preview of the future of the #EClass
Amazing @MercedesBenz 300 SLR
@MercedesBenz That's not what we'd
expect. Please contact your local
Workshop so that our Technicians
inspect the issue.
Polarity Classification
@audi Probably one of my
worst decisions was buying an
Proud to own an Audi @audi
@audi Sorry RPM but this is
rubbish. There is so much
great motor sport happening
and you dish up crap
@Audi Excellent SUV from
Audi! Beautiful Car!
Polarity classification for BMW, Mercedes, and Audi are
shown in Figure 2. The figure shows that BMW has 72%
positive tweets compared 79% for Mercedes and 83% for
Audi. Furthermore, the figure shows that BMW has 8%
negative polarity compared 18% for Mercedes and 16% for
Audi. This gives a good indication for customers seeking to
buy cars from the manufacturers that have a good reviews and
comments from users owning this car and it gives indications
to competitors that Audi is a huge competitor.
Fig 2. Polarity Classification for BMW, Mercedes, Audi
Figure 3 shows emotions classification results for three
automotive companies. BMW emotion classifications are
79% labeled as “unknown”, 5% “Joy”, 0.5% “Surprise”, 9%
“Sadness”, 0% “Fear”, 5.5% “Anger” and 1% for “Disgust”.
Mercedes emotions categories are 56.6% labeled as
“Unknown”, 31.9% “Joy”, 0.5% “Surprise”, 4.1% “Sadness”,
0.4% “Fear”, 6.4% “Anger” and 0.1% for “Disgust”. Audi
emotions categories are 63.2% labeled as “Unknown”, 10%
“Joy”, 17.7% “Surprise”, 5.1% “Sadness”, 0.2% “Fear”, 1.3%
“Anger” and 2.4% for “Disgust”. These results give a good
indicator for customers seeking to buy cars and help them to
take a right decision. We can note that, “joy” category was
better for BMW comparing to Mercedes and Audi. This is can
be due to the fact that positive reviews are not necessary to be
“Joy” always, other categories can be also determined as a
positive, since it has no negative implication.
Fig 3. Emotion Classifications for BMW, Mercedes, and Audi
Sentiment Analysis is considered one of the most attractive
fields that encourage to study and apply in various sectors. In
this paper, sentiment analysis models are applied on three of
most leading automotive industry companies to extract the
polarity and emotions (opinions) of customers around each
company, which are very useful information that helps in
marketing. The results showed that Audi’s positive polarity
was higher (83%) than other companies. On the other hand,
the negative polarity of Audi is less than all other companies.
This means that for example offers in Audi’s page would
circulate to higher number of satisfied people than in BMW
and Mercedes.
Furthermore, the analysis results show that that the
percentage of positive reviews in Audi are the most among the
three companies with a percentage of 83%. In addition, Audi
negative polarity is less than others with a percentage of 16%.
We can conclude that, the Audi users have more satisfaction
comparing to the other users. This will help the users that
welling to buy a car to compare between the three of the
companies based on the previous users' opinions. In addition,
the emotions classification results were consistent with the
polarity classifications, and give more information about each
polarity class.
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
[1] Cambria, Erik, et al. "New avenues in opinion mining and sentiment
analysis."IEEE Intelligent Systems 2 (2013): 15-21.
[2] Edosomwan, Simeon, et al. "The history of social media and its
impact on business." Journal of Applied Management and
entrepreneurship 16.3 (2011): 79-91.
[3] Li, Nan, and Desheng Dash Wu. "Using text mining and sentiment
analysis for online forums hotspot detection and forecast." Decision
Support Systems 48.2 (2010): 354-368.
[4] Lima, Ana CES, and Leandro N. de Castro. "Automatic sentiment
analysis of Twitter messages." Computational Aspects of Social
Networks (CASoN), 2012 Fourth International Conference on.IEEE,
[5] Pang, Bo, and Lillian Lee. "Opinion mining and sentiment
analysis."Foundations and trends in information retrieval 2.1-2
(2008): 1-135.
[6] He, Wu, ShenghuaZha, and Ling Li. "Social media competitive
analysis and text mining: A case study in the pizza
industry." International Journal of Information Management 33.3
(2013): 464-472.
[7] Abrahams, Alan S., et al. "Vehicle defect discovery from social
media."Decision Support Systems 54.1 (2012): 87-97.
[8] Maynard, Diana, KalinaBontcheva, and Dominic Rout. "Challenges
in developing opinion mining tools for social media." Proceedings of
the@ NLP can u tag# usergeneratedcontent (2012): 15-22.
[9] Pak, Alexander, and Patrick Paroubek. "Twitter as a Corpus for
Sentiment Analysis and Opinion Mining." LREC.Vol. 10. 2010.
[10] Kessler, Jason S., and Nicolas Nicolov. "The JDPA Sentiment Corpus
for the Automotive Domain."
[11] Stieglitz, Stefan, and Nina Krüger. "Analysis of sentiments in
corporate Twitter communicationA case study on an issue of
Toyota." Analysis 1 (2011): 1-2011.
[12] Rish, Irina. "An empirical study of the naive Bayes classifier." IJCAI
2001 workshop on empirical methods in artificial intelligence.Vol.
3.No. 22.IBM New York, 2001.
[13] Wilson, Theresa, JanyceWiebe, and Paul Hoffmann. "Recognizing
contextual polarity in phrase-level sentiment analysis." Proceedings
of the conference on human language technology and empirical
methods in natural language processing.Association for
Computational Linguistics, 2005.
[14] Strapparava, Carlo, and Alessandro Valitutti. "WordNet Affect: an
Affective Extension of WordNet." LREC.Vol. 4. 2004.
2015 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
... It is one of the technological breakthroughs derived from environmental awareness and is predicted to be a "hot product" for multiple brands in the next few years. The significant increase in public interest eventually intrigues investors, which drives the industry to move forward [1] and efficiently apply social media advertisement to reach target customers [8]. Notable brands that have already been producing electric cars are Volkswagen, General Motor, and Chrysler. ...
... 638.361 tweets contain the keywords "#volkswagen", "#vw", "#@gm", "#generalmotors", "#chrysler" by the streaming process, which enables us to gain the real-time data provided by Twitter Application Programming Interface (API) [8]. The headline news is collected by a streaming method as well by using Python. ...
... Sentiment analysis refers to the use of natural language processing, text analysis, and linguistics to form valuable insights from data to help people make decisions [51]. Sentiment analysis allows for the bulk extraction of sentiment tendencies expressed by authors of Weibo and converts unstructured Weibo texts into structured sentiment indices for introduction into the model. ...
Full-text available
Intelligent vehicles refer to a new generation of vehicles with automatic driving functions that is gradually becoming an intelligent mobile space and application terminal by carrying advanced sensors and other devices and using new technologies, such as artificial intelligence. Firstly, the traditional autoregressive intelligent vehicle sales prediction model based on historical sales is established. Secondly, the public opinion data and online search index data are selected to establish a sales prediction model based on online public opinion and online search index. Then, we consider the influence of KOL (Key Opinion Leader), a sales prediction model based on KOL online public opinion andonline search index is established. Finally, the model is further optimized by using the deep learning algorithm LSTM (Long Short-Term Memory network), and the LSTM sales prediction model based on KOL online public opinion and online search index is established. The results show that the consideration of the online public opinion and search index can improve the prediction accuracy of intelligent vehicle sales, and the public opinion of KOL plays a greater role in improving the prediction accuracy of sales than that of the general public. Deep learning algorithms can further improve the prediction accuracy of intelligent vehicle sales.
... Specifically, using evaluations on social media due to the quick advancement of information technology. By using social media platforms, such as Twitter and Facebook, users can freely express their opinions and share their ideas, thoughts, experiences, and feelings (Shukri et al. 2015;Felt 2016;Wang et al. 2017;Poecze et al. 2018;Choi et al. 2020). ...
Full-text available
The present study aims to create a framework that analyses user posts related to a product of interest on social networking platforms. More precisely, by applying information mining techniques, posts are categorised according to the intention they express, the sentiment polarisation, and the type of opinion. The model operates based on linguistic rules, machine learning, and combinations. Six different methodologies are implemented to extract intent, sentiment, and type of opinion from a tweet. The final model automatically detects intention to buy or not to buy the product, intention to compare the product with other competitors, and finally, intention to search for information about the product. It then categorises the text according to the sentiment and depending on their expressed opinion. The dataset comprises tweets for each day of the iPhone 5’s life cycle, corresponding to 365 days. Additionally, it demonstrated that the business’s external or internal decisions affect the public purchasing audience’s opinions, sentiments, and intentions expressed on social media. Lastly, as a Business Intelligence tool, the framework recognises and analyses these points, which contribute substantially to the company’s decision-making through the findings.
... Shukri et al. [34] used Wilson et al. [35] polarity lexicon for classifying polarity, while Strapparava et al. [36] used an emotion lexicon for classifying emotions. The case study was related to polarity and emotion classification with regard to the automation industry. ...
... They post status (messages) called tweets. It is a short message with a length of 140 characters [7] initially which has upgraded the character length now. Hence, Twitter data is also used as the base to classify Build vs. Buy. ...
Full-text available
Over the past few decades, multiple software development process models, tools, and techniques have been used by practitioners. Despite using these techniques, most software development organizations still fail to meet customer's needs within time and budget. Time overrun is one of the major reasons for project failure. There is a need to come up with a comprehensive solution that would increase the chances of project success. However, the "make vs. buy" decision can be helpful for "in time" software development. Social media have become a popular platform for discussion of all sorts of topics, so software development is no exception. Software developers discuss all the pros and cons of making vs. buy decisions on Twitter and other social media platforms. Twitter trending is a typical feature that evaluates the level of popularity of a specific event on online networking. A mixed-method approach comprising of interviews of software industry experts and Twitter data extraction is applied to scrutinize the effective decision of software build vs. buy decision. The findings of the analysis show that software makes vs. buy decisions depend on several factors including cost, development technology, software development team skills, and time. Based on the finding of the study a framework is proposed for the decision to build versus buy in Small and medium-sized enterprises (SMEs). Furthermore, the framework has been designed to statistically indicate make versus buy decisions of the organization and to suggest appropriate choices based on different parameters.
... Development of NLP dashboard, web application and Cloud deployment API will serve as master source for the enterprises to identify the topic models and sentiments of electric vehicles and to take business decisions" [9][10][11][12]. ...
Full-text available
Twitter is a well-known social media tool for people to communicate their thoughts and feelings about products or services. In this project, I collect electric vehicles related user tweets from Twitter using Twitter API and analyze public perceptions and feelings regarding electric vehicles. After collecting the data, To begin with, as the first step, I built a pre-processed data model based on natural language processing (NLP) methods to select tweets. In the second step, I use topic modeling, word cloud, and EDA to examine several aspects of electric vehicles. By using Latent Dirichlet allocation, do Topic modeling to infer the various topics of electric vehicles. The topic modeling in this study was compared with LSA and LDA, and I found that LDA provides a better insight into topics, as well as better accuracy than LSA.In the third step, the "Valence Aware Dictionary (VADER)" and "sEntiment Reasoner (SONAR)" are used to analyze sentiment of electric vehicles, and its related tweets are either positive, negative, or neutral. In this project, I collected 45000 tweets from Twitter API, related hashtags, user location, and different topics of electric vehicles. Tesla is the top hashtag Twitter users tweeted while sharing tweets related to electric vehicles. Ekero Sweden is the most common location of users related to electric vehicles tweets. Tesla is the most common word in the tweets related to electric vehicles. Elon-musk is the common bi-gram found in the tweets related to electric vehicles. 47.1% of tweets are positive, 42.4% are neutral, and 10.5% are negative as per VADER Finally, I deploy this project work as a fully functional web app.
... Furthermore, by making use of social networks' data, companies may identify these opinions and make decisions based on them to improve in target areas. Such approaches have been made between apparel brands [8], in the automotive industry [9], and in food chains [10], among others. ...
Conference Paper
Full-text available
The amount of information that social networks can shed on a certain topic is exponential compared to conventional methods. As new COVID-19 vaccines are approved by COFEPRIS in Mexico, society is acting differently by showing approval or rejection of some of these vaccines on social networks. Data analytics has opened the possibility to process, explore, and analyze a large amount of information that comes from social networks and evaluate people's sentiments towards a specific topic. In this analysis, we present a Sentiment Analysis of tweets related to COVID-19 vaccines in Mexico. The study involves the exploration of Twitter data to evaluate if there are preferences between the different vaccines available in Mexico and what patterns and behaviors can be observed in the community based on their reactions and opinions. This research will help to provide a first understanding of people's opinions about the available vaccines and how these opinions are built to identify and avoid possible misinformation sources.
Conference Paper
Full-text available
Social media is critical in today's world for exchanging information and disseminating ideas. A person's emotional impact has a significant impact on their day-today life. Sentiment analysis is a form of text mining that locates and pulls out subjective information from sources, allowing a company to track discussions online and monitor social sentiment about their brand, product, or service. Simply put, sentiment analysis helps determine the author's attitude towards a topic. Positive, neutral, or negative pieces of writing are classified by sentiment analysis software. Deep learning algorithms and various functions of natural language processing helps to interpret the written or spoken sentiments regarding a topic. An ecosystem where millions of bytes of data are produced daily has enabled sentiment analysis to be a key tool for interpreting these huge chunks of data. The purpose of this work is to conduct a sentiment analysis on "tweets" by making use of a variety of machine learning algorithms. The study will make an attempt to categorise the polarity of the tweet as either positive, negative, or neutral. In the event that a tweet has only positive, negative, or neutral components, the label assigned to the tweet will be determined by the sentiment that predominates.
Sentiment analysis also called opinion mining, and it studies opinions of people towards products and services. Opinions are very important as the organizations always want to know the public opinions about their products and services. People give their opinions via social media. With the advent of social media like Twitter, Facebook, blogs, forums, etc. sentiment analysis has become important in every field like automobile, medical, film, fashion, stock market, mobile phones, insurance, etc. Analyzing the opinions and predicting the opinion is called sentiment analysis. Sentiment analysis is done using opinion words by classification methods or by sentiment lexicons. This chapter compares different methods of solving sentiment analysis problem, algorithms, its merits and demerits, applications, and also investigates different research problems in sentiment analysis.
Full-text available
Social media popularity and importance is on the increase due to people using it for various types of social interaction across multiple channels. This systematic review focuses on the evolving research area of Social Opinion Mining, tasked with the identification of multiple opinion dimensions, such as subjectivity, sentiment polarity, emotion, affect, sarcasm and irony, from user-generated content represented across multiple social media platforms and in various media formats, like text, image, video and audio. Through Social Opinion Mining, natural language can be understood in terms of the different opinion dimensions, as expressed by humans. This contributes towards the evolution of Artificial Intelligence which in turn helps the advancement of several real-world use cases, such as customer service and decision making. A thorough systematic review was carried out on Social Opinion Mining research which totals 485 published studies and spans a period of twelve years between 2007 and 2018. The in-depth analysis focuses on the social media platforms, techniques, social datasets, language, modality, tools and technologies, and other aspects derived. Social Opinion Mining can be utilised in many application areas, ranging from marketing, advertising and sales for product/service management, and in multiple domains and industries, such as politics, technology, finance, healthcare, sports and government. The latest developments in Social Opinion Mining beyond 2018 are also presented together with future research directions, with the aim of leaving a wider academic and societal impact in several real-world applications.
Conference Paper
Full-text available
Knowing about communication of specific issues in social media has become increasingly important for the reactive and proactive stakeholder-communication of enterprises. Tools have been designed to monitor social media sites and to aggregate data of discussions in social media. However, these tools do not consider the dynamics of discussions and are not able to reflect sentiments within these discussions. In our contribution, we address these aspects by analyzing a data set of around 730,000 Tweets published in a time frame of 19 weeks. Within this data set, we analyzed those Tweets dealing with the corporate crisis of Toyota in 2010. We classified sentiments by using a linguistic approach. In this context, we identified and investigated specific stages of communication ("quiet stages" and "peaks"). Additionally, our study concentrates on the sentiments found in Tweets of the ten most active participants of the discussion.
Full-text available
While much work has recently focused on the analysis of social media in order to get a feel for what people think about current topics of interest, there are, however, still many challenges to be faced. Text mining systems originally designed for more regular kinds of texts such as news articles may need to be adapted to deal with facebook posts, tweets etc. In this paper, we discuss a variety of issues related to opinion mining from social media, and the challenges they impose on a Natural Language Processing (NLP) system, along with two example applications we have developed in very different domains. In contrast with the majority of opinion mining work which uses machine learning techniques, we have developed a modular rule-based approach which performs shallow linguistic analysis and builds on a number of linguistic subcomponents to generate the final opinion polarity and score.
Full-text available
In this paper we present a linguistic resource for the lexical representation of affective knowledge. This resource (named W ORDNET- AFFECT) was developed starting from WORDNET, through a selection and tagging of a subset of synsets representing the affective meanings. In this paper we present a linguistic resource for a lexical representation of affective knowledge. This re- source (named WORDNET-AFFECT) was developed start- ing from WORDNET, through the selection and labeling of the synsets representing affective concepts. Affective computing is advancing as a field that allows a new form of human computer interaction, in addition to the use of natural language. There is a wide perception that the future of human-computer interaction is in themes such as entertainment, emotions, aesthetic pleasure, motivation, attention, engagement, etc. Studying the relation between natural language and affective information and dealing with its computational treatment is becoming crucial. For the development of WORDNET-AFFECT, we con- sidered as a starting point WORDNET DOMAINS (Magnini and Cavaglia, 2000), a multilingual extension of Word- Net, developed at ITC-irst. In WORDNET DOMAINS each synset has been annotated with at least one domain label (e.g. SPORT, POLITICS, MEDICINE), selected from a set of about two hundred labels hierarchically organized. A do- main may include synsets of different syntactic categories: for instance the domain MEDICINE groups together senses from Nouns, such as doctor#1 (i.e. the first sense of the word doctor) and hospital#1, and from Verbs such as operate#7. For WORDNET-AFFECT, our goal was to have an addi- tional hierarchy of "affective domain labels", independent from the domain hierarchy, with which the synsets repre- senting affective concepts are annotated.
Full-text available
This paper presents a rich annotation scheme for men- tions, co-reference, meronymy, sentiment expressions, modifiers of sentiment expressions including neutral- izers, negators, and intensifiers, and describes a large corpus annotated with this scheme. We describe how this corpus relates to recent, state-of-the-art work in sentiment analysis, and define the various annotation types, provide examples, and show statistics on occur- rence and inter-annotator agreement. This resource is the largest sentiment-topical corpus to date and is pub- licly available. It helps quantify sentiment phenomena, and allows for the construction of advanced sentiment systems and enables direct comparison of different al- gorithms.
Full-text available
The naive Bayes classifier greatly simplify learn-ing by assuming that features are independent given class. Although independence is generally a poor assumption, in practice naive Bayes often competes well with more sophisticated classifiers. Our broad goal is to understand the data character-istics which affect the performance of naive Bayes. Our approach uses Monte Carlo simulations that al-low a systematic study of classification accuracy for several classes of randomly generated prob-lems. We analyze the impact of the distribution entropy on the classification error, showing that low-entropy feature distributions yield good per-formance of naive Bayes. We also demonstrate that naive Bayes works well for certain nearly-functional feature dependencies, thus reaching its best performance in two opposite cases: completely independent features (as expected) and function-ally dependent features (which is surprising). An-other surprising result is that the accuracy of naive Bayes is not directly correlated with the degree of feature dependencies measured as the class-conditional mutual information between the fea-tures. Instead, a better predictor of naive Bayes ac-curacy is the amount of information about the class that is lost because of the independence assump-tion.
This chapter presents a rich annotation scheme for mentions, co-reference, meronymy, sentiment expressions, modifiers of sentiment expressions including neutralizers, negators, and intensifiers, and describes a large corpus annotated with this scheme. We define the various annotation types, provide examples, and show statistics on occurrence and inter-annotator agreement. This resource is the largest sentiment-topical corpus to date and is publicly available. It helps quantify sentiment phenomena, and allows for the construction of advanced sentiment systems and enables direct comparison of different algorithms.
The distillation of knowledge from the Web—also known as opinion mining and sentiment analysis—is a task that has recently raised growing interest for purposes such as customer service, predicting financial markets, monitoring public security, investigating elections, and measuring a health-related quality of life. This article considers past, present, and future trends of sentiment analysis by delving into the evolution of different tools and techniques—from heuristics to discourse structure, from coarse- to fine-grained analysis, and from keyword- to concept-level opinion mining.
A pressing need of vehicle quality management professionals is decision support for the vehicle defect discovery and classification process. In this paper, we employ text mining on a popular social medium used by vehicle enthusiasts: online discussion forums. We find that sentiment analysis, a conventional technique for consumer complaint detection, is insufficient for finding, categorizing, and prioritizing vehicle defects discussed in online forums, and we describe and evaluate a new process and decision support system for automotive defect identification and prioritization. Our findings provide managerial insights into how social media analytics can improve automotive quality management.