ArticlePDF Available

Abstract

In this current era, social media plays a important role in data exchange, sharing their thoughts. Emotional Effect of a person maintains an important role on their day to day life. Sentiment Analysis is a procedure of analyzing the opinions and polarity of thoughts of the person. Twitter is a main platform on sharing the thought's, opinion and sentiments on different occasions. Twitter Sentimental Analysis is method of analyzing the emotions from tweets (message posted by user in twitter). Tweets are helpful in extracting the Sentimental values from the user. The data provide the Polarity indication like positive, negative or unbiassed values. It is focused on the person's tweets and the hash tags for understanding the situations in each aspect of the criteria. The paper is to analyse the famous person's id's (@realdonaldtrump) or hash tags (#IPL2018) for understanding the mindset of people in each situation when the person has tweeted or has acted upon some incidents. The proposed system is to analyze the sentiment of the people using python, twitter API, Text Blob (Library for processing text). As the results it helps to analysis the post with a better accuracy.
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-7 Issue-4S, November 2018
343
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: E1989017519
Twitter Sentimental Analysis
Shobana G, Vigneshwara B, Maniraj Sai A.
Abstract: In this current era, social media plays a important
role in data exchange, sharing their thoughts. Emotional Effect
of a person maintains an important role on their day to day life.
Sentiment Analysis is a procedureof analyzing the opinions and
polarity of thoughts of the person. Twitter is a main platform on
sharing the thought's, opinion and sentiments on different
occasions. Twitter Sentimental Analysis is method of analyzing
the emotions from tweets (message posted by user in twitter).
Tweets are helpful in extracting the Sentimental values from the
user. The data provide the Polarity indication like positive,
negative or unbiassed values. It is focused on the person’s tweets
and the hash tags for understanding the situations in each aspect
of the criteria. The paper is to analyse the famous person’s id’s
(@realdonaldtrump) or hash tags (#IPL2018) for understanding
the mindset of people in each situation when the person has
tweeted or has acted upon some incidents. The proposed system is
to analyze the sentiment of the people using python, twitter API,
Text Blob (Library for processing text). As the results it helps to
analysis the post with a better accuracy.
Keywords: (@realdonaldtrump), (#IPL2018), Text Blob
(Library for Processing Text).
I. INTRODUCTION
In the past years, the young generation people are moving
towards the social media like Google Plus, WhatsApp,
Facebook, Twitter, etc. The social media is also revolving
with those people to get them involved by making current
trending insights concepts that is trending within a second.
In the recent years, the people are exposing their social
related issues through several social media by comments,
reviews, posts, hashtags, emoji’s, etc. which was followed
by many people and those tweets become popular soon.
Moreover, the social media is also bringing tremendous
opportunity platform for businesses to connect with the
consumers so easily. People rest on mostly user produced
content like, comments, over online for making the decision.
Example: if anyone has to buy a product or make a decision,
they initiallysearch its reviews online, converse about it on
social media. The content that is displayed for that product
is mainly taken into the point as well as the discussion in the
social media is also noticed and these made the way to make
our business a success. To automate our analysis based on
the reviews or comments in the social media by the people,
for a sentimental analysis. Sentimental Analysis (SA) is
introduced to the world to tell us the information is correct
or wrong in each scenario using the social media tags. Thus,
we can know about how world or people are reacting to
every aspect currently going in the world.
Revised Version Manuscript Received on 25 November 2018.
Shobana G, Assistant Professor, Department of Computer Science and
Engineering, Kumaraguru College of Technology, Coimbatore (Tamil
Nadu), India.
Vigneshwara B, UG Scholars, Department of Computer Science and
Engineering, Kumaraguru College of Technology, Coimbatore (Tamil
Nadu), India.
Maniraj Sai A, UG Scholars, Department of Computer Science and
Engineering, Kumaraguru College of Technology, Coimbatore (Tamil
Nadu), India.
Fig 1.1 System Architecture
The system architecture consists of the components
asshown in the figure 1.1 such as Tweets extraction from
twitter, preprocessing of data, feature extraction, Training
set are defined for the given analysis. The training set is
obtained by predefined set of positive or negative tweets
which can be done using naive Bayes or support vector
machine (SVM) and output obtained is positive, negative
tweets. The Classifier will classify the tweets according the
training set and regulates the polarity of the tweet as the
output.
In this paper, we are going to analysis the microblog
called as Twitter, classify the “tweets” into positive,
negative and neutral sentiment. We explore the method for
building such data using Twitter hashtags (e.g., #best
feeling, #DonaldTrump, #love) to identify positive,
negative, and neutral tweets to use for training three-way
sentiment classifiers. Thus, these tweets and the hashtags are
must for analyzing the thinking level of individual people.
II. LITERATURE SURVEY
Sentiment analysis is the process of analysis of the text
from many levels. First level is document level [3],the
classification task determine the class of an objectbased on
its attributes (Turney, 2002; Pang and Lee, 2004), and after
that it can analysed at the sentence level[5] for classifying
the sentence based on the negative, positive and neutral
sentiments (Hu and Liu, 2004; Kim and Hovy, 2004) and
next level is the phrase level[4] for definingif an expression
is unbiassed or polar and then remove uncertainty of
Twitter Sentimental Analysis
344
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: E1989017519
meaning fromthe polarity of the polar expressions
(Wilson et al., 2005; Agarwal et al., 2009. Bermingham and
Smeaton(2010) and Pak and Paroubek (2010). Go et al.
(2009) they used distant learning algorithm to obtain the
sentiment data [8]. In this techniques,positive emoticons
symbols in tweets such as “:)” “:-)”and negative emoticons
symbols in tweets such as like “:(” “:-(”. They proposed the
models using Naive Bayes algorithm for analysis the text
and the report are generated and visualized.
They used two methods such as unigrams for identifying
single word repeating over the context and bigrams for
identifying double word repeating over the context along
with Parts-of-Speech (POS) for analysing the tweets. But the
unigram method had reached a better way of analysis but the
bigrams and POS had failed to attempt his purpose.
Pak and Paroubek (2010) [2] collect the following tweets
considered as data which really helped them in similar
distant learning paradigm for setting a model for analysis.
They perform classification of task such as subjective,
objective. For subjective the informationare get from the
user tweets by means of text or image or symbols as Go et
al. (2009) [8]. For objective informationthe information are
obtained from verification of the data such as famous
newspapers like “Times of India”, “Washington Posts”
etc.Information which is taken for analysis is casual sample
of flowing tweets collected by using queries. In the past year
there have been numerousdocumentsobservingthe Twitter
sentiment and buzz [1], [2], [4] (Jansen et al. 2009; Pak and
Paroubek 2010; O’Connor et al. 2010; Tumasjan et al. 2010;
Bifet and Frank 2010; Barbosa and Feng 2010; Davidov,
Tsur, and Rappoport 2010). Furtherscientists have started to
discover the usage of part-of-speech structures but results
remain mixed. It has enormous interestingchances to
develop the innovative applications, because success of
many business depends on accessible information on online
sources such as blogs, twitter and other social networks.
Barbosa and Feng (2010)[4] has analysed the sentimental
classification on Twitter data.The test data of tweets are
collected, they have taken some of the syntax features for
analysis of tweets which contains symbols, retweet,
emoticons, tags, link, punctuation and exclamation marks,
semicolon are in the combination with structuresfor
identifying the polarity of words.
Kamps et al. (2002) [12] has analysed the data by using the
lexical database. Lexical database is description of lexemes.
Lexical database such as WordNet are used.This contains
the emotional content of a word. The distance metric of
words are used to determined semantic polarity of
adjectives.
Researchers are also trying to find different ways of
analysing tweets based on the ideas they had while
understanding the concept. Researchers tried this analysis
using some of the specified fields such as Machine learning
which uses Naive Bayes, Maximum entropy and SVM
alongside the Semantic Orientation based Word Net which
extracts equivalent words and similitude for the content
feature, then Lexicon based analysis based on the created
dataset which consists of pre-processed tweets and lastly,
Hybrid approach where some researchers combined the
supervised machine learning and lexicon based approaches
together to improve sentiment classification performance.
Gamon (2004) [9] has done sentiment analysis on feedback
data from the Global Support Services survey. They are used
query to identify the role of features like Part of Speech tags.
The accuracy of classifier can be obtained by some of
factors such as feature selection,from the testing data and
demonstrate the abstract linguistic analysis feature for
accuracy of data.
Devaki.p, et al (2017)[15] has done analysis on twitter data
for election. It indicates the popularity of parties in the
election based on positive tweets. This system uses Naïve
Bayes classifier algorithm are used to classify the positive
and negative tweets.
A comparative study of existing techniques for mining the
data which includes machine learning, Interdependent
Latent Dirichlet Allocation, lexicon-based approaches,
together with cross domain , cross-lingual methods and
some evaluation metrics. The concept level sentences
analysis uses the Combining Lexicon and Learning based
Approach. As the result of study, machine learning methods
such as Support Vector Machine and Naive Bayes have the
highest accuracy and can be regarded as the baseline
learning methods, while lexicon-based methods are very
effective in some cases.
More research is needed to determine whether the POS
features are just of poor quality due to the results of the
tagger or whether POS features are just less useful for
sentiment analysis in this domain. Features from an existing
sentiment lexicon were somewhat useful in conjunction with
microblogging features, but the microblogging features (i.e.,
the presence of intensifiers and positive/negative/neutral
emoticons and abbreviations) were clearly the most useful.
In this paper, we perform extensive feature analysis of
tweets using hashtags, ID’s and building model
classifications.
III. METHODOLOGY
In this method we uses textblob as a method to find the
polarity of the text ( positive text, negative text or neutral
text). The tweets are imported from the Twitter using the
(API) provided by the Twitter Developer. From these API
various fields like tweets, source, retweets, likes, language,
user etc. can be scrapped. After collecting these data, we can
analyses the various famous person thoughts on anevent or
occasion
Fig 3.1Architectural Flow of Twitter Analysis
International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-7 Issue-4S, November 2018
345
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: E1989017519
The figure 3.1explains the extraction of tweets id from
twitter through API, then preprocess the data that are
extracted. Preprocessing includes exclusion of unwanted
fields, segregating the fields important for analysis. Once the
fields are extracted and segregated CSV is created. Using
this CSV, the length of the message, Likes, retweets for the
id is extracted and various results are derived. With the
scraped tweets, classify the tweets whether positive or
negative or neutral.
A. Dataset Description
In this proposed system, we have used the dataset called
astwitterdataset. csv. It contains the following fields Tweets,
Len, ID, Date, Source, Likes, RT’s (Retweets), SA
(Sentimental Analysis).
B. Software Description
In the system the graphs such as Table, Bar graph,
Line graph are generated with the help of Spyder and
Jupyter notebook. The predefined functions are pandas,
numpy, matplotlib, pyplot, list, Dictionary. Pandas is used
for converting from csv file to dataset. Numpy is one of the
essential library for scientific calculating in Python. It
delivers a high-performance multidimensional array object,
and apparatuses for experementing with these arrays. Python
comprises ofnumerous built-in container categories: lists,
dictionaries, sets, and tuples. A list is the Python equal of an
array, but is resizable and can contain elements of different
types. A dictionary stores (key, value) pairs, like a Map in
Java or an object in JavaScript. Python library such as Text
Blobare used for processing the textual data. It provides API
for processing natural language processing (NLP) such as
part-of-speech tagging, noun phrase extraction, sentiment
analysis, classification, translation, and more. Tweepyisused
for accessing Twitter API and it is open sourced.
C. Data Analysis and Visualization
In Twitter the various famous personalities tweet their
thoughts on their opinion on an occasion. From their
thoughts, importance of that occasion and the polarity of
their tweet are analysed. Some of the analysis with the
dataset as follows.
Calculate the average length of the tweet and
visualize the average length for a period.
Visualize the favorites and retweets for each
personality.
Visualize the various source of the tweet.
Calculate the polarity of the tweets given by the
person
Visualize the Polarity of tweet (positive,
negative, neutral)
Compare the tweets polarity of various famous
personalities
Analyses to calculate the average length of the tweet
and visualize the average length for a period.
Calculate the average length of the tweet of the person
using the mean function and visualize the average length for
a period. With the calculated average. Visualize the average
length of tweet for a period.
Fig 3.2: Average length of Tweets for a given period
Analysistovisualize the favorites and retweets for each
personality.
For Visualizing the favorites and retweets for each
personality, it is needed to calculate the number of Favorites
and RT’s from CSV. Visualize the data and compare with
various personalities.
Fig 3.3: Narendra Modi Favorites and retweets
Fig 3.4 Rahul Gandhi Favorites and retweets
Analysis to visualize the various source of the tweet.
Twitter tweets are tweeted through various sources like
Media Source, iPhone or android phone. Visualize the
number of sources used by a user to tweet his opinion in
twitter. Most used source for tweets by the personalities can
be identified.
Fig 3.5. Source for Narendra Modi Tweets
Twitter Sentimental Analysis
346
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: E1989017519
Fig 3.6 Sources for Rahul Gandhi Tweets
Analysis tocalculate the polarity of the tweets given by
the person
Using Textblob calculate the polarity of each tweets and
calculate the average polarity level of the tweets and
interfere that the person tweets polarity level. This allows to
find the polarity of the tweet on an occasion. Calculate and
store the three levels in the list.
Analysis tovisualize the polarity percent
From the calculated polarity we can visualize the polarity
through charts. Compare the polarity with various
personalities and conclude it with the personalities who
tweets maximum positive tweets.
Fig 3.7 Polarity level of Narendra Modi tweet.
Comparison of the tweets polarity of various famous
personalities
From the calculated polarity levels of various
personalities, we can compare their polarity level to find the
polarity of their tweet,whether their tweet is positive or
negative or neutral. Visualize various personalities with their
tweets polarity and compare.
Fig 3.8 Comparison of the Polarity Levels
IV. CONCLUSION
Twitter sentiment analysis comes under the category of
text and opinion mining. It focuses on analyzing the
sentiments of the tweets and feeding the data to a machine
learning model to train it and then check its accuracy, so that
we can use this model for future use according to the results.
It comprises of steps like data collection, text pre-
processing, sentiment detection, sentiment classification,
training and testing the model. This research topic has
evolved during the last decade with models reaching the
efficiency of almost 85%-90%. But it still lacks the
dimension of diversity in the data. Along with this it has a
lot of application issues with the slang used and the short
forms of words. Many analyzers don’t perform well when
the number of classes are increased. Also, it’s still not tested
that how accurate the model will be for topics other than the
one in consideration. Hence sentiment analysis has a very
bright scope of development in future.
REFERENCES
1. Jansen,B.J.; Zhang,M.; Sobel,K.; and Chowdury,A. (2009),
“Twitterpower: Tweets as electronic word of mouth”, Journal of the
American Society for Information Science and Technology
60(11):21692188.
2. Pak, A., and Paroubek, P (2010), “Twitter as a corpus for sentiment
analysis and opinion mining”. In Proc. of LREC.
3. Pang, B., and Lee, L. (2008), ”Opinion mining and sentiment
analysis. Foundations and Trends in Information Retrieval” 2(1 -2):1
135.
4. Wilson, T. Wiebe, J.; and Hoffmann, (P. 2009),”Recognizing
contextual polarity: An exploration of features for phrase-level
sentiment analysis. Computational Li nguistics”, 35(3):399–433.
5. M Hu and B Liu. (2004),”Mining and summarizing customer reviews.
KDD”.
6. L. Barbosa, J. Feng. “Robust Sentiment Detection on Twitterfrom
Biased and Noisy Data”. COLING 2010: Poster Volume,pp. 36-44.
7. J. Kamps, M. Marx, R. J. Mokken, and M. De Rijke, “Using wordnet
to measure semantic orientations of adjectives,” 2004.
8. Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment
classification using distant supervision. Technical report, Stanford.
9. David Zimbra, M. Ghiassi and Sean Lee, “Brand-Related Twitter
Sentiment Analysis using Feature Engineering and the Dynamic
Architecture for Artificial Neural Networks”, IEEE 1530-1605, 2016.
10. Varsha Sahayak, Vijaya Shete and Apashabi Pathan, “Sentiment
Analysis on Twitter Data”, (IJIRAE) ISSN: 2349-2163, January 2015.
11. Peiman Barnaghi, John G. Breslin and Parsa Ghaffari, “Opinion
Mining and Sentiment Polarity on Twitter and Correlation between
Events and Sentiment”, 2016 IEEE Second International Conference
on Big Data Computing Service and Applications.
12. Mondher Bouazizi and Tomoaki Ohtsuki, “Sentiment Analysis: from
Binary to Multi-Class Classification”, IEEE ICC 2016 SAC Social
Networking, ISBN 978-1-4799-6664-6.
13. Nehal Mamgain, Ekta Mehta, Ankush Mittal and Gaurav Bhatt,
“Sentiment Analysis of Top Colleges in India Using Twitter Data”,
(IEEE) ISBN -978-1-5090-0082-1, 2016.
14. https://www.geeksforgeeks.org/twitter-sentiment-analysis-using-
python.” Twitter Sentimental analysis for Realdonaldtrump” Devaki
P, Ilakiya, J, Indumathi, R and Arul Priya, M,
15. “Geospatially and literally analysing tweets”, Journal of Advanced
Research in Dynamical and Control Systems Volume 9, Issue Special
Issue 14, 2017, Pages 1002-1009
Conference Paper
For a long time, the vital part of data gathering behavior depends on the opinion of the other people has been solved. Due to the continuous growth of popularity and availability of opinions from the various social sites. These are personal blogs and online feedback/reviews sites when there is a lot of data generated by user’s opinions is available freely; have important information that can be very helpful for any organization and consumer as well. Thus new challenges and opportunities are emerging and data scientists to absorb important information from a large amount of data. Now they are using different types of tools and technologies to made decisions? Thus data science helps us to make decisions very fast and in a better way. Data science is very pure science which gives us all detail in bulk. In this paper, we describe the different needs, issues and challenges, and solutions of the problem of big data (sentiments analysis) which possess the result of data science.
Conference Paper
Full-text available
Merchants selling products on the Web often ask their customers to review the products that they have purchased and the associated services. As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. For a popular product, the number of reviews can be in hundreds or even thousands. This makes it difficult for a potential customer to read them to make an informed decision on whether to purchase the product. It also makes it difficult for the manufacturer of the product to keep track and to manage customer opinions. For the manufacturer, there are additional difficulties because many merchant sites may sell the same product and the manufacturer normally produces many kinds of products. In this research, we aim to mine and to summarize all the customer reviews of a product. This summarization task is different from traditional text summarization because we only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative. We do not summarize the reviews by selecting a subset or rewrite some of the original sentences from the reviews to capture the main points as in the classic text summarization. Our task is performed in three steps: (1) mining product features that have been commented on by customers; (2) identifying opinion sentences in each review and deciding whether each opinion sentence is positive or negative; (3) summarizing the results. This paper proposes several novel techniques to perform these tasks. Our experimental results using reviews of a number of products sold online demonstrate the effectiveness of the techniques.
Conference Paper
Full-text available
Microblogging today has become a very popular communication tool among Internet users. Millions of users share opinions on different aspects of life everyday. Therefore microblogging web-sites are rich sources of data for opinion mining and sentiment analysis. Because microblogging has appeared relatively recently, there are a few research works that were devoted to this topic. In our paper, we focus on using Twitter, the most popular microblogging platform, for the task of sentiment analysis. We show how to automatically collect a corpus for sentiment analysis and opinion mining purposes. We perform linguistic analysis of the collected corpus and explain discovered phenomena. Using the corpus, we build a sentiment classifier , that is able to determine positive, negative and neutral se ntiments for a document. Experimental evaluations show that our proposed techniques are efficient and performs better than previousl y proposed methods. In our research, we worked with English, however, the proposed technique can be used with any other language.
Article
Full-text available
In this paper we report research results investigating microblogging as a form of electronic word-of-mouth for sharing consumer opinions concerning brands. We ana- lyzed more than 150,000 microblog postings containing branding comments, sentiments, and opinions. We inves- tigated the overall structure of these microblog postings, the types of expressions, and the movement in positive or negative sentiment. We compared automated methods of classifying sentiment in these microblogs with man- ual coding. Using a case study approach, we analyzed the range, frequency, timing, and content of tweets in a corporate account. Our research findings show that 19% of microblogs contain mention of a brand. Of the brand- ing microblogs, nearly 20% contained some expression of brand sentiments. Of these, more than 50% were posi- tive and 33% were critical of the company or product. Our comparison of automated and manual coding showed no significant differences between the two approaches. In analyzing microblogs for structure and composition, the linguistic structure of tweets approximate the linguistic patterns of natural language expressions. We find that microblogging is an online tool for customer word of mouth communications and discuss the implications for corporations using microblogging as part of their overall marketing strategy.
Article
Full-text available
Many approaches to automatic sentiment analysis begin with a large lexicon of words marked with their prior polarity (also called semantic orientation). However, the contextual polarity of the phrase in which a particular instance of a word appears may be quite different from the word's prior polarity. Positive words are used in phrases expressing negative sentiments, or vice versa. Also, quite often words that are positive or negative out of context are neutral in context, meaning they are not even being used to express a sentiment. The goal of this work is to automatically distinguish between prior and contextual polarity, with a focus on understanding which features are important for this task. Because an important aspect of the problem is identifying when polar terms are being used in neutral contexts, features for distinguishing between neutral and polar instances are evaluated, as well as features for distinguishing between positive and negative contextual polarity. The evaluation includes assessing the performance of features across multiple machine learning algorithms. For all learning algorithms except one, the combination of all features together gives the best performance. Another facet of the evaluation considers how the presence of neutral instances affects the performance of features for distinguishing between positive and negative polarity. These experiments show that the presence of neutral instances greatly degrades the performance of these features, and that perhaps the best way to improve performance across all polarity classes is to improve the system's ability to identify when an instance is neutral.
Article
We introduce a novel approach for automatically classify-ing the sentiment of Twitter messages. These messages are classified as either positive or negative with respect to a query term. This is useful for consumers who want to re-search the sentiment of products before purchase, or com-panies that want to monitor the public sentiment of their brands. There is no previous research on classifying sen-timent of messages on microblogging services like Twitter. We present the results of machine learning algorithms for classifying the sentiment of Twitter messages using distant supervision. Our training data consists of Twitter messages with emoticons, which are used as noisy labels. This type of training data is abundantly available and can be obtained through automated means. We show that machine learn-ing algorithms (Naive Bayes, Maximum Entropy, and SVM) have accuracy above 80% when trained with emoticon data. This paper also describes the preprocessing steps needed in order to achieve high accuracy. The main contribution of this paper is the idea of using tweets with emoticons for distant supervised learning.
Conference Paper
In this paper, we propose an approach to automatically detect sentiments on Twit- ter messages (tweets) that explores some characteristics of how tweets are written and meta-information of the words that compose these messages. Moreover, we leverage sources of noisy labels as our training data. These noisy labels were provided by a few sentiment detection websites over twitter data. In our experi- ments, we show that since our features are able to capture a more abstract represen- tation of tweets, our solution is more ef- fective than previous ones and also more robust regarding biased and noisy data, which is the kind of data provided by these sources.
Article
An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area, of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.