Conference PaperPDF Available

Sentiment Analysis using Sentiwordnet and Machine Learning Approach (Indonesia general election opinion from the twitter content)

Authors:

Figures

Content may be subject to copyright.
Sentiment Analysis using Sentiwordnet and
Machine Learning Approach
(Indonesia general election opinion from the twitter
content)
Eka Miranda
Information Systems Department,
School of Information Systems
Bina Nusantara University
Jakarta, Indonesia 11480
ekamiranda@binus.ac.id
Mediana Aryuni
Information Systems Department,
School of Information Systems
Bina Nusantara University
Jakarta, Indonesia 11480
mediana.aryuni@binus.ac.id
Edwin Satya Surya
Information Systems Department,
School of Information Systems
Bina Nusantara University
Jakarta, Indonesia 11480
edwin.surya@binus.ac.id
Ricky Hariyanto
Information Systems Department,
School of Information Systems
Bina Nusantara University
Jakarta, Indonesia 11480
ricky.hariyanto@binus.ac.id
Abstract— The computational process of identifying and
categorizing opinions that are expressed in the piece of text
could be employed to determine information insight and the
writer's opinion toward a particular topic. Most sentiment
analysis employed for English text. Contrarily, a plethora
method for sentiment analysis has been reported that the task
stayed an interesting question for Indonesian text. The
invention of machine learning models and broad accessibility
of Twitter data on previous years have derived many
researchers to take a machine learning model to resolve the
sentiment analysis problem. The objective of this study is to
build a sentiment analysis model using Sentiwordnet and
machine learning for Indonesia general election opinion in
Indonesian text from the twitter content. The data of the tweet
was taken namely, the username, and the tweet itself. The
theme of the tweet was the topic related to the 2019 general
election figures, namely Joko Widodo and Prabowo Subianto.
The period of data collection was November 13, 2018, to
January 11, 2019, during the campaign period. The tweet was
in Indonesia language. The result revealed sentiment analysis
with the Naïve Bayes classification method showed 74.94%
accuracy for Joko Widodo topic, while 71.37% accuracy for
Prabowo topic.
Keywords—sentiment analysis, SentiWordNet, twitter, naïve
bayes classification
I. I
NTRODUCTION
In the modern era, communication and socialization
could be done by indirect communication through new media
namely social media. Social media or often called social
network is a tool that could be used to communicate with
each other without direct interaction between individuals [1].
About 120 million Indonesia people use mobile devices,
such as smartphones or tablets to access social media (with
the penetration rate of 45%). Online activity through social
media reaches 37% from the population of in Indonesia in a
week [2]. Over the past 2 years, 90% of all data over the
world was produced from the internet [3]. In one day, the
data created from around the world increased by 2.5
quintillion bytes. The number was produced from the
combination of many tools, including search engines, such as
Google, Yahoo; electronic mail; digital photos; internet of
things, and social media [4]. Facebook produced 6 billion
bytes of data per day. Twitter was successfully produced 500
million bytes of data per day [4]. The computational process
of identifying and categorizing opinions that are expressed in
the piece of text could be employed to determine information
insight and the writer's opinion toward a particular topic [5].
Sentiment analysis is one of the most appealing research
topics in computer science [6]. Sentiment analysis or opinion
mining is one of the main tasks of NLP (Natural Language
Processing) [7].
Zhaou suggested a new method combining social context
and topic context to analyze micro blog sentiment [8]. Fang
presented the sentiment polarity categorization process to
analysis product review data [7]. Naiknaware suggested
social media sentiment analysis using machine learning
classifiers from the online posts from twitter [9]. Hausler
proposed the relation among news-based sentiment, captured
over a machine learning approach, and the US securitized
and direct commercial real estate market [10].
Social media has a great impact on various issues in the
community, in Indonesian people as well. The Indonesian
Presidential Election 2019-2024 issue is the most appealing
for Indonesian people today. Prediction about the winner of
the Indonesia presidential election became a hot and
attractive discussion among Indonesian citizen on social
media as well. The candidate team (Joko Widodo and
Prabowo) have exploited information from various media
(social media as well) to get an insight of citizen opinion
about the candidate. A collective opinion was produced by
social media (Twitter) directly influence to the brand or
public-figure image. Brand image or public-figure image is
the one important issue related to their/him/her credibility.
The positive sentiment to the brand or public-figure produces
a positive impact on the person or the brand and vice versa
[2].
The research employed by the Indonesia Press council
and the Danish embassy analyzed the twitter account profile
and the corresponding between person tendency to the
presidential candidate and the tweet and re-tweet content
[11]. The citizen sentiment about Indonesia president
candidate could be considered as an input for the campaign
strategy. The important impact of social media was
considered by politician and the candidate. Internet and
social media could be a potential tool for the campaign.
978-1-7281-3333-1/19/$31.00 ©2019 IEEE
2019 International Conference on Information Management and Technology (ICIMTech)
62
19-20 Au
g
ust 2019, Jakarta & Bali, Indonesia
Social media and internet could be used as a tool to deliver
and communicate the candidate working plan to the wide
community [12]. Since rich data available on social media
(Twitter) about Indonesia president candidate, so the data
could be used to predict citizen sentiment about the
candidate. One method to analyze the text data is sentiment
analysis.
Despite many studies have proposed sentiment analysis
for text content analysis, to the best of our knowledge, it is
just a few of them has been used the Indonesian text
especially for Indonesia general election (presidential
election) topic. Most sentiment analysis employed for
English text. Hence, the objective of this study is to build a
sentiment analysis model using Sentiwordnet and machine
learning for Indonesia general election (presidential election)
topic in Indonesian text from the twitter content. Sentiment
analysis model could be used as an alternative text analysis
tools instead of a survey approach.
II.
LITERATURE
S
TUDY
A. Sentiment Analysis
Opinion mining (sometimes recognized as sentiment
analysis) referring to the use of natural language processing
(NLP), text analysis and computational linguistics to
identify, extract, quantify, and analyze the content of text
information. Sentiment analysis was broadly used to the
analyze of the customer opinion, for instance, reviews and
survey responses from social media [13].
Sentiment classification is a process of detecting
expression of the text, positive expression or negative
expression about an issue or topic. Sentiment classification
has become a popular technology on Twitter content
analysis. The tweet sentiment classification has been widely
applied in various fields namely politics, social, market share
research, and others [14].
The sentiment analysis process uses the Natural
Language Processing (stemming, part-of-speech tagging) and
also uses the additional resources (thesaurus, a dictionary of
sentiments or emotions) [15]. Type of sentiment analysis
tasks namely opinion mining and emotion mining.
Taxonomy of sentiment analysis tasks is shown in Figure 1.
This study would use the opining mining task.
Fig. 1. Taxonomy of sentiment analysis tasks [16]
B. Classification Type
Sentiment analysis can be done with two approaches,
namely the Lexicon-based and Machine Learning
approaches.
a. Machine leaning approach
The machine learning approach can be grouped
into two main types: supervised and unsupervised.
The success of the two is primarily based on the
selection and extraction of a suitable set of features
used to identify the sentiment [17].
b. The Lexicon-based approach mainly depends on
the sentiment dictionary (a set of terms that are
identified compiled and developed for traditional
communication genres, for example, Opinion
Finder lexicon or even more complex structures for
example ontology, or dictionaries that measure the
semantic orientation of words or phrases).
The sentiment classification type is shown in Figure 2.
This study used machine learning
approach.
Fig. 2. The sentiment classification type [18]
C. Term Weighting
Term weighting is the process of calculating the weights
for each term in the vocabulary. There are two types of the
word weighting scheme namely supervised and unsupervised
word weighting. This study was used unsupervised word
weighting. Unsupervised word weighting is a weighting
scheme that does not depend on the data category. This
method namely: Term Frequency, Inverse Document
Frequency, Term Frequency-Inverse Document Frequency
and Binary. tf–idf (term frequency-inverse document
frequency), is a numerical statistic that is proposed to show
the importance of a word in a document [19]. tf-idf formula
is shown in equation 1 and 2.
W(d, t) = T F(d, t) (1)
d is the document and t is the word. N is the number of
documents, and d
t
is a document that contains the word t.
D. SentiWordNet
SentiWordNet (version 1.0) is a lexical resource where
every word on Wordnet (version 2.0) related to three
numerical scores, namely Obj (s), Post (s) and Neg (s).
978-1-7281-3333-1/19/$31.00 ©2019 IEEE
2019 International Conference on Information Management and Technology (ICIMTech)
63
19-20 Au
g
ust 2019, Jakarta & Bali, Indonesia
These scores defining the objectivity, positivity, and
negativity level in the term that contained in the words.
Each score value ranges from 0.0 to 1.0, and their sum is 1.0
for each synset (synset is a set of one or more synonyms).
SentiWordNet used a random walk algorithm. The theory
for the random walk stage namely, if 2 synset or words have
the same context, they tend to have the same sentiment.
Each synset will be associated with a positive or negative
context. The more relationships with a positive context, the
greater the positive value, and vice versa
[20].
E. Naïve Bayes Classifier
Naive Bayes classifier (NBC) is one of the machine learning
methods that utilizes probability and statistical calculations
to predict probabilities in the future based on previous
experience. This research used the Naive Bayes classifier
due to this method needs a small amount of training data to
appraise the importance of a parameter for classification
[21]. Naive Bayes is a conditional probability-based model
for classification. The problem instance that was given in
this model represented by a vector x = (x
1
…. x
n
) shows
some n features (independent variables) and assigns to the
instance probabilities for every k possible
outcomes or classes C
k
. Bayes' theorem calculated the
conditional probability using the formula in equation 3. [21].
III.
DATA AND METHODS
A. Dataset
The tweet data was taken from Twitter using technology
provided by Twitter for anyone who wants to retrieve their
data for free. Additional tools were made for this purpose,
namely: (1) Twitter Application Programming Interface
(API) Credentials, (2) Python script with the Tweepy library
that will pull data and put it in a CSV file with detailed data
(the date tweet was taken, the user's username, and the tweet
itself). The theme of the tweet was the topic related to the
2019 general election figures, namely Joko Widodo and
Prabowo Subianto. The period of data collection was
November 13, 2018, to January 11, 2019, during the
campaign period. The tweet was in Indonesia language.
B. Methods
This study consists of four main steps, namely Data pre-
processing, determine class label with SentiWordNet,
Sentiment analysis classification and Evaluate the result.
1. Data pre-processing
This step consists of:
a. Tweet selection
This step would remove a tweet that contains
hashtag (#), re-tweet features, duplicating tweets or
links solely.
b. Normalize the sentences
This step would revise word spelling and improve
the word based on the Indonesia dictionary
c. Translate tweet from Indonesia language into
English
d. Data cleansing
This step would remove special characters
(hashtags, periods, commas), usernames, URLs,
HTML tags, words consisting of one character,
excess spaces and number
e. Tokenization and case conversion
This step would parse the sentences, paragraphs, or
documents into smaller parts, called tokens or
independent words. Subsequently, all token would
be converted into the lower case to eliminate the
difference between lower case and capital case.
f. Remove the not alphabet token
2. Determine class label with SentiWordNet
This step would receive every tweet, calculate the value
of each word on the tweet based on SentiWordNet, and
combine each value to find out the positive or negative
value of the word
3. Sentiment analysis classification
a. Term weighting
The TF-IDF method would process the
vectorization of each tweet in the form of text and
change them into numbers.
b. Feature selection
Bigam (part of N-gram) was used as the feature
selection method (word selection). Bigram grouped
words on tweets into groups of words consisting of
2 words to achieve more accurate sentiment
analysis results. Subsequently, k-fold cross-
validation was employed to determine the training
data and test data. 10-fold cross-validation was used
in this study. This is a common technique in k-fold
cross-validation. 9-fold/9 folds was used for
training classifier and 1-fold was used for testing
part.
c. Sentiment analysis classification
Naïve Bayes method was employed for this
classification.
4. Evaluate the result
Evaluation based on confusion matrix was employed to
evaluate the result.
Research framework is shown in Figure 3.
Fig. 3. Research framework
978-1-7281-3333-1/19/$31.00 ©2019 IEEE
2019 International Conference on Information Management and Technology (ICIMTech)
64
19-20 Au
g
ust 2019, Jakarta & Bali, Indonesia
IV.
RESULT AND DISCUSSION
A. Data pre-processing
The words cleansing and normalization process were
performed by removing the duplicate tweet and removing a
tweet that contains the username, emoticon or hashtag solely.
The cleansing and normalization process produced 10662
tweets (6152 tweets for Joko Widodo topic and 6152 tweets
for Prabowo topic respectively). Since the lack of Python
Google Translate capability to translate Indonesian into the
English language subsequently, all words as a result from
cleansing and normalization process were translated into
English through Google translate and Cloud Translation API
from Google Cloud Platform. Data cleansing itself was
performed by removing username, URL, HTML or tags,
eliminating special characters, removing all words that
contain one character, eliminating excess space, tokenizing
and case conversion (convert tweet into lower case),
removing stop-words and lemmatization and POS Tagging.
The lemmatization performed by calling the
WordNetLemmatized function. Subsequently, Part-of-speech
tagging performed to identify words on the sentences. The
word identified as a verb, adjective, adverb and noun. The
script for lemmatization and POS Tagging is shown in
Figure 4.
Fig. 4. Lemmatization and POS Tagging script
B. Determine class label with SentiWordNet
There are two stages to determine class labels namely:
Split the text and Lemmatization the split text. Split the text,
this process would parse the tweets into words using space
characters as references. Lemmatized word, this process aims
to make sure the split text results could back to the base
words without additional letters (the suffix). Subsequently,
score analysis program that designed in this study would
accumulate sentiment score and recognize the positive,
neutral, and negative scores based on SentiWordNet
dictionary for each word from a tweet sent by Twitter users.
Script to accumulate the total score of positive, negative and
neutral sentiment is shown in Figure 5-7 respectively.
Fig. 5. Script to accumulate positive score
If accumulated of positive score > accumulated of
negative score, a tweet would be recognized as positive
sentiment, if accumulated of negative score > accumulated of
positive score, a tweet would be recognized as negative
sentiment, if accumulated of positive score = accumulated of
negative score, a tweet would be recognized as a neutral
score. Neutral score was accumulated from the formula in
equation 3.
Neutral score =
1 – (Positive score + Negative score) (3)
Fig. 6. Script to accumulate negative score
978-1-7281-3333-1/19/$31.00 ©2019 IEEE
2019 International Conference on Information Management and Technology (ICIMTech)
65
19-20 Au
g
ust 2019, Jakarta & Bali, Indonesia
Fig. 7. Script to accumulate neutral score
Table 1 shows the classification result for Joko Widodo
topic and Prabowo topic. Data pre-processing produced
10661 tweet and 6152 tweets for each topic respectively.
69.54% for Joko Widodo topic has a positive score, 10.64%
has neutral score and 19.82% has a negative score. While
61.52% for Prabowo topic has a positive score, 10.71% have
neutral score and 25% has a negative score. This
accumulated score would be used as a reference to define the
sentiment of a tweet.
TABLE I. T
HE
C
LASSIFICATION
R
ESULT
Subject/ Sentiment Positive Neutral Negative
Prabowo Subianto 3955 659 1538
Joko Widodo 7414 1134 2113
C. Sentiment Analysis Classification
This step began with the term weighting process. This
process has used the functions of the Sklearn library to get
the TfidVectorizer function. This function has two
parameters namely: (1) the stop_words parameter to remove
stop-words and (2) ngram_range parameter to select the
word range in the n_gram process. Subsequently, the data
were divided into training data and testing dataset. The
KFold function and 10_split parameter were used to produce
9 fold from the dataset as a training data and 1 fold from the
dataset as testing data. Afterwards, the sentiment analysis
classification was performed using the naïve Bayes
classification method. Each topic (Joko Widodo and
Prabowo) of tweet data would be trained using the Naïve
Bayes classifier. Once the Naïve Bayes classifier has been
trained, the sentiment label prediction would be appended
into each tweet and the data would be stored in an array.
Finally, the actual sentiment label results and the prediction
label results would be used to calculate the accuracy of
Naïve Bayes classifier based on the confusion matrix.
Confusion matrix was used to calculate the accuracy ratio
by dividing correct predictions with total predictions made.
Recall ratio shows part of actual positives was identified
correctly, Precision shows part of positive identifications
was actually correct and F1 Score calculates a mean of
precision and recall. The confusion matrix for the
classification result is shown in Table II and Table III.
TABLE II.
THE CONFUSION MATRIX
:
J
OKO WIDODO
T
OPIC
Positive Negative Neutral
Positive 733.50 7.50 0.40
Negative 159.30 51.80 0.20
Neutral 97.10 2. 70 13.60
Predicted Class
Actual Class
TABLE III.
THE CONFUSION MATRIX
:
P
RABOWO TOPIC
Positive Negative Neutral
Positive 385.20 9.00 1. 30
Negative 115.40 37. 00 1.40
Neutral 450.50 3. 50 16.90
Predicted Class
Actual Class
Sentiment analysis with the Naïve Bayes classification
method showed 74.94% accuracy, 45.23% the average recall
value, 84.61% the average precision value, and 46.62% the
average F1 score for Joko Widodo topic. While 71.37%
accuracy, 49.02% the average recall value, 77.7% the
average precision value, and 50.58% the average F1 score for
Prabowo topic. Supervised classification method required a
corpus (words and the weight of positive value and negative
value of the words) to classify a tweet. This research was
used the SentiwordNet corpus in the English language to
classify a tweet while the tweet data that was used in this
study itself written in Indonesia language. This research has
already explored experimentally Indonesian-language corpus
(Barasa) to classify a tweet and found several issues namely
lack of the phyton and Github document for Indonesia
language and imbalance of sentiment weight for the words.
Since those two issues, this research used SentiwordNet
dictionary and translate Indonesia tweet into English.
Unfortunately, not all tweets were written in formal
Indonesian language. Almost all tweets wrote in informal
978-1-7281-3333-1/19/$31.00 ©2019 IEEE
2019 International Conference on Information Management and Technology (ICIMTech)
66
19-20 Au
g
ust 2019, Jakarta & Bali, Indonesia
Indonesian language. Translating the informal Indonesia
language into English was another challenge, due to no
Indonesian informal language dictionary available. In
addition, the accuracy of the classifier depends on the data
(the time period of data collection and the number of data).
The data was collected on the campaign period. Collecting
data in a different time period would produce a difference
accuracy result.
V. C
ONCLUSION AND
F
UTURE
W
ORK
This study developed sentiment analysis using
Sentiwordnet and machine learning approach (Naïve Bayes
classification) for Indonesia general election opinion from
the twitter content. The results showed that the Naïve Bayes
classifier model achieved high performance to analyze the
sentiment content from Twitter on Indonesia language for
Indonesia general election opinion (Joko Widodo and
Prabowo topic). A detailed workflow has been introduced
that includes four main steps namely Data pre-processing,
Determine class label with SentiWordNet, Sentiment
analysis classification and Evaluate the result. Operationally
the sentiment analysis classification was performed by
cleansing and normalization process then translated tweet
into English through Google translate and Cloud Translation
API from Google Cloud Platform. Split text, Lemmatization
split text and SentiWordNet were employed to determine the
class label for a tweet. Tweet sentiment classification
performed using naive Bayes classifier. The result of this
research could be used as an alternative text analysis tools
instead of a survey approach for Indonesia general election
topic (presidential election) topic and could be considered as
an input for the campaign strategy.  ǡ  
         
     Ǥ    
 
ǦǤ   
       
      
ǦǤ
A
CKNOWLEDGMENT
We appreciatively admit the support from Research and
Technology Transfer office (TO), Bina Nusantara University
which helped the funding for this paper.
R
EFERENCES
[1] I. Adiyana and R. F. Hakim, “Implementasi text mining pada mesin
pencarian Twitter untuk menganalisis topik – topik terkait KPK dan
Jokowi”, Prosiding Seminar Nasional Matematika dan Pendidikan
Matematika UMS 2015, pp. 570-581, Yogyakarta: Universitas
Muhammadiyah Surakarta, 2015.
[2] W. K. Pertiwi, “Riset ungkap pola pemakaian medsos orang
Indonesia”, Kompas, March 1, 2018. [Online]. Available: Kompas,
https://tekno.kompas.com. [Accessed October 14, 2018].
[3] B. Marr, “How much data do we create every day? The mind-blowing
stats everyone should read”, May 21, 2018. [Online]. Available:
https://www.forbes.com. [Accessed October 13, 2018].
[4] Yudhianto, “Just a day, 2.5 quintillion bytes are created around the
world,” Dec. 10, 2016. [Online]. Available:
https://inet.detik.com/consumer/d-3367959/cuma-sehari-25-
quintillion-byte-tercipta-di-seluruh-dunia. [Accessed: Oct, 13, 2018].
[5] F. Thia, “An overview of future data trends that have an impact on
business,” Aug. 21, 2018. [Online]. Available:
https://sains.kompas.com/read/2018/08/21/205840823/gambaran-tren-
masa-depan-data-yang-berdampak-pada-bisnis. [Accessed: Oct, 13,
2018].
[6] M. V. Mäntyala, D. Graziotin, and M. Kuutilaa, “The evolution of
sentiment analysis—A review of research topics, venues, and top
cited papers”, Computer Science Review, Volume 27, pp. 16-32, Feb.
2018.
[7] X. Fang and J. Zhan, “Sentiment analysis using product review data”,
Journal og Big Data, 2:5, pp. 1-14, 2015.
[8] X. Zou, J. Yang, and J. Zhang, “Microblog sentiment analysis using
social and topic context”, Plos One, 13:2, pp. 1-24, 2018.
[9] B. Naiknaware, B. Kushwaha, and S. Ka wathekar, “Social Media
Sentiment Analysis using Machine Learning Classifiers”,
International Journal of Computer Science and Mobile Computing,
Vol.6 Issue.6, pp. 465-472, Jun. 2017.
[10] J. Hausler, J. Ruscheinsky, and M. Lang, “News-based sentiment
analysis in real estate: a machine learning approach”, Journal of
Property Research, Vol. 35, No. 4, pp. 344–371, 2018.
[11] E. F. Kusuma, “Bagaimana peran Twitter mempengaruhi politik
Indonesia?”, June 16, 2015. [Online]. Available: Inet Detik,
https://inet.detik.com. [Accessed: October 16, 2018].
[12] S. Adi, M. Wulandari, A. K. Mardiana, and A. Muzakki, “Survei:
topik dan tren analisis sentimen pada media online”, Seminar
Nasional Teknologi Informasi dan Multimedia 2018, pp. 55 – 60,
Yogyakarta: Universitas AMIKOM Yogyakarta., 2018.
[13] S. Saad and B. Saberi, “Sentiment Analysis or Opinion Mining: A
Review”, International Journal on Advanced Science, Engineering
and Information Technology, Vol. 7, No 5, pp. 1-7, 2017.
[14] J. D. Novakovi, A. Veljovi, S. S. Ili, Z. M. Papi. “Evaluation of
Classification Models in Machine Learning”, Theory and
Applications of Mathematics & Computer Science, Vol 7, No 1, pp.
39-46, 2017.
[15] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis
algorithms and applications: A survey”, Ain Shams Engineering
Journal, Vol. 5, Issue 4, pp. 1093–1113, 2014.
[16] A. Yadollahi, A. G. Shahraki, and O. R. Zaiane, “Current state of text
sentiment analysis from opinion”, ACM Computing Surveys, Vol. 50,
No. 2, Article 25, pp. 1-33, 2017.
[17] D. Tsarev, M. Petrovskiy, and I. Mashechkin, “Supervised and
Unsupervised Text Classification via Generic Summarization”,
International Journal of Computer Information Systems and Industrial
Management Applications, Volume 5, pp. 509-515, 2013.
[18] S. Symeonidis, “5 Things You Need to Know about Sentiment
Analysis and Classification”, Democritus University of Thrace,
[Online]. Available: KDnuggets,
https://www.kdnuggets.com/2018/03/5-things-sentiment-analysis-
classification.html. [Accessed April 12, 2019]
[19] S. Qaiser and R. Ali, “Text Mining: Use of TF-IDF to Examine the
Relevance of Words to Documents”, International Journal of
Computer Applications, Volume 181, No.1, pp. 1-5, Jul. 2018.
[20] S. Baccianella, A. Esuli, and F. Sebastiani, “SentiWordNet 3.0: An
Enhanced Lexical Resource for Sentiment Analysis and Opinion
Mining”, Proceedings of the International Conference on Language
Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta,
Malta, pp. 2200-2204, 2010.
[21] G. Kaur and E. N. Oberai, “A Review Article On Naïve Bayes
Classifier With Various Smoothing Techniques”, International
Journal of Computer Science and Mobile Computing, Vol. 3, Issue.
10, pp. 864 – 868, Oct. 2014.
978-1-7281-3333-1/19/$31.00 ©2019 IEEE
2019 International Conference on Information Management and Technology (ICIMTech)
67
19-20 Au
g
ust 2019, Jakarta & Bali, Indonesia
... Nandi et al. (Pal Nandi et al. 2022) used a fuzzy and fast Fourier transform in sentiment classification. This paper shows a different way to use the sentiment of the text using Senti-WordNet score (Firmanto and Sarno 2018;Gohil and Patel 2019;Husnain et al. 2021;Miranda et al. 2019;Rao et al. 2020). These scores are very efficiently used for sentiment analysis purposes (Agushaka et al. 2023) for extensive text analysis. ...
Article
Full-text available
Fake news creates social turbulence, which may hamper our social or economic equilibrium. Researchers have harnessed machine learning (ML) and deep learning (DL) algorithms to combat this challenge, particularly in disparate environments. Numerous techniques have been created to classify false news based on various textual features, including deep learning, machine learning, and evolutionary methodologies. Although fake news sentiment analysis is not entirely new, sentiment score-based artificial news analysis is rarely used. Our method incorporates machine learning techniques and deep learning techniques, such as LSTM-BiLSTM, with SentiWordNet parser-obtained sentiment scores. This integration improves feature sets and enables a more detailed analysis of emotional context. This research pioneers using machine learning along with deep learning techniques based on sentiment scores, an innovative approach within the field. Our research substantially improves the detection of false news. Recall and F-measure are significantly enhanced using machine learning techniques with the COVID-19 dataset. Moreover, sentiment-based deep learning techniques used for both the LIAR and COVID-19 datasets surpass previous benchmarks, obtaining a remarkable accuracy improvement of over 15% on the LIAR dataset compared to existing literature. This pioneering sentiment score-based approach enhances fake news detection accuracy, offering a potent tool to counter misinformation and safeguard societal equilibrium.
... Context-aware sentiment analysis has emerged as a crucial area of study, especially in multilingual and culturally diverse contexts. Miranda et al. (2019) undertook the task of crafting a sentiment analysis model tailored to Indonesian Twitter data, focusing on opinions related to the Indonesian general election. By combining SentiWordNet with machine learning techniques, their research emphasized the significance of context-aware sentiment analysis in comprehending sentiments within diverse linguistic and cultural settings. ...
... Existing studies have either used a combination of lexical semantic dictionaries with ML or hybrid ML hybrid deep learning algorithms for Twitter sentiment analysis as shown in Table 1. Techniques based on SentiWordNet, WordNet and ML suffer from several challenges including design complexities due to data dimentionality because of data sparsity (Miranda et al., 2019;Asgarian et al., 2018;Khan et al., 2017;Dehkharghani et al., 2016). Techniques based on MLs and hybrid MLs (Dessì et al., 2021;Qureshi and Sabih, 2021;Baydogan and Alatas, 2021;Kapil and Ekbal, 2020;Mossie and Wang, 2020), among other shortcomings include their inability to capture underlying semantics of words. ...
Chapter
Demystifying Emerging Trends in Machine Learning (Volume 2) offers a deep dive into emerging and trending topics in the field of machine learning (ML). This edited volume showcases several machine learning methods for a variety of tasks. A key focus of this volume is the application of text classification for cybersecurity, E-commerce, sentiment analysis, public health and web content analysis. The 49 chapters highlight a wide variety of machine learning methods including SVNs, K-Means Clustering, CNNs, DCNNs, among others. Each chapter includes accessible information through summaries, discussions and reference lists. This comprehensive volume is essential for students, researchers, and professionals eager to understand the emerging trends reshaping machine learning today.
Chapter
This book series aims to provide a forum for researchers from both academia and industry to share their latest research contributions in the area of computing technologies and Data Sciences and thus to exchange knowledge with the common goal of shaping the future. The best way to create memories is to gather and share ideas, creativity and innovations.
Chapter
The election campaign provides the evaluation and experience of the voters. The analysis of an election campaign comprises the different twists and turns to monitor and evaluate the situation in the elections. India is one of the biggest democratic countries with different languages, races, and policies. Through manual processing in the election, the campaign government can control the situation. The opinion of the voters is a key factor in the determination of the election results. Hence, it is necessary to process the opinion of the voters to get a clear view of the election. To gain knowledge from the opinion of the voter, machine learning (ML)-based techniques are implemented to classify the voter’s opinion about political parties and candidates of the parties. In ML, sentiment analysis is the key factor in the identification of the opinion about parties to estimate the positive and negative opinions of voters. This paper presented a survey about ML and classification techniques in the NLP-based election campaign process. Also, to process the opinion of the people, natural language processing (NLP) is effective for processing. In the NLP process, sentiment analysis is a key factor to identify the opinion of the voters about political parties and candidates. The estimation is based on the evaluation of the election campaign for the computation of the opinion of the voters in the election campaign evaluation. The opinion about candidates and views of the candidates are evaluated in the analysis.KeywordsElection campaignMachine learningNatural language processingSentiment analysis
Chapter
Full-text available
Ternary Quantum-Dot Cellular Automata (TQCA) is a developing nanotechnology that guarantees lower power utilization and littler size, with quicker speed contrasted with innovative transistor. In this article, we are going to propose a novel architecture of level-sensitive scan design (LSSD) in TQCA. These circuits are helpful for the structure of numerous legitimate and useful circuits. Recreation consequences of proposed TQCA circuits are developed by utilizing such QCA designer tool. In realization to particular specification, we need to find the parameter values by using Schrodinger equation. Here, we have optimized the different parameter in the equation of Schrodinger.KeywordsTQCALSSDQuantum phenomenon for combinational as well as sequential logicJ-K flip-flopSchrodinger equationEnergyPower
Article
Full-text available
In this paper, the use of TF-IDF stands for (term frequency-inverse document frequency) is discussed in examining the relevance of key-words to documents in corpus. The study is focused on how the algorithm can be applied on number of documents. First, the working principle and steps which should be followed for implementation of TF-IDF are elaborated. Secondly, in order to verify the findings from executing the algorithm, results are presented, then strengths and weaknesses of TD-IDF algorithm are compared. This paper also talked about how such weaknesses can be tackled. Finally, the work is summarized and the future research directions are discussed.
Article
Full-text available
Analyzing massive user-generated microblogs is very crucial in many fields, attracting many researchers to study. However, it is very challenging to process such noisy and short microblogs. Most prior works only use texts to identify sentiment polarity and assume that microblogs are independent and identically distributed, which ignore microblogs are networked data. Therefore, their performance is not usually satisfactory. Inspired by two sociological theories (sentimental consistency and emotional contagion), in this paper, we propose a new method combining social context and topic context to analyze microblog sentiment. In particular, different from previous work using direct user relations, we introduce structure similarity context into social contexts and propose a method to measure structure similarity. In addition, we also introduce topic context to model the semantic relations between microblogs. Social context and topic context are combined by the Laplacian matrix of the graph built by these contexts and Laplacian regularization are added into the microblog sentiment analysis model. Experimental results on two real Twitter datasets demonstrate that our proposed model can outperform baseline methods consistently and significantly.
Article
Full-text available
Opinion Mining (OM) or Sentiment Analysis (SA) can be defined as the task of detecting, extracting and classifying opinions on something. It is a type of the processing of the natural language (NLP) to track the public mood to a certain law, policy, or marketing, etc. It involves a way that development for the collection and examination of comments and opinions about legislation, laws, policies, etc., which are posted on the social media. The process of information extraction is very important because it is a very useful technique but also a challenging task. That mean, to extract sentiment from an object in the web-wide, need to automate opinion-mining systems to do it. The existing techniques for sentiment analysis include machine learning (supervised and unsupervised), and lexical-based approaches. Hence, the main aim of this paper presents a survey of sentiment analysis (SA) and opinion mining (OM) approaches, various techniques used that related in this field. As well, it discusses the application areas and challenges for sentiment analysis with insight into the past researcher's works.
Article
Full-text available
Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gain much attention in recent years. In this paper, we aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis. A general process for sentiment polarity categorization is proposed with detailed process descriptions. Data used in this study are online product reviews collected from Amazon.com. Experiments for both sentence-level categorization and review-level categorization are performed with promising outcomes. At last, we also give insight into our future work on sentiment analysis.
Article
Full-text available
Sentiment analysis is one of the fastest growing research areas in computer science, making it challenging to keep track of all the activities in the area. We present a computer-assisted literature review, where we utilize both text mining and qualitative coding, and analyze 6,996 papers from Scopus. We find that the roots of sentiment analysis are in the studies on public opinion analysis at the beginning of 20th century and in the text subjectivity analysis performed by the computational linguistics community in 1990’s. However, the outbreak of computer-based sentiment analysis only occurred with the availability of subjective texts on the Web. Consequently, 99% of the papers have been published after 2004. Sentiment analysis papers are scattered to multiple publication venues, and the combined number of papers in the top-15 venues only represent ca. 30% of the papers in total. We present the top-20 cited papers from Google Scholar and Scopus and a taxonomy of research topics. In recent years, sentiment analysis has shifted from analyzing online product reviews to social media texts from Twitter and Facebook. Many topics beyond product reviews like stock markets, elections, disasters, medicine, software development and cyberbullying extend the utilization of sentiment analysis.
Article
Full-text available
Sentiment Analysis (SA) is an ongoing field of research in text mining field. SA is the computational treatment of opinions, sentiments and subjectivity of text. This survey paper tackles a comprehensive overview of the last update in this field. Many recently proposed algorithms' enhancements and various SA applications are investigated and presented briefly in this survey. These articles are categorized according to their contributions in the various SA techniques. The related fields to SA (transfer learning, emotion detection, and building resources) that attracted researchers recently are discussed. The main target of this survey is to give nearly full image of SA techniques and the related fields with brief details. The main contributions of this paper include the sophisticated categorizations of a large number of recent articles and the illustration of the recent trend of research in the sentiment analysis and its related areas.
Conference Paper
Full-text available
In this work we present SENTIWORDNET 3.0, a lexical resource explicitly devised for supporting sentiment classification and opinion mining applications. SENTIWORDNET 3.0 is an improved version of SENTIWORDNET 1.0, a lexical resource publicly available for research purposes, now currently licensed to more than 300 research groups and used in a variety of research projects worldwide. Both SENTIWORDNET ...
Article
This paper examines the relationship between news-based sentiment, captured through a machine learning approach, and the US securitised and direct commercial real estate markets. Thus, we contribute to the literature on text-based sentiment analysis in real estate by creating and testing various sentiment measures by utilising trained support vector networks. Using a vector autoregressive framework, we find the constructed sentiment indicators to predict the total returns of both markets. The results show a leading relationship of our sentiment, even after controlling for macroeconomic factors and other established sentiment proxies. Furthermore, empirical evidence suggests a shorter response time of the indirect market in relation to the direct one. The findings make a valuable contribution to real estate research and industry participants, as we demonstrate the successful application of a sentiment-creation procedure that enables short and flexible aggregation periods. To the best of our knowledge, this is the first study to apply a machine learning approach to capture textual sentiment relevant to US real estate markets.
Article
Sentiment analysis from text consists of extracting information about opinions, sentiments, and even emotions conveyed by writers towards topics of interest. It is often equated to opinion mining, but it should also encompass emotion mining. Opinion mining involves the use of natural language processing and machine learning to determine the attitude of a writer towards a subject. Emotion mining is also using similar technologies but is concerned with detecting and classifying writers emotions toward events or topics. Textual emotion-mining methods have various applications, including gaining information about customer satisfaction, helping in selecting teaching materials in e-learning, recommending products based on users emotions, and even predicting mental-health disorders. In surveys on sentiment analysis, which are often old or incomplete, the strong link between opinion mining and emotion mining is understated. This motivates the need for a different and new perspective on the literature on sentiment analysis, with a focus on emotion mining. We present the state-of-the-art methods and propose the following contributions: (1) a taxonomy of sentiment analysis; (2) a survey on polarity classification methods and resources, especially those related to emotion mining; (3) a complete survey on emotion theories and emotion-mining research; and (4) some useful resources, including lexicons and datasets.