ArticlePDF Available

Abstract and Figures

In today's world, the social media is everywhere, and everybody come in contact with it every day. With social media datas, we are able to do a lot of analysis and statistics nowdays. Within this scope of article, we conclude and analyse the sentiments and manifestations (comments, hastags, posts, tweets) of the users of the Twitter social media platform, based on the main trends (by keyword, which is mostly the 'covid' and coronavirus theme in this article) with Natural Language Processing and with Sentiment Classification using Recurrent Neural Network. Where we analyse, compile, visualize statistics, and summarize for further processing. The trained model works much more accurately, with a smaller margin of error, in determining emotional polarity in today's 'modern' often with ambiguous tweets. Especially with RNN. We use this fresh scraped data collections (by the keyword's theme) with our RNN model what we have created and trained to determine what emotional manifestations occurred on a given topic in a given time interval. ARTICLE HISTORY
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tjit20
Journal of Information and Telecommunication
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjit20
Social media sentiment analysis based on
COVID-19
László Nemes & Attila Kiss
To cite this article: László Nemes & Attila Kiss (2020): Social media sentiment analysis based on
COVID-19, Journal of Information and Telecommunication, DOI: 10.1080/24751839.2020.1790793
To link to this article: https://doi.org/10.1080/24751839.2020.1790793
© 2020 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
Group
Published online: 14 Jul 2020.
Submit your article to this journal
View related articles
View Crossmark data
Social media sentiment analysis based on COVID-19
László Nemes and Attila Kiss
Department of Information Systems, ELTE Eötvös Loránd University, Budapest, Hungary
ABSTRACT
In todays world, the social media is everywhere, and everybody
come in contact with it every day. With social media datas, we are
able to do a lot of analysis and statistics nowdays. Within this
scope of article, we conclude and analyse the sentiments and
manifestations (comments, hastags, posts, tweets) of the users of
the Twitter social media platform, based on the main trends (by
keyword, which is mostly the covidand coronavirus theme in
this article) with Natural Language Processing and with Sentiment
Classication using Recurrent Neural Network. Where we analyse,
compile, visualize statistics, and summarize for further processing.
The trained model works much more accurately, with a smaller
margin of error, in determining emotional polarity in todays
modernoften with ambiguous tweets. Especially with RNN. We
use this fresh scraped data collections (by the keywords theme)
with our RNN model what we have created and trained to
determine what emotional manifestations occurred on a given
topic in a given time interval.
ARTICLE HISTORY
Received 20 May 2020
Accepted 30 June 2020
KEYWORDS
natural language processing;
recurrent neural network;
sentiment analysis; social
media; visualization
1. Introduction
The main goal is to train a model to sentiment prediction by looking correlations between
words and tag it to positive or negative sentiment.
In todays world, social media platforms like twitter are of immense importance to
peoples everyday lives. We denitely have to deal with the manifestations on these plat-
forms, and as machine learning becomes more and more popular and important just like
the natural language processing (NLP), we have to deal with this, and analyse and research
the emotions on this platforms.
There are many ways to approach a topic, from puredictionary-based analysis to more
seriousdeep learning, neural networks. By building learning algorithms and classiers, we
strive to label the relevant tweets with the appropriate emotional polarity.
As we mentioned at the beginning of the introduction, the main objective of this article
is to develop a model for predicting emotions by focusing on the relationship between
words, thus labelling specic entries, as opposed to the usual positiveand negative
decomposition, we get a much wider scale for more accurate forecasting. However, at
the focus point, there is no larger dataset, but the properly trained model analyses with
© 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/
licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
CONTACT Attila Kiss kiss@inf.elte.hu
JOURNAL OF INFORMATION AND TELECOMMUNICATION
https://doi.org/10.1080/24751839.2020.1790793
a newly mined dataset that matches the current trend (coronavirus themes now) and
dataset build number, (number of scraped tweets) which narrowing the circle of a
larger amount of data into a narrower topic. In this way, we do not only indicate that
the data should be positive or negative, we also provide a more detailed breakdown of
the emotional levels. This can provide more accurate data than analysing larger datasets,
as fresh mining is always available, so you can get much faster and more accurate results
as nal result than earlier larger samples and other polls.
We also compare our model with other third-party options to see how small details play
a very important in proper categorization using a properly taught a Recurrent Neural
Network model for dierent messages. Thus, as mentioned earlier, focusing on specic
topics, by analysing a given number of messages (tweets), and waiting the particular
emotional outcomes related to the topic. According to our estimates, we expect a more
accurate and detailed analysis and categorization of an emotional analysis related to a
current topic, which can provide a more stable and accurate basis for various sociological
and other studies. It also provides a dierent approach to research on the pandemic,
focusing on the rapidly changing human mood and opinion. Such as the changes and
manifestations of human moods in a given period of the coronavirus on social
media.(Twitter)
The model was built and taught using the libraries and capabilities provided by ten-
sorow. By analysing a Recurrent Neural Network (RNN). The rest of this article contains
sections on the structure and use of the encoder, model and results.
2. Related works
Emotional analysis of twitter datasets within the article of Balahur (2013) using unigram
and bigram (n-gram) and supervised learning with simple Support Vector Machines.
Based on the results we can conclude that on the one hand, the best properties to use
emotional analysis is the unigram and the bigram together. Second, we can see that gen-
eralizations, using unique tags, emotive words and modiers are strongly improve the per-
formance rating of emotions. (joy, happy, sadness, fear, etc.) Presented in another article,
Jianqiang and Xiaolin (2018) introduces a word embedding method implemented, based
on unsupervised learning and large twitter corpora, the method uses hidden contextual
semantic relationships and co-occurrence statistics between tweets and words. These
word embeds are combined with n-gram characteristics and word mood polarity score
characteristics form a set of tweet emotional features. Set is integrated into a deep convo-
lutional neural network.
The method, which described by Ortis et al. (2018) uses text extracted from the descrip-
tion of dierent images instead of classic user entries. Then denes a multimodal embed-
ding space based on the text properties. The emotional examination being performed by a
supervised Support Vector Machine.
This study explores techniques of Leskovec (2011) for modelling, analysing, and opti-
mizing social media. First, they show us how to collect large amounts of social media
data. Then it will continue to discuss methods for obtaining and tracking information
and how to build forecasting models for information dissemination and inclusion.
Finally, they discusses methods for monitoring the ow of emotions across the network
and the development of polarization.
2L. NEMES AND A. KISS
With the Recurrent Neural Network by Mikolov et al. (2010), which is intentionally run
multiple times and the goal with statistical language modelling is to predict the next word
in textual data in its given context. Where the experiments show signicant reduction of
word error rate. In addition, Mikolov et al. (2011) shows that the recurrent neural network
language model (RNN LM) signicantly outperforms many competitive language model-
ling techniques. And approaches that result in more than 15-fold acceleration in both
the training and testing phases are presented. Finally, they discuss options for reducing
the parameters of the models. The resulting RNN model is thus smaller, faster in both train-
ing and testing, and may be more accurate than the base. Besides in another article, we
can cover up the SummaRuNNer (Nallapati et al., 2017) which is a Recurrent Neural
Network (RNN) based sequence model, and interpretable neural sequence model which
is proposed to summarize extraction documents. Which shows that, it is better performing
than or is comparable to the state-of-the-art deep learning models.
Following this, we were introduced to learning several related tasks together using a
multitasking learning framework by Liu et al. (2016). Based on the recurrent neural
network, three dierent mechanisms are proposed sharing information to model text
with task-specic and shared layers. Textual classication tasks shows that, the proposed
models can improve the task using other related tasks.
In another work, Arras et al. (2017) presented a simple and eective strategy for extend-
ing the Layer-wise Relevance Propagation (LRP) process to repetitive architectures such as
LSTMs, by proposing a rule for reproducing relevance through multiplicative interactions.
The extended LRP version was applied bidirectionally. The LSTM model shows the
emotional prediction of sentences to see if the relevance of the resulting words is reliable
and what the classiers decision for or against a particular class is and how they perform
better than gradient-based decomposition.
Getting to know a dierent perspective, we can discover the SmartSA, a lexicon-based
sentiment classication system for social media genres by Muhammad et al. (2016), which
integrates contextual grasp strategies in two dierent ways: interaction of terms with their
local context and global context. They also present a hybridization method for a general
purpose lexicon, SentiWordNet, with genre-specic vocabulary.
Besides, we can focus to describes an emotional analysis study by Neri et al. (2012),
which includes more than 1000 Facebook posts based on news summaries of Rai the
Italian public broadcaster service versus the emerging and more dynamic La7 private
company. This study maps study results with observations made by the Osservatorio di
Pavia, an Italian research institute specializing in theoretical media analysis.
Along with the growth of web content, there is an increasing number of hate speech on
various platforms, which provide a suitable ltering tool for natural language processing
by Schmidt and Wiegand (2017). It is shown that character-level approaches work
better than token-level approaches, and that a lexical list of resources, such a list of
slurs, can help rank, but usually only in combination with others.
Additionally, we can also get to introduce a new metaheuristic method (CSK) by Pandey
et al. (2017), based on K-means and cuckoo search. The method provides a new way to
nd optimal cluster heads based on the sentimental content of the Twitter dataset.
Wang and Li (2015) extends signicant advances in text-based emotional prediction
tasks to a higher level of prediction of emotion behind images. They show that visual
and textual features alone are not sucient for accurate emotional tagging. Experiments
JOURNAL OF INFORMATION AND TELECOMMUNICATION 3
with two large datasets show that the proposed method signicantly improved the exist-
ing state-of-the-art methods.
Finally, Xu et al. (2019) propose a new Hierarchical Deep Fusion (HDF) model for explor-
ing the transverse relationship between images, text, and their social relationships, which,
with their complementary features, make emotional analysis more eective. Visual content
is combined with various semantic fragments of textual content using three-level hierarch-
ical LSTM (H-LSTM) to learn the inter-modal correlation of image and text at dierent
levels.
3. DataSet/DataFrame building for the analysis
3.1. Existing dataset usage
Of course, we also have the option to use data from external sources that was previously
built from tweets for specic topics (possibly a huge mixed tweet collection or some more
specic collection), but in this case, we have to keep in mind that, these data may not be
up to date. So it can also be a previously compiled collection and there are several sources
where you can access and download datasets.
Basically, this would not be a problem, but under the circumstances, we try to rely on
the most up-to-date datas for test dataset. However, it may be suitable for comparison to
what extent the writing trend of a given circle inuences the outcome of the analysis.
3.2. Build dataSet using Twitter API for scraping
Using the twitter developer tools, we build a test dataset using a scraping script, which
compiles our data collection from tweets into a topic based dataset with the given
keyword and a tweet scrape data number. In the state before use for analysis, we have
the possibility to submit this data for a completely dierent non-RNN-based testing, as
the dataset construction also supports the performance of a completely dierent, tra-
ditional analysis. For example, Excel-based processing (not deep learning).
About the methods that perform the scraping and cleaning, our main method is the
datasetbuildingwhere according to the parameters we need a keyword for the current
scraping, a tweet count limit (how many tweet do we need in this theme) date intervals,
which time period where we would like to extract data in this related topic, and of course
the language, where we used English in all cases. For the scrape, we have also used
the tweetpy library for the Twitter API. Plus we perform the extracleaning with the
cleantweetmethod (Listing 1).
However, we would like to use the Recurrent Neural Network what we have built, and
we also would like to use the test dataset (which we are freshly scraping and mining.) on
our already trained model. The scraping script what mentioned above makes this possible,
because the dataset has undergone proper formatting and cleaning.
Overall, after compiling the dataset itself, we have the opportunity to use this data in a
completely dierent traditional (Excel) analysis as well. But, these systems and structures
are supported by the script in an orderly, uninterrupted manner and also run the analysis.
The analysis will focus primarily on a separate specic topic, which will be the Coronavirus.
On this Figure 1, we can see there are a lot of another possibility and method to analyse
with this dataset.
4L. NEMES AND A. KISS
4. Dierent ways of sentiment analysis
As we mentioned there are several dierent possibility to the Natural Language processing
and Sentiment analysis. If we would like to separate that into two categories, rst, the
classic dictionary style, which is not the most modern way as opposed to the Deep Learn-
ing possibilities. What we also use in this analysis, is the Recurrent Neural Network.
In classical dictionary-based analysis, we have a pre-set vocabulary where each word
has a value, whether the eect of the word is positive or rather negative. Accordingly,
the sentences are decomposed so that each word is identied, and then, according to
our dictionary, we assign the given value to the eect what that word also has. The
sum of these values would give the emotional value of our particular sentence in the
most general case. Of course, we can run into a lot of problems here, as denials, double
denials, word turns, word combinations that can aect emotion which cannot be
detected. This is why it has shifted this topic towards Deep Learning, using properly
trained models.
Figure 1. Part from the fresh mined DataSet.
Listing 1. Part of the Twitter dataset builder
JOURNAL OF INFORMATION AND TELECOMMUNICATION 5
4.1. Deep learning RNN
We use and build Recurrent Neural Network (RNN).
What is Recurrent Neural Network (RNN)
1
A neural network that is intentionally run
multiple times, where parts of each run feed into the next run. Specically, hidden
layers from the previous run provide part of the input to the same hidden layer in the
next run. Recurrent neural networks are particularly useful for evaluating sequences, so
that the hidden layers can learn from previous runs of the neural network on earlier
parts of the sequence.
For example, one recurrent neural network that runs four times. Notice that the values
learned in the hidden layers from the rst run become part of the input to the same
hidden layers in the second run. Similarly, the values learned in the hidden layer on the
second run become part of the input to the same hidden layer in the third run. In this
way, the recurrent neural network gradually trains and predicts the meaning of the
entire sequence rather than just the meaning of individual.
In addition to the RNN, the advantages are that it is possible to process inputs of any
length. The size of the model does not increase with the size of the input. The calculation
takes into account historical information. The weights are distributed as a function of time.
Of course, it should be noted that some general counter-arguments are mainly that the
calculation is slow.
4.2. RNN model build and train
The tools provided by Keras and Tensorow were used to build the model. Where we
created a Sequential model by passing a list of layer instances to the constructor and
the rst layer is the Embedding layer, which can be used for neural networks on text
data. It requires that the input data be integer encoded, so that each word is represented
by a unique integer. The embedding layer is initialized with random weights and will learn
an embedding for all of the words in the training dataset. Then we used Bidirectional
wrapper for RNNs. Next is the Dense and Dropout layers. A dense layer is a classic fully con-
nected neural network layer, each input node is connected to each output node. A
dropout layer is similar except that when the layer is used, the activations are set to
zero for some random nodes. This is a way to prevent overtting.
Then we also save our trained models in .h5 format with the actual training date, to
reuse that, if we need to. Also we have a another possibility to load this trained models
and use it on the new scraped datas. (There is a separate menu option to use a re-
trained model or a previous model where we use the name of this .h5 le to refer this.)
There are numerous way to use train and test datasets before you use the trained
model in a real dataset. Tensorow gives us numerous datasets for example: imdb
reviews/subwords8kand civil comemmentsetc. We can split it up to train and test
dataset and use it for compileand tmodel calls, and of course we can use our own
datasets for this train and test phase as well. In the case of models trained by external data-
sets, we can talk about continuous learning, since another dataset is made for the model
and we use the result for our own actual datasets.
For display, we use the matplotlib.pyplot package, where our model walk through the
given dataset and use the predict method. Accordingly, we categorize how positive and
6L. NEMES AND A. KISS
negative the emotional value of the tweet or sentence, plus visualize this results with a
colourful plot.
4.3. RNN analysis themes and results
To examine and compare the model, the coronavirus topic (which is the most prominent
and up-to-date topic of our time, the recent data mining results have a lot of potential, as
there is no pre-compiled dataset here and rapid changes can be topical here) and dierent
numbers of fresh datas, what we mine. In addition, comparisons are made with several
third-party applications and we also compare with traditional, classical analysis what dier-
ences and conclusions can be drawn about eciency, accuracy, and speed in dierent
cases.
We expect that the model we have trained and developed and taught in detail can
provide more accurate results for todays online communication formulas, dicult
multi-meaning sentences and unique topics than a traditional or a third-party application
that also works with accurate but larger error ranges than our more accurately prepared
model.
4.3.1. Compare to the oldfashion research work
Traditional polling or purely human work, tracking, data collection, analysis, these pro-
cesses are time consuming. The result would be very accurate, but by the time the
report is completed, the conclusion may be outdated. The result would no longer be
relevant. Thus, in the case of any human labour trigger, especially data mining, scraping
can be a huge step forward as a test dataset. In this way, the process takes less time
and we are also able to use a number of other third-party tools to speed up our
processes.
In essence, we can discover incomparably large dierences between traditionally
supported analyses and analyses which supported by dierent scraping and other
dataset compilation options, as the dierence is found in time and accuracy. In addition
to the traditional research process, it should also be mentioned because people do the
analysis, so the accuracy of the tweet polarity would be really good, but it cannot cover
such a large sample, i.e. not in such a short time, so the results may be much more
relevant for analyses despite perhaps a larger error factor, as up-to-date and fast
results as well as partial results can be obtained. Not to mention the special well-
trained neural networks, the results and speed of which cannot be measured by the
speed of human work.
4.3.2. Compare to some third-party sentiment analyser like TextBlob
As mentioned earlier for the analyses, we will use coronavirus theme, which dominates
social media platforms.
The pre-measurement expectations are as follows: We would like more accurate, less or
even zero neutral expression in the results, as these data would greatly distort the real
picture, and basically we would like to minimize the neutral category as much as possible.
Based on the small details, the twisted manifestations, we expect some cases move to the
negative or positive direction from the neutral space. For both TextBlob and RNN, we apply
a same appropriate categorical distribution to dierent levels of feel.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 7
Thus, we scraped a dierent number of tweet data in each analysis and compared the
results of the test datasets. (Fresh Scraped Tweets dataset what we use on the trained RNN
model and TextBlob as well.)
The large-scale presence of the given topic on Twitter was already visible in the rst
rounds, it greatly inuences the results. Initially, the rst dierence between the trained
models came out on a smaller sample of 10 and 20 pieces. Using the functions of TextBlob,
you can see how many dierent and cluttering tweets direct the end result of the analysis
to the neutral topic, and we often get a smaller but positive end result, which of course was
also the case in our own model. (With a smaller or zero neutral segment and a better dis-
tributed area.)
Primarily against the background of this phenomenon, looking at the test datasets,
which data currently analysed, it was noticeable that the age group currently on twitter
who is mostly active is young/younger. Thus, school closes appear as a positive phenom-
enon in smaller samples and with a small positive and neutral direction for the end result.
In addition, the hospital donations also moves the end result in a positive direction. There
is a trend in addition to negative deaths, tweets about these donations and cohesion are
much more present even in small samples, of course here the inuence of the current
scrape is great on what data it collects. Plus, the factuality of newscasts also reinforces
the neutral or weakly positive or weakly negative slices. One cannot emotionally shift
the simple statement in any direction in most cases.
Other third-party models will not be mentioned in detail, as an analyst based on a
simple dictionary has already given completely misleading results on tweets that have
reported positive or negative disease of the virus outcomes on a given topic. Like
(Figure 2), the textBlob and our own well-trained model were able to lter out these
word turns and manifestations really accurately. (Maybe, the RNN looks more signicant,
but now, we cannot prove it 100%, but the RNN has not have a Neutral section most of the
time, which gives us more improvement to the analysis.) Mainly the amount of test data
will be the inuencing factor.
Note: The RNN model was trained based on an imdb review dataset (In test and train
dataset sections using shue method as well. Then we use the fresh scraped dataset as
test dataset with this trained model.)
We can see on the gures (For this run, the keyword was the covid.), the RNN
managed to categorization on all tweets without giving a neutral result, so we conclude
that the model was betterdened in the smaller details and categorized it based on
the small details. Our model stands out in the strongly positive and the weakly negative
sections, which is a good indicator of the division of the topic and the abundance of
interactions on the topic. Of course, it can be noticed that on social media platforms,
positive manifestations continue to dominate which also driven by partial results, but it
is also realistic that there are also calls for negative and dierent perspectives. TextBlob
also deviates in the positive direction as our model, both results tipped in the same
direction, but a larger neutral value can also be noticed in this case in addition to
the negative manifestations. Overall, the categorization of both models can be realistic,
the dierence is to be found primarily in the detail handling of the models, which
hopefully our model handled better even with so little test data. Figure 2 worked
from this DataSet (Figure 3).
8L. NEMES AND A. KISS
We continue to compare TextBlob and our own RNN model, how it performs on larger
and larger test datasets, and how accurate it gives less erroneous results, with double
denials and other, slengand general manifestations, reports.
Between 24 April 2020 and 25 April 2020 on the sample of 50, we can see the increasing
distance towards the two extremes. In the case of the RNN (b) (Figure 4) model, again,
tweets did not fall into the neutral category, they were subdivided into weakly negative
and weakly positive parts, as opposed to textBlob (a) (Figure 4), where there is a more sig-
nicant neutral unit. In addition, there is a kind of progress towards extremism, which can
be concluded that people are already starting to get boredof this whole topic, and the
daily numbers and the situation itself. Of course, the high divisions can be inferred from
the dierent policy reactions and announcements and the tweets that respond to
them, which either express a sympathetic opinion or a dissenting opinion about the situ-
ation. (Looking at the dataset, we can see a strong wave of manifestations about the
decisions and political inuence of the WHO which amplied positive negative opinions
probably.)
Increasing (200 tweets) the dataset but still using the keyword covid, we can see that
the division is still similar. A kind of increase in the positive direction can be detected,but
this increase in the amount of tweets can be explained in this case of both models.(There is
Figure 2. Analysis of sample of 20 tweets by TextBlob and RNN, using covidkeyword. (a) TextBlob
result, (b) RNN result.
Figure 3. 15 April 2020 DataSet.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 9
adierence in the strength of positivity between the two results.) The increased number of
tweets shows that mostly the positive expression, the support, and the hope’–greater
extent of positivity is still highly present in social media which was expected, but nega-
tive messages are also present in signicant amounts on the subject (Figure 5).
Using the keyword coronavirusand a much larger dataset, the result is very similar to
the trends so far. Smaller increase in both positive and negative directions, we can see only
smaller movements in the strength of positivity or negativity (Figure 6).
Our model did not place a tweet in a neutral section, which makes it easier to see dier-
ences of opinion. It should also be mentioned that our model evaluates tweets between 0
and 1, while textBlob between 1 and 1. The categories would be dened accordingly, so
that a few small details of the tweet are able to move that into another category. Because
Figure 4. Analysis of sample of 50 tweets by TextBlob and RNN, using covidkeyword. (a) TextBlob
result, (b) RNN result.
Figure 5. Analysis of sample of 200 tweets by TextBlob and RNN, using covidkeyword. The time
period stands between 24 April 2020 and 25 April 2020. (a) TextBlob result, (b) RNN result.
10 L. NEMES AND A. KISS
of these small details, we can say that perhaps we can get a more comprehensive picture
in order to avoid neutrality (Figure 7).
People are divided on this topic, just like in most other cases nowadays. They clash
arguments on dierent topics, try to draw conclusions in this place, convince others
about the support. In most cases, strong negativity or positivity is a bit, meaning most
people are not biased completely towards one side, but there is a visible percentage
who is totally biased that is normal today. For smaller DataSets, these are strengthened
a little better, which is also normal.
Examining the period between 13 May 2020 and 14 May 2020 again using the keyword
covid, we obtained a result (with 200 tweets) very similar to the previous months 200
tweets. Overall, movement can be observed in the categories delimiting strength
(weakly, strongly) within the positive and negative sides. So, it can be concluded that
the RNN model (b) (Figure 8) continues to deliver signicantly results with small
Figure 6. Analysis of sample of 500 tweets by TextBlob and RNN, using coronaviruskeyword. The time
period stands between 24 April 2020 and 25 April 2020. (a) TextBlob result, (b) RNN result.
Figure 7. 24 April 2020 Part of the dataSet of the coronaviruskeyword.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 11
changes in time and still without categorization into the neutral section. Looking at the
program logs we saw that in some cases decimal values have decided a particular
writing to be weakly positive or negative and not neutral. The weakly positive in this
case is 7.50% and the weakly negative is 10.50%. Based on the results of the RNN
model, it can be said that positivity is still more present in social media in the case of pan-
demic-related manifestations. Based on the result of TextBlob (a) (Figure 8), we see a
similar result in the positive direction, but with a signicant 30% neutral data, and the
weakly positive section is 36.50% against the RNNs 7.50%.
Overall, the RNN chart provides a much more realistic and thorough picture of current
emotional levels (for us) with minimal or even zero neutral results.
If we increase the number of tweets to 500 in the same time period. In the case of the
RNN model, we can observe a strengthening in the negative section (simple negative not
together with the strongly and weakly negatives), which can also be said for the result of
TextBlob.
In our textBlob (a) (Figure 9) analysis, we can see again 29% of neutral value, in addition
to a weakly negative value of 17.80%. For the RNN model (b) (Figure 9), again, the neutral
result is 0% and only 8.60% is weakly negative. Overall, comparing the categorical values of
the two analyses, the positive displacement can be said again, but the division of this end
result is reected in a completely dierent way in the two models. In the case of RNN, a
positive value of 24.80% can be observed, in addition to the negative value, which is
22.40%, which is a proportionate division and the positive manifestation in the sample
of 500 tweets are a little more. In contrast, in a TextBlob analysis, weakly positive value
is 35.20%, which is dominate. The positive value is 11% and a negative value is 4%.
The reactions and evaluations of various political announcements and decisions, after
the announcement, provoke signicant activity from the people who argue and talk about
the eects in the social media. Thus drastically increasing the number of tweets related to
the topic. A similar reaction has been shown by various international events on this sub-
jects, especially after the details have been described. (There is a visible shift into the posi-
tive and negative directions, sometimes from the neutral, but also there are some changes
in the strength distribution of positivity and negativity itself.)
Figure 8. Analysis of sample of 200 tweets by TextBlob and RNN, using covidkeyword. The time
period stands between 13 May 2020 and 14 May 2020. (a) TextBlob result, (b) RNN result.
12 L. NEMES AND A. KISS
How long the covidand coronavirustopics will be dominant on the entire Internet no
one knows. If the vaccine will be available, the topic is still expected to stay with us for a
signicant period of time and it will still to dominate the various community platforms
with its subsequent eects (Figure 10).
5. Conclusion and future work
In this work, we use a Recurrent Neural Network (RNN) to classify emotions on tweets. We
developed a model to analyse the emotional nature of various tweets, using the recurrent
neural network for emotional prediction, searching for connections between words, and
marking them with positive or negative emotions. Where instead of simple positive and
negative extremes, we have classied the various texts into a much more articulated
class of emotional strength (weakly positive/negative, strongly positive/negative). This
Figure 9. Analysis of sample of 500 tweets by TextBlob and RNN, using covidkeyword. The time
period stands between 13 May 2020 and 14 May 2020. (a) TextBlob result, (b) RNN result.
Figure 10. 13 May 2020 Part of the dataSet of the covidkeyword.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 13
has been combined with a keyword-based special data scraper, so we can apply our
taught RNN model with these specic freshly scraped datasets. As a result, we get an
emotional classication related to specic topics. What kind of tweets they were and
what emotional class they belong to, what is the distribution on that topic at the emotional
level within the given start interval. In the article, we focused most on the coronavirus and
related emotional changes and uctuations, and it was shown that the overall positive
manifestation and presence on the social platform remained on social media surfaces
during this pandemic. Of course, in addition to negative and other manifestations. Over
time, positivity has strengthened, but there is also a stronger negative array that is
natural. According to our expectations this topic remain positive manifestations, some-
times with a higher and sometimes with a smaller percentage. It can be seen that the
recurrent neural network provides good performance and prediction in text classication.
Where the RNN model brought a smaller amount of data in neutral result or completely
reduced to zero that. Which proves that our model is able to makea decision and categor-
ize in some direction even on the basis of small details. Our comparisons were made
mainly against TextBlob, which also worked very well and delivered stable results, but
there were many times when the neutral results were above 30% compared to our RNN
model, which we cannot use as usefully for further evaluations as for our RNN model.
The classication of emotions for both models (TextBlob, RNN) was properly segmented.
For future work and further development, it may be advisable to create an interface that
better visualizes and interacts with users, which can be supplemented with sophisticated
database management for archiving, tracking, and exploring datas to other areas. We can
further expand the analysis by introducing various classications and clusters as well as
other data analyses. Allowing examinations and comparisons from a new perspective, in
addition to emotional analyses, may even provide an opportunity to further support
current results and compare the conclusions. In addition, implementing or refactoring
future potential tensorow features and keeping it up to date.
Note
1. https://developers.google.com/machine-learning/glossary/#recurrent_neural_network
Disclosure statement
No potential conict of interest was reported by the authors.
Funding
The project has been supported by the European Union, co-nanced by the European Social Fund
[grant number EFOP-3.6.3-VEKOP-16-2017-00002].
Notes on contributors
László Nemes received the B.Sc. degree in computer science from Eötvös Loránd University in 2020
and currently pursuing a M.Sc. degree. He is a Demonstrator with the Department of Media and Edu-
cational Technology, Eötvös Loránd University.
14 L. NEMES AND A. KISS
Attila Kiss was born in 1960. In 1985 he graduated (MSc) as mathematician at Eötvös Loránd Univer-
sity, in Budapest, Hungary. He defended his PhD in the eld of database theory in 1991. Since 2010
he is working as the head of Information Systems Department at Eötvös Loránd University. His scien-
tic research is focusing on database theory and practice, security, semantic web, big data, data
mining, articial intelligence and bioinformatics. He was the supervisor of seven PhD students. He
has more than 145 scientic publications.
ORCID
László Nemes https://orcid.org/0000-0001-6167-9369
Attila Kiss https://orcid.org/0000-0001-8174-6194
References
Arras, L., Montavon, G., Müller, K. R.., & Samek, W.2017). Explaining recurrent neural network predictions
in sentiment analysis. Preprint arXiv:1706.07206.
Balahur, A. (2013). Sentiment analysis in social media texts. In Proceedings of the 4th workshop on
computational approaches to subjectivity, sentiment and social media analysis (pp. 120128).
Jianqiang, Z., Xiaolin, G., & Xuejun, Z. (2018). Deep convolution neural networks for twitter sentiment
analysis. IEEE Access,6, 2325323260. https://doi.org/10.1109/ACCESS.2017.2776930
Leskovec, J. (2011). Social media analytics: Tracking, modeling and predicting the ow of information
through networks. In Proceedings of the 20th international conference companion on world wide
web (pp. 277278).
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classication with multi-task learn-
ing. Preprint arXiv:1605.05101.
Mikolov, T., Karaát, M., Burget, L., Černock, J., & Khudanpur, S. (2010). Recurrent neural network
based language model. In Eleventh annual conference of the international speech communication
association.
Mikolov, T., Kombrink, S., Burget, L., Černock, J., & Khudanpur, S. (2011). Extensions of recurrent
neural network language model. In 2011 IEEE international conference on acoustics, speech and
signal processing (ICASSP) (pp. 55285531).
Muhammad, A., Wiratunga, N., & Lothian, R. (2016). Contextual sentiment analysis for social media
genres. Knowledge-Based Systems,108,92101. https://doi.org/10.1016/j.knosys.2016.05.032
Nallapati, R., Zhai, F., & Zhou, B. (2017). Summarunner: A recurrent neural network based sequence model
for extractive summarization of documents. In Thirty-rst AAAI conference on articial intelligence.
Neri, F., Aliprandi, C., Capeci, F., Cuadros, M., & By, T. (2012). Sentiment analysis on social media. In
2012 IEEE/ACM international conference on advances in social networks analysis and mining
(pp. 919926).
Ortis, A., Farinella, G. M., Torrisi, G., & Battiato, S. (2018). Visual sentiment analysis based on on objec-
tive text description of images. In 2018 international conference on content-based multimedia index-
ing (CBMI) (pp. 16).
Pandey, A. C., Rajpoot, D. S., & Saraswat, M. (2017). Twitter sentiment analysis using hybrid cuckoo
search method. Information Processing & Management 53(4), 764779. https://doi.org/10.1016/j.
ipm.2017.02.004
Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language pro-
cessing. In Proceedings of the fth international workshop on natural language processing for
social media (pp. 110).
Wang, Y., & Li, B. (2015). Sentiment analysis for social media images. In 2015 IEEE international con-
ference on data mining workshop (ICDMW) (pp. 15841591).
Xu, J., Huang, F., Zhang, X., Wang, S., Li, C., Li, Z., & He, Y. (2019). Sentiment analysis of social images
via hierarchical deep fusion of content and links. Applied Soft Computing,80, 387399. https://doi.
org/10.1016/j.asoc.2019.04.010
JOURNAL OF INFORMATION AND TELECOMMUNICATION 15
... Melalui pendekatan ini menganalisis sentimen masyarakat dengan pendekatan machine learning akan melalui empat proses yaitu pengumpulan data, pra pemrosesan data, analisis dan evaluasi penelitian [5], kita bisa mengetahui informasi lebih lanjut terkait bagaimana berbagai pihak menerima atau menolak perubahan ini, serta faktor apa yang mendasari persepsi mereka. Dengan perkembangan teknologi menganalisis sentimen dapat menggunakan pendekatan machine learning dimana ini bisa dikategorikan dalam natural language processing yaitu suatu proses mengambil dan mengolah data opini masyarakat melalui berbagai media sosial salah satunya adalah media sosial X merupakan suatu hal yang menarik dalam hal mengidentifikasi dan memahami perasaan seseorang terhadap suatu permasalahan maupun fenomena dalam bentuk membuat suatu tweet dalam media sosial X dimana natural language processing sudah banyak digunakan dalam penelitian berbagai bidang [6], [7], [8], [9]. ...
Article
Full-text available
social media X or formerly more familiar with Twitter is one of the familiar social media and has many users in the world whis is a platform for accesing some information and commenting both suggestions and criticsm related to the development of the Capital City of the Archipelago (IKN) which is the center of smart government in East Kalimantan. There are indormation, suggestions and criticisms addressed to the @ikn_id account directly addressed to the Indonesia government as well as public opinions related to IKN by using the IKN hashtag. Public sentiment on the issue is in the form of text on IKN Development. This research aims to analyze public opinion on the government's decision to build the Capital City of Nusatara (IKN) conveyed through X social media using appropriate data analysis methods by comparing the performance of support vector machine, logistic regression, and naïve bayes algorithms and identifying the most effective algorithm in sentiment analysis. The method used in this research to analyze sentiment are support vector machine, logistic regression and naïve bayes. The use of these three algorithms is also to compare the accuracy that is better than other algorithms. The results obtained using the Support Vector Machine algorithm is 80% while using the Logistic Regression and naïve bayes algorithms are 79%.
... Agencies should use fact-checkers to prevent false information from being disseminated online. Several strategies may be employed to assess public opinion toward the COVID-19 data [10]. ...
Article
Full-text available
Sentiment analysis (SA) is a popular method for obtaining relevant and subjective information from textual information. Multimedia material, such as photos, text, videos, and audio, has lately gained popularity as a way for users to contribute their views on social networking platforms. While sentiment analysis of such material is helpful for various reasons, it is sometimes seen as challenging since these messages are frequently brief, unstructured, noisy, and contain linguistic inconsistencies. Twitter is among the most prominent social media tools for expressing public opinion about various events. However, determining people’s feelings can be tricky since researchers must consider multiple factors. Much of the previous research on sentiment analysis deals with dual or triple-class analysis while using older language modeling techniques. Furthermore, penta-class classification tasks have not been addressed as much. To deal with the challenge, we present a transformer-based model called BertSent that uses ordered preprocessing steps combined with transformer-based tokenization and optimization to get the best sentiment analysis results focused on dealing with limited data. Moreover, our framework handles the challenge of penta-class classification of tweets, and to that end, we combine many preprocessing techniques to fine-tune our BERT-based model. We employ resampling techniques to balance the data which improves model generalization and performance to address class imbalance issues in the penta-class setup. For that purpose, we incorporate both oversampling and undersampling to tackle the challenge of class imbalance when dealing with the penta-class classification problem. Moreover, this article also compares the performance of the transformer-based model against a variety of deep learning-based models, including bi-directional models. The experimentations and results support our model’s remarkable performance considering the limited data and penta-class classification challenge. The results provide an interesting perspective as both undersampling and oversampling provide similar results. BertSent model combined with oversampling provides slightly better performance with 75.3% test accuracy in comparison to undersampling which resulted in 75.1% accuracy.
... These effects include symptoms of post-traumatic stress disorder, long quarantine suppress symptoms, and many others. Social media outlets like Facebook and Twitter, news platforms, blogs, and forums, during the periods of the pandemic have been experiencing reactions, feelings, emotions, and thoughts of people towards the COVID-19 pandemic and its impacts [10]. ...
Article
Full-text available
The world health organization (WHO) has officially declared the COVID-19 as a global pandemic in March 2020. Consequently, nations worldwide took some preventive measures, including lockdowns, quarantines, and social distancing to slow down the spread of coronavirus. This unprecedented event has profoundly disrupted the normal way of life. The pandemic had devastating impacts on various aspects of society such as healthcare systems, social life, the economy, and education. People from around the world began expressing emotions of fear, isolation, and various kinds of traumatic disorders on social media networks such as Twitter and Facebook. This research paper explores the impacts of COVID-19 in Morocco using topic modeling, sentiment analysis, and time series analysis. The study follows a two-step process. Initially, we employed a topic model, specifically BERTopic, to extract the main themes from a dataset containing comments gathered from the online newspaper Hespress and Twitter. Subsequently, we conducted a topic-based sentiment analysis to assess how COVID-19 has impacted Moroccans through a time window of three years. The findings revealed that sentiments related to the various topics were highly negative. In addition, we leveraged time-series data on COVID-19 to examine how the evolving epidemiological situation influenced sentiments from March 2020, the beginning of the pandemic, until the end of 2022. Our analysis indicated a strong correlation between changes in COVID-19 cases and sentiment analysis results.
... With the help of the TextBlob Python library, we conducted a sentiment analysis of influencer statements to gain insight into the emotional tone and impact. Each statement was classified as positive, negative or neutral based on the sentiment expressed (Kaur & Sharma, 2020;Nemes & Kiss, 2021). This analysis provided a high-level understanding of the overall sentiment landscape within the influencer marketing domain. ...
Preprint
As Artificial Intelligence and machine learning (ML) permeate diverse aspects of daily life, a new category of 'consumer mimic technology' has emerged, promising personalised and efficient user experiences. The purpose of this chapter is to gain a better understanding of consumer mimic technology in relation to influencer marketing through a robust methodology that combines qualitative and quantitative analysis and discusses its applications , impacts and ethical ramifications. The analysis reveals how mimic technology shapes consumer behaviour via highly tailored influencer content. Findings suggest these systems effectively drive engagement and purchasing decisions by leveraging emotional connections and projecting credibility. However, the extensive tracking of personal data to enable mimic algorithms frequently disregards user privacy and transparency. As this chapter highlights, personalisation must be balanced with ethical boundaries, and mimicry technology should be developed in a manner that empowers consumers while avoiding harm to them. At the intersection of marketing, consumer behaviour, ML and privacy, this work elucidates the promise and peril of this rapidly evolving field. With the advancement of these technologies , further research can help promote ethical innovation.
... Sentiment analysis techniques are generally supported by most automated text classification tools, which are regularly used by marketers as a computer-supported, fast, scalable, and effective way of measuring consumer sentiment (Dhaoui et al., 2017). Automated sentiment analysis is receiving increasing attention from both academia and industry and has become one of the main techniques for handling big social media data (Nemes & Kiss, 2021). Typically, automated sentiment analysis techniques are used to classify any text-based document into predefined categories that reflect the polarity of the sentiments referenced in the text (Chakraborty et al., 2020;Drus & Khalid, 2019). ...
Article
Full-text available
IntroductionThis study aims to identify the most popular topics and words in Twitter conversations regarding cyberattacks on Bank Syariah Indonesia that occurred in May 2023. It also seeks to analyze the sentiments, emotions, and potential customer churn of netizens following cyberattacks.Objectives The objective of this study is to investigate the public's response to cyberattacks on Bank Syariah Indonesia, focusing on identifying key topics, analyzing sentiments and emotions, and estimating potential customer churn.Method This study uses a qualitative method with a sentiment analysis approach utilizing Orange Data Mining software. The data comprises tweets collected from May 10, 2023, to May 24, 2023, using keywords such as "BSI" and "Bank Syariah Indonesia," resulting in 30,014 tweets. Sentiment and emotion analyses were conducted to categorize tweets and identify the prevalent sentiments and emotions.ResultsThe analysis reveals that the words "BSI," "Data," and "Lockbit" are most frequently mentioned, indicating the relevance of the cyber attackers who targeted Bank Syariah Indonesia. The sentiment analysis showed that 56% of the tweets were neutral and dominated by emotions of joy. The study also identifies a short-term potential churn rate of 1.60% for Bank Syariah Indonesia's total customer base, indicating the risk of customers switching to other banks.ImplicationsThe results highlight the importance of robust cybersecurity measures and quick response strategies for maintaining customer trust and satisfaction. Financial institutions, particularly banks, must prioritize information and technology security to prevent customer churn and ensure the continuity of their services.Originality/NoveltyThis study provides insights into public reactions to cyber-attacks on Islamic banks, emphasizing the role of sentiment and emotion analysis in understanding customer behavior. This offers practical implications for improving risk management and customer retention strategies in the banking sector.
Article
Respiratory diseases are contagious and immensely affect all aspects of people and spread through the air or direct contact. COVID-19 is one of the most dangerous respiratory infections, and it has exaggerated many countries. The battle to curb its spread was waged in every country, even with few or no infections. Vaccination is one of the most vital to fight against COVID-19, and it started in India in January 2021. Every country's government has created awareness programs about COVID-19 and its updates through messages and videos on social media to reduce misconceptions and panic that followed due to the outright misinformation about COVID-19 and its impacts. This study classifies the medical vaccination tweets related to COVID-19; we extracted the tweets regarding vaccination in India from 1 January 2021 to 31 December 2021. We classified the tweets into four categories: pro-vaccine, anti-vaccine, hesitancy and cognizant. We performed the text summarization using fuzzy logic and classification using the stacked ANN and compared the results using the different word embedding models. During the vaccination period, we identified that allergy is a general topic discussed by individuals in social media through quadratic discriminant analysis. The proposed model surpassed the results of the baseline models and achieved an accuracy of 96.7%.
Article
Full-text available
Microblogging platforms like Twitter can convey short messages to direct contacts, but also to other potentially interested users. They are actively exploited either by individual users or whole organizations and companies. This paper describes some results we obtained from the Social Network and Sentiment Analysis of a Twitter channel, related to a pop music event. Apart from the particular results a methodology and some guidelines for the automatic classification of Twitter content are discussed.
Article
Full-text available
Twitter sentiment analysis provides the organizations with the ability to surveying public emotion towards the events or products related to them. Most of the studies are focusing on obtaining sentiment features by analyzing lexical and syntactic features that are expressed explicitly through sentiment words, emoticons, exclamation marks etc. In this paper, we introduce a word embeddings obtained by unsupervised learning on large twitter corpora that uses latent contextual semantic relationships and co-occurrence statistical characteristics between words in tweets. These word embeddings are combined with n-grams features and word sentiment polarity score features to form a sentiment feature set of tweets. The feature set is integrated into an deep convolution neural network for training and predicting sentiment classification labels. We experimentally compare the performance of our model with the baseline model that is a word n-grams model on five Twitter datasets, the results indicate that our model performs better on the accuracy and F1-Measure for Twitter sentiment classification. OAPA
Article
Sentiment analysis is crucial for many social media analytic tasks. Earlier researches mainly focus on single modality, e.g., text description or visual content. Recently, more and more works pay attention to the incorporation of multiple modalities. Different from the traditional image database, social images usually interconnect with each other, which makes the sentiment analysis nontrivial. Most existing methods consider different images independently, which cannot be directly applied to the interconnected images. In this paper, we propose a novel Hierarchical Deep Fusion (HDF)model to explore the cross-modal correlations among images, texts, and their social links, which can learn comprehensive and complementary features for more effective sentiment analysis. Specifically, we combine the visual content with different semantic fragments of textual content through a three-level hierarchical LSTMs (H-LSTMs)to learn the inter-modal correlations between image and text at different levels. To exploit the link information effectively, the linkages among social images are modeled by a weighted relation network and each node is embedded into a distributed vector. Then, the extracted image–text features and node embeddings are fused by a Multi-Layer Perceptron (MLP)to further capture the non-linear cross-modal correlations for sentiment prediction. Comprehensive experiments are conducted to demonstrate the effectiveness of our approach on both machine weakly labeled and manually labeled datasets.
Conference Paper
Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the resulting LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.
Article
Sentiment analysis is one of the prominent fields of data mining that deals with the identification and analysis of sentimental contents generally available at social media. Twitter is one of such social medias used by many users about some topics in the form of tweets. These tweets can be analyzed to find the viewpoints and sentiments of the users by using clustering-based methods. However, due to the subjective nature of the Twitter datasets, metaheuristic-based clustering methods outperforms the traditional methods for sentiment analysis. Therefore, this paper proposes a novel metaheuristic method (CSK) which is based on K-means and cuckoo search. The proposed method has been used to find the optimum cluster-heads from the sentimental contents of Twitter dataset. The efficacy of proposed method has been tested on different Twitter datasets and compared with particle swarm optimization, differential evolution, cuckoo search, improved cuckoo search, gauss-based cuckoo search, and two n-grams methods. Experimental results and statistical analysis validate that the proposed method outperforms the existing methods. The proposed method has theoretical implications for the future research to analyze the data generated through social networks/medias. This method has also very generalized practical implications for designing a system that can provide conclusive reviews on any social issues.
Article
We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.
Article
Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervised objectives, which often suffer from insufficient training data. In this paper, we use the multi-task learning framework to jointly learn across multiple related tasks. Based on recurrent neural network, we propose three different mechanisms of sharing information to model text with task-specific and shared layers. The entire network is trained jointly on all these tasks. Experiments on four benchmark text classification tasks show that our proposed models can improve the performance of a task with the help of other related tasks.