Content uploaded by Attila Kiss
Author content
All content in this area was uploaded by Attila Kiss on Jul 14, 2020
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tjit20
Journal of Information and Telecommunication
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjit20
Social media sentiment analysis based on
COVID-19
László Nemes & Attila Kiss
To cite this article: László Nemes & Attila Kiss (2020): Social media sentiment analysis based on
COVID-19, Journal of Information and Telecommunication, DOI: 10.1080/24751839.2020.1790793
To link to this article: https://doi.org/10.1080/24751839.2020.1790793
© 2020 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
Group
Published online: 14 Jul 2020.
Submit your article to this journal
View related articles
View Crossmark data
Social media sentiment analysis based on COVID-19
László Nemes and Attila Kiss
Department of Information Systems, ELTE Eötvös Loránd University, Budapest, Hungary
ABSTRACT
In today’s world, the social media is everywhere, and everybody
come in contact with it every day. With social media datas, we are
able to do a lot of analysis and statistics nowdays. Within this
scope of article, we conclude and analyse the sentiments and
manifestations (comments, hastags, posts, tweets) of the users of
the Twitter social media platform, based on the main trends (by
keyword, which is mostly the ‘covid’and coronavirus theme in
this article) with Natural Language Processing and with Sentiment
Classification using Recurrent Neural Network. Where we analyse,
compile, visualize statistics, and summarize for further processing.
The trained model works much more accurately, with a smaller
margin of error, in determining emotional polarity in today’s
‘modern’often with ambiguous tweets. Especially with RNN. We
use this fresh scraped data collections (by the keyword’s theme)
with our RNN model what we have created and trained to
determine what emotional manifestations occurred on a given
topic in a given time interval.
ARTICLE HISTORY
Received 20 May 2020
Accepted 30 June 2020
KEYWORDS
natural language processing;
recurrent neural network;
sentiment analysis; social
media; visualization
1. Introduction
The main goal is to train a model to sentiment prediction by looking correlations between
words and tag it to positive or negative sentiment.
In today’s world, social media platforms like twitter are of immense importance to
people’s everyday lives. We definitely have to deal with the manifestations on these plat-
forms, and as machine learning becomes more and more popular and important just like
the natural language processing (NLP), we have to deal with this, and analyse and research
the emotions on this platforms.
There are many ways to approach a topic, from ‘pure’dictionary-based analysis to ‘more
serious’deep learning, neural networks. By building learning algorithms and classifiers, we
strive to label the relevant tweets with the appropriate emotional polarity.
As we mentioned at the beginning of the introduction, the main objective of this article
is to develop a model for predicting emotions by focusing on the relationship between
words, thus labelling specific entries, as opposed to the usual ‘positive’and ‘negative’
decomposition, we get a much wider scale for more accurate forecasting. However, at
the focus point, there is no larger dataset, but the properly trained model analyses with
© 2020 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/
licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
CONTACT Attila Kiss kiss@inf.elte.hu
JOURNAL OF INFORMATION AND TELECOMMUNICATION
https://doi.org/10.1080/24751839.2020.1790793
a newly mined dataset that matches the current trend (coronavirus themes now) and
dataset build number, (number of scraped tweets) which narrowing the circle of a
larger amount of data into a narrower topic. In this way, we do not only indicate that
the data should be positive or negative, we also provide a more detailed breakdown of
the emotional levels. This can provide more accurate data than analysing larger datasets,
as fresh mining is always available, so you can get much faster and more accurate results
as final result than earlier larger samples and other polls.
We also compare our model with other third-party options to see how small details play
a very important in proper categorization using a properly taught a Recurrent Neural
Network model for different messages. Thus, as mentioned earlier, focusing on specific
topics, by analysing a given number of messages (tweets), and waiting the particular
emotional outcomes related to the topic. According to our estimates, we expect a more
accurate and detailed analysis and categorization of an emotional analysis related to a
current topic, which can provide a more stable and accurate basis for various sociological
and other studies. It also provides a different approach to research on the pandemic,
focusing on the rapidly changing human mood and opinion. Such as the changes and
manifestations of human moods in a given period of the coronavirus on social
media.(Twitter)
The model was built and taught using the libraries and capabilities provided by ten-
sorflow. By analysing a Recurrent Neural Network (RNN). The rest of this article contains
sections on the structure and use of the encoder, model and results.
2. Related works
Emotional analysis of twitter datasets within the article of Balahur (2013) using unigram
and bigram (n-gram) and supervised learning with simple Support Vector Machines.
Based on the results we can conclude that on the one hand, the best properties to use
emotional analysis is the unigram and the bigram together. Second, we can see that gen-
eralizations, using unique tags, emotive words and modifiers are strongly improve the per-
formance rating of emotions. (joy, happy, sadness, fear, etc.) Presented in another article,
Jianqiang and Xiaolin (2018) introduces a word embedding method implemented, based
on unsupervised learning and large twitter corpora, the method uses hidden contextual
semantic relationships and co-occurrence statistics between tweets and words. These
word embeds are combined with n-gram characteristics and word mood polarity score
characteristics form a set of tweet emotional features. Set is integrated into a deep convo-
lutional neural network.
The method, which described by Ortis et al. (2018) uses text extracted from the descrip-
tion of different images instead of classic user entries. Then defines a multimodal embed-
ding space based on the text properties. The emotional examination being performed by a
supervised Support Vector Machine.
This study explores techniques of Leskovec (2011) for modelling, analysing, and opti-
mizing social media. First, they show us how to collect large amounts of social media
data. Then it will continue to discuss methods for obtaining and tracking information
and how to build forecasting models for information dissemination and inclusion.
Finally, they discusses methods for monitoring the flow of emotions across the network
and the development of polarization.
2L. NEMES AND A. KISS
With the Recurrent Neural Network by Mikolov et al. (2010), which is intentionally run
multiple times and the goal with statistical language modelling is to predict the next word
in textual data in its given context. Where the experiments show significant reduction of
word error rate. In addition, Mikolov et al. (2011) shows that the recurrent neural network
language model (RNN LM) significantly outperforms many competitive language model-
ling techniques. And approaches that result in more than 15-fold acceleration in both
the training and testing phases are presented. Finally, they discuss options for reducing
the parameters of the models. The resulting RNN model is thus smaller, faster in both train-
ing and testing, and may be more accurate than the base. Besides in another article, we
can cover up the SummaRuNNer (Nallapati et al., 2017) which is a Recurrent Neural
Network (RNN) based sequence model, and interpretable neural sequence model which
is proposed to summarize extraction documents. Which shows that, it is better performing
than or is comparable to the state-of-the-art deep learning models.
Following this, we were introduced to learning several related tasks together using a
multitasking learning framework by Liu et al. (2016). Based on the recurrent neural
network, three different mechanisms are proposed sharing information to model text
with task-specific and shared layers. Textual classification tasks shows that, the proposed
models can improve the task using other related tasks.
In another work, Arras et al. (2017) presented a simple and effective strategy for extend-
ing the Layer-wise Relevance Propagation (LRP) process to repetitive architectures such as
LSTMs, by proposing a rule for reproducing relevance through multiplicative interactions.
The extended LRP version was applied bidirectionally. The LSTM model shows the
emotional prediction of sentences to see if the relevance of the resulting words is reliable
and what the classifier’s decision for or against a particular class is and how they perform
better than gradient-based decomposition.
Getting to know a different perspective, we can discover the SmartSA, a lexicon-based
sentiment classification system for social media genres by Muhammad et al. (2016), which
integrates contextual grasp strategies in two different ways: interaction of terms with their
local context and global context. They also present a hybridization method for a general
purpose lexicon, SentiWordNet, with genre-specific vocabulary.
Besides, we can focus to describes an emotional analysis study by Neri et al. (2012),
which includes more than 1000 Facebook posts based on news summaries of Rai –the
Italian public broadcaster service versus the emerging and more dynamic La7 private
company. This study maps study results with observations made by the Osservatorio di
Pavia, an Italian research institute specializing in theoretical media analysis.
Along with the growth of web content, there is an increasing number of hate speech on
various platforms, which provide a suitable filtering tool for natural language processing
by Schmidt and Wiegand (2017). It is shown that character-level approaches work
better than token-level approaches, and that a lexical list of resources, such a list of
slurs, can help rank, but usually only in combination with others.
Additionally, we can also get to introduce a new metaheuristic method (CSK) by Pandey
et al. (2017), based on K-means and cuckoo search. The method provides a new way to
find optimal cluster heads based on the sentimental content of the Twitter dataset.
Wang and Li (2015) extends significant advances in text-based emotional prediction
tasks to a higher level of prediction of emotion behind images. They show that visual
and textual features alone are not sufficient for accurate emotional tagging. Experiments
JOURNAL OF INFORMATION AND TELECOMMUNICATION 3
with two large datasets show that the proposed method significantly improved the exist-
ing state-of-the-art methods.
Finally, Xu et al. (2019) propose a new Hierarchical Deep Fusion (HDF) model for explor-
ing the transverse relationship between images, text, and their social relationships, which,
with their complementary features, make emotional analysis more effective. Visual content
is combined with various semantic fragments of textual content using three-level hierarch-
ical LSTM (H-LSTM) to learn the inter-modal correlation of image and text at different
levels.
3. DataSet/DataFrame building for the analysis
3.1. Existing dataset usage
Of course, we also have the option to use data from external sources that was previously
built from tweets for specific topics (possibly a huge mixed tweet collection or some more
specific collection), but in this case, we have to keep in mind that, these data may not be
up to date. So it can also be a previously compiled collection and there are several sources
where you can access and download datasets.
Basically, this would not be a problem, but under the circumstances, we try to rely on
the most up-to-date datas for test dataset. However, it may be suitable for comparison to
what extent the writing trend of a given circle influences the outcome of the analysis.
3.2. Build dataSet using Twitter API for scraping
Using the twitter developer tools, we build a test dataset using a scraping script, which
compiles our data collection from tweets into a topic based dataset with the given
keyword and a tweet scrape data number. In the state before use for analysis, we have
the possibility to submit this data for a completely different non-RNN-based testing, as
the dataset construction also supports the performance of a completely different, tra-
ditional analysis. For example, Excel-based processing (not deep learning).
About the methods that perform the scraping and cleaning, our main method is the
‘datasetbuilding’where according to the parameters we need a keyword for the current
scraping, a tweet count limit (how many tweet do we need in this theme) date intervals,
which time period where we would like to extract data in this related topic, and of course
the language, where we used English in all cases. For the scrape, we have also used
the tweetpy library for the Twitter API. Plus we perform the ‘extra’cleaning with the
‘cleantweet’method (Listing 1).
However, we would like to use the Recurrent Neural Network what we have built, and
we also would like to use the test dataset (which we are freshly scraping and mining.) on
our already trained model. The scraping script what mentioned above makes this possible,
because the dataset has undergone proper formatting and cleaning.
Overall, after compiling the dataset itself, we have the opportunity to use this data in a
completely different traditional (Excel) analysis as well. But, these systems and structures
are supported by the script in an orderly, uninterrupted manner and also run the analysis.
The analysis will focus primarily on a separate specific topic, which will be the Coronavirus.
On this Figure 1, we can see there are a lot of another possibility and method to analyse
with this dataset.
4L. NEMES AND A. KISS
4. Different ways of sentiment analysis
As we mentioned there are several different possibility to the Natural Language processing
and Sentiment analysis. If we would like to separate that into two categories, first, the
classic dictionary style, which is not the most modern way as opposed to the Deep Learn-
ing possibilities. What we also use in this analysis, is the Recurrent Neural Network.
In classical dictionary-based analysis, we have a pre-set vocabulary where each word
has a value, whether the effect of the word is positive or rather negative. Accordingly,
the sentences are decomposed so that each word is identified, and then, according to
our dictionary, we assign the given value to the effect what that word also has. The
sum of these values would give the emotional value of our particular sentence in the
most general case. Of course, we can run into a lot of problems here, as denials, double
denials, word turns, word combinations that can affect emotion which cannot be
detected. This is why it has shifted this topic towards Deep Learning, using properly
trained models.
Figure 1. Part from the fresh mined DataSet.
Listing 1. Part of the Twitter dataset builder
JOURNAL OF INFORMATION AND TELECOMMUNICATION 5
4.1. Deep learning –RNN
We use and build Recurrent Neural Network (RNN).
What is Recurrent Neural Network (RNN)
1
–A neural network that is intentionally run
multiple times, where parts of each run feed into the next run. Specifically, hidden
layers from the previous run provide part of the input to the same hidden layer in the
next run. Recurrent neural networks are particularly useful for evaluating sequences, so
that the hidden layers can learn from previous runs of the neural network on earlier
parts of the sequence.
For example, one recurrent neural network that runs four times. Notice that the values
learned in the hidden layers from the first run become part of the input to the same
hidden layers in the second run. Similarly, the values learned in the hidden layer on the
second run become part of the input to the same hidden layer in the third run. In this
way, the recurrent neural network gradually trains and predicts the meaning of the
entire sequence rather than just the meaning of individual.
In addition to the RNN, the advantages are that it is possible to process inputs of any
length. The size of the model does not increase with the size of the input. The calculation
takes into account historical information. The weights are distributed as a function of time.
Of course, it should be noted that some general counter-arguments are mainly that the
calculation is slow.
4.2. RNN model build and train
The tools provided by Keras and Tensorflow were used to build the model. Where we
created a Sequential model by passing a list of layer instances to the constructor and
the first layer is the Embedding layer, which can be used for neural networks on text
data. It requires that the input data be integer encoded, so that each word is represented
by a unique integer. The embedding layer is initialized with random weights and will learn
an embedding for all of the words in the training dataset. Then we used Bidirectional
wrapper for RNNs. Next is the Dense and Dropout layers. A dense layer is a classic fully con-
nected neural network layer, each input node is connected to each output node. A
dropout layer is similar except that when the layer is used, the activations are set to
zero for some random nodes. This is a way to prevent overfitting.
Then we also save our trained models in .h5 format with the actual training date, to
reuse that, if we need to. Also we have a another possibility to load this trained models
and use it on the new scraped datas. (There is a separate menu option to use a re-
trained model or a previous model where we use the name of this .h5 file to refer this.)
There are numerous way to use train and test datasets before you use the trained
model in a real dataset. Tensorflow gives us numerous datasets for example: ‘imdb
reviews/subwords8k’and ‘civil comemments’etc. We can split it up to train and test
dataset and use it for ‘compile’and ‘fit’model calls, and of course we can use our own
datasets for this train and test phase as well. In the case of models trained by external data-
sets, we can talk about ‘continuous learning’, since another dataset is made for the model
and we use the result for our own actual datasets.
For display, we use the matplotlib.pyplot package, where our model walk through the
given dataset and use the predict method. Accordingly, we categorize how positive and
6L. NEMES AND A. KISS
negative the emotional value of the tweet or sentence, plus visualize this results with a
colourful plot.
4.3. RNN analysis –themes and results
To examine and compare the model, the coronavirus topic (which is the most prominent
and up-to-date topic of our time, the recent data mining results have a lot of potential, as
there is no pre-compiled dataset here and rapid changes can be topical here) and different
numbers of fresh datas, what we mine. In addition, comparisons are made with several
third-party applications and we also compare with traditional, classical analysis what differ-
ences and conclusions can be drawn about efficiency, accuracy, and speed in different
cases.
We expect that the model we have trained and developed and taught in detail can
provide more accurate results for today’s online communication formulas, difficult
multi-meaning sentences and unique topics than a traditional or a third-party application
that also works with accurate but larger error ranges than our more accurately prepared
model.
4.3.1. Compare to the oldfashion research work
Traditional polling or purely human work, tracking, data collection, analysis, these pro-
cesses are time consuming. The result would be very accurate, but by the time the
report is completed, the conclusion may be outdated. The result would no longer be
relevant. Thus, in the case of any human labour trigger, especially data mining, scraping
can be a huge step forward as a test dataset. In this way, the process takes less time
and we are also able to use a number of other third-party tools to speed up our
processes.
In essence, we can discover incomparably large differences between traditionally
supported analyses and analyses which supported by different scraping and other
dataset compilation options, as the difference is found in time and accuracy. In addition
to the traditional research process, it should also be mentioned because people do the
analysis, so the accuracy of the tweet polarity would be really good, but it cannot cover
such a large sample, i.e. not in such a short time, so the results may be much more
relevant for analyses despite perhaps a larger error factor, as up-to-date and fast
results as well as partial results can be obtained. Not to mention the special well-
trained neural networks, the results and speed of which cannot be measured by the
speed of human work.
4.3.2. Compare to some third-party sentiment analyser like TextBlob
As mentioned earlier for the analyses, we will use coronavirus theme, which dominates
social media platforms.
The pre-measurement expectations are as follows: We would like more accurate, less or
even zero neutral expression in the results, as these data would greatly distort the real
picture, and basically we would like to minimize the neutral category as much as possible.
Based on the small details, the twisted manifestations, we expect some cases move to the
negative or positive direction from the neutral space. For both TextBlob and RNN, we apply
a same appropriate categorical distribution to different levels of feel.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 7
Thus, we scraped a different number of tweet data in each analysis and compared the
results of the test datasets. (Fresh Scraped Tweets dataset what we use on the trained RNN
model and TextBlob as well.)
The large-scale presence of the given topic on Twitter was already visible in the first
rounds, it greatly influences the results. Initially, the first difference between the trained
models came out on a smaller sample of 10 and 20 pieces. Using the functions of TextBlob,
you can see how many different and cluttering tweets direct the end result of the analysis
to the neutral topic, and we often get a smaller but positive end result, which of course was
also the case in our own model. (With a smaller or zero neutral segment and a better dis-
tributed area.)
Primarily against the background of this phenomenon, looking at the test datasets,
which data currently analysed, it was noticeable that the age group currently on twitter
who is mostly active is young/younger. Thus, school closes appear as a positive phenom-
enon in smaller samples and with a small positive and neutral direction for the end result.
In addition, the hospital donations also moves the end result in a positive direction. There
is a trend in addition to negative deaths, tweets about these donations and cohesion are
much more present even in small samples, of course here the influence of the current
scrape is great on what data it collects. Plus, the factuality of newscasts also reinforces
the neutral or weakly positive or weakly negative slices. One cannot emotionally shift
the simple statement in any direction in most cases.
Other third-party models will not be mentioned in detail, as an analyst based on a
simple dictionary has already given completely misleading results on tweets that have
reported positive or negative disease of the virus outcomes on a given topic. Like
(Figure 2), the textBlob and our own well-trained model were able to filter out these
word turns and manifestations really accurately. (Maybe, the RNN looks more significant,
but now, we cannot prove it 100%, but the RNN has not have a Neutral section most of the
time, which gives us more improvement to the analysis.) Mainly the amount of test data
will be the influencing factor.
Note: The RNN model was trained based on an imdb review dataset (In test and train
dataset sections using shuffle method as well. Then we use the fresh scraped dataset as
test dataset with this trained model.)
We can see on the figures (For this run, the keyword was the ‘covid’.), the RNN
managed to categorization on all tweets without giving a neutral result, so we conclude
that the model ‘was better’defined in the smaller details and categorized it based on
the small details. Our model stands out in the strongly positive and the weakly negative
sections, which is a good indicator of the division of the topic and the abundance of
interactions on the topic. Of course, it can be noticed that on social media platforms,
positive manifestations continue to dominate which also driven by partial results, but it
is also realistic that there are also calls for negative and different perspectives. TextBlob
also deviates in the positive direction as our model, both results tipped in the same
direction, but a larger neutral value can also be noticed in this case in addition to
the negative manifestations. Overall, the categorization of both models can be realistic,
the difference is to be found primarily in the detail handling of the models, which
hopefully our model handled better even with so little test data. Figure 2 worked
from this DataSet (Figure 3).
8L. NEMES AND A. KISS
We continue to compare TextBlob and our own RNN model, how it performs on larger
and larger test datasets, and how accurate it gives less erroneous results, with double
denials and other, ‘sleng’and general manifestations, reports.
Between 24 April 2020 and 25 April 2020 on the sample of 50, we can see the increasing
distance towards the two extremes. In the case of the RNN (b) (Figure 4) model, again,
tweets did not fall into the neutral category, they were subdivided into weakly negative
and weakly positive parts, as opposed to textBlob (a) (Figure 4), where there is a more sig-
nificant neutral unit. In addition, there is a kind of progress towards extremism, which can
be concluded that people are already starting to ‘get bored’of this whole topic, and the
daily numbers and the situation itself. Of course, the high divisions can be inferred from
the different policy reactions and announcements and the tweets that respond to
them, which either express a sympathetic opinion or a dissenting opinion about the situ-
ation. (Looking at the dataset, we can see a strong wave of manifestations about the
decisions and political influence of the WHO –which amplified positive negative opinions
probably.)
Increasing (200 tweets) the dataset but still using the keyword ‘covid’, we can see that
the division is still similar. A kind of increase in the positive direction can be detected,but
this increase in the amount of tweets can be explained in this case of both models.(There is
Figure 2. Analysis of sample of 20 tweets by TextBlob and RNN, using ‘covid’keyword. (a) TextBlob
result, (b) RNN result.
Figure 3. 15 April 2020 DataSet.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 9
adifference in the strength of positivity between the two results.) The increased number of
tweets shows that mostly the ‘positive expression’, the support, and the ‘hope’–greater
extent of positivity is still highly present in social media –which was expected, but nega-
tive messages are also present in significant amounts on the subject (Figure 5).
Using the keyword ‘coronavirus’and a much larger dataset, the result is very similar to
the trends so far. Smaller increase in both positive and negative directions, we can see only
smaller movements in the strength of positivity or negativity (Figure 6).
Our model did not place a tweet in a neutral section, which makes it easier to see differ-
ences of opinion. It should also be mentioned that our model evaluates tweets between 0
and 1, while textBlob between −1 and 1. The categories would be defined accordingly, so
that a few small details of the tweet are able to move that into another category. Because
Figure 4. Analysis of sample of 50 tweets by TextBlob and RNN, using ‘covid’keyword. (a) TextBlob
result, (b) RNN result.
Figure 5. Analysis of sample of 200 tweets by TextBlob and RNN, using ‘covid’keyword. The time
period stands between 24 April 2020 and 25 April 2020. (a) TextBlob result, (b) RNN result.
10 L. NEMES AND A. KISS
of these small details, we can say that perhaps we can get a more comprehensive picture
in order to avoid neutrality (Figure 7).
People are divided on this topic, just like in most other cases nowadays. They clash
arguments on different topics, try to draw conclusions in this place, convince others
about the support. In most cases, strong negativity or positivity is a bit, meaning most
people are not biased completely towards one side, but there is a visible percentage
who is totally biased –that is normal today. For smaller DataSets, these are strengthened
a little better, which is also normal.
Examining the period between 13 May 2020 and 14 May 2020 again using the keyword
‘covid’, we obtained a result (with 200 tweets) very similar to the previous month’s 200
tweets. Overall, movement can be observed in the categories delimiting strength
(weakly, strongly) within the positive and negative sides. So, it can be concluded that
the RNN model (b) (Figure 8) continues to deliver significantly results with small
Figure 6. Analysis of sample of 500 tweets by TextBlob and RNN, using ‘coronavirus’keyword. The time
period stands between 24 April 2020 and 25 April 2020. (a) TextBlob result, (b) RNN result.
Figure 7. 24 April 2020 Part of the dataSet of the ‘coronavirus’keyword.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 11
changes in time and still without categorization into the neutral section. Looking at the
program logs we saw that in some cases decimal values have decided a particular
writing to be weakly positive or negative and not neutral. The weakly positive in this
case is 7.50% and the weakly negative is 10.50%. Based on the results of the RNN
model, it can be said that positivity is still more present in social media in the case of pan-
demic-related manifestations. Based on the result of TextBlob (a) (Figure 8), we see a
similar result in the positive direction, but with a significant 30% neutral data, and the
weakly positive section is 36.50% against the RNN’s 7.50%.
Overall, the RNN chart provides a much more realistic and thorough picture of current
emotional levels (for us) with minimal or even zero neutral results.
If we increase the number of tweets to 500 in the same time period. In the case of the
RNN model, we can observe a strengthening in the negative section (simple negative not
together with the strongly and weakly negatives), which can also be said for the result of
TextBlob.
In our textBlob (a) (Figure 9) analysis, we can see again 29% of neutral value, in addition
to a weakly negative value of 17.80%. For the RNN model (b) (Figure 9), again, the neutral
result is 0% and only 8.60% is weakly negative. Overall, comparing the categorical values of
the two analyses, the positive displacement can be said again, but the division of this end
result is reflected in a completely different way in the two models. In the case of RNN, a
positive value of 24.80% can be observed, in addition to the negative value, which is
22.40%, which is a proportionate division and the positive manifestation in the sample
of 500 tweets are a little more. In contrast, in a TextBlob analysis, weakly positive value
is 35.20%, which is dominate. The positive value is 11% and a negative value is 4%.
The reactions and evaluations of various political announcements and decisions, after
the announcement, provoke significant activity from the people who argue and talk about
the effects in the social media. Thus drastically increasing the number of tweets related to
the topic. A similar reaction has been shown by various international events on this sub-
jects, especially after the details have been described. (There is a visible shift into the posi-
tive and negative directions, sometimes from the neutral, but also there are some changes
in the strength distribution of positivity and negativity itself.)
Figure 8. Analysis of sample of 200 tweets by TextBlob and RNN, using ‘covid’keyword. The time
period stands between 13 May 2020 and 14 May 2020. (a) TextBlob result, (b) RNN result.
12 L. NEMES AND A. KISS
How long the ‘covid’and ‘coronavirus’topics will be dominant on the entire Internet no
one knows. If the vaccine will be available, the topic is still expected to stay with us for a
significant period of time and it will still to dominate the various community platforms
with its subsequent effects (Figure 10).
5. Conclusion and future work
In this work, we use a Recurrent Neural Network (RNN) to classify emotions on tweets. We
developed a model to analyse the emotional nature of various tweets, using the recurrent
neural network for emotional prediction, searching for connections between words, and
marking them with positive or negative emotions. Where instead of simple positive and
negative extremes, we have classified the various texts into a much more articulated
class of emotional strength (weakly positive/negative, strongly positive/negative). This
Figure 9. Analysis of sample of 500 tweets by TextBlob and RNN, using ‘covid’keyword. The time
period stands between 13 May 2020 and 14 May 2020. (a) TextBlob result, (b) RNN result.
Figure 10. 13 May 2020 Part of the dataSet of the ‘covid’keyword.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 13
has been combined with a keyword-based special data scraper, so we can apply our
taught RNN model with these specific freshly scraped datasets. As a result, we get an
emotional classification related to specific topics. What kind of tweets they were and
what emotional class they belong to, what is the distribution on that topic at the emotional
level within the given start interval. In the article, we focused most on the coronavirus and
related emotional changes and fluctuations, and it was shown that the overall positive
manifestation and presence on the social platform remained on social media surfaces
during this pandemic. Of course, in addition to negative and other manifestations. Over
time, positivity has strengthened, but there is also a stronger negative array that is
natural. According to our expectations this topic remain positive manifestations, some-
times with a higher and sometimes with a smaller percentage. It can be seen that the
recurrent neural network provides good performance and prediction in text classification.
Where the RNN model brought a smaller amount of data in neutral result or completely
reduced to zero that. Which proves that our model is ‘able to make’a decision and categor-
ize in some direction even on the basis of small details. Our comparisons were made
mainly against TextBlob, which also worked very well and delivered stable results, but
there were many times when the neutral results were above 30% compared to our RNN
model, which we cannot use as usefully for further evaluations as for our RNN model.
The classification of emotions for both models (TextBlob, RNN) was properly segmented.
For future work and further development, it may be advisable to create an interface that
better visualizes and interacts with users, which can be supplemented with sophisticated
database management for archiving, tracking, and exploring datas to other areas. We can
further expand the analysis by introducing various classifications and clusters as well as
other data analyses. Allowing examinations and comparisons from a new perspective, in
addition to emotional analyses, may even provide an opportunity to further support
current results and compare the conclusions. In addition, implementing or refactoring
future potential tensorflow features and keeping it up to date.
Note
1. https://developers.google.com/machine-learning/glossary/#recurrent_neural_network
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
The project has been supported by the European Union, co-financed by the European Social Fund
[grant number EFOP-3.6.3-VEKOP-16-2017-00002].
Notes on contributors
László Nemes received the B.Sc. degree in computer science from Eötvös Loránd University in 2020
and currently pursuing a M.Sc. degree. He is a Demonstrator with the Department of Media and Edu-
cational Technology, Eötvös Loránd University.
14 L. NEMES AND A. KISS
Attila Kiss was born in 1960. In 1985 he graduated (MSc) as mathematician at Eötvös Loránd Univer-
sity, in Budapest, Hungary. He defended his PhD in the field of database theory in 1991. Since 2010
he is working as the head of Information Systems Department at Eötvös Loránd University. His scien-
tific research is focusing on database theory and practice, security, semantic web, big data, data
mining, artificial intelligence and bioinformatics. He was the supervisor of seven PhD students. He
has more than 145 scientific publications.
ORCID
László Nemes https://orcid.org/0000-0001-6167-9369
Attila Kiss https://orcid.org/0000-0001-8174-6194
References
Arras, L., Montavon, G., Müller, K. R.., & Samek, W.2017). Explaining recurrent neural network predictions
in sentiment analysis. Preprint arXiv:1706.07206.
Balahur, A. (2013). Sentiment analysis in social media texts. In Proceedings of the 4th workshop on
computational approaches to subjectivity, sentiment and social media analysis (pp. 120–128).
Jianqiang, Z., Xiaolin, G., & Xuejun, Z. (2018). Deep convolution neural networks for twitter sentiment
analysis. IEEE Access,6, 23253–23260. https://doi.org/10.1109/ACCESS.2017.2776930
Leskovec, J. (2011). Social media analytics: Tracking, modeling and predicting the flow of information
through networks. In Proceedings of the 20th international conference companion on world wide
web (pp. 277–278).
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learn-
ing. Preprint arXiv:1605.05101.
Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., & Khudanpur, S. (2010). Recurrent neural network
based language model. In Eleventh annual conference of the international speech communication
association.
Mikolov, T., Kombrink, S., Burget, L., Černockỳ, J., & Khudanpur, S. (2011). Extensions of recurrent
neural network language model. In 2011 IEEE international conference on acoustics, speech and
signal processing (ICASSP) (pp. 5528–5531).
Muhammad, A., Wiratunga, N., & Lothian, R. (2016). Contextual sentiment analysis for social media
genres. Knowledge-Based Systems,108,92–101. https://doi.org/10.1016/j.knosys.2016.05.032
Nallapati, R., Zhai, F., & Zhou, B. (2017). Summarunner: A recurrent neural network based sequence model
for extractive summarization of documents. In Thirty-first AAAI conference on artificial intelligence.
Neri, F., Aliprandi, C., Capeci, F., Cuadros, M., & By, T. (2012). Sentiment analysis on social media. In
2012 IEEE/ACM international conference on advances in social networks analysis and mining
(pp. 919–926).
Ortis, A., Farinella, G. M., Torrisi, G., & Battiato, S. (2018). Visual sentiment analysis based on on objec-
tive text description of images. In 2018 international conference on content-based multimedia index-
ing (CBMI) (pp. 1–6).
Pandey, A. C., Rajpoot, D. S., & Saraswat, M. (2017). Twitter sentiment analysis using hybrid cuckoo
search method. Information Processing & Management 53(4), 764–779. https://doi.org/10.1016/j.
ipm.2017.02.004
Schmidt, A., & Wiegand, M. (2017). A survey on hate speech detection using natural language pro-
cessing. In Proceedings of the fifth international workshop on natural language processing for
social media (pp. 1–10).
Wang, Y., & Li, B. (2015). Sentiment analysis for social media images. In 2015 IEEE international con-
ference on data mining workshop (ICDMW) (pp. 1584–1591).
Xu, J., Huang, F., Zhang, X., Wang, S., Li, C., Li, Z., & He, Y. (2019). Sentiment analysis of social images
via hierarchical deep fusion of content and links. Applied Soft Computing,80, 387–399. https://doi.
org/10.1016/j.asoc.2019.04.010
JOURNAL OF INFORMATION AND TELECOMMUNICATION 15