ArticlePDF Available

Abstract and Figures

The prediction and speculation about the values of the stock market especially the values of the worldwide companies are a really interesting and attractive topic. In this article, we cover the topic of the stock value changes and predictions of the stock values using fresh scraped economic news about the companies. We are focussing on the headlines of economic news. We use numerous different tools to the sentiment analysis of the headlines. We consider BERT as the baseline and compare the results with three other tools, VADER, TextBlob, and a Recurrent Neural Network, and compare the sentiment results to the stock changes of the same period. The BERT and RNN were much more accurate, these tools were able to determine the emotional values without neutral sections, in contrast to the other two tools. Comparing these results with the movement of stock market values in the same time periods, we can establish the moment of the change occurred in the stock values with sentiment analysis of economic news headlines. Also we discovered a significant difference between the different models in terms of the effect of emotional values on the change in the value of the stock market by the correlation matrices.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=tjit20
Journal of Information and Telecommunication
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/tjit20
Prediction of stock values changes using
sentiment analysis of stock news headlines
László Nemes & Attila Kiss
To cite this article: László Nemes & Attila Kiss (2021): Prediction of stock values changes using
sentiment analysis of stock news headlines, Journal of Information and Telecommunication
To link to this article: https://doi.org/10.1080/24751839.2021.1874252
© 2021 The Author(s). Published by Informa
UK Limited, trading as Taylor & Francis
Group
Published online: 01 Feb 2021.
Submit your article to this journal
View related articles
View Crossmark data
Prediction of stock values changes using sentiment analysis
of stock news headlines
László Nemes
a
and Attila Kiss
a,b
a
Department of Information Systems, ELTE Eötvös Loránd University, Budapest, Hungary;
b
Department of
Informatics, J. Selye University, Komárno, Slovakia
ABSTRACT
The prediction and speculation about the values of the stock
market especially the values of the worldwide companies are a
really interesting and attractive topic. In this article, we cover the
topic of the stock value changes and predictions of the stock
values using fresh scraped economic news about the companies.
We are focussing on the headlines of economic news. We use
numerous dierent tools to the sentiment analysis of the
headlines. We consider BERT as the baseline and compare the
results with three other tools, VADER, TextBlob, and a Recurrent
Neural Network, and compare the sentiment results to the stock
changes of the same period. The BERT and RNN were much more
accurate, these tools were able to determine the emotional
values without neutral sections, in contrast to the other two
tools. Comparing these results with the movement of stock
market values in the same time periods, we can establish the
moment of the change occurred in the stock values with
sentiment analysis of economic news headlines. Also we
discovered a signicant dierence between the dierent models
in terms of the eect of emotional values on the change in the
value of the stock market by the correlation matrices.
ARTICLE HISTORY
Received 22 November 2020
Accepted 7 January 2021
KEYWORDS
Sentiment analysis; BERT;
recurrent neural network;
stock values; dataset
building
1. Introduction
A popular goal is to develop and/or use a model to sentiment prediction by looking for
connections between words and marking them with positive or negative sentiments.
There are many opportunities these days to perform sentiment analyses, for example
external services that are almost completely ready to use it in a given context where it
is needed like TextBlob. In addition, there are options that allow us to create our own
models, train them based on our own data. Sentiment analysis with BERT is one of the
most powerful tool that we can use, but we can also create a Recurrent Neural
Network (RNN) as well or use the NLTK tool with VADER Lexicon with
SentimentIntensityAnalyzer.
© 2021 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/
licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
CONTACT Attila Kiss kiss@inf.elte.hu
JOURNAL OF INFORMATION AND TELECOMMUNICATION
https://doi.org/10.1080/24751839.2021.1874252
The stock market is one of the most important economic participants. Many people try
to interpret and dene the dierent stock market movements in many ways. In this article,
we use dierent tools to the sentiment analysis, especially focussing on the economic
news, but in terms of economic news, focussing only on the headlines of economic
news. In todays communications and news consumption, the headlines of various articles
play an even more important role than before. Now, we use sentiment analysis on these
headlines on a particular company or companies to determine the eects of the headlines
to the stock market. The question arises how much eect has the economic headline
without the economic news whole context, if it has any measurable eect at all. We
have found that it really has. Thus, we dene the dierent impacts and their perceived
signicance with a very specic and unique new approach.
Data is an important pillar of analysis. Primarily the headlines of economic news are
needed, what we use for sentiment analysis. Secondary, dierent stock market data are
also needed based on companies. There are many possibilities for data collection and
analysis, from traditionaldictionary-based performed by humans to more serious
neural networks that determine the polarity of the headlines of each economic news
and label with appropriate emotional polarity. In the case of stock market data, numerous
tools are available to obtain stock market data which can be even company-specic which
is important to us. In both cases, we work with the most up-to-date data as possible,
based on the information provided by the companies. Both, the headlines of the econ-
omic news and stock value data are related to the time period which specied by the
news. Thus, the results of the given emotional analysis and the range of stock market
data will be appropriate.
The analysis can be separated to the next sections. Collect headlines of economic news
based on companies and collect stock market data according to the timestamps of the
given economic news headlines.
Then prepare these data and apply dierent sentiment analysis tools like RNN or NLTK
with VADER Lexicon ect. The RNN model was built and taught using the libraries and
capabilities provided by Tensorow. Manage these data and compare the stock market
data and emotional data with visualization and explanation. Present how the headlines
of economic news can aect dierent stock market changes and the public.
2. Related works
Devlin et al. (2018) introduce a new language representation model called BERT, which
stands for Bidirectional Encoder Representations from Transformers, that was designed
to pretrain deep bidirectional representations from unlabelled text by jointly conditioning
on both left and right context in all layers. The new possibilities and results of this model
enable even low-resource tasks to benet from deep unidirectional architectures. This
model became one of the most signicant tool of the natural language processing.
Wang et al. (2020) introduces a public sentiment analysis during the outbreak which is
able to provides insightful information in making appropriate public health responses.
They analyze the Sina Weibo popular Chinese social media site posts, where the unsuper-
vised BERT model is adopted to classify sentiment categories (positive, neutral, and nega-
tive) and TF-IDF (term frequency-inverse document frequency) model is used to
summarize the topics of posts. Analyzing posts with negative sentiment from social
2LÁSZLÓ NEMES AND A. KISS
media could contribute to understanding the experiences and oers examples for other
countries. The analyses provide insights on the evolution of social sentiment over time
and the topic themes connected to negative sentiment on the social media sites. BERT
classication model and TF-IDF topic extraction model results were delivered with con-
siderable accuracy.
The big data is a very popular and powerful tool nowadays. Lee (2020) explores the initial
impact of COVID-19 sentiment on US stock market using big data notedly Daily News Senti-
ment Index (DNSI) and Google Trends data on coronavirus-related searches. The goal is to
investigate a correlation between COVID-19 sentiment and 11 selected sector indices of the
Unites States (US) stock market in a declared time period. Any positive or negative senti-
ment of public related to stock market crisis can have a ripple eect on decision making
by investors in stock markets. The results reveal the distinct eects of the COVID-19 senti-
ment across in various industries and separate them to dierent correlation groups.
Khedr and Yaseen (2017) aims at constructing an eective model to predict stock
market future trends with small error ratio and improve the accuracy of prediction.
Where this prediction model is based on sentiment analysis and historical stock market
prices, worked with K-NN and naïve Bayes algorithm to earn the nal results. We can sep-
arate the model for two stages. The rst stage is to determine the news polarity is positive
or negative using naïve Bayes algorithm, the second stage incorporates the output of the
rst stage as input with the processed historical numeric data to predict the future stock
trend using K-NN algorithm.
Streaming data prove to be a rich source of data analysis where data are collected in
real-time. The major characteristics of such data being its accessibility and availability,
help in proper analysis and prediction. Das et al. (2018) show an analysis that has been
made for making nancial decisions such as stock market prediction, to predict the poten-
tial prices of a companys stock using twitter data.
Kalyani et al. (2016)s project takes data such as nancial news articles about a company
and predict its future stock trend with news sentiment classication, assuming that news
articles have impact on stock market. This is an attempt to study relationship between
news and stock trend. For this, they used dictionary based approach. The dictionaries for
positive and negative words are created using general and nance specic sentiment carry-
ing words. Based on this data, they implemented classication models. The results show
that Random Forest (RF) and Suport Vector Machine (SVM) perform well in all testing.
Mikolov et al. (2011) presents some modications of the original recurrent neural
network language model (RNN LM). This model has been shown to signicantly outper-
form many competitive language modelling techniques in terms of accuracy, but the
remaining problem is the computational complexity. Their result is more than 15 times
faster in both training and testing phases. The resulting RNN model can be smaller,
faster and more accurate than the base.
In another paper we can get to know the SummaRuNNer model, which is a Recurrent
Neural Network (RNN) based sequence model. Nallapati et al. (2016) proposes a very inter-
pretable neural sequence model for extractive document summarization that allows intui-
tive visualization, and shows that it is better performing than the state-of-the art deep
learning models and it is comparable to this learning models as well.
Following this line, Liu et al. (2016) shows the multitask learning framework to jointly
learn across multiple related tasks which based on recurrent neural network. They
JOURNAL OF INFORMATION AND TELECOMMUNICATION 3
propose three dierent mechanisms of sharing information to model text with task-
specic and shared layers where the dierences among them are the mechanisms of
sharing information between the tasks.
Lets look at other approaches. Balahur (2013) presents a method for sentiment analysis
specically designed to work with Twitter data, focussing their structure, length and
specic language. They show that the use of generalized features signicantly improves
the results of the sentiment classication. They apply unigram and bigram (n-gram) and
supervised learning with simple Support Vector Machines. Based on the results we can
conclude that, the best properties to use emotional analysis is the unigram and the
bigram together. We can also see that generalizations, using unique tags, emotive
words and modiers are strongly improve the performance rating of emotions.
SmartSA is a lexicon-based sentiment classication system for social media genres by
Muhammad et al. (2016). It integrates strategies to capture contextual polarity from two
ways, the interaction of terms with their textual neighbourhood and text genre like local
and global context. They also introduce an approach to hybridise a general purpose
lexicon, with genre-specic vocabulary and sentiment. The results from diverse social
media show that this strategies of local and global contexts signicantly improve senti-
ment classication, and are complementary in combination.
Arras et al. (2017) have introduced a simple yet eective strategy for extending the LRP
procedure to recurrent architectures (LSTM) by proposing a rule to backpropagate the rel-
evance through multiplicative interactions. They applied the extended LRP version to a bi-
directional LSTM model for the sentiment prediction of sentences.
To study the inuence of market characteristics on stock prices, traditional neural
network algorithms may incorrectly predict the stock market, since the initial weight of
the random selection problem can be easily prone to incorrect predictions. Based on
the development of word vector in deep learning, Pang et al. (2020) demonstrates the
concept of stock vector.The input is not only a single index or single stock index, but
multi-stock high-dimensional historical data. They propose the deep long short-term
memory neural network (LSTM) with embedded layer and the long short-term memory
neural network with automatic encoder to predict the stock market.
Billah et al. (2016) presented an improved Levenberg Marquardt (LM) training algor-
ithm. Improved Levenberg Marquardt algorithm of neural network can predict the poss-
ible day-end closing stock price with less memory and time needed, provided previous
historical stock market data of Dhaka Stock Exchange. such as the opening, highest,
lowest prices and total share traded data.
3. DataFrame building
3.1. Options to build DataFrame of the news headlines and stock values
There are several ways to approach data structure building. Primarily we consider the
headlines of economic news. Of course, there is the possibility to compile a collection
of data by human eort according to specic conditions, such as gathering economic
news titles ltered by a given company name from the collection built from a start
time (which is the oldest possible economic news titles) until to reach a certain limit.
There is the possibility of approaching the analysis using data from a previous archive col-
lection of data, but the main goal is to use the most up-to-date data as possible. There is
4LÁSZLÓ NEMES AND A. KISS
also the possibility of using human eort in the case of data collection from the stock
market values of companies, but today many economic portals and other libraries and fra-
meworks are available to fully automate the process. In this case, automation plays a more
important role than in the previous economic news title data collecting. The error factor
can be signicantly reduced when compiling companieseconomic data. In addition, the
source and the values of the stock data are easier to manage this way than in the econ-
omic news title data collecting.
3.2. DataFrame of the headlines of economic news
As mentioned earlier, the main goal in the headlines of economic news is to use the most
up-to-date data possible. All data collection and management is automated. There is an
option to the user to specify the portal as a source to manage the news. We used data
from nviz.comfor our analyses. Before collecting the data, it is possible to enter the
stock exchange names of the companies where we would like to collect the data of
recent economic news for analysis. It is possible to specify more than one company by
listing as parameter. The function takes care of managing the appropriate timestamps
(news publication time) and separating the news based on the companies and create a
backup into a le as csv. This freshly compiled data is used by the application for
further analysis (as part of sentiment analysis, comparisons, and other possibilities.) It is
important to mention that news timestamps play a role in compiling additional stock
market data so the analyses take place in the same time period. Thus, these economic
news headlines dene the interval for subsequent stock market data collection separated
for companies.
website_url = https://nviz.com/quote.ashx?t=
company_tikcers = [AMD,AMZN,FB,GOOG]
news_tables = {}
parsed_data = []
for ticker in company_tikcers:
url = website_url + ticker
req = Request(url=url, headers={user-agent:my-scrape})
response = urlopen(req)
html = BeautifulSoup(response, html)
news_data = html.nd(id=news-table)
news_tables[ticker] = news_data
for ticker, news_table in news_tables.items():
for row in news_table.ndAll(tr):
title = row.a.text
date_data = row.td.text.split(‘’)
if len(date_data) == 1:
time = date_data[0][0:7]
else:
date = datetime.datetime.strptime(date_data[0], %b-%d-%
y).strftime(%Y/%m/%d)
time = date_data[1][0:7]
JOURNAL OF INFORMATION AND TELECOMMUNICATION 5
parsed_data.append([ticker, date, time, title])
dataset = pd.DataFrame(parsed_data, columns=[Company,
Date,Time,News Headline])
Listing 1. Part of the Economic news headlines dataframe builder
The code snippet shown by Listing 1 implements a part of data collection for economic
news headlines. Where the weblite urlis the portal from where we process the news, and
the company ticketsare the company names on the stock market in a list from which we
would like to compile data. The data processing shown by the code snippet use the Beau-
tifulSoup,urlopenand Requesttools to perform scraping. For other source pages we
have to make changes in this processing stage to scrape data from this specied page
(Figure 1).
3.3. DataFrame of the company specic stock values
Yahoo ntool was used to collecting stock values for companies. This data is separated
by companies into the intervals of previously compiled economic news headlines. Based
on this, it will be possible to analyze and compare economic news headlines and stock
market data for a given period. This data collection and management also provides the
opportunity to perform individual and aggregate analyses (Figure 2).
4. Sentiment analysis with dierent tools
As mentioned earlier, there are many possibilities for sentiment analysis from human-
labelled data to various deep learning methods. In the present case, we compare the pos-
sibilities oered by TextBlob, NLTK -- VADER Lexicon, RNN and BERT. The main goal is to
analyze the headlines of economic news about dierent companies and determine their
sentiment values to be positive or negative possibly neutral. A key factor is to minimize
neutral values. It should be noted that we do not have as much inuence over external
third-party devices as we do about our own models, such as RNN.
In the case of sentiment analysis, the headline of the economic news from each
company is labelled to what sentiment value it carries, and the polarity value is also indi-
cated. With the help of these data, we can make a number of further analyses and
Figure 1. Part from the economic news headlines dataframe.
6LÁSZLÓ NEMES AND A. KISS
comparisons. The main direction is to compare the specic companies with their stock
market values in the period of time which determined by the economic news. Thus asses-
sing and presenting the emotional impact of economic news headlines on stock market
changes and see how powerful the headlines can be alone without full content.
In addition, our goal is to determine the strength and accuracy of dierent sentiment
analysis tools by the given context. The BERT tool is used as a kind of comparative tool to
see how close the results of the other tools are to the results of BERT. More detailed analy-
sis of stock market values and sentiment values (polarity and sentiment label) is done
using the results of TextBlob, NLTK -- VADER Lexicon and RNN.
4.1. TextBlob
TextBlob is a powerful NLP library for Python, which is built upon NLTK and provides an
easy to use interface to the NLTK library. This tool can be used to perform a variety of NLP
tasks ranging from parts-of-speech tagging to sentiment analysis, and language trans-
lation to text classication, but we focus on the sentiment analysis. If we do a sentiment
analysis, we actually determine a polarity value of the sentences, where this value can be
between 1 and 1. Then we label the data with the right sentiment value (positive, nega-
tive or neutral). For other tools, the polarity value may move on a dierent scale, so the
labelling needs to adjust for these dierences for further analysis.
The Figure 3 shows that sentiment values separated by companies. No other value can
approach the neutral section, it can be concluded that the analysis of the given economic
news headlines and its outcome is very uncertain. In the case of AMD, it can be noted that
in Figure 3(a), in addition to the 63 news headlines rated as neutral, 31 are positive and 5
are negative. In the case of FB -- Facebook, in addition to the 80 news headlines rated as
neutral, there are 13 positive and 6 negative values as well.
In the case of the total result, 75.25 percent in Figure 3(b) is neutral besides to this
20.25 percent is positive and only 4.50 percent is negative.
The goal is to minimize neutral values by using a more accurate analysis to reduce the
inaccuracy increased by its neutral values in stock market comparisons.
The following gure (Figure 4) shows the results divided into days in the interval. The
results are aggregated and this gives us a normalized value of how positive or negative
the overall day was for the company. Due to the signicant neutral value of more than 75
percent, the days are visibly shifting in a positive direction, which can greatly distort real
results. Where a company does not have a coloured column for a given day, there was no
economic news headline about those company. The following gure is formed on the inter-
val, where above zero means the positive section and below means the negative section.
Figure 2. Part from the AMD Stock Values DataFrame.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 7
It should be noted that there were a large amount of news during the period, about
AMD will launch new CPUs and GPUs in October, which was also signicantly positive.
This may be explained by recent period in the end of October CPU and GPU events
and this is the eect of these events.
4.2. NLTK -- VADER lexicon
NLTK stands for Natural Language Toolkit. This toolkit is one of the most powerful NLP
libraries which contains packages to make machines understand human language and
reply to it with an appropriate response. Again, we focus on sentiment analysis with
the SentimentIntensityAnalyzer. The polarity value of the sentences scales between -1
and 1 just like in the TextBlob. The data labelling process (positive, negative or neutral)
is similar to the previous tool. We use VADER Lexicon in this section. VADER (Valence
Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis
Figure 3. Company specic results of the sentiment analysis using TextBlob. The time period stands
between 2020-10-27 and 2020-11-14. (a) Results by Companies and (b) Aggregate Sentiment Result.
Figure 4. TextBlob Analysis results separated by days. The time period stands between 2020-10-27
and 2020-11-14.
8LÁSZLÓ NEMES AND A. KISS
tool that is specically attuned to sentiments expressed in social media, and works well on
texts from other domains.
As shown in Figure 5 below, the neutral value (a) dominates in all cases among the sen-
timent results separated by companies. In the gure next to it (b), the aggregate senti-
ment result shows the economic news headlines signicant neutral values. This level of
neutral values has impact on comparisons and analyses to the subsequent stock
market changes. Compared to the results of TextBlob, the neutral values have been sig-
nicantly reduced and we expect that has signicant eect in further analysis to obtain
more accurate and realistic results with fewer neutral values. In Figure 5(b), 51.50
percent of the total result is neutral in addition to 31.50 percent positive and 17
percent negative. Of the positive or negative categories, the positive strongly dominates,
but this huge neutral value still makes the result little bit uncertain.
The following gure (Figure 6) shows the results divided into days in the interval. The
results are aggregated and this gives us a normalized value of how positive or negative
the overall day was for the company. One day in total cannot be neutral because of
the other news headlines have to move it in some direction and the neutral values accord-
ing to the polarity also try to move the result in some direction too. Thus, the following
gure is formed on the interval, where above zero means the positive section and below
means the negative section.
4.3. Recurrent neural network (RNN)
When we talk about traditional neural networks, all the outputs and inputs are indepen-
dent of each other. But in the case of recurrent neural networks, the output from the pre-
vious steps is fed into the input of the current state.
All in all the Recurrent Neural Network is a neural network that is intentionally run mul-
tiple times, where parts of each run feed into the next run. Specically, hidden layers from
the previous run provide part of the input to the same hidden layer in the next run. Recur-
rent neural networks are particularly useful for evaluating sequences, so that the hidden
layers can learn from previous runs of the neural network on earlier parts of the sequence.
Figure 5. Company specic results of the sentiment analysis using NLTK -- VADER Lexicon. The time
period stands between 2020-10-27 and 2020-11-14. (a) Results by Companies and (b) Aggregate Sen-
timent Result.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 9
For example, the following gure of Google shows a recurrent neural network that runs
four times (Figure 7). Notice that the values learned in the hidden layers from the rst run
become part of the input to the same hidden layers in the second run. Similarly, the values
learned in the hidden layer on the second run become part of the input to the same hidden
layer in the third run. In this way, the recurrent neural network gradually trains and predicts
the meaning of the entire sequence rather than just the meaning of individual words.
An advantages of the RNN model: RNN can process inputs of any length. An RNN
model is modelled to remember each information throughout the time which is very
helpful in any time series predictor. Even if the input size is larger, the model size does
not increase. But there some disadvantages: Due to its recurrent nature, the computation
is slow. Training of RNN models can be dicult.
We have to mention that the polarity value of the sentences scales between 0 and 1
here. In contrast to the models what mentioned earlier.
The Figure 8 shows the results of RNN separated by companies and the aggregating
result as before. A signicant dierence from the previous results of TextBlob and NLTK
-- Vader Lexicon is that the neutral section was completely eliminated, all news headlines
were categorized as either positive or negative. This is a signicant dierence from pre-
vious models, although there was a kind of downward trend in the models. The neutral
category of the TextBlob was huge, it was signicantly reduced by the NLTK -- Vader
Lexicon, and then the RNN model was managed to avoid a neutral category.
Figure 8(a) shows how the positive and negative news headlines are distributed
among the companies. In the case of AMD, it can be noted that the result is quite
balancedwith 51 positive and 48 negative values. In part (b) of the gure, the total
result is 58.50 percent positive and 41.50 percent negative and the neutral value is 0
percent which is now the key.
In Figure 9, the positive negative day categorization is totally dierent than the pre-
vious ones, because in the case of the RNN model, the polarity values scale between 0
and 1. Therefore, here is a traditionalbar chart showing the aggregation of polarity
values for each day.
Figure 6. NLTK -- VADER Lexicon Analysis results separated by days. The time period stands between
2020-10-27 and 2020-11-14.
10 LÁSZLÓ NEMES AND A. KISS
Note: The RNN model was trained based on an IMDB review dataset
2
(In the test and
train dataset sections we used shue method as well. Then we use the fresh scraped
dataset as test dataset with this trained model.)
4.4. Bidirectional encoder representations from transformers (BERT)
Unlike the traditional NLP models that follow a unidirectional approach, that is, reading
the text either from left to right or right to left, BERT reads the entire sequence of
Figure 7. A recurrent neural network.1
Figure 8. Company specic results of the sentiment analysis using RNN. The time period stands
between 2020-10-27 and 2020-11-14. (a) Results by Companies and (b) Aggregate Sentiment Result.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 11
words at once. BERT makes use of a Transformer which is essentially a mechanism to build
relationships between the words in the dataset. In its simplest form, a BERT consists of two
processing models -- an encoder and a decoder. The encoder reads the input text and the
decoder produces the predictions. But, because the main goal of BERT is to create pre-
trained model, the encoder takes priority over decoder. BERT is a remarkable break-
through in the eld of NLP.
As mentioned earlier, BERT is used as a kind of comparative result. Figure 10 shows the
results obtained by BERT. Of course, without a neutral category, it managed to categorize
each economic news headline and labelled it as a positive or negative value. In part (a) of
the gure, it can be mentioned that the result of our previous RNN model is quite encoura-
ging, as there is no neutral category either and the values of certain companies are quite
close to the result of BERT. Part (b) of the gure shows the overall result where 50.50
percent is positive and 49.50 percent is negative compared to the result of the RNN
model where 58.50 is positive and 41.50 is negative, neutral is 0 percent in both cases.
We expected that the model we trained and taught would give more accurate and more
reliable results than other tools on the same data set. More specically, the result from the
RNN model determines emotional values and labels with a more accurate and smaller error
rate than NLTK with VADER Lexicon or TextBlob. This expectation was also conrmed by the
results. It should be emphasized that the result of NLTK was much more encouraging than
initially expected and in later analyses, despite the existing neutral values, it gave a much
nerresult than TextBlob where we get a rawresult due to the signicant neutral value.
For the RNN model, no headline is placed in the neutral category. Regarding the results of
BERT and the results of the other tools, we expect more accurate results from the RNN and
NLTK tools when analyzing with stock market values.
5. Sentiment and stock value analysis
Following the sentiment analyses at a given interval, we can start the comparison with
stock market changes at the same interval. During the sentiment analysis, the realistic
Figure 9. RNN Analysis results separated by days. The time period stands between 2020-10-27 and
2020-11-14.
12 LÁSZLÓ NEMES AND A. KISS
word was mentioned, which refers to the smaller neutral category, less neutral value in
the solution of the analysis. It refers to a better and strongeranalytical model that
was able to give positive and negative tags to news headlines which were categorized
as neutral by the previous analytical model. Thus we reduced the potential of error and
possible skew results.
We can say that economic news do have an impact on stock market shifts, there are
times when certain news items have eect to the later movements, and there are
times when the news describe a particular shift, which enhances change too.
Our main study in the present case focuses on the headlines of economic news about
the various companies which was given as parameters previously, without their full article
context. The headline itself, which aims to draw peoples attention and generate clicks on
full content, is worded in this sometimes sharp, eye-catchingway. How much impact do
these economic news headlines have on stock market changes, if it has any eect. In our
results we found that it really has.
Figure 11 shows the AMD stock market changes during the given study period, where
the date and the daily closing (adjusted closing price) value are displayed.
The following (Figure 12) shows the results of dierent sentiment models (TextBlob,
NLTK -- Vader Lexicon and RNN) for the given period, broken down by day.
Here, the results are the same of the previous sentiment analyses, but now they are
displayed on a dierent diagram for the purpose of being comparable with the stock
market data. Signicant dierences can be observed in the results of the dierent
models especially in some parts of the result. One of the most striking may be the nega-
tive news stream around 2020-11-11. In all three cases, a negative trend can be detected,
but the dierences in the extent are signicant. These results, in comparison with stock
market changes, help us to see a kind of eect on whether stock market movements
are reected in the diagram of sentiment results. The amount of neutral values plays a
signicant role in the accuracy of the models. It was mentioned earlier that when calcu-
lating daily results (this day is positive or negative all in all), the polarity values of the
neutral values also count, so that these values also play a role in the positive or negative
shift of a day as they belong to that day, but this values distort the result. In contrast, in a
model where there is no neutral value, much higher accuracy can be expected.
Figure 10. Company specic results of the sentiment analysis using BERT. The time period stands
between 2020-10-27 and 2020-11-14. (a) Results by Companies and (b) Aggregate Sentiment Result.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 13
Figure 13 shows the normalized results, where the dierent models show how the
stock market value changed in the period and how the daily results obtained by the econ-
omic news headlines of the given period relate to stock market movements. The graph
still process data from AMD. The rst gure (a) shows the summary of the results obtained
by the TextBlob and the stock market result. In the case of TextBlob, the ratio of neutral
values was 75.25 percent, which is also reected in the large vibrationof emotional
results. On the normalization graph in this case with the economic news headlines and
stock values we can read as fundamental changes, with the trend of decreasing or increas-
ing. In the period between 2020-11-04 and 2020-11-06, a strong decrease in emotional
values can be observed, in addition to with a smaller breakor a correction in the
stock market values as well. In the phases of 2020-11-09 and 2020-11-10, a signicant
Figure 11. AMD stock value changes.
Figure 12. Sentiment analysis of dierent models by daily separation. (a) TextBlob. (b) NLTK -Vader
Lexicon and (c) RNN.
14 LÁSZLÓ NEMES AND A. KISS
break point can be observed in both stock market developments and emotional results.
Overall, we can see the impact and the major growth declines can be traced from the
chart, but its detail is questionable. In the case of gure (b) we can see the results of
the NLTK -- Vader lexicon normalization. The ratio of neutral values in this case was
reduced to 51.50 percent. It can be said that the result is surprising at rst. It is clear
that a more detailed co-movementof stock and emotional values is shown in the
gure. Changes between 2020-11-09 and 2020-11-11 will be tracked fully in sync.
Regarding the results of RNN in gure (c), where the ratio of neutral values was 0
percent, signicant dierences can be observed compared to the previous ones. Here,
it may appear primarily that the two results do not follow each other in synchrony
and in some cases there is a signicant dierence between emotional and stock
market results. It can be assumed that in this case, the eect of emotional values on
the results of the current days may not be as great and perhaps a kind of periodic pre-
dictioncan be observed. The signicant positive result between 2020-11-04 and 2020-
11-06 is one of the most striking results. Until the subsequent correlation matrix
results, all that can be stated is that there is a signicant decrease in the inuence of
emotional values in the given stock market period. A kind of emotional decrease or
increase and a following stock shift can be observed, but in fact the inuence has
decreased signicantly, which can be explained by neutral values and realism,when
examining the inuence of news headlines we cannot expect as much impact as full econ-
omic articles and analyses. In all three cases, these eects are also analyzed by correlation
matrices.
In the case of Figure 14, we can see that Compound (sentiment results) has a huge
impact on both the opening, closing, lowest and highest values of the stock market,
which is a very distorted result. Its almost unthinkable to have such a big impact. As men-
tioned earlier, the signicant neutral value can be traced back to this situation as well.
Figure 13. Normalized results of the sentiment and stock values. (a) TextBlob. (b) NLTK -Vader Lexicon
and (c) RNN.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 15
The following Figure 15 shows the result of the NLTK -- Vader Lexicon correlation
matrix, where there are decreases in Compound values in almost all values compared
to the results of the previous (TextBlob) correlation matrix. In addition to a kind of
synchronized resultseen on previous diagrams, a signicant eect was expected
in the matrix as well, but perhaps these results may also seem excessive as a result
obtained, considering that we examine economic news headlines on a company-
specic basis.
In the Figure 16, the correlation matrix of RNN is completely dierent and surprising in
this case as well. The value of the Compound has decreased signicantly compared to its
previous models to the opening, closing, lowest and highest values, and unlike before, its
eect on another value has increased drastically. The value of the volume is the amount of
an asset or security that changes hands over some period of time, often over the course of
a day.
The correlation matrix of RNN and the values of Compound provide a kind of expla-
nation for the diagrams seen earlier. The previous models had a signicant eect on
the opening, closing, lowest and highest values, in contrast, the RNN shows a completely
dierent result. Overall, we can say that the headlines themselves have a signicant eect
on the change in stock market values, in addition to highlighting the volume value, which
alone received a signicant value in the RNN model, unexpectedly high. It should be
noted that the data from the study period may also play a role in this. But the result is
thought-provoking. The result is not unique. We obtained a similar result for the measure-
ments between 2020-10-27 and 2020-11-16 for another company, which was the Google
(GOOG). As we can see in the Figure 17.
Figure 14. Correlation matrix of TextBlob.
16 LÁSZLÓ NEMES AND A. KISS
Figure 15. Correlation matrix of NLTK -- Vader Lexicon.
Figure 16. Correlation matrix of RNN.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 17
6. Conclusion and future work
In this work, we used dierent sentiment analysis tools to emotionally analyze and classify
dierent economic news headlines and examine their impact on dierent stock market
value changes even without their full context. Emotions were classied into the usual
positive negative and neutral categories. Neutral categories appeared for TextBlob and
NLTK-VADER Lexicon tools, but not for Recurrent Neural Network (RNN). The various sen-
timent analyses results were compared with the result of BERT as a benchmark. As we
expected, the results of the RNN model what we developed and taught outperformed
the other sentiment analysis tools and gave a result quite close to BERT, emphasizing
that there was no neutral emotional value in this case either. In the analysis of emotional
results and stock market changes, we compared the daily results of emotional values and
the results of stock market values for the given period. We obtained appropriate diagrams
for the reading of the emotional results and the stock market movements and corrections,
but we could detect dierences according to the ratio and eect of the neutral values of
the dierent models. In the eld of further analysis, we detected signicant dierences in
the correlation matrices. In the case of TextBlob, the Compound (emotional results) had a
signicant eect on the opening, closing, highest and lowest values of the stock
exchange, the NLTK -- Vader Lexicon gave similar results, but reducing the results of
the previous model signicantly. The RNN model brought a completely dierent value.
The emotional values and stock market change diagram also showed a kind of smaller
eect, which was also conrmed by the correlation matrix, and also had a signicant
eect on the Volume value compared to the other models. Overall, economic news head-
lines have an impact on stock market values even without their textual context, and
Figure 17. Correlation matrix of RNN with another company (GOOG).
18 LÁSZLÓ NEMES AND A. KISS
signicant dierences can be observed between dierent sentiment analytical tools. But
the stock market impact also depends on how the data in the current study period was
aected.
Future work could include further expansion of the analyses, possible additions of a
new features. In addition, the inclusion of other tools to compare stock market predictions
with dierent sentiment analysis tools. That can be built into an easy-to-use format by
developing a platform incorporating various future changes of tensorow into the
current model.
Notes
1. https://developers.google.com/machinelearning/glossary/#recurrent_neural_network
2. https://www.tensorow.org/datasets/catalog/imdb_reviews
Disclosure statement
No potential conict of interest was reported by the authors.
Funding
The project has been supported by the European Union, co-nanced by the European
Social Fund (EFOP-3.6.3-VEKOP-16-2017-00002).
Notes on contributors
László Nemes received the B.Sc. degree in computer science from Eötvös Loránd Univer-
sity in 2020 and currently pursuing a M.Sc. degree. He is a Demonstrator with the Depart-
ment of Media and Educational Technology, Eötvös Loránd University.
Attila Kiss was born in 1960. In 1985 he graduated (MSc) as mathematician at Eötvös
Loránd University, in Budapest, Hungary. He defended his PhD in the eld of database
theory in 1991. Since 2010 he is working as the head of Information Systems Department
at Eötvös Loránd University. Since 2015 he has been teaching at J. Selye University,
Komárno, Slovakia, too. His scientic research is focussing on database theory and prac-
tice, security, semantic web, big data, data mining, articial intelligence and bioinfor-
matics. He was the supervisor of 7 PhD students. He has more than 160 scientic
publications.
ORCID
László Nemes http://orcid.org/0000-0001-6167-9369
Attila Kiss http://orcid.org/0000-0001-8174-6194
References
Arras, L., Montavon, G., Müller, K. R.., & Samek, W (2017). Explaining recurrent neural network pre-
dictions in sentiment analysis. arXiv preprint arXiv:1706.07206.
JOURNAL OF INFORMATION AND TELECOMMUNICATION 19
Balahur, A. (2013, 14 June). Sentiment analysis in social media texts. Proceedings of the 4th Workshop
on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Atlanta, GA
(pp. 120128). Association for Computational Linguistics.
Billah, M., Waheed, S., & Hanifa, A (2016, December 810). Stock market prediction using an improved
training algorithm of neural network. 2nd International Conference on Electrical, Computer &
Telecommunication Engineering (ICECTE), Rajshahi, Bangladesh (pp. 14). IEEE. https://doi.org/
10.1109/ICECTE.2016.7879611
Das, S., Behera, R. K., & Rath, S. K. (2018). Real-time sentiment analysis of twitter streaming data for
stock prediction. Procedia Computer Science,132, 956964. https://doi.org/10.1016/j.procs.2018.
05.111
Devlin, J., Chang, M. W.., Lee, K., & Toutanova, K (2018). Bert: Pre-training of deep bidirectional trans-
formers for language understanding. arXiv preprint arXiv:1810.04805.
Kalyani, J., Bharathi, P., & Jyothi, P. (2016). Stock trend prediction using news sentiment analysis.
arXiv preprint arXiv:1607.01958.
Khedr, A. E., & Yaseen, N. (2017). Predicting stock market behavior using data mining technique and
news sentiment analysis. International Journal of Intelligent Systems and Applications,9(7), 22.
https://doi.org/10.5815/ijisa
Lee, H. S. (2020). Exploring the initial impact of COVID-19 sentiment on US stock market using big
data. Sustainability,12(16), 6648. https://doi.org/10.3390/su12166648
Liu, P., Qiu, X., & Huang, X (2016). Recurrent neural network for text classication with multi-task
learning. arXiv preprint arXiv:1605.05101.
Mikolov, T., Kombrink, S., Burget, L., Černock, J., & Khudanpur, S (2011, May 2227). Extensions of
recurrent neural network language model. IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), Prague, Czech Republic (pp. 55285531). IEEE. https://doi.org/
10.1109/ICASSP.2011.5947611
Muhammad, A., Wiratunga, N., & Lothian, R. (2016). Contextual sentiment analysis for social media
genres. Knowledge-based Systems,108,92101. https://doi.org/10.1016/j.knosys.2016.05.032
Nallapati, R., Zhai, F., & Zhou, B (2016). Summarunner: A recurrent neural network based sequence
model for extractive summarization of documents. arXiv preprint arXiv:1611.04230.
Pang, X., Zhou, Y., Wang, P., Lin, W., & Chang, V. (2020). An innovative neural network approach for
stock market prediction. The Journal of Supercomputing,76(3), 20982118. https://doi.org/10.
1007/s11227-017-2228-y
Wang, T., Lu, K., Chow, K. P., & Zhu, Q. (2020). COVID-19 sensing: Negative sentiment analysis on
social media in China via bert model. IEEE Access,8, 138162138169. https://doi.org/10.1109/
Access.6287639
20 LÁSZLÓ NEMES AND A. KISS
... Headlines of financial news also regarding world-known companies were collected from Finviz in [9]. VADER, TextBlob, BERT and a recurrent neural network were compared in their extraction of the sentiment to be correlated with the market moves. ...
... Firstly, we should employ other means for establishing the ground truth of the sentiment scores for the texts. Besides VADER, other options like TextBlob and BERT [9] are intended to be used in the future and the results from all will be compared. ...
... It was also observed that lexicon-based techniques have been utilized in a number of sectors to extract a score from textual data and identify the most positive and negative sentences, including users' tweets during Covid-19 about online learning (Mujahid et al., 2021), tweets (Bhaumik and Yadav, 2021), news (Nemes & Kiss, 2021), stock market (Oliveira et al., 2016), but not in shopping apps reviews. As a result, in the current study, we used two lexicon-based algorithms to find sentiment ratings and top positive and unfavorable reviews. ...
Chapter
The goal of this study is to apply machine learning (ML) approaches to assess user sentiment and predict review ratings for Bangladeshi shopping apps. The data for this study was obtained from the Google Play Store reviews of 15 Bangladeshi shopping apps. The AFINN and VADER sentiment algorithms were used to assess the filtered summary phrases as positive, neutral, or negative sentiments after cleaning. The present study additionally employed five supervised machine learning approaches to divide users' assessments of shopping apps into three sentiment groups. According to the findings of this survey, the majority of ratings for shopping apps were positive. While all five machine learning approaches (SVC, k-neighbors classifier, logistic regression, decision tree classifier, and random forest classifier) can properly categorize review text into sentiment classes, the random forest classifier outperforms in terms of high accuracy. This study adds to the literature on customer sentiment and aids app marketers in understanding how consumers feel about apps.
... The results revealed that the BERT-based model achieved an agreement rate of 71%, 10% higher than VADER that achieved an agreement score of 61%. The results are aligned with results from past studies conducted by other researchers (Crocamo, Viviani, Famiglini, Bartoli, Pasi and Carrà, 2021;Nemes and Kiss, 2021). Therefore, we decided to use the BERT-based algorithm for our sentiment analysis experiments. ...
Preprint
Full-text available
Current research on users` perspectives of cyber security and privacy related to traditional and smart devices at home is very active, but the focus is often more on specific modern devices such as mobile and smart IoT devices in a home context. In addition, most were based on smaller-scale empirical studies such as online surveys and interviews. We endeavour to fill these research gaps by conducting a larger-scale study based on a real-world dataset of 413,985 tweets posted by non-expert users on Twitter in six months of three consecutive years (January and February in 2019, 2020 and 2021). Two machine learning-based classifiers were developed to identify the 413,985 tweets. We analysed this dataset to understand non-expert users` cyber security and privacy perspectives, including the yearly trend and the impact of the COVID-19 pandemic. We applied topic modelling, sentiment analysis and qualitative analysis of selected tweets in the dataset, leading to various interesting findings. For instance, we observed a 54% increase in non-expert users` tweets on cyber security and/or privacy related topics in 2021, compared to before the start of global COVID-19 lockdowns (January 2019 to February 2020). We also observed an increased level of help-seeking tweets during the COVID-19 pandemic. Our analysis revealed a diverse range of topics discussed by non-expert users across the three years, including VPNs, Wi-Fi, smartphones, laptops, smart home devices, financial security, and security and privacy issues involving different stakeholders. Overall negative sentiment was observed across almost all topics non-expert users discussed on Twitter in all the three years. Our results confirm the multi-faceted nature of non-expert users` perspectives on cyber security and privacy and call for more holistic, comprehensive and nuanced research on different facets of such perspectives.
... To perform SA, several lexicons are accessible Customer sentiment analysis (Preethi et al., 2015); lexicon-based techniques are simple to apply, and lexicon-based approaches have been used in many recent and similar studies (Machov a et al., 2020;Yang et al., 2020). Additionally, it was discovered that lexicon-based approaches have been used in a variety of fields to gain the score from the textual data and find top positive and negative sentences, including the tweets of a general user (Bhaumik and Yadav, 2021;Oyebode), tweets about online education during COVID-19 (Mujahid et al., 2021), stock market (Oliveira et al., 2016), news (Nemes and Kiss, 2021), e-mails (Borg and Boldt, 2020) and halal food (Mostafa, 2018;Mostafa, 2020), but not halal restaurants. Hence, in the current work, we used two lexicon-based approaches to find sentiment scores and top positive and negative reviews of halal restaurants. ...
Article
Purpose There is a strong prerequisite for organizations to analyze customer review behavior to evaluate the competitive business environment. The purpose of this study is to analyze and predict customer reviews of halal restaurants using machine learning (ML) approaches. Design/methodology/approach The authors collected customer review data from the Yelp website. The authors filtered the reviews of only halal restaurants from the original data set. Following cleaning, the filtered review texts were classified as positive, neutral or negative sentiments, and those sentiments were scored using the AFINN and VADER sentiment algorithms. Also, the current study applies four machine learning methods to classify each review toward halal restaurants into its sentiment class. Findings The experiment showed that most of the customer reviews toward halal restaurants were positive. The authors also discovered that all of the methods (decision tree, linear support vector machine, logistic regression and random forest classifier) can correctly classify the review text into sentiment class, but logistic regression outperforms the others in terms of accuracy. Practical implications The results facilitate halal restaurateurs in identifying customer review behavior. Social implications Sentiment and emotions, according to appraisal theory, form the basis for all interactions, facilitating cognitive functions and supporting prospective customers in making sense of experiences. Emotion theory also describes human affective states that determine motives and actions. The study looks at how potential customers might react to a halal restaurant’s consensus on social media based on reviewers’ opinions of halal restaurants because emotions can be conveyed through reviews. Originality/value This study applies machine learning approaches to analyze and predict customer sentiment based on the review texts toward halal restaurants.
... Users' tweets about online learning (Mujahid et al., 2021), tweets (Bhaumik and Yadav, 2021), news (Nemes & Kiss, 2021), and stock market (Oliveira et al., 2016) were among the sectors where lexiconbased techniques were used to extract a score from textual data and identify the most positive and negative sentences, but not in blended learning platform app reviews. As a consequence, we employed two lexicon-based algorithms to determine sentiment ratings and the top positive and negative reviews in the current study. ...
Chapter
Understanding how to assess the learners' evaluation has become an essential topic for both academics and practitioners as blended mobile learning applications have proliferated. This study examines users' sentiment and predicts the review rating of the blended learning platform app using machine learning (ML) techniques. The data for this study came from Google Play Store reviews of the Google Classroom app. The VADER and AFINN sentiment algorithms were used to determine if the filtered summary sentences were positive, neutral, or negative. In addition, five supervised machine learning algorithms were used to differentiate user evaluations of the Google Classroom app into three sentiment categories in the current study. According to the results of this investigation, the majority of reviews for this app were negative. While all five machine learning algorithms are capable of correctly categorizing review text into sentiment ratings, the random logistic regression outperforms in terms of accuracy.
... In [36] researchers demonstrate how economic news headlines can influence stock market fluctuations. They used BERT as a benchmark and correlate the sentiment results to stock fluctuations over the same period using three other tools such as VADER, TextBlob, and a Recurrent Neural Network. ...
Conference Paper
Due to market volatility, forecasting stock market trend is one of the most complicated tasks. There is a heated debate about the effectiveness of predicting market movement based on public sentiments expressed in written news, whether through social media, web pages, or financial news. However, past researches ignore a crucial source of knowledge, which is videos news, that is typically introduced by market specialists. This research examined the reliability of using the sentiment of video news sites to forecast the stock price. To determine the strength of the causal relationship between stock market prices and video sentiments, we applied Granger causality analysis and Pearson correlation coefficient tests. We also investigated the use of TextBlob API versus the efficiency of Google Cloud Natural Language API to find sentiment polarity scores for video news. Various models were evaluated for Sentiment Analysis of S&P 500 stock using LR, SVM, LSTM, and CNN models. Finally, we utilized the most effective sentiment analysis tool to train our ML classification model. This research is unique because it identifies and tests the question. Can we build an effective prediction model based on video news sentiment or can we add video news sentiment as a new feature to our future prediction model? The experimental findings demonstrate that there is a causal connection between video news sentiment and stock market fluctuation. The findings also revealed that when using the Google Cloud Natural Language API for sentiment analysis, the model showed a correlation between the video news and the company's price movements. Index Terms-market prediction, video news, machine learning, causality test.
... The models to perform sentiment analysis range from lexicon based approaches to sequence-to-sequence models to transformers in the current era. The proposed model in this paper uses a modified version of BERT(Bi-Directional Encoder Representations from Transformers), the state of the art model for NLP related tasks [13,31]. ...
Preprint
Full-text available
The stock market has been a popular topic of interest in the recent past. The growth in the inflation rate has compelled people to invest in the stock and commodity markets and other areas rather than saving. Further, the ability of Deep Learning models to make predictions on the time series data has been proven time and again. Technical analysis on the stock market with the help of technical indicators has been the most common practice among traders and investors. One more aspect is the sentiment analysis - the emotion of the investors that shows the willingness to invest. A variety of techniques have been used by people around the globe involving basic Machine Learning and Neural Networks. Ranging from the basic linear regression to the advanced neural networks people have experimented with all possible techniques to predict the stock market. It's evident from recent events how news and headlines affect the stock markets and cryptocurrencies. This paper proposes an ensemble of state-of-the-art methods for predicting stock prices. Firstly sentiment analysis of the news and the headlines for the company Apple Inc, listed on the NASDAQ is performed using a version of BERT, which is a pre-trained transformer model by Google for Natural Language Processing (NLP). Afterward, a Generative Adversarial Network (GAN) predicts the stock price for Apple Inc using the technical indicators, stock indexes of various countries, some commodities, and historical prices along with the sentiment scores. Comparison is done with baseline models like - Long Short Term Memory (LSTM), Gated Recurrent Units (GRU), vanilla GAN, and Auto-Regressive Integrated Moving Average (ARIMA) model.
Article
Sentiment Analysis deals with the computational treatment of opinions of expressed in written handbooks. The addition of the formerly mature semantic technologies to the field has proven to increase the results delicacy. In this a semantically- enhanced methodology for the reflection of sentiment opposition in fiscal news is presented. The term" Sentiment Analysis" was first defined in 2003 by Nasukawa and Yi as “ determining the subjectivity opposition ( positive or negative) and opposition strength ( explosively positive, mildly positive, weakly positiveetc.) of a given review textbook; in other words- determining the opinion of the pen.” Turney’s pioneering work on Sentiment Analysis applied an unsupervised approach to classify review data into positive class and negative class. The sum aggregate of information entered by the investors is reflected through the stock price of the enterprises. Through this process, information is converted from a textual form to a numerical form. This process of conversion is veritably useful, because it allows information to be fluently epitomized and enables us to compare the sentiments of news with the request returns. There may be variations about the exact meaning of a piece of news, but there can not be any variation about request returns. The fiscal news that makes a positive impact on the stock request returns is good and the bone that makes a negative impact on stock request returns is bad. In comparison to the work done in sentiment bracket applied to the review sphere or product reviews, veritably little work has been done in the field of operation of these ways in the fiscal sphere using unsupervised approach. This paper tries to address this exploration gap. The overall purpose of the study is to propose a semantic exposure grounded unsupervised approach for chancing sentiments strength of fiscal textbook.
Conference Paper
Full-text available
The COVID-19 pandemic has signified the interconnected nature of our world demonstrating that no one is safe until everyone is safe. The social and economic turmoil caused by the pandemic is devastating and revealing a dramatic loss of human life worldwide and presents a prodigious challenge to food systems, public health, and work worldwide. The vaccination programs are of utmost priority for every institution but there is a clear divide among people on efficacies and application of the offered vaccines. Today, the world has access to high-performance wireless internet due to 5G technologies which can enable systems to fetch billions of records from social media within a blink of an eye. The internet revolution has opened a new door of opportunities. This study aims to come up with a system that can utilize 5G technologies to access the data from social media to create awareness, prevent and control the impact of the pandemic by assessing the people's sentiments towards the COVID vaccines. People's sentiments are classified from not afraid to afraid divulging a total of three classes. The dataset is extracted from Twitter. The study has three main objectives 1) data collection and preprocessing 2) analyzing public sentiments, 3) evaluating the performance of Machine Learning (ML) classifiers. The results show that majority of people belong to the neutral class which indicates that they are still doubtful if they should be vaccinated or not. There is an urgent need for vaccine awareness programs to prevent COVID.
Article
Full-text available
This study explores the initial impact of COVID-19 sentiment on US stock market using big data. Using the Daily News Sentiment Index (DNSI) and Google Trends data on coronavirus-related searches, this study investigates the correlation between COVID-19 sentiment and 11 select sector indices of the Unites States (US) stock market over the period from 21st of January 2020 to 20th of May 2020. While extensive research on sentiment analysis for predicting stock market movement use tweeter data, not much has used DNSI or Google Trends data. In addition, this study examines whether changes in DNSI predict US industry returns differently by estimating the time series regression model with excess returns of industry as the dependent variable. The excess returns are obtained from the Fama-French three factor model. The results of this study offer a comprehensive view of the initial impact of COVID-19 sentiment on the US stock market by industry and furthermore suggests the strategic investment planning considering the time lag perspectives by visualizing changes in the correlation level by time lag differences.
Article
Full-text available
Coronavirus disease 2019 (COVID-19) poses massive challenges for the world. Public sentiment analysis during the outbreak provides insightful information in making appropriate public health responses. On Sina Weibo, a popular Chinese social media, posts with negative sentiment are valuable in analyzing public concerns. 999,978 randomly selected COVID-19 related Weibo posts from 1 January 2020 to 18 February 2020 are analyzed. Specifically, the unsupervised BERT (Bidirectional Encoder Representations from Transformers) model is adopted to classify sentiment categories (positive, neutral, and negative) and TF-IDF (term frequency-inverse document frequency) model is used to summarize the topics of posts. Trend analysis and thematic analysis are conducted to identify characteristics of negative sentiment. In general, the fine-tuned BERT conducts sentiment classification with considerable accuracy. Besides, topics extracted by TF-IDF precisely convey characteristics of posts regarding COVID-19. As a result, we observed that people concern four aspects regarding COVID-19, the virus Origin (Gamey Food, 3.08%; Bat, 2.70%; Conspiracy Theory, 1.43%), Symptom (Fever, 2.13%; Cough, 1.19%), Production Activity (Go to Work, 1.94%; Resume Work, 1.12%; School New Semester Beginning, 1.06%) and Public Health Control (Temperature Taking, 1.39%; Coronavirus Cover-up, 1.26%; City Shutdown, 1.09%). Results from Weibo posts provide constructive instructions on public health responses, that transparent information sharing and scientific guidance might help alleviate public concerns.
Article
Full-text available
Microblogging platforms like Twitter can convey short messages to direct contacts, but also to other potentially interested users. They are actively exploited either by individual users or whole organizations and companies. This paper describes some results we obtained from the Social Network and Sentiment Analysis of a Twitter channel, related to a pop music event. Apart from the particular results a methodology and some guidelines for the automatic classification of Twitter content are discussed.
Article
Full-text available
In this study, an attempt has been made for making financial decisions such as stock market prediction, to predict the potential prices of a company’s stock and to serve the need of this, Twitter data 1 2 has been considered for scoring the impression that is carried for a particular firm. Streaming data proves to be a perennial source of data analysis collected in real-time. Streaming data basically deals with the continuous flow of data which carries information from sources like websites, mobile phone applications, server logs, social websites, trading floors, etc. The major characteristics of such data being its accessibility and availability, help in proper analysis and prediction of user behavior in a ceaseless manner. The classifying model made out of historical data can be relentlessly honed to give even more accurate results since its outcome is always compared to the next tick of the clock. Spark streaming has been considered for the processing of humongous data and data ingestion tools like Twitter API and Apache Flume have been further implemented for analysis.
Article
Full-text available
This paper aims to develop an innovative neural network approach to achieve better stock market predictions. Data were obtained from the live stock market for real-time and off-line analysis and results of visualizations and analytics to demonstrate Internet of Multimedia of Things for stock analysis. To study the influence of market characteristics on stock prices, traditional neural network algorithms may incorrectly predict the stock market, since the initial weight of the random selection problem can be easily prone to incorrect predictions. Based on the development of word vector in deep learning, we demonstrate the concept of “stock vector.” The input is no longer a single index or single stock index, but multi-stock high-dimensional historical data. We propose the deep long short-term memory neural network (LSTM) with embedded layer and the long short-term memory neural network with automatic encoder to predict the stock market. In these two models, we use the embedded layer and the automatic encoder, respectively, to vectorize the data, in a bid to forecast the stock via long short-term memory neural network. The experimental results show that the deep LSTM with embedded layer is better. Specifically, the accuracy of two models is 57.2 and 56.9%, respectively, for the Shanghai A-shares composite index. Furthermore, they are 52.4 and 52.5%, respectively, for individual stocks. We demonstrate research contributions in IMMT for neural network-based financial analysis.
Article
Full-text available
Stock market prediction has become an attractive investigation topic due to its important role in economy and beneficial offers. There is an imminent need to uncover the stock market future behavior in order to avoid investment risks. The large amount of data generated by the stock market is considered a treasure of knowledge for investors. This study aims at constructing an effective model to predict stock market future trends with small error ratio and improve the accuracy of prediction. This prediction model is based on sentiment analysis of financial news and historical stock market prices. This model provides better accuracy results than all previous studies by considering multiple types of news related to market and company with historical stock prices. A dataset containing stock prices from three companies is used. The first step is to analyze news sentiment to get the text polarity using naïve Bayes algorithm. This step achieved prediction accuracy results ranging from 72.73% to 86.21%. The second step combines news polarities and historical stock prices together to predict future stock prices. This improved the prediction accuracy up to 89.80%.
Conference Paper
Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the resulting LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.
Conference Paper
Predicting closing stock price accurately is an challenging task. Computer aided systems have been proved to be helpful tool for stock prediction such as Artificial Neural Net-work(ANN), Adaptive Neuro Fuzzy Inference System (ANFIS) etc. Latest research works prove that Adaptive Neuro Fuzzy Inference System shows better results than Neural Network for stock prediction. In this paper, an improved Levenberg Marquardt(LM) training algorithm of artificial neural network has been proposed. Improved Levenberg Marquardt algorithm of neural network can predict the possible day-end closing stock price with less memory and time needed, provided previous historical stock market data of Dhaka Stock Exchange such as opening price, highest price, lowest price, total share traded. Morever, improved LM algorithm can predict day-end stock price with 53% less error than ANFIS and traditional LM algorithm. It also requires 30% less time, 54% less memory than traditional LM and 47% less time, 59% less memory than ANFIS.
Article
We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art. Our model has the additional advantage of being very interpretable, since it allows visualization of its predictions broken up by abstract features such as information content, salience and novelty. Another novel contribution of our work is abstractive training of our extractive model that can train on human generated reference summaries alone, eliminating the need for sentence-level extractive labels.