ChapterPDF Available

Predicting the Stock Market Behavior Using Historic Data Analysis and News Sentiment Analysis in R

Authors:
Predicting the Stock Market Behavior
Using Historic Data Analysis and News
Sentiment Analysis in R
A. C. Jishag, A. P. Athira, Muchintala Shailaja and S. Thara
Abstract Predicting the stock market has always been an attractive topic, mainly
due to its vitality in the economic and financial sectors. Yet, predictions of the stock
market pose a challenging exercise, even to the brightest and sharpest minds in the
business. Prediction of stock market is never an easy task, due to the complexity
and dynamic characteristics of the data it deals with. Bulk amount of the data output
generated by the stock market is considered to be a treasure house of knowledge
for investors; several studies have been conducted in an attempt to predict the stock
market trends. Hence, it is imminent to uncover the behavior of the stock market data
in order to avoid future investment risks for the investors. Here we tried a different
approach for solving this problem by combining two different components: sentiment
analysis on stock-related news reports and historic data analysis. The primary aim of
this study was to construct an efficient model to predict trends in the stock market,
with minimum error ratio and with maximum accuracy possible for the prediction.
This model achieved notably better accuracy as compared to the models created in the
previous studies. Two datasets were used in this study. A historical dataset containing
the stock values of over ten 11, xxxx companies in the previous years, and a sentiment
dataset containing the stock market news reports from social media and other online
sources. The first step was to analyze the stock reports and classify them either as
a positive or a negative sentiment. Lexicon method of text sentiment classification
was used for this purpose. Predictions at this stage achieved an accuracy of 67.14%.
The second step of this study used ts and ARIMA functions to predict stock trend,
using the historical dataset. In the final step, results from both the components were
combined together, to predict stock prices in future. This improved the prediction
accuracy up to 89.80%.
Keywords Historic data analysis ·Lexicon method ·Sentiment analysis ·Stock
market prediction
A. C. Jishag (B)·A. P. Athira ·M. Shailaja ·S. Thara
Amrita Vishwa Vidyapeetham, Amritapuri, India
e-mail: jishagac@gmail.com
S. Thara
e-mail: thara.amrita06@gmail.com
© Springer Nature Singapore Pte Ltd. 2020
A. K. Luhach et al. (eds.), First International Conference on Sustainable Technologies
for Computational Intelligence, Advances in Intelligent Systems and Computing 1045,
https://doi.org/10.1007/978-981-15-0029- 9_56
717
718 A. C. Jishag et al.
1 Introduction
Stock market also known as a share market or an equity market is a digital mar-
ketplace, where the sellers and buyers, from various parts of the globe, meet for
economic transactions. These transactions are carried out with a discrete entity of
stocks or shares. These entities represent ownership claims on various business or-
ganizations. It might also represent securities listed on the public as well as privately
traded stock exchanges. Examples of privately traded stock exchanges include shares
of various private companies. These shares are sold to the investors or buyers through
various equity crowd funding platforms.
The complex behavior and volatility in the data make the decision-making in stock
market a tedious work for investors. This points to an important need to explore and
uncover the bulk amount of valuable data generated everyday by the stock market.
Investors tend to look for ways of predicting the future stock prices. Such predictions
help them determine the best time to carry out a transaction. A transaction carried out
at an opportune time can lead to the best results on the investments. Since the later
part of the last century, stock trading can be carried out either digitally or physically.
Stock market prediction provides strong insights into the future stock behavior to
investors, helping them in their investment decisions.
Stock market prediction has always been of immense interest to investors. Stock
market windows produce bulk amount of data, and these are considered as a treasure
of knowledge for investors. This is the major risk in the prediction of the stock market.
The bulkiness of data makes the stock values vulnerable to any incident happening
in the economic market. Prediction gets more complicated when all of these factors
affecting the stock prices have to be considered for prediction. In this study, we aim
to construct an effective and efficient model that can predict the future trends in the
stock market with the minimum error ratio and maximum possible accuracy for the
prediction. The prediction model is primarily based on two components: sentiment
analysis and historical data analysis. Sentiment analysis component would find the
polarity of stock market-related news articles from various sources. Historical data
analysis component would predict a function based on the stock prices from the
previous years.
2 Literature Survey
Stock market prediction is a method of understanding the upcoming fluctuations in the
stock prices of a company. Several researches have gone into this field. Some of them
focused on methods to improve the accuracy of the prediction based on sentiment
analysis of news related to stock trends, while others focused on the prediction of
price differentials with different phases. Studies have proved the presence of a strong
correlation between stock-related news and fluctuation in stock prices.
Predicting the Stock Market Behavior Using Historic … 719
2.1 Studies that Rely on the Online Media Stock Information
Bing in [1] proposed an algorithm for an accurate identification of stock prices by
studying social media data. In this study, Bing used several NLP routines, concurrent
with data mining techniques. He successfully deduced a relationship between public
sentimental values and numeric stock prices, in multilayer hierarchical structures.
He also found a relationship within the lower layers and top layer of unstructured
data.
Cara in [2] proposed a model to predict the Indonesian stock market based on
online articles using sentimental analysis. This study had three main objectives—
prediction of stock? price fluctuation, computations on margin percentage, and pre-
dicting the future stock prices. He made use of five unsupervised algorithms: support
vector machine (SVM), decision tree, random forest, naive Bayes, and neutral net-
works. Cara observed that the random forest and naive Bayes algorithms performed
much better compared to their counterparts. But a conspicuous limitation of this
study was that the derived prediction model factored in only the stock prices of the
five records that existed in the past.
Hana Hasans approach in [3] predicted stock trends, of a rise or a fall, combining
data hourly reports from online Web sites, breaking news on financial channels,
along with one-hour charts of stock prices, in the research demonstrated that the use
of logistic regression along with a keyword performed well in the prediction.
2.2 Studies Related to News Analysis
Patrick et al. in [4] deployed several text mining techniques in performing sentiment
analysis in the financial domain by integrating the word and the resources to un-
derstand and study the stock market reports. Their study was focused on German
language, which was used as a tool for sentiment analysis, at various levels. Stock
values were compared to the sentiment measured models, in identifying the person
who invested. This was later used in publishing investment recommendations, to
avoid the future investment risks.
Shynkevich in [5] made use of the multiple kernel language technique in investi-
gating the usage of two different classes of articles related to stock prices that focused
on targets. Shynkevich observed that these two classes could enhance the accuracy
in prediction of stock trends, by using data from news reports and historical stock
prices. He made use of open and close attributes. He also showed that usage of SVM
AND K-NN algorithms? lead to share drops in the prediction accuracy. Rule mining
was also used to uncover the stock market and generate the regulations in the pre-
diction of stock price. Naive Bayes algorithm was used in this study to predict the
class labels.
720 A. C. Jishag et al.
Hoang and Phayung proposed a model in [6] to identify upcoming stock trends on
stock exchanges in Vietnam, using news articles from various publications. The au-
thor’s combined algorithms from general SVM with linear SVM in their predictions,
based on the closing stock prices.
Jageshwar and Shagufa in [7] superimposed financial news reports on the observed
daily changes in stock prices. The main intent of this scheme was to increase the
prediction accuracy, by combining elements of technical analytics with rule-based
classifiers.
In [8], Ruchi and Gandhi conceived a model to predict stock trends observed
in the stock market, by employing non-quantifiable information from articles. The
authors developed a behavioral model based on statistical parameters, which helped
increase the prediction accuracy.
Saadi in [9] observed a relationship in economic news reports, using time series
methods. He made use of closing stock market values with ten methods, for time
serializing the data, followed by integrating them with machine learning algorithms
like SVM and K-NN.
In [10], Kim studied the stock market values. His research was focused on mining
opinion analytics. The study proved a strong correlation between news reports and
the stock prices.
Abdulla [11] scrutinized the Bangladesh stock market using NLP and text min-
ing techniques derived the information for his research. This study made use of an
information parser algorithm along with Open NLP, to familiarize the investor with
risks in selling or buying a particular stock.
Thara and Krishna in [12] used random Fourier features in the detection of sen-
timent analysis. This study made a selective choice from various relevant features
and was successful in showing a considerable improvement in the accuracy of the
detection of aspect-based polarities. This model was able to yield an accuracy of
90%.
In [13], Thara and Sidharth focused on the classification of data into different
categories of polarity. They were able to present a comparative result on the classi-
fication. This study concluded that the SVM, algorithm along with polynomial and
RBF outperformed all other algorithms.
Veena et al. in [14] identified and characterized cells as malignant and not-
malignant in association with mini-chromosomes. This study assessed the potential
of this association as the biomarkers of malignant and benign tissues of lungs. They
used lexicon method as a factor of classification.
Priyanka et al. in [15] conducted a study on methanolic extracts of turmeric
cultivars. This investigation pointed to a statistically significant radical scavenging
activity. The authors were able to establish Prathibha, a variety, displayed a very
significant curing method. This study also depended upon the algorithms of SVM
and random forest routines for the classification.
Predicting the Stock Market Behavior Using Historic … 721
3 Proposed Method
Stock market is an ever-vibrant section of financial management. There are tons of
Web sites and agencies, which promise stock predictions to a great level of accu-
racy to attract customers. Prediction of stock market performance is an impossible
task, theoretically. Voluminous stock transactions and the spontaneity of the factors
affecting the data bear testimony to the theory. Spanning from the wide range of
the unexpected changes in the business climate to the mood outlandish swings of
capricious CEOs, anything and everything can affect stock prices.
Several attempts have been made in the past to enhance accuracy in the prediction
of stock prices as well as overall performance of stock exchanges. Most of this
prediction was models aimed to predict future performance of stocks, on from the
analysis of historic data. None of these models succeeded in their prediction accuracy
of at least 70–75%. To minimize the errors of prediction, researchers resorted to
methods of machine learning, and sentiment analysis of stock reports on the social
media. Such measures scarcely achieved their objectives due to uncontrollable trolls
of unauthenticated reports on social media and polarized viewpoints of bloggers.
This paper describes a model combining these two ideologies: sentiment analysis
of social media reports and analytics of historic stock prices. Figure 1shows a block
diagram of the proposed algorithm, in a step-by-step approach.
Fig. 1 Data flow diagram—proposed algorithm
722 A. C. Jishag et al.
3.1 Data Description
Data of over ten companies traded on the NASDAQ stock exchange in New York,
USA, was used for this study. NASDAQ is one of the global electronic market-
places for selling and buying of stocks and securities. It is also considered to be
the benchmark index for the US technology stock exchanges. Two types of datasets
were collected for this study—news sentiment dataset and historic numeric dataset.
Historic data is constituted of stock-related numeric data from NASDAQ. Sentiment
dataset was collected from various online sources such as nasdaq.com, the Wall Street
Journal, and Yahoo Finance. These sources post each days news about various stock
exchanges, concerning several companies. The data considered from these sources
was articles, about stocks dividends, company splitting, mergers and acquisitions,
and reports from financial experts The news articles culled for the present research
were specific to the companies selected for this study.
3.2 Proposed Model Description
This section describes details of each component of the model proposed in this paper.
This model is comprised of two components, namely sentiment analysis and historic
data analysis.
(1) Sentiment Analysis: Sentiment analysis component explains the analysis of
stock-related news. The primary objective is to classify the reports to be ei-
ther positive or negative. Several data preprocessing routines are performed on
the stock reports. This action is followed by the classification of the news data
using the lexical approach. The following section describes the preprocessing
steps in detail:
Text preprocessing
Tokenization: It is the process of splitting textual contents of a document into
atomic components, tokens. In the context of this probe, tokenization was used to
split the stock reports into meaningful words.
Data Standardization: It is the process used in data analytics to bring all the data
within a dataset to a unique common format. For this study, Data standardization
was used to transform all the words in the stock report into lower case.
Stop-word-removal: Certain words in stock reports may not convey significant
meaning in the context of the stock prices, such as the, a, of etc. Stop-word-removal
is the process of removing these words from the stock report or document.
Stemming: It is the process of removing suffixes, such as -ed, -ing, -ion, from a root
word. This process helps reduce the complexity of the document and minimize
the processing time, which improves the model’s performance.
Lexicon Method There are primarily two approaches for creation of a sentiment
analysis model—machine learning and lexical approach. In the machine learning
Predicting the Stock Market Behavior Using Historic … 723
approach, the older posts and tweets, from online platforms, are used as a training
set to train a model. This model is used to predict the upcoming stock trend, by
passing the latest tweets through this model. On the other hand, in the lexicon
method, a pre-defined library containing the set of English words is used, catego-
rized as either positive or negative. A weight is assigned to each of these words with
respect to its positivity or negativity, in the context of stock market. Each word,
from the online posts and tweets, is compared with this positive and negative list,
and the overall sentiment of the post is computed. Lexicon method is far simpler
and takes lesser time to evaluate as compared to the machine learning approach,
but the latter provides a higher degree of accuracy. The algorithm discussed in this
paper preferred the simpler lexicon method.
Data obtained from the online sources was first preprocessed, prior to being sub-
jected to sentiment analysis. The raw data is first made into tokens or words by
the process of tokenization. The resultant tokens are recast into a consistent for-
mat by conversion of all the upper case letters to the lower case. Now the use of
punctuations is taken care of and the numbers are removed. Any white spaces are
then stripped off, before the data is ready for sentiment analysis.
N-Gram: This algorithm is applied upon the preprocessed data, just before senti-
ment analysis, to compensate for the low accuracy of the lexicon method. N-gram
finds all combinations of adjacent words or letters of length n that constitute the
dataset, thereby enhancing the accuracy.
Upon the successful implementation of sentiment analysis routines, a graph is
plotted, stating the positive and negative sentiment values, as shown in Fig.2.
Fig. 2 Final result from
sentiment analysis
724 A. C. Jishag et al.
(2) Historic Data Analysis: Historic data analysis is an inevitable part of stock mar-
ket prediction. It processes the part stock trend and models a graph predicting
the upcoming market trend. The algorithm discussed in this paper used autore-
gressive integrated moving average (ARIMA) and time series (ts) functions in
R Studio for this purpose.
Time Series Function Time series function or ts function is used to create time
series objects. These time series objects are matrices or vector with class of ts,
which represent the data, sampled at periodic points in time. In the context of this
paper, ts function intakes the historic dataset and serializes the data according to
time. This helps in the creation of a function with respect to the existing stock
trend. The resultant function is provided to ARIMA function.
ARIMA Function ARIMA stands for autoregressive integrated moving average.
It is a model class that captures a suite of different standard temporal structures
in the time series data. There are different definitions available for ARIMA as it
is a combination of three different entities or classes: auto regression, integrated,
and the moving average. Also, each definition of the ARMA models has different
signs for the AR and/or MA coefficients.
Xt=a1Xt1+··· +apXtp+et+b1et1+··· +bqet1(1)
The differenced series follows a zero-mean ARMA model for the ARIMA models
with differencing. In the aspects of our study, this function helps us in identifying
a function to be plotted as a graph for the historic trend of the past historic data as
in Fig. 3.
Fig. 3 Forcasted output
from ARIMA function
Predicting the Stock Market Behavior Using Historic … 725
Finally, the results from both sentiment data analysis and historic data analysis are
combined showcased in the shiny application. In the users perspective, this gives
them an easier way of understanding the both historic and sentiment aspects of the
company at a single stretch.
4 Experimentation and Result
This section describes the results of the various experiments, performed in this study,
to predict trends in the stock market, using sentiment analysis and historic data
analysis. The trials can broadly be categorized into two phases—the initial phase
covering results from the sentiment analysis component that classified the stock
reports and the subsequent phase describing the historic data analysis component
that processed the historic stock values and plotted a function for the same. Stock
values of ten companies were taken into consideration for either of the phases.
4.1 Results of Sentiment Analysis
Table 1shows the results of the experiments on the sentiment analysis component.
The result shown is a cumulative accuracy the prediction of the sentiment analysis
component corresponding to each company.
The results from these experiments stated that the model proposed in this paper
is able to achieve a better accuracy when compared with the models from previous
studies in the field of stock market. Also, it was observed that the execution time was
considerably low, compared to the existing models. Never did we have a system for
sentiment analysis in the market, which could surpass an accuracy of 80%. Almost all
of the results for stock market news sentiment analysis models gathered an accuracy
in the range of 71–75%. We were able to obtain accuracy ranging from 71.95 to
86.21%.
Three algorithms were compared upon the datasets of the selected companies so
as to ensure a high accuracy in the results. The algorithms selected for this purpose
were naive Bayes classifier, K-nearest neighbors (K-NN), and support vector machine
(SVM).
The results from this comparison are summarized in Table 2. The table contains
the accuracy obtained from each of the classifier algorithms ran on each company’s
Tabl e 1 Sentiment analysis result based on each company
Company Yahoo Msft Fb AAL Apple Adobe
Accuracy 86.21 72.73 82.76 80.52 76.32 81.22
726 A. C. Jishag et al.
Tabl e 2 Accuracy of each classifier upon each dataset
K-NN (%) SVM (%) Naive Bayes (%)
Yahoo 78.75 58.62 86.17
Msft 69.12 54.48 76.18
Fb 75.15 70.14 79.00
AAL 80.73 62.49 81.58
Apple 71.88 49.76 84.56
Adobe 70.01 60.09 76.11
dataset. The results clearly exhibit that the naive Bayes algorithm outperformed both
the SVM classifier and the K-NN algorithm with respect to textual data. It was also
observed that the SVM classifier registered the least accuracy when dealing with
textual data in comparison with K-NN and naive Bayes for our experimental dataset.
4.2 Results of Historic Data Analysis Component
Table 3shows the prediction accuracy with and without the use of historic data
component. It is evident from the results that the historic data analysis gives a big
boost in the accuracy of the prediction model. Hence, it can be validated that there is
a strong relation between the history of the stock prices and the stock news reports
for these stock prices. Concerning prediction of the stock market, hardly any of the
researchers have ever combined these two components together for the prediction
of stock prices. And hence most of these researchers achieved an accuracy ranging
from 74 to 81%, for the prediction on stock performance trends.
This study clearly showcases that the sentiment analysis component alone cannot
identify the future scope of the stock market. While sentiment analysis component
alone achieves an accuracy of 59.18%, sentiment analysis component, when com-
bined with the analysis of the past stock data achieved accuracy up to 89.79%. The
prediction model mainly depends on the K-NN algorithm, which earlier proved to
be the most efficient when dealing with textual and numeric data.
Thus, the results so far clearly depict that the model described in this paper is
an effective and efficient way to predict the future stock trends with a relatively
higher accuracy in comparison with the existing models. To verify this assumption,
Tabl e 3 Accuracy with and without the historic component for the proposed model
Measurement Sentiment component alone Sentiment and historic
component
Accuracy (%) 67.14 89.80
Predicting the Stock Market Behavior Using Historic … 727
Tabl e 4 Comparison of the proposed model with the existing models
Previous studies Obtained accuracy (%)
Model by Bing [1]76.12
Cara’s model [2]60.39–67.73
Shynkevichl Model [5]79.59
Model proposed by Phyng [11]75
Kim’s model [10]60–65
Model proposed in this paper 89.80
the proposed model is compared with the accuracy obtained by the existing models
and the models from the previous researches. Results from Table4demonstrate that
our proposed model outperforms all the other previous studies and models existing
currently in the market.
5 Conclusion
The proposed model conducted an investigation upon the effects of combining two
different types of analyses for predicting the future stock trend, namely news senti-
ment analysis and historical stock data analysis.
This paper has proposed model that showed a considerable increase in the accuracy
of the prediction when the historic analysis was combined with the sentiment stock
report component. Sentiment analysis component consisted of news reports collected
about various companies over the social media and other online platforms. Historical
stock data component was comprised of the historical data trend of each company
for the previous years. The first stage of the proposed model was to polarize the
news reports to either positive or negative using lexicon method. The second stage
was to predict a future stock trend using ARIMA and ts functions, upon the historic
stock dataset, for each company, obtained from NASDAQ. This was followed by
incorporation of the outputs from the first and second stages as the input in the
prediction of the future stock market using K-NN algorithm. The results of the
proposed model achieved an accuracy of 89.80%. The results obtained by this study
also pointed toward a strong relation between the sentiment analysis report of the
stock reports and the historic stock market trend. This model can be further refined by
the inclusion of machine learning or neural network algorithms in order to recognize
the underlying emotions in the classification of the news as positive or negative.
728 A. C. Jishag et al.
References
1. Bing, L.I., Ou, C.: Public sentiment analysis in Twitter datafor prediction of a company s stock
price movements. In: IEEE 11th International Public, Conference E-bus. Eng (2014)
2. Yahya Eru Cara, B.D.T.: Stock price prediction using linear regression based on sentiment
analysis. In: International Conference Advance Computer Science Information System, pp.
147154 (2015)
3. Hana Alostad, H.D.: Directional prediction of stock prices using breaking news on Twitter. In:
IEEE/WIC/ACM International Conference on Web Intelligence Intelligent Agent Technology,
pp. 07 (2015)
4. Patrick Uhr, M.F., Zenkert, J.: Sentiment analysis in financial market. In: IEEE International
Conference System Man, Cybernetics, pp. 912917 (2014)
5. Shynkevichl, Y., Mcginnityl, T.M., Colemanl, S., Belatrechel, A.: Stock Price Prediction based
on StockSpecific and Sub-Industry-Specific News Articles (2015)
6. Abdullah, S.S., Rahaman, M.S., Rahman, M.S.: Analysis of stock market using text mining
and natural language processing. In: 2013 International Conference Informatics, Electronics
Vis, pp. 16 (2013)
7. Price, S.M., Shriwas, J., Farzana, S.: Using text mining and rule based technique for prediction.
Int. J. Emerg. Technol. Adv. Eng. 4(1) (2014)
8. Desai, R.: Stock market prediction using data mining 1, 2(2), 27802784 (2014)
9. Journal, I., Social, O.F., Studies, H.: TIME SERIES ANALYSIS ON STOCK MARKET FOR
TEXT MINING 6(1), 69–91 (2014)
10. Kim, Y., Jeong, S.R., Ghani, I.: Text Opinion Mining to Analyze News for Stock Market
Prediction Int. J. Adv. Soft Comput. Its Appl. 6(1), 113 (2014)
11. Thanh, H.T.P., Meesad, P.: Stock market trend prediction based on text mining of corporate
web and time series data. J. Adv. Comput. Intell. Intell. Informatics 18(1) (2014)
12. Thara, S., Athul Krishna, N.S.: Aspect sentiment identification using random fourier features.
Int. J. Intell. Syst. Appl. 10(9), 32–39
13. Thara, S. Sidharth, S.: Aspect based sentiment classication: SVD features. In: 2017 International
Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, pp.
2370–2374 (2017)
14. Rafeek, R., Remya, R., Detecting contextual word polarity using aspect-based sentiment anal-
ysis and logistic regression. In: ICSTM 2017-Proceedings, : IEEE International Conference on
Smart Technologies and Management for Computing. Communication, Controls, Energy and
Materials (2017)
15. Vijayan, V.K, Bindu, K.R., Parameswaran, L.A.: Comprehensive study of text classication
algorithms. In: 2017 International Conference on Computing and Network Communications,
CoCoNet 2015 (2015)
... The researchers at [32] conducted a study to validate the effectiveness of various models for complex wind speed forecasting and specifically explored using mixed deep learning algorithms applied to wind speed data. A further study that used electrical load forecasting data [33] demonstrated similar findings, as it evaluated and empirically analyzed the most efficient deep learning models suitable for predicting short-term load. ...
Article
Full-text available
This study uses real-world illustrations to explore the application of deep learning approaches to predict economic information. In this, we investigate the effect of deep learning model architecture and time-series data properties on prediction accuracy. We aim to evaluate the predictive power of several neural network models using a financial time-series dataset. These models include Convolutional RNNs, Convolutional LSTMs, Convolutional GRUs, Convolutional Bi-directional RNNs, Convolutional Bi-directional LSTMs, and Convolutional Bi-directional GRUs. Our main objective is to utilize deep learning techniques for simultaneous predictions on multivariable time-series datasets. We utilize the daily fluctuations of six Asian stock market indices from 1 April 2020 to 31 March 2024. This study’s overarching goal is to evaluate deep learning models constructed using training data gathered during the early stages of the COVID-19 pandemic when the economy was hit hard. We find that the limitations prove that no single deep learning algorithm can reliably forecast financial data for every state. In addition, predictions obtained from solitary deep learning models are more precise when dealing with consistent time-series data. Nevertheless, the hybrid model performs better when analyzing time-series data with significant chaos.
... Fluctuations in stock prices reflect not only the market's expectations of companies but also the market's expectations of the overall economic environment. Therefore, forecasting stock prices helps investors, policymakers, and market regulators to make more informed decisions [1]. Early forecasting methods mainly include fundamental analysis and technical analysis. ...
Article
Full-text available
With the rapid development of financial markets, accurate stock price prediction is significant to investors and financial institutions. Many researchers proposed stock price prediction models, including linear models, random forests, and LSTMs. However, few studies have comprehensively compared the three models. This study aims to fill this gap by analysing the forecasting effectiveness of different models through empirical studies. This research is to explore the application of linear models, random forests, and LSTM models in predicting stock prices and analyse and compare the principles, advantages and disadvantages, and the scope of application of these three models. According to the analysis, they all have their scope of application and limitations in different situations. In practical application, the appropriate model can be chosen for prediction and analysis according to the specific data sets and research purpose. Meanwhile, it is also possible to try to integrate and improve different models to get better prediction results. In addition, the influence of data quality and completeness, feature selection and extraction from the prediction results should be noted to improve the prediction accuracy and stability of the model. In conclusion, this thesis provides some references and lessons for related studies and practical applications by analysing and comparing the applications of LSTM, linear models, and random forests in predicting stock prices.
... The trust value of the source user vs to the destination user is derived by employing the iterative multiplication approach to multiply the trust values along the path. If numerous possible paths exist between these nodes, the route with the highest trust rating is preferred.To compute trust [10] have used the basic idea of how two users A and B are trusted or trusted by other users. It can also be seen that how a user's reciprocal neighbours' trust differs from the trust it obtains from them. ...
... In this work, they have tried to authorize if the comments or reviews regarding a hotel is false or honest as put by any user where as in the latter work, they have tried to examine deceitful data within the standard datasets. The work in [8] corresponds to Stock market Prediction analysis. Again, the work in [9], [10] and [12] corresponds to work done in Natural language Processing domain. ...
... A. C. Jishag et al. suggests that the N-Grams method can be used in Sentiment Analysis before the Lexicon method for improved accuracy [7]. Thara S. et al. proves that the aspect based sentiment polarities can be improvised by mapping them to a lower dimensions [8] and Singular Value Decomposition could catch the latent connection of the data [9]. ...
Conference Paper
Abstract—The ongoing pandemic has caused several impacts on human life. Social media has become more popular during the pandemic, wherein people share all their thoughts and emotions. Due to the social distancing norms and other preventive measures, people are connecting with each other through this platform. This gave us an occasion to study the overall mental reaction of the public to this disease. To measure the region-wise sentiment value, we used the tweets associated with COVID-19 in this paper. This is then used for the model’s training. Then, we generate the average emotion conveyed by each region. This can be beneficial in validating the psychological impact that COVID- 19 had on the people, helping governments to take effective actions. The output as well as the COVID-19 guidelines and data are then represented in a website.
Article
Full-text available
Diabetes complaint is substantially caused due to increase in blood glucose position. With the growth of Machine knowledge styles, we have got the strictness to search out an answer to the current issue, we have got advanced system mistreatment information processing that has the capability to read whether the case has polygenic illness or not. Like wise, reading the sickness firstly ends up in furnishing the cases before it begins vital. Information retirement has the strictness to remove unseen data from a large volume of diabetes associated information Data wisdom styles are having the capability to gain other scientific fields through rending new light on common questions. One similar task is aiding in making prognostications on medical data is one similar task. Machine literacy is a new scientific field in data wisdom which handles the ways during which machines learn from experience. The design's main end is to introduce a fashion which performs early vaticination of diabetes complaint of a case with better delicacy by combining colorful machine literacy ways' results. The main provocation of doing this design is to present a diabetes vaticination model for the vaticination of circumstance of diabetes.
Article
Full-text available
The objective of the paper was to show the effectiveness of using random Fourier features in detection of sentiment polarities. The method presented in this paper proves that detection of aspect based polarities can be improved by selective choice of relevant features and mapping them to lower dimensions. In this study, random Fourier features were prepared corresponding to the polarity data. A regularized least square strategy was adopted to fit a model and perform the task of polarity detection Experiments were performed with 10 cross-validations. The proposed method with random Fourier features yielded 90% accuracy over conventional classifiers. Precision, Recall, and F-measure were deployed in our empirical evaluations.
Conference Paper
Full-text available
Accurate forecasting of upcoming trends in the capital markets is extremely important for algorithmic trading and investment management. Before making a trading decision, investors estimate the probability that a certain news item will influence the market based on the available information. Speculation among traders is often caused by the release of a breaking news article and results in price movements. Publications of news articles influence the market state that makes them a powerful source of data in financial forecasting. Recently, researchers have developed trend and price prediction models based on information extracted from news articles. However, to date no previous research that investigates the advantages of using news articles with different levels of relevance to the target stock has been conducted. This research study uses the multiple kernel learning technique to effectively combine information extracted from stock-specific and sub-industry-specific news articles for prediction of an upcoming price movement. News articles are divided into these two categories based on their relevance to a targeted stock and analyzed by separate kernels. The experimental results show that utilizing two categories of news improves the prediction accuracy in comparison with methods based on a single news category.
Conference Paper
Stock price prediction is a difficult task, since it very depending on the demand of the stock, and there is no certain variable that can precisely predict the demand of one stock each day. However, Efficient Market Hypothesis (EMH) said that stock price also depends on new information significantly. One of many information sources is people's opinion in social media. People's opinion about products from certain companies may determine the company's reputation and thus affecting people's decision to buy the stock of the company. When using opinion as primary data, it is necessary to make a suitable analysis of it. A famous example using opinion as data is sentiment analysis. Sentiment analysis is a process to determine emotion/feeling within people opinion about something, in this case products of some companies. There are some researches about sentiment analysis used to predict the stock prices. Bollen on his research concludes that people opinion on social media such as Twitter can predict DJIA value with 87.6% accuracy. This shows that there is a relation between sentiment analysis and stock prices. Our purpose on this research is to predict the Indonesian stock market using simple sentiment analysis. Naive Bayes and Random Forest algorithm are used to classify tweet to calculate sentiment regarding a company. The results of sentiment analysis are used to predict the company stock price. We use linear regression method to build the prediction model. Our experiment shows that prediction models using previous stock price and hybrid feature as predictor gives the best prediction with 0.9989 and 0.9983 coefficient of determination.