Conference Paper

Predict Market Movements Based on the Sentiment of Financial Video News Sites

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Due to market volatility, forecasting stock market trend is one of the most complicated tasks. There is a heated debate about the effectiveness of predicting market movement based on public sentiments expressed in written news, whether through social media, web pages, or financial news. However, past researches ignore a crucial source of knowledge, which is videos news, that is typically introduced by market specialists. This research examined the reliability of using the sentiment of video news sites to forecast the stock price. To determine the strength of the causal relationship between stock market prices and video sentiments, we applied Granger causality analysis and Pearson correlation coefficient tests. We also investigated the use of TextBlob API versus the efficiency of Google Cloud Natural Language API to find sentiment polarity scores for video news. Various models were evaluated for Sentiment Analysis of S&P 500 stock using LR, SVM, LSTM, and CNN models. Finally, we utilized the most effective sentiment analysis tool to train our ML classification model. This research is unique because it identifies and tests the question. Can we build an effective prediction model based on video news sentiment or can we add video news sentiment as a new feature to our future prediction model? The experimental findings demonstrate that there is a causal connection between video news sentiment and stock market fluctuation. The findings also revealed that when using the Google Cloud Natural Language API for sentiment analysis, the model showed a correlation between the video news and the company's price movements. Index Terms-market prediction, video news, machine learning, causality test.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper examines the impact of the sentiments of OPEC news on stock market prices of public listed oil and gas companies in Bursa Malaysia. We used data of stock market prices from randomly selected oil and gas companies for the period of 2012 to 2017. For the methodology, we first established a supervised machine learning algorithm-based news classifier to classify the OPEC news following its sentiments. We developed a financial news sentiment classifier by combining machine learning algorithms and lexicon-based labelling methods. We then applied the event study method to investigate how stock market prices react to OPEC news’ sentiment. The results showed a negative correlation between OPEC news sentiment and stock market prices of oil and gas companies during the event window based on each OPEC news release date. The results further showed that the stock market prices do not react to OPEC news sentiment on event day. These findings should provide some guides to stock investors on the movement of the selected stock market prices of energy sector companies during the event window period.
Article
Full-text available
Stock price prediction can be made more efficient by considering the price fluctuations and understanding people’s sentiments. A limited number of models understand financial jargon or have labelled datasets concerning stock price change. To overcome this challenge, we introduced FinALBERT, an ALBERT based model trained to handle financial domain text classification tasks by labelling Stocktwits text data based on stock price change. We collected Stocktwits data for over ten years for 25 different companies, including the major five FAANG (Facebook, Amazon, Apple, Netflix, Google). These datasets were labelled with three labelling techniques based on stock price changes. Our proposed model FinALBERT is fine-tuned with these labels to achieve optimal results. We experimented with the labelled dataset by training it on traditional machine learning, BERT, and FinBERT models, which helped us understand how these labels behaved with different model architectures. Our labelling method’s competitive advantage is that it can help analyse the historical data effectively, and the mathematical function can be easily customised to predict stock movement.
Article
Full-text available
The prediction and speculation about the values of the stock market especially the values of the worldwide companies are a really interesting and attractive topic. In this article, we cover the topic of the stock value changes and predictions of the stock values using fresh scraped economic news about the companies. We are focussing on the headlines of economic news. We use numerous different tools to the sentiment analysis of the headlines. We consider BERT as the baseline and compare the results with three other tools, VADER, TextBlob, and a Recurrent Neural Network, and compare the sentiment results to the stock changes of the same period. The BERT and RNN were much more accurate, these tools were able to determine the emotional values without neutral sections, in contrast to the other two tools. Comparing these results with the movement of stock market values in the same time periods, we can establish the moment of the change occurred in the stock values with sentiment analysis of economic news headlines. Also we discovered a significant difference between the different models in terms of the effect of emotional values on the change in the value of the stock market by the correlation matrices.
Article
Full-text available
Outbreak and spread of the Covid-19 pandemic have touched to the core of our sentiments. Indian stock market has seen a roller coaster ride so far this year amid the Covid-19 pandemic. Sentiments have turned out to be a significant influence on the movement of the Indian stock market and pandemic has only added more steam. This study with the limelight on the Covid-19 pandemic is an endeavour to investigate the classification accuracy of selected ML algorithms under natural language processing for sentiment analysis and prediction for the Indian stock market. The study proposes the framework for sentiment analysis and prediction for the Indian stock market where six ML algorithms are put to test. Consequently, the study highlights the superior algorithms based on accuracy results. These superior algorithms can be potent input to build robust prediction models as a logical next step.
Article
Full-text available
The stock market is very unstable and volatile due to several factors such as public sentiments, economic factors and more. Several Petabytes volumes of data are generated every second from different sources, which affect the stock market. A fair and efficient fusion of these data sources (factors) into intelligence is expected to offer better prediction accuracy on the stock market. However, integrating these factors from different data sources as one dataset for market analysis is seen as challenging because they come in a different format (numerical or text). In this study, we propose a novel multi-source information-fusion stock price prediction framework based on a hybrid deep neural network architecture (Convolution Neural Networks (CNN) and Long Short-Term Memory (LSTM)) named IKN-ConvLSTM. Precisely, we design a predictive framework to integrate stock-related information from six (6) heterogeneous sources. Secondly, we construct a base model using CNN, and random search algorithm as a feature selector to optimise our initial training parameters. Finally, a stacked LSTM network is fine-tuned by using the tuned parameter (features) from the base-model to enhance prediction accuracy. Our approach's emperical evaluation was carried out with stock data (January 3, 2017, to January 31, 2020) from the Ghana Stock Exchange (GSE). The results show a good prediction accuracy of 98.31%, specificity (0.9975), sensitivity (0.8939%) and F-score (0.9672) of the amalgamated dataset compared with the distinct dataset. Based on the study outcome, it can be concluded that efficient information fusion of different stock price indicators as a single data source for market prediction offer high prediction accuracy than individual data sources.
Article
Full-text available
This study explores the initial impact of COVID-19 sentiment on US stock market using big data. Using the Daily News Sentiment Index (DNSI) and Google Trends data on coronavirus-related searches, this study investigates the correlation between COVID-19 sentiment and 11 select sector indices of the Unites States (US) stock market over the period from 21st of January 2020 to 20th of May 2020. While extensive research on sentiment analysis for predicting stock market movement use tweeter data, not much has used DNSI or Google Trends data. In addition, this study examines whether changes in DNSI predict US industry returns differently by estimating the time series regression model with excess returns of industry as the dependent variable. The excess returns are obtained from the Fama-French three factor model. The results of this study offer a comprehensive view of the initial impact of COVID-19 sentiment on the US stock market by industry and furthermore suggests the strategic investment planning considering the time lag perspectives by visualizing changes in the correlation level by time lag differences.
Article
Full-text available
Stock prediction is a challenging task concerned by researchers due to its considerable returns. It is difficult because of the high randomness in the stock market. Stock price movement is mainly related to the capital situation and hot events. In recent years, researchers improved prediction accuracy with news and social media. However, the existing methods do not take into account the different influences of events. To solve this problem, we propose a multi-element hierarchical attention capsule network, which consists of two components. The former component, multi-element hierarchical attention, quantifies the importance of valuable information contained in multiple news and social media through its weights assignment process. And the latter component, capsule network, learns more context information from the events through its vector representation in the hidden layer. Moreover, we construct a combined data set to maintain the complementarity between social media and news. Finally, we achieve better results than baselines, and experiments show that our model improves prediction accuracy by quantifying the different influences of events.
Article
Full-text available
Stock price movement prediction plays important roles in decision making for investors. It was usually regarded as a binary classification task. In this paper, a recurrent convolutional neural kernel (RCNK) model was proposed, which learned complementary features from different sources of data, namely, historical price data and text data in the message board, to predict the stock price movement. It integrated the advantage of technical analysis and sentiment analysis. Different from previous studies, the text data was treated as sequential data and utilized the RCNK model to train sentiment embeddings with the temporal features. Besides, in the classification section of the model, the explicit kernel mapping layer was used to replace several full-connected layers. This operation reduced the parameters of the model and the risk of overfitting. In order to test the impact of treating the sentiment data as sequential data, the effectiveness of explicit kernel mapping layer and the usefulness integrating the technical analysis and sentiment analysis, the proposed model was compared with the other two deep learning models (recurrent convolutional neural network model and convolutional neural kernel model) and the models with only one source of data as input. The result showed that the proposed model outperformed the other models.
Article
Full-text available
Predicting the stock market remains a challenging task due to the numerous influencing factors such as investor sentiment, firm performance, economic factors and social media sentiments. However, the profitability and economic advantage associated with accurate prediction of stock price draw the interest of academicians, economic, and financial analyst into researching in this field. Despite the improvement in stock prediction accuracy, the literature argues that prediction accuracy can be further improved beyond its current measure by looking for newer information sources particularly on the Internet. Using web news, financial tweets posted on Twitter, Google trends and forum discussions, the current study examines the association between public sentiments and the predictability of future stock price movement using Artificial Neural Network (ANN). We experimented the proposed predictive framework with stock data obtained from the Ghana Stock Exchange (GSE) between January 2010 and September 2019, and predicted the future stock value for a time window of 1 day, 7 days, 30 days, 60 days, and 90 days. We observed an accuracy of (49.4-52.95 %) based on Google trends, (55.5-60.05 %) based on Twitter, (41.52-41.77 %) based on forum post, (50.43-55.81 %) based on web news and (70.66-77.12 %) based on a combined dataset. Thus, we recorded an increase in prediction accuracy as several stock-related data sources were combined as input to our prediction model. We also established a high level of direct association between stock market behaviour and social networking sites. Therefore, based on the study outcome, we advised that stock market investors could utilise the information from web financial news, tweet, forum discussion, and Google trends to effectively perceive the future stock price movement and design effective portfolio/investment plans.
Article
Full-text available
This paper proposes and analyzes a methodology of forecasting movements of the analysts’ net income estimates and those of stock prices. We achieve this by applying natural language processing and neural networks in the context of analyst reports. In the pre-experiment, we applied our method to extract opinion sentences from the analyst report while classifying the remaining partsas non-opinion sentences. Then, we performed two additional experiments. First, we employed our proposed method for forecasting the movements of analysts’ net income estimates by inputting the opinion and non-opinion sentences into separate neural networks. Besides the reports, we inputted the trend of the net income estimate to the networks. Second, we employed our proposed method for forecasting the movements of stock prices. Consequently, we found differences between security firms, which depend on whether analysts’ net income estimates tend to be forecasted by opinions or facts in the context of analyst reports. Furthermore, the trend of the net income estimate was found to be effective for the forecast as well as an analyst report. However, in experiments of forecasting movements of stock prices, the difference between opinion sentences and non-opinion sentences was not effective.
Article
Full-text available
Due to its dynamics, non-linearity and complexity nature, stock market is inherently difficult to predict. One of the attractive objectives is to predict stock market movement direction by using public sentiments analysis. However, there is an active debate about the usefulness of this approach and the strength of causality between stock market trends and sentiments. The opinions of researchers range from rejecting the relationship to confirming a clear causality between sentiments and trading in stock markets. Nevertheless, many advanced computational methods have adopted sentiment-based features, yet did not attain maturity and performance. In this paper, we are contributing constructively in this debate by empirically investigating the predictability of stock market movement direction using an enhanced method of sentiments analysis. Precisely, we experiment on stock prices history, sentiments polarity, subjectivity, N-grams, customized text-based features in addition to features lags that are used for a finer-grained analysis. Five research questions have been investigated towards answering issues associated with stock market movement prediction using sentiment analysis. We have collected and studied the stocks of ten influential companies belonging to different stock domains in NASDAQ. Our analysis approach is complemented by a sophisticated causality analysis, an algorithmic feature selection and a variety of machine learning techniques including regularized models stacking. A comparison of our approach with other sentiment-based stock market prediction approaches including Deep learning, establishes that our proposed model is performing adequately and predicting stock movements with a higher accuracy of 60%.
Article
Full-text available
Sentiment analysis has become a key technology to gain insight from social networks. The field has reached a level of maturity that paves the way for its exploitation in many different fields such as marketing, health, banking or politics. The latest technological advancements, such as deep learning techniques, have solved some of the traditional challenges in the area caused by the scarcity of lexical resources. In this Special Issue, different approaches that advance this discipline are presented. The contributed articles belong to two broad groups: technological contributions and applications.
Article
Full-text available
The stock market is a key pivot in every growing and thriving economy, and every investment in the market is aimed at maximising profit and minimising associated risk. As a result, numerous studies have been conducted on the stock-market prediction using technical or fundamental analysis through various soft-computing techniques and algorithms. This study attempted to undertake a systematic and critical review of about one hundred and twenty-two (122) pertinent research works reported in academic journals over 11 years (2007–2018) in the area of stock market prediction using machine learning. The various techniques identified from these reports were clustered into three categories, namely technical, fundamental, and combined analyses. The grouping was done based on the following criteria: the nature of a dataset and the number of data sources used, the data timeframe, the machine learning algorithms used, machine learning task, used accuracy and error metrics and software packages used for modelling. The results revealed that 66% of documents reviewed were based on technical analysis; whiles 23% and 11% were based on fundamental analysis and combined analyses, respectively. Concerning the number of data source, 89.34% of documents reviewed, used single sources; whiles 8.2% and 2.46% used two and three sources respectively. Support vector machine and artificial neural network were found to be the most used machine learning algorithms for stock market prediction.
Article
Full-text available
Recent advances in the integration of deep recurrent neural networks and statistical inferences have paved new avenues for joint modeling of moments of random variables, which is highly useful for signal processing, time series analysis, and financial forecasting. However, introducing explicit knowledge as exogenous variables has received little attention. In this paper, we propose a novel model termed sentiment-aware volatility forecasting (SAVING), which incorporates market sentiment for stock return fluctuation prediction. Our framework provides an ensemble of symbolic and sub-symbolic AI approaches, that is, including grounded knowledge into a connectionist neural network. The model aims at producing a more accurate estimation of temporal variances of asset returns by better capturing the bi-directional interaction between movements of asset price and market sentiment. The interaction is modeled using Variational Bayes via the data generation and inference operations. We benchmark our model with 9 other popular ones in terms of the likelihood of forecasts given the observed sequence. Experimental results suggest that our model not only outperforms pure statistical models, e.g., GARCH and its variants, Gaussian-process volatility model, but also outperforms the state-of-the-art autoregressive deep neural nets architectures, such as the variational recurrent neural network and the neural stochastic volatility model.
Article
Full-text available
Forecasting time series data is an important subject in economics, business, and finance. Traditionally, there are several techniques to effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more notably Autoregressive Integrated Moving Average (ARIMA) with its many variations. In particular, ARIMA model has demonstrated its outperformance in precision and accuracy of predicting the next lags of time series. With the recent advancement in computational power of computers and more importantly developing more advanced machine learning algorithms and approaches such as deep learning, new algorithms are developed to forecast time series data. The research question investigated in this article is that whether and how the newly developed deep learning-based algorithms for forecasting time series data, such as "Long Short-Term Memory (LSTM)", are superior to the traditional algorithms. The empirical studies conducted and reported in this article show that deep learning-based algorithms such as LSTM outperform traditional-based algorithms such as ARIMA model. More specifically, the average reduction in error rates obtained by LSTM is between 84 - 87 percent when compared to ARIMA indicating the superiority of LSTM to ARIMA. Furthermore, it was noticed that the number of training times, known as "epoch" in deep learning, has no effect on the performance of the trained forecast model and it exhibits a truly random behavior.
Article
Full-text available
News reports have become an imperative conduit of public information. Several recent studies have used news data from public media to investigate the impact of news on stock market returns. This study analyses the usefulness of news for predicting stock returns in the Taiwan stock market. We employ text mining of economic news, transform documents using a keyword matrix, and then convert the results into news variables. Subsequently, together with other quantitative variables, we construct a fixed effect model to investigate the behaviours of stock market returns in 20 subsectors from January 2008 to December 2014. Empirical analysis reveals that the news variables provide useful information for predicting Taiwan stock market returns, although the out-sample performance is only marginally improved. We also discover an asymmetric effect of economic news for predicting stock market returns. The prediction accuracy is higher when the stock market is booming than when it is glooming.
Article
Full-text available
Traditional stock market prediction approaches commonly utilize the historical price-related data of the stocks to forecast their future trends. As the Web information grows, recently some works try to explore financial news to improve the prediction. Effective indicators, e.g., the events related to the stocks and the people's sentiments towards the market and stocks, have been proved to play important roles in the stocks' volatility, and are extracted to feed into the prediction models for improving the prediction accuracy. However, a major limitation of previous methods is that the indicators are obtained from only a single source whose reliability might be low, or from several data sources but their interactions and correlations among the multi-sourced data are largely ignored. In this work, we extract the events from Web news and the users' sentiments from social media, and investigate their joint impacts on the stock price movements via a coupled matrix and tensor factorization framework. Specifically, a tensor is firstly constructed to fuse heterogeneous data and capture the intrinsic relations among the events and the investors' sentiments. Due to the sparsity of the tensor, two auxiliary matrices, the stock quantitative feature matrix and the stock correlation matrix, are constructed and incorporated to assist the tensor decomposition. The intuition behind is that stocks that are highly correlated with each other tend to be affected by the same event. Thus, instead of conducting each stock prediction task separately and independently, we predict multiple correlated stocks simultaneously through their commonalities, which are enabled via sharing the collaboratively factorized low rank matrices between matrices and the tensor. Evaluations on the China A-share stock data and the HK stock data in the year 2015 demonstrate the effectiveness of the proposed model.
Conference Paper
Full-text available
In this paper, we introduce a new prediction model depend on Bidirectional Gated Recurrent Unit (BGRU). Our predictive model relies on both online financial news and historical stock prices data to predict the stock movements in the future. Experimental results show that our model accuracy achieves nearly 60% in S&P 500 index prediction whereas the individual stock prediction is over 65%.
Book
Full-text available
The key component in forecasting demand and consumption of resources in a supply network is an accurate prediction of real-valued time series. Indeed, both service interruptions and resource waste can be reduced with the implementation of an effective forecasting system. Significant research has thus been devoted to the design and development of methodologies for short term load forecasting over the past decades. A class of mathematical models, called Recurrent Neural Networks, are nowadays gaining renewed interest among researchers and they are replacing many practical implementations of the forecasting systems, previously based on static methods. Despite the undeniable expressive power of these architectures, their recurrent nature complicates their understanding and poses challenges in the training procedures. Recently, new important families of recurrent architectures have emerged and their applicability in the context of load forecasting has not been investigated completely yet. This work performs a comparative study on the problem of Short-Term Load Forecast, by using different classes of state-of-the-art Recurrent Neural Networks. The authors test the reviewed models first on controlled synthetic tasks and then on different real datasets, covering important practical cases of study. The text also provides a general overview of the most important architectures and defines guidelines for configuring the recurrent networks to predict real-valued time series.
Article
Full-text available
Opinion Mining (OM) or Sentiment Analysis (SA) can be defined as the task of detecting, extracting and classifying opinions on something. It is a type of the processing of the natural language (NLP) to track the public mood to a certain law, policy, or marketing, etc. It involves a way that development for the collection and examination of comments and opinions about legislation, laws, policies, etc., which are posted on the social media. The process of information extraction is very important because it is a very useful technique but also a challenging task. That mean, to extract sentiment from an object in the web-wide, need to automate opinion-mining systems to do it. The existing techniques for sentiment analysis include machine learning (supervised and unsupervised), and lexical-based approaches. Hence, the main aim of this paper presents a survey of sentiment analysis (SA) and opinion mining (OM) approaches, various techniques used that related in this field. As well, it discusses the application areas and challenges for sentiment analysis with insight into the past researcher's works.
Article
Full-text available
Stock market prediction has become an attractive investigation topic due to its important role in economy and beneficial offers. There is an imminent need to uncover the stock market future behavior in order to avoid investment risks. The large amount of data generated by the stock market is considered a treasure of knowledge for investors. This study aims at constructing an effective model to predict stock market future trends with small error ratio and improve the accuracy of prediction. This prediction model is based on sentiment analysis of financial news and historical stock market prices. This model provides better accuracy results than all previous studies by considering multiple types of news related to market and company with historical stock prices. A dataset containing stock prices from three companies is used. The first step is to analyze news sentiment to get the text polarity using naïve Bayes algorithm. This step achieved prediction accuracy results ranging from 72.73% to 86.21%. The second step combines news polarities and historical stock prices together to predict future stock prices. This improved the prediction accuracy up to 89.80%.
Article
Full-text available
The World Wide Web such as social networks, forums, review sites and blogs generate enormous heaps of data in the form of users views, emotions, opinions and arguments about different social events, products, brands, and politics. Sentiments of users that are expressed on the web has great influence on the readers, product vendors and politicians. The unstructured form of data from the social media is needed to be analyzed and well-structured and for this purpose, sentiment analysis has recognized significant attention. Sentiment analysis is referred as text organization that is used to classify the expressed mind-set or feelings in different manners such as negative, positive, favorable, unfavorable, thumbs up, thumbs down, etc. The challenge for sentiment analysis is lack of sufficient labeled data in the field of Natural Language Processing (NLP). And to solve this issue, the sentiment analysis and deep learning techniques have been merged because deep learning models are effective due to their automatic learning capability. This Review Paper highlights latest studies regarding the implementation of deep learning models such as deep neural networks, convolutional neural networks and many more for solving different problems of sentiment analysis such as sentiment classification, cross lingual problems, textual and visual analysis and product review analysis, etc.
Article
Full-text available
The key component in forecasting demand and consumption of resources in a supply network is an accurate prediction of real-valued time series. Indeed, both service interruptions and resource waste can be reduced with the implementation of an effective forecasting system. Significant research has thus been devoted to the design and development of methodologies for short term load forecasting over the past decades. A class of mathematical models, called Recurrent Neural Networks, are nowadays gaining renewed interest among researchers and they are replacing many practical implementation of the forecasting systems, previously based on static methods. Despite the undeniable expressive power of these architectures, their recurrent nature complicates their understanding and poses challenges in the training procedures. Recently, new important families of recurrent architectures have emerged and their applicability in the context of load forecasting has not been investigated completely yet. In this paper we perform a comparative study on the problem of Short-Term Load Forecast, by using different classes of state-of-the-art Recurrent Neural Networks. We test the reviewed models first on controlled synthetic tasks and then on different real datasets, covering important practical cases of study. We provide a general overview of the most important architectures and we define guidelines for configuring the recurrent networks to predict real-valued time series.
Article
Full-text available
In this paper, a novel neural network is proposed, which can automatically learn and recall contents from texts, and answer questions about the contents in either a large corpus or a short piece of text. The proposed neural network combines parse trees, semantic networks, and inference models. It contains layers corresponding to sentences, clauses, phrases, words and synonym sets. The neurons in the phrase-layer and the word-layer are labeled with their part-of-speeches and their semantic roles. The proposed neural network is automatically organized to represent the contents in a given text. Its carefully designed structure and algorithms make it able to take advantage of the labels and neurons of synonym sets to build the relationship between the sentences about similar things. The experiments show that the proposed neural network with the labels and the synonym sets has the better performance than the others that do not have the labels or the synonym sets while the other parts and the algorithms are the same. The proposed neural network also shows its ability to tolerate noise, to answer factoid questions, and to solve single-choice questions in an exercise book for non-native English learners in the experiments.
Article
Full-text available
The ability to exploit public sentiment in social media is increasingly considered as an important tool for market understanding, customer segmentation and stock price prediction for strategic marketing planning and manoeuvring. This evolution of technology adoption is energised by the healthy growth in big data framework, which caused applications based on Sentiment Analysis (SA) in big data to become common for businesses. However, scarce works have studied the gaps of SA application in big data. The contribution of this paper is two-fold: (i) this study reviews the state of the art of SA approaches. including sentiment polarity detection, SA features (explicit and implicit), sentiment classification techniques and applications of SA and (ii) this study reviews the suitability of SA approaches for application in the big data frameworks, as well as highlights the gaps and suggests future works that should be explored. SA studies are predicted to be expanded into approaches that utilise scalability, possess high adaptability for source variation, velocity and veracity to maximise value mining for the benefit of the users. © 2016 Nurfadhlina Mohd Sharef, Harnani Mat Zin and Samaneh Nadali.
Article
Full-text available
Social media are increasingly reflecting and influencing behavior of other complex systems. In this paper we investigate the relations between a well-known micro-blogging platform Twitter and financial markets. In particular, we consider, in a period of 15 months, the Twitter volume and sentiment about the 30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We find a relatively low Pearson correlation and Granger causality between the corresponding time series over the entire time period. However, we find a significant dependence between the Twitter sentiment and abnormal returns during the peaks of Twitter volume. This is valid not only for the expected Twitter volume peaks (e.g., quarterly announcements), but also for peaks corresponding to less obvious events. We formalize the procedure by adapting the well-known "event study" from economics and finance to the analysis of Twitter data. The procedure allows to automatically identify events as Twitter volume peaks, to compute the prevailing sentiment (positive or negative) expressed in tweets at these peaks, and finally to apply the "event study" methodology to relate them to stock returns. We show that sentiment polarity of Twitter peaks implies the direction of cumulative abnormal returns. The amount of cumulative abnormal returns is relatively low (about 1-2%), but the dependence is statistically significant for several days after the events.
Article
Recently, social media, particularly microblogs, have become highly valuableinformation resources for many investors. Previous studies examined general stockmarket movements, whereas in this paper, USD/TRY currency movements based on thechange in the number of positive, negative and neutral tweets are analyzed. Weinvestigate the relationship between Twitter content categorized as sentiments, such asBuy, Sell and Neutral, with USD/TRY currency movements. The results suggest thatthere exists a relationship between the number of tweets and the change in USD/TRYexchange rate.
Article
The Efficient Market Hypothesis states that stock market changes reflect the arrival of new information through external events and news. Thus, many recent studies in the literature evaluate the impact of Sentiment Analysis (SA) applied to social media and news in the stock market. However, these studies generally do not present investment strategies that take advantage of sentiments in new publications considering the correlation between news and the stock market, specially when news are written in Portuguese. This paper proposes investment strategies based on Sentiment Analysis of financial news applied to the Brazilian stock market. For such, the following activities were performed: (i) identifying the most suitable Artificial Neural Network (ANN) architecture to perform Sentiment Analysis in financial news in Brazilian Portuguese; (ii) studying the correlation between the predominant sentiment in financial news of three major Brazilian news portals through the Granger causality test; (iii) proposing two categories of investment strategies based on Sentiment Analysis, considering both negative and positive financial news; and (iv) applying the proposed strategies to the Brazilian stock market. Experiments were conducted with financial news from the most popular Brazilian online news sources and the results showed: (i) the most appropriate ANN to perform SA in Portuguese is the Convolutional Neural Network; (ii) there is a significant influence of the predominant daily news sentiment in the stock market; and (iii) investment strategies based on Sentiment Analysis can bring profitability both in short and in long term, surpassing the strategies Random Walk and Buy & Hold.
Article
In the financial sector, the stock market and its trends are highly volatile in nature. Recent studies have shown that news articles and social media analysis can have an immense impact on investors' opinion toward financial markets. Thus, the purpose of this study is to explore the relationship between news sentiment and stock market movement using information from different news agencies, business magazines, and financial portals. This study offers an application of the Bayesian structural time (BST) series model that is more transparent and facilitates better handling of uncertainty than the autoregressive integrated moving average (ARIMA) model and the vector autoregression (VAR) method by using prior information about the structure of the model. One of the main pitfalls of this model is the presumption of linearity. The long short-term memory (LSTM) model is a nonlinear model that can capture various nonlinear structures present in the data set. We propose a hybrid model, which combines the LSTM model with the BST model along with the regression component that captures information from different news sources to identify market predictors. The proposed model detects unusual behavior or anomalous pattern of the stock price movement, which makes our model superior compared to the traditional methods. Our new hybrid model accumulates error with lower rates (3.5%) and shows a remarkable performance over some of the other existing hybrid models, such as AR-MLP, ARIMA-LSTM, and VAR-LSTM model.
Article
This paper investigates whether and how investor sentiment affects stock market volatility forecasting from a non‐linear theory perspective. By using a novel dataset that contains massive articles about stock market analysis obtained from a Chinese investors' community, we construct four sentiment indices to measure investor sentiment by applying textual analysis techniques. Differing from the developed market, we find that the investor sentiment from an emerging market causes stock volatility by a non‐linear pattern rather than a linear style. Furthermore, we show that the investor sentiment improves stock volatility prediction based on the long short‐term memory model. And the predictability is still significant after considering another sentiment proxy variable. Finally, we demonstrate that this improvement of predictive performance is meaningful from an economic point of view.
Article
In this paper, we use the Twitter based happiness index as a proxy for investor sentiment in order to examine whether happiness influences future market volatility of country VIX indexes. Our sample includes the major stock markets of the USA, Canada, UK, Germany, France, Netherlands, Switzerland, Japan, China, Hong Kong, India, Brazil, South Korea, and South Africa. Using linear and nonlinear causality tests, we find that Twitter happiness significantly causes the future volatility of the sample countries. The robustness checks show no divergence from our primary findings and provide strong evidence of a nonlinear relationship between investor sentiment and future stock market volatility.
Article
Stock prediction via market data analysis is an attractive research topic. Both stock prices and news articles have been employed in the prediction processes. However, how to combine technical indicators from stock prices and news sentiments from textual news articles, and make the prediction model be able to learn sequential information within time series in an intelligent way, is still an unsolved problem. In this paper, we build up a stock prediction system and propose an approach that 1) represents numerical price data by technical indicators via technical analysis, and represents textual news articles by sentiment vectors via sentiment analysis, 2) setup a layered deep learning model to learn the sequential information within market snapshot series which is constructed by the technical indicators and news sentiments, 3) setup a fully connected neural network to make stock predictions. Experiments have been conducted on more than five years of Hong Kong Stock Exchange data using four different sentiment dictionaries, and results show that 1) the proposed approach outperforms the baselines in both validation and test sets using two different evaluation metrics, 2) models incorporating prices and news sentiments outperform models that only use either technical indicators or news sentiments, in both individual stock level and sector level, 3) among the four sentiment dictionaries, finance domain-specific sentiment dictionary (Loughran–McDonald Financial Dictionary) models the news sentiments better, which brings more prediction performance improvements than the other three dictionaries.
Article
In this research, we proposed a text analysis system to predict stock market movements using news and social media data. It is a scalable prediction system for sparse and high dimensional feature sets. Using the developed system, we collected 12,560 articles from New York Times covering one year time period, and 2,854,333 tweets from Twitter covering 4 months time period. We analysed the collected data using entity extraction, sentiment analysis and topic modelling techniques. We applied our feature set creation and elastic net regression based training method . The analyses have been used to train different prediction models. Using these trained prediction models, we predicted stock market movements for Dow Jones Index and showed that the proposed method can make promising predictions. In different sets of experiments, highly accurate (up to 70.90% accuracy) predictions are made by the proposed approach. These predicted values also correlated (up to 0.2315 correlation coefficient value) with real Dow Jones Index values. Further, we report performance comparison results for various prediction models that we trained with different set of features to analyse the importance of time interval and feature space size. Our test results show that it is possible to make reasonable stock movement prediction by integrating news and related social media data, analysing them using named entity extraction, sentiment analysis and topic modelling techniques together with prediction models which use features that are created from these analysis results.
Article
We employ multiple heterogeneous data sources, including historical transaction data, technical indicators, stock posts, news and Baidu index, to predict the directions of stock price movements. We focus on the distinctive predicting patterns of active and inactive stocks, and we examine the predictive power of support vector machine (SVM) in different levels of activity for a single stock. We construct a total of 14 data source combinations according to the above 5 heterogeneous data sources, and choose three forecasting horizons, namely 1 day, 2 days and 3 days, so that we can investigate the forecast effects of stock price movements in China A-share market under different data source combinations and forecasting horizons. It is concluded that the optimal data source combinations of active and inactive stocks are different. Active stocks achieve the highest accuracy when combining multiple non-traditional data sources, while inactive stocks obtain the highest accuracy when combining traditional data sources with non-traditional data sources. We further divide each stock into inactive periods, active periods and very active periods, and compare the forecast effects of the same stocks in different periods. We conclude that, for most combinations of data sources, the more active the stock is, the more accurate we achieve, which indicates that our approach is more powerful for predicting the price movements of stocks in active and very active periods.
Article
The search for models to predict the prices of financial markets is still a highly researched topic, despite major related challenges. The prices of financial assets are non-linear, dynamic, and chaotic; thus, they are financial time series that are difficult to predict. Among the latest techniques, machine learning models are some of the most researched, given their capabilities for recognizing complex patterns in various applications. With the high productivity in the machine learning area applied to the prediction of financial market prices, objective methods are required for a consistent analysis of the most relevant bibliography on the subject. This article proposes the use of bibliographic survey techniques that highlight the most important texts for an area of research. Specifically, these techniques are applied to the literature about machine learning for predicting financial market values, resulting in a bibliographical review of the most important studies about this topic. Fifty-seven texts were reviewed, and a classification was proposed for markets, assets, methods, and variables. Among the main results, of particular note is the greater number of studies that use data from the North American market. The most commonly used models for prediction involve support vector machines (SVMs) and neural networks. It was concluded that the research theme is still relevant and that the use of data from developing markets is a research opportunity.
Article
Recently, due to their ability to deal with sequences of different lengths, neural networks have achieved a great success on sentiment classification. It is widely used on sentiment classification. Especially long short-term memory networks. However, one of the remaining challenges is to model long texts to exploit the semantic relations between sentences in document-level sentiment classification. Existing Neural network models are not powerful enough to capture enough sentiment messages from relatively long time-steps. To address this problem, we propose a new neural network model (SR-LSTM) with two hidden layers. The first layer learns sentence vectors to represent semantics of sentences with long short term memory network, and in the second layer, the relations of sentences are encoded in document representation. Further, we also propose an approach to improve it which first clean datasets and remove sentences with less emotional polarity in datasets to have a better input for our model. The proposed models outperform the state-of-the-art models on three publicly available document-level review datasets.
Article
We explore the ability of sentiment metrics, extracted from micro-blogging sites, to predict stock markets. We also address sentiments’ predictive time-horizons. The data concern bloggers’ feelings about five major stocks. Taking independent bullish and bearish sentiment metrics, granular to two minute intervals, we model their ability to forecast stock price direction, volatility, and traded volume. We find evidence of a causal link from sentiments to stock price returns, volatility and volume. The predictive time-horizon is minutes, rather than hours or days. We argue that diverse and high volume sentiment is more predictive of price volatility and traded volume than near-consensus is predictive of price direction. Causality is ephemeral. In this sense, the crowd is more a hasty mob than a source of wisdom.
Chapter
Economics was conceived as early as the classical period as a science of causes. The philosopher–economists David Hume and J. S. Mill developed the conceptions of causality that remain implicit in economics today. This article traces the history of causality in economics and econometrics, showing that different approaches can be classified on two dimensions: process versus structural approaches, and a priori versus inferential approaches. The variety of modern approaches to causal inference is explained and related to this classification. Causality is also examined in relationship to exogeneity and identification.
Article
Given the close contact between international financial markets, the contagion effect across markets is becoming increasingly obvious. In this paper, which uses principal component analysis to build a Chinese stock market investor sentiment index and further applies a structural vector autoregression (SVAR) model, we analyze the contagion effect of international crude oil price fluctuations on Chinese stock market investor sentiment. The results show that international crude oil price fluctuations significantly Granger cause Chinese stock market investor sentiment; in the long term, if the international crude oil price fluctuates by 1%, stock market sentiment will negatively fluctuate 3.9400%. From the perspective of short-term efficacy, if the international crude oil price fluctuates by 1%, stock market investor sentiment in the same period will negatively fluctuate 1.0223%. International crude oil prices made a greater early contribution to investor sentiment and showed a rapid growth trend, with a contribution of 2.8076% in the first period and 8.1955% in the second. The growth rate then slows and eventually stabilizes at the 25% level; the average contagion delay for international crude oil price fluctuation to affect investor sentiment is 8 months.
Conference Paper
Many researches have been done for stock market prediction. While there are many free and premium data sources available today, and there many new machine learning algorithms have been proposed, most of these researches do not focus on utilizing these data sources and algorithms. This paper proposes a stock market prediction service framework that allows users to choose different data sources and machine learning techniques. In particular, while most existing prediction approaches are based on neural networks, support vector machines or Naive Bayes, we illustrate the flexibility of our framework by including metric learning based methods to predict stock movement.
Article
Correlation between two variables or parameters plays a very significant role in statistics. Furthermore, the accuracy in the measurement of the correlation depends upon the data collected for the set of discourse. It is quite evident that in many cases the data collected for various statistical measures are full of uncertainties. A dual hesitant fuzzy set (DHFS) is a generalized form of a hesitant fuzzy set (HFS) and negates the effects of uncertainty inherent in the collected data. In the present paper, the concept of HFS has been replaced with DHFS and the correlation between two DHFSs is obtained. A formula for the correlation coefficient between two DHFSs has been derived. The proposed method was used to determine the coefficient of correlation between different parameters of water in four different lakes in Rajasthan, India.