Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Predicting returns in the stock market is usually posed as a forecasting problem where prices are predicted. Intrinsic volatility in the stock market across the globe makes the task of prediction challenging. Consequently, forecasting and diffusion modeling undermines a diverse range of problems encountered in predicting trends in the stock market. Minimizing forecasting error would minimize investment risk. In the current work, we pose the problem as a direction-predicting exercise signifying gains and losses. We develop an experimental framework for the classification problem which predicts whether stock prices will increase or decrease with respect to the price prevailing n days earlier. Two algorithms, random forests, and gradient boosted decisio‘n trees (using XGBoost) facilitate this connection by using ensembles of decision trees. We test our approach and report the accuracies for a variety of companies as improvement over existing predictions. A novelty of the current work is about the selection of technical indicators and their use as features, with high accuracy for medium to long-run prediction of stock price direction.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... In this section, data input, machine learning techniques, long-short-term memory, and related work are presented. Basak et al. (2019) divided the data input methods into two categories, including structured and unstructured data. The structured input data include daily stock information, technical and economic indicators, and financial reports. ...
... By predicting the Kuala Lumpur Composite Index, the persistent homogeneity analysis of the input data is helpful to increase the possibility of stock prediction. Basak et al. (2019) performed stock market analysis by using three different machine learning classifiers. They believe that the difference in company background also affects the accuracy of the prediction. ...
Article
Full-text available
Since the beginning of stock trading, investors and researchers have tried to find effective ways to predict the direction of stock prices on the next day. However, predicting stock prices is a hard task because there exist many factors that may affect the next day’s stock prices. Recently, investors and researchers have adopted machine learning techniques with technical indicator analysis to make prediction. But the prediction accuracy is unsatisfactory. Thus, this study aims to examine the problem of stock price prediction with time series and proposes an effective way to filter out the datasets, which consists of three key steps: First, a time series model with long-short-term memory (LSTM) was used to identify the possible problems and solve them in the process of stock price prediction. Second, the results of LSTM were adopted to influence the stock trend prediction by a variety of machine learning techniques, including random forest, support vector machine, light gradient boosting machine (LightGBM). Third, a novel method was proposed to select suitable datasets through buying or selling. Hereby, FTSE TWSE Taiwan 50 Index stocks were used as training and testing datasets, respectively. Some important days were selected for prediction and decision making. The experimental results show that the highest prediction accuracy of 86% and the average prediction accuracy of 82% have been obtained. Consequently, when comparing to other existing methods, the accuracy of predicting stock price trend with our proposed approach has been significantly improved. Reference No.: 2024-UAAI-0010R1/243667820 Subject Area: Intelligent Investment System
... This underscores the importance of forecasting green asset prices is imperative for making well-informed investment decisions about this growing asset class. There is a growing literature showing that forecasting stock price direction achieves higher accuracy than predicting actual stock prices (P€ onk€ a, 2016; Basak et al., 2019;Lohrmann and Luukka, 2019). Moreover, existing studies have explored various macroeconomic factors that deeply influenced the performance of green asset prices. ...
... Accurate price predictions in these areas are essential for making informed investment decisions and aligning portfolios with environmentally and socially conscious goals. Traditional forecasting methods have their limitations, but the integration of advanced ML techniques has emerged as a powerful tool to automate feature selection and capture complex relationships, offering improved accuracy and decision-making support (Basak et al., 2019;Ampomah et al., 2020;Ghosh et al., 2022). In fact, Mehrjoo et al. (2014) aim to develop a mechanism for deriving efficient frontier (EF) on the basis of the new model. ...
Article
Purpose This study investigates clean energy, commodities, green bonds and environmental, social and governance (ESG) index prices forecasting and assesses the predictive performance of various factors on these asset prices, used for the development of a robust forecasting support decision model using machine learning (ML) techniques. More specifically, we explore the impact of the financial stress on forecasting price. Design/methodology/approach We utilize feature selection techniques to evaluate the predictive efficacy of various factors on asset prices. Moreover, we have developed a forecasting model for these asset prices by assessing the accuracy of two ML models: specifically, the deep learning long short-term memory (LSTM) neural networks and the extreme gradient boosting (XGBoost) model. To check the robustness of the study results, the authors referred to bootstrap linear regression as an alternative traditional method for forecasting green asset prices. Findings The results highlight the significance of financial stress in enhancing price forecast accuracy, with the financial stress index (FSI) and panic index (PI) emerging as primary determinants. In terms of the forecasting model's accuracy, our analysis reveals that the LSTM outperformed the XGBoost model, establishing itself as the most efficient algorithm among the two tested. Practical implications This research enhances comprehension, which is valuable for both investors and policymakers seeking improved price forecasting through the utilization of a predictive model. Originality/value To the authors' best knowledge, this marks the inaugural attempt to construct a multivariate forecasting model. Indeed, the development of a robust forecasting model utilizing ML techniques provides practical value as a decision support tool for shaping investment strategies.
... Moreover, the famous Efficient Market Hypothesis (EMH) stipulates that the stock prices already contain and reflect all available information, and therefore in theory there exists no techniques that can produce excess economic profits in the long run [3]. For a long time, there has been a debate about whether the daily stock price is predictable for its intrinsically chaotic, non-parametric properties [4]. The oncedominated theory of EMH underwent skeptics when more and more economists came to believe that at least some predictable patterns that could lead to excess market profit exist [4]. ...
... For a long time, there has been a debate about whether the daily stock price is predictable for its intrinsically chaotic, non-parametric properties [4]. The oncedominated theory of EMH underwent skeptics when more and more economists came to believe that at least some predictable patterns that could lead to excess market profit exist [4]. Furthermore, significant incidents in the financial sector have brought spotlight on the significance of diversity. ...
Article
Full-text available
Portfolio optimization is a long-term topic in the financial field, which can maximize returns while minimizing risks. It is widely used in production and daily life, such as stock investment, production optimization, and engineering models. This article selects data from 10 stocks, namely KO, PG, PEP, CL, MDLZ, STZ, PM, KMB, GIS, and K, from January 24, 2024, to June 14, 2024. Firstly, this article uses the ARIMA model to learn the first 70% of the data and predict the last 30% of the data, and then uses portfolio strategy to evaluate the results. This article obtained the maximum Sharpe ratio model on the Monte Carlo effective boundary. By combining the maximum Sharpe ratio model with the SP 500 index within the same time frame, this article concludes that the market performs better. This article provides different portfolio allocation strategies for risk averse investors seeking stable positive returns in turbulent markets.
... Compared with other statistical methods, random forest can significantly reduce the probability of misidentifying a normal company as a fraudulent company. The random forest model can also evaluate the importance of variables and find that the asset-liability ratio (DEQUTY) is the most important variable, which is of great significance for capital formation [4]. Moreover, in the article Predicting the direction of stock market prices using random forest, the authors propose to treat the stock market prediction problem as a classification problem and use the random forest algorithm in ensemble learning to minimize the prediction error [4]. ...
... The random forest model can also evaluate the importance of variables and find that the asset-liability ratio (DEQUTY) is the most important variable, which is of great significance for capital formation [4]. Moreover, in the article Predicting the direction of stock market prices using random forest, the authors propose to treat the stock market prediction problem as a classification problem and use the random forest algorithm in ensemble learning to minimize the prediction error [4]. In addition, the paper titled A study on predicting loan default based on the random forest algorithm uses Lending Club's loan data set in the first quarter of 2019, deals with the data imbalance by the feature engineering and SMOTE methods, and then constructs the loan default prediction model by using the Random Forest algorithm. ...
Article
Full-text available
Accurately predicting the movement of stock prices can help people make more informed investment decisions and thus obtain higher returns. They can also assess market trends, develop investment strategies and provide investment advice. In this paper, we used 5 models including Random Forest, XGBoost, ANN, RNN, LSTM to predict and verify the fit of 3 companies (AMZN, BABA and MSFT). It is found that LSTM and random forest model can predict well in most cases. The development of the financial industry does have some shortcomings, and the future financial field will be a field full of challenges and opportunities, so some machine learning and deep learning methods can be used to solve the prediction and modeling problems of financial aspects such as the stock market.
... The research explores the challenges of predicting stock prices amidst market volatility by framing the problem as a directional prediction exercise, leveraging classification algorithms, random forests, and gradient-boosted decision trees. The incorporation of technical indicators and feature engineering contributes to enhanced accuracy, particularly for medium to long-term predictions, reinforcing the importance of refining models to tackle intricate market dynamics [27]. A proposed hybrid GA-XGBoost prediction system further supports the need for advanced feature engineering, data preparation, and optimal feature selection to improve prediction performance, emphasizing the ongoing necessity for model development in addressing complex financial data characteristics [28]. ...
Article
Investment in the capital market has become a lifestyle for millennials in Indonesia as seen from the increasing number of SID (Single Investor Identification) from 2.4 million in 2019 to 10.3 million in December 2022. The increase is due to various reasons, starting from the Covid-19 pandemic, which limited the space for social interaction and the easy way to invest in the capital market through various e-commerce platforms. These investors generally use fundamental and technical analysis to maximize profits and minimize the risk of loss in stock investment. These methods may lead to problem where subjectivity and different interpretation may appear in the process. Additionally, these methods are time consuming due to the need in the deep research on the financial statements, economic conditions and company reports. Machine learning by utilizing historical stock price data which is time-series data is one of the methods that can be used for the stock price forecasting. This paper proposed XGBoost optimized by Particle Swarm Optimization (PSO) for stock price forecasting. XGBoost is known for its ability to make predictions accurately and efficiently. PSO is used to optimize the hyper-parameter values of XGBoost. The results of optimizing the hyper-parameter of the XGBoost algorithm using the Particle Swarm Optimization (PSO) method achieved the best performance when compared with standard XGBoost, Long Short-Term Memory (LSTM), Support Vector Regression (SVR) and Random Forest. The results in RSME, MAE and MAPE shows the lowest values in the proposed method, which are, 0.0011, 0.0008, and 0.0772%, respectively. Meanwhile, the reaches the highest value. It is seen that the PSO-optimized XGBoost is able to predict the stock price with a low error rate, and can be a promising model to be implemented for the stock price forecasting. This result shows the contribution of the proposed method.
... Sentiment analysis of news articles, social media, or financial reports can provide insights into market sentiment and investor behavior, offering an additional layer of information. can also provide unique features that may reveal emerging trends or anomalies in market activity.The iterative nature of feature engineering is another important consideration (Basak S, et al.,2019). As and continuously evolving, the relevance of features may change over time. ...
Chapter
Full-text available
Machine learning has rapidly transformed various sectors, and the stock market is no exception. Traditionally dominated by human intuition and fundamental analysis, stock market forecasting has increasingly integrated sophisticated algorithms and data-driven for every scenario. In the stock market, forecasting aims to predict future price movements or trends based on historical data and other relevant inputs. Machine learning enhances this process by leveraging algorithms that, correlations, and anomalies within large datasets. Unlike traditional statistical methods, which may rely on linear assumptions and require extensive domain expertise, machine learning to new data more flexibly. These models include various techniques such as which identifies hidden structures or patterns in unlabeled data.Supervised learning models, such as linear regression, decision trees, and neural networks, have gained prominence in stock market forecasting. Linear regression predicts future values based on linear relationships between variables.
... These particular algorithms for example, networks of neurones called "neural networks"; logistics regression, which generates a binary value to forecast the stock's future movement K-Nearest Neighbour, which employs SVMs, probability, and retrieves the k nearest data points using distance measurements. Random forest is our preferred method going on to SVMs [23]. Alternative methods for training adversarial networks have included using Ensemble LSTM with CNN on stock indices. ...
Conference Paper
Full-text available
These days, the majority invest their money in financial products. Every country on Earth is affected by the financial operations. A stock market is any platform that enables the issuance and businesses of stocks in an organized fashion, this can also occur physically or digitally. Uncertainty is introduced into stock prices by the numerous rules and regulations that regulate the financial processes inherent to the stock market. There are many risks in the financial market, but at the same time, it also offers many opportunities for profits. The proposed system consists of the following steps: preprocessing, feature selection, and model training. Data preparation is all about the enhancement of the selected data. Feature selection in stock market movements forecasting is a process of selecting the relevant features/variables from the dataset. It helps to reduce dimensionality, prevents overfitting, and improves forecast
... Tan et al. [8], using Chinese stock market data, selected two feature spaces for price trend prediction, assessing the robustness of the RF model and verifying the potential for significant excess returns in these paradigms. Basak et al. [9] concentrated on the classification problem of stock price prediction, employing random forests and gradient-boosted decision trees to create a framework for predicting stock price direction, which successfully demonstrated efficiency. Moreover, their selection of indicators and features contributed to improved accuracy in mediumand long-term stock price predictions. ...
Article
Full-text available
Issues related to asset pricing and return forecasting have consistently been prominent research topics in both the financial industry and academia. With the rapid advancement of science, technology, and artificial intelligence, machine learning has garnered extensive attention from scholars for its robust self-learning and adaptive capabilities, as well as its exceptional advantages in large-scale data processing. This paper provides a detailed overview of the application of emerging machine learning methods in asset pricing and offers an in-depth analysis of machine learnings contributions to this field based on existing research. This research finds that machine learning has introduced profound changes and progressive innovations in asset pricing models. Additionally, this paper examines the limitations of machine learning methods, objectively highlighting current shortcomings such as poor interpretability and model overfitting. Finally, this paper proposes future directions for improving machine learning models to advance the financial field further.
... Machine learning algorithms have gained significant traction in stock price prediction due to their ability to handle complex, non-linear relationships in financial data. Basak et al. (2019) used random forests and gradient-boosted decision trees (XGBoost) to predict the direction of stock market prices, achieving high accuracy for medium to long-run predictions. Bazrkar and Hosseini (2023) utilized PSO parameter adjustment to enhance stock market prediction using a Support Vector Machine (SVM). ...
Article
Full-text available
Stock price prediction remains a complex challenge in financial markets. This study introduces a novel Long Short-Term Memory (LSTM) model optimized by Sand Cat Swarm Optimization (SCSO) for stock price prediction. The research evaluates multiple algorithms including ANN, LSTM variants, Auto-ARIMA, Gradient Boosted Trees, DeepAR, N-BEATS, N-HITS, and the proposed LSTM-SCSO using DAX index data from 2018 to 2023. Model performance was assessed through Mean Squared Error, Mean Absolute Error, Mean Absolute Percentage Error, and out-of-sample R2 metrics. Statistical significance was validated using Model Confidence Set analysis with 5000 bootstrap replications. Results demonstrate LSTM-SCSO's superior performance across all evaluation metrics. The model achieved an annualized return of 66.25% compared to the DAX index's 47.45%, with a Sharpe ratio of 2.9091. The integration of technical indicators and macroeconomic variables enhanced the model's predictive capabilities. These findings establish LSTM-SCSO as an effective tool for stock price prediction, offering practical value for investment decision-making.
... Machine learning technology is the current frontier technology in the field of computing, which has a strong ability to capture and analyze data [18]. Currently, machine learning has been increasingly and deeply applied in stock prediction research such as tree models [19], support vector machines (SVM) [20,21], artificial neural networks (ANN) [22,23], and convolutional neural networks (CNN) [24] and so on. All these machine learning models are capable of handling high-dimensional nonlinear data. ...
Article
Full-text available
Stock trading signal prediction is very important for investors’ trading decisions. However, since the stock market is a complex and nonlinear system, stock trading is frequent and complex. Human beings cannot integrate all the relevant information in time and make the right decisions by their brains alone. Machine learning can mimic the brain, learn from experience, and discover the connection between different things, thus realizing correct prediction and decision-making. Therefore, this study innovatively proposes a fusion of interpretable embedded multicriteria feature cross-selection engineering to capture effective features. Meanwhile, an optimized neural network prediction model is proposed where the Bayesian (BO) algorithm assumes the task of searching for hyperparameter combinations. The methods are as follow: (1) Daily stock prices are categorized into four types of key points for stock trading signals based on the time series extreme point algorithm. (2) A more comprehensive range of impact factors is constructed. Starting from the stock’s historical trading data, based on the stock’s trend, volatility, and turnover flow, five categories of technical indicators are constructed: Overlap Study, Momentum Indicator, Momentum Indicator, Volatility Indicator, and Price Conversion. (3) To construct a feature cross-selection method with multiple feature screening criteria to find the optimal feature influencing factors from different evaluation dimensions. (4) The hyper-parameters of the Artificial Neural Network (ANN) are optimized using Bayesian optimization algorithm. The optimized ANN is then used to model the data and obtain predictions. Twenty stocks were randomly selected from Shanghai Stock Exchange and Shenzhen Stock Exchange as experimental data to verify the validity of the model. The accuracy of the model proposed in this paper is 54.83%, 55.46%, and 54.70% for stocks with upward, steady, and downward trends respectively. The accuracy is on average 7.93%, 8.09%, and 8.09% higher than the comparison model. The return on investment through the predicted results of the model is 21.87%, 7.76%, and −3.51% respectively, which is better than the other comparative models. It can be seen from the experiments that the feature cross-selection method with multi-feature screening criterion can help the model to better find the optimal feature influencing factor, which helps to improve the accuracy of prediction. The Bayesian optimization algorithm contributes to the performance improvement of the ANN. After modeling the features using the Bayesian optimized ANN, the stock trading signal prediction model proposed in this paper is significantly better than other prediction models.
... Precise information, such as companies stock prices is necessary for businesses and investors to make informed investment decisions, therefore this topic is crucial. Their understanding of potential returns and risks is aided by stock price predictions [8]. Controlling share price volatility is important for a company, as it affects investment strategies and company selection for investment purposes [9,10]. ...
Article
The stocks of nickel mining companies in Indonesia are one of the most influential stocks in the global economy. Understanding its price movement is quite complicated, it requires an advanced technique to process it. This research provides investors with better information regarding nickel mining companies and prediction techniques using prediction models, allowing investors to create more successful strategies with lower investment risks. Thus, allowing small investors to compete with institutional players. This study focuses on comparing the accuracy of two stock prediction models, i.e. Bidirectional Long Short-Term Memory (BiLSTM) and Random Forest (RF), in predicting the stock prices of a nickel mining company in Indonesia, Antam (Aneka Tambang) using a dataset from Yahoo Finance with several forecast time frame as the comparison. The evaluation metrics we use to compare the accuracy of the two models are Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). The results of this study showed that the BiLSTM model has a lower error compared to the RF model, although in terms of training time and computational resource usage, the RF model shows a much better efficiency than the BiLSTM model. The RMSE shows that the BiLSTM model error is 0.0568633, while the RF model error is 0.0746966 for the short forecast time frame. For the long forecast time frame, the BiLSTM model error is 0.0753416, while the RF model error is 0.1549706. The RF model's performance deteriorates significantly as the forecast time frame lengthens, indicating a strong inverse correlation between the model's accuracy and the length of the forecast time window. The results demonstrate the exceptional capability of the BiLSTM model in making the right decision in predicting the stock price of nickel mining companies.
... Research by Baur, Hong, and Lee (2018) indicates that the volatility of cryptocurrencies is higher than that of traditional assets, which can lead to substantial gains or losses. This volatility is often driven by market sentiment, heavily influenced by news and media coverage (Khaidem, Saha, and Dey 2016). ...
Article
Full-text available
This study represents a pioneering investigation into cryptocurrency news's repercussions on publicly traded companies' corporate earnings. Leveraging advanced Generative AI (GenAI) models and the BERT framework for sentiment analysis, we meticulously integrated comprehensive data from the Financial Modeling Prep API to employ a rigorous event study methodology alongside advanced machine learning algorithms. Noteworthy insights were derived from the BERT model, shedding light on the rationales behind abnormal returns and facilitating an in-depth analysis of material and immaterial impacts. The study's findings underscore the significant impact of both positive and negative cryptocurrency news on cumulative abnormal returns (CAR), particularly within firms deeply entrenched in crypto activities. Notably, deliberate news, including official announcements, exerts a more pronounced influence than unintentional market reactions. This innovative approach furnishes actionable insights for financial services, investment management, and corporate communication, providing a framework for enhancing predictive models, investment decisions, and risk management strategies.
... Similarly, Patel et al. (2015) demonstrate that Random Forest and Gradient Boosting Machines outperform Artificial Neural Networks (ANN) and SVM in predicting stock market index movements. Additionally, Khaidem et al. (2016) apply Random Forest for stock market prediction and observe superior performance compared to Naive Bayes and SVM. Long Short-Term Memory (LSTM) networks, a type of recurrent neural network capable of capturing long-term dependencies, J o u r n a l P r e -p r o o f have shown promise in stock market forecasting. ...
Article
Full-text available
Modelling water futures is challenging due to the dynamics of several variables and non-linearity. The traditional models are often inefficient to capture such hidden patterns. Thus, utilizing datasets that include daily price and volume information of water futures, this study evaluates the performance of SVM, Random Forest Regressor, LSTM, GBM, and XGBoost models. The findings indicate that XGBoost outperforms other models in accuracy; however, all models tested perform well and provide accurate predictions. Consequently, it makes valuable contributions to the field of modelling financial products based on water-adjacent entities. Furthermore, it underscores the potential for machine learning in enhancing market predictions for emerging and niche sectors that have thus far been overlooked by market participants but will increasingly become more important as climate change progresses. Lastly, as futures markets for water as a commodity mature (with higher trading volume of futures contracts), this study will serve as a benchmark for better modelling of water futures.
... To overcome these challenges, Machine Learning (ML) models have gained prominence in financial time series forecasting due to their ability to learn from data, interpretability and lack of assumptions explained by Makridakis et al. (2018).Various ML models, including Artificial Neural Networks (ANNs)/ Multilayer Perceptron (MLP) [Haykin (2009)],Support Vector Regression (SVR) [Henrique et al. (2018)], Random Forest (RF) [Nti et al. (2019)],eXtreme Gradient Boosting (XGBoost) [Basak et al. (2019)] and ensemble models such as stacking [Jiang et al. (2020)]and bagging [Wang et al. (2009)] have been utilized in financial time series forecasting. ML models, being data-driven and adaptable, offer advantages over traditional model-based approaches. ...
Article
Full-text available
Accurate prediction of agricultural prices is crucial due to their complex and nonlinear nature. Due to the perishable nature of TOP (Tomato, Onion and Potato) vegetable produce, price fluctuates based on supply and demand. It is necessary to forecast harvest period TOP prices, so growers can make informed production decisions and also farmers can plan their market situation to enhance their profits. This research introduces novel. Deep Learning (DL) models based on hidden states to enhance the precision of TOP price forecasting. The Hidden Markov Model (HMM) is employed to identify hidden states and uncover underlying patterns in TOP price data. The hidden states identified by HMM serve as a feature extraction technique and are utilized in four DL models, viz., Multilayer Perceptron (MLP), Recurrent Neural Networks (RNN), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTM). The integration of HMM with DL aims to improve forecasting accuracy compared to HMM and traditional DL models. The models are evaluated using a real dataset from Azadpur Mandi in Delhi, providing practical insights into forecasting accuracy. The performance of the models is evaluated using standard metrics such as Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE) and Mean Absolute Error (MAE). Additionally, the Diebold-Mariano (DM) test has been conducted to compare the accuracy of the proposed approach with baseline DL models. The findings demonstrate that the hybrid approach of Hidden Markov (HM) combined with DL models yields superior forecasting performance compared to existing models.
... Since it is the most basic model of ML, there are plenty of studies implementing Linear Regression to predict specific stock data (Deenadayalan and Vaishnavi, 2021) or general movements caused by macroeconomic changes (Smith and Rajan, 2017). When it comes to Random Forest, direction stock market prices (Khaidem et al., 2016). Tanuwijaya and Hansun (2019) used a KNN model to forecast the LQ45 stock index. ...
Chapter
Forcasting the future direction of stock indinces has been received significant attention by researchers and investors. Due to the complexcity of information, it is very difficult to predict future stock market price behavior. In this paper, we determine the best machine learning model by forecasting the Borsa Istanbul Stock Exchange participation index with Artificial Intelligence (AI). Six different machine learning algorithms are used to predict the prices of a participation index such as Linear Regression, LSTM, KNN, Auto-ARIMA, Gradient Boosting and Random Forest. Models were built by using the closing rates of the Participation Index between November 2015 and June 2020, and the last 30 days' rate was forecasted. As a main finding, the best model was determined according to accuracy results based on the various models. It is seen that, of the six different machine learning models, the LSTM model provides the most accurate result. This is kind of first study on the prediction of participation index by application of six different machine learning models of AI.
... For example, a study by Dietterich et al. showed that support vector machine (SVM) models based on technical indicators have high accuracy in predicting stock movements. The study utilized a variety of technical indicators, such as Relative Strength Index (RSI) and Smoothed Moving Average (SMA), to construct feature sets to predict the direction of the stock prices of Apple and other companies, and achieved good results [5][6]. ...
Article
Full-text available
This paper investigates the performance of a variety of machine learning models for Apple stock price prediction, covering linear regression, ridge regression, Lasso regression, SVM, weighted average, stacked model, and random forest methods. The dataset contains daily closing prices of Apple stock for the period from 2010 to 2024, using data from the first 13 years for model training and data from the last year for testing. The study results show that after many times of parameters tuning and testing, all models except Random Forest exhibit good prediction results, with simple models such as linear regression and ridge regression performing particularly well with fewer features, while the Random Forest model exhibits severe overfitting/underfitting problems. This study provides an empirical reference for the application of machine learning in financial time series forecasting, which can help to improve financial forecasting ability and investment decision-making in the future.
... To address this limitation, data-driven approaches, including machine learning and deep learning, have been considered. Among machine learning methods, support vector machines (SVMs) [18], decision trees [19] have demonstrated superior performance compared to traditional approaches in handling non-linearity, high-dimensional data, and small sample sizes. Furthermore, deep learning models inspired by biological systems have gained prominence in financial forecasting. ...
Preprint
The significant fluctuations in stock index prices in recent years highlight the critical need for accurate forecasting to guide investment and financial strategies. This study introduces a novel composite forecasting framework that integrates variational mode decomposition (VMD), PatchTST, and adaptive scale-weighted layer (ASWL) to address these challenges. Utilizing datasets of four major stock indices--SP500, DJI, SSEC, and FTSE--from 2000 to 2024, the proposed method first decomposes the raw price series into intrinsic mode functions (IMFs) using VMD. Each IMF is then modeled with PatchTST to capture temporal patterns effectively. The ASWL module is applied to incorporate scale information, enhancing prediction accuracy. The final forecast is derived by aggregating predictions from all IMFs. The VMD-PatchTST-ASWL framework demonstrates significant improvements in forecasting accuracy compared to traditional models, showing robust performance across different indices. This innovative approach provides a powerful tool for stock index price forecasting, with potential applications in various financial analysis and investment decision-making contexts.
... This system uses two SVM estimators to predict the upper and lower limits of price oscillations, then constructs a trading strategy based on these constrained predictions, achieving leading results on the S&P 500 dataset. At the same time, other machine learning methods such as k-nearest neighbours (KNN) [22], random forests (RF) [23], and decision trees (DT) [24] are also frequently applied in stock market forecasting. Compared to traditional statistical models, machine learning demonstrates a stronger nonlinear modelling capability, enabling it to capture complex dynamic patterns in stock price time series effectively. ...
... Besides LSTM, we also compared multiple regression models for stock index prediction. The regression methods used contain Lasso, ElasticNet, RandomFore-stRegressor, LinearRegression, Ridge, SGDRegressor, KNeighborsRegressor, DecisionTreeRegressor, and BayesianRidge [41,42,43]. ...
Preprint
Full-text available
We propose and experimentally demonstrate an innovative stock index prediction method using a weighted optical reservoir computing system. We construct fundamental market data combined with macroeconomic data and technical indicators to capture the broader behavior of the stock market. Our approach shows significant higher performance than state-of-the-art methods such as linear regression, decision trees, and neural network architectures including long short-term memory. It captures well the market's high volatility and nonlinear behaviors despite limited data, demonstrating great potential for real-time, parallel, multi-dimensional data processing and predictions.
... For example, L Khaidem et al. successfully used ensemble learning to improve the performance of random forest for stock prediction. As for logistic regression, J Gong and S Sun introduced innovative feature index variables into the prediction model and proposed a special optimization process to select optimizing regression parameters [10][11]. In 2012, A Upadhyay et al. used seven independent financial ratiosto construct a Multinomial Logistic Regression method [12]. ...
Article
Full-text available
Due to its chaotic and high volatility character, as well as many other uncertainties from reality, predicting the price of stock market is always a challenging goal to achieve. Due to those characteristics mentioned above, it could be regarded as a classification problem, and then many methods using different machine learning tools could be well applied to solve the challenging problem. Within these methods, deep neural network is a popular and highly noticed one in recent years. This is mainly because of its unique advantages compared to the more conventional machine learning methods, the highly complex nonlienearity and deep nonlinear topologies, to appropriately describe the complex situations. Later, after adding advantages of reinforcement learning to enable the model of the advantages to improve feature dimensions, the deep reinforcement learning method is well proposed to improve the performance. Deep reinforcement learning is a method to combine the advantages of deep learning and the advantages of reinforcement learning, and this paper will discuss its characters and advantages, and finally talk about its limitations and future development.
... Recent research has identified momentum, economic, and psychological behaviour factors as viable sources for stock price forecasting (Gray and Vogel, 2016;Lo et al., 2000;Christoffersen and Diebold, 2006;Moskowitz et al., 2012). Furthermore, recent research on stock price predictability suggests that forecasting stock price direction is more successful than forecasting actual stock prices (Basak et al., 2019;Nyberg, 2011;Nyberg and Pönkä, 2016;Ballings et al., 2015;Lohrmann and Luukka, 2019). ...
... A Random Forest is a collection of decision trees constructed mathematically, with each tree having a random sample of data and a random subset of attributes. The aggregate of the predictions from each individual tree is the final forecast, which is usually determined by average (for regression) or by a majority vote (for classification) [8]. Random Forests offer remarkable advantages including exceptional predictive accuracy, reduced risk of overfitting through aggregation, adeptness with large datasets and feature-rich data, feature importance insights, and resilience to noisy data. ...
Article
Full-text available
The inherent uncertainties of market dynamics, such as economic data, geopolitics, and natural calamities, make stock market prediction extremely difficult. One increasingly effective method for handling this complexity is machine learning. Using data from the world's largest e-commerce and technology company, Amazon, this study concentrated on supervised machine learning models for stock market prediction. The most successful model was Support Vector Machine (SVM), which achieved an amazing prediction accuracy of 89.11%. Furthermore, Principal Component Analysis (PCA) significantly improved Random Forest's accuracy, enhancing it from 75.25% to 87.13%. In addition, the results show that the SVM outperforms the random forest no matter the PCA is considered. These results underscore SVM's importance in stock price prediction and PCA's value in enhancing Random Forest's performance. This research provides valuable insights into machine learning's role in financial forecasting, empowering investors and decision-makers to make informed choices in the ever-evolving stock market landscape.
Article
Full-text available
Stock trend prediction is a significant challenge due to the inherent uncertainty and complexity of stock market time series. In this study, we introduce an innovative dual-branch network model designed to effectively address this challenge. The first branch constructs recurrence plots (RPs) to capture the nonlinear relationships between time points from historical closing price sequences and computes the corresponding recurrence quantifification analysis measures. The second branch integrates transposed transformers to identify subtle interconnections within the multivariate time series derived from stocks. Features extracted from both branches are concatenated and fed into a fully connected layer for binary classification, determining whether the stock price will rise or fall the next day. Our experimental results based on historical data from seven randomly selected stocks demonstrate that our proposed dual-branch model achieves superior accuracy (ACC) and F1-score compared to traditional machine learning and deep learning approaches. These findings underscore the efficacy of combining RPs with deep learning models to enhance stock trend prediction, offering considerable potential for refining decision-making in financial markets and investment strategies.
Preprint
Full-text available
The rapid economic growth in recent years has led to a surge in stock market participation, necessitating accurate stock price prediction to mitigate investment risks and maximize returns. However, stock price’s dynamic nature and intrinsic volatility pose significant challenges to traditional statistical and machine learning (ML) models, which often struggle with overfitting, poor robustness, and limited generalization. To address these challenges, this study introduces EvoBagNet, an evolutionary Bagging ensemble learning model specifically designed for robust and high-accuracy stock price prediction. The proposed framework integrates nine state-of-the-art ML techniques, encompassing tree-based methods, neural networks, and both boosting and bagging ensemble approaches, applied to recent datasets from nine prominent IT sector companies. Extensive experimentation was conducted using multiple train-test split ratios to evaluate the scalability and adaptability of the models under diverse scenarios. The performance of EvoBagNet was assessed against six evaluation metrics, revealing its superior accuracy and robustness compared to alternative models. EvoBagNet demonstrated exceptional prediction accuracy across all datasets, achieving performance scores of 97.0%±0.7, 98.3%±0.5, 97.3%±0.8, 97.4%±0.6, 97.0%±1.0, 98.6%±0.4, 98.8%±0.4, 91.7%±1.2, and 98.4%±0.3 for Tech Mahindra, Mindtree, Infosys, Wipro, TCS, Mphasis, L&T Tech, HCL, and Coforge, respectively. These results highlight EvoBagNet’s potential as a powerful tool for stock price forecasting, offering significant implications for informed investment strategies and financial decision-making. This study underscores EvoBagNet’s effectiveness in addressing the limitations of traditional ML models, paving the way for its application in dynamic and high-stakes financial markets.
Preprint
Full-text available
Non-Fungible Tokens (NFTs) have garnered significant attentions as an emerging digital asset class with unique properties that cannot be replicated. In this paper, this study analyzes the graphical factors affecting the pricing of NFTs represented by CryptoPunks using decision trees, random forests and XGB regression methods. This study reveals that various image attributes of CryptoPunks exhibit significant variability and exert an influence on their prices. These finding provides valuable insights into the pricing dynamics of NFTs and shed light on the key attributes that impact their value in the market.
Preprint
Full-text available
This paper provides an empirical study exploring the application of deep learning algorithms -- Multilayer Perceptron (MLP), Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and Transformer -- in constructing long-short stock portfolios. Two datasets comprising randomly selected stocks from the S\&P 500 and NASDAQ indices, each spanning a decade of daily data, are utilized. The models predict daily stock returns based on historical features such as past returns, Relative Strength Index (RSI), trading volume, and volatility. Portfolios are dynamically adjusted by longing stocks with positive predicted returns and shorting those with negative predictions, with asset weights optimized using Mean-Variance Optimization. Performance is evaluated over a two-year testing period, focusing on return, Sharpe ratio, and maximum drawdown metrics. The results demonstrate the efficacy of deep learning models in enhancing long-short stock portfolio performance.
Conference Paper
Full-text available
Predicting stock market prices with accuracy is a complicated task and in the financial sector, forecasting stock prices accurately and reliably is continuing to be a major challenge to the researchers. An accurate prediction of the closing price of the stock of any company, helps both the company and the buyer to make financially sound decisions. Though several researchers have come up with numerous models and solutions, the quest for precise and accurate prediction continues. This paper focuses on evaluation of prediction methods in financial ecosystem of the stock market considering the closing price of the stock. The historical data of stock price is extracted from the financial website Yahoo finance, The traditional methods of statistical analysis like the moving average, weighted and exponential moving average are extended to autoregressive integrated moving average (ARIMA) and their performance is compared and evaluated with the fusion of ARIMA and support vector regression (SVR) analysis for prediction of stack closing price. ARIMA exploits the statistical properties of the data, to predict the future values of the data by the information from the past values while support vector regression (SVR) is a supervised learning algorithm, that aims to minimize the prediction errors to fit a plane to the feature space. The model's validity is evaluated based on the measures of mean absolute error (MAE), mean squared error (MSE) and root mean squared error (RMSE). The fusion of models has an improvement over the other models with RMSE of 0.017842 and R squared value of 0.944. The values of RMSE and R squared value obtained indicate that the fusion of ARIMA and SVR has proved to be an excellent prediction model for the data considered.
Article
Full-text available
This investigation aimed to utilize machine learning algorithms for predicting stock market movements in Iran. The study centered on three specific sectors - diversified finance, information technology (IT), and metals - within the Tehran Stock Exchange. Ten years of historical data were analyzed . Incorporating ten technical indicators. To achieve this goal, six machine learning models were deployed. Support Vector Regression (Linear)Support Vector Regression (RBF)Linear Regression, Random -Forests ,K-Nearest Neighbours (KNN) Decision Trees.
Preprint
Full-text available
Stock trend forecasting, a challenging problem in the financial domain, involves ex-tensive data and related indicators. Relying solely on empirical analysis often yields unsustainable and ineffective results. Machine learning researchers have demonstrated that the application of random forest algorithm can enhance predictions in this context, playing a crucial auxiliary role in forecasting stock trends. This study introduces a new approach to stock market prediction by integrating sentiment analysis using FinGPT generative AI model with the traditional Random Forest model. The proposed technique aims to optimize the accuracy of stock price forecasts by leveraging the nuanced understanding of financial sentiments provided by FinGPT. We present a new methodology called "Sentiment-Augmented Random Forest" (SARF), which in-corporates sentiment features into the Random Forest framework. Our experiments demonstrate that SARF outperforms conventional Random Forest and LSTM models with an average accuracy improvement of 9.23% and lower prediction errors in pre-dicting stock market movements.
Article
Full-text available
This research paper explores the integration of Artificial Intelligence (AI) and Machine Learning (ML) models into Oracle Planning and Budgeting Cloud (PBCS) system to enhance forecasting accuracy and optimize scenario planning. The study investigates how predictive analytics and real-time data processing can be leveraged to automate and improve financial planning processes. Through a comprehensive analysis of current methodologies and emerging AI technologies, this paper aims to bridge the research gap in understanding AI's impact on forecasting reliability, particularly in fluctuating market conditions. The findings suggest that AI-driven forecasting models can significantly improve prediction accuracy and enable more dynamic and responsive scenario planning in planning and budgeting systems.
Preprint
Full-text available
Value at Risk (VaR) and stress testing are two of the most widely used approaches in portfolio risk management to estimate potential market value losses under adverse market moves. VaR quantifies potential loss in value over a specified horizon (such as one day or ten days) at a desired confidence level (such as 95'th percentile). In scenario design and stress testing, the goal is to construct extreme market scenarios such as those involving severe recession or a specific event of concern (such as a rapid increase in rates or a geopolitical event), and quantify potential impact of such scenarios on the portfolio. The goal of this paper is to propose an approach for incorporating prevailing market conditions in stress scenario design and estimation of VaR so that they provide more accurate and realistic insights about portfolio risk over the near term. The proposed approach is based on historical data where historical observations of market changes are given more weight if a certain period in history is "more similar" to the prevailing market conditions. Clusters of market conditions are identified using a Machine Learning approach called Variational Inference (VI) where for each cluster future changes in portfolio value are similar. VI based algorithm uses optimization techniques to obtain analytical approximations of the posterior probability density of cluster assignments (market regimes) and probabilities of different outcomes for changes in portfolio value. Covid related volatile period around the year 2020 is used to illustrate the performance of the proposed approach and in particular show how VaR and stress scenarios adapt quickly to changing market conditions. Another advantage of the proposed approach is that classification of market conditions into clusters can provide useful insights about portfolio performance under different market conditions.
Chapter
Every business has its objective to attain, along with earning profits. A business needs finances and also throws up difficulties in meeting its financial requirements. If ignored, the challenges could affect the business. It is crucial for the business to keep a regular check, necessitating the right, analytical methods. A large number of works of literature have focussed on the traditional methods of predicting financial distress. Recently, a few studies evolved using modern methods. This study has a review of the literature regarding modern methods used in predicting financial distress. The present study has adopted the structure of a scoping review developed by Arksey and O'Malley and the main aim is to showcase the importance of predicting financial distress with modern methods through the machine learning approach. It also aims to highlight the drawbacks of statistical methods while predicting financial distress and covering the reasons for them.
Article
Full-text available
This research aims to explore the efficacy of machine learning techniques, specifically Random Forest modeling, in forecasting economic growth. The research problem lies in the challenge of accurately predicting economic trends, which is crucial for effective policy formulation and decision-making. The study follows a structured methodology comprising data collection, preprocessing, feature selection, model training, and validation. Results demonstrate the effectiveness of Random Forest modeling in capturing the intricate patterns of economic data and outperforming traditional forecasting methods. This approach offers promising prospects for enhancing the accuracy and reliability of economic growth forecasts, thereby facilitating informed decision-making processes in various sectors.
Article
Full-text available
Artificial Intelligence (AI) has been developing rapidly in recent years, and the application of AI in various fields has gradually emerged. This paper focuses on exploring the cross-fertilization of AI with the Chinese stock market, and the study adopts 7 types of factors, totaling 29 factor indicators, covering multiple types of value, valuation, leverage, financial quality, growth, technology, and risk factors, etc. Through data preprocessing, feature engineering, and other steps on the factors, and then combined with 8 machine learning algorithms to construct corresponding quantitative trading strategies on CSI 300 constituent stocks. By comparing the results of various models, it is found that the trading strategies constructed by machine learning algorithms can obtain significant excess returns in the Chinese market. Except for Gaussian Park Bayes, the other seven model strategy returns beat the benchmark returns, among which XGBoost performs the best, achieving a 20.10% return and the smallest retraction. This paper has a positive impact on advancing the research on the intersection of machine learning and quantitative investment.
Article
Full-text available
Prediction in stock market is an interesting and challenging research topic in machine learning. A large research has been conducted for prediction in stock market by using different machine learning classifiers. This research paper presents a detail study on data of London, New York, and Karachi stock exchange markets to predict the future trend in these stock exchange markets. In this study, we have applied machine learning classifiers before and after applying principle component analysis (PCA) and reported errors and accuracy of the algorithms before and after applying PCA. The performance of the selected algorithms has been compared using accuracy measure over the selected datasets.
Article
Full-text available
Prediction of stock price return is a highly complicated and very difficult task because there are many factors such that may influence stock prices. An accurate prediction of movement direction of stock index is crucial for investors to make effective market trading strategies. However, because of the high nonlinearity of the stock market, it is difficult to reveal the inside law by the traditional forecast methods. In response to such difficulty, data mining techniques have been introduced and applied for this financial prediction. This study attempted to develop three models and compared their performances in predicting the direction of movement in daily Tehran Stock Exchange (TSE) index. The models are based on three classification techniques, Decision Tree, Random Forest and Naïve Bayesian Classifier. Ten microeconomic variables and three macroeconomic variables were chosen as inputs of the proposed models. Experimental results show that performance of Decision Tree model (80.08%) was found better than Random Forest (78.81%) and Naïve Bayesian Classifier (73.84%).
Article
Full-text available
Variable annuity (VA) with embedded guarantees have rapidly grown in popularity around the world in recent years. Valuation of VAs has been studied extensively in past decades. However, most of the studies focus on a single contract. These methods can be extended neither to valuate nor to manage the risk of a large variable annuity portfolio due to the computational complexity. In this paper, we propose an efficient moment matching machine learning method to solve this problem. This method is proved to be a good candidate for risk management in terms of the speed of and the complexity of computing the annual dollar deltas, VaRs and CVaRs for a large variable annuity portfolio whose contracts are over a period of 25 years. There are two stages for our method. First, we select a small number of contracts and propose a moment matching Monte Carlo method based on the Johnson curve, rather than the well known nested simulations, to compute the annual dollar deltas, VaRs and CVaRs for each selected contract. Then, these computed results are used as a training set for well known machine learning methods, such as regression tree , neural network and so on. Afterwards, the annual dollar deltas, VaRs and CVaRs for the entire portfolio {\em are predicted} through the trained machine learning method. Compared to other existing methods \cite{BKR08, Gan13, GL15}, our method is very efficient and accurate. Finally, our test results support our claims.
Article
Full-text available
In the business sector, it has always been a difficult task to predict the exact daily price of the stock market index; hence, there is a great deal of research being conducted regarding the prediction of the direction of stock price index movement. Many factors such as political events, general economic conditions, and traders’ expectations may have an influence on the stock market index. There are numerous research studies that use similar indicators to forecast the direction of the stock market index. In this study, we compare two basic types of input variables to predict the direction of the daily stock market index. The main contribution of this study is the ability to predict the direction of the next day’s price of the Japanese stock market index by using an optimized artificial neural network (ANN) model. To improve the prediction accuracy of the trend of the stock market index in the future, we optimize the ANN model using genetic algorithms (GA). We demonstrate and verify the predictability of stock price direction by using the hybrid GA-ANN model and then compare the performance with prior studies. Empirical results show that the Type 2 input variables can generate a higher forecast accuracy and that it is possible to enhance the performance of the optimized ANN model by selecting input variables appropriately.
Article
Full-text available
Predicting financial market changes is an important issue in time series analysis, receiving an increasing attention in last two decades. The combined prediction model, based on artificial neural networks (ANNs) with principal component analysis (PCA) for financial time series forecasting is presented in this work. In the modeling step, technical analysis has been conducted to select technical indicators. Then PCA approach was applied to extract the principal components from the variables for the training step. Finally, the ANN-based model called NARX was used to train the data and perform the time series forecast. TAL1T stock of Nasdaq OMX Baltic stock exchange was used as a case study. The mean square error (MSE) measure was used to evaluate the performances of proposed model. The experimental results lead to the conclusion that the proposed model can be successfully used as an alternative method to standard statistical techniques for financial time series forecasting.
Article
Full-text available
We document significant “time series momentum†in equity index, currency, commodity, and bond futures for each of the 58 liquid instruments we consider. We find persistence in returns for one to 12 months that partially reverses over longer horizons, consistent with sentiment theories of initial under-reaction and delayed over-reaction. A diversified portfolio of time series momentum strategies across all asset classes delivers substantial abnormal returns with little exposure to standard asset pricing factors and performs best during extreme markets. Examining the trading activities of speculators and hedgers, we find that speculators profit from time series momentum at the expense of hedgers.
Article
Full-text available
Absfract-Financial time series are complex, non stationary and deterministically chaotic. Technical indicators are used with Principal Component Analysis (PCA) in order to identify the most influential inputs in the context of the forecasting model. Neural networks (NN) and support vector regression (SVR) are used with different inputs. Our assumption is that the future value of a stock price depends on the financial indicators although there is no parametric model to explain this relationship. This relationship comes from the technical analysis. Comparison shows that SVR and MLP networks require different inputs. Besides that the MLP networks outperform the SVR technique.
Article
Full-text available
The use of Support Vector Machines (SVMs) is studied in financial forecasting by comparing it with a multi-layer perceptron trained by the Back Propagation (BP) algorithm. SVMs forecast better than BP based on the criteria of Normalised Mean Square Error (NMSE), Mean Absolute Error (MAE), Directional Symmetry (DS), Correct Up (CP) trend and Correct Down (CD) trend. S&P 500 daily price index is used as the data set. Since there is no structured way to choose the free parameters of SVMs, the generalisation error with respect to the free parameters of SVMs is investigated in this experiment. As illustrated in the experiment, they have little impact on the solution. Analysis of the experimental results demonstrates that it is advantageous to apply SVMs to forecast the financial time series.
Conference Paper
Full-text available
Stock market prediction is an attractive research problem to be investigated. News contents are one of the most important factors that have influence on market. Considering the news impact in analyzing the stock market behavior, leads to more precise predictions and as a result more profitable trades. So far various prototypes have been developed which consider the impact of news in stock market prediction. In this paper, the main components of such forecasting systems have been introduced. In addition, different developed prototypes have been introduced and the way whereby the main components are implemented compared. Based on studied attempts, the potential future research activities have been suggested.
Article
Full-text available
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.
Article
Full-text available
The efficient market hypothesis states that asset prices in financial markets should reflect all available information; as a consequence, prices should always be consistent with ‘fundamentals’. In this paper, we discuss the main ideas behind the efficient market hypothesis, and provide a guide as to which of its predictions seem to be borne out by empirical evidence, and which do not. In examining the empirical evidence, we concentrate on the stock and foreign exchange markets. The efficient market hypothesis is almost certainly the right place to start when thinking about asset price formation. The evidence suggests, however, that it cannot explain some important and worrying features of asset market behaviour. Most importantly for the wider goal of efficient resource allocation, financial market prices appear at times to be subject to substantial misalignments, which can persist for extended periods of time.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Article
In this paper, customer restaurant preference is predicted based on social media location check-ins. Historical preferences of the customer and the influence of the customer's social network are used in combination with the customer's mobility characteristics as inputs to the model. As the popularity of social media increases, more and more customer comments and feedback about products and services are available online. It not only becomes a way of sharing information among friends in the social network but also forms a new type of survey which can be utilized by business companies to improve their existing products, services, and market analysis. Approximately 121,000 foursquare restaurant check-ins in the Greater New York City area are used in this research. Artificial neural networks (ANN) and support vector machine (SVM) are developed to predict the customers' behavior regarding restaurant preferences. ANN provides 93:13% average accuracy across investigated customers, compared to only 54:00% for SVM with a sigmoid kernel function.
Article
In this paper we introduce a calibration procedure for validating of agent based models. Starting from the well-known financial model of (. Brock and Hommes, 1998), we show how an appropriate calibration enables the model to describe price time series. We formulate the calibration problem as a nonlinear constrained optimization that can be solved numerically via a gradient-based method. The calibration results show that the simplest version of the Brock and Hommes model, with two trader types, fundamentalists and trend-followers, replicates nicely the price series of four different markets indices: the S&P 500, the Euro Stoxx 50, the Nikkei 225 and the CSI 300. We show how the parameter values of the calibrated model are important in interpreting the trader behavior in the different markets investigated. These parameters are then used for price forecasting. To further improve the forecasting, we modify our calibration approach by increasing the trader information set. Finally, we show how this new approach improves the model's ability to predict market prices.
Article
We consider three sets of phenomena that feature prominently in the financial economics literature: (1) conditional mean dependence (or lack thereof) in asset returns, (2) dependence (and hence forecastability) in asset return signs, and (3) dependence (and hence forecastability) in asset return volatilities. We show that they are very much interrelated and explore the relationships in detail. Among other things, we show that (1) volatility dependence produces sign dependence, so long as expected returns are nonzero, so that one should expect sign dependence, given the overwhelming evidence of volatility dependence; (2) it is statistically possible to have sign dependence without conditional mean dependence; (3) sign dependence is not likely to be found via analysis of sign autocorrelations, runs tests, or traditional market timing tests because of the special nonlinear nature of sign dependence, so that traditional market timing tests are best viewed as tests for sign dependence arising from variation in expected returns rather than from variation in volatility or higher moments; (4) sign dependence is not likely to be found in very high-frequency (e.g., daily) or very low-frequency (e.g., annual) returns; instead, it is more likely to be found at intermediate return horizons; and (5) the link between volatility dependence and sign dependence remains intact in conditionally non-Gaussian environments, for example, with time-varying conditional skewness and/or kurtosis.
Article
Function estimation/approximation is viewed from the perspective of numerical optimization iti function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent "boosting" paradigm is developed for additive expansions based on any fitting criterion. Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such "TreeBoost" models are presented. Gradient boosting of regression trees produces competitives highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.
Article
This paper examines the relationship between stock prices and commodity prices and whether this can be used to forecast stock returns. As both prices are linked to expected future economic performance they should exhibit a long-run relationship. Moreover, changes in sentiment towards commodity investing may affect the nature of the response to disequilibrium. Results support cointegration between stock and commodity prices, while Bai–Perron tests identify breaks in the forecast regression. Forecasts are computed using a standard fixed (static) in-sample/out-of-sample approach and by both recursive and rolling regressions, which incorporate the effects of changing forecast parameter values. A range of model specifications and forecast metrics are used. The historical mean model outperforms the forecast models in both the static and recursive approaches. However, in the rolling forecasts, those models that incorporate information from the long-run stock price/commodity price relationship outperform both the historical mean and other forecast models. Of note, the historical mean still performs relatively well compared to standard forecast models that include the dividend yield and short-term interest rates but not the stock/commodity price ratio. Copyright © 2014 John Wiley & Sons, Ltd.
Article
The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.
Article
Financial engineering such as trading decision is an emerging research area and also has great commercial potentials. A successful stock buying/selling generally occurs near price trend turning point. Traditional technical analysis relies on some statistics (i.e. technical indicators) to predict turning point of the trend. However, these indicators can not guarantee the accuracy of prediction in chaotic domain. In this paper, we propose an intelligent financial trading system through a new approach: learn trading strategy by probabilistic model from high-level representation of time series–turning points and technical indicators. The main contributions of this paper are two-fold. First, we utilize high-level representation (turning point and technical indicators). High-level representation has several advantages such as insensitive to noise and intuitive to human being. However, it is rarely used in past research. Technical indicator is the knowledge from professional investors, which can generally characterize the market. Second, by combining high-level representation with probabilistic model, the randomness and uncertainty of chaotic system is further reduced. In this way, we achieve great results (comprehensive experiments on S&P500 components) in a chaotic domain in which the prediction is thought impossible in the past.
Article
The efficient market hypothesis gives rise to forecasting tests that mirror those adopted when testing the optimality of a forecast in the context of a given information set. However, there are also important differences arising from the fact that market efficiency tests rely on establishing profitable trading opportunities in ‘real time’. Forecasters constantly search for predictable patterns and affect prices when they attempt to exploit trading opportunities. Stable forecasting patterns are therefore unlikely to persist for long periods of time and will self-destruct when discovered by a large number of investors. This gives rise to non-stationarities in the time series of financial returns and complicates both formal tests of market efficiency and the search for successful forecasting approaches.
Article
Several empirical studies have documented that the signs of excess stock returns are, to some extent, predictable. In this paper, we consider the predictive ability of the binary dependent dynamic probit model in predicting the direction of monthly excess stock returns. The recession forecast obtained from the model for a binary recession indicator appears to be the most useful predictive variable, and once it is employed, the sign of the excess return is predictable in-sample. The new dynamic “error correction” probit model proposed in the paper yields better out-of-sample sign forecasts, with the resulting average trading returns being higher than those of either the buy-and-hold strategy or trading rules based on ARMAX models.
Article
This paper reports on an experiment of learning and forecasting on the foreign exchange market by means of an Artificial Intelligence methodology (a ‘Classifier System’) which simulates learning and adaptation in complex and changing environments. The experiment has been run for two different exchange rates, the US dollar-D mark rate and the US dollar-yen rate, representative of two possibly different market environments. A fictitious “artificial agent” is first trained on a monthly data base from 1973 to 1990, and then tested out-of-sample from 1990 to 1992. Its forecasting performance is then compared with the performance of decision rules which follow the prescription of various economic theories on exchange rate behaviour, and the performance of forecasts given by VAR estimations of the exchange-rate's determinants.
Article
Traditionally, the autoregressive integrated moving average (ARIMA) model has been one of the most widely used linear models in time series forecasting. However, the ARIMA model cannot easily capture the nonlinear patterns. Support vector machines (SVMs), a novel neural network technique, have been successfully applied in solving nonlinear regression estimation problems. Therefore, this investigation proposes a hybrid methodology that exploits the unique strength of the ARIMA model and the SVMs model in forecasting stock prices problems. Real data sets of stock prices were used to examine the forecasting accuracy of the proposed model. The results of computational tests are very promising.
Conference Paper
We provide a method for estimating the generalization error of a bag using out-of-bag estimates. In bagging, each pre- dictor (single hypothesis) is learned from a bootstrap sam- ple of the training examples; the output of a bag (a set of predictors) on an example is determined by voting. The out- of-bag estimate is based on recording the votes of each pre- dictor on those training examples omitted from its bootstrap sample. Because no additional predictors are generated, the out-of-bag estimate requires considerably less time than 10- fold cross-validation. We address the question of how to use the out-of-bag estimate to estimate generalization error. Our experiments on several datasets show that the out-of-bag estimate and 10-fold cross-validation have very inaccurate (much too optimistic) confidence levels. We can improve the out-of-bag estimate by incorporating a correction.
Article
In this paper we investigate ways to use prior knowledge and neural networks to improve multivariate prediction ability. Daily stock prices are predicted as a complicated real-world problem, taking non-numerical factors such as political and international events are into account. We have studied types of prior knowledge which are difficult to insert into initial network structures or to represent in the form of error measurements. We make use of prior knowledge of stock price predictions and newspaper information on domestic and foreign events. Event-knowledge is extracted from newspaper headlines according to prior knowledge. We choose several economic indicators, also according to prior knowledge, and input them together with event-knowledge into neural networks. The use of event-knowledge and neural networks is shown to be effective experimentally: the prediction error of our approach is smaller than that of multiple regression analysis on the 5% level of significance. © 1997 by John Wiley & Sons, Ltd.
Article
The key to successful stock market forecasting is achieving best results with minimum required input data. Given stock market model uncertainty, soft computing techniques are viable candidates to capture stock market nonlinear relations returning significant forecasting results with not necessarily prior knowledge of input data statistical distributions. This paper surveys more than 100 related published articles that focus on neural and neuro-fuzzy techniques derived and applied to forecast stock markets. Classifications are made in terms of input data, forecasting methodology, performance evaluation and performance measures used. Through the surveyed papers, it is shown that soft computing techniques are widely accepted to studying and evaluating stock market behavior.
Article
Research on this project was supported by a grant from the National Science Foundation. I am indebted to Arthur Laffer, Robert Aliber, Ray Ball, Michael Jensen, James Lorie, Merton Miller, Charles Nelson, Richard Roll, William Taylor, and Ross Watts for their helpful comments.
Article
We study approximately 5.0 million stock picks submitted by individual users to the “CAPS” website run by the Motley Fool company (www.caps.fool.com). These picks prove to be surprisingly informative about future stock prices. Shorting stocks with a disproportionate number of negative picks and buying stocks with a disproportionate number of positive picks yields a return of over 12% per annum over the sample period. Negative picks mostly drive these results; they strongly predict future stock price declines. Returns to positive picks are statistically indistinguishable from the market. A Fama–French decomposition suggests that stock-picking rather than style factors largely produced these results.
Article
The efficient market hypothesis has been widely tested and, with few exceptions, found consistent with the data in a wide variety of markets: the New York and American Stock Exchanges, the Australian, English, and German stock markets, various commodity futures markets, the Over-the-Counter markets, the corporate and government bond markets, the option market, and the market for seats on the New York Stock Exchange. Yet, in a manner remarkably similar to that described by Thomas Kuhn in his book, The Structure of Scientific Revolutions, we seem to be entering a stage where widely scattered and as yet incohesive evidence is arising which seems to be inconsistent with the theory. As better data become available (e.g., daily stock price data) and as our econometric sophistication increases, we are beginning to find inconsistencies that our cruder data and techniques missed in the past. It is evidence which we will not be able to ignore. The purpose of this special issue of the Journal of Financial Economics is to bring together a number of these scattered pieces of anomalous evidence regarding Market Efficiency. As Ball (1978) points out in his survey article: taken individually many scattered pieces of evidence on the reaction of stock prices to earnings announcements which are inconsistent with the theory don't amount to much. Yet viewed as a whole, these pieces of evidence begin to stack up in a manner which make a much stronger case for the necessity to carefully review both our acceptance of the efficient market theory and our methodological procedures.
Article
This paper investigates the predictability of spot foreign exchange rate returns from past buy-sell signals of the simple technical trading rules by using the nearest neighbors and the feedforward network regressions. The optimal choices for nearest neighbors, hidden units in a feedforward network and the training set are determined by the cross validation method which minimizes the mean square error. Although this method is computationally expensive the results indicate that it has the advantage of avoiding overfitting in noisy environments and indicate that simple technical rules provide significant forecast improvements for the current returns over the random walk model.
Article
Technical analysis, also known as 'charting,' has been a part of financial practice for many decades, but this discipline has not received the same level of academic scrutiny and acceptance as more traditional approaches such as fundamental analysis. One of the main obstacles is the highly subjective nature of technical analysis-the presence of geometric shapes in historical price charts is often in the eyes of the beholder. In this paper, we propose a systematic and automatic approach to technical pattern recognition using nonparametric kernel regression, and we apply this method to a large number of U.S. stocks from 1962 to 1996 to evaluate the effectiveness of technical analysis. By comparing the unconditional empirical distribution of daily stock returns to the conditional distribution-conditioned on specific technical indicators such as head-and-shoulders or double bottoms-we find that over the 31-year sample period, several technical indicators do provide incremental information and may have some practical value. Copyright The American Finance Association 2000.
Article
Revolutions often spawn counterrevolutions and the efficient market hypothesis in finance is no exception. The intellectual dominance of the efficient-market revolution has more been challenged by economists who stress psychological and behaviorial elements of stock-price determination and by econometricians who argue that stock returns are, to a considerable extent, predictable. This survey examines the attacks on the efficient market hypothesis and the relationship between predictability and efficiency. I conclude that our stock markets are more efficient and less predictable than many recent academic papers would have us believe.
Conference Paper
In this paper, we describe NewsCATS (news categorization and trading system), a system implemented to predict stock price trends for the time immediately after the publication of press releases. NewsCATS consists mainly of three components. The first component retrieves relevant information from press releases through the application of text preprocessing techniques. The second component sorts the press releases into predefined categories. Finally, appropriate trading strategies are derived by the third component by means of the earlier categorization. The findings indicate that a categorization of press releases is able to provide additional information that can be used to forecast stock price trends, but that an adequate trading strategy is essential for the results of the categorization to be fully exploited.
Conference Paper
We develop a prediction system useful in forecasting mid-term price trend in Taiwan stock market (Taiwan stock exchange weighted stock index, abbreviated as TSEWSI). The system is based on a recurrent neural network trained by using features extracted from ARIMA analyses. By differencing the raw data of the TSEWSI series and then examining the autocorrelation and partial autocorrelation function plots, the series can be identified as a nonlinear version of ARIMA(1,2,1). Neural networks trained by using second difference data are shown to give better predictions than otherwise trained by using raw data. During backpropagation training, in addition to the traditional error modification term, we also feedback the difference of two successive predictions in order to adjust the connection weights. Empirical results shows that the networks trained using 4-year weekly data is capable of predicting up to 6 weeks market trend with acceptable accuracy
Conference Paper
A discussion is presented of a buying- and selling-time prediction system for stocks on the Tokyo Stock Exchange and the analysis of internal representation. The system is based on modular neural networks. The authors developed a number of learning algorithms and prediction methods for the TOPIX (Tokyo Stock Exchange Prices Indexes) prediction system. The prediction system achieved accurate predictions, and the simulation on stocks trading showed an excellent profit
Article
Matplotlib is a 2D graphics package used for Python for application development, interactive scripting, and publication-quality image generation across user interfaces and operating systems. The latest release of matplotlib runs on all major operating systems, with binaries for Macintosh's OS X, Microsoft Windows, and the major Linux distributions. Matplotlib has a Matlab emulation environment called PyLab, which is a simple wrapper of the matplotlib API. Matplotlib provides access to basic GUI events such as button_press_event, mouse_motion_event and can also be registered with those events to receive callbacks. Event handling code written in matplotlib works across many different GUIs. It supports toolkits for domain specific plotting functionality that is either too big or too narrow in purpose for the main distribution. Matplotlib has three basic API classes, including, FigureCanvasBase, RendererBase and Artist.
Article
The topic of data warehousing encompasses architectures, algorithms, and tools for bringing together selected data from multiple databases or other information sources into a single repository, called a data warehouse, suitable for direct querying or analysis. In recent years data warehousing has become a prom inent buzzword in the database industry, but attention from the database research community has been lim ited. In this paper we motivate the concept of a data warehouse, we outline a general data warehousing arch itecture, and we propose a number of technical issues arising from the architecture that we believe are suitable topics for exploratory research. 1 Introduction Prov iding integrated access to multiple, distributed, heterogeneous databases and other information sources has become one of the leading issues in database research and industry [6]. In the research commun ity, most approaches to the data integration prob lem are based on the following very genera l two-step ...
Article
This paper presents statistical investigations regarding the predictability of stock returns. The examined data covers 207 stocks on the Swedish stock market for the time period 1987-1996. The results show trend behavior and autocorrelation values that are stable even when the entire time interval is broken down to yearly intervals.
Technical Analysis Power Tools for Active Investors
  • G Appel
Appel, G. (2005). Technical Analysis Power Tools for Active Investors. ISBN:0-13-147902-4.
Pharmaceutical Companies in the Economic Storm Navigating from a Position of Strength
  • P Behner
  • D Schwarting
  • S Vallerien
  • M Ehrhardt
  • C Beever
  • D Rollmann
Behner, P., Schwarting, D., Vallerien, S., Ehrhardt, M., Beever, C. & Rollmann, D. (2009). Pharmaceutical Companies in the Economic Storm Navigating from a Position of Strength. Technical Report: BooZ& Co Analysis.
XGBoost: A Scalable Tree Boosting System
  • T Chen
  • C Guestrin
Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System, Proceedings of the 22 nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining-KDD'16. doi:10.1145/2939672.2939785
Machine Learning in Stock Price Trend Forecasting
  • Y Dai
  • Y Zhang
Dai, Y. & Zhang Y. (2013). Machine Learning in Stock Price Trend Forecasting. Stanford University, http://cs229.stanford.edu/proj2013/DaiZhang-MachineLearningInStockPriceTrendForecasting.pdf