Article

Improving Financial Trading Decisions Using Deep Q-learning: Predicting the Number of Shares, Action Strategies, and Transfer Learning

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We study trading systems using reinforcement learning with three newly proposed methods to maximize total profits and reflect real financial market situations while overcoming the limitations of financial data. First, we propose a trading system that can predict the number of shares to trade. Specifically, we design an automated system that predicts the number of shares by adding a deep neural network (DNN) regressor to a deep Q-network, thereby combining reinforcement learning and a DNN. Second, we study various action strategies that use Q-values to analyze which action strategies are beneficial for profits in a confused market. Finally, we propose transfer learning approaches to prevent overfitting from insufficient financial data. We use four different stock indices—the S&P500, KOSPI, HSI, and EuroStoxx50—to experimentally verify our proposed methods and then conduct extensive research. The proposed automated trading system, which enables us to predict the number of shares with the DNN regressor, increases total profits by four times in S&P500, five times in KOSPI, 12 times in HSI, and six times in EuroStoxx50 compared with the fixed-number trading system. When the market situation is confused, delaying the decision to buy or sell increases total profits by 18% in S&P500, 24% in KOSPI, and 49% in EuroStoxx50. Further, transfer learning increases total profits by twofold in S&P500, 3 times in KOSPI, twofold in HSI, and 2.5 times in EuroStoxx50. The trading system with all three proposed methods increases total profits by 13 times in S&P500, 24 times in KOSPI, 30 times in HSI, and 18 times in EuroStoxx50, outperforming the market and the reinforcement learning model.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... (4) Formulate a Reward Function : Provides numerical feedback to the agent in response to its preceding action. [2], [55], [56], [57], [59], [79], [91], [156], [157], [163], [170], [174], 37 [175], [183], [195], [196], [204], [218], [219], [243], [248], [255], [256], [282], [294], [305], [311], [313], [321], [327], [328], [332], [338], [347], [348], [352], [354] Q-Learning [29], [40], [54], [63], [76], [77], [81,82], [90], [92], [93], [104], [119] 36 [138], [148], [155], [161], [167], [172], [191][192][193][194], [199] [201], [209], [218], [244,245], [246], [258], [283], [290], [307], [357], [359] SARSA [64], [67], [77], [86], [134], [177], [258], [284], [290], [305] 10 Other [7], [55], [88], [92], [201], [258] 6 ...
... Table 2 presents features and data used in the top fifteen most cited papers in our sample (as of October 2022), which are also ordered chronologically. [163] Daily Several x x Table 2 State representation commonly incorporates discrete state, technical analysis, pricing data, macroeconomic indicators, sentiment data, current position, and Limit Order Book (LOB) data. Few publications experiment with or compare different state configurations. ...
... x Logsig/Softmax x x Liang et al. [207] Equities 5 x x x Xiong et al. [339] Equities 3 x Jeong and Kim [163] Equity Index 3 x where ∈ [0, 1) represents the discount rate determining the agent's foresight. With = 0, the agent prioritises immediate rewards, whereas close to 1 emphasises future rewards significantly. ...
Preprint
Full-text available
Reinforcement Learning (RL) has experienced significant advancement over the past decade, prompting a growing interest in applications within finance. This survey critically evaluates 167 publications, exploring diverse RL applications and frameworks in finance. Financial markets, marked by their complexity, multi-agent nature, information asymmetry, and inherent randomness, serve as an intriguing test-bed for RL. Traditional finance offers certain solutions, and RL advances these with a more dynamic approach, incorporating machine learning methods, including transfer learning, meta-learning, and multi-agent solutions. This survey dissects key RL components through the lens of Quantitative Finance. We uncover emerging themes, propose areas for future research, and critique the strengths and weaknesses of existing methods.
... Deep reinforcement learning (DRL) incorporates DL to solve the policy while dealing with complex, highdimensional RL problems. For portfolio optimization problems, DRL provides a framework to allocate assets to maximize the investment return dynamically [18][19][20][21][22][23][24][25][26][27][28]. Jeong and Kim [19] presented an automated trading methodology that utilizes the deep Q-learning (DQN) [29] algorithm to identify the optimal number of stocks to trade. ...
... For portfolio optimization problems, DRL provides a framework to allocate assets to maximize the investment return dynamically [18][19][20][21][22][23][24][25][26][27][28]. Jeong and Kim [19] presented an automated trading methodology that utilizes the deep Q-learning (DQN) [29] algorithm to identify the optimal number of stocks to trade. They also addressed the overfitting problem of DL models by a transfer learning approach. ...
Article
Full-text available
Portfolio optimization is a widely studied topic in quantitative finance. Recent advances in portfolio optimization have shown promising capabilities of deep reinforcement learning algorithms to dynamically allocate funds across various potential assets to meet the objectives of prospective investors. The reward function plays a crucial role in providing feedback to the agent and shaping its behavior to attain the desired goals. However, choosing an optimal reward function poses a significant challenge for risk-averse investors aiming to maximize returns while minimizing risk or pursuing multiple investment objectives. In this study, we attempt to develop a risk-adjusted deep reinforcement learning (RA-DRL) approach leveraging three DRL agents trained using distinct reward functions, namely, log returns, differential Sharpe ratio, and maximum drawdown to develop a unified policy that incorporates the essence of these individual agents. The actions generated by these agents are then fused by employing a convolutional neural network to provide a single risk-adjusted action. Instead of relying solely on a singular reward function, our approach integrates three different functions aiming at diverse objectives. The proposed approach is tested on daily data of four real-world stock market instances: Sensex, Dow, TWSE, and IBEX. The experimental results demonstrate the superiority of our proposed approach based on several risk and return performance metrics when compared with base DRL agents and benchmark methods.
... Li et al. (2019) explored the investment strategy of the stock market based on a DRL model, highlighting the benefits of automated decision-making mechanisms in financial investments. Jeong et al. (2019) proposed methods to improve financial trading decisions using deep Q-learning, combining reinforcement learning with a deep neural network to predict the number of shares accurately. Similarly, Liu et al. (2021) discussed the application of DRL in stock trading strategies and stock forecasting, emphasizing the reliability and advantages of the model compared to traditional approaches. ...
... Similarly, Thilakarathna et al. (2020) proposed a Markov decision process model for financial trading tasks and solved it using deep recurrent Q-network algorithms. Jeong et al. (2019) focused on improving financial trading decisions through deep Q-learning, specifically predicting the number of shares, action strategies, and transfer learning. Thanh et al. (2022) developed a time-driven feature-aware jointly DRL approach for financial signal representation and algorithmic trading. ...
Article
Full-text available
This study addresses the complexities of carbon quota trading markets amidst global warming concerns, proposing a deep reinforcement learning (DRL)-based decision-making model to enhance trading strategies. Acknowledging the limitations of conventional methods in navigating volatile carbon prices, policy shifts, and informational disparities, the research integrates DRL's advanced capabilities. It commences with an overview of DRL principles and its successful applications, followed by an analysis of market dynamics and trading nuances. A DRL model is then formulated, delineating state-action spaces and a tailored reward function for optimized learning within the carbon trading context. Model refinement involves hyperparameter tuning for superior performance. The summary concludes with an evaluation of the model's efficacy, highlighting its adaptability and computational demands, while outlining avenues for further enhancement and real-world implementation to combat climate change through improved carbon market operations.
... Reward functions, which are a core component of reinforcement learning, have a decisive impact on the learning process and the development of the final strategy. Traditional reinforcement learning methods typically rely on predefined static reward functions [1][2][3][4][5][6][7][8][9][10]. Although these approaches can achieve the desired goal to some extent, their limitations become apparent when faced with unstable and complex environments such as financial markets. ...
... The design of reward functions is a critical aspect of RL as it directly influences the agent's behavior and the overall performance of the learning algorithm. The design of reward functions can vary significantly depending on the specific objectives and preferences of the investors involved in trading strategies [1][2][3][4][5]7,9,10,[20][21][22][23][24][25][26]. ...
Article
Full-text available
Reinforcement Learning (RL) is increasingly being applied to complex decision-making tasks such as financial trading. However, designing effective reward functions remains a significant challenge. Traditional static reward functions often fail to adapt to dynamic environments, leading to inefficiencies in learning. This paper presents a novel approach, called Self-Rewarding Deep Reinforcement Learning (SRDRL), which integrates a self-rewarding network within the RL framework. The SRDRL mechanism operates in two primary phases: First, supervised learning techniques are used to learn from expert knowledge by employing advanced time-series feature extraction models, including TimesNet and WFTNet. This step refines the self-rewarding network parameters by comparing predicted rewards with expert-labeled rewards, which are based on metrics such as Min-Max, Sharpe Ratio, and Return. In the second phase, the model selects the higher value between the expert-labeled and predicted rewards as the RL reward, storing it in the replay buffer. This combination of expert knowledge and predicted rewards enhances the performance of trading strategies. The proposed implementation, called Self-Rewarding Double DQN (SRDDQN), demonstrates that the self-rewarding mechanism improves learning and optimizes trading decisions. Experiments conducted on datasets including DJI, IXIC, and SP500 show that SRDDQN achieves a cumulative return of 1124.23% on the IXIC dataset, significantly outperforming the next best method, Fire (DQN-HER), which achieved 51.87%. SRDDQN also enhances the stability and efficiency of trading strategies, providing notable improvements over traditional RL methods. The integration of a self-rewarding mechanism within RL addresses a critical limitation in reward function design and offers a scalable, adaptable solution for complex, dynamic trading environments.
... -several researchers have developed transfer learning models in order to overcome the financial data scarcity issue [21,43,48,50,57,59,62,66,73,75,76]; as example, by starting from the observation that the common financial markets characteristics are able to explain the capability of technical indicators for providing working financial forecast signals, some scholars used transfer learning models in order to extract the general pattern underlying financial data of different securities markets; a recent study [24], by highlighting that <<current approaches in trading strategies treat each market or asset in isolation, with few use cases of transfer learning in the financial literature>> showed that a transfer learning model, which can learn a trading rule directly from a large-scale stock dataset and which is able to fine-tune it on a dataset that has the trading rule included, <<improves financial performance when compared to training a neural network from scratch>> [24]; moreover, another recent study stated that <<transfer learning based on 2 data sets is superior than other base-line methods>> [23]. ...
... As future research directions, the researchers highlighted the need to benchmark the benefits of their regression approach against state-of-the-art methods, by comparing their model with existing methods in order to measure the improvements and advantages offered by their proposed approach.In 5-26[72] Man et al. have highlighted the influence of market sentiment on various aspects of financial markets, such as price trends, trading volumes, volatility, and risks; as a result, they emphasized that several trading strategies, developed on the base of the findings of financial sentiment analysis (FSA), leading to significant returns; the researchers also reported example of the above topic relevance, by highlighting that major news vendors, such as Thomson Reuters, have incorporated sentiment analysis into their services (Thomson Reuters News Analytics -TRNAscores) that assess the polarity of news content; moreover, the researchers pointed out the importance of Natural Language Processing techniques on the topic, because these techniques facilitates the analysis of the financial text data. As possible future directions for FSA research, the paper pointed out the relevance of information combination, from different data sources, to enhance analysis, and the importance of transfer learning technique as path for more effective sentiment analysis.In 5-27[73] Jeong and Kim have pointed out that stock trading presents several challenges because financial data is often limited, volatile, and contains uncertain patterns, making decision-making difficult. To address this, the researchers proposed a transfer learning-based reinforcement learning model aimed at addressing the problem of insufficient financial data, by selecting component stocks based on their relationship with an index stock; thus, Jeong and Kim highlighted that the above approach prevents overfitting and provides useful information, leading to a better understanding of the financial data. ...
Preprint
Full-text available
Literature highlighted that financial time series data pose significant challenges for accurate stock price prediction, because these data are characterized by noise and susceptibility to news; traditional statistical methodologies made assumptions, such as linearity and normality, which are not suitable for the non-linear nature of financial time series; on the other hand, machine learning methodologies are able to capture non linear relationship in the data. To date, neural network is considered the main machine learning tool for the financial prices prediction. Transfer Learning, as a method aimed at transferring knowledge from source tasks to target tasks, can represent a very useful methodological tool for getting better financial prediction capability. Current reviews on the above body of knowledge are mainly focused on neural network architectures, for financial prediction, with very little emphasis on the transfer learning methodology; thus, this paper is aimed at going deeper on this topic by developing a systematic review with respect to application of Transfer Learning for financial market predictions and to challenges/potential future directions of the transfer learning methodologies for stock market predictions.
... In this approach, the target Q-value used to update the critic network is calculated by taking the minimum value among the Q-values predicted by multiple critic networks for a given state-action pair. Focusing on the minimum Q-value ensures that the learning process remains conservative, thus avoiding potential overestimation if all critique networks have an upward bias (Jeong & Kim, 2019). This conservative estimation helps stabilize the learning process, especially in environments where an excessively high Q-value can lead to poor policy decisions, resulting in increased robustness and reliability of the learned policy. ...
Article
Full-text available
This study proposes a reinforcement-learning-based forex trading model that leverages the Multi-Critic Deep Deterministic Policy Gradient (MC-DDPG) algorithm. The model is trained on historical forex data, technical indicators, and macroeconomic news to respond effectively to favorable announcements and enhance profitability while minimizing risk. The forex trading model was trained with five years of historical forex data on the EURUSD, GBPUSD, and AUDUSD currency pairs, leading technical indicators such as RSI, CCI, ATR, and MACD as trend filters, and 77 macroeconomic news items from the US economic calendar in real-time signals that have an impact on currency movements. We calculate high and medium signals for the effects of macroeconomic news on the forex market. The performance of the model was evaluated using error and investment metrics and compared with that of other reinforcement learning models. The empirical results of this study further emphasize the advantages of MC-DDPG. Extensive backtesting from the beginning of 2022 to mid-2024 revealed superior performance compared with other reinforcement learning models, such as SAC, TD3, A2C, and standard DDPG. The MC-DDPG model achieved a cumulative profit of 47.93%, with a win rate of 67.16% and an average profit expectancy of 94.65 points per trade. Maintaining the risk in check with a relatively low maximum drawdown of 2.68% indicates a balanced approach between profitability and risk control. The ability of MC-DDPG to adapt to market conditions and consistently outperform baseline models highlights its potential for real-world forex trading applications. The MC-DDPG model effectively selected 49 macroeconomic news items that contributed to higher returns and showed outstanding potential for improving trading performance, thus providing traders with a more adaptive tool in the face of market dynamics.
... Several prior studies have developed different methods to improve decisions on financial trading. Jeong & Kim used deep Q-learning to improve trading decisions, predict the number of shares to trade, and came up with a transfer learning algorithm to avoid overfitting [2]. Approaches using Asynchronous Advantage Actor-Critic (A3C) and Deep Q-Network (DQN) with Stacked Denoising Auto Encoders (SDAEs) showed better results than the Buy & Hold strategy [3]. ...
Article
Full-text available
In the last couple of years, stock trading has gained so much popularity because of its promising returns. However, most investors do not pay attention to the risks of trading without analysis, which can lead to a big loss. Some to reduce these risks, try their luck with automated and pre-programmed trading systems, which are called Expert Advisors. The current study examines the application of DRL for automated assistance in trading with an emphasis on decision-making enhancement, particularly the use of DRL in order to realize high asset returns with a low risk of exposure. Concretely, the two applied DRL methods within this work are A2C and PPO. By systematic testing, the A2C method produced a Sharpe Ratio of 1.6009 with a cumulative return of 1.4468, while the PPO method achieved a Sharpe Ratio of 1.7628 with a cumulative return of 1.4767. These were fine-tuned for the most optimal learning rates, cut loss, and take profit ratios, thus showing great promise with the capability to tune up trading strategies and improve trading performances. The research leverages these DRL techniques, hence arriving at better trading strategies that balance profit and risk, while underlining the promise of advanced algorithms in automated stock trading.
... 2 Related Work RL in Finance. RL has shown promise for financial decision-making, spanning Q-learning approaches for Sharpe ratio maximization (Gao and Chan, 2000), dynamic asset allocation (Jangmin et al., 2006), deep Q-learning (Jeong and Kim, 2019), tabular SARSA (de Oliveira et al., 2020), policy-based portfolio optimization (Shi et al., 2019), and actor-critic methods (Liu et al., 2018;Ye et al., 2020) enhanced by adversarial training (Liang et al., 2018) and transformer-based architectures (Huang et al., 2024). Recent research efforts in RL for financial applications have been greatly aided by open-source frameworks like FinRL (Liu et al., 2021(Liu et al., , 2022, which standardize implementations and provide reproducible benchmarks. ...
Preprint
Full-text available
Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{FLAG-Trader}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.
... In tackling the optimal execution problem, researchers have deployed various RL techniques, including temporal-difference learning, policy gradient, and reinforce [5]. Among these, Q-learning emerges as the most popular RL algorithm [6], notably applied to optimal execution challenges in finance [7], [8]. Lin [9] presents a pioneering deep reinforcement learning (DRL) framework that leverages an advanced variant of the Deep Q-Network (DQN) algorithm, incorporating Double DQN, Dueling Network, and Noisy Nets to minimize trade execution costs. ...
Article
Full-text available
Stock trading execution is a critical component in the complex financial market landscape, and the development of a robust trade execution framework is essential for financial institutions pursuing profitability. This paper presents the Federated Proximal Policy Optimization (FPPO) algorithm, an adaptive trade execution framework that leverages joint reinforcement learning. The FPPO algorithm demonstrates significant improvements in model performance across various stocks, with average returns enhanced by 3% to 15%. It also exhibits superior performance in key metrics such as the reward function value, showcasing its effectiveness in different financial contexts. The paper further explores the model’s performance under the FPPO algorithm with varying numbers of client nodes and different risk preferences, underscoring the importance of these factors in model construction. The results substantiate the FPPO algorithm’s capability to safeguard privacy, ensure high performance, and enable the creation of personalized trading models in the optimal trade execution problem. This positions investors to gain a competitive edge in the dynamic and complex financial markets.
... RL designs intelligent agents that map states to actions through interaction with their environment, to maximize the total reward (Jeong and Kim 2019). A sequence of states and actions from the initial point to the terminal state is called an episode (Sutton and Barto 2018). ...
Article
Full-text available
Portfolio management involves choosing and actively overseeing various investment assets to meet an investor’s long-term financial goals, considering their risk tolerance and desired return potential. Traditional methods, like mean–variance analysis, often lack the flexibility needed to navigate the complexities of today’s financial markets. Recently, Deep Reinforcement Learning (DRL) has emerged as a promising approach, enabling continuous adjustments to investment strategies based on market feedback without explicit price predictions. This paper presents a comprehensive literature review of DRL applications in portfolio management, aimed at finance researchers, data scientists, AI experts, FinTech engineers, and students seeking advanced portfolio optimization methodologies. We also conducted an experimental study to evaluate five DRL algorithms—Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Twin Delayed DDPG (TD3)—in managing a portfolio of 30 Dow Jones Industrial Average (DJIA) stocks. Their performance is compared with the DJIA index and traditional strategies, demonstrating DRL’s potential to improve portfolio outcomes while effectively managing risk.
... In one study, final decisions were made using results obtained from an ensemble of multiple agents trained with varying numbers of training iterations, aiming to minimize issues such as overfitting, which can occur with machine learning classifiers [45]. In another study using DQN, transfer learning approaches were proposed to prevent overfitting caused by insufficient financial data [46]. There are also studies using the DQN method for trading in foreign exchange markets [47]. ...
Article
Full-text available
Algoritmik ticaret, kantitatif ticaret olarak da bilinir, finans ve teknoloji endüstrilerinde önemli bir rol oynar. Borsa yatırımcılarının alım satım işlemlerinde kullandıkları bazı analitik yöntemlerin yerini alan algoritmik ticaret, makine öğrenimi ve derin öğrenmedeki gelişmelerden yararlanarak karmaşık verilerden anlam çıkarmada derin öğrenme yöntemlerinin yeteneğinden yararlanır. Bu çalışmada, topluluk öğrenmesi adı verilen bir çerçeve, derin öğrenme yöntemi olan takviyeli öğrenme ile birleştirildi. Eğitilmiş takviyeli öğrenme aracısı, Standard & Poors 500 (GSPC) endeksinin 2011 verileri üzerinde test edildi ve 2.258,27 $ kar marjı elde edildi. Önerilen yapıda, Uzun Kısa Süreli Bellek (LSTM) aracısı zaman serisi verilerini işlerken, Evrişimli Sinir Ağı (CNN) aracısı bu verilerden oluşturulan bir görüntüyü girdi olarak kullandı. Bu girdilerden elde edilen tahminler birleştirildi ve nihai sonuç bir Derin Q Ağı (DQN) modelinden türetildi, böylece topluluk öğrenme yapısı oluşturuldu. Sonuçlar, literatürdeki benzer çalışmalarda, topluluk öğrenme yöntemi kullanılarak yapılan tahminlerin, ajanlar ve yöntemler tarafından yapılan bireysel tahminlere kıyasla daha yüksek kazançlar sağladığını göstermektedir.
... These aid learning in complex environments, particularly with continuous or high-dimensional action spaces. It complements various reinforcement learning methods applied in finance, robotics, and games [ 6,10,17]. ...
Chapter
In a complex and changeable stock market, algorithmic stock trading has firmly established itself as a fundamental aspect of the present-day financial market, where most transactions are now fully automated. Additionally, Deep Reinforcement Learning (DRL) agents, renowned for their exceptional performance in intricate games such as chess and Go, are increasingly impacting the stock market. In this paper, we examine the potential of deep reinforcement learning to optimize the portfolio returns of 15 Asian stocks. We model stock trading as a Markov decision process problem because of its stochastic and interactive nature. Furthermore, we train a deep reinforcement learning agent using three actor-critic-based algorithms: proximal policy optimization (PPO), advantage actor-critic (A2C), and deep deterministic policy gradient (DDPG). We tested the algorithm on Asian stocks to see how well it performed pre-COVID, during COVID, and post-COVID. The trading agent’s performance using various reinforcement learning algorithms is assessed and compared to the traditional min-variance portfolio allocation strategy. The proposed three individual algorithms are above the minimum variance in risk-adjusted return as evaluated by the Sharpe ratio.
... The model has outperformed its benchmark and produced stable risk-adjusted returns for stocks and stock index futures. Jeong and Kim [18] [22]. The futures price time series is numerically sorted on a 15-day period. ...
Preprint
Full-text available
The forecasting and early warning of agricultural product price time series is an important task in the field of stream data event analysis and agricultural data mining. The existing forecasting methods of agricultural product price time series have the problems of low precision and low efficiency. To solve these problems, we propose a forecasting model selection method based on time-series image encoding technology. Specifically, we use Gramian Angle fields (GAFs), Markov transition fields (MTF), and Recurrence Plots (RP) to encode time series to images and retain all information about the event. Then, we propose a information fusion feature augmentation method (IFFA) to combine time series images. The time series combined images(TSCI) are input into the CNN forecasting model selection classifier. Finally, we introduce the idea of transfer learning to optimize the selection method of agricultural product price time series forecasting model. This idea can effectively reduce the overfitting phenomenon caused by insufficient data or unbalanced samples in real data set. Experimental results show that, compared with the existing methods, our IFFA-TSCI-CNN time series classification method has great advantages in efficiency and accuracy.
... Traditional time series methods employ statistical forecasting techniques, such as the Autoregressive Integrated Moving Average (ARIMA) model, which perform well for stationary linear data but fail to capture the dynamics of non-stationary multivariate time series. With the gradual development of artificial intelligence technology, machine learning methods have started to be applied to the prediction of non-stationary financial data, including decision tree forecasting [1], SVM prediction [2], neural network forecasting [3], gradient boosting [4], reinforcement learning prediction [5], and deep learning prediction [6]. ...
Article
Full-text available
Financial time series data are characterized by non-linearity, non-stationarity, and stochastic complexity, so predicting such data presents a significant challenge. This paper proposes a novel hybrid model for financial forecasting based on CEEMDAN-SE and ARIMA- CNN-LSTM. With the help of the CEEMDAN-SE method, the original data are decomposed into several IMFs and reconstructed via sample entropy into a lower-complexity stationary high-frequency component and a low-frequency component. The high-frequency component is predicted by the ARIMA statistical forecasting model, while the low-frequency component is predicted by a neural network model combining CNN and LSTM. Compared to some classical prediction models, our algorithm exhibits superior performance in terms of three evaluation indexes, namely, RMSE, MAE, and MAPE, effectively enhancing model accuracy while reducing computational overhead.
... When using machine learning algorithms, investors need to combine their investment philosophy and risk preference to develop a trading strategy that suits them. Compared with the traditional way of securities trading, algorithmic trading is that investors use computer programs to automatically split orders and submit them step by step according to the corresponding models and principles, and this way of order splitting and sequential submission effectively reduces the impact of investor orders on the securities market, reduces the cost of market impacts, improves the liquidity of the securities market, and at the same time captures many short-lived trading opportunities in the market in a timely manner [15][16][17][18][19][20] Using machine learning optimization algorithms to automate the improvement of securities trading strategies to get the optimal algorithmic trading strategy problem for investors in the specific securities market environment. ...
Article
Full-text available
Automation in securities trading offers advantages over human subjective trading, such as immunity to subjective emotional factors, high efficiency, and the ability to monitor multiple stocks simultaneously, making it a cutting-edge development path in the securities trading industry. In this paper, we first apply the concept of time-frequency decomposition, gradually moving from the first-order moments of securities prices to the higher-order moments. We then combine this with the EMD time-frequency decomposition method to analyze the securities price sequence and extract the characteristics of the securities price fluctuations. Finally, we use the differential long- and short-term memory network to construct an automatic optimization trading system. We compare the system’s performance with traditional technical analysis indexes, as well as the annualized returns of PPO and A2C models on various securities, to verify its performance under unilateral rising, oscillating rising, and plummeting quotes. Finally, we conducted a live test on 1000 GEM stocks. The system in this paper outperforms all traditional technical indicators, with an average annualized return of 71.85% at the lowest and 127.27% at the highest among 5 securities, demonstrating excellent performance. In the three quotes of Ningde Times, Aier Dental, and Goldfish that are rising one way, rising and falling over time, and rising again, the annualized returns of this paper’s system are 77.13%, 67.16%, and 12.66%, which are higher than those of the PPO and A2C models.
... Relative latency competition predictions are supported empirically. In order to optimize overall earnings and replicate actual financial market conditions, [5] investigates trading systems that use reinforcement learning together with three novel techniques. In the first approach, a deep neural network regressor is used to estimate how many shares will be traded. ...
... Various DL methods allowed researchers to train ANN techniques that enhance the use of RL and the use of popular algorithms such as Policy Gradients (PG), Advantage Actor-Critic (A2C), and Deep Q-Learning (DQL) (Mnih et al., 2015(Mnih et al., , 2016Sutton et al., 1999;Watkins & Dayan, 1992) have emerged as important advances in the discovery of investment policies in the stock and futures markets (Deng et al., 2017;Dixon et al., 2020;Moody & Saffell, 2001;Pendharkar & Cusatis, 2018) as well as in the cryptocurrency market (Sattarov et al., 2020) Jeong and Kim (2019) implemented a DQL RL model to improve financial trading decisions by adjusting the number of shares in the portfolio and implementing transfer learning to deal with insufficient data and avoid overfitting. Likewise, Zarkias et al. (2019) executed a trading strategy applying DQL to predict the Euro-Dollar exchange rates using a trailing stop strategy. ...
Article
Full-text available
While machine learning's role in financial trading has advanced considerably, algorithmic transparency and explainability challenges still exist. This research enriches prior studies focused on high‐frequency financial data prediction by introducing an explainable reinforcement learning model for portfolio management. This model transcends basic asset prediction, formulating concrete, actionable trading strategies. The methodology is applied in a custom trading environment mimicking the CAC‐40 index's financial conditions, allowing the model to adapt dynamically to market changes based on iterative learning from historical data. Empirical findings reveal that the model outperforms an equally weighted portfolio in out‐of‐sample tests. The study offers a dual contribution: it elevates algorithmic planning while significantly boosting transparency and interpretability in financial machine learning. This approach tackles the enduring ‘black‐box’ issue and provides a holistic, transparent framework for managing investment portfolios.
... Additionally, DQL's strategic decision-making prowess was applied in healthcare to optimize the security and privacy of healthcare data in IoT systems, focusing on authentication, malware, and DDoS attack mitigation, and evaluating performance through metrics like energy consumption and accuracy [51]. In the financial sector, the study [52] introduced an automated trading system that combines reinforcement learning with a deep neural network to predict share quantities and employs transfer learning to overcome data limitations. This approach significantly boosts profits across various stock indices, outperforming traditional systems in volatile markets. ...
Article
Full-text available
In the complex and dynamic landscape of cyber threats, organizations require sophisticated strategies for managing Cybersecurity Operations Centers and deploying Security Information and Event Management systems. Our study enhances these strategies by integrating the precision of well-known biomimetic optimization algorithms—namely Particle Swarm Optimization, the Bat Algorithm, the Gray Wolf Optimizer, and the Orca Predator Algorithm—with the adaptability of Deep Q-Learning, a reinforcement learning technique that leverages deep neural networks to teach algorithms optimal actions through trial and error in complex environments. This hybrid methodology targets the efficient allocation and deployment of network intrusion detection sensors while balancing cost-effectiveness with essential network security imperatives. Comprehensive computational tests show that versions enhanced with Deep Q-Learning significantly outperform their native counterparts, especially in complex infrastructures. These results highlight the efficacy of integrating metaheuristics with reinforcement learning to tackle complex optimization challenges, underscoring Deep Q-Learning’s potential to boost cybersecurity measures in rapidly evolving threat environments.
... The global transition to sustainable energy sources necessitates the development of mechanisms like green certificates (GCs) to incentivize renewable energy production. Scholars from China, Europe, America, and other regions have extensively researched and explored issues related to the market mechanisms and models of GCs [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19] , technological innovations including blockchain and artificial intelligence platform technologies [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37] , policies and economic strategies and market changes 10,19,[38][39][40][41][42][43][44][45] . ...
Article
Full-text available
Given the complexity of issuing, verifying, and trading green power certificates in China, along with the challenges posed by policy changes, ensuring that China’s green certificate market trading system receives proper mechanisms and technical support is crucial. This study presents a green power certificate trading (GC-TS) architecture based on an equilibrium strategy, which enhances the quoting efficiency and multi-party collaboration capability of green certificate trading by introducing Q-learning, smart contracts, and effectively integrating a multi-agent trading Nash strategy. Firstly, we integrate green certificate trading with electricity and carbon asset trading, constructing pricing strategies for the green certificate, carbon, and electricity trading markets; secondly, we design a certificate-electricity-carbon efficiency model based on ensuring the consistency of green certificates, green electricity, and carbon markets; then, to achieve diversified green certificate trading, we establish a multi-agent reinforcement learning game equilibrium model. Additionally, we propose an integrated Nash Q-learning offer with a smart contract dynamic trading joint clearing mechanism. Experiments show that trading prices have increased by 20%, and the transaction success rate by 30 times, with an analysis of trading performance from groups of 3, 5, 7, and 9 trading agents exhibiting high consistency and redundancy. Compared with models integrating smart contracts, it possesses a higher convergence efficiency of trading quotes.
... The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. (Wu et al., 2019); (Bisi et al.); ; (Chen et al., Nov. 2021) 7 5 (Chen et al., 2018); ; (Chen and Gao, 2019); (Maalla et al., 2021); (Vogl et al., 2022) 7.25 4 (Yong et al., 2017); (Si et al., 2017); (Jeong and Kim, 2019); (Tsantekidis et al., 2020-September, Sep. 2020) 7.5 11 (Ma and Han, 2018); (Carapuço et al., Dec. 2018); (Dang, 1121); ; (Baek et al., 2020); (Bisht and Kumar, 2020); (Di. ...
Article
Artificial Intelligence (AI) approaches have been increasingly used in financial markets as technology advances. In this research paper, we conduct a Systematic Literature Review (SLR) that studies financial trading approaches through AI techniques. It reviews 143 research articles that implemented AI techniques in financial trading markets. Accordingly, it presents several findings and observations after reviewing the papers from the following perspectives: the financial trading market and the asset type, the trading analysis type considered along with the AI technique, and the AI techniques utilized in the trading market, the estimation and performance metrics of the proposed models. The selected research articles were published between 2015 and 2023, and this review addresses four RQs. After analyzing the selected research articles, we observed 8 financial markets used in building predictive models. Moreover, we found that technical analysis is more adopted compared to fundamental analysis. Furthermore, 16% of the selected research articles entirely automate the trading process. In addition, we identified 40 different AI techniques that are used as standalone and hybrid models. Among these techniques, deep learning techniques are the most frequently used in financial trading markets. Building prediction models for financial markets using AI is a promising field of research, and academics have already deployed several machine learning models. As a result of this evaluation, we provide recommendations and guidance to researchers.
Article
The present study examines the factors influencing the optimization of FinTech in financial markets using a hybrid approach. FinTech, as one of the fundamental transformations in the financial world, has significantly altered financial interactions, particularly through the application of advanced technologies such as artificial intelligence, blockchain, and big data processing. The objective of this research is to identify and model the components that affect FinTech optimization in financial markets and to employ optimization models to enhance their performance. Initially, to identify the key components, a qualitative method and meta-synthesis technique were utilized. In this phase, documents and articles related to FinTech and the banking industry were reviewed, and data extracted from other studies were analyzed. These data encompass various dimensions, including cybersecurity, artificial intelligence indicators, system scalability, and the economic and social impacts of financial technologies. After identifying the main components, the quantitative phase of the research involved modeling FinTech optimization based on these components using mathematical programming methods. The proposed optimization models were implemented using GAMS software with the epsilon constraint algorithm and MATLAB software with the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). The results obtained from the optimization models indicate that improvements in the identified components can lead to increased efficiency and reduced costs in FinTech systems. This study also demonstrates that employing a hybrid approach and simultaneously analyzing qualitative and quantitative data can facilitate a more accurate simulation of FinTech performance in financial markets and offer more effective solutions for improving financial processes.
Article
Reinforcement Learning (RL) has experienced significant advancement over the past decade, prompting a growing interest in applications within finance. This survey critically evaluates 167 publications, exploring diverse RL applications and frameworks in finance. Financial markets, marked by their complexity, multi-agent nature, information asymmetry, and inherent randomness, serve as an intriguing test-bed for RL. Traditional finance offers certain solutions, and RL advances these with a more dynamic approach, incorporating machine learning methods, including transfer learning, meta-learning, and multi-agent solutions. This survey dissects key RL components through the lens of Quantitative Finance. We uncover emerging themes, propose areas for future research, and critique the strengths and weaknesses of existing methods.
Article
Currently, the stock market is attractive, and it is challenging to develop an efficient investment model with high accuracy due to changes in the values of the shares for political, economic, and social reasons. This article presents an innovative proposal for a short-term, automatic investment model to reduce capital loss during trading, applying a reinforcement learning (RL) model. On the other hand, we propose an adaptable data window structure to enhance the learning and accuracy of investment agents in three foreign exchange markets: crude oil, gold, and the Euro. In addition, the RL model employs an actor-critic neural network with rectified linear unit (ReLU) neurons to generate specialized investment agents, enabling more efficient trading, minimizing investment losses across different time periods, and reducing the model’s learning time. The proposed RL model obtained a reduction average loss of 0.03% in Euro, 0.25% in gold, and 0.13% in crude oil in the test phase with varying initial conditions.
Chapter
Stocks trading strategy plays an important role in financial investment. However, it is challenging to come up with an optimal profit-making portfolio in a volatile market. In this thesis, we proposed a couple of ensemble methods that use a few deep reinforcement learning (DRL) architectures to train on dynamic markets and learn complex trading strategies to achieve maximum returns on investments. We proposed three ensemble strategies with three different RL Actor-Critic algorithms as constituents: Twin Delayed Deep Deterministic Policy Gradient (TD3), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). These three ensembles are as follows: (i) PPO, TD3, and DDPG (ii) SAC, PPO, and TD3 (iii) DDPG, SAC, and PPO and compared their performance with that of the state-of-the-art ensemble method, performance namely, Advantage Actor Critic (A2C), PPO, and DDPG. The ensemble techniques adapt to various market conditions by utilizing the best aspects of all three algorithms. The effectiveness of these ensembles is demonstrated on 30 Sensex stocks with sufficient liquidity and 30 Dow Jones Industrial Average (DJIA) indexed stocks. The Sharpe ratio and maximum drawdown are employed to evaluate the performance of the ensemble methods.
Chapter
Full-text available
Over the past few decades, the financial industry has shown a keen interest in using computational intelligence to improve various financial processes. As a result, a range of models have been developed and published in numerous studies. However, in recent years, deep learning (DL) has gained significant attention within the field of machine learning (ML) due to its superior performance compared to traditional models. There are now several different DL implementations being used in finance, particularly in the rapidly growing field of Fintech. DL is being widely utilized to develop advanced banking services and investment strategies. This chapter provides a comprehensive overview of the current state-of-the-art in DL models for financial applications. The chapter is divided into categories based on the specific sub-fields of finance, and examines the use of DL models in each area. These include algorithmic trading, price forecasting, credit assessment, and fraud detection. The chapter aims to provide a concise overview of the various DL models being used in these fields and their potential impact on the future of finance.
Article
Full-text available
RESUMEN Objetivo: Evaluar la eficacia de los modelos de aprendizaje profundo y sus extensiones con los modelos de volatilidad condicional en la predicción de la volatilidad del Índice S&P/BVL Peru General. Métodos: El estudio adoptó un diseño no experimental, transversal y analítico para prever la volatilidad del Índice S&P/BVL Peru General, analizando 5807 datos desde el 3 de enero de 2000 hasta el 16 de octubre de 2023. Los datos históricos del Índice fueron procesados utilizando técnicas computacionales en Python. Se realizaron pruebas estadísticas como la prueba aumentada de Dickey-Fuller y la prueba de Jarque-Bera. Para lograr la estacionariedad de los datos, fue necesario aplicar diferenciaciones después de las pruebas de raíz unitaria. Se comparó la precisión de los modelos mediante pruebas de Diebold-Mariano y de Rango Signado de Wilcoxon. Resultados: Los modelos GJR-GARCH, SVR con kernel lineal y redes neuronales demostraron un rendimiento superior en términos de menor MSE, destacándose especialmente el SVR con un MAE de 0.00049. Conclusión: Los modelos GJR-GARCH y SVR con kernel lineal son altamente efectivos para predecir la volatilidad del Índice, recomendándose su implementación en estrategias de gestión de riesgos. Palabras clave: econometría, gestión de riesgos, mercado financiero, aprendizaje, empresa. Términos de indización Tesauro UNESCO: econometría, gestión de riesgos, mercado financiero, empresa, inversión. © Los autores. Este artículo es publicado por la revista Quipukamayoc de la Facultad de Ciencias Contables, Universidad Nacional Mayor de San Marcos. Este es un artículo de acceso abierto, distribuido bajo los términos de la licencia Creative Commons Atribución 4.0 Internacional (CC BY 4.0) [https://creativecommons.org/licenses/ by/4.0/deed.es] que permite el uso, distribución y reproducción en cualquier medio, siempre que la obra original sea debidamente citada de su fuente original.
Article
In the contemporary digitalization landscape and technological advancement, the auction industry undergoes a metamorphosis, assuming a pivotal role as a transactional paradigm. Functioning as a mechanism for pricing commodities or services, the procedural intricacies and efficiency of auctions directly influence market dynamics and participant engagement. Harnessing the advancing capabilities of artificial intelligence (AI) technology, the auction sector proactively integrates AI methodologies to augment efficacy and enrich user interactions. This study delves into the intricacies of the price prediction challenge within the auction domain, introducing a sophisticated RL-GRU framework for price interval analysis. The framework commences by adeptly conducting quantitative feature extraction of commodities through GRU, subsequently orchestrating dynamic interactions within the model’s environment via reinforcement learning techniques. Ultimately, it accomplishes the task of interval division and recognition of auction commodity prices through a discerning classification module. Demonstrating precision exceeding 90% across publicly available and internally curated datasets within five intervals and exhibiting superior performance within eight intervals, this framework contributes valuable technical insights for future endeavours in auction price interval prediction challenges.
Chapter
In this article, we propose a reinforcement learning method for developing a stock trading strategy while optimizing investment return. Using the Advantage Actor Critic (A2C) algorithms to check all SET 50 stocks, to train a deep reinforcement learning agent to get this trading technique. Our empirical findings demonstrate that, in terms of Sharpe ratio and cumulative returns, the proposed reinforcement learning technique beats both the SET 50 Average index and the conventional min-variance portfolio allocation strategy.
Chapter
The artificial intelligence (AI) has been a platform of immense assistance to develop and simplify discoveries in medical science. However, the environmentalist have been researching this concept to benefit the environment to establish multidimensional discoveries of clean energy. An increase in greenhouse gases (GHGs) is caused by most human developmental activities. Direct or indirect emission of GHGs by person, group, event, or any other activity contributes to the carbon footprint. According to the Environmental Protection Agency, USA, major sources of increasing GHGs are transportation (29%), electricity (28%), industry (22%), commercial and residential (12%), and agriculture (9%). Vigorous effects are required to control the increasing GHGs by developing and implementing policies and utilizing new technologies. In this time of challenges presented by climate change, technological advancements in artificial intelligence (AI) or digital assistance have made a significant impact on people’s lifestyles. AI-based technologies to monitor, predict, and reduce GHGs emissions may help in a cleaner environment. This article aims to describe different AI-based approaches to minimize carbon footprints as well as discuss the role of AI in various industries and its economic and societal outcomes. Specifically, we have attempted to fill the research gaps by investigating existing opportunities in the field of AI toward reducing GHG emissions.
Article
Full-text available
Recent advances in the field of machine learning have yielded novel research perspectives in behavioural economics and financial markets microstructure studies. In this paper we study the impact of individual trader leaning characteristics on markets using a stock market simulator designed with a multi-agent architecture. Each agent, representing an autonomous investor, trades stocks through reinforcement learning, using a centralized double-auction limit order book. This approach allows us to study the impact of individual trader traits on the whole stock market at the mesoscale in a bottom-up approach. We chose to test three trader trait aspects: agent learning rate increases, herding behaviour and random trading. As hypothesized, we find that larger learning rates significantly increase the number of crashes. We also find that herding behaviour undermines market stability, while random trading tends to preserve it.
Article
Full-text available
Financial portfolio management is the process of constant redistribution of a fund into different financial products. This paper presents a financial-model-free Reinforcement Learning framework to provide a deep machine learning solution to the portfolio management problem. The framework consists of the Ensemble of Identical Independent Evaluators (EIIE) topology, a Portfolio-Vector Memory (PVM), an Online Stochastic Batch Learning (OSBL) scheme, and a fully exploiting and explicit reward function. This framework is realized in three instants in this work with a Convolutional Neural Network (CNN), a basic Recurrent Neural Network (RNN), and a Long Short-Term Memory (LSTM). They are, along with a number of recently reviewed or published portfolio-selection strategies, examined in three back-test experiments with a trading period of 30 minutes in a cryptocurrency market. Cryptocurrencies are electronic and decentralized alternatives to government-issued money, with Bitcoin as the best-known example of a cryptocurrency. All three instances of the framework monopolize the top three positions in all experiments, outdistancing other compared trading algorithms. Although with a high commission rate of 0.25% in the backtests, the framework is able to achieve at least 4-fold returns in 30 days.
Article
Full-text available
ABSTRACT Banana is one of the most consumed fruits in Brazil and an important source of minerals, vitamins and carbohydrates for human diet. The characterization of banana superior genotypes allows identifying those with nutritional quality for cultivation and to integrate genetic improvement programs. However, identification and quantification of the provitamin carotenoids are hampered by the instruments and reagents cost for chemical analyzes, and it may become unworkable if the number of samples to be analyzed is high. Thus, the objective was to verify the potential of indirect phenotyping of the vitamin A content in banana through artificial neural networks (ANNs) using colorimetric data. Fifteen banana cultivars with four replications were evaluated, totaling 60 samples. For each sample, colorimetric data were obtained and the vitamin A content was estimated in the ripe banana pulp. For the prediction of the vitamin A content by colorimetric data, multilayer perceptron ANNs were used. Ten network architectures were tested with a single hidden layer. The network selected by the best fit (least mean square error) had four neurons in the hidden layer, enabling high efficiency in prediction of vitamin A (r2 = 0.98). The colorimetric parameters a* and Hue angle were the most important in this study. High-scale indirect phenotyping of vitamin A by ANNs on banana pulp is possible and feasible.
Article
Full-text available
Mini-batch optimization has proven to be a powerful paradigm for large-scale learning. However, the state of the art parallel mini-batch algorithms assume synchronous operation or cyclic update orders. When worker nodes are heterogeneous (due to different computational capabilities or different communication delays), synchronous and cyclic operations are inefficient since they will leave workers idle waiting for the slower nodes to complete their computations. In this paper, we propose an asynchronous mini-batch algorithm for regularized stochastic optimization problems with smooth loss functions that eliminates idle waiting and allows workers to run at their maximal update rates. We show that by suitably choosing the step-size values, the algorithm achieves a rate of the order O(1/T)O(1/\sqrt{T}) for general convex regularization functions, and the rate O(1/T) for strongly convex regularization functions, where T is the number of iterations. In both cases, the impact of asynchrony on the convergence rate of our algorithm is asymptotically negligible, and a near-linear speedup in the number of workers can be expected. Theoretical results are confirmed in real implementations on a distributed computing infrastructure.
Article
Full-text available
This chapter presents the historical context for the current state of financial information and risk management. In lieu of a comprehensive history, the authors discuss several broad historical themes in risk and finance: institutionalization, technology, globalization, and complexity, including the rise of risk management professionals. Emblematic events are used to illustrate the evolution of the financial markets and risk management.
Article
Full-text available
The explosion of algorithmic trading has been one the most recent prominent trends in the financial industry. Algorithmic trading consists of automated trading strategies that attempt to minimize transaction costs by optimally placing transac- tions orders. The key ingredient of many of these strategies are intra-daily volume predictions. This work proposes a dynamic model for intra-daily volume forecast- ing that captures salient features of the series such as intra-daily periodicity and volume asymmetry. Results show that the proposed methodology is able to signif- icantly outperform common volume forecasting methods and delivers significantly more precise predictions in a VWAP tracking trading exercise.
Article
Full-text available
Brain-computer interaction (BCI) and physiological computing are terms that refer to using processed neural or physiological signals to influence human interaction with computers, environment, and each other. A major challenge in developing these systems arises from the large individual differences typically seen in the neural/physiological responses. As a result, many researchers use individually-trained recognition algorithms to process this data. In order to minimize time, cost, and barriers to use, there is a need to minimize the amount of individual training data required, or equivalently, to increase the recognition accuracy without increasing the number of user-specific training samples. One promising method for achieving this is collaborative filtering, which combines training data from the individual subject with additional training data from other, similar subjects. This paper describes a successful application of a collaborative filtering approach intended for a BCI system. This approach is based on transfer learning (TL), active class selection (ACS), and a mean squared difference user-similarity heuristic. The resulting BCI system uses neural and physiological signals for automatic task difficulty recognition. TL improves the learning performance by combining a small number of user-specific training samples with a large number of auxiliary training samples from other similar subjects. ACS optimally selects the classes to generate user-specific training samples. Experimental results on 18 subjects, using both [Formula: see text] nearest neighbors and support vector machine classifiers, demonstrate that the proposed approach can significantly reduce the number of user-specific training data samples. This collaborative filtering approach will also be generalizable to handling individual differences in many other applications that involve human neural or physiological data, such as affective computing.
Conference Paper
Full-text available
Electronic markets have emerged as popular venues for the trading of a wide variety of financial assets, and computer based algorithmic trading has also asserted itself as a dominant force in financial markets across the world. Identifying and understanding the impact of algorithmic trading on financial markets has become a critical issue for market operators and regulators. We propose to characterize traders’ behavior in terms of the reward functions most likely to have given rise to the observed trading actions. Our approach is to model trading decisions as a Markov Decision Process (MDP), and use observations of an optimal decision policy to find the reward function. This is known as Inverse Reinforcement Learning (IRL). Our IRL-based approach to characterizing trader behavior strikes a balance between two desirable features in that it captures key empirical properties of order book dynamics and yet remains computationally tractable. Using an IRL algorithm based on linear programming, we are able to achieve more than 90% classification accuracy in distinguishing high frequency trading from other trading strategies in experiments on a simulated E-Mini S&P 500 futures market. The results of these empirical tests suggest that high frequency trading strategies can be accurately identified and profiled based on observations of individual trading actions.
Article
Full-text available
This paper reviews previous and current research on the relation between price changes and trading volume in financial markets, and makes four contributions. First, two empirical relations are established: volume is positively related to the magnitude of the price change and, in equity markets, to the price change per se. Second, previous theoretical research on the price-volume relation is summarized and critiqued, and major insights are emphasized. Third, a simple model of the price-volume relation is proposed that is consistent with several seemingly unrelated or contradictory observations. And fourth, several directions for future research are identified.
Article
Full-text available
This paper introduces adaptive reinforcement learning (ARL) as the basis for a fully automated trading system application. The system is designed to trade foreign exchange (FX) markets and relies on a layered structure consisting of a machine learning algorithm, a risk management overlay and a dynamic utility optimization layer. An existing machine-learning method called recurrent reinforcement learning (RRL) was chosen as the underlying algorithm for ARL. One of the strengths of our approach is that the dynamic optimization layer makes a fixed choice of model tuning parameters unnecessary. It also allows for a risk-return trade-off to be made by the user within the system. The trading system is able to make consistent gains out-of-sample while avoiding large draw-downs.
Article
Full-text available
Associative memory operating in a real environment must perform well in online incremental learning and be robust to noisy data because noisy associative patterns are presented sequentially in a real environment. We propose a novel associative memory that satisfies these requirements. Using the proposed method, new associative pairs that are presented sequentially can be learned accurately without forgetting previously learned patterns. The memory size of the proposed method increases adaptively with learning patterns. Therefore, it suffers neither redundancy nor insufficiency of memory size, even in an environment in which the maximum number of associative pairs to be presented is unknown before learning. Noisy inputs in real environments are classifiable into two types: noise-added original patterns and faultily presented random patterns. The proposed method deals with two types of noise. To our knowledge, no conventional associative memory addresses noise of both types. The proposed associative memory performs as a bidirectional one-to-many or many-to-one associative memory and deals not only with bipolar data, but also with real-valued data. Results demonstrate that the proposed method's features are important for application to an intelligent robot operating in a real environment. The originality of our work consists of two points: employing a growing self-organizing network for an associative memory, and discussing what features are necessary for an associative memory for an intelligent robot and proposing an associative memory that satisfies those requirements.
Article
Full-text available
This paper investigates the relationship between aggregate stock market trading volume and the serial correlation of daily stock returns. For both stock indexes and individual large stocks, the first-order daily return autocorrelation tends to decline with volume. The paper explains this phenomenon using a model in which risk-averse “market makers” accommodate buying or selling pressure from “liquidity” or “noninformational” traders. Changing expected stock returns reward market makers for playing this role. The model implies that a stock price decline on a high-volume day is more likely than a stock price decline on a low-volume day to be associated with an increase in the expected stock return.
Article
Full-text available
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of—possibly redundant—inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Article
Full-text available
This paper investigates the method of forecasting stock price difference on artificially generated price series data using neuro-fuzzy systems and neural networks. As trading profits is more important to an investor than statistical performance, this paper proposes a novel rough set-based neuro-fuzzy stock trading decision model called stock trading using rough set-based pseudo outer-product (RSPOP) which synergizes the price difference forecast method with a forecast bottleneck free trading decision model. The proposed stock trading with forecast model uses the pseudo outer-product based fuzzy neural network using the compositional rule of inference [POPFNN-CRI(S)] with fuzzy rules identified using the RSPOP algorithm as the underlying predictor model and simple moving average trading rules in the stock trading decision model. Experimental results using the proposed stock trading with RSPOP forecast model on real world stock market data are presented. Trading profits in terms of portfolio end values obtained are benchmarked against stock trading with dynamic evolving neural-fuzzy inference system (DENFIS) forecast model, the stock trading without forecast model and the stock trading with ideal forecast model. Experimental results showed that the proposed model identified rules with greater interpretability and yielded significantly higher profits than the stock trading with DENFIS forecast model and the stock trading without forecast model.
Article
Intelligent agents are often used in professional portfolio management. The use of intelligent agents in personal retirement portfolio management is not investigated in the past. In this research, we consider a two-asset personal retirement portfolio and propose several reinforcement learning agents for trading portfolio assets. In particular, we design an on-policy SARSA (λ) and an off-policy Q(λ) discrete state and discrete action agents that maximize either portfolio returns or differential Sharpe ratios. Additionally, we design a temporal-difference learning, TD(λ), agent that uses a linear valuation function in discrete state and continuous action settings. Using two different two-asset portfolios, the first asset being the S&P 500 Index and the second asset being either a broad bond market index or a 10-year U.S. Treasury note (T-note), we test the performance of different agents on different holdout (test) samples. The results of our experiments indicate that the high-learning frequency (i.e., adaptive learning) TD(λ) agent consistently beats both the single asset stock and bond cumulative returns by a significant margin.
Article
Dynamic control theory has long been used in solving optimal asset allocation problems, and a number of trading decision systems based on reinforcement learning methods have been applied in asset allocation and portfolio rebalancing. In this paper, we extend the existing work in recurrent reinforcement learning (RRL) and build an optimal variable weight portfolio allocation under a coherent downside risk measure, the expected maximum drawdown, E(MDD). In particular, we propose a recurrent reinforcement learning method, with a coherent risk adjusted performance objective function, the Calmar ratio, to obtain both buy and sell signals and asset allocation weights. Using a portfolio consisting of the most frequently traded exchange-traded funds, we show that the expected maximum drawdown risk based objective function yields superior return performance compared to previously proposed RRL objective functions (i.e. the Sharpe ratio and the Sterling ratio), and that variable weight RRL long/short portfolios outperform equal weight RRL long/short portfolios under different transaction cost scenarios. We further propose an adaptive E(MDD) risk based RRL portfolio rebalancing decision system with a transaction cost and market condition stop-loss retraining mechanism, and we show that the proposed portfolio trading system responds to transaction cost effects better and outperforms hedge fund benchmarks consistently.
Article
We explore the use of deep learning hierarchical models for problems in financial prediction and classification. Financial prediction problems – such as those presented in designing and pricing securities, constructing portfolios, and risk management – often involve large data sets with complex data interactions that currently are difficult or impossible to specify in a full economic model. Applying deep learning methods to these problems can produce more useful results than standard methods in finance. In particular, deep learning can detect and exploit interactions in the data that are, at least currently, invisible to any existing financial economic theory. Copyright
Article
This Article argues that the rise of algorithmic trading undermines efficient capital allocation in securities markets. It is a bedrock assumption in theory that securities prices reveal how effectively public companies utilize capital. This conventional wisdom rests on the straightforward premise that prices reflect available information about a security and that investors look to prices to decide where to invest and whether their capital is being productively used. Unsurprisingly, regulation relies pervasively on prices as a proxy for the allocative efficiency of investor capital. Algorithmic trading weakens the ability of prices to function as a window into allocative efficiency. This Article develops two lines of argument. First, algorithmic markets evidence a systemic degree of model risk - the risk that stylized programming and financial modeling fails to capture the messy details of real-world trading. By design, algorithms rely on pre-set programming and modeling to function. Traders must predict how markets might behave and program their algorithms accordingly in advance of trading, and this anticipatory dynamic creates steep costs. Building algorithms capable of predicting future markets presents a near-impossible proposition, making gaps and errors inevitable. These uncertainties create incentives for traders to focus efforts on markets where prediction is likely to be most successful, i.e., short-term markets that have limited relevance for capital allocation. Secondly, informed traders, long regarded as critical to filling gaps in information and supplying markets with insight, have fewer incentives to participate in algorithmic markets and to correct these and other informational deficits. Competing with high-speed, algorithmic counterparts, informed traders can see lower returns from their engagement. When informed traders lose interest in bringing insights to securities trading, prices are less rich as a result. This argument has significant implications for regulation that views prices as providing an essential window into allocative efficiency. Broad swaths of regulation across corporate governance and securities regulation rely on prices as a mechanism to monitor and discipline public companies. As algorithmic trading creates costs for capital allocation, this reliance must also be called into question. In concluding, this Article outlines pathways for reform to better enable securities markets to fulfill their fundamental purpose: efficiently allocating capital to the real economy.
Article
Can we train the computer to beat experienced traders for financial assert trading? In this paper, we try to address this challenge by introducing a recurrent deep neural network (NN) for real-time financial signal representation and trading. Our model is inspired by two biological-related learning concepts of deep learning (DL) and reinforcement learning (RL). In the framework, the DL part automatically senses the dynamic market condition for informative feature learning. Then, the RL module interacts with deep representations and makes trading decisions to accumulate the ultimate rewards in an unknown environment. The learning system is implemented in a complex NN that exhibits both the deep and recurrent structures. Hence, we propose a task-aware backpropagation through time method to cope with the gradient vanishing issue in deep training. The robustness of the neural system is verified on both the stock and the commodity future markets under broad testing conditions.
Article
The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Article
F OR MANY YEARS cconomists, Statisticians, and teach-ers of finance have been interested in developing and testing models of stock price behavior. One important model that has evolved from this research is the theory of random walks. This theory casts serious doubt on many other methods for describing and predicting stock price behavior — methods that have considerable popularity outside the academic world. For example, we shall see later that if the random walk theory is an accurate description of reality, then the various "technical" or "chartist" procedures for pre-dicting stock prices are completely without value. In general the theory of random walks raises chal-lenging questions for anyone who has more than a passing interest in understanding the behavior of stock prices. Unfortunately, however, most discussions of the theory have appeared in technical academic journals and in a form which the non-mathematician would usually find incomprehensible. This article describes, briefly and simply, the theory of random walks and some of the important issues it raises concerning the work of market analysts. To preserve brevity some aspects of the theory and its implications are omitted. More complete (and also more technical) discussions of the theory of random walks are available elsewhere; hopefully the introduction provided here will encourage the reader to examine one of the more rigorous and lengthy works listed at the end of this article. Common Techniques for Predicting Stock Market Prices In order to put the theory of random walks into perspective we first discuss, in brief and general terms, the two approaches to predicting stock prices that are commonly espoused by market professionals. These are (1) "chartist" or "technical" theories and (2) the theory of fundamental or intrinsic value analysis. The basic assumption of all the chartist or technical theories is that history tends to repeat itself, i.e., past patterns of price behavior in individual securities will tend to recur in the future. Thus the way to predict stock prices (and, of course, increase one's potential Eugene F. Fama is Assistant Professor of Finance, Graduate Sehool of Business, The University of Chieago.
Article
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
Article
We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Article
Article
The authors examine transaction costs associated with algorithmic trading, based on a sample of 2.5 million orders, of which one million are executed via algorithmic means. The data permit a comparison of algorithmic executions with a broader universe of trades, as well as across multiple providers of model-based trading services. Algorithmic trading is found to be a cost-effective technique, based on a measure of implementation shortfall. The superiority of algorithm performance applies only for order sizes up to 10% of average daily volume, however. Algorithmic trading performance relative to a commonly used volume participation benchmark also is quite good, although certainty of outcome declines sharply with the size of the order. A clear link between performance and variability in performance relative to both benchmarks appears to be lacking. Although rough equality across providers is observed on average, this equality of performance breaks down quickly as order size grows.
Article
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning. Comment: See http://www.jair.org/ for any accompanying files
Article
The paper deals with the problem of discrete–time delta hedging and discrete-time option valuation by the Black–Scholes model. Since in the Black–Scholes model the hedging is continuous, hedging errors appear when applied to discrete trading. The hedging error is considered and a discrete-time adjusted Black–Scholes–Merton equation is derived. By anticipating the time sensitivity of delta in many cases the discrete-time delta hedging can be improved and more accurate delta values dependent on the length of the rebalancing intervals can be obtained. As an application the discrete-time trading with transaction costs is considered. Explicit solution of the option valuation problem is given and a closed form delta value for a European call option with transaction costs is obtained.
Article
A major assumption in many machine learning and data mining algorithms is that the training and future data must be in the same feature space and have the same distribution. However, in many real-world applications, this assumption may not hold. For example, we sometimes have a classification task in one domain of interest, but we only have sufficient training data in another domain of interest, where the latter data may be in a different feature space or follow a different data distribution. In such cases, knowledge transfer, if done successfully, would greatly improve the performance of learning by avoiding much expensive data-labeling efforts. In recent years, transfer learning has emerged as a new learning framework to address this problem. This survey focuses on categorizing and reviewing the current progress on transfer learning for classification, regression, and clustering problems. In this survey, we discuss the relationship between transfer learning and other related machine learning techniques such as domain adaptation, multitask learning and sample selection bias, as well as covariate shift. We also explore some potential future issues in transfer learning research.
Article
It has been long recognized that trading volume provides valuable information for understanding stock price movement. As such, equivolume charting was developed to consider how stocks appear to move in a volume frame of reference as opposed to a time frame of reference. Two technical indicators, namely the volume adjusted moving average (VAMA) and the ease of movement (EMV) indicator, are developed from equivolume charting. This paper explores the profitability of stock trading by using a neural network model developed to assist the trading decisions of the VAMA and EMV. The generalized regression neural network (GRNN) is chosen and utilized on past S&P 500 index data. For the VAMA, the GRNN is used to predict the future stock prices, as well as the future width size of the equivolume boxes typically utilized on an equivolume chart, for calculating the future value of the VAMA. For the EMV, the GRNN is also used to predict the future value of the EMV. The idea is to further exploit the equivolume potential by using a forecasting system to predict the future equivolume measurements, allowing investors to enter or exit trades earlier. The results show that the stock trading using the neural network with the VAMA and EMV outperforms the results of stock trading generated from the VAMA and EMV without neural network assistance, the simple moving averages (MA) in isolation, and the buy-and-hold trading strategy.
Article
The stock market, which has been investigated by various researchers, is a rather complicated environment. Most research only concerned the technical indexes (quantitative factors), instead of qualitative factors, e.g., political effect. However, the latter plays a critical role in the stock market environment. Thus, this study develops a genetic algorithm based fuzzy neural network (GFNN) to formulate the knowledge base of fuzzy inference rules which can measure the qualitative effect on the stock market. Next, the effect is further integrated with the technical indexes through the artificial neural network (ANN). An example based on the Taiwan stock market is utilized to assess the proposed intelligent system. Evaluation results indicate that the neural network considering both the quantitative and qualitative factors excels the neural network considering only the quantitative factors both in the clarity of buying-selling points and buying-selling performance.
Article
Trading in stock market indices has gained unprecedented popularity in major financial markets around the world. However, the prediction of stock price index is a very difficult problem because of the complexity of the stock market data. This study proposes stock trading model based on chaotic analysis and piecewise nonlinear model. The core component of the model is composed of four phases: The first phase determines time-lag size in input variables using chaotic analysis. The second phase detects successive change-points in the stock market data and the third phase forecasts the change-point group with backpropagation neural networks (BPNs). The final phase forecasts the output with BPN. The experimental results are encouraging and show the usefulness of the proposed model with respect to profitability.
Conference Paper
We present the first large-scale empirical application of reinforcement learning to the important problem of optimized trade execution in modern financial markets. Our experiments are based on 1.5 years of millisecond time-scale limit order data from NASDAQ, and demonstrate the promise of reinforcement learning methods to market microstructure problems. Our learning algorithm introduces and exploits a natural "low-impact" factorization of the state space.
Conference Paper
Restricted Boltzmann machines were developed using binary stochastic hidden units. These can be generalized by replacing each binary unit by an infinite number of copies that all have the same weights but have progressively more negative biases. The learning and inference rules for these “Stepped Sigmoid Units ” are unchanged. They can be approximated efficiently by noisy, rectified linear units. Compared with binary units, these units learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset. Unlike binary units, rectified linear units preserve information about relative intensities as information travels through multiple layers of feature detectors. 1.
Conference Paper
We describe and analyze an online algorithm for supervised learning of pseudo-metrics. The algorithm receives pairs of instances and predicts their similarity according to a pseudo-metric. The pseudo-metrics we use are quadratic forms parameterized by positive semi-definite matrices. The core of the algorithm is an update rule that is based on successive projections onto the positive semi-definite cone and onto half-space constraints imposed by the examples. We describe an efficient procedure for performing these projections, derive a worst case mistake bound on the similarity predictions, and discuss a dual version of the algorithm in which it is simple to incorporate kernel operators. The online algorithm also serves as a building block for deriving a large-margin batch algorithm. We demonstrate the merits of the proposed approach by conducting experiments on MNIST dataset and on document filtering.
Conference Paper
A novel stochastic adaptation of the recurrent reinforcement learning (RRL) methodology is applied to daily, weekly, and monthly stock index data, and compared to results obtained elsewhere using genetic programming (GP). The data sets used have been a considered a challenging test for algorithmic trading. It is demonstrated that RRL can reliably outperform buy-and-hold for the higher frequency data, in contrast to GP which performed best for monthly data.
Article
We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an uncon- strained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penalized risk minimization and on- line learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as ℓ1. We derive concrete and very simple algorithms for minimization of loss functions with ℓ1, ℓ2, ℓ22, and ℓ∞ regularization. We also show how to construct ef- ficient algorithms for mixed-norm ℓ1/ℓq regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in a series of experiments with synthetic and natural data sets.
Article
Stock trading is an important decision-making problem that involves both stock selection and asset management. Though many promising results have been reported for predicting prices, selecting stocks, and managing assets using machine-learning tech- niques, considering all of them is challenging because of their complexity. In this paper, we present a new stock trading method that incorporates dynamic asset allocation in a reinforcement-learning framework. The proposed asset allocation strategy, called meta policy (MP), is designed to utilize the temporal information from both stock recommen- dations and the ratio of the stock fund over the asset. Local traders are constructed with pattern-based multiple predictors, and used to decide the purchase money per recom- mendation. Formulating the MP in the reinforcement learning framework is achieved by a compact design of the environment and the learning agent. Experimental results
Article
Stock trading system to assist decision-making is an emerging research area and has great commercial potentials. Successful trading operations should occur near the reversal points of price trends. Traditional technical analysis, which usually appears as various trading rules, does aim to look for peaks and bottoms of trends and is widely used in stock market. Unfortunately, it is not convenient to directly apply technical analysis since it depends on person’s experience to select appropriate rules for individual share. In this paper, we enhance conventional technical analysis with Genetic Algorithms by learning trading rules from history for individual stock and then combine different rules together with Echo State Network to provide trading suggestions. Numerous experiments on S&P 500 components demonstrate that whether in bull or bear market, our system significantly outperforms buy-and-hold strategy. Especially in bear market where S&P 500 index declines a lot, our system still profits.
Article
The synthesis of fuzzy logic and methods of the Dempster–Shafer theory (the so-called rule-base evidential reasoning) is proved to be a powerful tool for building expert and decision making systems. Nevertheless, there are two limitations of such approaches that reduce their ability to deal with uncertainties the decision makers often meet in practice. The first limitation is that in the framework of known approaches to the rule-base evidential reasoning, a degree of belief can be assigned only to a particular hypothesis, not to a group of them, whereas an assignment of a belief mass to a group of events is a key principle of the Dempster–Shafer theory. The second limitation is concerned with the observation that in many real-world decision problems we deal with different sources of evidence and the combination of them is needed. The known methods for the rule-base evidential reasoning do not provide a technique for the combination of evidence from different sources. In the current paper, a new approach free of these limitations is proposed. The advantages of this approach are demonstrated using simple numerical examples and the developed stock trading expert system optimized and tested on the real data from Warsaw Stock Exchange.
Article
Based on the principles of technical analysis, this paper proposes an artificial intelligence model, which employs the Adaptive Network Fuzzy Inference System (ANFIS) supplemented by the use of reinforcement learning (RL) as a non-arbitrage algorithmic trading system. The novel intelligent trading system is capable of identifying a change in a primary trend for trading and investment decisions. It dynamically determines the periods for momentum and moving averages using the RL paradigm and also appropriately shifting the cycle using ANFIS-RL to address the delay in the predicted cycle. This is used as a proxy to determine the best point in time to go LONG and visa versa for SHORT. When this is coupled with a group of stocks, we derive a simple form of “riding the cycles – waves”. These are the derived features of the underlying stock movement. It provides a learning framework to trade on cycles. Initial experimental results are encouraging. Firstly, the proposed framework is able to outperform DENFIS and RSPOP in terms of true error and correlation. Secondly, based on the test trading with five US stocks, the proposed trading system is able to beat the market by about 50 percentage points over a period of 13years.
Article
Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up and propose a novel accelerated gradient algorithm, which deals with this deficiency, enjoys a uniformly superior guarantee and works well in practice.
Article
Algorithmic trading has sharply increased over the past decade. Equity market liquidity has improved as well. Are the two trends related? For a recent five-year panel of New York Stock Exchange (NYSE) stocks, we use a normalized measure of electronic message traffic (order submissions, cancellations, and executions) as a proxy for algorithmic trading, and we trace the associations between liquidity and message traffic. Based on within-stock variation, we find that algorithmic trading and liquidity are positively related. To sort out causality, we use the start of autoquoting on the NYSE as an exogenous instrument for algorithmic trading. Previously, specialists were responsible for manually disseminating the inside quote. As stocks were phased in gradually during early 2003, the manual quote was replaced by a new automated quote whenever there was a change to the NYSE limit order book. This market structure change provides quicker feedback to traders and algorithms and results in more message traffic. For large-cap stocks in particular, quoted and effective spreads narrow under autoquote and adverse selection declines, indicating that algorithmic trading does causally improve liquidity.