ChapterPDF Available

Deep Reinforcement Learning Models for Automated Stock Trading

Authors:

Abstract and Figures

This paper proposes an automated trading strategy using reinforcement learning. The stock market has become one of the largest financial institutions. These institutions embrace machine learning solutions based on artificial intelligence for market monitoring, credit quality, fraud detection, and many other areas. We desire to provide an efficient and effective solution that would overcome the manual trading drawbacks by building a Trading Bot. In this paper, we will propose a stock trading strategy that uses reinforcement learning algorithms to maximize the profit. The strategy employs three actor critic models: Advanced Actor Critic(A2C), Twin Delayed DDPG (TD3) and Soft Actor Critic (SAC). Our strategy picks the most optimal model based on the current market situation. The performance of our trading bot is evaluated and compared with Markowitz portfolio theory.
Content may be subject to copyright.
Deep Reinforcement Learning Models for
Automated Stock Trading
Sarthak Singh1, Vedank Goyal2, Sarthak Goel3, and Prof. H.C. Taneja4
1 Department of Applied Mathematics, Delhi Technological University
sarthak.snhsingh104@gmail.com
2 Department of Applied Mathematics, Delhi Technological University
vedankgoyal99@gmail.com
3 Department of Applied Mathematics, Delhi Technological University
sarthak.goel01@gmail.com
4 Department of Applied Mathematics, Delhi Technological University
hctaneja@dce.ac.in
Abstract. This paper proposes an automated trading strategy using reinforcement
learning. The stock market has become one of the largest financial institutions.
These institutions embrace machine learning solutions based on artificial
intelligence for market monitoring, credit quality, fraud detection, and many other
areas. We desire to provide an efficient and effective solution that would overcome
the manual trading drawbacks by building a Trading Bot. In this paper, we will
propose a stock trading strategy that uses reinforcement learning algorithms to
maximize the profit. The strategy employs three actor critic models: Advanced
Actor Critic(A2C), Twin Delayed DDPG (TD3) and Soft Actor Critic (SAC). Our
strategy picks the most optimal model based on the current market situation. The
performance of our trading bot is evaluated and compared with Markowitz portfolio
theory.
Keywords: Trading, Trading bot, portfolio, Reinforcement Learning, Markowitz’
Efficient Frontier Model, Actor Critic model, Twin Delayed DDPG, Soft Actor
Critic
.
1
Introduction
People have been trading stock for years in order to make best use of their stagnant
wealth. It involves a lot of risk so traders have been continuously trying to find ways to
reduce these risks to maximize the return. [1]. Stock trading allows companies to gain
capital from the investing public to fund their own growth and operations. They may
create a greater rate of return, but still there is a huge amount of risk involved.
Stock market being extremely volatile fluctuates drastically even in the space of a
few seconds. Investors have the opportunity to participate in trading around the world
and at any time of the day. They usually regret not buying/selling a particular stock at a
particular time. Due to this behavior of stocks, large investment firms and private banks
try to create strategies to predict the stock market. Common public doesn’t have access
to these tools [2]. A trading bot overcomes the limitation that a manual trader faces.
Although it can do calculations at a much faster pace than humans, a trading bot fails to
Advanced Production and Industrial Engineering
R.M. Singari and P.K. Kankar (Eds.)
© 2022 The authors and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/ATDE220738
175
replicate the intelligence of humans when it comes to decision making. [3]. Recently,
machine learning and deep learning algorithms have been widely used to build models
to reduce risks in the financial market [4].
We desire to provide an efficient and effective solution that would overcome the
manual trading drawbacks by building a Trading Bot [5]. Our strategy employs three
actor critic models: Advanced Actor Critic(A2C) [6], [7], Twin Delayed DDPG (TD3)
[8], [9] and Soft Actor Critic (SAC) [10]. This trading bot which will be efficient enough
to automatically trade on its own based on different market conditions and user approach
throughout the time period with continuous modifications to ensure the best trade return
for the specified period..
2
Problem Description
Our stock trading problem is modeled as a Markov Decision Process. It consists of an
agent and environment. The agent of our model is the trading bot and the environment is
the financial market that the bot observes. The agent observes the market and chooses a
trading action from the action space.
2.1 MDP for Trading
Markov Decision Process contains a vector of states S=[p,c,b], A vector of actions A
such as buy, sell and hold, a transition Model T(s,a,s’), a reward function R(s,a) and a
policy Π(s)→a
The trading bot tries to maximize this reward function that it will receive in future.
Fig. 1 shows the framework for this process.
Fig. 1. Framework for MDP
2.2 Financial Indicator
Different types of financial indicators were used to represent a company’s financial
position. These are Profitability ratios, Market Liquidity ratios, Efficiency ratios,
Leverage financial ratios and Market valuation ratios.
2.3 Assumptions and Constraints
Trading Bot does not affect the financial market in any way.
All the allowed actions lead to non- negative balance.
Collapse of the stock market such as war, sovereign debt is not considered.
S. Singh et al. / Deep Reinforcement Learning Models for Automated Stock Trading176
3
Trading Agent
The proposed strategy employs three actor critic models: Advanced Actor Critic(A2C),
Twin Delayed DDPG (TD3) and Soft Actor Critic (SAC). It picks the most optimal
model based on the current market situation.
3.1 Advanced Actor Critic(A2C)
In A2C an actor-critic algorithm is used to ameliorate the policy gradient. A2C does not
estimate the value function but the advantage function in order to taper the variance of
the policy gradients [11]. Estimating advantage function depends on how much better
the action can be and in return makes the model more vigorous. A2C uses clones of the
same agent where each works independently to interrelate with the same environment.
3.2 Twin Delayed DDPG(TD3)
TD3 or Twin Delayed DDPG is an off-policy algorithm. It uses a constant action spaced
environment. In Bellman error loss function, TD3, uses the minimum of the two Q-
functions it calculates. TD3 adds noise to actions at the time of training, to achieve the
objective of higher exploration of policies. TD3 also updates the target networks and
policies a little seldom than the Q-function [12].
3.3 Soft Actor Critic(SAC)
SAC or Soft Actor Critic is an off-policy algorithm that forms a bridge between
stochastic policy optimization and DDPG. It is not like TD3 however, it does take the
double-Q trick into its formula, but due to optimization of the stochastic policy [13],
SAC ends up benefiting from target policy smoothing. It uses the feature of entropy
regularization. SAC maximizes trade-off between return and entropy.
3.4 Ensemble Strategy
The motive of creating a highly vigorous trading model is achieved by compiling all the
above stated models into one ensemble, and to let them trade based on the Sharpe ratio.
One agent performs great in bull-run and acts bad in bear-run, another does the opposite,
and another is optimized to higher volatility. Trading agent is chosen with respect to
maximization of the returns adjusted to the risk, higher Sharpe value is the corresponding
check for the end objective.
4
Architecture
There are three layers in FinRL library i.e. the stock market environment which is the
application layer, DRL trading agent, and stock trading application which is the finance
market environment. Each layer has several modules which define a separate function
and are relatively independent. Any module from any layer can be selected on the basis
of one’s stock trading strategy.
S. Singh et al. / Deep Reinforcement Learning Models for Automated Stock Trading 177
One agent performs great in bull-run and acts bad in bear-run, another does the
opposite, and another is optimized to higher volatility. Trading agent is chosen with
respect to maximization of the returns adjusted to the risk, higher Sharpe value is the
corresponding check for the end objective. Fig. 2 shows how the agent layer interacts
with the finance market environment in such a way that it maximizes the reward function.
Fig. 2. Architecture
4.1 Environment
This trading environment is based on OpenAI Baselines [14] and Stable Baselines [15].
This trading environment is based on OpenAI Baselines and Stable Baselines. It can
simulate live stock market data with the real market data in correlation with the principle
of time driven simulation. State baseline is an improved implementation of RL
algorithms based on the OpenAI Baselines. In order to define the parameters of the
environment the stock dimension and state space was defined. Stock dimension is equal
to the number of unique tickers used to train the data.
4.2 Data
Daily stock price data is downloaded from Yahoo Finance using the Yahoo Finance API.
The data extracted is saved in a dataframe using python library Pandas as shown in Fig.
3. The Dow Jones Index from WRDS is used for this model. The training set is ten-year-
long data from 01/01/2009 till 31/12/2019, and the trade set is a one-year-long data from
01/01/2020 till 31/10/2021.
5
Performance Evaluation
5.1 Agent Selection
In Table 1, SAC has the best sharpe ratio of -0.71 of the three models used, SAC will be
used as trading agent for the period 2020-01 to 2020-03. SAC has the best sharpe ratio
of 1.20 for the next quarter so SAC will be used as trading agent for the period 2020-04
to 2020-06. TD3 has the best sharpe ratio of 0.81 for the next quarter so TD3 will be
used as trading agent for the period 2020-07 to 2020-09.
Agent Layer
Environment Layer
Data
S. Singh et al. / Deep Reinforcement Learning Models for Automated Stock Trading178
Table 1. Sharpe Ratio of Trading Agents
Trading Agent A2C TD3 SAC Selected Agent
2020-01 to 2020-03 -0.92 -0.75 -0.71 SAC
2020-04 to 2020-06 1.10 1.17 1.20 SAC
2020-07 to 2020-09 0.58 0.81 0.36 TD3
2020-10 to 2020-12 1.27 1.10 0.72 A2C
2021-01 to 2021-03 0.90 1.36 1.47 SAC
2021-04 to 2021-06 -0.002 0.86 1.34 SAC
2021-07 to 2021-09 -0.46 -0.19 0.12 SAC
2021-10 to 2021-11 2.66 1.38 3.66 SAC
5.2 Metric Used for Performance Evaluation
We have used different metrics to evaluate the performances of the models. This is
because one metric alone cannot be trusted to give good feedback about the performance
and output of the model. These metrics are Annualized return, Annualized volatility,
Sharpe Ratio and Cumulative return
Table 2. Performance Evaluation
01/01/2020-31/10/2021 Ensemble
Strategy
A2C TD3 SAC MPT
Cumulative Return 17.56% 11.85% 20.07% 15.60% 7.89%
Sharpe Ratio 0.27 0.17 0.24 0.19 0.09
Annual Return 3.4% 2.9% 3.7% 3.5% 1%
Annual Volatility 9.76% 15.92% 14.35% 16.38% 24.67%
6
Conclusion
In this paper, deep reinforcement learning algorithms which are Advanced Actor
Critic(A2C), Twin Delayed DDPG (TD3) and Soft Actor Critic (SAC) were explored.
In order to adjust with the continuously changing marketing conditions and ensemble
strategy was proposed to optimize the annual return and reduce the risk. Table 1. shows
how this strategy picks the best model out of the three algorithms used. Table 2. shows
that the proposed strategy outperforms the Markowitz portfolio theory. The annual return
of our ensemble strategy is 3.4% whereas in the case of Markowitz portfolio theory it is
just 1%.
For future work, there is a lot of scope in testing the model on various parameters. In
this paper we have kept some parameters constant and have not hindered them. We can
include the parameter of the time period in which the bot has to run. We have used a
quarter of the financial year but this parameter can be tested against a different number
of financial trading days. We can also hinder the number of tickers used in the portfolio
for getting various results. Machine learning techniques can be used to optimize the agent
selection process. Advance transaction cost can be added to the state space.
S. Singh et al. / Deep Reinforcement Learning Models for Automated Stock Trading 179
References
[1] Demirgüç-Kunt A, Levine R. Stock market development and financial intermediaries: stylized facts. The
World Bank Economic Review. 1996 May 1;10(2):291-321.
[2] Goriaev A, Zabotkin A. Risks of investing in the Russian stock market: Lessons of the first decade.
Emerging Markets Review. 2006 Dec 1;7(4):380-97.
[3] Mathur M, Mhadalekar S, Mhatre S, Mane V. Algorithmic Trading Bot. InITM Web of Conferences
2021 (Vol. 40, p. 03041). EDP Sciences.
[4] Yang H, Liu XY, Wu Q. A practical machine learning approach for dynamic stock recommendation.
In2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And
Communications/12th IEEE International Conference On Big Data Science And Engineering
(TrustCom/BigDataSE) 2018 Aug 1 (pp. 1693-1697). IEEE.
[5] Kanade P, Singh S, Rajoria S, Veer P, Wandile N. Machine learning model for stock market prediction.
International Journal for Research in Applied Science and Engineering Technology. 2020;8(6):209-16.
[6] Sagiraju K, Shashi M. Reinforcement Learning Algorithms for Automated Stock Trading. Advances in
Dynamical Systems and Applications (ADSA). 2021;16(2):1019-32.
[7] AbdelKawy R, Abdelmoez WM, Shoukry A. A synchronous deep reinforcement learning model for
automated multi-stock trading. Progress in Artificial Intelligence. 2021 Mar;10(1):83-97.
[8] Park DY, Lee KH. Practical algorithmic trading using state representation learning and imitative
reinforcement learning. IEEE Access. 2021 Nov 13;9:152310-21.
[9] Wang B, Zhang X. Deep Learning Applying on Stock Trading. 2021 (pp. 1–6)
[10] Kathirgamanathan A, Twardowski K, Mangina E, Finn DP. A centralized soft actor critic deep
reinforcement learning approach to district demand side management through CityLearn. InProceedings
of the 1st international workshop on reinforcement learning for energy management in buildings & cities
2020 Nov 17 (pp. 11-14).
[11] Grondman I, Busoniu L, Lopes GA, Babuska R. A survey of actor-critic reinforcement learning: Standard
and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications
and Reviews). 2012 Dec 21;42(6):1291-307.
[12] Cui Y, Huang X, Wu D, Zheng H. Machine Learning based Resource Allocation Strategy for Network
Slicing in Vehicular Networks. In2020 IEEE/CIC International Conference on Communications in China
(ICCC) 2020 Aug 9 (pp. 454-459). IEEE.
[13] Gros S, Zanon M. Reinforcement learning based on MPC and the stochastic policy gradient method.
In2021 American Control Conference (ACC) 2021 May 25 (pp. 1947-1952). IEEE.
[14] Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. Openai gym. arXiv
preprint arXiv:1606.01540. 2016 Jun 5.
[15] Raffin A, Hill A, Ernestus M, Gleave A, Kanervisto A, Dormann N. Stable baselines3
.
S. Singh et al. / Deep Reinforcement Learning Models for Automated Stock Trading180
... This stems from the intricate exploration and complex reward adjustment procedures required to steer the algorithm towards a profitable policy, rendering it reliant on continuous information updates (Li et al., 2019;Shavandi & Khedmati, 2022;Théate & Ernst, 2021). Furthermore, these systems often rely on fixed, regular time intervals for sampling (Hao et al., 2023;Huang et al., 2023;Singh et al., 2022), and value-based methods typically adhere to standard expectation calculations instead of distributional learning (Brim, 2020;Brim et al., 2022;Suliman et al., 2022;Wu et al., 2020). ...
Article
Full-text available
Algorithmic trading allows investors to avoid emotional and irrational trading decisions and helps them make profits using modern computer technology. In recent years, reinforcement learning has yielded promising results for algorithmic trading. Two prominent challenges in algorithmic trading with reinforcement learning are (1) extracting robust features and (2) learning a profitable trading policy. Another challenge is that it was previously often assumed that both long and short positions are always possible in stock trading; however, taking a short position is risky or sometimes impossible in practice. We propose a practical algorithmic trading method, SIRL-Trader, which achieves good profit using only long positions. SIRL-Trader uses offline/online state representation learning (SRL) and imitative reinforcement learning. In offline SRL, we apply dimensionality reduction and clustering to extract robust features whereas, in online SRL, we co-train a regression model with a reinforcement learning model to provide accurate state information for decision-making. In imitative reinforcement learning, we incorporate a behavior cloning technique with the twin-delayed deep deterministic policy gradient (TD3) algorithm and apply multistep learning and dynamic delay to TD3. The experimental results show that SIRL-Trader yields higher profits and offers superior generalization ability compared with state-of-the-art methods.
Article
Automated trading is one of the research areas that has benefited from the recent success of deep reinforcement learning (DRL) in solving complex decision-making problems. Despite the large number of researches done, casting the stock trading problem in a DRL framework still remains an open research area due to many reasons, including dynamic extraction of financial data features instead of handcrafted features, applying a scalable DRL technique that can benefit from the huge historical trading data available within a reasonable time frame and adopting an efficient trading strategy. In this paper, a novel multi-stock trading model is presented, based on free-model synchronous multi-agent deep reinforcement learning, which is able to interact with the trading market and to capture the financial market dynamics. The model can be executed on a standard personal computer with multiple core CPU or a GPU in a convenient time frame. The superiority of the proposed model is verified on datasets of different characteristics from the American stock market with huge historical trading data.
Article
The modern history of the Russian stock market has mirrored ups and downs of the country's transition as well as swings in investor perceptions. In this paper, we describe the evolution of the Russian stock market over its first decade, with particular attention to the risk factors driving stock returns. First, we analyze the development of the institutional infrastructure and dynamics of the market's size and liquidity measured by the number of listed and traded stocks, depositary receipts and IPOs as well as trading volume in the local stock exchanges and abroad. Then, we examine major political and economic events, which influenced the investor perceptions of the country risk and were reflected in stock prices. Finally, we carry out quantitative analysis of risk factors explaining considerable time and cross-sectional variation in Russian stock returns. We document a significant role of corporate governance, political risk, and macroeconomic risk factors, such as global equity markets performance, oil prices, and exchange rates, whose relative importance varied a lot over time.
Stock market development and financial intermediaries: stylized facts. The World Bank Economic Review
  • A Demirgüç-Kunt
  • R Levine
Demirgüç-Kunt A, Levine R. Stock market development and financial intermediaries: stylized facts. The World Bank Economic Review. 1996 May 1;10(2):291-321.
Machine learning model for stock market prediction
  • P Kanade
  • S Singh
  • S Rajoria
  • P Veer
  • N Wandile
Kanade P, Singh S, Rajoria S, Veer P, Wandile N. Machine learning model for stock market prediction. International Journal for Research in Applied Science and Engineering Technology. 2020;8(6):209-16.
Reinforcement Learning Algorithms for Automated Stock Trading. Advances in Dynamical Systems and Applications (ADSA)
  • K Sagiraju
  • M Shashi
Sagiraju K, Shashi M. Reinforcement Learning Algorithms for Automated Stock Trading. Advances in Dynamical Systems and Applications (ADSA). 2021;16(2):1019-32.