PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

This work proposes DeepFolio, a new model for deep portfolio management based on data from limit order books (LOB). DeepFolio solves problems found in the state-of-the-art for LOB data to predict price movements. Our evaluation consists of two scenarios using a large dataset of millions of time series. The improvements deliver superior results both in cases of abundant as well as scarce data. The experiments show that DeepFolio outperforms the state-of-the-art on the benchmark FI-2010 LOB. Further, we use DeepFolio for optimal portfolio allocation of crypto-assets with rebalancing. For this purpose, we use two loss-functions - Sharpe ratio loss and minimum volatility risk. We show that DeepFolio outperforms widely used portfolio allocation techniques in the literature.
DeepFolio: Convolutional Neural Networks for
Portfolios with Limit Order Book Data
Aiusha Sangadiev‡‡∗, Rodrigo Rivera-Castro‡‡† , Kirill Stepanov‡‡‡,
Andrey Poddubny‡‡§, Kirill Bubenchikov‡‡, Nikita Bekezin‡‡k ,
Polina Pilyugina∗∗ and Evgeny Burnaev††
Skoltech
Moscow, Russia
‡‡Equal Contribution
Email: aiusha.sangadiev@skoltech.ru, rodrigo.riveracastro@skoltech.ru, kirill.stepanov@skoltech.ru,
§andrey.poddubny@skoltech.ru, kirill.bubenchikov@skoltech.ru, knikita.bekezin@skoltech.ru,
∗∗polina.pilyugina@skoltech.ru, ††e.burnaev@skoltech.ru
Abstract—This work proposes DeepFolio, a new model for deep
portfolio management based on data from limit order books
(LOB). DeepFolio solves problems found in the state-of-the-art for
LOB data to predict price movements. Our evaluation consists of
two scenarios using a large dataset of millions of time series. The
improvements deliver superior results both in cases of abundant
as well as scarce data. The experiments show that DeepFolio
outperforms the state-of-the-art on the benchmark FI-2010 LOB.
Further, we use DeepFolio for optimal portfolio allocation of
crypto-assets with rebalancing. For this purpose, we use two
loss-functions - Sharpe ratio loss and minimum volatility risk. We
show that DeepFolio outperforms widely used portfolio allocation
techniques in the literature.
Index Terms—Investment Portfolios, Big Data Mining, Cryp-
toassets, Convolutional Neural Networks
I. INTRODUCTION
More than half of the financial world uses electronic Limit
Order Books (LOBs). LOBS are a store of records of all
transactions, [
1
], [
2
]. A limit order is a request to transact with
a financial instrument at a price not exceeding a threshold, [
3
].
Usually, traders set so-called buy limit orders below the current
market price. They represent the maximum price that the trader
is willing to pay. On the other side, traders set the amount
above the current market price. The sell limit orders act as the
minimum price to sell. LOBs are also gaining popularity in the
relatively new and rapidly developing crypto-asset market. The
novelty of LOBs leads to low market liquidity and increased
stochastic behavior of crypto-asset prices [
4
]. It is easy to
see the drivers behind the increasing popularity of LOBs. Our
example in Figure 1 shows how traders control the price of the
transaction and the logic behind a LOB. First, a passive order
for one ETH crypto asset at 260 USDT arrives. Similarly, a
retail order to sell three crypto-assets at 300 USDT appears.
The sell order matches with three passive orders to buy. Second,
a trade executes at 300 USDT, and the LOB removes the buy
orders.
Sections I, II, III, IV were supported by the Ministry of Education and
Science of the Russian Federation (Grant no. 14.756.31.0001). Other sections
were supported by the Mexican National Council for Science and Technology
(CONACYT), 2018-000009-01EXTF-00154.
Price (USDT)
250 260 270 280 290 300 310 320 330
Volume
— bid
— ask
Fig. 1. Example of a LOB for the ETH crypto-asset
Accordingly, modeling LOBs with mathematical methods is
a challenging task. Typically, researchers resort to using the
autoregressive integrated moving average model (ARIMA), [
5
].
Alternatively, the vector autoregressive model (VAR), [
6
], is a
popular choice. One of the benefits of the VAR is that it can
display the direction of transactions. However, LOB data is
highly stochastic, and time series are unsteady. The result is
additional noise to the data. This setting makes the creation
of dedicated models and processing data demanding. Another
limitation of those techniques is that they make assumptions
on the data. To overcome these limitations, [
7
] suggests a
state-of-the-art model called DeepLOB.
In this work, we propose a LOB-based approach to predict
price trends of crypto-assets. Levering deep neural networks, we
call our approach DeepFolio. Our proposal achieves superior
results and addresses some of the problems of DeepLOB.
Moreover, we go a step further and use DeepFolio to build
investment portfolios. Thus, this work adds a new entry to the
”deep portfolio” literature.
arXiv:2008.12152v1 [cs.LG] 27 Aug 2020
The remainder of the paper is structured as follows. We
introduce relevant literature in section II and our data processing
in section III. We then proceed to present our methodology in
section IV. section V describes our experiments and details
the results of our method. In this section, we compare with a
range of baseline algorithms. In section VI, we summarise our
findings and discuss possible future work.
II. RELATED W ORK
A. Deep learning and LOBs
There have been previous attempts to work with limit order
book data using machine learning methods. For example, in [
8
],
they extract features using principal component analysis (PCA).
Furthermore, in a second step, they use linear discriminant
analysis (LDA). However, these techniques are suitable only
for processing statistical data. Besides, they are not optimized
for working with dynamics. Another critical point is that these
models make inherent assumptions about the data. As a result,
these techniques yield lower efficiency.
Besides [
7
], there are several works in the literature. Their
focus is on the application of deep learning and neural
networks. They use them to process limit order book data,
and then to classify price trends. Along this line of work,
one of the most notable entries is [
9
]. The authors propose to
use a fully convolutional neural network (FCNN). With the
FCNN, they extract features and perform trend classification.
This approach shows significant improvements over more
conventional methods such as support vector machines (SVM).
Another example of using deep learning as a classifier for
LOB data is [
10
]. In this work, the authors applies an LSTM
to perform trend forecasting based on LOB data. Finally,
[
7
] combines these two approaches to create a mixed CNN
and LSTM neural network. With LOB data, the approach
delivers state-of-the-art results in the classification of trends.
Significantly, it outperforms approaches using pure CNNs or
LSTMs.
B. Markowitz mean-variance model
The Markowitz mean-variance model is a classic approach.
Portfolio managers use it widely for portfolio building. The
central assumption underlying this theory is that the investor
has two choices. She will try to maximize profits at a given
level of risk or minimize risk at a given level of profit. The
Markowitz model offers to build a broad array of possible
portfolios to reach these goals. It then chooses one of them
through the optimization of the risk-return curve. To build
the space of possible portfolios, Markowitz proposes to lever
three elements. It requires a class of assets, a vector of the
average expected returns, and a covariance matrix, [
11
]. With
this, the Markowitz model constructs an array of portfolios
with various profitability-risk ratios, [
11
]. Since the analysis
builds on two criteria, the manager selects the portfolios based
on three choices:
She searches for effective or non-improvable solutions.
She chooses the main criterion, i.e., minimum profitability,
using other criteria as constraints.
She provides a ”super criteria,” such as a superposition
of the previous two options.
In this work, the criteria for choosing the optimal portfolio
are the maximum Sharpe Ratio, [
12
]. It is a standard metric
for assessing the ”optimality”, and the minimum volatility risk.
III. DATA
A. FI-2010 dataset
This dataset is the first public marked-up dataset of high-
frequency financial markets, [
13
]. It is ideal for assessing and
controlling the forecasting of indicators. With time-series data
from five stocks of the NASDAQ Nordic stock market, it
consists of normalized representations. It results in a dataset of
approximately 40,000,000 time-series samples representing ten
consecutive days. The dataset provides three different normal-
izations: z-score, min-max, and decimal precision normalization.
Due to its richness and relevance, it is a good benchmark for
LOB-based deep learning models, [7].
B. Crypto-assets dataset
Limit order books for crypto-assets are not readily available.
Hence, we assemble the datasets using the public API of
Binance, [
14
]. Binance is a relevant market for the trade of
crypto-assets. In our dataset, the time length of the collected
data is one year. It starts on February 27, 2019, and has an
hour resolution. The data consists of orders, defined by bid
or ask labels, time steps, volumes, and prices. By asks and
bids, we divide the orders. We take the ten best asks, the ten
best bids, and their respective volumes within a five-minute
interval. As a result, we obtain 40 values for a single time
step. Each of them consists of 20 asks and bids, as well as
20 volumes. The percentage of missing values is less than
6%. The dataset has missing values distributed evenly. For
data imputation, we consider methods relying on neighboring
values. These are prices connected to an order volume, such
as simple arithmetic or root mean square average. However,
it probably leads to a distortion of data. For this reason, we
use the propagation of the last viable value as an additional
imputation technique. Moreover, we normalize the data using
the dynamic z-normalization, see Equation 1.
z=xµ
σ(1)
We use the mean
µ
and the standard deviation
σ
of the
previous five days. The objective is to normalize the values of
the current day. In the financial time series literature, dynamic
normalization is a reasonable choice. The motivation is that
financial time series are usually affected by regime shifts, [
7
].
In particular, we can represent crypto-assets’ prices as a sum.
For [
15
], the sum consists of the primary trend plus some
noise or long term and short term volatility. Along these lines,
the dynamic normalization enables the data to be within an
appropriate range. If we apply
z
-normalization on the whole
dataset, we destroy the underlying data patterns. Finally, for
each point in the dataset, we establish a mid-price outlined
in Equation 2. It is the average between the best ask and the
best bid. Throughout this work, we use mid-prices for further
calculations.
pt=p(1)
a(t) + p(1)
b(t)
2(2)
After that, we generate three labels indicating price move-
ments such as increase, decrease, or uncertainty. The third label
is defined whenever an increase or decrease is too small to
confirm them. Since financial data is inherently noisy and highly
stochastic, we use label smoothing strategies. For this purpose,
we calculate
m
, see Equation 3, and
m+
, see Equation 4.
These values denote the average of the previous and next
k
mid-prices. We then calculate the ”smoothed labels”
lt
. In
Equation 5 and Equation 6 respectively, we outline these labels.
These values show relative changes in the asset and its trend,
taking into account a k-point smoothing.
m(t) = 1
k
k
X
i=0
pti(3)
m+(t) = 1
k
k
X
i=0
pt+i(4)
lt=m+(t)pt
pt
(5)
lt=m+(t)m(t)
m(t)(6)
For the final label distribution, we set a threshold,
α
, equal
to 0.001. Changes of 0.1% are sufficiently large to indicate a
price movement. If
lt> α
, we apply
lt
to signalize an increase.
Otherwise, if
lt<α
, the price is decreasing. We consider
the
[α, α]
interval to be an intermediate value of
lt
. In this
case, there is no increase or decrease in price. The changes are
insignificant for this range of values. We present this logic for
the crypto-asset BTC. In our example, the green background
represents a buy signal. We use red for the sell signal and
white for the hold one.
Fig. 2. An example of the labeling process for the crypto-asset BTC. Green
represents buy, red sell and white hold.
IV. MODEL ARCHITECTURE
A. CNN+RNN
The first module of DeepFolio consists of three main blocks.
The first block is a fully convolutional neural network (FCNN).
One Inception block represents the second block. An LSTM
network is our third one. The input to this network has three
elements. They are a batch size, a sequence length, and features.
Hence, we consider this module to be a ”CNN+RNN.”
The FCNN block has three sub-blocks. On the first block,
we have a stridden convolutional layer. It has a kernel size of
1×2
. Thus, it performs convolutions strictly over LOB levels.
Two convolutional layers follow it in the second block. Due
to their kernel sizes of
4×1
, they capture short-term time
dependencies. In the last block of the FCNN, the kernel size
expands to
1×10
. Hence, it performs convolutions over the
remaining elements in the feature dimension.
Similarly, we employ an Inception block, [
16
]. It enables us
to capture dynamic behaviors over multiple time scales. This
block is equivalent to performing multiple moving averages
over different periods. For financial time series analysis, we
can use it to capture the time-series momentum.
The last LSTM block captures long-term temporal depen-
dencies in the data. We feed its output into a fully-connected
layer with a softmax activation function. It has three outputs
to produce probabilities of having one of three possible labels.
They are a negative price trend, 1, a neutral trend, 0, and a
positive trend, +1.
B. Problems with the ”CNN+RNN” module
a) Extreme sensitivity to initial model weight allocation:
Empirical observations show that using ”He uniform” is
suboptimal. Practitioners use it to initialize weights of the
convolutional and recurrent layers, [
17
]. Nevertheless, both
in the case of the weight matrices and the biases, the model
”dies.” It happens early in the training process and results in
a lack of learning. A better option is to use Glorot uniform,
[
18
]. It initializes the weight matrices of the CNN and the
input weight matrix of the LSTM. Similarly, for the recurrent
weight matrix of the LSTM, we have zero initialization. We
do this for all biases, and orthogonal, [
19
]. In Figure 8, we
show this effect. In the figure, ”default” stands for the default
initialization. The second label, ”initialization,” represents our
proposed allocation.
b) Slow learning process at the beginning of the training:
This effect is especially noticeable with the crypto-asset data.
Compared to the benchmark dataset, FI-2010, it is a smaller
dataset. Figure 9 depicts that it takes more than 30 epochs
before proper training starts.
c) Worse depth-wise scalability: It stems from the first
two problems. Unfortunately, the original model offers worse
depth-wise scalability. An increase in depth hampers the
training process even further.
C. ResCNN+GRU
In [
20
], the authors propose using residual connections.
The motivation is to improve the learning process of deep
convolutional networks. Residual connections allow for better
gradient flows through the layers. Inspired by this, we introduce
blocks with residual connections into the network. Our objective
is to extend the depth of the network. We also want to
improve problems associated with gradient flows and vanishing
gradients. In Figure 3, we present a general architecture for
DeepFolio.
Input
ResCNN+GRU
Label
LSTM
Dense
Output
Fig. 3. General architecture of DeepFolio
Figure 4 depicts the structure of the used residual block. It
consists of three stacked
3×1
convolutions. A leaky rectified
linear unit is the activation function, [
21
]. The leaky ReLU also
serves as a shortcut connection. Our observation is that batch
normalization improves the convergence speed dramatically.
This aligns with similar results from [
22
] and [
20
], However, at
the same time, it hampers the network’s ability to learn ”deeper”
patterns. Other works using deep learning for financial data
do not use batch normalization. Examples of this are [
7
], [
9
],
and [
23
]. We assume that batch normalization might be a
”smoother.” As a consequence, it might affect deeper patterns
in the financial time-series data.
Fig. 4. Structure of a residual block
Figure 5 presents a comparison of two networks using the
same dataset. One has batch normalization, and the other does
not. Negatively, the loss is higher in the validation dataset with
batch normalization. Hence, we do not use it in the residual
blocks.
Fig. 5. Comparison of validation loss (upper) and validation accuracy (lower)
for DeepFolio with batch norms and without on FI-2010 dataset with prediction
horizon k= 1
Fig. 6. Structure of the ”inception v2” module
In Figure 6, we use the architecture ”inception v2”. One can
consider it as an alternative to the canonical inception block.
[
24
] proposed it first. The authors replace the
5×5
kernel with
two consecutive smaller
3×3
kernels. This approach improves
metrics and computational speeds.
For most tasks, the gated recurrent units (GRU) performs
on par with the LSTM. We make this conclusion based on
empirical observations. Our conclusion arises from a numerical
comparison of GRU versus LSTM. However, GRUs offer
additional benefits. They have a more straightforward structure.
Fig. 7. The network architecture of DeepFolio
Fig. 8. Training loss curves for both models with different initial weights
allocations on the FI-2010 dataset with prediction horizon k= 1
It allows them to generalize better in cases of limited data. Our
architectural choices are visible in Figure 7. We present the
full architecture of the ResCNN+GRU module of DeepFolio.
A problem of [
7
] is its initial weight allocation dependency.
In Figure 8, we can see that our ResCNN+GRU module solves
it. It is mostly indifferent to the weight allocations. Further, it
trains well for both cases.
Another problem of [
7
] is noticeable in the crypto-asset
dataset. We run both models for the dataset of the crypto-asset
BTC. Our prediction horizon is
k= 1
to see the performance
Fig. 9. Comparison of training losses of DeepLOB and ResCNN+GRU of
DeepFolio on the Bitcoin dataset with prediction horizon k= 1
of DeepFolio. DeepLOB takes nearly 30+ epochs for the loss
to start dropping. On the other side, our model starts training
at around epochs 8-9. Visually, we confirm in Figure 9 that
the problem disappears.
D. Portfolio optimization model
The predicted labels of DeepFolio are convenient for the
development of trading strategies. However, we go a step
further of price prediction and trading strategies in this work.
Our objective is to generate investment portfolios of crypto-
assets. For this, we build a crypto-asset portfolio consisting of
4 crypto-assets. We also perform weight rebalancing every 50
minutes. Rather than strictly building portfolios with historical
data, we use our predictions. It results in picking the period
for rebalancing. The reasoning is that it is less frequent than
a predictive horizon of
k= 1
, i.e., 5 minutes. Nevertheless,
we still maintain reliable performance. To allocate portfolio
weights, the model essentially has a two-step structure. First,
we feed the input data to the LSTM network. Then, we pass the
LSTM outputs through the fully connected layer with softmax
activation. LSTMs are very efficient tools for modeling time
series and especially financial data. Our innovation is that we
use price movement labels to perform rebalancing. Traditionally,
the literature works with the price and returns history.
The following algorithm performs the training scheme.
Fist, the input for LSTM layers with 64 units consists of
price movement indicators. These are the labels from the
ResCNN+GRU module. In our case, the period consists of 50
minutes. Second, we pass the predictions through softmax. With
this, we can get portfolio weights and use them to optimize
the objective function. Third, we run an Adam optimizer with
a learning rate of 0.001. We use this to train our network and
set the batch size to 64. Fourth, after we train the network,
we use the input predicted labels. The ResCNN+GRU module
generates them for intervals of 50 minutes. As a result, we
obtain the portfolio weights rebalanced. Fifth, we move ahead
to the next 50-minutes interval. Again, we feed the input
with predictions from the ResCNN+GRU module and update
weights. Finally, we repeat this process for the whole test set.
In this work, we evaluate two different loss functions:
1) Maximization of Sharpe ratio, proposed in [7]:
LSR =
E[R]
std(R)
where:
E[R] = E"n
X
i=1
wi,t1ri,t#
and
ri,t = (pi,t pi,t1)/pi,t1
is the return of the
asset
i
.
std
is the standard deviation. Sharpe ratio is
essentially a form of risk-adjusted returns. It assesses
the ”optimality” of the portfolio. Portfolios with a higher
Sharpe ratio are considered more optimal.
2) Minimization of portfolio volatility (risk):
LV=std(R)
It corresponds to the minimization of volatility. This is
equal to reducing portfolio risks.
V. EXPERIMENTS
We evaluate our model and compare its performance with
the state-of-the-art. Besides, we also consider two more
baseline models. They are a CNN, [
9
], and an LSTM, [
10
].
For DeepLOB, we follow the indications in its respective
publication strictly. To train the ResCNN+GRU module of
DeepFolio, we use an Adam optimizer. We set its learning rate
at 0.01, and
to 1. To avoid overfitting, we apply early stopping
with checkpointing. It saves the model weights each time. Our
performance metrics are accuracy for FI-2010 and F1 score for
the crypto-asset. On each iteration, we seek to improve them
on the validation set. If we do not observe changes after 20
epochs, the training stops.
L2
-normalization helps us tackle
overfitting. It is especially relevant for the ResCNN+GRU
module of DeepFolio. Sometimes, it can overfit the training
data. For example, validation loss starts to grow steadily. We
suppose that this is due to the deeper architecture with more
parameters.
A. FI-2010
For the FI-2010 dataset, we divide ten days of this dataset
into three parts. We use seven days for training and two days
for validation. The remaining days serve us as a training metric.
We use 40 features from the dataset. They account for the ten
levels of ask prices, bid prices, and quantities. The last five
features are labels. Respectively, they account for the prediction
horizons
k= 1,2,3,5
, and
10
. We use only
k= 1,5,10
for
comparison. These labels represent three different horizons.
They are short-term predictions, mid-term predictions, and
long-term predictions. We also employ a sliding time window
of length
T= 100
with a batch size equal to 64. The input
to the network has a size (64, 1, 100, 40). In this case, the
second dimension is an auxiliary ”channel” dimension.
In Table I, we see the benefits of our model. Both DeepLOB
and DeepFolio massively outperform both baseline models.
TABLE I
EXP ERIM ENTA L RES ULTS FO R FI-2010 BEN CHM ARK DATAS ET WI TH
DIFFERENT PREDICTION HORIZONS k
Model Accuracy % Precision % Recall % F1 %
Prediction horizon k = 10
CNN 41,23% 44,54% 45,89% 38,40%
LSTM 38,31% 25,12% 33,32% 18,63%
DeepLOB 77.39% 80.72% 77.39% 77.11%
DeepFolio 79.51% 82.18% 79.51% 79.22%
Prediction horizon k = 5
CNN 58,11% 50,76% 55,25% 50,67%
LSTM 50,60% 16,87% 33,33% 22,40%
DeepLOB 74.26% 77.58% 74.26% 73.7%
DeepFolio 75.03% 77.66% 75.03% 74.51%
Prediction horizon k = 1
CNN 77,88% 75,53% 60,56% 65,12%
LSTM 66,93% 22,31% 33,33% 26,73%
DeepLOB 81.8% 83.02% 81.8% 80.88%
DeepFolio 82.44% 83.98% 82.44% 81.29%
The difference grows further as the length of the prediction
horizon,
k
, grows. DeepFolio also outperforms DeepLOB on
all metrics. The performance gap between these two models
also grows with the length of
k
. The architecture of DeepFolio
captures the long-term relations in the data better.
B. Crypto-asset dataset
We consider two different cases for the crypto-asset dataset.
The first setup is a conventional one. We train a separate
network for each crypto-asset. Then, we validate and test only
on the respective crypto-assets. The second approach combines
three crypto-assets into one dataset. They are BTC, LTC, and
ETH. We do the training on this combined dataset. Separately,
we perform testing on each crypto-asset. That way, we can
assess the models’ ability to generalize. Also, we intentionally
hold out Ripple (XRP) entirely. We aim to additionally back-test
the models. We want to evaluate their generalization ability to
do transfer learning. For both approaches, we use a sliding time
window of
T= 60
and a batch size of 64. For the first case, we
employ a 70-15-15 split of the datasets. Respectively, we use
70% for training and 15% for validation and test. An additional
characteristic is that the datasets are unbalanced. Hence, we
focus on the weighted F1 score to assess the performance of
the models.
In Table II, we appreciate that both DeepLOB and DeepFolio
outperform. The baselines show worse performance by a large
margin on all metrics. When we move to longer prediction
horizons, it becomes especially evident. Rapidly, the metrics
of baseline methods start dropping. DeepLOB and DeepFolio
also experience a decrease in metrics. Nevertheless, it is not as
severe as the baseline models. While directly comparing Deep-
Folio and DeepLOB, we can see that DeepFolio outperforms.
It gets superior scores across all metrics. However, the gap
between them is narrow. To better investigate the results, we
provide the confusion matrices. They are available for the four
prediction horizons in Figure 11 for
k
= 1. In Figure 12, they
have
k
= 5. We also present two additional matrices for further
horizons. In Figure 13, it is k= 10. k= 20 is in Figure 14.
TABLE II
THE R ESULT S OF EX PER IME NTS ON C RYPT OASS ETS W ITH T HE FIR ST SE TUP
FOR DIFFERENT PREDICTION HORIZONS k
Model Accuracy % Precision % Recall % F1 %
Prediction horizon k = 1
CNN 77,88% 75,53% 60,56% 65,12%
LSTM 68,68% 22,89% 33,33% 27,14%
DeepLOB 81.02% 89.21% 81.02% 81.89%
DeepFolio 84.84% 89.11% 84.84% 84.32%
Prediction horizon k = 5
CNN 76,32% 38,45% 42,78% 40,45%
LSTM 40,02% 13,34% 33,33% 19,05%
DeepLOB 64.68% 69.19% 64.68% 65.04%
DeepFolio 65.17% 69.73% 65.17% 65.48%
Prediction horizon k = 10
CNN 23,37% 26,35% 17,88% 20,41%
LSTM 13,51% 17,15% 9,77% 11,37%
DeepLOB 59.81% 63.15% 59.81% 58.32%
DeepFolio 60.28% 66.6% 60.28% 60.87%
Prediction horizon k = 20
CNN 21,60% 20,27% 14,98% 15,91%
LSTM 14,40% 9,95% 9,93% 9,94%
DeepLOB 53.09% 67.33% 53.09% 55.63%
DeepFolio 55.43% 67.43% 55.43% 57.91%
For the second setup, we split the dataset in the following
way. First, we take each crypto-asset from the (BTC, LTC, ETH)
trio separately. Then, we perform an 80-10-10 train-validation-
test split. After that, we concatenate the train parts of the
crypto-assets. With this, we form a single dataset. We repeat
the same process for the validation, while we keep the test
sets separate. The main goal of this setup is to check whether
models can extract general LOB patterns. Our inspiration is
the work of [
25
]. To further test the networks’ ability, we
perform transfer learning. We select the XRP crypto-asset
for this task. We feed it to the entire dataset into models
that did not previously see the XRP data. For this setup, we
exclude baseline models. Their performance is limited, even
when dealing with individual crypto-assets. Thus, we focus on
DeepLOB and DeepFolio, primarily.
We look at Table III. It seems that both models have strong
generalizing abilities. However, DeepFolio outperforms in the
majority of the cases. The gaps this time are higher at about
2-3 % on average. Transfer learning results are also robust. It
means that neural networks are indeed capable of learning the
general LOB patterns. They do not merely adapt to the data.
Overall, in both setups, we can see that DeepFolio outperforms.
C. Portfolio
To evaluate our portfolio model performance, we estimate
the portfolio value using [26] and define it as
pt=pt1
rt
rt1
wt1
where
pt1
is the portfolio value at the beginning of period
t
.
rt
corresponds to prices vector at time
t
. Meanwhile,
wt1
is the portfolio weight vector at the beginning of period
t
. We
rebalance every 50 minutes and do not consider transaction
costs.
TABLE III
RES ULTS FO R THE T RAN SFE R LEA RNIN G SET UP FO R DIFF ERE NT
PREDICTION HORIZONS kUS ING T HE CRY PTO-A SSE T DATASET
Model Accuracy % Precision % Recall % F1 %
Prediction horizon k = 1
DeepLOB 86.69% 91.67% 86.69% 87.44%
DeepFolio 89.9% 93.14% 89.9% 90.4%
Prediction horizon k = 5
DeepLOB 66.43% 70.04% 66.43% 66.97%
DeepFolio 66.46% 69.1% 66.46% 66.75%
Prediction horizon k = 10
DeepLOB 56.96% 71.53% 56.96% 58.39%
DeepFolio 61.57% 66.35% 61.57% 62.33%
Prediction horizon k = 20
DeepLOB 57.6% 69.29% 57.6% 57.6%
DeepFolio 59.81% 67.08% 59.81% 60.75%
Fig. 10. Cumulative returns in logarithmic scale for the various portfolio
strategies
In Figure 10, we see the various portfolio strategies. It
displays the cumulative log-returns. Here,
1/n
is the equal-
weights naive portfolio. Markowitz SR corresponds to the
Markowitz model with Sharpe Ration. Similarly, Markowitz
MV has a mean-variance. DeepFolio SR uses the Sharpe Ratio
as a loss function. However, for DeepFolio MV, we have
volatility, instead. DeepFolio with Sharpe Ratio has the best
performance on the test dataset. Moreover, the testing period
starts around February 2020. In this period, the crisis induced
by COVID-19 hits the global markets.
Table IV presents a global comparison of results. We want
a full understanding of each method’s performance. For this,
we compare the following parameters. First, we consider the
expected and mean returns. Second, we look at the standard
deviation of portfolio returns, the Sharpe Ratio. Third, we
have the ratio between positive and negative returns for the
test period. We can see that all reallocation strategies work
well. Nevertheless, DeepFolio with Sharpe Ratio shows the
best values for all parameters. The only exception is in the
case of the standard deviation.
VI. CONCLUSIONS AND DISCUSSION
We propose DeepFolio to address problems in the state-of-
the-art. Our model surpasses its performance on the benchmark
TABLE IV
EXPERIMENTAL RESULTS FOR DIFFERENT ALGORITHMS ON CRYPTO-A SSE TS DATASE T
Expected Return Mean Return Standard Deviation Sharpe Ratio +/-
Markowitz, SR 1.152323 0.016102 0.006193 0.025998 1.108280
Markowitz, MV 1.159850 0.016629 0.005987 0.027774 1.081761
1/n 1.281730 0.027101 0.006736 0.040234 1.130901
DeepFolio, SR 1.467931 0.040126 0.007561 0.053069 1.118012
DeepFolio, MV 1.280971 0.025986 0.006225 0.041746 1.113636
dataset. We observe similar behavior for the crypto-asset
dataset. It is despite the latter being more scarce and favoring
smaller models. We also show that DeepFolio is capable of
learning general patters in the LOB data. It does not merely
adapt to the data at hand. We demonstrate it through transfer
learning on a previously unseen crypto-asset. We generate price
movement predictions from LOBs. With them, we prove that
they as well can be used for short-term portfolio allocation.
We bestow these portfolios with rebalancing strategies. Such
an approach overcomes the pitfalls of classical methods of
portfolio optimization. Also, we test the model with two
different loss functions. They are the maximization of Sharpe
ratio and volatility. Extensive tests show that DeepFolio with
Sharpe Ratio performs the best. It outperforms all other
approaches. Portfolio managers can use the results of this
work for a myriad of assets. For assets with high liquidity, we
expect a better performance. They are less prone to stochastic
fluctuations. In conclusion, our approach serves as a building
block for an automated portfolio building and optimization
framework.
REFERENCES
[1]
I. Rosu et al., “Liquidity and information in order driven markets,
Chicago Booth School of Business, 2010.
[2]
C. A. Parlour and D. J. Seppi, “Limit order markets: A survey,” Handbook
of financial intermediation and banking, vol. 5, pp. 63–95, 2008.
[3] J. Murphy, “Technical analysis of the futures markets, new,” 1986.
[4]
C. Carrie, “The new electronic trading regime of dark books, mashups
and algorithmic trading,” Trading, vol. 2006, no. 1, pp. 14–20, 2006.
[5]
A. A. Ariyo, A. O. Adewumi, and C. K. Ayo, “Stock price prediction
using the arima model,” in 2014 UKSim-AMSS 16th International
Conference on Computer Modelling and Simulation. IEEE, 2014,
pp. 106–112.
[6]
E. Zivot and J. Wang, “Vector autoregressive models for multivariate
time series,” Modeling Financial Time Series with S-Plus
R
, pp. 385–429,
2006.
[7]
Z. Zhang, S. Zohren, and S. Roberts, “Deeplob: Deep convolutional
neural networks for limit order books,” IEEE Transactions on Signal
Processing, vol. 67, no. 11, pp. 3001–3012, 2019.
[8]
N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosifidis,
“Temporal bag-of-features learning for predicting mid price movements
using high frequency limit order book data,” IEEE Transactions on
Emerging Topics in Computational Intelligence, 2018.
[9]
A. Tsantekidis, N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj,
and A. Iosifidis, “Forecasting stock prices from the limit order book
using convolutional neural networks,” in 2017 IEEE 19th Conference
on Business Informatics (CBI). IEEE, Jul. 2017. [Online]. Available:
https://doi.org/10.1109/cbi.2017.23
[10]
——, “Using deep learning to detect price change indications in
financial markets,” in 2017 25th European Signal Processing Conference
(EUSIPCO), 2017.
[11] H. M. Markowitz, “Portfolio selection,” Fi, 1978.
[12]
W. F. Sharpe, “Mutual fund performance,” The Journal of business,
vol. 39, no. 1, pp. 119–138, 1966.
[13]
A. Ntakaris, M. Magris, J. Kanniainen, M. Gabbouj, and A. Iosifidis,
“Benchmark dataset for mid-price forecasting of limit order book data
with machine learning methods,” Journal of Forecasting, vol. 37, no. 8,
pp. 852–866, 2018.
[14]
“Official documentation for the binance apis and streams.” [Online].
Available: https://github.com/binance-exchange/binance- official-api-docs
[15]
C. Conrad, A. Custovic, and E. Ghysels, “Long- and short-term
cryptocurrency volatility components: A garch-midas analysis,Journal
of Risk and Financial Management, vol. 11, p. 23, 05 2018.
[16]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2015, pp. 1–9.
[17]
K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers:
Surpassing human-level performance on imagenet classification,” 2015.
[18]
X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in In Proceedings of the International
Conference on Artificial Intelligence and Statistics (AISTATS10). Society
for Artificial Intelligence and Statistics, 2010.
[19]
A. M. Saxe, J. L. McClelland, and S. Ganguli, “Exact solutions to the
nonlinear dynamics of learning in deep linear neural networks,” 2013.
[20]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” arXiv preprint arXiv:1512.03385, 2015.
[21]
A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities
improve neural network acoustic models,” in in ICML Workshop on
Deep Learning for Audio, Speech and Language Processing, 2013.
[22]
S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network
training by reducing internal covariate shift,” in Proceedings of the 32nd
International Conference on Machine Learning, ser. Proceedings of
Machine Learning Research, F. Bach and D. Blei, Eds., vol. 37. Lille,
France: PMLR, 07–09 Jul 2015, pp. 448–456.
[23]
F. Feng, X. He, X. Wang, C. Luo, Y. Liu, and T.-S. Chua, “Temporal
relational ranking for stock prediction,” ACM Transactions on Information
Systems (TOIS), vol. 37, no. 2, p. 27, 2019.
[24]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
the inception architecture for computer vision,” 2015.
[25]
J. Sirignano and R. Cont, “Universal features of price formation in
financial markets: perspectives from deep learning,” 2018.
[26]
Z. Jiang, D. Xu, and J. Liang, “A Deep Reinforcement Learning
Framework for the Financial Portfolio Management Problem,arXiv
e-prints, p. arXiv:1706.10059, Jun. 2017.
Fig. 11. Confusion Matrix for crypto-assets. Prediction horizon (k) equals 1. From left to right: ”BTC”, ”LTC”, ”ETH”, ”XRP”
Fig. 12. Confusion Matrix for crypto-assets. Prediction horizon (k) equals 5. From left to right: ”BTC”, ”LTC”, ”ETH”, ”XRP”
Fig. 13. Confusion Matrix for crypto-assets. Prediction horizon (k) equals 10. From left to right: ”BTC”, ”LTC”, ”ETH”, ”XRP”
Fig. 14. Confusion Matrix for crypto-assets. Prediction horizon (k) equals 20. From left to right: ”BTC”, ”LTC”, ”ETH”, ”XRP”
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Managing the prediction of metrics in high‐frequency financial markets is a challenging task. An efficient way is by monitoring the dynamics of a limit order book to identify the information edge. This paper describes the first publicly available benchmark dataset of high‐frequency limit order markets for mid‐price prediction. We extracted normalized data representations of time series data for five stocks from the Nasdaq Nordic stock market for a time period of 10 consecutive days, leading to a dataset of ∼4,000,000 time series samples in total. A day‐based anchored cross‐validation experimental protocol is also provided that can be used as a benchmark for comparing the performance of state‐of‐the‐art methodologies. Performance of baseline approaches are also provided to facilitate experimental comparisons. We expect that such a large‐scale dataset can serve as a testbed for devising novel solutions of expert systems for high‐frequency limit order book data analysis.
Article
Full-text available
We use the GARCH-MIDAS model to extract the long- and short-term volatility components of cryptocurrencies. As potential drivers of Bitcoin volatility, we consider measures of volatility and risk in the US stock market as well as a measure of global economic activity. We find that S&P 500 realized volatility has a negative and highly significant effect on long-term Bitcoin volatility. The finding is atypical for volatility co-movements across financial markets. Moreover, we find that the S&P 500 volatility risk premium has a significantly positive effect on long-term Bitcoin volatility. Finally, we find a strong positive association between the Baltic dry index and long-term Bitcoin volatility. This result shows that Bitcoin volatility is closely linked to global economic activity. Overall, our findings can be used to construct improved forecasts of long-term Bitcoin volatility.
Article
Full-text available
Using a large-scale Deep Learning approach applied to a high-frequency database containing billions of electronic market quotes and transactions for US equities, we uncover nonparametric evidence for the existence of a universal and stationary price formation mechanism relating the dynamics of supply and demand for a stock, as revealed through the order book, to subsequent variations in its market price. We assess the model by testing its out-of-sample predictions for the direction of price moves given the history of price and order flow, across a wide range of stocks and time periods. The universal price formation model is shown to exhibit a remarkably stable out-of-sample prediction accuracy across time, for a wide range of stocks from different sectors. Interestingly, these results also hold for stocks which are not part of the training sample, showing that the relations captured by the model are universal and not asset-specific. The universal model --- trained on data from all stocks --- outperforms, in terms of out-of-sample prediction accuracy, asset-specific linear and nonlinear models trained on time series of any given stock, showing that the universal nature of price formation weighs in favour of pooling together financial data from various stocks, rather than designing asset- or sector-specific models as commonly done. Standard data normalizations based on volatility, price level or average spread, or partitioning the training data into sectors or categories such as large/small tick stocks, do not improve training results. On the other hand, inclusion of price and order flow history over many past observations is shown to improve forecasting performance, showing evidence of path-dependence in price dynamics.
Article
Stock prediction aims to predict the future trends of a stock in order to help investors make good investment decisions. Traditional solutions for stock prediction are based on time-series models. With the recent success of deep neural networks in modeling sequential data, deep learning has become a promising choice for stock prediction. However, most existing deep learning solutions are not optimized toward the target of investment, i.e., selecting the best stock with the highest expected revenue. Specifically, they typically formulate stock prediction as a classification (to predict stock trends) or a regression problem (to predict stock prices). More importantly, they largely treat the stocks as independent of each other. The valuable signal in the rich relations between stocks (or companies), such as two stocks are in the same sector and two companies have a supplier-customer relation, is not considered. In this work, we contribute a new deep learning solution, named Relational Stock Ranking (RSR), for stock prediction. Our RSR method advances existing solutions in two major aspects: (1) tailoring the deep learning models for stock ranking, and (2) capturing the stock relations in a time-sensitive manner. The key novelty of our work is the proposal of a new component in neural network modeling, named Temporal Graph Convolution, which jointly models the temporal evolution and relation network of stocks. To validate our method, we perform back-testing on the historical data of two stock markets, NYSE and NASDAQ. Extensive experiments demonstrate the superiority of our RSR method. It outperforms state-of-the-art stock prediction solutions achieving an average return ratio of 98% and 71% on NYSE and NASDAQ, respectively.
Article
Time-series forecasting has various applications in a wide range of domains, e.g., forecasting stock markets using limit order book data. Limit order book data provide much richer information about the behavior of stocks than its price alone, but also bear several challenges, such as dealing with multiple price depths and processing very large amounts of data of high dimensionality, velocity, and variety. A well-known approach for efficiently handling large amounts of high-dimensional data is the bag-of-features (BoF) model. However, the BoF method was designed to handle multimedia data such as images. In this paper, a novel temporal-aware neural BoF model is proposed tailored to the needs of time-series forecasting using high frequency limit order book data. Two separate sets of radial basis function and accumulation layers are used in the temporal BoF to capture both the short-term behavior and the long-term dynamics of time series. This allows for modeling complex temporal phenomena that occur in time-series data and further increase the forecasting ability of the model. Any other neural layer, such as feature transformation layers, or classifiers, such as multilayer perceptrons, can be combined with the proposed deep learning approach, which can be trained end-to-end using the back-propagation algorithm. The effectiveness of the proposed method is validated using a large-scale limit order book dataset, containing over 4.5 million limit orders, and it is demonstrated that it greatly outperforms all the other evaluated methods.
Conference Paper
Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.
Chapter
The vector autoregression (VAR) model is one of the most successful, flexible, and easy to use models for the analysis of multivariate time series. It is a natural extension of the univariate autoregressive model to dynamic multivariate time series. The VAR model has proven to be especially useful for describing the dynamic behavior of economic and financial time series and for forecasting. It often provides superior forecasts to those from univariate time series models and elaborate theory-based simultaneous equations models. Forecasts from VAR models are quite flexible because they can be made conditional on the potential future paths of specified variables in the model.