In this paper, a novel algorithm is developed for electronic trading on financial time series. The new method uses quantization and volatility information together with FeedForward Neural Networks (FFNN) for achieving High Frequency Trading (HFT). The proposed procedures are based on estimating the Forward Conditional Probability Distribution (FCPD) of the quantized return values. From past samples, the Conditional Expected Value (CEV) can be learned, from which FCPD can be obtained by using a special encoding scheme. Based on this estimation, a trading signal is triggered if the probability of price change becomes significant as measured by a quadratic criterion. Due to the encoding scheme and quantization, the complexity of learning and estimation have been reduced for HFT. Extensive numerical analysis has been performed on financial time series and the new method has proven to be profitable on mid-prices. In order to beat the secondary effects, we focus on the most liquid assets, on which we managed to achieve positive profits.
Trading by estimating the quantized forward distribution
Attila Ceffer, Norbert Fogarasi and Janos Levendovszky
Department of Networked Systems amd Services, Budapest University of Technology and
Compiled January 29, 2018
1. Introduction
The selection of portfolios which are optimal in terms of risk-adjusted returns has
been an intensive area of research in recent decades (Anagnostopoulos and Mamanis
2011). Furthermore, the main focus of portfolio optimization tends to move towards
the application of High Frequency Trading (HFT) when a huge amount of financial
data is taken into account within a very short time interval and trading with the opti-
mized portfolio is also to be performed at high frequency within these intervals (Chan
2013). HFT presents a challenge to both algorithmic and architectural development,
because of the need for developing algorithms running fast on specific architectures
(e.g. GPGPU, FPGA chipsets) where speed is the most important attribute. On the
other hand, profitable portfolio optimization and trading needs the evaluation of rather
complex goal functions with different constraints which sometimes cast the problem
in the NP hard domain (d’Aspremont 2007; Sipos and Levendovszky 2013; Fogarasi
and Levendovszky 2013). As a result, the computational paradigms emerging from
the field of neural computing, which support fast parallel implementation, are often
CONTACT Attila Ceffer. Email:
used in the field algorithmic trading (Kaastra and Boyd 1996; Saad, Prokhorov, and
Wunsch 1998; Levendovszky and Kia 2013).
In this paper, trading is done based on the estimated Forward Conditional Prob-
ability Distribution (FCPD). Parts of this work have already been presented at the
Financial Markets and Nonlinear Dynamics conference in Paris 2017 (Ceffer and Lev-
endovszky 2017). However this paper is a considerable extension of those results with
respect to time series autocorrelation analysis, trading with portfolios and other en-
hancements in the formalism.
Since FCPD takes its values on the possible asset prices (or return values), the
number of probabilities to be estimated explodes exponentially with respect to the
length of the memory. As a result, in the case of numerous return values and for the
sake of accurate estimations, we need a very large training set, which hinders HFT
due to the low speed of learning. In order to speed up data collection and learning
(using a small number of samples), we need to quantize the asset prices. We quantize
the change of the prices (returns), which varies in a smaller interval. In the paper, we
use the Lloyd-Max algorithm for quantization to attain a good trading performance.
After quantizing the returns, we train a FFNN to estimate the FCPD and this can
then provide the necessary trading signal.
In order to beat the bid-ask spread, we have introduced further adjustments in the
algorithm by considering the estimated volatility of the asset and by choosing more
liquid assets for trading. With these adjustments, we could secure positive profit in
the presence of bid-ask spread on EUR/USD currency exchange rates.
The material summarized above is organized as follows:
in Section 2, the theoretical background of trading by FFNN is outlined;
in Section 3, encoding schemes are introduced to obtain FCPD;
in Section 4, the trading strategy is defined and explained;
in Section 5, the computational model of the trading algorithm is mapped out;
in Section 6, we validate the methodology on real historical financial time series.
2. Theoretical background - trading with FFNN
Let us assume, that x(t) is the return of a financial instrument (e.g. foreign exchange
rate) at time instant t. Based on historical observations of the financial instrument
values, one can construct a training set containing some samples followed by the ob-
served forward return given as follows: τ(K)={(xk, x(k+ 1)) , k =L, ..., K 1}where
Kis the number of samples available for training, Lis the memory (time lag) of the
process and xk= (x(kL+ 1), ..., x(k)) are observed samples.
There are several other adaptive architectures used for prediction, however FFNNs
are proven to exhibit universal approximator capabilities (Hornik, Stinchcombe, and
White 1989). As a result, after learning, FFNNs can capture the CEV which is the
optimal solution of the nonlinear regression problem in mean square (Doob 1953).
Let us then construct a FFNN based predictor
˜x(t+ 1) = ϕ
ij ... ϕ X
=N et (Θ,xt),
where xt= (x(tL+ 1), ..., x(t)).
After learning, we obtain
opt : min
(x(k+ 1) N et (Θ,xk))2(2)
(x(k+ 1) N et (Θ,xk))2=E(x(t+ 1) Net (Θ,xt))2(3)
as shown in (Doob 1953) and
ΘE(x(t+ 1) N et (Θ,xt))2Net (Θ,xt) = E(x(t+ 1) |xt) (4)
and the FFNN will provide the optimal non-linear prediction of the CEV
N et (Θ,xt) = E(x(t+ 1) |xt),(5)
(for further details see (Hornik, Stinchcombe, and White 1989; Haykin 1998; Funahashi
3. Coding scheme to obtain FCPD from CEV
In order to obtain the FCPD of the asset, let us encode the possible values of the
return into an orthonormal vector set:
i=δli =1 if i=l
0 otherwise
and rewrite the training set according to the encoding mechanism:
τ(K)={(rk+1,xk), k = 1, ..., K},
where rk+1 =r(l)if x(k+ 1) = ql. Then by minimizing the error function
krk+1 N et (Θ,xk)k2EkrNet (Θ,x)k2,(6)
one will obtain N et Θ(K)
opt ,x=E(r|x), where due to the encoding, component l of
the conditional expected value will yield the corresponding conditional probability as
El(r|x) =
=Pr(l)|x=P(x(t+ 1) = ql|x) =
FeedForward Neural Network Forward Conditional Probability Distribution
Figure 1. The neural architecture to estimate FCPD
This allows the construction of a FFNN for the estimation of the FCPD, the proposed
architecture is shown on Figure 1.
4. The trading algorithm
Having obtained the FCPD, we can now turn our attention to developing a trading
strategy which utilizes it. However, first we observe that the method outlined above re-
quires a high complexity neural network as the dimension of the output y=N et (Θ,x)
is dim(y) = Mwhich is the number of possible returns. Unfortunately, a high com-
plexity FFNN with many outputs contains a large number of free parameters which,
in turn, requires a large learning set to train. This will prevent fast execution of the
strategy and, as a result, hinders the ability of the method to be used in HFT. Thus,
the present effort in this section is focused on decreasing the number of outputs, which
will also reduce the number of free parameters to optimized. In order to achieve this,
we quantize the time series.
Let us define a quantization of the returns as in (Lloyd 2006). Let {Q1, Q2, ..., QM}
be a class of sets (intervals) and {q1, q2, ..., qM}be a corresponding set of quanta. We
associate with a partition {Qi}a label function γ(x) defined for all real values xsuch
γ(x) = iif xlies in Qi.(8)
Define ˆri(t) := γ(ri(t)) and ˆxi(t) := γ(xi(t)), the labels of the asset returns and
portfolio return under the given quantization. Assuming we have Lpast observations
of x, we can define the conditional probability function
Px(i, t) := Px(t+ 1) = i|ˆx(t) = x1, ..., ˆx(tL+ 1) = xL), i = 1, ..., M, (9)
where history vector x= (x1, ..., xL) contains the quantization labels of historical
observations of y. Let us define a constant ε > 0 corresponding to frictional transaction
costs of long and short positions and let jU:= γ(ε) be the upper and jL:= γ(ε) be
the lower tolerance label and define δ > 0 as the minimum trading probability limit.
We can now define a trading strategy using FCPD as follows:
Algorithm 1 Trading algorithm
while t<T do
if not InstrumentAtHand and P
Px(i, t)P
Px(i, t)> δ then
Buy the instrument
if InstrumentAtHand and P
Px(i, t)<P
Px(i, t)then
Sell the instrument
The returns of financial time series can be approximated by Gaussian random vari-
ables. However, equidistant quantization is only optimal for samples following uniform
distribution. In order to overcome this shortcoming, the expected value of the squared
quantization error (i.e. the squared difference between original and quantized signals)
can be reduced by applying non-equidistant quantization. By quantizing with larger
error the components which occur less frequently than the components which occur
more often, the overall error can be made smaller (Roe 2006; Lloyd 2006). In this way,
one can obtain a more accurate estimation of the FCPD, which may yield better trad-
ing decisions. To determine the optimal quantization levels, we used the Lloyd-Max
The Lloyd-Max algorithm:
(1) use an initial set of representative levels: qii= 1,2, ..., M
(2) assign each sample x(t) in training set τ(K)to closest representative qi:Ci=
xτ(K):Q(x) = ii= 2,3, ..., M
(3) calculate new representative levels:
x i = 1,2, ..., M
(4) repeat 2. and 3. until no further distortion reduction (or applying a stopping
Our simulations have proven that by running the Lloyd Max algorithm, the quan-
tization error drops to 10 times lower than using equidistant quantization due to
minimizing the objective function
4.1. Trading in high volatility periods
Based on the predicted standard deviation we can improve the trading efficiency. Each
time the neural network gives an entry signal (either long or short), we calculate the
standard deviation from the forward conditional distribution as
σ(t) = v
Pi qi
If the standard deviation reaches a given threshold (high volatility), we enter into
the trade, otherwise (low volatility), we stay away from the market. The modified
trading algorithm is shown in Algorithm 2.
Algorithm 2 Trading algorithm with volatility filter
while t<T do
if not InstrumentAtHand and P
Px(i, t)P
Px(i, t)> δ and σ(t)> η
Buy the instrument
if InstrumentAtHand and P
Px(i, t)<P
Px(i, t)then
Sell the instrument
4.2. Determining the memory of the process (time lag)
Since our main concern is to decrease the complexity of FFNN used for estimating
FCPD, we also would like to minimize the number of inputs. However, the number of
inputs is determined by the memory of the random process to be predicted. This needs
the estimation of the ”model degree” or ”memory” of the predicted process in terms
of estimating the number of past values used for prediction. The process memory is
determined based on the autocorrelation measured in the financial time series.
In order to determine the autocorrelation of the input financial time series, we
performed the following off-line analysis. Having loaded the entire training data set
and computed the returns as described in equation (11), we examined the plot of the
return time series. We concluded that it appears to fluctuate around a constant mean,
thus no further transformation is necessary for the autocorrelation analysis.
We then plotted the sample autocorrelation function (ACF) of the computed return
data for time lags 0 to 20 (note that the autocorrelation measure is normalized so that
time lag 0 has autocorrelation of 1). We also plotted approximate upper and lower
confidence bounds (horizontal lines) under the hypothesis that the underlying is a
Gaussian white noise process (Box, Reinsel, and Jenkins 1994). The results for the
minute by minute foreign exchange rate time series are shown in Figure 2. We observe
that the sample ACF has significant autocorrelation for lags 1, 2 and 3, but drops off
for larger time lags. Therefore a process memory parameter of L= 3 is a good choice
for these data sets.
4.3. Portfolio optimization
In this section, we explain how to select a portfolio from a universe of assets which is
optimal for trading.
0 5 10 15 20
Sample Autocorrelation
EUR/USD Sample Autocorrelation Function
0 5 10 15 20
Sample Autocorrelation
GBP/USD Sample Autocorrelation Function
0 5 10 15 20
Sample Autocorrelation
AUD/USD Sample Autocorrelation Function
0 5 10 15 20
Sample Autocorrelation
NZD/USD Sample Autocorrelation Function
Figure 2. Sample autocorrelation of AUD/USD, EUR/USD, GBP/USD, NZD/USD minute by minute close
price return data as a function of the time lag.
Let us assume that there is a vector valued random asset price process (e.g. the
values of currency foreign exchange rates), which is denoted by s(t) = (s1(t), ..., sn(t))
for n1 assets. The return series of s(t) is defined as
ri(t) = si(t)
si(t1) 1.(11)
A portfolio of assets is defined by a portfolio vector w(t)=(w1(t), ..., wn(t)) which
yields a linear combination of asset values at time t:
p(t) :=
wi(t)si(t) = w(t)Ts(t),(12)
and implies a return of the portfolio from time t1 to time t:
x(t) :=
wi(t)ri(t) = w(t)Tr(t).(13)
Our approach will be to select a portfolio which optimizes the following objective
wopt : max
Px(i, t)X
Px(i, t)
Given that Px(i, t), as defined in equation 9, is estimated for each portfolio at each
time step using a high-dimensional FFNN, this can be considered a High-dimensional,
Expensive (computationally), Black-box (HEB) optimization problem. There are a
number of different ways to deal with such problems, a good survey of the different
methods is (Shan and Wang 2010). In section 5, we will explain how we have tackled
this complex problem.
5. Computational approach to modelling and optimization
Our computational framework is shown in the block diagram of Figure 3 and detailed
In the case of a single asset, compute the returns from the historical time series;
In the case of multiple time series, construct a starting portfolio for the opti-
mization and compute its returns from the historical time series;
Quantize the return series of the asset;
Fit a FFNN to the time series of the asset by using the coding scheme as detailed
in section 3;
Evaluate the objective function by estimating the FCPD according to the iden-
tified model given the portfolio;
In the case of portfolio selection, continue the numerical optimization process
until the optimal portfolio is obtained according to the objective function;
Quantization Train FFNN
Single asset
time series Estimate
Trading Results
time series
Figure 3. Computational approach
Form a trading signal based on the price behaviour of the asset as per Algorithm
2 to decide which trading action is to be performed;
Finally, one can carry out a performance analysis by testing and evaluating various
numerical indicators for the sake of comparing the profitability of the different methods
(chapter 6 contains further details).
In the absence of analytical solutions for the constrained optimization problem posed
in (14), we use simulated annealing (SA) to obtain good quality heuristic solutions.
SA is a stochastic search algorithm for finding the global optimum in a large space
fully described in the related papers (Kirkpatrick, Gelatt, and Vecchi 1983; Salamon,
Frost, and Sibani 2002). There are of course several other stochastic search algorithms,
for example genetic algorithm, tabu search, GRASP, pattern search, which could also
be used, but SA has been successfully applied to similar problems (Armananzas and
Lozano 2005; Fogarasi and Levendovszky 2013; Sipos and Levendovszky 2013). At
each step of the algorithm, we consider a neighbouring state w’ of the current state
wand probabilistically decide between moving the system or staying. The transition
probability depends on a temperature parameter Twhich is decreasing during the
procedure (also referred to as cooling). Convergence to the globally optimal solution
solution has been proven as long as the cooling schedule is sufficiently slow (Geman
and Geman 1984).
In our application, the negative energy function J(w) is equivalent to the objective
function defined as follows:
J(w) = max
Px(i, t)X
Px(i, t)
Heuristic search procedures that aspire to find global optimal solutions to hard
combinatorial optimization problems usually require some type of diversification to
overcome local optimality. One way to achieve diversification is to re-start the search
from a new solution once a region has been extensively explored. This strategy is
referred to as Multi-start method (Mart´ı 2003). A detailed analysis of the various
types of multi-start strategies can be found in (Mart´ı, Resende, and Ribeiro 2013).
We perform parallel and independent simulated annealing in a given number of ran-
domly selected subspaces, following the treatment of (Ram, Sreenivas, and Subrama-
niam 1996). More precisely, we use closed orthants in Rl, constraining each coordinate
of the subspace to be nonnegative or nonpositive. This is motivated by our investiga-
tion which has shown that SA converges faster and provides more reliable results in
these regions. However, an exhaustive search in every subspace is computationally not
feasible in case of a larger number of assets.
The neighbour function in each iteration generates a new portfolio on the L1-ball,
the distance of which from the previous one depends on the current temperature T.
In each of the selected subspaces let wbe an arbitrary initialization vector, and a new
vector w0is generated randomly subject to the aforementioned neighbour function.
We automatically, accept the new vector if J(w0)< J (w). In case J(w0)J(w), we
apply random acceptance with probability e
T. The sampling is then continued
while decreasing parameter Tto zero. Finally, the last state vectors obtained in the
corresponding subspaces are compared, and the one minimizing J(w) is the identified
optimal portfolio vector.
6. Performance analysis
An extensive back-testing framework has been created to handle trading actions on
various input data and provide numerical results for the sake of comparing different
methods on different time series (either historical foreign exchange rates or artificially
generated data).
At first, we investigate the estimation performance of the proposed model on gen-
erated data. For the sake of comparison, we used several FFNNs, each with different
number of neurons in the hidden layer. In order to ensure the universal approximator
capability, we used sigmoid activation function in the hidden layer, and linear in the
output layer. To prevent overfitting, we used the ”early stopping” method. The results
showed that FFNN can successfully predict the forward distribution, furthermore in
some cases it is more accurate than the standard histogram method (simply calcu-
lating the relative frequencies). Table 1 below shows the out-of-sample mean squared
error between predicted and real FCPD. Due to the use of early stopping, in-sample
MSE is similar to out-of-sample MSE.
MSE L=1 L=2 L=3 L=4
N=1 0.00296 0.0089 0.00153 0.00425
N=3 0.00741 0.00867 0.00162 0.00249
N=5 0.00624 0.00697 0.00157 0.00201
N=8 0.00624 0.00613 0.00144 0.00194
N=10 0.00624 0.00486 0.00174 0.00188
N=20 0.00624 0.00412 0.00178 0.00181
N=30 0.00624 0.00412 0.00189 0.00196
N=50 0.00624 0.00412 0.00224 0.00215
N=100 0.00624 0.00412 0.00224 0.00215
N=200 0.00624 0.00412 0.00224 0.00215
Histogram 0.00624 0.00412 0.00224 0.00215
Table 1. Mean squared error of the FCPD as a function of the number of neurons (N) and the time lag (L)
on generated data
By extensive numerical simulations, we found that the optimal number of neurons
in the hidden layer is 10, because it is a good tradeoff between the quality and training
time. Adding more neurons does not improve the approximation quality measured by
the mean squared error, but it increases training time significantly (Table 1).
For a detailed comparative analysis, the following performance measures were cal-
culated for each experiment on the corresponding time series:
Profit gained, that is the money realized by the agent on top of the 10,000 USD
starting balance;
Maximum drawdown, that is the maximum loss from a peak to a trough of the
Number of trades, that is the number of trades the agent made in the evaluation
Winning rate, that is the ratio of the total number of winning trades to the
number of all trades;
Average trade duration, that is the average holding period of an asset or portfolio
in minutes.
Average profit per trade (in points, which is the smallest possible price change),
that is the amount realized by the agent divided by the number of trades.
Sharpe ratio, that is the excess return divided by the standard deviation of
an asset or portfolio, assuming the risk free rate is 0%. This measure is not
annualized, but reflects the Sharpe ratio for the return per minute.
In this section, we show the numerical results obtained on the following foreign
exchange data sets:
minute by minute data from 2016.01.01 to 2016.12.31 including bid and ask prices, in
order to take into account transaction costs. In each case, the length of memory of the
neural network was L= 3 and we used the last 5 days of data observations from the
past to fit FFNNs, while we retrained the network after 1 day of data. Based on Table
1 we set the number of neurons to N= 10 and the number of quantization levels to
M= 5 in each simulation.
The starting balance of the agent was 10,000 USD, but on each trade, it used a
notional amount of 100,000 USD, using a leverage of 10:1 as is customary in foreign
exchange trading.
The trading results with and without considering the bid-ask spread are shown in
Table 2.
Mid-price Bid-ask prices
Profit 2 989 USD -7 092 USD
Maximum Drawdown 33.3% 71.7%
Number of trades 5 280 5 280
Average profit per trade 0.56 -1.34
Average trade duration 279.45 min 279.45 min
Winning ratio 49.8% 34.1%
Sharpe ratio 0.0286 -0.0123
Table 2. Results on historical foreign exchange data sets of the trading algorithm with portfolio optimization
with and without considering the bid-ask spread.
As observed, the trading algorithm managed to achieve positive profits on mid-
prices. Since the average profit per trade is negative when trading on the bid and ask
prices, further enhancements(parameter tuning, more complex trading strategy etc.)
are needed to ensure profitability.
In order to demonstrate that the trading algorithm is profitable even when the bid-
ask spread is taken into account, we focused on single-asset trading for the EUR/USD
foreign exchange rate where the bid-ask spread was the most narrow. Table 3 shows
single-asset trading results for the same time period.
Mid-price Bid-ask prices
Profit 5 781 USD 2 695 USD
Maximum Drawdown 32.9% 39.3%
Number of trades 493 493
Average profit per trade 11.72 5.46
Average trade duration 549.35 min 549.35 min
Winning ratio 55.26% 50.6%
Sharpe ratio 0.056 0.036
Table 3. Results of the single asset trading algorithm on historical EUR/USD foreign exchange rate with
and without considering the bid-ask spread.
We have developed a novel trading method based on estimating the FCPD of single
assets or portfolios by a FFNN. In order to minimize the complexity of FFNN and
support HFT a new coding scheme has been introduced to map CEV into FCPD.
Trading was done by using a probabilistic condition indicating the trend of the price
change based on the FCPD. To guarantee high profitability the time lag was estimated
based on the auto-correlation pattern and trading was done only in high volatility
periods. The paper also dealt with portfolio optimization with respect to the new
objective function.
Having fine tuned the model parameters on generated data, the numerical results
demonstrated that the new methods were able to yield consistent profits on the
mid-prices of high frequency foreign exchange (EUR/USD, GBP/USD, AUD/USD,
NZD/USD) historical data. However, when the bid-ask spread was also taken into ac-
count, the algorithm achieved positive profits only for EUR/USD, where the spreads
were most narrow.
Directions for future research include testing the method on other asset classes
(eg., fixed income, equities, exchange traded funds) and considering larger universes
of assets to pick sparse, optimal portfolios for trading. Enhancements can also be made
to the determination of the time lag parameter (eg., dynamic autocorrelation analysis
for each time series segment) and more complex trading strategies could be introduced
which can hold multiple portfolios or consider model parameters such as the bid-ask
spread or estimated volatility in determining the trade size.
