PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In this paper, a novel algorithm is developed for electronic trading on financial time series. The new method uses quantization and volatility information together with FeedForward Neural Networks (FFNN) for achieving High Frequency Trading (HFT). The proposed procedures are based on estimating the Forward Conditional Probability Distribution (FCPD) of the quantized return values. From past samples, the Conditional Expected Value (CEV) can be learned, from which FCPD can be obtained by using a special encoding scheme. Based on this estimation, a trading signal is triggered if the probability of price change becomes significant as measured by a quadratic criterion. Due to the encoding scheme and quantization, the complexity of learning and estimation have been reduced for HFT. Extensive numerical analysis has been performed on financial time series and the new method has proven to be profitable on mid-prices. In order to beat the secondary effects, we focus on the most liquid assets, on which we managed to achieve positive profits.
Content may be subject to copyright.
ARTICLE TEMPLATE
Trading by estimating the quantized forward distribution
Attila Ceffer, Norbert Fogarasi and Janos Levendovszky
Department of Networked Systems amd Services, Budapest University of Technology and
Economics
ARTICLE HISTORY
Compiled January 29, 2018
ABSTRACT
In this paper, a novel algorithm is developed for electronic trading on financial
time series. The new method uses quantization and volatility information together
with FeedForward Neural Networks (FFNN) for achieving High Frequency Trading
(HFT). The proposed procedures are based on estimating the Forward Conditional
Probability Distribution (FCPD) of the quantized return values.
From past samples, the Conditional Expected Value (CEV) can be learned, from
which FCPD can be obtained by using a special encoding scheme. Based on this
estimation, a trading signal is triggered if the probability of price change becomes
significant as measured by a quadratic criterion. Due to the encoding scheme and
quantization, the complexity of learning and estimation have been reduced for HFT.
Extensive numerical analysis has been performed on financial time series and the
new method has proven to be profitable on mid-prices. In order to beat the secondary
effects, we focus on the most liquid assets, on which we managed to achieve positive
profits.
KEYWORDS
algorithmic trading; neural networks; conditional probability distribution;
quantization
1. Introduction
The selection of portfolios which are optimal in terms of risk-adjusted returns has
been an intensive area of research in recent decades (Anagnostopoulos and Mamanis
2011). Furthermore, the main focus of portfolio optimization tends to move towards
the application of High Frequency Trading (HFT) when a huge amount of financial
data is taken into account within a very short time interval and trading with the opti-
mized portfolio is also to be performed at high frequency within these intervals (Chan
2013). HFT presents a challenge to both algorithmic and architectural development,
because of the need for developing algorithms running fast on specific architectures
(e.g. GPGPU, FPGA chipsets) where speed is the most important attribute. On the
other hand, profitable portfolio optimization and trading needs the evaluation of rather
complex goal functions with different constraints which sometimes cast the problem
in the NP hard domain (d’Aspremont 2007; Sipos and Levendovszky 2013; Fogarasi
and Levendovszky 2013). As a result, the computational paradigms emerging from
the field of neural computing, which support fast parallel implementation, are often
CONTACT Attila Ceffer. Email: ceffer@hit.bme.hu
used in the field algorithmic trading (Kaastra and Boyd 1996; Saad, Prokhorov, and
Wunsch 1998; Levendovszky and Kia 2013).
In this paper, trading is done based on the estimated Forward Conditional Prob-
ability Distribution (FCPD). Parts of this work have already been presented at the
Financial Markets and Nonlinear Dynamics conference in Paris 2017 (Ceffer and Lev-
endovszky 2017). However this paper is a considerable extension of those results with
respect to time series autocorrelation analysis, trading with portfolios and other en-
hancements in the formalism.
Since FCPD takes its values on the possible asset prices (or return values), the
number of probabilities to be estimated explodes exponentially with respect to the
length of the memory. As a result, in the case of numerous return values and for the
sake of accurate estimations, we need a very large training set, which hinders HFT
due to the low speed of learning. In order to speed up data collection and learning
(using a small number of samples), we need to quantize the asset prices. We quantize
the change of the prices (returns), which varies in a smaller interval. In the paper, we
use the Lloyd-Max algorithm for quantization to attain a good trading performance.
After quantizing the returns, we train a FFNN to estimate the FCPD and this can
then provide the necessary trading signal.
In order to beat the bid-ask spread, we have introduced further adjustments in the
algorithm by considering the estimated volatility of the asset and by choosing more
liquid assets for trading. With these adjustments, we could secure positive profit in
the presence of bid-ask spread on EUR/USD currency exchange rates.
The material summarized above is organized as follows:
in Section 2, the theoretical background of trading by FFNN is outlined;
in Section 3, encoding schemes are introduced to obtain FCPD;
in Section 4, the trading strategy is defined and explained;
in Section 5, the computational model of the trading algorithm is mapped out;
in Section 6, we validate the methodology on real historical financial time series.
2. Theoretical background - trading with FFNN
Let us assume, that x(t) is the return of a financial instrument (e.g. foreign exchange
rate) at time instant t. Based on historical observations of the financial instrument
values, one can construct a training set containing some samples followed by the ob-
served forward return given as follows: τ(K)={(xk, x(k+ 1)) , k =L, ..., K 1}where
Kis the number of samples available for training, Lis the memory (time lag) of the
process and xk= (x(kL+ 1), ..., x(k)) are observed samples.
There are several other adaptive architectures used for prediction, however FFNNs
are proven to exhibit universal approximator capabilities (Hornik, Stinchcombe, and
White 1989). As a result, after learning, FFNNs can capture the CEV which is the
optimal solution of the nonlinear regression problem in mean square (Doob 1953).
Let us then construct a FFNN based predictor
˜x(t+ 1) = ϕ
X
i
w(L)
iϕ
X
j
w(L1)
ij ... ϕ X
m
w(1)
nmx(tm)!
=N et (Θ,xt),
(1)
where xt= (x(tL+ 1), ..., x(t)).
2
After learning, we obtain
Θ(K)
opt : min
Θ
1
K
K
X
k=1
(x(k+ 1) N et (Θ,xk))2(2)
because
lim
K→∞
1
K
K
X
k=1
(x(k+ 1) N et (Θ,xk))2=E(x(t+ 1) Net (Θ,xt))2(3)
as shown in (Doob 1953) and
min
ΘE(x(t+ 1) N et (Θ,xt))2Net (Θ,xt) = E(x(t+ 1) |xt) (4)
and the FFNN will provide the optimal non-linear prediction of the CEV
N et (Θ,xt) = E(x(t+ 1) |xt),(5)
(for further details see (Hornik, Stinchcombe, and White 1989; Haykin 1998; Funahashi
1989)).
3. Coding scheme to obtain FCPD from CEV
In order to obtain the FCPD of the asset, let us encode the possible values of the
return into an orthonormal vector set:
qlr(l):r(l)
i=δli =1 if i=l
0 otherwise
and rewrite the training set according to the encoding mechanism:
τ(K)={(rk+1,xk), k = 1, ..., K},
where rk+1 =r(l)if x(k+ 1) = ql. Then by minimizing the error function
1
K
K
X
k=1
krk+1 N et (Θ,xk)k2EkrNet (Θ,x)k2,(6)
one will obtain N et Θ(K)
opt ,x=E(r|x), where due to the encoding, component l of
the conditional expected value will yield the corresponding conditional probability as
El(r|x) =
M
X
i=1
r(i)
lPr(i)|x=
M
X
i=1
δliPr(i)|x=
=Pr(l)|x=P(x(t+ 1) = ql|x) =
=Pl.
(7)
3
x(t)
x(t-1)
x(t-L+1)
.
.
.Pi
P1
PM
P2
.
.
.
.
.
.
FeedForward Neural Network Forward Conditional Probability Distribution
Figure 1. The neural architecture to estimate FCPD
This allows the construction of a FFNN for the estimation of the FCPD, the proposed
architecture is shown on Figure 1.
4. The trading algorithm
Having obtained the FCPD, we can now turn our attention to developing a trading
strategy which utilizes it. However, first we observe that the method outlined above re-
quires a high complexity neural network as the dimension of the output y=N et (Θ,x)
is dim(y) = Mwhich is the number of possible returns. Unfortunately, a high com-
plexity FFNN with many outputs contains a large number of free parameters which,
in turn, requires a large learning set to train. This will prevent fast execution of the
strategy and, as a result, hinders the ability of the method to be used in HFT. Thus,
the present effort in this section is focused on decreasing the number of outputs, which
will also reduce the number of free parameters to optimized. In order to achieve this,
we quantize the time series.
Let us define a quantization of the returns as in (Lloyd 2006). Let {Q1, Q2, ..., QM}
be a class of sets (intervals) and {q1, q2, ..., qM}be a corresponding set of quanta. We
associate with a partition {Qi}a label function γ(x) defined for all real values xsuch
that
γ(x) = iif xlies in Qi.(8)
Define ˆri(t) := γ(ri(t)) and ˆxi(t) := γ(xi(t)), the labels of the asset returns and
portfolio return under the given quantization. Assuming we have Lpast observations
of x, we can define the conditional probability function
Px(i, t) := Px(t+ 1) = i|ˆx(t) = x1, ..., ˆx(tL+ 1) = xL), i = 1, ..., M, (9)
4
where history vector x= (x1, ..., xL) contains the quantization labels of historical
observations of y. Let us define a constant ε > 0 corresponding to frictional transaction
costs of long and short positions and let jU:= γ(ε) be the upper and jL:= γ(ε) be
the lower tolerance label and define δ > 0 as the minimum trading probability limit.
We can now define a trading strategy using FCPD as follows:
Algorithm 1 Trading algorithm
while t<T do
if not InstrumentAtHand and P
i>jU
Px(i, t)P
i<jL
Px(i, t)> δ then
Buy the instrument
if InstrumentAtHand and P
i>jU
Px(i, t)<P
i<jL
Px(i, t)then
Sell the instrument
tt+1
The returns of financial time series can be approximated by Gaussian random vari-
ables. However, equidistant quantization is only optimal for samples following uniform
distribution. In order to overcome this shortcoming, the expected value of the squared
quantization error (i.e. the squared difference between original and quantized signals)
can be reduced by applying non-equidistant quantization. By quantizing with larger
error the components which occur less frequently than the components which occur
more often, the overall error can be made smaller (Roe 2006; Lloyd 2006). In this way,
one can obtain a more accurate estimation of the FCPD, which may yield better trad-
ing decisions. To determine the optimal quantization levels, we used the Lloyd-Max
algorithm.
The Lloyd-Max algorithm:
(1) use an initial set of representative levels: qii= 1,2, ..., M
(2) assign each sample x(t) in training set τ(K)to closest representative qi:Ci=
xτ(K):Q(x) = ii= 2,3, ..., M
(3) calculate new representative levels:
qi=1
kCikX
xBi
x i = 1,2, ..., M
(4) repeat 2. and 3. until no further distortion reduction (or applying a stopping
criterion).
Our simulations have proven that by running the Lloyd Max algorithm, the quan-
tization error drops to 10 times lower than using equidistant quantization due to
minimizing the objective function
M1
P
i=1
Ci+1
R
Ci
(xqi)2p(x)dx.
4.1. Trading in high volatility periods
Based on the predicted standard deviation we can improve the trading efficiency. Each
time the neural network gives an entry signal (either long or short), we calculate the
standard deviation from the forward conditional distribution as
5
σ(t) = v
u
u
t
M
X
i=1
Pi qi
M
X
i=1
Piqi!2
.(10)
If the standard deviation reaches a given threshold (high volatility), we enter into
the trade, otherwise (low volatility), we stay away from the market. The modified
trading algorithm is shown in Algorithm 2.
Algorithm 2 Trading algorithm with volatility filter
while t<T do
if not InstrumentAtHand and P
i>jU
Px(i, t)P
i<jL
Px(i, t)> δ and σ(t)> η
then
Buy the instrument
if InstrumentAtHand and P
i>jU
Px(i, t)<P
i<jL
Px(i, t)then
Sell the instrument
tt+1
4.2. Determining the memory of the process (time lag)
Since our main concern is to decrease the complexity of FFNN used for estimating
FCPD, we also would like to minimize the number of inputs. However, the number of
inputs is determined by the memory of the random process to be predicted. This needs
the estimation of the ”model degree” or ”memory” of the predicted process in terms
of estimating the number of past values used for prediction. The process memory is
determined based on the autocorrelation measured in the financial time series.
In order to determine the autocorrelation of the input financial time series, we
performed the following off-line analysis. Having loaded the entire training data set
and computed the returns as described in equation (11), we examined the plot of the
return time series. We concluded that it appears to fluctuate around a constant mean,
thus no further transformation is necessary for the autocorrelation analysis.
We then plotted the sample autocorrelation function (ACF) of the computed return
data for time lags 0 to 20 (note that the autocorrelation measure is normalized so that
time lag 0 has autocorrelation of 1). We also plotted approximate upper and lower
confidence bounds (horizontal lines) under the hypothesis that the underlying is a
Gaussian white noise process (Box, Reinsel, and Jenkins 1994). The results for the
minute by minute foreign exchange rate time series are shown in Figure 2. We observe
that the sample ACF has significant autocorrelation for lags 1, 2 and 3, but drops off
for larger time lags. Therefore a process memory parameter of L= 3 is a good choice
for these data sets.
4.3. Portfolio optimization
In this section, we explain how to select a portfolio from a universe of assets which is
optimal for trading.
6
0 5 10 15 20
−0.05
0
0.05
Lag
Sample Autocorrelation
EUR/USD Sample Autocorrelation Function
0 5 10 15 20
−0.05
0
0.05
Lag
Sample Autocorrelation
GBP/USD Sample Autocorrelation Function
0 5 10 15 20
−0.05
0
0.05
Lag
Sample Autocorrelation
AUD/USD Sample Autocorrelation Function
0 5 10 15 20
−0.05
0
0.05
Lag
Sample Autocorrelation
NZD/USD Sample Autocorrelation Function
Figure 2. Sample autocorrelation of AUD/USD, EUR/USD, GBP/USD, NZD/USD minute by minute close
price return data as a function of the time lag.
7
Let us assume that there is a vector valued random asset price process (e.g. the
values of currency foreign exchange rates), which is denoted by s(t) = (s1(t), ..., sn(t))
for n1 assets. The return series of s(t) is defined as
ri(t) = si(t)
si(t1) 1.(11)
A portfolio of assets is defined by a portfolio vector w(t)=(w1(t), ..., wn(t)) which
yields a linear combination of asset values at time t:
p(t) :=
n
X
i=1
wi(t)si(t) = w(t)Ts(t),(12)
and implies a return of the portfolio from time t1 to time t:
x(t) :=
n
X
i=1
wi(t)ri(t) = w(t)Tr(t).(13)
Our approach will be to select a portfolio which optimizes the following objective
function:
wopt : max
w
X
i>jU
Px(i, t)X
i<jL
Px(i, t)
2
.(14)
Given that Px(i, t), as defined in equation 9, is estimated for each portfolio at each
time step using a high-dimensional FFNN, this can be considered a High-dimensional,
Expensive (computationally), Black-box (HEB) optimization problem. There are a
number of different ways to deal with such problems, a good survey of the different
methods is (Shan and Wang 2010). In section 5, we will explain how we have tackled
this complex problem.
5. Computational approach to modelling and optimization
Our computational framework is shown in the block diagram of Figure 3 and detailed
below:
In the case of a single asset, compute the returns from the historical time series;
In the case of multiple time series, construct a starting portfolio for the opti-
mization and compute its returns from the historical time series;
Quantize the return series of the asset;
Fit a FFNN to the time series of the asset by using the coding scheme as detailed
in section 3;
Evaluate the objective function by estimating the FCPD according to the iden-
tified model given the portfolio;
In the case of portfolio selection, continue the numerical optimization process
until the optimal portfolio is obtained according to the objective function;
8
Quantization Train FFNN
Objective
function
Single asset
time series Estimate
FCPD
Stopping
criteria
Trading Results
Select
portfolio
Multidimensional
time series
yes
no
Figure 3. Computational approach
Form a trading signal based on the price behaviour of the asset as per Algorithm
2 to decide which trading action is to be performed;
Finally, one can carry out a performance analysis by testing and evaluating various
numerical indicators for the sake of comparing the profitability of the different methods
(chapter 6 contains further details).
In the absence of analytical solutions for the constrained optimization problem posed
in (14), we use simulated annealing (SA) to obtain good quality heuristic solutions.
SA is a stochastic search algorithm for finding the global optimum in a large space
fully described in the related papers (Kirkpatrick, Gelatt, and Vecchi 1983; Salamon,
Frost, and Sibani 2002). There are of course several other stochastic search algorithms,
for example genetic algorithm, tabu search, GRASP, pattern search, which could also
be used, but SA has been successfully applied to similar problems (Armananzas and
Lozano 2005; Fogarasi and Levendovszky 2013; Sipos and Levendovszky 2013). At
each step of the algorithm, we consider a neighbouring state w’ of the current state
wand probabilistically decide between moving the system or staying. The transition
probability depends on a temperature parameter Twhich is decreasing during the
procedure (also referred to as cooling). Convergence to the globally optimal solution
solution has been proven as long as the cooling schedule is sufficiently slow (Geman
and Geman 1984).
In our application, the negative energy function J(w) is equivalent to the objective
function defined as follows:
J(w) = max
w
X
i>jU
Px(i, t)X
i<jL
Px(i, t)
2
.(15)
Heuristic search procedures that aspire to find global optimal solutions to hard
combinatorial optimization problems usually require some type of diversification to
overcome local optimality. One way to achieve diversification is to re-start the search
from a new solution once a region has been extensively explored. This strategy is
referred to as Multi-start method (Mart´ı 2003). A detailed analysis of the various
types of multi-start strategies can be found in (Mart´ı, Resende, and Ribeiro 2013).
We perform parallel and independent simulated annealing in a given number of ran-
domly selected subspaces, following the treatment of (Ram, Sreenivas, and Subrama-
niam 1996). More precisely, we use closed orthants in Rl, constraining each coordinate
of the subspace to be nonnegative or nonpositive. This is motivated by our investiga-
9
tion which has shown that SA converges faster and provides more reliable results in
these regions. However, an exhaustive search in every subspace is computationally not
feasible in case of a larger number of assets.
The neighbour function in each iteration generates a new portfolio on the L1-ball,
the distance of which from the previous one depends on the current temperature T.
In each of the selected subspaces let wbe an arbitrary initialization vector, and a new
vector w0is generated randomly subject to the aforementioned neighbour function.
We automatically, accept the new vector if J(w0)< J (w). In case J(w0)J(w), we
apply random acceptance with probability e
J(w0)J(w)
T. The sampling is then continued
while decreasing parameter Tto zero. Finally, the last state vectors obtained in the
corresponding subspaces are compared, and the one minimizing J(w) is the identified
optimal portfolio vector.
6. Performance analysis
An extensive back-testing framework has been created to handle trading actions on
various input data and provide numerical results for the sake of comparing different
methods on different time series (either historical foreign exchange rates or artificially
generated data).
At first, we investigate the estimation performance of the proposed model on gen-
erated data. For the sake of comparison, we used several FFNNs, each with different
number of neurons in the hidden layer. In order to ensure the universal approximator
capability, we used sigmoid activation function in the hidden layer, and linear in the
output layer. To prevent overfitting, we used the ”early stopping” method. The results
showed that FFNN can successfully predict the forward distribution, furthermore in
some cases it is more accurate than the standard histogram method (simply calcu-
lating the relative frequencies). Table 1 below shows the out-of-sample mean squared
error between predicted and real FCPD. Due to the use of early stopping, in-sample
MSE is similar to out-of-sample MSE.
MSE L=1 L=2 L=3 L=4
N=1 0.00296 0.0089 0.00153 0.00425
N=3 0.00741 0.00867 0.00162 0.00249
N=5 0.00624 0.00697 0.00157 0.00201
N=8 0.00624 0.00613 0.00144 0.00194
N=10 0.00624 0.00486 0.00174 0.00188
N=20 0.00624 0.00412 0.00178 0.00181
N=30 0.00624 0.00412 0.00189 0.00196
N=50 0.00624 0.00412 0.00224 0.00215
N=100 0.00624 0.00412 0.00224 0.00215
N=200 0.00624 0.00412 0.00224 0.00215
Histogram 0.00624 0.00412 0.00224 0.00215
Table 1. Mean squared error of the FCPD as a function of the number of neurons (N) and the time lag (L)
on generated data
By extensive numerical simulations, we found that the optimal number of neurons
in the hidden layer is 10, because it is a good tradeoff between the quality and training
time. Adding more neurons does not improve the approximation quality measured by
the mean squared error, but it increases training time significantly (Table 1).
10
For a detailed comparative analysis, the following performance measures were cal-
culated for each experiment on the corresponding time series:
Profit gained, that is the money realized by the agent on top of the 10,000 USD
starting balance;
Maximum drawdown, that is the maximum loss from a peak to a trough of the
balance;
Number of trades, that is the number of trades the agent made in the evaluation
period;
Winning rate, that is the ratio of the total number of winning trades to the
number of all trades;
Average trade duration, that is the average holding period of an asset or portfolio
in minutes.
Average profit per trade (in points, which is the smallest possible price change),
that is the amount realized by the agent divided by the number of trades.
Sharpe ratio, that is the excess return divided by the standard deviation of
an asset or portfolio, assuming the risk free rate is 0%. This measure is not
annualized, but reflects the Sharpe ratio for the return per minute.
In this section, we show the numerical results obtained on the following foreign
exchange data sets:
AUD/USD,
EUR/USD,
GBP/USD and
NZD/USD
minute by minute data from 2016.01.01 to 2016.12.31 including bid and ask prices, in
order to take into account transaction costs. In each case, the length of memory of the
neural network was L= 3 and we used the last 5 days of data observations from the
past to fit FFNNs, while we retrained the network after 1 day of data. Based on Table
1 we set the number of neurons to N= 10 and the number of quantization levels to
M= 5 in each simulation.
The starting balance of the agent was 10,000 USD, but on each trade, it used a
notional amount of 100,000 USD, using a leverage of 10:1 as is customary in foreign
exchange trading.
The trading results with and without considering the bid-ask spread are shown in
Table 2.
Mid-price Bid-ask prices
Profit 2 989 USD -7 092 USD
Maximum Drawdown 33.3% 71.7%
Number of trades 5 280 5 280
Average profit per trade 0.56 -1.34
Average trade duration 279.45 min 279.45 min
Winning ratio 49.8% 34.1%
Sharpe ratio 0.0286 -0.0123
Table 2. Results on historical foreign exchange data sets of the trading algorithm with portfolio optimization
with and without considering the bid-ask spread.
As observed, the trading algorithm managed to achieve positive profits on mid-
prices. Since the average profit per trade is negative when trading on the bid and ask
11
prices, further enhancements(parameter tuning, more complex trading strategy etc.)
are needed to ensure profitability.
In order to demonstrate that the trading algorithm is profitable even when the bid-
ask spread is taken into account, we focused on single-asset trading for the EUR/USD
foreign exchange rate where the bid-ask spread was the most narrow. Table 3 shows
single-asset trading results for the same time period.
Mid-price Bid-ask prices
Profit 5 781 USD 2 695 USD
Maximum Drawdown 32.9% 39.3%
Number of trades 493 493
Average profit per trade 11.72 5.46
Average trade duration 549.35 min 549.35 min
Winning ratio 55.26% 50.6%
Sharpe ratio 0.056 0.036
Table 3. Results of the single asset trading algorithm on historical EUR/USD foreign exchange rate with
and without considering the bid-ask spread.
Conclusions
We have developed a novel trading method based on estimating the FCPD of single
assets or portfolios by a FFNN. In order to minimize the complexity of FFNN and
support HFT a new coding scheme has been introduced to map CEV into FCPD.
Trading was done by using a probabilistic condition indicating the trend of the price
change based on the FCPD. To guarantee high profitability the time lag was estimated
based on the auto-correlation pattern and trading was done only in high volatility
periods. The paper also dealt with portfolio optimization with respect to the new
objective function.
Having fine tuned the model parameters on generated data, the numerical results
demonstrated that the new methods were able to yield consistent profits on the
mid-prices of high frequency foreign exchange (EUR/USD, GBP/USD, AUD/USD,
NZD/USD) historical data. However, when the bid-ask spread was also taken into ac-
count, the algorithm achieved positive profits only for EUR/USD, where the spreads
were most narrow.
Directions for future research include testing the method on other asset classes
(eg., fixed income, equities, exchange traded funds) and considering larger universes
of assets to pick sparse, optimal portfolios for trading. Enhancements can also be made
to the determination of the time lag parameter (eg., dynamic autocorrelation analysis
for each time series segment) and more complex trading strategies could be introduced
which can hold multiple portfolios or consider model parameters such as the bid-ask
spread or estimated volatility in determining the trade size.
References
Anagnostopoulos, K.P., and G. Mamanis. 2011. “The meanvariance cardinality con-
strained portfolio optimization problem: An experimental evaluation of five multiobjec-
12
tive evolutionary algorithms.” Expert Systems with Applications 38 (11): 14208 – 14217.
http://www.sciencedirect.com/science/article/pii/S0957417411007603.
Armananzas, Ruben, and Jose A. Lozano. 2005. “A Multiobjective Approach to the Portfolio
Optimization Problem.” In 2005 IEEE Congress on Evolutionary Computation (CEC’2005),
Vol. 2, Edinburgh, Scotland, September, 1388–1395. IEEE Service Center.
Box, George E. P., Gregory C. Reinsel, and Gwilym M. Jenkins. 1994. Time series analysis :
forecasting and control. Englewood Cliffs, NJ: Prentice-Hall.
Ceffer, Attila, and Janos Levendovszky. 2017. “Trading by estimating the forward distribu-
tion using quantization and volatility information.” Proceedings, 3rd Financial Markets and
Nonlinear Dynamics (FMND 2017) .
Chan, Ernie. 2013. Algorithmic Trading: Winning Strategies and Their Rationale. 1st ed. Wiley
Publishing.
d’Aspremont, Alexandre. 2007. “Identifying Small Mean Reverting Portfolios.” CoRR
abs/0708.3048.
Doob, J.L. 1953. Stochastic Processes. Wiley Publications in Statistics. John Wiley & Sons.
Fogarasi, Norbert, and Janos Levendovszky. 2013. “Sparse, mean reverting portfolio selection
using simulated annealing.” Algorithmic Finance 2 (3-4): 197–211.
Funahashi, K. 1989. “On the Approximate Realization of Continuous Mappings by Neural
Networks.” Neural Netw. 2 (3): 183–192. http://dx.doi.org/10.1016/0893-6080(89)90003-8.
Geman, Stuart, and Donald Geman. 1984. “Stochastic Relaxation, Gibbs Distributions, and
the Bayesian Restoration of Images.” IEEE Trans. Pattern Anal. Mach. Intell. 6 (6): 721–
741. http://dx.doi.org/10.1109/TPAMI.1984.4767596.
Haykin, Simon. 1998. Neural Networks: A Comprehensive Foundation. 2nd ed. Upper Saddle
River, NJ, USA: Prentice Hall PTR.
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989. “Multilayer feedfor-
ward networks are universal approximators.” Neural Networks 2 (5): 359 – 366.
http://www.sciencedirect.com/science/article/pii/0893608089900208.
Kaastra, Iebeling, and Milton Boyd. 1996. “Designing a neural network for forecasting financial
and economic time series.” Neurocomputing 10 (3): 215 – 236. Financial Applications, Part
{II}.
Kirkpatrick, S., C. D. Gelatt, and M. P. Vecchi. 1983. “Optimization by Simulated Annealing.”
Science 220 (4598): 671–680. http://science.sciencemag.org/content/220/4598/671.
Levendovszky, Janos, and Farhad Kia. 2013. “Prediction based high frequency trading on
financial time series.” Periodica Polytechnica Electrical Engineering and Computer Science
56 (1): 29–34. https://pp.bme.hu/eecs/article/view/7165.
Lloyd, S. 2006. “Least Squares Quantization in PCM.” IEEE Trans. Inf. Theor. 28 (2): 129–
137. http://dx.doi.org/10.1109/TIT.1982.1056489.
Mart´ı, Rafael. 2003. Multi-Start Methods, 355–368. Boston, MA: Springer US.
Mart´ı, Rafael, Mauricio G.C. Resende, and Celso C. Ribeiro. 2013. “Multi-start methods for
combinatorial optimization.” European Journal of Operational Research 226 (1): 1 – 8.
http://www.sciencedirect.com/science/article/pii/S0377221712007394.
Ram, D.Janaki, T.H. Sreenivas, and K.Ganapathy Subramaniam. 1996. “Parallel Simulated
Annealing Algorithms.” Journal of Parallel and Distributed Computing 37 (2): 207 – 212.
Roe, G. 2006. “Quantizing for Minimum Distortion (Corresp.).” IEEE Trans. Inf. Theor. 10
(4): 384–385. http://dx.doi.org/10.1109/TIT.1964.1053693.
Saad, E. W., D. V. Prokhorov, and D. C. Wunsch. 1998. “Comparative study of stock trend
prediction using time delay, recurrent and probabilistic neural networks.” IEEE Transac-
tions on Neural Networks 9 (6): 1456–1470.
Salamon, Peter, Richard Frost, and Paolo Sibani. 2002. Facts, Conjectures, and Improvements
for Simulated Annealing. Philadelphia, PA, USA: Society for Industrial and Applied Math-
ematics.
Shan, Songqing, and G. Gary Wang. 2010. “Survey of modeling and optimization
strategies to solve high-dimensional design problems with computationally-expensive
black-box functions.” Structural and Multidisciplinary Optimization 41 (2): 219–241.
13
http://dx.doi.org/10.1007/s00158-009-0420-2.
Sipos, I. Robert, and Janos Levendovszky. 2013. “Optimizing sparse mean reverting portfolios.”
Algorithmic Finance 2 (2): 127–139.
14
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We study the problem of finding sparse, mean reverting portfolios based on multivariate historical time series. After mapping the optimal portfolio selection problem into a generalized eigenvalue problem, we propose a new optimization approach based on the use of simulated annealing. This new method ensures that the cardinality constraint is automatically satisfied in each step of the optimization by embedding the constraint into the iterative neighbor selection function. We empirically demonstrate that the method produces better mean reversion coefficients than other heuristic methods, but also show that this does not necessarily result in higher profits during convergence trading. This implies that more complex objective functions should be developed for the problem, which can also be optimized under cardinality constraints using the proposed approach.
Article
Full-text available
The integration of optimization methodologies with computational analyses/simulations has a profound impact on the product design. Such integration, however, faces multiple challenges. The most eminent challenges arise from high-dimensionality of problems, computationally-expensive analysis/simulation, and unknown function properties (i.e., black-box functions). The merger of these three challenges severely aggravates the difficulty and becomes a major hurdle for design optimization. This paper provides a survey on related modeling and optimization strategies that may help to solve High-dimensional, Expensive (computationally), Black-box (HEB) problems. The survey screens out 207 references including multiple historical reviews on relevant subjects from more than 1,000 papers in a variety of disciplines. This survey has been performed in three areas: strategies tackling high-dimensionality of problems, model approximation techniques, and direct optimization strategies for computationally-expensive black-box functions and promising ideas behind non-gradient optimization algorithms. Major contributions in each area are discussed and presented in an organized manner. The survey exposes that direct modeling and optimization strategies to address HEB problems are scarce and sporadic, partially due to the difficulty of the problem itself. Moreover, it is revealed that current modeling research tends to focus on sampling and modeling techniques themselves and neglect studying and taking the advantages of characteristics of the underlying expensive functions. Based on the survey results, two promising approaches are identified to solve HEB problems. Directions for future research are also discussed.
Chapter
Multi-start procedures were originally conceived as a way to exploit a local or neighborhood search procedure, by simply applying it from multiple random initial solutions. Modern multi-start methods usually incorporate a powerful form of diversification in the generation of solutions to help overcome local optimality. Different metaheuristics, such as GRASP or tabu search, have been applied to this end. This survey briefly sketches historical developments that have motivated the field and then focuses on modern contributions that define the current state of the art. Two classical categories of multi-start methods are considered according to their domain of application: global optimization and combinatorial optimization. Additionally, several methods are reviewed to estimate the number of local optima in combinatorial problems. The estimation of this number can help to establish the complexity of a given instance, and also to choose the most convenient neighborhood, which is especially interesting in the context of multi-start methods. Experiments on three well-known combinatorial optimization problems are included to illustrate the local optima estimation techniques.
Article
In this paper we investigate prediction based trading on financial time series assuming general AR(J) models. A suitable nonlinear estimator for predicting the future values will be provided by a properly trained FeedForward Neural Network (FFNN) which can capture the characteristics of the conditional expected value. In this way, one can implement a simple trading strategy based on the predicted future value of the asset price and comparing it to the current value. The method is tested on FOREX data series and achieved a considerable profit on the mid price. In the presence of the bid-ask spread, the gain is smaller but it still ranges in the interval 2-6 percent in 6 months without using any leverage. FFNNs can provide fast prediction which can give rise to high frequency trading on intraday data series.
Article
In this paper we investigate trading with optimal mean reverting portfolios subject to cardinality constraints. First, we identify the parameters of the underlying VAR(1) model of asset prices and then the quantities of the corresponding Ornstein-Uhlenbeck (OU) process are estimated by pattern matching techniques. Portfolio optimization is performed according to two approaches: (i) maximizing the predictability by solving the generalized eigenvalue problem or (ii) maximizing the mean return. The optimization itself is carried out by stochastic search algorithms and Feed Forward Neural Networks (FFNNs). The presented solutions satisfy the cardinality constraint thus providing sparse portfolios to minimize the transaction costs and to maximize interpretability of the results. The performance has been tested on historical data (SWAP rates, SP 500, and FOREX). The proposed trading algorithms have achieved 29.57% yearly return on average, on the examined data sets. The algorithms prove to be suitable for high frequency, intraday trading as they can handle financial data up to the arrival rate of every second.
Article
Multi-start methods strategically sample the solution space of an optimization problem. The most successful of these methods have two phases that are alternated for a certain number of global iterations. The first phase generates a solution and the second seeks to improve the outcome. Each global iteration produces a solution that is typically a local optimum, and the best overall solution is the output of the algorithm. The interaction between the two phases creates a balance between search diversification (structural variation) and search intensification (improvement), to yield an effective means for generating high-quality solutions. This survey briefly sketches historical developments that have motivated the field, and then focuses on modern contributions that define the current state-of-the-art. We consider two categories of multi-start methods: memory-based and memoryless procedures. The former are based on identifying and recording specific types of information (attributes) to exploit in future constructions. The latter are based on order statistics of sampling and generate unconnected solutions. An interplay between the features of these two categories provides an inviting area for future exploration.
Article
In this paper, we prove that any continuous mapping can be approximately realized by Rumelhart-Hinton-Williams' multilayer neural networks with at least one hidden layer whose output functions are sigmoid functions. The starting point of the proof for the one hidden layer case is an integral formula recently proposed by Irie-Miyake and from this, the general case (for any number of hidden layers) can be proved by induction. The two hidden layers case is proved also by using the Kolmogorov-Arnold-Sprecher theorem and this proof also gives non-trivial realizations.
Article
Artificial neural networks are universal and highly flexible function approximators first used in the fields of cognitive science and engineering. In recent years, neural network applications in finance for such tasks as pattern recognition, classification, and time series forecasting have dramatically increased. However, the large number of parameters that must be selected to develop a neural network forecasting model have meant that the design process still involves much trial and error. The objective of this paper is to provide a practical introductory guide in the design of a neural network for forecasting economic time series data. An eight-step procedure to design a neural network forecasting model is explained including a discussion of tradeoffs in parameter selection, some common pitfalls, and points of disagreement among practitioners.