PresentationPDF Available

Abstract and Figures

Forecasting in Big Data Environments High-dimensional Nonlinear Time Series Analysis Nonlinear Forecasting Using a Large Number of Predictors Forecasting in Big Data Environments with a Shrinkage Estimation of Skip-layer Neural Networks Past, Present and Future of Testing for Nonlinearity in Time Series Motivation and Inspirations, Nonlinear Factor Model
Content may be subject to copyright.
Forecasting in Big Data Environments
Ali Habibnia
Department of Statistics
London School of Economics
May 27th , 2017 (K.N.Toosi University of Technology)
High-dimensional Nonlinear Time Series Analysis
INonlinear Forecasting Using a Large Number of Predictors
IForecasting in Big Data Environments with a Shrinkage
Estimation of Skip-layer Neural Networks
IPast, Present and Future of Testing for Nonlinearity in Time
Series
Motivation and Inspirations
Designing novel statistical techniques to model the complexity of
large datasets.
ICurse of dimensionality Blessing of dimensionality
IRelaxing unrealistic assumptions of the classical models
IA resurgence in the field of artificial neural networks
Research Question
Is it possible to forecast with a high-dimensional panel of predictors
while considering nonlinear dynamic among variables?
ICurse of dimensionality
IFeature extraction
IFeature selection
ITo model complex and nonlinear data
IParametric nonlinear regression models
INonparametric and semiparametric models
IANN, Kernel-based methods & Tree-based regression models
Nonlinearity in (Financial) Series
Nonlinearity Between (Financial) Series
Nonlinear Forecasting Using a Large Number of Predictors
Linear (statistical) factor models:
Given a high-dimensional matrix of stationary time series (i.e. financial
returns), denoted by xit (i= 1, ..., m, t = 1, ..., T )
IFactor estimation step
(PCA, MLE, Kalman-Filter,...) PCA finds the projection such that the
best linear reconstruction of the data is as close as possible to the original
data.
xt=λ0
iut+ξt
IForecasting step
ˆxiT +1|T=ˆ
β0
iˆuT
ˆxiT +1|T=ˆ
λ0
iuT+1|T
ˆxiT +1|T=ˆ
λ0
iuT+1|T+ˆ
ξiT +1|T
Neural Networks: one of the oldest and one of the newest areas
IFormulation of a multilayer feedforward neural network model with more
than one hidden layer (i.e. hhidden layers when h= 1, ..., M ) can be
generalized to
yt= Φ(x;w) = φk"M
X
h=1
φh...φj
m
X
i=1
xitwij whk #+εt
ITo show that the neural network models can be seen as a generalization of
linear models, we allowed for direct connections from the input variables
to the output layer and we assumed that the output transfer function
{φk(.)}is linear, then the model becomes
yt=
m
X
i=1
xitwik +
J
X
j=1
φjm
X
i=1
xitwij wj k +εt
Nonlinear generalisation of factor models
IFactor estimation using neural network PCA
Figure: Schematic diagram of the standard autoassociative neural network architecture
for calculating the nonlinear principal component analysis (NLPCA).
Nonlinear generalisation of factor models
IFactor estimation using neural network PCA
y(x)
t=φ(x)
j(z(x))
u=φ(x)
k(y(x)
t)
y(u)
t=φ(u)
j(z(u))
˜xt=φ(u)
k(y(u)
t)
INonlinear forecasting step
ˆxiT +1|T= Φ(ˆu(N L)
T)
ˆxiT +1|T=φ(uNL )
k(φ(uNL )
ju(N L)
T+1|T))
ˆxiT +1|T=φ(uNL )
k(φ(uNL )
ju(N L)
T+1|T)) + ˆ
ξ(NL)
iT +1|T
Shrinkage Estimation of Skip-layer Neural Networks
yt= Φ(x;w) =
m
X
i=1
xitwik +
J
X
j=1
φjm
X
i=1
xitwij wjk +εt,
x
it
x
mt
j
k
wij
wik
wjk
Figure: A single-hidden-layer neural network with skip-layer connections
Empirical analysis
IThe data are daily returns of m= 418 equities on the S&P 500 index from
04.01.2005 through 31.12.2014.
IWe calculate 1-step (here one day) ahead forecasts of targets (ˆxit+1|treturn
series to be forecast) based on a rolling (moving) estimation window.
Fraction of the variance explained by the first three PCs
Comparison of linear and nonlinear factor models based on the performance of the portfolio
simulation during an out-of-sample period
Table:
Portfolio Return Sharp Ratio
FM(ut) 4.35% 9.1770
FM(ut+1) 7.51% 17.4927
NLFM(ut) 7.87% 25.0019
NLFM(ut+1) 7.41% 18.8963
Comparison of linear and nonlinear factor models and the benchmark models based on the
performance of the portfolio simulation
(a) Linear and nonlinear factor models against an invest-
ment on S&P 500 index
Comparison of linear and nonlinear factor models and the benchmark models based on the
performance of the portfolio simulation
(b) Linear and nonlinear factor model against Random
walk
Comparison of linear and nonlinear factor models, and the models with only one nonlinear
step based on the performance of the portfolio simulation
Portfolio Return Sharp Ratio
Linear FM 7.51% 17.4927
Nonlinear FM 7.87% 25.0019
Nonlinear in factor estimation step 7.61% 18.6259
Nonlinear in forecast equation step 6.83% 17.8577
Comparison of linear and nonlinear factor models and the Hybrid model based on the
performance of the portfolio simulation during out-of-sample period
Table:
Portfolio Return Sharp Ratio
Linear FM 7.51% 17.4927
Nonlinear FM 7.87% 25.0019
Hybrid model 9.32% 19.2152
How can we achieve a hierarchical order? Scholz, Fraunholz,
and Selbig (2008)
IA hierarchical order essentially yields uncorrelated components.
IIn principle, hierarchy can be achieved by two strongly related ways:
either by a constraint to the variance in the component space or by a
constraint to the squared reconstruction error in the original space.
IThe solution is to use only one network with a hierarchy of
subnetworks.
IBoth the error E1of the subnetwork with one component and the error
of the total network with two components are estimated in each
iteration. The network weights are then adapted at once with regard to
the total hierarchic error E=E1+E1,2
Iminimising the hierarchical error: EH=E1+E1,2
ITo find the optimal network weights for a minimal error in the
h-NLPCA, the conjugate gradient descent algorithm is used.
IThe gradient is the sum of the individual gradients.
How to get the variance that is explained by NLPCA?
Monahan, A. H. (2001)
INonlinear PCA works by minimizing the mean square error
(MSE) which can be seen as minimizing the remaining variance
and hence indirectly maximizing the variance explained (covered)
by the components.
IIn NLPCA the explained variance of a component is estimated as
variance of the reconstructed data by using only one or more than
one components, normalized by total variance.
How to find the optimal number of factors? Scholz, M. (2012)
IWhen we use hierarchical NLPCA, the factors will be
ranked/ordered as in classical linear PCA. (Desire explained
variance)
IUse missing data validation for estimating the best number of
factors. This is motivated by the idea that only the model of
optimal complexity is able to predict missing values with the
highest accuracy.
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.