Content uploaded by Ali Habibnia

Author content

All content in this area was uploaded by Ali Habibnia on Aug 17, 2017

Content may be subject to copyright.

Research Question

Is it possible to forecast with a high-dimensional panel of predictors

while considering nonlinear dynamic among variables?

ICurse of dimensionality

IFeature extraction

IFeature selection

ITo model complex and nonlinear data

IParametric nonlinear regression models

INonparametric and semiparametric models

IANN, Kernel-based methods & Tree-based regression models

Nonlinear Forecasting Using a Large Number of Predictors

Linear (statistical) factor models:

Given a high-dimensional matrix of stationary time series (i.e. ﬁnancial

returns), denoted by xit (i= 1, ..., m, t = 1, ..., T )

IFactor estimation step

(PCA, MLE, Kalman-Filter,...) PCA ﬁnds the projection such that the

best linear reconstruction of the data is as close as possible to the original

data.

xt=λ0

iut+ξt

IForecasting step

ˆxiT +1|T=ˆ

β0

iˆuT

ˆxiT +1|T=ˆ

λ0

iuT+1|T

ˆxiT +1|T=ˆ

λ0

iuT+1|T+ˆ

ξiT +1|T

Neural Networks: one of the oldest and one of the newest areas

IFormulation of a multilayer feedforward neural network model with more

than one hidden layer (i.e. hhidden layers when h= 1, ..., M ) can be

generalized to

yt= Φ(x;w) = φk"M

X

h=1

φh...φj

m

X

i=1

xitwij whk #+εt

ITo show that the neural network models can be seen as a generalization of

linear models, we allowed for direct connections from the input variables

to the output layer and we assumed that the output transfer function

{φk(.)}is linear, then the model becomes

yt=

m

X

i=1

xitwik +

J

X

j=1

φjm

X

i=1

xitwij wj k +εt

Comparison of linear and nonlinear factor models, and the models with only one nonlinear

step based on the performance of the portfolio simulation

Portfolio Return Sharp Ratio

Linear FM 7.51% 17.4927

Nonlinear FM 7.87% 25.0019

Nonlinear in factor estimation step 7.61% 18.6259

Nonlinear in forecast equation step 6.83% 17.8577

How can we achieve a hierarchical order? Scholz, Fraunholz,

and Selbig (2008)

IA hierarchical order essentially yields uncorrelated components.

IIn principle, hierarchy can be achieved by two strongly related ways:

either by a constraint to the variance in the component space or by a

constraint to the squared reconstruction error in the original space.

IThe solution is to use only one network with a hierarchy of

subnetworks.

IBoth the error E1of the subnetwork with one component and the error

of the total network with two components are estimated in each

iteration. The network weights are then adapted at once with regard to

the total hierarchic error E=E1+E1,2

Iminimising the hierarchical error: EH=E1+E1,2

ITo ﬁnd the optimal network weights for a minimal error in the

h-NLPCA, the conjugate gradient descent algorithm is used.

IThe gradient is the sum of the individual gradients.

How to get the variance that is explained by NLPCA?

Monahan, A. H. (2001)

INonlinear PCA works by minimizing the mean square error

(MSE) which can be seen as minimizing the remaining variance

and hence indirectly maximizing the variance explained (covered)

by the components.

IIn NLPCA the explained variance of a component is estimated as

variance of the reconstructed data by using only one or more than

one components, normalized by total variance.

How to ﬁnd the optimal number of factors? Scholz, M. (2012)

IWhen we use hierarchical NLPCA, the factors will be

ranked/ordered as in classical linear PCA. (Desire explained

variance)

IUse missing data validation for estimating the best number of

factors. This is motivated by the idea that only the model of

optimal complexity is able to predict missing values with the

highest accuracy.