Content uploaded by Moritz Heiden

Author content

All content in this area was uploaded by Moritz Heiden on Nov 07, 2015

Content may be subject to copyright.

Pitfalls of the Cholesky Decomposition for forecasting

multivariate volatility

M. D. Heiden∗

July 23, 2015

Abstract This paper studies the pitfalls of applying the Cholesky decomposition for fore-

casting multivariate volatility. We analyze the impact of one of the main issues in empirical

application of using the decomposition: The sensitivity of the forecasts to the order of the

variables in the covariance matrix. We ﬁnd that despite being frequently used to guarantee

positive semi-deﬁniteness and symmetry of the forecasts, the Cholesky decomposition has

to be used with caution, as the ordering of the variables leads to signiﬁcant diﬀerences in

forecast performance. A possible solution is provided by studying an alternative, the matrix

exponential transformation. We show that in combination with empirical bias correction,

forecasting accuracy of both decompositions does not signiﬁcantly diﬀer. This makes the

matrix exponential a valuable option, especially in larger dimensions.

Keywords: realized covariances; realized volatility; cholesky; decomposition; forecasting

JEL Classiﬁcation Numbers: C1, C53, C58

∗Department of Statistics, Faculty of Business and Economics, University of Augsburg, Germany. Email:

moritz.heiden@wiwi.uni-augsburg.de. Phone: +49 821 598-4027. The author gratefully acknowledges the

support from the German Research Foundation (DFG)-Project “Wishart Processes in Statistics and Econo-

metrics: Theory and Applications”.

1

1 Introduction

Forecasts of the covariance matrix are a crucial ingredient of many economic applications

in asset and risk management. The concept of using realized covariances as a proxy for

the unobservable volatility process, has largely spread since the availability and improved

accessibility of high-frequency data in ﬁnance. Recent approaches to forecast the realized

covariance (RCOV) matrix are based upon uni- or multivariate time series models. To ensure

mathematical validity of the forecasted RCOV matrix, such as symmetry and positive semi-

deﬁniteness, either parameter restrictions or decompositions are used. The latter one is

preferred to guarantee parsimony, especially if dimensions are large.

Latest multivariate approaches that ensure symmetry and positive semi-deﬁniteness of the

RCOV matrix include the Wishart Autoregressive (WAR) model proposed by Gouri´eroux

et al. (2009) and its dynamic generalization, the Conditional Autoregressive Wishart by

Golosnoy et al. (2012). Chiriac (2010) shows, that the WAR estimation is very sensitive

to assumptions on the underlying data, causing degenerate Wishart distributions and af-

fecting the estimation results. Consequently, Chiriac & Voev (2011) choose the way of

transformation and base their Vector Autoregressive Fractionally Integrated Moving Aver-

age (VARFIMA) model on a Cholesky decomposition (CD) of the covariance matrix. Bauer

& Vorkink (2011) instead transform the covariance matrix by using the matrix exponential

transformation (MET) and use a factor model approach for the individual components. An-

dersen et al. (2006) and Colacito et al. (2011) modify the Dynamic Conditional Correlation

(DCC) model of Engle (2002), splitting up variances and covariances in the modeling pro-

cess. A similar approach is implemented in Halbleib & Voev (2011a). The authors suggest

a mixed data sampling method based on low-frequent estimators for the correlations.

Usually, the choice of the method can be motivated by the unique properties of the

respective decomposition. For example, while the elements of the CD are explicitly linked to

the entries of original RCOV matrix, this is not the case for the MET. On the other hand,

both, CD and MET, do not separate variances and covariances. This can be achieved by

2

applying a DCC type decomposition, allowing for more ﬂexibility in the modeling process.

Moreover, as Halbleib & Voev (2011a) point out, high-frequency estimators for the whole

covariance matrix are often noisy and simple methods to reduce microstructure noise, such

as sparse-sampling (see Andersen et al. (2003)), are not applicable in large dimensions.

Overall, due to its simplicity, the CD remains the most frequently used method in the

literature, beside the problem that each permutation of the elements in the original matrix

yields a diﬀerent decomposition. The problem is well known in the literature on Vector

Autoregression (VAR), where the VAR is usually identiﬁed using the corresponding CD to

derive the dynamic response of each variable to an orthogonal shock (see e.g. Sims (1980)).

Keating (1996) show, that the ordering of the variables is crucial to obtain structural impulse

responses, which is only possible if the system of equations is partially recursive. The gravity

of the problem for VAR based approaches has also been pointed out by Kl¨oßner & Wagner

(2014), who analyze the extent, to which measuring spillovers is inﬂuenced by the order of

the variables.

In this paper, we focus on the impact of the ordering of the assets in the original covariance

matrix on the forecasts, if a CD is used. Since the amount of possible permutations grows

very fast with increasing number of assets, it is computationally burdensome to calculate

and compare forecasts from each ordering. Therefore, we evaluate the predictive accuracy of

all 720 permutations based on a small data set of six assets. Analyzing the loss distributions

of two established loss functions, we ﬁnd diﬀerences of up to 18% between the average loss

of the best and worst model. Using the Model Conﬁdence Set framework of Hansen et al.

(2011), we show that these loss diﬀerences are indeed statistically signiﬁcant, meaning that

an arbitrary ordering may result in suboptimal forecasts and hence poor model choices.

Additionally, we take a look at the impact of a simple empirical bias correction, as the

forecasts from both, CD and MET are biased by construction. We show, that using the

ordering invariant MET and applying the bias reduction provides a possible solution to the

ordering problem, as forecasts are not signiﬁcantly diﬀerent from the best CD.

3

2 Decomposition of the realized covariance matrix

Let Rtbe the N×1 vector of log returns over each period of Tdays. For a portfolio

consisting of Nstocks:

Rt=p(t)−p((t−1)),

p(t) = (p1t, . . . ,pNt) being the log price at time t∈[1, . . . ,T ].

Assume, there are Mequally spaced intra-day observations, the i-th intra-day return for

the t-th period is:

ri,t ≡p((t−1) + i1

M)−p((t−1) + (i−1) 1

M),(1)

with i= 1, . . . , M. According to Barndorﬀ-Nielsen & Shephard (2002), the N×NRCOV

matrix for the t-th period is then deﬁned as

Yt=

M

X

i=1

ri,tr0

i,t,(2)

which is a consistent estimator for the conditional variance-covariance matrix of the log re-

turns, V ar [Rt|Ft−1] = Σt. The estimator can be reﬁned to reduce market microstructure

noise (e.g. Hayashi & Yoshida (2005); Zhang et al. (2005)) and account for jumps (Chris-

tensen & Kinnebrock, 2010). The issue of asynchronicity of the data can be addressed by

methods such as linear or previous-tick interpolation (Dacorogna, 2001) and subsampling

(Zhang, 2011), which are easy to implement in empirical work. More complex procedures

are often based on the use of multivariate realized kernels (see e.g. Barndorﬀ-Nielsen et al.

(2008)). However, as Halbleib & Voev (2011a) point out, these methods are still limited in

application as they may lead to data loss or do not guarantee positive deﬁniteness.

4

2.1 Cholesky decomposition

The CD, decomposes a real, positive deﬁnite1matrix into the product of a real upper

triangular matrix and its transpose (Brezinski, 2006).

The Cholesky decomposition of the naturally symmetric and positive semi-deﬁnite Yt,

with Ptbeing an upper triangular matrix, yields:

Yt=

y11,t y12,t · · · y1N,t

y12,t y22,t · · · y2N,t

.

.

..

.

.....

.

.

y1N,t · · · · · · yNN,t

(3)

=

p11,t 0· · · 0

p12,t p22,t · · · 0

.

.

..

.

.....

.

.

p1N,t p2N,t · · · pNN ,t

p11,t p12,t · · · p1N,t

0p22,t · · · p2N,t

.

.

..

.

.....

.

.

0 0 · · · pNN ,t

(4)

=P0

tPt.

The elements pij,t,i, j = 1, . . . , N,i<j are real and can be calculated recursively by

pij,t =

1

pii,t yij,t −Pi−1

k=1 pki,tpkj,tfor i<j

qyjj,t −Pj−1

k=1 p2

kj,t for i=j

0 for i>j

(5)

In reverse, the realized covolatilities can be expressed in terms of the Cholesky elements

yij,t =

min{i,j}

X

`=1

p`i,tp`j,t.(6)

1Or positive semi-deﬁnite if the condition of strict positivity for the diagonal elements of the triangular

matrix is dropped

5

Since in practice, modeling is carried out on the elements of the CD, one of the problems

depicted in equation 5 is the inﬂuence of the ordering of the variables in the covariance

matrix. Consider for example, that we swap the position of the ﬁrst and second asset in

the return vector. As a result, the elements in the ﬁrst and second row of the matrix in

equation 3 will change its positions. Due to the recursive calculation of the elements in Pt

, the corresponding Cholesky elements in the ﬁrst and second row of Ptwill not merely be

swapped, but completely change magnitude. Using the CD for a N×Nportfolio, there are

N! possible permutations of the stocks in the matrix, resulting in diﬀerent decompositions

that are nonlinearly related to each other. Hence, the resulting time-series of Cholesky

elements pij,t diﬀer between the decompositions. For all model based on the CD, this may

lead to varying model choices, parameter estimates and also forecasts.

Another issue arises in obtaining forecasts for b

Yt+1. Being a quadratic transformation of

the forecast for b

Pt+1, the forecast b

Yt+1 may not be unbiased, even if the forecasts for b

Pt+1

are. This problem is further illustrated in section 2.4.

Furthermore, a often desirable feature of covariance forecasting, namely the separation

of variances and covariance dynamics can not be achieved by using the CD directly on the

covariance matrix. However, it is possible to ﬁrst apply a DCC decomposition approach

and a CD on the correlation matrix thereafter. In general, the nonlinear dependence of

the elements in the decomposition can also be an advantage, as the dependency structure

between the Cholesky elements can be studied and used for forecasting, see e.g. Brechmann

et al. (2015).

2.2 Matrix exponential transformation

For the covariance matrix, the matrix exponential transformation (MET) was introduced

together with the matrix logarithm function by Chiu et al. (1996). In mathematics, both

operators are frequently used for solving ﬁrst-order diﬀerential systems, see e.g. Bellman

(1997).

6

For any real, symmetric matrix At, the matrix exponential transformation performs a

power series expansion, resulting in a real, positive (semi-)deﬁnite matrix, in our case Yt,

Yt=Exp(At) =

∞

X

s=0 1

s!As

t,(7)

with A0

tbeing the identity matrix of size N×N, and As

tbeing the s-times the standard

matrix multiplication of At.

In reverse, a real, symmetric matrix Atcan be obtained from Ytby the inverse of the

matrix exponential function, the matrix logarithm function, logm(·),

At=

a11,t a12,t · · · a1N,t

a12,t a22,t · · · a2N,t

.

.

..

.

.....

.

.

a1N,t · · · · · · aN N,t

=logm(Yt).(8)

Again, a reasonable practical approach would be, to model and forecast the elements aij,t,

i, j = 1, . . . , N and obtain valid covariance forecasts by equation 7. However, due to the

power series the expansion, the relationship between Ytand Atis not straightforward to

interpret (see e.g. Asai et al. (2006)) and similar to the CD in section 2.1, covariances and

variances cannot be estimated separately. By applying models to At, therefore doing the

estimation and forecasting in the log-volatility space, the retransformed forecasts for Yt+1,

will be biased by Jensen’s inequality. The problem and possible solutions are illustrated in

2.4.

Nevertheless the MET has several advantages, especially related to factor models, where

a certain factor structure is analyzed by principal component methods. It can be shown, that

under several conditions, as for example in our case symmetry and positive semi-deﬁniteness

of Yt, applying the matrix logarithm function yielding At, corresponds to decomposing Yt

7

into its eigenvalues and eigenvectors (see Chiu et al. (1996)). Hence, the As

tcan be obtained

more easily via the eigenvectors than by matrix multiplication as in equation 7. Further, as

principal component analysis of the matrix Ytis also based upon eigenvalue decomposition,

restrictions on the structure of the covariance matrix models can be directly implemented

while constructing the As

t, see e.g. Chiu et al. (1996) or Bauer & Vorkink (2011).

2.3 HAR model

One of the most simple and yet successful univariate models for volatility forecasting

is the Heterogeneous Autoregressive (HAR) model of Corsi (2009). It is inspired by the

Heterogeneous Market Hypothesis (M¨uller et al., 1993), which amongst other things assumes

that market participants act on diﬀerent time horizons (dealing frequencies) due to their

individual preferences, and therefore create volatility speciﬁcally on these horizons. Since in

practice, volatility over longer time intervals has stronger inﬂuence on those over shorter time

intervals than conversely (Corsi, 2009), the HAR models volatility by an additive cascade of

components of volatilities in an autoregressive framework..

This leads to the following model for the daily realized volatilities xt

xt=c+β(d)x(d)

t−1+β(w)x(w)

t−1+β(m)x(m)

t−1+εt, εt

iid

∼(0, σ2),(9)

where x(·)

tis the realized volatility over the corresponding periods of interest, one day (1d), one

week (1w) and one month (1m), which are deﬁned as: : x(d)

t=xt−1,x(w)

t= 5−1P5

i=1 xt−i+1

and x(m)

t= 22−1P22

i=1 xt−i+1.

The main advantages of the HAR are, that it is easy to estimate within an OLS frame-

work, parameters are directly interpretable and it reproduces volatility characteristics such

as long-memory without a fractionally integration component. The latter is especially inter-

esting, as the long-memory property could also stem from multifractal scaling2, which can

2The underlying process scales diﬀerently for various time horizons.

8

be captured by an additive component model as the HAR, whereas fractionally integrated

models imply univariate scaling (Andersen & Bollerslev, 1996). Under the HMH hypothesis,

multifractal scaling possesses clear economic justiﬁcation which is directly interpretable in

the HAR framework, due to the simple parameter structure (Corsi, 2009).

Regarding forecasting, standard methods for a general ARMA framework can be used

to produce direct or iterated forecasts of the conditional volatility. In contrast to the above

conventional HAR model, which is directly applied on a time-series of realized volatilities,

we use the model on the time-series of the elements of the CD or the MET, by replacing the

components xtand x(·)

twith the respective pij,t or aij,t from equation 4 and 8.

2.4 Forecasting and bias correction

To obtain forecasts for the RCOV matrix b

Yt+1, the forecasts bpij,t or baij,t are obtained by

the HAR model in section 2.3 and retransformed by equation 5 respectively 8.

For the CD, this last transformation is nonlinear and induces a theoretical bias, which

is derived in Chiriac & Voev (2011) and can be expressed by the covariances of the forecast

errors u·,t+1 of the HAR model

E[ˆyij,t+1 −yij,t+1] =

max{i,j}

X

`=1

E[u`i,t+1u`j,t+1].(10)

However, since we estimate the models independently of each other, the expression is

not feasible as we cannot consistently estimate the covariance matrix of the forecast errors.

A heuristic approach to obtain unbiased predictions is suggested in Chiriac & Voev (2011)

and further studied in Halbleib & Voev (2011b). In the original approach, due to the larger

distortion of volatilities, bias correction is only carried out on the series of realized volatilities

ˆyii,t,i∈1, . . . , N. However, as implied by equation 10, all elements of b

Yt+1 will be biased.

Hence, an adaption of the approach of Chiriac & Voev (2011), that corrects volatility and

9

covariance forecasts can be obtained by:

ˆy(corrected),ij,t+1 = ˆyij,t+1 ·median yij,t

ˆyij,t t=1,...,n

.

Note, that the window length non which the median is estimated controls for the trade-

oﬀ between the bias and the precision of the correction. Since we are interested in the general

relation between bias correction, we simply estimate the median in the bias correction factor

on a window length equal to our estimation window for the HAR model in section 3.

In case of the MET, the analytical bias correction is more complicated, but can be

derived if b

Atand the estimated residuals ˆεtare both normally distributed, see Bauer &

Vorkink (2011) for a detailed discussion. However, since normality is empirically often not

satisﬁed, Bauer & Vorkink (2011) suggest a similar approach to Chiriac & Voev (2011).

Their method decomposes the forecasted matrix of realized covariances b

Yt+1 into correlations

and volatilities, bias correcting the latter ones only and leaving the correlations intact. For

comparative reasons, we apply our method in equation 2.4, which works well in our empirical

application for both, CD and MET, see section 3. Note that bias correcting not only the

volatilities but also the covariances bears the risk of the corrected RCOV matrix forecast no

longer being positive semi-deﬁnite. However, in our application, this is never the case.

2.5 Loss functions and the MCS

According to Patton & Sheppard (2009), two issues are of major importance when com-

paring forecasts of the covariance matrix. First, tests have to be robust to noise in the

volatility proxy and second, they should only require minimal assumptions on the distribu-

tion of the returns. Therefore, we rely on the method of Hansen et al. (2011) using a model

conﬁdence set (MCS) approach based upon diﬀerent loss functions to evaluate the multi-

variate volatility forecasts. This framework fulﬁlls the requirements of Patton & Sheppard

(2009) and has the advantage that we can conveniently compare forecasts from many models

10

without using a benchmark. Furthermore, the MCS does not necessarily select a single best

model but it allows for the possibility of equality of the models forecasting ability. Hence, a

model is only removed from the MCS if it is signiﬁcantly inferior to other models, making

the MCS more robust in comparing volatility forecasts.

For our approach, we choose two loss functions that satisfy the conditions of Hansen &

Lunde (2006) for producing a consistent ranking in the multivariate case. Consistency in the

context of loss functions means, that the true ranking of the covariance models is preserved,

regardless if the true conditional covariance or an unbiased covariance proxy is used (Hansen

& Lunde, 2006). For the comparison of forecasts of the whole covariance matrix, Laurent

et al. (2013) present two families of loss functions that yield a consistent ordering. The ﬁrst

family, called p-norm loss functions can be written as

LYt,b

Ytp= N

X

i,j=1

|yij,t −byij,t|p!1/p

,(11)

where b

Ytis the forecast from our model for the actual RCOV matrix Yt, which we use as

a proxy for the unobservable covariance matrix Σt. The respective elements of the matrices

are denoted by yij,t and byij,t. From this class, we consider the commonly used multivariate

equivalent of the mean squared error (MSE) loss: L Yt,b

Yt2

2.

The second family, called eigenvalue loss functions is based upon the square root of the

largest eigenvalue of the matrix (Yt−b

Yt)2. We will consider a special case of this family,

the so called James-Stein loss (James & Stein, 1961), which is usually referred to as the

Multivariate Quasi Likelihood (QLIKE) loss function:

LYt,b

Yt=tr(b

YtYt)−ln b

YtYt−N, (12)

where Nis the number of assets.

While both, the MSE and the QLIKE loss function determine the optimal forecasts based

11

on conditional expectation, Clements et al. (2009) point out, that compared to the MSE, the

QLIKE has greater power in distinguishing between volatility forecasts based on the MCS

framework. As pointed out in Laurent et al. (2013), the QLIKE penalizes underpredictions

more heavily than overpredictions. West et al. (1993) show, that this is also relevant from

an investor’s point of view, as an underestimation of variances leads to lower expected

utility than an equal amount of overestimation. Hence, for a risk averse investor, punishing

underpredictions more heavily seems to be rationale when evaluating forecasting accuracy.

For the MCS approach, we start with the full set of candidate models M0={1, . . . ,m0}.

For all models, the loss diﬀerential between each model is computed based upon one of our

loss functions Lk,k= 1 (MSE),2 (QLIKE), so that for model iand j,i,j = 1, . . . ,m0and

every time point t= 1, . . . ,T we get:

dij,kt = LkYit ,b

Yit−LkYjt,b

Yjt.(13)

At each step of the evaluation, the hypothesis

H0:E[dij,kt ]=0,∀i>j∈ M,(14)

is tested for a subset of models M ∈ M0, where M=M0for the initial step. If the H0

is rejected at a given signiﬁcance level α, the worst performing model is removed from the

set. To give an impression on the scale of rejection, for each loss function and model, the

respective αat which the model would be removed from the MCS can be computed.

This process continues until a set of models remains that cannot be rejected. Similar to

Hansen et al. (2011), we use the range statistics to evaluate the H0, which can be written

as:

TR= max

i,j∈M |tij,k|= max

i,j∈M ¯

dij,k

pcvar( ¯

dij,k ),(15)

where ¯

dij,k =1

TPT

t=1 dij,k and cvar( ¯

dij,k ) is obtained from a block-bootstrap procedure, see

12

Hansen et al. (2011), which we implement with 10000 replications and a block length varying

from 20 to 50 to check the robustness of the results.

The worst performing model to be removed from the set Mis selected as model iwith

i= arg max

i∈M

¯

di,k

pcvar( ¯

di,k),(16)

where ¯

di,k =1

m−1Pj∈M ¯

dij,k and mbeing the number of models in the actual set M.

3 Empirical study

3.1 Data and descriptive statistics

The dataset stems from the New York Stock Exchange (NYSE) Trade and Quotations

(TAQ) and corresponds to the one used in Chiriac & Voev (2011). It was obtained from the

Journal of Applied Econometrics Data Archive. The original data ﬁle consists of all tick-by-

tick bid and ask quotes on six stocks listed on the NYSE, American Stock Exchange (AMEX)

and the National Association of Security Dealers Automated Quotation system (NASDAQ).

The sample ranges from 9:30 EST until 16:00 EST over the period January 1, 2000 to July 30,

2008 and consists of (2156 trading days). Included individual stocks are American Express

Inc. (AXP), Citigroup (C), General Electric (GE), Home Depot Inc. (HD), International

Business Machines (IBM) and JPMorgan Chase&Co (JPM). The original tick-by-tick data

has previously been transformed as follows. To obtain synchronized and regularly spaced

observations, the previous-tick interpolation method of Dacorogna (2001) is used.

Then, log-midquotes are constructed from the bid and ask quotes by geometric averaging.

M= 78 equally spaced 5-minute return vectors ri,t are computed from the log-midquotes.

Daily open-to-close returns are computed as the diﬀerence in the log-midquote at the end

and the beginning of each day.

For each daily period t= 1,...,2156, the series of daily RCOV matrices is constructed

13

as in section 2 by summing up the squared 5-minute return vectors:

Yt=

M

X

i=1

ri,tr0

i,t.(17)

This approach is further reﬁned by a subsampling procedure to make the RCOV estimates

more robust to microstructure noise and non-synchronicity (see Zhang (2011)). From the

original data, 30 regularly ∆-spaced subgrids are constructed with ∆ = 300 seconds, start-

ing at seconds 1,11,21,...,291. For each subgrid, the log-midquotes are constructed and

the RCOV matrix is obtained on each subgrid according to equation 17. Then, the RCOV

matrices are averaged over the subgrids. To avoid noise by measuring overnight volatilities,

all computations are applied to open-to-close data. For the descriptive statistics and esti-

mation purposes, all daily and intradaily returns are scaled by 100, so that the values refer

to percentage returns.

At each point of time t, we apply either the CD or the MET on the obtained RCOV

matrix. Additionally, we take the logarithm of the elements on the diagonal, to ensure

positivity of the elements of the decomposition when applying the time-series models. Since

the ordering of the assets in the original RCOV matrix is relevant for the CD, we refer to

the basic alphabetic ordering of the individual stocks in section 3.1 for the initial descriptive

analysis of the elements of the CD.

In general, the elements of both decompositions exhibit the same characteristics as the

realized covariances, such as volatility clustering, right skewness, excess kurtosis and high

levels of autocorrelation, see tables 2 and 3. All series seem to be stationarity based on the

Augmented Dickey-Fuller test.

3.2 Optimal odering

If the ordering might indeed be crucial for the forecast performance, the question arises

if there is any possibility to determine the optimal position of an asset in the original return

14

vector before evaluating all permutations. According to equation 6, the forecasts in column

j,{byij }i=1,...,j only depend on the forecasted entries of the Cholesky matrix Pup to column

1, . . . , j,{bp`i }i≤j. Hence, if the asset is for example moved from position i= 1 in the return

vector to a position i > 1, the number of forecasted Cholesky elements that enter calculation

of the covolatility forecast increases with every increase in the position. Intuitively, assets

that are more correlated with each other should be placed after assets that are less correlated,

so that their dependence is picked up by the Cholesky elements. Similarly, in the estimation

of structural VARs, variables are often ordered by their degree of exogeneity from most to

least exogenuous, see e.g. Bernanke & Blinder (1992); Keating (1996). However, the CD is

only useful for identifying the structural relationship under rather restrictive conditions, e.g.

in case of VAR modeling if the underlying relationship is recursive. Based on our data set

of equity returns, we cannot impose a structural relationship by means of economic theory.

Nevertheless, we analyze the correlation structure of the realized variances of the six assets

to identify possible linkages that might be helpful in ordering the assets. The full-sample

correlation matrix of the time-series of realized variances for the natural alphabetic ordering

of the assets are given in ﬁgure 1.

On the left sid, the ordering of the elements in the return vector matrix is used, while on

the right side the correlations are ordered based on the angular positions of the eigenvectors

of the correlation matrix. This method is sometimes called “correlation ordering” (Friendly

& Kwan, 2003) and places similar variables contiguously. The correlation matrix on the right

shows, which assets should be grouped together. Note that the correlations are not sorted

by size, eg. from highest to lowest average correlation. We now proceed to analyze two

questions. First, do diﬀerent orderings indeed yield signiﬁcantly diﬀerent forecasts? Second,

does ordering the variables in the returns vector similar to the rule of correlation ordering

produce superior forecasts?

15

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

AXP

C

GE

HD

IBM

JPM

AXP

C

GE

HD

IBM

JPM

100 77

100

77

69

100

81

71

79

100

70

60

70

69

100

71

91

71

66

55

100

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

IBM

HD

GE

AXP

C

JPM

IBM

HD

GE

AXP

C

JPM

100 69

100

70

79

100

70

81

77

100

60

71

69

77

100

55

66

71

71

91

100

Figure 1: Correlation matrix of the original time-series of realized variances. On the left,

correlations are ordered by the alphabetic order in the return vector. On the right, correla-

tions are ordered based on the angular positions of the eigenvectors of the correlation matrix.

The estimate of the corresponding correlation coeﬃcient is given inside the square.

3.3 Modeling and forecasting procedure

For each decomposition and permutation of the assets, we apply the HAR model from

section 2.3 on each time-series of CD or MET elements. Since the MET is independent of

the chosen permutation, we can use the resulting model as a benchmark model. For the CD,

we obtain 21 diﬀerent models for each one of the 720 permutations. We retain the last 200

observations of the dataset for one-step ahead out-of-sample forecasting and estimate the

models based on the a moving window of 1956 observations. Forecasts of the RCOV matrix

are then generated according to section 2.4.

First, for each permutation we evaluate the forecasts by means of the multivariate loss

functions from section 2.5. For the CD and both loss functions, we take the average loss over

time for each permutation to obtain a distribution of losses, see ﬁgure 2. The corresponding

descriptive statistics are given in the upper half of table 4.

The loss density of the MSE is multimodal and left skewed with an average loss of 271.71.

In comparison, the average loss of the MET is 334.21, which is 16% larger than the maximum

16

0.00

0.05

0.10

0.15

265 270 275 280 285 290

loss

density

0

25

50

75

100

0.58 0.59 0.60 0.61

loss

density

Figure 2: Average (over time) MSE (left) and QLIKE (right) density for all permutations.

Red line is the mean value.

MSE loss of the CD. The diﬀerence between largest and smallest average loss is 8%. The

QLIKE loss density is more symmetric and only slightly right skewed, with an average loss

of 0.60, compared to an average loss of 0.71 for the MET. Again, the MET QLIKE loss is

16% larger than the largest QLIKE loss of the CD. The standard deviation of the QLIKE

losses is signiﬁcantly smaller (p < 0.01) than for the MSE loss, based on the Brown-Forsythe

test. Still, the diﬀerence between largest and smallest average loss is roughly 5%. Ranking

the models from best to worst (smallest to largest average losses over time), we ﬁnd that

the ordering is not consistent across the loss functions. Evaluating the model performance

over time instead of taking averages, the most frequent best model is identical in 4 of the

200 out-of-sample forecasts for both loss functions. The most frequent worst model on the

other hand diﬀers between both loss function. For the MSE, one certain ordering is the

worst model in 12 out of the 200 forecasts. In case of the QLIKE, the most frequent worst

ordering has the highest loss in 3 out of 200 times.

To come back to the question if the method of correlation ordering in section 3.2 is helpful

in determining the best model ex-ante, we list the worst and best orderings based on the

average loss for both loss function in table 1. To simplify the notation, we rename the assets

by their position in the alphabetic return vector, namely AXP= 1, C= 2, GE= 3, HD= 4,

IBM= 5 and JPM= 6.

Surprisingly, the best model under the MSE loss function nearly coincides with the model

17

worst best ex-ante ex-ante vs best

MSE “3 2 1 6 4 5” “5 4 3 1 6 2” “5 4 3 1 2 6” 1.002

QLIKE “3 1 5 4 2 6” “6 2 3 4 5 1” “5 4 3 1 2 6” 1.506

Table 1: Orderings with the highest and lowest average losses (without bias correction)

based on the respective loss function. Ex-ante gives the order proposed by the method of

correlation ordering. Additionally the average loss of the ex-ante model relative to the best

model is listed.

suggested by the method of correlation ordering, with only asset 2 and 6 switching positions.

For the QLIKE loss, only asset 3 is on the same position in the best model compared to the

ex-ante ordering. Regarding the average loss size, the ex-ante models losses are only 0.2%

larger than the best model based on the MSE loss, whereas for the QLIKE loss function

the ex-ante losses are 50% larger. We statistically evaluate these diﬀerences in section 3.4.

However, based on the mixed results from both loss functions, we cannot unambiguously

establish a link between correlation ordering and forecasting results. Additionally, as pointed

out before, the model ranking is highly time-varying. Evaluating the model at every point of

time reveals that the ex-ante model has the lowest loss at exactly one point of time for both

loss functions. Again, it seems that neither the ex-ante nor any other ordering is consistently

delivering the best forecasts.

0.00

0.05

0.10

0.15

0.20

130 135 140 145 150

loss

density

0

20

40

60

80

0.19 0.20 0.21 0.22

loss

density

Figure 3: Average (over time) MSE (left) and QLIKE (right) density for all permutations

with bias correction. Red line is the mean value.

In case of the bias correction, the average loss densities for all permutations are signiﬁ-

cantly diﬀerent (p < 0.01) from the ones without bias correction based on the Kolmogorov-

18

Smirnov (KS) test. In general, the bias correction does decrease the average loss, see ﬁgure

3. Descriptive statistics are given in the lower half of table 4. Most notable, the standard

deviation does increase for the MSE, while in case of the QLIKE the distribution becomes

more right skewed. As a result, the diﬀerence between largest and smallest average loss

increases for both loss functions to 17% (MSE), respectively 18% (QLIKE) percent. The

bias corrected average MET loss is 112.64 for the MSE and 0.18 for the QLIKE. Hence, the

MET heavily beneﬁts from the bias correction, making it a possible alternative to the CD

to circumvent the ordering problem.

For each permutation, we test the distribution of losses over time of the bias corrected

vs the non-bias corrected forecasts using the KS test. In all cases, the loss distributions are

signiﬁcantly diﬀerent at a level p < 0.01 and the mean loss (over time) of the bias corrected

distribution is smaller than the one of the non-bias corrected. For the MSE, the worst model

with bias correction is also the same as the worst model without bias correction. Otherwise,

we ﬁnd that the best and worst model are not the same as in the case of no bias correction.

As before, the ranking of the average losses from best to worst is not consistent across the

loss functions. Comparing the losses over time reveals a similar behavior as before, where

the most frequent best and worst model varies across time.

3.4 Statistically testing forecast performance

To evaluate the signiﬁcance of the loss diﬀerences across time, we test the losses of the

permutations using the MCS procedure introduced in section 2.5. We are interested in several

questions. First of all, are the forecasts from the models which are best and worst based

upon the average loss signiﬁcantly diﬀerent from each other? Second, how well does the

bias adjusted MET model perform compared to the best ordering and third, is the ex-ante

ordering signiﬁcantly worse than the best model?

Starting with the ﬁrst questions, we ﬁnd that for both loss functions the worst model

can be rejected from the MCS at a α= 1% level of signiﬁcance. In case of bias correction, α

19

further decreases. As mentioned in the literature the QLIKE is also more discerning, leading

to slightly lower levels of signiﬁcance in both cases if compared to the MSE. Comparing

the non-corrected vs the bias corrected forecasts, we ﬁnd that the bias correction leads to

signiﬁcantly better forecasts for both loss functions (α= 1%). Overall, since the diﬀerences

between the forecasts are indeed statistically signiﬁcant, choosing the “wrong” ordering may

lead to poor forecast performance, no matter which loss function is chosen.

Next, we only consider the case of bias correction. As we have seen, the MET average

losses where well within the range of the average CD losses. If the MET losses are not

signiﬁcantly diﬀerent from the best CD model, the MET with bias correction could be a

valid alternative to avoid the ordering problem of the CD. The MET forecasts can only

be rejected from the MCS at a α= 50% signiﬁcance level for the MSE and a α= 69%

signiﬁcance level for the QLIKE. Hence, the forecasts from the best CD model and the MET

are not signiﬁcantly diﬀerent from each other at a reasonable level of conﬁdence.

Comparing the losses of the ex-ante ordering with the best model under the respective

loss function, we ﬁnd that for the QLIKE the losses are signiﬁcantly diﬀerent (α < 1%),

while for the MSE the ex-ante model can not be rejected from the MCS (α= 9%). Hence,

initially deciding upon the ordering does not yield a clear recommendation. The danger of

arbitrarily choosing an ordering that might lead to poor forecasts and hence model choices

cannot be assessed ex-ante based on the methodology of correlation ordering.

4 Conclusion

In this paper, we empirically analyzed several issues arising from using the Cholesky

decomposition (CD) for forecasting the realized covariance (RCOV) matrix. We studied

the impact of the order of the variables in the covariance matrix on volatility forecasting,

ﬁnding that diﬀerent orderings do indeed lead to signiﬁcantly diﬀerent forecasts based on

a MCS approach. Initially deciding upon the ordering based on the angular positions of

20

the eigenvectors of the correlation matrix does not lead to unambiguously better results

in forecasting. Further, we ﬁnd that the best and worst models are not consistent over

time, so that a clear recommendation to which order to use is not at hand, even if forecasts

are performed stepwise. A frequently used method of bias correction improves forecasting

accuracy, but on the other hand widens the diﬀerence between best and worst model so that

the ordering problem worsens. On the other hand, bias corrected forecasts from another

decomposition, the matrix exponential transformation (MET) show equal predictive ability

and do not suﬀer from the ordering problem. Thus, for empirical application, two conclusions

can be drawn. If a reasonable order can be imposed on the elements of the covariance matrix

or if the connection between the elements of the decomposed covariance matrix are of interest,

the CD is a rational choice. Otherwise, the application of the MET together with a bias

correction is advised, be it for comparative reasons or simply to avoid the time consuming

process of estimating all possible permutations of the CD.

21

A Appendix

A.1 Tables and ﬁgures

min max mean sd skew kurt pval ADF acf l=1 acf l=2

p11,t -1.31 2.03 0.30 0.58 0.10 2.23 0.01 0.88 0.86

p12,t -0.21 7.63 0.71 0.66 2.83 16.71 0.01 0.76 0.70

p22,t -1.13 2.16 0.2 0.53 0.29 2.33 0.01 0.89 0.87

p13,t -0.5 3.99 0.52 0.43 2.17 11.41 0.01 0.65 0.59

p23,t -0.28 2.68 0.37 0.3 1.99 9.69 0.01 0.61 0.57

p33,t -1.14 1.73 0.04 0.48 0.31 2.40 0.01 0.86 0.82

p14,t -0.7 3.71 0.55 0.46 2.01 9.44 0.01 0.64 0.59

p24,t -0.38 2.4 0.36 0.3 1.77 8.79 0.01 0.46 0.46

p34,t -0.53 2.76 0.28 0.26 1.69 10.30 0.01 0.5 0.45

p44,t -0.97 1.75 0.29 0.42 0.36 2.74 0.01 0.81 0.77

p15,t -0.43 3.09 0.45 0.35 1.96 9.95 0.01 0.53 0.49

p25,t -1.22 4.68 0.31 0.27 3.42 39.36 0.01 0.47 0.39

p35,t -0.43 2.13 0.26 0.22 1.82 10.20 0.01 0.44 0.42

p45,t -0.48 1.71 0.15 0.17 1.36 10.53 0.01 0.18 0.14

p55,t -1.11 1.56 0.00 0.46 0.58 2.88 0.01 0.86 0.82

p16,t -0.29 8.16 0.72 0.66 2.88 18.55 0.01 0.73 0.65

p26,t -0.19 5.77 0.58 0.43 2.26 15.50 0.01 0.64 0.6

p36,t -0.36 2.22 0.22 0.22 1.86 10.20 0.01 0.32 0.3

p46,t -0.91 1.19 0.14 0.18 0.63 6.63 0.01 0.17 0.11

p56,t -0.87 1.37 0.14 0.19 1.25 8.48 0.01 0.18 0.15

p66,t -1.21 2.2 0.14 0.52 0.3 2.39 0.01 0.89 0.86

Table 2: Descriptive statistics for the time-series of the elements of the (alphabetic) Cholesky

decomposition. Diagonal (log) time-series are written in bold. Additionally, p-value of the

ADF test and magnitude of the ﬁrst and second autocorrelation coeﬃcient.

22

min max mean sd skew kurt pval ADF acf l=1 acf l=2

a11,t -2.72 3.63 0.32 1.13 0.13 2.19 0.01 0.88 0.86

a12,t -0.35 0.94 0.3 0.16 0.26 3.32 0.01 0.44 0.43

a22,t -2.28 4.35 0.3 1.07 0.33 2.36 0.01 0.9 0.87

a13,t -0.24 0.67 0.24 0.14 -0.08 3.06 0.01 0.28 0.26

a23,t -0.24 0.71 0.27 0.14 -0.02 2.9 0.01 0.38 0.29

a33,t -2.36 3.52 0.09 0.96 0.3 2.43 0.01 0.85 0.82

a14,t -0.34 0.73 0.2 0.13 0.04 3.08 0.01 0.24 0.2

a24,t -0.25 0.66 0.22 0.13 0.07 3 0.01 0.25 0.27

a34,t -0.27 0.66 0.22 0.13 -0.09 3.13 0.01 0.3 0.28

a44,t -2.01 3.63 0.65 0.84 0.35 2.77 0.01 0.81 0.76

a15,t -0.34 0.62 0.21 0.13 -0.18 3.27 0.01 0.22 0.18

a25,t -0.28 0.8 0.23 0.13 -0.04 3.17 0.01 0.29 0.23

a35,t -0.19 0.67 0.26 0.14 -0.06 2.79 0.01 0.31 0.29

a45,t -0.31 0.65 0.21 0.13 -0.08 3.18 0.01 0.21 0.17

a55,t -2.21 3.51 0.1 0.91 0.57 2.93 0.01 0.85 0.81

a16,t -0.16 0.99 0.29 0.16 0.55 3.7 0.01 0.45 0.39

a26,t -0.11 1.13 0.42 0.18 0.5 3.5 0.01 0.53 0.51

a36,t -0.32 0.63 0.23 0.13 0.02 2.97 0.01 0.21 0.2

a46,t -0.32 0.75 0.2 0.13 0.05 3.25 0.01 0.24 0.2

a56,t -0.33 0.62 0.21 0.13 0.01 3.07 0.01 0.2 0.17

a66,t -2.35 4.85 0.47 1.07 0.25 2.42 0.01 0.89 0.85

Table 3: Descriptive statistics for time-series of the elements of the matrix exponential

transformation. Diagonal (log) elements are written in bold. Additionally, p-value of the

ADF test and magnitude of the ﬁrst and second autocorrelation coeﬃcient.

23

min max mean sd skew kurt median max/min MET alphabetic ex-ante

without bias correction

MSE 265.82 287.43 271.71 4.65 0.75 2.92 271.08 1.08 334.21 269.27 282.71

QLIKE 0.58 0.61 0.60 0.01 −0.28 3.47 0.60 1.05 0.71 0.59 0.60

with bias correction

MSE 130.29 152.47 136.43 5.30 0.79 2.62 134.48 1.17 112.64 130.52 149.19

QLIKE 0.18 0.22 0.21 0.01 −0.97 3.54 0.21 1.18 0.18 0.21 0.19

Table 4: Descriptive statistics for the CD losses over all permutations. Max. vs min. is the ratio of the average loss of the best

model vs the average loss of the worst model. As a comparison, the losses of the (ordering invariant) MET and the losses of the

alphabetic and ex-ante correlation ordering are given.

24

References

Andersen, T. & Bollerslev, T. (1996). Heterogeneous information arrivals and return

volatility dynamics: Uncovering the long-run in high frequency returns. NBER Working

Papers 5752, National Bureau of Economic Research, Inc.

Andersen, T. G., Bollerslev, T., Christoffersen, P. F. & Diebold, F. X. (2006).

Volatility and correlation forecasting. Handbook of Economic Forecasting 1(05), 777–878.

Andersen, T. G., Bollerslev, T., Diebold, F. X. & Labys, P. (2003). Modeling

and forecasting realized volatility. Econometrica 71(2), 579–625.

Asai, M., McAleer, M. & Yu, J. (2006). Multivariate stochastic volatility: A review.

Econometric Reviews 25(2-3), 145–175.

Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. & Shephard, N. (2008).

Multivariate realised kernels: consistent positive semi-deﬁnite estimators of the covariation

of equity prices with noise and non-synchronous trading. Economics Series Working Papers

397, University of Oxford, Department of Economics.

Barndorff-Nielsen, O. E. & Shephard, N. (2002). Estimating quadratic variation

using realized variance. Journal of Applied Econometrics 17(5), 457–477.

Bauer, G. H. & Vorkink, K. (2011). Forecasting multivariate realized stock market

volatility. Journal of Econometrics 160(1), 93–101.

Bellman, R. (1997). Introduction to matrix analysis, vol. 19. Society for Industrial Math-

ematics.

Bernanke, B. S. & Blinder, A. S. (1992). The Federal Funds Rate and the Channels

of Monetary Transmission. American Economic Review 82(4), 901–21.

Brechmann, E. C., Heiden, M. & Okhrin, Y. (2015). A multivariate volatility vine

copula model. Econometric Reviews (forthcoming).

25

Brezinski, C. (2006). The life and work of Andr´e Cholesky. Numerical Algorithms 43(3),

279–288.

Chiriac, R. (2010). A note on estimating Wishart autoregressive model. Ecares working

papers, ULB – Universite Libre de Bruxelles.

Chiriac, R. & Voev, V. (2011). Modelling and forecasting multivariate realized volatility.

Journal of Applied Econometrics 26(6), 922–947.

Chiu, T. Y. M., Leonard, T. & Tsui, K. W. (1996). The matrix-logarithmic covariance

model. Journal of the American Statistical Association 91(433), 198–210.

Christensen, K. & Kinnebrock, S. (2010). Pre-averaging estimators of the ex-post

covariance matrix in noisy diﬀusion models with non-synchronous data. Journal of Econo-

metrics 159(1), 116–133.

Clements, A., Doolan, M., Hurn, S. & Becker, R. (2009). Evaluating multivariate

volatility forecasts. NCER Working Paper Series 41, National Centre for Econometric

Research.

Colacito, R., Engle, R. F. & Ghysels, E. (2011). A component model for dynamic

correlations. Journal of Econometrics 164(1), 45–59.

Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal

of Financial Econometrics 7(2), 174–196.

Dacorogna, M. M. (2001). An introduction to high-frequency ﬁnance. Academic Press.

Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate

generalized autoregressive conditional heteroskedasticity models. Journal of Business &

Economic Statistics 20(3), 339–350.

26

Friendly, M. & Kwan, E. (2003). Eﬀect ordering for data dis-

plays. Computational Statistics & Data Analysis 43(4), 509–539. URL

http://ideas.repec.org/a/eee/csdana/v43y2003i4p509-539.html.

Golosnoy, V., Gribisch, B. & Liesenfeld, R. (2012). The conditional autoregressive

Wishart model for multivariate stock market volatility. Journal of Econometrics 167(1),

211–223.

Gouri´

eroux, C., Jasiak, J. & Sufana, R. (2009). The Wishart autoregressive process

of multivariate stochastic volatility. Journal of Econometrics 150(2), 167–181.

Halbleib, R. & Voev, V. (2011a). Forecasting covariance matrices: A mixed frequency

approach. CREATES Research Papers 2011-03, School of Economics and Management,

University of Aarhus.

Halbleib, R. & Voev, V. (2011b). Forecasting multivariate volatility using the VARFIMA

model on realized covariance Cholesky Factors. Journal of Economics and Statistics

(Jahrbuecher fuer Nationaloekonomie und Statistik) 231(1), 134–152.

Hansen, P. R. & Lunde, A. (2006). Consistent ranking of volatility models. Journal of

Econometrics 131(1-2), 97–121.

Hansen, P. R., Lunde, A. & Nason, J. M. (2011). The Model Conﬁdence Set. Econo-

metrica, Econometric Society 79(2), 453–497.

Hayashi, T. & Yoshida, N. (2005). On covariance estimation of non-synchronously

observed diﬀusion processes. Bernoulli 11(2), 359–379.

James, W. & Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkley

Symp. on Math. Statist. and Prob. (1), 361–379.

Keating, J. W. (1996). Structural information in recursive VAR orderings. Journal of

Economic Dynamics and Control 20(9-10), 1557–1580.

27

Kl¨

oßner, S. & Wagner, S. (2014). Exploring all var orderings for calculating spillovers?

yes, we can! - a note on diebold and yilmaz (2009). Journal of Applied Econometrics 29(1),

172–179.

Laurent, S., Rombouts, J. V. K. & Violante, F. (2013). On loss functions and rank-

ing forecasting performances of multivariate volatility models. Journal of Econometrics

173(1), 1–10.

M¨

uller, U. A., Dacorogna, M. M., Dave, R. D., Pictet, O. V., Olsen, R. B. &

Ward, J. (1993). Fractals and intrinsic time - a challenge to econometricians. Working

Papers 1993-08-16, Olsen and Associates.

Patton, A. J. & Sheppard, K. (2009). Evaluating volatility and correlation forecasts.

In: Handbook of Financial Time Series (Mikosch, T., Kreiss, J.-P., Davis, R. A. &

Andersen, T. G., eds.). Springer Berlin Heidelberg, pp. 801–838.

Sims, C. A. (1980). Macroeconomics and reality. Econometrica 48(1), 1–48.

West, K. D., Edison, H. J. & Cho, D. (1993). A utility-based comparison of some

models of exchange rate volatility. Journal of International Economics 35(1-2), 23–45.

Zhang, L. (2011). Estimating covariation: Epps eﬀect, microstructure noise. Journal of

Econometrics 160(1), 33–47.

Zhang, L., Ait-Sahalia, Y. & Mykland, P. A. (2005). A tale of two time scales:

Determining integrated volatility with noisy high-frequency data. Journal of the American

Statistical Association 100, 1394–1411.

28