ResearchPDF Available

Pitfalls of the Cholesky decomposition for forecasting multivariate volatility



This paper studies the pitfalls of applying the Cholesky decomposition for forecasting multivariate volatility. We analyze the impact of one of the main issues in empirical application of using the decomposition: The sensitivity of the forecasts to the order of the variables in the covariance matrix. We find that despite being frequently used to guarantee positive semi-definiteness and symmetry of the forecasts, the Cholesky decomposition has to be used with caution, as the ordering of the variables leads to significant differences in forecast performance. A possible solution is provided by studying an alternative, the matrix exponential transformation. We show that in combination with empirical bias correction, forecasting accuracy of both decompositions does not significantly differ. This makes the matrix exponential a valuable option, especially in larger dimensions.
Pitfalls of the Cholesky Decomposition for forecasting
multivariate volatility
M. D. Heiden
July 23, 2015
Abstract This paper studies the pitfalls of applying the Cholesky decomposition for fore-
casting multivariate volatility. We analyze the impact of one of the main issues in empirical
application of using the decomposition: The sensitivity of the forecasts to the order of the
variables in the covariance matrix. We find that despite being frequently used to guarantee
positive semi-definiteness and symmetry of the forecasts, the Cholesky decomposition has
to be used with caution, as the ordering of the variables leads to significant differences in
forecast performance. A possible solution is provided by studying an alternative, the matrix
exponential transformation. We show that in combination with empirical bias correction,
forecasting accuracy of both decompositions does not significantly differ. This makes the
matrix exponential a valuable option, especially in larger dimensions.
Keywords: realized covariances; realized volatility; cholesky; decomposition; forecasting
JEL Classification Numbers: C1, C53, C58
Department of Statistics, Faculty of Business and Economics, University of Augsburg, Germany. Email: Phone: +49 821 598-4027. The author gratefully acknowledges the
support from the German Research Foundation (DFG)-Project “Wishart Processes in Statistics and Econo-
metrics: Theory and Applications”.
1 Introduction
Forecasts of the covariance matrix are a crucial ingredient of many economic applications
in asset and risk management. The concept of using realized covariances as a proxy for
the unobservable volatility process, has largely spread since the availability and improved
accessibility of high-frequency data in finance. Recent approaches to forecast the realized
covariance (RCOV) matrix are based upon uni- or multivariate time series models. To ensure
mathematical validity of the forecasted RCOV matrix, such as symmetry and positive semi-
definiteness, either parameter restrictions or decompositions are used. The latter one is
preferred to guarantee parsimony, especially if dimensions are large.
Latest multivariate approaches that ensure symmetry and positive semi-definiteness of the
RCOV matrix include the Wishart Autoregressive (WAR) model proposed by Gouri´eroux
et al. (2009) and its dynamic generalization, the Conditional Autoregressive Wishart by
Golosnoy et al. (2012). Chiriac (2010) shows, that the WAR estimation is very sensitive
to assumptions on the underlying data, causing degenerate Wishart distributions and af-
fecting the estimation results. Consequently, Chiriac & Voev (2011) choose the way of
transformation and base their Vector Autoregressive Fractionally Integrated Moving Aver-
age (VARFIMA) model on a Cholesky decomposition (CD) of the covariance matrix. Bauer
& Vorkink (2011) instead transform the covariance matrix by using the matrix exponential
transformation (MET) and use a factor model approach for the individual components. An-
dersen et al. (2006) and Colacito et al. (2011) modify the Dynamic Conditional Correlation
(DCC) model of Engle (2002), splitting up variances and covariances in the modeling pro-
cess. A similar approach is implemented in Halbleib & Voev (2011a). The authors suggest
a mixed data sampling method based on low-frequent estimators for the correlations.
Usually, the choice of the method can be motivated by the unique properties of the
respective decomposition. For example, while the elements of the CD are explicitly linked to
the entries of original RCOV matrix, this is not the case for the MET. On the other hand,
both, CD and MET, do not separate variances and covariances. This can be achieved by
applying a DCC type decomposition, allowing for more flexibility in the modeling process.
Moreover, as Halbleib & Voev (2011a) point out, high-frequency estimators for the whole
covariance matrix are often noisy and simple methods to reduce microstructure noise, such
as sparse-sampling (see Andersen et al. (2003)), are not applicable in large dimensions.
Overall, due to its simplicity, the CD remains the most frequently used method in the
literature, beside the problem that each permutation of the elements in the original matrix
yields a different decomposition. The problem is well known in the literature on Vector
Autoregression (VAR), where the VAR is usually identified using the corresponding CD to
derive the dynamic response of each variable to an orthogonal shock (see e.g. Sims (1980)).
Keating (1996) show, that the ordering of the variables is crucial to obtain structural impulse
responses, which is only possible if the system of equations is partially recursive. The gravity
of the problem for VAR based approaches has also been pointed out by Kl¨oßner & Wagner
(2014), who analyze the extent, to which measuring spillovers is influenced by the order of
the variables.
In this paper, we focus on the impact of the ordering of the assets in the original covariance
matrix on the forecasts, if a CD is used. Since the amount of possible permutations grows
very fast with increasing number of assets, it is computationally burdensome to calculate
and compare forecasts from each ordering. Therefore, we evaluate the predictive accuracy of
all 720 permutations based on a small data set of six assets. Analyzing the loss distributions
of two established loss functions, we find differences of up to 18% between the average loss
of the best and worst model. Using the Model Confidence Set framework of Hansen et al.
(2011), we show that these loss differences are indeed statistically significant, meaning that
an arbitrary ordering may result in suboptimal forecasts and hence poor model choices.
Additionally, we take a look at the impact of a simple empirical bias correction, as the
forecasts from both, CD and MET are biased by construction. We show, that using the
ordering invariant MET and applying the bias reduction provides a possible solution to the
ordering problem, as forecasts are not significantly different from the best CD.
2 Decomposition of the realized covariance matrix
Let Rtbe the N×1 vector of log returns over each period of Tdays. For a portfolio
consisting of Nstocks:
p(t) = (p1t, . . . ,pNt) being the log price at time t[1, . . . ,T ].
Assume, there are Mequally spaced intra-day observations, the i-th intra-day return for
the t-th period is:
ri,t p((t1) + i1
M)p((t1) + (i1) 1
with i= 1, . . . , M. According to Barndorff-Nielsen & Shephard (2002), the N×NRCOV
matrix for the t-th period is then defined as
which is a consistent estimator for the conditional variance-covariance matrix of the log re-
turns, V ar [Rt|Ft1] = Σt. The estimator can be refined to reduce market microstructure
noise (e.g. Hayashi & Yoshida (2005); Zhang et al. (2005)) and account for jumps (Chris-
tensen & Kinnebrock, 2010). The issue of asynchronicity of the data can be addressed by
methods such as linear or previous-tick interpolation (Dacorogna, 2001) and subsampling
(Zhang, 2011), which are easy to implement in empirical work. More complex procedures
are often based on the use of multivariate realized kernels (see e.g. Barndorff-Nielsen et al.
(2008)). However, as Halbleib & Voev (2011a) point out, these methods are still limited in
application as they may lead to data loss or do not guarantee positive definiteness.
2.1 Cholesky decomposition
The CD, decomposes a real, positive definite1matrix into the product of a real upper
triangular matrix and its transpose (Brezinski, 2006).
The Cholesky decomposition of the naturally symmetric and positive semi-definite Yt,
with Ptbeing an upper triangular matrix, yields:
y11,t y12,t · · · y1N,t
y12,t y22,t · · · y2N,t
y1N,t · · · · · · yNN,t
p11,t 0· · · 0
p12,t p22,t · · · 0
p1N,t p2N,t · · · pNN ,t
p11,t p12,t · · · p1N,t
0p22,t · · · p2N,t
0 0 · · · pNN ,t
The elements pij,t,i, j = 1, . . . , N,i<j are real and can be calculated recursively by
pij,t =
pii,t yij,t Pi1
k=1 pki,tpkj,tfor i<j
qyjj,t Pj1
k=1 p2
kj,t for i=j
0 for i>j
In reverse, the realized covolatilities can be expressed in terms of the Cholesky elements
yij,t =
1Or positive semi-definite if the condition of strict positivity for the diagonal elements of the triangular
matrix is dropped
Since in practice, modeling is carried out on the elements of the CD, one of the problems
depicted in equation 5 is the influence of the ordering of the variables in the covariance
matrix. Consider for example, that we swap the position of the first and second asset in
the return vector. As a result, the elements in the first and second row of the matrix in
equation 3 will change its positions. Due to the recursive calculation of the elements in Pt
, the corresponding Cholesky elements in the first and second row of Ptwill not merely be
swapped, but completely change magnitude. Using the CD for a N×Nportfolio, there are
N! possible permutations of the stocks in the matrix, resulting in different decompositions
that are nonlinearly related to each other. Hence, the resulting time-series of Cholesky
elements pij,t differ between the decompositions. For all model based on the CD, this may
lead to varying model choices, parameter estimates and also forecasts.
Another issue arises in obtaining forecasts for b
Yt+1. Being a quadratic transformation of
the forecast for b
Pt+1, the forecast b
Yt+1 may not be unbiased, even if the forecasts for b
are. This problem is further illustrated in section 2.4.
Furthermore, a often desirable feature of covariance forecasting, namely the separation
of variances and covariance dynamics can not be achieved by using the CD directly on the
covariance matrix. However, it is possible to first apply a DCC decomposition approach
and a CD on the correlation matrix thereafter. In general, the nonlinear dependence of
the elements in the decomposition can also be an advantage, as the dependency structure
between the Cholesky elements can be studied and used for forecasting, see e.g. Brechmann
et al. (2015).
2.2 Matrix exponential transformation
For the covariance matrix, the matrix exponential transformation (MET) was introduced
together with the matrix logarithm function by Chiu et al. (1996). In mathematics, both
operators are frequently used for solving first-order differential systems, see e.g. Bellman
For any real, symmetric matrix At, the matrix exponential transformation performs a
power series expansion, resulting in a real, positive (semi-)definite matrix, in our case Yt,
Yt=Exp(At) =
s=0 1
with A0
tbeing the identity matrix of size N×N, and As
tbeing the s-times the standard
matrix multiplication of At.
In reverse, a real, symmetric matrix Atcan be obtained from Ytby the inverse of the
matrix exponential function, the matrix logarithm function, logm(·),
a11,t a12,t · · · a1N,t
a12,t a22,t · · · a2N,t
a1N,t · · · · · · aN N,t
Again, a reasonable practical approach would be, to model and forecast the elements aij,t,
i, j = 1, . . . , N and obtain valid covariance forecasts by equation 7. However, due to the
power series the expansion, the relationship between Ytand Atis not straightforward to
interpret (see e.g. Asai et al. (2006)) and similar to the CD in section 2.1, covariances and
variances cannot be estimated separately. By applying models to At, therefore doing the
estimation and forecasting in the log-volatility space, the retransformed forecasts for Yt+1,
will be biased by Jensen’s inequality. The problem and possible solutions are illustrated in
Nevertheless the MET has several advantages, especially related to factor models, where
a certain factor structure is analyzed by principal component methods. It can be shown, that
under several conditions, as for example in our case symmetry and positive semi-definiteness
of Yt, applying the matrix logarithm function yielding At, corresponds to decomposing Yt
into its eigenvalues and eigenvectors (see Chiu et al. (1996)). Hence, the As
tcan be obtained
more easily via the eigenvectors than by matrix multiplication as in equation 7. Further, as
principal component analysis of the matrix Ytis also based upon eigenvalue decomposition,
restrictions on the structure of the covariance matrix models can be directly implemented
while constructing the As
t, see e.g. Chiu et al. (1996) or Bauer & Vorkink (2011).
2.3 HAR model
One of the most simple and yet successful univariate models for volatility forecasting
is the Heterogeneous Autoregressive (HAR) model of Corsi (2009). It is inspired by the
Heterogeneous Market Hypothesis (M¨uller et al., 1993), which amongst other things assumes
that market participants act on different time horizons (dealing frequencies) due to their
individual preferences, and therefore create volatility specifically on these horizons. Since in
practice, volatility over longer time intervals has stronger influence on those over shorter time
intervals than conversely (Corsi, 2009), the HAR models volatility by an additive cascade of
components of volatilities in an autoregressive framework..
This leads to the following model for the daily realized volatilities xt
t1+εt, εt
(0, σ2),(9)
where x(·)
tis the realized volatility over the corresponding periods of interest, one day (1d), one
week (1w) and one month (1m), which are defined as: : x(d)
t= 51P5
i=1 xti+1
and x(m)
t= 221P22
i=1 xti+1.
The main advantages of the HAR are, that it is easy to estimate within an OLS frame-
work, parameters are directly interpretable and it reproduces volatility characteristics such
as long-memory without a fractionally integration component. The latter is especially inter-
esting, as the long-memory property could also stem from multifractal scaling2, which can
2The underlying process scales differently for various time horizons.
be captured by an additive component model as the HAR, whereas fractionally integrated
models imply univariate scaling (Andersen & Bollerslev, 1996). Under the HMH hypothesis,
multifractal scaling possesses clear economic justification which is directly interpretable in
the HAR framework, due to the simple parameter structure (Corsi, 2009).
Regarding forecasting, standard methods for a general ARMA framework can be used
to produce direct or iterated forecasts of the conditional volatility. In contrast to the above
conventional HAR model, which is directly applied on a time-series of realized volatilities,
we use the model on the time-series of the elements of the CD or the MET, by replacing the
components xtand x(·)
twith the respective pij,t or aij,t from equation 4 and 8.
2.4 Forecasting and bias correction
To obtain forecasts for the RCOV matrix b
Yt+1, the forecasts bpij,t or baij,t are obtained by
the HAR model in section 2.3 and retransformed by equation 5 respectively 8.
For the CD, this last transformation is nonlinear and induces a theoretical bias, which
is derived in Chiriac & Voev (2011) and can be expressed by the covariances of the forecast
errors u·,t+1 of the HAR model
E[ˆyij,t+1 yij,t+1] =
However, since we estimate the models independently of each other, the expression is
not feasible as we cannot consistently estimate the covariance matrix of the forecast errors.
A heuristic approach to obtain unbiased predictions is suggested in Chiriac & Voev (2011)
and further studied in Halbleib & Voev (2011b). In the original approach, due to the larger
distortion of volatilities, bias correction is only carried out on the series of realized volatilities
ˆyii,t,i1, . . . , N. However, as implied by equation 10, all elements of b
Yt+1 will be biased.
Hence, an adaption of the approach of Chiriac & Voev (2011), that corrects volatility and
covariance forecasts can be obtained by:
ˆy(corrected),ij,t+1 = ˆyij,t+1 ·median yij,t
ˆyij,t t=1,...,n
Note, that the window length non which the median is estimated controls for the trade-
off between the bias and the precision of the correction. Since we are interested in the general
relation between bias correction, we simply estimate the median in the bias correction factor
on a window length equal to our estimation window for the HAR model in section 3.
In case of the MET, the analytical bias correction is more complicated, but can be
derived if b
Atand the estimated residuals ˆεtare both normally distributed, see Bauer &
Vorkink (2011) for a detailed discussion. However, since normality is empirically often not
satisfied, Bauer & Vorkink (2011) suggest a similar approach to Chiriac & Voev (2011).
Their method decomposes the forecasted matrix of realized covariances b
Yt+1 into correlations
and volatilities, bias correcting the latter ones only and leaving the correlations intact. For
comparative reasons, we apply our method in equation 2.4, which works well in our empirical
application for both, CD and MET, see section 3. Note that bias correcting not only the
volatilities but also the covariances bears the risk of the corrected RCOV matrix forecast no
longer being positive semi-definite. However, in our application, this is never the case.
2.5 Loss functions and the MCS
According to Patton & Sheppard (2009), two issues are of major importance when com-
paring forecasts of the covariance matrix. First, tests have to be robust to noise in the
volatility proxy and second, they should only require minimal assumptions on the distribu-
tion of the returns. Therefore, we rely on the method of Hansen et al. (2011) using a model
confidence set (MCS) approach based upon different loss functions to evaluate the multi-
variate volatility forecasts. This framework fulfills the requirements of Patton & Sheppard
(2009) and has the advantage that we can conveniently compare forecasts from many models
without using a benchmark. Furthermore, the MCS does not necessarily select a single best
model but it allows for the possibility of equality of the models forecasting ability. Hence, a
model is only removed from the MCS if it is significantly inferior to other models, making
the MCS more robust in comparing volatility forecasts.
For our approach, we choose two loss functions that satisfy the conditions of Hansen &
Lunde (2006) for producing a consistent ranking in the multivariate case. Consistency in the
context of loss functions means, that the true ranking of the covariance models is preserved,
regardless if the true conditional covariance or an unbiased covariance proxy is used (Hansen
& Lunde, 2006). For the comparison of forecasts of the whole covariance matrix, Laurent
et al. (2013) present two families of loss functions that yield a consistent ordering. The first
family, called p-norm loss functions can be written as
Ytp= N
|yij,t byij,t|p!1/p
where b
Ytis the forecast from our model for the actual RCOV matrix Yt, which we use as
a proxy for the unobservable covariance matrix Σt. The respective elements of the matrices
are denoted by yij,t and byij,t. From this class, we consider the commonly used multivariate
equivalent of the mean squared error (MSE) loss: L Yt,b
The second family, called eigenvalue loss functions is based upon the square root of the
largest eigenvalue of the matrix (Ytb
Yt)2. We will consider a special case of this family,
the so called James-Stein loss (James & Stein, 1961), which is usually referred to as the
Multivariate Quasi Likelihood (QLIKE) loss function:
YtYt)ln b
YtYtN, (12)
where Nis the number of assets.
While both, the MSE and the QLIKE loss function determine the optimal forecasts based
on conditional expectation, Clements et al. (2009) point out, that compared to the MSE, the
QLIKE has greater power in distinguishing between volatility forecasts based on the MCS
framework. As pointed out in Laurent et al. (2013), the QLIKE penalizes underpredictions
more heavily than overpredictions. West et al. (1993) show, that this is also relevant from
an investor’s point of view, as an underestimation of variances leads to lower expected
utility than an equal amount of overestimation. Hence, for a risk averse investor, punishing
underpredictions more heavily seems to be rationale when evaluating forecasting accuracy.
For the MCS approach, we start with the full set of candidate models M0={1, . . . ,m0}.
For all models, the loss differential between each model is computed based upon one of our
loss functions Lk,k= 1 (MSE),2 (QLIKE), so that for model iand j,i,j = 1, . . . ,m0and
every time point t= 1, . . . ,T we get:
dij,kt = LkYit ,b
At each step of the evaluation, the hypothesis
H0:E[dij,kt ]=0,i>j∈ M,(14)
is tested for a subset of models M ∈ M0, where M=M0for the initial step. If the H0
is rejected at a given significance level α, the worst performing model is removed from the
set. To give an impression on the scale of rejection, for each loss function and model, the
respective αat which the model would be removed from the MCS can be computed.
This process continues until a set of models remains that cannot be rejected. Similar to
Hansen et al. (2011), we use the range statistics to evaluate the H0, which can be written
TR= max
i,j∈M |tij,k|= max
i,j∈M ¯
pcvar( ¯
dij,k ),(15)
where ¯
dij,k =1
t=1 dij,k and cvar( ¯
dij,k ) is obtained from a block-bootstrap procedure, see
Hansen et al. (2011), which we implement with 10000 replications and a block length varying
from 20 to 50 to check the robustness of the results.
The worst performing model to be removed from the set Mis selected as model iwith
i= arg max
pcvar( ¯
where ¯
di,k =1
m1Pj∈M ¯
dij,k and mbeing the number of models in the actual set M.
3 Empirical study
3.1 Data and descriptive statistics
The dataset stems from the New York Stock Exchange (NYSE) Trade and Quotations
(TAQ) and corresponds to the one used in Chiriac & Voev (2011). It was obtained from the
Journal of Applied Econometrics Data Archive. The original data file consists of all tick-by-
tick bid and ask quotes on six stocks listed on the NYSE, American Stock Exchange (AMEX)
and the National Association of Security Dealers Automated Quotation system (NASDAQ).
The sample ranges from 9:30 EST until 16:00 EST over the period January 1, 2000 to July 30,
2008 and consists of (2156 trading days). Included individual stocks are American Express
Inc. (AXP), Citigroup (C), General Electric (GE), Home Depot Inc. (HD), International
Business Machines (IBM) and JPMorgan Chase&Co (JPM). The original tick-by-tick data
has previously been transformed as follows. To obtain synchronized and regularly spaced
observations, the previous-tick interpolation method of Dacorogna (2001) is used.
Then, log-midquotes are constructed from the bid and ask quotes by geometric averaging.
M= 78 equally spaced 5-minute return vectors ri,t are computed from the log-midquotes.
Daily open-to-close returns are computed as the difference in the log-midquote at the end
and the beginning of each day.
For each daily period t= 1,...,2156, the series of daily RCOV matrices is constructed
as in section 2 by summing up the squared 5-minute return vectors:
This approach is further refined by a subsampling procedure to make the RCOV estimates
more robust to microstructure noise and non-synchronicity (see Zhang (2011)). From the
original data, 30 regularly ∆-spaced subgrids are constructed with ∆ = 300 seconds, start-
ing at seconds 1,11,21,...,291. For each subgrid, the log-midquotes are constructed and
the RCOV matrix is obtained on each subgrid according to equation 17. Then, the RCOV
matrices are averaged over the subgrids. To avoid noise by measuring overnight volatilities,
all computations are applied to open-to-close data. For the descriptive statistics and esti-
mation purposes, all daily and intradaily returns are scaled by 100, so that the values refer
to percentage returns.
At each point of time t, we apply either the CD or the MET on the obtained RCOV
matrix. Additionally, we take the logarithm of the elements on the diagonal, to ensure
positivity of the elements of the decomposition when applying the time-series models. Since
the ordering of the assets in the original RCOV matrix is relevant for the CD, we refer to
the basic alphabetic ordering of the individual stocks in section 3.1 for the initial descriptive
analysis of the elements of the CD.
In general, the elements of both decompositions exhibit the same characteristics as the
realized covariances, such as volatility clustering, right skewness, excess kurtosis and high
levels of autocorrelation, see tables 2 and 3. All series seem to be stationarity based on the
Augmented Dickey-Fuller test.
3.2 Optimal odering
If the ordering might indeed be crucial for the forecast performance, the question arises
if there is any possibility to determine the optimal position of an asset in the original return
vector before evaluating all permutations. According to equation 6, the forecasts in column
j,{byij }i=1,...,j only depend on the forecasted entries of the Cholesky matrix Pup to column
1, . . . , j,{bp`i }ij. Hence, if the asset is for example moved from position i= 1 in the return
vector to a position i > 1, the number of forecasted Cholesky elements that enter calculation
of the covolatility forecast increases with every increase in the position. Intuitively, assets
that are more correlated with each other should be placed after assets that are less correlated,
so that their dependence is picked up by the Cholesky elements. Similarly, in the estimation
of structural VARs, variables are often ordered by their degree of exogeneity from most to
least exogenuous, see e.g. Bernanke & Blinder (1992); Keating (1996). However, the CD is
only useful for identifying the structural relationship under rather restrictive conditions, e.g.
in case of VAR modeling if the underlying relationship is recursive. Based on our data set
of equity returns, we cannot impose a structural relationship by means of economic theory.
Nevertheless, we analyze the correlation structure of the realized variances of the six assets
to identify possible linkages that might be helpful in ordering the assets. The full-sample
correlation matrix of the time-series of realized variances for the natural alphabetic ordering
of the assets are given in figure 1.
On the left sid, the ordering of the elements in the return vector matrix is used, while on
the right side the correlations are ordered based on the angular positions of the eigenvectors
of the correlation matrix. This method is sometimes called “correlation ordering” (Friendly
& Kwan, 2003) and places similar variables contiguously. The correlation matrix on the right
shows, which assets should be grouped together. Note that the correlations are not sorted
by size, eg. from highest to lowest average correlation. We now proceed to analyze two
questions. First, do different orderings indeed yield significantly different forecasts? Second,
does ordering the variables in the returns vector similar to the rule of correlation ordering
produce superior forecasts?
100 77
100 69
Figure 1: Correlation matrix of the original time-series of realized variances. On the left,
correlations are ordered by the alphabetic order in the return vector. On the right, correla-
tions are ordered based on the angular positions of the eigenvectors of the correlation matrix.
The estimate of the corresponding correlation coefficient is given inside the square.
3.3 Modeling and forecasting procedure
For each decomposition and permutation of the assets, we apply the HAR model from
section 2.3 on each time-series of CD or MET elements. Since the MET is independent of
the chosen permutation, we can use the resulting model as a benchmark model. For the CD,
we obtain 21 different models for each one of the 720 permutations. We retain the last 200
observations of the dataset for one-step ahead out-of-sample forecasting and estimate the
models based on the a moving window of 1956 observations. Forecasts of the RCOV matrix
are then generated according to section 2.4.
First, for each permutation we evaluate the forecasts by means of the multivariate loss
functions from section 2.5. For the CD and both loss functions, we take the average loss over
time for each permutation to obtain a distribution of losses, see figure 2. The corresponding
descriptive statistics are given in the upper half of table 4.
The loss density of the MSE is multimodal and left skewed with an average loss of 271.71.
In comparison, the average loss of the MET is 334.21, which is 16% larger than the maximum
Figure 2: Average (over time) MSE (left) and QLIKE (right) density for all permutations.
Red line is the mean value.
MSE loss of the CD. The difference between largest and smallest average loss is 8%. The
QLIKE loss density is more symmetric and only slightly right skewed, with an average loss
of 0.60, compared to an average loss of 0.71 for the MET. Again, the MET QLIKE loss is
16% larger than the largest QLIKE loss of the CD. The standard deviation of the QLIKE
losses is significantly smaller (p < 0.01) than for the MSE loss, based on the Brown-Forsythe
test. Still, the difference between largest and smallest average loss is roughly 5%. Ranking
the models from best to worst (smallest to largest average losses over time), we find that
the ordering is not consistent across the loss functions. Evaluating the model performance
over time instead of taking averages, the most frequent best model is identical in 4 of the
200 out-of-sample forecasts for both loss functions. The most frequent worst model on the
other hand differs between both loss function. For the MSE, one certain ordering is the
worst model in 12 out of the 200 forecasts. In case of the QLIKE, the most frequent worst
ordering has the highest loss in 3 out of 200 times.
To come back to the question if the method of correlation ordering in section 3.2 is helpful
in determining the best model ex-ante, we list the worst and best orderings based on the
average loss for both loss function in table 1. To simplify the notation, we rename the assets
by their position in the alphabetic return vector, namely AXP= 1, C= 2, GE= 3, HD= 4,
IBM= 5 and JPM= 6.
Surprisingly, the best model under the MSE loss function nearly coincides with the model
worst best ex-ante ex-ante vs best
MSE “3 2 1 6 4 5” “5 4 3 1 6 2” “5 4 3 1 2 6” 1.002
QLIKE “3 1 5 4 2 6” “6 2 3 4 5 1” “5 4 3 1 2 6” 1.506
Table 1: Orderings with the highest and lowest average losses (without bias correction)
based on the respective loss function. Ex-ante gives the order proposed by the method of
correlation ordering. Additionally the average loss of the ex-ante model relative to the best
model is listed.
suggested by the method of correlation ordering, with only asset 2 and 6 switching positions.
For the QLIKE loss, only asset 3 is on the same position in the best model compared to the
ex-ante ordering. Regarding the average loss size, the ex-ante models losses are only 0.2%
larger than the best model based on the MSE loss, whereas for the QLIKE loss function
the ex-ante losses are 50% larger. We statistically evaluate these differences in section 3.4.
However, based on the mixed results from both loss functions, we cannot unambiguously
establish a link between correlation ordering and forecasting results. Additionally, as pointed
out before, the model ranking is highly time-varying. Evaluating the model at every point of
time reveals that the ex-ante model has the lowest loss at exactly one point of time for both
loss functions. Again, it seems that neither the ex-ante nor any other ordering is consistently
delivering the best forecasts.
Figure 3: Average (over time) MSE (left) and QLIKE (right) density for all permutations
with bias correction. Red line is the mean value.
In case of the bias correction, the average loss densities for all permutations are signifi-
cantly different (p < 0.01) from the ones without bias correction based on the Kolmogorov-
Smirnov (KS) test. In general, the bias correction does decrease the average loss, see figure
3. Descriptive statistics are given in the lower half of table 4. Most notable, the standard
deviation does increase for the MSE, while in case of the QLIKE the distribution becomes
more right skewed. As a result, the difference between largest and smallest average loss
increases for both loss functions to 17% (MSE), respectively 18% (QLIKE) percent. The
bias corrected average MET loss is 112.64 for the MSE and 0.18 for the QLIKE. Hence, the
MET heavily benefits from the bias correction, making it a possible alternative to the CD
to circumvent the ordering problem.
For each permutation, we test the distribution of losses over time of the bias corrected
vs the non-bias corrected forecasts using the KS test. In all cases, the loss distributions are
significantly different at a level p < 0.01 and the mean loss (over time) of the bias corrected
distribution is smaller than the one of the non-bias corrected. For the MSE, the worst model
with bias correction is also the same as the worst model without bias correction. Otherwise,
we find that the best and worst model are not the same as in the case of no bias correction.
As before, the ranking of the average losses from best to worst is not consistent across the
loss functions. Comparing the losses over time reveals a similar behavior as before, where
the most frequent best and worst model varies across time.
3.4 Statistically testing forecast performance
To evaluate the significance of the loss differences across time, we test the losses of the
permutations using the MCS procedure introduced in section 2.5. We are interested in several
questions. First of all, are the forecasts from the models which are best and worst based
upon the average loss significantly different from each other? Second, how well does the
bias adjusted MET model perform compared to the best ordering and third, is the ex-ante
ordering significantly worse than the best model?
Starting with the first questions, we find that for both loss functions the worst model
can be rejected from the MCS at a α= 1% level of significance. In case of bias correction, α
further decreases. As mentioned in the literature the QLIKE is also more discerning, leading
to slightly lower levels of significance in both cases if compared to the MSE. Comparing
the non-corrected vs the bias corrected forecasts, we find that the bias correction leads to
significantly better forecasts for both loss functions (α= 1%). Overall, since the differences
between the forecasts are indeed statistically significant, choosing the “wrong” ordering may
lead to poor forecast performance, no matter which loss function is chosen.
Next, we only consider the case of bias correction. As we have seen, the MET average
losses where well within the range of the average CD losses. If the MET losses are not
significantly different from the best CD model, the MET with bias correction could be a
valid alternative to avoid the ordering problem of the CD. The MET forecasts can only
be rejected from the MCS at a α= 50% significance level for the MSE and a α= 69%
significance level for the QLIKE. Hence, the forecasts from the best CD model and the MET
are not significantly different from each other at a reasonable level of confidence.
Comparing the losses of the ex-ante ordering with the best model under the respective
loss function, we find that for the QLIKE the losses are significantly different (α < 1%),
while for the MSE the ex-ante model can not be rejected from the MCS (α= 9%). Hence,
initially deciding upon the ordering does not yield a clear recommendation. The danger of
arbitrarily choosing an ordering that might lead to poor forecasts and hence model choices
cannot be assessed ex-ante based on the methodology of correlation ordering.
4 Conclusion
In this paper, we empirically analyzed several issues arising from using the Cholesky
decomposition (CD) for forecasting the realized covariance (RCOV) matrix. We studied
the impact of the order of the variables in the covariance matrix on volatility forecasting,
finding that different orderings do indeed lead to significantly different forecasts based on
a MCS approach. Initially deciding upon the ordering based on the angular positions of
the eigenvectors of the correlation matrix does not lead to unambiguously better results
in forecasting. Further, we find that the best and worst models are not consistent over
time, so that a clear recommendation to which order to use is not at hand, even if forecasts
are performed stepwise. A frequently used method of bias correction improves forecasting
accuracy, but on the other hand widens the difference between best and worst model so that
the ordering problem worsens. On the other hand, bias corrected forecasts from another
decomposition, the matrix exponential transformation (MET) show equal predictive ability
and do not suffer from the ordering problem. Thus, for empirical application, two conclusions
can be drawn. If a reasonable order can be imposed on the elements of the covariance matrix
or if the connection between the elements of the decomposed covariance matrix are of interest,
the CD is a rational choice. Otherwise, the application of the MET together with a bias
correction is advised, be it for comparative reasons or simply to avoid the time consuming
process of estimating all possible permutations of the CD.
A Appendix
A.1 Tables and figures
min max mean sd skew kurt pval ADF acf l=1 acf l=2
p11,t -1.31 2.03 0.30 0.58 0.10 2.23 0.01 0.88 0.86
p12,t -0.21 7.63 0.71 0.66 2.83 16.71 0.01 0.76 0.70
p22,t -1.13 2.16 0.2 0.53 0.29 2.33 0.01 0.89 0.87
p13,t -0.5 3.99 0.52 0.43 2.17 11.41 0.01 0.65 0.59
p23,t -0.28 2.68 0.37 0.3 1.99 9.69 0.01 0.61 0.57
p33,t -1.14 1.73 0.04 0.48 0.31 2.40 0.01 0.86 0.82
p14,t -0.7 3.71 0.55 0.46 2.01 9.44 0.01 0.64 0.59
p24,t -0.38 2.4 0.36 0.3 1.77 8.79 0.01 0.46 0.46
p34,t -0.53 2.76 0.28 0.26 1.69 10.30 0.01 0.5 0.45
p44,t -0.97 1.75 0.29 0.42 0.36 2.74 0.01 0.81 0.77
p15,t -0.43 3.09 0.45 0.35 1.96 9.95 0.01 0.53 0.49
p25,t -1.22 4.68 0.31 0.27 3.42 39.36 0.01 0.47 0.39
p35,t -0.43 2.13 0.26 0.22 1.82 10.20 0.01 0.44 0.42
p45,t -0.48 1.71 0.15 0.17 1.36 10.53 0.01 0.18 0.14
p55,t -1.11 1.56 0.00 0.46 0.58 2.88 0.01 0.86 0.82
p16,t -0.29 8.16 0.72 0.66 2.88 18.55 0.01 0.73 0.65
p26,t -0.19 5.77 0.58 0.43 2.26 15.50 0.01 0.64 0.6
p36,t -0.36 2.22 0.22 0.22 1.86 10.20 0.01 0.32 0.3
p46,t -0.91 1.19 0.14 0.18 0.63 6.63 0.01 0.17 0.11
p56,t -0.87 1.37 0.14 0.19 1.25 8.48 0.01 0.18 0.15
p66,t -1.21 2.2 0.14 0.52 0.3 2.39 0.01 0.89 0.86
Table 2: Descriptive statistics for the time-series of the elements of the (alphabetic) Cholesky
decomposition. Diagonal (log) time-series are written in bold. Additionally, p-value of the
ADF test and magnitude of the first and second autocorrelation coefficient.
min max mean sd skew kurt pval ADF acf l=1 acf l=2
a11,t -2.72 3.63 0.32 1.13 0.13 2.19 0.01 0.88 0.86
a12,t -0.35 0.94 0.3 0.16 0.26 3.32 0.01 0.44 0.43
a22,t -2.28 4.35 0.3 1.07 0.33 2.36 0.01 0.9 0.87
a13,t -0.24 0.67 0.24 0.14 -0.08 3.06 0.01 0.28 0.26
a23,t -0.24 0.71 0.27 0.14 -0.02 2.9 0.01 0.38 0.29
a33,t -2.36 3.52 0.09 0.96 0.3 2.43 0.01 0.85 0.82
a14,t -0.34 0.73 0.2 0.13 0.04 3.08 0.01 0.24 0.2
a24,t -0.25 0.66 0.22 0.13 0.07 3 0.01 0.25 0.27
a34,t -0.27 0.66 0.22 0.13 -0.09 3.13 0.01 0.3 0.28
a44,t -2.01 3.63 0.65 0.84 0.35 2.77 0.01 0.81 0.76
a15,t -0.34 0.62 0.21 0.13 -0.18 3.27 0.01 0.22 0.18
a25,t -0.28 0.8 0.23 0.13 -0.04 3.17 0.01 0.29 0.23
a35,t -0.19 0.67 0.26 0.14 -0.06 2.79 0.01 0.31 0.29
a45,t -0.31 0.65 0.21 0.13 -0.08 3.18 0.01 0.21 0.17
a55,t -2.21 3.51 0.1 0.91 0.57 2.93 0.01 0.85 0.81
a16,t -0.16 0.99 0.29 0.16 0.55 3.7 0.01 0.45 0.39
a26,t -0.11 1.13 0.42 0.18 0.5 3.5 0.01 0.53 0.51
a36,t -0.32 0.63 0.23 0.13 0.02 2.97 0.01 0.21 0.2
a46,t -0.32 0.75 0.2 0.13 0.05 3.25 0.01 0.24 0.2
a56,t -0.33 0.62 0.21 0.13 0.01 3.07 0.01 0.2 0.17
a66,t -2.35 4.85 0.47 1.07 0.25 2.42 0.01 0.89 0.85
Table 3: Descriptive statistics for time-series of the elements of the matrix exponential
transformation. Diagonal (log) elements are written in bold. Additionally, p-value of the
ADF test and magnitude of the first and second autocorrelation coefficient.
min max mean sd skew kurt median max/min MET alphabetic ex-ante
without bias correction
MSE 265.82 287.43 271.71 4.65 0.75 2.92 271.08 1.08 334.21 269.27 282.71
QLIKE 0.58 0.61 0.60 0.01 0.28 3.47 0.60 1.05 0.71 0.59 0.60
with bias correction
MSE 130.29 152.47 136.43 5.30 0.79 2.62 134.48 1.17 112.64 130.52 149.19
QLIKE 0.18 0.22 0.21 0.01 0.97 3.54 0.21 1.18 0.18 0.21 0.19
Table 4: Descriptive statistics for the CD losses over all permutations. Max. vs min. is the ratio of the average loss of the best
model vs the average loss of the worst model. As a comparison, the losses of the (ordering invariant) MET and the losses of the
alphabetic and ex-ante correlation ordering are given.
Andersen, T. & Bollerslev, T. (1996). Heterogeneous information arrivals and return
volatility dynamics: Uncovering the long-run in high frequency returns. NBER Working
Papers 5752, National Bureau of Economic Research, Inc.
Andersen, T. G., Bollerslev, T., Christoffersen, P. F. & Diebold, F. X. (2006).
Volatility and correlation forecasting. Handbook of Economic Forecasting 1(05), 777–878.
Andersen, T. G., Bollerslev, T., Diebold, F. X. & Labys, P. (2003). Modeling
and forecasting realized volatility. Econometrica 71(2), 579–625.
Asai, M., McAleer, M. & Yu, J. (2006). Multivariate stochastic volatility: A review.
Econometric Reviews 25(2-3), 145–175.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A. & Shephard, N. (2008).
Multivariate realised kernels: consistent positive semi-definite estimators of the covariation
of equity prices with noise and non-synchronous trading. Economics Series Working Papers
397, University of Oxford, Department of Economics.
Barndorff-Nielsen, O. E. & Shephard, N. (2002). Estimating quadratic variation
using realized variance. Journal of Applied Econometrics 17(5), 457–477.
Bauer, G. H. & Vorkink, K. (2011). Forecasting multivariate realized stock market
volatility. Journal of Econometrics 160(1), 93–101.
Bellman, R. (1997). Introduction to matrix analysis, vol. 19. Society for Industrial Math-
Bernanke, B. S. & Blinder, A. S. (1992). The Federal Funds Rate and the Channels
of Monetary Transmission. American Economic Review 82(4), 901–21.
Brechmann, E. C., Heiden, M. & Okhrin, Y. (2015). A multivariate volatility vine
copula model. Econometric Reviews (forthcoming).
Brezinski, C. (2006). The life and work of Andr´e Cholesky. Numerical Algorithms 43(3),
Chiriac, R. (2010). A note on estimating Wishart autoregressive model. Ecares working
papers, ULB – Universite Libre de Bruxelles.
Chiriac, R. & Voev, V. (2011). Modelling and forecasting multivariate realized volatility.
Journal of Applied Econometrics 26(6), 922–947.
Chiu, T. Y. M., Leonard, T. & Tsui, K. W. (1996). The matrix-logarithmic covariance
model. Journal of the American Statistical Association 91(433), 198–210.
Christensen, K. & Kinnebrock, S. (2010). Pre-averaging estimators of the ex-post
covariance matrix in noisy diffusion models with non-synchronous data. Journal of Econo-
metrics 159(1), 116–133.
Clements, A., Doolan, M., Hurn, S. & Becker, R. (2009). Evaluating multivariate
volatility forecasts. NCER Working Paper Series 41, National Centre for Econometric
Colacito, R., Engle, R. F. & Ghysels, E. (2011). A component model for dynamic
correlations. Journal of Econometrics 164(1), 45–59.
Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal
of Financial Econometrics 7(2), 174–196.
Dacorogna, M. M. (2001). An introduction to high-frequency finance. Academic Press.
Engle, R. F. (2002). Dynamic conditional correlation: A simple class of multivariate
generalized autoregressive conditional heteroskedasticity models. Journal of Business &
Economic Statistics 20(3), 339–350.
Friendly, M. & Kwan, E. (2003). Effect ordering for data dis-
plays. Computational Statistics & Data Analysis 43(4), 509–539. URL
Golosnoy, V., Gribisch, B. & Liesenfeld, R. (2012). The conditional autoregressive
Wishart model for multivariate stock market volatility. Journal of Econometrics 167(1),
eroux, C., Jasiak, J. & Sufana, R. (2009). The Wishart autoregressive process
of multivariate stochastic volatility. Journal of Econometrics 150(2), 167–181.
Halbleib, R. & Voev, V. (2011a). Forecasting covariance matrices: A mixed frequency
approach. CREATES Research Papers 2011-03, School of Economics and Management,
University of Aarhus.
Halbleib, R. & Voev, V. (2011b). Forecasting multivariate volatility using the VARFIMA
model on realized covariance Cholesky Factors. Journal of Economics and Statistics
(Jahrbuecher fuer Nationaloekonomie und Statistik) 231(1), 134–152.
Hansen, P. R. & Lunde, A. (2006). Consistent ranking of volatility models. Journal of
Econometrics 131(1-2), 97–121.
Hansen, P. R., Lunde, A. & Nason, J. M. (2011). The Model Confidence Set. Econo-
metrica, Econometric Society 79(2), 453–497.
Hayashi, T. & Yoshida, N. (2005). On covariance estimation of non-synchronously
observed diffusion processes. Bernoulli 11(2), 359–379.
James, W. & Stein, C. (1961). Estimation with quadratic loss. Proc. Fourth Berkley
Symp. on Math. Statist. and Prob. (1), 361–379.
Keating, J. W. (1996). Structural information in recursive VAR orderings. Journal of
Economic Dynamics and Control 20(9-10), 1557–1580.
oßner, S. & Wagner, S. (2014). Exploring all var orderings for calculating spillovers?
yes, we can! - a note on diebold and yilmaz (2009). Journal of Applied Econometrics 29(1),
Laurent, S., Rombouts, J. V. K. & Violante, F. (2013). On loss functions and rank-
ing forecasting performances of multivariate volatility models. Journal of Econometrics
173(1), 1–10.
uller, U. A., Dacorogna, M. M., Dave, R. D., Pictet, O. V., Olsen, R. B. &
Ward, J. (1993). Fractals and intrinsic time - a challenge to econometricians. Working
Papers 1993-08-16, Olsen and Associates.
Patton, A. J. & Sheppard, K. (2009). Evaluating volatility and correlation forecasts.
In: Handbook of Financial Time Series (Mikosch, T., Kreiss, J.-P., Davis, R. A. &
Andersen, T. G., eds.). Springer Berlin Heidelberg, pp. 801–838.
Sims, C. A. (1980). Macroeconomics and reality. Econometrica 48(1), 1–48.
West, K. D., Edison, H. J. & Cho, D. (1993). A utility-based comparison of some
models of exchange rate volatility. Journal of International Economics 35(1-2), 23–45.
Zhang, L. (2011). Estimating covariation: Epps effect, microstructure noise. Journal of
Econometrics 160(1), 33–47.
Zhang, L., Ait-Sahalia, Y. & Mykland, P. A. (2005). A tale of two time scales:
Determining integrated volatility with noisy high-frequency data. Journal of the American
Statistical Association 100, 1394–1411.
... Pourahmadi 2011), where the author states that "the factors of the Cholesky decomposition are dependent on the order in which the variables appear in the random vector y i ". This has been noted as a pitfall of Choleskybased approaches and can lead to significant differences in forecast performance (Heiden 2015), although on problems with real data we have not observed large differences. ...
Full-text available
We consider the problem of predicting the covariance of a zero mean Gaussian vector, based on another feature vector. We describe a covariance predictor that has the form of a generalized linear model, i.e., an affine function of the features followed by an inverse link function that maps vectors to symmetric positive definite matrices. The log-likelihood is a concave function of the predictor parameters, so fitting the predictor involves convex optimization. Such predictors can be combined with others, or recursively applied to improve performance.
... where the author states that "the factors of the Cholesky decomposition are dependent on the order in which the variables appear in the random vector y i ". This has been noted as a pitfall of Cholesky-based approaches and can lead to significant differences in forecast performance [24], although on problems with real data we have not observed large differences. ...
We consider the problem of predicting the covariance of a zero mean Gaussian vector, based on another feature vector. We describe a covariance predictor that has the form of a generalized linear model, i.e., an affine function of the features followed by an inverse link function that maps vectors to symmetric positive definite matrices. The log-likelihood is a concave function of the predictor parameters, so fitting the predictor involves convex optimization. Such predictors can be combined with others, or recursively applied to improve performance.
Full-text available
This paper introduces a multivariate kernel based forecasting tool for the prediction of variance-covariance matrices of stock returns. The method introduced allows for the incorporation of macroeconomic variables into the forecasting process of the matrix without resorting to a decomposition of the matrix. The model makes use of similarity forecasting techniques and it is demonstrated that several popular techniques can be thought as a subset of this approach. A forecasting experiment demonstrates the potential for the technique to improve the statistical accuracy of forecasts of variance-covariance matrices.
We present a new matrix-logarithm model of the realized covariance matrix of stock returns. The model uses latent factors which are functions of both lagged volatility and returns. The model has several advantages: it is parsimonious; it does not require imposing parameter restrictions; and, it results in a positive-definite covariance matrix. We apply the model to the covariance matrix of size-sorted stock returns and find that two factors are sufficient to capture most of the dynamics. We also introduce a new method to track an index using our model of the realized volatility covariance matrix.
This paper proposes a dynamic framework for modeling and forecasting of realized covariance matrices using vine copulas to allow for more flexible dependencies between assets. Our model automatically guarantees positive definiteness of the forecast through the use of a Cholesky decomposition of the realized covariance matrix. We explicitly account for long-memory behavior by using ARFIMA and HAR models for the individual elements of the decomposition. Furthermore, our model incorporates non-Gaussian innovations and GARCH effects, accounting for volatility clustering and unconditional kurtosis. The dependence structure between assets is studied using vine copula constructions, which allow for nonlinearity and asymmetry without suffering from an inflexible tail behavior or symmetry restrictions as in conventional multivariate models. Further, the copulas have a direct impact on the point forecasts of the realized covariances matrices, due to being computed as a nonlinear transformation of the forecasts for the Cholesky matrix. Beside studying in-sample properties, we assess the usefulness of our method in a one-day ahead forecasting framework, comparing recent types of models for the realized covariance matrix based on a model confidence set approach. Additionally, we find that in Value-at-Risk (VaR) forecasting, vine models require less capital requirements due to smoother and more accurate forecasts.
SUMMARY Diebold and Yilmaz (Economic Journal 2009; 119; 158–171) introduce the spillover index to measure linkages between international financial markets. As their index depends on the ordering of the variables in the underlying VAR model, they check robustness by computing the index for a small number of randomly chosen permutations, stating that it was impossible to explore the huge number of renumerations. Building on a new divide-and-conquer strategy, we provide an algorithm for swiftly calculating the spillover index's maximum and minimum over all renumerations. Using this new algorithm, we find that the true range of the spillover index can be up to three times as large as estimated by Diebold and Yilmaz. Copyright © 2013 John Wiley & Sons, Ltd.
A flexible method is introduced to model the structure of a covariance matrix C and study the dependence of the covariances on explanatory variables by observing that for any real symmetric matrix A, the matrix exponential transformation, C = exp (A), is a positive definite matrix. Because there is no constraint on the possible values of the upper triangular elements on A, any possible structure of interest can be imposed on them. The method presented here is not intended to replace the existing special models available for a covariance matrix, but rather to provide a broad range of further structures that supplements existing methodology. Maximum likelihood estimation procedures are used to estimate the parameters, and the large-sample asymptotic properties are obtained. A simulation study and two real-life examples are given to illustrate the method introduced.
It has long been customary to measure the adequacy of an estimator by the smallness of its mean squared error. The least squares estimators were studied by Gauss and by other authors later in the nineteenth century. A proof that the best unbiased estimator of a linear function of the means of a set of observed random variables is the least squares estimator was given by Markov [12], a modified version of whose proof is given by David and Neyman [4]. A slightly more general theorem is given by Aitken [1]. Fisher [5] indicated that for large samples the maximum likelihood estimator approximately minimizes the mean squared error when compared with other reasonable estimators. This paper will be concerned with optimum properties or failure of optimum properties of the natural estimator in certain special problems with the risk usually measured by the mean squared error or, in the case of several parameters, by a quadratic function of the estimators. We shall first mention some recent papers on this subject and then give some results, mostly unpublished, in greater detail.
A large number of parameterizations have been proposed to model conditional variance dynamics in a multivariate framework. However, little is known about the ranking of multivariate volatility models in terms of their forecasting ability. The ranking of multivariate volatility models is inherently problematic because it requires the use of a proxy for the unobservable volatility matrix and this substitution may severely affect the ranking. We address this issue by investigating the properties of the ranking with respect to alternative statistical loss functions used to evaluate model performances. We provide conditions on the functional form of the loss function that ensure the proxy-based ranking to be consistent for the true one - i.e., the ranking that would be obtained if the true variance matrix was observable. We identify a large set of loss functions that yield a consistent ranking. In a simulation study, we sample data from a continuous time multivariate diffusion process and compare the ordering delivered by both consistent and inconsistent loss functions. We further discuss the sensitivity of the ranking to the quality of the proxy and the degree of similarity between models. An application to three foreign exchange rates, where we compare the forecasting performance of 16 multivariate GARCH specifications, is provided.