Content uploaded by Anshul Verma

Author content

All content in this area was uploaded by Anshul Verma on Jan 17, 2019

Content may be subject to copyright.

arXiv:1712.02138v2 [q-fin.ST] 10 May 2018

A cluster driven log-volatility factor model: a deepening on the

source of the volatility clustering

Anshul Verma∗1, R. J. Buonocore†1, and T. Di Matteo‡1,2,3

1Department of Mathematics, King’s College London, The Strand, London, WC2R 2LS,

UK

2Department of Computer Science, University College London, Gower Street, London,

WC1E 6BT, UK

3Complexity Science Hub Vienna, Josefstaedter Strasse 39, A 1080 Vienna

May 11, 2018

Abstract

We introduce a new factor model for log volatilities that performs dimensionality reduction

and considers contributions globally through the market, and locally through cluster structure

and their interactions. We do not assume a-priori the number of clusters in the data, instead

using the Directed Bubble Hierarchical Tree (DBHT) algorithm to ﬁx the number of factors. We

use the factor model and a new integrated non parametric proxy to study how volatilities con-

tribute to volatility clustering. Globally, only the market contributes to the volatility clustering.

Locally for some clusters, the cluster itself contributes statistically to volatility clustering. This

is signiﬁcantly advantageous over other factor models, since the factors can be chosen statisti-

cally, whilst also keeping economically relevant factors. Finally, we show that the log volatility

factor model explains a similar amount of memory to a Principal Components Analysis (PCA)

factor model and an exploratory factor model.

1 Introduction

Volatilities are an important factor for the estimation of risk [1] and for models aiming at dynamically

modelling price and what the rational, fair price should be under such models [2, 3]. However, the

eﬀect of volatility clustering, and particularly its unclear link with how volatilities are correlated

with each other, complicates this process. This causes a problem due to the high dimensionality of

the correlation matrix between the log volatilities that is also subject to noise [4], which makes it

diﬃcult to identify meaningful information about what drives the volatility and volatility clustering.

This problem is also relevant in multivariate volatility modelling since most popular methods such

as multivariate General Autoregressive Conditional Heteroskedasticity (GARCH) [5], stochastic

covariance [6] and realised covariance [7] suﬀer from the curse of dimensionality and an increase in

the number of parameters needed. One such way of tackling this problem is through dimensionality

reduction, which is a general class of methods that aims to reduce high dimensional datasets to

∗Corresponding author, anshul.verma@kcl.ac.uk, +447740779724

†riccardo_junior.buonocore@kcl.ac.uk,+447549919717

‡tiziana.di_matteo@kcl.ac.uk,+4402078482223

1

a reduced form which is a faithful representation of the original dataset [8], and is also related to

noise reduction of the dataset.

One such method of dimensionality reduction of correlation matrices is Principal Component

Analysis (PCA) [9]. It aims to transform the original correlation matrix into an orthogonal basis.

For square correlation matrices, which are those that we consider in this paper, this essentially

means calculating the eigenvalues and their respective eigenvectors. The ﬁrst eigenvector (called

the ﬁrst principal component) has the highest variance and explains most of the variability in the

data, the second eigenvector (called the second principal component) has the second highest variance

and explains less variability than the ﬁrst principal component, and so on. The method has been

applied to ﬁnance mainly through portfolio optimisation to produce sets of orthogonal portfolios

[10]. A paper which uses PCA in the context of volatility modelling is [11], where the author extracts

the ﬁrst few principal components and uses them to calibrate a multivariate GARCH model, with

a further extension proposed in [12]. The main drawback of PCA is that it is not clear how many

principal components i.e. factors to keep, as either too many principal components are kept or

the methods used to select the components are heuristic and subjective in nature [9]. In [13], the

authors suggest to keep the number of principal components according to the Marchenko-Pastur

distribution with a further reﬁnement made in [14] and previously in [15], however in [16] it is

pointed out valuable information may still be lost.

A highly related class of methods in dimensionality reduction are called factor models [17, 18, 19,

20]. Factor models are used to describe the dynamical evolution of time series, assuming that there

exist common factors through the asset’s sensitivity, often called responsiveness, to changes in the

value of these factors. Dimensionality reduction is then achieved through the description of the time

series as the number of factors is smaller than the number of stocks. Factor models have widespread

use in ﬁnance due to their relative (or at least superﬁcial) simplicity in comparison to other models

of returns series [17, 19, 21, 22, 20]. Factor models can be split into two varieties: exploratory, which

assume no underlying structure to the data, and conﬁrmatory, which tests relationships between

known factors [23].

However, similarly to PCA a question of how we should choose the factors arises. One such

answer can be categorised by assuming that we have some prior knowledge of the factors. The

simplest and earliest factor model which falls under this category is the Capital Asset Pricing Model

(CAPM)[17, 24, 25, 26]. It emerges from the extremely popular Markowitz scheme of portfolio

optimisation [27], which says it is better to spread an investment across a class of stocks in order

to reduce the total risk of the portfolio. CAPM develops this further by saying that the non

diversiﬁable risk, or systematic risk, comes from the stock’s exposure to changes in the market and

the corresponding sensitivity to this change.

A very well known factor model which has multiple factors, rather than just one like CAPM, is

the 3-factor Fama-French factor model [28, 19, 21, 29, 30]. In this factor model, the ﬁrst factor comes

again from the exposure to the market risk with two extra factors: the small minus big (SMB) and

the high minus low (HML)[19, 21]. The SMB factor follows the observation by Fama and French

that stocks with a smaller market cap, which is the market value of the stock used as a proxy of size,

tend to give additional returns. Equivalently, the HML factor represents the book/market ratio i.e.

the ratio of the total value of the assets owned by the company associated to a stock relative to

the stock’s market value, and is positively correlated with additional returns. The aim of the HML

factor is to evaluate whether stocks have been undervalued by the market, where the book/market

ratio exceeds 1, and thus have the potential for larger returns. Recently, the Fama-French model

has been extended to include 5 factors [31]. The arbitrage pricing theory (APT) is also a more

generalised multi factor model, except it states that returns are a linear function of macro economic

factors [18, 32]. In APT however, there is no indication of exactly how many and what factors

2

should be included, which then introduces an ad-hoc nature to the types and numbers of factors

included in the model.

The above factor models share the fact that the number and nature of the factors are somewhat

exogenous in the sense that they are determined by economic intuition on what should drive ﬁnan-

cial returns. Unfortunately, it has been pointed out that there is weak evidence for CAPM [28],

both Fama-French 3and 5factor models and to some manifestations of the APT [33, 34, 35, 36],

underlying the issue that these factors cannot explain the cross dependence of assets. Instead, there

is a strand of literature which invokes factors that are extracted from the ﬁnancial data itself thus

meaning that the factors are endogenous [37, 38, 20]. In essence, it has been shown that the col-

lective action of assets is what induces the factors, giving support to this type of determination of

factors [37], an approach we shall adopt here. Another diﬀerence is that the above factor models

are mainly applied to returns rather than volatilities.

In this paper, we instead build a new factor model of log volatilities that aims to reduce the

dimensionality by considering contributions globally from the market and more locally to the clusters

and their interactions. The number of factors is ﬁxed by the Directed Bubble Hierarchical Tree

(DBHT) clustering algorithm [39, 40], which therefore means we make no prior assumption on the

number of clusters and thus the number of factors to be considered. Using this factor model between

volatilities, we aim to study the link between the univariate volatility clustering and the multivariate

correlation structure of volatilities. We will see that whilst over the entire market the only signiﬁcant

contributor that aﬀects the memory is the market, individual clusters may have diﬀerent properties

where the cluster contributions and interactions are more signiﬁcant. This oﬀers a method to

statistically select factors based on memory reduction. We also note that for the clusters which

signiﬁcantly reduce their own memory are mostly made up by stocks from particular industries,

oﬀering an economic interpretation for the makeup of the cluster modes. We can thus select the

factors in a statistical manner like in PCA, but also retain the appealing economic interpretation

like in CAPM and Fama-French.

The structure of the paper is as follows: Section 2 desribes the dataset, Section 3 introduces a

new factor model for log volatilities, Section 4 describes how we select factors based on their memory

reduction using a new non parametric integrated proxy for the strength of the volatility clustering,

Section 5 we explore how the empirical link between volatility clustering strength and volatility

cross correlation can be explained. In Section 6, we reveal how each cluster has an economical

interpretation in terms of their identiﬁed dominant ICB supersector. Section 7 compares our factor

model to a PCA inspired factor model and an exploratory factor analysis model in terms of their

memory reduction performance. Section 8 reports the dynamic stability of the factor model. Finally,

we draw some conclusions in Section 9.

2 Dataset

The dataset we shall use consists of the daily closing prices of 1270 stocks in the New York Stock

Exchange (NYSE), National Association of Securities Dealers Automated Quotations (NASDAQ)

and American Stock Exchange (AMEX) from 01/01/2000 to 12/05/2017, which makes 4635 points

for each price time series. As anticipated in the introduction, we perform cross correlation analysis.

We therefore make sure that the stocks are aligned through the data cleaning procedure described

in A.1, which leaves our dataset with N= 1202 stocks. We calculate the log-returns time series of

a given stock i,ri(t), deﬁned as:

ri(t) = ln pi(t+ 1) −ln pi(t),(1)

3

where pi(t)is the price time series of stock i, and ri(t)is a time series of length T= 4364. After

standardising ri(t)so that it has zero mean and a variance of 1, we deﬁne the proxy we shall use

for the volatility as ln |ri(t)|i.e. the log absolute value of returns [41].

3 Log-volatility factor model

In this section we describe a new factor model for log volatilities, which we shall use to uncover the

relationship between the univariate volatility clustering eﬀect and the cross correlations between

volatilities. Let us recall that a general factor model is given by:

ri(t) =

P

X

p=1

[βipfp(t) + αip ] + ǫi(t),(2)

where ri(t)are the log returns for asset i,fpare the p= 1,2, ..., P factors. βip is their respective

sensitivities/responsiveness, which quantiﬁes how ri(t)reacts to changes in fp.αip is the intercept

and ǫi(t)are residual terms with zero mean. Firstly, we deﬁne the log volatility term we want to

study. Most stochastic volatility models (where the volatility is assumed to be random and not

constant) assume that the returns for the stock ifollow an evolution according to [42]

ri(t) = δ(t)eωi(t),(3)

where δ(t)is a white noise with ﬁnite variance and ωi(t)are the log volatility terms. The exponential

term encodes the structure of the volatility and how it contributes to the overall size of the return.

Taking the absolute value of (3) and the log of both sides, Eq. (3) becomes

ln |ri(t)|= ln |δ(t)|+ωi(t),(4)

from which we see that working with ln |ri(t)|has the added beneﬁt of making the proxy for volatility,

ωi(t)additive, which in turn makes the volatility more suitable for factor models. Since δ(t)is a

random scale factor that is applied to all stocks, we can set it to 1, so that ωi(t) = ln |ri(t)|. We

also standardise the ln |ri(t)|to a mean of 0and standard deviation 1as is performed in [43].

In the following subsections, we describe our factor model which considers contributions from

the market mode, clusters and interactions, and their corresponding ﬁtting procedures.

3.1 Market Mode

The log volatility term ωi(t)in Eq. (4) can be modelled as

ωi(t) = βi0I0(t) + αi0+ci(t),(5)

where βi0is the responsiveness of stock iwith respect to changes in I0(t), deﬁned as

I0(t) =

N

X

i=1

ξiln |ri(t)|,(6)

with the pseudo-index ξibeing the weight of stock ifor the market mode. αi0in eq. (5) is the excess

volatility compared to the market, I0(t). We note that the factor model in eq. (5) is in analogous

form to the general factor model in eq. (2). The ﬁrst two terms of eq. (5) represent the market

factor, which is the widely observed eﬀect of the market aﬀecting all stocks i.e. the co-movement

4

of all stocks at once [44, 13, 43]. We see from eq. (5) that performing the linear regression of ωi(t)

against I0(t)gives βi0and αi0, so that the ci(t)becomes the residue after performing the regression.

In table 1, we show two examples of the regression coeﬃcients for the market mode for two selected

stocks Coca Cola Enterprises (KO) and Transoceanic (RIG). We report the values of βi0and αi0

for the weighted scheme and for the equal weights scheme detailed in A.2, along with their p values

for the null hypothesis of each of the coeﬃcients being 0. As we can see from Table 1, at the 5%

level, the null hypothesis is rejected for all βi0for both weighting schemes, which means that we

can conclude that the βi0are signiﬁcant. For the αi0the null hypothesis is rejected for both stocks

in the equal weights case, and for the weighted case it is rejected only for RIG, and for these cases

we can conlude that the αi0are non-zero.

βi0αi0

KO 0.0310 (0) 0.0015 (0.4764)

RIG 0.0248 (0) 0.1972 (0)

(a) weighted modes

βi0αi0

KO 1.1564 (0) -0.0690 (0.0017)

RIG 0.9041 (0) 0.1426 (0)

(b) equal weights

Table 1: This table shows the responsiveness to the market mode I0(t),βi0and the corresponding

excess volatility αi0for stocks KO and RIG, calibrated as detailed in section 3.1. The p values

shown in brackets are for the null hypothesis that both βi0and αi0are 0. Table 1a is for the

weighted scheme and Table 1b for equal weights, which are detailed in A.2.

3.2 DBHT output

Since ci(t)is the residue after performing the regression in eq. (5), it represents the volatility that

is not explained by the market. We can therefore further deﬁne ci(t)as:

ci(t) = βikIk(t) +

n−1

X

k′=1

βik′Ik′(t) + ǫi(t),(7)

where βik are the responsiveness for the kcluster mode Ik(t)which iis a member of. In the sum

from eq. (7), the βik′are the responsiveness to changes in Ik′(t)which are the cluster modes of the

clusters k′6=ki.e. the clusters iis not a member of. In eq. (7) the ﬁrst term is for the cluster

factor and it represents the co-movement of the stock with its cluster. Like for eq. (5), eq. (7) is

an analogous form to eq. (2). The sum in eq. (7) represents the interactions the stock ihas with

other clusters, where the strength of the interactions are quantiﬁed and deﬁned through the βik′.

The next step of the calibration procedure concerns the identiﬁcation of the clusters, which is

relevant for the ci(t)term deﬁned in eq. (7). Now, we need to ﬁnd what the cluster structure is,

which we do by ﬁrst calculating G, which is the cross correlation matrix between ci(t), deﬁned as

Gij =1

T

T

X

t=1

ci(t)cj(t).(8)

We then apply the clustering algorithm to G. We use the clustering algorithm after removing

the market mode since this gives a more stable clustering [45]. We shall use the Directed Bubble

Hierarchical Tree, DBHT [39, 40, 46], to ﬁnd the cluster membership of stocks. DBHT is used

because as compared to other hierarchical clustering algorithms it provides the best performance in

terms of information retrieval [40]. Using the DBHT algorithm also means that we make no prior

5

assumption on exactly how many factors for the clusters should be included, instead extracting

them directly from the data. We can see from Table 2 that the DBHT algorithm identiﬁes a total of

K= 29 clusters, with the largest cluster comprising of 172 stocks and the smallest cluster comprising

of 5stocks. The average cluster size is 41.4.

3.3 Cluster modes and interactions

Once the number and composition of each cluster is identiﬁed, we can associate a factor to each

cluster k. The interactions are then characterised through the responsiveness βik′where k6=k′i.e.

how ci(t)changes w.r.t to Ik′(t). We deﬁne the cluster mode for cluster k,Ik(t), again as a weighted

average of volatilities for the assets in k

Ik(t) = X

i∈cluster k

ξikci(t).(9)

ξik is the weight for stock iwhich is in cluster k. From eq. (7), we see that similarly to the market

mode case, we can determine βik ,βik′and αik ,αik′by linearly regressing ci(t)against Ik(t)and

Ik′(t). We use elastic net regression [47] to ﬁnd βik and βik′to take into account the possibility

of Ik(t)and Ik′(t)being correlated, whilst also allowing for some of the βik′to be 0as imay not

interact with cluster k′. More details about elastic net regression are provided in appendix A.3.

4 Empirical link between volatility clustering and volatility cross

correlation

As anticipated in the introduction, we choose which factors are relevant for the decomposition in

Eq. (7), by measuring what the impact is of each cluster on the volatility clustering. Before turning

our attention to this analysis, let us introduce the volatility clustering proxy we use in the rest of

the paper.

4.1 Volatility Clustering

Volatility clustering is one of the so called stylised facts of ﬁnancial data, and expresses the idea

that returns are not independent since volatilities are autocorrelated [48, 49]. The autocorrelation

function (ACF) κ(L)is deﬁned as

κ(L) = corr(ln |r(t+L)|,ln |r(t)|)(10)

=h[ln |r(t+L)|ln |r(t)|]i

σ2,(11)

where h...idenotes the expectation. Lis the lag and σ2is the variance of the process of ln |r(t)|,

and note that we use log absolute value returns as a proxy for volatility. The interpretation of this

result is that large changes in returns are usually followed by other large changes in returns, or that

the returns retain a memory of previous values [50]. For this reason, volatility clustering can also

be called the memory eﬀect. κ(L)has also been assumed to follow a power law decay:

κ(L)∼L−βvol ,(12)

where βvol describes the strength of the memory eﬀect. A lower value of βvol indicates that more

memory of past values is kept. To compute βwe transform eq. (12) into loglog scales and compute

6

0 1 2 3 4 5 6

log(L)

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

log( (L))

empirical

fit

(a) Coca Cola Enterprises Inc. βvol = 0.4544

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

log(L)

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

log( (L))

empirical

fitted

(b) Transoceanic βvol = 0.3975

Figure 1: Empirical ACF of the log absolute value returns (blue solid lines) for Coca Cola Co. (KO)

in ﬁgure 1a and Transocean (RIG) in ﬁgure 1b in log-log scale. The linear best ﬁt is also shown in

red dashed lines.

the slope of the linear best ﬁt, which gives us the exponent βvol. We shall compute βvol using the

Theil-Sen procedure rather than using standard least squares since it is more robust to outliers [51].

We report in ﬁgure 1 the function κ(L)for Coca Cola Enterprises Inc. in ﬁgure 1a and Transoceanic

in ﬁgure 1b, both in loglog scale, with the linear best ﬁt also plotted. We deﬁne the entries Eij of

the empirical volatility cross correlation Eas

Eij =

T

X

t=1

ln |ri(t)|ln |rj(t)|.(13)

The proxy used for the volatility cross correlation is the average cross correlation for stock i,ρvol

i,

is deﬁned as

ρvol

i=1

N−1

N

X

i6=j

Eij (14)

Using the proxies for volatility clustering and the volatility cross correlation, [52] ﬁnds a negative

relationship between ρvol

iand βvol

i, which we conﬁrm holds on our data set of daily data and using

ln |r(t)|, rather than the original high frequency data and |r(t)|used in [52], in ﬁgure 2. The main

consequence of this result is that it implies that the more the volatility of a stock iis linked to other

stocks, the stronger the memory eﬀect and thus it retains more information about previous values

of volatility, linking the strength of volatility clustering with the cross correlation matrix between

volatilities.

4.2 Non parametric memory proxy

As already mentioned, the βvol power law exponent that is ﬁtted to the autocorrelation function

of the absolute returns is a proxy for the strength of the memory eﬀect: the lower the beta the

stronger the memory eﬀect. The use of the power law to quantify the memory eﬀect is parametric

as we assume the tail decays as a power law through the exponent β. The autocorrelation function

itself can be noisy due to its slow convergence [48], which can be seen in ﬁgure 1. In light of this,

7

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

i

vol

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

vol

Figure 2: Negative dependence between ρvol

iand βvol

i. The negative relationship was tested using

1 sided Spearman’s rank correlation at the 5% level with the null hypothesis of there being no

correlation and was rejected, which conﬁrms the result of [52] on our data.

we instead introduce a new model free proxy, η, by integrating the autocorrelation function over

time lags Luntil Lcut, which we deﬁne as the standard Bartlett Cut at the 5% level [53].

η=ZLcut

L=1

κ(L)dL , (15)

where κ(L)is the empirical autocorrelation matrix of the log absolute returns as a function of the

lag L. With this proxy the larger the value of ηthe greater the degree of the memory eﬀect (in the

βproxy this corresponds to larger values of the exponent). The median value reported across all

stocks is 20.7318 ±8.6901, where the error is computed across all stocks using the median absolute

deviation (MAD) for ηideﬁned as

MAD =median (|ηi−median(ηi)|).(16)

We have also plotted the βas a memory eﬀect proxy vs ηin ﬁgure 3a, which as expected shows a

decreasing relationship between ηand the βmemory proxy, which is the one used in the literature,

since a larger memory eﬀect means a higher η, but lower β. This provides justiﬁcation for our use

of η. This proves that ηis coherent with βvol and thus can be used a proxy for the strength of the

memory eﬀect.

Figure 3b which is a plot of ρvol

ivs ηconﬁrms the main result of [52] using ηinstead of βvol ,

and was tested using Spearman’s rank correlation at the 5% level for the null hypothesis, which

was rejected, of there being no correlation between ρvol

iand ηversus alternative hypothesis of there

being signiﬁcant positive relationship. Our proxy can therefore also conﬁrm the result of [52].

Plotting Lcut vs ηin ﬁgure 4a, reveals that processes with strong short memory will have a lower

Lcut and thus lower η, whilst processes with a long memory component will have higher Lcut and

8

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

vol

0

20

40

60

80

100

120

140

(a) βvol vs η

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

i

vol

0

20

40

60

80

100

120

140

(b) ρvol

ivs η

Figure 3: In ﬁgure 3a we plot the βvol power law exponent proxy for the strength of the memory

eﬀect vs c the integrated proxy. In ﬁgure 3b we plot the relationship between ρvol

iand ηdeﬁned in

the text. The decreasing relationship in ﬁgure 3a and the increasing relationship in ﬁgure 3b was

tested using the Spearman’s rank correlation at the 5% level and was rejected in both cases.

Lcut

0 100 200 300 400 500 600 700 800 900

η

0

20

40

60

80

100

120

140

(a) Lcut vs η

Lcut

0 100 200 300 400 500 600 700 800 900

βvol

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(b) Lcut vs βvol

Figure 4: The ﬁgure on the left is a plot of Lcut vs ηfor all stocks. The ﬁgure on the right is Lcut vs

βvol for all stocks. The increasing relationship shown in ﬁgure 4a and decreasing relationship shown

in ﬁgure 4b are tested using Spearman’s rank correlation, and are 0.7871 and -0.4271 respectively,

which are statistically signiﬁcant at the 5% level.

9

η. This is important since volatility clustering is a result of long memory present in time series.

An analogous plot of Lcut vs βvol in ﬁgure 4b shows the expected decrease in βvol as Lcut increases,

but the relationship is not as strong as that of Lcut vs η(an absolute Spearman correlation value

of 0.4271 vs 0.7871 tested at the 5% level). A consequence of this is that ηcan better distinguish

between short and long memory processes as compared to βvol.

5 Memory ﬁltration

In this section, by means of the factor model introduced in eqs. (5)(7) and also by means of the η

proxy introduced in the previous subsection, we want to understand the origin of the empirical link

between the memory strength and the volatility cross-correlation. This analysis will in turn be also

fundamental for the cluster mode selection in our model. The main intuition is that the market

mode, the cluster mode and the interaction modes all bring relevant information about the memory

of a certain stock’s time-series.

5.1 Assessing the memory contributions

Let us here describe the method we use in order to understand the contribution to the memory of

each term in the factor model in eqs. (5)(7). For every time-series, say for stock i, we follow a

step-by-step procedure, by measuring the value of the proxy ηifor the following four times:

1. on the plain time-series ηi,P L;

2. on the residual time-series once the market mode is removed ηi,M M ;

3. on the residual time-series once the market mode and the cluster mode (of the the cluster the

stock belongs to) are removed ηi,CM ;

4. on the residual time-series once market, cluster and interaction mode are all removed. In order

to make a quantitative comparison ηi,I M .

The next step consists in assessing the memory reduction after each removal. We do so by computing

the ratio of two subsequently computed value of ηi. For stock ithus we have that

1. ηi,MM

ηi,P L deﬁnes the reduction in memory induced by the market mode;

2. ηi,CM

ηi,MM deﬁnes the reduction in memory induced by the cluster mode once the market mode is

removed;

3. ηi,IM

ηi,CM deﬁnes the reduction in memory induced by the interaction mode once the market mode

and the cluster mode are removed.

According to the deﬁnition, if a ratio is below one it means that a memory reduction has occurred

via the corresponding removal. In order to understand what is the average behaviour of these ratios

we take the median of each of them computed on all stocks. So, for example, the average reduction

of memory induced by the market mode on a given set of stocks is median(ηi,MM

ηi,P L )computed over

the index i. As for an error to associate to this measure we used the Median Average Deviation

10

[54], deﬁned as for ηi,M M

ηi,P L

MAD ηi,M M

ηi,P L (17)

=median

ηi,MM

ηi,P L

−median ηi,MM

ηi,P L ,(18)

and similarly for ηi,CM

ηi,MM and ηi,I M

ηi,CM . Both the median and the MAD were chosen because of their

robustness against outliers. We regard as signiﬁcant a reduction of memory on the given set of

stocks for which the median plus the mad of the ratio are below one.

5.2 Whole market analysis: ﬁnding the main source of memory

We apply here the procedure described in the previous subsection to our dataset described in Section

2. For completeness, in Fig. 5 we report the result of our analysis for both the unweighted and the

weighted schemes. Figure 5a reports the value of the ratios along with the errors (black vertical

bars). We observe that in all cases the average value plus the error stays below one, which means

that every term gives a meaningful contribution to the overall memory. However we also notice

that, in particular for the reduction coming from the cluster mode, there is a large variablity among

stocks. Figure 5b reports the same result but showing what is the contribution of each removal

with respect to the overall memory. According to our analysis, the majority of the contribution

comes from the market mode, which is than the main source of memory for the volatility. We also

plot in ﬁgure 6 the cumulative of the fraction of stocks with at most the percentage of memory left

reported on the x axis, after all contributions are removed. For example from ﬁgure 6 we ﬁnd that

90% of all stocks have only 16.7% of their memory unexplained by all the contributions. We also

note here that there is little diﬀerence in ﬁgure 6 between the weighted and unweighted versions so

we shall herein use the unweighted scheme for most of the analysis. This analysis establishes that

there is indeed a link between the log volatility and volatility clustering.

5.3 Cluster-by-cluster analysis: selection criterion for factors

In this subsection, instead of aggregating the result of the memory reduction over the whole market,

we specialize and check what happens to the memory on a cluster-by-cluster basis. For brevity, we

only discuss in detail the case of cluster 12 and cluster 22, as deﬁned by the DBHT algorithm

discussed in section 3.2, since they are quite informative about the diﬀerent behaviour one can

ﬁnd at a cluster level. We repeat then the same analysis we performed in the previous subsection

but report the behaviour of these two particular clusters. In ﬁgure 7 we report the result of our

analysis for the unweighted scheme. Figure 7a reports the value of the ratios along with the errors

(black vertical bars). Diﬀerently for the whole dataset, we see that from ﬁgure 7a, the cluster mode

removes the vast majority of the memory for cluster 12, without any contribution coming from the

market mode or from the interactions. Instead for cluster 22, we see from ﬁgure 7a that the market

is the major contributor to the memory, whereas the cluster mode is reducing some the remaining

memory to some extent and the interactions are again not giving much contribution. Figure 7b

reports the same kind of result but relatively to the overall memory. These results suggest that a

local analysis reveals a richer behaviour in how the terms in our log volatility factor model aﬀect

the memory eﬀect, showing that there is also a link between the correlation structure of the log

volatilities and the memory eﬀect. Given these results, we argue that a good criteria for selecting

statistically meaningful factors, among all cluster modes, to be included in the deﬁnition of our

11

MM CM IM

0

0.2

0.4

0.6

0.8

1equal weights

weighted

(a)

equal weights weighted

0

0.2

0.4

0.6

0.8

1

% contribution to memory

MM

CM

IM

residual

(b)

Figure 5: Results for the procedure described in section 5.1 across all stocks in the market. Figure 5a

is the median of the ratio of the memory proxies for, starting from the left, ηi,M M

ηi,P L ,ηi,CM

ηi,MM and ηi,I M

ηi,CM ,

computed over the whole market. The blue bars are for the equal weights scheme and the yellow

bars are for the weighted scheme. The black vertical bars represent the errors among stocks memory

reduction applied to the whole market, which is calculated using eq. (18) and its equivalents for the

other ratios. In ﬁgure 5b we plot the contribution to the memory eﬀect of the market (MM), cluster

(CM) and interactions (IM) as a percentage with respect to the overall memory. The residual is

remaining percentage of memory that is unexplained by the contributors. The values are computed

over the whole market. The left column is for the equal weights scheme and the right column is for

the weighted scheme.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

unexplained memory

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

cdf

equal weights

weighted

Figure 6: Cumulative distribution of the fraction of stocks which have a fraction residual memory

left after all contributors of the model (market mode, cluster mode and interactions) are removed.

The red line is for the weighted modes and the blue the equal weighted modes

12

MM CM IM

0

0.5

1

1.5 cluster 12

cluster 22

(a)

cluster 12 cluster 22

0

0.2

0.4

0.6

0.8

1

% contribution to memory

MM

CM

IM

residual

(b)

Figure 7: The same set of graphs as Fig. 5 except using the equal weights scheme and taking only

stocks belonging to cluster 12 and 22. In ﬁgure 7a we plot the median ratio of, starting from the left,

ηi,MM

ηi,P L ,ηi,CM

ηi,MM and ηi,I M

ηi,CM , computed over the stocks in cluster 12 for the blue bars and over stocks in

cluster 22 for the yellow bars. Equal weighted modes are used. The black vertical bars represent the

errors among stocks memory reduction applied to stocks in cluster 12 and 22, which is calculated

by using eq. (18) and its equivalents for the other ratios. In ﬁgure 7b we plot the contribution to

the memory eﬀect of the market (MM), cluster (CM) and interactions (IM) as a percentage with

respect to the overall memory. The residual is remaining percentage of memory that is unexplained

by the contributors. The values are computed over all stocks in cluster 12 for the left column and

over all stocks in cluster 22 for the right column. Equal weighted modes are used.

13

0 5 10 15 20 25 30

cluster no.

0

20

40

60

80

100

120

140

160

180

no. stocks

Automobiles & Parts

Banks

Basic Resources

Chemicals

Construction & Materials

Financial Services

Food & Beverage

Health Care

Industrial Goods & Services

Insurance

Media

Oil & Gas

Personal & Household Goods

Real Estate

Retail

Technology

Telecommunications

Travel & Leisure

Utilities

Figure 8: Composition of DBHT clusters in terms of ICB supersectors. The x axis labels the clusters

of DBHT and the y axis is the number of stocks in each cluster. The colours represent particular

ICB supersector given in the key.

factor model, is to choose those which achieve a signiﬁcant reduction (in the sense of Section 5.1)

to the memory of the stocks within their cluster. Table 2 summarizes the results of this procedure,

reporting in the ﬁrst column the cluster number k(as given by the DBHT algorithm). The second

column contains the number of stocks in each cluster and in the fourth column we show if the cluster

mode reduces the memory of the stocks within that cluster signiﬁcantly. As we can see we ﬁnd that

out of 29 clusters, 7 do not have a signiﬁcant meaning to the memory, thus, according, to our

criteria are discarded. The ﬁfth, sixth, seventh and eighth column of the table 2 are the fractional

contributions that the market, cluster, interactions and residuals make to the overall memory in the

cluster. Comparing the last four columns in table 2 we see that there is signiﬁcant heterogeneity

in the amount of contributions the market and cluster make to the cluster’s overall memory, which

highlights the importance of the inclusion of cluster factors in our factor model.

6 Economical interpretation of the clusters

Up till now, we have focused on determining the clusters via statistical tools. In this section we

show that the clusters also have an economical interpretation. In ﬁgure 8, we show the cluster

composition of each cluster identiﬁed through DBHT using the Industrial Classiﬁcation Benchmark

(ICB)supersector classiﬁcation of common industries, with each colour representing a diﬀerent su-

persector. In particular from ﬁgure 8, we observe that clusters are dominated by a particular

supersector. For example, we see from ﬁgure 8 that clusters 12 and 22 show the presence of domi-

nant supersectors: the real estate sector for cluster 12 and technology sector for cluster 22. In order

to check that these identiﬁcations of dominant sectors are meaningful, we used the same hypoth-

esis test as in [56, 40], which tests the null hypothesis that the cluster has merely randomly been

14

k no. stocks dom. supersector cluster sig market cluster interac resid

1 68 OG (0) T 0.000 0.758 0.055 0.187

2 26 OG (0) T 0.000 0.653 0.097 0.250

3 12 FS (0) T 0.387 0.463 0.041 0.110

4 39 U (0) T 0.855 0.032 0.024 0.090

5 13 BR (0) T 0.727 0.199 0.016 0.058

6 11 IGS (0.089957) T 0.719 0.073 0.026 0.182

7 23 FS (0) T 0.721 0.127 0.053 0.100

8 17 FB (0) F 0.818 0.000 0.021 0.161

9 9 HC (0) T 0.923 0.029 0.001 0.047

10 24 IGS (0.355912) T 0.471 0.403 0.028 0.098

11 11 HC (0) F 0.890 0.000 0.018 0.093

12 32 RE (0) T 0.000 0.977 0.005 0.018

13 30 FS (0) T 0.662 0.226 0.019 0.093

14 144 RE (0) T 0.574 0.272 0.049 0.105

15 77 HC (0) T 0.769 0.093 0.012 0.127

16 5 TL (0) T 0.968 0.012 0.003 0.016

17 66 B (0) T 0.733 0.149 0.040 0.078

18 111 B (0) T 0.833 0.088 0.024 0.055

19 15 PHG (0) T 0.781 0.134 0.031 0.054

20 8 TL (0) F 0.965 0.000 0.002 0.033

21 172 T (0) T 0.684 0.221 0.013 0.082

22 118 T (0) T 0.836 0.071 0.020 0.073

23 14 I (0) F 0.951 0.000 0.007 0.042

24 12 IGS (0.003514) T 0.911 0.050 0.005 0.034

25 17 C (0) T 0.956 0.005 0.003 0.035

26 31 R (0) T 0.900 0.036 0.008 0.057

27 43 IGS (0) F 0.945 0.000 0.005 0.049

28 37 R (0) F 0.940 0.000 0.003 0.057

29 15 IGS (0) F 0.954 0.000 0.003 0.044

Table 2: Table showing the cluster no. k in the ﬁrst column and the number of stocks in the

second column. In the third column, we have the dominant ICB supersector (abbreviated to the

ﬁrst letters in each supersector, which are listed in ﬁgure 8). In brackets in the third column we

have the p value of the hypothesis test which tests whether the most dominant supersector can

be meaningfully identiﬁed from the cluster [55], which are given to 6 decimal places. The fourth

column details whether the cluster mode signiﬁcantly reduces the memory for that cluster. The

ﬁfth, sixth, seventh and eighth columns are the fraction of contribution (to 3 decimal places) that

the market, cluster, interactions and residual make respectively to the total memory.

15

assigned supersector classiﬁcations using the hypergeometric distribution versus the alternative hy-

pothesis that the supersector is indeed dominating the cluster. Starting from a signiﬁcance level of

5%, we additionally used a conservative Bonferroni correction for multiple hypothesis testing [57]

of 0.5NclNI C B to reduce the level of signiﬁcance, where Ncl is the number of clusters identiﬁed

through DBHT and NICB is the number of ICB supersectors. This reduces the level of signiﬁcance

to 9.0×10−5, reporting the p values to six decimal places. Table 2 details the results of applying

this process to all clusters, with the dominant supersector denoted in the third column. We see from

Table 2 that in 26 clusters, the cluster can indeed be matched to their dominating supersector, and

of the clusters that signiﬁcantly contribute to their own memory (see section 5.3), 19 correspond to

their dominating supersector. This opens the possibility of choosing cluster modes for a further re-

ﬁnement of the factor model between log volatilties by choosing the cluster modes which reduce the

memory statistically signiﬁcantly after the market mode is removed, but also having an economic

interpretation of being dominated by particular supersectors.

Moreover, after comparing clusters which are dominated by the same ICB supersector in table

2, we see that the groups of clusters k= 1,2and k= 17,18, which are dominated by the Oil and

Gas and Banks supersectors respectively, have similar contributions for the market, clusters and

interactions. However, there are instances where clusters dominated by the same supersector do not

have similar contributions. For example, clusters k= 12,14 are both dominated by the Real Estate

supersector, but for k= 12 the market does not statistically contribute to the memory, whilst for

k= 14 it does. This could be indication of markets moving away from clearly deﬁned industrial

supersectors, which was also noted in [55], and emphasises why we have used the clustering algorithm

DBHT, rather than taking the industrial classiﬁcations directly.

7 Comparison with PCA and Exploratory Factor Analysis

In this section we compare the memory reduction performance of our model with a well established

PCA inspired factor model [58] and exploratory factor analysis driven factor model. Firstly, we

explain the importance of the PCA factor model. The PCA analysis gives a set of orthogonal

eigenvectors that deﬁne mutually linearly uncorrelated portfolios that can be used to help deﬁne

factor models by assigning each principal component a separate factor. However, as we have pointed

out it is diﬃcult to decide how many principal components we should keep. In our analysis, the

number of principal components we keep in the PCA factor model shall be ﬁxed to be the same as

the number of factors in our factor model i.e. 20. PCA aims to explain the diagonal terms in the

orthogonal basis of the correlation matrix E, which is the correlation matrix between the ln |ri(t)|.

Exploratory factor analysis (FA) on the other hand is more general, and aims to explain the oﬀ

diagonal terms of E, using the general linear model in (2). Again, there are problems selecting

exactly how many factors should be included [59], but we ﬁx the number of factors in the FA model

to be equal to the number of factors in our log volatility factor model i.e. 20. After extracting

the factors, we apply a varimax rotation of the factors [60], which is commonly applied in factor

analysis to improve understandability. In ﬁgure 9 we plot the cumulative distribution function of

how much residual memory is left after removal of the factors for the log volatility factor model, FA

model and PCA factor model as a percentage of the total memory before removal.

We see from ﬁgure 9 that 90% of all stocks only have a maximum of 16.7% residual memory

left for the factor model of log volatility, whereas 90% of all stocks have a maximum of 12.7% of

residual memory left, which means that the PCA factor model and the log volatility factor model

both explain the memory to the same eﬃciency. For the exploratory factor model, we see that 90%

of all stocks have 21.8% of their memory left, which is worse than the log volatility factor model

16

0 0.2 0.4 0.6 0.8 1

unexplained memory

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

cdf

logvol model

FA

PCA

Figure 9: Empirical cumulative distribution function of the unexplained residual memory for the

factor model in blue line, the PCA in black, where we only take the ﬁrst 23 principal components,

and the exploratory factor analysis, where we use 23 factors and a varimax rotation.

17

and the PCA factor model, but still has a comparable performance. We can therefore conclude that

the log volatility factor model explains the same amount of memory as the other two models, even

after ﬁxing the amount of factors to be the same in the PCA and exploratory factor model.

8 Dynamic Stability of Clusters and their Memory Properties

So far the results that have been presented are based on static correlation matrices, which are

computed across the whole time period considered in our dataset. A natural question then arises

about whether the results presented in sections 5 and 6 are dynamically stable. First we divide

each stock’s time series into 50 rolling windows of length 1600, which gives a shift of 56 days for

each window [55]. For every window, we then perform the same analysis as is done in section 3.

That is, for each rolling window m= 1,2, ..., 50 we remove the market mode computed on that

time window, and then compute the corresponding correlation matrix Gmand its clustering Ym

using the DBHT algorithm. To tackle whether the clusters themselves are dynamically stable, we

use a similar procedure to the one presented in section 6 and is also carried out in [55]. Speciﬁcally,

for each time window m, we use the hypergeometric test to see if each of the clusters in the static

clustering Xare statistically similar to a cluster in Ym, recording the number of time windows

where there is a possible match. This is recorded in the blue bars in ﬁgure 10. We also calculate

whether each of the clusters in Ymcan still statistically reduce their memory in every time window

m, and measure the total number of time windows where this happens, which is plotted in the red

bars in ﬁgure 10. These two numbers give a measure of persistence of both the appearance and

statistical memory reduction properties for each cluster.

As we can see from ﬁgure 10 most clusters are quite stable, appearing in most time windows. The

exceptions to this are clusters 6,10,24, which interestingly can all be identiﬁed with the Industrial

Goods and Services supersector (from table 2), and cluster 16, which is quite a small cluster with

only 5stocks and thus more likely to be unstable in time due its small size. From ﬁgure 10, we

can also conclude that the memory ﬁltration properties of the clusters identiﬁed in section 5.3 are

stable in time. This is because ﬁgure 10 indicates high persistence of the static clusters from table

2 in the memory sense (red bars) that statistically contribute to their memory (for example clusters

2 and 12). On the other hand, we see low persistence in the memory sense for clusters that do not

contribute to their own memory in the static case (for example clusters 27,28,29).

9 Conclusion

We proposed a new factor model for the log-volatility discussing how each term of the model aﬀects

the stylized fact of the volatility clustering. This reduces the information present in the linear

correlation between the log volatilities to a global factor, which is the so-called market mode, and

second and third local factors, which are the cluster mode and the interactions. Using a new

non parametric, integrated proxy for the volatility clustering, we found that there is indeed a link

between the volatility and volatility clustering. First, the dataset was examined globally, which

revealed the market to account for the majority of the volatility clustering eﬀect present in our

dataset. However, a local cluster by cluster analysis instead reveals signiﬁcant variability: in some

clusters, the cluster mode itself may be contributing to the volatility clustering. This enabled us to

select only statistically relevant cluster factors, reducing the information in the correlation between

the log volatility and the number of factors further. From these reduced set of factors, we can

select factors that have an economic interpretation through the identiﬁcation of their dominant

ICB supersector, which decreased the number of relevant factors some more. This is signiﬁcantly

18

5 10 15 20 25

0

5

10

15

20

25

30

35

40

45

50

no. time windows

no. appearances

no. active

Figure 10: The blue bars are the number of time windows where a cluster k(whose identity are

detailed in 2), can be statistically identiﬁed with a cluster in Ym, which is the clustering computed

over 50 rolling windows of length 1600. The red bars are the number of time windows where a

cluster in Xcan statistically reduce its own memory on the rolling time window m.

19

advantageous over other potential factor models that could be used for log volatility such as PCA and

exploratory factor analysis since we do not subjectively select the number of factors, and also because

the factors have a clearer economic interpretation through the identiﬁcation of their dominant ICB

supersector. A comparison of the log volatility factor model with PCA and an exploratory factor

model reveals that they explain the same amount of memory in the dataset. Both the clusters and

their reported memory ﬁltration were also found to be dynamically stable.

This work is particularly relevant for the ﬁeld of volatility modelling, since most multivariate

models such as multivariate extensions of GARCH, stochastic covariance and realised covariance

models suﬀer from the curse of dimensionality and increase in the number of parameters. The log

volatility factor model presented here could be used to help reduce the amount of parameters needed

for these models through the identiﬁcation of a reduced set of factors given by the procedure in this

paper.

A Appendix

A.1 Data cleaning process

Our dataset cannot be used as it is since the price time-series are not aligned, which is due to the

fact the some stocks have not been traded on certain days. In order to overcome this issue, we apply

a data cleaning procedure which allows us to keep as many stock as possible. For example, we do

not want to remove a stock just because it was not traded on few days in the given time-span. The

main idea is to ﬁll the gaps dragging the last available price and assuming that a gap in the price

time-series corresponds to a zero log-return. At the same time we do not want to drag too many

prices because a time-series ﬁlled with zeros would not be statistically signiﬁcant. In light of this

we remove from our dataset the time-series which are too short in a certain sense. The detailed

procedure goes as follows:

1. Remove from the dataset the price time-series with length less than ptimes the longest one;

2. Find the common earliest day among the remaining time-series;

3. Create a reference time-series of dates when at least one of the stocks has been traded starting

from the earliest common date found in the previous step;

4. Compare the reference time-series of dates with the time-series of dates of each stock and ﬁll

the gaps dragging the last available price.

In this paper we chose p= 0.90 thus keeping as much as possible unmodiﬁed time-series. However,

the results do not change if we pick a higher value of p.

A.2 Weighting schemes

Here we shall deﬁne the two types of weighting schemes used in this paper for the ξiand ξik deﬁned

in (6) and (9) respectively. The ﬁrst weighting scheme is based on the eigenspectrum of Eand

G. It is useful now to explain the ﬁnancial interpretation of the eigenvectors vwith entries viand

eigenvalue λfor E.vican be seen as weights for a portfolio deﬁned by v. Measuring the risk from

the volatility of the portfolio via its variance, we see it is given by:

1

TX

t X

i

viln |ri(t)|!2

=X

ij

vivjEij =λ(19)

20

Hence λrepresents the risk from the volatility of the portfolio given by v. We set ξi=vi, where

now viis the ith entry of the eigenvector corresponding to the largest eigenvalue of the empirical

correlation matrix E. This is called the market eigenvalue as it represents all stocks moving together

[13], and is also portfolio of stocks that gives the risk of the market volatility mode through its

corresponding eigenvalue. We could have also used a real index to determine the weights e.g. the

Dow Jones, but [45] showed that this does not eﬀectively remove the inﬂuence of modes from returns

compared to a pseudo-index.

The weights ξik are established in a similar way to the market mode case, which we shall do by

considering only the part of Gwhich corresponds to members of the cluster. Deﬁning a submatrix

of G

G(k)={G}(i,j)∈clusterk (20)

Where {...}(i,j)∈cluster k refers to only keeping the elements the matrix in which iand jare stocks

in cluster k. Thus G(k)is the square sub matrix of Gcorresponding to cluster k. This submatrix

is the correlation matrix of a market which consists only of stocks which are part of cluster k.

Hence, in exactly the same way as the market eigenvalue, the largest eigenvalue of G(k)represents

stocks of the cluster moving together, the value of the eigenvalue being the risk of the cluster market

portfolio, and the related eigenvector giving the weights of such a portfolio. Therefore, the deﬁnition

of the weights ξik for cluster kare determined by setting ωik =v(k)

i, which is the ith entry of the

eigenvector corresponding to the largest eigenvalue of G(k). This is the weighting scheme used and is

compared to the case of equal weights where ξi=1

Nand ξik =1

mkin ﬁgures 5a, 5b and 6 thereafter

the equal weights scheme is used.

A.3 Elastic Net Regression

Elastic net regression is used to ﬁnd the values of βik and βik′using Eq. (7). Further details of the

use of this method is provided in this appendix. Elastic net regression [47] is a hybrid version of

ridge regularisation and lasso regression, thus providing a way of dealing with correlated explanatory

variables (in our case Ik(t)and Ik′(t)) and also performing feature selection, which takes into account

non-interacting clusters Ik′(t)that ridge regularisation would ignore. Elastic net regression solves

the constrained minimisation problem

min

βi

1

T

T

X

t=1 ci(t)−I(t)†βi2+λPa(βi)(21)

, where βiis the vector of loadings given by (βi1, βi2,...,βiK )†,I(t)is the matrix consisting of

columns (I1(t), I2(t),...,INcl (t)) and λand aare hyperparameters. Pa(βi)is deﬁned as

Pa(βi) =

M

X

j=1 (1 −a)β2

ij

2+a|βij |!(22)

. The ﬁrst term in the sum of Eq. (22) is the L2penalty for the ridge regularisation and the

second term in the sum is the L1penalty for the lasso regression. Hence if a= 0 then elastic

net reduces to ridge regression and if a= 1 then elastic net becomes lasso, with a value between

the two controlling the extent which one is preferred to the other. The determination of the a

hyperparameter, controlling the extent of lasso vs ridge, and λ, for the ridge, is done using 10 cross

validated ﬁts [47], picking the pair of (a, λ)that give the minimum prediction error. We show the

values of βik and test the signiﬁcance of the predictor Ik(t)at the 5% level in Table 3 , where the

p value is shown in brackets, using the signiﬁcance test outlined in [61].

21

βik

KO 0.9431(0) 0.8997(0)

RIG 0.9041 (0) 1.1265(0)

Table 3: This table shows the responsiveness to the cluster mode Ik(t),βik calibrated as detailed

in section 3.3. P values shown in brackets test the signiﬁcance of the predictor given by the cluster

mode Ik(t). The ﬁrst column is for the weighted scheme and second is for equal weights, which are

detailed in A.2.

200 400 600 800 1000 1200

200

400

600

800

1000

1200

-0.2

0

0.2

0.4

0.6

0.8

1

Figure 11: Heat map of the correlation matrix for Gwith the stocks reordered to correspond to

their cluster no. from table 2. The colour legend for the heat map is given to the right of the ﬁgure.

A.4 Visualisation of Residuals and Factors

We can represent the correlation matrix Gdeﬁned in eq. (8) as a heat map, which is shown in

ﬁgure 11 with the stocks reordered according their cluster no. kgiven by table 2. From ﬁgure 11,

we see the clusters of correlation matrix, which are given by the square blocks along the diagonal

that are more populated by higher correlation values. We also see the interactions between the

clusters which are represented by the rectangular blocks of higher correlation values away from the

main diagonal.

In order to provide a visualisation of the factors, we plot the time series of the market mode

I0(t)and the two particular cluster modes Ik(t)for k= 1,12, where the subscript of the cluster

modes indicates the particular clusters we are using from table 2, in ﬁgure 12. We see from ﬁgure

12 that the time series encodes important information regarding market conditions. In the plot for

I0(t)in ﬁgure 12a, the two periods of high volatility indicated by the red and black dashed lines

represent the Great Financial Crisis of 2008 and the Eurozone Debt Crisis (note that the extreme

low volatility seen before 2002 was caused by the American stock exchanges being shut down due

to the September 11th terrorist attack). The time series of I1(t)in ﬁgure 12b again shows a high

22

2000 2002 2005 2007 2010 2012 2015 2017

date

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

(a) I0(t)

2000 2002 2005 2007 2010 2012 2015 2017

date

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

(b) I1(t)

2000 2002 2005 2007 2010 2012 2015 2017

date

-2

-1

0

1

2

3

4

(c) I12(t)

Figure 12: Time series of the market mode I0(t)in a, and the cluster modes Ik(t)for k= 1,12

(see table 2) respectively in b and c, where the subscripts of the cluster modes refers to the clusters

given in table 2. The red dashed lines in these plots refers to the outbreak of the Great Financial

Crisis of 2008. The black dashed line in ﬁgure 12a marks a portion of the Eurozone debt crisis. The

light blue dashed line in ﬁgure 12b marks a low global demand in oil and gas supplies.

volatility period during the ﬁnancial crisis, but we also see another high volatility phase denoted

by the light blue dashed line. This represents the volatility in the oil and gas markets caused by

low demand, and makes sense since table 2 shows that cluster 1 represents the Oil and Gas ICB

supersector.

A.5 Smoothness of η

We plot ηas a function of the upper limit in the integrand of eq. (15), where the upper limit L′is

allowed to be in the interval [1, Lcut]. As we can see from both plots in ﬁgure 13, the line is much

smoother showing that the ηproxy is much more robust with respect to the noisy signal of the

empirical ACF. This oﬀers an advantage of using ηrather than βvol which is more sensitive the the

noise in the ACF and gives poor ﬁts to the ACF in log-log scale as can be seen from the examples

in ﬁgure 1.

23

0 50 100 150 200 250 300

L'

0

5

10

15

20

25

30

35

(a) Coca Cola Enterprises Inc.

0 20 40 60 80 100 120 140

L'

0

2

4

6

8

10

12

14

(b) Transoceanic

Figure 13: Integrated proxy ηas a function of the lag L′where ηis integrated over [1,L’] until

L′=Lcut. Fig. 13a is for Coca Cola Co. and ﬁg. 13b for Transocean

References

[1] Jean-Philippe Bouchaud and Marc Potters. Theory of ﬁnancial risk and derivative pricing:

from statistical physics to risk management. Cambridge university press, 2009.

[2] John Hull and Alan White. The pricing of options on assets with stochastic volatilities. The

journal of ﬁnance, 42(2):281–300, 1987.

[3] John C Hull. Options, futures, and other derivatives. Pearson Education India, 2006.

[4] Joël Bun, Jean-Philippe Bouchaud, and Marc Potters. Cleaning large correlation matrices:

tools from random matrix theory. Physics Reports, 666:1–109, 2017.

[5] Luc Bauwens, Sébastien Laurent, and Jeroen VK Rombouts. Multivariate garch models: a

survey. Journal of applied econometrics, 21(1):79–109, 2006.

[6] Peter K Clark. A subordinated stochastic process model with ﬁnite variance for speculative

prices. Econometrica: journal of the Econometric Society, pages 135–155, 1973.

[7] Torben G Andersen, Tim Bollerslev, Francis X Diebold, and Paul Labys. Modeling and fore-

casting realized volatility. Econometrica, 71(2):579–625, 2003.

[8] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a

comparative. J Mach Learn Res, 10:66–71, 2009.

[9] Ian T Jolliﬀe. Principal component analysis and factor analysis. In Principal component

analysis, pages 115–128. Springer, 1986.

[10] J Darbyshire. The volatility surface: a practitioner’s guide, volume 357. Aitch & Dee Limited,

2017.

[11] Carol Alexander. Principal component models for generating large garch covariance matrices.

Economic Notes, 31(2):337–359, 2002.

24

[12] Kun Zhang and Laiwan Chan. Eﬃcient factor garch models and factor-dcc models. Quantitative

Finance, 9(1):71–91, 2009.

[13] Vasiliki Plerou, Parameswaran Gopikrishnan, Bernd Rosenow, Luis A Nunes Amaral, Thomas

Guhr, and H Eugene Stanley. Random matrix approach to cross correlations in ﬁnancial data.

Physical Review E, 65(6):066126, 2002.

[14] Satya N Majumdar and Pierpaolo Vivo. Number of relevant directions in principal component

analysis and wishart random matrices. Physical review letters, 108(20):200601, 2012.

[15] Donald A Jackson. Stopping rules in principal components analysis: a comparison of heuristical

and statistical approaches. Ecology, 74(8):2204–2214, 1993.

[16] Giacomo Livan, Simone Alfarano, and Enrico Scalas. Fine structure of spectral properties

for random correlation matrices: An application to ﬁnancial markets. Physical Review E,

84(1):016113, 2011.

[17] William F Sharpe. Capital asset prices: A theory of market equilibrium under conditions of

risk. The journal of ﬁnance, 19(3):425–442, 1964.

[18] Richard Roll and Stephen A Ross. An empirical investigation of the arbitrage pricing theory.

The Journal of Finance, 35(5):1073–1103, 1980.

[19] Eugene F Fama and Kenneth R French. Common risk factors in the returns on stocks and

bonds. Journal of ﬁnancial economics, 33(1):3–56, 1993.

[20] Rémy Chicheportiche and J-P Bouchaud. A nested factor model for non-linear dependencies

in stock returns. Quantitative Finance, 15(11):1789–1804, 2015.

[21] Eugene F Fama and Kenneth R French. Multifactor explanations of asset pricing anomalies.

The journal of ﬁnance, 51(1):55–84, 1996.

[22] Charles Engel, Nelson C Mark, and Kenneth D West. Factor model forecasts of exchange rates.

Econometric Reviews, 34(1-2):32–55, 2015.

[23] Bruce Thompson. Exploratory and conﬁrmatory factor analysis: Understanding concepts and

applications. American Psychological Association, 2004.

[24] Robert C Merton. An intertemporal capital asset pricing model. Econometrica: Journal of the

Econometric Society, pages 867–887, 1973.

[25] Michael Zabarankin, Konstantin Pavlikov, and Stan Uryasev. Capital asset pricing model

(capm) with drawdown measure. European Journal of Operational Research, 234(2):508–517,

2014.

[26] Nicholas Barberis, Robin Greenwood, Lawrence Jin, and Andrei Shleifer. X-capm: An extrap-

olative capital asset pricing model. Journal of Financial Economics, 115(1):1–24, 2015.

[27] Harry Markowitz. Portfolio selection. The journal of ﬁnance, 7(1):77–91, 1952.

[28] Eugene F Fama and Kenneth R French. The cross-section of expected stock returns. the

Journal of Finance, 47(2):427–465, 1992.

25

[29] Gregory Connor, Matthias Hagmann, and Oliver Linton. Eﬃcient semiparametric estimation

of the fama–french model and extensions. Econometrica, 80(2):713–754, 2012.

[30] Robert Faﬀ, Philip Gharghori, and Annette Nguyen. Non-nested tests of a gdp-augmented

fama–french model versus a conditional fama–french model in the australian stock market.

International Review of Economics & Finance, 29:627–638, 2014.

[31] Eugene F Fama and Kenneth R French. A ﬁve-factor asset pricing model. Journal of Financial

Economics, 116:1–22, 2015.

[32] Nai-Fu Chen, Richard Roll, and Stephen A Ross. Economic forces and the stock market.

Journal of business, pages 383–403, 1986.

[33] Marc R Reinganum. The arbitrage pricing theory: some empirical results. The Journal of

Finance, 36(2):313–321, 1981.

[34] Robert Faﬀ. A simple test of the fama and french model using daily data: Australian evidence.

Applied Financial Economics, 14(2):83–92, 2004.

[35] Robert R Grauer and Johannus A Janmaat. Cross-sectional tests of the capm and fama–french

three-factor model. Journal of banking & Finance, 34(2):457–470, 2010.

[36] François-Eric Racicot and William F Rentz. Testing fama–french’s new ﬁve-factor asset pricing

model: evidence from robust instruments. Applied Economics Letters, 23(6):444–448, 2016.

[37] Yannick Malevergne and D Sornette. Collective origin of the coexistence of apparent random

matrix theory noise and of factors in large sample correlation matrices. Physica A: Statistical

Mechanics and its Applications, 331(3):660–668, 2004.

[38] Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna. Hierarchically nested factor

model from multivariate data. EPL (Europhysics Letters), 78(3):30006, 2007.

[39] Won-Min Song, Tiziana Di Matteo, and Tomaso Aste. Hierarchical information clustering by

means of topologically embedded graphs. PLoS One, 7(3):e31929, 2012.

[40] Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo. Relation between ﬁnancial market struc-

ture and the real economy: comparison between clustering methods. PloS one, 10(3):e0116201,

2015.

[41] Stephen J Taylor. Modeling stochastic volatility: A review and comparative study.

Mathematical ﬁnance, 4(2):183–204, 1994.

[42] F Jay Breidt, Nuno Crato, and Pedro De Lima. The detection and estimation of long memory

in stochastic volatility. Journal of econometrics, 83(1-2):325–348, 1998.

[43] Ajay Singh and Dinghai Xu. Random matrix application to correlations amongst the volatility

of assets. Quantitative Finance, 16(1):69–83, 2016.

[44] Laurent Laloux, Pierre Cizeau, Jean-Philippe Bouchaud, and Marc Potters. Noise dressing of

ﬁnancial correlation matrices. Physical review letters, 83(7):1467, 1999.

[45] Christian Borghesi, Matteo Marsili, and Salvatore Miccichè. Emergence of time-horizon in-

variant correlation structure in ﬁnancial returns by subtraction of the market mode. Physical

Review E, 76(2):026104, 2007.

26

[46] T. Di Matteo N. Musmeci, T. Aste. Interplay between past market correlation structure changes

and future volatility outbursts. Scientiﬁc Reports 6, 6:36320, 2016.

[47] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal

of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.

[48] Rama Cont. Empirical properties of asset returns: stylized facts and statistical issues.

Quantitative Finance, 2001.

[49] Anirban Chakraborti, Ioane Muni Toke, Marco Patriarca, and Frédéric Abergel. Econophysics

review: Ii. agent-based models. Quantitative Finance, 11(7):1013–1041, 2011.

[50] Benoit B Mandelbrot. The variation of certain speculative prices. In Fractals and Scaling in

Finance, pages 371–418. Springer, 1997.

[51] Henri Theil. A rank-invariant method of linear and polynomial regression analysis. In Henri

Theil’s contributions to economics and econometrics, pages 345–381. Springer, 1992.

[52] S Micciche. Empirical relationship between stocks cross-correlation and stocks volatility clus-

tering. Journal of Statistical Mechanics: Theory and Experiment, 2013(05):P05015, 2013.

[53] George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series

analysis: forecasting and control, page 33. John Wiley & Sons, 2015.

[54] Lothar Sachs. Applied statistics: a handbook of techniques. Springer Science & Business

Media, 2012.

[55] Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo. Risk diversiﬁcation: a study of per-

sistence with a ﬁltered correlation-network approach. Journal of Network Theory in Finance,

1(1):77–98, 2015.

[56] Michele Tumminello, Salvatore Micciche, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna.

Statistically validated networks in bipartite complex systems. PloS one, 6(3):e17994, 2011.

[57] Willliam Feller. An introduction to probability theory and its applications, volume 2. John

Wiley & Sons, 2008.

[58] Ian T Jolliﬀe. A note on the use of principal components in regression. Applied Statistics,

pages 300–303, 1982.

[59] Kristopher J Preacher, Guangjian Zhang, Cheongtag Kim, and Gerhard Mels. Choosing

the optimal number of factors in exploratory factor analysis: A model selection perspective.

Multivariate Behavioral Research, 48(1):28–56, 2013.

[60] Dennis Child. The essentials of factor analysis. A&C Black, 2006.

[61] Richard Lockhart, Jonathan Taylor, Ryan J Tibshirani, and Robert Tibshirani. A signiﬁcance

test for the lasso. Annals of statistics, 42(2):413, 2014.

27