ArticlePDF Available

A cluster driven log-volatility factor model: a deepening on the source of the volatility clustering


Abstract and Figures

We introduce a new factor model for log volatilities that performs dimensionality reduction and considers contributions globally through the market, and locally through cluster structure and their interactions. We do not assume a-priori the number of clusters in the data, instead using the Directed Bubble Hierarchical Tree (DBHT) algorithm to fix the number of factors. We use the factor model and a new integrated non parametric proxy to study how volatilities contribute to volatility clustering. Globally, only the market contributes to the volatility clustering. Locally for some clusters, the cluster itself contributes statistically to volatility clustering. This is significantly advantageous over other factor models, since the factors can be chosen statistically, whilst also keeping economically relevant factors. Finally, we show that the log volatility factor model explains a similar amount of memory to a Principal Components Analysis (PCA) factor model and an exploratory factor model.
Content may be subject to copyright.
arXiv:1712.02138v2 [q-fin.ST] 10 May 2018
A cluster driven log-volatility factor model: a deepening on the
source of the volatility clustering
Anshul Verma1, R. J. Buonocore1, and T. Di Matteo1,2,3
1Department of Mathematics, King’s College London, The Strand, London, WC2R 2LS,
2Department of Computer Science, University College London, Gower Street, London,
3Complexity Science Hub Vienna, Josefstaedter Strasse 39, A 1080 Vienna
May 11, 2018
We introduce a new factor model for log volatilities that performs dimensionality reduction
and considers contributions globally through the market, and locally through cluster structure
and their interactions. We do not assume a-priori the number of clusters in the data, instead
using the Directed Bubble Hierarchical Tree (DBHT) algorithm to fix the number of factors. We
use the factor model and a new integrated non parametric proxy to study how volatilities con-
tribute to volatility clustering. Globally, only the market contributes to the volatility clustering.
Locally for some clusters, the cluster itself contributes statistically to volatility clustering. This
is significantly advantageous over other factor models, since the factors can be chosen statisti-
cally, whilst also keeping economically relevant factors. Finally, we show that the log volatility
factor model explains a similar amount of memory to a Principal Components Analysis (PCA)
factor model and an exploratory factor model.
1 Introduction
Volatilities are an important factor for the estimation of risk [1] and for models aiming at dynamically
modelling price and what the rational, fair price should be under such models [2, 3]. However, the
effect of volatility clustering, and particularly its unclear link with how volatilities are correlated
with each other, complicates this process. This causes a problem due to the high dimensionality of
the correlation matrix between the log volatilities that is also subject to noise [4], which makes it
difficult to identify meaningful information about what drives the volatility and volatility clustering.
This problem is also relevant in multivariate volatility modelling since most popular methods such
as multivariate General Autoregressive Conditional Heteroskedasticity (GARCH) [5], stochastic
covariance [6] and realised covariance [7] suffer from the curse of dimensionality and an increase in
the number of parameters needed. One such way of tackling this problem is through dimensionality
reduction, which is a general class of methods that aims to reduce high dimensional datasets to
Corresponding author,, +447740779724,+447549919717,+4402078482223
a reduced form which is a faithful representation of the original dataset [8], and is also related to
noise reduction of the dataset.
One such method of dimensionality reduction of correlation matrices is Principal Component
Analysis (PCA) [9]. It aims to transform the original correlation matrix into an orthogonal basis.
For square correlation matrices, which are those that we consider in this paper, this essentially
means calculating the eigenvalues and their respective eigenvectors. The first eigenvector (called
the first principal component) has the highest variance and explains most of the variability in the
data, the second eigenvector (called the second principal component) has the second highest variance
and explains less variability than the first principal component, and so on. The method has been
applied to finance mainly through portfolio optimisation to produce sets of orthogonal portfolios
[10]. A paper which uses PCA in the context of volatility modelling is [11], where the author extracts
the first few principal components and uses them to calibrate a multivariate GARCH model, with
a further extension proposed in [12]. The main drawback of PCA is that it is not clear how many
principal components i.e. factors to keep, as either too many principal components are kept or
the methods used to select the components are heuristic and subjective in nature [9]. In [13], the
authors suggest to keep the number of principal components according to the Marchenko-Pastur
distribution with a further refinement made in [14] and previously in [15], however in [16] it is
pointed out valuable information may still be lost.
A highly related class of methods in dimensionality reduction are called factor models [17, 18, 19,
20]. Factor models are used to describe the dynamical evolution of time series, assuming that there
exist common factors through the asset’s sensitivity, often called responsiveness, to changes in the
value of these factors. Dimensionality reduction is then achieved through the description of the time
series as the number of factors is smaller than the number of stocks. Factor models have widespread
use in finance due to their relative (or at least superficial) simplicity in comparison to other models
of returns series [17, 19, 21, 22, 20]. Factor models can be split into two varieties: exploratory, which
assume no underlying structure to the data, and confirmatory, which tests relationships between
known factors [23].
However, similarly to PCA a question of how we should choose the factors arises. One such
answer can be categorised by assuming that we have some prior knowledge of the factors. The
simplest and earliest factor model which falls under this category is the Capital Asset Pricing Model
(CAPM)[17, 24, 25, 26]. It emerges from the extremely popular Markowitz scheme of portfolio
optimisation [27], which says it is better to spread an investment across a class of stocks in order
to reduce the total risk of the portfolio. CAPM develops this further by saying that the non
diversifiable risk, or systematic risk, comes from the stock’s exposure to changes in the market and
the corresponding sensitivity to this change.
A very well known factor model which has multiple factors, rather than just one like CAPM, is
the 3-factor Fama-French factor model [28, 19, 21, 29, 30]. In this factor model, the first factor comes
again from the exposure to the market risk with two extra factors: the small minus big (SMB) and
the high minus low (HML)[19, 21]. The SMB factor follows the observation by Fama and French
that stocks with a smaller market cap, which is the market value of the stock used as a proxy of size,
tend to give additional returns. Equivalently, the HML factor represents the book/market ratio i.e.
the ratio of the total value of the assets owned by the company associated to a stock relative to
the stock’s market value, and is positively correlated with additional returns. The aim of the HML
factor is to evaluate whether stocks have been undervalued by the market, where the book/market
ratio exceeds 1, and thus have the potential for larger returns. Recently, the Fama-French model
has been extended to include 5 factors [31]. The arbitrage pricing theory (APT) is also a more
generalised multi factor model, except it states that returns are a linear function of macro economic
factors [18, 32]. In APT however, there is no indication of exactly how many and what factors
should be included, which then introduces an ad-hoc nature to the types and numbers of factors
included in the model.
The above factor models share the fact that the number and nature of the factors are somewhat
exogenous in the sense that they are determined by economic intuition on what should drive finan-
cial returns. Unfortunately, it has been pointed out that there is weak evidence for CAPM [28],
both Fama-French 3and 5factor models and to some manifestations of the APT [33, 34, 35, 36],
underlying the issue that these factors cannot explain the cross dependence of assets. Instead, there
is a strand of literature which invokes factors that are extracted from the financial data itself thus
meaning that the factors are endogenous [37, 38, 20]. In essence, it has been shown that the col-
lective action of assets is what induces the factors, giving support to this type of determination of
factors [37], an approach we shall adopt here. Another difference is that the above factor models
are mainly applied to returns rather than volatilities.
In this paper, we instead build a new factor model of log volatilities that aims to reduce the
dimensionality by considering contributions globally from the market and more locally to the clusters
and their interactions. The number of factors is fixed by the Directed Bubble Hierarchical Tree
(DBHT) clustering algorithm [39, 40], which therefore means we make no prior assumption on the
number of clusters and thus the number of factors to be considered. Using this factor model between
volatilities, we aim to study the link between the univariate volatility clustering and the multivariate
correlation structure of volatilities. We will see that whilst over the entire market the only significant
contributor that affects the memory is the market, individual clusters may have different properties
where the cluster contributions and interactions are more significant. This offers a method to
statistically select factors based on memory reduction. We also note that for the clusters which
significantly reduce their own memory are mostly made up by stocks from particular industries,
offering an economic interpretation for the makeup of the cluster modes. We can thus select the
factors in a statistical manner like in PCA, but also retain the appealing economic interpretation
like in CAPM and Fama-French.
The structure of the paper is as follows: Section 2 desribes the dataset, Section 3 introduces a
new factor model for log volatilities, Section 4 describes how we select factors based on their memory
reduction using a new non parametric integrated proxy for the strength of the volatility clustering,
Section 5 we explore how the empirical link between volatility clustering strength and volatility
cross correlation can be explained. In Section 6, we reveal how each cluster has an economical
interpretation in terms of their identified dominant ICB supersector. Section 7 compares our factor
model to a PCA inspired factor model and an exploratory factor analysis model in terms of their
memory reduction performance. Section 8 reports the dynamic stability of the factor model. Finally,
we draw some conclusions in Section 9.
2 Dataset
The dataset we shall use consists of the daily closing prices of 1270 stocks in the New York Stock
Exchange (NYSE), National Association of Securities Dealers Automated Quotations (NASDAQ)
and American Stock Exchange (AMEX) from 01/01/2000 to 12/05/2017, which makes 4635 points
for each price time series. As anticipated in the introduction, we perform cross correlation analysis.
We therefore make sure that the stocks are aligned through the data cleaning procedure described
in A.1, which leaves our dataset with N= 1202 stocks. We calculate the log-returns time series of
a given stock i,ri(t), defined as:
ri(t) = ln pi(t+ 1) ln pi(t),(1)
where pi(t)is the price time series of stock i, and ri(t)is a time series of length T= 4364. After
standardising ri(t)so that it has zero mean and a variance of 1, we define the proxy we shall use
for the volatility as ln |ri(t)|i.e. the log absolute value of returns [41].
3 Log-volatility factor model
In this section we describe a new factor model for log volatilities, which we shall use to uncover the
relationship between the univariate volatility clustering effect and the cross correlations between
volatilities. Let us recall that a general factor model is given by:
ri(t) =
[βipfp(t) + αip ] + ǫi(t),(2)
where ri(t)are the log returns for asset i,fpare the p= 1,2, ..., P factors. βip is their respective
sensitivities/responsiveness, which quantifies how ri(t)reacts to changes in fp.αip is the intercept
and ǫi(t)are residual terms with zero mean. Firstly, we define the log volatility term we want to
study. Most stochastic volatility models (where the volatility is assumed to be random and not
constant) assume that the returns for the stock ifollow an evolution according to [42]
ri(t) = δ(t)eωi(t),(3)
where δ(t)is a white noise with finite variance and ωi(t)are the log volatility terms. The exponential
term encodes the structure of the volatility and how it contributes to the overall size of the return.
Taking the absolute value of (3) and the log of both sides, Eq. (3) becomes
ln |ri(t)|= ln |δ(t)|+ωi(t),(4)
from which we see that working with ln |ri(t)|has the added benefit of making the proxy for volatility,
ωi(t)additive, which in turn makes the volatility more suitable for factor models. Since δ(t)is a
random scale factor that is applied to all stocks, we can set it to 1, so that ωi(t) = ln |ri(t)|. We
also standardise the ln |ri(t)|to a mean of 0and standard deviation 1as is performed in [43].
In the following subsections, we describe our factor model which considers contributions from
the market mode, clusters and interactions, and their corresponding fitting procedures.
3.1 Market Mode
The log volatility term ωi(t)in Eq. (4) can be modelled as
ωi(t) = βi0I0(t) + αi0+ci(t),(5)
where βi0is the responsiveness of stock iwith respect to changes in I0(t), defined as
I0(t) =
ξiln |ri(t)|,(6)
with the pseudo-index ξibeing the weight of stock ifor the market mode. αi0in eq. (5) is the excess
volatility compared to the market, I0(t). We note that the factor model in eq. (5) is in analogous
form to the general factor model in eq. (2). The first two terms of eq. (5) represent the market
factor, which is the widely observed effect of the market affecting all stocks i.e. the co-movement
of all stocks at once [44, 13, 43]. We see from eq. (5) that performing the linear regression of ωi(t)
against I0(t)gives βi0and αi0, so that the ci(t)becomes the residue after performing the regression.
In table 1, we show two examples of the regression coefficients for the market mode for two selected
stocks Coca Cola Enterprises (KO) and Transoceanic (RIG). We report the values of βi0and αi0
for the weighted scheme and for the equal weights scheme detailed in A.2, along with their p values
for the null hypothesis of each of the coefficients being 0. As we can see from Table 1, at the 5%
level, the null hypothesis is rejected for all βi0for both weighting schemes, which means that we
can conclude that the βi0are significant. For the αi0the null hypothesis is rejected for both stocks
in the equal weights case, and for the weighted case it is rejected only for RIG, and for these cases
we can conlude that the αi0are non-zero.
KO 0.0310 (0) 0.0015 (0.4764)
RIG 0.0248 (0) 0.1972 (0)
(a) weighted modes
KO 1.1564 (0) -0.0690 (0.0017)
RIG 0.9041 (0) 0.1426 (0)
(b) equal weights
Table 1: This table shows the responsiveness to the market mode I0(t),βi0and the corresponding
excess volatility αi0for stocks KO and RIG, calibrated as detailed in section 3.1. The p values
shown in brackets are for the null hypothesis that both βi0and αi0are 0. Table 1a is for the
weighted scheme and Table 1b for equal weights, which are detailed in A.2.
3.2 DBHT output
Since ci(t)is the residue after performing the regression in eq. (5), it represents the volatility that
is not explained by the market. We can therefore further define ci(t)as:
ci(t) = βikIk(t) +
βikIk(t) + ǫi(t),(7)
where βik are the responsiveness for the kcluster mode Ik(t)which iis a member of. In the sum
from eq. (7), the βikare the responsiveness to changes in Ik(t)which are the cluster modes of the
clusters k6=ki.e. the clusters iis not a member of. In eq. (7) the first term is for the cluster
factor and it represents the co-movement of the stock with its cluster. Like for eq. (5), eq. (7) is
an analogous form to eq. (2). The sum in eq. (7) represents the interactions the stock ihas with
other clusters, where the strength of the interactions are quantified and defined through the βik.
The next step of the calibration procedure concerns the identification of the clusters, which is
relevant for the ci(t)term defined in eq. (7). Now, we need to find what the cluster structure is,
which we do by first calculating G, which is the cross correlation matrix between ci(t), defined as
Gij =1
We then apply the clustering algorithm to G. We use the clustering algorithm after removing
the market mode since this gives a more stable clustering [45]. We shall use the Directed Bubble
Hierarchical Tree, DBHT [39, 40, 46], to find the cluster membership of stocks. DBHT is used
because as compared to other hierarchical clustering algorithms it provides the best performance in
terms of information retrieval [40]. Using the DBHT algorithm also means that we make no prior
assumption on exactly how many factors for the clusters should be included, instead extracting
them directly from the data. We can see from Table 2 that the DBHT algorithm identifies a total of
K= 29 clusters, with the largest cluster comprising of 172 stocks and the smallest cluster comprising
of 5stocks. The average cluster size is 41.4.
3.3 Cluster modes and interactions
Once the number and composition of each cluster is identified, we can associate a factor to each
cluster k. The interactions are then characterised through the responsiveness βikwhere k6=ki.e.
how ci(t)changes w.r.t to Ik(t). We define the cluster mode for cluster k,Ik(t), again as a weighted
average of volatilities for the assets in k
Ik(t) = X
icluster k
ξik is the weight for stock iwhich is in cluster k. From eq. (7), we see that similarly to the market
mode case, we can determine βik ,βikand αik ,αikby linearly regressing ci(t)against Ik(t)and
Ik(t). We use elastic net regression [47] to find βik and βikto take into account the possibility
of Ik(t)and Ik(t)being correlated, whilst also allowing for some of the βikto be 0as imay not
interact with cluster k. More details about elastic net regression are provided in appendix A.3.
4 Empirical link between volatility clustering and volatility cross
As anticipated in the introduction, we choose which factors are relevant for the decomposition in
Eq. (7), by measuring what the impact is of each cluster on the volatility clustering. Before turning
our attention to this analysis, let us introduce the volatility clustering proxy we use in the rest of
the paper.
4.1 Volatility Clustering
Volatility clustering is one of the so called stylised facts of financial data, and expresses the idea
that returns are not independent since volatilities are autocorrelated [48, 49]. The autocorrelation
function (ACF) κ(L)is defined as
κ(L) = corr(ln |r(t+L)|,ln |r(t)|)(10)
=h[ln |r(t+L)|ln |r(t)|]i
where h...idenotes the expectation. Lis the lag and σ2is the variance of the process of ln |r(t)|,
and note that we use log absolute value returns as a proxy for volatility. The interpretation of this
result is that large changes in returns are usually followed by other large changes in returns, or that
the returns retain a memory of previous values [50]. For this reason, volatility clustering can also
be called the memory effect. κ(L)has also been assumed to follow a power law decay:
κ(L)Lβvol ,(12)
where βvol describes the strength of the memory effect. A lower value of βvol indicates that more
memory of past values is kept. To compute βwe transform eq. (12) into loglog scales and compute
0 1 2 3 4 5 6
log( (L))
(a) Coca Cola Enterprises Inc. βvol = 0.4544
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
log( (L))
(b) Transoceanic βvol = 0.3975
Figure 1: Empirical ACF of the log absolute value returns (blue solid lines) for Coca Cola Co. (KO)
in figure 1a and Transocean (RIG) in figure 1b in log-log scale. The linear best fit is also shown in
red dashed lines.
the slope of the linear best fit, which gives us the exponent βvol. We shall compute βvol using the
Theil-Sen procedure rather than using standard least squares since it is more robust to outliers [51].
We report in figure 1 the function κ(L)for Coca Cola Enterprises Inc. in figure 1a and Transoceanic
in figure 1b, both in loglog scale, with the linear best fit also plotted. We define the entries Eij of
the empirical volatility cross correlation Eas
Eij =
ln |ri(t)|ln |rj(t)|.(13)
The proxy used for the volatility cross correlation is the average cross correlation for stock i,ρvol
is defined as
Eij (14)
Using the proxies for volatility clustering and the volatility cross correlation, [52] finds a negative
relationship between ρvol
iand βvol
i, which we confirm holds on our data set of daily data and using
ln |r(t)|, rather than the original high frequency data and |r(t)|used in [52], in figure 2. The main
consequence of this result is that it implies that the more the volatility of a stock iis linked to other
stocks, the stronger the memory effect and thus it retains more information about previous values
of volatility, linking the strength of volatility clustering with the cross correlation matrix between
4.2 Non parametric memory proxy
As already mentioned, the βvol power law exponent that is fitted to the autocorrelation function
of the absolute returns is a proxy for the strength of the memory effect: the lower the beta the
stronger the memory effect. The use of the power law to quantify the memory effect is parametric
as we assume the tail decays as a power law through the exponent β. The autocorrelation function
itself can be noisy due to its slow convergence [48], which can be seen in figure 1. In light of this,
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Figure 2: Negative dependence between ρvol
iand βvol
i. The negative relationship was tested using
1 sided Spearman’s rank correlation at the 5% level with the null hypothesis of there being no
correlation and was rejected, which confirms the result of [52] on our data.
we instead introduce a new model free proxy, η, by integrating the autocorrelation function over
time lags Luntil Lcut, which we define as the standard Bartlett Cut at the 5% level [53].
κ(L)dL , (15)
where κ(L)is the empirical autocorrelation matrix of the log absolute returns as a function of the
lag L. With this proxy the larger the value of ηthe greater the degree of the memory effect (in the
βproxy this corresponds to larger values of the exponent). The median value reported across all
stocks is 20.7318 ±8.6901, where the error is computed across all stocks using the median absolute
deviation (MAD) for ηidefined as
MAD =median (|ηimedian(ηi)|).(16)
We have also plotted the βas a memory effect proxy vs ηin figure 3a, which as expected shows a
decreasing relationship between ηand the βmemory proxy, which is the one used in the literature,
since a larger memory effect means a higher η, but lower β. This provides justification for our use
of η. This proves that ηis coherent with βvol and thus can be used a proxy for the strength of the
memory effect.
Figure 3b which is a plot of ρvol
ivs ηconfirms the main result of [52] using ηinstead of βvol ,
and was tested using Spearman’s rank correlation at the 5% level for the null hypothesis, which
was rejected, of there being no correlation between ρvol
iand ηversus alternative hypothesis of there
being significant positive relationship. Our proxy can therefore also confirm the result of [52].
Plotting Lcut vs ηin figure 4a, reveals that processes with strong short memory will have a lower
Lcut and thus lower η, whilst processes with a long memory component will have higher Lcut and
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
(a) βvol vs η
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
(b) ρvol
ivs η
Figure 3: In figure 3a we plot the βvol power law exponent proxy for the strength of the memory
effect vs c the integrated proxy. In figure 3b we plot the relationship between ρvol
iand ηdefined in
the text. The decreasing relationship in figure 3a and the increasing relationship in figure 3b was
tested using the Spearman’s rank correlation at the 5% level and was rejected in both cases.
0 100 200 300 400 500 600 700 800 900
(a) Lcut vs η
0 100 200 300 400 500 600 700 800 900
(b) Lcut vs βvol
Figure 4: The figure on the left is a plot of Lcut vs ηfor all stocks. The figure on the right is Lcut vs
βvol for all stocks. The increasing relationship shown in figure 4a and decreasing relationship shown
in figure 4b are tested using Spearman’s rank correlation, and are 0.7871 and -0.4271 respectively,
which are statistically significant at the 5% level.
η. This is important since volatility clustering is a result of long memory present in time series.
An analogous plot of Lcut vs βvol in figure 4b shows the expected decrease in βvol as Lcut increases,
but the relationship is not as strong as that of Lcut vs η(an absolute Spearman correlation value
of 0.4271 vs 0.7871 tested at the 5% level). A consequence of this is that ηcan better distinguish
between short and long memory processes as compared to βvol.
5 Memory filtration
In this section, by means of the factor model introduced in eqs. (5)(7) and also by means of the η
proxy introduced in the previous subsection, we want to understand the origin of the empirical link
between the memory strength and the volatility cross-correlation. This analysis will in turn be also
fundamental for the cluster mode selection in our model. The main intuition is that the market
mode, the cluster mode and the interaction modes all bring relevant information about the memory
of a certain stock’s time-series.
5.1 Assessing the memory contributions
Let us here describe the method we use in order to understand the contribution to the memory of
each term in the factor model in eqs. (5)(7). For every time-series, say for stock i, we follow a
step-by-step procedure, by measuring the value of the proxy ηifor the following four times:
1. on the plain time-series ηi,P L;
2. on the residual time-series once the market mode is removed ηi,M M ;
3. on the residual time-series once the market mode and the cluster mode (of the the cluster the
stock belongs to) are removed ηi,CM ;
4. on the residual time-series once market, cluster and interaction mode are all removed. In order
to make a quantitative comparison ηi,I M .
The next step consists in assessing the memory reduction after each removal. We do so by computing
the ratio of two subsequently computed value of ηi. For stock ithus we have that
1. ηi,MM
ηi,P L defines the reduction in memory induced by the market mode;
2. ηi,CM
ηi,MM defines the reduction in memory induced by the cluster mode once the market mode is
3. ηi,IM
ηi,CM defines the reduction in memory induced by the interaction mode once the market mode
and the cluster mode are removed.
According to the definition, if a ratio is below one it means that a memory reduction has occurred
via the corresponding removal. In order to understand what is the average behaviour of these ratios
we take the median of each of them computed on all stocks. So, for example, the average reduction
of memory induced by the market mode on a given set of stocks is median(ηi,MM
ηi,P L )computed over
the index i. As for an error to associate to this measure we used the Median Average Deviation
[54], defined as for ηi,M M
ηi,P L
MAD ηi,M M
ηi,P L (17)
ηi,P L
median ηi,MM
ηi,P L ,(18)
and similarly for ηi,CM
ηi,MM and ηi,I M
ηi,CM . Both the median and the MAD were chosen because of their
robustness against outliers. We regard as significant a reduction of memory on the given set of
stocks for which the median plus the mad of the ratio are below one.
5.2 Whole market analysis: finding the main source of memory
We apply here the procedure described in the previous subsection to our dataset described in Section
2. For completeness, in Fig. 5 we report the result of our analysis for both the unweighted and the
weighted schemes. Figure 5a reports the value of the ratios along with the errors (black vertical
bars). We observe that in all cases the average value plus the error stays below one, which means
that every term gives a meaningful contribution to the overall memory. However we also notice
that, in particular for the reduction coming from the cluster mode, there is a large variablity among
stocks. Figure 5b reports the same result but showing what is the contribution of each removal
with respect to the overall memory. According to our analysis, the majority of the contribution
comes from the market mode, which is than the main source of memory for the volatility. We also
plot in figure 6 the cumulative of the fraction of stocks with at most the percentage of memory left
reported on the x axis, after all contributions are removed. For example from figure 6 we find that
90% of all stocks have only 16.7% of their memory unexplained by all the contributions. We also
note here that there is little difference in figure 6 between the weighted and unweighted versions so
we shall herein use the unweighted scheme for most of the analysis. This analysis establishes that
there is indeed a link between the log volatility and volatility clustering.
5.3 Cluster-by-cluster analysis: selection criterion for factors
In this subsection, instead of aggregating the result of the memory reduction over the whole market,
we specialize and check what happens to the memory on a cluster-by-cluster basis. For brevity, we
only discuss in detail the case of cluster 12 and cluster 22, as defined by the DBHT algorithm
discussed in section 3.2, since they are quite informative about the different behaviour one can
find at a cluster level. We repeat then the same analysis we performed in the previous subsection
but report the behaviour of these two particular clusters. In figure 7 we report the result of our
analysis for the unweighted scheme. Figure 7a reports the value of the ratios along with the errors
(black vertical bars). Differently for the whole dataset, we see that from figure 7a, the cluster mode
removes the vast majority of the memory for cluster 12, without any contribution coming from the
market mode or from the interactions. Instead for cluster 22, we see from figure 7a that the market
is the major contributor to the memory, whereas the cluster mode is reducing some the remaining
memory to some extent and the interactions are again not giving much contribution. Figure 7b
reports the same kind of result but relatively to the overall memory. These results suggest that a
local analysis reveals a richer behaviour in how the terms in our log volatility factor model affect
the memory effect, showing that there is also a link between the correlation structure of the log
volatilities and the memory effect. Given these results, we argue that a good criteria for selecting
statistically meaningful factors, among all cluster modes, to be included in the definition of our
1equal weights
equal weights weighted
% contribution to memory
Figure 5: Results for the procedure described in section 5.1 across all stocks in the market. Figure 5a
is the median of the ratio of the memory proxies for, starting from the left, ηi,M M
ηi,P L ,ηi,CM
ηi,MM and ηi,I M
ηi,CM ,
computed over the whole market. The blue bars are for the equal weights scheme and the yellow
bars are for the weighted scheme. The black vertical bars represent the errors among stocks memory
reduction applied to the whole market, which is calculated using eq. (18) and its equivalents for the
other ratios. In figure 5b we plot the contribution to the memory effect of the market (MM), cluster
(CM) and interactions (IM) as a percentage with respect to the overall memory. The residual is
remaining percentage of memory that is unexplained by the contributors. The values are computed
over the whole market. The left column is for the equal weights scheme and the right column is for
the weighted scheme.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
unexplained memory
equal weights
Figure 6: Cumulative distribution of the fraction of stocks which have a fraction residual memory
left after all contributors of the model (market mode, cluster mode and interactions) are removed.
The red line is for the weighted modes and the blue the equal weighted modes
1.5 cluster 12
cluster 22
cluster 12 cluster 22
% contribution to memory
Figure 7: The same set of graphs as Fig. 5 except using the equal weights scheme and taking only
stocks belonging to cluster 12 and 22. In figure 7a we plot the median ratio of, starting from the left,
ηi,P L ,ηi,CM
ηi,MM and ηi,I M
ηi,CM , computed over the stocks in cluster 12 for the blue bars and over stocks in
cluster 22 for the yellow bars. Equal weighted modes are used. The black vertical bars represent the
errors among stocks memory reduction applied to stocks in cluster 12 and 22, which is calculated
by using eq. (18) and its equivalents for the other ratios. In figure 7b we plot the contribution to
the memory effect of the market (MM), cluster (CM) and interactions (IM) as a percentage with
respect to the overall memory. The residual is remaining percentage of memory that is unexplained
by the contributors. The values are computed over all stocks in cluster 12 for the left column and
over all stocks in cluster 22 for the right column. Equal weighted modes are used.
0 5 10 15 20 25 30
cluster no.
no. stocks
Automobiles & Parts
Basic Resources
Construction & Materials
Financial Services
Food & Beverage
Health Care
Industrial Goods & Services
Oil & Gas
Personal & Household Goods
Real Estate
Travel & Leisure
Figure 8: Composition of DBHT clusters in terms of ICB supersectors. The x axis labels the clusters
of DBHT and the y axis is the number of stocks in each cluster. The colours represent particular
ICB supersector given in the key.
factor model, is to choose those which achieve a significant reduction (in the sense of Section 5.1)
to the memory of the stocks within their cluster. Table 2 summarizes the results of this procedure,
reporting in the first column the cluster number k(as given by the DBHT algorithm). The second
column contains the number of stocks in each cluster and in the fourth column we show if the cluster
mode reduces the memory of the stocks within that cluster significantly. As we can see we find that
out of 29 clusters, 7 do not have a significant meaning to the memory, thus, according, to our
criteria are discarded. The fifth, sixth, seventh and eighth column of the table 2 are the fractional
contributions that the market, cluster, interactions and residuals make to the overall memory in the
cluster. Comparing the last four columns in table 2 we see that there is significant heterogeneity
in the amount of contributions the market and cluster make to the cluster’s overall memory, which
highlights the importance of the inclusion of cluster factors in our factor model.
6 Economical interpretation of the clusters
Up till now, we have focused on determining the clusters via statistical tools. In this section we
show that the clusters also have an economical interpretation. In figure 8, we show the cluster
composition of each cluster identified through DBHT using the Industrial Classification Benchmark
(ICB)supersector classification of common industries, with each colour representing a different su-
persector. In particular from figure 8, we observe that clusters are dominated by a particular
supersector. For example, we see from figure 8 that clusters 12 and 22 show the presence of domi-
nant supersectors: the real estate sector for cluster 12 and technology sector for cluster 22. In order
to check that these identifications of dominant sectors are meaningful, we used the same hypoth-
esis test as in [56, 40], which tests the null hypothesis that the cluster has merely randomly been
k no. stocks dom. supersector cluster sig market cluster interac resid
1 68 OG (0) T 0.000 0.758 0.055 0.187
2 26 OG (0) T 0.000 0.653 0.097 0.250
3 12 FS (0) T 0.387 0.463 0.041 0.110
4 39 U (0) T 0.855 0.032 0.024 0.090
5 13 BR (0) T 0.727 0.199 0.016 0.058
6 11 IGS (0.089957) T 0.719 0.073 0.026 0.182
7 23 FS (0) T 0.721 0.127 0.053 0.100
8 17 FB (0) F 0.818 0.000 0.021 0.161
9 9 HC (0) T 0.923 0.029 0.001 0.047
10 24 IGS (0.355912) T 0.471 0.403 0.028 0.098
11 11 HC (0) F 0.890 0.000 0.018 0.093
12 32 RE (0) T 0.000 0.977 0.005 0.018
13 30 FS (0) T 0.662 0.226 0.019 0.093
14 144 RE (0) T 0.574 0.272 0.049 0.105
15 77 HC (0) T 0.769 0.093 0.012 0.127
16 5 TL (0) T 0.968 0.012 0.003 0.016
17 66 B (0) T 0.733 0.149 0.040 0.078
18 111 B (0) T 0.833 0.088 0.024 0.055
19 15 PHG (0) T 0.781 0.134 0.031 0.054
20 8 TL (0) F 0.965 0.000 0.002 0.033
21 172 T (0) T 0.684 0.221 0.013 0.082
22 118 T (0) T 0.836 0.071 0.020 0.073
23 14 I (0) F 0.951 0.000 0.007 0.042
24 12 IGS (0.003514) T 0.911 0.050 0.005 0.034
25 17 C (0) T 0.956 0.005 0.003 0.035
26 31 R (0) T 0.900 0.036 0.008 0.057
27 43 IGS (0) F 0.945 0.000 0.005 0.049
28 37 R (0) F 0.940 0.000 0.003 0.057
29 15 IGS (0) F 0.954 0.000 0.003 0.044
Table 2: Table showing the cluster no. k in the first column and the number of stocks in the
second column. In the third column, we have the dominant ICB supersector (abbreviated to the
first letters in each supersector, which are listed in figure 8). In brackets in the third column we
have the p value of the hypothesis test which tests whether the most dominant supersector can
be meaningfully identified from the cluster [55], which are given to 6 decimal places. The fourth
column details whether the cluster mode significantly reduces the memory for that cluster. The
fifth, sixth, seventh and eighth columns are the fraction of contribution (to 3 decimal places) that
the market, cluster, interactions and residual make respectively to the total memory.
assigned supersector classifications using the hypergeometric distribution versus the alternative hy-
pothesis that the supersector is indeed dominating the cluster. Starting from a significance level of
5%, we additionally used a conservative Bonferroni correction for multiple hypothesis testing [57]
of 0.5NclNI C B to reduce the level of significance, where Ncl is the number of clusters identified
through DBHT and NICB is the number of ICB supersectors. This reduces the level of significance
to 9.0×105, reporting the p values to six decimal places. Table 2 details the results of applying
this process to all clusters, with the dominant supersector denoted in the third column. We see from
Table 2 that in 26 clusters, the cluster can indeed be matched to their dominating supersector, and
of the clusters that significantly contribute to their own memory (see section 5.3), 19 correspond to
their dominating supersector. This opens the possibility of choosing cluster modes for a further re-
finement of the factor model between log volatilties by choosing the cluster modes which reduce the
memory statistically significantly after the market mode is removed, but also having an economic
interpretation of being dominated by particular supersectors.
Moreover, after comparing clusters which are dominated by the same ICB supersector in table
2, we see that the groups of clusters k= 1,2and k= 17,18, which are dominated by the Oil and
Gas and Banks supersectors respectively, have similar contributions for the market, clusters and
interactions. However, there are instances where clusters dominated by the same supersector do not
have similar contributions. For example, clusters k= 12,14 are both dominated by the Real Estate
supersector, but for k= 12 the market does not statistically contribute to the memory, whilst for
k= 14 it does. This could be indication of markets moving away from clearly defined industrial
supersectors, which was also noted in [55], and emphasises why we have used the clustering algorithm
DBHT, rather than taking the industrial classifications directly.
7 Comparison with PCA and Exploratory Factor Analysis
In this section we compare the memory reduction performance of our model with a well established
PCA inspired factor model [58] and exploratory factor analysis driven factor model. Firstly, we
explain the importance of the PCA factor model. The PCA analysis gives a set of orthogonal
eigenvectors that define mutually linearly uncorrelated portfolios that can be used to help define
factor models by assigning each principal component a separate factor. However, as we have pointed
out it is difficult to decide how many principal components we should keep. In our analysis, the
number of principal components we keep in the PCA factor model shall be fixed to be the same as
the number of factors in our factor model i.e. 20. PCA aims to explain the diagonal terms in the
orthogonal basis of the correlation matrix E, which is the correlation matrix between the ln |ri(t)|.
Exploratory factor analysis (FA) on the other hand is more general, and aims to explain the off
diagonal terms of E, using the general linear model in (2). Again, there are problems selecting
exactly how many factors should be included [59], but we fix the number of factors in the FA model
to be equal to the number of factors in our log volatility factor model i.e. 20. After extracting
the factors, we apply a varimax rotation of the factors [60], which is commonly applied in factor
analysis to improve understandability. In figure 9 we plot the cumulative distribution function of
how much residual memory is left after removal of the factors for the log volatility factor model, FA
model and PCA factor model as a percentage of the total memory before removal.
We see from figure 9 that 90% of all stocks only have a maximum of 16.7% residual memory
left for the factor model of log volatility, whereas 90% of all stocks have a maximum of 12.7% of
residual memory left, which means that the PCA factor model and the log volatility factor model
both explain the memory to the same efficiency. For the exploratory factor model, we see that 90%
of all stocks have 21.8% of their memory left, which is worse than the log volatility factor model
0 0.2 0.4 0.6 0.8 1
unexplained memory
logvol model
Figure 9: Empirical cumulative distribution function of the unexplained residual memory for the
factor model in blue line, the PCA in black, where we only take the first 23 principal components,
and the exploratory factor analysis, where we use 23 factors and a varimax rotation.
and the PCA factor model, but still has a comparable performance. We can therefore conclude that
the log volatility factor model explains the same amount of memory as the other two models, even
after fixing the amount of factors to be the same in the PCA and exploratory factor model.
8 Dynamic Stability of Clusters and their Memory Properties
So far the results that have been presented are based on static correlation matrices, which are
computed across the whole time period considered in our dataset. A natural question then arises
about whether the results presented in sections 5 and 6 are dynamically stable. First we divide
each stock’s time series into 50 rolling windows of length 1600, which gives a shift of 56 days for
each window [55]. For every window, we then perform the same analysis as is done in section 3.
That is, for each rolling window m= 1,2, ..., 50 we remove the market mode computed on that
time window, and then compute the corresponding correlation matrix Gmand its clustering Ym
using the DBHT algorithm. To tackle whether the clusters themselves are dynamically stable, we
use a similar procedure to the one presented in section 6 and is also carried out in [55]. Specifically,
for each time window m, we use the hypergeometric test to see if each of the clusters in the static
clustering Xare statistically similar to a cluster in Ym, recording the number of time windows
where there is a possible match. This is recorded in the blue bars in figure 10. We also calculate
whether each of the clusters in Ymcan still statistically reduce their memory in every time window
m, and measure the total number of time windows where this happens, which is plotted in the red
bars in figure 10. These two numbers give a measure of persistence of both the appearance and
statistical memory reduction properties for each cluster.
As we can see from figure 10 most clusters are quite stable, appearing in most time windows. The
exceptions to this are clusters 6,10,24, which interestingly can all be identified with the Industrial
Goods and Services supersector (from table 2), and cluster 16, which is quite a small cluster with
only 5stocks and thus more likely to be unstable in time due its small size. From figure 10, we
can also conclude that the memory filtration properties of the clusters identified in section 5.3 are
stable in time. This is because figure 10 indicates high persistence of the static clusters from table
2 in the memory sense (red bars) that statistically contribute to their memory (for example clusters
2 and 12). On the other hand, we see low persistence in the memory sense for clusters that do not
contribute to their own memory in the static case (for example clusters 27,28,29).
9 Conclusion
We proposed a new factor model for the log-volatility discussing how each term of the model affects
the stylized fact of the volatility clustering. This reduces the information present in the linear
correlation between the log volatilities to a global factor, which is the so-called market mode, and
second and third local factors, which are the cluster mode and the interactions. Using a new
non parametric, integrated proxy for the volatility clustering, we found that there is indeed a link
between the volatility and volatility clustering. First, the dataset was examined globally, which
revealed the market to account for the majority of the volatility clustering effect present in our
dataset. However, a local cluster by cluster analysis instead reveals significant variability: in some
clusters, the cluster mode itself may be contributing to the volatility clustering. This enabled us to
select only statistically relevant cluster factors, reducing the information in the correlation between
the log volatility and the number of factors further. From these reduced set of factors, we can
select factors that have an economic interpretation through the identification of their dominant
ICB supersector, which decreased the number of relevant factors some more. This is significantly
5 10 15 20 25
no. time windows
no. appearances
no. active
Figure 10: The blue bars are the number of time windows where a cluster k(whose identity are
detailed in 2), can be statistically identified with a cluster in Ym, which is the clustering computed
over 50 rolling windows of length 1600. The red bars are the number of time windows where a
cluster in Xcan statistically reduce its own memory on the rolling time window m.
advantageous over other potential factor models that could be used for log volatility such as PCA and
exploratory factor analysis since we do not subjectively select the number of factors, and also because
the factors have a clearer economic interpretation through the identification of their dominant ICB
supersector. A comparison of the log volatility factor model with PCA and an exploratory factor
model reveals that they explain the same amount of memory in the dataset. Both the clusters and
their reported memory filtration were also found to be dynamically stable.
This work is particularly relevant for the field of volatility modelling, since most multivariate
models such as multivariate extensions of GARCH, stochastic covariance and realised covariance
models suffer from the curse of dimensionality and increase in the number of parameters. The log
volatility factor model presented here could be used to help reduce the amount of parameters needed
for these models through the identification of a reduced set of factors given by the procedure in this
A Appendix
A.1 Data cleaning process
Our dataset cannot be used as it is since the price time-series are not aligned, which is due to the
fact the some stocks have not been traded on certain days. In order to overcome this issue, we apply
a data cleaning procedure which allows us to keep as many stock as possible. For example, we do
not want to remove a stock just because it was not traded on few days in the given time-span. The
main idea is to fill the gaps dragging the last available price and assuming that a gap in the price
time-series corresponds to a zero log-return. At the same time we do not want to drag too many
prices because a time-series filled with zeros would not be statistically significant. In light of this
we remove from our dataset the time-series which are too short in a certain sense. The detailed
procedure goes as follows:
1. Remove from the dataset the price time-series with length less than ptimes the longest one;
2. Find the common earliest day among the remaining time-series;
3. Create a reference time-series of dates when at least one of the stocks has been traded starting
from the earliest common date found in the previous step;
4. Compare the reference time-series of dates with the time-series of dates of each stock and fill
the gaps dragging the last available price.
In this paper we chose p= 0.90 thus keeping as much as possible unmodified time-series. However,
the results do not change if we pick a higher value of p.
A.2 Weighting schemes
Here we shall define the two types of weighting schemes used in this paper for the ξiand ξik defined
in (6) and (9) respectively. The first weighting scheme is based on the eigenspectrum of Eand
G. It is useful now to explain the financial interpretation of the eigenvectors vwith entries viand
eigenvalue λfor E.vican be seen as weights for a portfolio defined by v. Measuring the risk from
the volatility of the portfolio via its variance, we see it is given by:
t X
viln |ri(t)|!2
vivjEij =λ(19)
Hence λrepresents the risk from the volatility of the portfolio given by v. We set ξi=vi, where
now viis the ith entry of the eigenvector corresponding to the largest eigenvalue of the empirical
correlation matrix E. This is called the market eigenvalue as it represents all stocks moving together
[13], and is also portfolio of stocks that gives the risk of the market volatility mode through its
corresponding eigenvalue. We could have also used a real index to determine the weights e.g. the
Dow Jones, but [45] showed that this does not effectively remove the influence of modes from returns
compared to a pseudo-index.
The weights ξik are established in a similar way to the market mode case, which we shall do by
considering only the part of Gwhich corresponds to members of the cluster. Defining a submatrix
of G
G(k)={G}(i,j)clusterk (20)
Where {...}(i,j)cluster k refers to only keeping the elements the matrix in which iand jare stocks
in cluster k. Thus G(k)is the square sub matrix of Gcorresponding to cluster k. This submatrix
is the correlation matrix of a market which consists only of stocks which are part of cluster k.
Hence, in exactly the same way as the market eigenvalue, the largest eigenvalue of G(k)represents
stocks of the cluster moving together, the value of the eigenvalue being the risk of the cluster market
portfolio, and the related eigenvector giving the weights of such a portfolio. Therefore, the definition
of the weights ξik for cluster kare determined by setting ωik =v(k)
i, which is the ith entry of the
eigenvector corresponding to the largest eigenvalue of G(k). This is the weighting scheme used and is
compared to the case of equal weights where ξi=1
Nand ξik =1
mkin figures 5a, 5b and 6 thereafter
the equal weights scheme is used.
A.3 Elastic Net Regression
Elastic net regression is used to find the values of βik and βikusing Eq. (7). Further details of the
use of this method is provided in this appendix. Elastic net regression [47] is a hybrid version of
ridge regularisation and lasso regression, thus providing a way of dealing with correlated explanatory
variables (in our case Ik(t)and Ik(t)) and also performing feature selection, which takes into account
non-interacting clusters Ik(t)that ridge regularisation would ignore. Elastic net regression solves
the constrained minimisation problem
t=1 ci(t)I(t)βi2+λPa(βi)(21)
, where βiis the vector of loadings given by (βi1, βi2,...,βiK ),I(t)is the matrix consisting of
columns (I1(t), I2(t),...,INcl (t)) and λand aare hyperparameters. Pa(βi)is defined as
Pa(βi) =
j=1 (1 a)β2
2+a|βij |!(22)
. The first term in the sum of Eq. (22) is the L2penalty for the ridge regularisation and the
second term in the sum is the L1penalty for the lasso regression. Hence if a= 0 then elastic
net reduces to ridge regression and if a= 1 then elastic net becomes lasso, with a value between
the two controlling the extent which one is preferred to the other. The determination of the a
hyperparameter, controlling the extent of lasso vs ridge, and λ, for the ridge, is done using 10 cross
validated fits [47], picking the pair of (a, λ)that give the minimum prediction error. We show the
values of βik and test the significance of the predictor Ik(t)at the 5% level in Table 3 , where the
p value is shown in brackets, using the significance test outlined in [61].
KO 0.9431(0) 0.8997(0)
RIG 0.9041 (0) 1.1265(0)
Table 3: This table shows the responsiveness to the cluster mode Ik(t),βik calibrated as detailed
in section 3.3. P values shown in brackets test the significance of the predictor given by the cluster
mode Ik(t). The first column is for the weighted scheme and second is for equal weights, which are
detailed in A.2.
200 400 600 800 1000 1200
Figure 11: Heat map of the correlation matrix for Gwith the stocks reordered to correspond to
their cluster no. from table 2. The colour legend for the heat map is given to the right of the figure.
A.4 Visualisation of Residuals and Factors
We can represent the correlation matrix Gdefined in eq. (8) as a heat map, which is shown in
figure 11 with the stocks reordered according their cluster no. kgiven by table 2. From figure 11,
we see the clusters of correlation matrix, which are given by the square blocks along the diagonal
that are more populated by higher correlation values. We also see the interactions between the
clusters which are represented by the rectangular blocks of higher correlation values away from the
main diagonal.
In order to provide a visualisation of the factors, we plot the time series of the market mode
I0(t)and the two particular cluster modes Ik(t)for k= 1,12, where the subscript of the cluster
modes indicates the particular clusters we are using from table 2, in figure 12. We see from figure
12 that the time series encodes important information regarding market conditions. In the plot for
I0(t)in figure 12a, the two periods of high volatility indicated by the red and black dashed lines
represent the Great Financial Crisis of 2008 and the Eurozone Debt Crisis (note that the extreme
low volatility seen before 2002 was caused by the American stock exchanges being shut down due
to the September 11th terrorist attack). The time series of I1(t)in figure 12b again shows a high
2000 2002 2005 2007 2010 2012 2015 2017
(a) I0(t)
2000 2002 2005 2007 2010 2012 2015 2017
(b) I1(t)
2000 2002 2005 2007 2010 2012 2015 2017
(c) I12(t)
Figure 12: Time series of the market mode I0(t)in a, and the cluster modes Ik(t)for k= 1,12
(see table 2) respectively in b and c, where the subscripts of the cluster modes refers to the clusters
given in table 2. The red dashed lines in these plots refers to the outbreak of the Great Financial
Crisis of 2008. The black dashed line in figure 12a marks a portion of the Eurozone debt crisis. The
light blue dashed line in figure 12b marks a low global demand in oil and gas supplies.
volatility period during the financial crisis, but we also see another high volatility phase denoted
by the light blue dashed line. This represents the volatility in the oil and gas markets caused by
low demand, and makes sense since table 2 shows that cluster 1 represents the Oil and Gas ICB
A.5 Smoothness of η
We plot ηas a function of the upper limit in the integrand of eq. (15), where the upper limit Lis
allowed to be in the interval [1, Lcut]. As we can see from both plots in figure 13, the line is much
smoother showing that the ηproxy is much more robust with respect to the noisy signal of the
empirical ACF. This offers an advantage of using ηrather than βvol which is more sensitive the the
noise in the ACF and gives poor fits to the ACF in log-log scale as can be seen from the examples
in figure 1.
0 50 100 150 200 250 300
(a) Coca Cola Enterprises Inc.
0 20 40 60 80 100 120 140
(b) Transoceanic
Figure 13: Integrated proxy ηas a function of the lag Lwhere ηis integrated over [1,L’] until
L=Lcut. Fig. 13a is for Coca Cola Co. and fig. 13b for Transocean
[1] Jean-Philippe Bouchaud and Marc Potters. Theory of financial risk and derivative pricing:
from statistical physics to risk management. Cambridge university press, 2009.
[2] John Hull and Alan White. The pricing of options on assets with stochastic volatilities. The
journal of finance, 42(2):281–300, 1987.
[3] John C Hull. Options, futures, and other derivatives. Pearson Education India, 2006.
[4] Joël Bun, Jean-Philippe Bouchaud, and Marc Potters. Cleaning large correlation matrices:
tools from random matrix theory. Physics Reports, 666:1–109, 2017.
[5] Luc Bauwens, Sébastien Laurent, and Jeroen VK Rombouts. Multivariate garch models: a
survey. Journal of applied econometrics, 21(1):79–109, 2006.
[6] Peter K Clark. A subordinated stochastic process model with finite variance for speculative
prices. Econometrica: journal of the Econometric Society, pages 135–155, 1973.
[7] Torben G Andersen, Tim Bollerslev, Francis X Diebold, and Paul Labys. Modeling and fore-
casting realized volatility. Econometrica, 71(2):579–625, 2003.
[8] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a
comparative. J Mach Learn Res, 10:66–71, 2009.
[9] Ian T Jolliffe. Principal component analysis and factor analysis. In Principal component
analysis, pages 115–128. Springer, 1986.
[10] J Darbyshire. The volatility surface: a practitioner’s guide, volume 357. Aitch & Dee Limited,
[11] Carol Alexander. Principal component models for generating large garch covariance matrices.
Economic Notes, 31(2):337–359, 2002.
[12] Kun Zhang and Laiwan Chan. Efficient factor garch models and factor-dcc models. Quantitative
Finance, 9(1):71–91, 2009.
[13] Vasiliki Plerou, Parameswaran Gopikrishnan, Bernd Rosenow, Luis A Nunes Amaral, Thomas
Guhr, and H Eugene Stanley. Random matrix approach to cross correlations in financial data.
Physical Review E, 65(6):066126, 2002.
[14] Satya N Majumdar and Pierpaolo Vivo. Number of relevant directions in principal component
analysis and wishart random matrices. Physical review letters, 108(20):200601, 2012.
[15] Donald A Jackson. Stopping rules in principal components analysis: a comparison of heuristical
and statistical approaches. Ecology, 74(8):2204–2214, 1993.
[16] Giacomo Livan, Simone Alfarano, and Enrico Scalas. Fine structure of spectral properties
for random correlation matrices: An application to financial markets. Physical Review E,
84(1):016113, 2011.
[17] William F Sharpe. Capital asset prices: A theory of market equilibrium under conditions of
risk. The journal of finance, 19(3):425–442, 1964.
[18] Richard Roll and Stephen A Ross. An empirical investigation of the arbitrage pricing theory.
The Journal of Finance, 35(5):1073–1103, 1980.
[19] Eugene F Fama and Kenneth R French. Common risk factors in the returns on stocks and
bonds. Journal of financial economics, 33(1):3–56, 1993.
[20] Rémy Chicheportiche and J-P Bouchaud. A nested factor model for non-linear dependencies
in stock returns. Quantitative Finance, 15(11):1789–1804, 2015.
[21] Eugene F Fama and Kenneth R French. Multifactor explanations of asset pricing anomalies.
The journal of finance, 51(1):55–84, 1996.
[22] Charles Engel, Nelson C Mark, and Kenneth D West. Factor model forecasts of exchange rates.
Econometric Reviews, 34(1-2):32–55, 2015.
[23] Bruce Thompson. Exploratory and confirmatory factor analysis: Understanding concepts and
applications. American Psychological Association, 2004.
[24] Robert C Merton. An intertemporal capital asset pricing model. Econometrica: Journal of the
Econometric Society, pages 867–887, 1973.
[25] Michael Zabarankin, Konstantin Pavlikov, and Stan Uryasev. Capital asset pricing model
(capm) with drawdown measure. European Journal of Operational Research, 234(2):508–517,
[26] Nicholas Barberis, Robin Greenwood, Lawrence Jin, and Andrei Shleifer. X-capm: An extrap-
olative capital asset pricing model. Journal of Financial Economics, 115(1):1–24, 2015.
[27] Harry Markowitz. Portfolio selection. The journal of finance, 7(1):77–91, 1952.
[28] Eugene F Fama and Kenneth R French. The cross-section of expected stock returns. the
Journal of Finance, 47(2):427–465, 1992.
[29] Gregory Connor, Matthias Hagmann, and Oliver Linton. Efficient semiparametric estimation
of the fama–french model and extensions. Econometrica, 80(2):713–754, 2012.
[30] Robert Faff, Philip Gharghori, and Annette Nguyen. Non-nested tests of a gdp-augmented
fama–french model versus a conditional fama–french model in the australian stock market.
International Review of Economics & Finance, 29:627–638, 2014.
[31] Eugene F Fama and Kenneth R French. A five-factor asset pricing model. Journal of Financial
Economics, 116:1–22, 2015.
[32] Nai-Fu Chen, Richard Roll, and Stephen A Ross. Economic forces and the stock market.
Journal of business, pages 383–403, 1986.
[33] Marc R Reinganum. The arbitrage pricing theory: some empirical results. The Journal of
Finance, 36(2):313–321, 1981.
[34] Robert Faff. A simple test of the fama and french model using daily data: Australian evidence.
Applied Financial Economics, 14(2):83–92, 2004.
[35] Robert R Grauer and Johannus A Janmaat. Cross-sectional tests of the capm and fama–french
three-factor model. Journal of banking & Finance, 34(2):457–470, 2010.
[36] François-Eric Racicot and William F Rentz. Testing fama–french’s new five-factor asset pricing
model: evidence from robust instruments. Applied Economics Letters, 23(6):444–448, 2016.
[37] Yannick Malevergne and D Sornette. Collective origin of the coexistence of apparent random
matrix theory noise and of factors in large sample correlation matrices. Physica A: Statistical
Mechanics and its Applications, 331(3):660–668, 2004.
[38] Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna. Hierarchically nested factor
model from multivariate data. EPL (Europhysics Letters), 78(3):30006, 2007.
[39] Won-Min Song, Tiziana Di Matteo, and Tomaso Aste. Hierarchical information clustering by
means of topologically embedded graphs. PLoS One, 7(3):e31929, 2012.
[40] Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo. Relation between financial market struc-
ture and the real economy: comparison between clustering methods. PloS one, 10(3):e0116201,
[41] Stephen J Taylor. Modeling stochastic volatility: A review and comparative study.
Mathematical finance, 4(2):183–204, 1994.
[42] F Jay Breidt, Nuno Crato, and Pedro De Lima. The detection and estimation of long memory
in stochastic volatility. Journal of econometrics, 83(1-2):325–348, 1998.
[43] Ajay Singh and Dinghai Xu. Random matrix application to correlations amongst the volatility
of assets. Quantitative Finance, 16(1):69–83, 2016.
[44] Laurent Laloux, Pierre Cizeau, Jean-Philippe Bouchaud, and Marc Potters. Noise dressing of
financial correlation matrices. Physical review letters, 83(7):1467, 1999.
[45] Christian Borghesi, Matteo Marsili, and Salvatore Miccichè. Emergence of time-horizon in-
variant correlation structure in financial returns by subtraction of the market mode. Physical
Review E, 76(2):026104, 2007.
[46] T. Di Matteo N. Musmeci, T. Aste. Interplay between past market correlation structure changes
and future volatility outbursts. Scientific Reports 6, 6:36320, 2016.
[47] Hui Zou and Trevor Hastie. Regularization and variable selection via the elastic net. Journal
of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2005.
[48] Rama Cont. Empirical properties of asset returns: stylized facts and statistical issues.
Quantitative Finance, 2001.
[49] Anirban Chakraborti, Ioane Muni Toke, Marco Patriarca, and Frédéric Abergel. Econophysics
review: Ii. agent-based models. Quantitative Finance, 11(7):1013–1041, 2011.
[50] Benoit B Mandelbrot. The variation of certain speculative prices. In Fractals and Scaling in
Finance, pages 371–418. Springer, 1997.
[51] Henri Theil. A rank-invariant method of linear and polynomial regression analysis. In Henri
Theil’s contributions to economics and econometrics, pages 345–381. Springer, 1992.
[52] S Micciche. Empirical relationship between stocks cross-correlation and stocks volatility clus-
tering. Journal of Statistical Mechanics: Theory and Experiment, 2013(05):P05015, 2013.
[53] George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series
analysis: forecasting and control, page 33. John Wiley & Sons, 2015.
[54] Lothar Sachs. Applied statistics: a handbook of techniques. Springer Science & Business
Media, 2012.
[55] Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo. Risk diversification: a study of per-
sistence with a filtered correlation-network approach. Journal of Network Theory in Finance,
1(1):77–98, 2015.
[56] Michele Tumminello, Salvatore Micciche, Fabrizio Lillo, Jyrki Piilo, and Rosario N Mantegna.
Statistically validated networks in bipartite complex systems. PloS one, 6(3):e17994, 2011.
[57] Willliam Feller. An introduction to probability theory and its applications, volume 2. John
Wiley & Sons, 2008.
[58] Ian T Jolliffe. A note on the use of principal components in regression. Applied Statistics,
pages 300–303, 1982.
[59] Kristopher J Preacher, Guangjian Zhang, Cheongtag Kim, and Gerhard Mels. Choosing
the optimal number of factors in exploratory factor analysis: A model selection perspective.
Multivariate Behavioral Research, 48(1):28–56, 2013.
[60] Dennis Child. The essentials of factor analysis. A&C Black, 2006.
[61] Richard Lockhart, Jonathan Taylor, Ryan J Tibshirani, and Robert Tibshirani. A significance
test for the lasso. Annals of statistics, 42(2):413, 2014.
... In this search for tractable and plausible latent factor estimation methods, it is crucial to take advantage of the structural specifics and statistical stylized facts of financial markets [19]. Since global markets consist of assets from different exchanges and various asset classes, certain risk factors will be specific only to a subset of assets [20]. For instance, in a global set of financial assets, pervasive global factors may affect all time series (such as the global macroeconomic and market shocks), and cluster-specific factor related to certain countries will affect only specific clusters of assets (for instance, European stocks will be affected by their own set of factors and may not be affected by some Asian market factors, after controlling for the common global component). ...
... Clusters of assets are also known to emerge in stocks of single equity markets (for instance, clusters of stocks belonging to the same sectors) -Kakushadze et al. [45] consider clustering techniques for estimating these groups from the asset return time series. Verma et al. [20] proposed a cluster-specific factor model for the log-volatility with the goal of studying the heteroskedastic properties of volatility in financial assets returns. Other clustering approaches were also shown to improve high-dimensional covariance matrix estimates, which ultimately reduces risk in optimized portfolios [46]- [49]. ...
Full-text available
Unsupervised learning methods have been increasingly used for detecting latent factors in high-dimensional time series, with many applications, especially in financial risk modelling. Most latent factor models assume that the factors are pervasive and affect all of the time series. However, some factors may affect only certain assets in financial markets, due to their clustering within countries, asset classes, or sector classifications. In this paper we consider high-dimensional financial time series with pervasive and cluster-specific latent factors, and propose a clustering and latent factor estimation method. We also develop a model selection algorithm, based on the spectral properties of asset correlation matrices and asset graphs. A simulation study with known data generating processes demonstrates that the proposed method outperforms other clustering methods and provides estimates with a high degree of accuracy. Moreover, the model selection procedure is also shown to provide stable and accurate estimates for the number of clusters and latent factors. We apply the proposed methods to datasets of asset returns from global financial markets using a backtesting approach. The results demonstrate that the clustering approach and estimated latent factors yield relevant information, improve risk modelling and reduce volatility in optimal minimum variance portfolios.
... Let us now use C to partition the stock market into non-overlapping communities of stocks that are more correlated internally than expected under a suitable null model. Detecting communities in financial markets is not new in the literature: for instance, Fenn et al. (2012) compared different procedures to unfold the community structure of the foreign exchange market and Verma et al. (2019) used clusters to extract relevant factors for volatility modeling. However, the procedure we are now going to illustrate is based on a combination of modularity maximization (Clauset et al., 2004;Newman, 2006) and RMT (MacMahon and Garlaschelli, 2015), which was shown to be theoretically superior in the case of correlation matrices. ...
Full-text available
The idiosyncratic (microscopic) and systemic (macroscopic) components of market structure have been shown to be responsible for the departure of the optimal mean-variance allocation from the heuristic `equally-weighted' portfolio. In this paper, we exploit clustering techniques derived from Random Matrix Theory (RMT) to study a third, intermediate (mesoscopic) market structure that turns out to be the most stable over time and provides important practical insights from a portfolio management perspective. First, we illustrate the benefits, in terms of predicted and realized risk profiles, of constructing portfolios by filtering out both random and systemic co-movements from the correlation matrix. Second, we redefine the portfolio optimization problem in terms of stock clusters that emerge after filtering. Finally, we propose a new wealth allocation scheme that attaches equal importance to stocks belonging to the same community and show that it further increases the reliability of the constructed portfolios. Results are robust across different time spans, cross-sectional dimensions and set of constraints defining the optimization problem
... Cluster-based factor models have been proposed by Tumminello et al. (2010) and Verma et al. (2019). The reason this is worth pursuing is that a factor model provides weights for each cluster, thereby quantifying the relative importance of the CDS. ...
Full-text available
The hedge effectiveness of the credit default swap (CDS) indices is analyzed. Starting with the CDS eligible for inclusion in the indices, CDX.NA.IG, CDX.NA.HY, iTraxx Europe and iTraxx Xover, a credit portfolio construction algorithm is proposed. It is based on principal components for variable selection. The spectral decomposition defines a deletion criterion that selects alternative portfolios of CDS for comparison to the printed indices. Hedge back testing indicates that as few as 2 names can replicate the behavior of the traded index with volatility reductions of up to an order of magnitude. To understand this, the network topology of the CDS is then studied via hierarchical trees and an associated nested factor model. This shows a ‘market effect’ across the majority of CDS eligible for a specific index, which equates to low information content versus the broader market. The net result is that a small subset of credits captures the price action.
... • In presence of heterogeneous entities, the same measured correlation value might correspond to very different levels of statistical significance for distinct pairs of nodes. For this reason, simply imposing a common global threshold on all correlations is inadequate, and alternative filtering techniques that project the original correlation matrix onto minimum spanning trees 87 , maximally planar graphs 88 , or more general manifolds 89 have been introduced (see Figure 2d). These approaches found that financial entities belonging to the same nominal category can have very different connectivity properties (e.g. ...
Full-text available
As the total value of the global financial market outgrew the value of the real economy, financial institutions created a global web of interactions that embodies systemic risks. Understanding these networks requires new theoretical approaches and new tools for quantitative analysis. Statistical physics contributed significantly to this challenge by developing new metrics and models for the study of financial network structure, dynamics, and stability and instability. In this Review, we introduce network representations originating from different financial relationships, including direct interactions such as loans, similarities such as co-ownership and higher-order relations such as contracts involving several parties (for example, credit default swaps) or multilayer connections (possibly extending to the real economy). We then review models of financial contagion capturing the diffusion and impact of shocks across each of these systems. We also discuss different notions of ‘equilibrium’ in economics and statistical physics, and how they lead to maximum entropy ensembles of graphs, providing tools for financial network inference and the identification of early-warning signals of system-wide instabilities.
... While [11] noted that only the market itself contributes to volatility clustering in a global sense, locally, the cluster itself can contribute to volatility clustering effect, e.g., efficiency-inducing policies for a sector. There have been several works that further understand volatility clustering. ...
Full-text available
Contagion arising from clustering of multiple time series like those in the stock market indicators can further complicate the nature of volatility, rendering a parametric test (relying on asymptotic distribution) to suffer from issues on size and power. We propose a test on volatility based on the bootstrap method for multiple time series, intended to account for possible presence of contagion effect. While the test is fairly robust to distributional assumptions, it depends on the nature of volatility. The test is correctly sized even in cases where the time series are almost nonstationary. The test is also powerful specially when the time series are stationary in mean and that volatility are contained only in fewer clusters. We illustrate the method in global stock prices data.
... Previous studies have used different methods to analyze historic correlations as random matrix theory to identify the distribution of eigenvalues concerning financial correlations [6][7][8], the partial transfer entropy to quantify the indirect influence that stock indices have on one another [9], the approaches from information theory in exploring the uncertainty within the financial system [10,11], community structure analysis [12], multilayer network methods [13][14][15][16][17][18], and filtering methods. Several authors have used network filtering methods to explain financial structures [19,20], hierarchy and networks in financial markets [21], relations between financial markets and real economy [22], volatility [23], interest rates [24], stock markets [25][26][27][28], future markets [29] or topological dynamics [30] to list a few. Also, the comparison of filtering methods to market data has been used for financial instruments. ...
In this work, we investigate the impact of the COVID-19 pandemic on sovereign bond yields. We consider the temporal changes from financial correlations using network filtering methods. These methods consider a subset of links within the correlation matrix, which gives rise to a network structure. We use sovereign bond yield data from 17 European countries between the 2010 and 2020 period. We find the mean correlation to decrease across all filtering methods during the COVID-19 period. We also observe a distinctive trend between filtering methods under multiple network centrality measures. We then relate the significance of economic and health variables towards filtered networks within the COVID-19 period. Under an exponential random graph model, we are able to identify key relations between economic groups across different filtering methods.
... • In presence of heterogeneous entities, the same measured correlation value might correspond to very different levels of statistical significance for distinct pairs of nodes. For this reason, simply imposing a common global threshold on all correlations is inadequate, and alternative filtering techniques that project the original correlation matrix onto minimum spanning trees 87 , maximally planar graphs 88 , or more general manifolds 89 have been introduced (see Figure 2d). These approaches found that financial entities belonging to the same nominal category can have very different connectivity properties (e.g. ...
Full-text available
The field of Financial Networks is a paramount example of the novel applications of Statistical Physics that have made possible by the present data revolution. As the total value of the global financial market has vastly outgrown the value of the real economy, financial institutions on this planet have created a web of interactions whose size and topology calls for a quantitative analysis by means of Complex Networks. Financial Networks are not only a playground for the use of basic tools of statistical physics as ensemble representation and entropy maximization; rather, their particular dynamics and evolution triggered theoretical advancements as the definition of DebtRank to measure the impact and diffusion of shocks in the whole systems. In this review we present the state of the art in this field, starting from the different definitions of financial networks (based either on loans, on assets ownership, on contracts involving several parties -- such as credit default swaps, to multiplex representation when firms are introduced in the game and a link with real economy is drawn) and then discussing the various dynamics of financial contagion as well as applications in financial network inference and validation. We believe that this analysis is particularly timely since financial stability as well as recent innovations in climate finance, once properly analysed and understood in terms of complex network theory, can play a pivotal role in the transformation of our society towards a more sustainable world.
... Management of risk in complex systems, such as financial markets requires clear quantification of the complexity. The measures proposed in this paper complement the existing statistical finance literature on describing evolution of markets during crisis and non-crisis periods [11,[48][49][50][51]. In this work we have used the word complexity to mean emergent instability, similar to Kuyyamudi et al. [11]. ...
Full-text available
The stock market is a canonical example of a complex system, in which a large number of interacting agents lead to joint evolution of stock returns and the collective market behavior exhibits emergent properties. However, quantifying complexity in stock market data is a challenging task. In this report, we explore four different measures for characterizing the intrinsic complexity by evaluating the structural relationships between stock returns. The first two measures are based on linear and non-linear co-movement structures (accounting for contemporaneous and Granger causal relationships), the third is based on algorithmic complexity, and the fourth is based on spectral analysis of interacting dynamical systems. Our analysis of a dataset comprising daily prices of a large number of stocks in the complete historical data of NASDAQ (1972–2018) shows that the third and fourth measures are able to identify the greatest global economic downturn in 2007–09 and associated spillovers substantially more accurately than the first two measures. We conclude this report with a discussion of the implications of such quantification methods for risk management in complex systems.
In order to explore the impact of information delay and liquidity on the financial market, we propose a delay agent-based model from the perspective of micro evolution in financial market based on the methods of agent-based model and econophysics. Mean escape time and escape rate in econophysics is used to measure stock price stability. The empirical comparison with benchmark VaR and CVaR is carried out, and the results of stochastic simulation are in good agreement with those of empirical analysis. Combined with the real data of China’s stock market, the results of theoretical stochastic simulation and empirical analysis indicate that (1) An optimal information delay is associated with the strongest stability of financial market; (2) The increase of liquidity will weaken the stability of the financial market; (3) Both information delay and liquidity can induce the nonmonotonic behavior in mean escape time versus the intensity of the quantifies market noise and mean escape rate versus delay time. In other words, we can observe that information delay enhances system stability. In addition, the existence of the worst market noise greatly weakens the stability of the market itself.
Full-text available
We present a set of stylized empirical facts emerging from the statistical analysis of price variations in various types of financial markets. We first discuss some general issues common to all statistical studies of financial time series. Various statistical properties of asset returns are then described: distributional properties, tail properties and extreme fluctuations, pathwise regularity, linear and nonlinear dependence of returns in time and across stocks. Our description emphasizes properties common to a wide variety of markets and instruments. We then show how these statistical properties invalidate many of the common statistical approaches used to study financial data sets and examine some of the statistical problems encountered in each case.
Full-text available
We report significant relations between past changes in the market correlation structure and future changes in the market volatility. This relation is made evident by using a measure of “correlation structure persistence” on correlation-based information filtering networks that quantifies the rate of change of the market dependence structure. We also measured changes in the correlation structure by means of a “metacorrelation” that measures a lagged correlation between correlation matrices computed over different time windows. Both methods show a deep interplay between past changes in correlation structure and future changes in volatility and we demonstrate they can anticipate market risk variations and this can be used to better forecast portfolio risk. Notably, these methods overcome the curse of dimensionality that limits the applicability of traditional econometric tools to portfolios made of a large number of assets. We report on forecasting performances and statistical significance of both methods for two different equity datasets. We also identify an optimal region of parameters in terms of True Positive and False Positive trade-off, through a ROC curve analysis. We find that this forecasting method is robust and it outperforms logistic regression predictors based on past volatility only. Moreover the temporal analysis indicates that methods based on correlation structural persistence are able to adapt to abrupt changes in the market, such as financial crises, more rapidly than methods based on past volatility.
Full-text available
This review covers recent results concerning the estimation of large covariance matrices using tools from Random Matrix Theory (RMT). We introduce several RMT methods and analytical techniques, such as the Replica formalism and Free Probability, with an emphasis on the Marchenko-Pastur equation that provides information on the resolvent of multiplicatively corrupted noisy matrices. Special care is devoted to the statistics of the eigenvectors of the empirical correlation matrix, which turn out to be crucial for many applications. We show in particular how these results can be used to build consistent "Rotationally Invariant" estimators (RIE) for large correlation matrices when there is no prior on the structure of the underlying process. The last part of this review is dedicated to some real-world applications within financial markets as a case in point. We establish empirically the efficacy of the RIE framework, which is found to be superior in this case to all previously proposed methods. The case of additively (rather than multiplicatively) corrupted noisy matrices is also dealt with in a special Appendix. Several open problems and interesting technical developments are discussed throughout the paper.
Full-text available
Fama and French (FF, 2015) propose a five-factor asset pricing model that captures size, value, profitability and investment patterns. The primary purpose here is to further investigate this new model using an improved GMM-based robust instrumental variables technique. A further purpose is to explore the relationship among the FF factors and the Pástor–Stambaugh (PS, 2003) liquidity factor. We conclude that except for the market factor, all of the factors including liquidity are not significant at even the 5% level using our GMM approach for almost all of the FF 12 sectors.
An English translation now joins the Russian and Spanish versions. It is based on the newly revised fifth edition of the German version of the book. The original edition has become very popular as a learning and reference source with easy to follow recipes and cross references for scientists in fields such as engineering, chemistry and the life sciences. Little mathematical background is required of the reader and some important topics, like the logarithm, are dealt with in the preliminaries preceding chapter one. The usefulness of the book as a reference is enhanced by a number of convenient tables and by references to other tables and methods, both in the text and in the bibliography. The English edition contains more material than the German original. I am most grateful to all who have in conversations, letters or reviews suggested improvements in or criticized earlier editions. Comments and suggestions will continue to be welcome. We are especially grateful to Mrs. Dorothy Aeppli of St. Paul, Minnesota, for providing numerous valuable comments during the preparation of the English manuscript. The author and the translator are responsible for any remaining faults and imperfections. I welcome any suggestions for improvement. My greatest personal gratitude goes to the translator, Mr. Zenon Reynaro wych, whose skills have done much to clarify the text, and to Springer-Verlag."
Empirical tests are reported for Ross' [48] arbitrage theory of asset pricing. Using data for individual equities during the 1962–72 period, at least three and probably four priced factors are found in the generating process of returns. The theory is supported in that estimated expected returns depend on estimated factor loadings, and variables such as the own standard deviation, though highly correlated (simply) with estimated expected returns, do not add any further explanatory power to that of the factor loadings.
Previous work shows that average returns on common stocks are related to firm characteristics like size, earnings/price, cash flow/price, book-to-market equity, past sales growth, long-term past return, and short-term past return. Because these patterns in average returns apparently are not explained by the CAPM, they are called anomalies. We find that, except for the continuation of short-term returns, the anomalies largely disappear in a three-factor model. Our results are consistent with rational ICAPM or APT asset pricing, but we also consider irrational pricing and data problems as possible explanations.