Content uploaded by Rainer Alexander Schüssler
Author content
All content in this area was uploaded by Rainer Alexander Schüssler on Jul 05, 2024
Content may be subject to copyright.
Ensembles of Portfolio Rules
Federico Nardari1and Rainer Alexander Schüssler2
1University of Melbourne
2University of Rostock
June 29, 2024
Abstract
We propose an ensemble framework for combining heterogeneous portfolio
rules that cannot be accommodated by previously proposed combination methods.
Using our approach, researchers and investors can take advantage of established
and ongoing advances in portfolio choice by diversifying the idiosyncratic risks of
alternative rules. Our ensemble approach maximizes the utility jointly generated by
the candidate portfolio rules, while allowing learning about their time-varying
relative performance. Based on out-of-sample evaluations of over forty years, we
document substantial utility gains in extensive applications to cross-sections of
stocks and to market timing.
Keywords: Portfolio choice; Combination of estimators; Ensemble learning; Estimation
risk
JEL classifications: G11, C10
∗
We are grateful to Philipp Adämmer, Michael Brandt, Lin William Cong, Christoph Frey, Antonio
Gargano, Vasyl Golosnoy, Bruce Grundy, Moritz Heiden, Sebastian Heiden, Philipp Haid, Joachim
Inkmann, Philipp Kaufmann, Patrick Kelly, Henri Nyberg, Yarema Okhrin, Winfried Pohlmier, Rafael
Weißbach, Qi Zeng, Guofu Zhou, and participants of the 1st FinEML Conference in Rotterdam, the 17th
International Conference on Computational and Financial Econometrics in Berlin and the research
seminars at Goethe University Frankfurt, University of Innsbruck, University of Münster, University of
Rostock, University of Hagen, University of Lugano and University of Konstanz for valuable comments
and discussions.
†
E-mail: federico.nardari@unimelb.edu.au; Phone: +61 3 9035 4133. The University of Melbourne,
Faculty of Business and Economics, Department of Finance. Level 11, 198 Berkeley Street, 3010 Victoria,
Australia.
‡
E-mail: rainer.schuessler@uni-rostock.de; Phone: +49 381 498-4316: University of Rostock, Faculty of
Economics and Social Sciences, Department of Economics. Ulmenstraße 69, 18057 Rostock, Germany.
1 Introduction
Over time, many ingenious portfolio rules (PRs) have been devised. A variety of techniques
have been proposed for a cross-section of risky assets, many of which are designed to
address the empirical shortcomings of the seminal Markowitz (1952) Mean-Variance
(MV) framework: these contributions include shrinkage approaches (see, e.g., Ledoit
and Wolf 2004;Barroso and Saxena 2022), strategies that exploit a factor structure
implied by asset pricing models (see, e.g., MacKinlay and Pástor 2000), volatility timing
strategies (see, e.g., Kirby and Ostdiek 2012), parametric portfolio policies (see, e.g.,
Brandt et al. 2009;DeMiguel et al. 2020), risk-parity strategies (see, e.g., Maillard et al.
2010;Roncalli 2014) and approaches that exploit asset characteristics using machine
learning techniques (see, e.g., Gu et al. 2020;Liu and Zhou 2024;Chen et al. 2024;
Cong et al. 2024). In addition, DeMiguel et al. (2009) and Duchin and Levy (2009)
show that the 1
/N
rule, which avoids estimation error by ignoring sample information,
outperforms many optimization-based rules in challenging out-of-sample (OOS) settings.
Similarly, many PRs have been proposed for optimal market timing, i.e., the allocation
between an aggregate equity portfolio and a risk-free asset. While some of these rules
use macroeconomic data and financial ratios (see, e.g., Rapach et al. 2010;Ferreira
and Santa-Clara 2011;Dangl and Halling 2012;Johannes et al. 2014), others rely on
forward-looking information from option prices (see, e.g., Pyun 2019), or exploit long-short
return anomalies in the cross-section of stocks by using machine learning methods and
shrinkage techniques (Dong et al.,2022).
Each of the above PRs (and any other PR) is defined by the information set it uses
and how it maps information into asset weights for a given asset universe. Thus, each PR
is a specialized way of searching the asset universe. Different PRs have different strengths
and limitations, and there are economic and statistical reasons for combining them.
Economically, combinations of PRs can diversify across their idiosyncratic risks, including
estimation risk. The intuition is similar to diversification across assets for investors with a
concave utility function. Moreover, because different PRs use different sets of information
1
and/or different methods for processing information into portfolio weights, combinations
of PRs can capture complementary aspects of the return generation process, which is
particularly important given the notoriously low signal-to-noise ratio of asset returns.
Statistically, PRs can be thought of as estimators. In various contexts, combinations of
estimators have been shown to be theoretically appealing and to perform well in empirical
applications. Overall, there are good reasons for combining PRs rather than relying on
one particular PR and rejecting all alternatives.
Indeed, the existing literature has developed several combination approaches aimed at
controlling estimation error and consequently improving OOS performance. However,
existing approaches are applicable to rather specific and limited sets of PRs, typically
within the MV or Global Minimum Variance (GMV) framework, with the addition of the
1
/N
rule (see, e.g., Kan and Zhou 2007;Tu and Zhou 2011;Kan et al. 2022;Lassance
et al. 2023). Moreover, in order to determine the optimal combination, they usually rely
on specific distributional assumptions about the generating process of asset returns.
As a result, existing combination strategies provide a rather narrow set of tools for
diversifying across PRs and thus for improving asset allocation performance. To the
best of our knowledge, there is no utility-based optimization framework for combining
multiple PRs that rely on heterogeneous information sets and/or very different methods
for mapping information into asset weights, while bypassing the estimation of the return
moments of the PRs.
1
For example, for a cross-section of assets there is currently no
utility maximization framework for combining a shrinkage-based approach such as, e.g.,
the method of Barroso and Saxena (2022), with, say, a volatility timing strategy and with
rules based on cross-sectional characteristics. Similarly, it is not obvious how to combine
market timing rules based on point forecasts for the equity premium with rules based
on density forecasts and/or with others that exploit cross-sectional characteristics. In
particular, existing combination methods can themselves be considered as candidate PRs
1
In our proposed approach, we treat as given the mapping between information signals and asset
weights that each candidate PR implies. As shown in Section 3.1, it is straightforward to back out the
asset weights implied by combining the PRs.
2
and thus be combined with other PRs. Existing approaches do not allow for such an
additional layer. Finally, there is no optimization framework to adapt the combination to
changing market environments, as certain (combinations of) PRs may outperform at
certain points in time, while others may shine at other points in time. Overall, the
existing literature does not allow to fully exploit the relative strengths of the many
solutions that have been or will be proposed. In Section 2we relate our approach to the
existing literature.
The goal of this work is to address these shortcomings. In particular, our primary goal
is to develop an overarching optimization framework for integrating heterogeneous PRs
that alternative combination methods cannot accommodate. In developing our framework,
we develop ways to mitigate estimation error at the combination stage.
2
Our framework
can be viewed as an outer layer, in which candidate PRs, regardless of their design, can
be combined. Consequently, our framework allows researchers to take full advantage of
existing and ongoing advances in asset allocation.
Importantly, our proposed framework should not be seen as a replacement for the
above-mentioned combination approaches, which target specific PRs (e.g., MV or GMV)
by exploiting a specialized structure. Rather, these combination rules themselves can be
added to the pool of candidate PRs and combined with other differently designed rules
(e.g., volatility timing or risk parity) within our ensemble. Section 5elaborates on this key
aspect and provides empirical support.
The investor in our framework is endowed with power utility preferences and has
access to a library of candidate PRs. In each period, they choose a combination of PRs
2
In fact, it appears that the lack of effective methods to limit estimation error may be the primary
reason why previous research has focused on combining only two PRs. Adding more PRs to the
combination does not necessarily result in empirical gains. As Tu and Zhou 2011 note, “Theoretically, if
the true optimal combination coefficients are known, combining more than two rules must dominate
combining any subset of them. However, the true optimal combination coefficients are unknown and have
to be estimated. As more rules are combined, more combination coefficients need to be estimated and the
estimation errors can grow. Hence, combining more than two rules may not improve the performance.
3
that would have maximized their pseudo OOS utility.
3
In determining the combination of
PRs that optimizes utility, the framework retains many attractive features. Although
previously proposed combination strategies share some of these features, no other method
we are aware of possesses all of them. Specifically, our approach:
i) Relies on the pseudo OOS returns of the candidate PRs. Since the optimal combination
of PRs is based on OOS utility gains, all that is needed to implement our approach is a
record of the assigned asset weights of the PRs and the subsequent pseudo OOS returns.
This allows the combination of many heterogeneous PRs, since each of them produces a
weight vector for the allocation across assets, while avoiding the problem of predicting the
moments of the PRs’ returns. As a result, our setup requires the estimation of fewer
parameters and thus reduces the estimation risk at the combination stage. However, we
emphasize that the candidate PRs themselves may or may not use estimated (conditional
or unconditional) moments of asset returns to construct portfolios.
ii) Is an ensemble framework. Our approach assigns combination weights based on the
realized pseudo OOS utility of the combined PRs, rather than their individually generated
utility. As an analogy to building a sports team, our combination framework does not
necessarily include the best individual players, but rather builds the best possible team.
The ensemble view automatically accounts for the (time-varying) interdependencies
among the OOS returns of the PRs. These interdependencies include correlations and
higher order co-moments. There is no conceptual limit to the number of PRs included in
the ensemble.
iii) Allows for adaptive learning. In our approach, economic gains in the recent past can
be emphasized over profitability in the more distant past by using a weighting factor.
This allows adaptive learning about the optimal combination of weights, and allows for
rapid shifts when empirically justified. At the level of asset returns, Farmer et al. (2023)
3
We focus on economic utility in the objective function rather than on a statistical criterion. It is well
known that statistical and economic evaluation criteria are not necessarily closely related. Leitch and
Tanner (1991) show that accurate predictions in terms of statistical criteria such as the root mean squared
error can lead to unprofitable portfolio allocations. Cenesizoglu and Timmermann (2012) corroborate this
finding in an application to equity premium forecasts, finding only a weak relationship between economic
utility measures and statistical forecast accuracy.
4
find short stretches of predictability for (aggregate) stock returns by a given predictor
that are interspersed with long periods showing no evidence of predictability. Similarly,
our modeling approach is designed to capture PRs, or combinations of PRs, that perform
well “locally” in time.
iv) Does not assume a specific data generating process (DGP) for asset returns or for the
PRs’ returns. To determine the combination weights, we do not make any assumptions
about the return-generating process.
4
Thus, our combination framework can be viewed
as a controlling instance: if a candidate PR is grossly misspecified and has nothing to
contribute to the ensemble, it will not be selected to be part of the combination. We
describe our methodology in Section 3.
In Section 4we apply our combination framework to two classic portfolio choice
problems. The first involves allocating across restricted universes of US stocks, and the
second involves allocating between the S&P 500 index and Treasury bills. To increase
diversification benefits, we build a pool of heterogeneous candidate PRs, including both
established and emerging PRs. Based on over forty years of OOS evaluations, we find
substantial utility gains from combining PRs. The utility generated by our combination
is either higher than that of any candidate PR or approximately equal to that of the
(ex-post) best-performing candidate PR. Our approach also appears to add value relative
to previously proposed combination methods. In addition, we empirically demonstrate the
virtues of the ensemble framework and adaptive learning. Combination weights change
rapidly over time, documenting that different (combinations of) PRs work well at different
points in time. We perform deeper analyses to shed light on the mechanisms at work in
generating utility gains and the potential for expanding the pool of candidate PRs. These
analyses show that our proposed combination method, by maximizing utility, chooses a
combination of PRs that balances the predictive power for asset returns and the ability
to anticipate their variance; further, average utility gains increase with the number of
PRs combined, suggesting further room for improvement by increasing the number of
4
Note, however, that the candidate PRs may or may not make assumptions about the return-generating
process.
5
candidate PRs.
Researchers and investors who rely on asset pricing rationales or empirical regularities
to make asset allocation decisions face several dimensions of uncertainty. First, there is
uncertainty about which asset pricing rationale or empirical regularity to rely on. Second,
regardless of whether the former or the latter is relied upon, in most cases they are only
suggestive of which variables (e.g., predictive variables for different return moments)
might be relevant for portfolio construction. Third, there is uncertainty about which
econometric or machine learning techniques should be used to translate information into
portfolio weights. Our ensemble approach helps researchers and investors to deal with
these multiple dimensions of uncertainty.
The applications presented in our empirical work are intended to illustrate the efficacy
of our methodological framework. It is not our intention to promote any particular
candidate PR, and we acknowledge that other researchers or investors may prefer to use
our approach with alternative sets of candidate PRs. The primary contributions of our
study are methodological in nature. In particular, we present a framework that: (a)
allows for a comprehensive exploitation of the relative merits of the multitude of proposed
solutions to portfolio choice problems; and (b) allows for the assessment of the incremental
empirical merits (or lack thereof) of newly proposed PRs. Because our ensemble approach
represents a novel integration of conceptually distinct PRs within a general framework, we
argue that it has the potential to transform the way researchers and investors approach
portfolio choice in the future. Rather than seeking an overall superior PR relative to
competing approaches, researchers can instead focus on PRs that provide complementary
information to the ensemble, with the goal of increasing economic gains.
2 Relation to the literature
Our work relates to two main strands of the portfolio choice literature. First, our work
shares common ground with combination approaches of PRs. Along this line, Kan and
Zhou (2007), Tu and Zhou (2011) and Kan et al. (2022) developed combination strategies
6
to maximize expected OOS performance under estimation risk for MV portfolio choice
problems. Kan and Zhou (2007) derived an optimal three-fund rule consisting of the
risk-free asset, the sample MV portfolio, and the sample minimum-variance portfolio,
that maximizes the expected OOS utility. Based on the intuition that a simple method
and a sophisticated method could optimize the bias-variance trade-off, Tu and Zhou
(2011) combine this three-fund portfolio with the 1/N rule. Kan et al. (2022) explore the
case when there is no risk-free asset available, and Lassance et al. (2023) also consider
combining the 1/N combination with the sample MV portfolio.
The optimal combination rules derived by the works cited above have provided
valuable analytical insights into portfolio construction under estimation error by relying
on the assumption that asset returns are identically and independently (iid) multivariate
normally distributed. In contrast, our proposed framework makes no assumptions about
the return generating process. More importantly, our approach is not restricted to PRs
of certain designs, such as, e.g., MV-based rules, and allows the combination of PRs
that could not be combined with existing methods due to their heterogeneity. Our
method is not a competing approach to these works, but is complementary to them:
combination methods themselves represent PRs and can be included as candidate PRs in
our combination approach, provided that they are applicable to the investment problem
at hand. We include previously proposed combination methods, such as those proposed
by Kan et al. (2022), in our empirical analysis and show that they produce improved
performance when combined with additional PRs; see Section 5for a detailed discussion
how our approach relates conceptually and empirically to existing PRs.
Paye (2012) considers combinations of PRs as one possible strategy to reduce estimation
risk associated with MV approximations to the economic value of PRs under general
utility specifications such as power utility. He finds that combining estimators can
significantly reduce estimation risk. Paye (2012) determines the combination weights
based on a resampling approach, assuming iid returns, and considers equal combination
weights as an alternative. Although it is not the focus of our paper, we consider, among
7
others, PRs based on MV approximations as candidate PRs for power utility optimization.
While MV approximations may be valuable because estimates of higher moments are not
available in some settings, the quadratic utility underlying MV preferences has some
counterintuitive properties such as increasing absolute risk aversion. Thus, in addition to
taking into account preferences about higher-order moments, evaluating PRs based on
MV in a power utility framework is also desirable because of the more intuitive properties
of power utility. An additional combination approach has been suggested by Pettenuzzo
and Ravazzolo (2016). While Pettenuzzo and Ravazzolo (2016) propose to combine
predictive densities based on their weighted individual past performance, we combine PRs
based on their jointly generated past performance.
Overall, we push the boundaries of combining PRs by providing a framework that
(i) is designed to mitigate estimation risk by avoiding estimation of the PRs’ return
(co-)moments; (ii) can combine PRs of arbitrary design; (iii) incorporates additional
appealing features such as adaptive learning about the combination weights, an ensemble
perspective for assigning combination weights, and a direct focus on the investor’s utility.
Second, our work is related to approaches that directly optimize economic utility
rather than taking a two-step approach with the need for estimated moments of returns.
These include parametric portfolio policies (Brandt et al.,2009;DeMiguel et al.,2020),
a boosting approach (Nevasalmi and Nyberg,2021), a subset combination approach
(Maasoumi et al.,2022), a genetic programming approach (Liu and Zhou,2024), and
an approach that using deep reinforcement learning (Cong et al.,2024). The above
mentioned techniques involve optimizing economic utility for specific portfolio choice
problems at the individual asset level, while our approach is about maximizing utility one
level up by combining PRs. Our work also complements these approaches in that they
can be included as candidate PRs and combined with other PRs in our framework.
8
3 Methodology
3.1 Basic structure
Suppose that we have a set of
M
candidate PRs, indexed as
m
= 1
, . . . , M
. For a typical
point in time
s
, each PR assigns weights to the
N
assets, indexed as
n
= 1
, . . . , N
,
5
based
on information observed through
s−
1, the date of portfolio construction. We denote the
(exogenously) assigned asset weights of the
m
-th PR for time
s
as
ω(−s)
m,s
, where
ω(−s)
m,s
is an
N×
1column vector,
ω(−s)
m,s,1, . . . , ω(−s)
m,s,N 0
. The superscript (
−s
)indicates that
information revealed at time
s
is not available to determine the portfolio allocation at
time s−1.
The
N×
1column vector of gross asset returns measured over the period [
s−
1 :
s
](i.e.,
one month in our applications) is given as
e
Rs
=
e
Rs,1,..., e
Rs,N 0
, where
e
Rs,n
= 1 +
ers,n
,
6
and
e
rs
=
(ers,1,...,ers,N )0
. Then, the pseudo OOS gross return of the
m
-th PR at time
s
can be expressed as:
Rm,s =ω(−s)
m,s
0e
Rs. (1)
The investor’s optimization problem is to maximize the conditional expected utility
of the portfolio’s (gross) return
Rp,t
based on the information through time
t−
1as a
function of the combination weights {wm,t}M
m=1 assigned to the PRs:
arg max
{wm,t}M
m=1
E
t−1[U(Rp,t)] = arg max
{wm,t}M
m=1
E
t−1"U M
X
m=1
wm,tRm,t !#,(2)
where
U(·)
denotes utility. The conditional expectation of the utility jointly generated by
the returns of the PRs, as expressed in Equation (2), cannot be computed in general,
unless one imposes additional restrictions on the PRs’ return generating process. As a
workaround, we technically treat the combination weights as constant through time. Thus,
5
In this paper, we only consider PRs that allocate across the same investment opportunity set.
However, our framework allows for PRs that allocate across different investment opportunity sets with
partial or no overlap of assets. For example, one PR could allocate across different stocks, while another
PR could allocate across commodities.
6
Depending on the portfolio choice problem at hand, returns can be defined as raw (total) returns or
excess returns.
9
the combination weights that maximize the conditional expected utility at a given point
in time are the same for all previous times, and we can thus rewrite Equation (2) as an
unconditional optimization problem. Suppose we are at time
t−
1and have access to a
track record of pseudo OOS returns generated by the PRs, spanning the interval between
τ
and
t−
1. We can then replace the expected utility in Equation (2) with its sample
counterpart, i.e., the sum of period-by-period realized utilities. The optimization problem,
then, becomes:
w∗
t= arg max
{wm}M
m=1
t−1
X
s=τ
U M
X
m=1
wmRm,s!, (3)
where
wt
=
(w1,t, . . . , wM,t)0
and
w∗
τ:t
=
w∗
t
. This unconditional formulation of the
optimization problem bypasses the need to estimate (co)moments of the PRs’ returns,
similar to the framework of Brandt et al. (2009).7
If there is some persistence in economic states, more recent data are likely to contain
more relevant predictive information than older data because they come from a more
similar market or economic environment. To account for such plausible economic dynamics,
we allow realized joint utilities to receive different weights in the optimization. Specifically,
we maximize the weighted past performance jointly generated by the PRs:
w∗
t= arg max
{wm}M
m=1
t−1
X
s=τ
αt−1−s·U M
X
m=1
wmRm,s!, (4)
subject to
M
X
m=1
wm= 1; wm≥0, m = 1, . . . , M, (5)
where
α
denotes a (fixed) forgetting factor to weight past profitability, and the constraints
(5) impose a convex combination of the candidate PRs. By allowing exponential
down-weighting of past performance (and repeating the optimization at each point
in time), we select the combination weights in an adaptive manner. In this way, we
7
As discussed earlier in Section 2, while we share common ground with Brandt et al. (2009) in this
regard, our framework differs from theirs in that our goal is to allocate combination weights across PRs
rather than to estimate coefficients associated with asset characteristics and map these coefficients to
individual asset weights.
10
incorporate the possibility to learn about the relative strengths of the candidate PRs
over specific time periods, allowing for more rapid weight changes than in the standard
unweighted formulation in (3).
8
We will discuss the forgetting factor and the weight
constraints in more detail in Sections (3.3) and (3.4), respectively.
For an investor with power utility preferences,
9
we can state the optimization problem
(4) more specifically as:
w∗
t= arg max
{wm}M
m=1
t−1
X
s=τ
αt−1−s·M
P
m=1
wmRm,s1−γ
1−γ, (6)
where
γ
denotes the relative risk aversion coefficient.
10
Alternative utility functions
are possible in our framework. In particular, our approach can be used with any utility
function that can be formulated as the discounted sum of additive utilities. For example,
one could use a mean-variance utility function with discounting or a downside risk
formulation.
For certain purposes such as executing trades and calculating transaction costs, it is
necessary to know the asset weights that result from combining the PRs. With the
optimized combination weights in hand, we can back out the implied weights of the
N
assets. These weights
ω∗
s
are linear combinations of the asset weights determined by the
8
The discounting we use may be reminiscent of reinforcement learning. However, in reinforcement
learning, future rewards are discounted, whereas in our approach, past utility is discounted to reduce its
influence on the optimal combination weights relative to more recently generated utility. Furthermore,
our optimization approach is technically defined as a one-period problem, where, in each period, we
maximize the (discounted) utility up to the given point in time. Reinforcement learning seems promising
for dynamic optimization problems, especially if one takes a largely data-driven approach. As discussed in
Section 2, PRs based on reinforcement learning at the asset level can be added to the pool and combined
with heterogeneous PRs.
9
By assuming power utility preferences at the level of combining PRs, preferences about higher-order
moments and tail risk properties are taken into account. This is true even if the candidate PRs in the
library do not optimize for power utility but, instead use MV approximations, or if the allocation does
not rely on an optimization framework at all. If we required candidate PRs to optimize power utility in
the first place, we would reject a large portion of promising PRs from the outset. PRs that are not
designed to maximize power utility preferences could still contribute to the ensemble.
10
We note that power utility fails to exist if the gross return approaches zero. That is,
U−→ −∞
if
R−→
0. The PRs we use in our empirical work avoid extreme returns and, hence, this is empirically not
a concern. To theoretically ensure that power utility exists, we had to restrict to candidate PRs that put
appropriate constraints on the asset weights.
11
PRs (summarized in the matrix Ωs) and the optimized combination weights w∗
s:
ω∗
s
[N×1]
=Ωs
[N×M]
·w∗
s
[M×1]
, (7)
where
Ωs=
ω(−s)
m=1,s,n=1 ... ω(−s)
m=M,s,n=1
.
.
.....
.
.
ω(−s)
m=1,s,n=N. . . ω(−s)
m=M,s,n=N
.
The usefulness of removing individual asset weights is easily seen when the positions
for a given asset implied by the candidate PRs partially or fully offset each other. An
execution desk trades the individual asset positions as implied by the combination, rather
than the implied positions of different PRs individually, thereby saving transaction costs.
3.2 Our combination framework as a stacking algorithm
Our proposed combination (6) can be classified as a stacking algorithm. Stacking is a
well-studied meta-learning algorithm for combining estimators in the machine learning
and statistics literature (Wolpert,1992;LeBlanc and Tibshirani,1996;Breiman,1996;
Yang,2001;Van der Laan et al.,2007;Polley and Van Der Laan,2010). Stacking
algorithms have been developed to minimize cross-validated risk defined by some statistical
criterion. We adapt this method to maximize cross-validated utility instead of optimizing
some statistical loss criterion and extend it to allow exponential down-weighting of
older performance to obtain combination weights based on the local performance of
(combinations of) PRs.
Stacking is an ensemble method; that is, it assesses the cross-validated risk/utility of
the combined candidate estimators (here, the PRs) rather than assessing their risk/utility
from a stand-alone perspective. Thus, the combination weights assigned according
to (6) are based on an ensemble perspective that implicitly accounts for time-varying
interdependencies among PR returns into account. With power utility preferences, the
entire joint distribution of PR returns is automatically used in (6) to maximize the
12
combination weights.
Another important feature of stacking is that it uses cross-validation to avoid
overfitting. Our combination is based on pseudo OOS returns. To account for the
time-series structure of the data, standard K-fold cross-validation cannot be applied. Our
approach is similar to leave-one-out cross-validation by omitting information revealed at
time
s
for portfolio construction at time
s−
1; see (1). Figure 1illustrates the general
mechanism of leave-one-out cross-validation. In our context, at each point in time and for
a given PR, the blue dots represent the information used for the next period’s portfolio
allocation, and the red dots represent the implied pseudo OOS (gross) returns. Our
approach is based on maximizing the utility generated by the red dots.
Figure 1: Schematic illustration of leave-one-out cross-validation. The illustration is adopted
from Hyndman and Athanasopoulos (2018).
If we were to include information revealed at time
s
for allocation at time
s−
1, the
resulting returns would be in-sample returns. In such an environment, the combination
would typically assign all weight to the PR with the highest in-sample returns. However,
PRs with high in-sample returns may produce poor OOS results due to overfitting.
Stacking is a genuine combination rather than a selection method. This means
that, even asymptotically, positive combination weights can be spread among different
PRs instead of being assigned to the most successful PR in the library. This feature is
attractive for the realistic case where none of the candidates in the library captures the
true data generating process. However, if a single candidate PR dominates any possible
combination, that PR will receive the entire weight. Therefore, selection is nested as a
13
special case.
Although stacking does not impose any restrictions on the combination weights per se,
convex combinations of estimators have been found to provide greater stability of the final
estimator (see, e.g., Breiman,1996;Van der Laan et al.,2007).
Stacking algorithms have a strong statistical foundation. Under certain conditions,
Van der Laan et al. (2007) established their asymptotic oracle performance, which means
that the learning algorithm asymptotically performs exactly as well (with respect to the
defined evaluation criterion) as the best possible ex-post choice for a given data set
among the set of weighted combinations of the estimators. Beyond these theoretical
results, learning algorithms based on stacking have been shown to be adaptive and robust
estimators for small samples in both artificial and real data sets (Wolpert,1992;Breiman,
1996;LeBlanc and Tibshirani,1996;Van der Laan et al.,2007;Polley and Van Der Laan,
2010). In most cases, they perform as well or even better than the ex-post best candidate
estimator. As a stacking algorithm, our combination framework relies on a methodology
with excellent statistical properties.
3.3 The forgetting factor
The exponential forgetting factor
α≤
1in (6) emphasizes the recent history of past
performance. In our empirical work, we adaptively choose the value of
α
from the
grid
S
α
=
{0.90:0.01:1.00}
. Data-adaptive estimation of the forgetting factor follows
Giraitis et al. (2013), who finds that exponential down-weighting with a data-driven
forgetting factor to be the most robust choice for accounting for structural change in time
series across extensive simulations and empirical exercises. Beckmann et al. (2020) find
substantial empirical gains from data-adaptive estimation of the exponential discount
factor in the context of adaptive model choice.
The lower the value of
α
, the more we down-weight performance in the more distant
past. For example, when working with monthly data, if
α
= 0
.
99, economic utility three
years ago will receive about 70% as much weight as economic utility last month. We take
14
α
= 0
.
90 as the lower limit of the grid since this value implies extremely fast forgetting: if
α
= 0
.
90, utility three years ago gets only about 2% as much weight as utility last period.
The effective window size is 1/(1 −α)and thus 10 months in case of α= 0.90.11
FLEXPOOL denotes the combination where the value of
α
is determined in each
period from the grid
S
α
. In the empirical analysis, we consider two additional benchmark
combinations: the first one is STATPOOL, where we set
α
= 1. The second one assigns
equal weights to the PRs, regardless of their past performance.
By allowing the down-weighting of older performance, the combination weights can be
quickly adjusted to changing environments if empirically justified. Another advantage of
the forgetting factor modeling is that changing dynamics are parsimoniously captured by
using a single parameter. This makes the approach less prone to estimation error than
parameter-rich alternatives such as regime-switching models. For each point, we choose
the optimal time-dependent value
αt
from the grid
S
α
as the one that has produced the
highest utility in the past from τ∗to t−1:
α∗
t= arg max
α∈S
α
t−1
X
s=τ∗
Uhw∗
t−1(α)0Rsi, (8)
where
τ∗
=
τ
+
τ0
, and
τ0
denotes the number of observations set aside for initial
optimization of the combination weights,
Rs
=
(R1,s, R2,s , . . . , RM,s )0
, and
w∗
t−1(α)
denotes the optimal combination weights according to (6), conditional on a given value of
α
. Note that we use down-weighting when maximizing the combination weights in (6) for
a given value of
α
. However, we do not use down-weighting to choose between different
values of αbased on the recursive evaluation in (8).12
Rolling windows can be seen as an ad hoc alternative to exponential down-weighting
for accommodating structural breaks by allowing rapid changes in combination weights.
11
Note that
∞
X
s=0
αs
=
1
1−α
for
α <
1. The upper bound
α
= 1 implies no down-weighting of older data,
so the standard recursive window estimation is nested as a special case.
12
The use of recursive window estimation to choose between different values of the forgetting factor
follows Beckmann et al. (2020) and Adämmer and Schüssler (2020), among others, and is guided by our
desire to keep the framework as parsimonious as possible. Adding another forgetting factor to choose
between different values is easily done (Bernaciak and Griffin,2024).
15
However, exponential down-weighting with forgetting factors estimated from the data is a
more sophisticated and robust choice.
3.4 Weight restrictions and additional regularization
As mentioned above, one motivation for imposing a convex combination of PRs is the
stability of the stacking algorithm. Another motivation is that we want to ensure that any
restrictions on asset weights imposed at the level of the candidate PRs (e.g., no short
selling, restrictions on sector weights, etc.) also hold at the level of the combined PRs.
Although our proposed combination approach uses pseudo OOS returns and is
parsimoniously parameterized, the estimation risk of the combination weights could still
be a concern in finite samples. There is no guarantee that the optimized combination
weights will outperform simple benchmarks such as equally weighted PRs. With finite
samples, it is not necessarily the case that the more candidate PRs are included, the
better the performance. Increasing the number of candidate PRs increases the potential
for further diversification gains, but also increases the estimation risk due to the increased
number of combination weights. The number of PRs that should be included in the
library is an empirical question and will depend on the return-risk profiles of the included
PRs and the interdependence structure of their returns. Regarding the estimation risk of
the combination weights, it is convenient that the regularization can be applied directly to
the combination weights in our framework.
As a regularization strategy, one can, for example, impose an
`0
-constraint (9) on the
combination weights:
kwk0=k, (9)
with
kwk0
=
M
X
m=1
1
(wm6= 0)
counts the number of the non-zero combination weights, and
1
(·)
denotes the indicator function. The tuning parameter
k
,
k≤M
, controls the size
of the subset of PRs that are combined. Empirically, the tuning parameter
k
can be
16
determined by the researcher in a data-adaptive manner by choosing the value of
k
that
would have maximized the pseudo-real-time OOS performance. One could go a step
further in terms of regularization and even avoid estimating combination weights by
imposing, in addition to the constraint (9), that all non-zero weights are equal by using:
wm∈0,1
k,m= 1, . . . , M, (10)
The use of equally weighted subsets of PRs, i.e., the use of constraint (10) in conjunction
with constraint (9), is motivated by the empirical success of equally weighted subsets in
combination approaches related to forecast combinations; see, e.g., Dong et al. (2022).
13
As
a computationally simpler regularization alternative , the
`1
-constraint can be used instead
of the
`0
-constraint (9). When the
`1
-constraint is combined with the constraint that all
non-zero weights are equal (10), this regularization strategy is the partially-egalitarian
lasso proposed by Diebold and Shin (2019) and can be regarded as a shortcut to equally
weighted subsets. In principle, there is no upper bound on the number of PRs that can be
considered.
In the main analyses of our empirical applications with five to six PRs, we imposed
only convex combination weights; see constraint (5). Economic gains increased on average
with the number of PRs included; see Figure 4for our application to a cross-section of the
largest 50 stocks and Figure 7for the market timing application. We also experimented
with the additional weight constraints (9) and constraint (9) in conjunction with constraint
(10), where we chose
k
in a pseudo-real-time, data-driven manner. We did not find any
empirical gain from using additional regularization in our empirical settings, but it may
be very beneficial in applications with more candidate PRs.
13
A complementary robustness strategy to additional weight constraints could be to use bootstrapped
data up to a given point in time to estimate combination weights at that point in time; see, e.g.,
Bonaccolto and Paterlini (2020) and Kazak and Pohlmeier (2023). While such a strategy could still
generate adaptive combination weights, the ability to rapidly change the combination weights would
be partially lost. However, block bootstrap methods could be extended to allow for exponential
down-weighting performance longer ago by introducing an additional tuning parameter.
17
4 Empirical work
4.1 Application to a cross-section of stocks
4.1.1 Investment universe and empirical study design
The investment universe in this application consists of the largest 50 US stocks. Their
monthly excess returns are constructed from CRSP data. We use data from 1957
:
01 to
2020
:
12 and only include stocks listed on the NYSE, NASDAQ, or AMEX with a code
of 10 or 11. At the beginning of a given month
t
, the largest 50 stocks (in terms of
market value) with non-missing monthly returns in the previous 120 months make up the
investment universe.
14 15
Note that the largest 50 stocks may change from month to
month, so the investment universe is dynamic.
Each candidate PR that we maintain in our library must assign weights to the
50 stocks at the beginning of each month. We get the first OOS portfolio returns in
1967
:
01. We reserve the first 60 OOS returns for the initial optimization of the PR
weights according to (6) and another 60 months for the initial tuning of the forgetting
factor
α
according to (8). Our first OOS evaluation takes place in January 1977. We
then go forward and run the optimization based on an extended sample of 61 OOS
portfolio returns and choose the value of the forgetting factor also based on an additional
observation. We proceed recursively and end up with an evaluation sample that spans the
period from 1977
:
01 to 2020
:
12 period. We consider a power utility investor with a
relative risk aversion of γ= 3. Our setup only considers risky assets. If we wanted to
include a risk-free asset, we could do so by adding a candidate PR that is represented by
a vector of zeros, since the returns in this application are defined as excess returns.
4.1.2 Candidate PRs
We consider the following five candidate PRs:
14
In the (rare) cases where a stock’s return is missing for month
t
, we set the excess return to zero.
15The choice of a rolling estimation window of 120 months follows, among others, DeMiguel et al.
(2009) and Kan et al. (2022).
18
•1/N:
This PR assigns equal weights to all assets. The 1/N rule does not use any sample
information and thus avoids estimation error. It has been found to outperform a
wide range of estimated optimal portfolios in many empirical data sets (DeMiguel
et al.,2009;Yuan and Zhou,2022).
•Volatility timing (VOLTIME):
Kirby and Ostdiek (2012) propose a volatility timing strategy where the weights are
given by:
ωt+1,n =1/bσ2
t+1,nη
N
X
n=1 1/bσ2
t+1,nη
,n= 1, ..., N , (11)
where
bσ2
t+1,n
denotes the estimated conditional variance of the
n
-th risky asset
at time
t
+ 1, using a rolling window of past returns from
t−
119 to
t
. This PR
ignores any sample information about conditional means and covariances. The
tuning parameter
η≥
0controls timing aggressiveness. Kirby and Ostdiek (2012)
consider the values η= 1,2, and 4. We set η= 4.
•
Maximizing expected OOS utility: Kan et al. (2022) developed combination
portfolios which have the highest expected OOS utility for a MV investor in a
setting without a risk-free asset. Their proposed approach is to combine the GMV
portfolio with a sample zero-investment portfolio, taking into account estimation risk
to control the exposure to the sample zero-investment portfolio. When the exposure
to the sample zero-investment portfolio is set to zero, the GMV is nested as a
special case. The combination method proposed by Kan et al. (2022) can be used
together with refined estimates of expected returns and expected (co-)variances to
form optimal portfolios by using shrinkage estimators or the single factor structure.
We consider the following two specifications of their approach:16
–
Kan et al. (2022) combined with MacKinlay and Pástor (2000) (KWZ - MP):
16
We use estimation windows of 120 months and set the risk aversion coefficient to 3in both PRs.
19
MacKinlay and Pástor (2000) exploit the implications of an asset pricing model
with a single risk factor for estimating expected returns. This reduces the
number of parameters that need to be estimated, thereby reducing estimation
risk.17
–Kan et al. (2022) combined with Ledoit and Wolf (2004) (KWZ - LW):
Ledoit and Wolf (2004) propose a shrinkage estimator of the covariance matrix
that involves a linear combination of the sample covariance matrix and the
identity matrix.18 19
•Galton-Shrinkage (GALTON):
Barroso and Saxena (2022) propose a shrinkage estimator that uses the structure
of past OOS forecast errors to correct the expected returns and expected (co-)
variances as inputs for portfolio optimization. They use the cleansed inputs to
compute the Galton MV portfolio whose weights are the result of simple Markowitz
optimization applied to the corrected inputs.20
The key formula for correcting the optimization inputs is:
Zt=bg0+bg1Zt−1, (12)
where, for each variable
Z
of interest (mean returns, variances or pairwise correla-
tions),
Zt−1
denotes its historical estimate at time
t−
1computed from a rolling
window of 60 observations.
Zt
is the cleansed portfolio input for
t
. Fama-MacBeth
regressions are used to estimate the Galton shrinkage coefficients
g0
and
g1
for the
means, variances and pairwise correlations. This is done using a large estimation
universe consisting of the set of the largest 500 US stocks at each point in time
17The weights for this rule are given by Equation 51 in Kan et al. (2022).
18The weights for this rule are given by Equation 43 in Kan et al. (2022).
19
KWZ-MP and KWZ-LW are combination rules based on the assumption that returns are identically
and independently multivariate normally distributed. While this assumption may not be literally true,
these PRs may still contribute to the ensemble. Our data-driven framework can measure the extent to
which these PRs provide incremental empirical value to other candidates in the pool.
20The weights for this rule are computed according to Equation 7in Barroso and Saxena (2022).
20
from which to learn.
21
To run the Fama-MacBeth regressions, we set a window of
12 ex-post realizations, and to initialize the Galton coefficients, we set aside one
additional learning period of 108 months.22 23
The slope coefficient in (12) controls the amount of shrinkage. At one extreme, if
its estimated value is 1, the corrected input is equal to the historical estimates,
i.e., the uncorrected estimates. At the other extreme, if its estimated value is 0,
the historical estimates are found to be completely unreliable, and the corrected
inputs are set to the grand mean of the returns, variances or pairwise correlations
observed up to time
t
.
24
Let
g1,mean
,
g1,var
and
g1,corr
denote the Galton slope
coefficients for the means, variances, and correlations, respectively. Different
extreme values of
g1,mean
,
g1,var
, and
g1,corr
produce well-known strategies as special
cases, namely the 1/N portfolio for
g1,mean
=
g1,var
=
g1,corr
= 0, the sample
GMV for
g1,mean
= 0
, g1,var
=
g1,corr
= 1, and the sample Markowitz portfolio for
g1,mean =g1,var =g1,corr = 1.
4.1.3 Baseline results
Table 1shows the results for the candidate PRs and the combined PRs. Given our focus
on economic utility, the certainty equivalent return (CER) seems to be a natural choice for
measuring portfolio performance. We report (monthly) CER values
25
without transaction
costs as well as with proportional transaction costs (CER
T C
) of 20 basis points (bps).
26
We further report the (monthly) Sharpe ratio without transaction costs (SR) and with
21
Barroso and Saxena (2022) consider both larger and smaller estimation universes and find similar
results.
22
See Equations 9to 13 in Barroso and Saxena (2022) for details on estimating the Galton coefficients.
23In the notation of Barroso and Saxena (2022), H= 60,E= 12 and L= 108.
24Note that we restrict the slope coefficients to lie between 0and 1.
25CER values are computed over the evaluation sample from τ∗∗ to Tas
CE R =((1 −γ)1
T−τ∗∗ + 1
T
X
s=τ∗∗
Uw∗
s(α∗
s)0Rs)1
1−γ
−1.(13)
26
The choice of 20 bps follows Kan et al. (2022). In the Appendix Cwe report results for transaction
costs of 50 bps and explore a strategy to mitigate transaction costs.
21
proportional transaction costs (SR
T C
) of 20 bps.
27
As a measure of downside risk, we
report the maximum drawdown (after transaction costs of 20 bps (MaxDD
T C
)), given its
relevance for asset managers and fiduciaries as emphasized, by van Hemert et al. (2020),
among others. Finally, we report the average monthly turnover (Avg. TO).
The main results from Table 1can be summarized as follows. FLEXPOOL generated
the highest CER value and Sharpe ratio before and after transaction costs, both compared
to each candidate PR and compared to the alternative combination schemes. FLEXPOOL
performed significantly better than STATPOOL in terms of CER values and the SR.
This finding illustrates the importance of emphasizing recent utility when assigning
combination weights. Distant utility was substantially down-weighted throughout the
sample with a forgetting factor
α
that varied between 0
.
93 and 0
.
96 (see Figure 2).
FLEXPOOL also generated considerably higher utility gains than equally weighted PRs.
In terms of downside risk, VOLTIME had the lowest maximum drawdown (0.3688) and
FLEXPOOL is a close second (0.3790).
If we compare FLEXPOOL with each candidate PR and alternative combined PRs,
the outperformance in terms of CERs and Sharpe ratios is significant at least at the 10%
level according to the test of Diebold and Mariano (1995) and the test of Ledoit and Wolf
(2008), respectively.
27
CER values are more appropriate as an evaluation metric than the Sharpe ratio in our power utility
framework, which aims to exploit time-varying investment opportunities; see, e.g., Bianchi and Guidolin
(2014). Nevertheless, we report the Sharpe ratio because of its popularity in evaluating the performance
of asset allocation strategies.
22
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
1
Selected valua of
Figure 2: Evolution of the selected value of the forgetting factor αin FLEXPOOL.
What combination weights were assigned to the different PRs and how did they
change over time? Figure 3provides the answers. The subplot in the upper left corner of
Figure 3shows the weight shares of the PRs averaged over the evaluation sample. The
blue (red) bars represent the weight shares of FLEXPOOL (STATPOOL). The remaining
subplots show the evolution of the PR weights over time. The blue (red) lines show the
combination weights of FLEXPOOL (STATPOOL). STATPOOL essentially split the
combination weights between GALTON and KWZ-MP, while the weights in FLEXPOOL
were broadly distributed, with weight shares between 13
.
68% (GALTON) and 25
.
09%
(KWZ-MP) over the evaluation sample. Interestingly, GALTON received the lowest
average weight in FLEXPOOL, even though it was the candidate PR with the highest
CER value over the evaluation sample. This result is a manifestation of the ensemble
view, where (time-varying) interdependencies among PR returns are taken into account.
The optimal combination weights of STATPOOL are more persistent than those of
FLEXPOOL, where the combination weights change rapidly and where often the entire
weight is assigned to one candidate PR. For example, VOLTIME received a high weight
after the burst of the dotcom bubble and its aftermath as well as during the subprime
crisis. In the relatively calm period of the mid to late 1990s, the 1/N rule prevailed. Next,
23
we conduct more in-depth analyses to shed more light on the mechanisms at work that
produce the utility gains of FLEXPOOL.
Table 1: Summary of results for a cross-section of the 50 largest assets.
The table reports our results for the evaluation sample from 1977
:
01 to 2020
:
12. It includes monthly CER
values without transaction costs and with proportional transaction costs (CER
T C
) of 20 bps for a
power utility investor with relative risk aversion of
γ
= 3. As an additional performance measure, the
table shows the monthly Sharpe ratio before transaction costs (SR) and after 20 bps of proportional
transaction costs (SR
T C
), and the maximum drawdown for the transaction-adjusted returns (MaxDD
T C
).
Avg. TO is the average turnover of the evaluation sample.
Candidate PRs CER CERT C SR SRT C MaxDDTC Avg. TO
1/N 0.0035 0.0033 0.1479 0.1442 0.5410 0.0782
VOLTIME 0.0051 0.0049 0.1954 0.1898 0.3688 0.1015
KWZ-MP 0.0046 0.0041 0.1790 0.1663 0.4976 0.2464
KWZ-LW 0.0045 0.0034 0.1772 0.1469 0.5037 0.5717
GALTON 0.0052 0.0046 0.2029 0.1853 0.4608 0.3100
Combined PRs
FLEXPOOL 0.0068 0.0060 0.2390 0.2168 0.3790 0.4184
STATPOOL 0.0045 0.0040 0.1832 0.1681 0.4758 0.2650
EQUAL WEIGHTS 0.0050 0.0046 0.1992 0.1878 0.4023 0.1964
24
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.2
0.4
0.6
0.8
11/N
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.2
0.4
0.6
0.8
1VOLTIME
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.2
0.4
0.6
0.8
1GALTON
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.2
0.4
0.6
0.8
1KWZ-MP
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.2
0.4
0.6
0.8
1KWZ-LW
Weight Shares
1/N
VOLTIME
KWZ-MP
KWZ-LW
GALTON
0
0.2
0.4
0.6
Figure 3: Combination weights.
The subplot in the upper left corner shows the weight shares of FLEXPOOL (blue bars) and STATPOOL
(red bars), averaged over the evaluation sample from 1977
:
01 to 2020
:
12. The remaining subplots show
the evolution of the combination weights of the candidate PRs. The blue (red) lines show the combination
weights in FLEXPOOL (STATPOOL).
4.1.4 Deeper analyses
Relationship between the number of combined PRs and economic utility
How does the performance of FLEXPOOL depend on the number of PRs combined?
So far, we have only reported the results for the case where we combine all five PRs
considered. How would the results change if we combined subsets of two, three or four
PRs? Figure 4shows the CER values as a function of the number of PRs combined. The
blue diamonds indicate the generated CER value produced by a particular subset of
combined PRs. For example, in the case of the subset with two combined PRs, there are
5
2
= 10 possible combinations. The red square shows the average CER value for a given
number of combined PRs. Figure 4illustrates that, on average, the CER values increase
with the number of combined PRs.
25
12345
Number of combined PRs
0.0030
0.0035
0.0040
0.0045
0.0050
0.0055
0.0060
0.0065
0.0070
CER
Figure 4: CER values as a function of the number of combined PRs using FLEXPOOL.
The blue diamonds show the generated CER values of all possible combinations for a given number of
combined PRs. The red square represents the average CER value for a given number of combined PRs.
The highest CER value (0
.
0058) in the subset of two PRs is achieved by the
combination of GALTON and KWZ-LW, the lowest performance (0
.
0032) is generated by
the combination of the 1/N rule with KWZ-LW. In the subset with three rules, the
highest CER value (0
.
0069) is achieved by the combination of VOLTIME, KWZ-MP and
KWZ-LW. The lowest performance (0
.
0047) in the subset of three combined rules is
generated by the 1/N rule, GALTON and KWZ-MP. In the subset with four combined
PRs, the highest CER (0
.
0068) is obtained by omitting the 1/N rule, and the lowest
performance (0
.
0054) is obtained by omitting GALTON. Note that the CER value for all
subsets with four PRs is higher than that of the best candidate PR (0.0052).
The finding that the CER values increase on average with the number of PRs combined
illustrates the benefits of diversification across more than just two PRs. The benefits of
combination are particularly encouraging given the positively correlated returns of the
PRs with an empirical correlation coefficient ranging from 0.64 (for 1/N with KWZ-MP)
to 0.84 (for GALTON with VOLTIME).
Predictive power and risk management
Each PR, regardless of how it is constructed, provides a record of asset weights and
26
implied OOS returns. To delve deeper into the mechanics of our combination framework,
we exploit the record of asset weights by analyzing their relationship to the implied OOS
returns. Specifically, based on Frahm (2015), we look at statistics on the predictive power
and risk management of the candidate and combined PRs. As a proxy for predictive
power, we estimate Spearman’s rank correlation coefficient
bρSP ω∗∗,e
e
r
as a robust
correlation measure, where
ω∗∗ =
ω∗
1977:01
.
.
.
ω∗
2020:12
and e
e
r=
e
r1977:01
.
.
.
e
r2020:12
,
and where
ω∗∗
denotes the asset weights implied by the PRs and the combination
weights computed according to (7),
28
stacked from the beginning to the end of the
evaluation sample. With
N
= 50 assets and an evaluation sample of 528 months (1977
:
01
to 2020
:
12), the vector
ω∗∗
has length 50
×
528 = 26
,
400. Similarly,
e
e
r
denotes the stacked
pseudo OOS (excess) returns generated by the N= 50 assets.
The intuition behind the rank correlation
bρSP
(
ω∗∗,e
e
r
)is the following: a PR will
assign a positive (negative) weight if the expected return of an asset is positive (negative).
Thus,
bρSP ω∗∗,e
e
r
approximates the overall predictive power of a PR. A high positive
correlation indicates high predictive power.
With respect to risk management, proxied by a PR’s ability to control the variance of
its returns, a PR assigns a low (high) squared weight
ω∗,2
s,n
to the
n
-th asset if the
n
-th
asset’s expected squared return is high (low) for time
s
. Similarly, for a pair of assets
p
and
q
(
p6
=
q
), a PR takes a high (low) cross-exposure
ω∗
s,pω∗
s,q
when the product of the
associated asset returns is expected to be low (high). Based on this intuition, we compute
Spearman’s rank correlation bρSP ω∗∗∗,e
e
e
r, where
28
For candidate PRs and the equally weighted benchmark combination, the optimal weights
w∗
s
in (7)
are replaced by assigning the full weight to the said candidate PR or equal weights, respectively.
27
ω∗∗∗ =
ω∗
1977:01,n=1 ×ω∗
1977:01,n=1
.
.
.....
.
.
ω∗
1977:01,n=1 ×ω∗
1977:01,n=50
.
.
.....
.
.
ω∗
1977:01,n=50 ×ω∗
1977:01,n=50
.
.
.....
.
.
ω∗
2020:12,n=1 ×ω∗
2020:12,n=1
.
.
.....
.
.
ω∗
2020:12,n=1 ×ω∗
2020:12,n=50
.
.
.....
.
.
ω∗
2020:12,n=50 ×ω∗
2020:12,n=50
and e
e
e
r=
er1977:01,n=1 ×er1977:01,n=1
.
.
.....
.
.
er1977:01,n=1 ×er1977:01,n=50
.
.
.....
.
.
er1977:01,n=50 ×er1977:01,n=50
.
.
.....
.
.
er2020:12,n=1 ×er2020:12,n=1
.
.
.....
.
.
er2020:12,n=1 ×er2020:12,n=50
.
.
.....
.
.
er2020:12,n=50 ×er2020:12,n=50
.
The rank correlation
bρSP ω∗∗∗,e
e
e
r
approximates the ability of a PR to control the
variance of the generated returns and can thus be seen as a proxy for risk management.
The more negative the correlation is, the better are the risk management of the PR.
Table 2summarizes the results for predictive power and risk management. FLEXPOOL
has by far the highest predictive power with an estimated rank correlation coefficient
of 0
.
0162, which is different from zero at the 1% significance level.
29
Interestingly,
FLEXPOOL achieves significant predictive power even though none of the candidate PRs
have significant predictive power when measured over the entire evaluation sample. The
key is to quickly shift the combination weights to (combinations of) PRs with local
predictive power. VOLTIME has by far the best risk management. In maximizing
economic utility, FLEXPOOL implicitly strikes a balance between predictive power and
risk management, partially sacrificing VOLTIME’s risk management to achieve better
predictive power.
29
As might be expected, the magnitudes of the correlations are fairly small, being consistent with a
low degree of predictability.
28
Table 2: Predictive power and risk management.
Spearman’s rank correlation
bρS P ω∗∗,e
e
r
approximates predictive power, and Spearman’s rank correlation
bρS P ω∗∗∗,e
e
e
r
approximates risk management. The p-values pertaining to the null that the correlation
coefficient is zero are shown in parentheses below the correlation estimates.
Candidate PRs bρSP ω∗∗,e
e
rbρSP ω∗∗∗,e
e
e
r
1/N − −
VOLTIME −0.0014
(0.8259) −0.0525
(0.0000)
KWZ-MP 0.0042
(0.4932) −0.0070
(0.2552)
KWZ-LW 0.0030
(0.6234) 0.0088
(0.1527)
GALTON 0.0074
(0.2283) 0.0076
(0.2162)
Combined PRs
FLEXPOOL 0.0162
(0.0085) −0.0141
(0.0223)
STATPOOL 0.0037
(0.5506) 0.0003
(0.9617)
EQUAL WEIGHTS 0.0064
(0.2996) −0.0126
(0.0401)
4.1.5 Additional PRs and alternative asset universe
In Appendix Awe present an application to a cross-section of the largest 500 US stocks
and include an alternative set of PRs.
4.1.6 Performance in different economic and market conditions
In Appendix Bwe provide further analyses with respect to combination weights and
utility gains in different economic and market conditions as well as subsample analysis.
29
4.2 Application to market timing
4.2.1 Investment universe and empirical study design
In this application we consider an investor endowed with power utility preferences and a
relative risk aversion of
γ
= 3 who can allocate their wealth between the S&P 500 index
and three-month US Treasury bills each month. We constrain the weight allocated to
stocks to be in the range [0; 1
.
5], thus ensuring that PR under consideration obeys these
weight constraints. Our evaluation period ranges from from 1977
:
01 to 2020
:
12. Each PR
generates its first OOS return in 1967
:
01, and we use 60 months of OOS returns for the
initial optimization of the combination weights. We reserve another 60 observations for
initial tuning of the forgetting factor
α
. CER values are calculated based on total returns
in this application.
4.2.2 Candidate PRs
We consider a diverse set of six different PRs. The first three PRs are based on strategies
that exploit Bayesian predictive densities of the next period’s excess return
y
, that is, the
return on the S&P 500 (including dividends) in excess of the risk-free rate
rf
. Bayesian
predictive densities of excess returns are attractive choices as a basis for market timing
decisions because of their ability to accommodate parameter and model uncertainty as
well as the use of time-varying parameters (TVP) and stochastic volatility (SV). In the
context of return predictability, Bayesian predictive densities have been used by Dangl
and Halling (2012), Johannes et al. (2014) and Pettenuzzo and Ravazzolo (2016), among
others. While the first three PRs in our library differ with respect to specific choices
that are relevant for computing their respective Bayesian predictive densities, we can
present them all in canonical form. These PRs solve the investment problem by directly
maximizing the conditional expected utility of the next period’s wealth Wt+1:
arg max
ωt+1∈[0;1.5]
E
tU(Wt+1)|Dt= arg max
ωt+1∈[0;1.5]Ze
R1−γ
t+1
1−γpyt+1|Dtdyt+1, (14)
30
where
p(yt+1|Dt)
denotes a Bayesian predictive density for the excess return
y
in
t
+ 1
based on the information set available at time
t
. The information set
Dt
includes the
returns and predictors that can be observed up to
t
as well as the choices of the prior in
t
= 0. Since power utility does not depend on wealth, we can set
Wt
= 1 and proceed
with the gross returns in (14). Let
e
Rt+1
denote the total gross return in
t
+ 1, where the
total return includes the excess return
y
and the risk-free rate
rf
. Let
ωt+1
be the weight
given to the risky asset for time
t
+ 1. We maximize the conditional expected utility by
approximating (14), based on
B
= 100
,
000 potential realizations
y(b)
draw,t+1
,
b
= 1
, . . . , B
,
of the excess return at t+ 1 from the predictive density p(yt+1|Dt):
arg max
ωt+1∈[0;1.5]
1
B
B
X
b=1
hωt+1 1 + rf
t+1 +y(b)
draw,t+1+ (1 −ωt+1)1 + rf
t+1i1−γ
1−γ
. (15)
We set
γ
= 3.
30
To obtain a Bayesian predictive density for the excess returns, we
have to impose some structure on the return generating process. We assume the dynamics
of the excess return to be given by TVP regression models with the following structure:
yt+1 =X0
tθt+εt+1, εt+1 ∼N(0, υt+1)(16)
θt=θt−1+ξt, ξt∼N(0,Ξt), (17)
where
Xt
denotes the vector of predictive variables observed in
t
. This vector contains,
a subset of twelve predictor variables from Welch and Goyal (2008), depending on the
specific setting.
31
Let
θt
denote the vector of (unobserved) time-varying coefficients.
30
Note that the risk aversion coefficient for candidate PRs may be different from the value of the risk
aversion coefficient used to optimize the combination weights in (6).
31
The predictors are the dividend yield, the dividend-payout-ratio, the earnings-to-price ratio, the sum
of squared daily returns on the S&P 500 index as a measure of stock variance, the book-to-market ratio,
the net equity expansion, the Treasury bill rate, the long-term government bond yields, the long-term
government bond returns, the default return spread, the default yield spread and inflation (lagged by one
additional month). We use the predictors from 1927
:
01 through 2020
:
11. We downloaded the data from
Amit Goyal’s homepage:
http://www.hec.unil.ch/agoyal/
. See Welch and Goyal (2008) for a more
detailed description of the variables.
31
The observation error
εt+1
is assumed to be normally distributed with mean zero and
(unknown) and time-varying variance
υt+1
. The time-varying coefficients are assumed to
evolve according to a multivariate random walk without drift. We initialize the coefficients
θ0with a diffuse conditional normal prior centered around zero.
The random shocks
ξ
are assumed to be multivariate normal with (unknown) and
time-varying system covariance matrix
Ξt
. Conditional on the observational variance and
the system covariance, standard Bayesian methods for state-space models using the
Kalman filter can be applied to estimate the coefficients
θt
and to compute the predictive
distribution of the returns. However, the observation variance and the system covariance
are unknown. We use a forgetting factor approach to model their dynamics, where the
value of the forgetting factor
δ
controls the dynamics of the coefficients, and the value of
the forgetting factor
κ
controls the dynamics of the observational variance. If we set
δ
= 1,
all available historical observations will be equally weighted in the updating process,
resulting in constant coefficients. If we set
δ <
1, older observations are exponentially
down-weighted. The lower we choose the value of
δ
, the more we down-weight older
observations.
Similarly,
κ
controls the dynamics of the observation variance. If we set
κ
= 1, we get
a constant variance. Using a conjugate specification with an inverse-gamma prior on the
observation variance and a conditional normal prior on the coefficients, along with fixed
values of the forgetting factors
δ
and
κ
, we obtain a t-distributed predictive density
p(yt+1|Dt)
that incorporates the uncertainty in the coefficients and the observational
variance into account. Our PRs based on Bayesian predictive densities include the
following three setups, which differ with respect to the included predictors and the
considered values of the forgetting factors δand κ:
•LARGE-TVP-SV:
This multivariate setup includes all of the twelve considered predictors from Welch
and Goyal (2008) and uses Bayesian model averaging (BMA) (Raftery et al.,1997) to
assign weights to the predictive densities, which are based on different specifications
32
of the coefficients’ dynamics. The dynamics are controlled by the value of the
forgetting factor
δ
. It is chosen from the grid
S
δ
=
{0.96; 0.97; 0.98; 0.99; 1.00}
,
including constant coefficients as a special case. Thus, the five individual models
Mj
,
j
= 1
,...,
5, in this setup are defined by different values of
δ
. Since conditional
heteroskedasticity is a well-known stylized fact for asset returns, we set the forgetting
factor
κ
= 0
.
97 for the observational variance, following the choice of RiskMetrics
T M
(J.P.Morgan/Reuters,1996) for monthly data. A priori, we assign equal weights to
the five predictive densities. After computing the weights of the predictive densities
at each point in time using Bayes’ rule, asset allocation decisions can be made
based on the mixture t-distribution using the approximation (15).
•BMA-TVP-CV:
The second setup is based on the setup suggested by Dangl and Halling (2012).
With a set of twelve available predictors, there are 212 different combinations of
predictors that are either included in or excluded from the vector of predictors
X
.
The value of the forgetting factor
δ
to control the dynamics of the coefficients
is again chosen from the grid
S
δ
=
{0.96; 0.97; 0.98; 0.99; 1.00}
. Thus, there are
5
×
2
12
= 20
,
480 different models
Mj
,
j
= 1
,...,
20
,
480, defined by different
subsets of included predictors and values of
δ
, are at disposal. Dangl and Halling
(2012) use a constant variance (CV). To mimic their choice in this regard, we set
κ
= 1
.
00. A priori, we assign equal weights to the 20
,
480 predictive densities and
update their weights using BMA.32
•UNIV-TVP-SV:
Univariate TVP-SV models are a common choice for modeling the dynamics of
aggregate stock returns (Johannes et al.,2014;Pettenuzzo and Ravazzolo,2016).
This setup uses only univariate (UNIV) predictive regression, including one of the
twelve predictors in each of the regressions, and also considers the grid
32
While this setup closely follows Dangl and Halling (2012), there are slight implementation differences.
For example, Dangl and Halling (2012) include the cross-sectional beta premium of Polk et al. (2006) as a
predictor, while we do not include it since the data are only available through 2002.
33
S
δ
=
{0.96; 0.97; 0.98; 0.99; 1.00}
for
δ
,
κ
is set to 0
.
97, and all predictive densities
are equally weighted.
The following three PRs rely on a MV framework to construct portfolios. The PRs
differ in how they compute the estimated excess return
byt+1
. The weight assigned to the
S&P 500 index is computed as follows:
ωt+1
[0;1.5]
=1
γbyt+1
bσ2
t+1 , (18)
where
bσ2
t+1
is the estimate of the variance, calculated over a rolling window of 60
months. The risk aversion γis set to 3. We consider the following PRs:
•Sum-of the-parts method (SOP):
Imposing economic constraints, Ferreira and Santa-Clara (2011) predict aggregate
stock returns as the sum of the dividend-price ratio and the long-run historical
average of earnings growth. Unlike predictive regressions, there are no parameters
to estimate and thus no estimation error.
•Combination of forecasts (CF):
Rapach et al. (2010) propose an equally weighted combination of point forecasts,
where each point forecast is based on univariate predictive regressions with constant
coefficients and one of the predictors proposed in Welch and Goyal (2008). Note
that we use monthly data, while Rapach et al. (2010) use quarterly data and
consider 15 instead of 12 predictors.
•Prevailing Historical Mean (PHM):
This PR uses the prevailing historical mean of the excess returns as a point forecast.
Our library of PRs contains heterogeneous asset allocation approaches. The PRs use
different information sets and different ways of mapping information into asset weights.
The most obvious difference between them is that some rely on (differently designed)
34
Bayesian prediction densities, while others rely on different strategies for generating point
forecasts within a MV specification.
The PRs LARGE-TVP-SV, BMA-TVP-CV, UNIV-TVP-SV and CF agree on the
predictor variables in Welch and Goyal (2008) as the information set, which consists of a
set of variables that have been suggested by the academic literature as predictors of the
equity premium. However, the four PRs use quite different econometric approaches to
exploit the predictors considered in Welch and Goyal (2008), ranging from univariate
regressions with constant or time-varying coefficients, constant and stochastic volatility,
different types of shrinkage or no shrinkage at all (LARGE-TVP-SV). These different
approaches, among many others, reflect the uncertainty in translating asset pricing
rationales or empirical regularities in the predictability of return moments into portfolio
choices, even when the information set is agreed upon. The academic literature recognizes
that the choice of econometric approach to capitalize on the predictability of stock market
returns can have a significant impact on the economic gains derived; see, for example,
Cederburg et al. (2023), who, among other things, examine the influence of constant
versus stochastic volatility in forecasting models on economic gains.
SOP (Ferreira and Santa-Clara,2011) uses a different information set than the four
PRs above and exploits the different time series persistence of the components in its
sum-of-the-parts approach, a specific empirical pattern that is exploited to improve
return predictions. PHM does not use any information from predictors at all. These PRs,
and potentially many others, provide different translations of asset pricing rationales or
strategies that are informed by empirical regularities into portfolio choice. For example,
time-varying coefficient models are consistent with the implications of asset pricing
rationales that use time-varying risk aversion to generate time-varying risk premiums;
see, e.g., Campbell and Cochrane (1999). Our ensemble approach allows for a range of
possible specifications and is therefore well suited to dealing with the uncertainty in the
translation.
35
4.2.3 Results
Table 3reports the results. It shows CER values without transaction costs as well as with
proportional transaction costs (CER
T C
) of 20 bps. We also report the (monthly) Sharpe
ratio without transaction costs (SR) and with proportional transaction costs (SR
T C
) of 20
bps. The
R2
OOS
-statistic (Campbell and Thompson,2008) compares the point forecast
accuracy of a given approach to the PHM benchmark. It measures the proportional
reduction in the sample mean squared forecast error compared to the prevailing historical
mean benchmark. Thus, a positive
R2
OOS
-statistic indicates that the mean squared
forecast error of the given approach is lower than that of the PHM. As a proxy for
point forecasting ability, we report
bρSP
(
ω∗∗,y
), which is the Spearman’s rank correlation
coefficient between the weights assigned to the risky asset and the realized excess returns.
Let
ω∗∗
denote the stacked weights of the risky asset over the evaluation sample, and
y
denotes the vector of realized excess returns over the evaluation sample. As a proxy of
risk management, we report Spearman’s rank correlation coefficient bρSP (ω∗∗,2,y2).
The empirical results can be summarized as follows. FLEXPOOL generated the
highest CER values and Sharpe ratios among the combined PRs. It demonstrated
both high predictive power and strong risk management. LARGE-TVP-SV had the
strongest predictive power, while SOP had the best risk management. The evolution
of the combination weights is shown in Figure 5. Most of the time, LARGE-TVP-SV
received a high weight (and often the entire weight). However, SOP, with its strong risk
management, was selected in three turbulent periods: in September and October 1998,
after the strongly negative returns in August 1998, a period associated with the Russian
currency crisis and the collapse of Long Term Capital Management. SOP was selected
from 2000
:
12 to 2003
:
10, a period associated with the dotcom bubble burst. Finally, SOP
was chosen from 2020
:
04 to 2020
:
07 after the large drop due to the COVID-19 pandemic
in March 2020. Similar to our first application, the results document FLEXPOOL’s ability
to automatically balance the predictive power and risk management of the candidate PRs
while maximizing the economic utility.
36
The value of the forgetting factor
α
was chosen to be 0
.
96 according to (8) over the
entire evaluation period. The emphasis on the recent economic utility gains resulted in a
faster adjustment of the combination weights compared to STATPOOL (see Figure 5).
The aggregate weight over time assigned by FLEXPOOL to the risky asset, i.e., the S&P
500 index, is shown in Figure 6. In particular, during periods of economic turbulence,
such as the dotcom bubble burst in the early 2000s and the Great Financial Crisis, there
are extended periods of low weights assigned to the risky asset.In terms of downside risk,
FLEXPOOL has the lowest maximum drawdown (0.2848) among all candidate and
combined PRs and less than half of the maximum drawdown produced by the PHM
(0.6601). If we compare FLEXPOOL with each candidate PR and alternative combined
PRs, the outperformance in terms of CERs and Sharpe ratios is significant at least at the
10% level according to the test of Diebold and Mariano (1995) and the test of Ledoit and
Wolf (2008), respectively, except for LARGE-TVP-SV and STATPOOL.
Among the candidate PRs, LARGE-TVP-SV generated by far the highest CER
value and SR despite its low
R2
OOS
-statistic of
−
0
.
1133. This result is reminiscent of
the findings by Cenesizoglu and Timmermann (2012) and Leitch and Tanner (1991)
that the point forecast accuracy of a model and its generated value measured by an
economic criterion can diverge strongly. Therefore, the
R2
OOS
-statistic may be a poor
indicator to guide portfolio decisions. LARGE-TVP-SV overfits the data because it uses
many predictors, time-varying coefficients and no shrinkage mechanism, resulting in the
low
R2
OOS
-statistic. As a complex model, LARGE-TVP-SV is strong at capturing the
structure of returns with a high predictive correlation compared to shrunk models, where
the signals are partially muted. However, the high variance of the forecasts based on
LARGE-TVP-SV leads to a low
R2
OOS
-statistic, which, however, is not detrimental in
terms of utility since the weight restrictions on the risky asset (no short selling, up to
50% leverage) prevent excessive portfolio weights, suggesting that the
R2
OOS
-statistic is
inappropriate for assessing the economic benefits of a PR; see also, e.g., Kelly et al. (2024).
Similar to the results in our first application, we find that predictive power and risk
37
Table 3: Summary of results for market timing.
The table shows our results for the evaluation sample from 1977
:
01 to 2020
:
12. It shows monthly CERs
without transaction costs as well as with proportional transaction costs (CER
T C
) of 20 bps for a power
utility investor with relative risk aversion of
γ
= 3. We also report the (monthly) Sharpe ratio without
transaction costs (SR) and with proportional transaction costs (SR
T C
) of 20 bps, and the maximum
drawdown for the transaction-adjusted returns (MaxDD
T C
). As a measure of the accuracy of the point
forecasts, we report the
R2
OOS
-statistic. Predictive power and risk management ability are proxied by
Spearman’s rank correlations bρSP (ω∗∗ ,y)and bρS P (ω∗∗,2,y2), respectively.
Economic Evaluation Criteria Statistical Properties
Candidate PRs CER CERT C SR SRT C MaxDDT C R2
OOS
bρSP (ω∗∗ ,y)bρSP (ω∗∗,2,y)
LARGE-TVP-SV 0.0096 0.0090 0.2048 0.1908 0.3459 −0.1133 0.1139
(0.0088) −0.0295
(0.4989)
BMA-TVP-CV 0.0067 0.0065 0.1411 0.1363 0.4099 −0.0388 0.0145
(0.7390) −0.0786
(0.0714)
UNIV-TVP-SV 0.0068 0.0065 0.1451 0.1394 0.3497 −0.0090 0.0176
(0.4150) −0.0885
(0.0422)
SOP 0.0071 0.0070 0.1560 0.1522 0.5832 0.0003 0.0614
(0.1592) −0.1213
(0.0530)
CF 0.0069 0.0068 0.1458 0.1433 0.6462 0.0010 0.0143
(0.7433) −0.0115
(0.7928)
PHM 0.0064 0.0064 0.1389 0.1382 0.6601 0.0000 −0.0256
(0.5578)
−0.0302
(0.4880)
Combined PRs
FLEXPOOL 0.0096 0.0091 0.2063 0.1965 0.2848 −0.0502 0.0888
(0.0413) −0.0979
(0.0245)
STATPOOL 0.0089 0.0084 0.1929 0.1800 0.3042 −0.0978 0.0929
(0.0329) −0.0292
(0.5030)
EQUAL WEIGHTS 0.0078 0.0077 0.1672 0.1631 0.4496 0.0035 0.0577
(0.1859) −0.0844
(0.0525)
management (positive values of
bρSP
(
ω∗∗,y)
and negative values of
bρSP
(
ω∗∗,2,y2
)) to
align well with the (ranking of the) CER values and the SRs. While our approach of
directly optimizing utility at the level of PRs captures the strong economic performance
of LARGE-TVP-SV, combination approaches based on statistical measures such as
R2
OOS
-statistics would not be able to do so. In Appendix Bwe provide further economic
analysis regarding portfolio composition and economic gains in different economic and
market conditions, as well as subsample analysis.
38
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.5
1
LARGE-TVP-SV
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.5
1
BMA-TVP-CV
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.5
1
UNIV-TVP-SV
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.5
1
SOP
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.5
1
CF
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.5
1
PHM
Figure 5: Evolution of combination weights.
The subplots show the evolution of the combination weights. The blue (red) lines show the combination
weights in FLEXPOOL (STATPOOL).
1977:01
1982:01
1987:01
1992:01
1997:01
2002:01
2007:01
2012:01
2017:01
0
0.5
1
1.5
Weight of the risky asset
Figure 6: Evolution of the aggregate position in the S&P 500 index.
39
BMA-TVP-CV and UNIV-TVP-SV shrink the coefficients towards zero by using
subsets of the predictors. As expected, the point forecast accuracy in terms of
R2
OOS
-
statistics is higher for these approaches than for LARGE-TVP-SV due to their shrinkage
mechanisms. However, the predictive power of BMA-TVP-CV and UNIV-TVP-SV as
measured by the rank correlation
bρSP
(
ω∗∗,y
)is significantly lower, as are their CERs and
SRs. Similarly, the equally weighted PRs, SOP and CF achieve decent point prediction
accuracy but are clearly inferior compared to LARGE-TVP-SV in terms of CER values
and SRs. PHM received temporarily high weights in the relatively calm mid to late 1990s.
This result is consistent with the finding in our first application, where 1/N received high
weights during this period. Thus, it appears that simple PRs tend to be favored in calm
periods, while flexible PRs are picked in more turbulent periods.
Figure 7shows the CER values as a function of the number of combined PRs. The
blue diamonds show the generated CER values produced by a particular subset of
combined PRs, and the red squares show the average CER values for a given number of
combined PRs. As was the case in our first application, the average CER values increase
as a function of the number of combined PRs.
The results of this application illustrate the importance of the translating asset pricing
rationales into portfolio choice. PRs based on combinations of univariate predictive
regressions, i.e., UNIV-TVP-SV and CF, produce relatively low CERs and play essentially
no role in the combination. This result confirms the finding of Welch and Goyal (2008)
who find no utility gains (relative to the PHM) for any of the predictors when evaluated
individually. Similarly, Goyal et al. (2023) dismiss most of the predictors advanced after
Welch and Goyal (2008) in terms of economic utility based on individual evaluation.
However, as shown in Table 3, LARGE-TVP-SV contributed strong predictive signals
and played a key role in the combination, demonstrating that exploiting multivariate
information with time-varying coefficients and stochastic volatility can be highly beneficial
in terms of economic utility. SOP and PHM, which are based on a different information
set, also played a role in the ensemble: while SOP with its good risk management was
40
123456
Number of combined PRs
0.006
0.007
0.008
0.009
0.010
0.011
CER
Figure 7: CER values as a function of the number of combined PRs using FLEXPOOL.
The blue diamonds show the CER values of all possible combinations for a given number of combined
PRs. The red square represents the average CER values for a given number of combined PRs.
selected mainly in recessions, PHM was selected mainly in expansions and calm economic
episodes.
4.2.4 Alternative settings and additional PRs
In addition to the results presented so far, we explored three alternative empirical settings.
First, we added the buy-and-hold strategy (without leverage) as another candidate PR to
the library. It yielded a CER value of 0
.
0073 and a Sharpe ratio of 0
.
1515. We found that
our results were largely unchanged when the buy-and-hold strategy was added to the
ensemble. The other two alternative settings used data available only over a shorter time
period.
We examined the utility gains from combining PRs based on backward-looking data
and PRs that are based on forward-looking data. As a representative of a PR that
uses forward-looking data, we chose the strategy of Pyun (2019), which provides OOS
forecasts of the equity premium based on the variance risk premium. This approach
41
exploits the relationship between the market risk premium and the price of variance
risk by the variance risk exposure. The point forecasts are available from 1990
:
02 to
2019
:
12.
33
We used a rolling window of 60 months to compute the variance estimate as an
additional input to the MV specification (18) and imposed the same weight restrictions
(no short selling, up to 50% leverage) as in our previous analysis. We combined this PR
with the LARGE-TVP-SV rule as a representative of a PR using backward-looking
data and computed results for the evaluation period from 2000
:
01 to 2019
:
12. The
PR based on the forward-looking data produced a CER value of 0
.
0078 and a SR of
0
.
2100. LARGE-TVP-SV generated a CER value of 0
.
0070 and a SR of 0
.
1955. With
FLEXPOOL, the combination of both PRs slightly improved the results with a CER
value of 0
.
0079 and a Sharpe ratio of 0
.
2146. To put this into perspective, PHM achieved
a CER value of 0.0014 and a Sharpe ratio of 0.0821 over this truncated sample.
We also examined whether adding a PR based on the approach recently proposed by
Dong et al. (2022) could add value relative to LARGE-TVP-SV. Dong et al. (2022)
propose a novel approach that uses a large number of 100 cross-sectional anomaly portfolio
returns as predictors for point forecasts of aggregate excess returns. These forecasts are
available from 1975
:
01 to 2017
:
12.
34
For this shortened period, we combined the strategy
of Dong et al. (2022) with LARGE-TVP-SV and computed results for the evaluation
sample from 1985
:
01 to 2017
:
12. Using the MV specification (18), we chose the setting
where the elastic net is used as the shrinkage technique for computing expected excess
returns and used a rolling window of 60 months for estimating the variance. We imposed
the same weight restrictions (no short selling, up to 50% leverage) as in our previous
analysis. LARGE-TVP-SV produced a CER value of 0
.
0082 and the approach of Dong
et al. (2022) produced a CER value of 0
.
0093. The combined PRs produced a CER
value of 0
.
0098 when using FLEXPOOL, confirming the usefulness of complex PRs for
increasing economic utility.
33
We downloaded the data from Sungjune Pyun’s homepage:
https://sjpyun.github.io/research.
html.
34We downloaded the forecasts from Dave Rapach’s homepage:
https://sites.google.com/slu.edu/daverapach/publications.
42
5 The relative strengths of FLEXPOOL
As a stacking method that can combine heterogeneous estimators (i.e., PRs in our
case), FLEXPOOL aims to diversify their idiosyncratic risks and exploit their individual
strengths. Existing combination methods such as the rules of Tu and Zhou (2011)orKan
et al. (2022) are limited in both the number and the type of PRs they can combine.
In contrast, FLEXPOOL is able to combine a large number of heterogeneous PRs.
Consequently, a notable strength of FLEXPOOL is its ability to combine PRs that may
come from different domains.
It is important to note that FLEXPOOL should not be seen as a substitute for
combination rules based on a specific structure, such as those derived from asset pricing
models. To illustrate, consider the KWZ-MP rule of Kan et al. (2022) (described in
Section 4.1.1). This PR is about maximizing the expected OOS utility and the weight
vector is given by
b
wq,t
=
b
wMP
g,t
+
g3(b
ψ2
t)
γb
wMP
z,t
; see Formula 51 in Kan et al. (2022). In
essence, the KWZ-MP rule combines the GMV portfolio (
b
wMP
g,t
) with the long-short
zero-investment portfolio (
b
wMP
z,t
). The
g3b
ψ2
t
parameter is estimated from the data
using a variety of inputs, including estimates of the means and the covariance matrix.
The factor structure implied by MacKinlay and Pástor (2000) is used to inform this
estimation process and, ultimately, to minimize the estimation error for the combination
of the two specific building blocks, (
b
wMP
g,t
) and (
b
wMP
z,t
). On the other hand FLEXPOOL
estimates the parameter
g3b
ψ2
t
parameter using a simple linear combination approach,
rather than relying on additional data or exploiting the factor structure. (A modified
version of) FLEXPOOL chooses the weight of the zero-investment portfolio in order to
maximize the pseudo OOS utility.35
35
In this particular configuration, FLEXPOOL had to be tuned by setting the weight of the GMV
portfolio to 1 and by limiting
g3b
ψ2
t
to the range between 0 and 1. This was done to ensure that the
weights of the resulting portfolio add up to 1. The limiting cases are the GMV portfolio (if
g3b
ψ2
t
is 0) and the plug-in portfolio (if
g3b
ψ2
t
is 1). Although this setup differs slightly from the typical
FLEXPOOL configuration used in this paper, which considers a convex combination of candidate PRs,
the outlined optimization approach for the weight of the zero-investment portfolio is very similar to the
way FLEXPOOL approaches the combination problem.
43
Empirically, the utility obtained from the portfolio with the estimated weight for the
long-short zero-investment portfolio in FLEXPOOL is lower than that of the KWZ-MP
rule:
CER
= 0
.
0046 and
CERT C
= 0
.
0041 for KWZ-MP vs.
CER
= 0
.
0044 and
CERT C
= 0
.
0039 for FLEXPOOL.
36
It should come as no surprise that KWP-MP
outperforms FLEXPOOL in this particular situation, since it is optimized for the two
PRs at hand. However, within the KWZ-MP method, there is no way to combine
the KWZ-MP rule with additional PRs such as VOLTIME or GALTON shrinkage.
Instead, the generality of our approach comes in handy. As shown in our application to a
cross-section of 50 stocks in Section 4.1, when we combine KWZ-MP with 1/N, VOLTIME
and GALTON, the resulting performance of FLEXPOOL exceeds that of KWZ-MP by
almost 20 basis points in CER per month after transaction costs:
CERT C
= 0
.
0060
for FLEXPOOL vs.
CERT C
= 0
.
0041 for KWZ-MP (see Table 1) with a much lower
maximum drawdown. Therefore, FLEXPOOL should be considered as a complementary
method to existing combination rules, not as a competing approach that aims to replace
them.
The relative strength of FLEXPOOL for combining specialized and heterogeneous
PRs (“strong learners”), rather than replacing existing combination methods, lies in
the core functionality of stacking algorithms. Stacking has been shown to have good
properties both asymptotically and in finite samples (see, e.g., Wolpert,1992;Breiman,
1996;Van der Laan et al.,2007;Polley and Van Der Laan,2010;Wang et al.,2023). It
has been used primarily in situations, where heterogeneous candidate estimators such as
random forests, neural networks and polynomial linear models are aggregated into a final
estimator. Simple estimators such as regression trees (“weak learners”) are typically
aggregated (e.g., via random forests or boosting) before entering the stacking algorithm.
While the candidate estimators in the ensemble may be complex to used to capture
hidden structures in the data, the aggregation function in stacking algorithms is typically
linear and the weights attached to the candidate estimators are bounded between 0 and 1
36
As in Section 4.1, the largest 50 stocks were used as the asset universe, and the same time frame was
used for estimation and evaluation. The evaluation sample covers the period from 1977:01 to 2020:12.
44
and add up to one (i.e., a convex combination).37
All in all, FLEXPOOL is not intended to “re-invent the wheel” in the sense of replacing
existing PRs that specifically combine different building blocks in a targeted manner such
as the KWZ-MP rule does. Such rules can be added to the pool of candidate PRs and
combined in FLEXPOOL. Due to the nature of stacking algorithms, combining a set of
relatively simple PRs solely within the FLEXPOOL framework may not be an optimal
strategy, if a targeted combination approach has already been developed for these simple
PRs.
As a second illustrative example of this aspect, suppose our pool of candidate
PRs consists of only three simple rules, namely 1/N, (the simple plug-in version of
the) Markowitz and GMV. Although the GALTON shrinkage method of Barroso and
Saxena (2022) is not a combination method in the strict sense, its solution to the asset
allocation problem is spanned by 1/N, (the simple plug-in version of the) Markowitz
portfolio and GMV as limiting cases; see our description in Section 4.1.2. The GALTON
shrinkage method uses the OOS forecast errors to determine the optimal degree of
shrinkage for the means and (co-)variances using shrinkage targets and involves estimating
several parameters from the data within the shrinkage framework. Thus, GALTON
uses sophisticated mechanisms to process the data with shrinkage targets and exploits
specific information to shrink means, variances and covariances, rather than using a
convex combination as in FLEXPOOL. Table 4compares the empirical performance
of GALTON and FLEXPOOL.
38
GALTON outperformed FLEXPOOL on both the
pre- and post-transaction cost performance measures. However, once again, within the
framework of Barroso and Saxena (2022), there is no way to combine the GALTON
rule with additional and potentially very different PRs. When we use FLEXPOOL to
37
While nonlinear and generally more complicated aggregation functions are theoretically possible,
they have not been widely adopted, and the convex combination has been found to provide stability (see,
e.g., Breiman,1996;Van der Laan et al.,2007;Polley and Van Der Laan,2010). In the case of combining
PRs, there is an additional motivation to use the convex combination to ensure that any restrictions on
asset weights imposed at the level of the candidate PRs (e.g., no short selling, restrictions on sector
weights, etc.) also hold at the level of the combined PRs.
38
As in Section 4.1, the largest 50 stocks were used as the asset universe, and the same time frame was
used for estimation and evaluation. The evaluation sample covered the period from 1977:01 to 2020:12.
45
combine GALTON with 1/N, VOLTIME, KWZ-MP and KWZ-LW, the utility gains are
substantially higher than when relying on GALTON alone, with FLEXPOOL producing
an additional 14 basis points per month in CER after TC (see Table 1).39
Table 4: Comparison of FLEXPOOL and GALTON in a pool of simple rules.
The table shows our results for the evaluation sample from 1977
:
01 to 2020
:
12. It includes monthly CER
values without transaction costs and with proportional transaction costs (CER
T C
) of 20 bps for a power
utility investor with relative risk aversion of
γ
= 3. As another performance measure, the table shows the
monthly Sharpe ratio before transaction costs (SR) and after proportional transaction costs of 20 bps
(SRT C ). Avg. TO is the average turnover.
CER CERT C SR SRT C Avg. TO
1/N 0.0035 0.0033 0.1479 0.1442 0.0782
Markowitz −0.1453 −0.3653 0.0364 −0.1425 99.0332
GMV 0.0032 0.0008 0.1400 0.0849 1.1543
GALTON 0.0052 0.0046 0.2029 0.1853 0.3100
FLEXPOOL 0.0038 0.0035 0.1512 0.1433 0.1082
Nevertheless, FLEXPOOL outperformed the individual PRs, which reiterates one key
point: if specialized combination method exists for simple and homogeneous rules, then
our approach should treat it as a candidate PR, without including the simple rules in the
pool. In the absence of such combination methods, FLEXPOOL, as a general-purpose
approach, is well suited to generate economic gains relative to relying on any single PR,
however selected.
6 Concluding remarks
We have introduced an ensemble framework for combining heterogeneous PRs. The
proposed combination strategy allows researchers to exploit the myriad of existing PRs
in a utility maximization framework, while diversifying away the idiosyncratic risks
of candidate PRs and retaining many attractive properties. Our approach is able to
39
One might wonder how the results are affected if we exclude 1/N from the pool of PRs in Table 1,
since it can be argued that this 1/N is also used as a building block for GALTON. Omitting 1/N from
the pool of candidates in 1changes the results very little, with
CE R
= 0
.
0068 and
CE RT C
= 0
.
0059.
46
combine the virtues of PRs regardless of their design and without making distributional
assumptions about the data generating process of the PRs’ returns.
Extensive applications to cross-sections of stocks and market timing have documented
the expediency of our approach. The combined PRs achieved out-of-sample certainty
equivalent returns that were either higher than those of any of the candidate PRs or
roughly equal to those of the ex-post best candidate PR. From an ensemble perspective,
the candidate PR with the highest individual utility did not necessarily receive the
highest weight in the combination. Rapidly changing combination weights played an
important role in improving OOS utility by capturing the time-varying performance of the
PRs. Detailed analyses showed how the flexible combination strikes a balance between
predicting the level of asset returns and anticipating their variance. Furthermore, the
analyses showed that, on average, utility gains increased with the number of candidate
PRs—even without additional regularization of the combination weights to reduce
estimation risk at the combination stage.
By using our ensemble approach, researchers and investors can address their uncertainty
about which asset pricing rationale or empirical regularity to consider for informing their
portfolio decisions, and about how to translate that information into portfolio choice. The
overarching contribution of our study is its potential to change the way we approach
portfolio choice problems: instead of striving to find a single best PR, our framework
allows an extensive library of candidate PRs to contribute their strengths in an ensemble,
similar to optimizing a combination of assets. While the search for new candidate PRs
will continue, relying on improved techniques and novel data sources, our framework will
also provide a tool for assessing the incremental empirical merits (or lack thereof) of
newly proposed PRs.
47
References
Adämmer, P. and Schüssler, R. A. (2020). Forecasting the equity premium: mind the
news! Review of Finance, 24(6):1313–1355.
Avramov, D., Cheng, S., and Metzker, L. (2023). Machine learning vs. economic restrictions:
Evidence from stock return predictability. Management Science, 69(5):2587–2619.
Baker, M., Bradley, B., and Wurgler, J. (2011). Benchmarks as limits to arbitrage:
Understanding the low-volatility anomaly. Financial Analysts Journal, 67(1):40–54.
Barroso, P. and Saxena, K. (2022). Lest we forget: Learn from out-of-sample forecast
errors when optimizing portfolios. The Review of Financial Studies, 35(3):1222–1278.
Beckmann, J., Koop, G., Korobilis, D., and Schüssler, R. A. (2020). Exchange rate
predictability and dynamic bayesian learning. Journal of Applied Econometrics,
35(4):410–421.
Bernaciak, D. and Griffin, J. E. (2024). A loss discounting framework for model averaging
and selection in time series models. International Journal of Forecasting, Forthcoming.
Bianchi, D. and Guidolin, M. (2014). Can long-run dynamic optimal strategies outperform
fixed-mix portfolios? evidence from multiple data sets. European Journal of Operational
Research, 236(1):160–176.
Blitz, D. and Van Vliet, P. (2007). The volatility effect: Lower risk without lower return.
Journal of Portfolio Management, pages 102–113.
Bonaccolto, G. and Paterlini, S. (2020). Developing new portfolio strategies by aggregation.
Annals of Operations Research, 292(2):933–971.
Brandt, M. W., Santa-Clara, P., and Valkanov, R. (2009). Parametric portfolio policies:
Exploiting characteristics in the cross-section of equity returns. The Review of Financial
Studies, 22(9):3411–3447.
Breiman, L. (1996). Stacked regressions. Machine Learning, 24(1):49–64.
Breiman, L. (2001). Random forests. Machine learning, 45:5–32.
Campbell, J. Y. and Cochrane, J. H. (1999). By force of habit: A consumption-based
explanation of aggregate stock market behavior. Journal of Political Economy,
107(2):205–251.
Campbell, J. Y. and Thompson, S. B. (2008). Predicting excess stock returns out of
sample: Can anything beat the historical average? The Review of Financial Studies,
21(4):1509–1531.
Cederburg, S., Johnson, T. L., and O’Doherty, M. S. (2023). On the economic significance
of stock return predictability. Review of Finance, 27(2):619–657.
Cenesizoglu, T. and Timmermann, A. (2012). Do return prediction models add economic
value? Journal of Banking & Finance, 36(11):2974–2987.
Chen, L., Pelger, M., and Zhu, J. (2024). Deep learning in asset pricing. Management
Science, 70(2):714–750.
Chordia, T., Subrahmanyam, A., and Tong, Q. (2014). Have capital market anomalies
attenuated in the recent era of high liquidity and trading activity? Journal of
Accounting and Economics, 58(1):41–58.
48
Cong, L. W., Tang, K., and Wang, J. (2024). Goal-oriented portfolio management through
transformer-based reinforcement learning. Social Science Research Network, (3554486).
Dangl, T. and Halling, M. (2012). Predictive regressions with time-varying coefficients.
Journal of Financial Economics, 106(1):157–181.
Daniel, K. and Moskowitz, T. J. (2016). Momentum crashes. Journal of Financial
Economics, 122(2):221–247.
DeMiguel, V., Garlappi, L., and Uppal, R. (2009). How inefficient are simple asset
allocation strategies. Review of Financial Studies, 22(5):1915–1953.
DeMiguel, V., Martin-Utrera, A., Nogales, F. J., and Uppal, R. (2020). A transaction-cost
perspective on the multitude of firm characteristics. The Review of Financial Studies,
33(5):2180–2222.
Diebold, F. X. and Mariano, R. S. (1995). Comparing predictive accuracy. Journal of
Business & Economic Statistics, 20(1):134–144.
Diebold, F. X. and Shin, M. (2019). Machine learning for regularized survey forecast
combination: Partially-egalitarian lasso and its derivatives. International Journal of
Forecasting, 35(4):1679–1691.
Dong, X., Li, Y., Rapach, D. E., and Zhou, G. (2022). Anomalies and the expected
market return. The Journal of Finance, 77(1):639–681.
Duchin, R. and Levy, H. (2009). Markowitz versus the talmudic portfolio diversification
strategies. Journal of Portfolio Management, 35:71–74.
Farmer, L. E., Schmidt, L., and Timmermann, A. (2023). Pockets of predictability. The
Journal of Finance, 78(3):1279–1341.
Ferreira, M. A. and Santa-Clara, P. (2011). Forecasting stock market returns: The sum of
the parts is more than the whole. Journal of Financial Economics, 100(3):514–537.
Frahm, G. (2015). A theoretical foundation of portfolio resampling. Theory and Decision,
79(1):107–132.
Frazzini, A., Israel, R., and Moskowitz, T. J. (2018). Trading costs. Available at SSRN
3229719.
Gârleanu, N. and Pedersen, L. H. (2013). Dynamic trading with predictable returns and
transaction costs. The Journal of Finance, 68(6):2309–2340.
Giraitis, L., Kapetanios, G., and Price, S. (2013). Adaptive forecasting in the presence of
recent and ongoing structural change. Journal of Econometrics, 177(2):153–170.
Goyal, A., Welch, I., and Zafirov, A. (2023). A comprehensive 2022 look at the empirical
performance of equity premium prediction. Review of Financial Studies, Forthcoming.
Gu, S., Kelly, B., and Xiu, D. (2020). Empirical asset pricing via machine learning. The
Review of Financial Studies, 33(5):2223–2273.
Haugen, R. A. and Heins, A. J. (1975). Risk and the rate of return on financial assets:
Some old wine in new bottles. Journal of Financial and Quantitative Analysis,
10(5):775–784.
Huang, D., Jiang, F., Tu, J., and Zhou, G. (2015). Investor sentiment aligned: A powerful
predictor of stock returns. The Review of Financial Studies, 28(3):791–837.
49
Hyndman, R. J. and Athanasopoulos, G. (2018). Forecasting: principles and practice.
OTexts.
Jegadeesh, N. and Titman, S. (1993). Returns to buying winners and selling losers:
Implications for stock market efficiency. The Journal of Finance, 48(1):65–91.
Johannes, M., Korteweg, A., and Polson, N. (2014). Sequential learning, predictability,
and optimal portfolio returns. The Journal of Finance, 69(2):611–644.
J.P.Morgan/Reuters (1996). Riskmetrics—technical document. Technical report.
Kan, R., Wang, X., and Zhou, G. (2022). Optimal portfolio choice with estimation risk:
No risk-free asset case. Management Science, 68(3):2047–2068.
Kan, R. and Zhou, G. (2007). Optimal portfolio choice with parameter uncertainty.
Journal of Financial and Quantitative Analysis, 42(3):621–656.
Kazak, E. and Pohlmeier, W. (2023). Bagged pretested portfolio selection. Journal of
Business & Economic Statistics, 41(4):1116–1131.
Kelly, B., Malamud, S., and Zhou, K. (2024). The virtue of complexity in return prediction.
The Journal of Finance, 79(1):459–503.
Kirby, C. and Ostdiek, B. (2012). It’s all in the timing: simple active portfolio strategies
that outperform naive diversification. Journal of Financial and Quantitative Analysis,
47(2):437–467.
Lassance, N., Vanderveken, R., and Vrins, F. (2023). On the combination of naive and
mean-variance portfolio strategies. Journal of Business & Economic Statistics, pages
1–15.
LeBlanc, M. and Tibshirani, R. (1996). Combining estimates in regression and classification.
Journal of the American Statistical Association, 91(436):1641–1650.
Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional
covariance matrices. Journal of Multivariate Analysis, 88(2):365–411.
Ledoit, O. and Wolf, M. (2008). Robust performance hypothesis testing with the sharpe
ratio. Journal of Empirical Finance, 15(5):850–859.
Leitch, G. and Tanner, J. E. (1991). Economic forecast evaluation: profits versus the
conventional error measures. The American Economic Review, pages 580–590.
Liu, Y. and Zhou, G. (2024). Optimal portfolio choice with economic constraints: A
genetic programming approach. Available at SSRN.
Maasoumi, E., Tong, G., Wen, X., and Wu, K. (2022). Portfolio choice with subset
combination of characteristics. Available at SSRN.
MacKinlay, A. and Pástor, L. (2000). Asset pricing models: Implications for expected
returns and portfolio selection. The Review of Financial Studies, 13(4):883–916.
Maillard, S., Roncalli, T., and Teiletche, J. (2010). On the properties of equally weighted
risk contribution portfolios. Journal of Portfolio Management, 36:60–70.
Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7:77–91.
Moritz, B. and Zimmermann, T. (2016). Tree-based conditional portfolio sorts: The
relation between past and future stock returns. Available at SSRN 2740751.
50
Nevasalmi, L. and Nyberg, H. (2021). Moving forward from predictive regressions:
Boosting asset allocation decisions. Available at SSRN 3623956.
Novy-Marx, R. (2014). Understanding defensive equity. Technical report, National Bureau
of Economic Research.
Paye, B. S. (2012). The economic value of estimated portfolio rules under general utility
specifications. Available at SSRN 1645419.
Pettenuzzo, D. and Ravazzolo, F. (2016). Optimal portfolio choice under decision-based
model combinations. Journal of Applied Econometrics, 31(7):1312–1332.
Polk, C., Thompson, S., and Vuolteenaho, T. (2006). Cross-sectional forecasts of the
equity premium. Journal of Financial Economics, 81(1):101–141.
Polley, E. C. and Van Der Laan, M. J. (2010). Super learner in prediction.
Pyun, S. (2019). Variance risk in aggregate stock returns and time-varying return
predictability. Journal of Financial Economics, 132(1):150–174.
Raftery, A. E., Madigan, D., and Hoeting, J. A. (1997). Bayesian model averaging for linear
regression models. Journal of the American Statistical Association, 92(437):179–191.
Rapach, D. E., Strauss, J. K., and Zhou, G. (2010). Out-of-sample equity premium
prediction: Combination forecasts and links to the real economy. The Review of
Financial Studies, 23(2):821–862.
Roncalli, T. (2014). Introduction to risk parity and budgeting. Financial Mathematics
Series.
Tu, J. and Zhou, G. (2011). Markowitz meets talmud: A combination of sophisticated
and naive diversification strategies. Journal of Financial Economics, 99(1):204–215.
Van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007). Super learner. Statistical
Applications in Genetics and Molecular Biology, 6(1).
van Hemert, O., Ganz, M., Harvey, C. R., Rattray, S., Martin, E. S., and Yawitch, D.
(2020). Drawdowns. The Journal of Portfolio Management, 46(8):34–50.
Wang, X., Hyndman, R. J., Li, F., and Kang, Y. (2023). Forecast combinations: An over
50-year review. International Journal of Forecasting, 39(4):1518–1547.
Welch, I. and Goyal, A. (2008). A comprehensive look at the empirical performance of
equity premium prediction. The Review of Financial Studies, 21(4):1455–1508.
Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2):241–259.
Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical
Association, 96(454):574–588.
Yuan, M. and Zhou, G. (2022). Why naive diversification is not so naive, and how to beat
it? Journal of Financial and Quantitative Analysis, pages 1–32.
51
A Alternative asset universe and portfolio rules
In this section we explore our ensemble approach for an alternative asset universe and
alternative portfolio rules. We use the empirical setting from Section 4.1, but use the
largest 500 US stocks instead of the largest 50 stocks, and consider an alternative set of
candidate PRs. The PRs in this application use long-only positions. Several PRs we had
used in Section 4.1 (i.e., KWZ-MP, KWZ-LW, GALTON) are not applicable due to the
higher number of model parameters to be estimated for 500 stocks and/or use short
positions. In this setting we use five candidate PRs, where 1/N and VOLTIME are the
same as in Section 4.1 and three additional PRs:
•VOL-LONG:
Each month, the 500 stocks are sorted according to their volatility over the last
twelve months and an equally weighted long position is taken in the lowest decile,
i.e., the 50 stocks with the lowest variance over the last year. This PR is an
implementation version to capture the low-risk anomaly that has been documented
in the literature; see, e.g., Haugen and Heins (1975), Blitz and Van Vliet (2007),
and Baker et al. (2011).
•MOM-LONG:
The empirical finding that past winners in the stock market outperform past
losers dates back to Jegadeesh and Titman (1993) and is known as the momentum
anomaly. As is common in the literature, we sort the stocks by their realized returns
over the month
t−
12 to
t−
2and take an equally weighted long position in the
top decile (i.e., the 50 stocks with the highest realized returns over the month
t−
12 to
t−
2). Contrary to the usual implementation of momentum-based sorting
strategies, we do not take a short position in the lowest decile due to the long-only
restriction in this setup.
•RANDOM FOREST:
Moritz and Zimmermann (2016) and Gu et al. (2020) have shown the merits of
random forests (Breiman,2001), a collection of regression trees, to capture the
nonlinear relationship between predictors and future stock returns. We use 100
52
trees, where each tree uses a maximum of 15 splits and the number of predictors to
sample at each node is equal to one-third of the total number of predictors. Each
tree predicts the returns of next month’s stock returns using the last 24 monthly
returns relative to their peers (sorted into deciles). In each month, we use the last
12 months for each of the 500 stocks as a training sample.
The choice of this setting was motivated by the following considerations. First,
we want to see how a machine learning based PR contributes to the ensemble. We
chose random forests and past returns as predictors to keep the required training time
comparatively low and to get a long OOS evaluation period (starting from 1977:01 as
we do for the other methods and applications). Neural networks as used in Gu et al.
(2020) and Chen et al. (2024) based on firm characteristics would require much more
training than random forests based on past return data. Second, inspired by Avramov
et al. (2023), we would like to impose economic constraints to evaluate whether adding
a machine learning-based method can generate utility gains in such an environment.
Avramov et al. (2023) note that in recent years, anomalous return patterns detected by
machine learning-based strategies have been largely confined to difficult-to-arbitrage
stocks and to short positions. Therefore, we restrict ourselves to a setting with liquid
large-cap stocks and long-only positions.
The results are summarized in Table A1. FLEXPOOL generates the highest CERs
before transaction costs. However, due to its high turnover, FLEXPOOL is slightly
outperformed by several other PRs after transaction costs. In this setting, our transaction
cost mitigation strategy does not work as well as for the two applications in the main
paper (see Table A13 in Appendix C) due to the lower autocorrelation of the combination
weights and the high turnover of one candidate PR, i.e., RANDOM FOREST.
One solution could be to reduce the turnover of the machine learning based PRs
before they enter the ensemble. In this direction, adding machine learning strategies as
candidate PRs that keep turnover relatively low (see, e.g. DeMiguel et al. (2020), Liu and
Zhou (2024) and Cong et al. (2024)) seems to be a fruitful avenue.
53
Table A1: Summary of results for a cross-section of the 500 largest stocks.
The table shows our results for the evaluation sample from 1977
:
01 to 2020
:
12. It includes monthly CER
values without transaction costs and with proportional transaction costs (CER
T C
) of 20 bps for a power
utility investor with relative risk aversion of
γ
= 3. As an additional performance measure, the table
shows the monthly Sharpe ratio before transaction costs (SR) and after proportional transaction costs of
20 bps (SR
T C
), and the maximum drawdown after transaction costs of 20 bps (MaxDD
T C
). Avg. TO is
the average turnover of the evaluation sample.
Candidate PRs CER CERT C SR SRT C MaxDDT C Avg. TO
1/N 0.0044 0.0042 0.1675 0.1639 0.5372 0.0789
VOLTIME 0.0046 0.0044 0.1870 0.1811 0.4230 0.0994
VOL-LONG 0.0055 0.0047 0.2115 0.1862 0.4367 0.4457
MOM-LONG 0.0044 0.0032 0.1684 0.1468 0.5828 0.6245
RANDOM FOREST 0.0026 −0.0004 0.1398 0.0892 0.7551 1.5350
Combined PRs
FLEXPOOL 0.0066 0.0041 0.2085 0.1593 0.4813 1.2468
STATPOOL 0.0050 0.0036 0.1810 0.1505 0.4753 0.7139
EQUAL WEIGHTS 0.0053 0.0042 0.1928 0.1675 0.4964 0.5202
B FLEXPOOL’s performance in different economic
and market conditions
Portfolio composition in different economic and market conditions.
First, we run a time-series regression of the portfolio composition in FLEXPOOL on a
number of indicators of the economic and market environment:
Share_Candidate_P Rt=β0+β1NegRett−1+β2Rect−1+β3H ighV olt−1+β4HighSentt−1+et,
(A.1)
where
Share_Candidate_P Rt
denotes the relative share of a candidate PR,
NegRet
is
a dummy variable that takes the value 1if the return of the S&P 500 return in a given
month is negative, and 0otherwise.
Rect−1
is a dummy variable that takes the value 1if
a given month is in a recession regime as classified by the National Bureau of Economic
Research (NBER), and 0otherwise.
HighV ol
is a dummy variable that takes the value 1
54
if the realized variance (computed using daily S&P 500 returns) is above the median of
the realized variance computed over the entire evaluation sample (1977:01-2020:12), and 0
otherwise.
HighSent
is a dummy variable that takes the value 1if the investor sentiment
index of Huang et al. (2015) is above the median of sentiment computed over the entire
evaluation sample (1977:01-2020:12), and 0otherwise.
CER differences in different economic and market conditions.
Second, we run a time-series regression of the difference between the monthly CER
achieved by FLEXPOOL and that achieved by one of the candidate PRs on a set of
indicators that proxy for economic and market conditions:
∆CERt=β0+β1N egRett+β2Rect+β3HighV olt+β4HighSentt+et, (A.2)
where ∆
CERt
denotes the difference between the CER achieved by FLEXPOOL and the
CER achieved by one of the candidate PRs (in bps). CERs are computed for a power
utility investor with risk aversion coefficient γ= 3.
Subsample analysis.
Third, we show subsample results (CER and CER
T C
) for the pre-2001 and post-2001
samples, where the choice of the split point (January 2001) follows Avramov et al. (2023)
and is rationalized with the decimalization in January 2001, which significantly reduced
trading costs.40
B.1 Application to a cross-section of the largest 50 stocks
Table A2 reports the estimated coefficients for the portfolio composition in different
markets according to Equation (A.1) based on HAC-robust standard errors. Figure
3, which shows the evolution of the weights assigned to the candidate PRs over time,
had already indicated that 1/N is an attractive choice in calm periods. The portfolio
composition analysis supports this finding, showing that 1/N was largely avoided when
current S&P 500 returns were negative and/or a recession was underway. On average,
FLEXPOOL assigned 6.7 percentage points less weight to 1/N when market returns were
40Chordia et al. (2014) argue that decimalization has led to increased liquidity and lower trading
costs, which has led to increased price efficiency and lower profitability for anomaly-based trading
strategies.
55
negative than when market returns were positive, and 21.0 percentage points less weight
to 1/N in recessions than in expansions, ceteris paribus. KWZ-MP was mainly selected
when current returns were negative and volatility was low, while KWZ-LW was mainly
selected in the opposite scenario. Table A3 reports the estimated coefficients for the CER
differences between FLEXPOOL and those achieved by one of the candidate PRs in
different economic states according to Equation (A.2). A clear message from Table A3 is
that 1/N significantly underperforms FLEXPOOL when market returns are negative, on
average by more than 225 bps per month.
Table A4 reports the results for CER and CER
T C
for the pre-2001 sample from
1977
:
01 to 2000
:
12 and the post-2001 sample from 2001
:
01 to 2020
:
12. The results
suggest that the performance of the PRs in different economic and market conditions is
overlaid by an overarching trend: For most PRs, economic gains are significantly smaller
in the more recent subsample. In the pre-2001 period, KWZ-MP and GALTON were
the most successful candidate PRs individually. However, FLEXPOOL managed to
achieve higher CER (CERTC ) values than both candidates, and also performed much
better than STATPOOL and EQUAL WEIGHTS. In the post-2001 period, the relative
performance of the candidate PRs changed significantly compared to the pre-2001 sample.
For example, KWZ-MP and GALTON, the best candidate PRs in the pre-2001 sample,
were significantly outperformed by VOLTIME and especially by KWZ-LW in the post-2001
sample. FLEXPOOL did a good job of adjusting the weights for KWZ-MP and GALTON
in the post-2001 sample, which dropped from over 45% to 1
.
21% for KWZ-MP and from
over 19% to less than 7% for GALTON. On the contrary, FLEXPOOL significantly
increased the weights for VOLTIME and KWZ-LW. This pattern confirms our previous
analysis that FLEXPOOL is successful in shifting weights to (combinations of) PRs that
perform well locally in time; see our analysis for predictive power and risk management
in Section 4.1.4. VOLTIME is stable in both subsamples, confirming its previously
documented robustness across different market states and time periods; see, e.g., Blitz and
Van Vliet (2007); Novy-Marx (2014). In the post-2001 sample, FLEXPOOL performed
roughly on par with the best candidate PR, i.e., KWZ-LW, while STATPOOL and
EQUAL WEIGHTS significantly underperformed KWZ-LW.
56
Table A2: Portfolio composition in different economic and market conditions.
The table reports the estimates of the slope coefficients for the time series regression in Equation
(A.1) for the evaluation sample from 1977:01 to 2020:12. *, **, *** indicates statistical significance
at the 10%, 5% and 1% level, respectively.
Candidate PR NegRet Rec HighVol HighSent
1/N −0.067∗∗ −0.210∗∗ 0.006 0.072
VOLTIME 0.019 −0.020 0.061 −0.135
KWZ-MP 0.103∗∗ 0.028 −0.166∗∗ 0.044
KWZ-LW −0.058∗∗ 0.082 0.098∗∗ −0.072
GALTON 0.003 0.119 0.002 0.091
Table A3: CER differences (in bps) in different economic and market conditions.
The table reports the estimates of the slope coefficients for the time series regression in Equation
(A.2) for the evaluation sample from 1977:01 to 2020:12. CERs are computed for a power utility
investor with risk aversion coefficient γ= 3. *, **, *** indicates statistical significance at the 10%,
5% and 1% level, respectively.
NegRet Rec HighVol HighSent
1/N 225.351∗∗∗ 46.401 7.027 −33.825
VOLTIME 8.964 4.141 29.146 14.69
KWZ-MP −63.523∗∗ 27.01 −9.639 −15.657
KWZ-LW 6.673 8.265 −5.020 6.282
GALTON −37.537∗∗ −1.331 −4.447 −4.920
57
Table A4: Subsample analysis.
The table reports the subsample results (CER and CERT C ) for the pre-2001 sample from 1977:01 to
2000:12 and the post-2001 sample from 2001:01 to 2020:12. Avg. share is the average share of a PR
in a subsample.
Pre-2001 Post-2001
CER CERT C Avg. Share CER CERT C Avg. Share
Candidate PR
1/N 0.0040 0.0038 0.2540 0.0029 0.0028 0.1234
VOLTIME 0.0053 0.0051 0.0770 0.0048 0.0047 0.4595
KWZ-MP 0.0066 0.0059 0.4515 0.0023 0.0020 0.0121
KWZ-LW 0.0032 0.0018 0.0230 0.0061 0.0053 0.3373
GALTON 0.0065 0.0058 0.1945 0.0037 0.0032 0.0677
Combined PR
FLEXPOOL 0.0075 0.0066 x0.0060 0.0052 x
STATPOOL 0.0055 0.0048 x0.0035 0.0031 x
EQUAL WEIGHTS 0.0056 0.0052 x0.0043 0.0040 x
B.2 Application to a cross-section of the largest 500 stocks
Table A5 reports the estimated coefficients for the portfolio composition in different
markets according to Equation (A.1); based on HAC-robust standard errors. The patterns
for 1/N are consistent with those found in the application to the 50 largest stocks.
FLEXPOOL assigns higher weights to VOLTIME during periods of negative market
returns and volatility, consistent with VOLTIME’s good risk management documented in
Section 4.1.4. Conversely, MOM-LONG is primarily selected during periods of positive
market returns and expansions, a pattern that also holds for RANDOM FOREST.
Analyzing the return patterns of RANDOM FOREST, we find that its returns are highly
correlated with MOM-LONG both in terms of level and volatility, compared to the
returns of the other PRs in the pool.41
Table A6 reports the estimated coefficients for the CER differences between FLEXPOOL
and those achieved by each of the candidate PRs in different economic states according
to Equation (A.2). VOLTIME and VOL-LONG performed relatively well in periods
of negative market returns. To the contrary, FLEXPOOL significantly outperformed
41The empirical correlation coefficient between the the returns of MOM-LONG and RANDOM
FORESTS is 0.85.
58
1/N, MOM-LONG and RANDOM FOREST when market returns were negative. The
performance of MOM-LONG and RANDOM FOREST (given their similarity in terms of
the return patterns) in the different market conditions is not too surprising: momentum
strategies are documented to perform better in bull markets than in bear markets, while
frequent trend reversals make momentum riskier and momentum crashes are more likely
to occur in market downturns; see, e.g., Daniel and Moskowitz (2016).
Table A7 reports CER and CER
T C
for the pre-2001 sample from 1977
:
01 to 2000
:
12
and the post-2001 sample from 2001
:
01 to 2020
:
12. While economic gains have declined
for most PRs in the post-2001 period, the performance of VOLTIME and VOL-LONG is
stable over time, confirming the robustness of the low-volatility anomaly; see, e.g., Blitz
and Van Vliet (2007); Novy-Marx (2014). MOM-LONG and RANDOM FOREST still had
strong episodes in the post-2001 period, but suffered large losses at various points, which
significantly worsened their CERs. In particular, RANDOM FOREST had high turnover,
which increased the difference between CER and CER
T C
. The strong episodes in the bull
runs of MOM-LONG and RANDOM FOREST and their relatively low correlation with
VOLTIME and VOL-LONG may explain why FLEXPOOL still gave significant weight to
MOM-LONG and RANDOM FOREST in the post-2001 period.
42
Consistent with the
results of Avramov et al. (2023), we find that the machine-learning based method did not
perform well in the post-2001 period in an environment where economic constraints hold.
However, this result by no means disqualifies machine learning techniques, as other
machine-learning techniques and/or information sets may lead to very different results.
43
FLEXPOOL performed better than any candidate PR or other combined PR in the
pre-2001 sample and was roughly equal to VOLTIME, the best candidate PR.
42The correlation between these strategies and VOLTIME and VOL-LONG was between 0.48 and
0.58.
43For example, emerging machine learning based approaches that directly optimize an economic
criterion at the asset level, include economic constraints and curb turnover appear to be promising;
see, e.g., Liu and Zhou (2024) and Cong et al. (2024).
59
Table A5: Portfolio composition in different economic and market conditions.
The table shows the estimates of the slope coefficients for the time series regression in Equation (A.1)
for the evaluation sample from 1977:01 to 2020:12. *, **, *** indicates statistical significance at the
10%, 5% and 1% level, respectively.
Candidate PR NegRet Rec HighVol HighSent
1/N −0.029 −0.049∗∗∗ −0.009 −0.018
VOLTIME 0.180∗∗∗ 0.003 0.107∗∗ −0.038
VOL-LONG 0.115∗∗∗ 0.087 −0.001 0.037
MOM-LONG −0.083∗∗ −0.115∗0.004 0.024
RANDOM FOREST −0.182∗∗∗ 0.074 −0.101∗∗ −0.006
Table A6: CER differences (in bps) in different economic and market conditions.
The table reports the estimates of the slope coefficients for the time series regression in Equation
(A.2) for the evaluation sample from 1977:01 to 2020:12. CERs are computed for a power utility
investor with risk aversion coefficient γ= 3. *, **, *** indicates statistical significance at the 10%,
5% and 1% level, respectively.
NegRet Rec HighVol HighSent
1/N 72.651∗∗ 64.390 25.634 56.519∗∗
VOLTIME −253.042∗∗∗ 92.260 −12.396 82.330∗∗
VOL-LONG −197.188∗∗ 69.239 10.547 74.777∗∗
MOM-LONG 145.088∗∗∗ 72.001 41.769 45.278
RANDOM FOREST 231.480∗∗∗ −19.788 84.426∗∗∗ 37.568
60
Table A7: Subsample analysis.
The table reports the subsample results (CER and CERT C ) for the pre-2001 sample from 1977:01 to
2000:12 and the post-2001 sample from 2001:01 to 2020:12. Avg. share indicates the average share
of a PR in a subsample.
Pre-2001 Post-2001
CER CERT C Avg. Share CER CERT C Avg. Share
Candidate PR
1/N 0.0044 0.0042 0.0450 0.0043 0.0042 0.0841
VOLTIME 0.0045 0.0043 0.2022 0.0047 0.0045 0.2095
VOL-LONG 0.0054 0.0045 0.1886 0.0057 0.0048 0.1800
MOM-LONG 0.0058 0.0046 0.3199 0.0028 0.0015 0.2637
RANDOM FOREST 0.0055 0.0024 0.2444 −0.0007 −0.0038 0.2627
Combined PR
FLEXPOOL 0.0078 0.0054 x0.0051 0.0025 x
STATPOOL 0.0054 0.0036 x0.0046 0.0035 x
EQUAL WEIGHTS 0.0062 0.0052 x0.0042 0.0031 x
B.3 Application to market timing
Table A8 reports the estimated slope coefficients for the portfolio composition in different
economic and market conditions according to Equation (A.1) based on HAC-robust
standard errors. Our previous analyses in Section 4.2.3 have already indicated that HM is
largely avoided in times of market turmoil and that SOP with its good risk management
was favorably selected in times of economic distress. The results in Table A8 confirm these
findings. Table A9 reports the estimated coefficients for the CER differences between
FLEXPOOL and those achieved by one of the candidate PRs in different economic and
market conditions according to Equation (A.2). In periods of negative market returns,
FLEXPOOL outperforms all candidate PRs except LARGE-TVP-SV and SOP which is
favorably selected in recessions.
Table A10 reports CER and CER
T C
for the pre-2001 sample from 1977
:
01 to 2000
:
12
and the post-2001 sample from 2001
:
01 to 2020
:
12. We also report the results for the
BUY-and-HOLD strategy for both subsamples to put the results into perspective. Notably,
CERs were significantly higher in the pre-2001 period for all candidate PRs (including the
BUY-and-HOLD strategy) and also for the combined PRs, suggesting that different
61
market conditions were overlaid by an overarching trend of market returns with a less
attractive risk-return profile in the post-2001 period. In both subsamples, FLEXPOOL
performed comparatively well and roughly on par with LARGE-TVP-SV, the best
candidate PR. The relatively good performance of LARGE-TVP-SV confirms our findings
from Section 4.2.3 that the high predictive power of LARGE-TVP-SV is an important
feature (see Table 3) and the subsample analysis shows that it is not limited to short
episodes.
Table A8: Portfolio composition in different economic and market conditions.
The table reports the estimates of the coefficients for the time-series regression in Equation (A.1)
for the evaluation sample from 1977:01 to 2020:12. *, **, *** indicates statistical significance at the
10%, 5% and 1% level, respectively.
Candidate PR NegRet Rec HighVol HighSent
LARGE-TVP-SV 0.000 −0.033 −0.100 −0.163
BMA-TVP-CV 0.053 −0.062 0.011 0.049
UNIV-TVP-SV −0.027 0.005 −0.037 −0.038
SOP 0.033 0.313∗0.097 0.022
CF 0.000 −0.001 0.000 0.001
HM −0.062∗−0.222∗∗ 0.028 0.129
Table A9: CER differences (in bps) in different economic and market conditions.
The table reports the estimates of the coefficients for the time-series regression in Equation (A.2) for
the evaluation sample from 1977:01 to 2020:12. CERs are computed for a power utility investor with
risk aversion coefficient γ= 3. *, **, *** indicates statistical significance at the 10%, 5% and 1%
level, respectively.
Candidate PR NegRet Rec HighVol HighSent
LARGE-TVP-SV −39.950 13.505 −15.060 −2.929
BMA-TVP-CV 116.176∗∗∗ 62.216 −50.715 17.666
UNIV-TVP-SV 91.197∗∗∗ 71.958 15.349 24.713
SOP −104.640∗∗ 116.042 −9.594 25.267
CF 125.335∗∗∗ 115.318 −28.607 −0.235
PHM 224.678∗∗∗ 113.957 −29.060 11.228
62
Table A10: Subsample analysis.
The table reports the subsample results (CER and CERT C ) for the pre-2001 sample from 1977:01 to
2000:12 and the post-2001 sample from 2001:01 to 2020:12. Avg. share indicates the average share
of a PR during a subsample. Avg. share is the average weight of a candidate PR in a subsample.
Pre-2001 Post-2001
CER CERT C Avg. Share CER CERT C Avg. Share
Candidate PR
LARGE-TVP-SV 0.0115 0.0108 0.4900 0.0073 0.0068 0.6547
BMA-TVP-CV 0.0085 0.0082 0.1354 0.0047 0.0045 0.1035
UNIV-TVP-SV 0.0089 0.0086 0.0002 0.0043 0.0040 0.0485
SOP 0.0104 0.0102 0.0751 0.0032 0.0031 0.1773
CF 0.0108 0.0107 0.0102 0.0023 0.0022 0.0000
PHM 0.0098 0.0098 0.2892 0.0023 0.0023 0.0160
Combined PR
FLEXPOOL 0.0120 0.0116 x0.0068 0.0064 x
STATPOOL 0.0107 0.0101 x0.0067 0.0063 x
EQUAL WEIGHTS 0.0106 0.0104 x0.0045 0.0044 x
BUY-and-HOLD 0.0098 0.0098 x0.0042 0.0042 x
C Transaction cost mitigation strategy
In our combination framework, the transaction costs implied by the candidate PRs may
offset each other to some extent; see Equation (7). In this section, inspired by Gârleanu
and Pedersen (2013), we explore an additional mechanism to mitigate the negative effect
of transaction costs by trading a linear combination between the target portfolio (implied
by the combination weights
w∗
t
implied by the optimization (see Equation (4)) and the
current portfolio composition (implied by wimplemented
t−1):
wimplemented
t=c·w∗
t+ (1 −c)wimplemented
t−1,(A.3)
where
c
controls the proportions of
w∗
t
and
wimplemented
t−1
.
44
The higher the value of c,
the more weight is given to the optimal portfolio.
44While the general idea of trading a mixture of the current and the optimal portfolio is based on
Gârleanu and Pedersen (2013), the implementation details in our framework differ from their dynamic
optimization framework with predictable returns.
63
For our three applications, Tables A11,A12 and A13 report the results for the
transaction cost mitigation strategy, respectively. While our primary interest is in how
FLEXPOOL performs in the different configurations, we also report the performance
of the candidate and combined PRs for perspective. The tables show the results for
the evaluation sample from 1977
:
01 to 2020
:
12 for
c
= 1
.
0and
c
= 0
.
1in Equation
(A.3), and for linear transaction costs of 20 bps and 50 bps.
45
The case
c
= 1
.
0is the
standard version of our ensemble approach and is repeated here for comparison. We note
that 20 bps and especially 50 bps as transaction costs are conservative choices since we
only useliquid large-cap stocks or the S&P 500 index in our applications. Frazzini et al.
(2018) find that linear transaction cost of 10 basis points is most representative for large
institutional investors.
When applied to a cross-section of the largest 50 stocks, FLEXPOOL still slightly
outperforms the best candidate PR, VOLTIME, after transaction costs of 50 bps for
c
= 1
.
0in terms of CER
T C
(0
.
0047 vs. 0
.
0046), see Table A11. Using the transaction cost
mitigation strategy with
c
= 0
.
1, FLEXPOOL’s CER
T C
increases to 0
.
0054. Empirically,
trading only slowly towards the target portfolio (
c
= 0
.
1) works well in this application
due to the high autocorrelation (across the PRs, the average first-order correlation is 0.89)
of the optimal weights of the PRs in the combination.
When allocating
c
= 10% to the target portfolio and 90% to the current portfolio, the
CER of FLEXPOOL remains almost the same (0.0066 vs. 0.0068), but the turnover
is significantly reduced from 0.4184 to 0.2517, and thus the transaction costs are also
significantly reduced, leading to an increased CER
T C
. When applied to market timing,
average monthly turnover is reduced from 0.2012 to 0.1678, and CER
T C
remains the same
as without the transaction cost mitigation strategy; see Table A12.
When applied to a cross-section of the largest 500 stocks, the transaction cost mitigation
strategy does not work well because the auto-correlation of the combination weights of
the PRs is comparatively low (across the PRs, the average first-order auto-correlation is
0.29), leading to a reduction in CER from 0.0066 to 0.0046. Another problem is that the
RANDOM FOREST candidate PR itself has high turnover, which still leads to relatively
high turnover in FLEXPOOL, even in the case where switching between the candidate
45We report the results for c= 0.1for all applications and note that the sensitivity checks for
c= 0.05 and c= 0.20 yield similar results and are therefore not reported.
64
PRs is reduced by the transaction cost mitigation strategy.
Table A11: Transaction cost mitigation strategy for the application to the largest 50 stocks.
The table shows the results for the evaluation sample from 1977:01 to 2020:12 for c= 1.0and
c= 0.1in Equation (A.3). It includes the monthly CER values without transaction costs and with
proportional transaction costs (CERT C ) of 20 bps and 50 bps for a power utility investor with relative
risk aversion of γ= 3. Avg. TO indicates the average monthly turnover.
Candidate PRs CER CER20bps CER50bps Avg. TO
1/N 0.0035 0.0033 0.0031 0.0782
VOLTIME 0.0051 0.0049 0.0046 0.1015
KWZ-MP 0.0046 0.0041 0.0034 0.2464
KWZ-LW 0.0045 0.0034 0.0017 0.5717
GALTON 0.0052 0.0046 0.0037 0.3100
Combined PRs
c= 1.0
FLEXPOOL 0.0068 0.0060 0.0047 0.4184
STATPOOL 0.0045 0.0040 0.0032 0.2650
EQUAL WEIGHTS 0.0050 0.0046 0.0040 0.1964
c= 0.1
FLEXPOOL 0.0066 0.0061 0.0054 0.2525
STATPOOL 0.0045 0.0040 0.0032 0.2575
EQUAL WEIGHTS 0.0050 0.0046 0.0040 0.1964
65
Table A12: Transaction cost mitigation strategy for the application to market timing.
The table shows the results for the evaluation sample from 1977:01 to 2020:12 for c= 1.0and
c= 0.1in Equation (A.3). It includes the monthly CER values without transaction costs and with
proportional transaction costs (CERT C ) of 20 bps and 50 bps for a power utility investor with relative
risk aversion of γ= 3. Avg. TO indicates the average monthly turnover.
Candidate PRs CER CER20bps CER50bps Avg. TO
LARGE-TVP-SV 0.0096 0.0090 0.0081 0.2902
BMA-TVP-CV 0.0067 0.0065 0.0062 0.1077
UNIV-TVP-SV 0.0068 0.0065 0.0061 0.1301
SOP 0.0071 0.0070 0.0068 0.0675
CF 0.0069 0.0068 0.0066 0.0567
PHM 0.0064 0.0064 0.0063 0.0181
Combined PRs
c= 1.0
FLEXPOOL 0.0096 0.0091 0.0085 0.2012
STATPOOL 0.0089 0.0084 0.0076 0.2573
EQUAL WEIGHTS 0.0078 0.0077 0.0074 0.0843
c= 0.1
FLEXPOOL 0.0093 0.0091 0.0085 0.1678
STATPOOL 0.0089 0.0085 0.0074 0.2484
EQUAL WEIGHTS 0.0078 0.0077 0.0074 0.0843
66
Table A13: Transaction cost mitigation strategy for the application to the largest 500 stocks.
The table shows the results for the evaluation sample from 1977:01 to 2020:12 for c= 1.0and
c= 0.1in Equation (A.3). It includes the monthly CER values without transaction costs and with
proportional transaction costs (CERT C ) of 20 bps and 50 bps for a power utility investor with relative
risk aversion of γ= 3. Avg. TO indicates the average monthly turnover.
Candidate PRs CER CER20bps CER50bps Avg. TO
1/N 0.0044 0.0042 0.0040 0.0789
VOLTIME 0.0046 0.0044 0.0041 0.0994
VOL-LONG 0.0055 0.0047 0.0033 0.4457
MOM-LONG 0.0044 0.0032 0.0013 0.6245
RANDOM FOREST 0.0026 −0.0004 −0.0051 1.5350
Combined PRs
c= 1.0
FLEXPOOL 0.0066 0.0041 0.0003 1.2468
STATPOOL 0.0050 0.0036 0.0014 0.7139
EQUAL WEIGHTS 0.0053 0.0042 0.0027 0.5202
c= 0.1
FLEXPOOL 0.0046 0.0033 0.0012 0.6779
STATPOOL 0.0051 0.0036 0.0015 0.7070
EQUAL WEIGHTS 0.0053 0.0042 0.0027 0.5202
67