Forecast accuracy and economic gains from Bayesian model averaging using time-varying weights
ABSTRACT Several Bayesian model combination schemes, including some novel approaches that simultaneously allow for parameter uncertainty, model uncertainty and robust time-varying model weights, are compared in terms of forecast accuracy and economic gains using financial and macroeconomic time series. The results indicate that the proposed time-varying model weight schemes outperform other combination schemes in terms of predictive and economic gains. In an empirical application using returns on the S&P 500 index, time-varying model weights provide improved forecasts with substantial economic gains in an investment strategy including transaction costs. Another empirical example refers to forecasting US economic growth over the business cycle. It suggests that time-varying combination schemes may be very useful in business cycle analysis and forecasting, as these may provide an early indicator for recessions. Copyright © 2009 John Wiley & Sons, Ltd.
-
Citations (0)
-
Cited In (0)
Page 1
2009 | 10
Forecast accuracy and economic gains from
Bayesian model averaging using time varying
weight
Lennart Hoogerheide, Richard Kleijn, Francesco Ravazzolo, Herman K. van Dijk and Marno Verbeek
Working Paper
Research Department
Page 2
Working papers fra Norges Bank, fra 1992/1 til 2009/2 kan bestilles over e-post.
servicesenter@norges-bank.no
eller ved henvendelse til:
Norges Bank, Abonnementsservice
Postboks 1179 Sentrum
0107 Oslo
Telefon 22 31 63 83, Telefaks 22 41 31 05
Fra 1999 og fremover er publikasjonene tilgjengelig på www.norges-bank.no
Working papers inneholder forskningsarbeider og utredninger som vanligvis ikke har fått sin endelige form.
Hensikten er blant annet at forfatteren kan motta kommentarer fra kolleger og andre interesserte.
Synspunkter og konklusjoner i arbeidene står for forfatternes regning.
Working papers from Norges Bank, from 1992/1 to 2009/2 can be ordered by e-mail:
servicesenter@norges-bank.no
or from Norges Bank, Subscription service
P.O.Box. 1179 Sentrum
N-0107Oslo, Norway.
Tel. +47 22 31 63 83, Fax. +47 22 41 31 05
Working papers from 1999 onwards are available on www.norges-bank.no
Norges Bank’s working papers present research projects and reports (not usually in their final form) and are intended
inter alia to enable the author to benefit from the comments of colleagues and other interested parties.
Views and conclusions expressed in working papers are the responsibility of the authors alone.
ISSN 1502-8143 (online)
ISBN 978-82-7553-507-6 (online)
Page 3
Forecast Accuracy and Economic Gains
from Bayesian Model Averaging
using Time Varying Weights
Lennart Hoogerheide1Richard Kleijn2Francesco Ravazzolo3
Herman K. van Dijk1Marno Verbeek4
Abstract
Several Bayesian model combination schemes, including some novel approaches that
simultaneously allow for parameter uncertainty, model uncertainty and robust time
varying model weights, are compared in terms of forecast accuracy and economic gains
using financial and macroeconomic time series. The results indicate that the proposed
time varying model weight schemes outperform other combination schemes in terms of
predictive and economic gains. In an empirical application using returns on the S&P
500 index, time varying model weights provide improved forecasts with substantial eco-
nomic gains in an investment strategy including transaction costs. Another empirical
example refers to forecasting US economic growth over the business cycle. It suggests
that time varying combination schemes may be very useful in business cycle analysis
and forecasting, as these may provide an early indicator for recessions.
Key words: forecast combination, Bayesian model averaging, time varying model
weights, portfolio optimization, business cycle.
1Econometric and Tinbergen Institutes, Erasmus University Rotterdam, The Netherlands
2PGGM, Zeist, The Netherlands.
3Norges Bank.Correspondence to:Francesco Ravazzolo, Norges Bank, Research Department,
Bankplassen 2, 0107 Oslo, Norway. E-mail: francesco.ravazzolo@norges-bank.no
4Rotterdam School of Management, Erasmus University Rotterdam, The Netherlands
Page 4
1Introduction
When an extensive set of forecasts of some future economic event is available, decision
makers usually attempt to discover which is the best forecast, then accept this and discard
the other forecasts. However, the discarded forecasts may have some independent valuable
information and including them in the forecasting process may provide more accurate results.
An important explanation is related to the fundamental assumption that in most cases
one can not identify a priori the exact true economic process or the forecasting model
that generates smaller forecast errors than its competitors. Different models may play a –
possibly temporary – complementary role in approximating the data generating process. In
these situations, forecast combinations are viewed as a simple and effective way to obtain
improvements in forecast accuracy.
Since the seminal article of Bates and Granger (1969) several papers have shown that
combinations of forecasts can outperform individual forecasts in terms of loss functions. For
example, Stock and Watson (2004) find that for predicting output growth in seven countries
forecast combinations generally perform better than forecasts based on single models. Mar-
cellino (2004) has extended this analysis to a large European data set with broadly the same
conclusion. However, several alternative combination schemes are available and it is not
clear which is the best scheme, either in a frequentist or Bayesian framework. For example,
Hendry and Clements (2004) and Timmermann (2006) show that simple combinations1often
give better performance than more sophisticated approaches. Further, using a frequentist
approach, Granger and Ramanathan (1984) propose the use of coefficient regression meth-
ods, Hansen (2007) introduces a Mallows’ criterion, which can be minimized to select the
empirical model weights, and Terui and Van Dijk (2002) generalize the least squares model
weights by reformulating the linear regression model as a state space specification where the
weights are assumed to follow a random walk process. Guidolin and Timmermann (2007)
propose a different time varying weight combination scheme where weights have regime
switching dynamics. Stock and Watson (2004) and Timmermann (2006) use the inverse
mean square prediction error (MSPE) over a set of the most recent observations to compute
model weights. In a Bayesian framework, Madigan and Raftery (1994) revitalize the concept
of Bayesian model averaging (BMA) and apply it in an empirical application dealing with
1Simple combinations are defined as combinations with model weights that do not involve unknown
parameters to be estimated; arithmetic averages constitute a simple example. Complex combinations are
defined as combinations that rely on estimating weights that depend on the full variance-covariance matrix
and, possibly, allow for time varying model weights.
2
Page 5
Occam’s Window. Recent applications suggest its relevance for macroeconomics (Fern´ andez,
Ley, and Steel, 2001 and Sala-i-Martin, Doppelhoffer, and Miller, 2004). Strachan and Van
Dijk (2008) compute impulse response paths and effects of policy measures using BMA in
the context of a large set of vector autoregressive models. Geweke and Whiteman (2006)
apply BMA using predictive likelihoods instead of marginal likelihoods.
This paper contributes to the research on forecast combinations by investigating sev-
eral Bayesian combination schemes. We propose three schemes that allow for parameter
uncertainty, model uncertainty and time varying model weights simultaneously. These ap-
proaches can be considered Bayesian extensions of the combination scheme of Terui and Van
Dijk (2002).
We provide two empirical illustrations. The results indicate that time varying model
weight schemes outperform other averaging schemes in terms of predictive and economic
gains. The first empirical example deals with forecasting the returns on the S&P 500 index
by combining individual forecasts from four competing models. The first model assumes
that a set of financial and macroeconomic variables that are related to the business cycle
have explanatory power. The second model is based on the popular market saying “Sell in
May and go away”, also known as the “Halloween indicator”, see for example Bouman and
Jacobsen (2002). Low predictability of stock market return data is well documented, see for
example Marquering and Verbeek (2004) and so is structural instability in this context, see
for example Pesaran and Timmermann (2002) and Ravazzolo, Paap, Van Dijk, and Franses
(2007). The third and fourth model are (robust) stochastic volatility models. As an investor
is particularly interested in the economic value of a forecasting scheme, we test our findings
in an active short-term investment exercise, with an investment horizon of one month. The
forecast combination schemes with time-varying model weights provide the highest economic
gains. The second empirical example refers to forecasting US economic growth over the
business cycle, where we consider combinations of forecasts from six well-known time series
models: an autoregressive model, two random walk models (with and without drift), an error
correction model and two (robust) stochastic volatility models. It suggests that time varying
weighting schemes may provide an early indicator for recessions.
The contents of this paper are organized as follows. In Section 2 we describe the different
forecast combination schemes. In Section 3 we give results from an empirical application
to US stock index returns which show that forecast combinations give economic gains. In
Section 4 we report results from macroeconomic forecasts using US GDP growth. Section 5
concludes.
3
Page 6
2Forecast combination schemes
Bayesian approaches have been widely used to construct forecast combinations, see for ex-
ample Leamer (1978), Hodges (1987), Draper (1995), Min and Zellner (1993), and Strachan
and Van Dijk (2008). In the Bayesian model averaging approach one derives the posterior
density for any individual model and combines these to compute a predictive density of the
event of interest. The predictive density accounts then for model uncertainty by averaging
over the posterior probabilities of individual models. Since the output is a complete density,
not only point forecasts but also distribution and quantile forecasts can be easily derived.
We discuss four Bayesian forecast combination schemes. The first scheme is a standard ap-
proach known as Bayesian model averaging, the other three schemes obtain model weights
as parameters to be estimated in linear and nonlinear regressions.
2.1Scheme 1: Bayesian Model Averaging (BMA)
The predictive density of the variable y at time T + 1, yT+1, given the data up to time T,
DT, is computed by averaging over the conditional predictive densities given the individual
models with the posterior probabilities of these models as weights:
p(yT+1|DT) =
n
?
i=1
p(yT+1|DT,mi)P(mi|DT)(1)
where n is the number of individual models; p(yT+1|DT,mi) is the conditional predictive
density given DT and model mi; P(mi|DT) is the posterior probability for model mi. The
conditional predictive density given DT and model miis defined as:
?
where p(yT+1|DT,mi,θi) is the conditional predictive density of yT+1given DT, the model
miand parameters θi; p(θi|DT,mi) is the posterior density for parameters θiin model mi.
The posterior probability for model mi, P(mi|DT), can be computed in several ways.
Madigan and Raftery (1994) define it as:
p(yT+1|DT,mi) =p(yT+1|DT,mi,θi)p(θi|DT,mi)dθi
(2)
P(mi|DT) =
p(y1:T|mi)P(mi)
?n
j=1p(y1:T|mj)P(mj)
(3)
where y1:T = {yt}T
marginal likelihood for model migiven by:
t=1; P(mi) is the prior probability for model mi; and p(y1:T|mi) is the
p(y1:T|mi) =
?
p(y1:T|θi,mi)p(θi|mi)dθi
(4)
4
Page 7
with p(θi|mi) the prior density for the parameters θiin model mi. The integral in equation
(4) can be evaluated analytically in the case of linear models, but not for more complex forms.
Chib (1995), for example, has derived a method to compute the expression also for nonlinear
examples. Laplace methods can also be used, see for example Planas, Rossi, and Fiorentini
(2008). A comparative study of Monte Carlo methods for marginal likelihood evaluation,
among which importance sampling and bridge sampling, is given by Ardia, Hoogerheide,
and Van Dijk (2009).
Geweke and Whiteman (2006) propose a BMA scheme based on the idea that a model
is as good as its predictions. The predictive density of yT+1conditional on DThas the same
form as equation (1), but the posterior probability of model miconditional on DT is now
computed as:
p(yT|DT−1,mi)P(mi)
?n
where p(yT|DT−1,mi) is the predictive likelihood for model mi, e.g. the density derived by
substituting the realized value yT into the predictive density of yT conditional on DT−1
given model mi. Mitchell and Hall (2005) discuss the relation of the predictive likelihood to
the Kullback-Leibler Information Criterion, and consequently to the frequentist combination
scheme based on recursive log-score weights, see for example Kascha and Ravazzolo (2008).
We apply BMA using (5) with p(yT|DT−1,mi) replaced by its product over T − k ob-
servations p(yk+1|Dk,mi) × ... × p(yT|DT−1,mi), where for increasing T we hold constant
the length k of the ‘initial period’ of data Dkthat are only used for deriving posterior dis-
tributions.2
That is, for forecasts of yT+1 in later periods the predictive likelihoods and
model weights are based on an expanding window of data. The densities p(yt|Dt−1,mi) are
evaluated as follows. First, parameters θiare simulated from the conditional distribution
on Dt−1. Second, draws ytare simulated conditionally on the θidraws and Dt−1. Third, a
kernel smoothing technique is used to estimate the density of ytin model miat its realized
value. The performance of alternative approaches for computing predictive likelihoods in
our time varying model combination schemes is left as a topic for future research.
In all models, we specify uninformative proper priors for the parameters θi. The use
of predictive likelihoods rather than marginal likelihoods helps us to avoid the inference
problems due to the Bartlett paradox.
P(mi|DT) =
j=1p(yT|DT−1,mj)P(mj)
(5)
2We choose k = 12 for our applications involving monthly data.
5
Page 8
2.2Combination schemes using estimated regression coefficients
as model weights
The next three combination schemes estimate the weights wiof the models mi(i = 1,...,n)
in regression form. We assume that the data ytsatisfy the linear equation
yt= w0+
n
?
i=1
wiyt,i+ ut
ut∼ N(0,σ2) i.i.d.t = 1,2,...,T(6)
where yt,ihas the predictive density p(yt|Dt−1,mi) of ytgiven Dt−1in model mi. Clear dif-
ferences with the BMA approach are that a constant term w0is added, and that there is no
restriction that all weights must be non-negative and adding to 1.3Therefore, the weights
wi(i = 1,...,n) can not be interpreted as model probabilities. Define the model weight vec-
tor w = (w0,w1,...,wn)?. We propose three novel sampling algorithms for simulating model
weight vectors w given the data y1:Tand the predictive densities p(yt|Dt−1,mi) (t = 1,...,T).
Scheme 2: Model weights from Ordinary Least Squares in a linear model (LIN)
A set of model weight vectors ws(s = 1,...,S) is generated by simulating independently
S sets of T × n draws ys
1,...,n), and performing an Ordinary Least Squares (OLS) regression in the model
t,ifrom the predictive densities p(yt|Dt−1,mi) (t = 1,...,T; i =
yt= w0+
n
?
i=1
wiys
t,i+ us
t
us
t∼ N(0,σ2)t = 1,2,...,T(7)
for each simulated set s = 1,...,S. It is well-known that in a linear model as (7) the OLS
estimator wsis the posterior mean of w under a flat prior. The generated model weights ws
are used to combine draws ys
into ‘combined draws’ ˜ ys
T+1,i(i = 1,...,n) from the predictive densities p(yT+1|DT,mi)
T+1:
˜ ys
T+1= ws
0+
n
?
i=1
ws
iys
T+1,i
(8)
The median of ˜ ys
is preferred over the mean because it is more robust to extreme draws. This approach can
be considered as an extension of the idea of Granger and Ramanathan (1984) to combine
point forecasts using weights that minimize a square loss function, to making use of Bayesian
T+1(s = 1,...,S) is our point forecast ˆ yT+1for yT+1, where the median
3Granger and Ramanathan (1984) explain that the constant term must be added to avoid biased forecasts.
They also conclude that this strategy is often more accurate than using restricted least squares weights.
6
Page 9
density forecasts. The model weights minimize the distance between the vector of observed
values y1:Tand the space spanned by the constant vector and the vectors of ‘predicted’ values
ys
1:T,i(i = 1,...,n).
The ‘combined draws’ ˜ ys
that aims at describing the central part of the predictive density, taking into account the
parameter and model uncertainty.
The assumption that the error term us
correlation over t, and has a normal distribution, is arguably violated. However, violations
of this assumption have no dire consequences for the performance of the proposed point
forecast ˆ yT+1. Roughly stated, the OLS estimator’s frequentist property of consistency in
combination with taking the median of a large set of ‘combined draws’ ˜ ys
is still a usable approach. For example, the use of Generalized Least Squares (GLS) methods
would not yield substantially different forecasts ˆ yT+1. The impact of this assumption on the
‘shrunk’ predictive density is arguably small; a closer look at this issue is left as a topic for
further research.
T+1are interpreted as draws from a ‘shrunk’ predictive density
tin (7) has constant variance σ2and no serial
T+1implies that OLS
Scheme 3: Time-varying weights (TVW)
The complementary roles of different models in approximating the data generating process
may differ over time. Therefore, substantially better forecasts may be obtained by extending
(6) to allow the model weights wi(i = 1,...,n) to change over time, resulting in
yt= wt,0+
n
?
i=1
wt,iyt,i+ ut
ut∼ N(0,σ2)t = 1,2,...,T. (9)
Terui and Van Dijk (2002) have proposed a method that extends the linear weight combi-
nation of point forecasts to time-varying weights. We extend their approach by making use
of Bayesian density forecasts, taking into account parameter uncertainty. As Terui and Van
Dijk (2002) we assume that the model weights wt= (wt,0,wt,1,...,wt,n)?(t = 1,..,T) evolve
over time in the following fashion:
wt= wt−1+ ξt
ξt∼ N(0,Σ).(10)
We restrict the covariance matrix Σ of the ‘weight innovations’ ξtto be a diagonal matrix.
The assumed independence of the weight innovations does not rule out that a posteriori
there will be coinciding (large) changes of model weights. It means that this dependence
is not imposed a priori. Including correlations in the weights would make the estimation
7
Page 10
procedure computationally more difficult, and guessing in the correlation structure can be
dangerous, possibly resulting in a poor forecasting scheme. Still, we intend to analyze the
extension of our scheme to non-diagonal Σ in future research.
As in scheme 2, our algorithm results in a set of generated model weights ws
1,...,S) given the data y1:Tand draws ys
(t = 1,...,T). The generated model weights ws
(i = 1,...,n) from the predictive densities p(yT+1|DT,mi) into ‘combined draws’ ˜ ys
T+1(s =
t,isimulated from the predictive densities p(yt|Dt−1,mi)
T+1are used to transform draws ys
T+1,i
T+1:
˜ ys
T+1= ws
T+1,0+
n
?
i=1
ws
T+1,iys
T+1,i
(11)
where the median of ˜ ys
a Kalman filter algorithm (see for example Harvey (1993)) having the interpretation of a
Bayesian learning approach is used to iteratively update the subsequent model weights ws
(t = 1,...,T + 1) in the model given by
T+1(s = 1,...,S) is our point forecast ˆ yT+1for yT+1. In scheme 3,
t
yt= ws
t,0+
n
?
i=1
ws
t,iys
t,i+ us
t
us
t∼ N(0,σ2)t = 1,2,...,T(12)
and (10). We fix the values of σ2and the diagonal elements of Σ. A Bayesian can interpret
these assumptions as having priors on σ2and Σ with zero variances. For each s the param-
eters σ2and Σ could also be estimated by maximum likelihood or MCMC methods, but we
discard this to reduce computational time.4
The model weights ws
the observed values y1:T and linear combinations of ‘predicted’ values ys
and constructing a ‘smooth’ path of weights ws
tincorporate a trade-off between minimizing the differences between
1:T,i(i = 1,...,n),
tover time.
Scheme 4: Robust time-varying weights (RTVW)
Recently, a new specification has been developed that makes parameter estimation in case of
instability over time more robust to prior assumptions, see for example Giordani and Villani
(2008) and Groen, Paap, and Ravazzolo (2009) for applications. We extend the scheme 3 of
time-varying model weights following the same reasoning. Then the weight innovations are
4In the financial application (with n = 4 models) we set σ2equal to its OLS estimate in (6) allowing it
to change with s. The (n + 1) × 1 vector diag(Σ) of diagonal elements of Σ is set as (0.1,0.01ι?
the n×1 vector consisting of ones, to have (small) signal-to-noise ratios in the range from 0.01 to 0.005. For
robustness we have tried different values of σ2and Σ with signal-to-noise ratios ranging from 0.0001 to 0.1, all
resulting in qualitatively equal results. In the macroeconomic application we set diag(Σ) = (0.01,0.005ι?
n)?with ιn
n)?.
8
Page 11
equal to the latent variables ξt,i(i = 0,1,...,n) only with probability πiand set equal to 0
with probability 1 − πi. That is, equation (10) becomes
wt= wt−1+ kt? ξt
ξt∼ N(0,Σ)(13)
with kt= (k0,t,k1,t,...,kn,t)?, where each element ki,tof the vector ktis an unobserved 0/1
variable with P[ki,t = 1] = πi. The Hadamard product ? refers to element-by-element
multiplication. Σ is again restricted to be a diagonal matrix.
The model (12)-(13) is estimated following Gerlach, Carter, and Kohn (2000), estimating
ktby deriving its posterior density conditional on σ2and Σ, but not on wt. Then, we apply
the Kalman Filter to estimate the latent factors wt. We set σ2and the diagonal elements of
Σ to the same fixed values as for scheme 3.
3Financial application
In our first application we investigate the forecasting performance and economic gains ob-
tained by applying the four forecast combination schemes to the case of US stock index
returns, the continuously compounded monthly return on the S&P 500 index in excess of
the 1-month T-Bill rate, from January 1966 to December 2008, for a total of 516 observations.
We use n = 4 individual models. The first model is based on the idea that a set of finan-
cial and macroeconomic variables contains potentially relevant factors for forecasting stock
returns. Among others, Pesaran and Timmermann (1995), Cremers (2002), Marquering and
Verbeek (2004) have shown that such variables can have predictive power. We include as
predictors the S&P 500 index dividend yield defined as the ratio of dividends over the previ-
ous twelve months and the current stock price, the 3-month T-Bill rate, the monthly change
in the 3-month T-bill rate, the term spread defined as the difference between the 10-year
T-bond rate and the 3-month T-bill rate, the credit spread defined as the difference between
Moody’s Baa and Aaa yields, the yield spread defined as the difference between the Federal
funds rate and the 3-month T-bill rate, the annual inflation rate based on the producer price
index (PPI) for finished goods, the annual growth rate of industrial production, and the
annual growth rate of the monetary base measure M1. We take into account the typical
publication lag of macroeconomic variables in order to avoid look-ahead bias and we include
inflation, the growth rates of industrial production and the monetary base with a two-month
lag. As the financial variables are promptly available, these are included with a one-month
lag. We label this forecasting model “Leading indicator” (LI).
9
Page 12
The second forecasting model is a simple linear regression model with a constant and a
dummy for November-April. It is based on the popular market saying “Sell in May and go
away”, also known as the “Halloween indicator” (HI) which is based on the assumption that
stock returns can be predicted simply by deterministic time patterns. This suggests to buy
stock in November and sell it in May. Bouman and Jacobsen (2002) show that this strategy
has predictive power.
The third model allows for a well-known stylized fact on excess returns, time-varying
volatility. We apply a stochastic volatility (SV) model with time varying mean:
rt = µt+ σtut
ut∼ N(0,1)
ξ1,t∼ N(0,τ2
ξ2,t∼ N(0,τ2
(14)
µt = µt−1+ ξ1,t
ln(σ2
1) (15)
t) = ln(σ2
t−1) + ξ2,t
2)(16)
The fourth model is a robust extension of the SV model that allows for parameter insta-
bility as in Giordani and Kohn (2008). In this robust stochastic volatility (RSV) model the
time-varying mean and volatility are given by
rt = µt+ σtut
ut∼ N(0,1)
ξ1,t∼ N(0,τ2
ξ2,t∼ N(0,τ2
(17)
µt = µt−1+ K1,tξ1,t
ln(σ2
1) (18)
t) = ln(σ2
t−1) + K2,tξ2,t
2)(19)
where Kj,t(j = 1,2; t = 1,...,T) is an unobserved 0/1 variable with P[Kj,t= 1] = πj,RSV.
The LI and HI specifications are linear models, therefore standard Bayesian derivations
apply to these, see for example Koop (2003). For estimation of the SV and RSV models we
refer to Giordani, Kohn, and Van Dijk (2007).
3.1Evaluation
We evaluate the statistical accuracy of the individual models and the four forecast combina-
tion schemes in terms of the root mean square error (RMSPE), and in terms of the correctly
predicted percentage of sign (Sign Ratio). Moreover, as an investor is more interested in the
economic value of a forecasting model than its precision, we test our conclusions in an active
short-term investment exercise, with an investment horizon of one month. The investor’s
portfolio consists of a stock index and riskfree bonds only. At the start of each month T +1,
the investor decides upon the fraction of her portfolio to be invested in stocks pwT+1, based
10
Page 13
upon a forecast of the excess stock return rT+1. The investor is assumed to maximize a
power utility function with coefficient of relative risk aversion γ:
u(WT+1) =W1−γ
T+1
1 − γ,
γ > 1, (20)
where WT+1is the wealth at the end of period T + 1, which is equal to
WT+1= WT((1 − pwT+1)exp(rf,T+1) + pwT+1exp(rf,T+1+ rT+1)), (21)
where WT denotes initial wealth, and where rf,T+1is the riskfree rate.
Without loss of generality we set initial wealth equal to one, WT = 1, such that the
investor’s optimization problem is given by
?((1 − pwT+1)exp(rf,T+1) + pwT+1exp(rf,T+1+ rT+1))1−γ
max
pwT+1ET(u(WT+1)) = max
pwT+1ET
1 − γ
?
,
(22)
where ETis the conditional expectation given information DTat time T. How this expecta-
tion is computed depends on how the predictive density for the excess returns is computed.
If we generally denote this density as p(rT+1|DT), the investor solves the following problem:
?
The integral in (23) is approximated by generating G independent draws {rg
predictive density p(rT+1|DT), and then using a numerical optimization method to maximize
the quantity:
max
pwT+1
u(WT+1)p(rT+1|DT)drT+1. (23)
T+1}G
g=1from the
1
G
G
?
g=1
?((1 − pwT+1)exp(rf,T+1) + pwT+1exp(rf,T+1+ rg
T+1))1−γ
1 − γ
?
(24)
We do not allow for short-sales or leveraging, constraining pwT+1to be in the [0,1] interval
(see Barberis (2000)).
We include eight cases in the empirical analysis below. We consider an investor who
obtains a forecast of the excess stock return rT+1from the n = 4 individual models (denoted
LI, HI, SV and RSV) described above. Then, we consider combination forecasts using the
four schemes (BMA, LIN, TVW and RTVW) from section 2, where all the individual models
are combined.
We evaluate the different investment strategies by computing the ex post annualized mean
portfolio return, the annualized standard deviation, the annualized Sharpe ratio and the total
11
Page 14
utility. Utility levels are computed by substituting the realized return of the portfolios at
time T + 1 into (20). Total utility is then obtained as the sum of u(WT+1) across all T∗
investment periods T = T0+1,...,T0+T∗, where the first investment decision is made at the
end of period T0. In order to compare alternative strategies we compute the multiplication
factor of wealth that would equate their average utilities. For example, suppose we compare
two strategies A and B. The wealth provided at time T +1 by the two resulting portfolios is
denoted as WA,T+1and WB,T+1, respectively. We then determine the value of ∆ such that
T0+T∗−1
?
T=T0
u(WA,T+1) =
T0+T∗−1
?
T=T0
u(WB,T+1/exp(∆)). (25)
Following Fleming, Kirby, and Ostdiek (2001), we interpret ∆ as the maximum performance
fee the investor would be willing to pay to switch from strategy A to strategy B. For com-
parison of multiple investment strategies, it is useful to note that – under a power utility
specification – the performance fee an investor is willing to pay to switch from strategy
A to strategy B can also be computed as the difference between the performance fees of
these strategies with respect to a third strategy C.5We use this property below to infer the
added value of strategies based on individual models and combination schemes by comput-
ing ∆ with respect to three static benchmark strategies: holding stocks only (∆s), holding
a portfolio consisting of 50% stocks and 50% bonds (∆m), and holding bonds only (∆b).
Finally, the portfolio weights in the active investment strategies change every month,
and the portfolio must be rebalanced accordingly. Hence, transaction costs play a non-
trivial role and should be taken into account when evaluating the relative performance of
different strategies. Rebalancing the portfolio at the start of month T + 1 means that the
weight invested in stocks is changed from pwT to pwT+1. We assume that transaction costs
amount to a fixed percentage c on each traded dollar. Setting the initial wealth WT equal
to 1 for simplicity, transaction costs at time T + 1 are equal to
cT+1= 2c|pwT+1− pwT|
(26)
where the multiplication by 2 follows from the fact that the investor rebalances her invest-
ments in both stocks and bonds. The net excess portfolio return is then given by rT+1−cT+1.
We apply a scenario with transaction costs of 0.1%.
5This follows from the fact that combining (25) for the comparisons of strategies A and B with
?
can be rewritten as?
12
C,
?
Tu(WC,T+1) =
Tu(WA,T+1/exp(∆A)) =?
?
Tu(WA,T+1/exp(∆A)) and
Tu(WB,T+1/exp(∆B)). Using the power utility specification in (20), this
Tu(WA,T+1) =?
?
Tu(WC,T+1) =
?
Tu(WB,T+1/exp(∆B)), gives
Tu(WB,T+1/exp(∆B− ∆A)).
Page 15
3.2 Empirical Results
The analysis for the active investment strategies is implemented for the period from Jan-
uary 1987 until December 2008, involving T∗= 264 one month ahead excess stock return
forecasts. The individual models are estimated recursively using an expanding window of
observations. The initial 12 predictions for each individual model are used as training pe-
riod for combination schemes and making the first combined prediction. The investment
strategies are implemented for a level of relative risk aversion of γ = 6.6
Before we analyze the performance of the different portfolios, we summarize the statisti-
cal accuracy of the excess return forecasts. All the individual models give similar RMSPE
statistics in Table 1, for the RSV model just the smallest and for the LI model the highest.
The sign ratio is the highest for the SV model, but hardly exceeds 60%, indicating low pre-
dictability. Due to this low predictability, small differences in RMSPE may have substantial
economic value. We investigate this in the portfolio exercise. The SV model gives the highest
Sharpe ratio, realized final utility and comparison fees ∆ among the individual models. The
TVW and RTVW combination schemes, however, provide much higher statistics; in partic-
ular RTVW outperforms all the other models in terms of Sharpe ratio and realized utility
value, and all three ∆’s are positive. Figure 1 can help to explain these findings. Individual
models allocate too low weight to the risky asset resulting in low portfolio returns. BMA
has a similar problem. The LIN, TVW and RTVW combinations allocate higher weights to
the stock asset, but RTVW is the only scheme that drastically reduces this weight in bear
market periods as the burst of the internet bubble in 2001-2003 or the recent financial crisis
in the second part of 2007 and 2008. Panel C in Table 1 shows evidence that the findings
are similar when taking into account the presence of medium transaction costs.
The good performance of RTVW as compared to LIN and TVW shows that its robust
flexible structure pays off. The higher portfolio weight of stock in bull markets for RTVW,
as compared to the individual models and BMA, is due to the ‘shrunk’ predictive density.
This ‘shrunk’ excess return distribution is not so much ‘compressed’ that the risky asset’s
portfolio weight switches from 0% to 100% when its mean changes from negative to positive
values. Rather, the parameter and model uncertainty that are incorporated in this ‘shrunk’
predictive density imply an investment strategy with a smooth, ‘moderate’, yet flexible
6We also implement exercises with γ = 4 and γ = 8. Results are qualitatively similar and available upon
request.
13
View other sources
Hide other sources
-
Available from Herman van Dijk · 22 Apr 2013
-
Available from SSRN
-
Available from psu.edu