Article

Approximately Normal Tests for Equal Predictive Accuracy in Nested Models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The models in competition are the continuous-time three-and five-factor AFDNS models, the fourand five-factor CKLS models on one side, and the more parsimonious univariate and vector autoregressive (AR and VAR) models, and the random walk process, on the other side. The out-of-sample model performances are evaluated using formal statistical tests including the equal predictability tests of Diebold and Mariano (1995) and Clark and West (2007), as well as the superior predictive ability (SPA) test of Hansen (2005) and the model confidence set of Hansen et al. (2011) (hereafter, MCS). ...
... We test the statistical significance of the out-of-sample forecasts using more formal procedures such as the Diebold-Mariano (DM) test (Diebold and Mariano 1995) for any two sets of forecasts and the Clark and West (2007) (CW) test for nested models. The CW test provided an adjustment for the DM test such that the test statistic had an approximately zero mean under the null hypothesis. ...
... As a result, the forecasting literature has not yet found a consensus regarding the selection of the most suitable measure. To test the statistical significance of the forecasting accuracy across the models, we have employed more advanced and well-established techniques such as the Diebold and Mariano (1995), Clark and West (2007), the superior predictive ability test of Hansen (2005), and the model confidence set of Hansen et al. (2011). Our empirical results from these four formal statistical tests significantly contribute to Content courtesy of Springer Nature, terms of use apply. ...
Article
Full-text available
In this paper, we present a robust predictive comparison of several continuous-time multi-factor models in the context of interbank rates. Recognizing the specific dynamics of the short-term segment of the yield curve, we examine the U.S. money market by extending two continuous-time frameworks with different factor structures, the Chan-Karolyi-Longstaff-Sanders (CKLS) model and the arbitrage-free dynamic Nelson-Siegel (AFDNS) model. A battery of formal forecasting accuracy tests is employed to select a subset of superior predictive models. Despite a better goodness-of-fit measure, additional factors improve the forecasting performance only for the CKLS family. With implications for monetary policy formulation, we found evidence of two separate maturity segments as the three-factor AFDNS and the five-factor CKLS models outperform parsimonious benchmarks in predicting the interbank rates for very short maturities. Our comparative forecasting results are re-confirmed with stronger out-of-sample performance for the five-factor CKLS model when the post global financial crisis sub-sample is analyzed.
... This is motivated by the ongoing global pandemic, COVID-19, in predicting sector stock returns. Design/methodology/approach -The study considers estimation of dynamic panel data with dynamic common correlated effects estimator and two pair-wise forecast measures, namely Campbell and Thompson (2008) and Clark and West (2007) tests in dealing with the nested predictive models. Findings -The results show that pandemic uncertainty has a negative and statistically significant effect on the different sector returns, implying that sector stock returns decline as the pandemic outbreak becomes more pronounced. ...
... Having established evidence that indicates predictability of sector stock returns due to infectious disease driven market uncertainty, we now evaluate the forecast performance of our predictive model in comparison with the benchmark modelthe historical average. As earlier stated, we adopt the RMSE and more formally, the C-W (Clark and West, 2007) [4] test statistic is used to establish the statistical significance of the forecast evaluation procedure which measures the significance of the difference in the forecast errors of two competing models presented for both in-sample and out-of-sample estimation results. The in-sample predictability evaluation, as described in the methodology section, is performed using 50% of the entire data sample for the insample forecasts and the remaining half for the out-of-sample forecasts. ...
... The Clark and West test measures the significance of the difference the forecast errors of two competing models. The null hypothesis of a zero coefficient is rejected if this statistic is greater than þ1.282 (for a one sided 0.10 test), þ1.645 (for a one sided 0.05 test) and þ2.00 for 0.01 test (for a one sided 0.01 test) (see Clark and West, 2007). Values in square brackets ...
Article
Purpose In this paper, the author examines the role of uncertainty due to pandemic on the predictability of sectoral stock returns in South Africa. This is motivated by the ongoing global pandemic, COVID-19, in predicting sector stock returns. Design/methodology/approach The study considers estimation of dynamic panel data with dynamic common correlated effects estimator and two pair-wise forecast measures, namely Campbell and Thompson (2008) and Clark and West (2007) tests in dealing with the nested predictive models. Findings The results show that pandemic uncertainty has a negative and statistically significant effect on the different sector returns, implying that sector stock returns decline as the pandemic outbreak becomes more pronounced. While the single predictor model consistently outperforms the historical average model both for in-sample and out-of-sample, controlling for other macroeconomic variables effect improves the forecast accuracy of infectious diseases uncertainty. These results are consistently robust to both the in-sample and out-of-sample forecast periods, outliers and heterogeneity. These results have implications for portfolio diversification strategies, which we set aside for future research. Originality/value The empirical literature is satiated with studies on how news can predict economic and financial variables, however, the role of uncertainty due to infectious diseases in the stock return predictability especially at the sectoral level is less understudied, this is the main contribution of the study.
... The overall lack of statistical significance should not undermine the fact that most OOS R 2 statistics are positive because the estimation procedure effectively shortens the already short 11-year sample period to 6 years, thus reducing the power of the tests. To circumvent this issue, I pool country-specific Clark-West (2007) mean squared prediction error f-statistics and run global panel regressions, thus increasing the statistical power of the tests by combining information from different regions. Columns 1 to 3 of panel B ...
... The OOS predictions are conducted for the period from January 2011 to December 2016 and are based on expanding windows with an initial estimation period from July 2006 to December 2010. The statistical significance of the OOS R 2 is derived using Clark and West's (2007) method and is based on heteroscedasticity-and autocorrelation-robust Newey-West standard errors with 12 lags in panel A, and on Driscoll-Kraay standard errors that are robust to cross-country correlations, heteroscedasticity, and autocorrelation in panel B. *p < .1; **p < .05; ...
Article
Full-text available
I find that short interest significantly and negatively predicts aggregate stock returns in 24 of 32 countries examined. This predictability survives out-of-sample tests, persists outside of recessions, and is not subsumed by other well-known return predictors. The results indicate that short interest contains valuable information for forecasting international market returns that is distinct and more powerful than that of other available predictors. However, the predictive power of short interest varies over time and across regions. It is higher around economic downturns when margin requirements tighten and in regions where short selling is constrained by regulations or equity lending market frictions. (JEL G12, G14, G15, G17) Received August 5, 2022; editorial decision March 5, 2023 by Editor Marcin Kacperczyk. Authors have furnished an Internet Appendix, which is available on the Oxford University Press Web site next to the link to the final published paper online.
... If R 2 OOS > 0, then the strategy forecast outperforms the historical average forecast in a mean squared error sense. To assess whether the proposed forecasting strategy has a significantly lower MSPE than the HM benchmark, we test the null hypothesis R OOS 0 against the alternative R OOS > 0, using the test by Clark and West (2007). Calculated over the entire post-holdout OOS period from 1999:01 to 2018:12 (240 months), R 2 OOS is 6.52% with an associated p-value of 0.003, based on the test by Clark and West (2007). ...
... To assess whether the proposed forecasting strategy has a significantly lower MSPE than the HM benchmark, we test the null hypothesis R OOS 0 against the alternative R OOS > 0, using the test by Clark and West (2007). Calculated over the entire post-holdout OOS period from 1999:01 to 2018:12 (240 months), R 2 OOS is 6.52% with an associated p-value of 0.003, based on the test by Clark and West (2007). Interestingly, this R 2 OOS -statistic is much greater than the predictive power of established macroeconomic and financial variables (see, e.g., Goyal and Welch, 2008). ...
Preprint
Full-text available
We introduce a novel strategy to predict monthly equity premia that is based on extracted news from more than 700, 000 newspaper articles, which were published in The New York Times and Washington Post between 1980 and 2018. We propose a flexible data-adaptive switching approach to map a large set of different news-topics into forecasts of aggregate stock returns. The information that is embedded in our extracted news is not captured by established economic predictors. Compared to the prevailing historical mean between 1999 and 2018, we find large out-of-sample (OOS) gains with an R_OOS^2 of 6.52% and sizeable utility gains for a mean-variance investor. The empirical results indicate that geopolitical news are at times more valuable than economic news to predict the equity premium and we also find that forecasting gains arise in down markets. JEL classifications: G11, G12, G17, C53
... The l-steps-ahead forecastˆsforecastˆ forecastˆs t+h|t is obtained by appropriate substitution based on the conditional volatility specification and the forecast errors are given by For forecast evaluation, we use both the mean squared forecast error (MSE) and the mean absolute forecast error (MAE) criteria. The null hypothesis of equality of forecast performance from different models is tested in a pairwise comparison using the Diebold and Mariano (1995) (DM) test and the modified DM type test statistics for nested models of Clark and West (2007), depending on the models to be compared. Furthermore, we use the superior predictive ability (SPA) test introduced by Hansen (2005) that allows for the simultaneous test of n similar null hypotheses against a group of alternatives. ...
... Table 4 contains results of pairwise forecast comparisons, for the four models, with the Diebold and Mariano (1995) test using both squared forecast error and absolute forecast error loss functions. For the cases of nested models, we apply the modified Diebold-Mariano test by Clark and West (2007). Note that in the hierarchy of GARCH type models the simpler ones are always nested in the more complex ones and historical volatility is nested in all time series models. ...
Article
The financial crisis has fueled interest in alternatives to traditional asset classes that might be less affected by large market gyrations and, thus, provide for a less volatile development of a portfolio. One attempt at selecting stocks that are less prone to extreme risks, is obeyance of Islamic Sharia rules. In this light, we investigate the statistical properties of the Dow Jones Islamic Finance (DJIM) index and explore its volatility dynamics using a number of up-to-date statistical models allowing for long memory and regime-switching dynamics. We find that the DJIM shares all stylized facts of traditional asset classes, and estimation results and forecasting performance for various volatility models are also in line with prevalent ndings in the literature. Overall, the relatively new Markov-switching multifractal model performs best under the majority of time horizons and loss criteria. Long memory GARCH-type models always improve upon the short-memory GARCH specification and additionally allowing for regime changes can further improve their performance.
... The most widely used tests for evaluating the statistical difference among competing forecasting models are the Diebold and Mariano (1995) test, the Equal Predictive Accuracy test of Clark and West (2007), the Reality Check for Data Snooping of White (2000), the Superior Predictive Ability of Hansen (2005) and the Model Confidence Set of Hansen et al. (2011). Each method has its pros and cons, and the Diebold and Mariano test is best ...
Article
Full-text available
The paper provides a disaggregated mixed-frequency framework for the estimation of GDP. The GDP is disaggregated into components that can be forecasted based on information available at higher sampling frequency, i.e., monthly, weekly, or daily. The model framework is applied for Greek GDP nowcasting. The results provide evidence that the more accurate nowcasting estimations require (i) the disaggregation of GDP, (ii) the use of a multilayer mixed-frequency framework, and (iii) the inclusion of financial information on a daily frequency. The simulation study provides evidence in favor of the disaggregation into components despite the inclusion of multiple sources of forecast errors.
... We use a rolling window for estimating the predictive regressions. The R 2 OOS is computed as in Campbell and Thompson (2008); p-values for R 2 OOS are computed as in Clark and West (2007). In Panel A, the out-of-sample period starts in 1990; in Panel B, the out-of-sample period starts in 2000. ...
Article
Full-text available
We study the drift and cyclical components in U.S. Treasury bonds. We find that bond yields are drifting because they reflect the drift in monetary policy rates. Empirically, modeling the monetary policy drift using demographics and productivity trends, plus long-term inflation expectations, leads to cyclical deviations of bond prices from their drift that predict bond returns in- and out-of-sample. These bond cycles can be interpreted as term premia or/and temporary deviations from rational expectations in a behavioral framework. Through the lens of our model, we detect a significant role of the latter in determining the cyclical properties of yields with short maturities.
... Out-of-sample test.Note: This table presents the forecast evaluation of four linear asset pricing models for the excess returns of the value-weighted AH-sorted quintile portfolios (AH1 through AH5). Out-of-sample R 2 values comparing the forecast errors of the AH+FF5 model to the other three benchmark models with theClark and West (2007) p-values in parentheses. Panel A presents the values of the one-month-ahead prediction. ...
Article
Full-text available
The paper investigates how the availability heuristic of individual stocks affects equity returns, where the availability heuristic is measured by the gap between the fractal dimension and the rational case (1.5). Our evidence support that the availability heuristic can positively predict the short-term expected excess returns and negatively predict the long-term expected excess returns. Further evidence from out-of-sample tests confirms the predictive ability of the availability heuristic. Our findings provide new insight into the understanding of the stock returns from behavioral finance.
... The results for investing directly in decile portfolios are summarized in Exhibit 2. Panels A and B report 15 In results not shown, we implement statistical forecast evaluation using Campbell and Thompson's (2008) out-of-sample R2 (R_OS^2) together with a test proposed in Clark and West (2007). Neither single predictor nor combination or multivariate forecast models deliver a statistically significant reduction in forecast errors. ...
... They consider the situation with two regression models-one being nested in the other-where the parameters are estimated by least squares and the mean squared (prediction) error is used as criterion function. The observation made in Clark and West (2007) is that MSPE is expected to be smaller for parsimonious models. This motivates a correction of a particular test. ...
Article
We consider the case where a parameter, ; is estimated by maximizing a criterion function, Q(X ;). The estimate is then used to evaluate the criterion function with the same data, X , as well as with an independent data set, Y. The in-sample …t and out-of-sample …t relative to that of 0 ; the "true" parameter, are given by T x;x = Q(X ; ^ x) Q(X ; 0) and T y;x = Q(Y; ^ x) Q(Y; 0). We derive the limit distribution of (T x;x ; T y;x) for a large class of criterion functions and show that T x;x and T y;x are strongly negatively related. The implication is that good in-sample …t translates directly into poor out-of-sample …t. This result forms the basis for a uni…ed framework for discussing aspect of model selection, model averaging, and the e¤ects of data mining. The limit distribution can also be used to motivate a particular form of shrinkage, called qrinkage, where in-sample parameter estimates are modi…ed to o¤-set the over…t of the criterion function, hence the name. This form of shrinkage is particularly simple in the context of regression models, such as the factor-based forecasting models.
Article
We examine the predictability of the model‐free implied volatility from swaptions on future realized volatility of the underlying swap rates. The model‐free implied volatility demonstrates significant predictability on future realized volatility of swap rates along a wide cross‐section of tenors. The predictive power of the model‐free implied volatility is superior to the predictability of lagged realized volatility and generalized autoregressive conditional heteroskedasticity‐type conditional volatility. The superior predictive power of the model‐free implied volatility also holds out‐of sample, in different market states and with longer forecasting horizons.
Article
We examine the importance of volatility and jump risk in the time‐series prediction of S&P 500 index option returns. The empirical analysis provides a different result between call and put option returns. Both volatility and jump risk are important predictors of put option returns. In contrast, only volatility risk is consistently significant in the prediction of call option returns over the sample period. The empirical results support the theory that there is option risk premium associated with volatility and jump risk, and reflect the asymmetry property of S&P 500 index distribution.
Article
This paper analyses to what extent a selection of leading indicators is able to forecast U.S. recessions, by means of both dynamic probit models and Support Vector Machine (SVM) models, using monthly data from January 1871 to June 2016. The results suggest that the probit models predict U.S. recession periods more accurately than SVM models up to six months ahead, while the SVM models are more accurate over longer horizons. Furthermore, SVM models appear to distinguish between recessions and tranquil periods better than probit models do. Finally, the most accurate forecasting models are those that include oil, stock returns and the term spread as leading indicators.
Article
To be efficient, logistics operations in e-commerce require warehousing and transportation resources to be aligned with sales. Customer orders must be fulfilled with short lead times to ensure high customer satisfaction, and the costly under-utilization of workers must be avoided. To approach this ideal, forecasting order quantities with high accuracy is essential. Many drivers of online sales, including seasonality, special promotions and public holidays, are well known, and they have been frequently incorporated into forecasting approaches. However, the impact of weather on e-commerce operations has not been rigorously analyzed. In this paper, we integrate weather data into the sales forecasting of the largest European online fashion retailer. We find that sunshine, temperature, and rain have a significant impact on daily sales, particularly in the summer, on weekends, and on days with extreme weather. Using weather forecasts, we have significantly improved sales forecast accuracy. We find that including weather data in the sales forecast model can lead to fewer sales forecast errors, reducing them by, on average, 8.6% to 12.2% and up to 50.6% on summer weekends. In turn, the improvement in sales forecast accuracy has a measurable impact on logistics and warehousing operations. We quantify the value of incorporating weather forecasts in the planning process for the order fulfillment center workforce and show how their incorporation can be leveraged to reduce costs and increase performance. With a perfect information planning scenario, excess costs can be reduced by 11.6% compared with the cost reduction attainable with a baseline model that ignores weather information in workforce planning. This article is protected by copyright. All rights reserved.
Article
This paper shows that for five small commodity-exporting countries that have adopted inflation targeting monetary policies, world commodity price aggregates have predictive power for their CPI and PPI inflation, particularly once possible structural breaks are taken into account. This conclusion is robust to using either disaggregated or aggregated commodity price indexes (although the former perform better), the currency denomination of the commodity prices, and to using mixed-frequency data. In pseudo out-of-sample forecasting, commodity indexes outperform the random walk and AR processes, although the improvements over the latter are sometimes modest.
Article
This chapter summarizes recent literature on asymptotic inference about forecasts. Both analytical and simulation based methods are discussed. The emphasis is on techniques applicable when the number of competing models is small. Techniques applicable when a large number of models is compared to a benchmark are also briefly discussed.
Article
Full-text available
El presente trabajo estima la brecha del producto y del crecimiento del producto potencial para Chile durante 1986-2007 con tres diferentes metodologías: (i) función de producción, (ii) aproximación por el filtro de Kalman (univariado y multivariado) y (iii) VAR estructural. Las estimaciones de brecha de producto muestran alta coherencia entre sí. Los métodos sugieren que al inicio de la muestra, la economía estaba sobrecalentada, con brechas positivas considerables. Desde 1993 hasta la Crisis Asiática, la brecha fue positiva pero pequeña. Luego la brecha se hace negativa con una suave tendencia a cerrarse, para luego hacerse positiva durante el 2007. Para evaluar las distintas medidas de brecha, se compara cuán cercanas se encuentran las estimaciones en tiempo real con respecto a las ex-post, y en cuánto ayudan las medidas de brecha a predecir la inflación futura. Los métodos arrojan estimaciones similares para el crecimiento del producto potencial. Para el período completo, se estima un crecimiento del producto de tendencia en torno al 5.5%. Sin embargo, existen diferencias marcadas entre subperíodos, mostrando una desaceleración en el período posterior a la recesión de 1999.
Article
We propose a new model for multivariate forecasting which combines the Generalized Dynamic Factor Model (GDFM)and the GARCH model. The GDFM, applied to a huge number of series, captures the multivariate information and disentangles the common and the idiosyncratic part of each series of returns. In this financial analysis, both these components are modeled as a GARCH. We compare GDFM+GARCH and standard GARCH performance on samples up to 475 series, predicting both levels and volatility of returns. While results on levels are not significantly different, on volatility the GDFM+GARCH model outperforms the standard GARCH in most cases. These results are robust with respect to different volatility proxies.
Article
Full-text available
The output gap (measuring the deviation of output from its potential) is a crucial concept in the monetary policy framework, indicating demand pressure that generates inflation. The output gap is also an important variable in itself, as a measure of economic fluctuations. However, its definition and estimation raise a number of theoretical and empirical questions. This paper evaluates a series of univariate and multivariate methods for extracting the output gap, and compares their value added in predicting inflation. The multivariate measures of the output gap have by far the best predictive power. This is in particular interesting, as they use information from data that are not revised in real time. We therefore compare the predictive power of alternative indicators that are less revised in real time, such as the unemployment rate and other business cycle indicators. Some of the alternative indicators do as well, or better, than the multivariate output gaps in predicting inflation. As uncertainties are particularly pronounced at the end of the calculation periods, assessment of pressures in the economy based on the uncertain output gap could benefit from being supplemented with alternative indicators that are less evised in real time.
ResearchGate has not been able to resolve any references for this publication.