ArticlePDF Available

Abstract and Figures

Inflation forecasting is an important but difficult task. Here, we explore advances in machine learning (ML) methods and the availability of new datasets to forecast US inflation. Despite the skepticism in the previous literature, we show that ML models with a large number of covariates are systematically more accurate than the benchmarks. The ML method that deserves more attention is the random forest model, which dominates all other models. Its good performance is due not only to its specific method of variable selection but also the potential nonlinearities between past key macroeconomic variables and inflation.
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://amstat.tandfonline.com/action/journalInformation?journalCode=ubes20
Journal of Business & Economic Statistics
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: https://amstat.tandfonline.com/loi/ubes20
Forecasting Inflation in a Data-Rich Environment:
The Benefits of Machine Learning Methods
Marcelo C. Medeiros, Gabriel F. R. Vasconcelos, Álvaro Veiga & Eduardo
Zilberman
To cite this article: Marcelo C. Medeiros, Gabriel F. R. Vasconcelos, Álvaro Veiga
& Eduardo Zilberman (2019): Forecasting Inflation in a Data-Rich Environment: The
Benefits of Machine Learning Methods, Journal of Business & Economic Statistics, DOI:
10.1080/07350015.2019.1637745
To link to this article: https://doi.org/10.1080/07350015.2019.1637745
View supplementary material
Accepted author version posted online: 01
Jul 2019.
Published online: 19 Aug 2019.
Submit your article to this journal
Article views: 247
View related articles
View Crossmark data
Supplementary materials for this article are available online. Please go to http://tandfonline.com/r/JBES
Forecasting Inflation in a Data-Rich
Environment: The Benefits of Machine Learning
Methods
Marcelo C. MEDEIROS
Department of Economics, Pontifical Catholic University of Rio de Janeiro, Rua Marqu ˆ
es de S˜
ao Vicente 225,
G´
avea, Rio de Janeiro 22451-900, Brazil (
mcm@econ.puc-rio.br
)
Gabriel F. R. VASCONCELOS
Department of Economics, University of California, Irvine, 3201 Social Sciences Plaza B, Irvine, CA 92617
(
gabriel.vasconcelos@uci.edu
)
´
Alvaro VEIGA
Department of Electrical Engineering, Pontifical Catholic University of Rio de Janeiro, Rua Marquˆ
es de S˜
ao
Vicente 225, G´
avea, Rio de Janeiro 22451-900, Brazil (
alvf@ele.puc-rio.br
)
Eduardo ZILBERMAN
Central Bank of Chile and Department of Economics, Pontifical Catholic University of Rio de Janeiro (PUC-Rio),
Agustinas 1180, Santiago 867, Chile (
ezilberman@bcentral.cl
)
Inflation forecasting is an important but difficult task. Here, we explore advances in machine learning
(ML) methods and the availability of new datasets to forecast U.S. inflation. Despite the skepticism in the
previous literature, we show that ML models with a large number of covariates are systematically more
accurate than the benchmarks. The ML method that deserves more attention is the random forest model,
which dominates all other models. Its good performance is due not only to its specific method of variable
selection but also the potential nonlinearities between past key macroeconomic variables and inflation.
Supplementary materials for this article are available online.
KEY WORDS: Big data; Inflation forecasting; LASSO; Machine learning; Random forests.
1. INTRODUCTION
It is difficult to overemphasize the importance of forecasting
inflation in rational economic decision-making. Many contracts
concerning employment, sales, tenancy, and debt are set in
nominal terms. Therefore, inflation forecasting is of great
value to households, businesses, and policymakers. In addition,
central banks rely on inflation forecasts not only to inform
monetary policy but also to anchor inflation expectations and
thus enhance policy efficacy. Indeed, as part of an effort
to improve economic decision-making, many central banks
release inflation forecasts on a regular basis.
Despite the benefits of forecasting inflation accurately,
improving simple models has proved challenging. As Stock
and Watson (2010) emphasized, “it is exceedingly difficult
to improve systematically upon simple univariate forecasting
models, such as the Atkeson and Ohanian (2001) random
walk model [...] or the time-varying unobserved components
model in Stock and Watson (2007).” This conclusion is
supported by a large literature (Faust and Wright 2013),
1See Mullainathan and Spiess (2017) for discussions of ML methods and big
data in economics. In this article, an ML model is any statistical model that
is able to either handle a large set of covariates and/or describe nonlinear
mappings nonparametrically. Some of these methods even predate “machines.”
but this literature has largely ignored the recent machine
learning (ML) and “big data” boom in economics.1With a
few exceptions, previous works either considered a restrictive
set of variables or were based on a small set of factors
computed from a larger number of predictors known as
“diffusion indexes” (Stock and Watson 2002). In addition,
most of these works focused on a time period when inflation
was very persistent, which favors models that treat inflation as
nonstationary.
“Big data” and ML methods are not passing fads, and
investigating whether the combination of the two is able to pro-
vide more accurate forecasts is of paramount importance. Gu,
Kelly, and Xiu (2018), for example, showed that ML methods
coupled with hundreds of predictors improve substantially out-
of-sample stock return predictions. In a similar spirit, despite
the previous skepticism, we argue that these methods lead to
more accurate inflation forecasts. We find that the gains of using
ML methods can be as large as 30% in terms of mean squared
c
2019 American Statistical Association
Journal of Business & Economic Statistics
Month, 2019, Vol. 00, No. 0
DOI: 10.1080/07350015.2019.1637745
Color versions of one or more of the figures in the article can be
found online at
www.tandfonline.com/r/jbes
.
1
2Journal of Business & Economic Statistics, Month 2019
errors.2Moreover, this new set of models can help uncover the
main predictors for future inflation, possibly shedding light on
the drivers of price dynamics.
These findings are practically important in light of the large
forecast errors that many central banks, international institu-
tions and other forecasters have made recently. A striking exam-
ple is the European Central Bank (ECB), whose projections
have been systematically and substantially above realized infla-
tion recently. Effective monetary policy depends on accurate
inflation forecasts; otherwise, the policy stance will be tighter or
looser than necessary. In addition, systematic forecasting errors
may undermine central banks’ credibility and their ability
to anchor inflation expectations. Furthermore, Svensson and
Woodford (2004) argued that optimal monetary policy can
be implemented through an inflation forecast-targeting regime
in which central banks target inflation forecasts over a given
horizon. Taken together, these arguments suggest potentially
large welfare costs associated with failures to forecast inflation.
Not surprisingly, such recent failures have fostered a debate on
inflation forecasting practices.3This article contributes to this
debate by providing a comprehensive guide and assessment of
ML methods for one important case study, US inflation.
1.1. Main Takeaways
We contribute to the literature in a number of ways. First,
contrary to the previous evidence in Stock and Watson (1999,
2007), Atkeson and Ohanian (2001), and many others, our
results show that it is possible to consistently beat univariate
benchmarks for inflation forecasting, namely, random walk
(RW), autoregressive (AR), and unobserved components
stochastic volatility (UCSV) models. We consider several ML
models in a data-rich environment from FRED-MD, a monthly
database compiled by McCracken and Ng (2016), to forecast
U.S. consumer price index (CPI) inflation during more than 20
years of out-of-sample observations. We show that the gains
can be as large as 30% in terms of mean squared errors. Our
results are valid for different subsamples. Forecasting inflation
is important to inform rational economic decision-making.
However, as these decisions are made in real time, we check the
robustness of our results by considering a real-time experiment
from 2001 to 2015. The superiority of ML methods persists
even in real time.
2To gauge the relevance of such gains, consider the rough welfare calculations
in Dellas et al. (2018). The authors use a textbook macro model with nominal
rigidities, in which inflation forecasting errors lead to inefficient output gap
volatility. If relative risk aversion is equal to two, agents are willing to
forgo 0.16% (0.34%) of their steady state consumption to avoid a forecasting
deterioration of 20% (50%) in terms of mean squared errors. These figures
are relatively large. As a ground for comparison, Lucas (1987), who originally
proposed this method to measure the welfare cost of business cycles, found it
to be 0.10%.
3The roles of ML methods and big data were discussed at the Norges Bank’s
workshop on “big data, machine learning and the macroeconomy” in 2017, as
well as at the ECB workshop on “economic forecasting with large datasets” in
2018. Banca D’Italia and Deutsche Bundesbank promoted similar workshops
in 2018 and 2019, respectively. Finally, staff of central banks have just started
to produce working papers with applications that combine big data, ML and
forecasting, for example, Chakraborty and Joseph (2017) from Bank of England
and Hall (2018) from Kansas City Fed.
Second, we highlight the main set of variables responsible
for these forecast improvements. Our results indicate that this
set of variables is not sparse, which corroborates the findings
of Giannone, Lenza, and Primiceri (2018). Indeed, we find
that ML models that do not impose sparsity are the best-
performing ones. By contrast, the high level of aggregation
of factor models, which have been among the most popular
models for macroeconomic forecasting, is not adequate. Fur-
thermore, either replacing standard principal component factors
with target factors, as advocated by Bai and Ng (2008), or using
boosting to select factors as in Bai and Ng (2009) improves the
results only marginally.
Finally, we pay special attention to a particular ML model,
the random forest (RF) of Breiman (2001), which systemati-
cally outperforms the benchmarks and several additional ML
and factor methods: the least absolute shrinkage and selection
operator (LASSO) family, which includes LASSO, adaptive
LASSO (adaLASSO), elastic net (ElNet), and the adaptive
elastic net (adaElNet); ridge regression (RR); Bayesian vec-
tor autoregressions (BVAR); and principal component fac-
tors, target factors (T. Factor), and linear ensemble methods
such as bagging, boosted factors (B. Factor), complete subset
regressions (CSR), and jackknife model averaging (JMA).
RF models are highly nonlinear nonparametric models that
have a tradition in statistics but have only recently attracted
attention in economics. This late success is partly due to
the new theoretical results developed by Scornet, Biau, and
Vert (2015) and Wagner and Athey (2018). As robustness
checks, we also compare the RF models with four nonlinear
alternatives: deep neural networks (Deep NN), boosted trees
(BTrees), and a polynomial model estimated either by LASSO
or adaLASSO. We also compare our results to a linear model
estimated with nonconcave penalties (SCAD) as in Fan and Li
(2001).
There are several reasons why our results differ from those
of Stock and Watson (1999,2007), and Atkeson and Oha-
nian (2001). The combination of ML methods with a large
dataset not considered by these authors is an obvious one.
The choice of a different sample can also explain the dif-
ferences. The above papers work in an environment where
inflation is clearly integrated, whereas inflation is stationary
in the period considered in this article. This fact alone makes
both the RW and UCSV models less attractive, as clearly
they are not suitable for stationary data. Nevertheless, if the
gains of the ML methods are due solely to the fact that
inflation is much less persistent, we would observe compet-
itive performance of the AR or factor models, which is not
the case. Although the performance of the RW and UCSV
models improves when accumulated inflation is considered,
the ML methods still achieve superior results. By construction,
accumulated inflation is much more persistent compared to
the monthly figures. Given all of these reasons, ML methods
coupled with large datasets merit serious consideration for
forecasting inflation.
To open the black box of ML methods, we compare the
variables selected by the adaLASSO, RR, and RF models.
Following McCracken and Ng (2016), we classify the variables
into eight groups: (i) output and income; (ii) labor market;
(iii) housing; (iv) consumption, orders, and inventories; (v)
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 3
money and credit; (vi) interest and exchange rates; (vii) prices;
and (viii) stock market. In addition, we consider AR terms
and the factors computed from the full set of predictors.
The most important variables for RR and RF models are
stable across horizons but are quite different between the
two specifications. For RR, AR terms, prices and employ-
ment are the most important predictors, resembling a sort
of backward-looking Phillips curve, whereas RF models give
more importance to disaggregated prices, interest-exchange
rates, employment and housing. The RF model resembles a
nonlinear Phillips curve in which the backward-looking terms
are disaggregated. The adaLASSO selection is quite different
across forecasting horizons and is, by construction and in
opposition to RF and RR models, sparse. Only AR terms retain
their relative importance independent of the horizon, and prices
gradually lose their relevance until up to 6 months ahead but
partially recover for longer horizons. Output-income variables
are more important for medium-term forecasts. Finally, none
of the three classes of models selects either factors or stocks,
not even RR or RF, which produces a nonsparse solution.
This result may indicate that the high level of cross-section
aggregation of the factors is one possible cause for its poor
performance.
To disentangle the effects of variable selection from nonlin-
earity, we consider two alternative models. The first uses the
variables selected by RF and estimates a linear specification
by OLS. The second method estimates RF with only the
regressors selected by adaLASSO. Both models outperform
RF only for one-month-ahead forecasting. For longer horizons,
the RF model is still the winner, which provides evidence that
both nonlinearity and variable selection play a key role in the
superiority of the RF model.
There are many sources of nonlinearities that could justify
the superiority of the RF model. For instance, the relationship
between inflation and employment is nonlinear to the extent
that it depends on the degree of slackness in the economy.
Another source of nonlinearity is economic uncertainty, as this
uncertainty increases the option value of economic decision
delays if they have an irreversible component (Bloom 2009).
For example, if it is expensive to dismiss workers, hiring
should be nonlinear on uncertainty. In addition, this real option
argument also makes households and businesses less sensitive
to changes in economic conditions when uncertainty is high.
Hence, the responses of employment and inflation to interest
rate decisions are arguably nonlinear on uncertainty. The pres-
ence of a zero lower bound on nominal interest rates and the
implications of this bound for unconventional monetary policy
is another source of nonlinearity among inflation, employment
and interest rate variables (Eggertsson and Woodford 2003).
Finally, to the extent that houses serve as collateral for loans, it
interacts with monetary policy (Iacoviello 2005) and financial
intermediation (Mian and Sufi 2009). As in the Great Reces-
sion, a housing bubble can form, resulting in a deep credit crash
(Shiller 2014). Needless to say, these interactions are highly
nonlinear and arguably have nonlinear effects on inflation,
employment and interest rates. In line with these arguments, we
show that the gains of the RF model are larger during recessions
and periods of high uncertainty, especially during and after the
Great Recession.
1.2. A Brief Comparison to the Literature
The literature on inflation forecasting is vast, and there is
substantial evidence that models based on the Phillips curve do
not provide good forecasts. Although Stock and Watson (1999)
showed that many production-related variables are potential
predictors of US inflation, Atkeson and Ohanian (2001)showed
that in many cases, the Phillips curve fails to beat even sim-
ple naive models. These results inspired researchers to seek
different models and variables to improve inflation forecasts.
Among the variables used are financial variables (Forni et al.
2003), commodity prices (Chen, Turnovsky, and Zivot 2014),
and expectation variables (Groen, Paap, and Ravazzolo 2013).
However, there is no systematic evidence that these models
outperform the benchmarks.
Recently, due to advancements in computational power,
theoretical developments in ML, and the availability of large
datasets, researchers have started to consider the usage of
high-dimensional models in addition to the well-established
(dynamic) factor models. However, most of these studies have
either focused only on a very small subset of ML models or
presented a restrictive analysis. For example, Inoue and Kilian
(2008) considered bagging, factor models, and other linear
shrinkage estimators to forecast US inflation with a small set of
real economic activity indicators. Their study is more limited
than ours both in terms of the pool of models and richness
of the set of predictors. Nevertheless, the authors are among
the few voices suggesting that ML techniques can deliver
nontrivial gains over univariate benchmarks. Medeiros and
Mendes (2016) provided evidence that LASSO-based models
outperform both factor and AR benchmarks in forecasting US
CPI. However, their analysis was restricted to a single ML
method for only one-month-ahead forecasting.
The literature has mainly explored linear ML models. One
explanation for this limitation is that several of the papers in
the early days considered only univariate nonlinear models that
were, in most cases, outperformed by simple benchmarks; see
Ter ¨
asvirta, van Dijk, and Medeiros (2005). An exception is
Nakamura (2005), who showed that neural networks outper-
form univariate autoregressive models for short horizons.
Recently, Garcia, Medeiros, and Vasconcelos (2017) applied
a large number of ML methods, including RFs, to real-time
inflation forecasting in Brazil. The results indicated a superior-
ity of the CSR method of Elliott, Gargano, and Timmermann
(2013). However, an important question is whether this is
a particular result for Brazil or if similar findings can be
replicated for the U.S. economy. The first difference between
the results presented here and those in Garcia, Medeiros, and
Vasconcelos (2017) is that the RF model robustly outperforms
its competitors and CSR does not perform particularly well.
With respect to the set of models considered, on the one hand, in
this article we employ more ML models than Garcia, Medeiros,
and Vasconcelos (2017), but on the other hand, we do not have a
real-time daily database of inflation expectations as in the case
of Brazil. We also provide a much richer discussion of variable
selection and the nature of the best-performing models.
Finally, it is important to contextualize our work in light
of the criticisms of Makridakis, Spiliotis, and Assimakopoulos
(2018) with respect to the ability of ML methods to produce
4Journal of Business & Economic Statistics, Month 2019
reliable forecasts. The methods considered here are much
different than those in the study of Makridakis, Spiliotis, and
Assimakopoulos (2018). While we consider modern ML tools
such as RF, shrinkage, and bagging, the authors focus on simple
regression trees, shallow neural networks, and support vector
machines. Furthermore, the models considered in Makridakis,
Spiliotis, and Assimakopoulos (2018) are univariate in the
sense that no other variables apart from lags are used as
predictors.
1.3. Organization of the Article
Section 2 gives an overview of the data. Section 3 describes
the forecasting methodology. Section 4 describes the models
used in the article. Section 5 provides the main results. Sec-
tion 6 concludes. The online supplementary materials provide
additional results. Tables and figures labeled with an “S” refer
to this supplement.
2. DATA
Our data consist of variables from the FRED-MD database,
which is a large monthly macroeconomic dataset designed for
empirical analysis in data-rich environments. The dataset is
updated in real time through the FRED database and is available
from Michael McCraken’s webpage.4For further details, we
refer to McCracken and Ng (2016).
We use the vintage as of January 2016. Our sample extends
from January 1960 to December 2015 (672 observations), and
only variables with all observations in the sample period are
used (122 variables). In addition, we include as potential pre-
dictors the four principal component factors computed from this
set of variables. We consider four lags of all variables, as well
as four autoregressive terms. Hence, the analysis contemplates
508 potential predictors. The out-of-sample window is from
January 1990 to December 2015. All variables are transformed
to achieve stationarity as described in the supplementary mate-
rials. πtis the inflation in month tcomputed as πt=log(Pt)
log(Pt1), and Ptis a given price index in period t. The baseline
price index is the CPI, but in the supplementary materials we
report results for the PCE and the core CPI inflation. Figure S.1
in this supplement displays the evolution of inflation measures
during the full sample period. The noncore inflation measures
have a large outlier in November 2008 associated with a
large decline in oil prices. This outlier can certainly affect the
estimation of the models considered in this article. To attenuate
its effects, we include a dummy variable for November 2009
in all models estimated after that date. We do not include any
look-ahead bias, as the dummy is added only after the outlier
has been observed by the econometrician.
We compare performance across models in the out-of-
sample window and in two different subsample periods,
namely, January 1990 to December 2000 (132 observations)
and January 2001 to December 2015 (180 observations). The
first sample corresponds to a period of low inflation volatility
(σ=0.17%), while in the second sample, inflation is more
4https://research.stlouisfed.org/econ/mccracken/fred-databases/.
volatile (σ=0.32%). However, on average, inflation is higher
during 1990–2000 than 2001–2015 and much more persistent
as well. Relative to the 1990–2000 period, inflation was more
volatile near the recession in the early 1990s. Table S.9 in the
supplementary material provides descriptive statistics and gives
an overview of the economic scenario in each subsample.
As a robustness check, we also report the results of a real-
time experiment using our second subsample, from 2001 to
2015. We choose this particular period because the real-time
vintages are easily available from the FRED-MD database. For
the real-time experiment, the number of potential regressors
varies according to the vintage.
3. METHODOLOGY
Consider the following model
πt+h=Gh(xt)+ut+h,h=1, ...,H,t=1, ...,T,(1)
where πt+his the inflation in month t+h;xt=(x1t,...,xnt)
is a n-vector of covariates possibly containing lags of πtand/or
common factors as well as a large set of potential predictors;
Gh(·)is the mapping between covariates and future inflation;
and utis a zero-mean random error. The target function Gh(xt)
can be a single model or an ensemble of different specifications.
There is a different mapping for each forecasting horizon.
The direct forecasting equation is given by
πt+h|t=
Gh,tRh+1:t(xt),(2)
where
Gh,tRh+1:tis the estimated target function based on data
from time tRh+1uptotand Rhis the window size, which
varies according to the forecasting horizon and the number of
lagged variables in the model. We consider direct forecasts as
we do not make any attempt to predict the covariates. The only
exception is the case of the BVAR model, where joint forecasts
for all predictors are computed in a straightforward manner
following the procedure described in Ba´
nbura, Giannone, and
Reichlin (2010).
The forecasts are based on a rolling-window framework
of fixed length. However, the actual in-sample number of
observations depends on the forecasting horizon. For example,
for the 1990–2000 period, the number of observations is Rh=
360 hp1, where pis the number of lags in the model.
For 2001–2015, Rh=492 hp1. We choose to work in a
rolling-window scheme for two reasons: to attenuate the effects
of potential structural breaks and outliers and to avoid problems
of running superior predictive performance tests among nested
models; see the discussion in Giacomini and White (2006).
For instance, with a rolling-window scheme, the unconditional
Giacomini–White (GW) test is equivalent to the traditional
Diebold–Mariano (DM) test.
In addition to three benchmark specifications (RW, AR, and
UCSV models), we consider factor-augmented AR models,
sparsity-inducing shrinkage estimators (LASSO, adaLASSO,
ElNet, and adaElNet), other shrinkage methods that do not
induce sparsity (RR and BVAR with Minnesota priors), aver-
aging (ensemble) methods (bagging, CSR, and JMA) and
RF. Bagging and CSR can be viewed as nonsparsity-inducing
shrinkage estimators. With respect to the factor-augmented AR
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 5
models, we consider in addition to the standard factors com-
puted with principal component analysis a set of target factors
as in Bai and Ng (2008) and boosted factors as in Bai and Ng
(2009). We also include in the comparison three different model
combination schemes, namely, the simple average, the trimmed
average, and the median of the forecasts. For both the shrinkage
and factor-based methods, the set of predictors are standardized
before estimation.
We find that RF, a highly nonlinear method, robustly outper-
forms the other methods. To disentangle the role of variable
selection from nonlinearity, we also consider a linear model
where the regressors are selected by RFs (RF/ordinary least
squares, OLS) and an RF model with regressors preselected by
adaLASSO (adaLASSO/RF).
Forecasts for the accumulated inflation over the following
3, 6, and 12 months are computed, with the exception of the
RW and UCSV models, by aggregating the individual forecasts
for each horizon. In the case of the RW and UCSV models,
a different specification is used to construct the forecasts (see
below).
4. MODELS
4.1. Benchmark Models
The first benchmark is the RW model, where for h=
1, ..., 12, the forecasts are as πt+h|t=πt. For the accumu-
lated h-month forecast, we set πt+1:t+h|t=πt(h1):t, where
πt(h1):tis the accumulated inflation over the previous h
months.
The second benchmark is the autoregressive (AR) model of
order p, where pis determined by the Bayesian information
criterion (BIC) and the parameters are estimated by OLS. The
forecast equation is πt+h|t=
φ0,h+
φ1,hπt+···+
φp,hπtp+1.
There is a different model for each horizon. The accumulated
forecasts are computed by aggregating the individual forecasts.
Finally, the third benchmark is the UCSV model, which is
described as follows
πt=τt+eht/2εt,τt=τt1+ut,ht=ht1+vt,
where {εt}is a sequence of independent and normally dis-
tributed random variables with zero mean and unit variance
and utand vtare also normal with zero mean and variance
given by inverse-gamma priors. τ1N(0, Vτ), and h1
N(0, Vh), where Vτ=Vh=0.12. The model is estimated
by Markov chain Monte Carlo (MCMC) methods. The h-steps-
ahead forecast is computed as πt+h=τt|t. For accumulated
forecasts, the UCSV is estimated with the accumulated h-month
inflation as the dependent variable.
4.2. Shrinkage
In this article, we estimate several shrinkage estimators for
linear models where Gh(xt)=β
hxtand
βh=arg min
βhTh
t=1yt+hβ
hxt2+
n
i=1
ph,i;λ,ωi).(3)
ph,i;λ,ωi)is a penalty function that depends on the penalty
parameter λand on a weight ωi>0. We consider different
choices for the penalty functions.
4.2.1. Ridge Regression (RR). RR was proposed by
Hoerl and Kennard (1970). The penalty is given by
n
i=1
ph,i;λ,ωi):=λ
n
i=1
β2
h,i.(4)
RR has the advantage of having an analytic solution that is
easy to compute, and it also shrinks the coefficients associated
with less-relevant variables to nearly zero. However, the coeffi-
cients rarely reach exactly zero for any size of λ.
4.2.2. Least Absolute Shrinkage and Selection Operator
(LASSO). LASSO was proposed by Tibshirani (1996), where
the penalty is
n
i=1
ph,i;λ,ωi):=λ
n
i=1|βh,i|.(5)
LASSO shrinks the irrelevant variables to zero. However,
model selection consistency is achieved only under very strin-
gent conditions.
4.2.3. Adaptive Least Absolute Shrinkage and Selection
Operator (adaLASSO). adaLASSO was proposed by Zou
(2006) to achieve model selection consistency. adaLASSO uses
the same penalty as LASSO with the inclusion of a weighting
parameter that comes from a first-step estimation. The penalty
is given by
n
i=1
ph,i;λ,ωi):=λ
n
i=1
ωi|βh,i|,(6)
where ωi=|β
h,i|1and β
h,iis the coefficient from the first-step
estimation. adaLASSO can deal with many more variables than
observation and works well in non-Gaussian environments and
under heteroscedasticity (Medeiros and Mendes 2016).
4.2.4. Elastic Net (ElNet). ElNet is a generalization that
includes LASSO and RR as special cases. It is a convex
combination of the 1and 2norms (Zou and Hastie 2005).
The ElNet penalty is defined as
n
i=1
ph,i;λ,ωi):=αλ
n
i=1
β2
h,i+(1α)λ
n
i=1|βh,i|;(7)
where α∈[0, 1]. We also consider an adaptive version of ElNet
(adaElNet), which works in the same way as adaLASSO.
4.3. Factor Models
The idea of the factor model is to extract common compo-
nents from all predictors, thus reducing the model dimension.
Factors are computed as principal components of a large set of
variables ztsuch that Ft=Azt, where Ais a rotation matrix
and Ftis the vector of the principal components. Consider
Equation (1). In this case, xtis given by πtj,j=0, 1, 2, 3
plus ftj,j=0, 1, 2, 3, where ftis the vector with the first four
principal components of zt. The theory behind factor models
can be found in Bai and Ng (2003).
6Journal of Business & Economic Statistics, Month 2019
4.3.1. Target Factors. To improve the forecasting per-
formance of factor models, Bai and Ng (2008) proposed tar-
geting the predictors. The idea is that if many variables in
ztare irrelevant predictors of πt+h, factor analysis using all
variables may result in noisy factors with poor forecasting
ability. The idea is to compute the principal components only
of the variables with high prediction power for future inflation.
Let zi,t,i=1, ...,qbe the candidate variables and wta set of
controls. We follow Bai and Ng (2008) and use lagged values
of πtas controls. The procedure is described as follows.
1. For i=1, ...,q,regressπt+hon wtand zi,tand compute the
t-statistics for the coefficient corresponding to zi,t.
2. Choose a significance level αand select all variables that are
significant using the computed t-statistics.
3. Let zt(α) be the selected variables from Steps 1–2. Estimate
the factors Ftfrom zt(α) by the principal components.
4. Regress πt+hon wtand ftj,j=0, 1, 2, 3, where ftFt.
The number of factors in ftis selected using the BIC.
The same procedure was used by Medeiros and Vasconcelos
(2016) to forecast US CPI inflation. The authors showed that
the target factors slightly reduce the forecasting errors.
4.3.2. Factor Boosting. The optimal selection of factors
for predictive regressions is an open problem in the literature.
Even if the factor structure is clear in the data, it is not obvious
that only the most relevant factors should be included in the
predictive regression. We adopt the boosting algorithm as in Bai
and Ng (2008) to select the factors and the number of lags in the
model. Define ztRq,thesetofallnfactors computed from
the original nvariables plus four lags of each factor. Therefore,
q=5n.
The algorithm is defined as follows:
1. Let t,0 πfor each t, where ¯π=1
tt
i=1πi.
2. For m=1, ...,M: (a) Compute
ut=πtth,m1.(b)
For each candidate variable i=1, ...,q, regress the current
residual on zi,tto obtain
bi, and compute
et,i=
utzi,t
bi.
Calculate SSRi=
e
i
ei. (c) Select i
mas the index of the
variable that delivers the smallest SSR, and define
φt,m=
zi
m,t
bi
m. (d) Update
t,m=
t,m1+vφt,m, where vis the
step length. We set v=0.2.
3. Stop the algorithm after the Mth iteration or when the BIC
starts to increase.
4.4. Ensemble Methods
Ensemble forecasts are constructed from a (weighted) aver-
age of the predictions of an ensemble of methods.
4.4.1. Bagging. The term “bagging” comes from boot-
strap aggregation, which was proposed by Breiman (1996).
The idea is to combine forecasts from several unstable models
estimated for different bootstrap subsamples. Normally, there
is much more to gain from combinations of models if they are
very different. The bagging steps are as follows:
1. For each bootstrap sample, run an OLS regression with
all candidate variables and select those with an absolute t-
statistic above a certain threshold c.
2. Estimate a new regression only with the variables selected in
the previous step.
3. The coefficients from the second regression are finally used
to compute the forecasts on the actual sample.
4. Repeat the first three steps for Bbootstrap samples and
compute the final forecast as the average of the Bforecasts.
Note that in our case, the number of observations may
be smaller than the number of variables, which makes the
regression in the first step unfeasible. We solve this issue by,
for each bootstrap subsample, randomly dividing all variables
in groups and running the pretesting step for each one of the
groups.
4.4.2. Complete Subset Regressions (CSR). CSR was
developed by Elliott, Gargano, and Timmermann (2013,2015).
The motivation was that selecting the optimal subset of xtto
predict πt+hby testing all possible combinations of regressors
is computationally very demanding and, in most cases, unfea-
sible. Suppose that we have ncandidate variables. The idea is
to select a number qnvariables and run regressions using all
possible combinations qof nvariables. The final forecast is the
average over all forecasts.
CSR handles a small number of variables. For large sets, the
number of regressions to be estimated increases rapidly. For
example, with n=25 and q=4, there are 12,650 regressions.
Therefore, we adopt a pretesting procedure similar to that used
with the target factors. We start fitting a regression of πt+h
on each of the candidate variables (including lags) and save
the t-statistics of each variable.5The t-statistics are ranked by
absolute value, and we select the ˜nvariables that are more
relevant in the ranking. The CSR forecast is calculated on these
variables.
4.4.3. Jackknife Model Averaging (JMA). JMA is a
method to combine forecasts from different models. Instead of
using simple average, JMA uses leave-one-out cross-validation
to estimate optimal weights; see Hansen and Racine (2012) and
Zhang, Wan, and Zou (2013).
Suppose we have Mcandidate models that we want to
average from and write the forecast of each model as π(m)
t+h,
m=1, ...,M. Set the final forecast as
πt+h=
M
m=1
ωmπ(m)
t+h,
where 0 ωm1 for all m∈{1, ...,M}and M
m=1ωm=1.
The JMA procedure is as follows:
1. For each observation of (xt,πt+h): (a) Estimate all candidate
models leaving the selected observation out of the estima-
tion. Since we are in a time series framework with lags in
the model, we also remove four observations before and four
observations after (xt,πt+h). (b) Compute the forecasts from
each model for the observations that were removed in the
previous step.
2. Choose the weights that minimize the cross-validation errors
subject to the constraints previously described.
Each candidate model in the JMA has four lags of the
inflation and four lags of one candidate variable.
5We do not use a fixed set of controls, wt, in the pretesting procedure like we
did for the target factors.
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 7
4.5. Random Forests
The RF model was proposed by Breiman (2001) to reduce
the variance of regression trees and is based on bootstrap aggre-
gation (bagging) of randomly constructed regression trees. In
turn, a regression tree is a nonparametric model that approx-
imates an unknown nonlinear function with local predictions
using recursive partitioning of the space of the covariates
(Breiman 1996).
To understand how a regression tree works, an example from
Hastie, Tibshirami, and Friedman (2001) is useful. Consider
a regression problem in which X1and X2are explanatory
variables, each taking values in some given interval, and Yis
the dependent variable. We first split the space into two regions
at X1=s1, and then the region to the left (right) of X1=s1
is split at X2=s2(X1=s3). Finally, the region to the right
of X1=s3is split at X2=s4. As illustrated in the left plot
of Figure 1, the final result is a partitioning into five regions:
Rk,k=1, ..., 5. In each region Rk, we assume that the model
predicts Ywith a constant ck, which is estimated as the sample
average of realizations of Ythat “fall” within region Rk.Akey
advantage of this recursive binary partition is that it can be
represented as a single tree, as illustrated in the right plot of
Figure 1. Each region corresponds to a terminal node of the
tree. Given a dependent variable πt+h, a set of predictors xt
and a number of terminal nodes K, the splits are determined to
minimize the sum of squared errors of the following regression
model
πt+h=
K
k=1
ckIk(xt;θk),
where Ik(xt;θk)is an indicator function such that
Ik(xt;θk)=1ifxtRk(θk),
0 otherwise.
θkis the set of parameters that define the kth region. Ik(xt;θk)
is in fact a product of indicator functions, each of which defines
one of the splits that yields the kth region.
RF is a collection of regression trees, each specified in a
bootstrap sample of the original data. Since we are dealing
with time series, we use a block bootstrap. Suppose there are
Bbootstrap samples. For each sample b,b=1, ...,B,atree
with Kbregions is estimated for a randomly selected subset of
the original regressors. Kbis determined to leave a minimum
number of observations in each region. The final forecast is the
average of the forecasts of each tree applied to the original data
πt+h=1
B
B
b=1Kb
k=1
ck,bIk,b(xt;
θk,b).
4.6. Hybrid Linear-Random Forest Models
RF/OLS and adaLASSO/RF are adaptations specifically
designed to disentangle the relative importance of variable
selection and nonlinearity to forecast US inflation. RF/OLS is
estimated using the following steps:
1. For each bootstrap sample b=1, ...,B: (a) Grow a single
tree with knodes (we used k=20), and save the Nksplit
variables. (b) Run an OLS on the selected splitting variables.
(c) Compute the forecast πb
t+h.
2. The final forecast will be ˆπt+h=B1B
b=1πb
t+h.
The main objective of RF/OLS is to check the performance
of a linear model using variables selected by the RF model.
If the results are very close to those of the RF model, we
understand that nonlinearity is not an issue, and the RF model
is superior solely because of variable selection. However, if
we see some improvement compared to other linear models,
especially bagging,6but RF/OLS is still less accurate than RF,
we have evidence that both nonlinearity and variable selection
play important roles.
The second adapted model is adaLASSO/RF, where we
use adaLASSO for variable selection and then estimate a
fully grown RF with the variables selected by adaLASSO. If
adaLASSO/RF performs similarly to RF, we understand that
the variable selection in RF is less relevant and nonlinearity is
more important.
6Bagging and RF are bootstrap-based models; the former is linear, and the latter
is nonlinear.
Figure 1. Example of a regression tree. Reproduction of part of Figure 9.2 in Hastie, Tibshirami, and Friedman (2001).
8Journal of Business & Economic Statistics, Month 2019
adaLASSO/RF and RF/OLS together create an “if and only
if” situation where we test the importance of variable selection
and nonlinearity from both sides. The results indicate that
nonlinearity and variable selection are both important to explain
the performance of RF.
4.7. Computer Codes and Tuning Parameters
All ML methods are estimated in Rusing, in most
cases, standard and well-established packages. For the
linear ML methods, the following packages are used:
HDEconometrics and glmnet. The RF models are
estimated using the randomForest package. The deep
networks and boosted trees used for robustness checks
are estimated, respectively, with the h2o and xgboost
packages.7The Rcodes are available online at https://github.
com/gabrielrvsc/ForecastInflation.8
For all methods within the LASSO family, the penalty
parameter λis chosen by the BIC as advocated by Kock
and Callot (2015) and Medeiros and Mendes (2016). The α
parameter for the ElNet penalty in (7) is set to 0.5. We also tried
selecting it by using the BIC, but the results are quite similar,
and the computational burden is much higher. The weights
of the adaptive versions of LASSO and ElNet are chosen as
ωi=1
βi+1
T
, where
βiis the estimate from the nonadaptive
version of the method. The additional term 1
Tgives variables
excluded by the initial estimator a second chance for inclusion
in the model.
The numbers of factors and lags in the factor models are
set to four. Data-driven methods for selecting the number of
factors and the lags usually yield smaller quantities, delivering
in most cases worse forecasts. For this reason, we decided
to fix the number of factors and lags at four. For the target
factors, we adopt the 5% significance level (α=0.05). For
the factor-boosting algorithm, we make the following choices.
The maximum number of iterations is set to M=10 ×
number of variables 5000. However, the boosting algorithm
is stopped whenever the BIC of the model starts to increase.
The shrinkage parameter is set to ν=0.2 as advocated by Bai
and Ng (2009).
The number of bootstrap replications for the bagging method
is B=100. We experimented with other values, and the results
are very stable. The pretesting procedure is conducted at the 5%
level as in Inoue and Kilian (2008). Given the large number of
variables, the pretesting was performed in two steps. First, for
each bootstrap sample, we divide the variables into 10 groups
and perform pretesting on each group. Then, selected variables
from each group are used in the next pretesting, which selects
the final variables.
For the CSR, we set ˜n=20 and q=4. These choices are
made to avoid a huge computational burden. We tried varying
both ˜nand q, and the results do not differ much. As in the
7HDEconometrics is available at https://github.com/gabrielrvsc/
HDeconometrics/. The remaining packages are available at https://cran.
r-project.org/web/packages/.
8See also http://www.econ.puc-rio.br/mcm/codes/.
bagging algorithm and the target factors, the initial pretesting
is carried out at the 5% level.
Each individual tree in the RF model is grown until there are
only five observations in each leaf. The proportion of variables
randomly selected in each split is 1/3. This is the default setting
in the randomForest package in R. The number of bootstrap
samples, B, is set to 500. We also experimented with other
combinations of parameters, and the final results are reasonably
stable.
5. RESULTS
The models are compared according to three different statis-
tics, namely, the root mean squared error (RMSE), the mean
absolute error (MAE), and the median absolute deviation from
the median (MAD), which are defined as follows
RMSEm,h=
1
TT0+1
T
t=T0
e2
t,m,h,
MAEm,h=1
TT0+1
T
t=T0
et,m,h, and
MADm,h=median
et,m,hmedian
et,m,h,
where
et,m,h=πtπt,m,hand πt,m,his the inflation forecast for
month tmade by model mwith information up to th.The
first two measures above are the usual ones in the forecasting
literature. MAD, which is less commonly used, is robust to
outliers and asymmetries. Reporting both MAE and MAD in
addition to RMSE is important for confirming that the results
are not due to a few large forecasting errors.
To test whether the forecasts from distinct models are
different, we consider a number of tests: the model confidence
sets (MCSs) of Hansen, Lunde, and Nason (2011), the superior
predictive ability (SPA) tests of Hansen (2005), and the multi-
horizon SPA test of Quaedvlieg (2017).
5.1. Overview
Table 1 reports a number of summary statistics across all
forecasting horizons. The results concern the sample from
1990 to 2015. Additional results for the different subsamples
can be found in Tables S.10 and S.11 in the supplementary
materials. Columns (1), (2), and (3) report the average RMSE,
the average MAE, and the average MAD. Columns (4), (5), and
(6) report, respectively, the maximum RMSE, MAE, and MAD
over the forecasting horizons. Columns (7), (8), and (9) report,
respectively, the minimum RMSE, MAE, and MAD over the
15 different horizons considered. We normalize these statistics
for the benchmark RW model to one. Columns (10), (11), and
(12) report the number of times (across horizons) each model
achieved the lowest RMSE, MAE, and MAD, respectively.
Columns (13) and (14) show the average p-values of the SPA
test proposed by Hansen (2005). The SPA test of Hansen (2005)
compares a collection of models against a benchmark where the
null hypothesis is that no other model in the pool of alternatives
has superior predictive ability. In the present context, for each
forecasting horizon, we run Hansen’s SPA test by setting each
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 9
Table 1. Forecasting results: summary statistics for the out-of-sample period from 1990 to 2015
Forecasting precision Sup. pred. ability Model confidence set
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17)
ave. ave. ave. max. max. max. min. min. min. # min. # min. # min. ave. p.v. ave. p.v. ave. p.v. ave. p.v. p.v.
Model RMSE MAE MAD RMSE MAE MAD RMSE MAE MAD RMSE MAE MAD sq abs Tmax sq Tmax abs m.h. sq.
RW 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0 0 0 0.02 0.00 0.13 0.07 0.00
AR 0.84 0.86 0.78 1.22 1.22 1.16 0.75 0.76 0.60 0 0 0 0.05 0.01 0.48 0.21 0.00
UCSV 0.82 0.82 0.85 0.95 0.91 1.04 0.77 0.78 0.76 0 0 0 0.07 0.04 0.52 0.37 0.00
LASSO 0.78 0.79 0.74 0.98 1.04 0.91 0.73 0.71 0.61 0 0 0 0.14 0.17 0.72 0.56 0.73
adaLASSO 0.78 0.78 0.76 0.96 0.96 0.99 0.72 0.71 0.63 0 0 0 0.15 0.31 0.68 0.66 0.18
ElNet 0.78 0.80 0.73 0.98 1.05 0.89 0.73 0.71 0.61 0 0 2 0.15 0.15 0.72 0.55 0.89
adaElnet 0.78 0.78 0.76 0.96 0.97 0.96 0.73 0.71 0.61 0 0 2 0.18 0.33 0.70 0.67 0.05
RR 0.76 0.77 0.79 0.89 0.93 1.03 0.70 0.71 0.67 0 0 0 0.43 0.46 0.77 0.69 0.40
BVAR 0.80 0.82 0.80 1.07 1.09 1.14 0.74 0.73 0.64 0 0 0 0.12 0.13 0.66 0.51 0.04
Bagging 0.79 0.83 0.90 0.83 0.89 1.29 0.74 0.78 0.76 0 0 0 0.20 0.06 0.63 0.39 0.04
CSR 0.82 0.82 0.80 1.13 1.11 1.09 0.76 0.74 0.67 0 0 0 0.13 0.06 0.58 0.37 0.00
JMA 0.86 0.91 1.00 0.99 0.99 1.39 0.76 0.83 0.78 0 0 0 0.06 0.01 0.29 0.08 0.00
Factor 0.84 0.87 0.87 1.17 1.21 1.25 0.78 0.78 0.71 0 0 0 0.04 0.01 0.34 0.13 0.00
T. Factor 0.83 0.88 0.89 1.17 1.23 1.26 0.77 0.80 0.70 0 0 0 0.01 0.00 0.30 0.09 0.00
B. Factor 0.83 0.90 0.99 1.17 1.32 1.60 0.74 0.75 0.73 0 0 0 0.02 0.00 0.58 0.25 0.00
RF 0.73 0.73 0.70 0.84 0.81 0.84 0.68 0.67 0.58 11 13 8 0.94 0.95 0.95 0.97 1
Mean 0.77 0.77 0.77 0.95 0.97 1.01 0.71 0.70 0.66 0 0 0 0.37 0.40 0.75 0.64 0.89
T.Mean 0.77 0.77 0.75 0.95 0.96 0.95 0.71 0.70 0.62 0 0 0 0.35 0.48 0.74 0.71 0.83
Median 0.77 0.77 0.75 0.94 0.97 0.99 0.71 0.70 0.63 0 0 0 0.32 0.44 0.73 0.70 0.89
RF/OLS 0.77 0.78 0.81 0.94 0.97 1.05 0.71 0.72 0.63 1 1 0 0.46 0.47 0.76 0.70 0.96
adaLASSO/RF 0.75 0.75 0.73 0.85 0.82 0.87 0.70 0.68 0.58 3 1 3 0.53 0.59 0.82 0.79 0.89
NOTE: The table reports for each model a number of different summary statistics across all the forecasting horizons, including the accumulated three-, six-, and twelve-month horizons. Columns (1), (2), and (3) report the average root mean square error
(RMSE), the average mean absolute error (MAE), and the average median absolute deviation (MAD). Columns (4), (5), and (6) report, respectively, the maximum RMSE, MAE, and MAD over the forecasting horizons. Columns (7), (8), and (9) report,
respectively, the minimum RMSE, MAE, and MAD over the 15 different horizons considered. Columns (10), (11), and (12) report the number of times (across horizons) each model achieved the lowest RMSE, MAE, and MAD, respectively. Columns
(13) and (14) show the average p-values of the superior predictive ability (SPA) test proposed by Hansen (2005). Columns (15) and (16) present for square and absolute losses, the average p-values for the model confidence sets (MCSs) based on the tmax
statistic as described in Hansen, Lunde, and Nason (2011). Column (17) displays the p-value of the multi-horizon MCS proposed by Quaedvlieg (2017). The test is based on the squared errors only.
10 Journal of Business & Economic Statistics, Month 2019
one of the models as the benchmark. A rejection of the null
indicates that the reference model is outperformed by one or
more competitors. Columns (15) and (16) present for square
and absolute losses, respectively, the average p-values for the
MCSs based on the tmax statistic as described in Hansen,
Lunde, and Nason (2011). An MCS is a set of models that is
built such that it will contain the best model with a given level of
confidence. An MCS is analogous to a confidence interval for a
parameter. The MCS procedure also yields p-values for each of
the models considered. The best model has the highest p-value.
We construct MCSs for each forecasting horizon. To build a
more general MCS that contemplates all horizons, column (17)
displays the p-value of the multi-horizon MCS proposed by
Quaedvlieg (2017). The test is based on the squared errors only.
The following facts emerge from the tables: (1) ML models
and the use of a large set of predictors systematically improve
the quality of inflation forecasts over traditional benchmarks.
This is a robust and statistically significant result. (2) The RF
model outperforms all the other alternatives in terms of point
statistics. The superiority of RF is due both to the variable selec-
tion mechanism induced by the method as well as the presence
of nonlinearities in the relation between inflation and its predic-
tors. RF has the lowest RMSEs, MAEs, and MADs across the
horizons and the highest MCS p-values. The RF model also has
the highest p-values in the SPA test and the multi-horizon MCS.
The improvements over the RW in terms of RMSE, MAE, and
MAD are almost 30% and are more pronounced during the
second subsample, when inflation volatility is much higher.
(3) Shrinkage methods also produce more precise forecasts
than the benchmarks. Sparsity-inducing methods are slightly
worse than nonsparsity-inducing shrinkage methods. Overall,
the forecasting performance among shrinkage methods is very
similar, and ranking them is difficult. (4) Factor models are
strongly outperformed by other methods. The adoption of
boosting and target factors improves the quality of the forecasts
only marginally. The poor performance of factor models is
more pronounced during the first subsample (low-volatility
period). (5) CSR and JMA do not perform well either and
are comparable to the factor models. (6) Forecast combination
schemes (simple average, trimmed average and median) do not
bring any significant improvements in any of the performance
criteria considered. (7) Among the benchmark models, both AR
and UCSV outperform the RW alternative. Furthermore, the
UCSV model is slightly superior to the AR specification.
5.2. Random Forests Versus Benchmarks
Tables 24show the results of the comparison between RF
and the benchmarks. Table 2 presents the RMSE, MAE, and
MAD ratios of the AR, UCSV, and RF models with respect to
the RW alternative for all forecasting horizons as well as for the
accumulated forecasts over 3, 6, and 12 months. The models
with the smallest ratios are highlighted in bold. It is clear that
the RF model has the smallest ratios for all horizons.
To check whether this is a robust finding across the out-
of-sample period, we compute rolling RMSEs, MAEs, and
MADs over windows of 48 observations. Table 3 shows the
frequency with which each model achieved the lowest RMSEs,
MAEs, and MADs as well as the frequency with which each
model was the worst-performing alternative among the four
competitors. The RF model is the winning specification and
is superior to the competitors for the majority of periods,
including the Great Recession. By contrast, the RW model
delivers the worst forecasts most of the time. Figures S.4–S.6 in
the supplementary materials show the rolling RMSEs, MAEs,
and MADs over the out-of-sample period. The performance
of the RW deteriorates as the forecasting horizon increases.
However, the accomplishments of the RF seem rather robust.
Table 4 reports the p-values of the unconditional Giacomini
and White (2006) test for superior predictive ability for squared
and absolute errors and the multi-horizon superior predictive
ability test of Quaedvlieg (2017). The latter test compares all
horizons jointly. Rejections of the null mean that the forecasts
are significantly different. It is clear that the RF has forecasts
that are significantly different from and superior to the three
benchmarks.
5.3. The Full Picture
In this section, we compare all models. Table 5 presents the
results for the full out-of-sample period, whereas Tables S.12
and S.13 present the results for the 1990–2000 and 2001–
2015 periods, respectively. The tables report the RMSEs and,
in parentheses, the MAEs for all models relative to the RW
specification. The error measures were calculated from 132
(180) rolling windows covering the 1990–2000 (2001–2015)
period. Values in bold denote the most accurate model in each
horizon. Cells in gray (blue) show the models included in the
50% MCS using the squared error (absolute error) as the loss
function. The MCSs were constructed based on the maximum
t-statistic. The last column in the table reports the number of
forecast horizons in which the model was included in the MCS
for the square (absolute) loss. The last two rows in the table
report the number of models included in the MCS for the square
and absolute losses.
We start by analyzing the full out-of-sample period.
Apart from a few short horizons, where either RF/OLS or
adaLASSO/RF is the winning model, RF delivers the smallest
ratios in most of the cases. RF is followed closely by shrinkage
models, where RR seems be superior to the other alternatives.
RR, RF, and the hybrid linear-RF models are the only ones
included in the MCS for all horizons. Neither RF nor RR
impose sparsity, which may corroborate the conclusions of
Giannone, Lenza, and Primiceri (2018), who provide evidence
against sparsity in several applications. Factor models have
very poor results and are almost never included in the MCS.
When factors are combined with boosting, there is a small
gain, but the results are still greatly inferior to those from
the RF and shrinkage models. This is particularly curious as
there is a correspondence between the factor models and RR:
RR predictions are weighted combinations of all principal
component factors of the set of predictors. Several reasons
might explain the difference. The first potential explanation
is a lack of a clear factor structure in the regressors. This
is not the case as shown in Figure S.2 in the supplementary
materials, where we display the eigenvalues of the correlation
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 11
Table 2. Forecasting results: RMSE, MAE and MAD Ratios (1990–2015)
Panel (a): RMSE ratio
Forecasting horizon
Model1234567891011123m6m12m
AR 0.90 0.81 0.79 0.81 0.79 0.79 0.78 0.76 0.78 0.82 0.84 0.75 0.86 0.97 1.22
UCSV 0.95 0.82 0.80 0.81 0.78 0.78 0.78 0.78 0.77 0.80 0.83 0.78 0.87 0.86 0.91
RF 0.84 0.73 0.71 0.74 0.71 0.72 0.72 0.71 0.72 0.76 0.77 0.68 0.71 0.71 0.77
Panel (b): MAE ratio
Forecasting horizon
Model1234567891011123m6m12m
AR 0.87 0.79 0.78 0.81 0.80 0.81 0.78 0.76 0.81 0.85 0.86 0.76 0.89 1.04 1.22
UCSV 0.91 0.82 0.79 0.80 0.80 0.79 0.80 0.79 0.78 0.80 0.85 0.78 0.86 0.90 0.89
RF 0.81 0.72 0.71 0.75 0.73 0.73 0.70 0.68 0.72 0.75 0.77 0.67 0.74 0.77 0.77
Panel (c): MAD ratio
Forecasting horizon
Model1234567891011123m6m12m
AR 0.74 0.70 0.82 0.82 0.83 0.75 0.66 0.68 0.77 0.70 0.77 0.60 0.87 1.16 0.89
UCSV 0.88 0.77 0.83 0.91 0.88 0.79 0.76 0.83 0.86 0.83 0.88 0.78 0.87 1.04 0.88
RF 0.70 0.63 0.77 0.84 0.75 0.73 0.65 0.64 0.73 0.69 0.71 0.58 0.71 0.80 0.59
NOTE: The table reports, for each forecasting horizon, the root mean squared error (RMSE), mean absolute error (MAE), and median absolute deviation from the median (MAD) ratios with respect to the random walk model for the full out-of-sample
period (1990–2015). The last three columns represent, respectively, the ratios for the accumulated three, six, and twelve-month forecasts. The statistics for the best-performing model are highlighted in bold.
12 Journal of Business & Economic Statistics, Month 2019
Table 3. Forecasting results: ranking of models (1990–2015)
Panel (a): Lowest rolling RMSE
Forecasting horizon
Model1234567891011123m6m12m
RW 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00 0.06 0.08
AR 0.08 0.05 0.00 0.16 0.01 0.01 0.10 0.18 0.12 0.13 0.11 0.00 0.00 0.00 0.00
UCSV 0.02 0.05 0.21 0.10 0.18 0.11 0.00 0.03 0.19 0.11 0.09 0.00 0.21 0.31 0.24
RF 0.89 0.90 0.79 0.74 0.81 0.88 0.90 0.79 0.69 0.76 0.75 1.00 0.78 0.63 0.68
Panel (b): Lowest rolling MAE
Forecasting horizon
Model1234567891011123m6m12m
RW 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.09 0.00 0.03 0.11 0.04
AR 0.17 0.03 0.00 0.05 0.00 0.02 0.10 0.14 0.13 0.06 0.05 0.02 0.00 0.00 0.00
UCSV 0.15 0.18 0.26 0.23 0.15 0.15 0.00 0.07 0.24 0.23 0.09 0.02 0.21 0.24 0.20
RF 0.68 0.79 0.74 0.72 0.85 0.82 0.90 0.79 0.63 0.71 0.76 0.95 0.76 0.65 0.76
Panel (c): Lowest rolling MAD
Forecasting horizon
Model1234567891011123m6m12m
RW 0.14 0.00 0.03 0.00 0.09 0.05 0.00 0.00 0.03 0.05 0.03 0.05 0.02 0.15 0.02
AR 0.23 0.16 0.23 0.23 0.12 0.23 0.26 0.32 0.11 0.15 0.42 0.23 0.04 0.03 0.04
UCSV 0.04 0.19 0.27 0.33 0.09 0.12 0.03 0.01 0.22 0.09 0.03 0.02 0.10 0.05 0.05
RF 0.59 0.65 0.46 0.44 0.69 0.60 0.70 0.67 0.64 0.71 0.52 0.70 0.84 0.77 0.89
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 13
Table 3. Continued
Panel (d): Highest rolling RMSE
Forecasting horizon
Model1234567891011123m6m12m
RW 0.82 0.98 0.94 0.99 0.98 1.00 1.00 1.00 0.86 0.80 0.71 0.85 0.61 0.26 0.00
AR 0.00 0.00 0.06 0.01 0.02 0.00 0.00 0.00 0.14 0.19 0.29 0.15 0.38 0.74 0.97
UCSV 0.18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.03
RF 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00
Panel (e): Highest rolling MAE
Forecasting horizon
Model1234567891011123m6m12m
RW 0.90 0.94 0.86 1.00 0.99 0.97 1.00 0.94 0.82 0.73 0.69 0.77 0.63 0.26 0.03
AR 0.08 0.03 0.13 0.00 0.01 0.03 0.00 0.06 0.18 0.27 0.28 0.23 0.36 0.74 0.86
UCSV 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.11
RF 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00
Panel (f): Highest rolling MAD
Forecasting horizon
Model1234567891011123m6m12m
RW 0.68 0.94 0.85 0.80 0.80 0.66 0.81 0.94 0.77 0.80 0.79 0.92 0.68 0.30 0.31
AR 0.05 0.03 0.12 0.10 0.10 0.07 0.04 0.02 0.10 0.03 0.02 0.01 0.18 0.57 0.51
UCSV 0.22 0.03 0.03 0.10 0.04 0.26 0.14 0.04 0.11 0.14 0.19 0.07 0.12 0.13 0.17
RF 0.05 0.00 0.00 0.00 0.06 0.02 0.01 0.00 0.02 0.02 0.00 0.00 0.02 0.01 0.00
NOTE: The table reports the frequency with which each model achieved the best (worst) performance statistics over a rolling window period of four years (48 observations). The last three columns represent, respectively, the ratios for the accumulated
three, six, and twelve-month forecasts. The statistics for the model with the highest figures are highlighted in bold.
14 Journal of Business & Economic Statistics, Month 2019
Table 4. Forecasting results: superior predictive ability test (1990–2015)
Panel (a): Squared errors
Giacomini–White test—Forecasting horizon Quaedvlieg test
Model 1 2 3 4 5 6 7 8 9 10 11 12 3m 6m 12m Unif. Avg.
RW 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.00 0.00 0.03 0.02 0.00 0.00 0.05 0.05 0.00 0.00
AR 0.00 0.01 0.02 0.04 0.02 0.02 0.06 0.08 0.05 0.06 0.01 0.00 0.01 0.02 0.02 0.00 0.00
UCSV 0.00 0.00 0.01 0.06 0.06 0.02 0.00 0.00 0.00 0.04 0.00 0.00 0.04 0.06 0.07 0.00 0.00
Panel (b): Absolute errors
Giacomini–White test—Forecasting horizon Quaedvlieg test
Model 1 2 3 4 5 6 7 8 9 10 11 12 3m 6m 12m Unif. Avg.
RW 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.05 0.05 0.00 0.00
AR 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.02 0.01 0.00 0.00 0.00 0.01 0.02 0.02 0.00 0.00
UCSV 0.00 0.00 0.01 0.03 0.00 0.01 0.00 0.00 0.01 0.05 0.01 0.00 0.04 0.06 0.07 0.00 0.00
NOTE: The table reports the p-values of the unconditional Giacomini-White test for superior predictive ability between the random forest models and each of the benchmark models for each forecasting horizon as well as for the three accumulated
horizons. The test is based on the full out-of-sample period. Panel (a) presents the results for squared errors, while panel (b) shows the results for absolute errors. The GW statistics are computed with Heteroscedastic-Autocorrelation (HAC) robust
variances with the quadratic spectral kernel and bandwidth selected by Andrew’s automatic method. The table also reports the p-values of the uniform and average multi-horizon superior predictive ability test proposed by Quaedvlieg (2017).
matrix of regressors over the forecasting period. As shown
in the figure, there is a small number of dominating factors.
Second, there might be factors that explain only a small portion
of the total variance of the regressors but have high predictive
power on inflation. Again, we do not think this is the case,
as target factors as well as boosting are specifically designed
to enhance the quality of the predictions but do not bring any
visible improvement in this case. Furthermore, we allow the
ML methods to select factors as well, and as shown below,
they are never selected. Finally, we believe the most probable
explanation is that although sparsity can be questioned, factor
models are an overly aggregated representation of the potential
predictors. The results of JMA are not encouraging either.
All of the competing models outperform RW for almost all
horizons. Finally, the forecast combination does not provide
any significant gain due to the empirical fact that most of the
forecasts are positively correlated; see Figure S.3.
To check whether this is a robust finding across the out-of-
sample period, we compute rolling RMSEs, MAEs, and MADs
over windows of 48 observations as shown in Figures S.7–
S.18 in the supplementary materials. The results corroborate
the superiority of the RF model, particularly for long as well as
aggregated horizons.
Focusing now on the two subsamples, the following conclu-
sions stand out from the tables in the supplementary materials.
The superiority of RF is more pronounced during the 2001–
2015 period, when inflation is much more volatile. During this
period, RF achieves the smallest RMSE and MAE ratios for
almost all horizons. From 1990 to 2000, the linear shrinkage
methods slightly outperform RF for short horizons. However,
RF dominates for long horizons and for the twelve-month
forecasts. Among the shrinkage models and during the first
period, there is no clear evidence of a single winner. Depending
on the horizon, different models perform the best. Another
salient fact is that there are fewer models included in the MSC
during the first subsample.
Finally, we test whether the superiority of the RF model
depends on the state of the economy. We consider two cases,
namely, recessions versus expansions according to the NBER
classification and high versus low macroeconomic uncertainty.9
We consider the following regressions
e2
t+h,RF
e2
t+h,other model =α0It+h+α1(1It+h)+error, (8)
where
e2
t+h,RF is the squared forecasting error of the RF for
horizon h,
e2
t+h,other model is the squared forecasting error of the
competing model, and Itis an indicator function that equals one
for periods of recession (or high macroeconomic uncertainty).
The results are presented in Tables S.14 and S.15 in the supple-
mentary materials for periods of expansion versus recession and
high versus low macroeconomic uncertainty, respectively. The
9Periods of high (low) macroeconomic uncertainty are those where uncertainty
is higher (lower) than the historical average. Since the results barely change if
we consider either financial or real, rather than macroeconomic, uncertainty,
we do not report them for brevity. They are available upon request. These
measures of macroeconomic, financial and real uncertainty are computed
as in Jurado, Ludvigson, and Ng (2015) and are the conditional volatility
of the unforecastable part of macroeconomic, financial and firm-level vari-
ables, respectively. They are available at Sydney C. Ludvigson’s webpage
(https://www.sydneyludvigson.com/).
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 15
Table 5. Forecasting errors for the CPI from 1990 to 2015
Consumer price index 1990–2015
Forecasting horizon
RMSE/(MAE)1234567891011123m6m12m
RMSE count
(MAE count)
RW 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1
(1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (1.00) (0)
AR 0.90 0.81 0.79 0.81 0.79 0.79 0.78 0.76 0.78 0.82 0.84 0.75 0.86 0.97 1.22 8
(0.87) (0.79) (0.78) (0.81) (0.80) (0.81) (0.78) (0.76) (0.81) (0.85) (0.86) (0.76) (0.89) (1.04) (1.22) (0)
UCSV 0.95 0.82 0.80 0.81 0.78 0.78 0.78 0.78 0.77 0.80 0.83 0.78 0.87 0.86 0.91 9
(0.91) (0.82) (0.79) (0.80) (0.80) (0.79) (0.80) (0.79) (0.78) (0.80) (0.85) (0.78) (0.86) (0.90) (0.89) (4)
LASSO 0.83 0.75 0.73 0.76 0.74 0.75 0.75 0.73 0.75 0.80 0.82 0.73 0.75 0.79 0.98 13
(0.82) (0.74) (0.73) (0.78) (0.77) (0.75) (0.74) (0.71) (0.76) (0.81) (0.84) (0.74) (0.79) (0.89) (1.04) (11)
adaLASSO 0.84 0.76 0.74 0.77 0.75 0.75 0.76 0.75 0.76 0.80 0.83 0.72 0.76 0.80 0.96 13
(0.81) (0.75) (0.72) (0.77) (0.75) (0.74) (0.73) (0.71) (0.75) (0.79) (0.84) (0.73) (0.79) (0.86) (0.96) (13)
ElNet 0.83 0.75 0.73 0.76 0.75 0.74 0.75 0.74 0.76 0.81 0.82 0.73 0.75 0.79 0.98 13
(0.82) (0.74) (0.73) (0.78) (0.78) (0.76) (0.75) (0.71) (0.77) (0.81) (0.85) (0.75) (0.78) (0.89) (1.05) (11)
adaElnet 0.84 0.75 0.73 0.77 0.75 0.75 0.75 0.74 0.76 0.80 0.81 0.73 0.76 0.81 0.96 13
(0.82) (0.74) (0.72) (0.76) (0.75) (0.74) (0.73) (0.71) (0.75) (0.79) (0.83) (0.75) (0.79) (0.86) (0.97) (13)
RR 0.85 0.73 0.72 0.75 0.74 0.75 0.75 0.73 0.74 0.77 0.78 0.70 0.73 0.77 0.89 14
(0.83) (0.72) (0.72) (0.77) (0.76) (0.76) (0.73) (0.71) (0.74) (0.77) (0.79) (0.71) (0.77) (0.86) (0.93) (14)
BVAR 0.86 0.76 0.75 0.77 0.74 0.76 0.77 0.76 0.77 0.82 0.83 0.74 0.79 0.85 1.07 12
(0.87) (0.73) (0.75) (0.79) (0.78) (0.78) (0.76) (0.76) (0.81) (0.83) (0.85) (0.76) (0.82) (0.93) (1.09) (9)
Bagging 0.83 0.76 0.76 0.80 0.78 0.79 0.83 0.81 0.78 0.82 0.83 0.74 0.74 0.76 0.82 12
(0.84) (0.78) (0.79) (0.87) (0.86) (0.85) (0.83) (0.80) (0.80) (0.84) (0.86) (0.78) (0.79) (0.89) (0.88) (7)
CSR 0.85 0.77 0.76 0.79 0.77 0.79 0.79 0.77 0.79 0.83 0.84 0.76 0.79 0.87 1.13 10
(0.84) (0.76) (0.75) (0.79) (0.79) (0.79) (0.76) (0.74) (0.79) (0.83) (0.84) (0.77) (0.82) (0.94) (1.11) (4)
JMA 0.99 0.82 0.84 0.85 0.84 0.81 0.91 0.86 0.84 0.95 0.92 0.80 0.76 0.80 0.88 4
(0.99) (0.85) (0.89) (0.94) (0.96) (0.90) (0.91) (0.87) (0.93) (0.96) (0.96) (0.83) (0.86) (0.96) (0.91) (0)
Factor 0.87 0.78 0.78 0.79 0.78 0.78 0.80 0.81 0.82 0.84 0.84 0.78 0.82 0.90 1.17 4
(0.88) (0.80) (0.80) (0.82) (0.82) (0.80) (0.78) (0.80) (0.87) (0.87) (0.87) (0.82) (0.89) (1.02) (1.21) (1)
T. Factor 0.88 0.79 0.78 0.80 0.77 0.79 0.79 0.80 0.80 0.82 0.83 0.78 0.82 0.91 1.17 3
(0.87) (0.82) (0.81) (0.84) (0.83) (0.84) (0.80) (0.80) (0.84) (0.87) (0.86) (0.80) (0.90) (1.06) (1.23) (0)
16 Journal of Business & Economic Statistics, Month 2019
Table 5. Continued
Consumer price index 1990–2015
Forecasting horizon
RMSE/(MAE)1234567891011123m6m12m
RMSE count
(MAE count)
B. Factor 0.95 0.77 0.76 0.78 0.77 0.79 0.79 0.78 0.79 0.83 0.84 0.74 0.82 0.91 1.17 10
(0.96) (0.80) (0.81) (0.85) (0.84) (0.86) (0.84) (0.82) (0.85) (0.86) (0.86) (0.75) (0.92) (1.13) (1.32) (1)
RF 0.84 0.73 0.71 0.74 0.71 0.72 0.72 0.71 0.72 0.76 0.77 0.68 0.71 0.71 0.77 15
(0.81) (0.72) (0.71) (0.75) (0.73) (0.73) (0.70) (0.68) (0.72) (0.75) (0.77) (0.67) (0.74) (0.77) (0.77) (15)
Mean 0.83 0.75 0.73 0.76 0.74 0.74 0.75 0.74 0.75 0.77 0.78 0.71 0.75 0.80 0.95 14
(0.81) (0.74) (0.73) (0.76) (0.76) (0.75) (0.73) (0.71) (0.75) (0.76) (0.78) (0.70) (0.78) (0.87) (0.97) (14)
T.M ea n 0.84 0.74 0.73 0.75 0.74 0.74 0.75 0.73 0.74 0.78 0.79 0.71 0.75 0.79 0.95 14
(0.81) (0.74) (0.72) (0.76) (0.75) (0.74) (0.72) (0.70) (0.74) (0.77) (0.79) (0.70) (0.78) (0.86) (0.96) (14)
Median 0.84 0.75 0.72 0.76 0.74 0.74 0.75 0.73 0.74 0.78 0.79 0.71 0.75 0.79 0.94 14
(0.81) (0.74) (0.72) (0.76) (0.76) (0.74) (0.73) (0.70) (0.74) (0.77) (0.79) (0.71) (0.78) (0.86) (0.97) (14)
RF/OLS 0.81 0.73 0.72 0.75 0.74 0.75 0.75 0.74 0.74 0.78 0.79 0.71 0.73 0.78 0.94 14
(0.79) (0.73) (0.72) (0.76) (0.76) (0.76) (0.73) (0.72) (0.75) (0.78) (0.81) (0.72) (0.78) (0.86) (0.97) (14)
adaLASSO/RF 0.85 0.76 0.72 0.73 0.73 0.72 0.72 0.71 0.72 0.79 0.82 0.70 0.73 0.73 0.80 15
(0.82) (0.73) (0.72) (0.74) (0.74) (0.73) (0.71) (0.68) (0.72) (0.79) (0.82) (0.68) (0.76) (0.80) (0.82) (15)
RMSE count 14 15 13 17 19 19 17 15 17 18 19 7 13 17 5
MAE count (12) (13) (12) (11) (10) (12) (14) (12) (10) (15) (16) (7) (13) (13) (4)
NOTE: The table shows the root mean squared error (RMSE) and, between parenthesis, the mean absolute errors (MAE) for all models relative to the random walk (RW). The error measures were calculated from 132 rolling windows covering the
1990–2000 period and 180 rolling windows covering the 2001–2015 period. Values in bold show the most accurate model in each horizon. Cells in gray (blue) show the models included in the 50% model confidence set (MCS) using the squared error
(absolute error) as loss function. The MCSs were constructed based on the maximum t-statistic. The last column in the table reports in how many horizons the row model was included in the MCS for square (absolute) loss. The last two rows in the table
report how many models were included in the MCS for square and absolute losses.
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 17
tables report the estimated values of α0and α1in Equation (8)
as the respective standard errors. For conciseness, we display
only the results for the most relevant models.
Inspecting the tables, it is clear that the majority of the
statistics are negative, which indicates that the RF model is
superior to its competitors. Of the 72 entries in each table,
the values of the statistics are positive only in four and seven
cases in Tables S.14 and S.15, respectively. However, the
differences are not statistically significant during recessions.
This result is not surprising, as only 34 of the 312 out-of-sample
observations are labeled as recessions. Nevertheless, the mag-
nitudes of the differences are much higher during recessions.
Turning attention to periods of low and high macroeconomic
uncertainty, it is evident that the RF model is statistically
superior to the benchmarks and the differences are higher
in periods of greater uncertainty. The gains from using RF
are particularly large during and after the Great Recession
(see Figures S.4–S.18). As argued above, both the degrees of
slackness and uncertainty might be sources of nonlinearities
in the economy. The fact that the RF model outperforms
competitors in these states of the economy suggests that allow-
ing for nonlinearities is key to improving macroeconomic
forecasts.
5.4. Real-Time Experiment
To test the robustness of our results, we also conduct an
experiment with the real-time database provided by FRED-MD.
Due to data limitations, we compute the real-time forecasts only
for the second subsample considered here. We compare the fol-
lowing models against the three benchmarks: RR, adaLASSO,
RF, RF/OLS, and adaLASSO/RF. Table 6 reports, for each
forecasting horizon, the RMSE, MAE, and MAD ratios with
respect to the RW model. The following conclusions emerge
from the table. The ML methods clearly outperform the three
benchmarks. Furthermore, for the three accumulated horizons
considered, RF is the best model. For the monthly forecasts,
both RF and adaLASSO/RF achieve the best performance for
most of the horizons. For one-month-ahead, RR seems to be
the best model. The results are robust to the choice between the
RMSE and MAE criteria. For MAD, RR is also competitive.
These results corroborate our conclusions that ML methods
should be considered seriously to forecast inflation.
5.5. Opening the Black Box: Variable Selection
We compare the predictors selected by some of the ML
methods: adaLASSO, RR, and RF. We select these three mod-
els for two reasons. First, they are generally the three best-
performing models. Second, they have quite different charac-
teristics. While adaLASSO is a true sparsity-inducing method,
RR and RF models are only approximately sparse. In addition,
RR is a linear model, whereas RF is a highly nonlinear specifi-
cation.
This analysis is straightforward for sparsity-inducing shrink-
age methods such as adaLASSO, as the coefficients of poten-
tially irrelevant variables are automatically set to zero.10 For
the other ML methods, the analysis is more complex. To
ensure that the results among models are comparable, we adopt
the following strategy. For RR and adaLASSO, the relative
importance measure is computed as the average coefficient
size (multiplied by the respective standard deviations of the
regressors). To measure the importance of each variable for the
RF models, we use out-of-bag (OOB) samples.11 When the bth
tree is grown, the OOB samples are passed down the tree, and
the prediction accuracy is recorded. Then, the values of the jth
variable are randomly permuted in the OOB sample, and the
accuracy is again computed. The decrease in accuracy due to
the permutation is averaged over all trees and is the measure of
the importance of the jth variable in RF.
Due to space constraints, we cannot show the relative impor-
tance for each variable, each lag, each horizon, or each esti-
mation window. Therefore, as described in the supplementary
materials and following McCracken and Ng (2016), we cat-
egorize the variables, including lags, into the following eight
groups: (i) output and income; (ii) labor market; (iii) housing;
(iv) consumption, orders, and inventories; (v) money and credit;
(vi) interest and exchange rates; (vii) prices; and (viii) stock
market. We also consider two additional groups, namely, the
principal component factors computed from the full set of
potential predictors and autoregressive terms. Furthermore, the
results are averaged across all estimation windows.
Figure 2 shows the importance of each variable group for the
adaLASSO, RR, and RF methods for each horizon. The values
in the plots are rescaled to sum to one.
The set of the most relevant variables for the RR and
RF models is quite stable across forecasting horizons but is
remarkably different between them. While for RR, AR terms,
prices and employment are the most important predictors, RF
models give more importance to prices, interest-exchange rates,
employment and housing. For adaLASSO, selection is quite
different across forecasting horizons, and only AR terms retain
their relative importance independent of the horizon. Other
prices gradually lose their relevance until up to six months
ahead and partially recover relevance when longer horizons are
considered. Output-income variables are more important for
medium-term forecasts. Finally, none of the three classes of
models selects either factors or stocks. This result may indicate
that the high level of cross-section aggregation of the factors is
responsible for the poor performance of factor models.
To compare the degree of sparsity between adaLASSO and
RF, we report word clouds of the selected variables in the
supplementary materials. A word cloud is an image composed
of the names of variables selected by a specific model across the
estimation windows; in the word cloud, the size of each word
indicates its frequency or importance. The names displayed in
the clouds are as defined in the third columns of Tables S.1–S.8.
10Medeiros and Mendes (2016) showed, for example, that under sparsity
conditions, the adaLASSO model selection is consistent for high-dimensional
time series models in very general settings, that is, the method correctly selects
the “true” set of regressors.
11For a given data point (yt,x
t), the OOB sample is the collection of all
bootstrap samples that do not include (yt,x
t).
18 Journal of Business & Economic Statistics, Month 2019
Table 6. Real-time forecasting results: RMSE, MAE, and MAD ratios
Panel (a): RMSE ratio
Forecasting horizon
Model1234567891011123m6m12m
AR 0.91 0.81 0.77 0.78 0.79 0.80 0.80 0.77 0.79 0.82 0.83 0.77 0.84 0.95 1.21
UCSV 0.97 0.82 0.79 0.80 0.78 0.79 0.81 0.80 0.79 0.81 0.81 0.77 0.86 0.88 0.87
RR 0.84 0.74 0.72 0.75 0.76 0.78 0.79 0.76 0.77 0.79 0.80 0.74 0.72 0.80 0.96
adaLASSO 0.93 0.92 1.08 1.01 1.56 0.96 0.96 0.92 0.88 0.93 0.94 0.87 0.86 1.03 1.11
RF 0.88 0.74 0.70 0.72 0.72 0.75 0.76 0.74 0.74 0.79 0.80 0.72 0.69 0.64 0.65
RF/OLS 0.84 0.75 0.72 0.75 0.76 0.78 0.78 0.76 0.76 0.78 0.79 0.73 0.73 0.79 0.95
adaLASSO/RF 0.85 0.73 0.70 0.71 0.74 0.73 0.75 0.72 0.74 0.77 0.78 0.72 0.70 0.72 0.74
Panel (b): MAE ratio
Forecasting horizon
Model1234567891011123m6m12m
AR 0.87 0.80 0.73 0.75 0.80 0.81 0.79 0.75 0.77 0.80 0.84 0.78 0.83 0.98 1.19
UCSV 0.91 0.82 0.77 0.77 0.79 0.80 0.81 0.80 0.79 0.80 0.83 0.78 0.85 0.91 0.86
RR 0.81 0.72 0.68 0.72 0.77 0.80 0.77 0.73 0.76 0.76 0.82 0.74 0.72 0.81 0.94
adaLASSO 0.88 0.91 0.89 0.98 1.14 1.00 1.00 0.94 0.88 0.92 1.01 0.92 0.87 1.11 1.09
RF 0.83 0.72 0.66 0.72 0.74 0.76 0.75 0.72 0.72 0.75 0.81 0.73 0.68 0.65 0.63
RF/OLS 0.82 0.73 0.69 0.74 0.77 0.80 0.79 0.74 0.75 0.76 0.80 0.74 0.74 0.80 0.90
adaLASSO/RF 0.81 0.71 0.65 0.68 0.73 0.74 0.75 0.70 0.72 0.75 0.80 0.74 0.69 0.71 0.71
Panel (c): MAD ratio
Forecasting horizon
Model1234567891011123m6m12m
AR 0.80 0.78 0.66 0.80 0.76 0.76 0.73 0.74 0.81 0.77 0.83 0.73 0.83 1.13 0.95
UCSV 0.86 0.82 0.82 0.81 0.80 0.76 0.90 0.86 0.84 0.75 0.92 0.70 0.93 1.03 0.88
RR 0.77 0.70 0.60 0.76 0.74 0.75 0.75 0.72 0.77 0.69 0.75 0.67 0.72 0.90 0.75
adaLASSO 0.75 0.94 0.86 1.01 1.07 1.04 1.09 1.05 1.05 0.90 1.08 0.94 0.85 1.22 0.97
RF 0.74 0.72 0.62 0.73 0.75 0.77 0.75 0.73 0.76 0.65 0.81 0.68 0.72 0.73 0.53
RF/OLS 0.77 0.71 0.66 0.76 0.77 0.78 0.81 0.77 0.83 0.72 0.85 0.69 0.76 0.84 0.77
adaLASSO/RF 0.70 0.68 0.60 0.66 0.74 0.73 0.78 0.64 0.71 0.76 0.81 0.67 0.69 0.80 0.56
NOTE: The table reports, for each forecasting horizon, the root mean squared error (RMSE), mean absolute error (MAE), and median absolute deviation from the median (MAD) ratios with respect to the random walk model for the period 2001–2015.
The last three columns represent, respectively, the ratios for the accumulated three, six, and twelve-month forecasts. The statistics for the best-performing model are highlighted in bold. The forecasts are computed in real-time.
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 19
Figure 2. Variable importance. The picture shows the importance of each variable group for the adaLASSO, RR, and RF methods for all
the twelve forecasting horizons. For all different methods, the values in the plots are rescaled to sum one. For RR and adaLASSO, the relative
importance measure is computed as the average coefficient size (multiplied by the respective standard deviations of the regressors). To measure
the importance of each variable for the RF models, we use out-of-bag (OOB) samples. When the bth tree is grown, the OOB samples are passed
down the tree and the prediction accuracy is recorded. Then, the values of the jth variable are randomly permuted in the OOB sample, and the
accuracy is again computed. The decrease in accuracy due to the permutation is averaged over all trees and is the measure of the importance of
the jth variable in the RF.
It is evident that the RF models are much less sparse than
adaLASSO.
RR, adaLASSO, and RF select different variables, which
suggests nontrivial interactions among variable selection, spar-
sity, and nonlinearity. If the econometrician is only interested
in forecasting, variable selection is less of a concern. One
should select (a combination of) the best-performing models. If
she/he is also interested in the precise mechanisms underlying
price dynamics, careful identification schemes should be con-
sidered, which is beyond the scope of this article. Nonetheless,
compared to adaLASSO and RR, the best-performing method,
RF, uses disaggregated inflation as a substitute for lags of
CPI inflation, thus resembling an unusual Phillips curve with
heterogeneous backward-looking terms. This finding may shed
light on price dynamics and can be useful for future research
aimed at uncovering such mechanisms.
5.6. Opening the Black Box: Nonlinearities
The role of nonlinearity in the relative performance
of the RF model is highlighted in the plots of recursive
RMSEs, MAEs, and MADs in Figures S.7–S.18. First, RF
is compared with the adaLASSO/RF and RF/OLS models
in Figures S.16–S.18. For h=1, the performances of
the three models are almost identical, indicating that there
are no benefits in introducing nonlinearity for very short-
term forecasts. On the other hand, the superiority of the
RF model becomes more evident for longer horizons and
for the accumulated inflation over six and twelve months.
Furthermore, the results do not seem to be due to a few
outliers, as both the rolling MAEs and MADs, which are less
sensitive to extreme observations, confirm the overperformance
of the RF model. These findings are also corroborated when
20 Journal of Business & Economic Statistics, Month 2019
Table 7. Forecasting results (alternative models)
Panel (a): RMSE ratio
Forecasting horizon
Model 1 2 3 4 5 6 7 8 9 10 11 12 3m 6m 12m
RF 0.84 0.73 0.71 0.74 0.71 0.72 0.72 0.71 0.72 0.76 0.77 0.68 0.71 0.71 0.77
SCAD 0.85 0.76 0.77 0.79 0.77 0.77 0.81 0.77 0.78 0.83 0.84 0.73 0.75 0.79 0.96
BTrees 0.88 0.78 0.74 0.76 0.73 0.74 0.74 0.73 0.73 0.77 0.79 0.71 0.74 0.70 0.71
Deep NN 1.00 0.84 0.78 0.80 0.78 0.85 0.79 0.77 0.78 0.84 0.81 0.75 0.81 0.84 0.90
LASSO 0.89 0.76 0.74 0.76 0.74 0.72 0.74 0.75 0.75 0.79 0.83 0.74 0.79 0.83 1.04
adaLASSO 0.91 0.73 0.71 0.73 0.71 0.70 0.72 0.73 0.74 0.76 0.80 0.69 0.74 0.73 0.80
Panel (b): MAE ratio
Forecasting horizon
Model 1 2 3 4 5 6 7 8 9 10 11 12 3m 6m 12m
RF 0.81 0.72 0.71 0.75 0.73 0.73 0.70 0.68 0.72 0.75 0.77 0.67 0.74 0.77 0.77
SCAD 0.84 0.77 0.78 0.81 0.81 0.78 0.80 0.74 0.79 0.84 0.87 0.76 0.80 0.87 0.99
BTrees 0.84 0.75 0.74 0.78 0.76 0.77 0.74 0.70 0.73 0.75 0.78 0.70 0.77 0.75 0.71
Deep NN 0.97 0.84 0.83 0.85 0.84 0.87 0.79 0.79 0.83 0.85 0.84 0.79 0.83 0.93 0.91
LASSO 0.94 0.78 0.75 0.79 0.79 0.74 0.74 0.76 0.80 0.82 0.89 0.78 0.88 1.01 1.21
adaLASSO 0.86 0.71 0.71 0.75 0.73 0.71 0.69 0.71 0.75 0.76 0.82 0.69 0.75 0.80 0.85
Panel (c): MAD ratio
Forecasting horizon
Model 1 2 3 4 5 6 7 8 9 10 11 12 3m 6m 12m
RF 0.70 0.63 0.77 0.84 0.75 0.73 0.65 0.64 0.73 0.69 0.71 0.58 0.71 0.80 0.59
SCAD 0.80 0.75 0.79 0.83 0.89 0.77 0.71 0.69 0.80 0.78 0.84 0.70 0.82 1.02 0.81
BTrees 0.73 0.70 0.79 0.86 0.80 0.78 0.69 0.71 0.76 0.68 0.77 0.65 0.79 0.88 0.61
Deep NN 0.85 0.73 0.90 0.97 0.94 0.83 0.78 0.86 0.95 0.83 0.87 0.83 0.82 1.15 0.87
LASSO 0.83 0.68 0.70 0.80 0.73 0.68 0.66 0.64 0.76 0.73 0.73 0.60 0.77 0.88 0.65
adaLASSO 0.81 0.65 0.78 0.81 0.81 0.76 0.63 0.63 0.81 0.78 0.82 0.66 0.75 0.90 0.76
The table reports, for each forecasting horizon, the root mean squared error (RMSE), mean absolute error (MAE), and median absolute deviation from the median (MAD) ratios with respect to the random walk model for the full out-of-sample period
(1990–2015). The last three columns represent, respectively, the ratios for the accumulated three, six, and twelve-month forecasts. The statistics for the best-performing model are highlighted in bold.
Medeiros et al.: Forecasting Inflation in a Data-Rich Environment: The Benefits of Machine Learning Methods 21
RF is compared with the other linear models; see Figures
S.7–S.15.
Although is clear that nonlinearity is present in the dynamics
of inflation, it is very difficult to completely uncover the
nonlinear mechanism driving the forecasts, as in our experiment
we estimated 12 different RF models (one for each horizon) for
each rolling window. As we have 312 windows, there are total
of 3744 different models to analyze.
5.7. Robustness: Alternative Nonlinear Models and
Other Penalties
As a robustness check, we compute forecasts from other
nonlinear alternatives. The first is a boosted regression tree
(BTree) as in Friedman (2001). The second one is based
on the results of Gu, Kelly, and Xiu (2018) and is a Deep
Neural Network (DeepNN) with three hidden layers and 32,
16, and 8 rectified linear units (ReLU) in each layer, respec-
tively. The DeepNN model is estimated by the stochastic
gradient descent algorithm. We also consider two second-order
polynomial models with interaction terms estimated either by
LASSO or adaLASSO. Finally, in addition to these nonlinear
alternatives, we estimate a linear model with a SCAD penalty.
Table 7 reports, for each forecasting horizon, the RMSE, the
MAE, and the MAD ratios with respect to the RW model for
the full out-of-sample period (1990–2015). In terms of RMSE,
RF is the best model in 11 of 15 cases. Among these, in
five cases there is a tie with the polynomial model estimated
with adaLASSO. In general, the DeepNN model has the worst
performance. SCAD is also outperformed by the RF model.
The results for MAE are quite similar. On the other hand, there
are some differences when MAD is considered. Although the
RF model is still the best alternative, the polynomial model
estimated with LASSO overperforms the one estimated with
adaLASSO.
These results corroborate the superiority of RF. Furthermore,
the fact that the polynomial models with interactive terms are
competitive sheds some light on the nature of the nonlinearity
captured by the tree-based models. We should bear in mind
that regression trees are models that explore interactions among
covariates.
6. CONCLUSIONS
We show that with the recent advances in ML methods
and the availability of new and rich datasets, it is possible
to improve inflation forecasts. Models such as LASSO, RF,
and others are able to produce more accurate forecasts than
the standard benchmarks. These results highlight the benefits
of ML methods and big data for macroeconomic forecasting.
Although our article focuses on inflation forecasting in the US,
one can easily apply ML methods to forecast other macroeco-
nomic series in a variety of countries.
The RF model deserves special attention, as it robustly
delivers the smallest errors. Its good performance is due to both
potential nonlinearities and its variable selection mechanism.
The selection of variables for RF models is stable across
horizons. These variables are mostly selected from the follow-
ing groups: prices, exchange and interest rates, and the housing
and labor markets. Although it is difficult to disentangle the
precise sources of the nonlinearities that the RF model uncov-
ers, the variable selection may shed light on them. In fact, there
are many theoretical arguments justifying nonlinear relation-
ships among inflation, interest rate, labor market outcomes,
and housing. For example, the relationship between inflation
and employment depends on the degree of slackness in the
economy. Uncertainty might also induce nonlinearities. Finally,
part of the out-of-sample window encompasses quarters when
the zero lower bound on nominal interest rates is binding, which
is another source of nonlinearity. This out-of-sample window
also encompasses a period in which a housing bubble led to a
credit crunch, events with highly nonlinear consequences.
RF is also the winning method in the periods of expansion
and recession as well as in the periods of low uncertainty and
high uncertainty. The gains from using RF are larger in periods
of recession and high uncertainty. RF also outperforms other
methods during and after the Great Recession, when uncer-
tainty skyrocketed and when the zero lower bound was binding.
Taken together, these results suggest that the relationships
among key macroeconomic variables can be highly nonlinear. If
this is the case, the linear methods applied in the profession not
only to forecast variables but also to achieve other objectives
such as approximate DSGE models might lead to inaccurate
results, especially during bad states of the economy.
SUPPLEMENTARY MATERIALS
The supporting material consists of a complete description on the data and
required transformations, descriptive statistics, more detailed results for the
CPI inflation as well as results concerning the PCE and the CPI core inflation
measures.
ACKNOWLEDGMENTS
We are very grateful to the coeditor, Christian B. Hansen; the associate
editor; and two anonymous referees for very helpful comments. We thank
Federico Bandi, Anders B. Kock, and Michael Wolf for comments during
the workshop “Trends in Econometrics: Machine Learning, Big Data, and
Financial Econometrics,” Rio de Janeiro, October 2017. The participants
of the “Third International Workshop in Financial Econometrics,” Arraial
d’Ajuda, October 2017, the workshop “Big Data, Machine Learning and
the Macroeconomy,” Oslo, October 2017, and the National Symposium on
Probability and Statistics (SINAPE), S˜
ao Pedro, September 2018, are also
gratefully acknowledged. We also thank Alberto Cavallo for an insightful
discussion during the workshop “Measuring and Analyzing the Economy using
Big Data” at the Central Bank of Brazil, November 2017. We also acknowledge
the suggestions received during seminars at the Universit`
a Ca’Foscari, Venice;
at the School of Applied Mathematics at FGV/RJ; and at the Department
of Economics at Duke University. Special thanks also to Tim Bollerslev, Jia
Li, Andrew Patton, Rogier Quaedvlieg, Peter R. Hansen, Lucciano Villacorta,
Mark Watson, and Brian Weller for inspiring discussions. Finally, we are
grateful to Michael McCracken for his help with the real-time dataset. The
views expressed herein do not necessarily reflect the position of the Central
Bank of Chile.
FUNDING
The research of the first author is partially supported by CNPq and FAPERJ.
[Received November 2018. Revised April 2019.]
22 Journal of Business & Economic Statistics, Month 2019
REFERENCES
Atkeson, A., and Ohanian, L. (2001), “Are Phillips Curves Useful for Forecast-
ing Inflation?,” Federal Reserve Bank of Minneapolis Quarterly Review,
25, 2–11. [1,2,3]
Bai, J., and Ng, S. (2003), “Inferential Theory for Factor Models of Large
Dimensions,” Econometrica, 71, 135–171. [5]
——— (2008), “Forecasting Economic Time Series Using Targeted Predic-
tors,” Journal of Econometrics, 146, 304–317. [2,5,6]
——— (2009), “Boosting Diffusion Indexes,Journal of Applied Economet-
rics, 24, 607–629. [2,5,8]
Ba´
nbura, M., Giannone, D., and Reichlin, L. (2010), “Large Bayesian Vector
Autoregressions,” Journal of Applied Econometrics, 25, 71–92. [4]
Bloom, N. (2009), “The Impact of Uncertainty Shocks,” Econometrica,
77, 623–685. [3]
Breiman, L. (1996), “Bagging Predictors,” Machine Learning, 24, 123–140.
[6,7]
——— (2001), “Random Forests,” Machine Learning, 45, 5–32. [2,7]
Chakraborty, C., and Joseph, A. (2017), “Machine Learning at Central Banks,
Working Paper 674, Bank of England Staff Working Paper. [2]
Chen, Y.-C., Turnovsky, S., and Zivot, E. (2014), “Forecasting Inflation Using
Commodity Price Aggregates,” Journal of Econometrics, 183, 117–134. [3]
Dellas, H., Gibson, H., Hall, S., and Tavlas, G. (2018), “The Macroeconomic
and Fiscal Implications of Inflation Forecast Errors,” Journal of Economic
Dynamics and Control, 93, 203 –217. [2]
Eggertsson, G. B., and Woodford, M. (2003), “Zero Bound on Interest Rates
and Optimal Monetary Policy,Brookings Papers on Economic Activity,
2003, 139–233. [3]
Elliott, G., Gargano, A., and Timmermann, A. (2013), “Complete Subset
Regressions,” Journal of Econometrics, 177, 357–373. [3,6]
——— (2015), “Complete Subset Regressions With Large-Dimensional Sets
of Predictors,” Journal of Economic Dynamics and Control, 54, 86–110.
[6]
Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized
Likelihood and Its Oracle Properties,” Journal of the American Statistical
Association, 96, 1348–1360. [2]
Faust, J., and Wright, J. (2013), “Forecasting Inflation,” in Handbook of
Economic Forecasting (Vol. 2A), eds. G. Elliott and A. Timmermann,
Amsterdam: Elsevier. [1]
Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2003), “Do Financial
Variables Help Forecasting Inflation and Real Activity in the Euro Area?,”
Journal of Monetary Economics, 50, 1243–1255. [3]
Friedman, J. (2001), “Greedy Function Approximation: A Gradient Boosting
Machine,” The Annals of Statistics, 29, 1189–1232. [21]
Garcia, M., Medeiros, M., and Vasconcelos, G. (2017), “Real-Time Inflation
Forecasting With High-Dimensional Models: The Case of Brazil,Interna-
tional Journal of Forecasting, 33, 679–693. [3]
Giacomini, R., and White, H. (2006), “Tests of Conditional Predictive Ability,”
Econometrica, 74, 1545–1578. [4,10]
Giannone, D., Lenza, M., and Primiceri, G. (2018), “Economic Predictions
With Big Data: The Illusion of Sparsity,” Working Paper, Northwestern
University. [2,10]
Groen, J., Paap, R., and Ravazzolo, F. (2013), “Real-Time Inflation Forecasting
in a Changing World,Journal of Business and Economic Statistics, 31, 29–
44. [3]
Gu, S., Kelly, B., and Xiu, D. (2018), “Empirical Asset Pricing With Machine
Learning,” Working Paper, University of Chicago. [1,21]
Hall, A. S. (2018), “Machine Learning Approaches to Macroeconomic Fore-
casting,” The Federal Reserve Bank of Kansas City Economic Review, 103,
63. [2]
Hansen, B. E., and Racine, J. S. (2012), “Jackknife Model Averaging,” Journal
of Econometrics, 167, 38–46. [6]
Hansen, P. (2005), “A Test for Superior Predictive Ability,Journal of Business
and Economic Statistics, 23, 365–380. [8,9]
Hansen, P. R., Lunde, A., and Nason, J. M. (2011), “The Model Confidence
Set,” Econometrica, 79, 453–497. [8,9,10]
Hastie, T., Tibshirami, R., and Friedman, J. (2001), The Elements of Statistical
Learning: Data Mining, Inference and Prediction, New York: Springer. [7]
Hoerl, A. E., and Kennard, R. W. (1970), “Ridge Regression: Biased Estimation
for Nonorthogonal Problems,” Technometrics, 12, 55–67. [5]
Iacoviello, M. (2005), “House Prices, Borrowing Constraints, and Monetary
Policy in the Business Cycle,” American Economic Review, 95, 739–764.
[3]
Inoue, A., and Kilian, L. (2008), “How Useful Is Bagging in Forecasting
Economic Time Series? A Case Study of U.S. CPI Inflation,Journal of
the American Statistical Association, 103, 511–522. [3,8]
Jurado, K., Ludvigson, S., and Ng, S. (2015), “Measuring Uncertainty,
American Economic Review, 105, 1177–1215. [14]
Kock, A., and Callot, L. (2015), “Oracle Inequalities for High Dimensional
Vector Autoregressions,” Journal of Econometrics, 186, 325–344. [8]
Lucas, R. (1987), Models of Business Cycles, Oxford: Blackwell. [2]
Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2018), “Statistical and
Machine Learning Forecasting Methods: Concerns and Ways Forward,”
PLoS One, 13, e0194889. [3,4]
McCracken, M., and Ng, S. (2016), “FRED-MD: A Monthly Database for
Macroeconomic Research,” Journal of Business and Economic Statistics,
34, 574–589. [2,4,17]
Medeiros, M., and Mendes, E. (2016), “1-Regularization of High-Dimensional
Time-Series Models With Non-Gaussian and Heteroskedastic Errors,Jour-
nal of Econometrics, 191, 255–271. [3,5,8,17]
Medeiros, M., and Vasconcelos, G. (2016), “Forecasting Macroeconomic
Variables in Data-Rich Environments,” Economics Letters, 138, 50–52.
[6]
Mian, A., and Sufi, A. (2009), “The Consequences of Mortgage Credit Expan-
sion: Evidence From the U.S. Mortgage Default Crisis,” Quarterly Journal
of Economics, 124, 1449–1496. [3]
Mullainathan, S., and Spiess, J. (2017), “Machine Learning: An Applied
Econometric Approach,” Journal of Economic Perspectives, 31, 87–106.
[1]
Nakamura, E. (2005), “Inflation Forecasting Using a Neural Network,Eco-
nomics Letters, 86, 373–378. [3]
Quaedvlieg, R. (2017), “Multi-Horizon Forecast Comparison,” Working Paper,
Erasmus School of Economics. [8,9,10,14]
Scornet, E., Biau, G., and Vert, J.-P. (2015), “Consistency of Random Forests,”
The Annals of Statistics, 43, 1716–1741. [2]
Shiller, R. J. (2014), “Speculative Asset Prices,American Economic Review,
104, 1486–1517. [3]
Stock, J., and Watson, M. (1999), “Forecasting Inflation,” Journal of Monetary
Economics, 44, 293–335. [2,3]
——— (2002), “Macroeconomic Forecasting With Diffusion Indexes,” Journal
of Business and Economic Statistics, 20, 147–162. [1]
——— (2007), “Why has US Inflation Become Harder to Forecast?,” Journal
of Money, Credit and Banking, 39, 3–33. [1,2]
——— (2010), “Modeling Inflation After the Crisis,” Technical Report,
National Bureau of Economic Research. [1]
Svensson, L. E. O., and Woodford, M. (2004), “Implementing Optimal Policy
Through Inflation-Forecast Targeting,” in The Inflation-Targeting Debate,
eds. B. S. Bernanke and M. Woodford, Chicago: University of Chicago
Press, pp. 19–92. [2]
Ter ¨
asvirta, T., van Dijk, D., and Medeiros, M. (2005), “Linear Models, Smooth
Transition Autoregressions and Neural Networks for Forecasting Macroe-
conomic Time Series: A Reexamination,” (with discussion), International
Journal of Forecasting, 21, 755–774. [3]
Tibshirani, R. (1996), “Regression Shrinkage and Selection via the LASSO,
Journal of the Royal Statistical Society, Series B, 58, 267–288. [5]
Wagner, I., and Athey, S. (2018), “Estimation and Inference of Heterogeneous
Treatment Effects Using Random Forests,Journal of the American Statis-
tical Association, 113, 1228–1242. [2]
Zhang, X., Wan, A. T., and Zou, G. (2013), “Model Averaging by Jackknife
Criterion in Models With Dependent Data,Journal of Econometrics,
174, 82–94. [6]
Zou, H. (2006), “The Adaptive Lasso and Its Oracle Properties,” Journal of the
American Statistical Association, 101, 1418–1429. [5]
Zou, H., and Hastie, T. (2005), “Regularization and Variable Selection via the
Elastic Net,” Journal of the Royal Statistical Society, Series B, 67, 301–320.
[5]
... O conjunto dois, composto majoritariamente por modelos univariados, embora de desempenho inferior ao grupo 1, exibe estabilidade na capacidade preditiva. Por fim, o grupo 3, constituído sobretudo por modelos Arima com variáveis antecedentes, expõe desempenho preditivo inferior aos Seguindo Medeiros et al. (2019), a tabela 1 apresenta sumário de estatística do desempenho preditivo ao considerarmos todos os horizontes de previsão. As colunas (1), (2) e (3) apresentam RMSE, MAE e MAD médios, respectivamente. ...
... Tradicionalmente, ao avaliar a capacidade preditiva de modelos previsores, a abordagem rolling window aparece como preferida (Medeiros et al., 2019;Medeiros, Vasconcelos e Freitas, 2015;Panagiotelis et al., 2019;Tarassow, 2019), pois atenua problemas relacionados a outliers e quebra estrutural. Além disso, ela produz série de erros preditivos em diferentes horizontes de previsão, utilizados na avaliação empírica dos modelos considerados, similar ao observado nas seções anteriores. ...
Research
Full-text available
Neste estudo, investigamos a capacidade de variáveis antecedentes, entre elas internações por agressão, na previsão do número de homicídios no Brasil. O objetivo principal desta pesquisa é suprimir a lacuna referente à defasagem de informações na divulgação sobre homicídios no país, permitindo assim análises conjunturais atualizadas. Para tanto, por intermédio do esquema rolling window e da abordagem model confidence set (MCS), investigamos se modelos de variáveis antecedentes apresentam desempenho preditivo superior ao conjunto de modelos univariados. Ao aplicar a abordagem MCS, considerando diferentes estatísticas de avaliação, funções de perda e janelas de estimação, encontramos fortes evidências da capacidade das variáveis antecedentes utilizadas fornecerem conteúdo informacional adicional na previsão da dinâmica criminal brasileira, com modelos de variáveis antecedentes sistematicamente superando modelos univariados. Na média, os melhores modelos de variáveis antecedentes apresentam melhorias relativas ao benchmark random walk, de 60% em termos de raiz do erro quadrado médio (RMSE), erro absoluto médio (MAE) e desvio absoluto médio da média (MAD).
... 4 Finally, the theory and use of machine learning techniques in economics is discussed in more detail in Varian (2014), Kleinberg et al (2015) and Mullainathan and Spiess (2017). Random forests were introduced by Breiman (2001), and have been shown to perform particularly well in economic forecasting (see for instance the comprehensive analysis for inflation in Medeiros et al (2021)). 5 ...
... At least as long as p < 4 is excluded. 11 This is in line with the conclusion ofMedeiros et al (2021) for inflation prediction.12 Predictor importance factors are based on how much each factor contributed to reduce RMSEs after the splits in each tree. ...
Article
This study analyses oil price movements through the lens of an agnostic random forest model, which is based on 1,000 regression trees. It shows that this highly disciplined, yet flexible computational model reduces in-sample root mean square errors (RMSEs) by 65% relative to a standard linear least square model that uses the same set of 11 explanatory factors. In forecasting exercises the RMSE reduction ranges between 51% and 68%, highlighting the relevance of non-linearities in oil markets. The results underscore the importance of incorporating financial factors into oil models: US interest rates, the dollar and the VIX together account for 39% of the models’ RMSE reduction in the post-2010 sample, rising to 48% in the post-2020 sample. If COVID-19 is also considered as a risk factor, these shares become even larger.
... We provide compelling evidence that the random forest model consistently beats the benchmarks and other ML tools in failure prediction. The model's superior performance stems from its variable selection mechanism and its ability to capture wider underlying patterns and relationships in the dataset (Medeiros et al., 2021). In addition, we find that the model's predictive ability differs when we account for crisis/noncrisis periods and for firm-and industry-level heterogeneity. ...
Article
Full-text available
In this study we investigate the ability of machine-learning techniques to predict firm failures, and we compare them against alternatives. Using data on business and financial risks on UK firms over 1994-2019, we document that machine-learning models are systematically more accurate than a discrete hazard benchmark. We conclude that the random forest model outperforms other models in failure prediction. In addition, we show that the improved predictive power of the random forest model relative to its counterparts, persists when we consider extreme economic events as well as firm and industry heterogeneity. Finally, we find that financial factors affect failure probabilities.
Article
We present a hierarchical architecture based on recurrent neural networks for predicting disaggregated inflation components of the Consumer Price Index (CPI). While the majority of existing research is focused on predicting headline inflation, many economic and financial institutions are interested in its partial disaggregated components. To this end, we developed the novel Hierarchical Recurrent Neural Network (HRNN) model, which utilizes information from higher levels in the CPI hierarchy to improve predictions at the more volatile lower levels. Based on a large dataset from the US CPI-U index, our evaluations indicate that the HRNN model significantly outperforms a vast array of well-known inflation prediction baselines. Our methodology and results provide additional forecasting measures and possibilities to policy and market makers on sectoral and component-specific price changes.
Article
We propose an out-of-sample prediction approach that combines unrestricted mixed-data sampling with machine learning (mixed-frequency machine learning, MFML). We use the MFML approach to generate a sequence of nowcasts and backcasts of weekly unemployment insurance initial claims based on a rich trove of daily Google Trends search volume data for terms related to unemployment. The predictions are based on linear models estimated via the LASSO and elastic net, nonlinear models based on artificial neural networks, and ensembles of linear and nonlinear models. Nowcasts and backcasts of weekly initial claims based on models that incorporate the information in the daily Google Trends search volume data substantially outperform those based on models that ignore the information. Predictive accuracy increases as the nowcasts and backcasts include more recent daily Google Trends data. The relevance of daily Google Trends data for predicting weekly initial claims is strongly linked to the COVID-19 crisis.
Article
Machine learning has recently entered the mortality literature in order to improve the forecasts of stochastic mortality models. This paper proposes to use two pure, tree-based machine learning models: random forests and gradient boosting, based on the differenced log-mortality rates to produce more accurate mortality forecasts. These forecasts are compared with forecasts from traditional, stochastic mortality models and with forecasts from random forests and gradient boosting variants of the stochastic models. The comparisons are based on the Model Confidence Set procedure. The results show that the pure, tree-based models significantly outperform all other models in the majority of cases considered. To address the lack of interpretability issue associated with machine learning models, we demonstrate how to extract information about the relationships uncovered by the tree-based models. For this purpose, we consider variable importance, partial dependence plots, and variable split conditions. Results from the in-sample fit suggest that tree-based models can be very useful tools for detecting patterns within and between variables that are not commonly identifiable with traditional methods.
Article
Full-text available
Adopting the conservation of resources theory, this research explores the joint influence of multiple factors affecting the choice of housing, an instance of a high involvement product. We develop an innovative Machine Learning approach to identify the most significant set of factors affecting consumers' housing choices. It was found that housing choices were primarily bound by energy resources and secondarily by other personal, conditional, and object resources. The study identified 25 key factors, many of which previous studies have not explored. The new factors include government payments, vehicle possession, time living with children, marriage duration, current location duration, and health conditions. The results suggest that joint effects of factors are more prominent than individual effects in influencing the choice of high‐involvement products. Further, the study suggests that ML methods are more robust than traditional methods and can be applied to analyse other types of high involvement products. The findings can assist real estate investors, policymakers and other stakeholders to understand sophisticated market behaviours to develop better and tailored housing strategies.
Article
In this paper, we assess whether using non-linear dimension reduction techniques pays off for forecasting inflation in real-time. Several recent methods from the machine learning literature are adopted to map a large dimensional dataset into a lower-dimensional set of latent factors. We model the relationship between inflation and the latent factors using constant and time-varying parameter (TVP) regressions with shrinkage priors. Our models are then used to forecast monthly US inflation in real-time. The results suggest that sophisticated dimension reduction methods yield inflation forecasts that are highly competitive with linear approaches based on principal components. Among the techniques considered, the Autoencoder and squared principal components yield factors that have high predictive power for one-month- and one-quarter-ahead inflation. Zooming into model performance over time reveals that controlling for non-linear relations in the data is of particular importance during recessionary episodes of the business cycle or the current COVID-19 pandemic.
Article
This paper examines several ways to extract timely economic signals from newspaper text and shows that such information can materially improve forecasts of macroeconomic variables including GDP, inflation, and unemployment. Our text is drawn from three popular UK newspapers that collectively represent UK newspaper readership in terms of political perspective and editorial style. Exploiting newspaper text can improve economic forecasts both unconditionally and when conditioning on other relevant information, but the performance of the latter varies according to the method used. Incorporating text into forecasts by combining counts of terms with supervised machine learning delivers the highest forecast improvements relative to existing text‐based methods. These improvements are most pronounced during periods of economic stress when, arguably, forecasts matter most.
Chapter
Forecasts of CPI inflation are critical in many public policy areas and private business planning. Many alternative approaches for selecting CPI forecasting models have been proposed. The standard practice to CPI forecasting is to pursue a winner-take-all perspective by which, for each dataset, a single believed to be the best model is selected from a set of competing approaches. However, model combination methods are becoming a common alternative to using a single time series method. We propose and apply a flexible Bayesian model averaging (BMA) approach of CPI inflation models to mitigate conceptual uncertainty and improve the short-term out-of-sample forecasting accuracy. The model space includes novel machine learning and deep learning algorithms and traditional univariate seasonal time series methods. The empirical results on the United States and Euro Area data reveal that BMA increases the predictive accuracy of CPI inflation forecasts in short-term exercises. The reduced out-of-sample forecast errors of BMA may be explained by their flexibility and capacity to select models that capture the diversity and complexity of inflation determinants and to estimate model weights that reflect the out-of-sample accuracy of a model.
Article
Full-text available
Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.
Article
Full-text available
The accuracy of inflation forecasts has important implications for macroeconomic stability and real interest rates in economies with nominal rigidities. Erroneous forecasts destabilize output, undermine the conduct of monetary policy under inflation targeting and affect the cost of both short and long-term government borrowing. We propose a new method for forecasting inflation that combines individual forecasts using time-varying-coefficient estimation along with an alternative method based on neural nets. Its application to forecast data from the US and the euro area produces superior performance relative to the standard practice of using individual or linear combinations of individual forecasts, especially during periods marked by structural changes.
Article
We propose a new method for estimation in linear models. The ‘lasso’ minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree‐based models are briefly described.
Article
Many scientific and engineering challenges---ranging from personalized medicine to customized marketing recommendations---require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. Given a potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms that, to our knowledge, is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially as the number of covariates increases.
Article
Machines are increasingly doing "intelligent" things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that gives it its own place in the econometric toolbox. Machine learning not only provides new tools, it solves a different problem. Specifically, machine learning revolves around the problem of prediction, while many economic applications revolve around parameter estimation. So applying machine learning to economics requires finding relevant tasks. Machine learning algorithms are now technically easy to use: you can download convenient packages in R or Python. This also raises the risk that the algorithms are applied naively or their output is misinterpreted. We hope to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble—and thus where they can be most usefully applied.