Content uploaded by Jonathan Benchimol
Author content
All content in this area was uploaded by Jonathan Benchimol on Jul 29, 2023
Content may be subject to copyright.
International Journal of Forecasting 39 (2023) 1145–1162
Contents lists available at ScienceDirect
International Journal of Forecasting
journal homepage: www.elsevier.com/locate/ijforecast
Forecasting CPI inflation components with Hierarchical
Recurrent Neural Networks✩
Oren Barkan a, Jonathan Benchimol b, Itamar Caspi b, Eliya Cohen c,
Allon Hammer c, Noam Koenigstein c,∗
aDepartment of Computer Science, The Open University, Israel
bResearch Department, Bank of Israel, Israel
cIby and Aladar Fleischman Faculty of Engineering, Tel Aviv University, Israel
article info
Keywords:
Inflation Forecasting
Disaggregated Inflation
Consumer Price Index
Machine Learning
Gated Recurrent Unit
Recurrent Neural Networks
abstract
We present a hierarchical architecture based on recurrent neural networks for predicting
disaggregated inflation components of the Consumer Price Index (CPI). While the
majority of existing research is focused on predicting headline inflation, many economic
and financial institutions are interested in its partial disaggregated components. To this
end, we developed the novel Hierarchical Recurrent Neural Network (HRNN) model,
which utilizes information from higher levels in the CPI hierarchy to improve predictions
at the more volatile lower levels. Based on a large dataset from the US CPI-U index, our
evaluations indicate that the HRNN model significantly outperforms a vast array of well-
known inflation prediction baselines. Our methodology and results provide additional
forecasting measures and possibilities to policy and market makers on sectoral and
component-specific price changes.
©2022 The Authors. Published by Elsevier B.V. on behalf of International Institute of
Forecasters. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
1. Introduction
The consumer price index (CPI) is a measure of the av-
erage change over time in the prices paid by a representa-
tive consumer for a common basket of goods and services.
The CPI attempts to quantify and measure the average
cost of living in a given country by estimating the pur-
chasing power of a single unit of currency. Therefore, it is
the key macroeconomic indicator for measuring inflation
(or deflation). As such, the CPI is a major driving force in
the economy, influencing a plethora of market dynamics.
In this work, we present a novel model based on recurrent
neural networks (RNNs) for forecasting disaggregated CPI
inflation components.
✩The views expressed in this paper are those of the authors and
do not necessarily reflect the views of the Bank of Israel.
∗Corresponding author.
E-mail address: noamk@tauex.tau.ac.il (N. Koenigstein).
In the mid-1980s, many advanced economies began
a major process of disinflation known as the Great
Moderation. This period was characterized by steady low
inflation and moderate yet steady economic growth (Faust
& Wright,2013). Later, the Global Financial Crisis (GFC)
of 2008, and more recently the economic effects of the
Covid-19 pandemic, were met with unprecedented mon-
etary policies, potentially altering the underlying inflation
dynamics worldwide (Bernanke et al.,2018;Gilchrist
et al.,2017;Woodford,2012). While economists still
debate the underlying forces that drive inflation, all agree
on the importance and value of contemporary inflation re-
search, measurements, and estimation. Moreover, the CPI
is a composite index comprising an elaborate hierarchy
of sub-indexes each with its own dynamics and driving
forces. Hence, in order to better understand inflation dy-
namics, it is useful to deconstruct the CPI index and look
into the specific disaggregated components underneath
the main headline.
https://doi.org/10.1016/j.ijforecast.2022.04.009
0169-2070/©2022 The Authors. Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under
the CC BY license (http://creativecommons.org/licenses/by/4.0/).
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
In the US, the CPI is calculated and reported by the
Bureau of Labor Statistics (BLS). It represents the cost of
a basket of goods and services across the country on a
monthly basis. The CPI is a hierarchical composite index
system that partitions all consumer goods and services
into a hierarchy of increasingly detailed categories. In
the US, the top CPI headline is composed of eight major
sector indexes: (1) Housing, (2) Food and Beverages, (3)
Medical Care, (4) Apparel, (5) Transportation, (6) Energy,
(7) Recreation, and (8) Other Goods and Services. Each
sector is composed of finer and finer sub-indexes until the
entry levels or ‘‘leaves’’ are reached. These entry-level in-
dexes represent concrete measurable products or services
whose price levels are being tracked. For example, the
White Bread entry is classified under the following eight-
level hierarchy: All Items →Food and Beverages →Food
at Home →Cereals and Bakery Products →Cereals and
Cereal Products →Bakery Products →Bread →White
Bread.
The ability to accurately estimate the upcoming disag-
gregated inflation rate is of high interest to policymakers
and market players: Inflation forecasting is a critical tool
for adjusting monetary policies around the world (Fried-
man,1961). Central banks predict future inflation trends
to justify interest rate decisions and to control and main-
tain inflation around their targets. Better understanding
of upcoming inflation dynamics at the component level
can help inform and elucidate decision-makers for opti-
mal monetary policy (Ida,2020). Predicting disaggregated
inflation rates is also important to fiscal authorities that
wish to forecast sectoral inflation dynamics to adjust so-
cial security payments and assistance packages to specific
industrial sectors. In the private sector, investors in fixed-
income markets wish to estimate future sectorial inflation
in order to foresee upcoming trends in discounted real
returns. Additionally, some private firms need to pre-
dict specific inflation components in order to forecast
price dynamics and mitigate risks accordingly. Finally,
both government and private debt levels and interest
payments heavily depend on the expected path of infla-
tion. These are just a few examples that emphasize the
importance of disaggregated inflation forecasting.
Most existing inflation forecasting models attempt to
predict the headline CPI while implicitly assuming that
the same approach can be effectively applied to its dis-
aggregated components (Faust & Wright,2013). How-
ever, as we show below, and in line with the litera-
ture, the disaggregated components are more volatile and
harder to predict. Moreover, changes in the CPI compo-
nents are more prevalent at the lower levels than up at
the main categories. As a result, lower hierarchy levels
often have fewer historical measurements for training
modern machine learning algorithms.
In this work, we present the hierarchical recurrent
neural network (HRNN) model, a novel model based on
RNNs that utilizes the CPI’s inherent hierarchy for im-
proved predictions at its lower levels. The HRNN is a
hierarchical arrangement of RNNs analogous to the CPI’s
hierarchy. This architecture allows information to prop-
agate from higher to lower levels in order to mitigate
volatility and information sparsity that otherwise im-
pedes advanced machine learning approaches. Hence, a
key advantage of the HRNN model stems from its supe-
riority at inflation predictions at lower levels of the CPI
hierarchy. Our evaluations indicate that the HRNN out-
performs many existing baselines at inflation forecasting
of different CPI components below the top headline and
across different time horizons.
Finally, our data and code are publicly available on
GitHub1to enable reproducibility and foster future eval-
uations of new methods. By doing so, we comply with
the call to make data and algorithms more open and
transparent to the community (Makridakis et al.,2018,
2020).
The remainder of the paper is organized as follows.
Section 2presents a literature review of baseline infla-
tion forecasting models and machine learning models.
Section 3explains RNN methodologies. Our novel HRNN
model is presented in Section 4. Section 5describes the
price data and data transformations. In Section 6, we
present our results and compare them to alternative ap-
proaches. Finally, we conclude in Section 7by discussing
potential implications of the current research and several
future directions.
2. Related work
While inflation forecasting is a challenging task of high
importance, the literature indicates that significant im-
provement upon basic time-series models and heuristics
is hard to achieve. Indeed, Atkeson & Ohanian (2001)
found that forecasts based on simple averages of past
inflation were more accurate than all other alternatives,
including the canonical Phillips curve and other forms
of structural models. Similarly, Stock & Watson (2007,
2010) provided empirical evidence for the superiority of
univariate models in forecasting inflation during the Great
Moderation period (1985 to 2007) and during the re-
covery following the GFC. More recently, Faust & Wright
(2013) conducted an extensive survey of inflation fore-
casting methods and found that a simple ‘‘glide path’’
prediction from the current inflation rate performs as well
as model-based forecasts for long-run inflation rates and
often outperforms them.
Recently, an increasing amount of effort has been di-
rected towards the application of machine learning
models for inflation forecasting. For example, Medeiros
et al. (2021) compared inflation forecasting with several
machine learning models such as lasso regression, ran-
dom forests, and deep neural networks. However,
Medeiros et al. (2021) mainly focused on using exogenous
features such as cash and credit availability, online prices,
housing prices, consumer data, exchange rates, and inter-
est rates. When exogenous features are considered, the
emphasis shifts from learning the endogenous time series
patterns to effectively extracting the predictive informa-
tion from the exogenous features. In contrast to Medeiros
et al. (2021), we preclude the use of any exogenous
features and focus on harnessing the internal patterns of
1The code and data are available at https://github.com/
AllonHammer/CPI_HRNN.
1146
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
the CPI series. Moreover, unlike previous works that dealt
with estimating the main headline, this work is focused
on predicting the disaggregated indexes that comprise the
CPI.
In general, machine learning methods flourish where
data are found in abundance and many training examples
are available. Unfortunately, this is not the case with CPI
inflation data. While a large amount of relevant exoge-
nous features exists, there are only 12 monthly readings
annually. Hence, the amount of available training exam-
ples is limited. Furthermore, Stock & Watson (2007) show
that statistics such as the average inflation rate, condi-
tional volatility, and persistency levels are shifting in time.
Hence, inflation is a non-stationary process, which further
limits the amount of relevant historical data points.
Goulet Coulombe et al. (2022), Chakraborty & Joseph
(2017), Athey & Susan (2018), and Mullainathan & Spiess
(2017) present comprehensive surveys of general ma-
chine learning applications in economics. Here, we do not
attempt to cover the plethora of research employing ma-
chine learning for economic forecasting. Instead, we focus
on models that apply neural networks to CPI forecasting
in the next section.
This paper joins several studies that apply neural net-
work methods to the specific task of inflation forecasting:
Nakamura (2005) employed a simple feed-forward net-
work to predict quarterly CPI headline values. Special em-
phasis is placed on early stopping methodologies in order
to prevent over-fitting. Their evaluations are based on US
CPI data from 1978–2003, and predictions are compared
against several autoregressive (AR) baselines. Presented in
Section 6, our evaluations confirm the findings of Naka-
mura (2005), that a fully connected network is indeed
effective at predicting the headline CPI. However, when
the CPI components are considered, we show that the
model in this work demonstrates superior accuracy.
Choudhary & Haider (2012) used several neural net-
works to forecast monthly inflation rates in 28 countries
in the Organisation for Economic Cooperation and De-
velopment (OECD). Their findings showed that, on aver-
age, neural network models were superior in 45% of the
countries while simple AR models of order one (AR1) per-
formed better in 23% of the countries. They also proposed
combining an ensemble of multiple networks arithmeti-
cally for further accuracy.
Chen et al. (2001) explored semi-parametric nonlinear
autoregressive models with exogenous variables (NLARX)
based on neural networks. Their investigation covered
a comparison of different nonlinear activation functions
such as the sigmoid activation, radial basis activation, and
ridgelet activation.
McAdam & McNelis (2005) explored thick neural net-
work models that represent trimmed-mean forecasts from
several models. By combining the network with a linear
Phillips curve model, they predict the CPI for the US,
Japan, and Europe at different levels.
In contrast to the aforementioned works, our model
predicts monthly CPI values in all hierarchy levels. We
utilize information patterns from higher levels of the CPI
hierarchy in order to assist the predictions at lower levels.
Such predictions are more challenging due to the inherent
noise and information sparsity at the lower levels. More-
over, the HRNN model in this work is better equipped
to harness sequential patterns in the data by employing
recurrent neural networks. Finally, we exclude the use of
exogenous variables and rely solely on historical CPI data
to focus on internal CPI pattern modeling.
Almosova & Andresen (2019) employed long short-
term memory (LSTM) for inflation forecasting. They com-
pared their approach to multiple baselines such as
autoregressive models, random walk models, seasonal
autoregressive models, Markov switching models, and
fully connected neural networks. At all time horizons,
the root mean squared forecast of their LSTM model was
approximately one-third of the random walk model and
significantly more accurate than the other baselines.
As we explain in Section 3.3, our model uses gated
recurrent networks (GRUs), which are similar to LSTMs.
Unlike Almosova & Andresen (2019) and Zahara et al.
(2020), a key contribution of our model stems from its
ability to propagate useful information from higher levels
in the hierarchy down to the nodes at lower levels. By
ignoring the hierarchical relations between the different
CPI components, our model is reduced to a set of simple,
unrelated GRUs. This setup is similar to Almosova & An-
dresen (2019), as the difference between LSTMs and GRUs
is negligible. In Section 6, we perform an ablation study
in which the HRNN ignores the hierarchical relations and
is reduced to a collection of independent GRUs, similar
to the model in Almosova & Andresen (2019). Our eval-
uations indicate that this approach is not optimal at any
level of the CPI hierarchy.
3. Recurrent neural networks
Before describing the HRNN model in detail, we briefly
overview the main RNN approaches. RNNs are neural net-
works that model sequences of data in which each value is
assumed to be dependent on previous values. Specifically,
RNNs are feed-forward networks augmented by imple-
menting a feedback loop (Mandic & Chambers,2001). As
such, RNNs introduce a notion of time to the standard
feed-forward neural networks and excel at modeling tem-
poral dynamic behavior (Chung et al.,2014). Some RNN
units retain an internal memory state from previous time
steps representing an arbitrarily long context window.
Many RNN implementations were proposed and studied
in the past. A comprehensive review and comparison of
the different RNN architectures is available in Chung et al.
(2014) and Lipton et al. (2015). In this section, we cover
the three most popular units: basic RNNs, long short-term
memory (LSTM), and gated recurrent units (GRUs).
3.1. Basic recurrent neural networks
Let {xt}T
t=1be the model’s input time series consisting
of Tsamples. Similarly, let {st}T
t=1be the model’s results
consisting of Tsamples from the target time series. The
model’s input at tis xt, and its output (prediction) is st.
The following set of equations defines a basic RNN unit:
st=tanh (xtu+st−1w+b),(1)
1147
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Fig. 1. Illustration of a basic RNN unit.
Each line carries an entire vector, from the output of one node to the
inputs of others. The yellow box is a learned neural network layer.
where u,w, and bare the model’s parameters, and tanh(x)
=ex−e−x
ex+e−xis the hyperbolic tangent function. The model’s
output from the previous period st−1is used as an ad-
ditional input to the model at time t, along with the
current input xt. The linear combination xtu+st−1w+bis
the argument of a hyperbolic tangent activation function
allowing the unit to model nonlinear relations between
inputs and outputs. Different implementations may em-
ploy other activation functions, e.g., the sigmoid function,
some logistic functions, or a rectified linear unit (ReLU)
function (Ramachandran et al.,2017). Fig. 1 depicts a basic
RNN unit.
3.2. Long short-term memory networks
Basic RNNs suffer from the ‘‘short-term memory’’ prob-
lem: they utilize data from recent history to forecast, but
if a sequence is long enough, it cannot carry relevant
information from earlier periods to later ones, e.g., rel-
evant patterns from the same month in previous years.
Long short-term memory networks (LSTMs) mitigate the
‘‘short-term memory’’ problem by introducing gates that
enable the preservation of relevant ‘‘long-term memory’’
and combining it with the most recent data (Hochreiter
& Schmidhuber,1997). The introduction of LSTMs paved
the way for significant strides forward in various fields,
such as natural language processing, speech recognition,
and robot control (Yu et al.,2019).
An LSTM unit has the ability to ‘‘memorize’’ or ‘‘forget’’
information through the use of a special memory cell state,
carefully regulated by three gates: an input gate, a forget
gate, and an output gate. The gates regulate the flow of
information into and out of the memory cell state. An
LSTM unit is defined by the following set of equations:
i=σ(xtui+st−1wi+bi),
f=σ(xtuf+st−1wf+bf),
o=σ(xtuo+st−1wo+bo),
˜
c=tanh (xtuc+st−1wc+bc),
ct=f×ct−1+iט
c,
st=o×tanh(ct),
(2)
where σ(x)=1
1+e−xis the sigmoid or logistic activation
function; ui,wi, and biare the learned parameters that
control the input gate i;uf,wf, and bfare the learned
parameters that control the forget gate f;uo,wo, and
boare the learned parameters that control the output
gate o; and ˜
cis the new candidate activation for the cell
state determined by the parameters uc,wc, and bc. The
cell state ctis itself updated by the linear combination
ct=f×ct−1+iט
c, where ct−1is its previous value
of the cell state. The input gate idetermines which parts
of the candidate ˜
cshould be used to modify the memory
cell state, and the forget gate fdetermines which parts
of the previous memory ct−1should be discarded. Finally,
the recently updated cell state ctis ‘‘squashed’’ through
a nonlinear hyperbolic tangent, and the output gate o
determines which parts of it should be presented in the
output st.Fig. 2 depicts an LSTM unit.
3.3. Gated recurrent units
A gated recurrent unit (GRU) improves the LSTM unit
by dropping the cell state in favor of a more simplified unit
that requires fewer learnable parameters (Dey & Salemt,
2017). GRUs employ only two gates instead of three:
an update gate and a reset gate. Using fewer parame-
ters, GRUs are faster and more efficient, especially when
training data are limited, such as in the case of infla-
tion predictions and particularly disaggregated inflation
components.
The following set of equations defines a GRU unit:
z=σ(xtuz+st−1wz+bz),
r=σ(xtur+st−1wr+br),
v=tanh (xtuv+(st−1×r)wv+bv),
st=z×v+(1 −z)st−1,
(3)
where uz,wz, and bzare the learned parameters that
control the update gate z, and ur,wr, and brare the
learned parameters that control the reset gate r. The
candidate activation vis a function of the input xtand
the previous output st−1, and is controlled by the learned
parameters: uv,wv, and bv. Finally, the output stcombines
the candidate activation v, and the previous state st−1
controlled by the update gate z.Fig. 3 depicts a GRU unit.
GRUs enable the ‘‘memorization’’ of relevant informa-
tion patterns with significantly fewer parameters com-
pared to LSTMs (see Fig. 2). Hence, GRUs constitute the
basic unit for our novel HRNN model described in Sec-
tion 4.
4. Hierarchical recurrent neural networks
The disaggregated components at lower levels of the
CPI hierarchy (e.g., newspapers, medical care, etc.) suffer
from missing data as well as higher volatility in change
rates. The HRNN exhibits a network graph in which each
node is associated with an RNN unit that models the
inflation rate of a specific (sub-) index (node) in the full
CPI hierarchy. The HRNN’s unique architecture allows it to
propagate information from RNN nodes in higher levels to
lower levels in the CPI hierarchy, coarse to fine grained,
via a chain of hierarchical informative priors over the
RNNs’ parameters. This unique property of the HRNN
is materialized in better predictions for nodes at lower
levels of the hierarchy, as we show in Section 6,
1148
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Fig. 2. Illustration of an LSTM unit.
Each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent point-wise operations, while the
yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denotes its content being copied and the
copies going to different locations.
Fig. 3. Illustration of a GRU unit.
Each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent point-wise operations, while the
yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denotes its content being copied and the
copies going to different locations.
4.1. Model formulation
Let I= {n}N
n=1be an enumeration of the nodes in the
CPI hierarchy graph. In addition, we define πn∈Ias
the parent node of the node n. For example, if the nodes
n=5 and n=19 represent the indexes of Tomatoes
and Vegetables, respectively, then π5=19 (i.e., the parent
node of Tomatoes is Vegetables).
For each node n∈I, we denote by xn
t∈Rthe
observed random variable that represents the CPI value
of the node nat timestamp t∈N. We further denote
Xn
t≜(xn
1,...,xn
t), where 1 ≤t≤Tn, and Tnis the
last timestamp for node n. Let g:Rm×Ω→Rbe
a parametric function representing an RNN node in the
hierarchy. Specifically, Rmis the space of parameters that
control the RNN unit, Ωis the input time series space, and
the function gpredicts a scalar value for the next value of
the input series. Hence, our goal is to learn the parameters
θn∈Rmsuch that for Xn
t∈Ω,g(θn,Xn
t)=xn
t+1,∀n∈I,
and 1 ≤t<Tn.
We proceed by assuming a Gaussian error on g’s pre-
dictions and receive the following expression for the like-
lihood of the observed time series:
p(Xn
Tn|θn, τn)=
Tn
t=1
p(xn
t|Xn
t−1, θn, τn)
=
Tn
t=1
N(xn
t;g(θn,Xn
t−1), τ −1
n),(4)
where τ−1
n∈Ris the variance of g’s errors.
Next, we define a hierarchical network of normal pri-
ors over the nodes’ parameters that attach each node’s
parameters with those of its parent node. The hierarchical
1149
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
priors follow
p(θn|θπn, τθn)=N(θn;θπn, τ −1
θnI),(5)
where τθnis a configurable precision parameter that de-
termines the ‘‘strength’’ of the relation between node n’s
parameters and the parameters of its parent πn. Higher
values of τθnstrengthen the attachment between θnand
its prior θπn.
The precision parameter τθncan be seen as a global
hyperparameter of the model to be optimized via cross-
validation. However, different nodes in the CPI hierar-
chy have varying degrees of correlation with their parent
nodes. Hence, the value of τθnin HRNN is given by
τθn=eα+Cn,(6)
where αis a hyperparameter, and Cn=ρ(Xn
Tn,Xπn
Tπn) is the
Pearson correlation coefficient between the time series of
nand its parent πn.
Importantly, Eq. (5) describes a novel prior relationship
between the parameters of a node and its parent in the
hierarchy that grows increasingly stronger according to
the historical correlation between the two series. This
ensures that a child node nis kept close to its parent
node πnin terms of the squared Euclidean distance in the
parameter space, especially if they are highly correlated.
Note that in the case of the root node (the headline
CPI), πndoes not exist and hence we set a normal non-
informative regularization prior with zero mean and unit
variance.
Let us now denote the aggregation of all series from
all levels by X= {Xn
Tn}n∈I. Similarly, we denote by θ=
{θn}n∈Iand T = {τn}n∈Ithe aggregation of all the RNN pa-
rameters and precision parameters from all levels, respec-
tively. Note that X(the data) is observed, θdenotes unob-
served learned variables, and T is determined by Eq. (6).
The hyperparameter αfrom Eq. (6) is set by a cross-
validation procedure.
With these definitions at hand, we now proceed with
the Bayes rule. From Eq. (4) and Eq. (5), we extract the
posterior probability:
p(θ|X,T) =p(X|θ , T)p(θ)
P(X)∝
n∈I
Tn
t=1
N(xn
t;g(θn,Xn
t−1), τ −1
n)
n∈I
N(θn;θπn, τ −1
θnI).
(7)
HRNN optimization follows a maximum a posteriori (MAP)
approach. Namely, we wish to find optimal parameter
values θ∗, such that
θ∗=argmax
θ
log p(θ|X,T).(8)
Note that the objective in Eq. (8) depends on the
parametric function g. The HRNN is a general framework
that can use any RNN, e.g., a simple RNN, LSTM, GRU,
etc. In this work, we chose gto be a scalar GRU be-
cause GRUs are capable of long-term memory but with
fewer parameters than LSTMs. Hence, each node nis
associated with a GRU with its own parameters: θn=
[uz
n,ur
n,uv
n, wz
n, wr
n, wv
n,bz
n,br
n,bv
n]. Then, g(θn,Xn
t) is com-
puted by tsuccessive applications of the GRU to xn
iwith
1≤i≤taccording to Eq. (3). Finally, the HRNN op-
timization proceeds with stochastic gradient ascent over
the objective in Eq. (8).Fig. 4 depicts the entire HRNN
architecture.
4.2. HRNN inference
In machine learning, after the model’s parameters have
been estimated in the training process, the model can
be applied to make predictions in a process known as
inference. In our case, equipped with the MAP estimate
θ∗, inference with the HRNN model is achieved as follows:
Given a sequence of historical CPI values Xn
tfor node n, we
predict the next CPI value yn
t+1=g(θn,Xn
t), as explained
in . This type of prediction is for next month’s CPI, namely,
horizon h=0. In this work, we also tested the ability
of the model to perform predictions for further horizons
h∈ {0,...,8}. The h-horizon predictions are obtained in a
recursive manner, whereby each predicted value yn
tis fed
back as an input for the prediction of yn
t+1. As expected,
Section 6shows that the forecasting accuracy gradually
degrades as horizon hincreases.
5. Dataset
This work is based on monthly CPI data released by
the US Bureau of Labor and Statistics (BLS). In what fol-
lows, we discuss the dataset’s characteristics and our
pre-processing procedures. For the sake of reproducibility,
the final version of the processed data is available in our
HRNN code.
5.1. The US consumer price index
The official CPI of each month is released by the BLS
several days into the following month. The price tags are
collected in 75 urban areas throughout the US from about
24,000 retail and service establishments. The housing and
rent rates are collected from about 50,000 landlords and
tenants across the country. The BLS releases two different
measurements according to urban demographics:
1. The CPI-U represents the CPI for urban consumers
and covers approximately 93% of the total pop-
ulation. According to the Consumer Expenditure
Survey, the CPI items and their relative weights are
derived from their estimated expenditure. These
items and their weights are updated each year in
January.
2. The CPI-W represents the CPI for urban wage earn-
ers and clerical workers and covers about 29% of
the population. This index is focused on households
with at least 50% of income coming from clerical or
wage-paying jobs, and at least one of the house-
hold’s earners must have been employed for at
least 70% of the year. The CPI-W indicates changes
in the cost of benefits, as well as future contract
obligations.
1150
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Fig. 4. Illustration of the full HRNN model.
In this work, we focus on CPI-U, as it is generally con-
sidered the best measure for the average cost of living
in the US. Monthly CPI-U data per product are gener-
ally available from January 1994. Our samples thus span
from January 1994 to March 2019. Note that throughout
the years, new indexes were added, and some indexes
have been omitted. Consequently, hierarchies can change,
which contributes to the challenge of our exercise.
5.2. The CPI hierarchy
The CPI-U is an eight-level-deep hierarchy compris-
ing 424 different nodes (indexes). Level 0 represents the
headline CPI, or the aggregated index of all components.
An index at any level is associated with a weight be-
tween 0 and 100, which represents its contribution to the
headline CPI at level 0. Level 1 consists of the eight main
aggregated categories or sectors: (1) Food and Beverages,
(2) Housing, (3) Apparel, (4) Transportation, (5) Medical
Care, (6) Recreation, (7) Education and Communication,
and (8) Other Goods and Services. Mid-levels (2–5) consist
of more specific aggregations, e.g., Energy Commodities,
Household Insurance, etc. The lower levels (6–8) consist
of fine-grained indexes, e.g., Apples, Bacon and Related
Products, Eyeglasses and Eye Care, Tires, Airline Fares, etc.
Tables 7 and 8(in Appendix) depict the first three
hierarchies of the CPI (levels 0–2).
5.3. Data preparation
We used publicly available data from the BLS website.2
However, the BLS releases hierarchical data on a monthly
basis in separate files. Hence, separate monthly files from
January 1994 until March 2019 were processed and aggre-
gated to create a single repository. Moreover, the format
of these files has changed over the years (e.g., txt, pdf,
and csv formats were all in use) and significant effort was
made to parse the changing formats from different time
periods.
The hierarchical CPI data is released in terms of
monthly index values. We transformed the CPI values to
monthly logarithmic change rates as follows: We denote
by xtthe CPI value (of any node) at month t. The loga-
rithmic change rate at month tis denoted by rate(t) and
given by
rate(t)=100 ×log xt
xt−1.(9)
Unless otherwise mentioned, the remainder of the paper
relates to monthly logarithmic change rates, as in Eq. (9).
We split the data into a training dataset and a test
dataset as follows: For each time series, we kept the first
(early in time) 70% of the measurements for the training
dataset. The remaining 30% of the measurements were
2www.bls.gov/cpi.
1151
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Table 1
Descriptive statistics.
Dataset # of monthly Mean STD Min Max # of Avg. measurements
measurements indexes per index
Headline only 303 0.18 0.33 −1.93 1.22 1 303
Level 1 6742 0.17 0.96 −18.61 11.32 34 198.29
Level 2 6879 0.12 1.10 −19.60 16.81 46 149.54
Level 3 7885 0.17 1.31 −34.23 16.37 51 121.31
Level 4 7403 0.08 1.97 −35.00 28.17 58 107.89
Level 5 10,809 0.01 1.43 −21.04 242.50 92 87.90
Level 6 7752 0.09 1.49 −11.71 16.52 85 86.13
Level 7 4037 0.11 1.53 −11.90 9.45 50 80.74
Level 8 595 0.08 1.56 −5.27 5.02 7 85.00
Full hierarchy 52,405 0.10 1.75 −35.00 242.50 424 123.31
Notes: General statistics of the headline CPI and CPI-U for each level in the hierarchy and the full hierarchy of indexes.
removed from the training dataset and used to form the
test dataset. The training dataset was used to train the
HRNN model as well as the other baselines. The test
dataset was used for evaluations. The results in Section 6
are based on this split.
Table 1 summarizes the number of data points and
general statistics of the CPI time series after applying
Eq. (9). When comparing the headline CPI with the full
hierarchy, we see that at lower levels, the standard devia-
tion (STD) is significantly higher and the dynamic range is
larger, implying much more volatility. The average num-
ber of measurements per index decreases at the lower
levels of the hierarchy, as not all indexes are available for
the entire period.
Fig. 5 depicts box plots of the CPI change rate distri-
butions at different levels. The boxes depict the median
value and the upper 75th and lower 25th percentiles. The
whiskers indicate the overall minimum and maximum
rates. Fig. 5 further emphasizes that the change rates are
more volatile as we go down the CPI hierarchy.
High dynamic range, high standard deviation, and less
training data are all indicators of the difficulty of making
predictions inside the hierarchy. Based on this informa-
tion, we can expect that the disaggregated component
predictions inside the hierarchy will be more difficult than
the headline.
Finally, Fig. 6 depicts a box plot of the CPI change rate
distribution for different sectors. We notice that some sec-
tors (e.g., apparel and energy) suffer from higher volatility
than others. As expected, predictions for these sectors will
be more difficult.
6. Evaluation and results
We evaluated the HRNN and compared it with well-
known baselines for inflation prediction as well as some
alternative machine learning approaches. We use the fol-
lowing notation: Let xtbe the CPI log-change rate at
month t. We consider models for ˆ
xt—an estimate for xt
based on historical values. Additionally, we denote by
εtthe estimation error at time t. In all cases, the h-
horizon forecasts were generated by recursively iterating
the one-step forecasts forward. Hyperparameters were set
through a ten-fold cross-validation procedure.
6.1. Baseline models
We compared the HRNN with the following CPI pre-
diction baselines:
1. Autoregression (AR) – The AR(ρ) estimates ˆ
xtbased
on the previous ρmonths as follows: ˆ
xt=α0+
ρ
i=1αixt−i+εt, where {αi}ρ
i=0denotes the model’s
parameters.
2. Phillips curve (PC) – A PC(ρ) is an extension of
AR(ρ) that considers the unemployment rate utat
month tin the CPI forecasting model as follows:
ˆ
xt=α0+ρ
i=1αixt−i+βut−1+εt, where {αi}ρ
i=0
and βare the model’s parameters.
3. Vector autoregression (VAR) – The VAR(ρ) model
is a multivariate generalization of AR(ρ). It is fre-
quently used to model two or more time series
together. VAR(ρ) estimates next month’s values of
ktime series based on their historical values from
the previous ρmonths as follows: ˆ
Xt=A0+
(ρ
i=1AiXt−i)+ϵt, where Xtdenotes the last ρ
values from kdifferent time series at month t, and
ˆ
Xtdenotes the model’s estimates of these values;
{Ai}ρ
i=0denotes (k×k) matrices of parameters, and
ϵtis a vector of error terms.
4. Random walk (RW) – We consider the RW(ρ)
model of Atkeson & Ohanian (2001). RW(ρ) is a
simple yet effective model that predicts next month’s
CPI as an average of the last ρmonths: ˆ
yt=
1
ρρ
i=1xt−i+εt.
5. Autoregression in gap (AR-GAP) – The AR-GAP
model subtracts a fixed inflation trend before pre-
dicting the inflation in gap (Faust & Wright,2013).
Inflation gap is defined as gt=xt−τt, where τt
is the inflation trend at time t, which represents a
slowly varying local mean. This trend value is es-
timated using RW(ρ) as follows: τt=1
ρρ
i=1xt−i.
By accounting for the local inflation trend τt, the
model attempts to increase stationarity in gtand
estimate it by ˆ
gt=α0+ρ
i=1αigt−i+εt, where
{αi}ρ
i=0denotes the model’s parameters. Finally, τt
is added back to ˆ
gtto achieve the forecast for the
final inflation prediction: ˆ
xt=ˆ
gt+τt.
6. Logistic smooth transition autoregressive model
(LSTAR) – The LSTAR model is an extension of
1152
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Fig. 5. Box plots of monthly inflation rate per hierarchy level.
Fig. 6. Box plots of monthly inflation rate per sector.
AR that allows for changes in the model param-
eters according to a transition variable F(t;c, γ ).
LSTAR(ρ, c, γ ) consists of two AR(ρ) components
that describe two trends in the data (high and low)
and a nonlinear transition function that links them
as follows:
ˆ
xt=α0+
ρ
i=1
αixt−i(1−F(t;γ , c))
+β0+
ρ
i=1
βixt−iF(t;γ , c)+εt,(10)
where F(t;γ , c)=1
1+e−γ(t−c)is a first-order lo-
gistic transition function that depends on the lo-
cation parameter cand a smoothing parameter γ.
The location parameter ccan be interpreted as
the threshold between the two AR(ρ) regimes, in
the sense that the logistic function changes mono-
tonically from 0 to 1 as tincreases and balances
symmetrically at t=c(van Dijk et al.,2002). The
model’s parameters are {αi}ρ
i=0and {βi}ρ
i=0, while γ,
and care hyperparameters.
7. Random forests (RF) – The RF(ρ) model is an en-
semble learning method which builds a set of deci-
sion trees (Song & Ying,2015) in order to mitigate
overfitting and improve generalization (Breiman,
2001). At prediction time, the average prediction of
the individual trees is returned. The inputs to the
RF(ρ) model are the last ρsamples, and the output
is the predicted value for the next month.
8. Gradient boosted trees (GBT) – The GBT(ρ) model
(Friedman,2002) is based on an ensemble of deci-
sion trees which are trained in a stage-wise fashion
similar to other boosting models Schapire (1999).
1153
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Unlike RF(ρ), which averages the prediction of sev-
eral decision trees, the GBT(ρ) model trains each
tree to minimize the remaining residual error of
all previous trees. At prediction time, the sum of
predictions of all the trees is returned. The inputs
to the GBT(ρ) model are the last ρsamples, and the
output is the predicted value for the next month.
9. Fully connected neural network (FC) – The FC(ρ)
model is a fully connected neural network with one
hidden layer and a ReLU activation (Ramachandran
et al.,2017). The output layer employs no activation
to formulate a regression problem with a squared
loss optimization. The inputs to the FC(ρ) model are
the last ρsamples, and the output is the predicted
value for the next month.
10. Deep neural network (Deep-NN) – The Deep-NN(ρ)
model is a deep neural network consisting of ten
layers with 100 neurons, as in Olson et al. (2018),
which was shown to perform well for inflation
prediction (Goulet Coulombe,2020). We used the
original setup of Olson et al. (2018) and tuned its
hyperparameters as follows: the learning rate was
set to lr =0.005, training lasted 50 epochs (instead
of 200), and the ELU activation functions (Clev-
ert et al.,2016) were replaced by ReLU activation
functions. These changes yielded more accurate
predictions. Hence we decided to include them in
all our evaluations. The inputs to the Deep-NN(ρ)
model are the last ρsamples, and the output is the
predicted value for the next month.
11. Deep neural network with unemployment (Deep-
NN + Unemployment) – Similar to PC(ρ), which
extends AR(ρ) by including unemployment data,
the Deep-NN(ρ) + Unemployment model extends
Deep-NN(ρ) by including the last ρsamples of the
unemployment rate ut. In terms of hyperparame-
ters, we used identical values as in the Deep-NN(ρ).
6.2. Ablation models
In order to demonstrate the contribution of the hier-
archical component of the HRNN model, we conducted
an ablation study that considered simpler alternatives
to the HRNN based on GRUs without the hierarchical
component:
1. Single GRU (S-GRU) – The S-GRU(ρ) is a single GRU
that receives the last ρvalues as inputs in order to
predict the next value. In GRU(ρ), a single GRU is
used for all the time series that comprise the CPI
hierarchy. This baseline utilizes all the benefits of
a GRU but assumes that the different components
of the CPI behave similarly and that a single unit is
sufficient to model all the nodes.
2. Independent GRUs (I-GRUs) – In I-GRUs(ρ), we
trained a different GRU(ρ) unit for each CPI node.
The S-GRU and I-GRU approaches represent two
extremes: The first attempts to model all the CPI
nodes with a single model, while the second treats
each node separately. I-GRUs(ρ) is equivalent to a
variant of the HRNN that ignores the hierarchy by
setting the precision parameter τθn=0; ∀n∈I.
That is, this is a simple variant of the HRNN that
trains independent GRUs, one for each index in the
hierarchy.
3. k-nearest neighbors GRU (KNN-GRU) – In order
to demonstrate the contribution of the hierarchical
structure of HRNN, we devised the KNN-GRU(ρ)
baseline. KNN-GRU attempts to utilize information
from multiple Pearson-correlated CPI nodes with-
out employing the hierarchical informative priors.
Hence, KNN-GRU presents a simpler alternative to
the HRNN that replaces the hierarchical structure
with elementary vector GRUs as follows: First, the
knearest neighbors of each CPI node were found
using the Pearson correlation measure. Then, sepa-
rate vector GRU(ρ) units were trained for each CPI
aggregate along its kmost similar nodes using the
last ρvalues of node nand its k-nearest nodes.
By doing so, the KNN-GRU(ρ) baseline was able
to utilize the benefits of GRU units together with
relevant information that comes from correlated
nodes.
6.3. Evaluation metrics
Following Aparicio & Bertolotto (2020) and Faust &
Wright (2013), we report results in terms of three eval-
uation metrics:
1. Root mean squared error (RMSE) – The RMSE is
given by
RMSE =
1
T
T
t=1xt−ˆ
xt2,(11)
where xtis the monthly change rate for month t,
and ˆ
xtis the corresponding prediction.
2. Pearson correlation coefficient – The Pearson cor-
relation coefficient φis given by
φ=COV (XT,ˆ
XT)
σX×σˆ
X
,(12)
where COV (XT,ˆ
XT) is the covariance between the
series of actual values and the predictions, and σXT
and σˆ
XTare the standard deviations of the actual
values and the predictions, respectively.
3. Distance correlation coefficient – In contrast to
the Pearson correlation measure, which detects lin-
ear associations between two random variables,
the distance correlation measure can also detect
nonlinear correlations (Székely et al.,2007;Zhou,
2012). The distance correlation coefficient rdis given
by
rd=dCov(XT,ˆ
XT)
dVar(XT)×dVar( ˆ
XT)
,(13)
where dCov(XT,ˆ
XT) is the distance covariance be-
tween the series of actual values and the predic-
tions, and dVar(XT) and dVar( ˆ
XT) are the distance
variance of the actual values and the predictions,
respectively.
1154
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Table 2
Average results on disaggregated CPI components.
Model RMSE per horizon Correlation
name AR(1) =1.00 (at horizon =0)
0 1 2 3 4 8 Pearson Distance
AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.06 0.05
AR(2) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
AR(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
AR(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
AR-GAP(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
AR-GAP(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
RW(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.05 0.04
Phillips(4) 1.00 1.00 1.00 1.00 0.98 1.00 0.06 0.04
VAR(1) 1.03 1.03 1.04 1.03 1.04 1.05 0.04 0.03
VAR(2) 1.03 1.03 1.04 1.03 1.04 1.05 0.06 0.03
VAR(3) 1.03 1.03 1.03 1.03 1.04 1.05 0.06 0.03
VAR(4) 1.02 1.03 1.03 1.03 1.03 1.04 0.07 0.04
LSTAR(ρ=4, c=2, γ=0.3) 1.04 1.07 1.07 1.07 1.08 1.1 0.09 0.07
GBT(4) 0.83 0.83 0.83 0.84 0.84 0.86 0.18 0.27
RF(4) 0.84 0.85 0.86 0.86 0.86 0.87 0.19 0.29
FC(4) 1.03 1.03 1.04 1.04 1.04 1.05 0.12 0.09
Deep-NN(4) 0.90 0.90 0.90 0.90 0.91 0.91 0.13 0.22
Deep-NN(4) + Unemployment 0.85 0.85 0.85 0.85 0.85 0.86 0.12 0.22
S-GRU(4) 1.02 1.06 1.06 1.07 1.04 1.12 0.10 0.08
I-GRU(4) 0.83 0.84 0.85 0.85 0.86 0.89 0.17 0.13
KNN-GRU(1) 0.91 0.93 0.96 0.97 0.96 0.96 0.19 0.15
KNN-GRU(2) 0.90 0.93 0.95 0.97 0.96 0.96 0.20 0.15
KNN-GRU(3) 0.89 0.92 0.95 0.96 0.96 0.95 0.20 0.15
KNN-GRU(4) 0.89 0.91 0.95 0.95 0.95 0.95 0.20 0.15
HRNN(1) 0.79 0.79 0.81 0.81 0.81 0.83 0.23 0.28
HRNN(2) 0.78 0.79 0.81 0.81 0.80 0.82 0.22 0.29
HRNN(3) 0.79 0.78 0.80 0.81 0.81 0.81 0.23 0.30
HRNN(4) 0.78 0.78 0.79 0.79 0.79 0.80 0.24 0.29
Notes: Average results across all 424 inflation indexes that make up the headline CPI. The RMSE results are relative to
the AR(1) model and normalized according to its results, i.e., RMSEModel
RMSEAR(1). The results are statistically significant according
to a Diebold–Mariano test with p<0.02.
6.4. Results
The HRNN model is unique in its ability to utilize
information from higher levels in the CPI hierarchy in
order to make predictions at lower levels. Therefore, we
provide results for each level of the CPI hierarchy—overall,
424 disaggregated indexes belonging to eight different
hierarchies. For the sake of completion, we also provide
results for the headline CPI index by itself. It is important
to note that in this case, the HRNN model cannot utilize
its hierarchical mechanism and has no advantage over the
alternatives, so we do not expect it to outperform.
Table 2 shows the average results from all the disag-
gregated indexes in the CPI hierarchy. We present predic-
tion results for horizons 0, 1, 2, 3, 4, and 8 months. The
results are relative to the AR(1) model and normalized
according to RMSEModel
RMSEAR(1). In the HRNN, we set α=1.5,
and the V-GRU(ρ) models were based on k=5 nearest
neighbors.
Table 2 shows that different versions of the HRNN
model repeatedly outperform the alternatives at any hori-
zon. Notably, the HRNN is superior to I-GRU, emphasizing
the importance of using hierarchical information and the
superiority of the HRNN over regular GRUs. Additionally,
the HRNN is superior to the different KNN-GRU models,
emphasizing the specific way the HRNN employs infor-
mative priors based on the CPI hierarchy. These results
are statistically significant according to Diebold & Mariano
(1995) pairwise tests for a squared loss-differential with
p-values below 0.02. Additionally, we performed a model
confidence set (MCS) test (Hansen et al.,2011) for the
leading models: RF(4), Deep-NN(4), Deep-NN(4) + Unem-
ployment, GBT(4), IGRU(4), HRNN(1), HRNN(2), HRNN(3),
and HRNN(4). The MCS removed all the baselines and left
only the four HRNN variants, with HRNN(4) as the leading
model (pHRNN(4) =1.00).
For the sake of completion, we also provide results for
predictions at the head of the CPI index. Table 3 summa-
rizes these results. When considering only the headline,
the hierarchical mechanism of the HRNN is redundant and
the model is identical to a single GRU. In this case, we
do not observe much advantage for employing the HRNN
model. In contrast, we see an advantage for the other
deep learning models, such as FC(4) and Deep-NN(4) +
Unemployment, which outperform the more traditional
approaches.
Table 4 lists the results of the best model, HRNN(4),
across all hierarchies (1–8, excluding the headline). We
include the results of the best ablation model, the
I-GRU(4) model, for comparison. The results are averaged
over all disaggregated components and normalized by the
AR(1) model RMSE, as before. As evident from Table 4,
the HRNN model shows the best relative performance at
the lower levels of the hierarchy where the CPI indexes
are more volatile and the hierarchical priors are most
effective.
Table 5 compares the results of HRNN(4) across differ-
ent sectors. Again, we include the results of the I-GRU(4)
1155
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Table 3
CPI headline only.
Model RMSE per horizon Correlation
name* AR(1) =1.00 (at horizon =0)
0 1 2 3 4 8 Pearson Distance
AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.29 0.22
AR(2) 1.00 0.97 0.99 1.01 1.00 0.98 0.32 0.24
AR(3) 1.00 0.98 0.98 1.00 0.96 0.97 0.33 0.25
AR(4) 1.00 0.95 0.95 0.96 0.93 0.96 0.33 0.25
AR-GAP(3) 1.00 0.98 0.98 1.00 0.96 0.97 0.33 0.25
AR-GAP(4) 0.99 0.95 0.95 0.96 0.93 0.96 0.33 0.25
RW(4) 1.05 0.98 0.99 1.01 0.97 0.96 0.23 0.2
Phillips(4) 0.93 0.94 0.95 0.95 0.93 0.95 0.33 0.25
LSTAR(ρ=4, c=2, γ=0.3) 0.98 0.95 0.95 0.97 0.95 0.95 0.32 0.24
RF(4) 1.05 1.06 1.03 1.07 1.04 1.03 0.27 0.28
GBT(4) 0.97 0.99 0.93 0.95 0.93 0.93 0.25 0.35
FC(4) 0.92 0.94 0.94 0.96 0.93 0.94 0.33 0.25
Deep-NN(4) 0.94 0.97 0.96 0.98 0.94 0.92 0.31 0.32
Deep-NN(4) + Unemployment 1.00 0.97 0.92 0.94 0.92 0.91 0.37 0.32
HRNN(4)/GRU(4) 1.00 0.97 0.99 0.99 0.96 0.99 0.35 0.37
Notes: Prediction results for the CPI headline index alone. The RMSE results are relative to the AR(1) model and normalized
according to its results, i.e., RMSEModel
RMSEAR(1).
Table 4
HRNN(4) vs. I-GRU(4) at different levels of the CPI hierarchy with respect to AR(1).
Hierarchy HRNN(4) I-GRU(4)
level
RMSE per horizon Correlation RMSE per horizon Correlation
AR(1) =1.00 (at horizon =0) AR(1) =1.00 (at horizon =0)
0 2 4 8 Pearson Distance 0 2 4 8 Pearson Distance
Level 1 0.95 0.97 0.99 1.00 0.33 0.37 0.98 0.98 0.99 0.97 0.25 0.38
Level 2 0.91 0.90 0.91 0.91 0.30 0.35 0.90 092 0.94 0.93 0.26 0.34
Level 3 0.79 0.79 0.80 0.81 0.21 0.31 0.82 0.89 0.94 0.94 0.23 0.37
Level 4 0.77 0.77 0.76 0.77 0.26 0.32 0.84 0.87 0.90 0.92 0.20 0.33
Level 5 0.79 0.77 0.77 0.80 0.21 0.31 0.85 0.89 0.89 0.93 0.22 0.29
Level 6 0.75 0.76 0.81 0.81 0.19 0.23 0.85 0.89 0.90 0.92 0.21 0.21
Level 7 0.75 0.78 0.77 0.80 0.17 0.17 0.87 0.89 0.92 0.94 0.18 0.15
Level 8 0.72 0.78 0.77 0.78 0.10 0.23 0.89 0.90 0.92 0.94 0.10 0.12
Notes: The RMSE results are relative to the AR(1) model and normalized according to its results, i.e., RMSEModel
RMSEAR(1).
Table 5
HRNN(4) vs. I-GRU(4) results for different CPI sectors with respect to AR(1).
Industry HRNN(4) I-GRU(4)
sector
RMSE per horizon Correlation RMSE per horizon Correlation
AR(1) =1.00 (at horizon =0) AR(1) =1.00 (at horizon =0)
0 2 4 8 Pearson Distance 0 2 4 8 Pearson Distance
Apparel 0.83 0.87 0.84 0.88 0.04 0.19 0.88 0.88 0.85 0.92 0.05 0.23
Energy 0.94 0.96 0.99 0.98 0.34 0.32 0.94 0.98 1.02 0.99 0.18 0.28
Food & Beverages 0.72 0.73 0.75 0.76 0.22 0.13 0.80 0.80 0.81 0.82 0.18 0.22
Housing 0.79 0.80 0.82 0.82 0.17 0.24 0.77 0.79 0.82 0.82 0.18 0.27
Medical Care 0.79 0.82 0.81 0.82 0.03 0.17 0.79 0.83 0.83 0.84 0.08 0.15
Recreation 0.99 0.99 1.00 1.00 0.05 0.17 1.00 0.99 1.00 1.00 −0.07 0.17
Services 0.90 0.92 0.95 0.94 0.04 0.15 0.89 0.94 0.95 0.96 0.02 0.21
Transportation 0.83 0.84 0.85 0.85 0.27 0.28 0.82 0.85 0.86 0.88 0.26 0.36
Notes: The RMSE results are relative to the AR(1) model and normalized according to its results, i.e., RMSEModel
RMSEAR(1).
model for comparison. The results are averaged over all
disaggregated components and presented as normalized
gains with respect to the AR(1) model, as before. The best
relative improvement of the HRNN(4) model appears to
be in the Food and Beverages group. This can be explained
by the fact that the Food and Beverages sub-hierarchy is
the deepest and most elaborate hierarchy in the CPI tree.
When the hierarchy is deeper and more elaborate, the
advantages of the HRNN are emphasized.
Finally, Fig. 7 provides specific examples of three dis-
aggregated indexes: Tomatoes, Bread, and Information
Technology. The solid red line presents the actual CPI val-
ues. The dashed green line presents HRNN(4) predictions,
while the dotted blue line presents I-GRU(4) predictions.
1156
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Fig. 7. Examples of HRNN(4) predictions for disaggregated indexes.
These indexes are located at the bottom of the CPI hierar-
chy and suffer from relatively high volatility. The HRNN(4)
model seems to track and predict the trends of the real
index accurately and often perform better than I-GRU(4).
As can be seen, I-GRU’s predictions appear to be more
conservative than HRNN. At first, this may appear coun-
terintuitive, as the HRNN has more regularization than
I-GRU. However, this additional regularization is actually
informative regularization coming from the parameters of
the upper levels in the CPI hierarchy. This allows the
HRNN model to be more expressive without overfitting.
In contrast, in order to ensure that I-GRU does not over-
fit the training data, its other regularization techniques,
such the learning rate hyperparameter and the early stop-
ping procedure, prevent the I-GRU model from becoming
overconfident. Figs. 9 and 10 in Appendix provide addi-
tional examples for a large variety of disaggregated CPI
components.
6.5. HRNN dynamics
In what follows, we take a closer look at several char-
acteristics of the HRNN model that result from the non-
stationary nature of the CPI. The HRNN model is a deep
learning hierarchical model that requires substantial train-
ing time depending on the available hardware. In this
work, the HRNN model was trained once using the train-
ing dataset and evaluated on the test dataset, as explained
above. In order to investigate the potential benefit from
retraining the HRNN every quarter, we performed the
following experiment: For a test-set period from 2001–
2018, we retrained HRNN(4) after each quarter, each
time adding the hierarchical CPI values of the last three
months. Fig. 8 presents the results of this experiment. The
dashed green line presents the RMSE of HRNN(4) with
the regular training used in this work, while the dotted
blue line presents the results of retraining the HRNN every
quarter. As expected, in most cases, retraining the model
with additional data from the recent period improves the
1157
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Fig. 8. Effect of retraining HRNN(4) each quarter.
Table 6
Average results on disaggregated CPI components prior to the GFC.
Model RMSE per horizon Correlation
name* AR(1) =1.00 (at horizon =0)
0 1 2 3 4 8 Pearson Distance
AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.07 0.05
AR(2) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
AR(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
AR(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
AR-GAP(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
AR-GAP(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.10 0.07
RW(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.05 0.04
Phillips(4) 1.00 1.00 0.99 0.99 1.00 1.00 0.05 0.03
VAR(1) 1.04 1.04 1.04 1.05 1.05 1.06 0.04 0.03
VAR(2) 1.03 1.04 1.04 1.04 1.05 1.05 0.05 0.03
VAR(3) 1.03 1.03 1.03 1.04 1.04 1.05 0.06 0.03
VAR(4) 1.02 1.03 1.03 1.03 1.03 1.04 0.06 0.04
LSTAR(ρ=4, c=2, γ=0.3) 1.05 1.06 1.05 1.08 1.09 1.10 0.08 0.06
RF(4) 0.92 0.91 0.91 0.92 0.92 0.95 0.2 0.29
GBT(4) 0.91 0.92 0.91 0.93 0.92 0.97 0.18 0.34
FC(4) 0.99 0.99 1.00 1.00 1.02 1.05 0.11 0.08
Deep-NN(4) 0.94 0.95 0.94 0.94 0.94 0.95 0.15 0.32
Deep-NN(4) + Unemployment 0.92 0.92 0.94 0.95 0.93 0.95 0.2 0.35
S-GRU(4) 1.05 1.09 1.09 1.10 1.09 1.10 0.09 0.07
I-GRU(4) 0.86 0.90 0.90 0.92 0.93 0.94 0.33 0.35
KNN-GRU(1) 0.94 0.96 0.96 0.96 0.97 0.98 0.10 0.07
KNN-GRU(2) 0.94 0.96 0.95 0.96 0.97 0.98 0.11 0.08
KNN-GRU(3) 0.93 0.96 0.95 0.96 0.96 0.98 0.11 0.08
KNN-GRU(4) 0.93 0.96 0.96 0.95 0.96 0.97 0.12 0.09
HRNN(1) 0.85 0.89 0.90 0.92 0.91 0.94 0.23 0.27
HRNN(2) 0.84 0.89 0.90 0.92 0.91 0.94 0.24 0.25
HRNN(3) 0.84 0.89 0.89 0.92 0.91 0.93 0.28 0.34
HRNN(4) 0.83 0.88 0.88 0.91 0.90 0.93 0.35 0.37
Notes: Average results across all 424 inflation indexes that make up the headline CPI. In contrast to Table 2, here we
focus on the results up to the GFC of 2008. The RMSE results are relative to the AR(1) model and normalized according
to its results, i.e., RMSEModel
RMSEAR(1). The results are statistically significant according to a Diebold–Mariano test with p<0.05.
results. However, this improvement is moderate and the
overall model quality is about the same.
In order to study the GFC effect on HRNN’s perfor-
mance, we removed the data from 2008 onward and
repeated the experiment of Table 2, using only the data
from 1997 up to 2008. The results of this experiment
are summarized in Table 6. In terms of the RMSE, the
gains of the HRNN in Table 2 vary from 0.78 up to 0.8,
in contrast to Table 6 where the gains vary from 0.83
to 0.93. This reveals that during the turmoil of the GFC,
when the demand for reliable and precise forecasting
tools is enhanced, the HRNN’s forecasting abilities remain
robust. In fact, its forecasting superiority was somewhat
enhanced during the GFC when compared to the AR(1)
baseline.
7. Concluding remarks
Policymakers have a wide range of predictive tools at
their disposal to forecast headline inflation: survey data,
1158
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Table 7
Indexes, levels 0 and 1.
Level Index Parent
0 All items –
1 All items less energy All items
1 All items less food All items
1 All items less food and energy All items
1 All items less food and shelter All items
1 All items less food, shelter, and energy All items
1 All items less food, shelter, energy, and used cars and trucks All items
1 All items less homeowners costs All items
1 All items less medical care All items
1 All items less shelter All items
1 Apparel All items
1 Apparel less footwear All items
1 Commodities All items
1 Commodities less food All items
1 Durables All items
1 Education and communication All items
1 Energy All items
1 Entertainment All items
1 Food All items
1 Food and beverages All items
1 Fuels and utilities All items
1 Household furnishings and operations All items
1 Housing All items
1 Medical care All items
1 Nondurables All items
1 Nondurables less food All items
1 Nondurables less food and apparel All items
1 Other goods and services All items
1 Other services All items
1 Recreation All items
1 Services All items
1 Services less medical care services All items
1 Services less rent of shelter All items
1 Transportation All items
1 Utilities and public transportation All items
Note: Levels and parents of indexes might change through time.
expert forecasts, inflation swaps, economic and econo-
metric models, etc. However, policy institutions lack mod-
els and data to assist with forecasting CPI components,
which are essential for a deeper understanding of the
underlying dynamics. The understanding of disaggregated
inflation trends can provide insight into the nature of
future inflation pressures, their transitory factors (sea-
sonal factors, energy, etc.), and other factors that influ-
ence market-makers and the conduct of monetary policy,
among other decision-makers. Hence, our hierarchical ap-
proach uses endogenous historical data to forecast the
CPI at the disaggregated level, rather than forecasting
headline inflation, even if it performs well (Ibarra,2012).
The business cycle plays an important role in inflation
dynamics, particularly through specific product classes.
CPI inflation dynamics are sometimes driven by compo-
nents unrelated to central bank policy objectives, such as
food and energy prices. A disaggregated CPI forecast pro-
vides a more accurate picture of the sources and features
of future inflation pressures in the economy, which in
turn improves policymakers’ response efficiency. Indeed,
forecasting sectoral inflation may improve the optimiza-
tion problem faced by the central bank (Ida,2020).
While similar headline inflation forecasts may corre-
spond to various underlying economic factors, a disaggre-
gated perspective allows understanding and analyzing the
decomposition of these inflation forecasts at the sectoral
or component level. Instead of disaggregating inflation to
forecast the headline inflation (Stock & Watson,2020), our
approach allows policy- and market-makers to forecast
specific sector and component prices, where information
is less available: almost no component- or sector-specific
survey forecasts, expert forecasts, or market-based fore-
casts exist. For instance, a central bank could use such
modeling features to consider components that contribute
to inflation (military, food, cigarettes, and energy) un-
related to its primary inflation objectives to improve
their final assessment of their inflation forecasts. Sector-
specific inflation forecasts should also inform economic
policy recommendations at the sectoral level, and market-
makers can better direct and tune their investment strate-
gies (Swinkels,2018).
In traditional approaches for inflation forecasting, a
theoretical or a linear model is often used, which in-
evitably biases the estimated forecasts. Our novel ap-
proach may overcome the usual shortcomings of
traditional forecasts, giving policymakers new insights
from a different angle. Disaggregated forecasts include
explanatory variables with hierarchies that reduce mea-
surement errors at the component level. Additionally, our
model structure attenuates component-specific residuals
derived from each level and sector, resulting in improved
forecasting. For all these reasons, we believe that the
1159
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Table 8
Indexes, level 2.
Level Index Parent
2 All items less food and energy All items less energy
2 Apparel commodities Apparel
2 Apparel services Apparel
2 Commodities less food Commodities
2 Commodities less food and beverages Commodities
2 Commodities less food and energy commodities All items less food and energy
2 Commodities less food, energy, and used cars and trucks Commodities
2 Communication Education and communication
2 Domestically produced farm food Food and beverages
2 Education Education and communication
2 Energy commodities Energy
2 Energy services Energy
2 Entertainment commodities Entertainment
2 Entertainment services Entertainment
2 Food Food and beverages
2 Food at home Food
2 Food away from home Food
2 Footwear Apparel
2 Fuels and utilities Housing
2 Homeowners costs Housing
2 Household energy Fuels and utilities
2 Household furnishings and operations Housing
2 Infants’ and toddlers’ apparel Apparel
2 Medical care commodities Medical care
2 Medical care services Medical care
2 Men’s and boys’ apparel Apparel
2 Nondurables less food Nondurables
2 Nondurables less food and apparel Nondurables
2 Nondurables less food and beverages Nondurables
2 Nondurables less food, beverages, and apparel Nondurables
2 Other services Services
2 Personal and educational expenses Other goods and services
2 Personal care Other goods and services
2 Pets, pet products and services Recreation
2 Photography Recreation
2 Private transportation Transportation
2 Public transportation Transportation
2 Rent of shelter Services
2 Services less energy services All items less food and energy
2 Services less medical care services Services
2 Services less rent of shelter Services
2 Shelter Housing
2 Tobacco and smoking products Other goods and services
2 Transportation services Services
2 Video and audio Recreation
2 Women’s and girls’ apparel Apparel
Note: Levels and parents of indexes have changed over the years.
HRNN can be a valuable tool for asset managers, pol-
icy institutions, and market-makers lacking component-
specific price forecasts critical to their decision processes.
The HRNN model was designed for predicting disag-
gregated CPI components. However, we believe its merits
may be useful for predicting other hierarchical time series,
such as GDP. In future work, we plan to investigate the
performance of the HRNN model on additional hierarchi-
cal time series. Moreover, in this paper we focused mainly
on endogenous models that do not consider other eco-
nomic variables. The HRNN can naturally be extended to
include different variables as side information by chang-
ing the input for the GRU components to be a multi-
dimensional time series (instead of a one-dimensional
vector). We plan to experiment with additional side in-
formation that can potentially improve the prediction
accuracy. In particular, we will experiment with online
price data, as in Aparicio & Bertolotto (2020). Finally, we
will try to replace the RNNs in the model with neural self-
attention (Shaw et al.,2018). Hopefully, this will lead to
improved accuracy and better explainability through the
analysis of attention scores (Hsieh et al.,2021).
Declaration of competing interest
The authors declare that they have no known com-
peting financial interests or personal relationships that
could have appeared to influence the work reported in
this paper.
Appendix. Additional tables and figures
See Figs. 9 and 10 and Tables 7 and 8.
1160
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
Fig. 9. Additional examples of HRNN(4) predictions for disaggregated indexes.
Fig. 10. Additional examples of HRNN(4) predictions for disaggregated indexes.
Indexes in Figs. 9 and 10 were selected from different hierarchies and sectors.
References
Almosova, A., & Andresen, N. (2019). Nonlinear inflation forecasting with
recurrent neural networks:Technical Report, European Central Bank
(ECB).
Aparicio, D., & Bertolotto, M. I. (2020). Forecasting inflation with online
prices. International Journal of Forecasting,36(2), 232–247.
Athey, & Susan (2018). The impact of machine learning on economics.
In The economics of artificial intelligence: An agenda (pp. 507–547).
University of Chicago Press.
Atkeson, A., & Ohanian, L. E. (2001). Are phillips curves useful for
1161
O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162
forecasting inflation? Federal Reserve Bank of Minneapolis Quarterly
Review,25(1), 2–11.
Bernanke, B. S., Laubach, T., Mishkin, F. S., & Posen, A. S. (2018). Inflation
targeting: lessons from the international experience. Princeton, NJ:
Princeton University Press.
Breiman, L. (2001). Random forests. Machine Learning,45(1),
5–32.
Chakraborty, C., & Joseph, A. (2017). Machine learning at central banks.
Bank of England Working Papers, Number 674.
Chen, X., Racine, J., & Swanson, N. R. (2001). Semiparametric ARX
neural-network models with an application to forecasting inflation.
IEEE Transactions on Neural Networks,12(4), 674–683.
Choudhary, M. A., & Haider, A. (2012). Neural network models
for inflation forecasting: An appraisal. Applied Economics,44(20),
2631–2635.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation
of gated recurrent neural networks on sequence modeling. arXiv
preprint arXiv:1412.3555.
Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate
deep network learning by exponential linear units (ELUs). arXiv:
Learning.
Dey, R., & Salemt, F. M. (2017). Gate-variants of gated recurrent unit
(GRU) neural networks. In 2017 IEEE 60th international midwest
symposium on circuits and systems (pp. 1597–1600). IEEE.
Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy.
Journal of Business & Economic Statistics,13(3), 253–263.
van Dijk, D., Terasvirta, T., & Franses, P. H. (2002). Smooth transi-
tion autoregressive models — A survey of recent developments.
Econometric Reviews,21(1), 1–47.
Faust, J., & Wright, J. H. (2013). Forecasting inflation. In G. El-
liott, C. Granger, & A. Timmermann (Eds.), Handbook of economic
forecasting:vol. 2,Handbook of economic forecasting (pp. 2–56).
Elsevier.
Friedman, M. (1961). The lag in effect of monetary policy. Journal of
Political Economy,69, 447.
Friedman, J. H. (2002). Stochastic gradient boosting. Computational
Statistics & Data Analysis,38(4), 367–378.
Gilchrist, S., Schoenle, R., Sim, J., & Zakrajšek, E. (2017). Inflation
dynamics during the financial crisis. American Economic Review,
107(3), 785–823.
Goulet Coulombe, P. (2020). To bag is to prune. arXiv e-Prints,
arXiv–2008.
Goulet Coulombe, P., Leroux, M., Stevanovic, D., & Surprenant, S.
(2022). How is machine learning useful for macroeconomic
forecasting? Journal of Applied Econometrics, in press.
Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence
set. Econometrica,79(2), 453–497.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory.
Neural Computation,9(8), 1735–1780.
Hsieh, T.-Y., Wang, S., Sun, Y., & Honavar, V. (2021). Explainable mul-
tivariate time series classification: A deep neural network which
learns to attend to important variables as well as time intervals. In
Proceedings of the 14th ACM international conference on web search
and data mining (pp. 607–615).
Ibarra, R. (2012). Do disaggregated CPI data improve the accuracy of
inflation forecasts? Economic Modelling,29(4), 1305–1313.
Ida, D. (2020). Sectoral inflation persistence and optimal monetary
policy. Journal of Macroeconomics,65(C).
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of
recurrent neural networks for sequence learning. CoRR.
Makridakis, S., Assimakopoulos, V., & Spiliotis, E. (2018). Objectivity, re-
producibility and replicability in forecasting research. International
Journal of Forecasting,34(4), 835–838.
Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2020). The M4
competition: 100,000 time series and 61 forecasting methods.
International Journal of Forecasting,36(1), 54–74.
Mandic, D., & Chambers, J. (2001). Recurrent neural networks for
prediction: Learning algorithms, architectures and stability. Wiley.
McAdam, P., & McNelis, P. (2005). Forecasting inflation with thick
models and neural networks. Economic Modelling,22(5), 848–867.
Medeiros, M., Vasconcelos, G., Veiga, A., & Zilberman, E. (2021).
Forecasting inflation in a data-rich environment: the benefits of
machine learning methods. Journal of Business & Economic Statistics,
39(1).
Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied
econometric approach. Journal of Economic Perspectives,31(2),
87–106.
Nakamura, E. (2005). Inflation forecasting using a neural network.
Economics Letters,86(3), 373–378.
Olson, M., Wyner, A. J., & Berk, R. (2018). Modern neural networks gen-
eralize on small data sets. In Proceedings of the 32nd international
conference on neural information processing systems (pp. 3623–3632).
Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation
functions. CoRR.
Schapire, R. E. (1999). A brief introduction to boosting. In IJCAI’99,
Proceedings of the 16th international joint conference on artificial
intelligence - Vol. 2 (pp. 1401–1406). San Francisco, CA, USA:
Morgan Kaufmann Publishers Inc..
Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-attention with relative
position representations. arXiv preprint arXiv:1803.02155.
Song, Y.-Y., & Ying, L. (2015). Decision tree methods: applications for
classification and prediction. Shanghai Archives of Psychiatry,27(2),
130.
Stock, J. H., & Watson, M. W. (2007). Why has US inflation become
harder to forecast? Journal of Money, Credit and Banking,39, 3–33.
Stock, J. H., & Watson, M. W. (2010). Modeling inflation after the crisis:
Technical Report, National Bureau of Economic Research.
Stock, J. H., & Watson, M. W. (2020). Trend, seasonal, and sectorial
inflation in the euro area. In G. Castex, J. Galí, & D. Saravia (Eds.),
Central banking, analysis, and economic policies book series:vol. 27,
Changing inflation dynamics,evolving monetary policy (pp. 317–344).
Central Bank of Chile.
Swinkels, L. (2018). Simulating historical inflation-linked bond returns.
Journal of Empirical Finance,48(C), 374–389.
Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing
dependence by correlation of distances. The Annals of Statistics,
35(6), 2769–2794.
Woodford, M. (2012). Inflation targeting and financial stability. Sveriges
Riksbank Economic Review,1, 7–32.
Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A review of recurrent
neural networks: LSTM cells and network architectures. Neural
Computation,31(7), 1235–1270.
Zahara, S., & Ilmiddaviq, M. (2020). Consumer price index prediction
using long short term memory (LSTM) based cloud computing.
Journal of Physics: Conference Series,1456, Article 012022.
Zhou, Z. (2012). Measuring nonlinear dependence in time-series, A
distance correlation approach. Journal of Time Series Analysis,33(3),
438–457.
1162