Content uploaded by Jonathan Benchimol
Author content
All content in this area was uploaded by Jonathan Benchimol on Jun 22, 2022
Content may be subject to copyright.
Forecasting CPI Inﬂation Components with Hierarchical
Recurrent Neural Networks
Oren Barkana, Jonathan Benchimolb, Itamar Caspib,
Eliya Cohenc, Allon Hammerc, Noam Koenigsteinc,1
aAriel University
bBank of Israel
cTelAviv University
Abstract
We present a hierarchical architecture based on recurrent neural networks for predict
ing disaggregated inﬂation components of the Consumer Price Index (CPI). While the
majority of existing research is focused on predicting headline inﬂation, many economic
and ﬁnancial institutions are interested in its partial disaggregated components. To
this end, we developed the novel Hierarchical Recurrent Neural Network (HRNN)
model, which utilizes information from higher levels in the CPI hierarchy to improve
predictions at the more volatile lower levels. Based on a large dataset from the US
CPIU index, our evaluations indicate that the HRNN model signiﬁcantly outperforms
a vast array of wellknown inﬂation prediction baselines. Our methodology and results
provide additional forecasting measures and possibilities to policy and market makers
on sectoral and componentspeciﬁc price changes.
Keywords: Inﬂation forecasting, Disaggregated inﬂation, Consumer Price Index,
Machine learning, Gated Recurrent Unit, Recurrent Neural Networks.
JEL Classiﬁcation: C45, C53, E31, E37.
Email address: noamk@tauex.tau.ac.il (Noam Koenigstein)
International Journal of Forecasting, forthcoming June 20, 2022
1. Introduction
The Consumer Price Index (CPI) is a measure of the average change over time in the
prices paid by a representative consumer for a common basket of goods and services.
The CPI attempts to quantify and measure the average costofliving in a given country
by estimating the purchasing power of a single unit of currency. Therefore, it is the key
macroeconomic indicator for measuring inﬂation (or deﬂation). As such, the CPI is
a major driving force in the economy inﬂuencing a plethora of market dynamics. In
this work, we present a novel model based on Recurrent Neural Networks (RNNs) for
forecasting disaggregated CPI inﬂation components.
In the mid1980s, many advanced economies began a major process of disinﬂation
known as the “great moderation”. This period was characterized by steady low inﬂation
and moderate yet steady economic growth (Faust and Wright,2013). Later, the Global
Financial Crisis (GFC) of 2008, and more recently the economic eﬀects of the Covid
19 pandemic, were met with unprecedented monetary policies, potentially altering
the underlying inﬂation dynamics worldwide (Woodford,2012;Gilchrist et al.,2017;
Bernanke et al.,2018). While economists still debate about the underlying forces that
drive inﬂation, all agree on the importance and value of contemporary inﬂation research,
measurements and estimation. Moreover, the CPI is a composite index comprised of
an elaborate hierarchy of subindexes each with its own dynamics and driving forces.
Hence, in order to better understand inﬂation dynamics, it is useful to deconstruct the
CPI index and look into the speciﬁc disaggregated components “underneath” the main
headline.
In the US, the Consumer Price Index (CPI) is calculated and reported by the Bureau
of Labor Statistics (BLS). It represents the cost of a basket of goods and services across
the country on a monthly basis. The CPI is a hierarchical composite index system that
partitions all consumer goods and services into a hierarchy of increasingly detailed
categories. In the US, the top CPI headline is composed of eight major sector indexes:
(1) Housing, (2) Food and Beverages, (3) Medical Care, (4) Apparel, (5) Transportation,
(6) Energy, (7) Recreation, and (8) Other goods and services. Each sector is composed of
ﬁner and ﬁner subindexes until the entrylevels or “leaves” are reached. These entry
level indexes represent concrete measurable products or services whose price levels
are being tracked. For example, the White Bread entry is classiﬁed under the following
eightlevel hierarchy: All Items
→
Food and Beverages
→
Food at Home
→
Cereals and
Bakery Products
→
Cereals and Cereal Products
→
Bakery products
→
Bread
→
White Bread.
The ability to accurately estimate the upcoming disaggregated inﬂation rate is of
high interest to policymakers and market players: Inﬂation forecasting is a critical tool in
adjusting monetary policies around the world (Friedman,1961). Central banks predict
future inﬂation trends to justify interest rate decisions and to control and maintain
inﬂation around its target. Better understanding of upcoming inﬂation dynamics at the
component level can help inform and elucidate decisionmakers for optimal monetary
policy (Ida,2020). Predicting disaggregated inﬂation rates is also important to ﬁscal
authorities that wish to forecast sectoral inﬂation dynamics to adjust social security
payments and assistance packages to speciﬁc industrial sectors. In the private sector,
investors in ﬁxedincome markets wish to estimate future sectorial inﬂation in order
to foresee upcoming trends in discounted real returns. Additionally, some private
2
ﬁrms need to predict speciﬁc inﬂation components in order to forecast price dynamics
and mitigate risks accordingly. Finally, both government and private debt levels and
interest payments heavily depend on the expected path of inﬂation. These are just a
few examples that emphasize the importance of disaggregated inﬂation forecasting.
Most existing inﬂation forecasting models attempt to predict the headline CPI while
implicitly assuming the same approach can be eﬀectively applied to its disaggregated
components (Faust and Wright,2013). However, as we show later, and in line with
the literature, the disaggregated components are more volatile and harder to predict.
Moreover, changes in the CPI components are more prevalent at the lower levels than
up at the main categories. As a result, lower hierarchy levels often have less historical
measurements for training modern machine learning algorithms.
In this work, we present the Hierarchical Recurrent Neural Network (HRNN) model,
a novel model based on RNNs that utilizes the CPI’s inherent hierarchy for improved
predictions at its lower levels. HRNN is a hierarchical arrangement of RNNs analogous
to the CPI’s hierarchy. This architecture allows information to propagate from higher
to lower levels in order to mitigate volatility and information sparsity that otherwise
impedes advanced machine learning approaches. Hence, a key advantage of the HRNN
model stems from its superiority at inﬂation predictions at lower levels of the CPI
hierarchy. Our evaluations indicate that HRNN outperforms many existing baselines at
inﬂation forecasting of diﬀerent CPI components below the top headline and across
diﬀerent time horizons.
Finally, our data and code are publicly available on GitHub
1
to enable reproducibility
and foster future evaluations of new methods. By doing so, we comply with the call to
make data and algorithms more open and transparent to the community (Makridakis
et al.,2018,2020).
The remainder of the paper is organized as follows. Section 2presents a literature
review of baseline inﬂation forecasting models and machine learning models. Section 3
explains recurrent neural networks methodologies. Our novel HRNN model is presented
in Section 4. Section 5describes the price data and data transformations. In Section 6, we
present our results and compare them to alternative approaches. Finally, we conclude
in Section 7by discussing potential implications of the current research and several
future directions.
2. Related Work
While inﬂation forecasting is a challenging task of high importance, the literature
indicates that signiﬁcant improvement upon basic timeseries models and heuristics
is hard to achieve. Indeed, Atkeson et al. (2001) found that forecasts based on simple
averages of past inﬂation were more accurate than all other alternatives, including the
canonical Phillips curve and other forms of structural models. Similarly, Stock and
Watson (2007,2010) provide empirical evidence for the superiority of univariate models
in forecasting inﬂation during the great moderation period (1985 to 2007) and during
the recovery ensuing the GFC. More recently, Faust and Wright (2013) conducted an
extensive survey of inﬂation forecasting methods and found that a simple “glide path”
1The code and data are available at https://github.com/AllonHammer/CPI_HRNN
3
prediction from the current inﬂation rate performs as well as modelbased forecasts for
longrun inﬂation rates and often outperforms them.
Recently, an increasing amount of eﬀort has been directed towards the application
of machine learning models for inﬂation forecasting. For example, Medeiros et al.
(2021) compared inﬂation forecasting with several machine learning models such as
lasso regression, random forests, and deep neural networks. However, Medeiros
et al. (2021) mainly focused on using exogenous features such as cash and credit
availability, online prices, housing prices, consumer data, exchange rates, and interest
rates. When exogenous features are considered, the emphasis shifts from learning the
endogenous time series patterns to eﬀectively extracting the predictive information
from the exogenous features. In contrast to Medeiros et al. (2021), we preclude the use
of any exogenous features and focus on harnessing the internal patterns of the CPI
series. Moreover, unlike previous works that dealt with estimating the main headline,
this work is focused on predicting the disaggregated indexes that comprise the CPI.
In general, machine learning methods ﬂourish where data is found in abundance
and many training examples are available. Unfortunately, this is not the case with
CPI inﬂation data. While a large amount of relevant exogenous features exist, there
are only twelve monthly readings annually. Hence, the amount of available training
examples is limited. Furthermore, Stock and Watson (2007) show that statistics such as
average inﬂation rate, conditional volatility, and persistency levels are shifting in time.
Hence, inﬂation is a nonstationary process, which further limits the amount of relevant
historical data points.
Goulet Coulombe et al. (2022), Mullainathan and Spiess (2017), Athey and Susan
(2018) and Chakraborty and Joseph (2017) present comprehensive surveys of general
machine learning applications in economics. Here, we do not attempt to cover the
plethora of research employing machine learning for economic forecasting. Instead, we
focus on models that apply neural networks to CPI forecasting in the next section.
This paper joins several studies that apply neural network methods to the speciﬁc
task of inﬂation forecasting: Nakamura (2005) employed a simple feedforward network
to predict quarterly CPI headline values. A special emphasis is placed on early stopping
methodologies in order to prevent overﬁtting. Their evaluations are based on US CPI
data during 19782003 and predictions are compared against several autoregressive (AR)
baselines. Presented in Section 6, our evaluations conﬁrm the ﬁndings of Nakamura
(2005), that a fully connected network is indeed eﬀective at predicting the headline CPI.
However, when the CPI components are considered, we show that the model in this
work demonstrates superior accuracy.
Choudhary and Haider (2012) used several neural networks to forecast monthly
inﬂation rates in 28 countries in the Organisation for Economic Cooperation and
Development (OECD). Their ﬁndings showed that, on average, neural network models
were superior in 45% of the countries while simple AR models of order one (AR1)
performed better in 23% of the countries. They also proposed to combine an ensemble
of multiple networks arithmetically for further accuracy.
Chen et al. (2001) explored semiparametric nonlinear autoregressive models with
exogenous variables (NLARX) based on neural networks. Their investigation covered a
comparison of diﬀerent nonlinear activation functions such as the Sigmoid activation,
radial basis activation, and Ridgelet activation.
4
McAdam and McNelis (2005) explored Thick Neural Network models that represent
“trimmed mean” forecasts from several models. By combining the network with a linear
Phillips Curve model, they predict the CPI for the US, Japan, and Europe at diﬀerent
levels.
In contrast to the aforementioned works, our model predicts monthly CPI values
in all hierarchy levels. We utilize information patterns from higher levels of the CPI
hierarchy in order to assist the predictions at lower levels. Such predictions are more
challenging due to the inherent noise and information sparsity at the lower levels.
Moreover, the HRNN model in this work is better equipped to harness sequential
patterns in the data by employing Recurrent Neural Networks. Finally, we exclude the use
of exogenous variables and rely solely on historical CPI data to focus on internal CPI
patterns modeling.
Almosova and Andresen (2019) employed longshort term memory LSTMs for
inﬂation forecasting. They compared their approach to multiple baselines such as
autoregressive models, random walk models, seasonal autoregressive models, Markov
switching models, and fullyconnected neural networks. At all time horizons, the root
mean squared forecast of their LSTM model was approximately onethird of the random
walk model and signiﬁcantly more accurate than the other baselines.
As we explain in Section 3.3, our model uses Gated Recurrent Networks (GRUs),
which are similar to LSTMs. Unlike Almosova and Andresen (2019) and Zahara et al.
(2020), a key contribution of our model stems from its ability to propagate useful
information from higher levels in the hierarchy down to the nodes at lower levels. By
ignoring the hierarchical relations between the diﬀerent CPI components, our model
is reduced to a set of simple unrelated GRUs. This setup is similar to Almosova and
Andresen (2019), as the diﬀerence between LSTMs and GRUs is negligible. In Section 6,
we perform an ablation study in which HRNN ignores the hierarchical relations and is
reduced to a collection of independent GRUs, similar to the model in Almosova and
Andresen (2019). Our evaluations indicate that this approach is not optimal at any level
of the CPI hierarchy.
3. Recurrent Neural Networks
Before describing the HRNN model in detail, we brieﬂy overview the main diﬀerent
RNNs approaches. RNNs are neural networks that model sequences of data in which
each value is assumed to be dependent on previous values. Speciﬁcally, RNNs are feed
forward networks augmented by implementing a feedback loop (Mandic and Chambers,
2001). As such, RNNs introduce a notion of time to the standard feedforward neural
networks and excel at modeling temporal dynamic behavior (Chung et al.,2014). Some
RNN units retain an internal memory state from previous time steps representing an
arbitrarily long context window. Many RNN implementations were proposed and
studied in the past. A comprehensive review and comparison of the diﬀerent RNN
architectures is available in (Lipton et al.,2015) and (Chung et al.,2014). In this section,
we will cover the three most popular units: Basic RNN, LongShort Time Memory
(LSTM), and Gated Recurrent Unit (GRU).
5
Figure 1. An illustration of a basic RNN unit.
TanH
Each line carries an entire vector, from the output of one node to the inputs of others. The yellow box is a learned
neural network layer.
3.1. Basic Recurrent Neural Networks
Let
{𝑥𝑡}𝑇
𝑡=1
be the model’s input time series consisting of
𝑇
samples. Similarly,
let
{𝑠𝑡}𝑇
𝑡=1
be the model’s results consisting of
𝑇
samples from the target time series.
Namely, the model’s input at
𝑡
is
𝑥𝑡
, and its output (prediction) is
𝑠𝑡
. The following set
of equations deﬁnes a basic RNN unit:
𝑠𝑡=tanh (𝑥𝑡𝑢+𝑠𝑡−1𝑤+𝑏),(1)
where
𝑢
,
𝑤
and
𝑏
are the model’s parameters and
tanh(𝑥)=𝑒𝑥−𝑒−𝑥
𝑒𝑥+𝑒−𝑥
is the hyperbolic
tangent function. Namely, the model’s output from the previous period
𝑠𝑡−1
is used as
an additional input to the model at time
𝑡
, along with the current input
𝑥𝑡
. The linear
combination
𝑥𝑡𝑢+𝑠𝑡−1𝑤+𝑏
is the argument of a hyperbolic tangent activation function
allowing the unit to model nonlinear relations between inputs and outputs. Diﬀerent
implementations may employ other activation functions, e.g., the Sigmoid function,
some logistic functions, or a Rectiﬁed Linear Unit (ReLU) function (Ramachandran
et al.,2017). Figure 1depicts an illustration of a basic RNN unit.
3.2. Long Short Term Memory Networks
Basic RNNs suﬀer from the “shortterm memory” problem: they utilize data from
recent history to forecast, but if a sequence is long enough, it cannot carry relevant
information from earlier periods to later ones, e.g., relevant patterns from the same
month in previous years. Long Short Term Memory networks (LSTMs) mitigate the
“shortterm memory” problem by introducing gates that enable the preservation of
relevant “longterm memory” and combining it with the most recent data (Hochreiter
and Schmidhuber,1997). The introduction of LSTMs paved the way for signiﬁcant
strides forward in various ﬁelds such as natural language processing, speech recognition,
robot control, and more (Yu et al.,2019).
An LSTM unit has the ability to “memorize” or “forget” information through the
use of a special memory cell state, carefully regulated by three gates: an input gate, a
forget gate, and an output gate. The gates regulate the ﬂow of information into and out of
6
the memory cell state. An LSTM unit is deﬁned by the following set of equations:
𝑖=𝜎(𝑥𝑡𝑢𝑖+𝑠𝑡−1𝑤𝑖+𝑏𝑖),
𝑓=𝜎(𝑥𝑡𝑢𝑓+𝑠𝑡−1𝑤𝑓+𝑏𝑓),
𝑜=𝜎(𝑥𝑡𝑢𝑜+𝑠𝑡−1𝑤𝑜+𝑏𝑜),
˜
𝑐=tanh (𝑥𝑡𝑢𝑐+𝑠𝑡−1𝑤𝑐+𝑏𝑐),
𝑐𝑡=𝑓×𝑐𝑡−1+𝑖×˜
𝑐,
𝑠𝑡=𝑜×tanh(𝑐𝑡),
(2)
where
𝜎(𝑥)=1
1+𝑒−𝑥
is the sigmoid or logistic activation function.
𝑢𝑖
,
𝑤𝑖
and
𝑏𝑖
are the
learned parameters that control the input gate
𝑖
.
𝑢𝑓
,
𝑤𝑓
and
𝑏𝑓
are the learned parameters
that control the forget gate
𝑓
, and
𝑢𝑜
,
𝑤𝑜
and
𝑏𝑜
are the learned parameters that control
the output gate
𝑜
.
˜
𝑐
is the new candidate activation for the cell state determined by the
parameters
𝑢𝑐
,
𝑤𝑐
and
𝑏𝑐
. The cell state itself
𝑐𝑡
is updated by the linear combination
𝑐𝑡=𝑓×𝑐𝑡−1+𝑖×˜
𝑐
, where
𝑐𝑡−1
is its previous value of the cell state. The input gate
𝑖
determines which parts of the candidate
˜
𝑐
should be used to modify the memory
cell state, and the forget gate
𝑓
determines which parts of the previous memory
𝑐𝑡−1
should be discarded. Finally, the recently updated cell state
𝑐𝑡
is “squashed” through a
nonlinear hyperbolic tangent and the output gate
𝑜
determines which parts of it should
be presented in the output 𝑠𝑡. Figure 2depicts an illustration of an LSTM unit.
Figure 2. An illustration of an LSTM Unit.
Sigmoid
TanH
X
X
+
X
Sigmoid
Sigmoid
TanH
Each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent
pointwise operations, while the yellow boxes are learned neural network layers. Lines merging denote concatenation,
while a line forking denotes its content being copied and the copies going to diﬀerent locations.
3.3. Gated Recurrent Unit
A Gated Recurrent Unit (GRU) improves the LSTM unit by dropping the cell state
in favor of a more simpliﬁed unit that requires less learnable parameters (Dey and
Salemt,2017). GRU employs only two gates instead of three: an update gate and a reset
gate. Using fewer parameters, GRUs are faster and more eﬃcient, especially when
7
training data is limited, such as in the case of inﬂation predictions and particularly
disaggregated inﬂation components.
The following set of equations deﬁnes a GRU unit:
𝑧=𝜎(𝑥𝑡𝑢𝑧+𝑠𝑡−1𝑤𝑧+𝑏𝑧),
𝑟=𝜎(𝑥𝑡𝑢𝑟+𝑠𝑡−1𝑤𝑟+𝑏𝑟),
𝑣=tanh (𝑥𝑡𝑢𝑣+ (𝑠𝑡−1×𝑟)𝑤𝑣+𝑏𝑣),
𝑠𝑡=𝑧×𝑣+ (1−𝑧)𝑠𝑡−1,
(3)
where
𝑢𝑧
,
𝑤𝑧
and
𝑏𝑧
are the learned parameters that control the update gate
𝑧
, and
𝑢𝑟
,
𝑤𝑟
and
𝑏𝑟
are the learned parameters that control the reset gate
𝑟
. The candidate
activation
𝑣
is a function of the input
𝑥𝑡
and the previous output
𝑠𝑡−1
, and is controlled
by the learned parameters:
𝑢𝑣
,
𝑤𝑣
and
𝑏𝑣
. Finally, the output
𝑠𝑡
combines the candidate
activation
𝑣
and the previous state
𝑠𝑡−1
controlled by the update gate
𝑧
. Figure 2depicts
an illustration of a GRU unit.
GRUs enable the “memorization” of relevant information patterns with signiﬁcantly
fewer parameters compared to LSTMs. Hence, GRUs constitute the basic unit for our
novel HRNN model described in Section 4.
Figure 3. An illustration of a GRU unit.
Sigmoid
X
1
+
X
X
Sigmoid Tanh
Each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent
pointwise operations, while the yellow boxes are learned neural network layers. Lines merging denote concatenation,
while a line forking denotes its content being copied and the copies going to diﬀerent locations.
4. Hierarchical Recurrent Neural Networks
The disaggregated components at lower levels of the CPI hierarchy (e.g., newspapers,
medical care, etc.) suﬀer from missing data as well as higher volatility in change rates.
HRNN exhibits a network graph in which each node is associated with a RNN unit
that models the inﬂation rate of a speciﬁc (sub)index (node) in the “full” CPI hierarchy.
8
HRNN’s unique architecture allows it to propagate information from RNN nodes in
higher levels to lower levels in the CPI hierarchy, coarse to ﬁnegrained, via a chain of
hierarchical informative priors over the RNNs’ parameters. This unique property of
HRNN is materialized in better predictions for nodes at lower levels of the hierarchy, as
we show later in Section 6,
4.1. Model Formulation
Let
ℐ={𝑛}𝑁
𝑛=1
be an enumeration of the nodes in the CPI hierarchy graph. In
addition, we deﬁne
𝜋𝑛∈ ℐ
as the parent node of the node
𝑛
. For example, if the nodes
𝑛=5
and
𝑛=19
represent the indexes of tomatoes and vegetables respectively, then
𝜋5=19 i.e. the parent node of tomatoes is vegetables.
For each node
𝑛∈ ℐ
, we denote by
𝑥𝑛
𝑡∈R
the observed random variable that
represents the CPI value of the node
𝑛
at timestamp
𝑡∈N
. We further denote
𝑋𝑛
𝑡,(𝑥𝑛
1, ..., 𝑥𝑛
𝑡)
, where
1≤𝑡≤𝑇𝑛
and
𝑇𝑛
is the last timestamp for node
𝑛
. Let
𝑔:R𝑚×Ω→R
be a parametric function representing an RNN node in the hierarchy.
Speciﬁcally,
R𝑚
is the space of parameters that control the RNN unit,
Ω
is the input
time series space, and the function
𝑔
predicts a scalar value for the next value of the
input series. Hence, our goal is to learn the parameters
𝜃𝑛∈R𝑚
s.t. for
𝑋𝑛
𝑡∈Ω
,
𝑔(𝜃𝑛, 𝑋𝑛
𝑡)=𝑥𝑛
𝑡+1,∀𝑛∈ ℐ, and 1≤𝑡<𝑇𝑛.
We proceed by assuming a Gaussian error on
𝑔
’s predictions and receive the
following expression for the likelihood of the observed time series:
𝑝(𝑋𝑛
𝑇𝑛𝜃𝑛,𝜏𝑛)=
𝑇𝑛
Ö
𝑡=1
𝑝(𝑥𝑛
𝑡𝑋𝑛
𝑡−1,𝜃𝑛,𝜏𝑛)=
𝑇𝑛
Ö
𝑡=1
𝒩(𝑥𝑛
𝑡;𝑔(𝜃𝑛, 𝑋𝑛
𝑡−1),𝜏−1
𝑛),(4)
where 𝜏−1
𝑛∈Ris the variance of 𝑔’s errors.
Next, we deﬁne a hierarchical network of normal priors over the nodes’ parameters
that attach each node’s parameters with those of its parent node. The hierarchical priors
follow:
𝑝(𝜃𝑛𝜃𝜋𝑛,𝜏𝜃𝑛)=𝒩(𝜃𝑛;𝜃𝜋𝑛,𝜏−1
𝜃𝑛I),(5)
where
𝜏𝜃𝑛
is a conﬁgurable precision parameter that determines the “strength” of the
relation between node
𝑛
’s parameters and the parameters of its parent
𝜋𝑛
. Higher
values of 𝜏𝜃𝑛strengthen the attachment between 𝜃𝑛and its prior 𝜃𝜋𝑛.
The precision parameter
𝜏𝜃𝑛
can be seen as a global hyperparameter of the model
to be optimized via crossvalidation. However, diﬀerent nodes in CPI the hierarchy
have varying degrees of correlation with their parent nodes. Hence, the value of
𝜏𝜃𝑛
in
HRNN is given by:
𝜏𝜃𝑛=𝑒𝛼+𝐶𝑛,(6)
where
𝛼
is a hyperparameter and
𝐶𝑛=𝜌(𝑋𝑛
𝑇𝑛, 𝑋𝜋𝑛
𝑇𝜋𝑛)
is the Pearson correlation coeﬃcient
between the time series of 𝑛and its parent 𝜋𝑛.
Importantly, Equation
(5)
describes a novel prior relationship between the parameters
of a node and its parent in the hierarchy that “grows” increasingly stronger according to
the historical correlation between the two series. This ensures that a child node
𝑛
is kept
close to its parent node
𝜋𝑛
in terms of squared Euclidean distance in the parameters
space, especially if they are highly correlated. Note that in the case of the root node
9
Figure 4. An illustration of the full HRNN model.
Zoom in
Hierarchy
Level 0
CPI
Headline
Level 1
EnergyApparel Food
Level 2
Men's
apparel
Fruits and
Vegetables
Footwear
Level 3
Vegetables
Fruits
(the headline CPI),
𝜋𝑛
does not exist and hence we set a normal noninformative
regularization prior with zero mean and unit variance.
Let us now denote the aggregation of all series from all levels by
𝑋={𝑋𝑛
𝑇𝑛}𝑛∈ℐ
.
Similarly, we denote by
𝜃={𝜃𝑛}𝑛∈ℐ
and
T={𝜏𝑛}𝑛∈ℐ
the aggregation of all the RNN
parameters and precision parameters from all levels, respectively. Note that
𝑋
(the data)
is observed,
𝜃
are unobserved learned variables, and
T
are determined by Equation
(6)
.
The hyperparameter 𝛼from Equation (6) is set by a crossvalidation procedure.
With these deﬁnitions at hand, we now proceed with the Bayes rule. From Equation 4
and Equation (5), we extract the posterior probability:
𝑝(𝜃𝑋, T)=𝑝(𝑋𝜃,T)𝑝(𝜃)
𝑃(𝑋)∝
Ö
𝑛∈ℐ
𝑇𝑛
Ö
𝑡=1
𝒩(𝑥𝑛
𝑡;𝑔(𝜃𝑛, 𝑋𝑛
𝑡−1),𝜏−1
𝑛)Ö
𝑛∈ℐ
𝒩(𝜃𝑛;𝜃𝜋𝑛,𝜏−1
𝜃𝑛I).
(7)
HRNN optimization follows a Maximum APosteriori (MAP) approach. Namely, we
wish to ﬁnd optimal parameter values 𝜃∗such that:
𝜃∗=argmax
𝜃
log 𝑝(𝜃𝑋, T).(8)
Note that the objective in Equation
(8)
depends on the parametric function
𝑔
. HRNN
is a general framework that can use any RNN, e.g., Simple RNN, LSTM, GRU, etc. In this
10
work, we chose
𝑔
to be a scalar GRU because GRUs are capable of longterm memory
but with fewer parameters than LSTMs. Hence, each node
𝑛
is associated with a GRU
with its own parameters:
𝜃𝑛=[𝑢𝑧
𝑛, 𝑢𝑟
𝑛, 𝑢𝑣
𝑛, 𝑤𝑧
𝑛, 𝑤𝑟
𝑛, 𝑤𝑣
𝑛, 𝑏𝑧
𝑛, 𝑏𝑟
𝑛, 𝑏𝑣
𝑛]
. Then,
𝑔(𝜃𝑛, 𝑋𝑛
𝑡)
is
computed by
𝑡
successive applications of the GRU to
𝑥𝑛
𝑖
with
1≤𝑖≤𝑡
according to
Equation
(3)
. Finally, the HRNN optimization proceeds with stochastic gradient ascent
over the objective in Equation
(8)
. Figure 4depicts an illustration of the entire HRNN
architecture.
4.2. HRNN Inference
In machine learning, after the model’s parameters have been estimated in the training
process, it can be applied to make predictions in a process known as inference. In our
case, equipped with the MAP estimate
𝜃∗
, inference with the HRNN model is achieved
as follows: Given a sequence of historical CPI values
𝑋𝑛
𝑡
for node
𝑛
, we predict the
next CPI value
𝑦𝑛
𝑡+1=𝑔(𝜃𝑛, 𝑋𝑛
𝑡)
, as explained in Section 4.1. This type of prediction
is for next month’s CPI, namely, horizon
ℎ=0
. In this work, we also test the ability
of the model to perform predictions for further horizons
ℎ∈ {0, .., 8}
. The
ℎ
horizon
predictions are obtained in a recursive manner, whereby each predicted value
𝑦𝑛
𝑡
is fed
back as an input for the prediction of
𝑦𝑛
𝑡+1
. As expected, Section 6shows that forecasting
accuracy gradually degrades as horizon ℎincreases.
5. Dataset
This work is based on monthly CPI data released by the US Bureau of Labor and
Statistics (BLS). In what follows, we discuss the dataset’s characteristics and our pre
processing procedures. For the sake of reproducibility, the ﬁnal version of the processed
data is available in our HRNN code.
5.1. The US Consumer Price Index
The oﬃcial CPI of each month is released by the BLS several days into the following
month. The price tags are collected in 75 urban areas throughout the US from about
24,000 retail and service establishments. The housing and rent rates are collected from
about 50,000 landlords and tenants across the country. The BLS releases two diﬀerent
measurements according to urban demographics:
1.
The
CPIU
represents the CPI for urban consumers and covers approximately
93% of the total population. According to the Consumer Expenditure Survey, the
CPI items and their relative weights are derived from their estimated expenditure.
These items and their weights are updated each year in January.
2.
The
CPIW
represents the CPI for urban wage earners and clerical workers and
covers about 29% of the population. This index is focused on households with at
least 50 percent of income coming from clerical or wagepaying jobs, and at least
one of the household’s earners must have been employed for at least 70% of the
year. CPIW indicates changes in the cost of beneﬁts, as well as future contract
obligations.
11
In this work, we focus on CPIU, as it is generally considered the best measure for the
average cost of living in the US. Monthly CPIU data per product is generally available
from January 1994. Our samples thus span from January 1994 to March 2019. Note
that throughout the years, new indexes were added, and some indexes have been
omitted. Consequently, hierarchies can change, which contributes to the challenge of
our exercise.
5.2. The CPI Hierarchy
The CPIU is an eightlevel deep hierarchy comprising 424 diﬀerent nodes (indexes).
Level 0 represents the headline CPI, or the aggregated index of all components. An index
at any level is associated with a weight between 0100, which represents its contribution
to the headline CPI at level 0. Level 1 consists of the 8 main aggregated categories or
sectors: (1) “Food and Beverages”, (2) “Housing”, (3) “Apparel”, (4) “Transportation”,
(5) “Medical Care”, (6) “Recreation”, (7) “Education and Communication”, and (8)
“Other Goods and Services”. Midlevels (25) consist of more speciﬁc aggregations e.g.,
“Energy Commodities”, “Household Insurance”, etc. The lower levels (68) consists of
ﬁnegrained indexes, e.g., “Apples”, “Bacon and Related Products”, “Eyeglasses and
Eye Care”, “Tires”, “Airline fares”, etc. Tables 7and 8(in Appendix A) depict the ﬁrst
three hierarchies of the CPI (levels 02).
5.3. Data Preparation
We used publicly available data from the BLS website
2
. However, the BLS releases
hierarchical data on a monthly basis in separate ﬁles. Hence, separate monthly ﬁles
from January 1994 until March 2019 were processed and aggregated to create a single
repository. Moreover, the format of these ﬁles has changed over the years (e.g., txt, pdf,
and csv formats were all in use) and a signiﬁcant eﬀort was made in order to parse the
changing formats from diﬀerent time periods.
The hierarchical CPI data is released in terms of monthly index values. We
transformed the CPI values to monthly logarithmic change rates as follows: We denote
by
𝑥𝑡
the CPI value (of any node) at month
𝑡
. The logarithmic change rate at month
𝑡
is
denoted by 𝑟𝑎𝑡𝑒(𝑡)and given by:
𝑟𝑎𝑡𝑒(𝑡)=100 ×log 𝑥𝑡
𝑥𝑡−1.(9)
Unless otherwise mentioned, the remainder of the paper relates to monthly logarithmic
change rates as in Equation (9).
We split the data into a training dataset and a test dataset as follows: For each time
series, we kept the ﬁrst (early in time) 70% of the measurements for the training dataset.
The remaining 30% of the measurements were removed from the training dataset and
used to form the test dataset. The training dataset was used to train the HRNN model as
well as the other baselines. The test dataset was used for evaluations. The results in
Section 6are based on this split.
Table 1summarizes the number of data points and general statistics of the CPI time
series after applying Equation
(9)
. When comparing the headline CPI with the full
2www.bls.gov/cpi
12
Table 1: Descriptive Statistics
Data set # Monthly Mean STD Min Max # of Avg. Measurements
Measurements Indexes per Index
Headline Only 303 0.18 0.33 1.93 1.22 1 303
Level 1 6742 0.17 0.96 18.61 11.32 34 198.29
Level 2 6879 0.12 1.10 19.60 16.81 46 149.54
Level 3 7885 0.17 1.31 34.23 16.37 51 121.31
Level 4 7403 0.08 1.97 35.00 28.17 58 107.89
Level 5 10809 0.01 1.43 21.04 242.50 92 87.90
Level 6 7752 0.09 1.49 11.71 16.52 85 86.13
Level 7 4037 0.11 1.53 11.90 9.45 50 80.74
Level 8 595 0.08 1.56 5.27 5.02 7 85.00
Full Hierarchy 52405 0.10 1.75 35.00 242.50 424 123.31
Notes: General statistics of the headline CPI and CPIU for each level in the hierarchy and the full
hierarchy of indexes.
hierarchy, we see that at lower levels the standard deviation (STD) is signiﬁcantly higher
and the dynamic range is larger, implying much more volatility. The average number of
measurements per index decreases at the lower levels of the hierarchy as not all indexes
are available for the entire period.
Figure 5depicts box plots of the CPI change rate distributions at diﬀerent levels.
The boxes depict the median value and the upper 75’th and lower 25’th percentiles.
The whiskers indicate the overall minimum and maximum rates. Figure 5further
emphasizes that the change rates are more volatile as we go down the CPI hierarchy.
High dynamic range, high standard deviation, and less training data are all indicators
of the diﬃculty of making predictions inside the hierarchy. Based on this information,
we can expect that the disaggregated component predictions inside the hierarchy will
be more diﬃcult than the headline.
Finally, Figure 6depicts a box plot of the CPI change rate distribution for diﬀerent
sectors. We notice that some sectors (e.g., apparel and energy) suﬀer from higher
volatility than others. As expected, predictions for these sectors will be more diﬃcult.
6. Evaluation and Results
We evaluate HRNN and compare it with wellknown baselines for inﬂation prediction
as well as some alternative machine learning approaches. We use the following notation:
Let
𝑥𝑡
be the CPI logchange rate at month
𝑡
. We consider models for
ˆ
𝑥𝑡
 an estimate for
𝑥𝑡
based on historical values. Additionally, we denote by
𝜀𝑡
the estimation error at time
𝑡
. In all cases, the
ℎ
horizon forecasts were generated by recursively iterating the one
step forecasts forward. Hyperparameters were set through a 10fold crossvalidation
procedure.
6.1. Baseline Models
We compare HRNN with the following CPI prediction baselines:
1. Autoregression (AR) 
The AR(
𝜌
) estimates
ˆ
𝑥𝑡
based on the previous
𝜌
months
as follows:
ˆ
𝑥𝑡=𝛼0+Í𝜌
𝑖=1𝛼𝑖𝑥𝑡−𝑖+𝜀𝑡
, where
{𝛼𝑖}𝜌
𝑖=0
are the model’s parameters.
13
Figure 5. Box plots of monthly inﬂation rate per hierarchy level.
012345678
Hierarchy Level
−1.0
−0.5
0.0
0.5
1.0
1.5
Monthly Rate
Figure 6. Box plots of monthly inﬂation rate per sector.
Food and
beverages Transport Housing Apparel Services Energy Medical
care Recreation
Sector
−3
−2
−1
0
1
2
3
4
Monthly Rate
2. Phillips Curve (PC) 
A PC(
𝜌
) is an extension of AR(
𝜌
) that considers the
unemployment rate
𝑢𝑡
at month
𝑡
in CPI forecasting model such as:
ˆ
𝑥𝑡=
𝛼0+Í𝜌
𝑖=1𝛼𝑖𝑥𝑡−𝑖+𝛽𝑢𝑡−1+𝜀𝑡, where {𝛼𝑖}𝜌
𝑖=0and 𝛽are the model’s parameters.
3. Vector Autoregression (VAR) 
The VAR(
𝜌
) model is a multivariate generalization
of AR(
𝜌
). It is frequently used to model two or more time series together. VAR(
𝜌
)
estimates next month’s values of
𝑘
time series based on their historical values
from the previous
𝜌
months as follows:
ˆ
𝑋𝑡=𝐴0+ (Í𝜌
𝑖=1𝐴𝑖𝑋𝑡−𝑖) + 𝜖𝑡
, where
𝑋𝑡
are
the last
𝜌
values from
𝑘
diﬀerent time series at month
𝑡
, and
ˆ
𝑋𝑡
are the model’s
14
estimates of these values,
{𝐴𝑖}𝜌
𝑖=0
are a
(𝑘×𝑘)
matrices of parameters, and
𝜖𝑡
is a
vector of error terms.
4. Random Walk (RW) 
We consider the RW(
𝜌
) model of Atkeson et al. (2001).
RW(
𝜌
) is a simple, yet eﬀective, model that predicts next month’s CPI as an average
of the last 𝜌months by: ˆ
𝑦𝑡=1
𝜌Í𝜌
𝑖=1𝑥𝑡−𝑖+𝜀𝑡.
5. Auto Regression in Gap (ARGAP) 
The ARGAP model subtracts a ﬁxed
inﬂation trend before predicting the inﬂation in gap (Faust and Wright,2013).
Inﬂation gap is deﬁned as
𝑔𝑡=𝑥𝑡−𝜏𝑡
, where
𝜏𝑡
is the inﬂation trend at time
𝑡
which represents a slowlyvarying local mean. This trend value is estimated
using RW(
𝜌
) as follows:
𝜏𝑡=1
𝜌Í𝜌
𝑖=1𝑥𝑡−𝑖
. By accounting for the local inﬂation
trend
𝜏𝑡
, the model attempts to increase stationarity in
𝑔𝑡
and estimate it by
ˆ
𝑔𝑡=𝛼0+Í𝜌
𝑖=1𝛼𝑖𝑔𝑡−𝑖+𝜀𝑡
, where
{𝛼𝑖}𝜌
𝑖=0
are the model’s parameters. Finally,
𝜏𝑡
is added back to
ˆ
𝑔𝑡
to achieve the forecast for the ﬁnal inﬂation prediction:
ˆ
𝑥𝑡=ˆ
𝑔𝑡+𝜏𝑡.
6. Logistic Smooth Transition Auto Regressive Model (LSTAR) 
The LSTAR is an
extension of AR that allows for changes in the model parameters according to a
transition variable
𝐹(𝑡;𝑐, 𝛾)
. LSTAR(
𝜌, 𝑐, 𝛾
) consists of two AR(
𝜌
) components
that describe two trends in the data (high and low), and a nonlinear transition
function that links them as follows:
ˆ
𝑥𝑡= 𝛼0+
𝜌
Õ
𝑖=1
𝛼𝑖𝑥𝑡−𝑖!(1−𝐹(𝑡;𝛾, 𝑐)) + 𝛽0+
𝜌
Õ
𝑖=1
𝛽𝑖𝑥𝑡−𝑖!𝐹(𝑡;𝛾, 𝑐)+𝜀𝑡,(10)
where
𝐹(𝑡;𝛾, 𝑐)=1
1+𝑒−𝛾(𝑡−𝑐)
is a ﬁrstorder logistic transition function that depends
on the location parameter
𝑐
, and a smoothing parameter
𝛾
. The location parameter
𝑐
can be interpreted as the threshold between the two AR(
𝜌
) regimes, in the sense
that the logistic function changes monotonically from 0 to 1 as
𝑡
increases and
balances symmetrically at
𝑡=𝑐
(van Dĳk et al.,2002). The model’s parameters are
{𝛼𝑖}𝜌
𝑖=0and {𝛽𝑖}𝜌
𝑖=0, while 𝛾, and 𝑐are hyperparameters.
7. Random Forests (RF) 
The RF(
𝜌
) model is an ensemble learning method which
builds a set of decision trees (Song and Ying,2015) in order to mitigate overﬁtting
and improve generalization (Breiman,2001). At prediction time, the average
prediction of the individual trees is returned. The inputs to the RF(
𝜌
) model are
the last 𝜌samples and the output is the predicted value for the next month.
8. Gradient Boosted Trees (GBT) 
The GBT(
𝜌
) model (Friedman,2002) is based
on an ensemble of decision trees which are trained in a stagewise fashion similar
to other boosting models (Schapire,1999). Unlike RF(
𝜌
) which averages the
prediction of several decision trees, the GBT(
𝜌
) trains each tree to minimize the
remaining residual error of all previous trees. At prediction time, the sum of
predictions of all the trees is returned. The inputs to the GBT(
𝜌
) model are the
last 𝜌samples and the output is the predicted value for the next month.
15
9. Fully Connected Neural Network (FC) 
The FC(
𝜌
) model is a fully connected
neural network with one hidden layer and a ReLU activation (Ramachandran
et al.,2017). The output layer employs no activation to formulate a regression
problem with a squared loss optimization. The inputs to the FC(
𝜌
) model are the
last 𝜌samples and the output is the predicted value for the next month.
10. Deep Neural Network (DeepNN) 
The DeepNN(
𝜌
) model is a deep neural
network consisting of 10 layers with 100 neurons as in Olson et al. (2018), which
was shown to perform well for inﬂation prediction (Goulet Coulombe,2020). We
used the original setup of Olson et al. (2018) and tuned its hyperparameters as
follows: learning rate was set to
𝑙𝑟 =0.005
, training lasted 50 epochs (instead of
200), and the ELU activation functions (Clevert et al.,2016) were replaced by ReLU
activation functions. These changes yielded more accurate predictions, hence we
decided to include them in all our evaluations. The inputs to the DeepNN(
𝜌
)
model are the last
𝜌
samples and the output is the predicted value for the next
month.
11. Deep Neural Network with Unemployment (DeepNN + Unemployment) 
Similar to PC(
𝜌
) which extends AR(
𝜌
) by including unemployment data, the
DeepNN(
𝜌
) + Unemployment model extends DeepNN(
𝜌
) by including the last
𝜌
samples of the unemployment rate
𝑢𝑡
. In terms of hyperparameters, we used
identical values as in the DeepNN(𝜌).
6.2. Ablation Models
In order to demonstrate the contribution of hierarchical component of the HRNN
model, we conducted an ablation study that considered “simpler” alternatives to HRNN
based on GRUs without the hierarchical component:
1. Single (SGRU) 
The SGRU(
𝜌
) is a single GRU unit that receives the last
𝜌
values
as inputs in order to predict the next value. In GRU(
𝜌
), a single GRU is used for
all the time series that comprise the CPI hierarchy. This baseline utilizes all the
beneﬁts of a GRU but assumes that the diﬀerent components of the CPI behave
similarly and a single unit is suﬃcient to model all the nodes.
2. Independent GRUs (IGRUs) 
In IGRUs(
𝜌
), we trained a diﬀerent GRU(
𝜌
) unit
for each CPI node. The SGRU and IGRU approaches represent two extremes:
The ﬁrst attempts to model all the CPI nodes with a single model, while the second
treats each node separately. IGRUs(
𝜌
) is equivalent to a variant of HRNN that
ignores the hierarchy by setting the precision parameter
𝜏𝜃𝑛=0; ∀𝑛∈ ℐ
. Namely,
this is a simple variant of HRNN that trains independent GRUs, one for each
index in the hierarchy.
3. KNearest Neighbors GRU (KNNGRU) 
In order to demonstrate the contribu
tion of the hierarchical structure of HRNN, we devised the KNNGRU(
𝜌
) baseline.
KNNGRU attempts to utilize information from multiple Pearsoncorrelated CPI
nodes without employing the hierarchical informative priors. Hence, KNNGRU
presents a “simpler” alternative to HRNN that replaces the hierarchical structure
with elementary vector GRUs as follows: First, the
𝑘
nearest neighbors of each
16
CPI node were found using the Pearson correlation measure. Then, separate
vector GRU(
𝜌
) units were trained for each CPI aggregate along its
𝑘
most similar
nodes using the last
𝜌
values of node
𝑛
and its
𝑘
nearest nodes. By doing so, the
KNNGRU(
𝜌
) baseline was able to utilize both the beneﬁts of GRU units together
with relevant information that comes from correlated nodes.
6.3. Evaluation Metrics
Following Faust and Wright (2013) and Aparicio and Bertolotto (2020), we report
results in terms of three evaluation metrics:
1. Root Mean Squared Error (RMSE)  The RMSE is given by:
𝑅𝑀𝑆𝐸 =v
u
t1
𝑇
𝑇
Õ
𝑡=1
(𝑥𝑡−ˆ
𝑥𝑡)2,(11)
where
𝑥𝑡
are the monthly change rate for month
𝑡
, and
ˆ
𝑥𝑡
are the corresponding
predictions.
2. Pearson Correlation Coeﬃcient 
The Pearson correlation coeﬃcient
𝜙
is given
by:
𝜙=𝐶𝑂𝑉 (𝑋𝑇,ˆ
𝑋𝑇)
𝜎𝑋×𝜎ˆ
𝑋
,(12)
where
𝐶𝑂𝑉 (𝑋𝑇,ˆ
𝑋𝑇)
is the covariance between the series of actual values and the
predictions, and
𝜎𝑋𝑇
and
𝜎ˆ
𝑋𝑇
are the standard deviations of the actual values and
the predictions, respectively.
3. Distance Correlation Coeﬃcient 
In contrast to the Pearson correlation measure,
which detects linear associations between two random variables, the distance
correlation measure can also detect nonlinear correlations (Székely et al.,2007;
Zhou,2012). The distance correlation coeﬃcient 𝑟𝑑is given by:
𝑟𝑑=dCov(𝑋𝑇,ˆ
𝑋𝑇)
qdVar(𝑋𝑇) × dVar(ˆ
𝑋𝑇)
,(13)
where
dCov(𝑋𝑇,ˆ
𝑋𝑇)
is the distance covariance between the series of actual values
and the predictions, and
dVar(𝑋𝑇)
and
dVar(ˆ
𝑋𝑇)
are the distance variance of the
actual values and the predictions, respectively.
6.4. Results
The HRNN model is unique in its ability to utilize information from higher levels in
the CPI hierarchy in order to make predictions at lower levels. Therefore, we provide
results for each level of the CPI hierarchy  overall 424 disaggregated indexes belonging
to 8 diﬀerent hierarchies. For the sake of completion, we also provide results for the
headline CPI index by itself. It is important to note that in this case, the HRNN model
17
Table 2: Average Results on Disaggregated CPI Components
Model RMSE per horizon Correlation
Name AR(1)=1.00 (at horizon=0)
0 1 2 3 4 8 Pearson Distance
AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.06 0.05
AR(2) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
AR(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
AR(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
ARGAP(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
ARGAP(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
RW(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.05 0.04
Phillips(4) 1.00 1.00 1.00 1.00 0.98 1.00 0.06 0.04
VAR(1) 1.03 1.03 1.04 1.03 1.04 1.05 0.04 0.03
VAR(2) 1.03 1.03 1.04 1.03 1.04 1.05 0.06 0.03
VAR(3) 1.03 1.03 1.03 1.03 1.04 1.05 0.06 0.03
VAR(4) 1.02 1.03 1.03 1.03 1.03 1.04 0.07 0.04
LSTAR(𝜌=4,𝑐=2,𝛾=0.3) 1.04 1.07 1.07 1.07 1.08 1.1 0.09 0.07
GBT(4) 0.83 0.83 0.83 0.84 0.84 0.86 0.18 0.27
RF(4) 0.84 0.85 0.86 0.86 0.86 0.87 0.19 0.29
FC(4) 1.03 1.03 1.04 1.04 1.04 1.05 0.12 0.09
DeepNN(4) 0.90 0.90 0.90 0.90 0.91 0.91 0.13 0.22
DeepNN(4) + Unemployment 0.85 0.85 0.85 0.85 0.85 0.86 0.12 0.22
SGRU(4) 1.02 1.06 1.06 1.07 1.04 1.12 0.10 0.08
IGRU(4) 0.83 0.84 0.85 0.85 0.86 0.89 0.17 0.13
KNNGRU(1) 0.91 0.93 0.96 0.97 0.96 0.96 0.19 0.15
KNNGRU(2) 0.90 0.93 0.95 0.97 0.96 0.96 0.20 0.15
KNNGRU(3) 0.89 0.92 0.95 0.96 0.96 0.95 0.20 0.15
KNNGRU(4) 0.89 0.91 0.95 0.95 0.95 0.95 0.20 0.15
HRNN(1) 0.79 0.79 0.81 0.81 0.81 0.83 0.23 0.28
HRNN(2) 0.78 0.79 0.81 0.81 0.80 0.82 0.22 0.29
HRNN(3) 0.79 0.78 0.80 0.81 0.81 0.81 0.23 0.30
HRNN(4) 0.78 0.78 0.79 0.79 0.79 0.80 0.24 0.29
Notes: Average results across all 424 inﬂation indexes that make up the headline CPI. The RMSE results
are relative to the
𝐴𝑅(1)
model and normalized according to its results, i.e.,
𝑅𝑀 𝑆𝐸𝑀 𝑜 𝑑𝑒 𝑙
𝑅𝑀 𝑆𝐸𝐴 𝑅(1)
. Results are
statistically signiﬁcant according to DieboldMariano test with 𝑝<0.02.
cannot utilize its hierarchical mechanism and has no advantage over the alternatives, so
we do not expect it to outperform.
Table 2depicts the average results from all the disaggregated indexes in the CPI
hierarchy. We present prediction results for horizons 0, 1, 2, 3, 4, and 8 months. The
results are relative to the
𝐴𝑅(1)
model and normalized according to:
𝑅𝑀 𝑆𝐸𝑀𝑜𝑑𝑒𝑙
𝑅𝑀 𝑆𝐸𝐴𝑅(1)
. In
HRNN we set
𝛼=1.5
, and the VGRU(
𝜌
) models were based on
𝑘=5
nearest neighbors.
Table 2shows that diﬀerent versions of the HRNN model repeatedly outperform the
alternatives at any horizon. Notably, HRNN is superior to IGRU, which emphasizes
the importance of using hierarchical information and the superiority of HRNN over
regular GRUs. Additionally, the HRNN is also superior to the diﬀerent KNNGRU
models, which emphasizes the speciﬁc way HRNN employs informative priors based
on the CPI hierarchy. These results are statistically signiﬁcant according to Diebold
and Mariano (1995) pairwise tests for a squared lossdiﬀerential with pvalues below
0.02
. Additionally, we performed a Model Conﬁdence Set (MCS) test (Hansen et al.,
18
Table 3: CPI Headline Only
Model RMSE per horizon Correlation
Name* AR(1)=1.00 (at horizon=0)
0 1 2 3 4 8 Pearson Distance
AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.29 0.22
AR(2) 1.00 0.97 0.99 1.01 1.00 0.98 0.32 0.24
AR(3) 1.00 0.98 0.98 1.00 0.96 0.97 0.33 0.25
AR(4) 1.00 0.95 0.95 0.96 0.93 0.96 0.33 0.25
ARGAP(3) 1.00 0.98 0.98 1.00 0.96 0.97 0.33 0.25
ARGAP(4) 0.99 0.95 0.95 0.96 0.93 0.96 0.33 0.25
RW(4) 1.05 0.98 0.99 1.01 0.97 0.96 0.23 0.2
Phillips(4) 0.93 0.94 0.95 0.95 0.93 0.95 0.33 0.25
LSTAR(𝜌=4,𝑐=2,𝛾=0.3) 0.98 0.95 0.95 0.97 0.95 0.95 0.32 0.24
RF(4) 1.05 1.06 1.03 1.07 1.04 1.03 0.27 0.28
GBT(4) 0.97 0.99 0.93 0.95 0.93 0.93 0.25 0.35
FC(4) 0.92 0.94 0.94 0.96 0.93 0.94 0.33 0.25
DeepNN(4) 0.94 0.97 0.96 0.98 0.94 0.92 0.31 0.32
DeepNN(4) + Unemployment 1.00 0.97 0.92 0.94 0.92 0.91 0.37 0.32
HRNN(4) / GRU(4) 1.00 0.97 0.99 0.99 0.96 0.99 0.35 0.37
Notes: Prediction results for the CPI headline index alone. The RMSE results are relative to the
𝐴𝑅(1)
model and normalized according to its results, i.e., 𝑅𝑀 𝑆𝐸𝑀 𝑜𝑑 𝑒 𝑙
𝑅𝑀 𝑆𝐸𝐴 𝑅(1).
2011) for the leading models: RF(4), DeepNN(4), DeepNN(4) + Unemployment,
GBT(4), IGRU(4), HRNN(1), HRNN(2), HRNN(3), and HRNN(4). MCS removed all the
baselines and left only the four HRNN variants, with HRNN(4) as the leading model
(𝑝𝐻𝑅 𝑁 𝑁 (4)=1.00).
For the sake of completion, we also provide results for predictions at the head of
the CPI index. Table 3summarizes these results. When considering only the headline,
the hierarchical mechanism of HRNN is redundant and the model is identical to a
single GRU unit. In this case, we do not observe much advantage for employing the
HRNN model. In contrast, we see an advantage for the other deep learning models
such as FC(4) and DeepNN(4) + Unemployment that outperform the more “traditional”
approaches.
Table 4depicts the results of HRNN(4), the best model, across all hierarchies (
1

8
,
excluding the headline). Additionally, we included the results of the best ablation
model, the IGRU(4) model, for comparison. Results are averaged over all disaggregated
components and normalized by the AR(1) model RMSE as before. As evident from
Table 4, the HRNN model shows the best relative performance at the lower levels of the
hierarchy where the CPI indexes are more volatile and the hierarchical priors are most
eﬀective.
Table 5compares the results of HRNN(4) across diﬀerent sectors. Again, we included
the results of the IGRU(4) model for comparison. The results are averaged over all
disaggregated components and presented as normalized gains with respect to the AR(1)
model as before. The best relative improvement of HRNN(4) model appears to be in
the Food and Beverages group. This can be explained by the fact that the Food and
Beverages subhierarchy is the deepest and most elaborate hierarchy in the CPI tree.
When the hierarchy is deeper and more elaborate, HRNN advantages are emphasized.
19
Table 4: HRNN(4) vs. IGRU(4) at diﬀerent levels of the CPI hierarchy with respect to AR(1)
Hierarchy HRNN(4)  IGRU(4)
Level 
RMSE per horizon Correlation  RMSE per horizon Correlation
AR(1)=1.00 (at horizon=0)  AR(1)=1.00 (at horizon=0)
0 2 4 8 Pearson Distance  0 2 4 8 Pearson Distance
Level 1 0.95 0.97 0.99 1.00 0.33 0.37  0.98 0.98 0.99 0.97 0.25 0.38
Level 2 0.91 0.90 0.91 0.91 0.30 0.35  0.90 092 0.94 0.93 0.26 0.34
Level 3 0.79 0.79 0.80 0.81 0.21 0.31  0.82 0.89 0.94 0.94 0.23 0.37
Level 4 0.77 0.77 0.76 0.77 0.26 0.32  0.84 0.87 0.90 0.92 0.20 0.33
Level 5 0.79 0.77 0.77 0.80 0.21 0.31  0.85 0.89 0.89 0.93 0.22 0.29
Level 6 0.75 0.76 0.81 0.81 0.19 0.23  0.85 0.89 0.90 0.92 0.21 0.21
Level 7 0.75 0.78 0.77 0.80 0.17 0.17  0.87 0.89 0.92 0.94 0.18 0.15
Level 8 0.72 0.78 0.77 0.78 0.10 0.23  0.89 0.90 0.92 0.94 0.10 0.12
Notes: The RMSE results are relative to the
𝐴𝑅(1)
model and normalized according to its results, i.e.,
𝑅𝑀 𝑆𝐸𝑀 𝑜 𝑑𝑒 𝑙
𝑅𝑀 𝑆𝐸𝐴 𝑅(1).
Table 5: HRNN(4) vs. IGRU(4) results for diﬀerent CPI sectors with respect to AR(1)
Industry HRNN(4)  IGRU(4)
Sector 
RMSE per horizon Correlation  RMSE per horizon Correlation
AR(1)=1.00 (at horizon=0)  AR(1)=1.00 (at horizon=0)
0 2 4 8 Pearson Distance  0 2 4 8 Pearson Distance
Apparel 0.83 0.87 0.84 0.88 0.04 0.19  0.88 0.88 0.85 0.92 0.05 0.23
Energy 0.94 0.96 0.99 0.98 0.34 0.32  0.94 0.98 1.02 0.99 0.18 0.28
Food & beverages 0.72 0.73 0.75 0.76 0.22 0.13  0.80 0.80 0.81 0.82 0.18 0.22
Housing 0.79 0.80 0.82 0.82 0.17 0.24  0.77 0.79 0.82 0.82 0.18 0.27
Medical care 0.79 0.82 0.81 0.82 0.03 0.17  0.79 0.83 0.83 0.84 0.08 0.15
Recreation 0.99 0.99 1.00 1.00 0.05 0.17  1.00 0.99 1.00 1.00 0.07 0.17
Services 0.90 0.92 0.95 0.94 0.04 0.15  0.89 0.94 0.95 0.96 0.02 0.21
Transportation 0.83 0.84 0.85 0.85 0.27 0.28  0.82 0.85 0.86 0.88 0.26 0.36
Notes: The RMSE results are relative to the
𝐴𝑅(1)
model and normalized according to its results, i.e.,
𝑅𝑀 𝑆𝐸𝑀 𝑜 𝑑𝑒 𝑙
𝑅𝑀 𝑆𝐸𝐴 𝑅(1).
Finally, Figure 7depicts speciﬁc examples of three disaggregated indexes: Tomatoes,
Bread, and Information Technology. The solid red line presents the actual CPI values.
The dashed green line presents HRNN(4) predictions, while the dotted blue line presents
the IGRU(4) predictions. These indexes are located down at the bottom of the CPI
hierarchy and suﬀer from relatively high volatility. The HRNN(4) model seems to
track and predict the trends of the real index accurately and often perform better then
IGRU(4). As can be seen, IGRU’s predictions appear to be more “conservative” than
HRNN. At ﬁrst, this may appear counterintuitive, as HRNN has more regularization
than IGRU. However, this additional regularization is actually informative regularization
coming from the parameters of the upper levels in the CPI hierarchy which allows the
HRNN model to be more expressive without overﬁtting. In contrast, in order to ensure
that IGRU does not overﬁt the training data, its other regularization techniques such
the learning rate hyperparameter and the early stopping procedure prevent the IGRU
model from becoming overconﬁdent. Figure 9and Figure 10 in Appendix Adepict
20
additional examples for a large variety of disaggregated CPI components.
Figure 7. Examples of HRNN(4) predictions for disaggregated indexes.
6.5. HRNN Dynamics
In what follows, we take a closer look at several characteristics of the HRNN model
that result from the nonstationary nature of the CPI. The HRNN model is a deep
learning hierarchical model that requires substantial training time depending on the
available hardware. In this work, the HRNN model was trained once using the training
dataset and evaluated on the test dataset as explained earlier. In order to investigate
the potential beneﬁt from retraining HRNN every quarter, we performed the following
experiment: For a testset period from 20012018, we retrained HRNN(4) after each
quarter, each time adding the hierarchical CPI values of the last 3 months. Figure 8
presents the results of the this experiment. The dashed green line presents the RMSE
21
of HRNN(4) with the “regular” training used in this work, while the dotted blue line
presents the results of retraining HRNN every quarter. As expected, in most cases,
retraining the model with additional data from the recent period improves the results.
However, this improvement is moderate and the overall model quality is about the
same.
Figure 8. The Eﬀect of Quarterly Retraining HRNN(4).
In order to study the GFC eﬀect on HRNN’s performance, we removed the data from
2008 onward and repeated the experiment of Table 2, using only the data from 1997 up
to 2008. The results of this experiment are summarized in Table 6. In terms of RMSE,
the gains of HRNN in Table 2vary from 0.78 up to 0.8, in contrast to Table 6where
the gains vary from 0.83 to 0.93, revealing that during the turmoil of the GFC, when
the demand for reliable and precise forecasting tools is enhanced, HRNN’s forecasting
abilities remain robust. In fact, its forecasting superiority was somewhat enhanced
during the GFC when compared to the AR(1) baseline.
7. Concluding Remarks
Policymakers have a wide range of predictive tools at their disposal to forecast
headline inﬂation: survey data, expert forecasts, inﬂation swaps, economic and econo
metric models, etc. However, policy institutions lack models and data to assist with
CPI components’ forecasting, which are essential for a deeper understanding of the
underlying dynamics. The understanding of disaggregated inﬂation trends can provide
insight into the nature of future inﬂation pressures, their transitory factors (seasonal
factors, energy, etc.), and other factors that inﬂuence marketmakers and the conduct
of monetary policy, among other decisionmakers. Hence, our hierarchical approach
uses endogenous historical data to forecast CPI at the disaggregated level, rather than
forecasting headline inﬂation, even if it performs well (Ibarra,2012).
The business cycle plays an important role in inﬂation dynamics, particularly
through speciﬁc product classes. CPI inﬂation dynamics are sometimes driven by
components unrelated to central bank policy objectives, such as food and energy prices,
for example. A disaggregated CPI forecast provides a more accurate picture of the
22
Table 6: Average Results on Disaggregated CPI Components Prior to The GFC
Model RMSE per horizon Correlation
Name* AR(1)=1.00 (at horizon=0)
0 1 2 3 4 8 Pearson Distance
AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.07 0.05
AR(2) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06
AR(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
AR(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
ARGAP(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07
ARGAP(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.10 0.07
RW(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.05 0.04
Phillips(4) 1.00 1.00 0.99 0.99 1.00 1.00 0.05 0.03
VAR(1) 1.04 1.04 1.04 1.05 1.05 1.06 0.04 0.03
VAR(2) 1.03 1.04 1.04 1.04 1.05 1.05 0.05 0.03
VAR(3) 1.03 1.03 1.03 1.04 1.04 1.05 0.06 0.03
VAR(4) 1.02 1.03 1.03 1.03 1.03 1.04 0.06 0.04
LSTAR(𝜌=4,𝑐=2,𝛾=0.3) 1.05 1.06 1.05 1.08 1.09 1.10 0.08 0.06
RF(4) 0.92 0.91 0.91 0.92 0.92 0.95 0.2 0.29
GBT(4) 0.91 0.92 0.91 0.93 0.92 0.97 0.18 0.34
FC(4) 0.99 0.99 1.00 1.00 1.02 1.05 0.11 0.08
DeepNN(4) 0.94 0.95 0.94 0.94 0.94 0.95 0.15 0.32
DeepNN(4) + unemployment 0.92 0.92 0.94 0.95 0.93 0.95 0.2 0.35
SGRU(4) 1.05 1.09 1.09 1.10 1.09 1.10 0.09 0.07
IGRU(4) 0.86 0.90 0.90 0.92 0.93 0.94 0.33 0.35
KNNGRU(1) 0.94 0.96 0.96 0.96 0.97 0.98 0.10 0.07
KNNGRU(2) 0.94 0.96 0.95 0.96 0.97 0.98 0.11 0.08
KNNGRU(3) 0.93 0.96 0.95 0.96 0.96 0.98 0.11 0.08
KNNGRU(4) 0.93 0.96 0.96 0.95 0.96 0.97 0.12 0.09
HRNN(1) 0.85 0.89 0.90 0.92 0.91 0.94 0.23 0.27
HRNN(2) 0.84 0.89 0.90 0.92 0.91 0.94 0.24 0.25
HRNN(3) 0.84 0.89 0.89 0.92 0.91 0.93 0.28 0.34
HRNN(4) 0.83 0.88 0.88 0.91 0.90 0.93 0.35 0.37
Notes: Average results across all 424 inﬂation indexes that make up the headline CPI. In contrast to
Table 2, here we focus on results up to the GFC of 2008. The RMSE results are relative to the
𝐴𝑅(1)
model
and normalized according to its results, i.e.,
𝑅𝑀 𝑆𝐸𝑀 𝑜 𝑑𝑒 𝑙
𝑅𝑀 𝑆𝐸𝐴 𝑅(1)
. Results are statistically signiﬁcant according to
DieboldMariano test with 𝑝<0.05.
sources and features of future inﬂation pressures in the economy, which in turn improves
policymakers’ response eﬃciency. Indeed, forecasting sectoral inﬂation may improve
the optimization problem faced by the central bank (Ida,2020).
While similar headline inﬂation forecasts may correspond to various underlying
economic factors, a disaggregated perspective allows understanding and analyzing the
decomposition of these inﬂation forecasts at the sectoral or component level. Instead of
disaggregating inﬂation to forecast the headline inﬂation (Stock and Watson,2020), our
approach allows policy and market makers to forecast speciﬁc sector and component
prices, where information is less available: almost no component or sectoralspeciﬁc
survey forecasts, expert forecast, or marketbased forecasts exist. For instance, a central
bank could use such modeling features to consider components that contribute to
inﬂation (military, food, cigarettes, and energy) unrelated to its primary inﬂation
objectives to improve their ﬁnal assessment of their inﬂation forecasts. Sectorspeciﬁc
inﬂation forecasts should also inform economic policy recommendations at the sectoral
23
level, and market makers can better direct and tune their investment strategies (Swinkels,
2018).
In traditional approaches for inﬂation forecasting, a theoretical or a linear model is
often used, which inevitably biases the estimated forecasts. Our novel approach may
overcome the usual shortcomings of traditional forecasts, giving policymakers new
insights from a “diﬀerent angle”. Disaggregated forecasts include explanatory variables
with hierarchies that reduce measurement errors at the component level. Additionally,
our model structure attenuates componentspeciﬁc residuals derived from each level
and sector, resulting in improved forecasting. For all these reasons, we believe that
HRNN can become a valuable tool for asset managers, policy institutions, and market
makers lacking componentspeciﬁc price forecasts critical to their decision processes.
The HRNN model was designed for predicting disaggregated CPI components,
however we believe its merits may come useful in the prediction of other hierarchical
time series such as GDP. In future work, we plan to investigate the performance of the
HRNN model on additional hierarchical time series. Moreover, in this paper we focused
mainly on endogenous models that do not consider other economic variables. HRNN
can naturally be extended to include diﬀerent variables as side information by changing
the input for the GRU components to be a multidimensional time series (instead of
a 1dimensional vector). In future work, we plan to experiment with additional side
information that can potentially improve the prediction accuracy. In particular, we
plan to experiment with online price data as in Aparicio and Bertolotto (2020). Finally,
we also plan to try to replace the RNNs in the model with neural selfattention (Shaw
et al.,2018). Hopefully, this should lead to improved accuracy and better explainability
through the analysis of attention scores (Hsieh et al.,2021).
References
Almosova, A., Andresen, N., 2019. Nonlinear inﬂation forecasting with recurrent neural networks.
Technical Report. European Central Bank (ECB).
Aparicio, D., Bertolotto, M.I., 2020. Forecasting inﬂation with online prices. International Journal of
Forecasting 36, 232–247.
Athey, Susan, 2018. The impact of machine learning on economics, in: The Economics of Artiﬁcial
Intelligence: An Agenda. University of Chicago Press, pp. 507–547.
Atkeson, A., Ohanian, E, L., 2001. Are phillips curves useful for forecasting inﬂation? Federal Reserve
Bank of Minneapolis Quarterly Review 25, 2–11.
Bernanke, B.S., Laubach, T., Mishkin, F.S., Posen, A.S., 2018. Inﬂation targeting: lessons from the
international experience. Princeton, NJ: Princeton University Press.
Breiman, L., 2001. Random forests. Machine Learning 45, 5–32.
Chakraborty, C., Joseph, A., 2017. Machine learning at central banks. Bank of England working papers,
number 674 .
Chen, X., Racine, J., Swanson, N.R., 2001. Semiparametric arx neuralnetwork models with an application
to forecasting inﬂation. IEEE Transactions on Neural Networks 12, 674–683.
Choudhary, M.A., Haider, A., 2012. Neural network models for inﬂation forecasting: an appraisal.
Applied Economics 44, 2631–2635.
Chung, J., Gulcehre, C., Cho, K., Bengio, Y., 2014. Empirical evaluation of gated recurrent neural networks
on sequence modeling. arXiv preprint arXiv:1412.3555 .
Clevert, D.A., Unterthiner, T., Hochreiter, S., 2016. Fast and accurate deep network learning by exponential
linear units (elus). arXiv: Learning .
Dey, R., Salemt, F.M., 2017. Gatevariants of gated recurrent unit (gru) neural networks, in: 2017 IEEE
60th International Midwest Symposium on Circuits and Systems (MWSCAS), IEEE. pp. 1597–1600.
24
Diebold, F.X., Mariano, R.S., 1995. Comparing Predictive Accuracy. Journal of Business & Economic
Statistics 13, 253–263.
van Dĳk, D., Terasvirta, T., Franses, P.H., 2002. Smooth Transition Autoregressive Models — A Survey Of
Recent Developments. Econometric Reviews 21, 1–47.
Faust, J., Wright, J.H., 2013. Forecasting Inﬂation, in: Elliott, G., Granger, C., Timmermann, A. (Eds.),
Handbook of Economic Forecasting. Elsevier. volume 2 of Handbook of Economic Forecasting. chapter 1,
pp. 2–56.
Friedman, J.H., 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 367–378.
Friedman, M., 1961. The Lag in Eﬀect of Monetary Policy. Journal of Political Economy 69, 447–447.
Gilchrist, S., Schoenle, R., Sim, J., Zakrajšek, E., 2017. Inﬂation dynamics during the ﬁnancial crisis.
American Economic Review 107, 785–823.
Goulet Coulombe, P., 2020. To bag is to prune. arXiv eprints , arXiv–2008.
Goulet Coulombe, P., Leroux, M., Stevanovic, D., Surprenant, S., 2022. How is machine learning useful
for macroeconomic forecasting? Journal of Applied Econometrics, forthcoming .
Hansen, P.R., Lunde, A., Nason, J.M., 2011. The model conﬁdence set. Econometrica 79, 453–497.
Hochreiter, S., Schmidhuber, J., 1997. Long shortterm memory. Neural Computation 9, 1735–1780.
Hsieh, T.Y., Wang, S., Sun, Y., Honavar, V., 2021. Explainable multivariate time series classiﬁcation:
A deep neural network which learns to attend to important variables as well as time intervals, in:
Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 607–615.
Ibarra, R., 2012. Do disaggregated CPI data improve the accuracy of inﬂation forecasts? Economic
Modelling 29, 1305–1313.
Ida, D., 2020. Sectoral inﬂation persistence and optimal monetary policy. Journal of Macroeconomics 65.
Lipton, Z.C., Berkowitz, J., Elkan, C., 2015. A critical review of recurrent neural networks for sequence
learning. CoRR .
Makridakis, S., Assimakopoulos, V., Spiliotis, E., 2018. Objectivity, reproducibility and replicability in
forecasting research. International Journal of Forecasting 34, 835–838.
Makridakis, S., Spiliotis, E., Assimakopoulos, V., 2020. The m4 competition: 100,000 time series and 61
forecasting methods. International Journal of Forecasting 36, 54–74.
Mandic, D., Chambers, J., 2001. Recurrent neural networks for prediction: learning algorithms,
architectures and stability. Wiley.
McAdam, P., McNelis, P., 2005. Forecasting inﬂation with thick models and neural networks. Economic
Modelling 22, 848–867.
Medeiros, M., Vasconcelos, G., Veiga, A., Zilberman, E., 2021. Forecasting inﬂation in a datarich
environment: the beneﬁts of machine learning methods. Journal of Business & Economic Statistics 39.
Mullainathan, S., Spiess, J., 2017. Machine learning: an applied econometric approach. Journal of
Economic Perspectives 31, 87–106.
Nakamura, E., 2005. Inﬂation forecasting using a neural network. Economics Letters 86, 373–378.
Olson, M., Wyner, A.J., Berk, R., 2018. Modern neural networks generalize on small data sets, in:
Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp.
3623–3632.
Ramachandran, P., Zoph, B., Le, Q.V., 2017. Searching for activation functions. CoRR .
Schapire, R.E., 1999. A brief introduction to boosting, in: Proceedings of the 16th International Joint
Conference on Artiﬁcial Intelligence  Volume 2, Morgan Kaufmann Publishers Inc., San Francisco,
CA, USA. p. 1401–1406.
Shaw, P., Uszkoreit, J., Vaswani, A., 2018. Selfattention with relative position representations. arXiv
preprint arXiv:1803.02155 .
Song, Y.Y., Ying, L., 2015. Decision tree methods: applications for classiﬁcation and prediction. Shanghai
archives of psychiatry 27, 130.
Stock, J.H., Watson, M.W., 2007. Why has us inﬂation become harder to forecast? Journal of Money,
Credit and Banking 39, 3–33.
Stock, J.H., Watson, M.W., 2010. Modeling inﬂation after the crisis. Technical Report. National Bureau of
Economic Research.
Stock, J.H., Watson, M.W., 2020. Trend, Seasonal, and Sectorial Inﬂation in the Euro Area, in: Castex, G.,
Galí, J., Saravia, D. (Eds.), Changing Inﬂation Dynamics,Evolving Monetary Policy. Central Bank of
Chile. volume 27 of Central Banking, Analysis, and Economic Policies Book Series. chapter 9, pp. 317–344.
25
Swinkels, L., 2018. Simulating historical inﬂationlinked bond returns. Journal of Empirical Finance 48,
374–389.
Székely, G.J., Rizzo, M.L., Bakirov, N.K., 2007. Measuring and testing dependence by correlation of
distances. Annals of Statistics 35, 2769–2794.
Woodford, M., 2012. Inﬂation targeting and ﬁnancial stability. Sveriges Riksbank Economic Review 1,
7–32.
Yu, Y., Si, X., Hu, C., Zhang, J., 2019. A review of recurrent neural networks: LSTM cells and network
architectures. Neural Computation 31, 1235–1270.
Zahara, S., Ilmiddaviq, M., et al., 2020. Consumer price index prediction using long short term memory
(lstm) based cloud computing, in: Journal of Physics: Conference Series, IOP Publishing. p. 012022.
Zhou, Z., 2012. Measuring nonlinear dependence in timeseries, a distance correlation approach. Journal
of Time Series Analysis 33, 438–457.
26
Appendix A Additional Tables and Figures
Table 7: Indexes Level 0 And 1
Level Index Parent
0 All items 
1 All items less energy All items
1 All items less food All items
1 All items less food and energy All items
1 All items less food and shelter All items
1 All items less food, shelter, and energy All items
1 All items less food, shelter, energy, and used cars and trucks All items
1 All items less homeowners costs All items
1 All items less medical care All items
1 All items less shelter All items
1 Apparel All items
1 Apparel less footwear All items
1 Commodities All items
1 Commodities less food All items
1 Durables All items
1 Education and communication All items
1 Energy All items
1 Entertainment All items
1 Food All items
1 Food and beverages All items
1 Fuels and utilities All items
1 Household furnishings and operations All items
1 Housing All items
1 Medical care All items
1 Nondurables All items
1 Nondurables less food All items
1 Nondurables less food and apparel All items
1 Other goods and services All items
1 Other services All items
1 Recreation All items
1 Services All items
1 Services less medical care services All items
1 Services less rent of shelter All items
1 Transportation All items
1 Utilities and public transportation All items
Note: Levels and Parents of Indexes might change through time
27
Table 8: Indexes Level 2
Level Index Parent
2 All items less food and energy All items less energy
2 Apparel commodities Apparel
2 Apparel services Apparel
2 Commodities less food Commodities
2 Commodities less food and beverages Commodities
2 Commodities less food and energy commodities All items less food and energy
2 Commodities less food, energy, and used cars and trucks Commodities
2 Communication Education and communication
2 Domestically produced farm food Food and beverages
2 Education Education and communication
2 Energy commodities Energy
2 Energy services Energy
2 Entertainment commodities Entertainment
2 Entertainment services Entertainment
2 Food Food and beverages
2 Food at home Food
2 Food away from home Food
2 Footwear Apparel
2 Fuels and utilities Housing
2 Homeowners costs Housing
2 Household energy Fuels and utilities
2 Household furnishings and operations Housing
2 Infants’ and toddlers’ apparel Apparel
2 Medical care commodities Medical care
2 Medical care services Medical care
2 Men’s and boys’ apparel Apparel
2 Nondurables less food Nondurables
2 Nondurables less food and apparel Nondurables
2 Nondurables less food and beverages Nondurables
2 Nondurables less food, beverages, and apparel Nondurables
2 Other services Services
2 Personal and educational expenses Other goods and services
2 Personal care Other goods and services
2 Pets, pet products and services Recreation
2 Photography Recreation
2 Private transportation Transportation
2 Public transportation Transportation
2 Rent of shelter Services
2 Services less energy services All items less food and energy
2 Services less medical care services Services
2 Services less rent of shelter Services
2 Shelter Housing
2 Tobacco and smoking products Other goods and services
2 Transportation services Services
2 Video and audio Recreation
2 Women’s and girls’ apparel Apparel
Note: Levels and Parents of Indexes have changed over the years.
28
Figure 9. Additional Examples of HRNN(4) predictions for disaggregated indexes
(a) Admission to movies, theaters, and
concerts (b) Alcoholic beverages (c) Bacon and related products
(d) Education (e) Film processing (f) Financial services
(g) Fruits and vegetables (h) Gasoline, unleaded regular (i) Haircuts and other personal care services
(j) Hospital services (k) Household energy (l) Household operations
29
Figure 10. Additional Examples of HRNN(4) predictions for disaggregated indexes (diﬀerent hierarchies and sectors)
(a) Housing (b) Intercity train fare (c) Jewelry
(d) Medical care commodities (e) Medical care (f) Motor oil, coolant, and ﬂuids
(g) Motor vehicle insurance (h) Nonprescription drugs (i) Private transportation
(j) Sports equipment (k) White bread (l) Women’s apparel
Figures 1324, indexes were selected from diﬀerent hierarchies and sectors
30