Content uploaded by Jonathan Benchimol

Author content

All content in this area was uploaded by Jonathan Benchimol on Jul 29, 2023

Content may be subject to copyright.

International Journal of Forecasting 39 (2023) 1145–1162

Contents lists available at ScienceDirect

International Journal of Forecasting

journal homepage: www.elsevier.com/locate/ijforecast

Forecasting CPI inflation components with Hierarchical

Recurrent Neural Networks✩

Oren Barkan a, Jonathan Benchimol b, Itamar Caspi b, Eliya Cohen c,

Allon Hammer c, Noam Koenigstein c,∗

aDepartment of Computer Science, The Open University, Israel

bResearch Department, Bank of Israel, Israel

cIby and Aladar Fleischman Faculty of Engineering, Tel Aviv University, Israel

article info

Keywords:

Inflation Forecasting

Disaggregated Inflation

Consumer Price Index

Machine Learning

Gated Recurrent Unit

Recurrent Neural Networks

abstract

We present a hierarchical architecture based on recurrent neural networks for predicting

disaggregated inflation components of the Consumer Price Index (CPI). While the

majority of existing research is focused on predicting headline inflation, many economic

and financial institutions are interested in its partial disaggregated components. To this

end, we developed the novel Hierarchical Recurrent Neural Network (HRNN) model,

which utilizes information from higher levels in the CPI hierarchy to improve predictions

at the more volatile lower levels. Based on a large dataset from the US CPI-U index, our

evaluations indicate that the HRNN model significantly outperforms a vast array of well-

known inflation prediction baselines. Our methodology and results provide additional

forecasting measures and possibilities to policy and market makers on sectoral and

component-specific price changes.

©2022 The Authors. Published by Elsevier B.V. on behalf of International Institute of

Forecasters. This is an open access article under the CC BY license

(http://creativecommons.org/licenses/by/4.0/).

1. Introduction

The consumer price index (CPI) is a measure of the av-

erage change over time in the prices paid by a representa-

tive consumer for a common basket of goods and services.

The CPI attempts to quantify and measure the average

cost of living in a given country by estimating the pur-

chasing power of a single unit of currency. Therefore, it is

the key macroeconomic indicator for measuring inflation

(or deflation). As such, the CPI is a major driving force in

the economy, influencing a plethora of market dynamics.

In this work, we present a novel model based on recurrent

neural networks (RNNs) for forecasting disaggregated CPI

inflation components.

✩The views expressed in this paper are those of the authors and

do not necessarily reflect the views of the Bank of Israel.

∗Corresponding author.

E-mail address: noamk@tauex.tau.ac.il (N. Koenigstein).

In the mid-1980s, many advanced economies began

a major process of disinflation known as the Great

Moderation. This period was characterized by steady low

inflation and moderate yet steady economic growth (Faust

& Wright,2013). Later, the Global Financial Crisis (GFC)

of 2008, and more recently the economic effects of the

Covid-19 pandemic, were met with unprecedented mon-

etary policies, potentially altering the underlying inflation

dynamics worldwide (Bernanke et al.,2018;Gilchrist

et al.,2017;Woodford,2012). While economists still

debate the underlying forces that drive inflation, all agree

on the importance and value of contemporary inflation re-

search, measurements, and estimation. Moreover, the CPI

is a composite index comprising an elaborate hierarchy

of sub-indexes each with its own dynamics and driving

forces. Hence, in order to better understand inflation dy-

namics, it is useful to deconstruct the CPI index and look

into the specific disaggregated components underneath

the main headline.

https://doi.org/10.1016/j.ijforecast.2022.04.009

0169-2070/©2022 The Authors. Published by Elsevier B.V. on behalf of International Institute of Forecasters. This is an open access article under

the CC BY license (http://creativecommons.org/licenses/by/4.0/).

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

In the US, the CPI is calculated and reported by the

Bureau of Labor Statistics (BLS). It represents the cost of

a basket of goods and services across the country on a

monthly basis. The CPI is a hierarchical composite index

system that partitions all consumer goods and services

into a hierarchy of increasingly detailed categories. In

the US, the top CPI headline is composed of eight major

sector indexes: (1) Housing, (2) Food and Beverages, (3)

Medical Care, (4) Apparel, (5) Transportation, (6) Energy,

(7) Recreation, and (8) Other Goods and Services. Each

sector is composed of finer and finer sub-indexes until the

entry levels or ‘‘leaves’’ are reached. These entry-level in-

dexes represent concrete measurable products or services

whose price levels are being tracked. For example, the

White Bread entry is classified under the following eight-

level hierarchy: All Items →Food and Beverages →Food

at Home →Cereals and Bakery Products →Cereals and

Cereal Products →Bakery Products →Bread →White

Bread.

The ability to accurately estimate the upcoming disag-

gregated inflation rate is of high interest to policymakers

and market players: Inflation forecasting is a critical tool

for adjusting monetary policies around the world (Fried-

man,1961). Central banks predict future inflation trends

to justify interest rate decisions and to control and main-

tain inflation around their targets. Better understanding

of upcoming inflation dynamics at the component level

can help inform and elucidate decision-makers for opti-

mal monetary policy (Ida,2020). Predicting disaggregated

inflation rates is also important to fiscal authorities that

wish to forecast sectoral inflation dynamics to adjust so-

cial security payments and assistance packages to specific

industrial sectors. In the private sector, investors in fixed-

income markets wish to estimate future sectorial inflation

in order to foresee upcoming trends in discounted real

returns. Additionally, some private firms need to pre-

dict specific inflation components in order to forecast

price dynamics and mitigate risks accordingly. Finally,

both government and private debt levels and interest

payments heavily depend on the expected path of infla-

tion. These are just a few examples that emphasize the

importance of disaggregated inflation forecasting.

Most existing inflation forecasting models attempt to

predict the headline CPI while implicitly assuming that

the same approach can be effectively applied to its dis-

aggregated components (Faust & Wright,2013). How-

ever, as we show below, and in line with the litera-

ture, the disaggregated components are more volatile and

harder to predict. Moreover, changes in the CPI compo-

nents are more prevalent at the lower levels than up at

the main categories. As a result, lower hierarchy levels

often have fewer historical measurements for training

modern machine learning algorithms.

In this work, we present the hierarchical recurrent

neural network (HRNN) model, a novel model based on

RNNs that utilizes the CPI’s inherent hierarchy for im-

proved predictions at its lower levels. The HRNN is a

hierarchical arrangement of RNNs analogous to the CPI’s

hierarchy. This architecture allows information to prop-

agate from higher to lower levels in order to mitigate

volatility and information sparsity that otherwise im-

pedes advanced machine learning approaches. Hence, a

key advantage of the HRNN model stems from its supe-

riority at inflation predictions at lower levels of the CPI

hierarchy. Our evaluations indicate that the HRNN out-

performs many existing baselines at inflation forecasting

of different CPI components below the top headline and

across different time horizons.

Finally, our data and code are publicly available on

GitHub1to enable reproducibility and foster future eval-

uations of new methods. By doing so, we comply with

the call to make data and algorithms more open and

transparent to the community (Makridakis et al.,2018,

2020).

The remainder of the paper is organized as follows.

Section 2presents a literature review of baseline infla-

tion forecasting models and machine learning models.

Section 3explains RNN methodologies. Our novel HRNN

model is presented in Section 4. Section 5describes the

price data and data transformations. In Section 6, we

present our results and compare them to alternative ap-

proaches. Finally, we conclude in Section 7by discussing

potential implications of the current research and several

future directions.

2. Related work

While inflation forecasting is a challenging task of high

importance, the literature indicates that significant im-

provement upon basic time-series models and heuristics

is hard to achieve. Indeed, Atkeson & Ohanian (2001)

found that forecasts based on simple averages of past

inflation were more accurate than all other alternatives,

including the canonical Phillips curve and other forms

of structural models. Similarly, Stock & Watson (2007,

2010) provided empirical evidence for the superiority of

univariate models in forecasting inflation during the Great

Moderation period (1985 to 2007) and during the re-

covery following the GFC. More recently, Faust & Wright

(2013) conducted an extensive survey of inflation fore-

casting methods and found that a simple ‘‘glide path’’

prediction from the current inflation rate performs as well

as model-based forecasts for long-run inflation rates and

often outperforms them.

Recently, an increasing amount of effort has been di-

rected towards the application of machine learning

models for inflation forecasting. For example, Medeiros

et al. (2021) compared inflation forecasting with several

machine learning models such as lasso regression, ran-

dom forests, and deep neural networks. However,

Medeiros et al. (2021) mainly focused on using exogenous

features such as cash and credit availability, online prices,

housing prices, consumer data, exchange rates, and inter-

est rates. When exogenous features are considered, the

emphasis shifts from learning the endogenous time series

patterns to effectively extracting the predictive informa-

tion from the exogenous features. In contrast to Medeiros

et al. (2021), we preclude the use of any exogenous

features and focus on harnessing the internal patterns of

1The code and data are available at https://github.com/

AllonHammer/CPI_HRNN.

1146

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

the CPI series. Moreover, unlike previous works that dealt

with estimating the main headline, this work is focused

on predicting the disaggregated indexes that comprise the

CPI.

In general, machine learning methods flourish where

data are found in abundance and many training examples

are available. Unfortunately, this is not the case with CPI

inflation data. While a large amount of relevant exoge-

nous features exists, there are only 12 monthly readings

annually. Hence, the amount of available training exam-

ples is limited. Furthermore, Stock & Watson (2007) show

that statistics such as the average inflation rate, condi-

tional volatility, and persistency levels are shifting in time.

Hence, inflation is a non-stationary process, which further

limits the amount of relevant historical data points.

Goulet Coulombe et al. (2022), Chakraborty & Joseph

(2017), Athey & Susan (2018), and Mullainathan & Spiess

(2017) present comprehensive surveys of general ma-

chine learning applications in economics. Here, we do not

attempt to cover the plethora of research employing ma-

chine learning for economic forecasting. Instead, we focus

on models that apply neural networks to CPI forecasting

in the next section.

This paper joins several studies that apply neural net-

work methods to the specific task of inflation forecasting:

Nakamura (2005) employed a simple feed-forward net-

work to predict quarterly CPI headline values. Special em-

phasis is placed on early stopping methodologies in order

to prevent over-fitting. Their evaluations are based on US

CPI data from 1978–2003, and predictions are compared

against several autoregressive (AR) baselines. Presented in

Section 6, our evaluations confirm the findings of Naka-

mura (2005), that a fully connected network is indeed

effective at predicting the headline CPI. However, when

the CPI components are considered, we show that the

model in this work demonstrates superior accuracy.

Choudhary & Haider (2012) used several neural net-

works to forecast monthly inflation rates in 28 countries

in the Organisation for Economic Cooperation and De-

velopment (OECD). Their findings showed that, on aver-

age, neural network models were superior in 45% of the

countries while simple AR models of order one (AR1) per-

formed better in 23% of the countries. They also proposed

combining an ensemble of multiple networks arithmeti-

cally for further accuracy.

Chen et al. (2001) explored semi-parametric nonlinear

autoregressive models with exogenous variables (NLARX)

based on neural networks. Their investigation covered

a comparison of different nonlinear activation functions

such as the sigmoid activation, radial basis activation, and

ridgelet activation.

McAdam & McNelis (2005) explored thick neural net-

work models that represent trimmed-mean forecasts from

several models. By combining the network with a linear

Phillips curve model, they predict the CPI for the US,

Japan, and Europe at different levels.

In contrast to the aforementioned works, our model

predicts monthly CPI values in all hierarchy levels. We

utilize information patterns from higher levels of the CPI

hierarchy in order to assist the predictions at lower levels.

Such predictions are more challenging due to the inherent

noise and information sparsity at the lower levels. More-

over, the HRNN model in this work is better equipped

to harness sequential patterns in the data by employing

recurrent neural networks. Finally, we exclude the use of

exogenous variables and rely solely on historical CPI data

to focus on internal CPI pattern modeling.

Almosova & Andresen (2019) employed long short-

term memory (LSTM) for inflation forecasting. They com-

pared their approach to multiple baselines such as

autoregressive models, random walk models, seasonal

autoregressive models, Markov switching models, and

fully connected neural networks. At all time horizons,

the root mean squared forecast of their LSTM model was

approximately one-third of the random walk model and

significantly more accurate than the other baselines.

As we explain in Section 3.3, our model uses gated

recurrent networks (GRUs), which are similar to LSTMs.

Unlike Almosova & Andresen (2019) and Zahara et al.

(2020), a key contribution of our model stems from its

ability to propagate useful information from higher levels

in the hierarchy down to the nodes at lower levels. By

ignoring the hierarchical relations between the different

CPI components, our model is reduced to a set of simple,

unrelated GRUs. This setup is similar to Almosova & An-

dresen (2019), as the difference between LSTMs and GRUs

is negligible. In Section 6, we perform an ablation study

in which the HRNN ignores the hierarchical relations and

is reduced to a collection of independent GRUs, similar

to the model in Almosova & Andresen (2019). Our eval-

uations indicate that this approach is not optimal at any

level of the CPI hierarchy.

3. Recurrent neural networks

Before describing the HRNN model in detail, we briefly

overview the main RNN approaches. RNNs are neural net-

works that model sequences of data in which each value is

assumed to be dependent on previous values. Specifically,

RNNs are feed-forward networks augmented by imple-

menting a feedback loop (Mandic & Chambers,2001). As

such, RNNs introduce a notion of time to the standard

feed-forward neural networks and excel at modeling tem-

poral dynamic behavior (Chung et al.,2014). Some RNN

units retain an internal memory state from previous time

steps representing an arbitrarily long context window.

Many RNN implementations were proposed and studied

in the past. A comprehensive review and comparison of

the different RNN architectures is available in Chung et al.

(2014) and Lipton et al. (2015). In this section, we cover

the three most popular units: basic RNNs, long short-term

memory (LSTM), and gated recurrent units (GRUs).

3.1. Basic recurrent neural networks

Let {xt}T

t=1be the model’s input time series consisting

of Tsamples. Similarly, let {st}T

t=1be the model’s results

consisting of Tsamples from the target time series. The

model’s input at tis xt, and its output (prediction) is st.

The following set of equations defines a basic RNN unit:

st=tanh (xtu+st−1w+b),(1)

1147

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Fig. 1. Illustration of a basic RNN unit.

Each line carries an entire vector, from the output of one node to the

inputs of others. The yellow box is a learned neural network layer.

where u,w, and bare the model’s parameters, and tanh(x)

=ex−e−x

ex+e−xis the hyperbolic tangent function. The model’s

output from the previous period st−1is used as an ad-

ditional input to the model at time t, along with the

current input xt. The linear combination xtu+st−1w+bis

the argument of a hyperbolic tangent activation function

allowing the unit to model nonlinear relations between

inputs and outputs. Different implementations may em-

ploy other activation functions, e.g., the sigmoid function,

some logistic functions, or a rectified linear unit (ReLU)

function (Ramachandran et al.,2017). Fig. 1 depicts a basic

RNN unit.

3.2. Long short-term memory networks

Basic RNNs suffer from the ‘‘short-term memory’’ prob-

lem: they utilize data from recent history to forecast, but

if a sequence is long enough, it cannot carry relevant

information from earlier periods to later ones, e.g., rel-

evant patterns from the same month in previous years.

Long short-term memory networks (LSTMs) mitigate the

‘‘short-term memory’’ problem by introducing gates that

enable the preservation of relevant ‘‘long-term memory’’

and combining it with the most recent data (Hochreiter

& Schmidhuber,1997). The introduction of LSTMs paved

the way for significant strides forward in various fields,

such as natural language processing, speech recognition,

and robot control (Yu et al.,2019).

An LSTM unit has the ability to ‘‘memorize’’ or ‘‘forget’’

information through the use of a special memory cell state,

carefully regulated by three gates: an input gate, a forget

gate, and an output gate. The gates regulate the flow of

information into and out of the memory cell state. An

LSTM unit is defined by the following set of equations:

i=σ(xtui+st−1wi+bi),

f=σ(xtuf+st−1wf+bf),

o=σ(xtuo+st−1wo+bo),

˜

c=tanh (xtuc+st−1wc+bc),

ct=f×ct−1+i×˜

c,

st=o×tanh(ct),

(2)

where σ(x)=1

1+e−xis the sigmoid or logistic activation

function; ui,wi, and biare the learned parameters that

control the input gate i;uf,wf, and bfare the learned

parameters that control the forget gate f;uo,wo, and

boare the learned parameters that control the output

gate o; and ˜

cis the new candidate activation for the cell

state determined by the parameters uc,wc, and bc. The

cell state ctis itself updated by the linear combination

ct=f×ct−1+i×˜

c, where ct−1is its previous value

of the cell state. The input gate idetermines which parts

of the candidate ˜

cshould be used to modify the memory

cell state, and the forget gate fdetermines which parts

of the previous memory ct−1should be discarded. Finally,

the recently updated cell state ctis ‘‘squashed’’ through

a nonlinear hyperbolic tangent, and the output gate o

determines which parts of it should be presented in the

output st.Fig. 2 depicts an LSTM unit.

3.3. Gated recurrent units

A gated recurrent unit (GRU) improves the LSTM unit

by dropping the cell state in favor of a more simplified unit

that requires fewer learnable parameters (Dey & Salemt,

2017). GRUs employ only two gates instead of three:

an update gate and a reset gate. Using fewer parame-

ters, GRUs are faster and more efficient, especially when

training data are limited, such as in the case of infla-

tion predictions and particularly disaggregated inflation

components.

The following set of equations defines a GRU unit:

z=σ(xtuz+st−1wz+bz),

r=σ(xtur+st−1wr+br),

v=tanh (xtuv+(st−1×r)wv+bv),

st=z×v+(1 −z)st−1,

(3)

where uz,wz, and bzare the learned parameters that

control the update gate z, and ur,wr, and brare the

learned parameters that control the reset gate r. The

candidate activation vis a function of the input xtand

the previous output st−1, and is controlled by the learned

parameters: uv,wv, and bv. Finally, the output stcombines

the candidate activation v, and the previous state st−1

controlled by the update gate z.Fig. 3 depicts a GRU unit.

GRUs enable the ‘‘memorization’’ of relevant informa-

tion patterns with significantly fewer parameters com-

pared to LSTMs (see Fig. 2). Hence, GRUs constitute the

basic unit for our novel HRNN model described in Sec-

tion 4.

4. Hierarchical recurrent neural networks

The disaggregated components at lower levels of the

CPI hierarchy (e.g., newspapers, medical care, etc.) suffer

from missing data as well as higher volatility in change

rates. The HRNN exhibits a network graph in which each

node is associated with an RNN unit that models the

inflation rate of a specific (sub-) index (node) in the full

CPI hierarchy. The HRNN’s unique architecture allows it to

propagate information from RNN nodes in higher levels to

lower levels in the CPI hierarchy, coarse to fine grained,

via a chain of hierarchical informative priors over the

RNNs’ parameters. This unique property of the HRNN

is materialized in better predictions for nodes at lower

levels of the hierarchy, as we show in Section 6,

1148

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Fig. 2. Illustration of an LSTM unit.

Each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent point-wise operations, while the

yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denotes its content being copied and the

copies going to different locations.

Fig. 3. Illustration of a GRU unit.

Each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent point-wise operations, while the

yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denotes its content being copied and the

copies going to different locations.

4.1. Model formulation

Let I= {n}N

n=1be an enumeration of the nodes in the

CPI hierarchy graph. In addition, we define πn∈Ias

the parent node of the node n. For example, if the nodes

n=5 and n=19 represent the indexes of Tomatoes

and Vegetables, respectively, then π5=19 (i.e., the parent

node of Tomatoes is Vegetables).

For each node n∈I, we denote by xn

t∈Rthe

observed random variable that represents the CPI value

of the node nat timestamp t∈N. We further denote

Xn

t≜(xn

1,...,xn

t), where 1 ≤t≤Tn, and Tnis the

last timestamp for node n. Let g:Rm×Ω→Rbe

a parametric function representing an RNN node in the

hierarchy. Specifically, Rmis the space of parameters that

control the RNN unit, Ωis the input time series space, and

the function gpredicts a scalar value for the next value of

the input series. Hence, our goal is to learn the parameters

θn∈Rmsuch that for Xn

t∈Ω,g(θn,Xn

t)=xn

t+1,∀n∈I,

and 1 ≤t<Tn.

We proceed by assuming a Gaussian error on g’s pre-

dictions and receive the following expression for the like-

lihood of the observed time series:

p(Xn

Tn|θn, τn)=

Tn

t=1

p(xn

t|Xn

t−1, θn, τn)

=

Tn

t=1

N(xn

t;g(θn,Xn

t−1), τ −1

n),(4)

where τ−1

n∈Ris the variance of g’s errors.

Next, we define a hierarchical network of normal pri-

ors over the nodes’ parameters that attach each node’s

parameters with those of its parent node. The hierarchical

1149

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

priors follow

p(θn|θπn, τθn)=N(θn;θπn, τ −1

θnI),(5)

where τθnis a configurable precision parameter that de-

termines the ‘‘strength’’ of the relation between node n’s

parameters and the parameters of its parent πn. Higher

values of τθnstrengthen the attachment between θnand

its prior θπn.

The precision parameter τθncan be seen as a global

hyperparameter of the model to be optimized via cross-

validation. However, different nodes in the CPI hierar-

chy have varying degrees of correlation with their parent

nodes. Hence, the value of τθnin HRNN is given by

τθn=eα+Cn,(6)

where αis a hyperparameter, and Cn=ρ(Xn

Tn,Xπn

Tπn) is the

Pearson correlation coefficient between the time series of

nand its parent πn.

Importantly, Eq. (5) describes a novel prior relationship

between the parameters of a node and its parent in the

hierarchy that grows increasingly stronger according to

the historical correlation between the two series. This

ensures that a child node nis kept close to its parent

node πnin terms of the squared Euclidean distance in the

parameter space, especially if they are highly correlated.

Note that in the case of the root node (the headline

CPI), πndoes not exist and hence we set a normal non-

informative regularization prior with zero mean and unit

variance.

Let us now denote the aggregation of all series from

all levels by X= {Xn

Tn}n∈I. Similarly, we denote by θ=

{θn}n∈Iand T = {τn}n∈Ithe aggregation of all the RNN pa-

rameters and precision parameters from all levels, respec-

tively. Note that X(the data) is observed, θdenotes unob-

served learned variables, and T is determined by Eq. (6).

The hyperparameter αfrom Eq. (6) is set by a cross-

validation procedure.

With these definitions at hand, we now proceed with

the Bayes rule. From Eq. (4) and Eq. (5), we extract the

posterior probability:

p(θ|X,T) =p(X|θ , T)p(θ)

P(X)∝

n∈I

Tn

t=1

N(xn

t;g(θn,Xn

t−1), τ −1

n)

n∈I

N(θn;θπn, τ −1

θnI).

(7)

HRNN optimization follows a maximum a posteriori (MAP)

approach. Namely, we wish to find optimal parameter

values θ∗, such that

θ∗=argmax

θ

log p(θ|X,T).(8)

Note that the objective in Eq. (8) depends on the

parametric function g. The HRNN is a general framework

that can use any RNN, e.g., a simple RNN, LSTM, GRU,

etc. In this work, we chose gto be a scalar GRU be-

cause GRUs are capable of long-term memory but with

fewer parameters than LSTMs. Hence, each node nis

associated with a GRU with its own parameters: θn=

[uz

n,ur

n,uv

n, wz

n, wr

n, wv

n,bz

n,br

n,bv

n]. Then, g(θn,Xn

t) is com-

puted by tsuccessive applications of the GRU to xn

iwith

1≤i≤taccording to Eq. (3). Finally, the HRNN op-

timization proceeds with stochastic gradient ascent over

the objective in Eq. (8).Fig. 4 depicts the entire HRNN

architecture.

4.2. HRNN inference

In machine learning, after the model’s parameters have

been estimated in the training process, the model can

be applied to make predictions in a process known as

inference. In our case, equipped with the MAP estimate

θ∗, inference with the HRNN model is achieved as follows:

Given a sequence of historical CPI values Xn

tfor node n, we

predict the next CPI value yn

t+1=g(θn,Xn

t), as explained

in . This type of prediction is for next month’s CPI, namely,

horizon h=0. In this work, we also tested the ability

of the model to perform predictions for further horizons

h∈ {0,...,8}. The h-horizon predictions are obtained in a

recursive manner, whereby each predicted value yn

tis fed

back as an input for the prediction of yn

t+1. As expected,

Section 6shows that the forecasting accuracy gradually

degrades as horizon hincreases.

5. Dataset

This work is based on monthly CPI data released by

the US Bureau of Labor and Statistics (BLS). In what fol-

lows, we discuss the dataset’s characteristics and our

pre-processing procedures. For the sake of reproducibility,

the final version of the processed data is available in our

HRNN code.

5.1. The US consumer price index

The official CPI of each month is released by the BLS

several days into the following month. The price tags are

collected in 75 urban areas throughout the US from about

24,000 retail and service establishments. The housing and

rent rates are collected from about 50,000 landlords and

tenants across the country. The BLS releases two different

measurements according to urban demographics:

1. The CPI-U represents the CPI for urban consumers

and covers approximately 93% of the total pop-

ulation. According to the Consumer Expenditure

Survey, the CPI items and their relative weights are

derived from their estimated expenditure. These

items and their weights are updated each year in

January.

2. The CPI-W represents the CPI for urban wage earn-

ers and clerical workers and covers about 29% of

the population. This index is focused on households

with at least 50% of income coming from clerical or

wage-paying jobs, and at least one of the house-

hold’s earners must have been employed for at

least 70% of the year. The CPI-W indicates changes

in the cost of benefits, as well as future contract

obligations.

1150

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Fig. 4. Illustration of the full HRNN model.

In this work, we focus on CPI-U, as it is generally con-

sidered the best measure for the average cost of living

in the US. Monthly CPI-U data per product are gener-

ally available from January 1994. Our samples thus span

from January 1994 to March 2019. Note that throughout

the years, new indexes were added, and some indexes

have been omitted. Consequently, hierarchies can change,

which contributes to the challenge of our exercise.

5.2. The CPI hierarchy

The CPI-U is an eight-level-deep hierarchy compris-

ing 424 different nodes (indexes). Level 0 represents the

headline CPI, or the aggregated index of all components.

An index at any level is associated with a weight be-

tween 0 and 100, which represents its contribution to the

headline CPI at level 0. Level 1 consists of the eight main

aggregated categories or sectors: (1) Food and Beverages,

(2) Housing, (3) Apparel, (4) Transportation, (5) Medical

Care, (6) Recreation, (7) Education and Communication,

and (8) Other Goods and Services. Mid-levels (2–5) consist

of more specific aggregations, e.g., Energy Commodities,

Household Insurance, etc. The lower levels (6–8) consist

of fine-grained indexes, e.g., Apples, Bacon and Related

Products, Eyeglasses and Eye Care, Tires, Airline Fares, etc.

Tables 7 and 8(in Appendix) depict the first three

hierarchies of the CPI (levels 0–2).

5.3. Data preparation

We used publicly available data from the BLS website.2

However, the BLS releases hierarchical data on a monthly

basis in separate files. Hence, separate monthly files from

January 1994 until March 2019 were processed and aggre-

gated to create a single repository. Moreover, the format

of these files has changed over the years (e.g., txt, pdf,

and csv formats were all in use) and significant effort was

made to parse the changing formats from different time

periods.

The hierarchical CPI data is released in terms of

monthly index values. We transformed the CPI values to

monthly logarithmic change rates as follows: We denote

by xtthe CPI value (of any node) at month t. The loga-

rithmic change rate at month tis denoted by rate(t) and

given by

rate(t)=100 ×log xt

xt−1.(9)

Unless otherwise mentioned, the remainder of the paper

relates to monthly logarithmic change rates, as in Eq. (9).

We split the data into a training dataset and a test

dataset as follows: For each time series, we kept the first

(early in time) 70% of the measurements for the training

dataset. The remaining 30% of the measurements were

2www.bls.gov/cpi.

1151

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Table 1

Descriptive statistics.

Dataset # of monthly Mean STD Min Max # of Avg. measurements

measurements indexes per index

Headline only 303 0.18 0.33 −1.93 1.22 1 303

Level 1 6742 0.17 0.96 −18.61 11.32 34 198.29

Level 2 6879 0.12 1.10 −19.60 16.81 46 149.54

Level 3 7885 0.17 1.31 −34.23 16.37 51 121.31

Level 4 7403 0.08 1.97 −35.00 28.17 58 107.89

Level 5 10,809 0.01 1.43 −21.04 242.50 92 87.90

Level 6 7752 0.09 1.49 −11.71 16.52 85 86.13

Level 7 4037 0.11 1.53 −11.90 9.45 50 80.74

Level 8 595 0.08 1.56 −5.27 5.02 7 85.00

Full hierarchy 52,405 0.10 1.75 −35.00 242.50 424 123.31

Notes: General statistics of the headline CPI and CPI-U for each level in the hierarchy and the full hierarchy of indexes.

removed from the training dataset and used to form the

test dataset. The training dataset was used to train the

HRNN model as well as the other baselines. The test

dataset was used for evaluations. The results in Section 6

are based on this split.

Table 1 summarizes the number of data points and

general statistics of the CPI time series after applying

Eq. (9). When comparing the headline CPI with the full

hierarchy, we see that at lower levels, the standard devia-

tion (STD) is significantly higher and the dynamic range is

larger, implying much more volatility. The average num-

ber of measurements per index decreases at the lower

levels of the hierarchy, as not all indexes are available for

the entire period.

Fig. 5 depicts box plots of the CPI change rate distri-

butions at different levels. The boxes depict the median

value and the upper 75th and lower 25th percentiles. The

whiskers indicate the overall minimum and maximum

rates. Fig. 5 further emphasizes that the change rates are

more volatile as we go down the CPI hierarchy.

High dynamic range, high standard deviation, and less

training data are all indicators of the difficulty of making

predictions inside the hierarchy. Based on this informa-

tion, we can expect that the disaggregated component

predictions inside the hierarchy will be more difficult than

the headline.

Finally, Fig. 6 depicts a box plot of the CPI change rate

distribution for different sectors. We notice that some sec-

tors (e.g., apparel and energy) suffer from higher volatility

than others. As expected, predictions for these sectors will

be more difficult.

6. Evaluation and results

We evaluated the HRNN and compared it with well-

known baselines for inflation prediction as well as some

alternative machine learning approaches. We use the fol-

lowing notation: Let xtbe the CPI log-change rate at

month t. We consider models for ˆ

xt—an estimate for xt

based on historical values. Additionally, we denote by

εtthe estimation error at time t. In all cases, the h-

horizon forecasts were generated by recursively iterating

the one-step forecasts forward. Hyperparameters were set

through a ten-fold cross-validation procedure.

6.1. Baseline models

We compared the HRNN with the following CPI pre-

diction baselines:

1. Autoregression (AR) – The AR(ρ) estimates ˆ

xtbased

on the previous ρmonths as follows: ˆ

xt=α0+

ρ

i=1αixt−i+εt, where {αi}ρ

i=0denotes the model’s

parameters.

2. Phillips curve (PC) – A PC(ρ) is an extension of

AR(ρ) that considers the unemployment rate utat

month tin the CPI forecasting model as follows:

ˆ

xt=α0+ρ

i=1αixt−i+βut−1+εt, where {αi}ρ

i=0

and βare the model’s parameters.

3. Vector autoregression (VAR) – The VAR(ρ) model

is a multivariate generalization of AR(ρ). It is fre-

quently used to model two or more time series

together. VAR(ρ) estimates next month’s values of

ktime series based on their historical values from

the previous ρmonths as follows: ˆ

Xt=A0+

(ρ

i=1AiXt−i)+ϵt, where Xtdenotes the last ρ

values from kdifferent time series at month t, and

ˆ

Xtdenotes the model’s estimates of these values;

{Ai}ρ

i=0denotes (k×k) matrices of parameters, and

ϵtis a vector of error terms.

4. Random walk (RW) – We consider the RW(ρ)

model of Atkeson & Ohanian (2001). RW(ρ) is a

simple yet effective model that predicts next month’s

CPI as an average of the last ρmonths: ˆ

yt=

1

ρρ

i=1xt−i+εt.

5. Autoregression in gap (AR-GAP) – The AR-GAP

model subtracts a fixed inflation trend before pre-

dicting the inflation in gap (Faust & Wright,2013).

Inflation gap is defined as gt=xt−τt, where τt

is the inflation trend at time t, which represents a

slowly varying local mean. This trend value is es-

timated using RW(ρ) as follows: τt=1

ρρ

i=1xt−i.

By accounting for the local inflation trend τt, the

model attempts to increase stationarity in gtand

estimate it by ˆ

gt=α0+ρ

i=1αigt−i+εt, where

{αi}ρ

i=0denotes the model’s parameters. Finally, τt

is added back to ˆ

gtto achieve the forecast for the

final inflation prediction: ˆ

xt=ˆ

gt+τt.

6. Logistic smooth transition autoregressive model

(LSTAR) – The LSTAR model is an extension of

1152

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Fig. 5. Box plots of monthly inflation rate per hierarchy level.

Fig. 6. Box plots of monthly inflation rate per sector.

AR that allows for changes in the model param-

eters according to a transition variable F(t;c, γ ).

LSTAR(ρ, c, γ ) consists of two AR(ρ) components

that describe two trends in the data (high and low)

and a nonlinear transition function that links them

as follows:

ˆ

xt=α0+

ρ

i=1

αixt−i(1−F(t;γ , c))

+β0+

ρ

i=1

βixt−iF(t;γ , c)+εt,(10)

where F(t;γ , c)=1

1+e−γ(t−c)is a first-order lo-

gistic transition function that depends on the lo-

cation parameter cand a smoothing parameter γ.

The location parameter ccan be interpreted as

the threshold between the two AR(ρ) regimes, in

the sense that the logistic function changes mono-

tonically from 0 to 1 as tincreases and balances

symmetrically at t=c(van Dijk et al.,2002). The

model’s parameters are {αi}ρ

i=0and {βi}ρ

i=0, while γ,

and care hyperparameters.

7. Random forests (RF) – The RF(ρ) model is an en-

semble learning method which builds a set of deci-

sion trees (Song & Ying,2015) in order to mitigate

overfitting and improve generalization (Breiman,

2001). At prediction time, the average prediction of

the individual trees is returned. The inputs to the

RF(ρ) model are the last ρsamples, and the output

is the predicted value for the next month.

8. Gradient boosted trees (GBT) – The GBT(ρ) model

(Friedman,2002) is based on an ensemble of deci-

sion trees which are trained in a stage-wise fashion

similar to other boosting models Schapire (1999).

1153

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Unlike RF(ρ), which averages the prediction of sev-

eral decision trees, the GBT(ρ) model trains each

tree to minimize the remaining residual error of

all previous trees. At prediction time, the sum of

predictions of all the trees is returned. The inputs

to the GBT(ρ) model are the last ρsamples, and the

output is the predicted value for the next month.

9. Fully connected neural network (FC) – The FC(ρ)

model is a fully connected neural network with one

hidden layer and a ReLU activation (Ramachandran

et al.,2017). The output layer employs no activation

to formulate a regression problem with a squared

loss optimization. The inputs to the FC(ρ) model are

the last ρsamples, and the output is the predicted

value for the next month.

10. Deep neural network (Deep-NN) – The Deep-NN(ρ)

model is a deep neural network consisting of ten

layers with 100 neurons, as in Olson et al. (2018),

which was shown to perform well for inflation

prediction (Goulet Coulombe,2020). We used the

original setup of Olson et al. (2018) and tuned its

hyperparameters as follows: the learning rate was

set to lr =0.005, training lasted 50 epochs (instead

of 200), and the ELU activation functions (Clev-

ert et al.,2016) were replaced by ReLU activation

functions. These changes yielded more accurate

predictions. Hence we decided to include them in

all our evaluations. The inputs to the Deep-NN(ρ)

model are the last ρsamples, and the output is the

predicted value for the next month.

11. Deep neural network with unemployment (Deep-

NN + Unemployment) – Similar to PC(ρ), which

extends AR(ρ) by including unemployment data,

the Deep-NN(ρ) + Unemployment model extends

Deep-NN(ρ) by including the last ρsamples of the

unemployment rate ut. In terms of hyperparame-

ters, we used identical values as in the Deep-NN(ρ).

6.2. Ablation models

In order to demonstrate the contribution of the hier-

archical component of the HRNN model, we conducted

an ablation study that considered simpler alternatives

to the HRNN based on GRUs without the hierarchical

component:

1. Single GRU (S-GRU) – The S-GRU(ρ) is a single GRU

that receives the last ρvalues as inputs in order to

predict the next value. In GRU(ρ), a single GRU is

used for all the time series that comprise the CPI

hierarchy. This baseline utilizes all the benefits of

a GRU but assumes that the different components

of the CPI behave similarly and that a single unit is

sufficient to model all the nodes.

2. Independent GRUs (I-GRUs) – In I-GRUs(ρ), we

trained a different GRU(ρ) unit for each CPI node.

The S-GRU and I-GRU approaches represent two

extremes: The first attempts to model all the CPI

nodes with a single model, while the second treats

each node separately. I-GRUs(ρ) is equivalent to a

variant of the HRNN that ignores the hierarchy by

setting the precision parameter τθn=0; ∀n∈I.

That is, this is a simple variant of the HRNN that

trains independent GRUs, one for each index in the

hierarchy.

3. k-nearest neighbors GRU (KNN-GRU) – In order

to demonstrate the contribution of the hierarchical

structure of HRNN, we devised the KNN-GRU(ρ)

baseline. KNN-GRU attempts to utilize information

from multiple Pearson-correlated CPI nodes with-

out employing the hierarchical informative priors.

Hence, KNN-GRU presents a simpler alternative to

the HRNN that replaces the hierarchical structure

with elementary vector GRUs as follows: First, the

knearest neighbors of each CPI node were found

using the Pearson correlation measure. Then, sepa-

rate vector GRU(ρ) units were trained for each CPI

aggregate along its kmost similar nodes using the

last ρvalues of node nand its k-nearest nodes.

By doing so, the KNN-GRU(ρ) baseline was able

to utilize the benefits of GRU units together with

relevant information that comes from correlated

nodes.

6.3. Evaluation metrics

Following Aparicio & Bertolotto (2020) and Faust &

Wright (2013), we report results in terms of three eval-

uation metrics:

1. Root mean squared error (RMSE) – The RMSE is

given by

RMSE =

1

T

T

t=1xt−ˆ

xt2,(11)

where xtis the monthly change rate for month t,

and ˆ

xtis the corresponding prediction.

2. Pearson correlation coefficient – The Pearson cor-

relation coefficient φis given by

φ=COV (XT,ˆ

XT)

σX×σˆ

X

,(12)

where COV (XT,ˆ

XT) is the covariance between the

series of actual values and the predictions, and σXT

and σˆ

XTare the standard deviations of the actual

values and the predictions, respectively.

3. Distance correlation coefficient – In contrast to

the Pearson correlation measure, which detects lin-

ear associations between two random variables,

the distance correlation measure can also detect

nonlinear correlations (Székely et al.,2007;Zhou,

2012). The distance correlation coefficient rdis given

by

rd=dCov(XT,ˆ

XT)

dVar(XT)×dVar( ˆ

XT)

,(13)

where dCov(XT,ˆ

XT) is the distance covariance be-

tween the series of actual values and the predic-

tions, and dVar(XT) and dVar( ˆ

XT) are the distance

variance of the actual values and the predictions,

respectively.

1154

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Table 2

Average results on disaggregated CPI components.

Model RMSE per horizon Correlation

name AR(1) =1.00 (at horizon =0)

0 1 2 3 4 8 Pearson Distance

AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.06 0.05

AR(2) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06

AR(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06

AR(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07

AR-GAP(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06

AR-GAP(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07

RW(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.05 0.04

Phillips(4) 1.00 1.00 1.00 1.00 0.98 1.00 0.06 0.04

VAR(1) 1.03 1.03 1.04 1.03 1.04 1.05 0.04 0.03

VAR(2) 1.03 1.03 1.04 1.03 1.04 1.05 0.06 0.03

VAR(3) 1.03 1.03 1.03 1.03 1.04 1.05 0.06 0.03

VAR(4) 1.02 1.03 1.03 1.03 1.03 1.04 0.07 0.04

LSTAR(ρ=4, c=2, γ=0.3) 1.04 1.07 1.07 1.07 1.08 1.1 0.09 0.07

GBT(4) 0.83 0.83 0.83 0.84 0.84 0.86 0.18 0.27

RF(4) 0.84 0.85 0.86 0.86 0.86 0.87 0.19 0.29

FC(4) 1.03 1.03 1.04 1.04 1.04 1.05 0.12 0.09

Deep-NN(4) 0.90 0.90 0.90 0.90 0.91 0.91 0.13 0.22

Deep-NN(4) + Unemployment 0.85 0.85 0.85 0.85 0.85 0.86 0.12 0.22

S-GRU(4) 1.02 1.06 1.06 1.07 1.04 1.12 0.10 0.08

I-GRU(4) 0.83 0.84 0.85 0.85 0.86 0.89 0.17 0.13

KNN-GRU(1) 0.91 0.93 0.96 0.97 0.96 0.96 0.19 0.15

KNN-GRU(2) 0.90 0.93 0.95 0.97 0.96 0.96 0.20 0.15

KNN-GRU(3) 0.89 0.92 0.95 0.96 0.96 0.95 0.20 0.15

KNN-GRU(4) 0.89 0.91 0.95 0.95 0.95 0.95 0.20 0.15

HRNN(1) 0.79 0.79 0.81 0.81 0.81 0.83 0.23 0.28

HRNN(2) 0.78 0.79 0.81 0.81 0.80 0.82 0.22 0.29

HRNN(3) 0.79 0.78 0.80 0.81 0.81 0.81 0.23 0.30

HRNN(4) 0.78 0.78 0.79 0.79 0.79 0.80 0.24 0.29

Notes: Average results across all 424 inflation indexes that make up the headline CPI. The RMSE results are relative to

the AR(1) model and normalized according to its results, i.e., RMSEModel

RMSEAR(1). The results are statistically significant according

to a Diebold–Mariano test with p<0.02.

6.4. Results

The HRNN model is unique in its ability to utilize

information from higher levels in the CPI hierarchy in

order to make predictions at lower levels. Therefore, we

provide results for each level of the CPI hierarchy—overall,

424 disaggregated indexes belonging to eight different

hierarchies. For the sake of completion, we also provide

results for the headline CPI index by itself. It is important

to note that in this case, the HRNN model cannot utilize

its hierarchical mechanism and has no advantage over the

alternatives, so we do not expect it to outperform.

Table 2 shows the average results from all the disag-

gregated indexes in the CPI hierarchy. We present predic-

tion results for horizons 0, 1, 2, 3, 4, and 8 months. The

results are relative to the AR(1) model and normalized

according to RMSEModel

RMSEAR(1). In the HRNN, we set α=1.5,

and the V-GRU(ρ) models were based on k=5 nearest

neighbors.

Table 2 shows that different versions of the HRNN

model repeatedly outperform the alternatives at any hori-

zon. Notably, the HRNN is superior to I-GRU, emphasizing

the importance of using hierarchical information and the

superiority of the HRNN over regular GRUs. Additionally,

the HRNN is superior to the different KNN-GRU models,

emphasizing the specific way the HRNN employs infor-

mative priors based on the CPI hierarchy. These results

are statistically significant according to Diebold & Mariano

(1995) pairwise tests for a squared loss-differential with

p-values below 0.02. Additionally, we performed a model

confidence set (MCS) test (Hansen et al.,2011) for the

leading models: RF(4), Deep-NN(4), Deep-NN(4) + Unem-

ployment, GBT(4), IGRU(4), HRNN(1), HRNN(2), HRNN(3),

and HRNN(4). The MCS removed all the baselines and left

only the four HRNN variants, with HRNN(4) as the leading

model (pHRNN(4) =1.00).

For the sake of completion, we also provide results for

predictions at the head of the CPI index. Table 3 summa-

rizes these results. When considering only the headline,

the hierarchical mechanism of the HRNN is redundant and

the model is identical to a single GRU. In this case, we

do not observe much advantage for employing the HRNN

model. In contrast, we see an advantage for the other

deep learning models, such as FC(4) and Deep-NN(4) +

Unemployment, which outperform the more traditional

approaches.

Table 4 lists the results of the best model, HRNN(4),

across all hierarchies (1–8, excluding the headline). We

include the results of the best ablation model, the

I-GRU(4) model, for comparison. The results are averaged

over all disaggregated components and normalized by the

AR(1) model RMSE, as before. As evident from Table 4,

the HRNN model shows the best relative performance at

the lower levels of the hierarchy where the CPI indexes

are more volatile and the hierarchical priors are most

effective.

Table 5 compares the results of HRNN(4) across differ-

ent sectors. Again, we include the results of the I-GRU(4)

1155

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Table 3

CPI headline only.

Model RMSE per horizon Correlation

name* AR(1) =1.00 (at horizon =0)

0 1 2 3 4 8 Pearson Distance

AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.29 0.22

AR(2) 1.00 0.97 0.99 1.01 1.00 0.98 0.32 0.24

AR(3) 1.00 0.98 0.98 1.00 0.96 0.97 0.33 0.25

AR(4) 1.00 0.95 0.95 0.96 0.93 0.96 0.33 0.25

AR-GAP(3) 1.00 0.98 0.98 1.00 0.96 0.97 0.33 0.25

AR-GAP(4) 0.99 0.95 0.95 0.96 0.93 0.96 0.33 0.25

RW(4) 1.05 0.98 0.99 1.01 0.97 0.96 0.23 0.2

Phillips(4) 0.93 0.94 0.95 0.95 0.93 0.95 0.33 0.25

LSTAR(ρ=4, c=2, γ=0.3) 0.98 0.95 0.95 0.97 0.95 0.95 0.32 0.24

RF(4) 1.05 1.06 1.03 1.07 1.04 1.03 0.27 0.28

GBT(4) 0.97 0.99 0.93 0.95 0.93 0.93 0.25 0.35

FC(4) 0.92 0.94 0.94 0.96 0.93 0.94 0.33 0.25

Deep-NN(4) 0.94 0.97 0.96 0.98 0.94 0.92 0.31 0.32

Deep-NN(4) + Unemployment 1.00 0.97 0.92 0.94 0.92 0.91 0.37 0.32

HRNN(4)/GRU(4) 1.00 0.97 0.99 0.99 0.96 0.99 0.35 0.37

Notes: Prediction results for the CPI headline index alone. The RMSE results are relative to the AR(1) model and normalized

according to its results, i.e., RMSEModel

RMSEAR(1).

Table 4

HRNN(4) vs. I-GRU(4) at different levels of the CPI hierarchy with respect to AR(1).

Hierarchy HRNN(4) I-GRU(4)

level

RMSE per horizon Correlation RMSE per horizon Correlation

AR(1) =1.00 (at horizon =0) AR(1) =1.00 (at horizon =0)

0 2 4 8 Pearson Distance 0 2 4 8 Pearson Distance

Level 1 0.95 0.97 0.99 1.00 0.33 0.37 0.98 0.98 0.99 0.97 0.25 0.38

Level 2 0.91 0.90 0.91 0.91 0.30 0.35 0.90 092 0.94 0.93 0.26 0.34

Level 3 0.79 0.79 0.80 0.81 0.21 0.31 0.82 0.89 0.94 0.94 0.23 0.37

Level 4 0.77 0.77 0.76 0.77 0.26 0.32 0.84 0.87 0.90 0.92 0.20 0.33

Level 5 0.79 0.77 0.77 0.80 0.21 0.31 0.85 0.89 0.89 0.93 0.22 0.29

Level 6 0.75 0.76 0.81 0.81 0.19 0.23 0.85 0.89 0.90 0.92 0.21 0.21

Level 7 0.75 0.78 0.77 0.80 0.17 0.17 0.87 0.89 0.92 0.94 0.18 0.15

Level 8 0.72 0.78 0.77 0.78 0.10 0.23 0.89 0.90 0.92 0.94 0.10 0.12

Notes: The RMSE results are relative to the AR(1) model and normalized according to its results, i.e., RMSEModel

RMSEAR(1).

Table 5

HRNN(4) vs. I-GRU(4) results for different CPI sectors with respect to AR(1).

Industry HRNN(4) I-GRU(4)

sector

RMSE per horizon Correlation RMSE per horizon Correlation

AR(1) =1.00 (at horizon =0) AR(1) =1.00 (at horizon =0)

0 2 4 8 Pearson Distance 0 2 4 8 Pearson Distance

Apparel 0.83 0.87 0.84 0.88 0.04 0.19 0.88 0.88 0.85 0.92 0.05 0.23

Energy 0.94 0.96 0.99 0.98 0.34 0.32 0.94 0.98 1.02 0.99 0.18 0.28

Food & Beverages 0.72 0.73 0.75 0.76 0.22 0.13 0.80 0.80 0.81 0.82 0.18 0.22

Housing 0.79 0.80 0.82 0.82 0.17 0.24 0.77 0.79 0.82 0.82 0.18 0.27

Medical Care 0.79 0.82 0.81 0.82 0.03 0.17 0.79 0.83 0.83 0.84 0.08 0.15

Recreation 0.99 0.99 1.00 1.00 0.05 0.17 1.00 0.99 1.00 1.00 −0.07 0.17

Services 0.90 0.92 0.95 0.94 0.04 0.15 0.89 0.94 0.95 0.96 0.02 0.21

Transportation 0.83 0.84 0.85 0.85 0.27 0.28 0.82 0.85 0.86 0.88 0.26 0.36

Notes: The RMSE results are relative to the AR(1) model and normalized according to its results, i.e., RMSEModel

RMSEAR(1).

model for comparison. The results are averaged over all

disaggregated components and presented as normalized

gains with respect to the AR(1) model, as before. The best

relative improvement of the HRNN(4) model appears to

be in the Food and Beverages group. This can be explained

by the fact that the Food and Beverages sub-hierarchy is

the deepest and most elaborate hierarchy in the CPI tree.

When the hierarchy is deeper and more elaborate, the

advantages of the HRNN are emphasized.

Finally, Fig. 7 provides specific examples of three dis-

aggregated indexes: Tomatoes, Bread, and Information

Technology. The solid red line presents the actual CPI val-

ues. The dashed green line presents HRNN(4) predictions,

while the dotted blue line presents I-GRU(4) predictions.

1156

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Fig. 7. Examples of HRNN(4) predictions for disaggregated indexes.

These indexes are located at the bottom of the CPI hierar-

chy and suffer from relatively high volatility. The HRNN(4)

model seems to track and predict the trends of the real

index accurately and often perform better than I-GRU(4).

As can be seen, I-GRU’s predictions appear to be more

conservative than HRNN. At first, this may appear coun-

terintuitive, as the HRNN has more regularization than

I-GRU. However, this additional regularization is actually

informative regularization coming from the parameters of

the upper levels in the CPI hierarchy. This allows the

HRNN model to be more expressive without overfitting.

In contrast, in order to ensure that I-GRU does not over-

fit the training data, its other regularization techniques,

such the learning rate hyperparameter and the early stop-

ping procedure, prevent the I-GRU model from becoming

overconfident. Figs. 9 and 10 in Appendix provide addi-

tional examples for a large variety of disaggregated CPI

components.

6.5. HRNN dynamics

In what follows, we take a closer look at several char-

acteristics of the HRNN model that result from the non-

stationary nature of the CPI. The HRNN model is a deep

learning hierarchical model that requires substantial train-

ing time depending on the available hardware. In this

work, the HRNN model was trained once using the train-

ing dataset and evaluated on the test dataset, as explained

above. In order to investigate the potential benefit from

retraining the HRNN every quarter, we performed the

following experiment: For a test-set period from 2001–

2018, we retrained HRNN(4) after each quarter, each

time adding the hierarchical CPI values of the last three

months. Fig. 8 presents the results of this experiment. The

dashed green line presents the RMSE of HRNN(4) with

the regular training used in this work, while the dotted

blue line presents the results of retraining the HRNN every

quarter. As expected, in most cases, retraining the model

with additional data from the recent period improves the

1157

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Fig. 8. Effect of retraining HRNN(4) each quarter.

Table 6

Average results on disaggregated CPI components prior to the GFC.

Model RMSE per horizon Correlation

name* AR(1) =1.00 (at horizon =0)

0 1 2 3 4 8 Pearson Distance

AR(1) 1.00 1.00 1.00 1.00 1.00 1.00 0.07 0.05

AR(2) 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.06

AR(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07

AR(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07

AR-GAP(3) 1.00 1.00 1.00 1.00 1.00 1.00 0.09 0.07

AR-GAP(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.10 0.07

RW(4) 1.00 1.00 1.00 1.00 1.00 1.00 0.05 0.04

Phillips(4) 1.00 1.00 0.99 0.99 1.00 1.00 0.05 0.03

VAR(1) 1.04 1.04 1.04 1.05 1.05 1.06 0.04 0.03

VAR(2) 1.03 1.04 1.04 1.04 1.05 1.05 0.05 0.03

VAR(3) 1.03 1.03 1.03 1.04 1.04 1.05 0.06 0.03

VAR(4) 1.02 1.03 1.03 1.03 1.03 1.04 0.06 0.04

LSTAR(ρ=4, c=2, γ=0.3) 1.05 1.06 1.05 1.08 1.09 1.10 0.08 0.06

RF(4) 0.92 0.91 0.91 0.92 0.92 0.95 0.2 0.29

GBT(4) 0.91 0.92 0.91 0.93 0.92 0.97 0.18 0.34

FC(4) 0.99 0.99 1.00 1.00 1.02 1.05 0.11 0.08

Deep-NN(4) 0.94 0.95 0.94 0.94 0.94 0.95 0.15 0.32

Deep-NN(4) + Unemployment 0.92 0.92 0.94 0.95 0.93 0.95 0.2 0.35

S-GRU(4) 1.05 1.09 1.09 1.10 1.09 1.10 0.09 0.07

I-GRU(4) 0.86 0.90 0.90 0.92 0.93 0.94 0.33 0.35

KNN-GRU(1) 0.94 0.96 0.96 0.96 0.97 0.98 0.10 0.07

KNN-GRU(2) 0.94 0.96 0.95 0.96 0.97 0.98 0.11 0.08

KNN-GRU(3) 0.93 0.96 0.95 0.96 0.96 0.98 0.11 0.08

KNN-GRU(4) 0.93 0.96 0.96 0.95 0.96 0.97 0.12 0.09

HRNN(1) 0.85 0.89 0.90 0.92 0.91 0.94 0.23 0.27

HRNN(2) 0.84 0.89 0.90 0.92 0.91 0.94 0.24 0.25

HRNN(3) 0.84 0.89 0.89 0.92 0.91 0.93 0.28 0.34

HRNN(4) 0.83 0.88 0.88 0.91 0.90 0.93 0.35 0.37

Notes: Average results across all 424 inflation indexes that make up the headline CPI. In contrast to Table 2, here we

focus on the results up to the GFC of 2008. The RMSE results are relative to the AR(1) model and normalized according

to its results, i.e., RMSEModel

RMSEAR(1). The results are statistically significant according to a Diebold–Mariano test with p<0.05.

results. However, this improvement is moderate and the

overall model quality is about the same.

In order to study the GFC effect on HRNN’s perfor-

mance, we removed the data from 2008 onward and

repeated the experiment of Table 2, using only the data

from 1997 up to 2008. The results of this experiment

are summarized in Table 6. In terms of the RMSE, the

gains of the HRNN in Table 2 vary from 0.78 up to 0.8,

in contrast to Table 6 where the gains vary from 0.83

to 0.93. This reveals that during the turmoil of the GFC,

when the demand for reliable and precise forecasting

tools is enhanced, the HRNN’s forecasting abilities remain

robust. In fact, its forecasting superiority was somewhat

enhanced during the GFC when compared to the AR(1)

baseline.

7. Concluding remarks

Policymakers have a wide range of predictive tools at

their disposal to forecast headline inflation: survey data,

1158

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Table 7

Indexes, levels 0 and 1.

Level Index Parent

0 All items –

1 All items less energy All items

1 All items less food All items

1 All items less food and energy All items

1 All items less food and shelter All items

1 All items less food, shelter, and energy All items

1 All items less food, shelter, energy, and used cars and trucks All items

1 All items less homeowners costs All items

1 All items less medical care All items

1 All items less shelter All items

1 Apparel All items

1 Apparel less footwear All items

1 Commodities All items

1 Commodities less food All items

1 Durables All items

1 Education and communication All items

1 Energy All items

1 Entertainment All items

1 Food All items

1 Food and beverages All items

1 Fuels and utilities All items

1 Household furnishings and operations All items

1 Housing All items

1 Medical care All items

1 Nondurables All items

1 Nondurables less food All items

1 Nondurables less food and apparel All items

1 Other goods and services All items

1 Other services All items

1 Recreation All items

1 Services All items

1 Services less medical care services All items

1 Services less rent of shelter All items

1 Transportation All items

1 Utilities and public transportation All items

Note: Levels and parents of indexes might change through time.

expert forecasts, inflation swaps, economic and econo-

metric models, etc. However, policy institutions lack mod-

els and data to assist with forecasting CPI components,

which are essential for a deeper understanding of the

underlying dynamics. The understanding of disaggregated

inflation trends can provide insight into the nature of

future inflation pressures, their transitory factors (sea-

sonal factors, energy, etc.), and other factors that influ-

ence market-makers and the conduct of monetary policy,

among other decision-makers. Hence, our hierarchical ap-

proach uses endogenous historical data to forecast the

CPI at the disaggregated level, rather than forecasting

headline inflation, even if it performs well (Ibarra,2012).

The business cycle plays an important role in inflation

dynamics, particularly through specific product classes.

CPI inflation dynamics are sometimes driven by compo-

nents unrelated to central bank policy objectives, such as

food and energy prices. A disaggregated CPI forecast pro-

vides a more accurate picture of the sources and features

of future inflation pressures in the economy, which in

turn improves policymakers’ response efficiency. Indeed,

forecasting sectoral inflation may improve the optimiza-

tion problem faced by the central bank (Ida,2020).

While similar headline inflation forecasts may corre-

spond to various underlying economic factors, a disaggre-

gated perspective allows understanding and analyzing the

decomposition of these inflation forecasts at the sectoral

or component level. Instead of disaggregating inflation to

forecast the headline inflation (Stock & Watson,2020), our

approach allows policy- and market-makers to forecast

specific sector and component prices, where information

is less available: almost no component- or sector-specific

survey forecasts, expert forecasts, or market-based fore-

casts exist. For instance, a central bank could use such

modeling features to consider components that contribute

to inflation (military, food, cigarettes, and energy) un-

related to its primary inflation objectives to improve

their final assessment of their inflation forecasts. Sector-

specific inflation forecasts should also inform economic

policy recommendations at the sectoral level, and market-

makers can better direct and tune their investment strate-

gies (Swinkels,2018).

In traditional approaches for inflation forecasting, a

theoretical or a linear model is often used, which in-

evitably biases the estimated forecasts. Our novel ap-

proach may overcome the usual shortcomings of

traditional forecasts, giving policymakers new insights

from a different angle. Disaggregated forecasts include

explanatory variables with hierarchies that reduce mea-

surement errors at the component level. Additionally, our

model structure attenuates component-specific residuals

derived from each level and sector, resulting in improved

forecasting. For all these reasons, we believe that the

1159

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Table 8

Indexes, level 2.

Level Index Parent

2 All items less food and energy All items less energy

2 Apparel commodities Apparel

2 Apparel services Apparel

2 Commodities less food Commodities

2 Commodities less food and beverages Commodities

2 Commodities less food and energy commodities All items less food and energy

2 Commodities less food, energy, and used cars and trucks Commodities

2 Communication Education and communication

2 Domestically produced farm food Food and beverages

2 Education Education and communication

2 Energy commodities Energy

2 Energy services Energy

2 Entertainment commodities Entertainment

2 Entertainment services Entertainment

2 Food Food and beverages

2 Food at home Food

2 Food away from home Food

2 Footwear Apparel

2 Fuels and utilities Housing

2 Homeowners costs Housing

2 Household energy Fuels and utilities

2 Household furnishings and operations Housing

2 Infants’ and toddlers’ apparel Apparel

2 Medical care commodities Medical care

2 Medical care services Medical care

2 Men’s and boys’ apparel Apparel

2 Nondurables less food Nondurables

2 Nondurables less food and apparel Nondurables

2 Nondurables less food and beverages Nondurables

2 Nondurables less food, beverages, and apparel Nondurables

2 Other services Services

2 Personal and educational expenses Other goods and services

2 Personal care Other goods and services

2 Pets, pet products and services Recreation

2 Photography Recreation

2 Private transportation Transportation

2 Public transportation Transportation

2 Rent of shelter Services

2 Services less energy services All items less food and energy

2 Services less medical care services Services

2 Services less rent of shelter Services

2 Shelter Housing

2 Tobacco and smoking products Other goods and services

2 Transportation services Services

2 Video and audio Recreation

2 Women’s and girls’ apparel Apparel

Note: Levels and parents of indexes have changed over the years.

HRNN can be a valuable tool for asset managers, pol-

icy institutions, and market-makers lacking component-

specific price forecasts critical to their decision processes.

The HRNN model was designed for predicting disag-

gregated CPI components. However, we believe its merits

may be useful for predicting other hierarchical time series,

such as GDP. In future work, we plan to investigate the

performance of the HRNN model on additional hierarchi-

cal time series. Moreover, in this paper we focused mainly

on endogenous models that do not consider other eco-

nomic variables. The HRNN can naturally be extended to

include different variables as side information by chang-

ing the input for the GRU components to be a multi-

dimensional time series (instead of a one-dimensional

vector). We plan to experiment with additional side in-

formation that can potentially improve the prediction

accuracy. In particular, we will experiment with online

price data, as in Aparicio & Bertolotto (2020). Finally, we

will try to replace the RNNs in the model with neural self-

attention (Shaw et al.,2018). Hopefully, this will lead to

improved accuracy and better explainability through the

analysis of attention scores (Hsieh et al.,2021).

Declaration of competing interest

The authors declare that they have no known com-

peting financial interests or personal relationships that

could have appeared to influence the work reported in

this paper.

Appendix. Additional tables and figures

See Figs. 9 and 10 and Tables 7 and 8.

1160

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

Fig. 9. Additional examples of HRNN(4) predictions for disaggregated indexes.

Fig. 10. Additional examples of HRNN(4) predictions for disaggregated indexes.

Indexes in Figs. 9 and 10 were selected from different hierarchies and sectors.

References

Almosova, A., & Andresen, N. (2019). Nonlinear inflation forecasting with

recurrent neural networks:Technical Report, European Central Bank

(ECB).

Aparicio, D., & Bertolotto, M. I. (2020). Forecasting inflation with online

prices. International Journal of Forecasting,36(2), 232–247.

Athey, & Susan (2018). The impact of machine learning on economics.

In The economics of artificial intelligence: An agenda (pp. 507–547).

University of Chicago Press.

Atkeson, A., & Ohanian, L. E. (2001). Are phillips curves useful for

1161

O. Barkan, J. Benchimol, I. Caspi et al. International Journal of Forecasting 39 (2023) 1145–1162

forecasting inflation? Federal Reserve Bank of Minneapolis Quarterly

Review,25(1), 2–11.

Bernanke, B. S., Laubach, T., Mishkin, F. S., & Posen, A. S. (2018). Inflation

targeting: lessons from the international experience. Princeton, NJ:

Princeton University Press.

Breiman, L. (2001). Random forests. Machine Learning,45(1),

5–32.

Chakraborty, C., & Joseph, A. (2017). Machine learning at central banks.

Bank of England Working Papers, Number 674.

Chen, X., Racine, J., & Swanson, N. R. (2001). Semiparametric ARX

neural-network models with an application to forecasting inflation.

IEEE Transactions on Neural Networks,12(4), 674–683.

Choudhary, M. A., & Haider, A. (2012). Neural network models

for inflation forecasting: An appraisal. Applied Economics,44(20),

2631–2635.

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation

of gated recurrent neural networks on sequence modeling. arXiv

preprint arXiv:1412.3555.

Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate

deep network learning by exponential linear units (ELUs). arXiv:

Learning.

Dey, R., & Salemt, F. M. (2017). Gate-variants of gated recurrent unit

(GRU) neural networks. In 2017 IEEE 60th international midwest

symposium on circuits and systems (pp. 1597–1600). IEEE.

Diebold, F. X., & Mariano, R. S. (1995). Comparing predictive accuracy.

Journal of Business & Economic Statistics,13(3), 253–263.

van Dijk, D., Terasvirta, T., & Franses, P. H. (2002). Smooth transi-

tion autoregressive models — A survey of recent developments.

Econometric Reviews,21(1), 1–47.

Faust, J., & Wright, J. H. (2013). Forecasting inflation. In G. El-

liott, C. Granger, & A. Timmermann (Eds.), Handbook of economic

forecasting:vol. 2,Handbook of economic forecasting (pp. 2–56).

Elsevier.

Friedman, M. (1961). The lag in effect of monetary policy. Journal of

Political Economy,69, 447.

Friedman, J. H. (2002). Stochastic gradient boosting. Computational

Statistics & Data Analysis,38(4), 367–378.

Gilchrist, S., Schoenle, R., Sim, J., & Zakrajšek, E. (2017). Inflation

dynamics during the financial crisis. American Economic Review,

107(3), 785–823.

Goulet Coulombe, P. (2020). To bag is to prune. arXiv e-Prints,

arXiv–2008.

Goulet Coulombe, P., Leroux, M., Stevanovic, D., & Surprenant, S.

(2022). How is machine learning useful for macroeconomic

forecasting? Journal of Applied Econometrics, in press.

Hansen, P. R., Lunde, A., & Nason, J. M. (2011). The model confidence

set. Econometrica,79(2), 453–497.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory.

Neural Computation,9(8), 1735–1780.

Hsieh, T.-Y., Wang, S., Sun, Y., & Honavar, V. (2021). Explainable mul-

tivariate time series classification: A deep neural network which

learns to attend to important variables as well as time intervals. In

Proceedings of the 14th ACM international conference on web search

and data mining (pp. 607–615).

Ibarra, R. (2012). Do disaggregated CPI data improve the accuracy of

inflation forecasts? Economic Modelling,29(4), 1305–1313.

Ida, D. (2020). Sectoral inflation persistence and optimal monetary

policy. Journal of Macroeconomics,65(C).

Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of

recurrent neural networks for sequence learning. CoRR.

Makridakis, S., Assimakopoulos, V., & Spiliotis, E. (2018). Objectivity, re-

producibility and replicability in forecasting research. International

Journal of Forecasting,34(4), 835–838.

Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2020). The M4

competition: 100,000 time series and 61 forecasting methods.

International Journal of Forecasting,36(1), 54–74.

Mandic, D., & Chambers, J. (2001). Recurrent neural networks for

prediction: Learning algorithms, architectures and stability. Wiley.

McAdam, P., & McNelis, P. (2005). Forecasting inflation with thick

models and neural networks. Economic Modelling,22(5), 848–867.

Medeiros, M., Vasconcelos, G., Veiga, A., & Zilberman, E. (2021).

Forecasting inflation in a data-rich environment: the benefits of

machine learning methods. Journal of Business & Economic Statistics,

39(1).

Mullainathan, S., & Spiess, J. (2017). Machine learning: an applied

econometric approach. Journal of Economic Perspectives,31(2),

87–106.

Nakamura, E. (2005). Inflation forecasting using a neural network.

Economics Letters,86(3), 373–378.

Olson, M., Wyner, A. J., & Berk, R. (2018). Modern neural networks gen-

eralize on small data sets. In Proceedings of the 32nd international

conference on neural information processing systems (pp. 3623–3632).

Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation

functions. CoRR.

Schapire, R. E. (1999). A brief introduction to boosting. In IJCAI’99,

Proceedings of the 16th international joint conference on artificial

intelligence - Vol. 2 (pp. 1401–1406). San Francisco, CA, USA:

Morgan Kaufmann Publishers Inc..

Shaw, P., Uszkoreit, J., & Vaswani, A. (2018). Self-attention with relative

position representations. arXiv preprint arXiv:1803.02155.

Song, Y.-Y., & Ying, L. (2015). Decision tree methods: applications for

classification and prediction. Shanghai Archives of Psychiatry,27(2),

130.

Stock, J. H., & Watson, M. W. (2007). Why has US inflation become

harder to forecast? Journal of Money, Credit and Banking,39, 3–33.

Stock, J. H., & Watson, M. W. (2010). Modeling inflation after the crisis:

Technical Report, National Bureau of Economic Research.

Stock, J. H., & Watson, M. W. (2020). Trend, seasonal, and sectorial

inflation in the euro area. In G. Castex, J. Galí, & D. Saravia (Eds.),

Central banking, analysis, and economic policies book series:vol. 27,

Changing inflation dynamics,evolving monetary policy (pp. 317–344).

Central Bank of Chile.

Swinkels, L. (2018). Simulating historical inflation-linked bond returns.

Journal of Empirical Finance,48(C), 374–389.

Székely, G. J., Rizzo, M. L., & Bakirov, N. K. (2007). Measuring and testing

dependence by correlation of distances. The Annals of Statistics,

35(6), 2769–2794.

Woodford, M. (2012). Inflation targeting and financial stability. Sveriges

Riksbank Economic Review,1, 7–32.

Yu, Y., Si, X., Hu, C., & Zhang, J. (2019). A review of recurrent

neural networks: LSTM cells and network architectures. Neural

Computation,31(7), 1235–1270.

Zahara, S., & Ilmiddaviq, M. (2020). Consumer price index prediction

using long short term memory (LSTM) based cloud computing.

Journal of Physics: Conference Series,1456, Article 012022.

Zhou, Z. (2012). Measuring nonlinear dependence in time-series, A

distance correlation approach. Journal of Time Series Analysis,33(3),

438–457.

1162