Content uploaded by Jonathan Benchimol
Author content
All content in this area was uploaded by Jonathan Benchimol on Jul 21, 2022
Content may be subject to copyright.
Forecasting CPI Inflation Components with
Hierarchical Recurrent Neural Network
Advanced Analytics: New Methods and Applications
for Macroeconomic Policy Conference
Bank of England, European Central Bank & King’s College London
July 21,2022
Jonathan Benchimol
Research Department
Bank of Israel
Coauthors: Oren Barkan,Itamar Caspi,Eliya Cohen,Allon Hammer, and Noam Koenigstein
This presentation does not necessarily reflect the views of the Bank of Israel 1
Outline
●Consumer Price Index and Dataset Properties
●Recurrent Neural Networks (RNNs)
●Hierarchical Recurrent Neural Networks (HRNN)
●Evaluation and Results
●Policy Implications and Conclusion
2
CPI and Dataset Properties
3
Consumer Price Index
●The Consumer Price Index (CPI) measures the average change over
time in the prices consumers pay for a basket of goods and services.
●The CPI quantifies the average cost of living in a given country by
estimating the purchasing power of a single unit of currency.
●The CPI is the key macroeconomic indicator for measuring inflation (or
deflation).
4
US Consumer Price Index
●In the US, the CPI is calculated and reported by the Bureau of Labor
Statistics (BLS) on a monthly basis.
●The BLS has classified all expenditure items into more than 200 categories,
arranged into eight major groups: (1) Housing, (2) Food and Beverages, (3)
Medical Care, (4) Apparel, (5) Transportation, (6) Energy, (7) Recreation,
and (8) Services.
●The consumer goods and services are grouped in a hierarchy of
increasingly detailed categories (levels).
5
Hierarchical Data Structure
•Level 0
Aggregated CPI across all components
•Level 1
Aggregated components (e.g., Energy, Apparel)
•Mid levels (2-5)
Fine grained components, expenditure classes, item strata (e.g., Insurance)
•Lower levels (6-8)
Finer grained components (e.g., Bacon, Tomatoes)
6
Hierarchical Data Structure
7
Example
●The White Bread entry is classified under the following eight level
hierarchy:
All Items
Food and Beverages
Food at Home
Cereals and Bakery Products
Cereals and Cereal Products
Bakery Products
Bread
White Bread
8
Forecasting CPI
●Central banks conduct monetary policy to achieve price stability (low and
stable inflation).
●Investors in fixed income assets (such as government bonds) estimate
future inflation to foresee upcoming trends in discounted real returns.
●Government and private debt management depend on the expected path
of inflation.
●Policymakers and marketmakers monitor CPI component levels (e.g., core
inflation, oil-related products).
9
Related Work
●Most related work deal with predicting the headline CPI only.
●Forecasts based on simple averages of past inflation are more accurate than
structural models [1].
●ML models based on exogenous features: online prices, house prices,
exchange rates etc. [2].
●Feed-forward NN to predict inflation rate in 28 OECD countries.
About 50% of the countries NN were superior to autoregressive models [3].
[1] Makridakis, Assimakopoulos, Spiliotis. Objectivity, reproducibility and replicability in forecasting research.
International Journal of Forecasting (2018).
[2] Medeiros, Vasconcelos, Veiga, Zilberman. Forecasting inflation in a data-rich environment: the benefits of machine learning methods.
Journal of Business & Economic Statistics (2021).
[3] Choudhary, Haider. Neural network models for inflation forecasting: an appraisal.
Applied Economics (2012).
10
Objective
•Our goal: Forecast US monthly CPI inflation for all components,
without exogenous features.
•Harness the hierarchical pattern of the data to improve prediction at
low levels.
•Utilize the sequential pattern of the data employing Recurrent
Neural Networks.
•Improve predictions of volatile and non-stationary time series at
lower-level components.
11
Dataset
•CPI-U (Urban CPI) from 1994 to 2019 from the BLS.
•Monthly prices of 424 components, structured
hierarchically.
•Each component is a time series of inflation rates
belonging to a level between 0 and 8.
•The train set comprises 70% of early entries, and the
other 30% comprise the test set.
= CPI-U at time
t
12
Summary Statistics
13
Volatility at Different Levels
14
Volatility at Different Sectors
15
Recurrent Neural Networks
16
Artificial Neural Networks
A neural network is a group of
algorithms that endeavors to recognize
underlying relationships in a set of
data through a process that mimics the
way the human brain operates.
17
Recurrent Neural Networks
•RNNs are neural networks that model sequences of data in which each
value is assumed to be dependent on previous values.
•RNNs are feed-forward networks augmented by including a feedback loop.
•RNNs introduce a notion of time to the standard feed-forward neural
networks and excel at modeling temporal dynamic behavior (Chung et al., 2014).
•Some RNN units retain an internal memory state from previous time steps
representing an arbitrarily long context window.
•Our paper covers the three most popular units: Basic RNN, Long-Short
Time Memory (LSTM), and Gated Recurrent Unit (GRU).
18
Basic RNN
The linear combination is the
argument of ahyperbolic tangent
activation function allowing the
unit to model nonlinear relations
between inputs and outputs.19
Long-Short Term Memory Networks
•Basic RNNs suffer from the “short-term memory” problem
Use recent history to forecast, but for long enough sequences, cannot carry relevant
information from earlier to later periods, e.g., relevant patterns from the same month in
previous years.
•Long-Short Term Memory networks (LSTMs) deal with this problem by
introducing gates that enable the preservation of relevant “long-term
memory” and combining it with the most recent data.
•The introduction of LSTMs paved the way for significant strides in various
fields such as NLP, speech recognition, robot control and more.
20
Long-Short Term Memory Networks
•A LSTM unit has the ability to
“memorize” or “forget” information
through the use of a special memory cell
state.
•The cell state is carefully regulated by
three gates that control the flow of
information in the LSTM unit: input gate
(i), forget gate (f), and output gate (o).
•The cell state Cis updated by a
combination of its previous state and its
current candidate C.
Learned params
21
Gated Recurrent Unit
•A Gated Recurrent Unit (GRU) is a newer improvement of the LSTM unit
that dropped the cell state in favor of a more simplified unit that requires
less learnable parameters.
•GRUs are faster and more efficient especially when training data is limited,
such as in the case of inflation predictions (and especially disaggregated
inflation components).
22
Gated Recurrent Unit
•The candidate activation vis a function
of the input and the previous output.
•The output sis a combination of the
candidate activation vand the previous
output controlled by z.
23
Hierarchical Recurrent Neural
Networks
24
Hierarchical Recurrent Neural Networks
•HRNN exhibits a network graph that follows the CPI hierarchy.
•Each node is a RNN that models the inflation rate of a specific
component in the full CPI hierarchy.
•HRNNs propagate information from RNN models in higher levels to
lower levels via hierarchical priors over the RNNs’ learned weights.
•Expected result: Better predictions for lower-level components.
25
HRNN Formulation
•Define a parametric function grepresenting a
RNN node in the hierarchy.
•g predicts a scalar value for the next input value
of a series.
•Assuming a normal likelihood relation between
gand the observed time series.
= Inflation rate at time t of node n
= Last time step of node n
= Precision variable of node n
= RNN learned params of node n
26
HRNN Formulation
•Define hierarchical network of normal priors
over the nodes’ parameters:
•This models the relationship between a node's
parameters and its parent in the hierarchy.
•This relationship grows stronger according to
the correlation between the two series.
•It ensures that each node is kept close to its
parent, in terms of squared Euclidean distance
in the parameter space.
= RNN learnt params of node n
= RNN learnt params of node n’s
parent
= Pearson corr.
coefficient between the parent and
the child’s time series
= Precision parameter
induced by the Pearson corr. and
an additional hyperparameter .
27
HRNN Formulation
•According to the Bayes Rule, the posterior
probability is:
•Maximum A-Posteriori (MAP) approach:
= Enumeration of all
nodes from all levels
= Aggregation of all
series from all levels
= Aggregation of all
learnt params from
all levels
= Aggregation of all
precision params
from all levels
28
HRNN Based on GRUs
•HRNNs implement gas a scalar GRU.
•Specifically, each node n, is associated
with a GRU of its own.
•HRNNs optimization proceeds with
stochastic gradient ascent over the
objective in MAP.
29
HRNN Architecture
•Each node is a scalar GRU
predicting the inflation in the
next time step for the given
component.
•Constraints from the parent
node are propagated down to
the child node.
•GRUs are trained from top to
bottom.
30
HRNN Inference
•Equipped with trained weights
for node n.
•Predict for future time step
•The prediction for horizon is
obtained by the predictions of
previous horizons iteratively.
•Each time using the previous
predicted value as input to the
GRU.
31
Evaluation and Results
32
Evaluation Metrics
●Evaluation metrics:
○RMSE:
○Pearson Correlation:
○Distance Correlation:
= Actual and predicted
inflation rate at month t,
respectively
= Actual and predicted
inflation rate series,
respectively.
33
Baselines
●Autoregressive (AR): Estimated next month value
based on previous d months.
●Phillips Curve (PC): Add unemployment rate u
which should have an inverse relation with inflation.
●Random Walk (RW): Simple average of last d months.
●Auto-Regression in Gap Form (AR-GAP): Detrend
time series using RW, then use AR to predict the gap
form and finally add the trend to final prediction.
●Vector Autoregression (VAR): Learn Kmost similar
time series together.
●Logistic Smooth Transition Auto Regressive Model
(LSTAR): extension of AR that allows for changes in
the model parameters according to a transition
variable F(Van Dijk et al. 2000). 34
Ablation Study
●Single Scalar GRU: One scalar GRU for all components.
Assumes that the different components of CPI hierarchy behave
similarly.
●HRNN without hierarchy: Set , removing the
hierarchical priors. Equivalent to Nindependent GRU units.
●Fully Connected Neural Network (FC): Similar to Auto-
Regression but with non-linearities.
●Vectorial GRU based on K Nearest Neighbors: Different GRU
for each node n, but each entry is a vector that includes the time
series of its knearest (most correlated) series.
35
36
37
No hierarchy prediction.
No advantage for GRU compared to simple AR model.
Results
Average Results of Best HRNN Model on
Disaggregated CPI Components by Hierarchy
HRNN shows best performance in the lower levels, where CPI components are more volatile.
38
Results
Average Results of Best HRNN Model on
Disaggregated CPI Components by Sector
HRNN shows best performance in Food and Beverages sector which contains the most low-level
CPI components. 39
40
41
42
Conclusion
43
Conclusion
•The hierarchical nature of the model enables information
propagation from higher levels.
•HRNNs are superior at predicting low-level inflation
components.
•Policy implications.
44