Content uploaded by Giovanni Ballarin
Author content
All content in this area was uploaded by Giovanni Ballarin on Oct 19, 2023
Content may be subject to copyright.
Impulse Response Analysis of Structural
Nonlinear Time Series Models
Giovanni Ballarin∗
University of Mannheim
October 13, 2023
First version: May 30, 2023
Link to most recent version
Abstract: Linear time series models are the workhorse of structural macroeconometric
analysis. Yet, economic theory as well as data suggest that nonlinear and asymmetric effects
might be key to understanding the potential effects of sudden economic changes. This paper
proposes a new semi-nonparametric sieve approach to estimate impulse response functions of
nonlinear time series within a general class of structural models. Using physical dependence
conditions, I prove that a two-step procedure can flexibly accommodate nonlinear specifi-
cations, avoiding the choice of fixed parametric forms. Sieve impulse responses are proven
to be consistent by deriving uniform estimation guarantees, while an iterative algorithm
makes it straightforward to compute them in practice. Simulations show that the proposed
semi-nonparametric approach provides insurance against misspecification at minor efficiency
costs. In a US monetary policy application, I find that the sieve GDP response associated
with a rate hike is, at its peak effects, 16% larger than that of a linear model. Finally, when
studying interest rate uncertainty shocks, sieve responses imply up to 54% and 71% stronger
contractionary effects on production and inflation, respectively.
∗E-mail: giovanni.ballarin@gess.uni-mannheim.de. I thank Otilia Boldea, Timo Dimitriadis, Juan Car-
los Escanciano, Lyudmila Grigoryeva, Klodiana Istrefi, Marina Khismatullina, So Jin Lee, Yuiching Li, Sarah
Mouabbi, Andrey Ramirez, Christoph Rothe, Carsten Trenkler and Mengshan Xu, as well as the participants
of the Econonometrics Seminar at the University of Mannheim, the 2023 ENTER Jamboree, the 10th HKMEt-
rics Workshop, the GSS Weekly Seminar at Tilburg University and the Internal Econometrics Seminar at Vrije
Universiteit Amsterdam for their comments, suggestions and feedback.
1
1 Introduction
This paper presents a semi-nonparametric method to study the structural dynamic effects of
unpredictable shocks in a class of nonlinear time series models.
Linear models are the foundation of economic structural time series modeling. The nature
of linear models makes them especially tractable and apt at describing fundamental interactions
and processes. For example, large classes of macroeconomic models in modern New Keynesian
theory can be reduced to linear VARMA form via linearization techniques. This often justifies the
application of the linear time series toolbox from a theoretical point of view. Concurrently, the
work of Sims (1980) on VARs reinvigorated the strain of macroeconometric literature that seeks
to study dynamic economic relationships. Brockwell and Davis (1991), Hamilton (1994b) and
Lütkepohl (2005) provide detailed overviews of linear time series modeling and its developments.
When the objects of interest are solely dynamic effects, the local projection (LP) approach of
Jordà (2005) has also gained popularity as an alternative thanks to its flexibility and ease of
implementation. LPs do not directly impose a linear model on the conditional distribution
of the time series, but rather consist of linear lag regressions. Throughout this paper, the
key dynamic effect under discussion will be the impulse response function (IRF), which is the
common inference object of both linear VARMA and LP analyses.
Nonlinear methods seek to flexibly study the dependence structure between variables of
interest by accommodating a potentially complex model structure. In recent years, research in
nonlinear and asymmetric effects has grown, partly due to the increasing availability of data,
making it feasible to estimate more elaborate models (Fuleky,2020). From a macroeconomic
perspective, one can imagine at least three broad categories of nonlinearities that may be impor-
tant to study. Sign-dependence of impulse responses is a potential key factor in the evaluation
of monetary policy, as the specific effects of an interest rate change might be mitigated if the
central bank implements a rate drop rather than a rate hike, while some others might be en-
hanced (Debortoli et al.,2020). If impulse responses are size-sensitive, large shocks and small
shocks can have vastly different economic impacts, meaning that the policymaker must account
for nonlinear scaling in the intensity of an intervention (Tenreyro and Thwaites,2016). Finally,
if the researcher’s objective lies in studying exogenous changes impacting a variable that is non-
linear by definition, such as volatility indexes, any valid structural model should account for this
feature.
The main contribution of this work is the development of an approach that allows estimating
structural IRFs which can account for general nonlinear effects. This goal entails solving two
related issues: first, structural identification of shocks, so that it is possible to give a valid
economic interpretation to impulse responses; second, estimation of nonlinear functions in the
setting of dependent data. In a linear setup, identification and estimation can be considered as
distinct problems, but when working with nonlinear models these questions become intertwined.
Without specific assumptions, nonlinear model classes are much too vast in terms of complexity:
there are too many channels for any variable to affect any other. Disentangling such channels
thus becomes impossible, and one cannot structurally interpret IRFs and dynamic effects such
2
as multipliers. This problem can be solved by being more precise about the classes of models
one is willing to entertain. I consider the structural nonlinear framework originally proposed by
Gonçalves et al. (2021), which involves selecting one variable to identify the structural shocks of
interest, Xt, and treating it separately from all other series, a vector Yt, included in the model.
By imposing a few additional assumptions on the dependence structure of innovations, one is able
to include general nonlinear effects of Xtand its lags onto Yt. By further allowing the lags of Ytto
influence Xt, this setup permits nonlinear dynamics to propagate to all variables over time. The
significant upside of this paradigm is that structural identification is built-in, instead of being
treated as a separate step. The latter path is most often taken in the literature by implementing
the generalized impulse response function (GIRF) proposed by Koop et al. (1996). Kilian and
Lütkepohl (2017) have, however, highlighted that common linear identification strategies such as
long-term and sign restrictions are generally impossible to impose in general nonlinear models,
since closed-form expressions are not available but in a handful of special cases.
A weakness of the framework in Gonçalves et al. (2021) is that it requires choosing a specific
functional form for the nonlinear components of the model, such as the negative-censoring
map or a cubic map. These are used to tease out the sign and size effects of shocks.1Yet,
correct prior knowledge of such terms is often unreasonable, especially in multivariate, multi-lag
models. The natural way to avoid selecting a parametric nonlinear specification is to resort
to semi-nonparametric techniques. Nonparametric time series methods have a long history in
econometrics (Härdle et al.,1997), but until recently not much progress has been made in
applying them to studying dynamic effects. Impulse response functions are objects that depend
on the global properties of the model and, to be more precise, defining an IRF requires iterating
shock perturbations over time. In a nonlinear model, the perturbation depends on the variables’
state, so that one must consider the shock’s effects across possible states. That is, different
features of the nonlinear model such as level, slope, curvature must be evaluated over a range
of values. Therefore, in this setting, an econometrician must provide error guarantees that are
uniform over the variables’ domain. In this work, I combine the uniform inference framework
of Chen and Christensen (2015) with the structural nonlinear time series scheme discussed
above. The general idea is to resort to semi-nonparametric series estimation and work in a
physical dependence setup (Wu,2005). On the one hand, I argue that physical dependence is a
natural way of imposing assumptions that lead to estimable models, being more transparent than
standard mixing conditions. On the other hand, the series approach makes it easy to estimate
models with linear and nonlinear components of the type considered in this paper. It also
provides well-developed theoretical results to study uncertainty. Under appropriate regularity
assumptions, I show that a two-step semi-nonparametric series estimation procedure is able
to consistently recover the structural model in a uniform sense. This result encompasses the
generated regressors’ problem, which arises in the second step due to the structural identification
strategy. Lastly, I prove that the nonlinear impulse response function estimates obtained are
themselves asymptotically consistent and, thanks to an iterative algorithm, straightforward to
1The negative-censoring map applied to variable ais aÞÑ maxpa, 0q.
3
compute in practice.
To validate the proposed methodology, I provide simulation evidence. The first set of results
shows that, with realistic sample sizes, the efficiency costs of the semi-nonparametric procedure
are small compared to correctly-specified parametric estimates. A second set of simulations
demonstrates that whenever the nonlinear parametric model is mildly misspecified the large-
sample bias is large, while for semi-nonparametric estimates it is negligible. Finally, I study how
the IRFs computed with the new method compare with the ones from two previous empirical
exercises. In a small, quarterly model of the US macroeconomy, I find that the parametric non-
linear and nonlinear appear to underestimate by intensity the GDP responses by 13% and 16%,
respectively, after a large exogenous monetary policy shock. Moreover, sieve responses achieve
maximum impact a year before their linear counterparts. Then, I evaluate the effects of interest
rate uncertainty on US output, prices, and unemployment following Istrefi and Mouabbi (2018).
In this exercise, the impact on industrial production of a one-deviation increase in uncertainty
is approximately 54% stronger according to semi-nonparametric IRFs than the comparable lin-
ear specification. These findings suggest that structural impulse responses predicated on linear
specifications might be appreciably underestimating shock effects.
Related Literature. Nonlinear models for dependent data have been extensively developed
with the aim of analyzing diverse types of series, see e.g. the monographs of Tong (1990), Fan
and Yao (2003), Gao (2007), Tsay and Chen (2018). Teräsvirta et al. (2010) provide a thorough
discussion of nonlinear economic time series modeling, but, by only presenting the generalized
IRF (GIRF) approach proposed by Koop et al. (1996), Potter (2000) and Gourieroux and Jasiak
(2005), they do not explicitly address structural analysis.
Parametric nonlinear specifications are common prescriptions, for example, in time-varying
models (Auerbach and Gorodnichenko,2012,Caggiano et al.,2015) and state-depend models
(Ramey and Zubairy,2018). They have been and are commonly used in time-homogeneous
models. Kilian and Vega (2011) provide a structural analysis of the effects of GDP on oil
price shocks and, in contrast to previous literature, find that asymmetries play a negligible role:
they do this by including a negative-censoring transformation of the structural variable and
testing for significance. Caggiano et al. (2017), Pellegrino (2021) and Caggiano et al. (2021) use
interacted VAR models to estimate effects of uncertainty and monetary policy shocks. From a
finance perspective, Forni et al. (2023a,b) study the economic effects of financial shocks. Their
generalized VMA specification, which is based on that of Debortoli et al. (2020), sets that
innovations be transformed with the quadratic map.2Gambetti et al. (2022) study news shocks
asymmetries by imposing that news changes enter their autoregressive model with a pre-specified
threshold function.
Extension of nonparametric methods to nonlinear time series have already been discussed in
the recent literature. For example, Kanazawa (2020) proposed to use radial basis function neural
networks to estimate a nonlinear time series model of the US macroeconomy. This work focuses
2I will discuss how their nonlinear model setup compares to the one I consider below.
4
on estimating the GIRF of Koop et al. (1996), with its structural limitations: productivity is
assumed to be a fully exogenous variable. Gourieroux and Lee (2023) provide a framework for
nonparametric kernel estimation and inference of IRFs via local projections. Yet, they primarily
work in the one-dimensional case and only mention economic identification in multivariate setups
from the perspective of linear VARs. The work possibly closest to the present paper seems to be
that of Lanne and Nyberg (2023), who develop a nearest-neighbor approach to impulse responses
estimation that builds on the local projection idea and the GIRF concept. These papers, save
for Gourieroux and Lee (2023), do not fully develop an asymptotic theory for their estimators,
which makes it hard to judge the econometric assumptions under which they are applicable.
Outline. The remainder of this paper is organized as follows. Section 2provides the general
framework for the structural model. Section 3describes the two-step semi-nonparametric esti-
mation strategy, provides a thorough treatment of physical dependence assumptions and reports
the key uniform consistency guarantees. Section 4is devoted to the discussion of nonlinear im-
pulse response function computation, validity and consistency. In Section 5, I report simulation
results that show the performance of the proposed method, while in Section 6I discuss empirical
applications. Finally, Section 7concludes. All proofs and additional technical results, as well as
secondary plots, can be found in Appendices Band C, respectively.
Notation. A (vector) random variable will be denoted in capital or Greek letters, e.g. Yt
or ϵt, while its realization will be in lowercase Latin letters, that is yt. For a process tYtutPZ,
we write Yt:s“ pYt, Yt`1, . . . , Ys´1, Ysq, as well as Y˚:t“ p...,Yt´2, Yt´1, Ytqfor the left-infinite
history and Yt:˚“ pYt, Yt`1, Yt`2, . . .qfor its right-infinite history. The same notation is also
used for random variable realizations. For a matrix APRdˆdwhere dě1,∥A∥is the spectral
norm, ∥A∥8is the supremum norm and ∥A∥rfor 0ără 8 is the r-operator norm. For a
random vector or matrix, I will use ∥¨∥Lrto denote the associated Lrnorm.
2 Model Framework
In this section, I introduce the nonlinear time series model that will be considered throughout
the paper. This model setup will be a generalization of the one developed in Gonçalves et al.
(2021) by letting the form of nonlinear components to remain unspecified until estimation. The
idea behind the partial structural identification scheme is simple: if Ztis the full vector of time
series of interest, one must choose one series, call it Xt, as the structural variable, and add
specific assumption on its dynamic effects on the remaining series, vector Yt. The central goal
will be the estimation of the impulse responses of Ytdue to a shock in Xt.
5
2.1 Model
Let Zt:“ pXt, Y 1
tq1where XtPXĎRand YtPYĎRdY, and let d“1`dYbe the dimension
of Zt. The structural nonlinear data generating process has form
B0Zt“b`BpLqZt´1`FpLqXt`ϵt,(1)
where b“ pb1, b1
2q1PRdand ϵt“ pϵ1, ϵ1
2q1PEĎRdare partitioned accordingly. I assume that
model (1) imposes a linear dependence of observables on Ytand its lags, while series Xtcan
enter nonlinearly. That is, BpLq “ B1`B2L`. . . `BpLp´1and FpLq “ F0`F1L`. . . `FpLp
are linear and functional lag polynomials, respectively.3
Matrices pF0, . . . , Fpqare functional in the sense that their entries consist of real univariate
functions. The product between FpLqand Xtis to be interpreted as functional evaluation, i.e.
FpLqXt“»
—
—
–
f0,1pXtq
.
.
.
f0,dpXtq
fi
ffi
ffi
fl`»
—
—
–
f1,1pXt´1q
.
.
.
f1,dpXt´1q
fi
ffi
ffi
fl`. . . `»
—
—
–
fp,1pXt´pq
.
.
.
fp,dpXt´pq
fi
ffi
ffi
fl,
where tfj,lu P Λfor j“0, . . . , p,l“1, . . . , d, and Λis a sufficiently regular function class.4The
modeling choice to remain within the autoregressive time series class with additive lag struc-
ture has two core advantages. First, it yields a straightforward generalization to classical linear
models (Lütkepohl,2005,Kilian and Lütkepohl,2017). Second, it keeps semi-nonparametric es-
timation of nonlinear components feasible. Additivity in variables and lags means that the curse
of dimensionality involved with multivariate nonparametric estimation is effectively mitigated
(Fan and Yao,2003).
Let the lag polynomials be given by
BpLq “ «B11pLqB12 pLq
B21pLqB22 pLqff, F pLq “ «0
F21pLqff.
This structural formulation means that the model equation for Xtis restricted to be linear in
all regressors. It also implies that Xtdoes not depend contemporaneously on itself. Note that
as long as B12pLq “ 0,Xtstill depends upon nonlinear functions of its own lags, which enter
via lags of Yt. Next, I impose that B0PRdYˆdYhas the form
B0“«1 0
´B0,12 B0,22ff,
where B0,22 is non-singular and normalized to have unit diagonal. The structural model is thus
3This is a minor abuse of notation compared to e.g. Lütkepohl (2005). The choice to use a matrix notation is
due to the ease and clarity of writing a (multivariate) additive nonlinear model such as (1) in a manner consistent
with standard linear VAR models. In cases where a real matrix APRdˆdis multiplied with a conformable
functional matrix F, I simply assume the natural product of a scalar times a function, e.g. Aij Fkℓ , where Fkℓ is
a function, returning a new real function.
4To fix ideas, one may think of ΛqpMq, the Hölder function class of smoothness qą0and domain MĎR.
We shall make more precise assumptions regarding Λin Section 3when discussing model estimation.
6
given by
Xt“b1`B12pLqYt´1`B11 pLqXt´1`ϵ1t,
B0,22 Yt“b2`B22pLqYt´1`B21 pLqXt´1`B0,12Xt`F21 pLqXt`ϵ2t.
Moreover, it follows that B´1
0exists and has form
B´1
0“«1 0
B21
0B22
0ff.
The constraints on B0yield a structural identification assumption and require that Xtbe
pre-determined with respect to Yt(Gonçalves et al.,2021). By introducing
µ:“B´1
0b, ApLq:“B´1
0BpLqand GpLq:“B´1
0FpLq,
one thus obtains
Xt“µ1`A12pLqYt´1`A11 pLqXt´1`ϵ1t,
Yt“µ2`A22pLqYt´1`A21 pLqXt´1`G21pLqXt`B21
0ϵ1t`B22
0ϵ2t,(2)
or, equivalently,
Zt“µ`ApLqYt´1`GpLqXt`ut,(3)
where ut“ ru1t, u2ts1,u1t”ϵ1tand u2t:“B21
0ϵ1t`B22
0ϵ2t. Given the structure of B´1
0, one can
see that A12pLq ” B12 pLq,A11pLq ” B11 pLqand G11pLq “ 0. Importantly, one must also notice
that A12pLqand G21 pLq “ B22
0F21pLqmight now be not properly identified without further
assumptions. Since A21 pLqis not necessarily zero, linear effects of lags of Xton Ytcan enter
by means of both lag polynomials. To resolve this issue, I therefore assume that the functional
polynomial G21pLqcontains, at lags greater than zero, only nonlinear components.5
Example 2.1. (A Simple Bivariate Model). To give a concrete example of (2), assume that
one wants to model the effects of monetary policy shocks on U.S. GDP growth following Romer
and Romer (2004). Then, let
Xt“ϵ1t,
Yt“µ2`A2Yt´1`GpXtq ` B21
0ϵ1t`ϵ2t,
where Xtare the policy shocks, which are assumed to be i.i.d., while Ytis a macroeconomic
variable whose responses the researcher is interested in, e.g. GDP growth or PCE inflation.
This setup is very minimal, and I assume here, for the sake of simplicity, that endogeneity of
ϵ2tdoes not pose a problem. Then, the term GpXtq ` B21
0ϵ1t”Gpϵ1tq ` B21
0ϵ1t“:Hpϵ1tqfully
captures any contemporaneous effect of monetary policy shocks on Yt. When Gpϵ1tq “ 0,Hpϵ1tq
5When using a semi-nonparametric estimation strategy with B-splines, this will be feasible to implement
numerically. When using wavelets, this also is a natural approach. In practice, however, some care must be taken
to avoid constructing collinear regression matrices.
7
and the model are linear. If Gpϵ1tq “ β0maxp0, ϵ1tqfor some β0“ 0, function His piece-wise
linear: contractionary and expansionary shocks have, in general, different effects on Yt, but
shocks with the same sign have proportional impact. As a final example, if Gpϵ1tq “ β0ϵ3
1tthen
Hpϵ1tqis a third-degree polynomial, so that both sign and size of monetary policy shocks are
fundamental determinants of Yt’s impulse response function. In principle, to correctly quantify
the repercussions of a specific monetary intervention a researcher must model all of these effects,
unless they have a strong prior belief that either or both can be safely ignored. More complex
nonlinear and asymmetric relations are also possible. A more robust strategy - as proposed in
the present work - is to avoid choosing G(or H) as part of the model’s specification, but rather
to empirically estimate it jointly with all other coefficients.
Remark 2.1. (Constrained Models). The general approach of leaving FpLqunconstrained is
appealing when no precise economic intuition or information is available. However, there might
be cases where the functional form of the nonlinear component is either partially known, or can
be restricted. A simple restriction is that of a uniform functional over lags,
FpLq “ F`F L `F L2`. . . `F Lp.
This is a constraint effectively imposed by e.g. Gonçalves et al. (2021), Kilian and Vega (2011)
and other references. They do this by fully specifying F, but nonparametric constraints may be
desired, e.g. monotonicity. Constrained estimation of FpLqis addressed in Remark 3.2 below.
The system of equations in (2) provides the so-called pseudo-reduced form model. By design,
one does not need to identify the model fully, meaning that fewer assumptions on Ztand ϵtare
needed to estimate the structural effects of ϵ1ton Yt. This comes at the cost of not being able
to simultaneously study structural effects with respect to ϵ2t. An associated problem is that,
in general, G21pLqXtis correlated with innovation u2tthrough B21
0ϵ1t. The main challenge to
structural shock identification of ϵ1tthus lies in the fact that if B21
0“ 0and G21p0q “ 0, there
is endogeneity in the equations for Ytsince Xtdepends linearly on ϵ1t.Gonçalves et al. (2021)
address the issue by proposing a two-step estimation procedure wherein one explicitly controls
for ϵ1tby using regression residuals pϵt. In Section 3below, I show that this approach also allows
for consistent semi-nonparametric estimation of structural impulse responses.
Remark 2.2. (Identification Schemes). Forni et al. (2023a,b) provide an alternative nonlinear
structural identification framework to that of Gonçalves et al. (2021). Their approach was
originally introduced in Debortoli et al. (2020) and is based on the VMA form of the time series.
Using the current notation, suppose that the structural representation of Ztis given by
Zt“b`QpLqFpϵ1tq ` BpLqϵt
where ϵtare independent structural shocks with zero mean and identity covariance, while ϵ1t
identifies, e.g., financial innovations and shocks. QpLqand BpLqare both linear lag polynomials
and Fis a nonlinear function to be specified by the researcher. Imposing some additional
8
assumptions, the reduced-form assumed by Forni et al. (2023a) is
Zt“µ`ApLqZt`Q0Fpϵ1tq ` B0ϵt,(4)
where Fpxq “ x2in their baseline specification. Forni et al. (2023b) use an analogous model,
while Debortoli et al. (2020) also consider more general setups where Q0is replaced by a general
lag polynomial DpLq. These kinds of structural assumptions are similar but not identical to
the ones imposed in Gonçalves et al. (2021) and this paper. For (4) to overlap with (2), one
must assume that Xtis exogenous and independently distributed, so that its level does not
affect the mapping of ϵ1tthrough F. That is, (4) requires that only the shocks have nonlinear
effects, not the structural variable itself. The upside of this approach is that one can directly
and explicitly model asymmetry in the innovation process. The drawbacks are that, without a
clear identification of a structural variable, one must fully identify B0. Moreover, function F
remains to be specified a priori. Note, however, that if innovation sequence ϵ1tis observable, a
generalization of the semi-nonparametric estimation results of this paper to the framework of
Debortoli et al. (2020) would be straightforward.
I now state some preliminary assumptions for the model.
Assumption 1. tϵ1tutPZand tϵ2tutPZare mutually independent time series such that
«ϵ1t
ϵ2tffi.i.d.
„˜0,«σ2
10
0 Σ2ff¸
where Σ2is a diagonal positive definite matrix.
Assumption 2. tZtutPZis strictly stationary, ergodic and such that suptEr|Zt|să8.
Assumption 3. The roots of equation detpId´ApLqLq “ 0are outside the complex unit circle.
Assumption 1follows Gonçalves et al. (2021). Assumption 2is a high-level assumption
on the properties of process tZtutPZand is common in the analysis of structural time series.
Assumption 3ensures that it is possible to invert lag polynomial pI´ApLqLqin order to define
impulse responses, as done below. However, Assumption 2and 3will not be sufficient to make
sure that (2) is estimable from data, and in Section 3additional constraints on ApLqand GpLq
will be required in order to apply semi-nonparametric estimation. Moreover, Assumption 2is
not easily interpretable: functional lag polynomial GpLqmakes it impossible to reduce semi-
structural equations (2) to an explicit infinite moving average form.
I will resolve both the former (sufficiency) and latter (interpretability) issue by using the
nonlinear dynamic model framework outlined by Pötscher and Prucha (1997). It will allow
introducing regularity assumptions on the dependence of Ztwhich enable the derivation of
consistency of impulse response estimates.
9
2.2 Structural Nonlinear Impulse Responses
Starting from pseudo-reduced equations (2), by letting ΨpLq“pId´ApLqLq´1one can further
derive that
Zt“η`ΘpLqϵt`ΓpLqXt,(5)
where
µ:“Ψp1q«µ1
µ2ff,ΘpLq:“ΨpLqB´1
0,and ΓpLq:“ΨpLq«0
G21pLqff.
To formally define impulse responses, it is useful to partition the polynomial ΘpLqaccording to
ΘpLq:“”Θ¨1pLqΘ¨2pLqı,
where Θ¨1pLqrepresents the first column of matrices in ΘpLq, and Θ¨2pLqthe remaining dY
columns.
Given impulse δPRat time t, define the shocked innovation process as ϵ1spδq “ ϵsfor
s“ tand ϵ1tpδq “ ϵt`δ, as well as the shocked structural variable as Xspδq “ Xtfor sătand
Xspδq “ XspZt´1, ϵt`δ, ϵt`1...,ϵsqfor sět. Further, let
Zt`h:“η`Θ¨1pLqϵ1t`h`Θ¨2pLqϵ2t`h`ΓpLqXt,
Zt`hpδq:“η`Θ¨1pLqϵ1t`hpδq ` Θ¨2pLqϵ2t`h`ΓpLqXtpδq,
be the time-tbaseline and shocked series, respectively. The unconditional impulse response is
given by
IRFhpδq “ ErZt`hpδq ´ Zt`hs.(6)
The difference between shock and baseline is clearly
Zt`hpδq ´ Zt`h“Θh,¨1δ`ΓpLqXtpδq ´ ΓpLqXt
“Θh,¨1δ`pΓ0Xt`hpδq ´ Γ0Xt`hq ` . . . ` pΓhXtpδq ´ ΓhXtq,
therefore the unconditional IRF reduces to
IRFhpδq “ Θh,¨1δ`ErΓ0Xt`hpδq ´ Γ0Xt`hs ` . . . `ErΓhXtpδq ´ ΓhXts.(7)
Notice that, in (7), while one can linearly separate expectations in the impulse response
formula, terms ErΓjXt`jpδq ´ ΓjXt`jsfor 0ďjďhcannot be meaningfully simplified. Coef-
ficients Γjare functional, therefore it is not possible to collect them across Xt`jpδqand Xt`j.
Moreover, these expectations involve nonlinear functions of lags of Xtand cannot be computed
explicitly. To address this issue, Section 4provides an iterative procedure that makes computa-
tion of nonlinear impulse responses in (7) straightforward.
Remark 2.3. (Local Projection Approaches). As mentioned in the introduction, in recent years
there has been growing interest in nonlinear IRF estimation procedures, and, accordingly, ways to
generalize the LP framework. Jordà (2005) already suggested that nonlinear impulse responses
10
can, in principle, be directly estimated with local projections via the so-called flexible local
projection approach. The flexible LP method relies on the Volterra expansion of time series to
account for nonlinearities. There are multiple issues with this method. First, Jordà (2005) does
not directly state how the validity of Volterra series implies the autoregressive form used in the
LP regression. Second, the flexible LP proposal is fundamentally equivalent to adding polynomial
factors to the linear regression specification. Thus, it is effectively a semi-nonparametric method,
yet Jordà (2005) does not provide a theoretical analysis from this viewpoint. Moreover, no
criterion or empirical rule-of-thumb for selecting the truncation order of the Volterra expansion
are suggested, which becomes a key issue in practice. Due to these concerns, application of
flexible LPs seems hard to justify from an econometric perspective.6Lanne and Nyberg (2023)
propose to nonparametrically recover the conditional mean function with a nearest-neighbor
(k-NN) regression estimator. Their method is very flexible, but requires appropriately choosing
the neighborhood size kand a distance measure for histories of realizations, and the authors do
not theoretically address these issues. Very recently, Gourieroux and Lee (2023) have considered
nonlinear IRF estimation with kernel-based methods by means of a novel conditional quantile
representation of the process. They prove kernel LP estimators based on such representation
are consistent, and that the direct estimator is asymptotically normal. The theory is developed
only for the univariate case, with an autoregressive structure of lag order one, limiting the
applicability of their procedure.
3 Estimation
Pseudo-reduced form model (2) can be compactly rewritten as
Xt“Π1
1W1t`ϵ1t,
Yt“Π1
2W2t`u2t,(8)
where
Π1:“`η1, A1,11,¨¨¨ , Ap,11, A1
1,12,¨¨¨ , A1
p,12˘1PR1`pd ,
Π2:““η2G1,21 ¨¨¨ Gp,21 A1,22 ¨¨¨ Ap,22 B21
0‰1,
Zt´1:t´p:“`Xt´1, . . . , Xt´p, Y 1
t´1, . . . , Y 1
t´p˘1PRpd,
W1t:“`1, Z1
t´1:t´p˘1PR1`pd,
W2t:“`1, Xt, Z1
t´1:t´p, ϵ1t˘1PR3`pd.
Additionally, let W1“ pW11, . . . , W1nq1and W2“ pW21, . . . , W2nq1be the design matrices for Xt
and Yt, respectively.
6Moreover, the complexity of estimating Volterra kernels grows exponentially with the kernel order, and thus
more sophisticated approaches have been proposed to make estimation feasible, see e.g. Sirotko-Sibirskaya et al.
(2020) and Movahedifar and Dickhaus (2023).
11
Two-step Estimation Procedure. Since W2tis an infeasible vector of regressors, to estimate
Π2one can use x
W2t“ p1, Xt, Z1
t´1:t´p,pϵ1tq1, which now contains generated regressors in the form
of residual pϵ1t. This approach is an adaptation of the two-step procedure put forth by Gonçalves
et al. (2021), where I allow for semi-nonparametric estimation:
1. Regress Xtonto W1tto get estimate p
Π1and compute residuals pϵ1t“Xt´p
Π1
1W1t.
2. Fit Ytusing x
W2tto get estimate p
Π2. Since G1,21 , . . . , Gp,21 contain functional parameters,
a semi-nonparametric estimation method is required.
3. Compute coefficients in p
ΘpLqand p
ΓpLqfrom p
Π1and p
Π2.
4. Consider the two paths with time tshocks ϵt`δversus ϵt: to construct the unconditional
IRF, average over histories as well as future shocks by using the algorithm detailed in
Proposition 4.1 or Proposition 4.2.
Gonçalves et al. (2021) only allow for pre-determined nonlinear transforms of Xt. The core
contribution of this paper is allowing G1,21, . . . , Gp,21 to be estimated in a nonparametric way.
I focus on series estimation in order to build on the extensive theory available in the setting of
dependent data (Chen,2013,Chen and Christensen,2015). This further adds to the framework
of Gonçalves et al. (2021), as their regularity assumptions are stated only as preconditions for
a uniform LLN to hold and are not easy to interpret.
Remark 3.1. (Alternative Estimation Approaches). One does not need to limit estimation of
the nonlinear functional parameters G1,21, . . . , Gp,21 to series-type estimators. The literature on
nonparametric regression is mature, and thus kernel (Tsybakov,2009), nearest-neighbor (Li and
Racine,2009), partitioning (Cattaneo et al.,2020) and deep neural network (Farrell et al.,2021)
estimators are all potentially valid alternatives. For example, Huang et al. (2014) use kernel
regression to perform density estimation and regression under physical dependence. However,
thanks to both availability of uniform inference results (see also Belloni et al. 2015) and ease
of implementation, series methods stand out as a choice for semi-nonparametric time series
estimation and nonlinear impulse response computation.
In the reminder of this section, I first introduce the semi-nonparametric series estimation
strategy in detail. Then, I outline the core assumptions of the sieve setup. Special focus is
put on the dependence structure of the data: rather than directly assuming β-mixing as in
Chen and Christensen (2015), I shall consider physical dependence assumptions (Wu,2005).
to provide transparent conditions on the model itself that, if satisfied, ensure consistency. I
prove that the proposed two-step semi-nonparametric procedure is uniformly consistent under
physical dependence assumptions. These assumptions can be imposed directly on the model,
and, as such, may be empirically checked, if necessary. The uniform asymptotic guarantees
are first stated for the infeasible estimator involving true innovations ϵ1tand later extended to
encompass feasible estimator p
Π2.
12
3.1 Semi-nonparametric Series Estimation
Starting from (8), one can introduce the ith-row coefficient matrices
G21
i“ rG1,21 ¨¨¨ Gp,21si,
A22
i“ rA1,22 ¨¨¨ Ap,22si,
and B21
0iaccordingly. Consider now the regression problem for each individual component of Yt,
Yit “G21
iXt:t´p`A22
iYt´1:t´p`B21
0iϵ1t`u2it,
where Xt:t´p:“ pXt, . . . , Xt´pq1and i“1, . . . , dY. For simplicity of notation, I suppress inter-
cept η2i, but this is without loss of generality. Since G21
iconsists of 1`pfunctional coefficients
and A22
ican be segmented into prow vectors of length dY, it is possible to rewrite the above as
Yit “
p
ÿ
j“0
g21
ij pXt´jq `
p
ÿ
j“1
A22
ij Yt´j`B21
0iϵ1t`u2it.(9)
I will use π2,i :“ rG21
iA22
iB21
0is1to identify the vector of coefficients in the equation for the ith
component of Yt. From (9), Π1
2can be decomposed in dYrows of coefficients, i.e.
»
—
—
–
Y1t
.
.
.
YdYt
fi
ffi
ffi
fl“»
—
—
–
π2,1
.
.
.
π2,dY
fi
ffi
ffi
flW2t`u2t
and one can treat each equation separately.
A semi-nonparametric series estimator for (9) is built on the idea that, if functions g21
ij
belong to an appropriate functional space, one can construct a growing collection of sets of basis
functions – called a sieve – which, linearly combined, progressively approximate g21
ij . That is,
one can reduce the infinite dimensional problem of estimating the functional coefficients in π2,i
to a linear regression problem. Although (9) features a sum of possibly nonlinear functions in
tXt´jup
j“0, as well as linear functions of tYt´jup
j“1and ϵ1t, constructing a sieve is straightfor-
ward.7
Assume that g21
ij PΛ, where Λis a sufficiently regular function class to be specified in the
following, and let BΛbe a sieve for Λ. Let b1κ, . . . , bκκ be the collection of κě1sieve basis
functions from BΛand define
bκpxq:“ pb1κpxq, . . . , bκκpxqq1,
Bκ:“ pbκpX1:1´pq, . . . , bκpXn:n´pqq1.
The sieve space for π2,i is B1`p
ΛˆR1`pdY, where here Ridentifies the space of linear functions.
Since the nonparametric components of Π2are linearly separable in the lag dimension, I take
7See Chen (2007) for a comprehensive exposition of sieve estimation. Chen and Shen (1998) and Chen (2013)
also provide additional examples of partially linear semi-nonparametric models under dependence.
13
B1`p
Λto be a direct product of sieve spaces.8Importantly, the same sieve can be used for all
components of Yt, as I assume the specification of the model does not change across i.
Let bπ,1K, . . . , bπ,K K be the sieve basis in B1`p
ΛˆR1`pdYwhich, for κě1and K“
pκ ` p1`pdYq, is given by
bπ,1KpW2tq “ b1κpXtq,
.
.
.
bπ,ppκqKpW2tq “ bκκpXt´pq,
bπ,ppκ`1qKpW2tq “ Yt´1,1,
.
.
.
bπ,pK´1qKpW2tq “ Yt´p,dY,
bπ,K K pW2tq “ ϵ1t,
where κfixes the size of the nonparametric component of the sieve. Note that K, the overall size
of the sieve, grows linearly in κ, which itself controls the effective dimension of the nonparametric
component of the sieve, bπ,1κ, . . . , bπ,κκ. In all theoretical results, I will focus on the growth rate
of Krather than κ, as asymptotically they differ at most by a constant multiplicative factor.
The regression equation for π2,i is
Yi“π1
2,iW2`u2i,
where Yi“ pYi1, . . . , Yinq1and u2i“ pu2i1, . . . , u2in q1. The estimation target is the conditional
expectation π2,ipwq “ ErYit |W2t“wsunder the assumption Eru2it |W2ts “ 0. By introducing
bK
πpwq:“ pbπ,1Kpwq, . . . , bπ,K K pwqq1,
Bπ:“`bK
πpW21q, . . . , bK
πpW2nq˘1,
the infeasible least squares series estimator p
π˚
2,ipwqis given by
p
π˚
2,ipwq “ bK
πpwq1pB1
πBπq´1B1
KYi.
Similarly, consider the feasible series regression matrices
bK
πpwq:“ pbπ,1Kpwq, . . . , bπ,K K pwqq1,
p
Bπ:“´bK
πpx
W21q, . . . , bK
πpx
W2nq¯1.
8It is not necessary to consider the more general case of tensor products of 1D sieve functions, as it would
be the case for a general p1`dYq-dimensional function G21
ipXt, Xt´1,...,Xt´pq. As previously discussed, the
additive structure avoids the curse of dimensionality which in nonlinear time series modeling if often a primary
concern when working with moderate sample sizes (Fan and Yao,2003).
14
Thus, the feasible least squares series estimator is
p
π2,ipwq “ bK
πpwq1pp
B1
πp
Bπq´1p
B1
KYi.
Given that the semi-nonparametric estimation problem is the same across i, to further
streamline notation, where it does not lead to confusion I will let π2be a generic coefficient
vector belonging to tπ2,iup
i“1, as well as define pπ2,Yand u2accordingly.
Remark 3.2. (Constrained Sieve Estimation). The idea of constrained estimation was only
briefly touched upon in Remark 2.1. In fully parametric nonlinear models, constraints are often
imposed out of necessity or simplicity. If, say, G1,21 is constituted only of the negative-censoring
map, it is unclear why G2,21 would be constituted instead of quadratic or cubic functions, for
example. That is, specific parametric assumptions can be either unreasonable or hard to justify
in practice.9Yet, constrained semi-nonparametric estimation might be desirable at times.
If the shape of the regression function is to be constrained to ensure e.g. non-negativity,
monotonicity or convexity, Chen (2007) gives examples of shape-preserving sieves, like cardinal
B-spline wavelets. Constraints on a generic sieve can also be imposed at estimation time. For
example, for simplicity suppose dY“1and p“2, and that one wants to impose G1,21 “G2,21.
The constrained sieve estimator then solves
min
β
n
ÿ
t“p`1`Yt´β1bK
πpW2tq˘2subject to “Iκ,´Iκ,0κˆp1`pdYq‰β“0.
Analysis of restricted or constrained estimators, however, is still a challenging problem in non-
parametric theory, c.f. Horowitz and Lee (2017), Freyberger and Reeves (2018), Chetverikov
et al. (2018). Misspecification in particular is complex to address. Accordingly, I will not be
imposing any specific restrictions on the nonlinear functions in Π2outside the ones necessary to
derive uniform asymptotic theory.
Spline Sieve. The B-spline sieve BSplpκ, r0,1sdY, rqof degree rě1over r0,1sdYcan be con-
structed using the Cox-de Boor recursion formula. Alternatively, an equivalent way of construct-
ing the spline sieve is as follows. For simplicity, let dY“1and let 0ăm1ă. . . ămκ´r´1ă1
be a set of knots. Then
bκ
splinepxq:“`1, x, x2, . . . , xr,maxpx´m1,0qr,...,maxpx´mκ´r´1,0qr˘1.
The resulting spline sieve is piece-wise polynomial of degree r. Moreover, notice that in practice
the spline sieve already contains a linear and constant term, so care must be taken to avoid
collinearity (for example, by not including an additional intercept and linear term in Xtin the
series regression).
9For more precise examples and a more in-depth discussion, see Section 2.1 of Chen (2013).
15
3.2 Distributional and Sieve Assumptions
To develop the asymptotic uniform consistency theory, I rely on the general theoretical frame-
work established by Chen and Christensen (2015). Basic distributional and sieve assumptions
can be carried over from their setup mostly unchanged.
Assumption 4. (i) tϵtutPZare such that ϵt
i.i.d.
„ p0,Σq, (ii) tϵ1tutPZand tϵ2tutPZare mutually
independent, (iii) ϵtPEfor all tPZwhere EĂRdYis compact, convex and has nonempty
interior.
Assumption 5. (i) tZtutPZis a strictly stationary and ergodic time series, (ii) XtPXfor all
tPZwhere XĂRis compact, convex and has nonempty interior, (iii) YtPYfor all tPZwhere
YĂRdYis compact, convex and has nonempty interior.
Assumptions 4(i)-(ii) are a repetition of Assumption 1. As W2tdepends only on Xt:t´p,
Yt´1:t´p, and ϵ1t, Assumption 1also implies that entries of u2tare independent of W2t, so that
Eru2it |W2ts “ 0.10 Assumption 5(i) also follows from Assumption 2. However, thanks to the
results derived in Section 3.3, below I will impose more primitive conditions on the model for Zt
that allow to recover 5(i). Assumption 4(iii) and Assumptions 5(ii)-(iii) imply that Xt,Yt, as well
as ϵtare bounded random variables. In (semi-)nonparametric estimation, imposing that Xtbe
bounded almost surely is a standard assumption. Since lags of Ytand innovations ϵtcontribute
linearly to all components of Zt, it follows that they too must be bounded. Unbounded regressors
are more complex to handle when working in the nonparametric setting. Generalization from
bounded to unbounded domains under dependence has already been discussed by e.g. Fan and
Yao (2003). Chen and Christensen (2015) also allow for an expanding support by using weighted
sieves. I leave this extension for future work.
It is, however, important to highlight that bounded support assumptions are relatively
uncommon in time series econometrics. This is clear when considering the extensive litera-
ture available on linear models such as, e.g., state-space, VARIMA and dynamic factor mod-
els (Hamilton,1994a,Lütkepohl,2005,Kilian and Lütkepohl,2017,Stock and Watson,2016).
Avoiding Assumptions 4(iii) and 5(iii) can possibly be achieved with a change in the model’s
equations – so that, for example, lags of Ytonly effect Xteither via bounded functions or not
at all – so I do not discuss this approach here. In practice, Assumptions 4(ii) and 5(ii)-(iii)
are not excessively restrictive, as most credibly stationary economic series often have reasonable
implicit (e.g. inflation) or explicit bounds (e.g. employment rate).11
Let Ft“σp...,ϵ1t´1, u2t´1, Yt´1, ϵ1t, u2t, Ytqbe the natural filtration defined up to time t.
Thanks to Assumptions 4and 5the following moment requirements hold trivially.
10Moreover, for any given i, the sequence tu2itutPZis i.i.d. over time index t.
11This is not true, of course, when modeling extreme events like natural disasters, wars or financial crises. To
study these types of series, however, researchers often apply specialized models. Thinking in this direction, a
future development could be to extend the framework presented here to allow for innovations with unbounded
support.
16
Assumption 6. (i) Eru2
2it|Ft´1sis uniformly bounded for all tPZalmost surely, (ii) Er|u2it|2`δs ă
8for some δą0, (iii) Er|Yit|2`δsis uniformly bounded for all tPZalmost surely, and (iv)
ErY2
it |Ft´1s ă 8 for any δą0.
Now let W2ĂRdbe the domain of W2t. By assumption, W2is compact and convex and is
given by the direct product
W2“X1`pˆYpˆE1,
where E1is the domain of structural innovations ϵ1ti.e. E”E1ˆE2.
Assumption 7. Define ζK,n :“supwPW2∥bK
πpwq∥and
λK,n :“ rλminpErbK
πpW2tqbK
πpW2tq1sqs´1{2.
It holds:
(i) There exist ω1, ω2ě0s.t. supwPW2∥∇bK
πpwq∥Ànω1Kω2.
(ii) There exist ω1ě0,ω2ą0s.t. ζK,n Ànω1Kω2.
(iii) λminpErbKpW2tqbKpW2tq1sq ą 0for all Kand n.
Assumption 7provides mild regularity conditions on the families of sieves that can be
used for the series estimator. More generally, letting W2be compact and rectangular makes
Assumption 7hold for commonly used basis functions (Chen and Christensen,2015).12 In
particular, Assumption 7(i) holds with ω1“0since the domain is fixed over the sample size.
In the proofs, it is useful to consider the orthonormalized sieve basis. Let
r
bK
πpwq:“E“bK
πpW2tqbK
πpW2tq1‰´1{2bK
πpwq,
r
Bπ:“´r
bK
πpW21q,...,r
bK
πpW2nq¯1
be the orthonormalized vector of basis functions and the orthonormalized regression matrix,
respectively.
Assumption 8. It holds that ∥pr
B1
πr
Bπ{nq ´ IK∥“oPp1q.
Assumption 8is the key assumption imposed by Chen and Christensen (2015) to derive
uniform converges rates under dependence. They prove that if tW2tutPZis strictly stationary
and β-mixing – with either geometric or algebraic decay, depending on the sieve family of interest
– then Assumption 8holds. Let pΩ,Q,Pqbe the underlying probability space and define
βpA,Bq:“1
2sup ÿ
pi,jqPIˆJ
|PpAiXBiq ´ PpAiqPpBiq|
where A,Bare two σ-algebras, tAiuiPIĂA,tBjujPJĂBand the supremum is taken over all
finite partitions of Ω. The h-th β-mixing coefficient of process tW2tutPZis defined as
βphq “ sup
t
βpσp...,W2t´1, W2tq, , σpW2t`h, W2t`h`1, . . .qq,
12See Chen (2007), Belloni et al. (2015) for additional discussion and examples of sieve families.
17
and W2tis said to be geometric or exponential β-mixing if βphq ď γ1expp´γ2hqfor some γ1ą0
and γ2ą0. The main issue with mixing assumptions is that they are, in general, hard to
compute and evaluate. Therefore, especially in nonlinear systems, assuming that βphqdecays
exponentially over himposes very high-level assumptions on the model. There are, however,
many setups in which it is known that β-mixing holds under primitive assumptions (see Chen
(2013) for examples).
In the next subsection, I will argue that using a different concept of dependence - one rooted
in a physical understanding of the underlying stochastic process - leads to imposing transparent
assumptions on the model’s structure.
3.3 Physical Dependence Conditions
Consider now a non-structural model of the form
Zt“GpZt´1, ϵtq.(10)
This is a generalization of semi-reduced model (3) where linear and nonlinear components are
absorbed into one functional term and B0is the identity matrix.13 Indeed, note that models
of the form Zt“GpZt´1, . . . , Zt´p, ϵtqcan be rewritten as (10) using a companion formulation.
If ϵtis stochastic, (10) defines a causal nonlinear stochastic process. More generally, it defines
a nonlinear difference equation and an associated dynamical system driven by ϵt. Throughout
this subsection, I shall assume that ZtPZĎRdZas well as ϵtPEĎRdZ.
Relying on the framework of Pötscher and Prucha (1997), I now introduce explicit conditions
that allow to control dependence in nonlinear models by using the toolbox of physical dependence
measures developed by Wu (2005,2011). The aim is to use a dynamical system perspective to
address the question of imposing meaningful assumptions on nonlinear dynamic models. This
makes it possible to give more primitive conditions under which one can actually estimate (8)
in a semi-nonparametric way.
Stability. An important concept for dynamical system theory is that of stability. Stability
turns out to play a key role in constructing valid asymptotic theory, as it is well understood
in linear models. It is also fundamental in developing the approximation theory of nonlinear
stochastic systems.
Example 3.1. (Linear System). As a motivating example, first consider the linear system
Zt“BZt´1`ϵt
where we may assume that tϵtutPZ,ϵtPRdZ, is a sequence of i.i.d. innovations.14 It is well-
known that this system is stable if and only if the largest eigenvalue of Bis strictly less than
13In this specific subsection, shock identification does not play a role and, as such, one can safely ignore B0.
14One could alternatively think of the case of a deterministic input, setting ϵt„Ptpatqwhere Ptpatqis a Dirac
density on the deterministic sequence tatutPZ.
18
one in absolute value (Lütkepohl,2005). For a higher order linear system, Zt“BpLqZt´1`ϵt
where BpLq “ B1`B2L`. . . `BpLp´1, stability holds if and only if |λmax pBq| ă 1where
B:“
»
—
—
—
—
—
—
—
–
B1B2¨¨¨ Bp
IdZ0¨¨¨ 0
0IdZ¨¨¨ 0
.
.
..
.
.¨¨¨ .
.
.
0¨¨¨ IdZ0
fi
ffi
ffi
ffi
ffi
ffi
ffi
ffi
fl
is the companion matrix.
Extending the notion of stability from linear to nonlinear systems requires some care. Pötscher
and Prucha (1997) derived generic conditions allowing to formally extend stability to nonlinear
models by first analyzing contractive systems.
Definition 3.1 (Contractive System).Let ZtPZĎRdZ,ϵtPEĎRdZ, where tZtutPZis
generated according to
Zt“GpZt´1, ϵtq.
The system is contractive if for all pz , z1q P ZˆZand pe, e1q P EˆE
∥Gpz, ϵq ´ Gpz1, ϵ1q∥ďCZ∥z´z1∥`Cϵ∥e´e1∥
holds with Lipschitz constants 0ďCZă1and 0ďCϵă 8.
Sufficient conditions to establish contractivity are
sup #
stackdZ
i“1„BG
BZpzi, eiqȷi
ˇˇˇˇziPZ, eiPE+ă1(11)
and
BG
Bϵ
ă 8,(12)
where the stacking operator stackdZ
i“1r¨siprogressively stacks the rows, indexed by i, of its
argument (which can be changing with i) into a matrix. Values pzi, eiq P ZˆEchange with
index ias the above condition is derived using the mean value theorem, therefore it is necessary
to consider a different set of values for each component of Zt.
It is easy to see, as Pötscher and Prucha (1997) point out, that contractivity is often a
too strong condition to be imposed. Indeed, even in the simple case of a scalar AR(2) model
Zt“b1Zt´1`b2Zt´2`ϵt, regardless of the values of b1, b2PRcontractivity is violated. This is
due to the fact that in a linear AR(2) model studying contractivity reduces to checking ∥B∥ă1
instead of |λmaxpBq| ă 1, and the former is a stronger condition than the latter.15 One can
weaken contractivity – which must hold for Gas a map from Zt´1to Zt– to the idea of eventual
15See Pötscher and Prucha (1997), pp.68-69.
19
contractivity. That is, intuitively, one can impose conditions on the dependence of Zt`hon Zt
for hą1sufficiently large. To do this formally, I first introduce the definition of system map
iterates.
Definition 3.2 (System Map Iterates).Let ZtPZĎRdZ,ϵtPEĎRdZwhere tZtutPZis
generated from a sequence tϵtutPZaccording to
Zt“GpZt´1, ϵtq.
The h-order system map iterate is defined to be
GphqpZt, ϵt`1, ϵt`2, . . . , ϵt`hq:“GpGp¨¨¨GpZt, ϵt`1q¨ ¨¨ , ϵt`h´1q, ϵt`hq
“Gp¨, ϵt`hq ˝ Gp¨, ϵt`h´1q ˝ ¨ ¨¨ ˝ GpZt, ϵt`1q,
where ˝signifies function composition and Gp0qpZtq “ Zt.
To shorten notation, in place of GphqpZt, ϵt`1, ϵt`2, . . . , ϵt`hqI shall use GphqpZt, ϵt`1:t`hq.
Additionally, for 1ďjďh, the partial derivative
BGph˚q
Bϵj
for some fixed h˚is to be intended with respect to ϵt`j, the j-th entry of the input sequence.
This derivative does not dependent on the time index since by assumption Gis time-invariant
and so is Gphq.
Taking again the linear autoregressive model as an example,
Zt`h“GphqpZt, ϵt`1:t`hq “ Bh
1Zt`
h´1
ÿ
i“0
Bi
1ϵt`h´i
since Gpz, ϵq “ B1z`ϵ. If B1determines a stable system, then ∥Bh
1∥Ñ0as hÑ 8 since Gh
converges to zero, and therefore ∥Bh
1∥ďCZă1for hsufficiently large. It is thus possible to
use system map iterates to define stability for higher-order nonlinear systems.
Definition 3.3 (Stable System).Let ZtPZĎRdZ,ϵtPEĎRdZ, where tZtutPZis generated
according to the system
Zt“GpZt´1, ϵtq.
The system is stable if there exists h˚ě1such that for all pz, z 1q P ZˆZand pe1, e2,...eh˚,
e1
1, e1
2, . . . , e1
h˚q P Ś2h˚
i“1E
∥Gph˚qpz, e1:h˚q ´ Gph˚qpz1, e1
1:h˚q∥ďCZ∥z´z1∥`Cϵ∥e1:h˚´e1
1:h˚∥
holds with Lipschitz constants 0ďCZă1and 0ďCϵă 8.
It is important to remember that this definition encompasses systems with an arbitrary finite
autoregressive structure, i.e., Zt“GpZt´p`1, . . . , Zt´1, ϵtqfor pě1, thanks to the companion
formulation of the process. An explicit stability condition, similar to that discussed above for
20
contractivity, can be derived by means of the mean value theorem. Indeed, for a system to be
stable it is sufficient that, at iterate h˚,
sup #
stackdZ
i“1«BGph˚q
BZpzi, ei
1:h˚qffi
ˇˇˇˇziPZ, ei
1:h˚P
h˚
ą
i“1
E+ă1(13)
and
sup #
BGph˚q
Bϵjpz, e1:h˚q
ˇˇˇˇzPZ, e1:h˚P
h˚
ą
i“1
E+ă 8, j “1, . . . , h˚.(14)
Pötscher and Prucha (1997) have used conditions (11)-(12) and (13)-(14) as basis for uniform
laws of large numbers and central limit theorems for Lr-approximable and near epoch dependent
processes.
Physical Dependence. Wu (2005) first proposed alternatives to mixing concepts by propos-
ing dependence measures rooted in a dynamical system view of a stochastic process. Much work
has been done to use such measures to derive approximation results and estimator properties,
see for example Wu et al. (2010), Wu (2011), Chen et al. (2016), and references within.
Definition 3.4. If for all tPZ,Zthas finite rth moment, where rě1, the functional physical
dependence measure ∆ris defined as
∆rphq:“sup
t
Zt`h´GphqpZ1
t, ϵt`1:t`hq
Lr
where ∥¨∥Lr“ pEr∥¨∥r
rsq1{r,Z1
tis due to F1
t“ p...,ϵ1
t´1, ϵ1
tqand tϵ1
tutPZis an independent copy
of tϵtutPZ.
Chen et al. (2016), among others, show how one may replace the geometric β-mixing as-
sumption with a physical dependence assumption.16 They show that the key sufficient condition
is for ∆rphqto decay sufficiently fast as hgrows.
Definition 3.5 (Geometric Moment Contracting Process).tZtutPZis geometric moment con-
tracting (GMC) in Lrnorm if there exists a1ą0,a2ą0and τP p0,1ssuch that
∆rphq ď a1expp´a2hτq.
GMC conditions can be considered more general than β-mixing, as they encompass well-
known counterexamples, e.g., the known counterexample provided by Zt“ pZt´1`ϵtq{2for ϵt
i.i.d. Bernoulli r.v.s (Chen et al.,2016). In the following proposition I prove that if contractivity
or stability conditions as defined by Pötscher and Prucha (1997) hold for Gand tϵtutPZis an
i.i.d. sequence, then process tZtutPZis GMC under weak moment assumptions.
16I adapt the definitions of Chen et al. (2016) to work with a system of the form Zt“GpZt´1, ϵtq.
21
Proposition 3.1. Assume that tϵtutPZ,ϵtPEĎRdZare i.i.d. and tZtutPZis generated according
to
Zt“GpZt´1, ϵtq,
where ZtPZĎRdZand Gis a measurable function.
(a) If contractivity conditions (11)-(12)hold, suptPZ∥ϵt∥Lră 8 for rě2and ∥Gpz , ϵq∥ă 8
for some pz, ϵq P ZˆE, then tZtutPZis GMC with
∆rpkq ď aexpp´γhq
where γ“ ´logpCZqand a“2∥Zt∥Lră 8.
(b) If stability conditions (13)-(14)hold, suptPZ∥ϵt∥Lră 8 for rě2and ∥BG{BZ∥ďMZă
8, then tZtutPZis GMC with
∆rpkq ď ¯aexpp´γh˚hq
where γh˚“ ´logpCZq{h˚and ¯a“2∥Zt∥LrmaxtMh´1
Z,1u{CZă 8.
Proposition 3.1 is important in that it links the GMC property to transparent conditions on
the structure of the nonlinear model. It also immediately allows handling multivariate systems,
while previous work has focused on scalar systems (c.f. Wu (2011) and Chen et al. (2016)).
Finally, it is now possible to show that if tW2tutPZsatisfies physical dependence assumptions,
then Assumption 8is fulfilled, c.f. Lemma 2.2 in Chen and Christensen (2015) for β-mixing
assumptions.
Lemma 3.1. If Assumption 7(iii) holds and tW2tutPZis strictly stationary and GMC then
one may choose an integer sequence q“qpnq ď n{2with pn{qqr`1qKρ∆rpqq “ op1qfor ρ“
5{2´ pr{2`2{rq ` ω2and rą2such that
∥pr
B1
πr
Bπ{nq ´ IK∥“OP˜ζK,nλK,n cqlog K
n¸“oPp1q
provided ζK,nλK,n apqlog Kq{n“op1q.
It can be seen that Lemma 3.1 holds by setting aKplogpnqq2{n“op1qand choosing qpnq “
γ´1logpKρnr`1q, where γis the GMC factor introduced in Proposition 3.1. Therefore, the
rate is the same as the one derived by Chen and Christensen (2015) for exponentially β-mixing
regressors. As shown in Proposition 3.1, system contractivity and stability conditions both
imply geometric moment contractivity, meaning that in place of Assumption 8one may require
the following.
Assumption 9. For rą2it holds either:
(i) tZtutPZis GMC in Lrnorm,
(ii) tZtutPZis generated according to Zt“ΦpZt´1, . . . , Zt´p;ϵtqwhere suptPZ∥ϵt∥Lră 8 and
Φis either contractive according to Definition 3.1 or stable according to Definition 3.3.
22
It is straightforward to prove that if GMC conditions are imposed on tZtutPZ, this im-
plies that tW2tutPZis also GMC.17 Therefore, Lemma 3.1 applies and Assumption 8as well as
Assumption 5(i) are verified.
3.4 Uniform Convergence and Consistency
Since the key asymptotic condition of Chen and Christensen (2015) is upheld under GMC
assumptions, their uniform convergence bound on the approximation error of the series estimator
can be applied. In order to do so, one must also impose some regularity conditions on π2.
Without loss of generality, let X“ r0,1sand let ∥π2∥8:“supwPY|π2pwq|be the sup-norm
of the conditional mean function π2pwq.
Assumption 10. The unconditional density of Xtis uniformly bounded away from zero and
infinity over X.
Assumption 11. For all 1ďiďdYand 0ďjďp, the restriction of g21
ij to r0,1sbelongs to
the Hölder class Λspr0,1sq of smoothness sě1.
Assumptions 10 and 11 are standard in the nonparametric regression literature. One only
needs to restrict the complexity of functions g21
ij since, for any i, the remainder of π2,i consists
of linear functions. More precisely, what is really needed is that the nonparametric components
of the sieve given by bπ,1K, . . . , bπ,K K are able to approximate g21
ij well enough.
Assumption 12. Sieve Bκbelongs to BSplpκ, r0,1sdY, rq, the B-spline sieve of degree rover
r0,1sdY, or Wavpκ, r0,1sdY, rq, the wavelet sieve of regularity rover r0,1sdY, with rąmaxts, 1u.
In the remainder of the paper, I will consider the cubic spline sieve (r“3), but theoretical
results are stated in the more general setting. Moreover, dwill be the effective dimension of the
joint estimation domain for G21
i.
Theorem 3.1 (Chen and Christensen (2015)).Let Assumptions 4,5,6,7,9,10,11 and 12
hold. If
K— pn{logpnqqd{p2s`dq,
then
∥pπ˚
2´π2∥8“OP´pn{logpnq´s{p2s`dq¯
provided that δě2{s(in Assumption 6) and dă2s.
In Theorem 3.1 the sup-norm consistency rate generally depends on the dimension dand
thus, in principle, the curse of dimensionality slows down convergence compared to parametric
estimation. Fortunately, under the current strctural model assumptions, the nonlinear functional
components in π2are linearly separable in the lag dimension, and thus one may take d“1as
effective dimension. This also means that condition dă2sis trivially satisfied.
17A formal argument can be found in Appendix B.
23
Two-step Consistency. The following theorem ensures that the two-step estimation proce-
dure produces consistent estimates. Since for impulse response functions one needs to study
the iteration of the entire structural model, this results is stated in terms of the full coefficient
matrices.
Theorem 3.2. Let tZtutPZbe determined by structural model (1). Under Assumptions 1,4,
5,6,7,9,10,11 and 12, let p
Π1and p
Π2be the least squares and semi-nonparametric series
estimators for Π1and Π2, respectively, based on the two-step procedure. Then,
∥p
Π1´Π1∥8“OPpn´1{2q
and
∥p
Π2´Π2∥8ďOPˆζK,nλK,n
K
?n˙`∥p
Π˚
2´Π2∥8,
where p
Π˚
2is the infeasible series estimator involving ϵ1t.
Sup-norm bounds for ∥p
Π˚
2´Π2∥8follow immediately from Lemma 2.3 and Lemma 2.4
in Chen and Christensen (2015). In particular, choosing the optimal nonparametric rate K—
pn{logpnqqd{p2s`dqfor the infeasible estimator would yield
∥p
Π˚
2´Π2∥8“OP´pn{logpnqq´s{p2s`dq¯
as per Theorem 3.1. The condition for consistency in Theorem 3.2 reduces to
K3{2
?n“op1q,
since for B-spline and wavelet sieves λK,n À1and ζK,n À?K. It simple to show that if for the
feasible estimator p
Π2the same rate pn{logpnqqd{p2s`dqis chosen for K, the consistency condition
in the above display is fulfilled assuming sě1and d“1.18
Remark 3.3. (Hyperparameter Selection). An important practical question when applying any
series or kernel-type methods is the selection of hyperparameters. For the former, this entails
the choice of the sieve’s size K. Although theory provides only asymptotic rates, a number
of methods can be used to select K, such as cross-validation, generalized cross-validation and
Mallow’s criterion (Li and Racine,2009). In the case of piece-wise splines, once size is selected,
knots can be chosen to be the Kuniform quantiles of the data. This ensures knots are not
located in regions of the domain with very few observations. In simulations and applications,
for simplicity, I select sieve sizes manually and locate knots approximately following empirical
quantiles. In unreported numerical experiments, I check that results are robust to moderate
changes in the number and approximate locations of spline knots.
18The rate for Kmay be optimized by balancing the uniform (infeasible) rate with the error due to residuals.
Since this paper is not concerned with finding the optimal rate, I do not perform this exercise here.
24
4 Impulse Response Analysis
Once the model’s linear, functional and structural coefficient are consistently estimated, compu-
tation of nonlinear impulse responses must be addressed. As discussed in Section 2, nonlinear
IRFs are generally hard to lay hands on, since the functional MA(8) form of the process is
highly non-trivial. In this section, I will provide an explicit, iterative algorithm to compute
responses that is numerically straightforward and does not require the construction of moving
average functional coefficients. Moreover, since to derive uniform bounds it is assumed that
the data has compact support, I will introduce a novel yet familiar IRF definition, called the
relaxed impulse response function, which is compatible with boundedness. Lastly, I prove that
semi-nonparametric IRF estimates are consistent with respect to their population counterparts.
4.1 Computation
Recall from equation (7) in Section 2.1 that impulse responses involve two moving average
lag polynomials, ΘpLqfor the linear model component and ΓpLqfor the nonlinear component,
respectively. As a first step, one can derive a semi-explicit recursive algorithm for computing
IRFhpδqin a manner that does not involve simulations of the innovations process.
Proposition 4.1 (Gonçalves et al. (2021), Proposition 3.1).Under Assumptions 1,2and 3,
for any h“0,1, . . . , H, let
Vjpδq:“ErΓjXt`jpδqs ´ ErΓjXt`js.
To compute
IRFhpδq “ Θh,¨1δ`
h
ÿ
j“0
Vjpδq,
the following steps can be used:
(i) For j“0, set Xtpδq “ Xt`δand V0pδq “ ErΓ0Xtpδqs ´ ErΓ0Xts.
(ii) For j“1, . . . , h, let
Xt`jpδq “ Xt`j`Θj,11δ`
j
ÿ
k“1pΓk,11Xt`j´kpδq ´ Γj,11 Xt`j´kq
“γjpXt`j:t;δq,
where γjare implicitly defined and depend on ΘpLqand ΓpLq.
(iii) For j“1, . . . , h, compute
Vjpδq “ ErΓjγjpXt`j:t;δqs ´ ErΓjXt`js.
The proof of Proposition 4.1 is identical to that in Gonçalves et al. (2021), with the only
variation being that in the current setup it is not possible to collect the nonlinear function across
25
Xt`j´kpδqand Xt`j´k. Computation of Xt`jpδqin step (ii) involves recursive evaluations of
nonlinear functions, which is why the algorithm is semi-explicit. For each horizon h, one needs
to evaluate h`1iterations of Xtpδq. Importantly, however, this approach dispenses from the need
to simulate innovations tϵt`juh´1
j“1as the joint distribution of tXt`h´1, Xt`j´1, . . . , Xtualready
contains all relevant information. Gonçalves et al. (2021) naturally argue that the algorithm
outlined in Proposition 4.1 is significantly more efficient than schemes involving Monte Carlo
simulations like e.g. the one used by Kilian and Vigfusson (2011).
However, tΓjuh
j“1are combinations of real and functional matrices and closed-form deriva-
tion is numerically impractical. Note that, by the definition of IRFs, the following explicit
iterative algorithm is also valid.
Proposition 4.2. In the same setup of Proposition 4.1, to compute IRFhpδqthe following steps
can be used:
(i 1) For j“0, let Xtpδq “ Xt`δand
IRF0pδq “ «δ
B21
0δff`E«0
G21,0Xtpδqff´E«0
G21,0Xtff.
(ii 1) For j“1, . . . , h, let
Xt`jpδq “ µ1`A12pLqYt`j´1pδq ` A11 pLqXt`j´1pδq ` ϵ1t`j,
Yt`jpδq “ µ2`A22pLqYt`j´1pδq ` H21 pLqXt`jpδq ` B21
0ϵ1t`j`u2t`j,
where H21pLq:“A21pLqL`G21 pLqand u2t`j:“B22
0ϵ2t`j. Setting Zt`jpδq“pXtpδq, Ytpδqq1
it holds
IRFhpδq “ ErZt`jpδqs ´ ErZt`js.
Proposition 4.2 follows directly from the definition of the unconditional impulse response
(6) combined with explicit iteration of the semi-reduced form (2) and sidesteps the MA(8)
formulation in (7). Step (i1) is trivial in nature. Step (ii1) may not seem useful when compared
to (ii), since, in practice, innovations ϵ1tand u2tare not available. However, let
p
µ, p
A11pLq,p
A12pLq,p
A21pLq,p
H11pLq,p
B21
0
be estimates of the model’s coefficients derived, for example, from series estimator p
Π1and p
Π2.
In sample, one can compute residuals pϵ1tand pu2t, and by definition it holds
Xt“pµ1`p
A12pLqYt´1`p
A11pLqXt´1`pϵ1t,
Yt“pµ2`p
A22pLqYt´1`p
H21pLqXt`p
B21
0p
ϵ1t`pu2t.
This means that one can readily construct the shocked sequence recursively as
p
Xt`jpδq “ pµ1`p
A12pLqp
Yt`j´1pδq ` p
A11pLqp
Xt`j´1pδq ` pϵ1t`j,
p
Yt`jpδq “ pµ2`p
A22pLqp
Yt`j´1pδq ` p
H21pLqp
Xt`jpδq ` p
B21
0pϵ1t`j`pu2t`j,
26
for j“1, . . . , h where p
Xtpδq “ Xt`δ,p
Xt´s“Xt´sfor all sě1and similarly for p
Ytpδq. To
evaluate a structural IRF, over a sample of size none can compute
y
IRFhpδq “ 1
n´j
n´j
ÿ
t“1”p
Yt`jpδq ´ Ytı,
which is still considerably less demanding than Monte Carlo simulations. Additionally, the
advantage in implementing steps (i1)-(ii1) over the procedure in Proposition 4.1 is that, when
p
H21pLqis a semi-nonparametric estimate, iterating model equations is numerically much more
straightforward than handling functional MA matrices tp
Γjuh
j“1.
4.2 Nonlinear Responses with Relaxed Shocks
Following Proposition 4.1, the sample impulse response would be
y
IRFhpδq:“p
Θh,¨1δ`
h
ÿ
j“0
¯
Vjpδq,(15)
where
¯
Vjpδq:“1
n´j
n´j
ÿ
t“1”p
Γjp
γjpXt`j:t;δq ´ p
ΓjXt`jı
and p
Θ,p
Γand p
γjare plug-in estimates of the respective quantities based on p
Π1and p
Π2. However,
under Assumptions 4and 5, the construction of impulse response (15) is improper. This can be
immediately seen by noticing that, at impact,
Xtpδq “ γjpXt;δq “ Xt`δ,
meaning that PpXtpδq R Xq ą 0since there is a translation of size δin the support of Xt.
The problem is rooted in the fact that the standard definition of IRF involves a translation of
the distribution of time tstructural innovations, which is incompatible with the assumptions
imposed in Section 3to derive semi-nonparametric consistency.
There are multiple ways to address this issue. One option, which would require substantial
technical work, is to extend Theorem 3.2 to encompass regressors with unbounded or expanding
domains. A potential direction could be coupling the weighted sieves of Chen and Christensen
(2015) with appropriately defined shocks. Instead, I propose to take a more direct approach by
changing the type of structural shock one studies in a way consistent with bounded domains for
all variables.
Definition 4.1. A mean-shift structural shock ϵ1tpδqis a transformation of ϵ1tsuch that
Ppϵ1tpδq P E1q “ 1and Erϵ1tpδqs “ δ.
A mean-shift shock is such that the distribution of time tinnovations is shifted to have
mean δ, while retaining support Ealmost surely. This definition is natural in that it makes
evaluating the effect of the MA(8) component of the unconditional IRF straightforward. With
27
−2−1 1 2
−2−1 1 2
δ
Figure 1: Example of symmetric shock relaxation. Unperturbed (left, blue) versus shocked
(right, orange) densities of innovations ϵ1t. The shock relaxation function (right, gray) and δ
together determine the form of the relaxed shock used to compute the IRF.
a mean-shift shock, at impact it holds
Xtpδq “ Xt`ϵ1tpδq ´ ϵ1t,
yet ϵ1tpδq ´ ϵ1tis not known unless the transformation for the mean-shock is itself known.
Unfortunately, the assumption that the mean of ϵ1tpδqis exactly equal to δrequires that the
distribution of ϵ1tbe known to properly choose a mean-shift transform. If instead one is willing
to assume only that Erϵ1tpδqs « δ, it is possible to sidestep this requirement by introducing a
shock relaxation function.
Definition 4.2 (Shock Relaxation Function).A shock relaxation function is a map ρ:E1Ñ
r0,1ssuch that ρpzq “ 0for all zPRzE1,ρpzq ě 0for all zPE1and there exists z0PE1for
which ρpz0q “ 1.
In general, choosing a shock relaxation function without taking into account the shape
of domain E1does not necessarily imply that the relaxed shocks will not push the structural
variable out-of-bounds. Therefore, I also introduce the notion of compatibility.
Definition 4.3 (Compatible Relaxation).Consider a shock δPRand let E1“ ra, bs.
(i) If δą0,ρis said to be right-compatible with δif
ρpzq ď b´z
|δ|for all zPE.
(ii) If δă0,ρis said to be left-compatible with δif
ρpzq ď a`z
|δ|for all zPE.
(iii) Given shock size |δ| ą 0,ρis said to be compatible if it is both right- and left-compatible.
By setting
ϵ1tpδq “ ϵ1t`δρpϵ1tq
28
where ρis compatible with δ, it follows that Xtpδq “ Xt`δρpϵ1tqand |Erϵ1tpδqs|“|δErρpϵ1tqs|ď
|δ|since Erρpϵ1tqs P r0,1qby definition of ρ. If ρis a bump function, a relaxed shock is a structural
shock that has been mitigated proportionally to the density of innovations at the edges of E1
and the squareness of ρ. For better intuition, Figure 1provides a graphical rendition of shock
relaxation of a symmetric error distribution with a bump function.
Remark 4.1. The definition of compatible relaxation function is static, as it considers only the
impact effect of a shock. Nonetheless, the assumption that XtPXfor all tmust also hold for
Xtpδq, the shocked structural variable. In theory, given δ, one can always either expand Xor
strengthen ρso that compatibility is enforced at all horizons 1ďhďH. For simulations, where
one has access to the data generating process, the choice of domains and relaxation functions
can be done transparently. In practice, some care is required. When working with empirical
data, unless one is willing to assume Xtis wholly exogenous – as in Section 6.1 with monetary
policy shocks – or strictly autoregressive, some scenarios are more amenable to analysis with
the framework presented here than other. In Section 6.2, following Istrefi and Mouabbi (2018),
I will let Xtbe a non-negative uncertainty measure, so that negative shocks are harder to study
without producing sequences that contain negative uncertainty values. Thus, I will focus on
positive, contractionary shocks.
For a given Xt, transformation Xt`δρpϵ1tqis not directly applicable since ϵ1tis not observed.
In practice, therefore, I will consider
p
Xtpδq:“Xt`δρppϵ1tq.
For simplicity of notation, let r
δt:“δρpϵ1tq. Similarly to Step (ii) of Proposition 4.1, given a
path Xt`j:tone finds
Xt`jpr
δtq “ Xt`j`Θj,11r
δt`
j
ÿ
k“1pΓk,11Xt`j´kpr
δtq ´ Γk,11Xt`j´kq
“γjpXt`j:t;r
δtq,
The relaxed-shock impulse response is thus given by
Ą
IRFhpδq:“ErZt`jpr
δtq ´ Zt`js “ Θh,¨1δErρpϵ1tqs `
j
ÿ
k“1
E”ΓkXt`j´kpr
δtq ´ ΓkXt`j´kı.
In what follows, I show that by replacing r
δtwith p
r
δt“δρppϵ1tqit is possible to consistently
estimate unconditional expectations involving Xt`jpr
δtqas well as Xt`j, and thus Ą
IRFhpδq, by
averaging over sample realizations.
4.3 Relaxed Impulse Response Consistency
For a given δPRand compatible shock relaxation function ρ, vector Vjpδqis the nonlinear
component of impulse responses. One can focus on a specific variable’s response by introducing,
29
for 1ďℓďd,
Vj,ℓpδq:“1
n´j
n´j
ÿ
t“1”Γj,ℓγjpXt`j:t;r
δtq ´ Γj,ℓXt`jı,
where Vj,ℓpδqis the horizon jnonlinear effect on the ℓth variable and Γj,ℓ is the ℓth component
of functional vector Γj. For the sake of notation I also define
vj,ℓpXt`j:t;r
δtq:“Γj,ℓγjpXt`j:t;r
δtq ´ Γj,ℓXt`j.
Let pvj,ℓpXt`j:t;p
r
δtqbe its sample equivalent, so that
p
vj,ℓpXt`j:t;p
r
δtq “ p
Γj,ℓp
γjpXt`j:t;p
r
δtq ´ p
Γj,ℓXt`j,
p
Vj,ℓpδq “ 1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;p
r
δt˘
and
y
Ą
IRFh,ℓpδq “ Θh,¨1δ n´1
n
ÿ
t“1
ρppϵ1tq `
h
ÿ
j“0p
Vj,ℓpδq
for 1ďℓďd.
Theorem 4.1. Let y
Ą
IRFh,ℓpδqbe a semi-nonparametric estimate for the horizon hrelaxed shock
IRF of variable ℓ. Under the same assumptions as in Theorem 3.2
y
Ą
IRFh,ℓpδqP
ÑĄ
IRFh,ℓpδq
for any fixed integers 0ďhă 8 and 1ďℓďd.
5 Simulations
This section is devoted to analyzing the empirical performance of the two-step semi-nonparametric
estimation strategy discussed above. I will consider the two simulation setups employed by
Gonçalves et al. (2021), with focus on bias and MSE of the estimated relaxed shocked impulse
response functions. Additionally, I provide simulations under a modified design which high-
light how in larger samples the non-parametric sieve estimator consistently recovers impulse
responses, while a least-squares estimator constructed with a pre-specified nonlinear transform
does not. In all simulations, I use a B-spline sieve of order 1.
5.1 Benchmark Bivariate Design
The first simulation setup involves a bivariate DGP where the structural shock does not directly
affect other observables. This is a simple environment to check that indeed the two-step estimator
recover the nonlinear component of the model and impulse responses are consistently estimated,
and that the MSE does not worsen excessively.
I consider three bivariate data generation processes. DGP 1 sets Xtto be a fully exogenous
30
innovation process,
Xt“ϵ1t,
Yt“0.5Yt´1`0.5Xt`0.3Xt´1´0.4 maxp0, Xtq ` 0.3 maxp0, Xt´1q ` ϵ2t.(16)
DGP 2 adds an autoregressive component to Xt, but maintains exogeneity,
Xt“0.5Xt´1`ϵ1t,
Yt“0.5Yt´1`0.5Xt`0.3Xt´1´0.4 maxp0, Xtq ` 0.3 maxp0, Xt´1q ` ϵ2t.(17)
Finally, DGP 3 add an endogenous effect of Yt´1on the structural variable by setting
Xt“0.5Xt´1`0.2Yt´1`ϵ1t,
Yt“0.5Yt´1`0.5Xt`0.3Xt´1´0.4 maxp0, Xtq ` 0.3 maxp0, Xt´1q ` ϵ2t.(18)
Following Assumption 1, innovations are mutually independent. To accommodate Assumptions
4and 5, both ϵ1tand ϵ2tare drawn from a truncated standard Gaussian distribution over
r´3,3s.19 All DGPs are centered to have zero intercept in population.
I evaluate bias and MSE plots using 1000 Monte Carlo simulation. For a chosen horizon
H, the impact of a relaxed shock on ϵ1tis evaluated on Yt`hfor h“1, . . . , H. To compute the
population IRF, I employ a direct simulation strategy that replicates the shock’s propagation
through the model and I use 10 000 replications. To evaluate the estimated IRF, the two-step
procedure is implemented: a sample of length nis drawn, the linear least squares and the semi-
nonparametric series estimators of the model are used to estimate the model and the relaxed
IRF is computed following Proposition 4.2. For the sake of brevity, I discuss the case of δ“1
and I set the shock relaxation function to be
ρpzq “ exp ˜1`„
z
3
4
´1ȷ´1¸
over interval r´3,3sand zero everywhere else.20 Choices of δ“ ´1and δ“ ˘0.5yield similar
results in simulations, so I do not discuss them here.
Figure 2contains the results for sample size n“240. This choice is motivated by considering
the average sample sizes found in most macroeconometric settings: it is equivalent to 20 years
of monthly data or 60 yearly of quarterly data (Gonçalves et al.,2021). The benchmark method
is an OLS regression that relies on a priori knowledge of the underlying DGP specification.
Given the moderate sample size, to construct the cubic spline sieve estimator of the nonlinear
component of the model I use a single knot, located at 0. The simulations in Figure 2show that
while the MSE is slighlty higher for the sieve model, the bias is comparable across methods. Note
19Let eit „Np0,1qfor i“1,2, then the truncated Gaussian innovations used in simulation are set to be
ϵit “minpmaxp´3, eitq,3q. The resulting r.v.s have a non-continuous density with two mass points at -3 and 3.
However, in practice, since these masses are negligible, for the moderate sample sizes used this choice does not
create issues.
20It can be easily checked that this choice of ρis compatible with shocks of size 0ď |δ| ď 1.
31
0 5 10 15 20 25
0
5
10
15 10-3 n = 240
Sieve OLS
0 5 10 15 20 25
-0.02
-0.015
-0.01
-0.005
0
0.005
0.01 n = 240
Sieve OLS
0 5 10 15 20 25
0
5
10
15 10-3 n = 240
Sieve OLS
0 5 10 15 20 25
-0.02
-0.015
-0.01
-0.005
0
0.005
0.01 n = 240
Sieve OLS
0 5 10 15 20 25
0
0.05
0.1
0.15 n = 240
Sieve OLS
0 5 10 15 20 25
-0.4
-0.3
-0.2
-0.1
0n = 240
Sieve OLS
Figure 2: Simulations results for DGPs 1-3.
32
that for DGP 3, due to the dependence of the structural variable on non-structural series lags,
the MSE and bias increase significantly, and there is no meaningful difference in performance
between the two estimation approaches.
5.2 Structural Partial Identification Design
To showcase the validity of the proposed sieve estimator under the type of partial structural
identification discussed in the paper, I again rely on the simulation design proposed by Gonçalves
et al. (2021). All specifications are block-recursive, and require estimating the contemporaneous
effects of a structural shock on non-structural variables, unlike in the previous section.
The form of the DGPs is
B0Zt“B1Zt´1`C0fpXtq ` C1fpXt´1q ` ϵt,
where in all variations of the model
B0“»
—
–
1 0 0
´0.45 1 ´0.3
´0.05 0.1 1 fi
ffi
fl, C0“»
—
–
0
´0.2
0.08fi
ffi
fl,and C1“»
—
–
0
´0.1
0.2fi
ffi
fl.
I focus on the case fpxq “ maxp0, xq, since this type of nonlinearity is simpler to study. DGP 4
treats Xtas an exogenous shock by setting
B1“»
—
–
000
0.15 0.17 ´0.18
´0.08 0.03 0.6fi
ffi
fl;
DGP 5 add serial correlation to Xt,
B1“»
—
–´0.13 0 0
0.15 0.17 ´0.18
´0.08 0.03 0.6fi
ffi
fl;
and DGP 6 includes dependence on Yt´1,
B1“»
—
–´0.13 0.05 ´0.01
0.15 0.17 ´0.18
´0.08 0.03 0.6fi
ffi
fl.
For these data generating processes, I employ the same setup of simulations with DGPs 1-3,
including the number of replications as well as the type of relaxed shock. as well as the sieve
grid. Here too I evaluate MSE and bias of both the sieve and the correct specification OLS
estimators with as sample size of n“240 observations. The results in Figure 3show again that
there is little difference in terms of performance between the semi-nonparametric sieve approach
and a correctly-specified OLS regression.
33
0 2 4 6 8 10 12
0
5
10
15 10-3 n = 240
0 2 4 6 8 10 12
-4
-2
0
2
410-3 n = 240
0 2 4 6 8 10 12
0
5
10
15 10-3
Sieve OLS
0 2 4 6 8 10 12
-4
-2
0
2
410-3
Sieve OLS
0 2 4 6 8 10 12
0
5
10
15 10-3 n = 240
0 2 4 6 8 10 12
-4
-2
0
2
410-3 n = 240
0 2 4 6 8 10 12
0
5
10
15 10-3
Sieve OLS
0 2 4 6 8 10 12
-4
-2
0
2
410-3
Sieve OLS
0 2 4 6 8 10 12
0
5
10
15 10-3 n = 240
0 2 4 6 8 10 12
-4
-2
0
2
410-3 n = 240
0 2 4 6 8 10 12
0
5
10
15 10-3
Sieve OLS
0 2 4 6 8 10 12
-4
-2
0
2
410-3
Sieve OLS
Figure 3: Simulations results for DGPs 4-6.
34
−3−2−1 1 2 3
−1
1
2
Figure 4: Plot of nonlinear function φpxqused in DGP 7.
5.3 Model Misspecification
The previous sections report results that support the use of the sieve IRF estimator in a sample of
moderate size, since it performs comparably to a regression performed with a priori knowledge
of the underlying DGP. I now show that the semi-nonparametric approach is also robust to
model misspecification compared to simpler specifications involving fixed choices for nonlinear
transformations.
To this end, I modify DGP 2 to use a smooth nonlinear transformation to define the effect
of structural variable Xton Yt. That is, there is no compounding of linear and nonlinear effects.
The autoregressive coefficient in the equation for Xtis also increased to make the shock more
persistent. The new data generating process, DGP 21, is, thus, given by
Xt“0.8Xt´1`ϵ1t,
Yt“0.5Yt´1`0.9φpXtq ` 0.5φpXt´1q ` ϵ2t.(19)
where φpxq:“ px´1qp0.5`tanhpx´1q{2q, which is plotted in Figure 4.
To emphasize the difference in estimated IRFs, in this setup I focus on δ“ ˘2, which
requires adapting the choice of innovations and shock relaxation function. In simulations of
DGP 21,ϵ1tand ϵ2tare both drawn from a truncated standard Gaussian distribution over
r´5,5s. The shock relaxation function of this setup is given by
ρpzq “ exp ˜1`„
z
5
3.9
´1ȷ´1¸
over interval r´5,5sand zero everywhere else. This form of ρis adapted to choices of δsuch
that 0ă |δ| ď 2. The sieve grid now consists of 4 equidistant knots within p´5,5q. I use the
same numbers of replications as in the previous simulations. Finally, the regression design is
identical to that used for DGP 2 under correct specification.
The results obtained with sample size n“2400 are collected in Figure 5. I choose this larger
sample size to clearly showcase the inconsistency of impulse responses under misspecification:
as it can be observed, the simple OLS estimator involving the negative-censoring transform
35
0 5 10 15 20 25
0
0.01
0.02
0.03
0.04
0.05
0.06 n = 2400
Sieve OLS
0 5 10 15 20 25
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05 n = 2400
Sieve OLS
(a) δ“ `2
0 5 10 15 20 25
0
0.005
0.01
0.015 n = 2400
Sieve OLS
0 5 10 15 20 25
-0.02
0
0.02
0.04
0.06
0.08
0.1 n = 2400
Sieve OLS
(b) δ“ ´2
Figure 5: Simulations results for DGP 7.
36
produces IRF estimates with consistently worse MSE and bias than those of the sieve estimator
at almost all horizons. Similar results are also obtained for more moderate shocks δ“ ˘1,
but the differences are less pronounced. These simulations suggest that the semi-nonparametric
sieve estimator can produce substantially better IRF estimates in large samples than methods
involving nonlinear transformations selected a priori.
In this setup, it is also important to highlight the fact that the poor performance of OLS
IRF estimates does not come from φpxqbeing “complex”, and, thus, hard to approximate by
combinations of simple functions. In fact, if in DGP 21function φis replaced by rφpxq:“
φpx`1q, the differences between sieve and OLS impulse response estimates become minimal in
simulations, with the bias of the latter decreasing by approximately an order of magnitude (see
Figure 8in Appendix C). This is simply due the fact that rφpxqis well approximated by maxp0, xq
directly. However, one then requires either prior knowledge or sheer luck when constructing the
nonlinear transforms of Xtfor an OLS regression. The proposed series estimator, instead,
just requires an appropriate choice of sieve. Many data-driven procedures to select sieves in
applications have been proposed, see for example the discussion in Kang (2021).
6 Empirical Applications
In this section, I showcase the practical utility of the proposed semi-nonparametric sieve estima-
tor by considering two applied exercises. First, I revisit the empirical analysis of Gonçalves et al.
(2021), which is itself based on the work of Tenreyro and Thwaites (2016). This provides both
linear and nonlinear benchmarks for the monetary policy responses within a compact econo-
metric model. I find that, although the differences between approaches are mild, nonparametric
IRFs in fact provide counter-evidence to the conclusions reported by Gonçalves et al. (2021). In
the second application, I compare the linear and nonlinear impulse responses that are produced
by uncertainty shocks in the setup studied by Istrefi and Mouabbi (2018). Here, sieve-estimated
IRFs show differences in shape, timing and intensity, chiefly when the sign of the shock changes.
6.1 Monetary Policy Shocks
The objective of the empirical analysis in Gonçalves et al. (2021) is to analyze the effects of a
monetary policy shock on a model of the US macroeconomy. Structural identification is achieved
via a narrative approach, following the seminal work of Romer and Romer (2004).
The four-variable model is set up identically to the one of Gonçalves et al. (2021), Section
7. Let Zt“ pXt,FFRt,GDPt,PCEtq1, where Xtis the series of narrative U.S. monetary policy
shocks, FFRtis the federal funds rate, GDPtis log real GDP and PCEtis PCE inflation.21 As
21In Gonçalves et al. (2021) p. 122, it is mentioned that CPI inflation is included in the model, but both
in the replication package made available by one the authors (https://sites.google.com/site/lkilian2019/
research/code) from which I source the data, and Tenreyro and Thwaites (2016), PCE inflation is used instead.
Moreover, the authors say that both the FFR and PCE enter the model in first differences, yet in their code these
variables are kept in levels. I keep their original formulation to allow for a proper comparison between estimation
methods.
37
a pre-processing step, GDP is transformed to log GDP and then linearly detrended. The data
is available quarterly and spans from 1969:Q1 to 2007:Q4. As in Tenreyro and Thwaites (2016),
I use a model with one lag, p“1. Narrative shock Xtis considered to be an i.i.d. sequence,
i.e. Xt“ϵ1t, therefore I assume no dependence on lagged variables when implementing pseudo-
reduced form (2). Like in Gonçalves et al. (2021), I consider positive and negative shocks of size
|δ| “ 1. As such, I choose
ρpzq “ It|z| ď 4uexp ˜1`„
z
4
6
´1ȷ´1¸
to be the shock relaxation functions. Figure 10 in Appendix Cprovides a check for the validity of
ρgiven the sample distribution of Xt. Knots for sieve estimation are located at t´1,0,1u. The
model is block-recursive, and the structural formulation of Section 2.1 allows identifying the U.S.
monetary policy shocks without the need to impose additional assumptions on the remaining
shocks. Gonçalves et al. (2021), following Tenreyro and Thwaites (2016), use two nonlinear
transformations, Fpxq “ maxp0, xqand Fpxq “ x3, to try to gauge how negative versus positive
and large versus small shocks, respectively, affect the U.S. macroeconomy. For clarity, below
I refer to this approach as “parametric nonlinear method”. Since the authors find that the
inclusion of a cubic term does not meaningfully change impulse responses, I focus on comparing
the IRFs estimated via sieve regression with the ones obtained by setting Fpxq “ maxp0, xq, as
well as by not including nonlinear terms (i.e. linear IRFs).
Figure 6shows the estimated impulse response to both a positive and negative unforeseen
monetary policy shock. The impact on the federal funds rate is consistent across all three
procedures, but there are important differences in GDP and inflation responses. In case of an
exogenous monetary tightening change, the parametric nonlinear response for GDP, unlike in
the case of linear and parametric nonlinear IRFs, is nearly zero at impact and has a monotonic
decrease until around 10 quarters ahead. The change is shape is meaningful, as the procedure
of Gonçalves et al. (2021) still yields a small short-term upward jump in GDP when a monetary
tightening shock hits. Moreover, after the positive shock, the sieve GDP responses reaches its
lowest value 4 and 2 quarters before the linear and parametric nonlinear responses, while its size
is 13% and 16% larger, respectively.22 Finally, the sieve PCE response is positive for a shorter
interval, but looks to be more persistent once it turns negative also 10 months after impact.
When the shock is expansionary, sieve IRFs show a pronounced asymmetry, even more than
that of parametric nonlinear responses. One can observe that semi-nonparametric federal funds
rate IRF is marginally mitigated compared to the alternative estimates. An important puzzle
is due to the clearly negative impact on GDP. Indeed, both types of nonlinear responses show
a drop in output in the first 5 quarters. Also note that the PCE inflation has a positive spike
the first couple of quarters after impact. Such a quick change seems unrealistic, as one does not
expect inflation to suddenly reverse sign, but, as Gonçalves et al. (2021) also remark, the overall
22The strength of this effect changes across different shocks sizes, as Figure 12 in Appendix Cproves. As
shocks sizes get smaller, nonlinear IRFs, both parametric and sieve, show decreasing negative effects.
38
0 5 10 15 20
Quarters
-0.5
0
0.5
1
1.5
2
2.5 Fed Funds Rate
Linear
Nonlin-par
Sieve
0 5 10 15 20
Quarters
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Log Real GDP
0 5 10 15 20
Quarters
-0.2
-0.1
0
0.1
0.2 PCE Inflation
(a) δ“ `1
0 5 10 15 20
Quarters
-2
-1.5
-1
-0.5
0Fed Funds Rate
0 5 10 15 20
Quarters
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Log Real GDP
0 5 10 15 20
Quarters
-0.2
-0.1
0
0.1
0.2 PCE Inflation
(b) δ“ ´1
Figure 6: Effect of an unexpected U.S. monetary policy shock on federal funds rate, GDP and
inflation. Linear (gray, dashed), parametric nonlinear with Fpxq “ maxp0, xq(red, point-dashed)
and sieve (blue, solid) structural impulse responses. For δ“ `1, the lowest point of the GDP
response is marked with a dot.
impact on inflation of both shocks is small when compared to the change in federal funds rate.
This comparison between methods, and specifically the nature of nonparametric impulse
responses, provides evidence that a small econometric model, such as the one studied by Tenreyro
and Thwaites (2016), may be inadequate to fully capture the dynamic effects of monetary policy
shocks. In both setups, however, impulse response interpretation is only suggestive, as confidence
bands are missing and only pointwise IRFs are available. Whether the puzzles highlighted above
would persist after accounting for estimation uncertainty is an important research question that
I leave for future analysis.
6.2 Uncertainty Shocks
Uncertainty in interest rates appears to be a significant factor in recent economic history. Start-
ing with the fundamental changes brought forth by the unprecedented measures of unconven-
tional monetary policy after the 2007-2008 financial crisis, to the powerful economic stimuli
during the COVID-19 pandemic, and finally the subsequent interest rate tightening and in-
flation phenomenon of 2022, central banks and institutional agents are often very concerned
39
about uncertainty. Since traditional central bank policymaking is heavily guided by the prin-
ciple that the central bank can and should influence expectations, controlling the (perceived)
level of ambiguity in current and future commitments is key.
Istrefi and Mouabbi (2018) provide an analysis of the impact of unforeseen changes in the
level of subjective interest rate uncertainty on the macroeconomy. They derive a collection of
new indices based on short- and long-term profession forecasts. Their empirical study goes in
depth into studying the different components that play a role in transmitting uncertainty shocks,
but here I will focus on re-evaluating their structural impulse response estimates under the light
of potentially-missing nonlinear effects. For the sake of simplicity, my evaluation will focus only
on the 3-months-ahead uncertainty measure for short-term interest rate maturities (3M3M) and
the US economy.23
Like in Istrefi and Mouabbi (2018), let Zt“ pXt,IPt,CPIt,PPIt,RTt,URtq1be a vector
where Xtis the chosen uncertainty measure, IPtis the (log) industrial production index, CPIt
is the CPI inflation rate, PPItis the producer price inflation rate, RTtis (log) retail sales and
URtis the unemployment rate. The nonlinear model specification is given by
Zt“µ`A1Zt´1`A2Zt´1`F1pXt´1q ` F2pXt´2q ` DWt`ut,
where Wtincludes a linear time trend and oil price OILt.24 The data has monthly frequency and
spans the period between May 1993 and July 2015.25 Note here that, following the identification
strategy of Gonçalves et al. (2021), nonlinear functions F1and F2are to be understood as not
effecting Xt, which is the structural variable. The linear VAR specification of Istrefi and Mouabbi
(2018) is recovered by simply assuming F1“F2“0prior to estimation. Since they use recursive
identification and order the uncertainty measure first, this model too is block-recursive.
I consider a positive shock with intensity δ“σϵ,1, where σϵ,1is the standard deviation of
structural innovations. In this empirical exercise, the relaxation function is given by
ρpzq “ I"|z| ď 1
4*exp ˆ1`”|4x|8´1ı´1˙
and I set t0.1,0.3uto be the cubic spline knots. As 3M3M is a non-negative measure of un-
certainty, some care must be taken to make sure that the shocked paths for Xtdo not reach
negative values. Figure 14 in Appendix Cshows that the relaxation function is compatible, and
also that the shocked nonlinear paths of Xtwith impulse δand δ1all do not cross below zero.
Figure 7presents both the linear and nonlinear structural impulse responses obtained.
Importantly, even though Istrefi and Mouabbi (2018) estimate a Bayesian VAR model and here
23Istrefi and Mouabbi (2018) also provide comparisons with results obtained with the other uncertainty mea-
sures, which they comment are all very similar to the ones obtained with 3M3M. Their paper additionally evaluates
a number of other highly developed countries.
24Inclusion of linear exogenous variables in the semi-nonparametric theoretical framework detail in Section 3
is straightforward as long as one can assume that they are stationary and weakly dependent. The choice of using
p“2is identical to that of the original authors, based on BIC.
25I reuse the original data employed by the authors, who kindly shared it upon request, but rescale retail sales
(RTt) so that the level on January 2000 equals 100.
40
0 20 40 60
Months
-0.05
0
0.05 3M3M Uncertainty
Linear
Sieve
0 20 40 60
Months
-0.5
0
0.5
IP
0 20 40 60
Months
-0.1
-0.05
0
0.05
0.1
CPI Inflation
0 20 40 60
Months
-0.5
0
0.5 PP Inflation
0 20 40 60
Months
-1
-0.5
0
0.5
1Retail
0 20 40 60
Months
-0.3
-0.2
-0.1
0
0.1
0.2
Unemployment
Figure 7: Effect of an unexpected, one-standard-deviation uncertainty shock to US macroeco-
nomic variables. Linear (gray, dashed) and sieve (blue, solid) structural impulse responses. The
extreme points of the responses are marked with a dot.
I consider a frequentist vector autoregressive benchmark, the shape of the IRFs is retained, c.f.
the median response in the top row of their Figure 4. When uncertainty increases, industrial
production drops, and the size and extent of this decrease is intensified in the nonlinear responses.
In fact, the sieve IP response reaches a value that is 54% lower than that of the respective linear
IRF.26 A similar behavior holds true for retail sales (38% lower) and unemployment (23% higher),
proving that this shock is more profoundly contractionary than suggested by the linear VAR
model. Further, CPI and PP inflation both show short-term fluctuations which strengthen the
short- and medium-term impact of the shock. CPI and PP nonlinear inflation responses are
76% and 41% stronger than their linear counterpart, respectively. These differences suggest
that linear IRFs might be both under-estimating the short-term intensity and misrepresenting
long-term persistence of inflation reactions. From another perspective, Nowzohour and Stracca
(2020) presented evidence that consumer consumption growth, credit growth and unemployment
do not co-move with the policy uncertainty index (EPU) of Baker et al. (2016), but are negatively
correlated with financial volatility. Given the strength of nonlinear IRFs, this discrepancy may
also suggest that the 3M3M uncertainty measure partially captures the financial channel, too.
The introduction of nonlinear terms in the structural VAR of Istrefi and Mouabbi (2018)
thus provides evidence that fundamental impulse response features might otherwise be missed.
Indeed, Figure 13 in Appendix C- which plots regression functions of endogenous variables
26Figure 15 in Appendix Cconfirms that this difference is consistent over a range of shock sizes, too.
41
with respect to Xt- proves that high and low uncertainty levels may have significantly different
effects on endogenous economic variables. In particular, at the second lag, tail effects appear to
be milder, while at low levels changes in uncertainty have more pronounced impact.
7 Conclusion
This paper studies the application of semi-nonparametric series estimation to the problem of
structural impulse response analysis for time series. After first discussing the partial identifi-
cation model setup, I have used the conditions of system contractivity and stability to derive
physical measures of the dependence for nonlinear systems. In turn, these allow to derive
primitive conditions under which series estimation can be employed and structural IRFs are
consistently estimated. The simulation results prove that this approach is valid in moderate
samples and has the added benefit of being robust to misspecification of the nonlinear model
components. Finally, two empirical applications showcase the utility in departing from both
linear and parametric nonlinear specifications when estimating structural responses.
There are many possible avenues for extending the results I have presented here. A key
aspect that I have not touched upon is inference in the form of confidence intervals: the theory
of Chen and Christensen (2015) does not encompass uniform inference, and, as such, additional
results have to be developed. Indeed, (uniform) confidence bands are necessary to properly
quantify the uncertainty of IRF estimates. Belloni et al. (2015) give a uniform asymptotic
inference theory, but their derivations are limited to non-dependent data. Li and Liao (2020)
and Cattaneo et al. (2022) provide theoretical coupling results that could be exploited in order
to handle time series data. Chen and Christensen (2018) give a theory of uniform inference for
panel IV setups, which could possibly be generalized to handle nonlinear IRFs. In the spirit
of Kang (2021), it would be also important to derive inference results that are uniform in the
selection of series terms, as, in practice, a data-driven procedure for selecting Kshould be used.
Studying other sieve spaces, such as neural networks or shape-preserving sieves (Chen,2007),
would also be highly desirable. The latter can be especially useful in contexts where economic
knowledge suggests that the nonlinear components of the model are e.g. strictly monotonic
increasing or convex. Finally, sharpening of convergence rates used in the main proofs is of
independent interest.
42
References
Auerbach, A. J. and Gorodnichenko, Y. (2012). Measuring the Output Responses to Fiscal
Policy. American Economic Journal: Economic Policy, 4(2):1–27.
Baker, S. R., Bloom, N., and Davis, S. J. (2016). Measuring Economic Policy Uncertainty. The
Quarterly Journal of Economics, 131(4):1593–1636.
Belloni, A., Chernozhukov, V., Chetverikov, D., and Kato, K. (2015). Some new asymptotic
theory for least squares series: Pointwise and uniform results. Journal of Econometrics,
186(2):345–366.
Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods. Springer Science
& Business Media.
Caggiano, G., Castelnuovo, E., Colombo, V., and Nodari, G. (2015). Estimating Fiscal Multi-
pliers: News From A Non-linear World. The Economic Journal, 125(584):746–776.
Caggiano, G., Castelnuovo, E., and Figueres, J. M. (2017). Economic policy uncertainty and
unemployment in the United States: A nonlinear approach. Economics Letters, 151:31–34.
Caggiano, G., Castelnuovo, E., and Pellegrino, G. (2021). Uncertainty shocks and the great
recession: Nonlinearities matter. Economics Letters, 198:109669.
Cattaneo, M. D., Farrell, M. H., and Feng, Y. (2020). Large sample properties of partitioning-
based series estimators. The Annals of Statistics, 48(3):1718–1741.
Cattaneo, M. D., Masini, R. P., and Underwood, W. G. (2022). Yurinskii’s Coupling for Mar-
tingales. Working Paper.
Chen, X. (2007). Chapter 76 Large Sample Sieve Estimation of Semi-Nonparametric Models.
In Heckman, J. J. and Leamer, E. E., editors, Handbook of Econometrics, volume 6, pages
5549–5632. Elsevier.
Chen, X. (2013). Penalized Sieve Estimation and Inference of Seminonparametric Dynamic
Models: A Selective Review. In Acemoglu, D., Arellano, M., and Dekel, E., editors, Advances
in Economics and Econometrics, pages 485–544. Cambridge University Press, 1 edition.
Chen, X. and Christensen, T. M. (2015). Optimal uniform convergence rates and asymptotic
normality for series estimators under weak dependence and weak conditions. Journal of
Econometrics, 188(2):447–465.
Chen, X. and Christensen, T. M. (2018). Optimal sup-norm rates and uniform inference on
nonlinear functionals of nonparametric IV regression. Quantitative Economics, 9(1):39–84.
Chen, X., Shao, Q.-M., Wu, W. B., and Xu, L. (2016). Self-normalized Cramér-type moderate
deviations under dependence. The Annals of Statistics, 44(4):1593–1617.
43
Chen, X. and Shen, X. (1998). Sieve Extremum Estimates for Weakly Dependent Data. Econo-
metrica, 66(2):289.
Chetverikov, D., Santos, A., and Shaikh, A. M. (2018). The Econometrics of Shape Restrictions.
Annual Review of Economics, 10(1):31–63.
Debortoli, D., Forni, M., Gambetti, L., and Sala, L. (2020). Asymmetric Effects of Monetary
Policy Easing and Tightening. Working Paper.
Fan, J. and Yao, Q. (2003). Nonlinear time series: nonparametric and parametric methods,
volume 20. Springer.
Farrell, M. H., Liang, T., and Misra, S. (2021). Deep Neural Networks for Estimation and
Inference. Econometrica, 89(1):181–213.
Feng, B. Q. (2003). Equivalence constants for certain matrix norms. Linear Algebra and its
Applications, 374:247–253.
Forni, M., Gambetti, L., Maffei-Faccioli, N., and Sala, L. (2023a). Nonlinear transmission of
financial shocks: Some new evidence. Journal of Money, Credit and Banking.
Forni, M., Gambetti, L., and Sala, L. (2023b). Asymmetric effects of news through uncertainty.
Macroeconomic Dynamics, pages 1–25.
Freyberger, J. and Reeves, B. (2018). Inference under Shape Restrictions. Working Paper.
Fuleky, P., editor (2020). Macroeconomic Forecasting in the Era of Big Data: Theory and
Practice, volume 52 of Advanced Studies in Theoretical and Applied Econometrics. Springer
International Publishing, Cham.
Gambetti, L., Maffei-Faccioli, N., and Zoi, S. (2022). Bad News, Good News: Coverage and
Response Asymmetries. Working Paper.
Gao, J. (2007). Nonlinear Time Series: Semiparametric and Nonparametric Methods. Chapman
and Hall/CRC.
Gonçalves, S., Herrera, A. M., Kilian, L., and Pesavento, E. (2021). Impulse response analysis for
structural dynamic models with nonlinear regressors. Journal of Econometrics, 225(1):107–
130.
Gourieroux, C. and Jasiak, J. (2005). Nonlinear Innovations and Impulse Responses with Ap-
plication to VaR Sensitivity. Annales d’Économie et de Statistique, pages 1–31.
Gourieroux, C. and Lee, Q. (2023). Nonlinear impulse response functions and local projections.
Working Paper.
Hamilton, J. D. (1994a). State-space models. Handbook of Econometrics, 4:3039–3080.
Hamilton, J. D. (1994b). Time Series Analysis. Princeton University Press.
44
Härdle, W., Lütkepohl, H., and Chen, R. (1997). A Review of Nonparametric Time Series
Analysis. International Statistical Review, 65(1):49–72.
Horn, R. A. and Johnson, C. R. (2012). Matrix Analysis. Cambridge University Press, second
edition.
Horowitz, J. L. and Lee, S. (2017). Nonparametric estimation and inference under shape restric-
tions. Journal of Econometrics, 201(1):108–126.
Huang, Y., Chen, X., and Wu, W. B. (2014). Recursive Nonparametric Estimation for Time
Series. IEEE Transactions on Information Theory, 60(2):1301–1312.
Istrefi, K. and Mouabbi, S. (2018). Subjective interest rate uncertainty and the macroeconomy:
A cross-country analysis. Journal of International Money and Finance, 88:296–313.
Jordà, Ò. (2005). Estimation and Inference of Impulse Responses by Local Projections. American
Economic Review, 95(1):161–182.
Kanazawa, N. (2020). Radial basis functions neural networks for nonlinear time series analysis
and time-varying effects of supply shocks. Journal of Macroeconomics, 64:103210.
Kang, B. (2021). Inference In Nonparametric Series Estimation with Specification Searches for
the Number of Series Terms. Econometric Theory, 37(2):311–345.
Kilian, L. and Lütkepohl, H. (2017). Structural Vector Autoregressive Analysis. Themes in
Modern Econometrics. Cambridge University Press, Cambridge.
Kilian, L. and Vega, C. (2011). Do energy prices respond to us macroeconomic news? a test of
the hypothesis of predetermined energy prices. Review of Economics and Statistics, 93(2):660–
671.
Kilian, L. and Vigfusson, R. J. (2011). Are the responses of the us economy asymmetric in
energy price increases and decreases? Quantitative Economics, 2(3):419–453.
Koop, G., Pesaran, M. H., and Potter, S. M. (1996). Impulse response analysis in nonlinear
multivariate models. Journal of Econometrics, 74(1):119–147.
Lanne, M. and Nyberg, H. (2023). Nonparametric Impulse Response Analysis in Changing
Macroeconomic Conditions. Working Paper.
Li, J. and Liao, Z. (2020). Uniform nonparametric inference for time series. Journal of Econo-
metrics, page 14.
Li, Q. and Racine, J. S. (2009). Nonparametric econometric methods. Emerald Group Publishing.
Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. New York : Springer,
Berlin.
45
Movahedifar, M. and Dickhaus, T. (2023). On the closed-loop Volterra method for analyzing
time series. Working Paper.
Nowzohour, L. and Stracca, L. (2020). More than a feeling: Confidence, uncertainty, and
macroeconomic fluctuations. Journal of Economic Surveys, 34(4):691–726.
Pellegrino, G. (2021). Uncertainty and monetary policy in the US: A journey into nonlinear
territory. Economic Inquiry, 59(3):1106–1128.
Pötscher, B. M. and Prucha, I. (1997). Dynamic nonlinear econometric models: Asymptotic
theory. Springer Science & Business Media.
Potter, S. M. (2000). Nonlinear impulse response functions. Journal of Economic Dynamics
and Control, 24(10):1425–1446.
Ramey, V. A. and Zubairy, S. (2018). Government spending multipliers in good times and in
bad: evidence from us historical data. Journal of political economy, 126(2):850–901.
Romer, C. D. and Romer, D. H. (2004). A new measure of monetary shocks: Derivation and
implications. American economic review, 94(4):1055–1084.
Sims, C. A. (1980). Macroeconomics and Reality. Econometrica, 48(1):1–48.
Sirotko-Sibirskaya, N., Franz, M. O., and Dickhaus, T. (2020). Volterra bootstrap: Resampling
higher-order statistics for strictly stationary univariate time series. Working Paper.
Stock, J. H. and Watson, M. W. (2016). Dynamic factor models, factor-augmented vector
autoregressions, and structural vector autoregressions in macroeconomics. In Handbook of
Macroeconomics, volume 2, pages 415–525. Elsevier.
Tenreyro, S. and Thwaites, G. (2016). Pushing on a string: US monetary policy is less powerful
in recessions. American Economic Journal: Macroeconomics, 8(4):43–74.
Teräsvirta, T., Tjøstheim, D., and Granger, C. W. J. (2010). Modelling Nonlinear Economic
Time Series. Oxford University Press.
Tong, H. (1990). Non-linear Time Series: a Dynamical System Approach. Oxford University
Press.
Tropp, J. A. (2012). User-Friendly Tail Bounds for Sums of Random Matrices. Foundations of
Computational Mathematics, 12(4):389–434.
Tsay, R. S. and Chen, R. (2018). Nonlinear Time Series Analysis, volume 891. John Wiley &
Sons.
Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics.
Springer, New York ; London.
46
Wu, W. B. (2005). Nonlinear system theory: Another look at dependence. Proceedings of the
National Academy of Sciences, 102(40):14150–14154.
Wu, W. B. (2011). Asymptotic theory for stationary processes. Statistics and its Interface,
4(2):207–226.
Wu, W. B., Huang, Y., and Huang, Y. (2010). Kernel estimation for time series: An asymptotic
theory. Stochastic Processes and their Applications, 120(12):2412–2431.
47
Appendix
A Preliminaries
Matrix Norms. Let
∥A∥r:“max ␣∥Ax∥rˇˇ∥x∥rď1(
be the r-operator norm of matrix APCd1ˆd2. The following Theorem establishes the equivalence
between different operator norms as well as the compatibility constants.
Theorem A.1 (Feng (2003)).Let 1ďp, q ď 8. Then for all APCd1ˆd2,
∥A∥pďλp,qpd1qλq,ppd2q∥A∥q,
where
λa,bpdq:“$
&
%
1if aěb,
d1{a´1{bif aăb.
This norm inequality is sharp.
In particular, if pąqthen it holds
1
pd2q1{q´1{p∥A∥pď∥A∥qď pd1q1{q´1{p∥A∥p.
B Proofs
B.1 GMC Conditions and Proposition 3.1
Lemma B.1. Assume that tϵtutPZ,ϵtPEĎRdZare i.i.d., and tZtutPZis generated according
to
Zt“GpZt´1, ϵtq,
where ZtPZĎRdZand Gis a measurable function. If either
(a) Contractivity conditions (11)-(12)hold, suptPZ∥ϵt∥Lră 8 and ∥Gpz, ϵq∥ă 8 for some
pz, ϵq P ZˆE;
(b) Stability conditions (13)-(14)hold, suptPZ∥ϵt∥Lră 8 and ∥BG{BZ∥ďMZă 8;
then
sup
t
∥Zt∥Lră 8 w.p.1.
Proof.
(a) In a first step, we show that, given event ωPΩ, realization Ztpωqis unique with probability
one. To do this, introduce initial condition z˝for ℓą1such that z˝PZand ∥z˝∥ă 8.
Define
Zp´ℓq
tpωq “ Gpℓqpy˝, ϵt´ℓ`1:tpωqq.
48
Further, let Z1p´ℓq
tbe the realization with initial condition z1
˝“ z˝and innovation realiza-
tions ϵt´ℓ`1:tpωq. Note that
Zp´ℓq
tpωq ´ Z1p´ℓq
tpωq
ďCℓ
Z
z˝´z1
˝
,
which goes to zero as ℓÑ 8. Therefore, if we set Ztpωq:“limℓÑ8 Zp´ℓq
tpωq,Ztpωqis
unique with respect to the choice of z˝w.p.1. A similar recursion shows that
Zp´ℓq
tpωq
ďCℓ
Z∥z˝∥`
ℓ´1
ÿ
k“0
Ck
ZCϵ∥ϵt´kpωq∥.
By norm equivalence, this implies
Zp´ℓq
t
LrďCℓ
Z∥z˝∥r`
ℓ´1
ÿ
k“0
Ck
ZCϵ∥ϵt´k∥Lr
ďCℓ
Z∥z˝∥r` p1´CZq´1Cϵsup
tPZ
∥ϵt∥Lră 8,
and taking the limit ℓÑ 8 proves the claim.
(b) Consider again distinct initial conditions z1
˝“ z˝and innovation realizations ϵt´ℓ`1:tpωq,
yielding Z1p´ℓq
tpωqand Zp´ℓq
tpωq, respectively. We may use the contraction bound derived
in the proof of Proposition 3.1 (b) below, that is,
Zp´ℓq
tpωq ´ Z1p´ℓq
tpωq
rďCℓ
ZC2∥z˝´z1
˝∥r,
where C2ą0is a constant. With trivial adjustments, the uniqueness and limit arguments
used for (a) above apply here too.
Proof of Proposition 3.1.
(a) By assumption it holds that for all pz, z1q P ZˆZand pe, e1q P EˆE
∥Gpz, ϵq ´ Gpz1, ϵ1q∥ďCZ∥z´z1∥`Cϵ∥e´e1∥
holds, where 0ďCZă1and 0ďCϵă 8. The equivalence of norms directly generalizes
this inequality to any r-norm for rą2. We study ∥Zt`h´Z1
t`h∥rwhere Z1
t`his constructed
with a time-tperturbation of the history of Zt`h. Therefore, for any given tand hď1it
holds that
Zt`h´GphqpZ1
t, ϵt`1:t`hq
rďCZ∥Gph´1qpZt, ϵt`1:t`h´1q ´ Gph´1qpZ1
t, ϵt`1:t`h´1q∥r
ďCh
Z∥Zt´Z1
t∥r,
since sequence ϵt`1:t`his common between Zt`hand Z1
t`h. Clearly then
Zt`h´GphqpZ1
t, ϵt`1:t`hq
rď2∥Zt∥rexpp´γhq
49
for γ“ ´logpCZq. Letting a“2∥Zt∥rand shifting time index tbackward by h, since
supt∥Zt∥Lră 8 w.p.1 from Lemma B.1 the result for Lrfollows with τ“1.
(b) Proceed similar to (a), but notice that now we must handle cases of steps 1ďhăh˚.
Consider iterate h˚`1, for which
Zt`h`1´Gph`1qpZ1
t, ϵt`1:t`h`1q
rďCZ∥GphqpGpZt, ϵt`1q, ϵt`2:t`hq ´ GphqpGpZ1
t, ϵt`1q, ϵt`2:t`hq∥r
ďCh
Z∥GpZt, ϵt`1q ´ GpZ1
t, ϵt`1q∥r
ďCh
ZMZ∥Zt´Z1
t∥r
by the mean value theorem. Here we may assume that MZě1otherwise we would fall
under case (a), so that MZďM2
Zď. . . ďMh˚´1
Z. More generally,
Zt`h`1´Gph`1qpZ1
t, ϵt`1:t`h`1q
rďCjphq
ZmaxtMh˚´1
Z,1u∥Zt´Z1
t∥r
for jphq:“th{h˚u. Result (b) then follows by noting that jphq ě h{h˚´1and then
proceeding as in (a) to derive GMC coefficients.
Companion and Lagged Vectors. The assumption of GMC for a process translates natu-
rally to vectors that are composed of stacked lags of realizations. This, for example, is important
in the discussion of Section 3when imposing Assumption 9, since one needs that series regressors
tW2tutPZbe GMC.
Recall that W2t“ pXt, Xt´1, . . . , Xt´p, Yt´1, . . . , Yt´p, ϵ1tq. Here we shall reorder this vector
slightly to be
W2t“ pXt, Xt´1, Yt´1, . . . , Xt´p, Yt´p, ϵ1tq.
For hą0and 1ďlďh, let Z1
t`j:“ΦplqpZ1
t, . . . , Z1
t´p;ϵt`1:t`jqbe the a perturbed version of
Zt, where Z1
t, . . . , Z1
t´pare taken from an independent copy of tZtutPZ. Define
W1
2t“ pX1
t, X1
t´1, Y 1
t´1, . . . , X1
t´p, Y 1
t´p, ϵ1tq.
Using Minkowski’s inequality
∥W2t`h´W1
2t`h∥Lrď∥Xt`h´X1
t`h∥Lr`
p
ÿ
j“1
∥Zt`h´j´Z1
t`h´j∥Lr
ď
p
ÿ
j“0
∥Zt`h´j´Z1
t`h´j∥Lr,
thus, since pą0is fixed finite,
sup
t
∥W2t`h´W1
2t`h∥Lrď
p
ÿ
j“0
∆rph´jqďpp`1qa1Zexpp´a2Zhq.
Above, a1Zand a2Zare the GMC coefficients of tZtutPZ.
50
B.2 Lemma 3.1 and Matrix Inequalities under Dependence
In order to prove Lemma 3.1, the idea is to modify the approach of Chen and Christensen (2015),
which relies on Berbee’s Lemma and an interlaced coupling, to handle variables with physical
dependence. Chen et al. (2016) provide an example on how to achieve this when working with
self-normalized sums. In what follows I modify their ideas to work with random dependent
matrices.
First of all, I recall below a Bernstein-type inequality for independent random matrices of
Tropp (2012).
Theorem B.1. Let tΞiun
i“1be a finite sequence of independent random matrices with dimensions
d1ˆd2. Assume ErΞis “ 0for each iand max1ďiďn∥Ξi∥ďRnand define
ς2
n:“max #
n
ÿ
i“1
E“Ξi,nΞ1
j,n‰
,
n
ÿ
i“1
E“Ξ1
i,nΞj,n ‰
+.
Then for all zě0,
P˜
n
ÿ
i“1
Ξi
ěz¸ď pd1`d2qexp ˆ´z2{2
nqς2
n`qRnz{3˙.
The main exponential matrix inequality due to Chen and Christensen (2015), Theorem 4.2
is as follows.
Theorem B.2. Let tXiuiPZwhere XiPXbe a β-mixing sequence and let Ξi,n “ΞnpXiqfor
each iwhere Ξn:XÞÑ Rd1ˆd2be a sequence of measurable d1ˆd2matrix-valued functions.
Assume that ErΞi,ns “ 0and ∥Ξi,n∥ďRnfor each iand define
S2
n:“max ␣E“∥Ξi,nΞ1
j,n∥‰,E“∥Ξ1
i,nΞj,n ∥‰(.
Let 1ďqďn{2be an integer and let I‚“qtn{qu, . . . , n when qtn{quănand I‚“ H otherwise.
Then, for all zě0,
P˜
n
ÿ
i“1
Ξi,n
ě6z¸ďn
qβpqq ` P˜
ÿ
iPI‚
Ξi,n
ěz¸`2pd1`d2qexp ˆ´z2{2
nqS2
n`qRnz{3˙,
where ∥řiPI‚Ξi,n∥:“0whenever I‚“ H.
To fully extend Theorem B.2 to physical dependence, I will proceed in steps. First, I
derive a similar matrix inequality by directly assuming that random matrices Ξi,n have physical
dependence coefficient ∆Ξ
rphq. In the derivations I will use that
1
pd2q1{2´1{r∥A∥rď∥A∥2ď pd1q1{2´1{r∥A∥r.
for rě2.
51
Theorem B.3. Let tϵjujPZbe a sequence of i.i.d. variables and let tΞi,n un
i“1,
Ξi,n “GΞ
np...,ϵi´1, ϵiq
for each i, where Ξn:XÞÑ Rd1ˆd2, be a sequence of measurable d1ˆd2matrix-valued functions.
Assume that ErΞi,ns “ 0and ∥Ξi,n∥ďRnfor each iand define
S2
n:“max ␣E“∥Ξi,nΞ1
j,n∥‰,E“∥Ξ1
i,nΞj,n ∥‰(.
Additionally assume that ∥Ξi,n∥Lră 8 for rą2and define the matrix physical dependence
measure ∆Ξ
rphqas
∆Ξ
rphq:“max
1ďiďn
Ξi,n ´Ξh˚
i,n
Lr,
where Ξh˚
i,n :“GΞ
np...,ϵ˚
i´h´1, ϵ˚
i´h, ϵi´h`1, . . . , ϵi´1, ϵiqfor independent copy tϵ˚
jujPZ. Let 1ďqď
n{2be an integer and let I‚“qtn{qu, . . . , n when qtn{quănand I‚“ H otherwise. Then, for
all zě0,
P˜
n
ÿ
i“1
Ξi,n
ě6z¸ďnr`1
qrpd2qr{2´1zr∆Ξ
rpqq`P˜
ÿ
iPI‚
Ξi,n
ěz¸`2pd1`d2qexp ˆ´z2{2
nqS2
n`qRnz{3˙,
where ∥řiPI‚Ξi,n∥:“0whenever I‚“ H.
Proof. To control dependence, we can adapt the interlacing block approach outlined by Chen
et al. (2016). To interlace the sum, split it into
n
ÿ
i“1
Ξi,n “ÿ
jPKe
Jk`ÿ
jPJo
Wk`ÿ
iPI‚
Ξi,n,
where Wj:“řqj
i“qpj´1q`1Ξi,n for j“1,...,tn{quare the blocks, I‚:“ tqtn{qu`1, . . . , nuif
qtn{quănand Jeand Joare the subsets of even and odd numbers of t1,...,tn{quu, respectively.
For simplicity define J“JeYJoas the set of block indices and let
W:
j:“E“Wj|ϵℓ, qpj´2q ` 1ďℓďqj ‰.
Note that by construction tW:
jujPJeare independent and also tW:
jujPJoare independent. Using
the triangle inequality we find
P˜
n
ÿ
i“1
Ξi,n
ě6z¸ďP˜
ÿ
jPJpWj´W:
jq
`
ÿ
jPJ
W:
j
`
ÿ
iPI‚
Ξi,n
ě6z¸
ďP˜
ÿ
jPJpWj´W:
jq
ěz¸`P˜
ÿ
jPJe
W:
j
ěz¸
`P˜
ÿ
jPJo
W:
j
ěz¸`P˜
ÿ
iPI‚
Ξi,n
ěz¸
“I`II `III `IV.
52
We keep term IV as is. As in the proof of Chen and Christensen (2015), terms II and III
consist of sums of independent matrices, where each W:
jsatisfies ∥W:
j∥ďqRnand
max !E”∥W:
jW:1
j∥ı,E”∥W:1
jW:
j∥ı)ďqS2
n.
Then, using the exponential matrix inequality of Tropp (2012),
P˜
ÿ
jPJe
W:
k
ěz¸ď pd1`d2qexp ˆ´z2{2
nqS2
n`qRnz{3˙.
The same holds for the sum over Jo. Finally, we use the physical dependence measure ∆Ξ
rto
bound I. Start with the union bound to find
P˜
ÿ
jPJpWj´W:
jq
ěz¸ďP˜ÿ
jPJ
Wj´W:
j
ěz¸
ďn
qP´
Wj´W:
j
ěq
nz¯,
where we have used that tn{quďn{q. Since Wjand W:
jdiffer only over a σ-algebra that is q
steps in the past, by assumption
Wj´W:
j
Lrďq∆Ξ
rpqq,
which implies, by means of the rth moment inequality,
P´
Wj´W:
j
ěq
nz¯ďP´pd2q1{r´1{2
Wj´W:
j
rěq
nz¯ďnr
qr´1pd2qr{2´1zr∆Ξ
rpqq.
where pd2q1{r´1{2is the operator norm equivalence constant such that ∥¨∥ě pd2q1{r´1{2∥¨∥r
(Feng,2003). Therefore,
P˜
ÿ
jPJpWj´W:
jq
ěz¸ďnr`1
qrpd2qr{2´1zr∆Ξ
rpqq
as claimed.
Notice that the first term in the bound is weaker than that derived by Chen and Christensen
(2015). The β-mixing assumption and Berbee’s Lemma give strong control over the probability
Pp∥řjPJpWj´W:
jq∥ězq. In contrast, assuming physical dependence means we have to explic-
itly handle a moment condition. One might think of sharpening Theorem B.3 by sidestepping
the rth moment inequality (c.f. avoiding Chebyshev’s inequality in concentration results), but
I do not explore this approach here.
The second step is to map the physical dependence of a generic vector time series tXiuiPZ
to matrix functions.
Proposition B.1. Let tXiuiPZwhere Xi“Gp...,ϵi´1, ϵiq P Xfor tϵjujPZi.i.d. be a sequence
53
with finite rth moment, where rą0, and functional physical dependence coefficients
∆rphq “ sup
i
Xi`h´GphqpX˚
i, ϵi`1:i`hq
Lr
for hě1. Let Ξi,n “ΞnpXiqfor each iwhere Ξn:XÞÑ Rd1ˆd2be a sequence of measurable
d1ˆd2matrix-valued functions such that Ξn“ pv1, . . . , vd2qfor vℓPRd1. If ∥Ξi,n ∥Lră 8 and
CΞ,ℓ :“sup
xPX
∥∇vℓpxq∥ďCΞă 8,
then matrices Ξi,n have physical dependence coefficients
∆Ξ
rphq “ sup
i
Ξi,n ´Ξh˚
i,n
Lrďad1ˆd2
d1˙1{r
CΞ∆rphq,
where Ξh˚
i,n “ΞnpGphqpX1
i, ϵi`1:i`hqq.
Proof. To derive the bound, we use ΞnpXiqand ΞnpXh˚
iqin place of Ξi,n and Ξh˚
i,n, respectively,
where Xh˚
i“GphqpX˚
i, ϵi`1:i`hq. First we move from studying the operator r-norm (recall,
rą2) to the Frobenius norm,
ΞnpXiq ´ ΞnpXh˚
iq
rď pd2q1{2´1{r
ΞnpXiq ´ ΞnpXh˚
iq
F.
where as intermediate step we use the 2-norm. Let Ξn“ pv1, . . . , vd2qfor vℓPRd1and ℓP
1, . . . , d2, so that
∥Ξn∥F“g
f
f
ed2
ÿ
ℓ“1
∥vℓ∥2
where vℓ“ pvℓ1, . . . , vℓd1q1. Since vℓ:XÞÑ Rd1are vector functions, the mean value theorem
gives that
ΞnpXiq ´ ΞnpXh˚
iq
Fďg
f
f
ed2
ÿ
ℓ“1
C2
Ξ,ℓ ∥Xi´Xh˚
i∥2ďad2CΞ∥Xi´Xh˚
i∥.
Combining results and moving from the vector r-norm to the 2-norm yields
ΞnpXiq ´ ΞnpXh˚
iq
rď pd2q1´1{rpd1q1{2´1{rCΞ∥Xi´Xh˚
i∥r.
The claim involving the Lrnorm follows immediately.
The following Corollary, which specifically handles matrix functions defined as outer prod-
ucts of vector functions, is immediate and covers the setups of series estimation.
Corollary B.1. Under the conditions of Proposition B.1, if
ΞnpXiq “ ξnpXiqξnpXiq1`Qn
54
where ξn:XÞÑ Rdis a vector function and QnPRdˆdis nonrandom matrix, then
∆Ξ
rphq ď d3{2´2{rCξ∆rphq,
where Cξ:“supxPX∥∇ξnpxq∥ă 8.
Proof. Matrix Qncancels out since it is nonrandom and appears in both ΞnpXiqand ΞnpXh˚
iq.
Since ΞnpXiqis square, the ratio of row to column dimensions simplifies.
The following Corollaries to Theorem B.3 can now be derived in a straightforward manner.
Corollary B.2. Under the conditions of Theorem B.3 and Proposition B.1, for all zě0
P˜
n
ÿ
i“1
Ξi,n
ě6z¸ďnr`1
qrzrpd2q2´pr{2`1{rqpd1q1{2´1{rCΞ∆rpqq ` P˜
ÿ
iPI‚
Ξi,n
ěz¸
`2pd1`d2qexp ˆ´z2{2
nqS2
n`qRnz{3˙.
where ∆rp¨q if the functional physical dependence coefficient of Xi.
Corollary B.3. Under the conditions of Theorem B.3 and Proposition B.1, if q“qpnqis chosen
such that nr`1
qrpd2q2´pr{2`1{rqpd1q1{2´1{rCΞ∆rpqq “ op1q
and Rnaqlogpd1`d2q “ opSn?nqthen
n
ÿ
i“1
Ξi,n
“OP´Snanq logpd1`d2q¯.
This result is almost identical to Corollary 4.2 in Chen and Christensen (2015), with the only
adaptation of using Theorem B.3 as a starting point. Condition Rnaqlogpd1`d2q “ opSn?nq
is simple to verify by assuming, e.g., q“opn{logpnqq since logpd1`d2q À logpKqand K“opnq.
Note that when d1“d2”K, which is the case of interest in the series regression setup, the
first condition in Corollary B.3 reduces to
K5{2´pr{2`2{rqCΞ∆rpqq “ op1q,
which also agrees with the rate of Corollary B.1. Assumption 7(i) and a compact domain further
allow to explicitly bound factor CΞby
CΞÀKω2,
so that the required rate becomes
Kρ∆rpqq “ op1q,where ρ:“3
2´r
2`ω2.
55
Proof of Lemma 3.1.The proof follows from Corollary B.3 by the same steps of the proof of
Lemma 2.2 in Chen and Christensen (2015). Simply take Ξi,n “n´1pr
pbqK
πpXiqr
pbqK
πpXiq1´IKq
and note that Rnďn´1p1`ζ2
K,nλ2
K,nqand Snďn´2p1`ζ2
K,nλ2
K,nq.
For Lemma 3.1 to hold under GMC assumptions a valid choice for qpnqis
qpnq “ γ´1logpKρnr`1q
where γas in Proposition 3.1. This is due to
ˆn
q˙r`1
qKρ∆rpqq À nr`1
qrKρexpp´γqq
Ànr`1Kρ
logpKρnr`1qrpKρnr`1q´1
“1
logpKρnr`1qr“op1q.
Note then that, if λK,n À1and ζK,n À?K, since
ζK,nλK,n cqlog K
nÀcKlogpKρnr`1qlogpKq
nÀcKlogpnρ`r`2qlogpnq
nÀcKlogpnq2
n,
to satisfy Assumption 8we may assume aKlogpnq2{n“op1qas in Remark 2.3 of Chen and
Christensen (2015) for the case of exponential β-mixing regressors.
B.3 Theorem 3.2
Before delving into the proof of Theorem 3.2, note that we can decompose p
Π2´Π2as
p
Π2´Π2“ pp
Π2´p
Π˚
2q`pp
Π˚
2´r
Π2q`pr
Π2´Π2q,
where r
Π2is the projection of Π2onto the linear space spanned by the sieve. The last two terms
can be handled directly with the theory developed by Chen and Christensen (2015). Specifically,
their Lemma 2.3 controls the second term (variance term), while Lemma 2.4 handles the third
term (bias term). This means here we can focus on the first term, which is due to using generated
regressors pϵ1tin the second step.
Since p
Π2can be decomposed in dYrows of semi-nonparametric coefficients, i.e.,
Yt“»
—
—
–
π2,1
.
.
.
π2,dY
fi
ffi
ffi
flW2t`ru2t,
we further reduce to the scalar case. Let π2be any row of Π2and, with a slight abuse of notation,
Ythe vector of observations of the component of Ytof the same row, so that one may write
pπ2pxq ´ pπ˚
2pxq “ r
bK
πpxq`p
r
B1
πp
r
Bπ˘´`p
r
Bπ´r
Bπ˘1Y`r
bK
πpxq„`p
r
B1
πp
r
Bπ˘´´`r
B1
πr
Bπ˘´ȷr
B1
πY
56
“I`II
where r
bK
πpxq “ Γ´1{2
B,2bK
πpxqis the orthonormalized sieve according to ΓB,2:“ErbK
πpW2tqbK
πpW2tq1s,
r
Bπis the infeasible orthonormalized design matrix (involving ϵ1t) and p
r
Bπis feasible orthonor-
malized design matrix (involving p
ϵ1t). In particular, note that
p
Bπ“Bπ`Rn,where Rn:“»
—
—
–
0 0 p
ϵ11 ´ϵ11
.
.
.¨¨¨ .
.
..
.
.
0 0 pϵ1n´ϵ1n
fi
ffi
ffi
flPRnˆK,
which implies p
r
Bπ´r
Bπ“RnΓ´1{2
B,2“:r
Rn.
The next Lemma provides a bound for the difference pp
r
B1
πp
r
Bπ{nq´pr
B1
πr
Bπ{nqthat will be
useful in the proof of Theorem 3.1 below.
Lemma B.2. Under the setup of Theorem 3.1, it holds
pp
r
B1
πp
r
Bπ{nq´pr
B1
πr
Bπ{nq
“OPpaK{nq.
Proof. Using the expansion p
r
B1
πp
r
Bπ“r
B1
πr
Bπ` p r
B1
πr
Rn`r
R1
nr
Bπq ` r
R1
nr
Rn, one immediately finds
that
pp
r
B1
πp
r
Bπ{nq´pr
B1
πr
Bπ{nq
ď2
r
B1
πr
Rn{n
`
r
R1
nr
Rn{n
.
The second right-hand side factor satisfies
r
R1
nr
Rn{n
ďλ2
K,n
R1
nRn{n
. Moreover,
R1
nRn{n
“
1
n
n
ÿ
t“1ppϵ1t´ϵ1tq2
“
1
n
n
ÿ
t“1pΠ1´p
Π1q1W1tW1
1tpΠ1´p
Π1q
ď
Π1´p
Π1
2
W1
1W1{n
“OPpn´1q,
since ∥W1
1W1{n∥“OPp1q. Under Assumption 12,λ2
K,n{n“oPpaK{nqsince B-splines and
wavelets satisfy λK,n À1. Consequently,
r
R1
nr
Rn{n
“oPpaK{nq.
Factor ∥r
B1
πRn{n∥is also straightforward, but depends on sieve dimension K,
r
B1
πRn{n
ď
1
n
n
ÿ
t“1r
bK
πpW2tqpp
ϵ1t´ϵ1tq
“
1
n
n
ÿ
t“1r
bK
πpW2tqW1
1tpΠ1´p
Π1q
ď
Π1´p
Π1
r
B1
πW1{n
“OPpaK{nq,
since ∥r
B1
πW1{n∥“OPp?Kqas the column dimension of W1is fixed. The claim then follows by
57
noting OPpaK{nqis the dominating order of convergence.
Proof of Theorem 3.2.Since p
Π1the least squares estimator of a linear equation, the rate of
convergence is the parametric rate n´1{2. The first result is therefore immediate.
For the second step, we consider
p
Π2´Π2
8ď
p
Π2´p
Π˚
2
8`
p
Π˚
2´Π2
8,
and bound explicitly the first right-hand side term. For a given component of the regression
function,
|pπ2pxq ´ pπ˚
2pxq| ď |I|`|II|.
We now control each term on the right side.
(1) It holds
|I| ď ∥r
bK
πpxq∥
`p
r
B1
πp
r
Bπ{n˘´
`p
r
Bπ´r
Bπ˘1Y{n
ďsup
xPW2
∥r
bK
πpxq∥
`p
r
B1
πp
r
Bπ{n˘´
`p
r
Bπ´r
Bπ˘1Y{n
ďζK,nλK,n
`p
r
B1
πp
r
Bπ{n˘´
`p
r
Bπ´r
Bπ˘1Y{n
.
Let Andenote the event on which
p
r
B1
πp
r
Bπ{n´IK
ď1{2, so that
`p
r
B1
πp
r
Bπ{n˘´
ď2on
An. Notice that since ∥pp
r
B1
πp
r
Bπ{nq´pr
B1
πr
Bπ{nq∥“oPp1q(Lemma B.2) and, by assumption,
∥r
B1
πr
Bπ{n´IK∥“oPp1q, then PpAc
nq “ op1q. On Anthen
|I| À ζK,nλ2
K,n
`p
Bπ´Bπ˘1Y{n
“ζK,nλ2
K,n
R1
nY{n
.
From R1
nY“řn
t“1bK
πpW2tqppϵ1t´ϵ1tqYt“ pΠ1´p
Π1q1W1
1Yit follows that
R1
nY{n
ď
Π1´p
Π1
W1
1Y{n
on An, meaning
|I| “ OP`ζK,nλ2
K,n{?n˘
as ∥W1
1Y{n∥“OPp1qand PpAc
nq “ op1q.
(2) Again we proceed by uniformly bounding II according to
|II| ď ζK,nλK,n
`p
r
B1
πp
r
Bπ{n˘´´`r
B1
πr
Bπ{n˘´
r
B1
πY{n
.
The last factor has order ∥r
B1
πY{n∥“OPp?Kqsince r
Bπis growing in row dimension with
K. For the middle term, introduce
∆B:“p
r
B1
πp
r
Bπ{n´r
B1
πr
Bπ{n
58
and event
Bn:“!
`r
B1
πr
Bπ{n˘´∆B
ď1{2)X!
r
B1
πr
Bπ{n´IK
ď1{2).
On Bn, we can apply the bound (Horn and Johnson,2012)
`p
r
B1
πp
r
Bπ{n˘´´`r
B1
πr
Bπ{n˘´
ď∥pr
B1
πr
Bπ{nq´∥2∥∆B∥
1´∥pr
B1
πr
Bπ{nq´∆B∥À
p
r
B1
πp
r
Bπ{n´r
B1
πr
Bπ{n
.
Since
p
r
B1
πp
r
Bπ{n´r
B1
πr
Bπ{n
“OPpaK{nqby Lemma B.2, we get
|II| “ OPˆζK,nλK,n
K
?n˙
on Bn. Finally, using PppAXBqcq ď PpAcq ` PpBcqwe note that PpBc
nq “ op1qso that
the bound asymptotically holds irrespective of event Bn.
Thus, we have shown that
|pπ2pxq ´ pπ˚
2pxq| ď OPˆζK,nλ2
K,n
1
?n˙`OPˆζK,nλK,n
K
?n˙
“OPˆζK,nλK,n
K
?n˙
as clearly ?n´1“opK{?nqand, as discussed in the proof of Lemma B.2,λ2
K,n{n“oPpaK{nq.
This bound is uniform in xand holds for each of the (finite number of) components of p
Π2,
therefore the proof is complete.
B.4 Theorem 4.1
Before proving impulse response consistency, I show that compositions of the model’s autore-
gressive nonlinear maps are also consistently estimated at any fixed horizon. This means that
the “functional moving average" coefficient matrices Γjinvolved in Proposition 4.1 can be con-
sistently estimated with p
Π1and p
Π2.
Lemma B.3. Under the assumptions of Theorem 3.2 and for any fixed integer jě0it holds
∥p
Γj´Γj∥8“oPp1q.
Proof. By definition, recall that ΓpLq “ ΨpLqGpLqwhere Ψ“ pId´ApLqLq´1. Since ΨpLqis
an MA(8) lag polynomial, we have that
ΓpLq “ ˜8
ÿ
k“0
ΨkLk¸pG0`G1L`. . . `GpLpq,
where Ψ0“Id,tΨku8
k“1are purely real matrices and G0is a functional vector that may also
contain linear components (i.e. allow linear functions of Xt). This means that Γjis a convolution
59
of real and functional matrices,
Γj“
mintj, pu
ÿ
k“1
Ψj´kGk.
The linear coefficients of ApLqcan be consistently estimated by p
Π1and p
Π2, and thus plug-in
estimate p
Ψjis consistent for Ψj(Lütkepohl,2005). Therefore,
∥p
Γj´Γj∥8ď
mintj, pu
ÿ
k“1
Ψj´kGk´p
Ψj´kp
Gk
8
ď
mintj, pu
ÿ
k“1
Ψj´k´p
Ψj´k
8∥Gk∥8`
p
Ψj´k
8
Gk´p
Gk
8
ď
mintj, pu
ÿ
k“1
opp1qCG,k `OPp1qopp1q
“opp1q,
where CG,k is a constant and ∥Gk´p
Gk∥8“opp1qas a direct consequence of Proposition 3.2.
Note. Since we assume that the model respects either contractivity or stability conditions,
the impulse responses must decay (eventually) exponentially fast to zero. This means that by
“stitching” bounds appropriately, one should also be able to achieve convergence uniformly over
h“0,1,...,8.
Recall now that the sample estimate for the relaxed-shock impulse response is
y
Ą
IRFh,ℓpδq “ Θh,¨1δ n´1
n
ÿ
t“1
ρppϵ1tq `
h
ÿ
j“0p
Vj,ℓpδq
where
p
Vj,ℓpδq “ 1
n´j
n´j
ÿ
t“1pvj,ℓ`Xt`j:t;p
r
δt˘“1
n´j
n´j
ÿ
t“1„p
ΓjpγjpXt`j:t;p
r
δtq ´ p
ΓjXt`jȷ.
Therefore, the estimated horizon himpulse response of the ℓth variable is
y
Ą
IRFh,ℓpδq:“p
Θh,ℓ1δ n´1
n
ÿ
t“1
ρpp
ϵ1tq `
h
ÿ
j“0«1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;p
r
δt˘ff.
Lemma B.4. Under the assumptions of Theorem 4.1 , let xj:0 “ pxj, . . . , x0q P Xjand εPE1
be nonrandom quantities. Let r
δbe the relaxed shock determined by δ,ρand ε. Then
(i) supxj:0,ε |pγjpxj:0;r
δq ´ γjpxj:0;r
δq|“oPp1q,
(ii) supxj:0,ε |pvj,ℓ`xj:0;r
δ˘´vj,ℓ`xj:0 ;r
δ˘|“oPp1q,
for any fixed integers jě0and ℓP t1, . . . , du.
Proof.
60
(i) From Proposition 4.1, we have that
pγjpxj:0;δq “ xj`Θj,11 δρpεq `
j
ÿ
k“1pΓk,11xj´kpr
δq ´ Γk,11xj´kq,
thus
|p
γjpxj:0;δq ´ γjpxj:0 ;δq|“
j
ÿ
k“1”pp
Γk,11xj´kpr
δq ´ p
Γk,11xj´kq´pΓk,11xj´kpr
δq ´ Γk,11xj´kqı
ď
j
ÿ
k“1
p
Γk,11xj´kpr
δq ´ Γk,11xj´kpr
δq
`
j
ÿ
k“1
p
Γk,11xj´k´Γk,11xj´k
.
This yields
sup
xj:0,ε |pγjpxj:0 ;r
δq ´ γjpxj:0;r
δq|ď2jsup
xPX
p
Γk,11x´Γk,11x
.
Since jis finite and fixed and the uniform consistency bound of Lemma B.3 holds, a fortiori
supxPX
p
Γk,11x´Γk,11x
“oPp1q.
(ii) Similarly to above,
|p
vj,ℓ`xj:0 ;r
δ˘´vj,ℓ`xj:0 ;r
δ˘|“
´p
Γj,ℓp
γjpxj:0;r
δq ´ Γj,ℓγjpxj:0 ;r
δq¯´´p
Γj,ℓxj´Γj,ℓ xj¯
ď∥p
Γj,ℓ ´Γj,ℓ∥8`∥Γj,ℓ ∥8|pγjpxj:0;δq ´ γjpxj:0;δq|
`|p
Γj,ℓxj´Γj,ℓ xj|
ď2∥p
Γj,ℓ ´Γj,ℓ∥8`CΓ,j,l |pγjpxj:0 ;δq ´ γjpxj:0;δq|,
where we have used that γjpxj:0;r
δq P Xto derive the first term in the second line. In the
last line, CΓ,j,l is a constant such that
∥Γj,ℓ∥8ď
mintj, pu
ÿ
k“1
∥Ψj´k∥8∥Gk∥8ďCΓ,j,l.
The claim then follows thanks to Lemma B.3 and (i).
In what follows, define pvj,ℓ`Xt`j:t;r
δt˘to be a version of vj,ℓ that is constructed using coef-
ficient estimates from tp
Π1,p
Π2ubut evaluated on the true innovations ϵt.
Proof of Theorem 4.1.If we introduce
Ą
IRFh,ℓpδq˚:“p
Θh,ℓ1δ n´1
n
ÿ
t“1
ρpϵ1tq `
h
ÿ
j“0«1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;r
δt˘ff,
61
then clearly
y
Ą
IRFh,ℓpδq ´ Ą
IRFh,ℓpδq
ď
y
Ą
IRFh,ℓpδq ´ Ą
IRF˚
h,ℓpδq
`
Ą
IRF˚
h,ℓpδq ´ Ą
IRFh,ℓpδq
“I`II.
To control II , we can observe
II ď
p
Θh,ℓ1δ n´1
n
ÿ
t“1
ρpϵ1tq ´ Θh,ℓ1δErρpϵ1tqs
`
h
ÿ
j“0
1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;r
δt˘´Ervj,ℓ`Xt`j:t;r
δ˘s
ďδ
p
Θh,ℓ1´Θh,ℓ1
n´1
n
ÿ
t“1
ρpϵ1tq
`δ
p
Θh,ℓ1
n´1
n
ÿ
t“1
ρpϵ1tq ´ Erρpϵ1tqs
`
h
ÿ
j“0
1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;r
δt˘´Ervj,ℓ`Xt`j:t;r
δ˘s
ďδ
p
Θh,ℓ1´Θh,ℓ1
n´1
n
ÿ
t“1
ρpϵ1tq
`δ
p
Θh,ℓ1
n´1
n
ÿ
t“1
ρpϵ1tq ´ Erρpϵ1tqs
`
h
ÿ
j“0
1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;r
δt˘´vj,ℓ`Xt`j:t;r
δt˘
`
h
ÿ
j“0
1
n´j
n´j
ÿ
t“1
vj,ℓ`Xt`j:t;r
δt˘´Ervj,ℓ`Xt`j:t;r
δ˘s
.
The first two terms in the last bound are oPp1qsince
p
Θh,ℓ1´Θh,ℓ1
“oPp1q, as discussed in
Lemma B.3, and n´1řn
t“1ρpϵ1tqp
ÑErρpϵ1tqs by a WLLN. For the other terms in the last sum
above, we similarly note that
1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;r
δt˘´vj,ℓ`Xt`j:t;r
δt˘
“oPp1q
from Lemma B.4, while thanks again to a WLLN it holds
1
n´j
n´j
ÿ
t“1
vj,ℓ`Xt`j:t;r
δt˘´Ervj,ℓ`Xt`j:t;r
δ˘s
“oPp1q.
Since his fixed finite, this implies that II “oPp1q.
Considering now I, we can write
Iďδ
p
Θh,ℓ1
n´1
n
ÿ
t“1
ρpp
ϵ1tq ´ ρpϵ1tq
`
h
ÿ
j“0
1
n´j
n´j
ÿ
t“1p
vj,ℓ`Xt`j:t;p
r
δt˘´p
vj,ℓ`Xt`j:t;r
δt˘
“I1`I2.
62
Since by assumption ρis a bump function, thus continuously differentiable over the range of ϵt,
by the mean value theorem
n´1
n
ÿ
t“1
ρppϵ1tq ´ ρpϵ1tq
ďn´1
n
ÿ
t“1
|ρ1
t|
p
ϵ1t´ϵ1t
for a sequence tρ1
tun
t“1of evaluations of first-order derivative ρ1at values ϵtin the interval with
endpoint ϵtand p
ϵt. One can use |ρ1
t| ď Cρ1with a finite positive constant Cρ1, and by recalling
that pϵ1t´ϵ1t“ pΠ1´p
Π1q1W1tone thus gets
n´1
n
ÿ
t“1
ρppϵ1tq ´ ρpϵ1tq
ďCρ1
1
n
n
ÿ
t“1
pΠ1´p
Π1q1W1t
ďCρ1∥Π1´p
Π1∥2
1
n
n
ÿ
t“1
∥W1t∥2“oPp1q.
This proves that term I1is itself oPp1q. Finally, to control I2, we use that by construction
estimator p
Π2is composed of sufficiently regular functional elements i.e. B-spline estimates of
order 1 or greater. Thanks again to the mean value theorem
1
n´j
n´j
ÿ
t“1pvj,ℓ`Xt`j:t;p
r
δt˘´pvj,ℓ`Xt`j:t;r
δt˘
ď1
n´j
n´j
ÿ
t“1
pvj,ℓ`Xt`j:t;p
r
δt˘´pvj,ℓ`Xt`j:t;r
δt˘
ďCpv1,j,ℓ
1
n´j
n´j
ÿ
t“1
p
ϵ1t´ϵ1t
for any fixed jand some Cpv1,j,ℓ ą0. This holds since pvj,ℓ is uniformly continuous by construc-
tion. Note that we have assumed that the nonlinear part of Π2belongs to a Hölder class with
smoothness są1(for simplicity, assume here that sis integer, otherwise a similar argument can
be made). Then, even though Cpv1,j,ℓ depends on the sample, it is bounded above in probability
for nsufficiently large. Following the discussion of term I1, we deduce that the last line in the
display above is opp1q. As his finite and independent of n, it follows that also I2is of order
oPp1q.
63
C Additional Plots
0 5 10 15 20 25
0
0.005
0.01
0.015 n = 2400
Sieve OLS
0 5 10 15 20 25
-0.02
-0.01
0
0.01
0.02
0.03
0.04 n = 2400
Sieve OLS
(a) δ“ `2
0 5 10 15 20 25
0
0.005
0.01
0.015 n = 2400
Sieve OLS
0 5 10 15 20 25
-0.01
0
0.01
0.02
0.03
0.04 n = 2400
Sieve OLS
(b) δ“ ´2
Figure 8: Simulation results for DGP 21when considering rφin place of φ.
64
-2 -1 0 1 2
-4
-2
0
2
Fed Funds Rate
-2 -1 0 1 2
-2
-1
0
1
2
3
Fed Funds Rate
-2 -1 0 1 2
-2
-1
0
1
2
Log Real GDP
-2 -1 0 1 2
-4
-2
0
2
Log Real GDP
-2 -1 0 1 2
-2
-1
0
1
2
PCE Inflation
-2 -1 0 1 2
-2
-1
0
1
2
PCE Inflation
Figure 9: Estimated nonlinear regression functions for the narrative U.S. monetary policy vari-
able. Contemporaneous (left side) and one-period lag (right side) effects are shown, linear and
nonlinear functions. For comparison, linear VAR coefficients (dark gray) and the identity map
(light gray, dashed) are shown as lines.
65
-5 -4 -3 -2 -1 0 1 2 3 4 5
(a) δ“ `1
-5 -4 -3 -2 -1 0 1 2 3 4 5
(b) δ“ ´1
Figure 10: Comparison of histograms and shock relaxation function for a positive (left) and
negative (right) shock in monetary policy. Original (blue) versus shocked (orange) distribution
of the sample realization of ϵ1t. The dashed vertical line is the mean of the original distribution,
while the solid vertical line is the mean after the shock.
66
0 5 10 15 20
Quarters
-0.5
0
0.5
1
1.5
2
2.5 Fed Funds Rate
Linear
Nonlin-par
Sieve
0 5 10 15 20
Quarters
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Log Real GDP
0 5 10 15 20
Quarters
-0.2
-0.1
0
0.1
0.2 PCE Inflation
(a) δ“ `1, knots at t´1,1u
0 5 10 15 20
Quarters
-2
-1.5
-1
-0.5
0Fed Funds Rate
0 5 10 15 20
Quarters
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Log Real GDP
0 5 10 15 20
Quarters
-0.2
-0.1
0
0.1
0.2 PCE Inflation
(b) δ“ ´1, knots at t´1,1u
0 5 10 15 20
Quarters
-0.5
0
0.5
1
1.5
2
2.5 Fed Funds Rate
Linear
Nonlin-par
Sieve
0 5 10 15 20
Quarters
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Log Real GDP
0 5 10 15 20
Quarters
-0.2
-0.1
0
0.1
0.2 PCE Inflation
(c) δ“ `1, knot at t0u
0 5 10 15 20
Quarters
-2
-1.5
-1
-0.5
0Fed Funds Rate
0 5 10 15 20
Quarters
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
Log Real GDP
0 5 10 15 20
Quarters
-0.2
-0.1
0
0.1
0.2 PCE Inflation
(d) δ“ ´1, knot at t0u
Figure 11: Robustness plots for U.S. monetary policy shock when changing knots compared to
those used in Figure 6. Note that linear and parametric nonlinear responses do not change.
67
GDP
0 5 10 15 20
Quarters
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5 Linear
Nonlin-par
Sieve
0 5 10 15 20
Quarters
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 5 10 15 20
Quarters
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 5 10 15 20
Quarters
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 5 10 15 20
Quarters
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0 5 10 15 20
Quarters
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
Figure 12: Relative changes in the GDP impulse responses function when the size of the shock is
reduced from that used in Figure 6. The standard deviation of Xt”ϵ1tis σϵ,1«0.5972. Linear
IRFs are re-scaled such that for all values of δthe linear response at h“0is one in absolute
value. Nonlinear IRFs are re-scaled by δtimes the linear response scaling factor.
68
0 0.1 0.2 0.3 0.4
-1
-0.5
0
IP
0 0.1 0.2 0.3 0.4
-10
-5
0
IP
0 0.1 0.2 0.3 0.4
-0.5
0
0.5
CPI
0 0.1 0.2 0.3 0.4
0
2
4
6
CPI
0 0.1 0.2 0.3 0.4
-2
-1
0
1
PP
0 0.1 0.2 0.3 0.4
0
5
10
PP
0 0.1 0.2 0.3 0.4
-1
-0.5
0
RT
0 0.1 0.2 0.3 0.4
0
5
10
RT
0 0.1 0.2 0.3 0.4
0
0.2
0.4
UR
0 0.1 0.2 0.3 0.4
0
0.2
0.4
UR
Figure 13: Estimated nonlinear regression functions for the 3M3M subjective interest rate un-
certainty measure. One-period (left side) and two-period lag (right side) effects are shown,
combining linear and nonlinear functions. For comparison, linear VAR coefficients (dark gray)
and the identity map (light gray, dashed) are shown as lines.
69
-0.3 -0.2 -0.1 0 0.1 0.2
(a) δ“σϵ
0 20 40 60
Months
0
0.2
0.4
0.6
(b) Envelope
Figure 14: [Top] Histograms and shock relaxation function for a one-standard-deviation shock
in interest rate uncertainty. Original (blue) versus shocked (orange) distribution of the sample
realization of ϵ1t. The dashed vertical line is the mean of the original distribution, while the
solid vertical line is the mean after the shock. [Bottom] Envelope (min-max) of shocked paths
for one-standard-deviation impulse response.
70
IP
0 20 40 60
Quarters
-12
-10
-8
-6
-4
-2
0
Linear
Sieve
0 20 40 60
Quarters
-12
-10
-8
-6
-4
-2
0
0 20 40 60
Quarters
-12
-10
-8
-6
-4
-2
0
0 20 40 60
Quarters
-12
-10
-8
-6
-4
-2
0
0 20 40 60
Quarters
-12
-10
-8
-6
-4
-2
0
0 20 40 60
Quarters
-12
-10
-8
-6
-4
-2
0
Figure 15: Relative changes in the industrial production impulse responses function when the
size of the shock is reduced from that used in Figure 7. The standard deviation of ”ϵ1tis
σϵ,1«0.0389. Linear IRFs are re-scaled such that for all values of δthe linear response at h“0
is one in absolute value. Nonlinear IRFs are re-scaled by δtimes the linear response scaling
factor.
71
CPI
0 20 40 60
Quarters
-3
-2
-1
0
1
Linear
Sieve
0 20 40 60
Quarters
-3
-2
-1
0
1
0 20 40 60
Quarters
-3
-2
-1
0
1
0 20 40 60
Quarters
-3
-2
-1
0
1
0 20 40 60
Quarters
-3
-2
-1
0
1
0 20 40 60
Quarters
-3
-2
-1
0
1
Figure 16: Relative changes in the CPI impulse responses function when the size of the shock
is reduced from that used in Figure 7. The standard deviation of ”ϵ1tis σϵ,1«0.0389. Linear
IRFs are re-scaled such that for all values of δthe linear response at h“0is one in absolute
value. Nonlinear IRFs are re-scaled by δtimes the linear response scaling factor.
72