# Practical methods for modelling weak VARMA processes: identification, estimation and specification with a macroeconomic application

**ABSTRACT** program on Mathematics of Information Technology and Complex Systems (MITACS)], the Canada Council for the Arts (Killam Fellowship), the CIREQ, the CIRANO, and the Fonds FCAR (Government of Québec). William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ). Mailing address: ABSTRACT In this paper, we develop practical methods for modelling weak VARMA processes. In a first part, we propose new identified VARMA representations, the diagonal MA equation form and the final MA equation form, where the MA operator is diagonal and scalar respectively. Both of these representations have the important feature that they constitute relatively simple modifications of a VAR model (in contrast with the echelon representation). In a second part, we study the problem of estimating VARMA models by relatively simple methods which only require linear regressions. We consider a generalization of the regression-based estimation method proposed by Hannan and Rissanen (1982). The asymptotic properties of the estimator are derived under weak hypotheses on the innovations (uncorrelated and strong mixing) so as to broaden the class of models to which it can be applied. In a third part, we present a modified information criterion which gives consistent estimates of the orders under the proposed representations. To demonstrate the importance of using VARMA models to study multivariate time series we compare the impulse-response functions and the out-of-sample forecasts generated by VARMA and VAR models.

**0**Bookmarks

**·**

**148**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**We study two linear estimators for stationary invertible VARMA models in echelon form – to achieve identification (model parameter unicity) – with known Kronecker indices. Such linear estimators are much simpler to compute than Gaussian maximum-likelihood estimators often proposed for such models, which require highly nonlinear optimization. The first estimator is an improved two-step estimator which can be interpreted as a generalized-least-squares extension of the two-step least-squares estimator studied in Dufour and Jouini (2005). The setup considered is also more general and allows for the presence of drift parameters. The second estimator is a new relatively simple three-step linear estimator which is asymptotically equivalent to ML, hence asymptotically efficient, when the innovations of the process are Gaussian. The latter is based on using modified approximate residuals which better take into account the truncation error associated with the approximate long autoregression used in the first step of the method. We show that both estimators are consistent and asymptotically normal under the assumption that the innovations are a strong white noise, possibly non-Gaussian. Explicit formulae for the asymptotic covariance matrices are provided. The proposed estimators are computationally simpler than earlier "efficient" estimators, and the distributional theory we supply does not rely on a Gaussian assumption, in contrast with Gaussian maximum likelihood or the estimators considered by Hannan and Kavalieris (1984b) and Reinsel, Basu and Yap (1992). We present simulation evidence which indicates that the proposed three-step estimator typically performs better in finite samples than the alternative multi-step linear estimators suggested by Hannan and Kavalieris (1984b), Reinsel et al. (1992), and Poskitt and Salau (1995).Computational Statistics & Data Analysis 03/2011; · 1.15 Impact Factor - SourceAvailable from: www2.cirano.qc.ca
- SourceAvailable from: uni-muenchen.de[Show abstract] [Hide abstract]

**ABSTRACT:**We consider portmanteau tests for testing the adequacy of structural vector autoregressive moving-average (VARMA) models under the assumption that the errors are uncorrelated but not necessarily independent. The structural forms are mainly used in econometrics to introduce instantaneous relationships between economic variables. We first study the joint distribution of the quasi-maximum likelihood estimator (QMLE) and the noise empirical autocovariances. We then derive the asymptotic distribution of residual empirical autocovariances and autocorrelations under weak assumptions on the noise. We deduce the asymptotic distribution of the G. M. Ljung and G. E. P. Box [Biometrika 65, 297–303 (1978; Zbl 0386.62079)] (or G. E. P. Box and D. A. Pierce, J. Am. Stat. Assoc. 65, 1509–1526 (1970; Zbl 0224.62041)) portmanteau statistics in this framework. It is shown that the asymptotic distribution of the portmanteau tests is that of a weighted sum of independent chi-squared random variables, which can be quite different from the usual chi-squared approximation used under independent and identically distributed (iid) assumptions on the noise. Hence we propose a method to adjust the critical values of the portmanteau tests. Monte Carlo experiments illustrate the finite sample performance of the modified portmanteau test.Journal of Statistical Planning and Inference 08/2011; 8(8). · 0.60 Impact Factor

Page 1

Practical methods for modelling weak VARMA processes:

identification, estimation and specification with a

macroeconomic application∗

Jean-Marie Dufour†

McGill University

Denis Pelletier‡

North Carolina State University

October 2008

∗The authors thank Marine Carasco, John Galbraith, Nour Meddahi and Rui Castro for several useful comments. The

second author gratefully acknowledges financial assistance from the Social Sciences and Humanities Research Council

of Canada, the Government of Québec (fonds FCAR), the CRDE and CIRANO. Earlier versions of the paper circulated

under the title Linear Estimation of Weak VARMA Models With a Macroeconomic Application. This work was supported

by the Social Sciences and Humanities Research Council of Canada, the Natural Sciences and Engineering Research

Council of Canada, the Canadian Network of Centres of Excellence [program on Mathematics of Information Technology

and Complex Systems (MITACS)], the Canada Council for the Arts (Killam Fellowship), the CIREQ, the CIRANO, and

the Fonds FCAR (Government of Québec).

†William Dow Professor of Economics, McGill University, Centre interuniversitaire de recherche en analyse des

organisations (CIRANO), and Centre interuniversitaire de recherche en économie quantitative (CIREQ).Mailing address:

Department of Economics, McGill University, Leacock Building, Room 519, 855 Sherbrooke Street West, Montréal,

Québec H3A 2T7, Canada. TEL: (1) 514 398 8879; FAX: (1) 514 398 4938; e-mail: jean-marie.dufour@mcgill.ca . Web

page: http://www.jeanmariedufour.com

‡Department of Economics, Box 8110, North Carolina State University, Raleigh, NC 27695-8110, USA. Email:

denis_pelletier@ncsu.edu. Web page: http://www4.ncsu.edu/ dpellet

Page 2

ABSTRACT

In this paper, we develop practical methods for modelling weak VARMA processes. In a first

part, we propose new identified VARMA representations, the diagonal MA equation form and the

final MA equation form, where the MA operator is diagonal and scalar respectively. Both of these

representations have the important feature that they constitute relatively simple modifications of a

VAR model (in contrast with the echelon representation). In a second part, we study the problem

of estimating VARMA models by relatively simple methods which only require linear regressions.

We consider a generalization of the regression-based estimation method proposed by Hannan and

Rissanen (1982). The asymptotic properties of the estimator are derived under weak hypotheses on

the innovations (uncorrelated and strong mixing) so as to broaden the class of models to which it

can be applied. In a third part, we present a modified information criterion which gives consistent

estimates of the orders under the proposed representations. To demonstrate the importance of using

VARMA models to study multivariate time series we compare the impulse-response functions and

the out-of-sample forecasts generated by VARMA and VAR models.

Key words: linear regression; VARMA; final equation form; information criterion; weak represen-

tation; strong mixing condition; impulse-response function.

Journal of Economic Literature Classification: C13, C32, C51, E0.

Page 3

1. Introduction

In time series analysis and econometrics, VARMA models are scarcely used to represent multivari-

ate time series. VAR models are much more widely employed because they are easier to implement.

The latter models can be estimated by least squares methods, while VARMA models typically re-

quire nonlinear methods (such as maximum likelihood). Specification is also easier for VARmodels

since only one lag order must be chosen.

VAR models, however, have important drawbacks. First, they are typically less parsimonious

than VARMA models [e.g., see Lütkepohl and Poskitt (1996b)]. Second, the family of VAR models

is not closed under marginalization and temporal aggregation [see Lütkepohl (1991)]. The truth

cannot always be a VAR. If a vector satisfies a VAR model, subvectors do not typically satisfy

VAR models (but VARMA models). Similarly, if the variables of a VAR process are observed at a

different frequency, the resulting process is not a VAR process. In contrast, the class of VARMA

models is closed under such operations.

Theimportance ofnonlinear models has been growing inthetimeseries literature. Thesemodels

are interesting and useful but may be hard to use. Because of this and the fact that many important

classes of nonlinear processes admit an ARMA representation [e.g., see Francq and Zakoïan (1998),

Francq, Roy, and Zakoïan (2003)] many researchers and practitioners still have an interest in linear

ARMA models. However, the innovations in these ARMA representations do not have the usual

i.i.d. or m.d.s. property, although they are uncorrelated. One must then be careful before applying

usual results to the estimation of ARMA models because they usually rely on the above strong as-

sumptions [e.g., see Brockwell and Davis (1991) and Lütkepohl (1991)]. We refer to these as strong

and semi-strong ARMA models respectively, by opposition to weak ARMA models where the in-

novations are only uncorrelated. The i.i.d. and m.d.s. properties are also not robust to aggregation

(the i.i.d. Gaussian case being an exception); see Francq and Zakoïan (1998), Francq, Roy, and

Zakoïan (2003), Palm and Nijman (1984), Nijman and Palm (1990), Drost (1993). In fact, the Wold

decomposition only guarantees that the innovations are uncorrelated.

It follows that (weak) VARMA models appear to be preferable from a theoretical viewpoint, but

their adoption is complicated by identification and estimation difficulties. The direct multivariate

generalization of ARMA models does not give an identified representation [see Lütkepohl (1991,

Section 7.1.1)]. It follows that one has to decide on a set of constraints to impose so as to achieve

identification. Standard estimation methods for VARMA models (maximum likelihood, nonlin-

ear least squares) require nonlinear optimization which may not be feasible as soon as the model

involves a few time series, because the number of parameters can increase quickly.

In this paper, we consider the problem of modeling weak VARMA processes. Our goal is to

develop a procedure which will ease the use of these models. It will cover three basic modelling

operations: identification, estimation and specification.

First, in order to avoid identification problems and to further ease the use of VARMA models,

we introduce three new identified VARMA representations, the diagonal MA equation form, the

final MA equation form and the diagonal AR equation form. Under the diagonal MA equation

form (diagonal AR equation form) representation, the MA (AR) operator is diagonal and each lag

operator may have a different order. Under the final MA equation form representation the MA

1

Page 4

operator is scalar, i.e. the operators are equal across equations. The diagonal and final MA equation

form representations can be interpreted as simple extensions of the VAR model, which should be

appealing to practitioners who prefer to employ VAR models due to their ease of use. The identified

VARMA representation which is the most widely employed in the literature is the echelon form.

Specification of VARMA models in echelon form does not amount to specifying the order p and

q as with ARMA models. Under this representation, VARMA models are specified by as many

parameters, called Kronecker indices, as the number of time series studied. These indices determine

theorder oftheelements ofthe ARand MAoperators in anon trivial way. Thecomplicated nature of

the echelon form representation is a major reason why practitioners are not using VARMA models,

so the introduction of a simpler identified representation is interesting.

Second, we consider the problem of estimating VARMA models by relatively simple methods

which only require linear regressions. For that purpose, we consider a multivariate generalization

of the regression-based estimation method proposed by Hannan and Rissanen (1982) for univariate

ARMA models. The method is performed in three steps. In a first step, a long autoregression is

fitted to the data. In the second step, the lagged innovations in the ARMA model are replaced

by the corresponding residuals from the long autoregression and a regression is performed. In a

third step, the data from the second step are filtered so as to give estimates that have the same

asymptotic covariance matrix than one would get with the maximum likelihood [claimed in Hannan

and Rissanen (1982), proven in Zhao-Guo (1985)]. Extension ofthis innovation-substitution method

to VARMA models was also proposed by Hannan and Kavalieris (1984a) and Koreisha and Pukkila

(1989), under the assumption that the innovations are a m.d.s.

Here, we extend these results by showing that the linear regression-based estimators are consis-

tent under weak hypotheses on the innovations and how filtering in the third step gives estimators

that have the same asymptotic distribution as their nonlinear counterparts (maximum likelihood if

the innovations are i.i.d., or nonlinear least squares if they are merely uncorrelated). In the non i.i.d.

case, we consider strong mixing conditions [Doukhan (1995), Bosq (1998)], rather than the usual

m.d.s. assumption. By using weaker assumptions for the process of the innovations, we broaden the

class of processes to which our method can be applied.

Third, we suggest a modified information criterion to choose the orders of VARMA models

under these representations. This criterion is to be minimized in the second step of the estima-

tion method over the orders of the AR and MA operators and gives consistent estimates of these

orders. Our criterion is a generalization of the information criterion proposed by Hannan and Rissa-

nen (1982), which was later corrected by Hannan and Rissanen (1983) and Hannan and Kavalieris

(1984b), forchoosing the orders pand q inARMAmodels. Theidea ofgeneralizing this information

criterion is mentioned in Koreisha and Pukkila (1989) but a specific generalization and theoretical

properties are not presented.

Fourth, the method is applied to U.S. macroeconomic data previously studied by Bernanke and

Mihov (1998) and McMillin (2001). To illustrate the impact of using VARMA models instead of

VAR models to study multivariate time series we compare the impulse-response functions generated

by each model. We show that we can obtain much more precise estimates of the impulse-response

function by using VARMA models instead of VAR models.

The rest of the paper is organized as follows. Our framework and notation are described in

2

Page 5

section 2. The new identified representations are presented in section 3. In section 4, we present

the estimation method. In section 5, we describe the information criterion used for choosing the

orders of VARMA models under the representation proposed in our work. Section 6 contains results

of Monte Carlo simulations which illustrate the properties of our method. Section 7 presents the

macroeconomic application where we compare the impulse-response functions from a VAR model

and VARMA models. Section 8 contains a few concluding remarks. Finally, proofs are in the

appendix.

2. Framework

Consider the following K-variate zero mean VARMA(p,q) model in standard representation:

Yt=

p

?

i=1

ΦiYt−i+ Ut−

q

?

j=1

ΘjUt−j, t ∈ Z,

(2.1)

where Utis a sequence of uncorrelated random variables with mean zero, defined on some prob-

ability space (Ω, A, P). The vectors Yt and Ut contain the K univariate time series: Yt =

[y1t,y2t, ... , yKt]′and Ut= [u1t,u2t, ... , uKt]′. We can also write the previous equation with

lag operators:

Φ(L)Yt= Θ(L)Ut

(2.2)

where

Φ(L) = IK− Φ1L − ··· − ΦpLp,

Let Htbe the Hilbert space generated by (Ys, s < t). The process Utcan be interpreted as the

linear innovation of Yt:

Ut= Yt− EL[Yt|Ht].

We assume that Ytis a strictly stationary and ergodic sequence and that the process Uthas common

variance (V ar[Ut] = ΣU) and finite fourth moment (E[|uit|4+2δ] < ∞, for all i and t, where

δ > 0). We make the zero mean-mean hypothesis only to simplify notation.

Assuming that the process Ytis stable,

Θ(L) = IK− Θ1L − ··· − ΘqLq.

(2.3)

(2.4)

det[Φ(z)] ?= 0 for all |z| ≤ 1,

(2.5)

and invertible,

det[Θ(z)] ?= 0 for all |z| ≤ 1,

(2.6)

it can be represented as an infinite VAR,

Π(L)Yt= Ut

(2.7)

where

Π(L) = Θ(L)−1Φ(L) = IK−

∞

?

i=1

ΠiLi,

(2.8)

3

Page 6

or an infinite VMA,

Yt= Ψ(L)Ut

(2.9)

where

Ψ(L) = Φ(L)−1Θ(L) = IK−

∞

?

j=1

ΨjLj.

(2.10)

We will denote by ϕij(L) the polynomial in row i and column j of Φ(L), and the row i or column

j of Φ(L) by

Φi•(L)

Φ•j(L)

=[ϕi1(L), ... , ϕiK(L)],

[ϕ1j(L), ... , ϕKj(L)]′.

(2.11)

(2.12)

=

The diag operator creates a diagonal matrix,

diag[ϕ11(L), ... , ϕKK(L)] =

ϕ11(L)

...

0

···

...

···

0

...

ϕKK(L)

(2.13)

where

ϕii(L) = 1 − ϕii,1L − ··· − ϕii,pLp.

(2.14)

The function deg[ϕ(L)] returns the degree of the polynomial ϕ(L) and the function dim(γ) gives

the dimension of the vector γ.

Weneed toimpose some structure on the process Ut. Thetypical hypothesis which is imposed in

the time series literature is that the Ut’s are either independent and identically distributed (i.i.d.) or

a martingale difference sequence (m.d.s.). In this work, we do not impose such strong assumptions

because we want to broaden the class of models to which it can be applied. We only assume that it

satisfies a strong mixing condition [Doukhan (1995), Bosq (1998)]. Let Utbe a strictly stationary

process, and

α(h) =sup

B∈σ(Us,s≤t)

C∈σ(Us,s≥t+h)

|Pr(B ∩ C) − Pr(B)Pr(C)|

(2.15)

the α-mixing coefficient of order h ≥ 1, where σ(Us,s ≤ t) and σ(Us,s ≥ t + h) are the σ-

algebras associated with {Us: s ≤ t}and σ(Us: s ≥ t + h) respectively. We suppose that Utis

strong mixing, i.e.

∞

?

This is a fairly minimal condition that will be satisfied by many processes of interest.

h=1

α(h)δ/(2+δ)< ∞

for some

δ > 0.

(2.16)

4

Page 7

3.Identification and diagonal VARMA representations

It is important to note that we cannot work with the standard representation (2.1) because it is not

identified. To help gain intuition on the identification of VARMA models, we can consider a more

general representation where Φ0and Θ0are not identity matrices:

Φ0Yt= Φ1Yt−1+ ··· + ΦpYt−p+ Θ0Ut− Θ1Ut−1+ ··· + ΘqUt−q.

By this specification, we mean the well-defined process

(3.1)

Yt= (Φ0− Φ1L − ··· − ΦpLp)−1(Θ0+ Θ1L + ··· + ΘqLq)Ut.

But we can see this such process has a standard representation if Φ0and Θ0are nonsingular. To

see this, we premultiply (3.1) by Φ−1

0

and define¯Ut=Φ−1

(3.2)

0Θ0Ut:

Yt

=Φ−1

+¯Ut− Φ−1

0Φ1Yt−1+ ··· + Φ−1

0Θ1Θ−1

0ΦpYt−p

0Φ0¯Ut−1− ··· − Φ−1

0ΘqΘ−1

0Φ0¯Ut−q.

(3.3)

Redefining the matrices, we get a representation of type (2.1). As long as Φ0and Θ0are nonsingular,

we can transform a non-standard VARMA into a standard one.

We say that two VARMA representations are equivalent if Φ(L)−1Θ(L) results in the same op-

erator Ψ(L). Thus, to ensure uniqueness of a VARMA representation, we must impose restrictions

on the AR and MA operators such that for a given Ψ(L) there is one and only one set of operators

Φ(L) and Θ(L) that can generate this infinite MA representation.

A first restriction that we impose is a multivariate equivalent of the coprime property in the

univariate case. We do not want factors of Φ(L) and Θ(L) to “cancel out” when Φ(L)−1Θ(L)

is computed. This feature is called the left-coprime property [see Hannan (1969) and Lütkepohl

(1993)]: the matrix operator Ψ[Φ(L),Θ(L)] ≡ Φ(L)−1Θ(L) is left-coprime if, for any operators

D(L),¯Φ(L), and¯Θ(L), the identity

D(L)Ψ[¯Φ(L),¯Θ(L)] = Ψ[Φ(L),Θ(L)]

(3.4)

implies that D(L) is unimodular [i.e., detD(L) is a nonzero constant]. To obtain uniqueness of

left-coprime operators we have to impose restrictions ensuring that the only feasible unimodular

operator D(L) in (3.4) is D(L) = IK. There is no unique way of doing this. The dominant

representation in the literature is the echelon form [see Deistler and Hannan (1981), Hannan and

Kavalieris (1984b), Lütkepohl (1993), Lütkepohl and Poskitt (1996a)].

Definition 3.1 (Echelon form) The VARMA representation in (2.1) is said to be in echelon form if

the AR and MA operators Φ(L) = [ϕij(L)]i,j=1,...,Kand Θ(L) = [θij(L)]i,j=1,...,Ksatisfy the

following conditions: all operators ϕij(L) and θij(L) in the i-th row of Φ(L) and Θ(L) have the

5

Page 8

same degree piwith the form

ϕii(L)=1 −

pi

?

?

m=1

ϕii,mLm,

for i = 1, ... , K,

ϕij(L)=

−

pi

m=pi−pij+1

pi

?

ϕij,mLm,

for j ?= i,

θij(L)=

m=0

θij,mLm

for i,j = 1, ... , K,

with Θ0= Φ0.

Further, in the VAR operator ϕij(L),

pij=

?

min(pi+ 1,pj)

min(pi,pj)

for i ≥ j,

for i < j,

i,j = 1, ... , K, i.e. pijspecifies the number of free coefficients in the operator ϕij(L) for j ?= i.

The row orders (p1, ... , pK) are the Kronecker indices and their sum?K

We see that dealing with VARMA models in echelon form is not as easy as dealing with uni-

variate ARMA models where everything is specified by choosing the value of p and q. The number

of Kronecker indices is larger than two (if K is larger than two) and, when choosing pij, we have to

consider if we are above or below the diagonal. Having a summation subscript in the operator ϕij,

m = pi−pij+1, different across rows and columns also complicates the use of this representation.

The task is far from being impossible but it is more complicated than for ARMA models. Specifica-

tion of VARMA models in echelon form is discussed in Hannan and Kavalieris (1984b), Lütkepohl

and Claessen (1997), Poskitt (1992), Nsiri and Roy (1992, 1996), Lütkepohl and Poskitt (1996b),

Bartel and Lütkepohl (1998). This might be a reason why practitioners are reluctant to employ

VARMA models. Who could blame them for sticking with VAR models when they probably need

to refer to a textbook to simply write down an identified VARMA representation?

In this work, to ease the use of VARMA models we present new VARMA representations which

can be seen as a simple extensions of the VAR model. To introduce them, we first review another

identified representation, the final equation form, which will refer to as the final AR equation form,

under which the AR operator is scalar [see Zellner and Palm (1974), Hannan (1976), Wallis (1977),

Lütkepohl (1993)].

i=1piis the McMillan

degree. For the VARMA orders we have in general p = q = max(p1, ... , pK).

Definition 3.2 (Final AR equation form) The VARMA representation (2.1) is said to be in final

AR equation form if Φ(L) = ϕ(L)IK, where ϕ(L) = 1−ϕ1L−···−ϕpLpis a scalar polynomial

with ϕp?= 0.

To see how we can obtain a VARMA model with a final AR equation form representation, we

6

Page 9

can proceed as follows. By standard linear algebra, we have

Φ(L)∗Φ(L) = Φ(L)Φ(L)∗= det[Φ(L)]IK

(3.5)

where Φ(L)∗is the adjoint matrix of Φ(L). On multiplying both sides of (2.2) by Φ(L)∗, we get:

det[Φ(L)]Yt= Φ(L)⋆Θ(L)Ut.

(3.6)

This representation is not attractive for several reasons. First, it is quite far from usual VAR

models by excluding lagged values of other variables in each equation (e.g., the AR part of the

first equation includes lagged values of y1tbut no lagged values of y2t, ... , yKt). Further, the

AR coefficients are the same in all the equations, which will require a polynomial of higher order

pK. Second, the interaction between the different variables is modeled through the MA part of the

model, which may have to be quite complex.

However, we can derive alternative representations which are both more intuitive and practical.

First, upon multiplying both sides of (2.2) by Θ(L)⋆, we get:

Θ(L)⋆Φ(L)Yt= det[Θ(L)]Ut

(3.7)

where Θ(L)⋆is the adjoint matrix of Θ(L). We refer to VARMA models in (3.7) as being in final

MA equation form.

Definition 3.3 (Final MA equation form) The VARMA representation (2.1) is said to be in final

MA equation form if

Θ(L) = θ(L)IK

(3.8)

where θ(L) = 1 − θ1L − ··· − θqLqis a scalar operator with θq?= 0.

By (3.7), it is clear that any VARMA process satisfying (2.1)-(2.6) can be written in final MA

form. This form is much closer to the usual finite-order VAR model than the echelon representation

or the final AR equation form, because the AR part is a finite-order VAR while the MA part of each

equation is a univariate MA which only involves a single innovation process. The main drawback

comes from the fact that the MA operator is the same in all the equations, which can lead to a

high-order MA. It is however possible to get a more parsimonious representation by allowing for

different MA polynomials in different equations.

Suppose there are common roots across rows for some columns of Θ(L), so that starting from

(2.1) we can write:

Φ(L)Yt

=

¯Θ(L)D(L)Ut,

det?¯Θ(L)?D(L)Ut,

(3.9)

(3.10)

¯Θ(L)⋆Φ(L)Yt

=

where D(L) = diag[d1(L), ... , dK(L)] and dj(L) is a polynomial common to θij(L), ∀i =

1, ... , K. We see that allowing for diagonal polynomials in the moving average as in equation

(3.10) may yield a more parsimonious representation than (3.7). We will call the representation

(3.10) diagonal MA equation form representation.

7

Page 10

Definition 3.4 (Diagonal MA equation form) The VARMA representation (2.1) is said to be in

diagonal MA equation form if Θ(L) = diag[θii(L)] = IK− Θ1L − ··· − ΘqLqwhere θii(L) =

1 − θii,1L − ··· − θii,qiLqi, θii,qi?= 0 and q = max1≤i≤K(qi).

The latter representation is interesting because contrary to the echelon form it is relatively easy

to specify. We do not have to deal with rules for the orders of the off-diagonal elements in the

AR and MA operators. The fact that it can be seen as a simple extension of the VAR model is also

appealing. Practitioners are comfortable using VARmodels, so simply adding lags of uitto equation

i is a natural extension of the VAR model which could give a more parsimonious representation.

It also has the advantage of putting the simple structure on the MA polynomials, the part which

complicates the estimation, rather than the AR part as in the final AR equation form. Notice that

in VARMA models, it is not necessary to include lags of all the innovations u1t,··· ,uKtin every

equation. This could entice practitioners to consider VARMAmodels if it is combined with a simple

regression-based estimation method.

From (3.7). it is clear that any process that satisfies (2.1)-(2.6) also possesses a diagonal MA

representation (because the latter includes the final MA equation form as a special case). We will

now give conditions ensuring that a diagonal MA representation is unique. For that purpose, we

consider the following assumptions and use the following matrix lemma (which may be of separate

interest).

Assumption 3.5 The matrices Φ(z) and Θ(z) have the following form:

Φ(z) = IK− Φ1z − ··· − Φpzp,

Assumption 3.6 Θ(z) is diagonal:

Θ(z) = IK− Θ1z − ··· − Θqzq.

Θ(z) = diag[θ11(z), ... , θKK(z)]

where θii(z) = 1 − θii,1z − ··· − θii,qizqiand θii,qi?= 0, i = 1, ... , K.

Assumption 3.7 For each i = 1, ... , K, there are no roots common to Φi•(z) and θii(z), i.e.

there is no value z⋆such that Φi•(z⋆) = 0 and θii(z⋆) = 0.

Lemma 3.8 Let [Φ(z),Θ(z)] and?¯Φ(z),¯Θ(z)?be two pairs of polynomial matrices which satisfy

Φ(z)−1Θ(z) =¯Φ(z)−1¯Θ(z), for 0 ≤ |z| < ρ0,

where ρ0is a positive constant, then

the Assumptions 3.5 to 3.7. If

(3.11)

Φ(z) =¯Φ(z) and Θ(z) =¯Θ(z), ∀z.

(3.12)

The proof of this lemma as well as other propositions appear in the Appendix. It entails that the

matrix Φ(z)−1Θ(z) has a unique factorization in terms of polynomial matrices Φ(z) and Θ(z), i.e.

8

Page 11

the operators Φ(L) and Θ(L) are uniquely defined by Φ(L)−1Θ(L). It is also easy to see that the

condition

Φ(z)−1Θ(z) =¯Φ(z)−1¯Θ(z)

(3.13)

could be replaced by

Θ(z)−1Φ(z) =¯Θ(z)−1¯Φ(z)

(3.14)

since by assumption the inverses of Θ(z) and¯Θ(z) exist. Note that Assumption 3.5 is equivalent

to (2.3). It is interesting to note that the conditions of Lemma 3.8 allow det[Φ(z)] and det[Θ(z)] to

have roots on or inside the unit circle |z| = 1. Further, Assumption 3.7 is weaker than the hypothesis

that det[Φ(L)] and det[Θ(L)] have no common roots, which would be a generalization of the usual

identification condition for ARMA models. We can now show that a VARMA model in diagonal

MA form has a unique representation.

Theorem 3.9 (Identification of diagonal MA equation form representation) Let {Yt : t ∈ Z}

be a VARMA process satisfying the conditions (2.1)-(2.6). If the assumptions 3.6 and 3.7 hold,

then the polynomial operators Φ(L) and Θ(L) are uniquely defined.

Similarly, we can demonstrate that the final MA equation form representation is identified under

the following assumption.

Assumption 3.10 There are no roots common to Φ(z) and θ(z), i.e. there is no value z⋆such that

Φ(z⋆) = 0 and θ(z⋆) = 0.

Theorem 3.11 (Identification of final MA equation form representation) Let {Yt: t ∈ Z} be a

VARMAprocess satisfying the conditions (2.1)-(2.6). If the model is in final MA equation form and

Assumption 3.7 holds, then the polynomial operators Φ(L) and Θ(L) are uniquely defined.

From equation (3.7), we see that it is always possible to obtain a diagonal MA equation form

representation starting from any VARMA representation. One case where we would obtain a diag-

onal and not final MA representation is when there are common factors across rows of columns of

Θ(L) as in (3.10).

A strong appeal of the diagonal and final MA equation form representations is that it is easy

to get the equivalent (in term of autocovariances) invertible MA representation of a non-invertible

representation. With ARMA models, we simply have to invert the roots of the MA polynomial

which are inside the unit circle and adjust the standard deviation of the innovations (divide it by the

square of these roots): see Hamilton (1994, Section 3.7). The same procedure could be applied to

VARMA models in diagonal or final MA equation form.

For VARMA representations where no particular simple structure is imposed on the MA part, at

the moment weare not aware ofan algorithm togo from the non-invertible tothe invertible represen-

tation tough theoretically this invertible representation exist and is unique as long as det[Θ(z)] ?= 0

for |z| = 1; see Hannan and Deistler (1988, chapter 1, section 3). So it might be troublesome to use

a nonlinear optimization with these VARMA representations since we don’t know how to go from

the non-invertible to the invertible representation.

9

Page 12

We can also consider the following natural generalization of the final AR equation form, where

we simply replace the scalar AR operator by a diagonal operator.

Definition 3.12 (Diagonal AR equation form) The VARMA model(2.1) is said to be in diagonal

AR equation form if Φ(L) = diag[ϕii(L)] = IK−Φ1L−···−ΦpLpwhere ϕii(L) = 1−ϕii,1L−

··· − ϕii,piLpiand p = max1≤i≤K(pi).

Assumption 3.13 For each i = 1, ... , K, there are no roots common to ϕii(z) and Θi•(z), i.e.

there is no value z⋆such that ϕii(z⋆) = 0 and Θi•(z⋆) = 0.

Theorem 3.14 (Identification of diagonal AR equation form representation) Let {Yt: t ∈ Z}

be a VARMA process satisfying the conditions (2.1)-(2.6). If the model is in diagonal AR equa-

tion form and Assumption 3.13 holds, then the polynomial operators Φ(L) and Θ(L) are uniquely

defined.

From Theorem 3.9, we can see that one way to ensure identification is to impose constraints

on the MA operator. This is an alternative approach to the ones developed for example in Hannan

(1971, 1976) where the identification is obtained by restricting the autoregressive part to be lower

triangular with deg[ϕij(L)] ≤deg[ϕii(L)] for j > i, or in the final AR equation form where Φ(L)

is scalar. It may be more interesting to impose constraints on the moving average part instead

because it is this part which causes problems in the estimation of VARMA models. Other identified

representations which do not have a simple MA operator include the reversed echelon canonical

form [see Poskitt (1992)] where we the rows of the VARMA model in echelon form are permuted

so that the Kronecker indices are ordered from smallest to largest, and the scalar component model

[see Tiao and Tsay (1989)] where contemporaneous linear transformations of the vector process are

considered. A general treatment of algebraic and topological structure underlying VARMA models

is given in Hannan and Kavalieris (1984b).

4. Estimation

We next introduce elements of notation for the parameters of our model. First, irrespective of the

VARMA representation employed, we split the whole vector of parameters γ in two parts γ1(the

parameters for the AR part) and γ2(MA part):

γ = [γ1,γ2]′.

(4.1)

For a VARMA model in diagonal MA equation form, γ1and γ2are

γ1

γ2

=

?ϕ1•,1, ... , ϕ1•,p, ... , ϕK•,1, ... , ϕK•,p

[θ11,1, ... , θ11,q1, ... , θKK,1, ... , θKK,qK],

?,

(4.2)

(4.3)

=

while for a VARMA model in final MA equation form, γ2is

γ2= [θ1, ... , θq].

10

Page 13

For VARMA models in diagonal AR equation form, we simply invert γ1and γ2:

γ1

γ2

=

?ϕ11,1, ... , ϕ11,p1, ... , ϕKK,1, ... , ϕKK,pK

[θ1•,1, ... , θ1•,q, ... , θK•,1, ... , θK•,q],

?,

(4.4)

(4.5)

=

while for a VARMA model in final AR equation form,

γ1=?ϕ1, ... , ϕp

?.

(4.6)

The estimation method involves three steps.

Step 1. Estimate a VAR(nT) to approximate the VARMA(p,q) and recuperate the residuals that we

will callˆUt:

nT

?

with T > 2K nT.

Step 2. With the residuals from step 1, compute an estimate of the covariance matrix of Ut,ˆΣU=

1

T

ˆUt= Yt−

l=1

ˆΠnT

l

Yt−l

(4.7)

?T

t=nT+1ˆUtˆU′

tand estimate by GLS the multivariate regression

Φ(L)Yt= [Θ(L) − IK]ˆUt+ et,

(4.8)

to get estimates˜A(L) and˜Θ(L) of Φ(L) and Θ(L). The estimator is

˜ γ =

?T

?

t=l

ˆZ′

t−1ˆΣ−1

U

ˆZt−1

?−1?T

?

t=l

ˆZ′

t−1ˆΣ−1

UYt

?

(4.9)

where l = nT+ max(p,q) + 1. Setting

Yt−1(p)

ˆUt−1

yk,t−1

ˆ uk,t−1

=[y1,t−1, ... , yK,t−1, ... , y1,t−p, ... , yK,t−p],

[ˆ u1,t−1, ... , ˆ uK,t−1, ... , ˆ u1,t−q, ... , ˆ uK,t−q],

[yk,t−1, ... , yk,t−pk],

[ˆ uk,t−1, ... , ˆ uk,t−qk],

(4.10)

(4.11)

(4.12)

(4.13)

=

=

=

the matrixˆZt−1for the various representations is:

ˆZDMA

t−1

=

Yt−1(p)

...

0

···

...

···

···

...

···

0

...

ˆ u1,t−1

...

0

···

...

···

,

0

...

Yt−1(p)

ˆ uK,t−1

,

(4.14)

ˆZFMA

t−1

=

Yt−1(p)

...

0

0

...

ˆ u1,t−1

...

ˆ uK,t−1

Yt−1(p)

(4.15)

11

Page 14

ˆZDAR

t−1

=

y1,t−1

...

0

···

...

···

ˆ Ut−1

0

...

ˆUt−1

...

0

···

...

0

0

...

yK,t−1

ˆUt−1

,

(4.16)

ˆZFAR

t−1

=

y1,t−1

...

yK,t−1

···

...

0

0

......

0

ˆUt−1

,

(4.17)

where DMA, FMA, DAR and FAR respectively stands for Diagonal MA, Final MA, Diagonal

AR and Final AR equation form.

Step 3. Using the second step estimates, we first form new residuals

˜Ut= Yt−

p

?

i=1

˜ΦiYt−i+

q

?

j=1

˜Θj˜Ut−j

(4.18)

initiating with˜Ut= 0, t ≤ max(p,q), and we define

Xt

=

q

?

q

?

j=1

˜ΘjXt−j+ Yt,

(4.19)

Wt

=

j=1

˜ΘjWt−j+˜Ut,

(4.20)

initiating with Xt= Wt= 0 for t ≤ max(p,q). We also compute a new estimate of ΣU,˜ΣU =

1

T

?T

t=max(p,q)+1˜Ut˜U′

t. Then we regress by GLS˜Ut+ Xt− Wton˜Vt−1with

˜Vt=

q

?

j=1

˜Θj˜Vt−j+˜Zt

(4.21)

where˜Ztis just likeˆZtfrom step 2 except that it is computed with˜Utinstead ofˆUtto obtain

regression coefficients that we callˆAiandˆΘj:

ˆ γ =

T

?

t=max(p,q)+1

˜V′

t−1˜Σ−1

U

˜Vt−1

−1

T

?

t=max(p,q)+1

˜V′

t−1˜Σ−1

U[˜Ut+ Xt− Wt]

.

(4.22)

The properties of the above estimates are summarized in the following three theorems. Theorem

4.1 is a generalization of results from Lewis and Reinsel (1985) where convergence is demonstrated

for mixing rather than i.i.d. innovations. We denote the Euclidean norm by ?B?2= tr(B′B).

Theorem 4.1 (VARMA first step estimates) Let (i) the VARMA model be defined by equations

(2.1)-(2.6); (ii) the strong mixing condition (2.16) hold; (iii) assume that E[|uit|4+2δ] < ∞, ∀i

12

Page 15

and for some δ > 0. If nTgrows at a rate faster than logT with nT2/T → 0, then for the first stage

estimates

nT

?

Theorem 4.2 (VARMA second step estimates) Let (i) the VARMA model be defined by equations

(2.1)-(2.6) and be identified; (ii) let the strong mixing condition (2.16) hold; (iii) assume that

E[|uit|4+2δ] < ∞, ∀i and for some δ > 0. If nTgrows at a rate faster than logT with nT2/T → 0,

then the second stage estimates converge in quadratic mean to their true value and

l=1

?ˆΠnT

l

− Πl? = Op(nTT−1/2).

(4.23)

√

T (˜ γ − γ)

d

−→ N

?

0,˜J−1˜I˜J−1?

where

˜I =

∞

?

j=−∞

E

??Z′

t−1Σ−1

UUt

??Z′

t−1−jΣ−1

UUt−j

?′?

,

˜J = E?Z′

t−1Σ−1

UZt−1

?,

and Zt−1is equal to the matrixˆZt−1whereˆUtis replaced by Ut. Further, if m4

mT→ ∞ then the matrix˜I and˜J can be consistently estimated in probability respectively by

mT

?

1

T

t=l

T/T → 0 with

˜IT

=

1

T

j=−mT

ω(j,mT)

T

?

t=l+|j|

?ˆZ′

t−1ˆΣ−1

U

˜Ut

??ˆZ′

t−1−jˆΣ−1

U

˜Ut−j

?′,

(4.24)

˜JT

=

T

?

ˆZ′

t−1ˆΣ−1

U

ˆZt−1,

(4.25)

with ω(j,mT) = 1 − |j|/(mT+ 1).

Theorem 4.3 (VARMA third step estimates) Let (i) the VARMA model be defined by equations

(2.1)-(2.6) and be identified; (ii) let the strong mixing condition (2.16) hold; (iii) assume that

E[|uit|4+2δ] < ∞, ∀i and for some δ > 0. If nTgrows at a rate faster than logT with nT2/T → 0,

then the third stage estimates converge in quadratic mean to their true value, and

√T (ˆ γ − γ)

d

−→ N

?

0,ˆ J−1ˆIˆ J−1?

(4.26)

with

ˆI =

∞

?

j=−∞

E

??V′

t−1Σ−1

UUt

??V′

t−1−jΣ−1

UUt−j

?′?

,

ˆJ = E?V′

t−1Σ−1

UVt−1

?

(4.27)

and Vt−1is equal to the matrix˜Vt−1where˜Utis replaced by Ut. Further, if m4

T/T → 0 with

13

#### View other sources

#### Hide other sources

- Available from Jean-Marie Dufour · May 22, 2014
- Available from ncsu.edu
- Available from ncsu.edu