Page 1

Generalized Autoregressive Score

Models with Applications∗

Drew Creala, Siem Jan Koopmanb,d,Andr´ e Lucasc,d

(a)University of Chicago, Booth School of Business

(b)Department of Econometrics, VU University Amsterdam

(c)Department of Finance, VU University Amsterdam, and Duisenberg school of finance

(d)Tinbergen Institute, Amsterdam

August 9, 2011

Abstract

We propose a class of observation driven time series models referred to as Generalized

Autoregressive Score (GAS) models. The mechanism to update the parameters over time

is the scaled score of the likelihood function. This new approach provides a unified and

consistent framework for introducing time-varying parameters in a wide class of non-linear

models. The GAS model encompasses other well-known models such as the generalized

autoregressive conditional heteroskedasticity, the autoregressive conditional duration, the

autoregressive conditional intensity, and Poisson count models with time-varying mean.

In addition, our approach can lead to new formulations of observation driven models. We

illustrate our framework by introducing new model specifications for time-varying copula

functions and for multivariate point processes with time-varying parameters. We study

the models in detail and provide simulation and empirical evidence.

Keywords : Copula functions, Dynamic models, Marked point processes, Time-varying

parameters.

JEL classification codes : C10, C22, C32, C51.

∗We would like to thank Karim Abadir, Konrad Banachewicz, Charles Bos, Jianqing Fan, Clive Granger,

Andrew Harvey, Marius Ooms, Neil Shephard and Michel van der Wel for their comments on an earlier draft

of the paper. We have benefited from the comments of participants at the conference “High-Frequency Data

Analysis in Financial Markets” of Hitotsubashi University, Tokyo, at a meeting of Society for Financial Econo-

metrics (SoFiE) in Geneva, and at seminar presentations at the University of Alicante, University of Chicago,

NAKE 2008 Utrecht research day, Center for Operations Research and Econometrics at Universit´ e Catholique

de Louvain, Imperial College London, Oxford-Man Institute, Princeton University, Erasmus University Rotter-

dam, Tinbergen Institute and VU University Amsterdam. We thank Moody’s for providing the credit rating

transition data for one of our applications. Correspondence: Drew D. Creal, University of Chicago Booth School

of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637. Email: Drew.Creal@chicagobooth.edu

Page 2

1Introduction

In many settings of empirical interest, time variation in a selection of model parameters

is important for capturing the dynamic behavior of univariate and multivariate time series

processes. Time series models with time-varying parameters have been categorized by Cox

(1981) into two classes of models: observation driven models and parameter driven models.

In the observation driven approach, time variation of the parameters is introduced by letting

parameters be functions of lagged dependent variables as well as contemporaneous and lagged

exogenous variables. Although the parameters are stochastic, they are perfectly predictable

given the past information. This approach simplifies likelihood evaluation and explains why

observation driven models have become popular in the applied statistics and econometrics

literature. Typical examples of these models are the generalized autoregressive conditional

heteroskedasticity (GARCH) models of Engle (1982), Bollerslev (1986) and Engle and Bollerslev

(1986), the autoregressive conditional duration and intensity (ACD and ACI, respectively)

models of Engle and Russell (1998) and Russell (2001), the dynamic conditional correlation

(DCC) model of Engle (2002a), the Poisson count models discussed by Davis, Dunsmuir, and

Streett (2003), the dynamic copula models of Patton (2006), and the time-varying quantile

model of Engle and Manganelli (2004). In our modeling framework for time-varying parameters,

many of the existing observation driven models are encompassed as mentioned above.In

addition, new models can be formulated and investigated.

In parameter driven models, the parameters are stochastic processes with their own source

of error. Given past and concurrent observations, the parameters are not perfectly predictable.

Typical examples of parameter driven models are the stochastic volatility (SV) model, see

Shephard (2005) for a detailed discussion, and the stochastic intensity models of Bauwens

and Hautsch (2006) and Koopman, Lucas, and Monteiro (2008). Estimation is usually more

involved for these models because the associated likelihood functions are not available in closed-

form. Exceptions include linear Gaussian state space models and discrete-state hidden Markov

models, see Harvey (1989) and Hamilton (1989), respectively. In most other cases, computing

the likelihood function requires the evaluation of a high-dimensional integral based on simula-

tion methods such as importance sampling and Markov chain Monte Carlo; for example, see

Shephard and Pitt (1997).

2

Page 3

The main contribution of this paper is the development of a framework for time-varying

parameters which is based on the score function of the predictive model density at time t. We

will argue that the score function is an effective choice for introducing a driving mechanism for

time-varying parameters. In particular, by scaling the score function appropriately, standard

observation driven models such as the GARCH, ACD, and ACI models can be recovered.

Application of this framework to other non-linear, non-Gaussian, possibly multivariate, models

will lead to the formulation of new observation driven models.

We refer to our observation driven model based on the score function as the generalized auto-

regressive score (GAS) model. The GAS model has the advantages of other observation driven

models. Likelihood evaluation is straightforward. Extensions to asymmetric, long memory, and

other more complicated dynamics can be considered without introducing further complexities.

Since the GAS model is based on the score, it exploits the complete density structure rather

than means and higher moments only. It differentiates the GAS model from other observation

driven models in the literature, such as the generalized autoregressive moving average models

of Shephard (1995) and ?) and the vector multiplicative error models of Cipollini, Engle, and

Gallo (2006).

In our first illustration,we develop new models for time-varying copulas. The copula

function provides an important tool for the econometrics of financial risk measurement. Patton

(2006) introduced the notion of time-varying copulas and provided the main properties of

dynamic copula functions. Other models for time-varying copulas include Giacomini, H¨ ardle,

and Spokoiny (2007) who developed locally constant copula models, and the stochastic copula

model of Hafner and Manner (2011). Another interesting copula-based model is developed by

Lee and Long (2009) where the multivariate GARCH model is extended with copula functions

to capture any remaining dependence in the volatility of the time series. An extended review

of the recent developments of copula functions in time series models is given by Patton (2009).

In our second illustration, we create a new class of multivariate point-process models for

credit risk. Models for counterparty default and rating transition risk are an important element

in the current regulatory system for financial institutions. Many of the new models are based

on marked point-processes with time-varying intensities for different levels of risk. Parameter

estimation relies on computationally demanding methods, see for example, Duffie, Eckner,

Horel, and Saita (2009). One of the main challenges when modeling credit events are the

3

Page 4

sparse number of transitions for each individual company. We show how a multi-state model for

pooled marked point-processes follows naturally within our framework. We analyze an extensive

data set of Moody’s rating histories of more than 8,000 U.S. corporates over a time span of

almost thirty years. We compare the results of the GAS model with those of its parameter

driven counterpart. The parameters in the benchmark model need to be estimated using a

Markov chain Monte Carlo method which is computationally more demanding compared to

our maximum likelihood procedure. Despite the substantial differences in computing time, the

GAS model produces almost identical estimates of time varying defaults and rating transition

probabilities when compared with those of the parameter driven model.

The remainder of the paper is organized as follows. In Section 2 we provide the basic GAS

specification together with a set of motivating examples. Section 3 describes several new copula

models with time-varying parameters. Section 4 presents the model for marked point-processes

with time-varying parameters. Section 5 concludes.

2 Model specification and properties

In this section we formulate a general class of observation driven time-varying parameter models.

The basic specification is introduced and a set of examples is provided for illustrative purposes.

We also discuss maximum likelihood estimation and model specification.

2.1 Basic model specification

Let N × 1 vector ytdenote the dependent variable of interest, ftthe time-varying parameter

vector, xta vector of exogenous variables (covariates), all at time t, and θ a vector of static

parameters. Define Yt= {y1,...,yt}, Ft= {f0,f1,...,ft}, and Xt= {x1,...,xt}. The

available information set at time t consists of {ft, Ft} where

Ft= {Yt−1, Ft−1, Xt}, for t = 1,...,n.

We assume that ytis generated by the observation density

yt∼ p(yt|ft, Ft; θ). (1)

4

Page 5

Furthermore, we assume that the mechanism for updating the time-varying parameter ft is

given by the familiar autoregressive updating equation

ft+1= ω +

p

∑

i=1

Aist−i+1+

q

∑

j=1

Bjft−j+1, (2)

where ω is a vector of constants, coefficient matrices Aiand Bjhave appropriate dimensions

for i = 1,...,p and j = 1,...,q, while st is an appropriate function of past data, st =

st(yt,ft,Ft;θ). The unknown coefficients in (2) are functions of θ, that is ω = ω(θ), Ai= Ai(θ),

and Bj= Bj(θ) for i = 1,...,p and j = 1,...,q. The main contribution of this paper is the

particular choice for the driving mechanism stthat is applicable over a wide class of observation

densities and non-linear models.

Our approach is based on the observation density (1) for a given parameter ft. When an

observation ytis realized, we update the time-varying ftto the next period t+1 using (2) with

st= St· ∇t,

∇t=∂ lnp(yt|ft, Ft; θ)

∂ft

,St= S(t, ft, Ft; θ),(3)

where S(·) is a matrix function. Given the dependence of the driving mechanism in (2) on the

scaled score vector (3), we let the equations (1) – (3) define the generalized autoregressive score

model with orders p and q. We may abbreviate the resulting model as GAS (p,q).

The use of the score for updating ftis intuitive. It defines a steepest ascent direction for

improving the model’s local fit in terms of the likelihood or density at time t given the current

position of the parameter ft. This provides the natural direction for updating the parameter.

In addition, the score depends on the complete density, and not only on the first or second

order moments of the observations yt. This distinguishes the GAS framework from most of the

other observation driven approaches in the literature. By exploiting the full density structure,

the GAS model introduces new transformations of the data that can be used to update the

time-varying parameter ft.

Via its choice of the scaling matrix St, the GAS model allows for additional flexibility in

how the score is used for updating ft. It is important to note that each different choice for the

scaling matrix Stresults in a different GAS model. The statistical and empirical properties of

each of these models can be different and warrants separate inspection.

5

Page 6

In many situations, it is natural to consider a form of scaling that depends on the variance

of the score. For example, we can define the scaling matrix as

St= I−1

t|t−1,

It|t−1= Et−1[∇t∇′

t], (4)

where Et−1 denotes an expectation with respect to p(yt|ft,Ft;θ). For this choice of St, the

GAS model encompasses the well-known observation driven GARCH model of Engle (1982)

and Bollerslev (1986), the ACD model of Engle and Russell (1998), and the ACI model of

Russell (2001) as well as most of the Poisson count models considered by Davis et al. (2003).

Another possibility that we consider in this paper is the GAS model with scaling matrix

St= Jt|t−1,

J′

t|t−1Jt|t−1= I−1

t|t−1, (5)

where Stis defined as the square root matrix of the (pseudo)-inverse information matrix for (1)

with respect to ft. An advantage of this specific choice for Stis that the statistical properties

of the corresponding GAS model become more tractable. This follows from the fact that for

St= Jt|t−1the GAS step sthas constant unit variance.

Another convenient choice is St= I. The GAS model then captures models such as the auto-

regressive conditional multinomial (ACM) model of Russell and Engle (2005) or the GARMA

models of ?). In the context of a fully generic observation density p(yt|ft,Ft;θ), however, the

statistical properties of the GAS model for these alternative choices of Stare typically much

more complicated.

We can further generalize the GAS updating equation (2) in various directions. For example,

it may be interesting to include exogenous variables in (2), or to generalize the evolution of

ftby including other non-linear effects such as regime-switching. In addition, it may be more

appropriate in some applications to consider long-memory versions of (2), for example

ft+1= ω +

∞

∑

i=1

(i + d − 1)!

i!(d − 1)!st−i+1,

for a scalar ftand a fractional integration parameter d < 1/2. We obtain the fractionally inte-

grated GAS model specification in the same vein as the well-known ARFIMA and FIGARCH

6

Page 7

models, see the contributions of Hosking (1981) and Baillie, Bollerslev, and Mikkelsen (1996),

respectively.

2.2 Special cases of GAS models

In this section we provide a number of simple examples that show how to operationalize the GAS

framework. The examples also reveal that the GAS framework encompasses a large number of

available observation driven models presented in the literature for an appropriate choice of the

scaling matrix St.

Example 1 : GARCH models Consider the basic model yt = σtεt where the Gaussian

disturbance εthas zero mean and unit variance while σtis a time-varying standard deviation.

It is a basic exercise to show that the GAS (1,1) model with St= I−1

t|t−1and ft= σ2

treduces to

ft+1= ω + A1

(y2

t− ft

)+ B1ft,(6)

which is equivalent to the standard GARCH(1,1) model as given by

ft+1= α0+ α1y2

t+ β1ft,ft= σ2

t, (7)

where coefficients α0= ω, α1= A1and β1= B1−A1are unknown and require certain conditions

for stationarity, see Bollerslev (1986). However, if we assume that εt follows a Student’s t

distribution with ν degrees of freedom and unit variance, the GAS (1,1) specification for the

conditional variance leads to the updating equation

ft+1= ω + A1·(1 + 3ν−1)·

In case ν−1= 0, the Student’s t distribution reduces to the Gaussian distribution and update

(

(1 + ν−1)

(1 − 2ν−1)(1 + ν−1y2

t/(1 − 2ν−1) ft)y2

t− ft

)

+ B1ft. (8)

(8) collapses to (6) as required. The recursion in (8), however, has an important difference

with the standard t-GARCH(1,1) model of Bollerslev (1987) which has the Student’s t density

in (1) with the updating equation (6). The denominator of the second term in the right-

hand side of (8) causes a more moderate increase in the variance for a large realization of

|yt| as long as ν is finite. The intuition is clear: if the errors are modeled by a fat-tailed

7

Page 8

distribution, a large absolute realization of yt does not necessitate a substantial increase in

the variance. The GAS updating mechanism for the model with Student’s t errors therefore is

substantially different from its familiar GARCH counterpart. In independent work, a similar

variance updating equation as (8) for the univariate Student’s t distribution is proposed by

Harvey and Chakravarty (2008); they also discuss the properties of the model in more detail.

Recently, Creal, Koopman, and Lucas (2011) have extended this model to the fully multivariate

case with further generalizations and compared it to the popular DCC model of Engle (2002a).

The GAS framework also provides a range of alternative time-varying variance equations

for other heavy-tailed distributions. For example, consider the asymmetric Laplace distribution

obtained by yt= wt·˜ yL

(1 + ϑ2)−1for coefficient ϑ > 0 and where −˜ yL

variables with means ϑσ/21/2and σ/(21/2ϑ), respectively. The random variables wt, ˜ yL

t+(1−wt)·˜ yR

t, where wtis a Bernoulli random variable with Pr[wt= 0] =

tand ˜ yR

tare exponentially distributed random

tand ˜ yR

t

are assumed to be independent. The mean and variance of ytare 0 and σ2, respectively. If we

let ft= log(σ2

t), the GAS step takes the form

st= 2

(21/2(−yt)

ϑσ

− 1

)

· 1{yt|yt≤0}(yt) + 2

(21/2ϑyt

σ

− 1

)

· 1{yt|yt>0}(yt), (9)

where 1A(x) is the indicator function for the set A, that is 1A(x) = 1 if x ∈ A, and zero

otherwise. The GAS driving mechanism (9) is composed of linear segments with unequal

absolute slopes. We can rewrite this as

st=˜ϑ121/2yt

σ

+˜ϑ2

(21/2|yt|

σ

− 2˜ϑ−1

2

)

,(10)

where˜ϑ1= (ϑ2− 1)/ϑ and˜ϑ2= (ϑ2+ 1)/ϑ. Specification (10) is equivalent to the driving

mechanism of the EGARCH model of Nelson (1991), who used the generalized error distribution

(GED) instead of the asymmetric Laplace described here.

Example 2 : MEM, ACD and ACI models Consider the model yt= µtεtwhere εthas

a gamma distribution with density p(εt;α) = Γ(α)−1εα−1

t

ααexp(−αεt), coefficient α and mean

µtas the mean of εt. Using a change of variable, we obtain the model density

p(yt|µt;α) = Γ(α)−1yα−1

t

ααµ−α

t

exp

(

−αyt

µt

)

.(11)

8

Page 9

In case we set ft= µt, the GAS (1,1) updating equation with St= I−1

t|t−1becomes

ft+1= ω + A1(yt− ft) + B1ft. (12)

This specification is equivalent to the multiplicative error model (MEM) proposed by Engle

(2002b) and extended in Engle and Gallo (2006). The exponential distribution is a special case

of the gamma distribution when α = 1. Hence, ACD and ACI models are special cases of the

MEM class. The ACD model of Engle and Russell (1998) follows directly from (11) for α = 1

and factor recursion (12). In case we specify the exponential density in terms of its intensity

rather than its expected duration, we obtain p(yt|λt) = λtexp(−λtyt) with intensity λt= 1/µt.

Let˜ft= log(λt), the GAS (1,1) updating equation becomes

˜ft+1= ω + A1

[

1 − ytexp(˜ft)

]

+ B1˜ft, (13)

which is equivalent to the standard ACI(1,1) model of Russell (2001).

Example 3 : Dynamic exponential family models The class of natural exponential family

models for a vector of observations ytcan be represented by the density function

p(yt|ft, Ft; θ) = exp[γ′yt− c(γ) + h(yt)],(14)

with scalar functions c(·) and h(·) and m × 1 parameter vector γ. We consider replacing γ by

a time-varying parameter vector γtthat is specified as

γt= d + Zft,

with m × 1 constant vector d and m × r factor loading matrix Z. The unknown coefficients

in d and Z are placed in parameter vector θ. Further, we impose a GAS specification on the

time-varying factor ft. The GAS driving mechanism with St= I−1

t|t−1is given by

st= [Z′¨ c(γt)Z]−1Z′[yt− ˙ c(γt)],

where ˙ c(γt) = ∂c(γt)/∂γt and ¨ c(γt) = ∂2c(γt)/∂γt∂γ′

t. This model is directly encompasses

9

Page 10

some well-known models from the literature if we change the scaling choice. For example, for

a Poisson density in (14) and St= I−1

counts of Davis et al. (2003).

t|t−1we recover the observation driven model for Poisson

2.3 Maximum likelihood estimation

A convenient property of observation driven models is the relatively simple way of estimating

parameters by maximum likelihood (ML). This feature applies to the GAS model as well. For

an observed time series y1,...,ynand by adopting the standard prediction error decomposition,

we can express the maximization problem as

ˆθ = argmax

θ

n

∑

t=1

ℓt, (15)

where ℓt= lnp(yt|ft,Ft;θ) for a realization of yt. Evaluating the log-likelihood function of the

GAS model is particularly simple. It only requires the implementation of the GAS updating

equation (2) and the evaluation of ℓtfor a particular value θ∗of θ.

It is possible to formulate recursions for computing the gradient of the likelihood with

respect to the static parameter vector θ. Gradient recursions for the GARCH model have been

developed by Fiorentini, Calzolari, and Panattoni (1996). In case of the GAS (1,1) specification,

the gradient is computed via the chain rule, that is

∂ℓt

∂θ′=

∂ lnpt

∂θ′

+

∂ lnpt

∂f′

t

·∂ft

∂θ′, (16)

with pt= p(yt|ft,Ft;θ) and

∂ft

∂θ′

=

∂ω

∂θ′+ A1∂st−1

= St−1∂∇t−1

∂θ′

+(∇′

+ B1∂ft−1

∂θ′

+(s′

∂θ′,

t−1⊗ I)∂⃗A1

∂θ′+(f′

t−1⊗ I)∂⃗B1

∂θ′, (17)

∂st−1

∂θ′

∂θ′

t−1⊗ I)∂⃗St−1

(18)

where⃗A = vec(A) denotes the vector with the stacked columns of the matrix A, and ⊗ is

the Kronecker matrix product. The derivations for ∂∇t−1/∂θ′and ∂⃗St−1/∂θ′should also

consider the effect of θ through ftas in (16). The log-likelihood derivatives can be computed

10

Page 11

simultaneously with the time-varying parameters ft. The analytic derivatives, particularly for

(18), may be cumbersome to compute in specific cases. We then turn to likelihood maximization

based on numerical derivatives.

We propose to compute standard errors and t-values for the estimated parameters based on

the inverse Hessian of the log-likelihood evaluated at the optimum. In particular, if θ gathers

all static parameters of the model, we conjecture that under suitable regularity conditions such

as those of White (1994) and Wooldridge (1994), the maximum likelihood estimatorˆθ of θ is

consistent and satisfies

√n(ˆθ − θ)

d→ N(0,H−1),

where H = limn→∞E[(∂ℓ/∂θ)(∂ℓ/∂θ′)]/n and ℓ =∑n

for the general class of GAS models is beyond the scope of the present paper. The results

have been established for specific subclasses of GAS models. For example, Davis, Dunsmuir,

t=1ℓt. A formal proof of these results

and Streett (2005) prove consistency and asymptotic normality of the ML estimator for first-

order Poisson count models. Straumann and Mikosch (2006) provide a set of conditions for

consistency and asymptotic normality for the Gaussian GARCH model and for more general

GARCH specifications. The main challenges for proving the result for the general class of

GAS models lie in verifying the stochastic equicontinuity of the likelihood function and in

establishing a contracting property for the non-linear stochastic recurrence equation (2). A

contracting property is needed to prove the stationarity and ergodicity of the data generating

process.

A nice feature of the model is that under the assumption of a correct model specification,

the series st forms a martingale difference series, Et−1[st] = 0. In particular, if we set the

scaling matrix St= Jt|t−1, stis a martingale difference with unit variance. If we then express

the updating equation for GAS (1,1) in its infinite order moving average form, we obtain

ft= (I − B1)−1ω + A1

∞

∑

i=0

Bi

1st−i.

Therefore, it is necessary for the covariance stationarity of ftthat the roots of B1lie inside

the unit circle. Such necessary conditions are helpful for establishing the limiting distribution

results mentioned above. For other choices of St, the derivation of such properties is less evident.

11

Page 12

2.4 Parameterizations

The GAS specification adapts naturally to different parameterizations of the observation density

(1). In the GARCH example of Section 2.2, for example, the time-varying parameter is ft= σ2

t.

If it is preferred to enforce the positivity of σ2

model in terms of˜ft= log(σ2

parameterization. In general, assume that one prefers a different parameterization˜ft= h(ft)

for some continuous and invertible mapping h(·). Let˙ht= ∂h(ft)/∂f′

given the information set Ft. For well behaved densities, the information matrix equals both

the expected outer product of scores and the expected second derivative of the log density.

t, an obvious alternative is to parameterize the

t). The GAS dynamics automatically adapt to the choice of the

twhich is deterministic

Therefore,

˜ J′

t|t−1˜ Jt|t−1=

(

Et−1[(˙h−1

t)′∇t∇′

t˙h−1

t]

)−1

=˙htI−1

t|t−1˙h′

t=˙htJ′

t|t−1Jt|t−1˙h′

t,(19)

where tildes denote that derivatives are taken with respect to˜ftrather than ft. Similarly, we

have

˜∇t=∂ lnp(yt|ft,Ft;θ)

∂˜ft

= (˙h′

t)−1∇t. (20)

The GAS updating step for˜ftwith square root information scaling is then given by

˜ st=˜ Jt|t−1˜∇t=˜ Jt|t−1(˙h′

t)−1J−1

t|t−1st, (21)

since st = Jt|t−1∇t. For the univariate case, it is easy to see that ˜ Jt|t−1(˙h′

For the multivariate case it follows that the updating step under the reparameterization is an

t)−1J−1

t|t−1= 1.

orthogonal linear transformation of the original step since

(

˜ Jt|t−1(˙h′

t)−1J−1

t|t−1

)(

˜ Jt|t−1(˙h′

t)−1J−1

t|t−1

)′

=˜ Jt|t−1(˙h′

t)−1It|t−1(˙ht)−1˜ J′

t|t−1= I, (22)

where the last equality follows from (19). The choice of parameterization thus only has a minor

effect on the form of the updating step stif we adopt Jt|t−1as our scaling matrix. In particular,

the new ˜ st is also a unit variance martingale difference series. Other forms of scaling have

different implications. For example, if we scale the score by the inverse information matrix

I−1

t|t−1, it is easy to derive that the updating step ˜ stfor˜ftequals ˜ st=˙htst.

12

Page 13

3 Dynamic copula models

In this section, we introduce several new dynamic copula models. Patton (2006) introduced

the notion of time-varying copulas, see also Dias and Embrechts (2004), van den Goorbergh,

Genest, and Werker (2005), Lee and Long (2009), and Patton (2009) for a review.

3.1 The dynamic Gaussian copula model

Copulas have recently become popular in financial risk management. A copula is a multivariate

distribution function over a hypercube with uniform marginals. It can be used to link marginal

distributions into a multivariate distribution using Sklar’s theorem in Sklar (1959). In this

section, we demonstrate that the GAS framework can provide a new model specification for

the bivariate Gaussian copula.

We consider a simple Gaussian copula where the GAS model suggests an alternative dy-

namic structure compared to earlier suggestions in the literature. The (Gaussian) correlation

parameter ρtis modeled by the transformed parameter ρt= [1−exp(−ft)]/[1+exp(−ft)]. In

Patton (2006), the driving mechanism for the dynamic bivariate Gaussian copula is given by

ft+1= ω + A1·

m

∑

i=1

Φ−1(u1,t−i+1)Φ−1(u2,t−i+1) + B1ft, (23)

where Φ−1(·) is the inverse of the normal distribution function, u1tand u2tare the probability

integral transforms using the univariate marginals, and m is a positive integer determining the

smoothness of ft. Equation (23) is intuitively appealing and builds on our understanding of

covariances: if the transformed marginals have the same sign, the correlation should increase.

The reverse holds if the transformed marginals are of opposite sign.

By using the density of the Gaussian copula, we can derive the GAS specification for the

time-varying correlation parameter. The score with respect to the correlation parameter is the

same for the Gaussian copula and for the bivariate normal distribution. For m = 1, Patton’s

model (23) reduces to

ft+1= ω + A1· yt+ B1· ft, (24)

13

Page 14

where yt= Φ−1(u1t)Φ−1(u2t). The GAS (1,1) updating equation for ftis obtained as

ft+1= ω + A1

2

(1 − ρ2

t)

[

yt− ρt− ρt(xt− 2)

(1 + ρ2

t)

]

+ B1ft, (25)

where xt= Φ−1(u1t)2+Φ−1(u2t)2. The similarities and differences between (24) and (25) are as

follows. Both models are driven by ytso that positively clustered transformed marginals lead

to an increase of the correlation parameter. The additional scaling factor 2/(1 − ρ2

is a consequence of modeling the transformed correlation parameter ftrather than ρtdirectly.

t) in (25)

The most interesting difference between the two model specifications is that the GAS model

includes the term xt, where xt− 2 is a martingale difference. To understand the impact of

this term, consider two possible scenarios we might observe Φ−1(u1t) = 1 and Φ−1(u2t) = 1 or,

alternatively, Φ−1(u1t) = 0.25 and Φ−1(u2t) = 4. In both cases, the cross-product term yt= 1

is the same and the recursion in (24) will cause ft+1to be the same regardless of which of the

two scenarios we observe. Conversely, the sum of squares term xtin the GAS model provides

information to distinguish between these two cases. The behavior of ft+1will depend on the

current value of the correlation ρt. If the correlation is positive, the impact on the value of

(xt− 2) is negative. In this case, the (xt− 2) term offsets part of the effect of (yt− ρt) if the

latter has a positive value. If (yt−ρt) has a negative value, however, the (xt−2) term reinforces

the magnitude of the GAS step for negative ρt.

For illustrative purposes, we extend the example from Patton (2006) to investigate the

dependence of the daily exchange rates of the German Mark (later Euro), against the US

dollar, with the Japanese Yen and with the British Pound, also both against the US dollar.

The sample period is January 1986 through August 2008. The log returns of the exchange rate

series are analyzed by the AR-GARCH model: an autoregressive process for the conditional

mean and a GARCH process for the conditional variance. We construct the transformed series

for u1tand u2tand use these as inputs for the Gaussian copula model.

Table 1 reports that the log-likelihood value increases 25 to 125 points when considering

GAS instead of Patton for the same number of parameters. The estimates of the parameter B1

imply that the GAS specification leads to a more persistently time-varying correlation process.

However, the increased sensitivity of the score mechanism to correlation shocks in the GAS

specification allows ftto react more fiercely to exchange rate returns of opposite sign if the

14

Page 15

Table 1: Estimation results for different dynamic copula models

Parameter estimates for the GAS and Patton models in (24)–(25). The data are the marginal AR-GARCH

transforms of log exchange rates for the German Mark-US dollar and Japanese Yen-US dollar (left panel) and

for the German Mark-US dollar and British Pound-US dollar (right panel), January 1986–August 2008. The

asymmetric confidence interval is in parentheses for B1, otherwise the standard error is in parentheses.

103ωA1

ln(B1/1 − B1)B1

log-lik

German Mark (Euro)–US $, Japanese Yen–US $

6.110.0585.30

(2.48)(0.009) (0.37)

GAS0.995 1218.16

(0.990,0.998)

Patton

−1.60

(0.85)

0.036

(0.003)

4.27

(0.10)

0.986 1191.51

(0.983,0.989)

German Mark (Euro)–US $, British Pound–US $

12.550.0824.97

(3.55) (0.008) (0.26)

GAS0.9932218.82

(0.988,0.996)

Patton

−0.97

(0.84)

0.025

(0.002)

4.71

(0.11)

0.991 2090.42

(0.989,0.993)

Parameter estimates for the GAS and Patton models in (24)–(25). The data are the marginal AR-GARCH

transforms of log exchange rates for the German Mark-US dollar and Japanese Yen-US dollar (left panel) and

for the German Mark-US dollar and British Pound-US dollar (right panel), January 1986–August 2008. The

asymmetric confidence interval is in parentheses for B1, otherwise the standard error is in parentheses.

1986 1988 199019921994 19961998 2000 20022004 2006 2008

−0.2

0.0

0.2

0.4

0.6

0.8

German Mark (Euro) and Japanese Yen versus Dollar

GAS Patton

198619881990 19921994 1996 1998200020022004 20062008

0.0

0.2

0.4

0.6

0.8

German Mark (Euro) and British Pound versus Dollar

GAS Patton

Figure 1: A copula illustration: comparisons of the correlation parameter estimates for the GAS and Patton

models in (24)–(25). The data are the marginal AR-GARCH transforms of log exchange rates for the German

Mark-US dollar and Japanese Yen-US dollar (left panel) and for the German Mark-US dollar and British

Pound-US dollar (right panel). The sample period is January 1986–August 2008.

15