Generalized Autoregressive Score Models with Applications
ABSTRACT We propose a class of observationdriven time series models referred to as generalized autoregressive score (GAS) models. The mechanism to update the parameters over time is the scaled score of the likelihood function. This new approach provides a unified and consistent framework for introducing timevarying parameters in a wide class of nonlinear models. The GAS model encompasses other wellknown models such as the generalized autoregressive conditional heteroskedasticity, autoregressive conditional duration, autoregressive conditional intensity, and Poisson count models with timevarying mean. In addition, our approach can lead to new formulations of observationdriven models. We illustrate our framework by introducing new model specifications for timevarying copula functions and for multivariate point processes with timevarying parameters. We study the models in detail and provide simulation and empirical evidence. Copyright © 2012 John Wiley & Sons, Ltd.

Technical Report: Time Varying Transition Probabilites for Markov Regime Switching Models
 SourceAvailable from: duke.edu[Show abstract] [Hide abstract]
ABSTRACT: a b s t r a c t This survey reviews the large and growing literature on copulabased models for economic and financial time series. Copulabased multivariate models allow the researcher to specify the models for the marginal distributions separately from the dependence structure that links these distributions to form a joint distribution. This allows for a much greater degree of flexibility in specifying and estimating the model, freeing the researcher from considering only existing multivariate distributions. The author surveys estimation and inference methods and goodnessoffit tests for such models, as well as empirical applications of these copulas for economic and financial time series.Journal of Multivariate Analysis 01/2012; · 1.06 Impact Factor  SourceAvailable from: Francisco Blasques
Page 1
Generalized Autoregressive Score
Models with Applications∗
Drew Creala,Siem Jan Koopmanb,d,Andr´ e Lucasc,d
(a)University of Chicago, Booth School of Business
(b)Department of Econometrics, VU University Amsterdam
(c)Department of Finance, VU University Amsterdam, and Duisenberg school of finance
(d)Tinbergen Institute, Amsterdam
August 9, 2011
Abstract
We propose a class of observation driven time series models referred to as Generalized
Autoregressive Score (GAS) models. The mechanism to update the parameters over time
is the scaled score of the likelihood function. This new approach provides a unified and
consistent framework for introducing timevarying parameters in a wide class of nonlinear
models. The GAS model encompasses other wellknown models such as the generalized
autoregressive conditional heteroskedasticity, the autoregressive conditional duration, the
autoregressive conditional intensity, and Poisson count models with timevarying mean.
In addition, our approach can lead to new formulations of observation driven models. We
illustrate our framework by introducing new model specifications for timevarying copula
functions and for multivariate point processes with timevarying parameters. We study
the models in detail and provide simulation and empirical evidence.
Keywords : Copula functions, Dynamic models, Marked point processes, Timevarying
parameters.
JEL classification codes : C10, C22, C32, C51.
∗We would like to thank Karim Abadir, Konrad Banachewicz, Charles Bos, Jianqing Fan, Clive Granger,
Andrew Harvey, Marius Ooms, Neil Shephard and Michel van der Wel for their comments on an earlier draft
of the paper. We have benefited from the comments of participants at the conference “HighFrequency Data
Analysis in Financial Markets” of Hitotsubashi University, Tokyo, at a meeting of Society for Financial Econo
metrics (SoFiE) in Geneva, and at seminar presentations at the University of Alicante, University of Chicago,
NAKE 2008 Utrecht research day, Center for Operations Research and Econometrics at Universit´ e Catholique
de Louvain, Imperial College London, OxfordMan Institute, Princeton University, Erasmus University Rotter
dam, Tinbergen Institute and VU University Amsterdam. We thank Moody’s for providing the credit rating
transition data for one of our applications. Correspondence: Drew D. Creal, University of Chicago Booth School
of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637. Email: Drew.Creal@chicagobooth.edu
Page 2
1Introduction
In many settings of empirical interest, time variation in a selection of model parameters
is important for capturing the dynamic behavior of univariate and multivariate time series
processes. Time series models with timevarying parameters have been categorized by Cox
(1981) into two classes of models: observation driven models and parameter driven models.
In the observation driven approach, time variation of the parameters is introduced by letting
parameters be functions of lagged dependent variables as well as contemporaneous and lagged
exogenous variables. Although the parameters are stochastic, they are perfectly predictable
given the past information. This approach simplifies likelihood evaluation and explains why
observation driven models have become popular in the applied statistics and econometrics
literature. Typical examples of these models are the generalized autoregressive conditional
heteroskedasticity (GARCH) models of Engle (1982), Bollerslev (1986) and Engle and Bollerslev
(1986), the autoregressive conditional duration and intensity (ACD and ACI, respectively)
models of Engle and Russell (1998) and Russell (2001), the dynamic conditional correlation
(DCC) model of Engle (2002a), the Poisson count models discussed by Davis, Dunsmuir, and
Streett (2003), the dynamic copula models of Patton (2006), and the timevarying quantile
model of Engle and Manganelli (2004). In our modeling framework for timevarying parameters,
many of the existing observation driven models are encompassed as mentioned above. In
addition, new models can be formulated and investigated.
In parameter driven models, the parameters are stochastic processes with their own source
of error. Given past and concurrent observations, the parameters are not perfectly predictable.
Typical examples of parameter driven models are the stochastic volatility (SV) model, see
Shephard (2005) for a detailed discussion, and the stochastic intensity models of Bauwens
and Hautsch (2006) and Koopman, Lucas, and Monteiro (2008). Estimation is usually more
involved for these models because the associated likelihood functions are not available in closed
form. Exceptions include linear Gaussian state space models and discretestate hidden Markov
models, see Harvey (1989) and Hamilton (1989), respectively. In most other cases, computing
the likelihood function requires the evaluation of a highdimensional integral based on simula
tion methods such as importance sampling and Markov chain Monte Carlo; for example, see
Shephard and Pitt (1997).
2
Page 3
The main contribution of this paper is the development of a framework for timevarying
parameters which is based on the score function of the predictive model density at time t. We
will argue that the score function is an effective choice for introducing a driving mechanism for
timevarying parameters. In particular, by scaling the score function appropriately, standard
observation driven models such as the GARCH, ACD, and ACI models can be recovered.
Application of this framework to other nonlinear, nonGaussian, possibly multivariate, models
will lead to the formulation of new observation driven models.
We refer to our observation driven model based on the score function as the generalized auto
regressive score (GAS) model. The GAS model has the advantages of other observation driven
models. Likelihood evaluation is straightforward. Extensions to asymmetric, long memory, and
other more complicated dynamics can be considered without introducing further complexities.
Since the GAS model is based on the score, it exploits the complete density structure rather
than means and higher moments only. It differentiates the GAS model from other observation
driven models in the literature, such as the generalized autoregressive moving average models
of Shephard (1995) and ?) and the vector multiplicative error models of Cipollini, Engle, and
Gallo (2006).
In our first illustration, we develop new models for timevarying copulas. The copula
function provides an important tool for the econometrics of financial risk measurement. Patton
(2006) introduced the notion of timevarying copulas and provided the main properties of
dynamic copula functions. Other models for timevarying copulas include Giacomini, H¨ ardle,
and Spokoiny (2007) who developed locally constant copula models, and the stochastic copula
model of Hafner and Manner (2011). Another interesting copulabased model is developed by
Lee and Long (2009) where the multivariate GARCH model is extended with copula functions
to capture any remaining dependence in the volatility of the time series. An extended review
of the recent developments of copula functions in time series models is given by Patton (2009).
In our second illustration, we create a new class of multivariate pointprocess models for
credit risk. Models for counterparty default and rating transition risk are an important element
in the current regulatory system for financial institutions. Many of the new models are based
on marked pointprocesses with timevarying intensities for different levels of risk. Parameter
estimation relies on computationally demanding methods, see for example, Duffie, Eckner,
Horel, and Saita (2009). One of the main challenges when modeling credit events are the
3
Page 4
sparse number of transitions for each individual company. We show how a multistate model for
pooled marked pointprocesses follows naturally within our framework. We analyze an extensive
data set of Moody’s rating histories of more than 8,000 U.S. corporates over a time span of
almost thirty years. We compare the results of the GAS model with those of its parameter
driven counterpart. The parameters in the benchmark model need to be estimated using a
Markov chain Monte Carlo method which is computationally more demanding compared to
our maximum likelihood procedure. Despite the substantial differences in computing time, the
GAS model produces almost identical estimates of time varying defaults and rating transition
probabilities when compared with those of the parameter driven model.
The remainder of the paper is organized as follows. In Section 2 we provide the basic GAS
specification together with a set of motivating examples. Section 3 describes several new copula
models with timevarying parameters. Section 4 presents the model for marked pointprocesses
with timevarying parameters. Section 5 concludes.
2 Model specification and properties
In this section we formulate a general class of observation driven timevarying parameter models.
The basic specification is introduced and a set of examples is provided for illustrative purposes.
We also discuss maximum likelihood estimation and model specification.
2.1 Basic model specification
Let N × 1 vector ytdenote the dependent variable of interest, ftthe timevarying parameter
vector, xta vector of exogenous variables (covariates), all at time t, and θ a vector of static
parameters. Define Yt= {y1,...,yt}, Ft= {f0,f1,...,ft}, and Xt= {x1,...,xt}. The
available information set at time t consists of {ft, Ft} where
Ft= {Yt−1, Ft−1, Xt},for t = 1,...,n.
We assume that ytis generated by the observation density
yt∼ p(ytft, Ft; θ).(1)
4
Page 5
Furthermore, we assume that the mechanism for updating the timevarying parameter ft is
given by the familiar autoregressive updating equation
ft+1= ω +
p
∑
i=1
Aist−i+1+
q
∑
j=1
Bjft−j+1,(2)
where ω is a vector of constants, coefficient matrices Aiand Bjhave appropriate dimensions
for i = 1,...,p and j = 1,...,q, while st is an appropriate function of past data, st =
st(yt,ft,Ft;θ). The unknown coefficients in (2) are functions of θ, that is ω = ω(θ), Ai= Ai(θ),
and Bj= Bj(θ) for i = 1,...,p and j = 1,...,q. The main contribution of this paper is the
particular choice for the driving mechanism stthat is applicable over a wide class of observation
densities and nonlinear models.
Our approach is based on the observation density (1) for a given parameter ft. When an
observation ytis realized, we update the timevarying ftto the next period t+1 using (2) with
st= St· ∇t,
∇t=∂ lnp(ytft, Ft; θ)
∂ft
,St= S(t, ft, Ft; θ),(3)
where S(·) is a matrix function. Given the dependence of the driving mechanism in (2) on the
scaled score vector (3), we let the equations (1) – (3) define the generalized autoregressive score
model with orders p and q. We may abbreviate the resulting model as GAS (p,q).
The use of the score for updating ftis intuitive. It defines a steepest ascent direction for
improving the model’s local fit in terms of the likelihood or density at time t given the current
position of the parameter ft. This provides the natural direction for updating the parameter.
In addition, the score depends on the complete density, and not only on the first or second
order moments of the observations yt. This distinguishes the GAS framework from most of the
other observation driven approaches in the literature. By exploiting the full density structure,
the GAS model introduces new transformations of the data that can be used to update the
timevarying parameter ft.
Via its choice of the scaling matrix St, the GAS model allows for additional flexibility in
how the score is used for updating ft. It is important to note that each different choice for the
scaling matrix Stresults in a different GAS model. The statistical and empirical properties of
each of these models can be different and warrants separate inspection.
5
Page 6
In many situations, it is natural to consider a form of scaling that depends on the variance
of the score. For example, we can define the scaling matrix as
St= I−1
tt−1,
Itt−1= Et−1[∇t∇′
t], (4)
where Et−1 denotes an expectation with respect to p(ytft,Ft;θ). For this choice of St, the
GAS model encompasses the wellknown observation driven GARCH model of Engle (1982)
and Bollerslev (1986), the ACD model of Engle and Russell (1998), and the ACI model of
Russell (2001) as well as most of the Poisson count models considered by Davis et al. (2003).
Another possibility that we consider in this paper is the GAS model with scaling matrix
St= Jtt−1,
J′
tt−1Jtt−1= I−1
tt−1,(5)
where Stis defined as the square root matrix of the (pseudo)inverse information matrix for (1)
with respect to ft. An advantage of this specific choice for Stis that the statistical properties
of the corresponding GAS model become more tractable. This follows from the fact that for
St= Jtt−1the GAS step sthas constant unit variance.
Another convenient choice is St= I. The GAS model then captures models such as the auto
regressive conditional multinomial (ACM) model of Russell and Engle (2005) or the GARMA
models of ?). In the context of a fully generic observation density p(ytft,Ft;θ), however, the
statistical properties of the GAS model for these alternative choices of Stare typically much
more complicated.
We can further generalize the GAS updating equation (2) in various directions. For example,
it may be interesting to include exogenous variables in (2), or to generalize the evolution of
ftby including other nonlinear effects such as regimeswitching. In addition, it may be more
appropriate in some applications to consider longmemory versions of (2), for example
ft+1= ω +
∞
∑
i=1
(i + d − 1)!
i!(d − 1)!st−i+1,
for a scalar ftand a fractional integration parameter d < 1/2. We obtain the fractionally inte
grated GAS model specification in the same vein as the wellknown ARFIMA and FIGARCH
6
Page 7
models, see the contributions of Hosking (1981) and Baillie, Bollerslev, and Mikkelsen (1996),
respectively.
2.2Special cases of GAS models
In this section we provide a number of simple examples that show how to operationalize the GAS
framework. The examples also reveal that the GAS framework encompasses a large number of
available observation driven models presented in the literature for an appropriate choice of the
scaling matrix St.
Example 1 : GARCH models Consider the basic model yt = σtεt where the Gaussian
disturbance εthas zero mean and unit variance while σtis a timevarying standard deviation.
It is a basic exercise to show that the GAS (1,1) model with St= I−1
tt−1and ft= σ2
treduces to
ft+1= ω + A1
(y2
t− ft
)+ B1ft,(6)
which is equivalent to the standard GARCH(1,1) model as given by
ft+1= α0+ α1y2
t+ β1ft,ft= σ2
t,(7)
where coefficients α0= ω, α1= A1and β1= B1−A1are unknown and require certain conditions
for stationarity, see Bollerslev (1986). However, if we assume that εt follows a Student’s t
distribution with ν degrees of freedom and unit variance, the GAS (1,1) specification for the
conditional variance leads to the updating equation
ft+1= ω + A1·(1 + 3ν−1)·
In case ν−1= 0, the Student’s t distribution reduces to the Gaussian distribution and update
(
(1 + ν−1)
(1 − 2ν−1)(1 + ν−1y2
t/(1 − 2ν−1) ft)y2
t− ft
)
+ B1ft. (8)
(8) collapses to (6) as required. The recursion in (8), however, has an important difference
with the standard tGARCH(1,1) model of Bollerslev (1987) which has the Student’s t density
in (1) with the updating equation (6). The denominator of the second term in the right
hand side of (8) causes a more moderate increase in the variance for a large realization of
yt as long as ν is finite. The intuition is clear: if the errors are modeled by a fattailed
7
Page 8
distribution, a large absolute realization of yt does not necessitate a substantial increase in
the variance. The GAS updating mechanism for the model with Student’s t errors therefore is
substantially different from its familiar GARCH counterpart. In independent work, a similar
variance updating equation as (8) for the univariate Student’s t distribution is proposed by
Harvey and Chakravarty (2008); they also discuss the properties of the model in more detail.
Recently, Creal, Koopman, and Lucas (2011) have extended this model to the fully multivariate
case with further generalizations and compared it to the popular DCC model of Engle (2002a).
The GAS framework also provides a range of alternative timevarying variance equations
for other heavytailed distributions. For example, consider the asymmetric Laplace distribution
obtained by yt= wt·˜ yL
(1 + ϑ2)−1for coefficient ϑ > 0 and where −˜ yL
variables with means ϑσ/21/2and σ/(21/2ϑ), respectively. The random variables wt, ˜ yL
t+(1−wt)·˜ yR
t, where wtis a Bernoulli random variable with Pr[wt= 0] =
tand ˜ yR
tare exponentially distributed random
tand ˜ yR
t
are assumed to be independent. The mean and variance of ytare 0 and σ2, respectively. If we
let ft= log(σ2
t), the GAS step takes the form
st= 2
(21/2(−yt)
ϑσ
− 1
)
· 1{ytyt≤0}(yt) + 2
(21/2ϑyt
σ
− 1
)
· 1{ytyt>0}(yt), (9)
where 1A(x) is the indicator function for the set A, that is 1A(x) = 1 if x ∈ A, and zero
otherwise. The GAS driving mechanism (9) is composed of linear segments with unequal
absolute slopes. We can rewrite this as
st=˜ϑ121/2yt
σ
+˜ϑ2
(21/2yt
σ
− 2˜ϑ−1
2
)
,(10)
where˜ϑ1= (ϑ2− 1)/ϑ and˜ϑ2= (ϑ2+ 1)/ϑ. Specification (10) is equivalent to the driving
mechanism of the EGARCH model of Nelson (1991), who used the generalized error distribution
(GED) instead of the asymmetric Laplace described here.
Example 2 : MEM, ACD and ACI models Consider the model yt= µtεtwhere εthas
a gamma distribution with density p(εt;α) = Γ(α)−1εα−1
t
ααexp(−αεt), coefficient α and mean
µtas the mean of εt. Using a change of variable, we obtain the model density
p(ytµt;α) = Γ(α)−1yα−1
t
ααµ−α
t
exp
(
−αyt
µt
)
.(11)
8
Page 9
In case we set ft= µt, the GAS (1,1) updating equation with St= I−1
tt−1becomes
ft+1= ω + A1(yt− ft) + B1ft. (12)
This specification is equivalent to the multiplicative error model (MEM) proposed by Engle
(2002b) and extended in Engle and Gallo (2006). The exponential distribution is a special case
of the gamma distribution when α = 1. Hence, ACD and ACI models are special cases of the
MEM class. The ACD model of Engle and Russell (1998) follows directly from (11) for α = 1
and factor recursion (12). In case we specify the exponential density in terms of its intensity
rather than its expected duration, we obtain p(ytλt) = λtexp(−λtyt) with intensity λt= 1/µt.
Let˜ft= log(λt), the GAS (1,1) updating equation becomes
˜ft+1= ω + A1
[
1 − ytexp(˜ft)
]
+ B1˜ft, (13)
which is equivalent to the standard ACI(1,1) model of Russell (2001).
Example 3 : Dynamic exponential family models The class of natural exponential family
models for a vector of observations ytcan be represented by the density function
p(ytft, Ft; θ) = exp[γ′yt− c(γ) + h(yt)],(14)
with scalar functions c(·) and h(·) and m × 1 parameter vector γ. We consider replacing γ by
a timevarying parameter vector γtthat is specified as
γt= d + Zft,
with m × 1 constant vector d and m × r factor loading matrix Z. The unknown coefficients
in d and Z are placed in parameter vector θ. Further, we impose a GAS specification on the
timevarying factor ft. The GAS driving mechanism with St= I−1
tt−1is given by
st= [Z′¨ c(γt)Z]−1Z′[yt− ˙ c(γt)],
where ˙ c(γt) = ∂c(γt)/∂γt and ¨ c(γt) = ∂2c(γt)/∂γt∂γ′
t. This model is directly encompasses
9
Page 10
some wellknown models from the literature if we change the scaling choice. For example, for
a Poisson density in (14) and St= I−1
counts of Davis et al. (2003).
tt−1we recover the observation driven model for Poisson
2.3Maximum likelihood estimation
A convenient property of observation driven models is the relatively simple way of estimating
parameters by maximum likelihood (ML). This feature applies to the GAS model as well. For
an observed time series y1,...,ynand by adopting the standard prediction error decomposition,
we can express the maximization problem as
ˆθ = argmax
θ
n
∑
t=1
ℓt,(15)
where ℓt= lnp(ytft,Ft;θ) for a realization of yt. Evaluating the loglikelihood function of the
GAS model is particularly simple. It only requires the implementation of the GAS updating
equation (2) and the evaluation of ℓtfor a particular value θ∗of θ.
It is possible to formulate recursions for computing the gradient of the likelihood with
respect to the static parameter vector θ. Gradient recursions for the GARCH model have been
developed by Fiorentini, Calzolari, and Panattoni (1996). In case of the GAS (1,1) specification,
the gradient is computed via the chain rule, that is
∂ℓt
∂θ′=
∂ lnpt
∂θ′
+
∂ lnpt
∂f′
t
·∂ft
∂θ′, (16)
with pt= p(ytft,Ft;θ) and
∂ft
∂θ′
=
∂ω
∂θ′+ A1∂st−1
= St−1∂∇t−1
∂θ′
+(∇′
+ B1∂ft−1
∂θ′
+(s′
∂θ′,
t−1⊗ I)∂⃗A1
∂θ′+(f′
t−1⊗ I)∂⃗B1
∂θ′, (17)
∂st−1
∂θ′
∂θ′
t−1⊗ I)∂⃗St−1
(18)
where⃗A = vec(A) denotes the vector with the stacked columns of the matrix A, and ⊗ is
the Kronecker matrix product. The derivations for ∂∇t−1/∂θ′and ∂⃗St−1/∂θ′should also
consider the effect of θ through ftas in (16). The loglikelihood derivatives can be computed
10
Page 11
simultaneously with the timevarying parameters ft. The analytic derivatives, particularly for
(18), may be cumbersome to compute in specific cases. We then turn to likelihood maximization
based on numerical derivatives.
We propose to compute standard errors and tvalues for the estimated parameters based on
the inverse Hessian of the loglikelihood evaluated at the optimum. In particular, if θ gathers
all static parameters of the model, we conjecture that under suitable regularity conditions such
as those of White (1994) and Wooldridge (1994), the maximum likelihood estimatorˆθ of θ is
consistent and satisfies
√n(ˆθ − θ)
d→ N(0,H−1),
where H = limn→∞E[(∂ℓ/∂θ)(∂ℓ/∂θ′)]/n and ℓ =∑n
for the general class of GAS models is beyond the scope of the present paper. The results
have been established for specific subclasses of GAS models. For example, Davis, Dunsmuir,
t=1ℓt. A formal proof of these results
and Streett (2005) prove consistency and asymptotic normality of the ML estimator for first
order Poisson count models. Straumann and Mikosch (2006) provide a set of conditions for
consistency and asymptotic normality for the Gaussian GARCH model and for more general
GARCH specifications. The main challenges for proving the result for the general class of
GAS models lie in verifying the stochastic equicontinuity of the likelihood function and in
establishing a contracting property for the nonlinear stochastic recurrence equation (2). A
contracting property is needed to prove the stationarity and ergodicity of the data generating
process.
A nice feature of the model is that under the assumption of a correct model specification,
the series st forms a martingale difference series, Et−1[st] = 0. In particular, if we set the
scaling matrix St= Jtt−1, stis a martingale difference with unit variance. If we then express
the updating equation for GAS (1,1) in its infinite order moving average form, we obtain
ft= (I − B1)−1ω + A1
∞
∑
i=0
Bi
1st−i.
Therefore, it is necessary for the covariance stationarity of ftthat the roots of B1lie inside
the unit circle. Such necessary conditions are helpful for establishing the limiting distribution
results mentioned above. For other choices of St, the derivation of such properties is less evident.
11
Page 12
2.4Parameterizations
The GAS specification adapts naturally to different parameterizations of the observation density
(1). In the GARCH example of Section 2.2, for example, the timevarying parameter is ft= σ2
t.
If it is preferred to enforce the positivity of σ2
model in terms of˜ft= log(σ2
parameterization. In general, assume that one prefers a different parameterization˜ft= h(ft)
for some continuous and invertible mapping h(·). Let˙ht= ∂h(ft)/∂f′
given the information set Ft. For well behaved densities, the information matrix equals both
the expected outer product of scores and the expected second derivative of the log density.
t, an obvious alternative is to parameterize the
t). The GAS dynamics automatically adapt to the choice of the
twhich is deterministic
Therefore,
˜ J′
tt−1˜ Jtt−1=
(
Et−1[(˙h−1
t)′∇t∇′
t˙h−1
t]
)−1
=˙htI−1
tt−1˙h′
t=˙htJ′
tt−1Jtt−1˙h′
t, (19)
where tildes denote that derivatives are taken with respect to˜ftrather than ft. Similarly, we
have
˜∇t=∂ lnp(ytft,Ft;θ)
∂˜ft
= (˙h′
t)−1∇t. (20)
The GAS updating step for˜ftwith square root information scaling is then given by
˜ st=˜ Jtt−1˜∇t=˜ Jtt−1(˙h′
t)−1J−1
tt−1st,(21)
since st = Jtt−1∇t. For the univariate case, it is easy to see that ˜ Jtt−1(˙h′
For the multivariate case it follows that the updating step under the reparameterization is an
t)−1J−1
tt−1= 1.
orthogonal linear transformation of the original step since
(
˜ Jtt−1(˙h′
t)−1J−1
tt−1
)(
˜ Jtt−1(˙h′
t)−1J−1
tt−1
)′
=˜ Jtt−1(˙h′
t)−1Itt−1(˙ht)−1˜ J′
tt−1= I, (22)
where the last equality follows from (19). The choice of parameterization thus only has a minor
effect on the form of the updating step stif we adopt Jtt−1as our scaling matrix. In particular,
the new ˜ st is also a unit variance martingale difference series. Other forms of scaling have
different implications. For example, if we scale the score by the inverse information matrix
I−1
tt−1, it is easy to derive that the updating step ˜ stfor˜ftequals ˜ st=˙htst.
12
Page 13
3Dynamic copula models
In this section, we introduce several new dynamic copula models. Patton (2006) introduced
the notion of timevarying copulas, see also Dias and Embrechts (2004), van den Goorbergh,
Genest, and Werker (2005), Lee and Long (2009), and Patton (2009) for a review.
3.1The dynamic Gaussian copula model
Copulas have recently become popular in financial risk management. A copula is a multivariate
distribution function over a hypercube with uniform marginals. It can be used to link marginal
distributions into a multivariate distribution using Sklar’s theorem in Sklar (1959). In this
section, we demonstrate that the GAS framework can provide a new model specification for
the bivariate Gaussian copula.
We consider a simple Gaussian copula where the GAS model suggests an alternative dy
namic structure compared to earlier suggestions in the literature. The (Gaussian) correlation
parameter ρtis modeled by the transformed parameter ρt= [1−exp(−ft)]/[1+exp(−ft)]. In
Patton (2006), the driving mechanism for the dynamic bivariate Gaussian copula is given by
ft+1= ω + A1·
m
∑
i=1
Φ−1(u1,t−i+1)Φ−1(u2,t−i+1) + B1ft, (23)
where Φ−1(·) is the inverse of the normal distribution function, u1tand u2tare the probability
integral transforms using the univariate marginals, and m is a positive integer determining the
smoothness of ft. Equation (23) is intuitively appealing and builds on our understanding of
covariances: if the transformed marginals have the same sign, the correlation should increase.
The reverse holds if the transformed marginals are of opposite sign.
By using the density of the Gaussian copula, we can derive the GAS specification for the
timevarying correlation parameter. The score with respect to the correlation parameter is the
same for the Gaussian copula and for the bivariate normal distribution. For m = 1, Patton’s
model (23) reduces to
ft+1= ω + A1· yt+ B1· ft, (24)
13
Page 14
where yt= Φ−1(u1t)Φ−1(u2t). The GAS (1,1) updating equation for ftis obtained as
ft+1= ω + A1
2
(1 − ρ2
t)
[
yt− ρt− ρt(xt− 2)
(1 + ρ2
t)
]
+ B1ft,(25)
where xt= Φ−1(u1t)2+Φ−1(u2t)2. The similarities and differences between (24) and (25) are as
follows. Both models are driven by ytso that positively clustered transformed marginals lead
to an increase of the correlation parameter. The additional scaling factor 2/(1 − ρ2
is a consequence of modeling the transformed correlation parameter ftrather than ρtdirectly.
t) in (25)
The most interesting difference between the two model specifications is that the GAS model
includes the term xt, where xt− 2 is a martingale difference. To understand the impact of
this term, consider two possible scenarios we might observe Φ−1(u1t) = 1 and Φ−1(u2t) = 1 or,
alternatively, Φ−1(u1t) = 0.25 and Φ−1(u2t) = 4. In both cases, the crossproduct term yt= 1
is the same and the recursion in (24) will cause ft+1to be the same regardless of which of the
two scenarios we observe. Conversely, the sum of squares term xtin the GAS model provides
information to distinguish between these two cases. The behavior of ft+1will depend on the
current value of the correlation ρt. If the correlation is positive, the impact on the value of
(xt− 2) is negative. In this case, the (xt− 2) term offsets part of the effect of (yt− ρt) if the
latter has a positive value. If (yt−ρt) has a negative value, however, the (xt−2) term reinforces
the magnitude of the GAS step for negative ρt.
For illustrative purposes, we extend the example from Patton (2006) to investigate the
dependence of the daily exchange rates of the German Mark (later Euro), against the US
dollar, with the Japanese Yen and with the British Pound, also both against the US dollar.
The sample period is January 1986 through August 2008. The log returns of the exchange rate
series are analyzed by the ARGARCH model: an autoregressive process for the conditional
mean and a GARCH process for the conditional variance. We construct the transformed series
for u1tand u2tand use these as inputs for the Gaussian copula model.
Table 1 reports that the loglikelihood value increases 25 to 125 points when considering
GAS instead of Patton for the same number of parameters. The estimates of the parameter B1
imply that the GAS specification leads to a more persistently timevarying correlation process.
However, the increased sensitivity of the score mechanism to correlation shocks in the GAS
specification allows ftto react more fiercely to exchange rate returns of opposite sign if the
14
Page 15
Table 1: Estimation results for different dynamic copula models
Parameter estimates for the GAS and Patton models in (24)–(25). The data are the marginal ARGARCH
transforms of log exchange rates for the German MarkUS dollar and Japanese YenUS dollar (left panel) and
for the German MarkUS dollar and British PoundUS dollar (right panel), January 1986–August 2008. The
asymmetric confidence interval is in parentheses for B1, otherwise the standard error is in parentheses.
103ωA1
ln(B1/1 − B1)B1
loglik
German Mark (Euro)–US $, Japanese Yen–US $
6.110.0585.30
(2.48)(0.009)(0.37)
GAS0.9951218.16
(0.990,0.998)
Patton
−1.60
(0.85)
0.036
(0.003)
4.27
(0.10)
0.986 1191.51
(0.983,0.989)
German Mark (Euro)–US $, British Pound–US $
12.550.0824.97
(3.55) (0.008) (0.26)
GAS0.9932218.82
(0.988,0.996)
Patton
−0.97
(0.84)
0.025
(0.002)
4.71
(0.11)
0.9912090.42
(0.989,0.993)
Parameter estimates for the GAS and Patton models in (24)–(25). The data are the marginal ARGARCH
transforms of log exchange rates for the German MarkUS dollar and Japanese YenUS dollar (left panel) and
for the German MarkUS dollar and British PoundUS dollar (right panel), January 1986–August 2008. The
asymmetric confidence interval is in parentheses for B1, otherwise the standard error is in parentheses.
198619881990 199219941996199820002002200420062008
−0.2
0.0
0.2
0.4
0.6
0.8
German Mark (Euro) and Japanese Yen versus Dollar
GAS Patton
19861988199019921994 1996199820002002200420062008
0.0
0.2
0.4
0.6
0.8
German Mark (Euro) and British Pound versus Dollar
GAS Patton
Figure 1: A copula illustration: comparisons of the correlation parameter estimates for the GAS and Patton
models in (24)–(25). The data are the marginal ARGARCH transforms of log exchange rates for the German
MarkUS dollar and Japanese YenUS dollar (left panel) and for the German MarkUS dollar and British
PoundUS dollar (right panel). The sample period is January 1986–August 2008.
15