ArticlePDF Available

Hierarchical Dynamic Models

Authors:

Abstract and Figures

Hierarchical dynamic models Consider a population of students divided into groups (schools or classes), for which we believe there are similarities (about a certain aspect of interest) within students in the same group. More specifically, suppose we are interested in analyzing the proficiency in maths of students attending at the same level at different schools, and all of them are submitted to the same test. It is reasonable to expect that the average score obtained by the students is not the same at every school, as characteristics of the school could possibly be affecting this average. For this specific example, the teaching skills of the maths teacher could be an important aspect. Other variables could also be important, such as the average social-economic background of the parents. This problem can be expressed in a hierarchical framework written in two hierarchical levels: first, a regression equation is specified at the student ...
Content may be subject to copyright.
Hierarchical Dynamic Models
Marina Silva Paez and Dani Gamerman
Universidade Federal do Rio de Janeiro, Brazil
marina,dani@im.ufrj.br
1 Introduction
Consider a population of students divided into groups (schools or classes), for which we
believe there are similarities (about a certain aspect of interest) within students in the
same group. More specifically, suppose we are interested in analyzing the proficiency in
maths of students attending to the same level at different schools, and all of them are
submitted to the same test. It is reasonable to expect that the average score obtained by
the students is not the same at every school, as characteristics of the school could possibly
be affecting this average. For this specific example, the teaching skills of the maths
teacher could be an important aspect. Other variables could also be important, such as
the average social-economic background of the parents. This problem can be expressed in
a hierarchical framework written in two hierarchical levels: firstly, a regression equation
is specified at the student level, possibly including explanatory variables such as gender
or age. The intercept and possibly some regression coefficients of this regression are
allowed to vary between schools, according to equations specified in the second level of
the hierarchy. This equations can be specified, for example, as regressions at the school
level, taking into account explanatory variables at this level. By allowing the regression
coefficients of certain explanatory variables to vary between schools, the model is taking
into account the fact that the effects of these variables in the maths score of the students
1
can vary by school.
Suppose now that the scores are obtained yearly for a certain period of time. It
is reasonable to expect that the conditions of each school could be changing through
time. Therefore, a modification can be made in the model described above to allow the
regression intercept to vary between schools and also through time. In a more general
framework, all the regression coefficients can be allowed to vary within these dimensions.
Hierarchical models that consider a time variation for the parameters through dynamic
linear models (Harrison and Stevens, 1979), are denominated Dynamic Hierarchical Mod-
els. These models are well documented in Gamerman and Migon (1993).
Dynamic Hierarchical Models can be applied to data in many fields of interest. Specif-
ically, many applications are made to environmental data due to the usual need of mod-
eling their variation in time, and many times in space as well. Some motivating examples
and applications to real data-sets are presented throughout this chapter. The remainder
of the chapter is organized as follows: in Section 2, the Dynamic Hierarchical Models are
presented, and inference for these models is presented in Section 3. Section 4 presents
extensions of these models for observations in time and space. Finally Section 5 presents
some concluding remarks.
2 Dynamic Hierarchical Models for Univariate Ob-
servations
In this section, Dynamic Hierarchical Models are introduced through Example 1 below,
that illustrates the problem presented in the introduction of this chapter. After the
methodology is presented and its specific notation is introduced, two other examples will
2
be given.
Example 1: maths scores of pupils from different schools over time
This example illustrates the problem presented in the introduction of this chapter,
where the interest lies in modeling the scores obtained in maths tests by students of
different schools.
Firstly, suppose we have a total Jschools, with njstudents being tested in the jth
school, j= 1,· · · , J. Let yij be the maths score for the ith student of the jth school,
and suppose his/her age (x1ij ) and gender (x2ij ) were also observed. A linear regression
model could be specified to explain the maths scores using age and gender as explanatory
variables. A possibly better specification, however, would take into consideration that
the students coming from the same school tend to have similarities. This means that
students with the same characteristics (gender and age) should not be expected to achieve
the same scores if they study at different schools. To accommodate this, one could allow
the intercept of the original regression to vary by school. Variables observed at the school
level could help explaining this variation. In this example, we suppose that the following
school related variables were also observed: years of experience of the maths teacher
(x3j), number of maths classes per week (x4j), and an indicator of whether the students
received any support from a tutor (x5j), j= 1,· · · , J. It is reasonable to expect that the
regression coefficients related to these explanatory variables will be all positive - meaning
that the scores of the students of a certain school tend to be higher if the maths teacher
is more experienced, they have higher number of maths classes and received the support
of a tutor. A possible hierarchical model for this data would be:
yij =β0,j +β1x1ij +β2x2ij +ij , ij N(0, σ2),
β0,j =β0+β3x3j+β4x4j+β5x5j+uj, ujN(0, σ2
u).
3
Now suppose tests are applied yearly to students of a certain level over a period of
Tyears. For simplification purposes, we suppose that the number of students tested in
each school does not vary through the years, under the hypothesis that no student will
change schools over this period, or fail an year. It is now reasonable to assume that the
intercept of the regression at the student level is changing over time, as well as by school,
if you take into account that characteristics of the schools could be changing over time.
We will also suppose that the intercept of the regression at the school level is changing
over time through an autoregressive structure. That reflects that other characteristics
of the school, which are not specified in the model, can also be affecting the scores of
the students - and these characteristics may vary smoothly in time. The proposed model
becomes:
yijt =β0,j t +β1x1ij +β2x2ij +ijt, ijt N(0, σ2),(1)
β0,jt =β0,t +β3x3j t +β4x4jt +β5x5jt +ujt, uj t N(0, σ2
u),(2)
β0,t =β0,t1+wt, wtN(0, σ2
w),(3)
where yijt denotes the score obtained by the ith student of the jth school at the tth year.
Note also that the variables measured at the school level are now indexed by time. The
age of the students can be considered fixed if we work, for example, with the age given
at the beginning of the experiment.
For this example, the coefficients related to gender and age are fixed. Note, however,
that they could also be varying by school, time or both. Similarly, the slope coefficients
at the second level of the hierarchy: β3,β4and β5, could also be varying over time.
4
In matrix notation, the above model can be written as
yt=X1tβ1,t +v1t,v1tNn(0,V1),(4)
β1,t =X2tβ2,t +v2t,v2tN3J(0,V2),(5)
β2,t =I4β2,t1+wt,wtN4(0,W), t 1,(6)
where Ikdenotes a k×kidentity matrix. ytis a vector of size nwhere the first n1
elements represent the scores of the students of the 1st school, the following n2elements
represent the scores of the students of the 2nd school, and so on, for a fixed time t. That
way, n=n1+n2+· · · +nJ.X1tis a matrix of dimension n×3J, obtained by the direct
sum of the matrices θ1j, j = 1,· · · , J, where θ1jis a nj×3 matrix with all the elements in
the firt column being equal to one, and the ith elements of the second and third columns
being the gender and the age of the ith student of the jth school. β1,t is a vector of order
3J×1 of coefficients, where the first n1sequences of three elements are given by β0,1t,β1
and β2(the coefficents corresponding to the first school); the following n2sequences of
three elements are given by β0,2t,β1and β2(the coefficents corresponding to the second
school), and so forth. v1tis a vector of errors n×1 and V1=Inσ2. Analogously, X2t
is a matrix 3J×4, obtained by the direct sum of the matrices θ2j, where θ2jis a 1 ×4
matrix with unitary elements in the first column and the other three columns being the
values of the covariates at the second level of the hierarchy for school j;β2,t and v2tare
vectors of coefficients and errors, respectively, of dimension 4 ×1 and V2=IJσ2
u.wt
is a vector of size 4, with the first element representing the error term in equation (3)
and the other elements being constant and equal to zero (as β3,β4and β5do not vary
in time). That way, Wis a 4 ×4 matrix with element [1,1] equal to σ2
wand the other
elements are equal to zero.
Now we are ready to present a general formulation for the Dynamic Hierarchical
5
Models, as introduced by Gamerman and Migon (1993). The model is composed of
three parts: an observation equation, structural equations and a system equation. The
observation equation describes the distribution of the observations through a regression
model (as in (4)); the structural equation describes the structure of hierarchy of the
regression parameters (as in (5)); and the system equation describes the evolution of the
parameters through time (as in (6)). For a linear hierarchical model of three levels, these
equations can be written, respectively, as:
observation equation:
yt=X1tβ1,t +v1t,v1tNn(0,V1t),(7)
structural equation:
β1,t =X2tβ2,t +v2t,v2tNr1(0,V2t),(8)
β2,t =X3tβ3,t +v3t,v3tNr2(0,V3t),(9)
system equation:
β3,t =Gtβ3,t1+wt,wtNr3(0,Wt),(10)
where nis the total number of observations, v1t,v2t,v3tand wtare disturbance terms
which are independent; X1t,X2t,X3tand Gtare known matrices possibly incorporating
explanatory variables; V1t,V2t,V3tand Wtare variance-covariance matrices that can
be allowed to vary over time; βiis a vector of coefficients of size ri, i = 1,· · · ,3, with
r1> r2> r3.
Hierarchical models of higher levels can be easily obtained adding extra levels in the
structural equations. The two levels model can be obtained setting X3tto be the identity
matrix, and V3tto be a matrix of null elements.
Other simple hypothetical examples are presented bellow to illustrate the use of these
6
models.
Example 2: weight measurements in a population of patients under treatment
Suppose that the variation of weight in a population of patients under the same kind
of experimental treatment is being investigated. Since the beginning of the experiment,
a different sample of patients is selected every week from the population and weighted,
for a total of Tweeks. Suppose that at the tth week, a sample of size ntis selected. To
model the variation of weight through time, we can use a simple two-stage model, given
by:
observation equation:
yit =βi,t +it, it N(0, σ2), i = 1,· · · , nt,
structural equations:
βi,t =µt+uit, uit N(0, σ2
u), i = 1,· · · , nt,
system equations:
µt=µt1+wt, wtN(0, σ2
w).
This model is a collection of the steady models of West and Harrison (1997), which are
related through similar mean levels. The βi,tsare the observation levels assumed to form
an exchangeable (with respect to index i) sample of means with common mean µt. Note
that the mean µtis allowed to vary over time - the treatment can cause average weight
loss or weight gain through time. This model can be written in the matrix notation as
in (7)-(10):
yt=X1tβ1,t +v1t,v1tNnt(0,V1t),(11)
β1,t =X2tβ2,t +v2t,v2tNnt(0,V2t),(12)
β2,t =Gtβ2,t1+wt, wtN(0, Wt),(13)
7
where yt= (y1t,· · · , yntt)T,β1,t = (β1,t,· · · , βnt,t )T,X1t=Int,V1t=Intσ2,X2t=
1nt, β2,t =µt,V2t=Intσ2
u,Gt= 1 and Wt=σ2
w.1nrepresents a n-dimensional vector of
ones.
Example 3: weight measurements in a population of children with malnutrition
The second example can be illustrated by the following experiment: A population
of children with malnutrition is being treated with a caloric diet. As in example 2, a
different sample (of size nt) of children is selected from the population at the tth week
of the experiment and weighted. Differently from the previous example, however,the
children are expected to gain weight through time. The model proposed in example 1
can therefore be modified to accommodate this expected growth in the average weight.
The proposed model for this example can be written as:
observation equation:
yit =βi,t +t, tN(0, σ2), i = 1,· · · , nt,
structural equations:
βi,t =µt+uit, uit N(0, σ2
u), i = 1,· · · , nt,
system equations:
µt=ρ1(µt1+δt1) + w1t, w1tN(0, σ2
w1),
δt=ρ2δt1+w2t, w2tN(0, σ2
w2).
Note that the expected growth of the mean µtis specified by the system equations.
This model can be represented in the matrix notation as in (11)-(13), where ytand β1,t
are defined as before and X1t=Int,V1t=Intσ2,β2,t = (µt, δt)T,X2t= (1nt,0nt), V2t=
Intσ2
u,Gt=
ρ1ρ1
0ρ2
and Wt=
σ2
w10
0σ2
w2
.0ndenotes a n-dimensional vector of
zeros.
8
3 Inference for Dynamic Hierarchical Models
Inference will be presented here from the Bayesian point of view. To do so, it will be
necessary to specify prior distributions for all the unknown parameters of the model.
More details about the Bayesian approach can be seen in Chapter 4. Our aim for this
kind of modeling is usually to obtain the posterior distribution of these parameters and
perform forecasting for future observations. In this section, some basic results which were
presented in Gamerman and Migon (1993) will be reviewed.
Let us define Dtas all the information obtained up to time t, including the observations
y={y1,· · · ,yt}and the prior information, represented by D0. We assume therefore
that Dt={yt, Dt1}. We can represent the (k-stage) dynamic hierarchical model as:
yt|β1,t Nn(X1tβ1,t,V1t),(14)
βi,t|βi+1,t Nri(Xi+1,t βi+1,Vi+1,t ), i = 1,· · · , k 1,(15)
βk,t|βk ,t1Nrk(Gtβk,t1,Wt),(16)
with initial prior βk,0|D0N(mk,0,Ck,0). Note that the matrices Xit , i = 1,· · · , k 1
and Gtare assumed known.
3.1 Updating and forecasting: known variances case
At this point, let us also assume that the variances Vit are known. Even though it is in
most cases an unrealistic assumption, it enables us to obtain the basic updating and fore-
casting operations. Under that assumption, Gamerman and Migon (1993) showed that
the model specified by equations (14)-(16), considering the prior βk,0|D0N(mk,0,Ck,0),
gives the following prior, posterior and predictive distributions:
9
the prior distribution at time tis given by
βi,t|Dt1N(ait ,Rit), i = 1,· · · , k, (17)
where ait =Xi+1,tai+1,t ,Rit =Xi+1,tRi+1,t XT
i+1,t +Vi+1,t, i = 1,· · · , k + 1, and
akt =Gtmk,t1and Rkt =GtCk,t1GT
t+Wt;
the predictive distribution one step-ahead is given by
yt|Dt1N(ft,Qt),(18)
where ft=X1ta1tand Qt=X1tR1tXT
1t+V1t;
the posterior distribution at time t is given by
βi,t|DtN(mit ,Cit), i = 1,· · · , k, (19)
where mit =ait +SitQ1
t(ytft),Cit =Rit SitQ1
tST
it,Sit =Rit ET
0it,Eij t =
Qj
l=i+1 Xlt,0i<jk, and Eiit =Irifor 1 i < k.
Gamerman and Migon (1993) also show that h-steps-ahead forecasts, h > 1, can be
easily obtained from this result. Suppose we are interested in predicting yt+hgiven Dt,
or in other words, we want to obtain the distribution of yt+h|Dt. Note that
yt+h|β1,t+hN(X1,t+hβ1,t+h,V1,t+h),and
βk,t|DtN(mk t,Ckt ).
Then, the distribution of βk,t+h|DtN(akt (h),Rkt(h)) can be recursively obtained
(West and Harrison, 1997) with akt(h) = Gt+hakt (h1) and Rkt(h) = Gt+hRkt(h
1)GT
t+h+Wt+hwith starting values akt(0) = mkt and Rkt (0) = Ckt. Successive integra-
tions give:
βi,t+1|DtN(ait (h),Rit(h)), i = 1,· · · , k 1
yt+h|DtN(ft(h),Qt(h)),
10
where ft(h) and Qt(h) are analogous to ftand Qtas defined previously, substituting ait
and Rit by ait(h) and Rit(h).
When a more accurate description of the model parameters is required, or if events in
the past are still of interest, a procedure called smoothing or filtering can be applied at
any given time point. This procedure filters back the information smoothly via a recursive
algorithm, allowing for parametric estimation of the system at previous times given all
available information at a given period of time. Thus, the smoothed distributions of state
parameters are denoted by [βi,t|Dn], as opposed to their online posterior distributions,
denoted by [βi,t|Dt]. Gamerman and Migon (1993) show that the smoothed distribution
for a dynamic hierarchical model is βi,t|DnN(mn
it,Cn
it), t = 1,· · · , n, i = 1,· · · , k,
with moments recursively defined as
mn
it =mit +AT
itR1
i,t+1(mn
i,t+1 ai,t+1),
Cn
it =Cit AT
itR1
i,t+1(Ri,t+1 Cn
i,t+1)R1
i,t+1Ait ,
and initialized at t=nwith mn
in =min and Cn
in =Cin, where Ait =Eik,t+1 Gt+1Ckt{(Iri
V
iktET
0itV∗−1
0kt E0it)Eik t}T, and V
ijt =Pj
l=i+1 Ei,l1,tVlt ET
i,l1,t.
3.2 Updating and forecasting: unknown variances case
As stated before, the hypothesis that the variances in the model are known is hardly
ever realistic in a real application. In many applications, however, is it reasonable to
suppose independence of the errors in the observation equation, such that Vit =σ2
itIN,
where Nis the dimension of yt. A conjugate analysis is possible when all the variances
Vit,Wtand Ck0are scaled by σ2, an unknown factor, with an Inverted Gamma prior
distribution: σ2|D0IG(n0/2, d0/2). The model can be written as in (14)-(16) with the
variances multiplied by the factor σ2.
11
The distribution of σ2can be updated at time tby IG(nt/2, dt/2) where nt=nt1+n
and dt=dt1+ (ytft)TQ1
t(ytft). The results presented by equations (13),
(14) and (15) remain valid except that all variances should be multiplied by σ2. The
predictive distribution (one-step ahead) in this case (after integrating σ2out) is obtained
by replacing the Normal distributions by a Student T distributions with ntdegrees of
freedom and substituting σ2by its estimate dt/nt. The posterior distribution is also
obtained by substituting the Normal by a Student T distribution, but in this case with
nt1degrees of freedom, and the estimate for σ2is given by dt1/nt1. We denote the
Student T distribution with mean m,νdegrees of freedom and variance-covariance matrix
Σby T(m, ν, Σ).
In a more realistic approach, however, just one unknown factor σ2is not enough to
handle all of the uncertainty at every level of the hierarchy. In general, a more realistic
representation considers at least one unknown parameter for the variance of the errors
of each hierarchical level - with the single parameter being the constant variance under
the assumption of independent errors. Models of this kind are not analytically tractable
and therefore it is necessary to make use of computational methods to simulate from
the distributions of interest (posterior and/or predictive), or make use of deterministic
approximation methods.
Numerical methods are a very important tool for applied Bayesian analysis. When
the posterior (or predictive) distribution of the unknown parameters is not analytically
tractable, samples can hardly ever be obtained directly from this distribution, and as an
alternative, simulation methods that sample indirectly from the posterior distribution can
be applied. One of the most used methods to obtain these samples is the Markov Chain
Monte Carlo (MCMC), a computer-intensive method where the idea is to sample from
12
a Markov chain whose equilibrium distribution is the posterior/predictive distribution
of interest. That way, after convergence is reached, the simulated values can be seen
as a sample of the distribution of interest, and inference can be performed based on
this sample. Different algorithms can be proposed in MCMC, with Gibbs Sampling and
Metropolis-Hastings being the most used. A broad discussion about these and others
algorithms can be found in Gamerman and Lopes (2006).
More recently, a promising alternative to inference via MCMC in latent Gaussian
models - the Integrated Nested Laplace Approximations (INLA) - was proposed by Rue
et. al. (2009). INLA is an approximation method for the posterior/predictive distri-
butions of interest, that unlike the empirical Bayes approaches (Fahrmeir et al. 2004),
incorporates posterior uncertainty with respect to hyperparameters. Other alternative
methods that have been successfully used for inference in problems of this kind are the
Particle Filters, which are sequential Monte Carlo methods based on point masses (or
“particles”) representations of probability densities, and are very attractive computa-
tionally. Examples of the use of these methods can be found in Liu and West (2001),
Johannes and Polson (2007) and Carvalho et. al. (2010), amongst others. Both these
methods (INLA and Particle Filters) can be applied to the Hierarchical Dynamic Models
presented here and extensions of these models that will be presented in chapter 5.
4 Model Extensions
In this section, extensions to the Dynamic Hierarchical Models will be presented, includ-
ing examples illustrating applications to real data-sets. In Section 4.1, the Matrix-variate
Dynamic Hierarchical Models will be presented as a multivariate extension to the uni-
variate case already introduced. In Section 4.2, a particular case of this model will be
13
considered, imposing a parametric structure to account for the spatial correlation be-
tween observations made in different locations in space. In Section 4.3, an extension to
observations in the exponential family will be presented.
4.1 Matrix-variate Dynamic Hierarchical Models
Suppose qobservations are made through time in rdifferent locations in space such
that y1t,· · · ,yqt represent qvectors of observations of dimension r×1. Data with this
structure can be easily found in the literature for the study of environmental processes.
As an example, in many locations around the world, monitoring stations are implemented
to monitor the air quality, measuring concentrations of all kinds of pollutants at a certain
periodicity. qobservations of certain pollutants made through time, and registered at
rdifferent locations, can be seen as qvectors of observations of dimension r×1. In
this section, the Hierarchical Dynamic Models for univariate responses will be extended
to responses of this kind, leading to the Matrix-variate Hierarchical Dynamic Models.
To introduce the notation, Section 4.1.1 presents the Matrix-variate Dynamic models.
The Matrix-variate Hierarchical Dynamic Models are then presented in Section 4.1.2 as
a simple extension. Inference to these models is presented in Section 4.1.3, and finally
Section 4.1.4 presents an application.
4.1.1 Matrix-variate Dynamic Models
Let y1t,· · · ,yqt represent qvectors of observations of dimension r×1, with the ith element
of each vector representing an observation made at location i. If we denote
yt= [y1t,· · · ,yqt],
14
we have a matrix of observations through time. West and Harrison (1989) define a
Matrix-variate Dynamic Model by the equations:
yt=Xtβt+vt,
βt=Gtβt1+wt,
where
ytis the r×qobservations matrix in time t;
Xtis a known matrix of regressors r×p;
βtis the matrix of unknown parameters;
vtis a matrix of errors;
Gtis a known evolution matrix p×p;
wtis the matrix of evolutions of errors p×q;
It is assumed that βtand vtare independent, βt1and wtare independent, and
vtand wtare independent. Under the hypotheses of normality, the errors vtand wt,
which are matrices, will follow a matrix-variate normal distribution. If Zis a random
matrix of dimension r×q, we say that Zhas a matrix-variate normal distribution with
right covariance matrix Σand left covariance matrix C, denoted by ZN(M,C,Σ), if
vec(Z)Nrq (vec(M),ΣC), where vec(Z) represents the vectorization of the matrix
Zin columns.
As an example, suppose that qtypes of pollutants are observed in rdifferent locations
in space, resulting in a r×qmatrix of observations, for a fixed period of time t. If this
matrix follows a N(M,C,Σ) distribution, then Mis a r×qmatrix of means, the left
15
matrix Cis r×r, and represents the correlation of the observations in space, and the
right matrix Σis q×qand represents the correlation between the different kinds of
pollutants. Note that in this model, Cand Σare not identifiable, and N(M,C,Σ) is
equivalent to N(M,C/k, kΣ), k ∈ <. If the observations of each variable of interest are
not correlated in space at a given period of time t, then C=Ir. If the observations are
independent at a fixed period of time and fixed location in space, then Σ=Iq.
4.1.2 Hierarchical Formulation of the Model
The Matrix-variate Hierarchical Dynamic Models are an extension of the model described
above, and it is an extension of the Hierarchical Dynamic Models, where for each period
of time, we observe a matrix of observations instead of a vector. This model can be
written, as defined by Landim and Gamerman (2000), as follows:
yt=X1tβ1,t +v1t,v1t|ΣN(0,V1,Σ),(20)
β1,t =X2tβ2,t +v2t,v2t|ΣN(0,V2,Σ),(21)
.
.
.
βk1,t =Xktβk ,t +vkt,vkt|ΣN(0,Vk,Σ),(22)
βk,t =Gtβk,t1+wt,wt|ΣN(0,W,Σ),(23)
where, for a fixed period of time t,
ytis the r×qobservations matrix;
βi,t, i = 1,· · · , k are regression coefficients matrixes of order ri×q;
Xit are known matrices of regressors of order ri1×ri, with r0=r,i= 1,· · · , k;
Gtis the known evolution matrix rk×rk;
16
wtis the matrix of evolutions of errors p×q.
We assume that βi,t and vit are independent, i= 1,· · · , k,βk,t1and wtare inde-
pendent, and v1t,v2t,· · · ,vkt and wtare independent. Matrices V1,V2,· · · ,Vkand W
are considered to be known, and for simplicity, are considered to be fixed over time.
Note that a restriction is made, stating that part of the conditional covariance between
elements of yt, which account for the different responses, is specified in Σ. The same
restriction is imposed to the conditional covariances of β1,t,β2,t,· · · ,βk,t , with all sharing
the same structure Σto account for the covariance between elements coming from differ-
ent responses. This assumption is usually made in multivariate dynamic models (see, for
example, West and Harrison (1997, ch. 16) and Quintana (1987)), and it is reasonable
since the magnitude of the covariance between elements at different hierarchical levels are
not imposed to be the same. The models above could be specified and operated without
this restriction but that would substantially increase parameter dimensionality.
To complete the specification of the model presented above, it is necessary to specify
prior distributions for the unknown parameters of the model. Usual choices of priors,
used by West and Harrison (1997), Quintana (1987) and Landim and Gamerman (2000),
are the following:
βk,0|Σ, D0N(Mk,0,Ck,0,Σ),and
Σ|D0W I (n0,S0).
4.1.3 Inference
As in the beginning of Section 3, we will start this section presenting results of inference
based on the assumption that V1,V2,· · · ,Vkand Ware known. Conditionally on the
value of Σ, it is possible to obtain the distributions of βi,t|Dt1,yt|Dt1and βi,t |Dt
17
analytically. These results are given below, and the demonstration can be found in
Landim and Gamerman (2000). For t= 1,2,· · · and i= 1,· · · , k,
a) βi,t|Dt1N(ait ,Rit,Σ),
ait =Xi+1ai+1,t , i = 1,· · · , k 1,akt =GtMk,t1,
Rit =Xi+1,tRi+1,t XT
i+1,t +Vi+1,t,1=1,· · · , k, Rkt =GtCk,t1GT
t+Wt;
b) yt|Dt1N(ft,Qt,Σ),
ft=X1ta1t,Qt=X1tR1tFT
1t+V1t;
c) θi,t|DtN(Mit ,Cit,Σ),
Mit =ait +SitQ1
tEt,Sit =RitE0RT
it,
Et=ytft,
Cit =Rit SitQ1
tST
it.
The posterior distribution of Σcan also be obtained analytically, and it is given by:
Σ|DtW I (nt,St),
with nt=nt1+Nand St=St1+ET
tQ1
tEt. Unconditionally on Σ, the distributions
of Θit|Dt1,yt|Dt1and βi,t |Dtbecome Student T, given by:
a) βi,t|Dt1T(ait ,Rit, nt,St),
b) yt|Dt1T(ft,Qt, nt,St),
c) βi,t|DtT(Mit ,Cit, nt,St),
18
where T(a,R, n, S) denote the Matrix Variate Student T distribution with mean a, left
side covariance matrix R,ndegrees of freedom and right side covariance matrix S. If Z
is a random matrix such that ZT(a,R, n, S) then vec(X)T(vec(a), n, RS).
Equivalently to the case of Univariate Hierarchical Dynamic Linear Models, forecasts
h-steps-ahead, or the distribution of (yt+h|Dt,Σ), can be obtained.
When the variance matrices V1,V2,· · · ,Vk,Wand Σare unknown, it is not possible
to obtain the distributions of βi,t|Dt1,yt|Dt1and βi,t |Dtanalytically. In that case,
numerical methods can help obtaining approximations for these distributions.
Under the Bayesian point of view, prior distributions must be specified for these
quantities. Landim and Gamerman (2000) considered independent Inverted Wishart prior
distributions for all of the variance-covariance matrices, and a Normal prior distribution
for βk,0|Σ, D0, given by βk,0|Σ, D0N(Mk0,Ck0,Σ).
The joint posterior distribution of ({β1},· · · ,{βk},V1,· · · ,Vk,W,Σ) is given by
p({β1},· · · ,{βk},V1,· · · ,Vk,W,Σ|y)
T
Y
t=1
p(yt|β1,t,V1,Σ)
×p(W)p(Σ)p(βk,0)|Σ)
k
Y
i=1
p(Vi)
T
Y
t=1
k1
Y
i=1
p(βi,t|βi+1,t ,Vi+1,Σ)
T
Y
t=2
p(βk,t|βk ,t1,W,Σ).
Even though it is not possible to obtain the posterior distribution of the model pa-
rameters analytically, it is possible to obtain their full conditional distributions. Landim
and Gamerman (2000) showed that the full conditional distribution for each of the pa-
rameters βi,t, t = 1,· · · , n,i= 1,· · · , k is Normal and the full conditional distribution
for the variance parameters V1,V2,· · · ,Vk,Wand Σis Inverted Wishart. That is, it
is possible to obtain samples from the posterior distribution of these parameters using
Gibbs Sampling, which is an MCMC algorithm. Details of how to obtain the full condi-
tional distributions and the proposed algorithm can be seen in Landim and Gamerman
19
(2000).
4.1.4 Application to the occupied population and average salary in Brazil
Landim and Gamerman (2000) presented an application to jointly model the popula-
tion at work and the average salary for the biggest Metropolitan regions in Brazil: Rio
de Janeiro (RJ), Salvador (SAL), S˜ao Paulo (SP), Belo Horizonte (BH), Porto Alegre
(PA) and Recife (RE). The data was made available by Instituto de Pesquisa Econˆomica
Apliacada (IPEA), in Brazil.
Observations were monthly made from May, 1982 to December, 1996. Preliminary
analysis showed a seasonal pattern for both variables in time. Landim and Gamerman
(2000) worked with the series aggregated at every three months, having a total of T= 56
periods of time of observation.
The original data was transformed via a logarithm transformation which made the
observations closer to those coming from a Normal distribution. Also, the series of salaries
was transformed using the INPC index, which is a National Index of prices for Consumer,
using December 1996 as a base. That way, the salaries were all in the same scale of reais
(R$) of December 1996. Figure 1 shows the aggregated series of the logarithm of people
at work and average salary for the six Metropolitan regions, showing that both variables
change significantly and smoothly in time. Note that this example concerns two response
variables which are varying not only in time but also in space. Preliminary analysis
showed that it is reasonable to consider that the observations follow an autoregressive
model in time and that the coefficients of this model vary in time and space.
Let ytbe a 6×2 matrix with element [i, 1] representing the logarithm of the population
size in the ith region, and [i, 2] representing the logarithm of the average salary in the ith
20
(a)
(b)
Figure 1: (a) logarithm of people at work; and (b) logarithm of average salary, for each
one of the six regions through time.
region for a fixed period of time t. That way, the response variable in the model can be
represented by
yt=
y11ty12t
.
.
.
y61ty62t
.
21
Landim and Gamerman (2000) suggest a dynamic autoregressive model of order 1 to
model this response matrix, where the observation equation for the element [i, j] of ytis
given by:
yijt =β1,ij t +β2,ijtyi1,t1+β3,ijtyi2,t1+ijt,
i= 1,· · · ,6, j = 1,2 and t= 1,· · · ,56. In matrix notation, we can write:
yt=X1tβ1,t +v1t,v1t|ΣN(0,V1,Σ),
where X1tis given by:
X1t=
1y11,t1y12,t10· · · 000
.
.
.
0000 · · · 1y61,t1y62,t1
,
and the matrix of autoregressive coefficients is given by:
βT
1,t =
β1,11tβ2,11tβ3,11t· · · β1,61tβ2,61tβ3,61t
β1,12tβ2,12tβ3,12t· · · β1,62tβ2,62tβ3,62t
.
For each j= 1,2, and fixed period of time t, the parameters β1, β2and β3are supposed
to have the same mean for every location i, as specified by the equations bellow:
β1,ijt =β1,j t +v21,ijt,
β2,ijt =β2,j t +v22,ijt,
β3,ijt =β3,j t +v23,ijt,
where i= 1,· · · ,6. That way, we can write the matrix β1,t in the form:
β1,t =X2tβ2,t +v2t,v2t|ΣN(0,V2,Σ),
22
where X2tis 18 ×3 and it is given by
XT
2t=
1 0 0 · · · 1 0 0
0 1 0 · · · 0 1 0
0 0 1 · · · 0 0 1
= 1T
6I3,
the hyperparameters matrix is given by
β2,t =
β1,1tβ1,1t
β2,1tβ2,2t
β3,1tβ3,2t
,
and
vT
2t=
v21,11tv22,11tv23,11t· · · v21,61tv22,61tv23,61t
v21,12tv22,12tv23,12t· · · v21,62tv22,62tv23,62t
.
The evolution matrix is given by:
β2,t =β2,t1+wt,wtN(0,W,Σ).
To complete the model, Inverted Wishart distributions were set as the prior distribu-
tions for the parameters V1,V2,Wand Σ, and a Normal distribution was set as the
prior distribution for β2,0|Σ.
Landim and Gamerman (2000) used an MCMC algorithm to obtain samples from
the posterior distribution of the unknown model parameters. After 59500 iterations,
convergence was achieved and the last 5000 iterations were used as samples of the pos-
terior distribution. They obtained credibility intervals for each unknown parameter of
the model. One important finding was that the correlations in matrix V1were not sig-
nificantly different from zero, which indicates independence between observations made
in different regions. Besides that, there is a strong indication that matrix V2could be
23
written as a block diagonal matrix. These hypothesis were incorporated into the model,
and it was re-estimated. Convergence for this case was faster. The posterior distributions
of coefficients β2showed that past observations of population size are a good explanatory
variable for the present population size, but do not explain much of the average salary.
The same way, past observations of average salary can explain part of the variability of
the present average salary, but do not explain much of the population size. Matrix Σ
showed small but positive correlation between the response variables.
4.2 Spatially Structured Matrix-variate Dynamic Hierarchical
Models
In this section we describe a class of models proposed by Paez et al. (2008) which
are a particular case of the Matrix-variate Dynamic Hierarchical Models of Landim and
Gamerman (2000), but impose a parametric structure to account for the spatial correla-
tion between observations made in different locations in space. With that restriction, the
spatial correlation can be captured without the need of estimating completely unknown
covariance matrices. A spatially structured matrix can be specified with a small number
of parameters, and as a consequence, interpolation of the response variable can be made
for any fixed period of time, to any set of non-observed locations in space.
Gelfand et al. (2005) use a similar approach to deal with this same kind of prob-
lem, where the temporal variance is described by dynamic components which capture
the association between measurements made for a fixed location in space and period of
time, as well as the dependence between space and time themselves. The multivariate
space processes described in that paper were developed through the use of corregion-
alization techniques and non-separable space-time covariance structures. They applied
24
the proposed methodology to model the variation of monthly maximum temperature in
the State of Colorado, U.S.A., using the monthly mean precipitation as an explanatory
variable.
Other applications of multilevel modeling taking spatial structures into consideration
can be seen in Chapter 33.
4.2.1 General model Framework
The model presented in this section is a special case of the model described by equations
(20) to (23). Without loss of generality, the model will be described for a two-level
hierarchical dynamic model. A simple extension can be made to obtain models of higher
hierarchical levels. Suppose we observe q(q > 1) response variables in a discrete set
of periods of time t= 1, ..., T and set of locations {s1, ..., sr}in a continuous space S.
Analogously to equations (20) to (23), the matrix-variate hierarchical dynamic space-time
model can be written as:
yt=X1tβ1,t +v1t,v1tN(0,V1,Σ),(24)
β1,t =X2tβ2,t +v2t,v2tN(0,V2,Σ),(25)
β2,t =Gtβ2,t1+wt,wtN(0,W,Σ),(26)
for t= 1, ..., T . We assume that the matrices X1t,X2tand Gtare known, with X1t
and X2tpossibly incorporating the values of explanatory variables. The dimensions of
the matrices in equation (24)-(26) are the same as specified previously for the equations
(20)-(23).
A spatial structure can be incorporated in the variance matrices V1and V2, being
specified through parametric structures describing spatial dependency. In particular, one
can define Vi=CV, which corresponds to vit N(0,Viρi(λi,·),Σi), where Cis the
25
matrix specified through the correlation function ρi(λi,·), i= 1,2.For a fixed period of
time tand given the parameter λi, the correlation function ρi(λi,·) gives the correlation
between the elements of vit, i = 1,2.This function can be appropriately specified in
order to model spatial dependency, giving higher correlation between elements which
correspond to observations made closer in space.
Paez et al. (2008) suggest independent Inverted Wishart prior distributions for the
covariance matrices and matrix-variate Normal prior distribution for (β2,0|Σ). A prior
distribution must also be specified for the parameters λi, i = 1,2,which define the spatial
correlation matrices. Inference for this model, including the estimation of the model
parameters, time forecasting and spatial interpolation, can be performed using MCMC
methods as described in Paez et al. (2008).
4.2.2 Pollution in the Northeast of the United States
This example was presented in Paez et al. (2008), and refers to a data-set made available
by the Clean Air Status and Trends Network (CASTNet) at the web site www.epa.gov/castnet.
CASTNet is the primary source for data on dry acidic deposition and rural ground-level
ozone in the United States. It consists of over 70 sites across the eastern and western
United States and is cooperatively operated and funded by the United States Environ-
mental Protection Agency (EPA) with the National Park Service.
When dealing with pollutants (and most environmental data-sets), the main interest
is usually to be able to explain some possible causes of variation as well as to be able
to perform predictions. In this particular example, the interest is to model and make
predictions of levels of two pollutants: SO2and NO3, which were measured in micrograms
per cubic meter of air (µg/m3). In this example, Paez et al. (2008) work with observations
26
Figure 2: Map of the 24 monitoring stations in the northeast United States. The coordi-
nates Latitude and Longitude are expressed in decimal degrees.
made in 24 monitoring stations from the 1st of January 1998 to the 30th of December
2004, in a total of 342 periods of time. The data, however, is not completely available, and
about 4% of it is missing. Under the Bayesian approach, handling missing observations
is straightforward, as they can be considered as unknown parameters of the model and
can be estimated if needed.
Figure 2 shows a map of the 24 monitoring stations in the northeast United States
and Figure 3 shows the trajectory of the time series observed at five randomly selected
locations in space, after a logarithmic transformation. No explanatory variables are
available to explain the variation of these pollutants. However, it is clear by Figure 3
that part of the time variation can be explained by seasonal components, such as sine and
co-sine waves. ] Exploratory analysis clearly show that the levels of the time series vary
with the location where they were measured. Also, it can be noticed that the amplitude
of their oscillation varies as well. The mean level of the series is also varying in time
27
Figure 3: Series of log(SO2) and log(NO3) through time for five randomly selected mon-
itoring stations.
(it is slowly decreasing), as well as their amplitude (which is slowly increasing). These
variations suggest the use of a hierarchical model with varying intercept and also varying
coefficients of sine and co-sine waves. It is intuitive, however, that these coefficients are
not independent for each period of time or location in space, but rather vary smoothly
in these dimensions. This should also be considered in the model spacification.
A logarithmic transformation was applied to the observed levels of both pollutants.
Preliminary analysis showed strong correlation between the transformed response vari-
ables log(SO2) and log(NO3). Paez et al. (2008) worked with observations of 22 moni-
toring stations and left the other 2 to validate the models (stations LRL and SPD, which
can be seen in Figure 2).
The authors compared the interpolation performance between two univariate models
28
(one for each response variable) and one bivariate model (where the two response variables
are jointly modeled). They showed an advantage of working under the multivariate model,
which was described by the authors through the following equations:
yt=X1tβ1,t +v1t,v1tN(0,Ir,Σ),
β1,t =β2,t +v2t,v2tN(0,CV,Σ),
β2,t =β2,t1+wt,wtN(0,W,Σ),
where yt= (y1t,y2t), with y1t=log(SO2)tand y2t=log(N O3)t. Sinuses and co-
sinuses waves were used as explanatory variables to explain the seasonality present in
the observations, so for each location si,X1t(si) = (1, sin(2πt/52), cos(2πt/52)), i =
1,· · · ,22, with X1t= (X1t(s1),· · · ,X1t(sr))T. Note that this model is a special case of
the model specified by equations (24)-(26), where V1=Ir,V2=CV,X2t=I3and
Gt=I3. The spatial structure is specified through a spatial correlation function which
defines matrix C.
In this application, the spatial correlation function was specified in the Mat´ern family
(Mat´ern, 1986; Handcock and Stein, 1993), which is given by
C(i, j) = 1
2κ1Γ(κ)dij
φκ
Kκdij
φ, φ, κ > 0,
where Kκis the modified Bessel function of order κ. This is a flexible family, containing
the exponential function (φ= 0.5), and the squared exponential function, which is the
limiting case when κ→ ∞.κis a range parameter that controls how fast the correlation
decays with distance, and φis a smoothness (or roughness) parameter that controls
geometrical properties of the random field. dij is a measure of distance between the
locations of observation.
As the scale and range parameters in the Mat´ern family cannot be estimated consis-
29
Figure 4: Posterior median and 95% credibility intervals for the elements of matrix β2
through time.
tently (Zhang, 2004), Paez et al. (2008) worked with a fixed value of κ= 2.5, and gave
to φa prior distribution based on the reference prior of Berger, de Oliveira and Sans´o
(2001). The choice of κ= 2.5 was made based on forecast performances comparing a set
of different values for this parameter. Non-informative priors were specified for the other
unknown parameters of the model.
Samples from the posterior distributions of the parameters were obtained through an
MCMC algorithm. The elements of matrix β2were shown to change significantly in time,
as can be seen in Figure 4, for both log(SO2) and log(N O3). Thus, the use of temporally
varying coefficients seems to be justified in this application. An interpolation of β1was
performed at an equally spaced grid of points, for a fixed period of time t= 342. Figure 5
shows a significant spatial variation of the elements of β1, which supports the importance
30
Figure 5: Posterior median of the elements of matrix β1through space.
of allowing these parameters to vary in this dimension. Thus, the use of spatially varying
coefficients also seems to be justified in this application. The variation of these elements
is smooth in space, specially for the coefficients of sine and co-sine of log(N O3). Based on
the interpolated coefficients β1, the response process ytis also interpolated at t= 342.
Each value sampled from y1,342 and y2,342 received an exponential transformation, so that
a sample of values in the original scale of the pollutants SO2and NO3were obtained.
The spatial variation of the posterior median of these pollutants is shown, in the original
scale, in Figure 6. Finally, Figures 7 and 8 show the predictions (95% credibility intervals
and median of the posterior distribution) made for SO2(Figure 7) and NO3(figure 8),
for the two stations (SPD and LRL) which were used for validation. The true observed
31
Figure 6: Predicted surface of SO2and NO3levels (µ/m3) for t= 342.
trajectories are also shown, for comparison. It can be seen that the obtained predictions
are good, with most of the real observations following inside the posterior credibility
intervals.
4.3 Dynamic Hierarchical Models: Exponential Family Obser-
vations
The methodology which will be introduced here was presented by Hansen (2009) and it
is an extension of the previous models to observations in the exponential family.
The motivation for this extension is that, sometimes, it is not possible to work under
the hypothesis of normality of the observations that otherwise could be modeled through
the Hierarchical Dynamic Models with structure in space proposed by Paez et al. (2008).
Measurements of rainfall are an example of observations that could benefit from this
new approach, as they cannot be assumed to be normally distributed. It is reasonable,
however, to assume that they come from another distribution belonging to the exponential
32
Figure 7: SO2levels (µ/m3) observed at stations SPD and LRL, for t= 1,· · · ,342, and
the posterior median and 95% credibility intervals obtained for SO2. The true trajectory
is in gray.
family.
4.3.1 General Model Framework
Consider a set of discrete periods of time: t= 1,· · · , T , where for every t,qvariables
are observed in rdifferent locations in space s1,· · · , sr. Let ytbe the r×qobservation
matrix. Suppose that the distribution of ytbelongs to the exponential family. A family of
distributions is said to belong to the exponential family if the probability density function
(or probability mass function, for discrete distributions) can be written as:
f(x|λ) = h(x)exp s
X
i=1
ηi(λ)Ti(x)A(λ)!,
where h(x), A(λ), Ti(x) and ηi(λ), i= 1,· · · , s are known functions. We will use the
notation XEF (µ) to denote that the random variable (either a scalar, vector or
33
Figure 8: NO3levels (µ/m3) observed at stations SPD and LRL, for t= 1,· · · ,342, and
the posterior median and 95% credibility intervals obtained for N O3. The true trajectory
is in gray.
matrix) Xfollows a distribution coming from the exponential family with mean µ.
Suppose that ythas mean φtand that φtcan be modeled through a function of a
regression equation in which the covariate effects vary smoothly through time and space.
This function is called link function and it links the linear predictor to the mean of
the distribution function as in the Generalized Linear Models (Nelder and Weddenburn,
1972). The model can be specified as bellow:
ytEF (µt),(27)
g(µt) = X1tβ1,t +v1t,v1tN(0,V1t,Σ),(28)
β1,t =X2tβ2,t +v2t,v2tN(0,V2t,Σ),(29)
β2,t =GtΘ2,t1+wt,wtN(0,W,Σ),(30)
where g(µt) is a known link function and the other quantities are defined as in (24)-(26).
34
4.3.2 Application: Rainfall data-set in Australia
In this section we present an application of the models presented above to a single response
variable consisting of measurements of rainfall in r= 15 monitoring stations in Australia
(Hansen, 2009). Here the observations can be assumed as coming from a Gamma distribu-
tion, which belongs to the exponential family. We denote by yt(si) the amount of rainfall
observed in time tand location si. We assume that yt(si) follows a Gamma distribution
with mean φt(si) and coefficient of variation ηdenoted by yt(si)G(φt(si), η).
Exploratory analysis showed that the observations present a clear seasonal cycle of
one year. Hansen (2009) worked with rainfall amounts aggregated by year to eliminate
the seasonal effect. No explanatory variables were considered in the model. Another
possibility would be to consider seasonal covariates. To model the rainfall observations,
capturing time and space dependencies, Hansen (2009) proposed the use of a particular
case of the model in (27)-(30) for a univariate response, where X1t=Ir,X2t=1r,
Gt= 1, V1t= 0 ×Ir,V2t=Cand W=σ2
w. A spatial structure is incorporated through
the specification of the elements of matrix C. In this application, these elements are
specified by the exponential family, where C[i, j] = ρexp{−λdi,j}. Note that this is an
isotropic covariance function, which depends on the locations of observation siand sj
only through the distance between them (di,j), and depends on two unknown parameters
ρand λ. Hansen (2009) also works with a logarithmic link function. The model can be
35
written as follows:
yt(si)G(φt(si), η),(31)
log(φt(si)) = β1,t (si),(32)
β1,t =β2,t +v2t,v2tN(0,C),(33)
β2,t =β2,t1+wt, wtN(0, σ2
w),(34)
where β1,t = (β1,t(s1),· · · , β1,t (sr)). To complete the model specifications, prior distribu-
tions for the unknown model parameters must be specified. Hansen (2009) works with
vaguely informative priors. As in the previous applications, β2,0follows a Normal dis-
tribution and Wfollows an Inverted Gamma distribution. The prior distribution for ρ,
which is also a variance parameter, was set to be an Inverted Gamma as well. Gamma
priors were specified for ηand λ. Samples of the posterior distribution of the unknown pa-
rameters were obtained through MCMC, using Gibbs Sampling and Metropolis-Hastings.
Table 1 shows statistics of the posterior distribution of the fixed parameters of the model.
parameter mean 2.5% median 97.5%
η0. 0.09 0.18 0.22
ρ0.80 0.67 0.79 0.94
λ0.08 0.07 0.08 0.10
σ2
w0.08 0.05 0.08 0.14
Table 1: Statistics from the posterior samples obtained for the fixed parameters.
It can be noticed that λis centered around 0.08 and ρis centered around 0.8. Taking
into account the distances between the monitoring stations, these estimated values define
a covariance matrix Cwith values that vary from 0.16 to 0.70. That way we conclude
that the model captured some high correlations between observations made in different
locations in space. The posterior samples obtained for σ2
wshow high probability density
36
around 0.08, meaning that the parameters β2,t do not vary much in time. The spatial
variation of these parameters, however, is higher, with the estimated value for the spatial
variance parameter ρbeing around ten times larger than the estimated value for σ2
w.
5 Concluding Remarks
In this chapter we presented a methodological review on hierarchical dynamic models and
generalizations of these models to accommodate multivariate responses, spatial variation
of regression coefficients and observations in the exponential family.
The models presented here are very flexible, permitting the smooth variation of re-
gression coefficients in time and/or space, and they can be applied to model data in
many areas of interest. In this chapter we presented examples and applications made
with real data-sets focusing mainly on environmental problems. The interest in this kind
of application is usually to do time forecasting and interpolation in space, which can be
easily done under the proposed methodology.
Inference is made under the Bayesian point of view. Usually for models like the
ones presented in this chapter, the posterior distributions of the unknown parameters are
not analytically tractable, and numerical methods must be used to approximate these
distributions. In the applications presented here, MCMC methods were used.
37
Acknowledgments
The authors would like to thank Dr. Aline Maia for providing us with the Australian
rainfall data-set and Dr. Fl´avia Landim for providing us with the occupied population
and mean salary data-set for the Metropolitan regions in Brazil. They would also like to
thank CNPq and FAPERJ for the financial support.
References
[1] Berger, J.O., De Oliveira, V. and Sanso, B. (2001). Objective Bayesian analysis of
spatially correlated data. Journal of the American Statistical Association,96, 1361-
1374.
[2] Fahrmeir, L., Kneib, T. and Lang, S. (2004). Penilized structured additive regression
for space-time data: a Bayesian perspective. Statistica Sinica,14(3), 731-761.
[3] Gamerman, D. and Lopes, H. F. (2006). Monte Carlo Markov Chain: Stochastic
Simulation for Bayesian Inference, 2nd. edition. Chapman and Hall, London.
[4] Gamerman, D. and Migon, H.S. (1993). Dynamic hierarchical models. Journal of the
Royal Statistical Society: Series B,55, 3, 629-642.
[5] Gelfand, A. E., Banerjee, S. and Gamerman, D. (2005). Spatial process modelling
for univariate and multivariate dynamic spatial data. Environmetrics,16, 465-479.
[6] Handcock, M. S. and Stein, M. L. (1993). A Bayesian analysis of Krigging. Techno-
metrics,35, 403-410.
38
[7] Hansen, N. (2009). Models with dynamic coefficients varying in space for data in the
exponential family. Unpublished Msc dissertation, IM, Universidade Federal do Rio
de Janeiro (in Portuguese).
[8] Harrison, P. J. and Stevens, C. F. (1979). Bayesian Forecasting. Journal of the Royal
Statistical Society Series B,38, 205-247.
[9] Johannes, M., Polson, N.G. (2007) Exact Particle Filtering and Learning. Working
Paper. The University of Chicago Booth School of Business.
[10] Landim, F. M. P. F. and Gamerman, D. (2000). Dynamic hierarchical models - an ex-
tension to matrix variate observations. Computational Statistics and Data Analysis,
35, 11-42.
[11] Liu, J. and West, M. (2001). Combined parameters and state estimation in
simulation-based Filtering. Sequential Monte Carlo Methods in Practice. Springer-
Verlag, New York.
[12] Carvalho, C. M., Johannes, M. S., Lopes, H. F. and Polson, N. G. (2010). Particle
Learning and Smoothing. Statistical Science,25(1), 88-106.
[13] Mat´ern, B. (1986). Spatial Variation, 2nd. edition. Berlin: Springer-Verlag.
[14] Nelder, J. A. and Wedderburn, R.W.M. (1972). Generalized linear models. Journal
of the Royal Statistical Society: Series A,135, 370-384.
[15] Paez, M. S., Gamerman, D., Landim, F. M. P. F. and Salazar, E. (2008). Spatially
varying dynamic coefficient models. Journal of Statistical Planning and Inference,
138, 1038-1058.
39
[16] Quintana, J. (1987). Multivariate Bayesian forcasting models, unpublished Ph.D.
Thesis, University of Warwick.
[17] Rue, H., Martino, S., and Chopin, N. 2009. Approximate Bayesian inference for
latent Gaussian models by using integrated nested Laplace approximations (with
discussion). Journal of the Royal Statistical Society Series B,71(2), 319-392.
[18] West, M. and Harrison, P. J. (1989). Bayesian Forecasting and Dynamic Model, New
York: Springer.
[19] West, M. and Harrison, P. J. (1997). Bayesian Forecasting and Dynamic Model, 2nd.
edition. London: Springer.
[20] Zhang (2004). Inconsistent estimation and asymptotically equal interpolations in
model-based geostatistics. Journal of the American Association,99, 250-261.
40
... Under the simplifying assumption of normality, Carter and Kohn (1994) outline a Markov Chain Monte Carlo implementation for estimating the posterior distribution of a dynamic state-space model. More recent work has offered implementations for more complicated dynamic state-space models [Paez and Gamerman (2013)]. However, all of these modeling approaches are inherently complicated (and usually computationally intensive) because the entire time-varying parameter vector θ 1:T must be estimated. ...
Article
Full-text available
While time series prediction is an important, actively studied problem, the predictive accuracy of time series models is complicated by nonstationarity. We develop a fast and effective approach to allow for nonstationarity in the parameters of a chosen time series model. In our power-weighted density (PWD) approach, observations in the distant past are down-weighted in the likelihood function relative to more recent observations, while still giving the practitioner control over the choice of data model. One of the most popular nonstationary techniques in the academic finance community, rolling window estimation, is a special case of our PWD approach. Our PWD framework is a simpler alternative compared to popular state–space methods that explicitly model the evolution of an underlying state vector. We demonstrate the benefits of our PWD approach in terms of predictive performance compared to both stationary models and alternative nonstationary methods. In a financial application to thirty industry portfolios, our PWD method has a significantly favorable predictive performance and draws a number of substantive conclusions about the evolution of the coefficients and the importance of market factors over time.
Article
Full-text available
This article reports the results of an exercise in which dynamic models are "fitted" for some Brazilian macroeconomics time series, useful for the current economic analysis. Its forecasting performance is compared with classical structural and transfer function models using trading days as regressor. The Bayesian forecasting methodology is briefly presented including an alternative way to estimate the parameters of an autoregressive processo
Article
We consider inference in the class of conditionally Gaussian dynamic models for non-normal, multivariate time series. In such models, data are represented as drawn from non-normal sampling distributions whose parameters are related both through time and hierarchically across several multivariate series. A key example, the main focus here, is that of time series of multinomial observations, a common occurrence in sociological and demographic studies involving categorical count data. The development is, however, presented in a more general setting, as the resulting methods apply beyond the multinomial context. We discuss inference in the proposed model class via a posterior simulation scheme based on appropriate modifications of existing Markov chain Monte Carlo algorithms for normal dynamic linear models, and including Metropolis-Hastings components. We develop analysis of time series of flows of students in the Italian secondary education system as an illustration of the models and met...
Article
S ummary This paper describes a Bayesian approach to forecasting. The principles of Bayesian forecasting are discussed and the formal inclusion of “the forecaster” in the forecasting system is emphasized as a major feature. The basic model, the dynamic linear model, is defined together with the Kalman filter recurrence relations and a number of model formulations are given. Multi‐process models introduce uncertainty as to the underlying model itself, and this approach is described in a more general fashion than in our 1971 paper. Applications to four series are described in a sister paper. Although the results are far from exhaustive, the authors are convinced of the great benefits which the Bayesian approach offers to forecasters.
Article
The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components). The implications of the approach in designing statistics courses are discussed.
Article
An analysis of a time series of cross-sectional data is considered under a Bayesian perspective. Information is modelled in terms of prior distributions and stratified parametric linear models developed by Lindley and Smith and dynamic linear models developed by Harrison and Stevens are merged into a general framework. This is shown to include many models proposed in econometrics and experimental design. Properties of the model are derived and shrinkage estimators reassessed. Evolution, smoothing and passage of data information through the levels of the hierarchy are discussed. Inference with an unknown scalar observation variance is drawn and an extension to the non-linear case is proposed.
Article
In this paper, we provide an exact particle filtering and parameter learning algorithm. Our approach exactly samples from a particle approximation to the joint posterior distribution of both parameters and latent states, thus avoiding the use of and the degeneracies inherent to sequential importance sampling. Exact particle filtering algorithms for pure state filtering are also provided. We illustrate the efficiency of our approach by sequentially learning parameters and filtering states in two models. First, we analyze a robust linear state space model with t-distributed errors in both the observation and state equation. Second, we analyze a log-stochastic volatility model. Using both simulated and actual stock index return data, we find that algorithm efficiently learns all of the parameters and states in both models.