Content uploaded by Tohru Ozaki
Author content
All content in this area was uploaded by Tohru Ozaki on May 08, 2016
Content may be subject to copyright.
Akaike Causality in State Space
Part I - Instantaneous Causality Between Visual
Cortex in fMRI Time Series
K.F. Kevin Wong, Tohru Ozaki
December 21, 2006
Abstract
We present a new approach of explaining partial causality in mul-
tivariate fMRI time series by a state space model. A given single time
series can be divided into two noise-driven processes, which comprising
a homogeneous process shared among multivariate time series and a
particular process refining the homogeneous process. Causality map is
drawn using Akaike noise contribution ratio theory, by assuming that
noises are independent. The method is illustrated by an application to
fMRI data recorded under visual stimulus.
Keywords: Akaike causality, noise contribution ratio, state space model,
common source, partial causality, functional MRI, primary visual cor-
tex, middle temporal cortex, posterior parietal cortex.
1 Introduction
For the purpose of causality analysis in multivariate time series data Akaike
(1968) decomposes power spectral density into components, each coming
from an independent noise of multivariate autoregressive model (VAR). Con-
troversy on Akaike noise contribution ratio (NCR) causality mainly focuses
on the validity of causality when residuals are spatially highly correlated,
that phenomenon can be reflected from large covariance entry in noise covari-
ance matrix. When driving noises have a high correlation instantaneously,
independence assumption of noise is not adequate, a non-zero noise covari-
ance is essential to improve the time series model. This indispensable co-
variance suggests that two corresponding time series are driven by similar
noises, which apparently showing causal relationship instantaneously from
one to the other, without a clue who is causing whom.
The causality being discussed is known to be instantaneous causality.
Geweke (1982) tested the likelihood ratio in order to decide the significance
of instantaneous causality. One deficiency is an unclear cut of causality when
1
the model order is getting high, so that feedback of instantaneous causality
through autoregressive process occurs and instantaneous causality plays a
role more than just instantaneously.
We propose an alternative way to look at the instantaneous causality by
state space model (Wong, 2005). In particular, we assume, instead of in-
directed causality between two variables, a directed causality from a latent
variable to the two variables. We will model the fMRI by a linear autore-
gressive model plus a homogeneous variable in a state space framework.
2 fMRI data under visual stimulus
The data selected as an example to illustrate our new method is obtained
from a recent research of Yamashita et al. (2005). The time series of BOLD
signal of a healthy subject under a visual stimulation was obtained in an
fMRI scanning machine. A black screen is presented to the subject for 30
seconds, then white dots appeared on the black screen and flew outwards
from center of the screen for 30 seconds. The two screens switched in every
30 seconds. A detailed experimental procedure and pre-processing procedure
can be found in Yamashita et al. (2005).
Yamashita et al. (2005) selected three regions of interest, primary visual
cortex (V1), visual cortex area 5 (V5) and posterior parietal cortex (PP).
They are reported to respond to human attention to visual motion. (B¨uchel
& Friston, 1997) The primary visual cortex (V1) is an entrance of visual
stimuli. Through V1 information is further transmitted to other visual areas,
such as visual areas V2, V3, V4 and V5. The visual area V5, also known as
visual area MT (middle temporal), is a region of extrastriate visual cortex
that is thought to play a major role in the perception of motion. Posterior
parietal cortex (PP) is another distinctive cortical area appearing to be
important for spatial processing and the control of eye movements, may
also have a central role in visual attention. We are interested in how is
connectivity among these areas in responding to visual stimulus.
In figure 1 we show the time series data on a time axis in second. The
data set contains four discontinuous segments. Each segment has 270 time
points covering 270 seconds. Yamashita et al. (2005) analyzed the time series
by VAR and adding the information of onset of stimulus as an exogenous
variable to the model. They reported that strong connectivity exists from
V1 to V5 and from V5 to PP at a period of 60 seconds, which is the time
between starting time of two consecutive stimuli.
3 Method and Result
We intend to fit the time series to a state space model and plot a causality
map based on the model. A latent variable is included in state vector in
2
0 90 180 270 360 450 540 630 720 810 900 990 1080
V5
V1
PP
time/seconds
Figure 1: fMRI BOLD signals under visual stimuli
3
order to get rid of a common dynamic which is driving the three cortex areas
simultaneously. Nevertheless, three individual driving noises representing
corresponding cortex areas pertain mutual causality, through a feedback
system provided by a transition matrix.
Let y
t
denote the observed data and x
t
the unobserved state. We assume
that x
t
depends on its past values through a linear stochastic model, con-
taining a dynamical noise term, and that y
t
follows from x
t
through a linear
observation model, containing an observation noise term; then the following
state space model applies:
x
t
= F x
t−1
+ Gw
t
(1)
y
t
= Hx
t
+ ²
t
(2)
Equations (1) and (2) are commonly known as system equation and ob-
servation equation, respectively. w
t
denotes the dynamical noise term of
the system equation, assumed to follow a multivariate Gaussian distribu-
tion w
t
∼ N (0, Q
t
), while ²
t
denotes the observation noise term of the
observation equation, assumed to follow a univariate Gaussian distribution
²
t
∼ N (0, R).
Kalman (1960) introduced a filtering technique for state space mod-
els which can efficiently calculate the conditional prediction and condi-
tional filtered estimation of unobserved states. A comprehensive intro-
duction to state space models and Kalman filtering has been provided by
Kalman (1960), Harrison & Stevens (1976), Harvey (1989), Grewal & An-
drews (2001).
Since we aim at decomposing the time series into a common source com-
ponent and a particular source component, we choose a special structure for
the state space model, such that the last element of the state vector x
t
repre-
sents the common source component, and the former elements of the vector
form a 3-variate AR model. By this we should have a canonical form (Aoki,
1990) for the 3-variate AR and a coefficient for the common source along
the diagonal of F . The 3-variate AR should capture main characteristics of
the time series but the common source should only capture instantaneous
and simultaneous dynamic, therefore the coefficient for the common source
should be small, for instance 0.05.
The model parameters in Equations (1) and (2) are estimated from given
data by the maximum-likelihood method. Given a set of parameters, com-
putation of the likelihood from the errors of the data prediction through ap-
plication of the Kalman filter is straightforward; see Mehra (1971),
˚
Astr¨om
& Kallstrom (1973), Sorenson (1985) and Vald´es-Sosa et al. (1999) for a de-
tailed treatment. A maximum likelihood estimate for the state space model
4
is as follows.
F =
3.0165 0.1486 −0.0516 1 0 0 0 0 0 0 0 0 1
0.0414 3.1754 0.0023 0 1 0 0 0 0 0 0 0 1.1335
0.0246 0.0690 3.1140 0 0 1 0 0 0 0 0 0 0.9725
−3.6868 −0.3493 0.0693 0 0 0 1 0 0 0 0 0 0
−0.0307 −4.0918 0.0083 0 0 0 0 1 0 0 0 0 0
0.0236 −0.1487 −4.0049 0 0 0 0 0 1 0 0 0 0
2.2126 0.2975 −0.0214 0 0 0 0 0 0 1 0 0 0
−0.0451 2.5811 −0.0089 0 0 0 0 0 0 0 1 0 0
−0.0801 0.1102 2.5351 0 0 0 0 0 0 0 0 1 0
−0.5601 −0.0873 −0.0069 0 0 0 0 0 0 0 0 0 0
0.0348 −0.6735 −0.0007 0 0 0 0 0 0 0 0 0 0
0.0348 −0.0219 −0.6720 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0.0500
,
G =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0
0 0 0 1
, H =
1 0 0 0 . . . 0
0 1 0 0 . . . 0
0 0 1 0 . . . 0
,
Q =
0.0460 0 0 0
0 0.0515 0 0
0 0 0.0631 0
0 0 0 0.0335
,
R =
0 0 0
0 0 0
0 0 0
.
Akaike information criterion(AIC), a value for comparing statistical models
by weighing likelihood function and number of model parameters, is 965.3
(= 879.3 + 2 × 43) for the state space model, comparing to 984.4 (= 900.4
+ 2 × 42) for a VAR(4) with full matrix of noise variance. It suggests that
the state space model should be a more suitable model to the time series.
In figure 2(a) we show the spectra of fMRI of, from left to right, PP,
V1 and V5, based on the estimated state space model. Each spectrum is
constituted of 4 colors, which corresponding to 4 system noises of the state
space model. By the state space structure we have already assume green,
red, yellow and black are respectively PP, V1, V5 and common source.
Through F , G and H, the 4 noises contribute to the time series distinctively,
showed by the mo del spectra. Among the 3 spectra, the one of V1 has the
5
(a)
0 0.1 0.2 0.3 0.4 0.5
0
50
100
150
200
250
300
350
0 0.1 0.2 0.3 0.4 0.5
0
200
400
600
800
1000
1200
0 0.1 0.2 0.3 0.4 0.5
0
50
100
150
200
250
300
(b)
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
(c)
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
Figure 2: Model spectra, NCR causality map and partial NCR causality
map of state space
6
highest power intensity at around 0.02, about 50-second period oscillation,
which can also be seen clearly from the data.
In figure 2(b) we show the NCR causality map, which obtained by nor-
malizing the spectra in (a). At each frequency the spectral power intensity
is squeezed into 0% to 100%, so that the ratio of contribution from each
noise variance can be seen clearly at each frequency. Since most power in-
tensity is dense between 0 to 0.06 interval, we shall explain causality based
on this interval. The black color is the noise in driving the time series si-
multaneously by assumption. We can see this common source explains over
50% of power intensity at 0Hz in all the spectra. Also, it has shared over
50% of power intensity of the lower frequency region of V1. Note that this
common source has been introduced to the state space model through an
AR process of coefficient 0.05, which meaning this noise is not providing an
additional degree of characteristic root to the transition matrix, but sparing
more room for the correlated residuals from AR.
In figure 2(c) we show the partial NCR causality map, when the contri-
bution of common source, ie black, is eliminated. These remaining colors
can tell the causality from these independent noises to the time series. V1
is showing up around low frequency range, saying that causality from V1 to
PP and V5 is significant. PP is causing V1 and V5 a little, mostly at the
neighborhood of 0.5 (20-25s period oscillation), and at the same time, V5 is
causing PP a little and V1 negligibly nothing.
We compare the above result to the causality result from AR, of which
noise variance matrix is diagonal, estimated by least square method. AIC
of an AR(4) with diagonal noise variance is 1452.7 (= 1374.7 + 2 × 39), a
value much greater than the AIC of state space model, meaning this AR(4)
is less suitable to the time series.
Ã
y
(1)
t
y
(2)
t
!
=
3.0183 0.1499 −0.0504
0.0433 3.1768 0.0036
0.0262 0.0701 3.1151
y
(1)
t−1
y
(2)
t−1
y
(3)
t−1
+
−3.6896 −0.3517 0.0674
−0.0337 −4.0944 0.0063
0.0212 −0.1507 −4.0064
y
(1)
t−2
y
(2)
t−2
y
(3)
t−2
+
2.2126 0.2985 −0.0205
−0.0453 2.5821 −0.0080
−0.0805 0.1108 2.5357
y
(1)
t−3
y
(2)
t−3
y
(3)
t−3
+
−0.5589 −0.0873 −0.0070
0.0362 −0.6734 −0.0008
0.0360 −0.0217 −0.6720
y
(1)
t−4
y
(2)
t−4
y
(3)
t−4
+
η
(1)
t
η
(2)
t
η
(3)
t
7
η
(1)
t
η
(2)
t
η
(3)
t
∼ N
0
0
0
,
0.0797 0 0
0 0.0948 0
0 0 0.0949
By this AR(4) we plot model spectra in figure 3(a) and NCR causality map
in figure 3(b). To our surprise this NCR causality map is so similar to that in
Figure 2(b). On one hand it has proven that the common source component
was added to lessen the squares of residual but not to take away any model
characteristics by our assumption, and on the other hand we can assure our
result in state space is consistent.
(a)
0 0.1 0.2 0.3 0.4 0.5
0
50
100
150
200
250
300
350
0 0.1 0.2 0.3 0.4 0.5
0
200
400
600
800
1000
0 0.1 0.2 0.3 0.4 0.5
0
50
100
150
200
250
(b)
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
0 0.1 0.2 0.3 0.4 0.5
0%
20%
40%
60%
80%
100%
Figure 3: Model spectra and NCR causality map
4 Discussion
We proposed a new method to apply Akaike causality in state space frame-
work so that the only limitation of Akaike causality is solved when residuals
of VAR is highly correlated. Correlation between noises in VAR is sorted
out as an additional independent noise homogeneously driving multivariate
time series in a state space framework. By comparing the AIC we found
that the state space model fits better than the VAR.
The idea in this paper can be further extended. Besides a common source
component for all time series, some pairwise or tuple-wise common source
8
components can be added into the model. For instance in this paper, in
additional to the common source of black color, we can introduce one more
color for a common source of V1 and V5 but not PP, so that the common
source with the visual cortex can be further eliminated. More generally
we should add also the other two combinations of pairwise common source
components.
However, time series in real application often share common character-
istics. A common characteristics greatly shared by two time series may also
appear in other time series, that it should not be negligibly zero even though
its strength is small. Therefore pairwise common variables could be easily
absorbed by an overall homogeneous variable. See Tanokura & Kitagawa
(2003) for a similar treatment.
Like any other causality theory, Akaike causality has to be based on a
model. The goodness of an estimated model affects very much the causality
conclusion. Therefore, before drawing any causality conclusion, much effort
on finding a suitable model is necessary.
5 Appendix
5.1 State space model and its ARMA representation
Here we will give the ARMA representation of state space model. Referring
to Equation (1) and (2), let F , G and H are of size m × m, m × k and
` × m respectively. Then F has m eigenvalues, thus there is a characteristic
polynomial of order m, so that we can transform F linearly to zero by Cayley
Hamilton Theorem.
F
m
− φ
1
F
m−1
− φ
2
F
m−2
− · · · − φ
m−1
F − φ
m
I = 0 .
By this a linear state space model can be transformed to a VARMA in terms
of observed data y and noises η.
y
t
− φ
1
y
t−1
− φ
2
y
t−2
− · · · − φ
m
y
t−m
≡
Θ
0
η
t
+ Θ
1
η
t−1
+ Θ
2
η
t−2
+ · · · + Θ
m−1
η
t−m+1
+ Θ
m
η
t−m
(3)
Let I be identity matrix.
Θ
0
=
¡
HG
I
¢
Θ
1
=
¡
H (F − φ
1
I) G
−φ
1
I
¢
Θ
2
=
¡
H
¡
F
2
− φ
1
F − φ
2
I
¢
G
−φ
2
I
¢
.
.
.
Θ
m−1
=
¡
H
¡
F
m−1
− φ
1
F
m−2
− · · · − φ
m−2
F − φ
m−1
I
¢
G
−φ
m−1
I
¢
Θ
m
=
¡
0
−φ
m
I
¢
η
t−j
=
µ
w
t−j
²
t−j
¶
∼ N (0, Σ) , Σ =
µ
Q 0
0 R
¶
9
Autoregressive coefficients of the VARMA are scalars which are coefficients
of the characteristic equation of F . Moving average coefficients Θ are formed
by two block matrices, of sizes ` × k and ` × `, which depend on F , G and H
only. This noise vector η is stacked by w
t
and ²
t
vertically. Note that the
size of η is not necessary as same as that of y. Although the autoregressive
part is molded identically for all variables in y, the moving average part
refines each variable uniquely.
5.2 Akaike causality for VARMA and State Space
Here we will give derivation of Akaike causality of VARMA only. Akaike
causality for state space is straightforward by combining this result with the
formula in the previous subsection.
By Equation 3 we will obtain a power spectral density matrix P
f
for a
VARMA.
F
f
(Φ) = −I +
p
X
j=1
Φ
j
e
−2jiπf
, F
f
(Θ) =
q
X
j=0
Θ
j
e
−2jiπf
,
P
f
= F
f
(Φ)
−1
F
f
(Θ) Σ F
f
(Θ)
H
n
F
f
(Φ)
−1
o
H
.
At each frequency f, the diagonal elements of P
f
are spectral density of
time series and the off-diagonal elements are cross spectral density. If Σ
is a diagonal matrix, each diagonal elements of P
f
is weighted sum of the
diagonals of Σ. By this Akaike NCR causality is defined by the proportion
of power from one noise variance to the power from all noise variance.
NCR
¡
σ
2
, y
t
¢
=
spectral density going to y
t
from σ
2
total spectral density going to y
t
from all variances
Acknowledgements
The authors would like to thank Dr Okito Yamashita and Prof Norihiro
Sadato for providing the fMRI data, and special thanks to Prof Rolando
Biscay for his comments and guidance.
This work was supported by the Atsumi International Scholarship Foun-
dation, the Iwatani Naoji Foundation, Research Institute of Science and
Technology for Society of the Japan Science and Technology Agency and
the Japanese So ciety for the Promotion of Science through Kiban B no.
173000922301.
10
References
Akaike, H. (1968). On the use of a linear model for the identification of
feedback systems. Annals of the Institute of Statistical Mathematics 20
425–439.
Aoki, M. (1990). State Space Modeling of Time Series. New York: Springer-
Verlag.
˚
Astr
¨
om, K. J. & Kallstrom, C. G. (1973). Application of system iden-
tification techniques to the determination of ship dynamics. In P. Eykhoff,
ed., Identification and system parameter estimation. Amsterdam: North-
Holland.
B
¨
uchel, C. & Friston, K. J. (1997). Modulation of connectivity in visual
pathways by attention: Cortical interactions evaluated with structural
equation modelling and fMRI. Cereb. Cortex 7 768–778.
Geweke, J. F. (1982). Measurement of linear dependence and feedback
between multiple time series. Journal of the American Statistical Associ-
ation 77 304–324.
Grewal, M. S. & Andrews, A. P. (2001). Kalman filtering: Theory and
Practice Using MATLAB 2nd edition. New York: Wiley.
Harrison, J. & Stevens, C. F. (1976). Bayesian forecasting (with dis-
cussion). Journal of the Royal Statistical Society, Series B 38 205–247.
Harvey, A. C.
(1989).
Forecasting, structural time series models and the
Kalman filter. Cambridge: Cambridge University Press.
Kalman, R. E. (1960). A new approach to linear filtering and prediction
problems. Journal of Basic Engineering 82 35–45.
Mehra, R. K. (1971). Identification of stochastic linear dynamic systems.
American Institute of Aeronautics and Astronautics Journal 9 28–31.
Sorenson, H. W. (1985). Kalman Filtering: Theory and Application.
IEEE Press.
Tanokura, Y. & Kitagawa, G. (2003). Extended power contribution
that can be applied without indep endence assumption. Tech. Rep. 886,
The Institute of Statistical Mathematics.
Vald
´
es-Sosa, P., Jimenez, J. C., Riera, J., Biscay, R. & Ozaki, T.
(1999). Nonlinear EEG analysis based on a neural mass model. Biological
Cybernetics 81 348–358.
11
Wong, K. F. K. (2005). Multivariate Time Series Analysis of Het-
eroscedastic Data with Application to Neuroscience. Ph.D. thesis, Grad-
uate University for Advanced Studies.
Yamashita, O., Sadato, N., Okada, N. & Ozaki, T. (2005). Evaluat-
ing frequency-wise directed connectivity of bold signals applying relative
power contribution with the linear multivariate time series models. Neu-
roimage 25 478–490.
12